Gossamer Forum
Home : Products : Gossamer Links : Discussions :

GT::Template problem with utf-8

Quote Reply
GT::Template problem with utf-8
Hello,

I'm not sure whether this is the right place to post this, but as many other GT::Template-related questions show up here, I thought it just might be.

I'm using GT::Template in a private database manager with data in utf-8. One of the optional columns contains XML markup. The perl script I use to extract search results parses this markup with XML::LibXML and then adds the result value to a hash passed to a template.


The problem is: when I send parsed markup to GT::Template, it garbles all utf-8 characters in the *unparsed* column values. When I have a record without a value for the markup column, then things work fine. When I print directly, and not through columns, everything works fine.

This is with MySQL 4.1.1-alpha-standard, and perl 5.8.1.

This is the code of a test script:

Code:
#!/usr/bin/perl
use strict;
use DBI;
use XML::LibXML;
use GT::Template;

use vars qw ($dbh %rec);
&connect;

my $sql = qq|select * from pv_text where ID='5'|;
my $sth = $dbh->prepare($sql) || die ("can't execute: $DBI::errstr");
$sth->execute || die ("can't execute: $DBI::errstr");

while (my $hashref = $sth->fetchrow_hashref) {
%rec = %$hashref;
if ($rec{skt_notes}) {
$rec{skt_notes} = &construct_apparatus($rec{skt_notes});
}
}

print "Content-type: text/html\n\n";
print qq|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>UTF-8 oddities test</title>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<meta http-equiv="Pragma" content="no-cache" />
<meta http-equiv="Cache-Control" content="no-cache, must-revalidate" />
<meta http-equiv="Expires" content="-1" />
</head>
<body><p>|;
GT::Template->parse_print("/srv/www/htdocs/istb/cgi-bin/pv3/template/test.tmpl", \%rec);
print qq|</p></body></html>|;



sub construct_apparatus {
my $app = shift;
my $string;
my $parser = XML::LibXML->new();
my $tree = $parser->parse_string($app);
my $root = $tree->getDocumentElement;
foreach my $app_el($root->findnodes('app')) {
my $lemma = $app_el->findvalue('lem');
my $wit = $app_el->findvalue('lem/@wit');
$string .= "Lemma: $lemma ($wit) <br />";
foreach my $rdg ($app_el->getChildrenByTagName('rdg')) {
my $reading = $rdg->getFirstChild->getData;
my $wit = $rdg->getAttribute('wit');
$string .= qq|Reading: $reading ($wit)<br />|;
}
}
return ($string);
}

&disconnect;

sub connect {
$dbh = DBI->connect("DBI:mysql:pv:localhost", "kellner", "somepassword", {RaiseError => 1, AutoCommit => 1});

}

sub disconnect {
$dbh->disconnect;
}

This is the template:

Code:

<table width="700" cellpadding="5" cellspacing="0" border="0"><tr><td valign="top" width="50%">
<p>
<%skt%>
</p>
<p class="small">
<%skt_notes%>
</p>
<p><%tib%></p>
</td><td valign="top">
<p>
<%trans%>
</p>
<p class="small">
<%trans_notes%>

</p>
</td></tr>
<tr><td colspan="2"><h3>Notes</h3><br /> <%notes%></td></tr>
</table>


I can provide some sql data if needed. But perhaps this is a simple and really stupid mistake (hopefully) ...

Thanks in advance,
kellner
Quote Reply
Re: [kellner] GT::Template problem with utf-8 In reply to
Have you tried Unicode::MapUTF8?

Code:
use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset);

$var =~ s/([\200-\377]+)/from_utf8({ -string => $1, -charset => 'ISO-8859-1'})/eg;

The above would replace any UTf8 charachters into readable charachters. I use the above code in my DMOZ_Wizard plugin, so that the 'World' category can be translated correctly into the right charachters :)

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] GT::Template problem with utf-8 In reply to
Thanks, but why would I need this anyway? The database has utf-8 as its charset, the database table has utf-8 as charset, the data is in utf-8, and the html-charset is also utf-8. This is the way I want it.


It's just that something about GT::Template's handling produces garbled characters, but not always - only under certain conditions. This puzzles me, and I't like to understand.

Perhaps I have to tell GT::Template that I want it to treat the data as utf-8? (But then again, it usually does this anyway.)
kellner
Quote Reply
Re: [kellner] GT::Template problem with utf-8 In reply to
What sort of conditions cause the garbled text? GT::Template itself shouldn't care what gets passed through it.

Jason Rhinelander
Gossamer Threads
jason@gossamer-threads.com
Quote Reply
Re: [Jagerman] GT::Template problem with utf-8 In reply to
Thanks for the reply.

The conditions that cause this are, to the extent that I can track them down, described in the above code.

To summarize:

I have a database record stored in a hash. One of the hash values may contain XML markup like this:

<app><lem>some utf8 string</lem>
<rdg wit="source">another utf8 string</rdg></app>

This hash value is sent to a subroutine called construct_apparatus, and returned in HTML.

The subroutine invokes XML::LibXML, parses the string and produces HTML output.

When, after this subroutine does its job, I send the hash on to GT::Template, it garbles all hash values EXCEPT for the one parsed by XML::LibXML.

When I don't invoke the subroutine and no XML is involved, everything turns out fine.


I've uploaded two screenshots to my server.

OK output: http://mailbox.univie.ac.at/...pv_ok_screenshot.jpg

Garbled output: http://mailbox.univie.ac.at/...rbled_screenshot.jpg (the part in the left column, reading "Lemma" etc., is parsed correctly; everything else is garbled).

I'd really appreicate any help on this,

thanks,
kellner
Quote Reply
Re: [kellner] GT::Template problem with utf-8 In reply to
I don't think GT::Template is changing the output here - it doesn't care about any sort of character encoding, it simply prints whatever it finds in a template/tag. It looks more like the browser isn't picking up the charset string - does it show up correctly in the source of both pages?

Jason Rhinelander
Gossamer Threads
jason@gossamer-threads.com
Quote Reply
Re: [Jagerman] GT::Template problem with utf-8 In reply to
No, the source text is garbled where the HTML output is garbled.
I've checked this with Mozilla 1.6b and Konqueror 3.1.4.
kellner