Gossamer Forum
Home : Products : Gossamer Links : Discussions :

GT::Template problem with utf-8

Quote Reply
GT::Template problem with utf-8
Hello,

I'm not sure whether this is the right place to post this, but as many other GT::Template-related questions show up here, I thought it just might be.

I'm using GT::Template in a private database manager with data in utf-8. One of the optional columns contains XML markup. The perl script I use to extract search results parses this markup with XML::LibXML and then adds the result value to a hash passed to a template.


The problem is: when I send parsed markup to GT::Template, it garbles all utf-8 characters in the *unparsed* column values. When I have a record without a value for the markup column, then things work fine. When I print directly, and not through columns, everything works fine.

This is with MySQL 4.1.1-alpha-standard, and perl 5.8.1.

This is the code of a test script:

Code:
#!/usr/bin/perl
use strict;
use DBI;
use XML::LibXML;
use GT::Template;

use vars qw ($dbh %rec);
&connect;

my $sql = qq|select * from pv_text where ID='5'|;
my $sth = $dbh->prepare($sql) || die ("can't execute: $DBI::errstr");
$sth->execute || die ("can't execute: $DBI::errstr");

while (my $hashref = $sth->fetchrow_hashref) {
%rec = %$hashref;
if ($rec{skt_notes}) {
$rec{skt_notes} = &construct_apparatus($rec{skt_notes});
}
}

print "Content-type: text/html\n\n";
print qq|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>UTF-8 oddities test</title>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<meta http-equiv="Pragma" content="no-cache" />
<meta http-equiv="Cache-Control" content="no-cache, must-revalidate" />
<meta http-equiv="Expires" content="-1" />
</head>
<body><p>|;
GT::Template->parse_print("/srv/www/htdocs/istb/cgi-bin/pv3/template/test.tmpl", \%rec);
print qq|</p></body></html>|;



sub construct_apparatus {
my $app = shift;
my $string;
my $parser = XML::LibXML->new();
my $tree = $parser->parse_string($app);
my $root = $tree->getDocumentElement;
foreach my $app_el($root->findnodes('app')) {
my $lemma = $app_el->findvalue('lem');
my $wit = $app_el->findvalue('lem/@wit');
$string .= "Lemma: $lemma ($wit) <br />";
foreach my $rdg ($app_el->getChildrenByTagName('rdg')) {
my $reading = $rdg->getFirstChild->getData;
my $wit = $rdg->getAttribute('wit');
$string .= qq|Reading: $reading ($wit)<br />|;
}
}
return ($string);
}

&disconnect;

sub connect {
$dbh = DBI->connect("DBI:mysql:pv:localhost", "kellner", "somepassword", {RaiseError => 1, AutoCommit => 1});

}

sub disconnect {
$dbh->disconnect;
}

This is the template:

Code:

<table width="700" cellpadding="5" cellspacing="0" border="0"><tr><td valign="top" width="50%">
<p>
<%skt%>
</p>
<p class="small">
<%skt_notes%>
</p>
<p><%tib%></p>
</td><td valign="top">
<p>
<%trans%>
</p>
<p class="small">
<%trans_notes%>

</p>
</td></tr>
<tr><td colspan="2"><h3>Notes</h3><br /> <%notes%></td></tr>
</table>


I can provide some sql data if needed. But perhaps this is a simple and really stupid mistake (hopefully) ...

Thanks in advance,
kellner
Quote Reply
Re: [kellner] GT::Template problem with utf-8 In reply to
Have you tried Unicode::MapUTF8?

Code:
use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset);

$var =~ s/([\200-\377]+)/from_utf8({ -string => $1, -charset => 'ISO-8859-1'})/eg;

The above would replace any UTf8 charachters into readable charachters. I use the above code in my DMOZ_Wizard plugin, so that the 'World' category can be translated correctly into the right charachters :)

Cheers

Andy (mod)
andy@ultranerds.co.uk


IMPORTANT: I've now moved to ultranerds.co.uk, and the .com will no longer work!
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package (plugins total "value" $3,325 & rising, for just $350)| GLinks ULTRA Package PRO (plugins total "value" $5,625 & rising, for just $500)
Support Forum | Links SQL Plugins | DMOZ Dumps | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Compare our different Plugin packages *new* Free CSS Templates
Quote Reply
Re: [Andy] GT::Template problem with utf-8 In reply to
Thanks, but why would I need this anyway? The database has utf-8 as its charset, the database table has utf-8 as charset, the data is in utf-8, and the html-charset is also utf-8. This is the way I want it.


It's just that something about GT::Template's handling produces garbled characters, but not always - only under certain conditions. This puzzles me, and I't like to understand.

Perhaps I have to tell GT::Template that I want it to treat the data as utf-8? (But then again, it usually does this anyway.)
kellner
Quote Reply
Re: [kellner] GT::Template problem with utf-8 In reply to
What sort of conditions cause the garbled text? GT::Template itself shouldn't care what gets passed through it.

Jason Rhinelander
Gossamer Threads
jason@gossamer-threads.com
Quote Reply
Re: [Jagerman] GT::Template problem with utf-8 In reply to
Thanks for the reply.

The conditions that cause this are, to the extent that I can track them down, described in the above code.

To summarize:

I have a database record stored in a hash. One of the hash values may contain XML markup like this:

<app><lem>some utf8 string</lem>
<rdg wit="source">another utf8 string</rdg></app>

This hash value is sent to a subroutine called construct_apparatus, and returned in HTML.

The subroutine invokes XML::LibXML, parses the string and produces HTML output.

When, after this subroutine does its job, I send the hash on to GT::Template, it garbles all hash values EXCEPT for the one parsed by XML::LibXML.

When I don't invoke the subroutine and no XML is involved, everything turns out fine.


I've uploaded two screenshots to my server.

OK output: http://mailbox.univie.ac.at/...pv_ok_screenshot.jpg

Garbled output: http://mailbox.univie.ac.at/...rbled_screenshot.jpg (the part in the left column, reading "Lemma" etc., is parsed correctly; everything else is garbled).

I'd really appreicate any help on this,

thanks,
kellner
Quote Reply
Re: [kellner] GT::Template problem with utf-8 In reply to
I don't think GT::Template is changing the output here - it doesn't care about any sort of character encoding, it simply prints whatever it finds in a template/tag. It looks more like the browser isn't picking up the charset string - does it show up correctly in the source of both pages?

Jason Rhinelander
Gossamer Threads
jason@gossamer-threads.com
Quote Reply
Re: [Jagerman] GT::Template problem with utf-8 In reply to
No, the source text is garbled where the HTML output is garbled.
I've checked this with Mozilla 1.6b and Konqueror 3.1.4.
kellner