Gossamer Forum
Home : General : Perl Programming :

convert to UTF8?

Quote Reply
convert to UTF8?
I'm trying to fix an issue for someone who's using a meta-search engine on a French web site. The data being pulled in, I would have assumed, should already be in UTF-8, since the engines being searched are French as well. But appearently its not, or it's getting munged at some point.

In his code, I've tried:

Code:
use Encode;
foreach my $r (@$results)
{
$r->{title} = encode_utf8($r->{title});
$r->{description} = encode_utf8($r->{description});
}

and:
Code:
foreach my $r (@$results)
{
utf8::encode($r->{title});
utf8::encode($r->{description});
}

The client says neither solutions seem to be working correctly. What else should I try?

Philip
------------------
Limecat is not pleased.
Quote Reply
Re: [fuzzy logic] convert to UTF8? In reply to
Each program needs to know which encoding is being used and needs to understand it. If the initial encoding is ISO-8859-15 or Windows-1252, but your system's encoding is UTF-8, you might not be able to reencode the characters properly. If the characters were encoded properly to UTF-8 but the previous encoding is still being expected, characters will not display properly.

Check the charset specified by the server in result pages. If no charset is specified, UAs must assume ISO-8859-1 (which is missing Œ, œ, and Ÿ).
You might have to specify use encoding 'something_or_other'; in your program (and then 'use utf8;' in a block where you handle the characters -- and write them to a file or stream so they don't get downgraded).


The forum says it supports UTF-8... does it really?
«»્ਉ餉٦

Last edited by:

mkp: Jul 18, 2006, 7:20 PM