Gossamer Forum
Home : General : Perl Programming :

Converting UTF8 to ISO-8859-1...

Quote Reply
Converting UTF8 to ISO-8859-1...
Below is some code Alex posted a while back to convert UTF8 to ISO-8859-1. Unfortunately, when running this on large cuts of the DMOZ rdf (like Regional), I run out of memory (1.5GB in my server). Is there an easy way to modify this code so that it isn't trying to load the entire cut into memory, convert it, then write it to a file? Maybe it can use a temp/swap file to write the data a piece at a time. Any ideas?

Code:
use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset);
open (FH, "$rdf_path/Regional.rdf.u8");
read (FH, $data, -s FH);
close FH;

open (OUT, "> $out_path/Regional.rdf");
$data =~ s/([\200-\377]+)/from_utf8({ -string => $1, -charset => 'ISO-8859-1'})/eg;
print OUT $data;
close OUT;

Thanks in advance,
Sean
Quote Reply
Re: [SeanP] Converting UTF8 to ISO-8859-1... In reply to
Hmm you could try using a while() loop instead of trying to load the whole thing into memory.
Quote Reply
Re: [Paul] Converting UTF8 to ISO-8859-1... In reply to
Good call. I'll try to redo this thing a little. Thanks.

Sean
Quote Reply
Re: [SeanP] Converting UTF8 to ISO-8859-1... In reply to
Code:
open FH, "$rdf_path/Regional.rdf.u8" or die $!;
open OUT, ">$out_path/Regional.rdf" or die $!;
while (<FH>) {
s/([\200-\377]+)/from_utf8({ -string => $1, -charset => 'ISO-8859-1'})/eg;
print OUT;
}
close OUT;
close FH;

That might help.

Last edited by:

Paul: Jul 22, 2002, 2:10 PM
Quote Reply
Re: [Paul] Converting UTF8 to ISO-8859-1... In reply to
That works great! Thanks so much for your help!

Sean