Gossamer Forum
Home : General : Perl Programming :

Just a quickie .. regarding non-english charachers...

Quote Reply
Just a quickie .. regarding non-english charachers...
Hi guys. Just a quick one.

Is there a way to decode something like;

World/Espa%c3%b1ol

...to;

World/Espa˝ol

I'm not quite sure what/how the %c3%b1 is assigned/setup :(

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Just a quickie .. regarding non-english charachers... In reply to
This should do it

Code:
use Unicode::String;
my $str = 'World/Espa%c3%b1ol';
$str =~ s/%([a-fA-F0-9]{2})/chr(hex($1))/eg;
my $ustr = new Unicode::String($str);
print $ustr->latin1();

If your OS is unicode friendly (like OS X), you can skip the Unicode::String bit and just use the regex.

-gordon

s/(\d{2})/chr($1)/ge + print if $_ = '8284703280698276687967';

Last edited by:

GClemmons: Jul 21, 2004, 5:33 PM
Quote Reply
Re: [GClemmons] Just a quickie .. regarding non-english charachers... In reply to
Thanks Gordon. I'll give that a go :)

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [GClemmons] Just a quickie .. regarding non-english charachers... In reply to
Thanks. That worked ok, except it looks like the RDF file uses stuff like;

<Topic r:id="Top/World/Espa├▒ol">

... whats the ├▒? Do you think there is any way to go through the RDF file, and change each non-english charachters correctly? (Unicode::MapUTF kinda works, but if you run it more than once, it seems to get rid of foreign charachters all together ... ie "Espa├▒ol" would change to "Espa˝ol", and then once you run it again, it would turn into : Espaol (i.e the missing ˝).

Any ideas?

TIA

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Just a quickie .. regarding non-english charachers... In reply to
Quote:
<Topic r:id="Top/World/Español">

Yeah, that's what a utf-8 string looks like when viewed via an ascii interface. If you want to see the ascii equiv you can use Unicode::String to convert it. Of course, that is only for latin characters - if you have some non-latin character (like cyrillic or japanese characters) it won't work. If your data is in Unicode format, I would leave it that way unless there is a good reason to switch it. If your presentation layer provides the right headers, everything should come out fine on the browsers.

-gordon

s/(\d{2})/chr($1)/ge + print if $_ = '8284703280698276687967';
Quote Reply
Re: [GClemmons] Just a quickie .. regarding non-english charachers... In reply to
Quote:
I would leave it that way unless there is a good reason to switch it. If your presentation layer provides the right headers, everything should come out fine on the browsers.

The problem I am having, is with my DMOZ wizard. I'm working on a cleanup routine, which I'm going to put in it, which looks like this so far;

Code:
sub run_rdf_cleanup {

print "Cleaning up RDF file... \n";

`mv content.rdf.u8 content.rdf.u8.2`;

open (CONTENT,"content.rdf.u8.2") || die $!;
open (WRITEIT,">content.rdf.u8") || die $!;
while (<CONTENT>) {
if (/[\200-\377]/) {
s/([\200-\377]+)/from_utf8({ -string => $1, -charset => 'ISO-8859-1'})/eg;
}
print WRITEIT $_;
}
close(WRITEIT);
close(CONTENT);

}

Its far from perfect, but I'm hoping it will do the trick.

As you say though, it won't be able to do BIG5 etc formats.. which is a bit of a pain (fortunatly almost everyone who uses my plugin imports English categories; but there are always a few who would need non-english charachters, such as French, German, Spanish, etc.

Its a tough one.

I guess what I could do is just put something up on the detailed page, letting people know that it WONT be able to import non-UTF8 charachters (sorry, thinking out loud :) ).

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [GClemmons] Just a quickie .. regarding non-english charachers... In reply to
Yaaaay... seems to be working :)

I'm gonna give it a few more tests; and if its ok, then I'll get it finished off, and get the update released.

Thanks for your help on this one Gordon :)

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Just a quickie .. regarding non-english charachers... In reply to
Any time. Regarding the spanish, french, german, etc characters, you can still convert those to Latin-1 - the problem lies in the non-latin characters like arabic, cyrillic, japanese, etc.

Good luck with the release.

-gordon

s/(\d{2})/chr($1)/ge + print if $_ = '8284703280698276687967';
Quote Reply
Re: [GClemmons] Just a quickie .. regarding non-english charachers... In reply to
Quote:
the problem lies in the non-latin characters like arabic, cyrillic, japanese, etc.

Yeah, I'm just going to put something on the site about the arabic, cyrillic, japanese imports.

Thanks again.

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!