Gossamer Forum
Home : General : Perl Programming :

Just a quickie .. regarding non-english charachers...

Quote Reply
Just a quickie .. regarding non-english charachers...
Hi guys. Just a quick one.

Is there a way to decode something like;

World/Espa%c3%b1ol

...to;

World/Español

I'm not quite sure what/how the %c3%b1 is assigned/setup :(

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Just a quickie .. regarding non-english charachers... In reply to
This should do it

Code:
use Unicode::String;
my $str = 'World/Espa%c3%b1ol';
$str =~ s/%([a-fA-F0-9]{2})/chr(hex($1))/eg;
my $ustr = new Unicode::String($str);
print $ustr->latin1();

If your OS is unicode friendly (like OS X), you can skip the Unicode::String bit and just use the regex.

-gordon

s/(\d{2})/chr($1)/ge + print if $_ = '8284703280698276687967';

Last edited by:

GClemmons: Jul 21, 2004, 5:33 PM
Quote Reply
Re: [GClemmons] Just a quickie .. regarding non-english charachers... In reply to
Thanks Gordon. I'll give that a go :)

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [GClemmons] Just a quickie .. regarding non-english charachers... In reply to
Thanks. That worked ok, except it looks like the RDF file uses stuff like;

<Topic r:id="Top/World/Español">

... whats the ñ? Do you think there is any way to go through the RDF file, and change each non-english charachters correctly? (Unicode::MapUTF kinda works, but if you run it more than once, it seems to get rid of foreign charachters all together ... ie "Español" would change to "Español", and then once you run it again, it would turn into : Espaol (i.e the missing ñ).

Any ideas?

TIA

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Just a quickie .. regarding non-english charachers... In reply to
Quote:
<Topic r:id="Top/World/Español">

Yeah, that's what a utf-8 string looks like when viewed via an ascii interface. If you want to see the ascii equiv you can use Unicode::String to convert it. Of course, that is only for latin characters - if you have some non-latin character (like cyrillic or japanese characters) it won't work. If your data is in Unicode format, I would leave it that way unless there is a good reason to switch it. If your presentation layer provides the right headers, everything should come out fine on the browsers.

-gordon

s/(\d{2})/chr($1)/ge + print if $_ = '8284703280698276687967';
Quote Reply
Re: [GClemmons] Just a quickie .. regarding non-english charachers... In reply to
Quote:
I would leave it that way unless there is a good reason to switch it. If your presentation layer provides the right headers, everything should come out fine on the browsers.

The problem I am having, is with my DMOZ wizard. I'm working on a cleanup routine, which I'm going to put in it, which looks like this so far;

Code:
sub run_rdf_cleanup {

print "Cleaning up RDF file... \n";

`mv content.rdf.u8 content.rdf.u8.2`;

open (CONTENT,"content.rdf.u8.2") || die $!;
open (WRITEIT,">content.rdf.u8") || die $!;
while (<CONTENT>) {
if (/[\200-\377]/) {
s/([\200-\377]+)/from_utf8({ -string => $1, -charset => 'ISO-8859-1'})/eg;
}
print WRITEIT $_;
}
close(WRITEIT);
close(CONTENT);

}

Its far from perfect, but I'm hoping it will do the trick.

As you say though, it won't be able to do BIG5 etc formats.. which is a bit of a pain (fortunatly almost everyone who uses my plugin imports English categories; but there are always a few who would need non-english charachters, such as French, German, Spanish, etc.

Its a tough one.

I guess what I could do is just put something up on the detailed page, letting people know that it WONT be able to import non-UTF8 charachters (sorry, thinking out loud :) ).

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [GClemmons] Just a quickie .. regarding non-english charachers... In reply to
Yaaaay... seems to be working :)

I'm gonna give it a few more tests; and if its ok, then I'll get it finished off, and get the update released.

Thanks for your help on this one Gordon :)

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Just a quickie .. regarding non-english charachers... In reply to
Any time. Regarding the spanish, french, german, etc characters, you can still convert those to Latin-1 - the problem lies in the non-latin characters like arabic, cyrillic, japanese, etc.

Good luck with the release.

-gordon

s/(\d{2})/chr($1)/ge + print if $_ = '8284703280698276687967';
Quote Reply
Re: [GClemmons] Just a quickie .. regarding non-english charachers... In reply to
Quote:
the problem lies in the non-latin characters like arabic, cyrillic, japanese, etc.

Yeah, I'm just going to put something on the site about the arabic, cyrillic, japanese imports.

Thanks again.

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!