Gossamer Forum
Home : General : Perl Programming :

string encode problem

Quote Reply
string encode problem
    
i have a character $string="€"; when use below javascript to encode it
i will be given a 7 byte stirng like this $encoded_result= "€"

i want to know if there is the equevelent of module or code in perl which do the same task , turn $string="€"; into $encoded_result= "€" ?

Code:


<SCRIPT type=text/javascript>
function ConvUtf(obj,btn){
document.getElementById("result").value=obj.value.replace(/[^\u0000-\u00FF]/g,function($0){return escape($0).replace(/(%u)(\w{4})/gi,"&#x$2;")});
}
function ResChinese(obj,btn)
{
document.getElementById("content").value=unescape(obj.value.replace(/&#x/g,'%u').replace(/;/g,''));
}
</SCRIPT>




I have asked perl guru . some of them mentioned HTML::Entities
however from my testing i can't get the expecting result

here is the code i tested and its output is "&#x80;"



#!/usr/bin/perl
#use HTML::Entities;
use HTML::Entities"encode_entities_numeric";
$string="€"; # ("&#x20AC;" expected result)
$encoded_result= encode_entities_numeric($string);
print $encoded_result;




is there anything i miss? or is there any code i could use to make such a encode?


Thanks in advance

Last edited by:

courierb: Jan 19, 2011, 8:06 PM
Quote Reply
Re: [courierb] string encode problem In reply to
it seems some javascript has been truncated
therefor i upload it as a attatchement here again

Last edited by:

courierb: Jan 19, 2011, 8:04 PM
Quote Reply
Re: [courierb] string encode problem In reply to
Hi,

Have you tried something like:

Code:
$encoded = join '', map { '&#'.ord.';' } split //, $string;

So something like this may be better:

Code:
my $string = q|some example with things like $ ^ % & * €|;
my $new_string;
foreach (split //, $string) {
if ($_ !~ /^[\d\w ]$/) {
$_ = "&#" . ord($_) . ";";
}
$back .= $_;
}

print "FOO: $back ";

Just tested that code locally, and looks like it works perfectly - gives out stuff like:

Code:
FOO: some example with things like &#36; &#94; &#37; &#38; &#128;

Hope that helps.

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] string encode problem In reply to
+Hi Andy

Really appreciate your reply

your code do encode the string. however "€" has been encoded to "&#128" which is 5 digit while i need conver "€" to "&#x20AC;" which is 7 digit


most important is after convertion if you put "&#x20AC;" into a webpage. it will show you letter "€" directly in the webpage

for example if you open below code in any browser

<html >

"&#x20AC;"

</html>


you will see a character "€" that is the result i wanted.
some one enlight me to use HTML::Entities to encode it
however i can't get it work for me( which decode function works perfect for me )

Logically if a module could decode for you perfectly . it should be able to encoding for you perfectly. so i suspect i just mess up something in encoding code.



Here is my test code

Code:


#!/usr/bin/perl
use strict;
use warnings;
use HTML::Entities qw[encode_entities_numeric];
my $string = "€" ;
my $encoded = encode_entities_numeric $string;
say $encoded;



gives out stuff like
Code:
&#128


which exactly the same one as yours (5 digit only) while i am expecting 7 digit encoded string output. the most important thing is if i " &#128" into a html page. the browser will show you a currupted character.

would it be possible to have a double byte character ("€") encoded into a 7 digit sting?


Thanks and have a nice day

Last edited by:

courierb: Jan 20, 2011, 7:19 AM
Quote Reply
Re: [courierb] string encode problem In reply to
You sure? See attached image.

NOTE: You are missing ; at the end of the string

For example - this ISNT valid:

Code:
&#128

..but this IS:

Code:
&#128;

Hope that helps

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!

Last edited by:

Andy: Jan 20, 2011, 7:24 AM
Quote Reply
Re: [courierb] string encode problem In reply to
BTW, exact same results Wink

Code:
<html >

&#x20AC;
&#128;

</html>

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] string encode problem In reply to
Dear Andy

You are right

The output now is &#128;
Quote Reply
Re: [courierb] string encode problem In reply to
 Andy thank you very much.

It seems that it do show the same character this time now.


i will have a test on various double byte charater like tail language or Japanese or Chinese and test its encoded output in various browser.


Thanks you very much for your help..
Quote Reply
Re: [courierb] string encode problem In reply to
Dear Andy

it seems these two type of output is slightly different.

for example
$ string= "中"
it is output would be (encode it by your code)
&#214;&#208;
if you put "&#xD6;&#xD0;" into html page it will show you a corrupted character.


while if $ string= "中" has been encoded into the format of seven digit
(i use javascript to conver it to "&#x4E2D;")

"&#x4E2D;" is my expected result if i put it into a html page you will see a correct character.

Code:


<html >

&#x20AC;
&#128;

<br>
1. encode_5= " &#214;&#208;" corrupted
<br>
2.encode_7= "&#x4E2D;" correct one
<br>

</html>

Last edited by:

courierb: Jan 20, 2011, 8:09 AM
Quote Reply
Re: [courierb] string encode problem In reply to
Hi,

Mmm, not sure

Have you tried asking at perlmonks.org? Those guys really know their stuff when it comes to utf-8 charachters, encoding, decoding, and whatnot ;)

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] string encode problem In reply to
Hi Andy
i do post the question there http://www.perlmonks.org/?node_id=883082
i was offered one solution.

Here is same code

Code:

#!/usr/bin/perl
use 5.010;
use strict;
use warnings;
use HTML::Entities qw[encode_entities_numeric];
my $string = "\x{20AC}";
my $encoded = encode_entities_numeric $string;
say $encoded;
__END__
&#x20AC;
[download]

however i don't understand why he use string "\x{20AC}"; actually the input value should be "€"
when i replace
my $string = "\x{20AC}";
with
my $string = "€";
then its output is &#128;( not seven digit) which is not my expected result.

Probably i am lack of knowledge on these stuffs. i was really confused by "\x{20AC}"; where did it come from.

Cheers

Last edited by:

courierb: Jan 20, 2011, 10:13 AM
Quote Reply
Re: [courierb] string encode problem In reply to
Hi,

I think you need to explain that the data being PASSED IN, looks like:


£

etc

...and that you need to convert it TO the correct 7 digit version (explain also that its gonna be used with UTF-8 characters too, as that may help too :))

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] string encode problem In reply to
Hi Andy

obviously the guru at perlmonk knows my meaning .
as "\x{20AC}"; looks like &#x20AC;



anyway thanks for your reminder i will update the post to make it more clear

a while back i google something interesting about this stuffs by google key words "\x{
obviously they were talking about the same issue.


http://stackoverflow.com/...-of-a-string-in-perl
Quote Reply
Re: [courierb] string encode problem In reply to
Perl needs to know you're dealing with unicode characters, otherwise it will treat the multi-byte character as separate bytes instead of a single one.

Adrian