Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Catalyst: Users

ajax character encoding issue solved, but WHY?

 

 

Catalyst users RSS feed   Index | Next | Previous | View Threaded


seasprocket at gmail

Jun 18, 2009, 9:23 PM

Post #1 of 8 (980 views)
Permalink
ajax character encoding issue solved, but WHY?

I had a character encoding issue that I finally solved, but I don't
understand why the fix works. I'm hoping someone can explain this to me!

The issue was that non-ascii chars were appearing as junk BUT only when
retrieved via ajax calls. Otherwise, they displayed fine. The junk display
was due to them being interpreted as ISO-8859-1, but I could not figure out
why the browser was interpreting that way. All my data is handled as UTF-8.

The problem was fixed by calling utf8::decode on the data prior to sending
back via ajax. BUT WHY?

I am using the JSON view to render ajax responses, and it sets the charset
header correctly to UTF-8. Of course, even when you decode, perl still
represents as "internal" utf8. But why should this be necessary?

Thanks!



--
==========================
http://www.bikewise.org

2People citizen's network for climate action: http://www.2people.org

Greater Seattle Climate Dialogues: http://www.climatedialogues.org
==========================


onken at houseofdesign

Jun 19, 2009, 12:52 AM

Post #2 of 8 (914 views)
Permalink
Re: ajax character encoding issue solved, but WHY? [In reply to]

Am 19.06.2009 um 06:23 schrieb seasprocket[at]gmail.com:

> I had a character encoding issue that I finally solved, but I don't
> understand why the fix works. I'm hoping someone can explain this to
> me!
>
> The issue was that non-ascii chars were appearing as junk BUT only
> when retrieved via ajax calls. Otherwise, they displayed fine. The
> junk display was due to them being interpreted as ISO-8859-1, but I
> could not figure out why the browser was interpreting that way. All
> my data is handled as UTF-8.
>
> The problem was fixed by calling utf8::decode on the data prior to
> sending back via ajax. BUT WHY?
>
> I am using the JSON view to render ajax responses, and it sets the
> charset header correctly to UTF-8. Of course, even when you decode,
> perl still represents as "internal" utf8. But why should this be
> necessary?
>
> Thanks!
>

What is the encoding of the web page that issues that ajax request?
Does this occur on different browser as well?
I had similar problems and solved it by making sure that
every page has the utf8 encoding header set.

IMHO using utf8::decode is a hack and should be avoided if possible.

moritz

_______________________________________________
List: Catalyst[at]lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst[at]lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


phil.mitchell at pobox

Jun 19, 2009, 8:25 AM

Post #3 of 8 (911 views)
Permalink
Re: ajax character encoding issue solved, but WHY? [In reply to]

On Fri, Jun 19, 2009 at 12:52 AM, Moritz Onken <onken[at]houseofdesign.de>wrote:

>
> Am 19.06.2009 um 06:23 schrieb seasprocket[at]gmail.com:
> What is the encoding of the web page that issues that ajax request?


charset=UTF-8


>
> Does this occur on different browser as well?

yes (tested on FF and IE)

>
> I had similar problems and solved it by making sure that
> every page has the utf8 encoding header set.
>
> IMHO using utf8::decode is a hack and should be avoided if possible.


I totally agree, but it needs to be fixed!


>
>
> moritz
>



--
==========================
http://www.bikewise.org

2People citizen's network for climate action: http://www.2people.org

Greater Seattle Climate Dialogues: http://www.climatedialogues.org
==========================


francesc.roma+catalyst at gmail

Jun 19, 2009, 8:51 AM

Post #4 of 8 (912 views)
Permalink
Re: ajax character encoding issue solved, but WHY? [In reply to]

On Fri, Jun 19, 2009 at 6:23 AM, <seasprocket[at]gmail.com> wrote:

>
> The problem was fixed by calling utf8::decode on the data prior to sending
> back via ajax. BUT WHY?
>
> I am using the JSON view to render ajax responses, and it sets the charset
> header correctly to UTF-8. Of course, even when you decode, perl still
> represents as "internal" utf8. But why should this be necessary?
>


I had exactly the same problem and solution using Catalyst::Controller::REST
with the JSON serializer. Still in my list of 'big mysteries to be solved'.


I hadn't discovered Catalyst::Plugin::Unicode back then, I wonder if using
it would help, haven't tried myself yet.

Cheers
Francesc


pagaltzis at gmx

Jun 20, 2009, 3:50 AM

Post #5 of 8 (903 views)
Permalink
Re: ajax character encoding issue solved, but WHY? [In reply to]

* seasprocket[at]gmail.com <seasprocket[at]gmail.com> [2009-06-19 06:30]:
> The issue was that non-ascii chars were appearing as junk BUT
> only when retrieved via ajax calls. Otherwise, they displayed
> fine. The junk display was due to them being interpreted as
> ISO-8859-1, but I could not figure out why the browser was
> interpreting that way. All my data is handled as UTF-8.
>
> The problem was fixed by calling utf8::decode on the data prior
> to sending back via ajax. BUT WHY?

Looks like your code is broken and assumes bytes throughout; as
long as all your data is UTF-8 you won’t notice. Apparently the
JSON serialiser is trying to produce UTF-8 output correctly by
encoding the strings you pass it; since they’re already encoded,
you get double-encoding gremlins.

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>

_______________________________________________
List: Catalyst[at]lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst[at]lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


seasprocket at gmail

Jun 22, 2009, 5:50 PM

Post #6 of 8 (862 views)
Permalink
Re: Re: ajax character encoding issue solved, but WHY? [In reply to]

On Sat, Jun 20, 2009 at 3:50 AM, Aristotle Pagaltzis <pagaltzis[at]gmx.de>wrote:

> * seasprocket[at]gmail.com <seasprocket[at]gmail.com> [2009-06-19 06:30]:>
> > The problem was fixed by calling utf8::decode on the data prior
> > to sending back via ajax. BUT WHY?
>
> Looks like your code is broken and assumes bytes throughout; as
> long as all your data is UTF-8 you won’t notice. Apparently the
> JSON serialiser is trying to produce UTF-8 output correctly by
> encoding the strings you pass it; since they’re already encoded,
> you get double-encoding gremlins.


Thanks for your suggestion, but I'm pretty sure that the data is not getting
encoded twice. C::V::JSON tests the data before it encodes (
Encode::is_utf8() ) and only encodes if this test is true. This test only
passes if the data is decoded.

I have confirmed this by checking to see if Encode::encode is getting called
in C::V::JSON (it's not).

I agree something's broken, I just don't know what it is ... My suspicion is
that I don't really understand what's happening inside sqlite -- I assume
it's storing as UTF-8, but I don't really know what it's doing.



>
>
> Regards,
> --
> Aristotle Pagaltzis // <http://plasmasturm.org/>
>
> _______________________________________________
> List: Catalyst[at]lists.scsys.co.uk
> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
> Searchable archive:
> http://www.mail-archive.com/catalyst[at]lists.scsys.co.uk/
> Dev site: http://dev.catalyst.perl.org/
>



--
==========================
http://www.bikewise.org

2People citizen's network for climate action: http://www.2people.org

Greater Seattle Climate Dialogues: http://www.climatedialogues.org
==========================


pagaltzis at gmx

Jun 23, 2009, 1:28 AM

Post #7 of 8 (854 views)
Permalink
Re: ajax character encoding issue solved, but WHY? [In reply to]

* seasprocket[at]gmail.com <seasprocket[at]gmail.com> [2009-06-23 03:00]:
> Thanks for your suggestion, but I'm pretty sure that the data
> is not getting encoded twice. C::V::JSON tests the data before
> it encodes ( Encode::is_utf8() ) and only encodes if this test
> is true. This test only passes if the data is decoded.

Augh! Augh! Why do people keep reading stuff into the UTF8 flag
that it doesn’t mean. (Yeah, I know why, because it’s called the
UTF8 flag when it should’ve been the UOK flag or something.) You
can have decoded data with the UTF8 flag off, and you can have
encoded data with the UTF8 flag on. The UTF8 flag is about the
internals-level format of the byte buffer of a scalar, it has
nothing to do with the meaning of the data on the Perl level.
Testing the flag in pure-Perl code is an almost certain sign of
brokenness.

> My suspicion is that I don't really understand what's happening
> inside sqlite -- I assume it's storing as UTF-8, but I don't
> really know what it's doing.

Try Devel::Peek to examine the strings that come out of it?

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>

_______________________________________________
List: Catalyst[at]lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst[at]lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/


seasprocket at gmail

Jun 29, 2009, 11:46 AM

Post #8 of 8 (763 views)
Permalink
Re: Re: ajax character encoding issue solved, but WHY? [In reply to]

On Tue, Jun 23, 2009 at 1:28 AM, Aristotle Pagaltzis <pagaltzis[at]gmx.de>wrote:

> * seasprocket[at]gmail.com <seasprocket[at]gmail.com> [2009-06-23 03:00]:
> > Thanks for your suggestion, but I'm pretty sure that the data
> > is not getting encoded twice. C::V::JSON tests the data before
> > it encodes ( Encode::is_utf8() ) and only encodes if this test
> > is true. This test only passes if the data is decoded.
>
> Augh! Augh! Why do people keep reading stuff into the UTF8 flag
> that it doesn’t mean. (Yeah, I know why, because it’s called the
> UTF8 flag when it should’ve been the UOK flag or something.) You
> can have decoded data with the UTF8 flag off, and you can have
> encoded data with the UTF8 flag on.


(Sorry to be so slow to reply. I wanted to find time to fully investigate
this, but haven't.)

The Encode docs state:

# When you encode, the resulting UTF8 flag is always off.
# When you decode, the resulting UTF8 flag is on unless you can
unambiguously represent data [as ASCII].

I was interpreting this to apply to all encoding/decoding -- but I now
realize that it may only apply to the Encode package. Which really just
leaves me more confused .. :)

> My suspicion is that I don't really understand what's happening

> > inside sqlite -- I assume it's storing as UTF-8, but I don't
> > really know what it's doing.
>
> Try Devel::Peek to examine the strings that come out of it?


I used Devel::StringInfo and found:

[info] string: Madrid Alarcón
is_utf8: 0
octet_length: 15
valid_utf8: 1
decoded_is_same: 0
decoded:
octet_length: 15
downgradable: 1
char_length: 14
string: Madrid Alarc
is_utf8: 1
raw = <<Madrid Alarcón>>

I did not draw any brilliant conclusions from this, although I'm curious why
the decoded version has the non-ASCII char cut off.

At this point, obviously, I need to find the time to dig in further. Thanks
for your thoughts!



>
>
> Regards,
> --
> Aristotle Pagaltzis // <http://plasmasturm.org/>
>
> _______________________________________________
> List: Catalyst[at]lists.scsys.co.uk
> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
> Searchable archive:
> http://www.mail-archive.com/catalyst[at]lists.scsys.co.uk/
> Dev site: http://dev.catalyst.perl.org/
>



--
==========================
http://www.bikewise.org

2People citizen's network for climate action: http://www.2people.org

Greater Seattle Climate Dialogues: http://www.climatedialogues.org
==========================

Catalyst users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.