Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Catalyst: Users

unicode best practices

 

 

Catalyst users RSS feed   Index | Next | Previous | View Threaded


richardjolly at mac

Feb 15, 2007, 12:16 PM

Post #1 of 13 (937 views)
Permalink
unicode best practices

Hi,

The crux of this question is what is best practice for a catalyst/DBIC
app to have it fully unicode aware.

We've got:

MySQL with charset defined as UTF8
DBIC with on connect do SET NAMES and SET CHARSET and UTF8Columns
Catalyst::Plugin::Unicode
End actions specify content type as including charset utf8
Then, finally, making sure any files (which include log files and
fixture files for tests) opened use the utf8 io layer.

Should this be enough?

Thanks,

Richard




_______________________________________________
List: Catalyst [at] lists
Listinfo: http://lists.rawmode.org/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst [at] lists/
Dev site: http://dev.catalyst.perl.org/


danielmcbrearty at gmail

Feb 15, 2007, 1:11 PM

Post #2 of 13 (894 views)
Permalink
Re: unicode best practices [In reply to]

other stuff I've found:

http://search.cpan.org/~bricas/DBIx-Class-0.07003/lib/DBIx/Class/UTF8Columns.pm
- utf8 text fields in the db are automatically flagged as such when retrieved

I also found once I started using the above, I needed to install

http://search.cpan.org/~lyokato/Catalyst-View-TT-ForceUTF8-0.06/lib/Catalyst/View/TT/ForceUTF8.pm

to avoid garbled output.

There is also somewhere a module which causes stash to be flagged
utf8, but I haven't needed it. Yet ...



On 2/15/07, Richard Jolly <richardjolly [at] mac> wrote:
> Hi,
>
> The crux of this question is what is best practice for a catalyst/DBIC
> app to have it fully unicode aware.
>
> We've got:
>
> MySQL with charset defined as UTF8
> DBIC with on connect do SET NAMES and SET CHARSET and UTF8Columns
> Catalyst::Plugin::Unicode
> End actions specify content type as including charset utf8
> Then, finally, making sure any files (which include log files and
> fixture files for tests) opened use the utf8 io layer.
>
> Should this be enough?
>
> Thanks,
>
> Richard
>
>
>
>
> _______________________________________________
> List: Catalyst [at] lists
> Listinfo: http://lists.rawmode.org/mailman/listinfo/catalyst
> Searchable archive: http://www.mail-archive.com/catalyst [at] lists/
> Dev site: http://dev.catalyst.perl.org/
>


--
Daniel McBrearty
email : danielmcbrearty at gmail.com
www.engoi.com : the multi - language vocab trainer
BTW : 0873928131

_______________________________________________
List: Catalyst [at] lists
Listinfo: http://lists.rawmode.org/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst [at] lists/
Dev site: http://dev.catalyst.perl.org/


jon at jrock

Feb 15, 2007, 1:15 PM

Post #3 of 13 (891 views)
Permalink
Re: unicode best practices [In reply to]

Daniel McBrearty wrote:
> I also found once I started using the above, I needed to install
> C::V::TT::ForceUTF8 to avoid garbled output.

This means you're probably not encoding octets to characters properly.
See what encoding::warnings says about your code.

Also, read http://www.catalystframework.org/calendar/2006/21 for
unicode details.

>
> There is also somewhere a module which causes stash to be flagged
> utf8, but I haven't needed it. Yet ...

Shouldn't be necessary if you're writing correct code.

--
package JAPH;use Catalyst qw/-Debug/;($;=JAPH)->config(name => do {
$,.=reverse qw[Jonathan tsu rehton lre rekca Rockway][$_].[split //,
";$;"]->[$_].q; ;for 1..4;$,=~s;^.;;;$,});$;->setup;

_______________________________________________
List: Catalyst [at] lists
Listinfo: http://lists.rawmode.org/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst [at] lists/
Dev site: http://dev.catalyst.perl.org/


danielmcbrearty at gmail

Feb 15, 2007, 1:35 PM

Post #4 of 13 (897 views)
Permalink
Re: unicode best practices [In reply to]

thanks! I'll try it.

On 2/15/07, Jonathan Rockway <jon [at] jrock> wrote:
>
> Daniel McBrearty wrote:
> > I also found once I started using the above, I needed to install
> > C::V::TT::ForceUTF8 to avoid garbled output.
>
> This means you're probably not encoding octets to characters properly.
> See what encoding::warnings says about your code.
>
> Also, read http://www.catalystframework.org/calendar/2006/21 for
> unicode details.
>
> >
> > There is also somewhere a module which causes stash to be flagged
> > utf8, but I haven't needed it. Yet ...
>
> Shouldn't be necessary if you're writing correct code.
>
> --
> package JAPH;use Catalyst qw/-Debug/;($;=JAPH)->config(name => do {
> $,.=reverse qw[Jonathan tsu rehton lre rekca Rockway][$_].[split //,
> ";$;"]->[$_].q; ;for 1..4;$,=~s;^.;;;$,});$;->setup;
>
> _______________________________________________
> List: Catalyst [at] lists
> Listinfo: http://lists.rawmode.org/mailman/listinfo/catalyst
> Searchable archive: http://www.mail-archive.com/catalyst [at] lists/
> Dev site: http://dev.catalyst.perl.org/
>


--
Daniel McBrearty
email : danielmcbrearty at gmail.com
www.engoi.com : the multi - language vocab trainer
BTW : 0873928131

_______________________________________________
List: Catalyst [at] lists
Listinfo: http://lists.rawmode.org/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst [at] lists/
Dev site: http://dev.catalyst.perl.org/


richardjolly at mac

Feb 16, 2007, 12:45 AM

Post #5 of 13 (889 views)
Permalink
Re: unicode best practices [In reply to]

On 15 Feb 2007, at 21:15, Jonathan Rockway wrote:

>
> Daniel McBrearty wrote:
>> I also found once I started using the above, I needed to install
>> C::V::TT::ForceUTF8 to avoid garbled output.
>
> This means you're probably not encoding octets to characters properly.
> See what encoding::warnings says about your code.

So needing C::V::TT::ForceUTF8 is a sign you've not handled it properly
further upstream?

> Also, read http://www.catalystframework.org/calendar/2006/21 for
> unicode details.

Thanks, nice summary.

Richard


_______________________________________________
List: Catalyst [at] lists
Listinfo: http://lists.rawmode.org/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst [at] lists/
Dev site: http://dev.catalyst.perl.org/


dhoworth at mrc-lmb

Feb 16, 2007, 3:19 AM

Post #6 of 13 (887 views)
Permalink
Re: unicode best practices [In reply to]

Jonathan Rockway wrote:
> Also, read http://www.catalystframework.org/calendar/2006/21 for
> unicode details.

Nice page, thanks. I'm puzzled by one sentence:

$1, in this example, will contain the first character in the
string. This intuitive if the string is something like "abcde",
but it also holds true for a string like ???.

Is '???' supposed to appear like that? If so, I don't understand the
point. If not, is something broken on my end or the server end?

Cheers, Dave

_______________________________________________
List: Catalyst [at] lists
Listinfo: http://lists.rawmode.org/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst [at] lists/
Dev site: http://dev.catalyst.perl.org/


Alexander.Hartmaier at t-systems

Feb 16, 2007, 3:34 AM

Post #7 of 13 (890 views)
Permalink
AW: unicode best practices [In reply to]

I use Oracle 10.1 and didn't have to define anything to get utf8 working.
My Oracle charset is AL32UTF8 which is the recommended charset for UTF8 useage.

-Alex


-----Ursprüngliche Nachricht-----
Von: Richard Jolly [mailto:richardjolly [at] mac]
Gesendet: Freitag, 16. Februar 2007 09:45
An: The elegant MVC web framework
Betreff: Re: [Catalyst] unicode best practices

On 15 Feb 2007, at 21:15, Jonathan Rockway wrote:

>
> Daniel McBrearty wrote:
>> I also found once I started using the above, I needed to install
>> C::V::TT::ForceUTF8 to avoid garbled output.
>
> This means you're probably not encoding octets to characters properly.
> See what encoding::warnings says about your code.

So needing C::V::TT::ForceUTF8 is a sign you've not handled it properly
further upstream?

> Also, read http://www.catalystframework.org/calendar/2006/21 for
> unicode details.

Thanks, nice summary.

Richard


_______________________________________________
List: Catalyst [at] lists
Listinfo: http://lists.rawmode.org/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst [at] lists/
Dev site: http://dev.catalyst.perl.org/

*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*
T-Systems Austria GesmbH Rennweg 97-99, 1030 Wien
Handelsgericht Wien, FN 79340b
*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*
Notice: This e-mail contains information that is confidential and may be privileged.
If you are not the intended recipient, please notify the sender and then delete this e-mail immediately.
*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*


email at jasonkohles

Feb 16, 2007, 5:30 AM

Post #8 of 13 (892 views)
Permalink
Re: unicode best practices [In reply to]

On Feb 16, 2007, at 6:19 AM, Dave Howorth wrote:

> Jonathan Rockway wrote:
>> Also, read http://www.catalystframework.org/calendar/2006/21 for
>> unicode details.
>
> Nice page, thanks. I'm puzzled by one sentence:
>
> $1, in this example, will contain the first character in the
> string. This intuitive if the string is something like "abcde",
> but it also holds true for a string like ???.
>
> Is '???' supposed to appear like that? If so, I don't understand the
> point. If not, is something broken on my end or the server end?
>

I wondered the same thing, so I pulled the advent calendar sources
from subversion, and in the .pod it's literally C<???>, so it appears
it is supposed to be like that, and I don't get the point either...

--
Jason Kohles
email [at] jasonkohles
http://www.jasonkohles.com/
"A witty saying proves nothing." -- Voltaire


Richard.Jolly at bbc

Feb 16, 2007, 6:05 AM

Post #9 of 13 (892 views)
Permalink
RE: unicode best practices [In reply to]

Jason Kohles wrote:
> On Feb 16, 2007, at 6:19 AM, Dave Howorth wrote:
>
>
> Jonathan Rockway wrote:
>
> Also, read
> http://www.catalystframework.org/calendar/2006/21 for
unicode
> details.
>
>
> Nice page, thanks. I'm puzzled by one sentence:
>
> $1, in this example, will contain the first character in the
> string. This intuitive if the string is something
> like "abcde",
> but it also holds true for a string like ???.
>
> Is '???' supposed to appear like that? If so, I don't understand
the
> point. If not, is something broken on my end or the server end?
>
>
>
> I wondered the same thing, so I pulled the advent calendar
> sources from subversion, and in the .pod it's literally
> C<???>, so it appears it is supposed to be like that, and I
> don't get the point either...

I think it's a problem with pod2html, but I can't investigate now.

Richard

http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.


_______________________________________________
List: Catalyst [at] lists
Listinfo: http://lists.rawmode.org/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst [at] lists/
Dev site: http://dev.catalyst.perl.org/


Richard.Jolly at bbc

Feb 16, 2007, 7:58 AM

Post #10 of 13 (886 views)
Permalink
RE: unicode best practices [In reply to]

Jonathan Rockway wrote:

> Also, read http://www.catalystframework.org/calendar/2006/21
> for unicode details.

The encoding::warnings pragma is great, though I'm a little shocked at
how many warnings it's uncovered.

Other questions: C::P::Unicode encodes the body in the finalize method.
That makes sense.

But I'm tempted to try putting "use encoding 'utf8'" in the main
catalyst application module. Or at least "use open 'utf8'". That would
handle all the file reads and writes, but presumably also the final
write to STDOUT that catalyst does. So it sounds like it would conflict
with C::P::Unicode.

I'll be experimenting later, but does anyone have comments on doing it
like that?

Thanks,

Richard

http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.


_______________________________________________
List: Catalyst [at] lists
Listinfo: http://lists.rawmode.org/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst [at] lists/
Dev site: http://dev.catalyst.perl.org/


jon at jrock

Feb 16, 2007, 8:39 AM

Post #11 of 13 (892 views)
Permalink
Re: unicode best practices [In reply to]

Richard Jolly wrote:
> I think it's a problem with pod2html, but I can't investigate now.

Nope, I accidentally edited (and committed) the file from Windows, which
wiped out the unicode. I'll fix it when I have access to a sane machine.

--
package JAPH;use Catalyst qw/-Debug/;($;=JAPH)->config(name => do {
$,.=reverse qw[Jonathan tsu rehton lre rekca Rockway][$_].[split //,
";$;"]->[$_].q; ;for 1..4;$,=~s;^.;;;$,});$;->setup;

_______________________________________________
List: Catalyst [at] lists
Listinfo: http://lists.rawmode.org/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst [at] lists/
Dev site: http://dev.catalyst.perl.org/


richardjolly at mac

Feb 17, 2007, 1:22 PM

Post #12 of 13 (885 views)
Permalink
Re: unicode best practices [In reply to]

On 16 Feb 2007, at 15:58, Richard Jolly wrote:

To reply to myself...

> Jonathan Rockway wrote:
>
>> Also, read http://www.catalystframework.org/calendar/2006/21
>> for unicode details.

Thanks again for this.

> The encoding::warnings pragma is great, though I'm a little shocked at
> how many warnings it's uncovered.
>
> Other questions: C::P::Unicode encodes the body in the finalize method.
> That makes sense.

It only handles text* content types. I had to patch it to handle the
application/xml we produce.

The main surprise (at least to me) was that POSTed content (not from a
from a form, but in a ReSTy style) comes in as a File::Temp object, so
I had to make sure that was explicitly encoded as well.
Catalyst::Plugin::Unicode handles most parameters, but not that.

> But I'm tempted to try putting "use encoding 'utf8'" in the main
> catalyst application module. Or at least "use open 'utf8'". That would
> handle all the file reads and writes, but presumably also the final
> write to STDOUT that catalyst does. So it sounds like it would conflict
> with C::P::Unicode.

I didn't end up doing this. Instead I just attacked all inputs and
outputs and made sure they were handled explicitly.

Thanks for everyone's help.

Richard


_______________________________________________
List: Catalyst [at] lists
Listinfo: http://lists.rawmode.org/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst [at] lists/
Dev site: http://dev.catalyst.perl.org/


danielmcbrearty at gmail

Mar 4, 2007, 4:56 AM

Post #13 of 13 (773 views)
Permalink
Re: unicode best practices [In reply to]

hmmm. I got round to this today. Installed e::w, put it at the top of
the code ... but it's not saying anything at all (I guess I should be
looking for warnings in the debug output?)


On 2/15/07, Jonathan Rockway <jon [at] jrock> wrote:
>
> Daniel McBrearty wrote:
> > I also found once I started using the above, I needed to install
> > C::V::TT::ForceUTF8 to avoid garbled output.
>
> This means you're probably not encoding octets to characters properly.
> See what encoding::warnings says about your code.
>
> Also, read http://www.catalystframework.org/calendar/2006/21 for
> unicode details.
>
> >
> > There is also somewhere a module which causes stash to be flagged
> > utf8, but I haven't needed it. Yet ...
>
> Shouldn't be necessary if you're writing correct code.
>
> --
> package JAPH;use Catalyst qw/-Debug/;($;=JAPH)->config(name => do {
> $,.=reverse qw[Jonathan tsu rehton lre rekca Rockway][$_].[split //,
> ";$;"]->[$_].q; ;for 1..4;$,=~s;^.;;;$,});$;->setup;
>
> _______________________________________________
> List: Catalyst [at] lists
> Listinfo: http://lists.rawmode.org/mailman/listinfo/catalyst
> Searchable archive: http://www.mail-archive.com/catalyst [at] lists/
> Dev site: http://dev.catalyst.perl.org/
>


--
Daniel McBrearty
email : danielmcbrearty at gmail.com
www.engoi.com : the multi - language vocab trainer
BTW : 0873928131

_______________________________________________
List: Catalyst [at] lists
Listinfo: http://lists.rawmode.org/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst [at] lists/
Dev site: http://dev.catalyst.perl.org/

Catalyst users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.