Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: ModPerl: ModPerl

quick pure perl question

 

 

ModPerl modperl RSS feed   Index | Next | Previous | View Threaded


aw at ice-sa

Jun 28, 2009, 8:41 AM

Post #1 of 7 (1121 views)
Permalink
quick pure perl question

Hi.
By curiosity, and just in case anyone knows off-hand :

perl 5.8.8

In a script, I substantially do this :

open(FIRST,'<:utf8',$name1);
open(SECOND,'>:raw',$name2);
while(defined($line = <FIRST>)) {
print SECOND $line;
}

and I get warnings : "wide character in print to <SECOND>,.."

I mean, I know that my data is UTF-8, and I know that some characters
are going to be "wide", and that's how I want them.
I also know that I could specify the output I/O layer as 'utf8' (which
avoids the warning).
But why do I get warnings when I specified 'raw' as the I/O layer ?
Doesn't 'raw' mean like 'as is' ?


mike at acorg

Jun 28, 2009, 9:12 AM

Post #2 of 7 (1054 views)
Permalink
Re: quick pure perl question [In reply to]

Check out this man page http://perldoc.perl.org/functions/open.html For
encoding UTF8, the example is

open(FH, "<:encoding(UTF-8)", "file")

Mike

----- Original Message -----
From: "André Warnier" <aw [at] ice-sa>
To: "mod_perl list" <modperl [at] perl>
Sent: Sunday, June 28, 2009 11:41 AM
Subject: quick pure perl question


> Hi.
> By curiosity, and just in case anyone knows off-hand :
>
> perl 5.8.8
>
> In a script, I substantially do this :
>
> open(FIRST,'<:utf8',$name1);
> open(SECOND,'>:raw',$name2);
> while(defined($line = <FIRST>)) {
> print SECOND $line;
> }
>
> and I get warnings : "wide character in print to <SECOND>,.."
>
> I mean, I know that my data is UTF-8, and I know that some characters are
> going to be "wide", and that's how I want them.
> I also know that I could specify the output I/O layer as 'utf8' (which
> avoids the warning).
> But why do I get warnings when I specified 'raw' as the I/O layer ?
> Doesn't 'raw' mean like 'as is' ?
>
>
>
>


moseley at hank

Jun 28, 2009, 9:33 AM

Post #3 of 7 (1067 views)
Permalink
Re: quick pure perl question [In reply to]

On Sun, Jun 28, 2009 at 8:41 AM, André Warnier<aw [at] ice-sa> wrote:
> Hi.
> By curiosity, and just in case anyone knows off-hand :
>
> perl 5.8.8
>
> In a script, I substantially do this :
>
> open(FIRST,'<:utf8',$name1);
> open(SECOND,'>:raw',$name2);
> while(defined($line = <FIRST>)) {
>  print SECOND $line;
> }
>
> and I get warnings : "wide character in print to <SECOND>,.."
>
> I mean, I know that my data is UTF-8, and I know that some characters are
> going to be "wide", and that's how I want them.
> I also know that I could specify the output I/O layer as 'utf8' (which
> avoids the warning).
> But why do I get warnings when I specified 'raw' as the I/O layer ?
> Doesn't 'raw' mean like 'as is' ?

You are decoding into characters when reading in. Perl sets the utf8
flag on $line to indicate that $line is character data. Then you are
attempting to write characters (which is an abstraction) out as byte
data. Perl warns you that you are doing this because the utf8 flag is
set.

You need to encode the character data before writing back out either
by encoding explicitly or using a layer.



--
Bill Moseley
moseley [at] hank


andy at hexten

Jun 30, 2009, 5:17 AM

Post #4 of 7 (1059 views)
Permalink
Re: quick pure perl question [In reply to]

On 28 Jun 2009, at 17:33, Bill Moseley wrote:
> You need to encode the character data before writing back out either
> by encoding explicitly or using a layer.


Or possibly not decode it in the first place and treat it as an opaque
octet stream. All depending, of course, on what it is you're trying to
achieve.

--
Andy Armstrong, Hexten


aw at ice-sa

Jun 30, 2009, 6:13 AM

Post #5 of 7 (1058 views)
Permalink
Re: quick pure perl question [In reply to]

Andy Armstrong wrote:
> On 28 Jun 2009, at 17:33, Bill Moseley wrote:
>> You need to encode the character data before writing back out either
>> by encoding explicitly or using a layer.
>
>
> Or possibly not decode it in the first place and treat it as an opaque
> octet stream. All depending, of course, on what it is you're trying to
> achieve.
>

I was not trying to achieve anything, and I do understand the
encoding/decoding aspect.
Basically, by using the '>:raw' encoding for the output stream, I was
not expecting perl to warn me that I was (knowingly) outputting "wide
characters" there, so I was surprised at the warning.

I /would/ have expected it if I was /not/ specifying an encoding, like
using simply '>'. But not when I am explicitly specifying '>:raw',
which in my mind, and according to my interpretation of the on-line
documentation, is equivalent to saying "output whatever you have as
bytes in that string variable right now, as is, I know what I'm doing".
But I guess my interpretation of the documentation is incorrect then.


andy at hexten

Jun 30, 2009, 6:33 AM

Post #6 of 7 (1055 views)
Permalink
Re: quick pure perl question [In reply to]

On 30 Jun 2009, at 14:13, André Warnier wrote:
> I /would/ have expected it if I was /not/ specifying an encoding,
> like using simply '>'. But not when I am explicitly specifying
> '>:raw', which in my mind, and according to my interpretation of the
> on-line documentation, is equivalent to saying "output whatever you
> have as bytes in that string variable right now, as is, I know what
> I'm doing".


You have that bit right - but the string doesn't contain bytes[1] - it
contains characters. Strings can either be an octet stream or a stream
of wide characters. By reading utf8 into a string you've turned it
into the latter. Perl's warning that you're pushing character data
into an octet hole.

[1] of course it's /made/ of bytes but that's not how Perl sees it.

--
Andy Armstrong, Hexten


moseley at hank

Jun 30, 2009, 6:45 AM

Post #7 of 7 (1054 views)
Permalink
Re: quick pure perl question [In reply to]

On Tue, Jun 30, 2009 at 6:13 AM, André Warnier <aw [at] ice-sa> wrote:

> Basically, by using the '>:raw' encoding for the output stream, I was not
> expecting perl to warn me that I was (knowingly) outputting "wide
> characters" there, so I was surprised at the warning.
>
> I /would/ have expected it if I was /not/ specifying an encoding, like
> using simply '>'. But not when I am explicitly specifying '>:raw', which in
> my mind, and according to my interpretation of the on-line documentation, is
> equivalent to saying "output whatever you have as bytes in that string
> variable right now, as is, I know what I'm doing".


I think it's because it's not bytes. Well, technically it's bytes of
course, but conceptually once you decode bytes you no longer have bytes.
You have that abstract idea of characters. And the only way to output that
information into a file (which hold bytes) is by first converting it to
bytes, and that requires encoding.

It's just like a thought you have in your brain. I'm not aware of any way
(yet) to output that in raw format -- must be encoded into typed, spoken, or
signed language first. Even if most of what I write would be considered
pretty raw.

Isn't :raw mostly a way to use layers to say don't do CRLF conversion --
like the old use of binmode()? Oh, maybe not according to the docs.
It's best to decode and encode all character data at program boundaries and
stay away form Windows.


--
Bill Moseley
moseley [at] hank

ModPerl modperl RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.