Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Perl: porters

UTF8 and Data::Dumper

 

 

Perl porters RSS feed   Index | Next | Previous | View Threaded


smueller at cpan

Aug 3, 2012, 8:12 AM

Post #1 of 8 (93 views)
Permalink
UTF8 and Data::Dumper

Hi Karl,

In the attached change (4b88fb76efce8c436e63b907c9842345d4fa77c7), you
modified the XS in Data::Dumper to fix some UTF8 issue the details of
which I do not know. That's great, but unfortunately, your changes won't
allow that version of D::D to run on older perls, which is quite important.

I'd now just go ahead and #ifdef the hell out of it since I don't know
how to get it right, but the specific way in which you patched the code
makes me wonder what's going on: What you cannot see in the diff is that
the function and macro definitions that you modified are only applicable
to 5.6 (yes, 5.six not sixteen) and earlier. This basically leaves 5.8
to 5.16 in the same bucket as blead and it blows up in fun ways.

Since I'm unsure what you intended, could you please have a look at this
when you have some time and try to make D::D work on both blead and perl
5.8-5.16?

Best regards,
Steffen
Attachments: 4b88fb76efce8c436e63b907c9842345d4fa77c7.patch (20.6 KB)


public at khwilliamson

Aug 3, 2012, 10:36 AM

Post #2 of 8 (90 views)
Permalink
Re: UTF8 and Data::Dumper [In reply to]

On 08/03/2012 09:12 AM, Steffen Mueller wrote:
> Hi Karl,
>
> In the attached change (4b88fb76efce8c436e63b907c9842345d4fa77c7), you
> modified the XS in Data::Dumper to fix some UTF8 issue the details of
> which I do not know. That's great, but unfortunately, your changes won't
> allow that version of D::D to run on older perls, which is quite important.
>
> I'd now just go ahead and #ifdef the hell out of it since I don't know
> how to get it right, but the specific way in which you patched the code
> makes me wonder what's going on: What you cannot see in the diff is that
> the function and macro definitions that you modified are only applicable
> to 5.6 (yes, 5.six not sixteen) and earlier. This basically leaves 5.8
> to 5.16 in the same bucket as blead and it blows up in fun ways.
>
> Since I'm unsure what you intended, could you please have a look at this
> when you have some time and try to make D::D work on both blead and perl
> 5.8-5.16?
>
> Best regards,
> Steffen


I'm trying to understand this. You're saying D::D fails in 5.16 and
blead? I guess that means the regression tests for it are not sufficient.

I think we should use #irc to figure out the best way forward.


smueller at cpan

Aug 5, 2012, 10:58 PM

Post #3 of 8 (86 views)
Permalink
Re: UTF8 and Data::Dumper [In reply to]

Hi Karl,

On 08/03/2012 07:36 PM, Karl Williamson wrote:
> On 08/03/2012 09:12 AM, Steffen Mueller wrote:
>> Hi Karl,
>>
>> In the attached change (4b88fb76efce8c436e63b907c9842345d4fa77c7), you
>> modified the XS in Data::Dumper to fix some UTF8 issue the details of
>> which I do not know. That's great, but unfortunately, your changes won't
>> allow that version of D::D to run on older perls, which is quite
>> important.
>>
>> I'd now just go ahead and #ifdef the hell out of it since I don't know
>> how to get it right, but the specific way in which you patched the code
>> makes me wonder what's going on: What you cannot see in the diff is that
>> the function and macro definitions that you modified are only applicable
>> to 5.6 (yes, 5.six not sixteen) and earlier. This basically leaves 5.8
>> to 5.16 in the same bucket as blead and it blows up in fun ways.
>>
>> Since I'm unsure what you intended, could you please have a look at this
>> when you have some time and try to make D::D work on both blead and perl
>> 5.8-5.16?

> I'm trying to understand this. You're saying D::D fails in 5.16 and
> blead? I guess that means the regression tests for it are not sufficient.
>
> I think we should use #irc to figure out the best way forward.

That's harder to do than you think - time zones, work hours, train time
all conspire against us.

So let me try via email again:

- Data::Dumper has an "#if PERL_VERSION < 5.6" block that you patched as
part of 4b88fb76efce8c436e63b907c9842345d4fa77c7 to use the newer
utf8-related functions.

- I don't think that's ever going to work, 5.6 being ancient and all.

- Data::Dumper (as in blead) now no longer works on 5.12.4 (which is
what I had to test with then) nor on 5.14.2 (which is what I just
tried). Not tested with 5.16 yet.

- Since DD is such a commonly used module and dual-lived, we need to
make it work for older releases. IMO all the way back to some 5.8
release. Since some crazy people still use 5.6, those may argue 5.6
(though I'd call "patches welcome").

I do not understand why you patched the #if block that pertains to 5.6
only to use functions that were introduced much later. I have a hunch
that we may now need two #if's: 5.6, 5.8-5.14 (or 5.16?) and one for all
perls since.

Does this make more sense? I'm at a loss about what exactly the patch
is, what's compatible with what, and what needs doing to get DD to work
on older perls again. This is where I'm hoping for your help.

Best regards,
Steffen


public at khwilliamson

Aug 6, 2012, 10:30 AM

Post #4 of 8 (83 views)
Permalink
Re: UTF8 and Data::Dumper [In reply to]

On 08/05/2012 11:58 PM, Steffen Mueller wrote:
> Hi Karl,
>
> On 08/03/2012 07:36 PM, Karl Williamson wrote:
>> On 08/03/2012 09:12 AM, Steffen Mueller wrote:
>>> Hi Karl,
>>>
>>> In the attached change (4b88fb76efce8c436e63b907c9842345d4fa77c7), you
>>> modified the XS in Data::Dumper to fix some UTF8 issue the details of
>>> which I do not know. That's great, but unfortunately, your changes won't
>>> allow that version of D::D to run on older perls, which is quite
>>> important.
>>>
>>> I'd now just go ahead and #ifdef the hell out of it since I don't know
>>> how to get it right, but the specific way in which you patched the code
>>> makes me wonder what's going on: What you cannot see in the diff is that
>>> the function and macro definitions that you modified are only applicable
>>> to 5.6 (yes, 5.six not sixteen) and earlier. This basically leaves 5.8
>>> to 5.16 in the same bucket as blead and it blows up in fun ways.
>>>
>>> Since I'm unsure what you intended, could you please have a look at this
>>> when you have some time and try to make D::D work on both blead and perl
>>> 5.8-5.16?
>
>> I'm trying to understand this. You're saying D::D fails in 5.16 and
>> blead? I guess that means the regression tests for it are not
>> sufficient.
>>
>> I think we should use #irc to figure out the best way forward.
>
> That's harder to do than you think - time zones, work hours, train time
> all conspire against us.

I thought I had recently seen you on #irc.
>
> So let me try via email again:
>
> - Data::Dumper has an "#if PERL_VERSION < 5.6" block that you patched as
> part of 4b88fb76efce8c436e63b907c9842345d4fa77c7 to use the newer
> utf8-related functions.
>
> - I don't think that's ever going to work, 5.6 being ancient and all.
>
> - Data::Dumper (as in blead) now no longer works on 5.12.4 (which is
> what I had to test with then) nor on 5.14.2 (which is what I just
> tried). Not tested with 5.16 yet.
>
> - Since DD is such a commonly used module and dual-lived, we need to
> make it work for older releases. IMO all the way back to some 5.8
> release. Since some crazy people still use 5.6, those may argue 5.6
> (though I'd call "patches welcome").
>
> I do not understand why you patched the #if block that pertains to 5.6
> only to use functions that were introduced much later. I have a hunch
> that we may now need two #if's: 5.6, 5.8-5.14 (or 5.16?) and one for all
> perls since.
>
> Does this make more sense? I'm at a loss about what exactly the patch
> is, what's compatible with what, and what needs doing to get DD to work
> on older perls again. This is where I'm hoping for your help.
>

By not working, it appears it won't even compile on older releases.

Clearly, I failed to notice that

#if PERL_VERSION < 5.6

when I patched it. I'm sorry.

I am attaching a new patch, that hopefully will fix the problem. It
doesn't include any version changes, or Changes file changes. I notice
that the version is now postdated to the planned next 5.17 release.

What the original patch attempted to solve was the bug that was
introduced in 5.8 (or 5.7) in which malformed UTF-8 could cause a read
beyond the end of the buffer. The first byte of a UTF-8 encoded
character indicates how many bytes in the entire character. Perl allows
up to 13 bytes in a character. If that first byte actually was wrong,
and was at the very end of the buffer, Perl would attempt to read up to
12 more bytes beyond the actual end. In 5.6 and 5.16, an extra
parameter is passed giving the length of the buffer, which acts as a
fence beyond which the base level routine will not go. At the time I
wrote the patch, I didn't realize that the bug wasn't always there, so I
didn't go looking for that #if 5.6.

So, you were right, an extra #if block is needed. I have compiled and
run the attached patch on 5.14, and attempted to compile it on 5.6.
There are compile-time bugs on 5.6, but none of them appear to be
related to my changes. Thus, other patches to this module have occurred
over the years which prevent its use in 5.6.
Attachments: 0001-Data-Dumper-Fix-to-use-with-earlier-Perls.patch (2.19 KB)


smueller at cpan

Aug 6, 2012, 11:01 AM

Post #5 of 8 (85 views)
Permalink
Re: UTF8 and Data::Dumper [In reply to]

Hi Karl,

thanks for helping me out so quickl!

On 08/06/2012 07:30 PM, Karl Williamson wrote:
> On 08/05/2012 11:58 PM, Steffen Mueller wrote:
>> On 08/03/2012 07:36 PM, Karl Williamson wrote:
>>> I think we should use #irc to figure out the best way forward.
>>
>> That's harder to do than you think - time zones, work hours, train time
>> all conspire against us.
>
> I thought I had recently seen you on #irc.

Must have been a week-end or I just popped in real quick.

> By not working, it appears it won't even compile on older releases.

> What the original patch attempted to solve was the bug that was
> introduced in 5.8 (or 5.7) in which malformed UTF-8 could cause a read
> beyond the end of the buffer. The first byte of a UTF-8 encoded
> character indicates how many bytes in the entire character. Perl allows
> up to 13 bytes in a character. If that first byte actually was wrong,
> and was at the very end of the buffer, Perl would attempt to read up to
> 12 more bytes beyond the actual end. In 5.6 and 5.16, an extra
> parameter is passed giving the length of the buffer, which acts as a
> fence beyond which the base level routine will not go. At the time I
> wrote the patch, I didn't realize that the bug wasn't always there, so I
> didn't go looking for that #if 5.6.
>
> So, you were right, an extra #if block is needed. I have compiled and
> run the attached patch on 5.14, and attempted to compile it on 5.6.
> There are compile-time bugs on 5.6, but none of them appear to be
> related to my changes. Thus, other patches to this module have occurred
> over the years which prevent its use in 5.6.

I tested it on 5.12.4, 5.14.2, 5.16.0 and it works there. It's better
than before for sure, so a DD developer release won't hurt to tickle
CPAN testers.

Thank you very much for looking at this quickly!

Best regards,
Steffen


public at khwilliamson

Aug 6, 2012, 2:48 PM

Post #6 of 8 (86 views)
Permalink
Re: UTF8 and Data::Dumper [In reply to]

On 08/06/2012 12:01 PM, Steffen Mueller wrote:
> Hi Karl,
>
> thanks for helping me out so quickl!
>
> On 08/06/2012 07:30 PM, Karl Williamson wrote:
>> On 08/05/2012 11:58 PM, Steffen Mueller wrote:
>>> On 08/03/2012 07:36 PM, Karl Williamson wrote:
>>>> I think we should use #irc to figure out the best way forward.
>>>
>>> That's harder to do than you think - time zones, work hours, train time
>>> all conspire against us.
>>
>> I thought I had recently seen you on #irc.
>
> Must have been a week-end or I just popped in real quick.
>
>> By not working, it appears it won't even compile on older releases.
>
>> What the original patch attempted to solve was the bug that was
>> introduced in 5.8 (or 5.7) in which malformed UTF-8 could cause a read
>> beyond the end of the buffer. The first byte of a UTF-8 encoded
>> character indicates how many bytes in the entire character. Perl allows
>> up to 13 bytes in a character. If that first byte actually was wrong,
>> and was at the very end of the buffer, Perl would attempt to read up to
>> 12 more bytes beyond the actual end. In 5.6 and 5.16, an extra
>> parameter is passed giving the length of the buffer, which acts as a
>> fence beyond which the base level routine will not go. At the time I
>> wrote the patch, I didn't realize that the bug wasn't always there, so I
>> didn't go looking for that #if 5.6.
>>
>> So, you were right, an extra #if block is needed. I have compiled and
>> run the attached patch on 5.14, and attempted to compile it on 5.6.
>> There are compile-time bugs on 5.6, but none of them appear to be
>> related to my changes. Thus, other patches to this module have occurred
>> over the years which prevent its use in 5.6.
>
> I tested it on 5.12.4, 5.14.2, 5.16.0 and it works there. It's better
> than before for sure, so a DD developer release won't hurt to tickle
> CPAN testers.
>
> Thank you very much for looking at this quickly!
>
> Best regards,
> Steffen
>

The fix is now in blead as
54161612423d90fe313643089d18d7a3a98460c7


rurban at x-ray

Aug 6, 2012, 3:41 PM

Post #7 of 8 (86 views)
Permalink
Re: UTF8 and Data::Dumper [In reply to]

On Mon, Aug 6, 2012 at 4:48 PM, Karl Williamson <public [at] khwilliamson> wrote:
> On 08/06/2012 12:01 PM, Steffen Mueller wrote:
>>
>> Hi Karl,
>>
>> thanks for helping me out so quickl!
>>
>> On 08/06/2012 07:30 PM, Karl Williamson wrote:
>>>
>>> On 08/05/2012 11:58 PM, Steffen Mueller wrote:
>>>>
>>>> On 08/03/2012 07:36 PM, Karl Williamson wrote:
>>>>>
>>>>> I think we should use #irc to figure out the best way forward.
>>>>
>>>>
>>>> That's harder to do than you think - time zones, work hours, train time
>>>> all conspire against us.
>>>
>>>
>>> I thought I had recently seen you on #irc.
>>
>>
>> Must have been a week-end or I just popped in real quick.
>>
>>> By not working, it appears it won't even compile on older releases.
>>
>>
>>> What the original patch attempted to solve was the bug that was
>>> introduced in 5.8 (or 5.7) in which malformed UTF-8 could cause a read
>>> beyond the end of the buffer. The first byte of a UTF-8 encoded
>>> character indicates how many bytes in the entire character. Perl allows
>>> up to 13 bytes in a character. If that first byte actually was wrong,
>>> and was at the very end of the buffer, Perl would attempt to read up to
>>> 12 more bytes beyond the actual end. In 5.6 and 5.16, an extra
>>> parameter is passed giving the length of the buffer, which acts as a
>>> fence beyond which the base level routine will not go. At the time I
>>> wrote the patch, I didn't realize that the bug wasn't always there, so I
>>> didn't go looking for that #if 5.6.
>>>
>>> So, you were right, an extra #if block is needed. I have compiled and
>>> run the attached patch on 5.14, and attempted to compile it on 5.6.
>>> There are compile-time bugs on 5.6, but none of them appear to be
>>> related to my changes. Thus, other patches to this module have occurred
>>> over the years which prevent its use in 5.6.
>>
>>
>> I tested it on 5.12.4, 5.14.2, 5.16.0 and it works there. It's better
>> than before for sure, so a DD developer release won't hurt to tickle
>> CPAN testers.
>>
>> Thank you very much for looking at this quickly!
>>
>> Best regards,
>> Steffen
>>
>
> The fix is now in blead as
> 54161612423d90fe313643089d18d7a3a98460c7

5.6 has 2.121 which is fine, and I see no problem with patching blead
for 5.6 support or not.
2.121 has no major problem, and 2.135_07 does not survive the 5.6
tests, e.g. for my $v (@$val)

new are mainly vstrings, regex objects, better can support for
freezer, and ensure hash iterators are reset.

There are however still errors in the new Data::Dumper pureperl:
t/dumper.t: $TMAX = 207 not 201,
t/bless.t fails to skip XS in the pureperl case,
t/perl-74170.t fails to skip XS in the pureperl case.
--
Reini Urban
http://cpanel.net/ http://www.perl-compiler.org/


smueller at cpan

Aug 6, 2012, 11:59 PM

Post #8 of 8 (77 views)
Permalink
Re: UTF8 and Data::Dumper [In reply to]

On 08/07/2012 12:41 AM, Reini Urban wrote:
> 5.6 has 2.121 which is fine, and I see no problem with patching blead
> for 5.6 support or not.
> 2.121 has no major problem, and 2.135_07 does not survive the 5.6
> tests, e.g. for my $v (@$val)
>
> new are mainly vstrings, regex objects, better can support for
> freezer, and ensure hash iterators are reset.
>
> There are however still errors in the new Data::Dumper pureperl:
> t/dumper.t: $TMAX = 207 not 201,

Thanks for that. This's not what I see back to 5.8.9 (oldest perl I have
on my laptop). There, it's 207 tests, but the PP implementation tests
were still hosed since Father Chrysostomos fixed the vstring dumping
code. That's fixed now.

If you would like to submit a patch to make 5.6 work, I'll apply it, but
I won't go through the pain of compiling a 5.6.

> t/bless.t fails to skip XS in the pureperl case,
> t/perl-74170.t fails to skip XS in the pureperl case.

Not fixed yet. Not entirely my itch either, but if I find some spare
time, I'll give it a shot.

--Steffen

Perl porters RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.