Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Perl: porters

[PATCH] h2ph: allow the quote mark delimiter when chasing #include directives with "-a"

 

 

Perl porters RSS feed   Index | Next | Previous | View Threaded


ntyni at debian

May 15, 2008, 1:15 PM

Post #1 of 8 (85 views)
Permalink
[PATCH] h2ph: allow the quote mark delimiter when chasing #include directives with "-a"

Recent Linux kernel headers like <asm/unistd.h> have quote delimiters in system
header #include directives:

> # ifdef __i386__
> # include "unistd_32.h"
> # else
> # include "unistd_64.h"
> # endif

which results in both unistd_32.h and unistd_64.h being skipped altogether.
See <http://bugs.debian.org/479762>.

In addition to simply matching the quote marks, we need to prepend the
directory prefix of the current file when the quote syntax is used. Unlike
cpp, 'require' will not look in the directory of the source file, only
in @INC.

This is most probably still suboptimal, but it seems to work and generated
no regressions on the translated system headers on current Debian unstable
(manually checked on the amd64 architecture.)

The patch is currently included in the Debian package as of 5.10.0-10.
---
utils/h2ph.PL | 20 +++++++++++++++-----
1 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/utils/h2ph.PL b/utils/h2ph.PL
index 0bfea18..a3ff285 100644
--- a/utils/h2ph.PL
+++ b/utils/h2ph.PL
@@ -85,7 +85,7 @@ sub reindent($) {
}

my ($t, $tab, %curargs, $new, $eval_index, $dir, $name, $args, $outfile);
-my ($incl, $incl_type, $next);
+my ($incl, $incl_type, $incl_quote, $next);
while (defined (my $file = next_file())) {
if (-l $file and -d $file) {
link_if_possible($file) if ($opt_l);
@@ -186,9 +186,10 @@ while (defined (my $file = next_file())) {
print OUT $t,"unless(defined(\&$name)) {\n sub $name () {\t",$new,";}\n}\n";
}
}
- } elsif (/^(include|import|include_next)\s*[<\"](.*)[>\"]/) {
+ } elsif (/^(include|import|include_next)\s*([<\"])(.*)[>\"]/) {
$incl_type = $1;
- $incl = $2;
+ $incl_quote = $2;
+ $incl = $3;
if (($incl_type eq 'include_next') ||
($opt_e && exists($bad_file{$incl}))) {
$incl =~ s/\.h$/.ph/;
@@ -221,6 +222,10 @@ while (defined (my $file = next_file())) {
"warn(\$\@) if \$\@;\n");
} else {
$incl =~ s/\.h$/.ph/;
+ # copy the prefix in the quote syntax (#include "x.h") case
+ if ($incl !~ m|/| && $incl_quote eq q{"} && $file =~ m|^(.*)/|) {
+ $incl = "$1/$incl";
+ }
print OUT $t,"require '$incl';\n";
}
} elsif (/^ifdef\s+(\w+)/) {
@@ -724,8 +729,13 @@ sub queue_includes_from
$line .= <HEADER>;
}

- if ($line =~ /^#\s*include\s+<(.*?)>/) {
- push(@ARGV, $1) unless $Is_converted{$1};
+ if ($line =~ /^#\s*include\s+([<"])(.*?)[>"]/) {
+ my ($delimiter, $new_file) = ($1, $2);
+ # copy the prefix in the quote syntax (#include "x.h") case
+ if ($delimiter eq q{"} && $file =~ m|^(.*)/|) {
+ $new_file = "$1/$new_file";
+ }
+ push(@ARGV, $new_file) unless $Is_converted{$new_file};
}
}
close HEADER;
--
1.5.5.1


h.m.brand at xs4all

May 15, 2008, 11:45 PM

Post #2 of 8 (79 views)
Permalink
Re: [PATCH] h2ph: allow the quote mark delimiter when chasing #include directives with "-a" [In reply to]

On Thu, 15 May 2008 23:15:35 +0300, Niko Tyni <ntyni[at]debian.org> wrote:

> Recent Linux kernel headers like <asm/unistd.h> have quote delimiters in system
> header #include directives:
>
> > # ifdef __i386__
> > # include "unistd_32.h"
> > # else
> > # include "unistd_64.h"
> > # endif
>
> which results in both unistd_32.h and unistd_64.h being skipped altogether.
> See <http://bugs.debian.org/479762>.
>
> In addition to simply matching the quote marks, we need to prepend the
> directory prefix of the current file when the quote syntax is used. Unlike
> cpp, 'require' will not look in the directory of the source file, only
> in @INC.

Some comments that are not only to your patch, which - I think - looks
fine. I just wondered, looking at this code ...

> This is most probably still suboptimal, but it seems to work and generated
> no regressions on the translated system headers on current Debian unstable
> (manually checked on the amd64 architecture.)
>
> The patch is currently included in the Debian package as of 5.10.0-10.
> ---
> utils/h2ph.PL | 20 +++++++++++++++-----
> 1 files changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/utils/h2ph.PL b/utils/h2ph.PL
> index 0bfea18..a3ff285 100644
> --- a/utils/h2ph.PL
> +++ b/utils/h2ph.PL
> @@ -85,7 +85,7 @@ sub reindent($) {
> }
>
> my ($t, $tab, %curargs, $new, $eval_index, $dir, $name, $args, $outfile);
> -my ($incl, $incl_type, $next);
> +my ($incl, $incl_type, $incl_quote, $next);
> while (defined (my $file = next_file())) {
> if (-l $file and -d $file) {
> link_if_possible($file) if ($opt_l);
> @@ -186,9 +186,10 @@ while (defined (my $file = next_file())) {
> print OUT $t,"unless(defined(\&$name)) {\n sub $name () {\t",$new,";}\n}\n";
> }
> }
> - } elsif (/^(include|import|include_next)\s*[<\"](.*)[>\"]/) {
> + } elsif (/^(include|import|include_next)\s*([<\"])(.*)[>\"]/) {

why are the " escaped in the char classes?

is this safe enough? it would also match #include "foo.h>
not that it is very likely to happen, but still

> $incl_type = $1;
> - $incl = $2;
> + $incl_quote = $2;
> + $incl = $3;
> if (($incl_type eq 'include_next') ||
> ($opt_e && exists($bad_file{$incl}))) {
> $incl =~ s/\.h$/.ph/;
> @@ -221,6 +222,10 @@ while (defined (my $file = next_file())) {
> "warn(\$\@) if \$\@;\n");
> } else {
> $incl =~ s/\.h$/.ph/;
> + # copy the prefix in the quote syntax (#include "x.h") case
> + if ($incl !~ m|/| && $incl_quote eq q{"} && $file =~ m|^(.*)/|) {
> + $incl = "$1/$incl";
> + }
> print OUT $t,"require '$incl';\n";
> }
> } elsif (/^ifdef\s+(\w+)/) {
> @@ -724,8 +729,13 @@ sub queue_includes_from
> $line .= <HEADER>;
> }
>
> - if ($line =~ /^#\s*include\s+<(.*?)>/) {
> - push(@ARGV, $1) unless $Is_converted{$1};
> + if ($line =~ /^#\s*include\s+([<"])(.*?)[>"]/) {

here the quotes are not escaped, and still no check on matching quotes

> + my ($delimiter, $new_file) = ($1, $2);
> + # copy the prefix in the quote syntax (#include "x.h") case
> + if ($delimiter eq q{"} && $file =~ m|^(.*)/|) {
> + $new_file = "$1/$new_file";
> + }
> + push(@ARGV, $new_file) unless $Is_converted{$new_file};
> }
> }
> close HEADER;


--
H.Merijn Brand Amsterdam Perl Mongers (http://amsterdam.pm.org/)
using & porting perl 5.6.2, 5.8.x, 5.10.x on HP-UX 10.20, 11.00, 11.11,
& 11.23, SuSE 10.1 & 10.2, AIX 5.2, and Cygwin. http://qa.perl.org
http://mirrors.develooper.com/hpux/ http://www.test-smoke.org
http://www.goldmark.org/jeff/stupid-disclaimers/


rgarciasuarez at gmail

May 16, 2008, 1:25 AM

Post #3 of 8 (77 views)
Permalink
Re: [PATCH] h2ph: allow the quote mark delimiter when chasing #include directives with "-a" [In reply to]

2008/5/16 H.Merijn Brand <h.m.brand[at]xs4all.nl>:
>> The patch is currently included in the Debian package as of 5.10.0-10.
>> ---
>> utils/h2ph.PL | 20 +++++++++++++++-----
>> 1 files changed, 15 insertions(+), 5 deletions(-)
>>
>> diff --git a/utils/h2ph.PL b/utils/h2ph.PL
>> index 0bfea18..a3ff285 100644
>> --- a/utils/h2ph.PL
>> +++ b/utils/h2ph.PL
>> @@ -85,7 +85,7 @@ sub reindent($) {
>> }
>>
>> my ($t, $tab, %curargs, $new, $eval_index, $dir, $name, $args, $outfile);
>> -my ($incl, $incl_type, $next);
>> +my ($incl, $incl_type, $incl_quote, $next);
>> while (defined (my $file = next_file())) {
>> if (-l $file and -d $file) {
>> link_if_possible($file) if ($opt_l);
>> @@ -186,9 +186,10 @@ while (defined (my $file = next_file())) {
>> print OUT $t,"unless(defined(\&$name)) {\n sub $name () {\t",$new,";}\n}\n";
>> }
>> }
>> - } elsif (/^(include|import|include_next)\s*[<\"](.*)[>\"]/) {
>> + } elsif (/^(include|import|include_next)\s*([<\"])(.*)[>\"]/) {
>
> why are the " escaped in the char classes?
>
> is this safe enough? it would also match #include "foo.h>
> not that it is very likely to happen, but still

([<"])([^<"]+)\2 would probably be better. (and faster)

>> $incl_type = $1;
>> - $incl = $2;
>> + $incl_quote = $2;
>> + $incl = $3;
>> if (($incl_type eq 'include_next') ||
>> ($opt_e && exists($bad_file{$incl}))) {
>> $incl =~ s/\.h$/.ph/;
>> @@ -221,6 +222,10 @@ while (defined (my $file = next_file())) {
>> "warn(\$\@) if \$\@;\n");
>> } else {
>> $incl =~ s/\.h$/.ph/;
>> + # copy the prefix in the quote syntax (#include "x.h") case
>> + if ($incl !~ m|/| && $incl_quote eq q{"} && $file =~ m|^(.*)/|) {

That last regexp is equivalent to $file =~ m|/|, isn't it ?

>> + $incl = "$1/$incl";
>> + }
>> print OUT $t,"require '$incl';\n";
>> }
>> } elsif (/^ifdef\s+(\w+)/) {
>> @@ -724,8 +729,13 @@ sub queue_includes_from
>> $line .= <HEADER>;
>> }
>>
>> - if ($line =~ /^#\s*include\s+<(.*?)>/) {
>> - push(@ARGV, $1) unless $Is_converted{$1};
>> + if ($line =~ /^#\s*include\s+([<"])(.*?)[>"]/) {
>
> here the quotes are not escaped, and still no check on matching quotes
>
>> + my ($delimiter, $new_file) = ($1, $2);
>> + # copy the prefix in the quote syntax (#include "x.h") case
>> + if ($delimiter eq q{"} && $file =~ m|^(.*)/|) {
>> + $new_file = "$1/$new_file";
>> + }
>> + push(@ARGV, $new_file) unless $Is_converted{$new_file};
>> }
>> }
>> close HEADER;

Apart from those little nits, that can be applied later, the patch
looks safe to me.


moritz at casella

May 16, 2008, 1:51 AM

Post #4 of 8 (74 views)
Permalink
Re: [PATCH] h2ph: allow the quote mark delimiter when chasing #include directives with "-a" [In reply to]

Rafael Garcia-Suarez wrote:
> 2008/5/16 H.Merijn Brand <h.m.brand[at]xs4all.nl>:
>>> The patch is currently included in the Debian package as of 5.10.0-10.
>>> ---
>>> utils/h2ph.PL | 20 +++++++++++++++-----
>>> 1 files changed, 15 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/utils/h2ph.PL b/utils/h2ph.PL
>>> index 0bfea18..a3ff285 100644
>>> --- a/utils/h2ph.PL
>>> +++ b/utils/h2ph.PL
>>> @@ -85,7 +85,7 @@ sub reindent($) {
>>> }
>>>
>>> my ($t, $tab, %curargs, $new, $eval_index, $dir, $name, $args, $outfile);
>>> -my ($incl, $incl_type, $next);
>>> +my ($incl, $incl_type, $incl_quote, $next);
>>> while (defined (my $file = next_file())) {
>>> if (-l $file and -d $file) {
>>> link_if_possible($file) if ($opt_l);
>>> @@ -186,9 +186,10 @@ while (defined (my $file = next_file())) {
>>> print OUT $t,"unless(defined(\&$name)) {\n sub $name () {\t",$new,";}\n}\n";
>>> }
>>> }
>>> - } elsif (/^(include|import|include_next)\s*[<\"](.*)[>\"]/) {
>>> + } elsif (/^(include|import|include_next)\s*([<\"])(.*)[>\"]/) {
>>
>> why are the " escaped in the char classes?
>>
>> is this safe enough? it would also match #include "foo.h>
>> not that it is very likely to happen, but still
>
> ([<"])([^<"]+)\2 would probably be better. (and faster)

That would match <foo<, not <foo>


ntyni at debian

May 16, 2008, 2:58 AM

Post #5 of 8 (72 views)
Permalink
Re: [PATCH] h2ph: allow the quote mark delimiter when chasing #include directives with "-a" [In reply to]

On Fri, May 16, 2008 at 10:25:45AM +0200, Rafael Garcia-Suarez wrote:
> 2008/5/16 H.Merijn Brand <h.m.brand[at]xs4all.nl>:

> >> - } elsif (/^(include|import|include_next)\s*[<\"](.*)[>\"]/) {
> >> + } elsif (/^(include|import|include_next)\s*([<\"])(.*)[>\"]/) {
> >
> > why are the " escaped in the char classes?

Because they backslashes were there earlier and I didn't want to confuse
the issue, I suppose.

> > is this safe enough? it would also match #include "foo.h>
> > not that it is very likely to happen, but still

I thought of that but again decided to keep the patch simple, particularly
because the old code didn't care about that either.

I have to admit I thought it would only catch invalid cpp syntax, but
somewhat surprisingly cpp actually handles these right here:

#include "a>b.h"
#include <a"b.h>

Still, handling these cases right should IMO go in a separate patch.

> ([<"])([^<"]+)\2 would probably be better. (and faster)

Even if you meant ([<"])([^>"]+)\2 , I don't think that works.
Sorry if I'm missing something.

> >> + # copy the prefix in the quote syntax (#include "x.h") case
> >> + if ($incl !~ m|/| && $incl_quote eq q{"} && $file =~ m|^(.*)/|) {
>
> That last regexp is equivalent to $file =~ m|/|, isn't it ?

It would be, but it's capturing the prefix in the same go:

> >> + $incl = "$1/$incl";
> >> + }

> Apart from those little nits, that can be applied later, the patch
> looks safe to me.

Thanks for the review.
--
Niko Tyni ntyni[at]debian.org


rgarciasuarez at gmail

May 16, 2008, 4:33 AM

Post #6 of 8 (74 views)
Permalink
Re: [PATCH] h2ph: allow the quote mark delimiter when chasing #include directives with "-a" [In reply to]

2008/5/16 Moritz Lenz <moritz[at]casella.verplant.org>:
>> ([<"])([^<"]+)\2 would probably be better. (and faster)
>
> That would match <foo<, not <foo>

Wow, indeed. Good catch.


h.m.brand at xs4all

May 16, 2008, 5:48 AM

Post #7 of 8 (72 views)
Permalink
Re: [PATCH] h2ph: allow the quote mark delimiter when chasing #include directives with "-a" [In reply to]

On Fri, 16 May 2008 12:58:15 +0300, Niko Tyni <ntyni[at]debian.org> wrote:

> > Apart from those little nits, that can be applied later, the patch
> > looks safe to me.
>
> Thanks for the review.

The original patch is now in as #33835

We're open for the followup now :)

--
H.Merijn Brand Amsterdam Perl Mongers (http://amsterdam.pm.org/)
using & porting perl 5.6.2, 5.8.x, 5.10.x on HP-UX 10.20, 11.00, 11.11,
& 11.23, SuSE 10.1 & 10.2, AIX 5.2, and Cygwin. http://qa.perl.org
http://mirrors.develooper.com/hpux/ http://www.test-smoke.org
http://www.goldmark.org/jeff/stupid-disclaimers/


demerphq at gmail

May 16, 2008, 5:54 AM

Post #8 of 8 (72 views)
Permalink
Re: [PATCH] h2ph: allow the quote mark delimiter when chasing #include directives with "-a" [In reply to]

2008/5/16 Rafael Garcia-Suarez <rgarciasuarez[at]gmail.com>:
> 2008/5/16 H.Merijn Brand <h.m.brand[at]xs4all.nl>:
>>> The patch is currently included in the Debian package as of 5.10.0-10.
>>> ---
>>> utils/h2ph.PL | 20 +++++++++++++++-----
>>> 1 files changed, 15 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/utils/h2ph.PL b/utils/h2ph.PL
>>> index 0bfea18..a3ff285 100644
>>> --- a/utils/h2ph.PL
>>> +++ b/utils/h2ph.PL
>>> @@ -85,7 +85,7 @@ sub reindent($) {
>>> }
>>>
>>> my ($t, $tab, %curargs, $new, $eval_index, $dir, $name, $args, $outfile);
>>> -my ($incl, $incl_type, $next);
>>> +my ($incl, $incl_type, $incl_quote, $next);
>>> while (defined (my $file = next_file())) {
>>> if (-l $file and -d $file) {
>>> link_if_possible($file) if ($opt_l);
>>> @@ -186,9 +186,10 @@ while (defined (my $file = next_file())) {
>>> print OUT $t,"unless(defined(\&$name)) {\n sub $name () {\t",$new,";}\n}\n";
>>> }
>>> }
>>> - } elsif (/^(include|import|include_next)\s*[<\"](.*)[>\"]/) {
>>> + } elsif (/^(include|import|include_next)\s*([<\"])(.*)[>\"]/) {
>>
>> why are the " escaped in the char classes?
>>
>> is this safe enough? it would also match #include "foo.h>
>> not that it is very likely to happen, but still
>
> ([<"])([^<"]+)\2 would probably be better. (and faster)

Actually any pattern using backreferences is going to be slower than
one that doesnt (all other things being equal of course).

>
>>> $incl_type = $1;
>>> - $incl = $2;
>>> + $incl_quote = $2;
>>> + $incl = $3;
>>> if (($incl_type eq 'include_next') ||
>>> ($opt_e && exists($bad_file{$incl}))) {
>>> $incl =~ s/\.h$/.ph/;
>>> @@ -221,6 +222,10 @@ while (defined (my $file = next_file())) {
>>> "warn(\$\@) if \$\@;\n");
>>> } else {
>>> $incl =~ s/\.h$/.ph/;
>>> + # copy the prefix in the quote syntax (#include "x.h") case
>>> + if ($incl !~ m|/| && $incl_quote eq q{"} && $file =~ m|^(.*)/|) {
>
> That last regexp is equivalent to $file =~ m|/|, isn't it ?

Not exactly equivalent, but I suspect morally equivalent.

yves
--
perl -Mre=debug -e "/just|another|perl|hacker/"

Perl porters RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.