Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Perl: porters

It's wafer thin!

 

 

Perl porters RSS feed   Index | Next | Previous | View Threaded


davidnicol at gmail

May 9, 2008, 11:14 AM

Post #1 of 23 (268 views)
Permalink
It's wafer thin!

--- perlop.pod 2008-05-09 12:57:06.000000000 +0000
+++ perlop.pod.new 2008-05-09 13:10:35.000000000 +0000
@@ -1050,15 +1050,27 @@
the life of the script. However, mentioning C</o> constitutes a promise
that you won't change the variables in the pattern. If you change them,
Perl won't even notice. See also L<"qr/STRING/imosx">.

+=item /THE EMPTY PATTERN/cg
+
If the PATTERN evaluates to the empty string, the last
I<successfully> matched regular expression is used instead. In this
case, only the C<g> and C<c> flags on the empty pattern is honoured -
the other flags are taken from the original pattern. If no match has
previously succeeded, this will (silently) act instead as a genuine
empty pattern (which will always match).

+Example:
+
+ $ranch =~ /(pig)/i or $ranch =~ /(sheep)/i or $ranch =~ /(cow)/i
+ or $ranch =~ /(chicken)/i or $ranch =~ /(horse)/i or
+ $ranch =~ /(turtle)/i or die "Are you sure <<$ranch>> is a ranch?";
+
+ push @animals, //g; # EMPTY PATTERN stands for last successful pattern
+
+=item matching in list context
+
If the C</g> option is not used, C<m//> in list context returns a
list consisting of the subexpressions matched by the parentheses in the
pattern, i.e., (C<$1>, C<$2>, C<$3>...). (Note that here C<$1> etc. are
also set, and that this differs from Perl 4's behavior.) When there are
@@ -1103,8 +1115,10 @@
search position to the beginning of the string, but you can avoid that
by adding the C</c> modifier (e.g. C<m//gc>). Modifying the target
string also resets the search position.

+=item \G assertion
+
You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a
zero-width assertion that matches the exact position where the previous
C<m//g>, if any, left off. Without the C</g> modifier, the C<\G> assertion
still anchors at pos(), but the match is of course only attempted once.


rgarciasuarez at gmail

May 18, 2008, 1:56 AM

Post #2 of 23 (233 views)
Permalink
Re: It's wafer thin! [In reply to]

2008/5/9 David Nicol <davidnicol[at]gmail.com>:
> --- perlop.pod 2008-05-09 12:57:06.000000000 +0000
> +++ perlop.pod.new 2008-05-09 13:10:35.000000000 +0000

The new headings are a good idea: added as #33852, thanks.


demerphq at gmail

May 18, 2008, 4:17 AM

Post #3 of 23 (231 views)
Permalink
Re: It's wafer thin! [In reply to]

2008/5/9 David Nicol <davidnicol[at]gmail.com>:
> --- perlop.pod 2008-05-09 12:57:06.000000000 +0000
> +++ perlop.pod.new 2008-05-09 13:10:35.000000000 +0000
> @@ -1050,15 +1050,27 @@
> the life of the script. However, mentioning C</o> constitutes a promise
> that you won't change the variables in the pattern. If you change them,
> Perl won't even notice. See also L<"qr/STRING/imosx">.
>
> +=item /THE EMPTY PATTERN/cg
> +
> If the PATTERN evaluates to the empty string, the last
> I<successfully> matched regular expression is used instead. In this
> case, only the C<g> and C<c> flags on the empty pattern is honoured -
> the other flags are taken from the original pattern. If no match has
> previously succeeded, this will (silently) act instead as a genuine
> empty pattern (which will always match).
>
> +Example:
> +
> + $ranch =~ /(pig)/i or $ranch =~ /(sheep)/i or $ranch =~ /(cow)/i
> + or $ranch =~ /(chicken)/i or $ranch =~ /(horse)/i or
> + $ranch =~ /(turtle)/i or die "Are you sure <<$ranch>> is a ranch?";
> +
> + push @animals, //g; # EMPTY PATTERN stands for last successful pattern

I know Raphael has applied this already buit i want to register that
Im not happy with this patch. I dont want the documentation to include
m!! as I dont think there is any need for it and it handcuffs us from
REMOVING it in the future. Explain what m!! does that you cant do with
$1 et al and maybe ill change my mind, alternatively change it to
s//something/; and my objection is withdrawn.

As it stands I would prefer the docs specifically mention that it is
unwise to do m// and note that the special case behaviour may be
disabled in a future perl, but that it probably wont be removed for
s///. Id also like the docs to specifically mention the traps
associated to this construct and the possibility that in future perls
it may be slighlty changed to avoid these traps. In general i think
the docs should recommend that this construct not be used at all and
including examples like this for it will achieve the opposite effect.

cheers,
yves


--
perl -Mre=debug -e "/just|another|perl|hacker/"


rgarciasuarez at gmail

May 18, 2008, 4:54 AM

Post #4 of 23 (231 views)
Permalink
Re: It's wafer thin! [In reply to]

2008/5/18 demerphq <demerphq[at]gmail.com>:
> I know Raphael has applied this already buit i want to register that
> Im not happy with this patch. I dont want the documentation to include
> m!! as I dont think there is any need for it and it handcuffs us from
> REMOVING it in the future. Explain what m!! does that you cant do with
> $1 et al and maybe ill change my mind, alternatively change it to
> s//something/; and my objection is withdrawn.
>
> As it stands I would prefer the docs specifically mention that it is
> unwise to do m// and note that the special case behaviour may be
> disabled in a future perl, but that it probably wont be removed for
> s///. Id also like the docs to specifically mention the traps
> associated to this construct and the possibility that in future perls
> it may be slighlty changed to avoid these traps. In general i think
> the docs should recommend that this construct not be used at all and
> including examples like this for it will achieve the opposite effect.

Apply a patch ? the addition of the =items was merely a clarification
of the current docs, but I'd be happy to discourage m!!in future
perls.


demerphq at gmail

May 18, 2008, 5:57 AM

Post #5 of 23 (231 views)
Permalink
Re: It's wafer thin! [In reply to]

2008/5/18 Rafael Garcia-Suarez <rgarciasuarez[at]gmail.com>:
> 2008/5/18 demerphq <demerphq[at]gmail.com>:
>> I know Raphael has applied this already buit i want to register that
>> Im not happy with this patch. I dont want the documentation to include
>> m!! as I dont think there is any need for it and it handcuffs us from
>> REMOVING it in the future. Explain what m!! does that you cant do with
>> $1 et al and maybe ill change my mind, alternatively change it to
>> s//something/; and my objection is withdrawn.
>>
>> As it stands I would prefer the docs specifically mention that it is
>> unwise to do m// and note that the special case behaviour may be
>> disabled in a future perl, but that it probably wont be removed for
>> s///. Id also like the docs to specifically mention the traps
>> associated to this construct and the possibility that in future perls
>> it may be slighlty changed to avoid these traps. In general i think
>> the docs should recommend that this construct not be used at all and
>> including examples like this for it will achieve the opposite effect.
>
> Apply a patch ? the addition of the =items was merely a clarification
> of the current docs, but I'd be happy to discourage m!!in future
> perls.

Ok, i wanted to discuss my rationale publicly so that you had a chance
to veto it, and so that any interested parties had a chance to
comment, but if you are ok with me introducing some deprecatory
sounding language then ill go ahead and do that when i get back to a
machine with p4 access.

Although i would like to hear about it if anybody can work out a
problem that you can solve with m// that you cant solve using qr// and
capture variables.

Also it occurs to me that maybe we can deprecate this construct and
provide an equivalent _explicit_ pattern for this behaviour as a
workaround. If we introduced this in the next release then we could
drop the empty match behaviour in the following.

The more i think about it the more like this option, as it would allow
some interesting ideas. like for instance embedding the last
successful match inside of a larger pattern. Thus it would allow us to
do things that we cant do now, as well as provide a work around for
deprecating the unpredictable empty match behaviour in both s/// and
m// form, as well as providing an escape hatch for the case where
people really do want $s=""; m/$s/ to be equivalent to m//.
Specifically Im thinking that m/(*LAST_PAT)/ would be the same as m//.

This seems better to the other escape hatch options, like adding a
flag or the extreme of just outright dropping the construct.

Cheers,
yves
--
perl -Mre=debug -e "/just|another|perl|hacker/"


perl at aaroncrane

May 18, 2008, 6:07 AM

Post #6 of 23 (231 views)
Permalink
Re: It's wafer thin! [In reply to]

demerphq writes:
> Specifically Im thinking that m/(*LAST_PAT)/ would be the same as m//.

I haven't thought much about the rest of what you're suggesting, but
it occurs to me that LAST_PAT is potentially confusing -- is this
"last" as in "most recent", or as in "final"?

I think /(*PREV_PAT)/ or /(*PREVIOUS_PAT)/ would be an improvement.

--
Aaron Crane ** http://aaroncrane.co.uk/


h.m.brand at xs4all

May 18, 2008, 6:26 AM

Post #7 of 23 (230 views)
Permalink
Re: It's wafer thin! [In reply to]

On Sun, 18 May 2008 14:07:41 +0100, Aaron Crane <perl[at]aaroncrane.co.uk>
wrote:

> demerphq writes:
> > Specifically Im thinking that m/(*LAST_PAT)/ would be the same as m//.
>
> I haven't thought much about the rest of what you're suggesting, but
> it occurs to me that LAST_PAT is potentially confusing -- is this
> "last" as in "most recent", or as in "final"?
>
> I think /(*PREV_PAT)/ or /(*PREVIOUS_PAT)/ would be an improvement.

but actually, it is
(*THE_MOST_RECENT_PATTERN_THAT_HAD_A_SUCCESSFUL_MATCH)

--
H.Merijn Brand Amsterdam Perl Mongers (http://amsterdam.pm.org/)
using & porting perl 5.6.2, 5.8.x, 5.10.x on HP-UX 10.20, 11.00, 11.11,
& 11.23, SuSE 10.1 & 10.2, AIX 5.2, and Cygwin. http://qa.perl.org
http://mirrors.develooper.com/hpux/ http://www.test-smoke.org
http://www.goldmark.org/jeff/stupid-disclaimers/


rvtol+news at isolution

May 18, 2008, 7:04 AM

Post #8 of 23 (230 views)
Permalink
Re: It's wafer thin! [In reply to]

"H.Merijn Brand" schreef:
> Aaron Crane:
>> demerphq:

>>> Specifically Im thinking that m/(*LAST_PAT)/ would be the same as
>>> m//.
>>
>> I haven't thought much about the rest of what you're suggesting, but
>> it occurs to me that LAST_PAT is potentially confusing -- is this
>> "last" as in "most recent", or as in "final"?
>>
>> I think /(*PREV_PAT)/ or /(*PREVIOUS_PAT)/ would be an improvement.
>
> but actually, it is
> (*THE_MOST_RECENT_PATTERN_THAT_HAD_A_SUCCESSFUL_MATCH)

And then some words about scope.

--
Affijn, Ruud

"Gewoon is een tijger."


abigail at abigail

May 19, 2008, 9:14 AM

Post #9 of 23 (215 views)
Permalink
Re: It's wafer thin! [In reply to]

On Sun, May 18, 2008 at 02:57:14PM +0200, demerphq wrote:
>
> The more i think about it the more like this option, as it would allow
> some interesting ideas. like for instance embedding the last
> successful match inside of a larger pattern. Thus it would allow us to
> do things that we cant do now, as well as provide a work around for
> deprecating the unpredictable empty match behaviour in both s/// and
> m// form, as well as providing an escape hatch for the case where
> people really do want $s=""; m/$s/ to be equivalent to m//.
> Specifically Im thinking that m/(*LAST_PAT)/ would be the same as m//.


No, I don't think they would be the same.

Oh, sure, semantically they would the same. But I don't think the main
reason there is m// is for its unique semantic behaviour only. Part of
the "usefulness" of m// is its shortness. m/(*LAST_PAT)/ completely misses
that point.

Now, I'm not argueing we should keep m//. But I think we should either keep
it, and keep it short and easy to remember [*] or remove it completely.
There's no point in crippling it.


[*] Easy to remember is subjective, but there are several tools where an
empty pattern means repeating the previous pattern. 'mutt' and 'vi'
spring into mind.


Abigail


davidnicol at gmail

May 19, 2008, 10:06 AM

Post #10 of 23 (215 views)
Permalink
Re: It's wafer thin! [In reply to]

On Sun, May 18, 2008 at 7:57 AM, demerphq <demerphq[at]gmail.com> wrote:
> Although i would like to hear about it if anybody can work out a
> problem that you can solve with m// that you cant solve using qr// and
> capture variables.

On can and can't, there's nothing you can do with the whole regex
paradigm that you can't do with substr.

The ranch example was contrived to demonstrate a situation where //
makes it easier. Using a s/// would make it even better

sub WordCounter($) { # takes string as argument, returns hash pairs
my %ret;
$string = shift;
while ( $string =~ /(\w+)/ ){
$ret{$1} = $string =~ s/\b$1\b//g
};
%ret
}

Hey your right, capture variables do seem like the better way to write
that. But with the feature, we can be a little more structured,
faster


sub RanchCounter($) { # takes ranch as argument, returns hash pairs
my %ret;
local *_ = shift;
while ( /\b(pig)\b/i or /\b(horse)\b/i or/\b(cow)\b/i or
/\b(chicken)\b/i or /\b(turtle)\b/i){
$ret{lc $1} = s///g
};
%ret
}

yes, the problem is contrived, but doing it that way is about as
readable and debatable on comparative awkwardness versus the
equivalent approach with $1 in the second regex.

sub RanchCounter2($) { # takes ranch as argument, returns hash pairs
my %ret;
local *_ = shift;
while ( /\b(pig|horse|cow|chicken|turtle)\b/i){
$ret{lc $1} = s/\b$1\b//gi
};
%ret
}


Quick and unreadable one-offs seems to be the use case that this
feature supports, not clear enterprise-grade auditable infrastructure.
And backwards compatibility, too. Without knowing one way or the
other, I expect that magical empty pattern might be one of perl 5's
features left out of Kurila.

The side effect of /o on the LAST_MATCH is worth warning about in the
discussion of the empty pattern


The thing, and I haven't contrived an example of this, (but I have
contrived a realistic real-world situation that would generate the
problem) that // would do which would be trickier to do by
reconstructing the to-be-matched-again pattern, would be when
LAST_MATCH has captures in it.

For instance, the input is supposed to specify names and addresses,
and there are ten different possible ways these are going to get
specified, and there are a thousand files that will get read, and each
file will be internally consistent but there's no telling which of the
ten formats is to be used, and new formats (which the code must be
adjusted to handle, possibly by someone else) appear every eight
months, on average. By using the empty pattern, the block of possible
formats can be in one easy-to-maintain series, and the code that
extracts the names and addresses can use // to repeat the active one,
allowing greater flexibility and clarity (especially with the new
named capture feature!) in composing the format identification regex
series, and a simpler maintenance procedure at new format integration
time.

Although doing that with named pre-compiled regexen would again be
better, you're right

open PF, "PossibleFormats.txt";
our @PossibleFormats = map {qr/$_/} <PF>; close PF;
my $ThisFormat;
$InputData =~ /$_/ and $ThisFormat=$_ and last for @PossibleFormats;
$ThisFormat or die "Shut 'er down, Clancy, time to add a format";
...


pagaltzis at gmx

May 19, 2008, 10:54 AM

Post #11 of 23 (215 views)
Permalink
Re: It's wafer thin! [In reply to]

* David Nicol <davidnicol[at]gmail.com> [2008-05-19 19:10]:
> The ranch example was contrived to demonstrate a situation
> where // makes it easier. Using a s/// would make it even
> better
>
> sub WordCounter($) { # takes string as argument, returns hash pairs
> my %ret;
> $string = shift;
> while ( $string =~ /(\w+)/ ){
> $ret{$1} = $string =~ s/\b$1\b//g
> };
> %ret
> }
>
> Hey your right, capture variables do seem like the better way to write
> that.

?

Uhm, I fail to see the point.

sub WordCounter($) {
my $string = shift;
my %ret;
++$ret{$1} while $string =~ s/(\w+)//;
%ret
}

> But with the feature, we can be a little more structured,
> faster

I don’t see how it’s fast to reperform the match you just tried
with m// in a separate s///.

> sub RanchCounter($) { # takes ranch as argument, returns hash pairs
> my %ret;
> local *_ = shift;
> while ( /\b(pig)\b/i or /\b(horse)\b/i or/\b(cow)\b/i or
> /\b(chicken)\b/i or /\b(turtle)\b/i){
> $ret{lc $1} = s///g
> };
> %ret
> }
>
> yes, the problem is contrived, but doing it that way is about
> as readable and debatable on comparative awkwardness versus the
> equivalent approach with $1 in the second regex.
>
> sub RanchCounter2($) { # takes ranch as argument, returns hash pairs
> my %ret;
> local *_ = shift;
> while ( /\b(pig|horse|cow|chicken|turtle)\b/i){
> $ret{lc $1} = s/\b$1\b//gi
> };
> %ret
> }

I still don’t understand why you want two regexes.

sub RanchCounter2($) { # takes ranch as argument, returns hash pairs
local *_ = shift;
my %ret;
for my $animal ( qw/ pig horse cow chicken turtle / ) {
$ret{ $animal } = s/\b$animal\b//gi;
};
%ret
}

Personally I continue to see no real value in keeping magic empty
match in the presence of qr//. With that, you can write a direct
equivalent for any code that might utilize the empty pattern:

sub RanchCounter($) { # takes ranch as argument, returns hash pairs
my %ret;
local *_ = shift;
for my $rx ( qr/\b(pig)\b/i, qr/\b(horse)\b/i, qr/\b(cow)\b/i,
qr/\b(chicken)\b/i, qr/\b(turtle)\b/i ) {
next if not /$rx/;
$ret{lc $1} = s/$rx//g;
};
%ret
}

Note that my goal here was to stay very close to your original
formulation, not write it in a clean way; the point is to show
that you can faithfully emulate any use of magic empty pattern
with the aid of qr//, should you want to for some inexplicable
reason.

It’s more cumbersome than with magic empty pattern, of course,
but that seems like a feature.

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>


davidnicol at gmail

May 19, 2008, 12:01 PM

Post #12 of 23 (216 views)
Permalink
Re: It's wafer thin! [In reply to]

On Mon, May 19, 2008 at 12:54 PM, Aristotle Pagaltzis <pagaltzis[at]gmx.de> wrote:

> Uhm, I fail to see the point.
>
> sub WordCounter($) {
> my $string = shift;
> my %ret;
> ++$ret{$1} while $string =~ s/(\w+)//;
> %ret
> }


You're right, that's a better word counter.

>> But with the feature, we can be a little more structured,
>> faster
>
> I don't see how it's fast to reperform the match you just tried
> with m// in a separate s///.

faster to write than separating the list of possible animals and then
constructing the list of regexen from it. Or not, as you wrote it
just as concisely and proved me wrong.

> I still don't understand why you want two regexes.

because I was trying to contrive a demonstration of how // ( a feature
I don't actually ever use) works

> Personally I continue to see no real value in keeping magic empty
> match in the presence of qr//. With that, you can write a direct
> equivalent for any code that might utilize the empty pattern:
>
> sub RanchCounter($) { # takes ranch as argument, returns hash pairs
> my %ret;
> local *_ = shift;
> for my $rx ( qr/\b(pig)\b/i, qr/\b(horse)\b/i, qr/\b(cow)\b/i,
> qr/\b(chicken)\b/i, qr/\b(turtle)\b/i ) {
> next if not /$rx/;
> $ret{lc $1} = s/$rx//g;
> };
> %ret
> }
>
> Note that my goal here was to stay very close to your original
> formulation, not write it in a clean way; the point is to show
> that you can faithfully emulate any use of magic empty pattern
> with the aid of qr//, should you want to for some inexplicable
> reason.

yes, that is the same conclusion reached while writing the message you
replied to.

> It's more cumbersome than with magic empty pattern, of course,
> but that seems like a feature.
>
> Regards,
> --
> Aristotle Pagaltzis // <http://plasmasturm.org/>

so magic empty pattern springs from a past time when qr// was not
available, and should be safe to deprecate.

Hopefully highlighting the existing paragraph in the documentation
with a paragraph title (the only part of my patch RGS applied, there
is no painfully contrived example in bleed crying for improvement)
will reduce the number of bug reports related to the magic empty
pattern surprising people.

Let's table further discussion of what contrived example to include in
the documentation until we find out if that works or not. If every
rarely-used feature in the documentation had its very own contrived
example, nobody could ever read it all the way through (as if anyone
does now.)


pagaltzis at gmx

May 19, 2008, 2:17 PM

Post #13 of 23 (207 views)
Permalink
Re: It's wafer thin! [In reply to]

* David Nicol <davidnicol[at]gmail.com> [2008-05-19 21:05]:
> On Mon, May 19, 2008 at 12:54 PM, Aristotle Pagaltzis <pagaltzis[at]gmx.de> wrote:
> > I still don't understand why you want two regexes.
>
> because I was trying to contrive a demonstration of how // ( a
> feature I don't actually ever use) works

I know. I meant that even after taking that into account, I
couldn’t see how using the empty pattern would be more desirable
(or at least, no less) than doing it in some other fashion.

I tried to contrive such an example myself, but failed. I was
unable to think of a single task in which I could use the empty
pattern where I didn’t also clearly prefer some other way of
achieving the same intent.

Hence my opinion that this should be deprecated.

> If every rarely-used feature in the documentation had its very
> own contrived example, nobody could ever read it all the way
> through (as if anyone does now.)

Yes, absolutely. I have a bit of a love/hate opinion of the the
docs as they stand. There is way too much in there and it’s hard
to find things whose location you haven’t memorised, yet I can’t
really point at any one place and say “that bit is superfluous or
mispaced.” Which is, naturally, the very reason I feel that way.

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>


demerphq at gmail

May 19, 2008, 3:22 PM

Post #14 of 23 (207 views)
Permalink
Re: It's wafer thin! [In reply to]

2008/5/19 Aristotle Pagaltzis <pagaltzis[at]gmx.de>:
> * David Nicol <davidnicol[at]gmail.com> [2008-05-19 21:05]:
>> On Mon, May 19, 2008 at 12:54 PM, Aristotle Pagaltzis <pagaltzis[at]gmx.de> wrote:
>> > I still don't understand why you want two regexes.
>>
>> because I was trying to contrive a demonstration of how // ( a
>> feature I don't actually ever use) works
>
> I know. I meant that even after taking that into account, I
> couldn't see how using the empty pattern would be more desirable
> (or at least, no less) than doing it in some other fashion.
>
> I tried to contrive such an example myself, but failed. I was
> unable to think of a single task in which I could use the empty
> pattern where I didn't also clearly prefer some other way of
> achieving the same intent.

Erm, so what did you come up with for my example?

if (/$pat1/ or /$pat2/ or /$pat3/) {
$hash{$1}=$_;
s///;
}

Cheers,
yves



--
perl -Mre=debug -e "/just|another|perl|hacker/"


davidnicol at gmail

May 19, 2008, 3:42 PM

Post #15 of 23 (207 views)
Permalink
Re: It's wafer thin! [In reply to]

On Mon, May 19, 2008 at 5:22 PM, demerphq <demerphq[at]gmail.com> wrote:

> Erm, so what did you come up with for my example?
>
> if (/$pat1/ or /$pat2/ or /$pat3/) {
> $hash{$1}=$_;
> s///;
> }
>
> Cheers,
> yves

After deprecating // this could become the slightly expanded

for my $p ($pat1, $pat2, $pat3){
/$p/ or next;
$hash{$1} = $_;
s/$p//;
last
}


ben at morrow

May 19, 2008, 3:54 PM

Post #16 of 23 (207 views)
Permalink
Re: It's wafer thin! [In reply to]

Quoth demerphq[at]gmail.com (demerphq):
>
> Erm, so what did you come up with for my example?
>
> if (/$pat1/ or /$pat2/ or /$pat3/) {
> $hash{$1}=$_;
> s///;
> }

How doable would

if (/$pat1/ or /$pat2/ or /$pat3/) {
$hash{$1} = $_;
$& = ...;
}

be? Of course, we would probably want to use ${^MATCH} instead, to avoid
keeping the source string around along with $&.

Ben

--
I've seen things you people wouldn't believe: attack ships on fire off
the shoulder of Orion; I watched C-beams glitter in the dark near the
Tannhauser Gate. All these moments will be lost, in time, like tears in rain.
Time to die. ben[at]morrow.me.uk


abigail at abigail

May 19, 2008, 4:22 PM

Post #17 of 23 (207 views)
Permalink
Re: It's wafer thin! [In reply to]

On Tue, May 20, 2008 at 12:22:00AM +0200, demerphq wrote:
> 2008/5/19 Aristotle Pagaltzis <pagaltzis[at]gmx.de>:
> > * David Nicol <davidnicol[at]gmail.com> [2008-05-19 21:05]:
> >> On Mon, May 19, 2008 at 12:54 PM, Aristotle Pagaltzis <pagaltzis[at]gmx.de> wrote:
> >> > I still don't understand why you want two regexes.
> >>
> >> because I was trying to contrive a demonstration of how // ( a
> >> feature I don't actually ever use) works
> >
> > I know. I meant that even after taking that into account, I
> > couldn't see how using the empty pattern would be more desirable
> > (or at least, no less) than doing it in some other fashion.
> >
> > I tried to contrive such an example myself, but failed. I was
> > unable to think of a single task in which I could use the empty
> > pattern where I didn't also clearly prefer some other way of
> > achieving the same intent.
>
> Erm, so what did you come up with for my example?
>
> if (/$pat1/ or /$pat2/ or /$pat3/) {
> $hash{$1}=$_;
> s///;
> }

I'd write that as:

my $copy = $_;
$hash {$1} = $copy if s/$pat1// or
s/$pat2// or
s/$pat3//;



Abigail


pagaltzis at gmx

May 20, 2008, 6:09 AM

Post #18 of 23 (186 views)
Permalink
Re: It's wafer thin! [In reply to]

* Abigail <abigail[at]abigail.be> [2008-05-20 01:25]:
> On Tue, May 20, 2008 at 12:22:00AM +0200, demerphq wrote:
> > 2008/5/19 Aristotle Pagaltzis <pagaltzis[at]gmx.de>:
> > > I tried to contrive such an example myself, but failed. I
> > > was unable to think of a single task in which I could use
> > > the empty pattern where I didn't also clearly prefer some
> > > other way of achieving the same intent.
> >
> > Erm, so what did you come up with for my example?
> >
> > if (/$pat1/ or /$pat2/ or /$pat3/) {
> > $hash{$1}=$_;
> > s///;
> > }
>
> I'd write that as:
>
> my $copy = $_;
> $hash {$1} = $copy if s/$pat1// or
> s/$pat2// or
> s/$pat3//;

Exactly. And if the behaviour of only creating a copy when a
match is known to exist is desired, empty pattern can be emulated
directly with qr//, as I said:

my $str = \$_;
my @rx = ( qr/$pat1/, qr/$pat2/, qr/$pat3/ );
if ( my $m = List::Util::first { $$str =~ $_ } @rx ) {
$hash{$1} = $_;
s/$m//;
}

Again, it takes more work, but I consider this a feature. (Less
so in this case than in my previous mail, but still.)

[.[. This would be *trivial*, btw, if we had a flag to ask `s///`
to return a modified copy instead of modifying in situ. Assuming
it was called `/R` and returned undef on failure (which I would
advocate, as it’s very easy to check for this and use the
original string instead if that’s what you want (particularly
since 5.10)), this could be written thus:

if ( my $cleaned = s!$pat1!!R // s!$pat2!!R // s!$pat3!!R ) {
( $hash{$1}, $_ ) = ( $_, $cleaned );
}

I *far* prefer this over any other variant. It’s shorter and
expresses the entire intent directly, and yet it will still
allocate memory for a copy only if the substitution succeeds. And
not only is it cleaner, it’s also potentially much faster as it
will only ever attempt any one match once, regardless of success
or failure.

Unfortunately, it’s also not currently possible. :-) ]]

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>


demerphq at gmail

May 20, 2008, 6:23 AM

Post #19 of 23 (186 views)
Permalink
Re: It's wafer thin! [In reply to]

2008/5/20 Aristotle Pagaltzis <pagaltzis[at]gmx.de>:
> * Abigail <abigail[at]abigail.be> [2008-05-20 01:25]:
>> On Tue, May 20, 2008 at 12:22:00AM +0200, demerphq wrote:
>> > 2008/5/19 Aristotle Pagaltzis <pagaltzis[at]gmx.de>:
>> > > I tried to contrive such an example myself, but failed. I
>> > > was unable to think of a single task in which I could use
>> > > the empty pattern where I didn't also clearly prefer some
>> > > other way of achieving the same intent.
>> >
>> > Erm, so what did you come up with for my example?
>> >
>> > if (/$pat1/ or /$pat2/ or /$pat3/) {
>> > $hash{$1}=$_;
>> > s///;
>> > }
>>
>> I'd write that as:
>>
>> my $copy = $_;
>> $hash {$1} = $copy if s/$pat1// or
>> s/$pat2// or
>> s/$pat3//;
>
> Exactly. And if the behaviour of only creating a copy when a
> match is known to exist is desired, empty pattern can be emulated
> directly with qr//, as I said:
>
> my $str = \$_;
> my @rx = ( qr/$pat1/, qr/$pat2/, qr/$pat3/ );
> if ( my $m = List::Util::first { $$str =~ $_ } @rx ) {
> $hash{$1} = $_;
> s/$m//;
> }
>
> Again, it takes more work, but I consider this a feature. (Less
> so in this case than in my previous mail, but still.)
>
> [[ This would be *trivial*, btw, if we had a flag to ask `s///`
> to return a modified copy instead of modifying in situ. Assuming
> it was called `/R` and returned undef on failure (which I would
> advocate, as it's very easy to check for this and use the
> original string instead if that's what you want (particularly
> since 5.10)), this could be written thus:
>
> if ( my $cleaned = s!$pat1!!R // s!$pat2!!R // s!$pat3!!R ) {
> ( $hash{$1}, $_ ) = ( $_, $cleaned );
> }
>
> I *far* prefer this over any other variant. It's shorter and
> expresses the entire intent directly, and yet it will still
> allocate memory for a copy only if the substitution succeeds. And
> not only is it cleaner, it's also potentially much faster as it
> will only ever attempt any one match once, regardless of success
> or failure.

I like this idea for s///R

yves



--
perl -Mre=debug -e "/just|another|perl|hacker/"


abigail at abigail

May 20, 2008, 6:39 AM

Post #20 of 23 (182 views)
Permalink
Re: It's wafer thin! [In reply to]

On Tue, May 20, 2008 at 03:09:58PM +0200, Aristotle Pagaltzis wrote:
> * Abigail <abigail[at]abigail.be> [2008-05-20 01:25]:
> > On Tue, May 20, 2008 at 12:22:00AM +0200, demerphq wrote:
> > > 2008/5/19 Aristotle Pagaltzis <pagaltzis[at]gmx.de>:
> > > > I tried to contrive such an example myself, but failed. I
> > > > was unable to think of a single task in which I could use
> > > > the empty pattern where I didn't also clearly prefer some
> > > > other way of achieving the same intent.
> > >
> > > Erm, so what did you come up with for my example?
> > >
> > > if (/$pat1/ or /$pat2/ or /$pat3/) {
> > > $hash{$1}=$_;
> > > s///;
> > > }
> >
> > I'd write that as:
> >
> > my $copy = $_;
> > $hash {$1} = $copy if s/$pat1// or
> > s/$pat2// or
> > s/$pat3//;
>
> Exactly. And if the behaviour of only creating a copy when a
> match is known to exist is desired, empty pattern can be emulated
> directly with qr//, as I said:
>
> my $str = \$_;
> my @rx = ( qr/$pat1/, qr/$pat2/, qr/$pat3/ );
> if ( my $m = List::Util::first { $$str =~ $_ } @rx ) {
> $hash{$1} = $_;
> s/$m//;
> }
>
> Again, it takes more work, but I consider this a feature. (Less
> so in this case than in my previous mail, but still.)

My version has the disadvantage of using a potentially unnessary copy,
but it has the advantage of not running the same pattern twice (and hence,
eliminating the need for m//). What's more efficient will depend on the
size of the string, and the complexity of the pattern.

I guess the following eliminates neither makes an unnecessary copy, nor will
it run a pattern twice:

if (/$pat1/ or /$pat2/ or /$pat3/) {
$hash {$1} = $_;
substr $hash {$1}, $- [0], $+ [0] - $- [0], "";
}

> [.[. This would be *trivial*, btw, if we had a flag to ask `s///`
> to return a modified copy instead of modifying in situ. Assuming
> it was called `/R` and returned undef on failure (which I would
> advocate, as it???s very easy to check for this and use the
> original string instead if that???s what you want (particularly
> since 5.10)), this could be written thus:
>
> if ( my $cleaned = s!$pat1!!R // s!$pat2!!R // s!$pat3!!R ) {
> ( $hash{$1}, $_ ) = ( $_, $cleaned );
> }
>
> I *far* prefer this over any other variant. It???s shorter and
> expresses the entire intent directly, and yet it will still
> allocate memory for a copy only if the substitution succeeds. And
> not only is it cleaner, it???s also potentially much faster as it
> will only ever attempt any one match once, regardless of success
> or failure.


It may be shorter, but as it is written, it's broken; it will fail one
of the patterns matches the entire string - or the entire string except
for a leading or trailing 0. You ought to write it as:

if (defined (my $cleaned = s!$pat1!!R // s!$pat2!!R // s!$pat3!!R)) {
($hash {$1}, $_) = ($_, $cleaned);
}

or

{
my $cleaned = s!$pat1!!R // s!$pat2!!R // s!$pat3!!R // last;
($hash {$1}, $_) = ($_, $cleaned);
}


neither of which is that nice.


And then there's this, where s///R doesn't help:

for (my $i = 0; $i < @array1; $i ++) {
say "Match on index $i" if $array1 [$i] =~ /pat/ && $array2 [$i] =~ //;
}

Of course, here the only gain of m// is the number characters typed (which
is mostly useful for the command line and short scripts).


Abigail


abigail at abigail

May 20, 2008, 6:44 AM

Post #21 of 23 (182 views)
Permalink
Re: It's wafer thin! [In reply to]

On Tue, May 20, 2008 at 03:23:52PM +0200, demerphq wrote:
>
> I like this idea for s///R


Yes, and I forgot to mention that in my previous reply. Well, I did write
it, then rewrote the paragraph it was in, and it fell down the cracks.

This idiom is common:

(my $mod = $orig) =~ s/pat/replacement/;

with the suggested /R modifier, it could be written as:

my $mod = $orig =~ s/pat/replacement/R;


doesn't save much in number of keystrokes, but, IMO, it is more readable.



Abigail


demerphq at gmail

May 20, 2008, 6:55 AM

Post #22 of 23 (182 views)
Permalink
Re: It's wafer thin! [In reply to]

2008/5/20 Abigail <abigail[at]abigail.be>:
> On Tue, May 20, 2008 at 03:23:52PM +0200, demerphq wrote:
>>
>> I like this idea for s///R
>
>
> Yes, and I forgot to mention that in my previous reply. Well, I did write
> it, then rewrote the paragraph it was in, and it fell down the cracks.
>
> This idiom is common:
>
> (my $mod = $orig) =~ s/pat/replacement/;
>
> with the suggested /R modifier, it could be written as:
>
> my $mod = $orig =~ s/pat/replacement/R;
>
>
> doesn't save much in number of keystrokes, but, IMO, it is more readable.

I was liking it especially because it would make it possible to write:

@changed= map { s/$pat/$something/R } @list;

which is really ugly to do currently.

@changed= map { (my $x= $_)=~s/$pat/$something/; $x } @list;

Yves


--
perl -Mre=debug -e "/just|another|perl|hacker/"


pagaltzis at gmx

May 20, 2008, 8:10 AM

Post #23 of 23 (179 views)
Permalink
Re: It's wafer thin! [In reply to]

* Abigail <abigail[at]abigail.be> [2008-05-20 15:45]:
> It may be shorter, but as it is written, it's broken; it will
> fail one of the patterns matches the entire string - or the
> entire string except for a leading or trailing 0. You ought to
> write it as:
>
> if (defined (my $cleaned = s!$pat1!!R // s!$pat2!!R // s!$pat3!!R)) {
> ($hash {$1}, $_) = ($_, $cleaned);
> }

Oops. Good point.

Still, keeping an unmodified copy and then updating `$_` is
implicit in the sequence of operations in the empty-pattern
version, whereas it’s explicit with this approach. So even
though your correct version is a little less nice than the
broken version I first wrote, I like it better than using the
empty pattern.

> And then there's this, where s///R doesn't help:
>
> for (my $i = 0; $i < @array1; $i ++) {
> say "Match on index $i" if $array1 [$i] =~ /pat/ && $array2 [$i] =~ //;
> }
>
> Of course, here the only gain of m// is the number characters
> typed (which is mostly useful for the command line and short
> scripts).

Again, qr// works just as well.

my $rx = qr/pat/;
for (my $i = 0; $i < @array1; $i ++) {
say "Match on index $i" if $array1 [$i] =~ /$rx/ && $array2 [$i] =~ /$rx/;
}

It doesn’t save as many keystrokes, but it does remove the
repetition of the pattern. Or in this particular case,

for (my $i = 0; $i < @array1; $i ++) {
say "Match on index $i" if not grep $_ !~ /$rx/, $array1 [$i], $array2 [$i];
}

But you know that. Of course the double negation can be a
mindbender, so I’d prefer List::MoreUtils here:

for (my $i = 0; $i < @array1; $i ++) {
say "Match on index $i" if all { /$rx/ } $array1 [$i], $array2 [$i];
}

That’s slightly shorter than the empty-pattern version, even.

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>

Perl porters RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.