
nick at ccl4
Nov 26, 2009, 3:41 AM
Post #24 of 51
(786 views)
Permalink
|
|
Re: [PATCH] Functional substitution (s///f)
[In reply to]
|
|
On Mon, Nov 23, 2009 at 12:16:59PM -0800, David Caldwell wrote: > On 11/23/09 11:38 AM -0500 jesse wrote: > >On Mon, Nov 23, 2009 at 04:31:23PM +0000, Zefram wrote: > >>David Caldwell wrote: > >>> So, I'm looking for comments and hopefully an indication of what it > >>> takes to get this into perl. > >> > >>Unfortunately you just missed the feature cutoff for 5.12, so this is a > >>bad time for a feature patch. But basically the process is what you've > >>just done: mailing a patch to perl5-porters. The patch would have to > >>include documentation to be fully acceptable. > > > >It would also probably want tests that include some more advanced/insane > >regex features and tests that "prove" what it should or shouldn't be > >doing with captures. > > I didn't include any tests for captures because the patch didn't touch > anything that would affect captures. It basically copies the target scalar > before anything is done and operates on the copy instead of the original. > Then it changes the return value to return the target scalar instead of the > number of matches. All the middle subst regexp stuff is untouched. Right. But what I think is important is to make sure that there isn't any part of the behaviour that *relies* on this implementation. Because it would be useful in future to have the possibility to optimise the implementation. Right now there is an explicit, up-front copy, which then gets shuffled around in-place every time a length-changing substitution is made. It would be nice to be able to change to having the matching part of the regexp engine walk the original (read-only), only copying chunks across that are unchanged, and writing substitutions out directly. But we'd only have the freedom to do this if we have all the corner cases covered now, to make sure that there isn't a way to "see through" this and spot the difference, at the point when /f is first introduced. However, I can't (yet) find a way to "spot" this, as it seems that 1: the variable that one is matching isn't "changed" until the end of the match, so that actions such as C<length> on it don't alter midway: $ ./perl -lw $_ = 'abcbcdcde'; s/c/2 ** length $_/ge; print $_; __END__ ab512b512d512de 2: variables such as C<$'>, which *would* differ depending on the implementation above, can still be given the existing behaviour, by making them (continue to) track the remainder of the original. $ ./perl -lw $_ = 'abcbcdcde'; s/c/2 ** length $'/ge; print $_; __END__ ab64b16d4de Nicholas Clark
|