
nospam-abuse at bloodgate
Nov 22, 2006, 11:59 AM
Post #12 of 32
(935 views)
Permalink
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Moin, offlist because I accidentily replied to myself, then resent it but forgot to add p5p. So, repost to list: On Wednesday 22 November 2006 19:23, you wrote: > On 11/22/06, Tels <nospam-abuse[at]bloodgate.com> wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > Moin, > > On Wednesday 22 November 2006 17:55, you wrote: > > > On 11/22/06, Tels <nospam-abuse[at]bloodgate.com> wrote: > > > > * I want to spur Yves into looking into why qr// is slower than it > > > > should, the slowdown makes assembling regexps from dynamically > > > > created parts much less usefull than it could be. > > > > > > Its because there is no easy way to merge a compiled regex into one > > > that is being compiled. Its not impossible its just not easy. I dont > > > see this happening in time for 5.10 if we want 5.10 coming out around > > > > Okay on the time frame. Just some small followup question for me to > > understand: > > > > As far as I understood Dave's message, > > > > $qr_foo = qr/foo/; $qr_foobar = qr/bar$qr_foo/; > > > > results in $qr_foobar merely containing: > > > > (?-xism:bar(?-xism:foo)) > > > > as a string. Why aren't the "inner" (?-xism:...) cut away when the > > content of the xism doesn't contain any variables or is "simple" or > > whatever? (I know that probably sounds much easier than it is :) > > They have to be wrapped in (?: ) so they are self contained. > > Think about: > > $p=qr/a|b|c/; > $x=qr/foo$p/; > > you dont want $x to be: fooa|b|c. You want it to be (?:foo(?:a|b|c)). Good point. (That was exactly the point I was waiting for since I couldn't come up with an example). So qr/foo/ => foo qr/a|b/ => (?:a|b) qr/(foo)/ => (foo) etc. Maybe a simple heuristic is that if it contans only [a-z0-9\[\]], interpolate as it is, otherwise interpolate as "(?:" $qr .")", unless it already has "(" at the start. Hm, but the regexp engine would surely know much better which case could be handled as which. > > I want the result to be: > > > > (?-xism:barfoo) > > > > and actually not a bunch of pointers to the little parts. > > That would be nice. I dont know if its that easy to do. We have some > of the information on hand to make it happen tho. > > > The reason is > > because it can become unwieldingly large and complicated: > > > > perl -le '$f = qr/f/; for (0..3) { $f = qr/$f$f/; } print > > $f,"\n"' > > > > I already have written a parser that nestes about 3 layers of multiple > > qr// objects and a simple debug printout of the resulting regexp start > > to spam the screen with xisms :-) > > Yeah, well you could post process the regex with a regex. :-) remove > all the unnecessary wrappers. Or you could use something like the > tools in the latest version of the re module that allow you to get > access to the pattern unwrapped. Well, if I had some free time lying around here somewhere :/ > > Plus, everytime you use that string to match > > something, the regexp compiler seems to goes crazy. > > IN what way? Parsing & processing all these (?-xism:) every match takes time :) Not much, but needless time anyway. > > Maybe someone should write RegExp::Optimize::qr? :-D > Heh. > BTW, why offlist? > > Cheers, > Yves - -- Signed on Wed Nov 22 20:54:46 2006 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email. This email violates U.S. patent #5,978,791 <http://tinyurl.com/5t6ft> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) iQEVAwUBRWSsEHcLPEOTuEwVAQLQyQf9HPq78BzY68q5xLdsF5tS499q0PWk+dB2 EW4k9DuFpWk59h86igOmfsDjJH3Fl/GPc6WiyIVmjEPr0GAB7OSHlCa6cIul4BGR CF+0mCzmEPODGim9TCLhprtb19l7qfmiG4PT6zsP+qpeF3yVHianjjdHWfLfLRp9 MjO/96EulOI3/GRBfFdAA01SAAJDanqwbz9TPVIWUfQltq/9i8G10cTU3fi8XnMY Q572lOfjM6TYLfAyAsTE1XcHJ5gfSOJGazgMKtt84JFViocIZ2WdnWXwfCp7GYAp XoeI79qkJ4nEjItJUgDPmWovZcJz2TMQnu5PY4xKDhw4AmMeLlXxpg== =gf83 -----END PGP SIGNATURE-----
|