
pm at flodhest
Sep 26, 2008, 4:11 PM
Post #2 of 4
(1170 views)
Permalink
|
On Sun, Sep 14, 2008 at 8:04 PM, Florian Zumbiehl <florz [at] florz> wrote: > Hi, > > this email basically arose from a discussion on #catalyst/irc.perl.org > where my (more or less) original question was for the format of the > string that Regex actions do match against. > > As nobody really seemed to know the answer, it got into a discussion > of basic URI semantics and finally kindof to the conclusion that > the current implementation of Regex (at least) probably is broken. > Part of that conclusion actually isn't from first-hand experience on > my part, but rather from Sebastian Riedel's examination of the source > of the current version, AFAICT - the debian backport package (5.7006) > I am using behaves differently. So, please forgive me, should this > invalidate parts of the following. > > So, to finally get to the meat of it: According to sri's examination, > catalyst simply extracts the path component from the URI, but > doesn't do any normalisation on it. This would mean that a request > for http://bar/foo would have a different string being matched against > the regexes than a request for http://bar/f%6fo . As those two URIs > are mandated to be equivalent (to refer to the same resource) by the > URI RFC (3986, 2.3), this kind of behaviour does make it pretty difficult > to write standards-compliant software, as you'd have to match against > ^(?:f|%66)(?:o|%6[fF]){2}$ for the example given above to meet the > requirements. > > I've got no clue whether other action types may be affected by > this, too. > > The behaviour I would consider sensible would be the normalisation > of the path in such a way that any two URI paths that are mandated > by the RFC to be equivalent will result in the exact same string, > and any two URI paths that are not mandated by the RFC to be > equivalent will result in different strings. > > IMO, in addition, as many characters as possible should be in > unescaped form after normalisation. For the path alone, that > would mean that only slashes in path components would really have > to be escaped. I assume that also escaping the ASCII control range > might be a good idea for security reasons with regard to use > on syscall/shell interfaces. If it's supposed to be safe for direct > injection into a URI, any other URI reserved characters probably > should be escaped, too. But above all, I think the important > thing is consistent, documented normalisation, independent of the > engine. > > Well, I guess that this email is somewhat open-ended so far. > But I don't really know what the next step should be - so, I'll > leave it at that. Please don't flame me for it ;-) > > Florian > > _______________________________________________ > Catalyst-dev mailing list > Catalyst-dev [at] lists > http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst-dev > Not sure if I'm on the right track or not, but I think the normalisation of the URL would be very good. I'm guessing the Regex problem is connected with | | sub foo :Local { my($self, $c, @args)=@_ } | ...where @args might contain "f%6fo" instead of whatever was meant to be there. I haven't dug into the source myself, but would there be any issues by making the path "sane" before it's actually handled in any way? -- Best regards, Jan Henning Thorsen _______________________________________________ Catalyst-dev mailing list Catalyst-dev [at] lists http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst-dev
|