Gossamer Forum: General: Perl Programming: regexp to decode url?

May 2, 1999, 9:48 PM

dahamsta

User (429 posts)

May 2, 1999, 9:48 PM

Post #1 of 9

Shortcut

regexp to decode url?

Hi folks,

Could somebody help me with a regular expressions to decode a url? It's for a navbar. I'm getting the current location using the REQUEST_URI environment variable, and now I want to split the string up into directories and put them into an array. The problem is that when I use "split", the first forward slash causes an empty value in the array, and I don't want the filename either.

For example, I would like to return an array like this:

@dirs = ("services", "design", "samples");

from all of these:

/services/design/samples
/services/design/samples/
/services/design/samples/index.htm

Any suggestions?

Thanks,
adam

May 3, 1999, 11:43 AM

Enthusiast (567 posts)

May 3, 1999, 11:43 AM

Post #2 of 9

Shortcut

Re: regexp to decode url? In reply to

try http://agora.leeds.ac.uk/nik/Perl/

hope this helps.....

May 3, 1999, 1:09 PM

dahamsta

User (429 posts)

May 3, 1999, 1:09 PM

Post #3 of 9

Shortcut

Re: regexp to decode url? In reply to

Weeeell, I was looking for a regexp to get rid of the first slash and the filename, but I guess maybe it's time I *did* delve into the complicated world of them and learn it for myself! Smile

The link should have been http://agora.leeds.ac.uk/nik btw, butI found it fast enuff, so thanks.

adam

May 4, 1999, 6:37 AM

Alex

Administrator (9387 posts)

May 4, 1999, 6:37 AM

Post #4 of 9

Shortcut

Re: regexp to decode url? In reply to

If you want to really decode a URL and get things like protocol, hostname, relative link, absolute link, etc, use the URI module available from CPAN.

If it's pretty simple and those are the only URL's you could do:

$input =~ m,/([^/]+)/([^/]+)/([^/]+),;

and you'll have services, design and samples in $1, $2, $3. It will work with all three samples you provided but wouldn't work if the input didn't look like:

/something/something/something

Cheers,

Alex

May 4, 1999, 11:56 PM

dahamsta

User (429 posts)

May 4, 1999, 11:56 PM

Post #5 of 9

Shortcut

Re: regexp to decode url? In reply to

Alex,

That's what the input *should* always look like, but I guess there'll always be anomalies. I'll have a look at the URI module as well.

Thanks,
adam

May 12, 1999, 1:54 PM

dahamsta

User (429 posts)

May 12, 1999, 1:54 PM

Post #6 of 9

Shortcut

Re: regexp to decode url? In reply to

Back to this one again. Smile

Ok, I guess my real problem isn't putting it into an array, it's stripping off the filename or forward slash at the end if there is one.

So is it possible to get something like this:

dir1/dir2/dir3

from the three examples I gave above? The string wouldn't necessarily be three dirs though (Alex's example depended on it, it could be one or ten...

Cheers,
adam

May 12, 1999, 3:11 PM

Alex

Administrator (9387 posts)

May 12, 1999, 3:11 PM

Post #7 of 9

Shortcut

Re: regexp to decode url? In reply to

Sure:

Code:
$input =~ s, 
            ^/?     # Find 0 or 1 leading slashes. 
            (.+?)   # Store everything in the middle in $1 
            /	    # Followed by a slash. 
	    [^/]+$  # Followed by the file name and end of string. 
          ,$1,x;    # Replace with just the middle.

would do the trick.

Cheers,

Alex

[This message has been edited by Alex (edited May 12, 1999).]

May 13, 1999, 12:02 AM

dahamsta

User (429 posts)

May 13, 1999, 12:02 AM

Post #8 of 9

Shortcut

Re: regexp to decode url? In reply to

Hi again Alex,

Sorry to be a bother, but that's still not getting the result I want. If there's a trailing slash, but no filename, it's leaving the leading and trailing slashes.

Also, I'm a bit worried that if the request_uri had no trailing slash, but was referring to a directory, that it would strip that off thinking it was a filename. Apache seems to rewrite the URI internally now, but I like to be sure.

So can the regex check to see if it's a valid filename (somethingdotsomething)? I thought it would be something like *\.* but that doesn't work.

Thanks again alex,
adam

May 13, 1999, 5:30 AM

dahamsta

User (429 posts)

May 13, 1999, 5:30 AM

Post #9 of 9

Shortcut

Re: regexp to decode url? In reply to

Back again! Smile

Ok, I think the easiest way to do it is to split the request_uri with the forward slash:

Code:
$uri = $ENV{REQUEST_URI}; 
@dirs = split("/",$uri);

Now I can get rid of the first forward slash with shift:

Code:
shift(@dirs);

And I can get the last value like this:

Code:
$length = $#dirs; 
$last = $dirs[$length];

So now all I have to do is check to see if $last is a valid filename, and if it is I can remove it with:

Code:
pop(@dirs);

Am I right so far? So now all I have left to do is actually CHECK for a filename. Filenames won't actually exist in this setup, I'm using mod_rewrite to send everything off to this script, so by rights the server will be looking for the first filename in DirectoryIndex, which in this case is "index.htm". However, for later scripts, it would be nice to check for any valid filename, excluding ones without an extension. As I said, I thought it would be *\.*, but that isn't valid (obviously now I come to look at it again!). So I looked at FileMan and then I reckoned:

m,^([A-Za-z0-9\-_.]\.[A-Za-z])$,)

...would do the job, but no. So any ideas what it would be?

I'll build me an indexable version of dbMan if it kills me! Smile

adam