Gossamer Forum
Home : General : Perl Programming :

Regex headache

Quote Reply
Regex headache
Okay, admittedly I am not the greatest with regexs but I should have been able to figure this out. In short I am using LWP to download images on a page and then I have Image::Magick resizing them. My problem comes when I have an image to download that is relative to the URL for instance:

The page I want to "spider" is at

http://www.domain.com/...s/that/whatever.html

The tags on that page look like this:

<img src="../image.gif">

Basically I need a regex to take

http://www.domain.com/...s/that/whatever.html

and turn it into

http://www.domain.com/1/two/this/

so the image can be grabbed. Like I said, I am terrible with regexes so this is probably something simple...
Quote Reply
Re: [BennyHill] Regex headache In reply to
I'm a little confused as to what you are trying to do. So you want to grab the HTML from a page, and get the image name. Then download that image, and resize it locally on your server?

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [BennyHill] Regex headache In reply to
$url =~ s|(.*/).*/.*\.html|$1|;

not sure if this is best, but it works with greedy nature
Quote Reply
Re: [adrockjames] Regex headache In reply to
Im not sure he means that. He wants to take the document URL and work out the full path, based on the relative path in order to find the image.
Quote Reply
Re: [BennyHill] Regex headache In reply to
m{^http://({^/:]+(:(\d+))?)(/.*)?$

Will get you this
host is $1 #like mydomain.com
port is $3 # like 80 or nothing(assumed 80)
path is $4 #like /root/relative/path

Not sure if that will help you but you should be able to put some logic around what comes back in $4