Gossamer Forum
Home : General : Perl Programming :

Grab actual URL...

Quote Reply
Grab actual URL...
GT's scripts have been causing problems again with grabbing other sites URL's Tongue

I'm writing a spider plugin for LSQL, and use the following to grab the HTML from the page;

my @html = get($_);

The problem I am having, is that some of the spidered pages (to grab other new links) are refering to jump.cgi, on their domain. What I really need to grab is the final URL that the spider is sent to....anyone got any ideas? I'm stumped Frown

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Grab actual URL... In reply to
...you mean you want to follow the hyperlinks?

That's kind of the beauty of spiders....they are written to do that Wink

You need to grab all links, then fetch them then parse etc. up to a certain number of levels deep probably specified as an admin option.

You could do with using parrallell fetching too.
Quote Reply
Re: [Paul] Grab actual URL... In reply to
Yes, I know thats what spiders are meant to do, which is why I'm trying to do it Wink

I still don't understand how you are saying I could grab the actual URL, rather than jump.cgi :-/

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Grab actual URL... In reply to
I don't get why you need to grab the real URL. If you spider a page and jump.cgi happens to be one of the hyperlinks, then so be it. It will still point to the right place.
Quote Reply
Re: [Paul] Grab actual URL... In reply to
Yes,but it needs to be stored in the LSQL database as the actual URL, and not the jump.cgi one Wink

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Grab actual URL... In reply to
Anyone? Unsure

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!