Gossamer Forum
Home : Products : Links 2.0 : Customization :

Extract URL and its link title (Plz HELP)

Quote Reply
Extract URL and its link title (Plz HELP)
Hi! Does anyone have any idea on extract the URL's (of certain type of file) out of a webpage and put it in a flat text database?

For example:
file.html containing the following contents:
<a href="../files/num_1.zip">1st one</a>
<a href="http://www.fff.com/files/num_2.zip">2nd one</a>

process and put into spidered.db:
1st one|../files/num_1.zip
2nd one|http://www.fff.com/files/num_2.zip

I know this is kinda complicated. I tried but nothing works:(
THANKS IN ADVICE!!!

Quote Reply
Re: Extract URL and its link title In reply to
Have you looked at the goFetch Modification? Have you read the Threads in this
forum about the goFetch Modification?

The goFetch Modification is located at:

http://lookhard.hypermart.net/links-mods/

Regards,

Eliot Lee
Quote Reply
Re: Extract URL and its link title In reply to
Thanks!

But how do I modify goFetch so that it'll grab the content b/w <a> and </a> instead of fetch the address and get the title of the page?

Thank you.

Quote Reply
Re: Extract URL and its link title In reply to
the old spider did this, you just need to put something in it to save the results and you shouldn't copy what i have in goFetch to do it

Quote Reply
Re: Extract URL and its link title In reply to
Does the old spider refers to the "Virtual Solutions Links Spider"?

Quote Reply
What's wrong w/ this script? In reply to
This is the script but it prints out nothing, any idea?
#!perl
use LWP::Simple;
$URL = "http://www.perl.com";
$src = get($URL);
while ($src =~ m#<a\s+ href\s*=\s*"?([^"] ?)"?>(. ?)</a>#ig) {
($link, $title) = ($1, $2);
$output .= "$title|$link\n";
}
print "$output";

Quote Reply
Re: What's wrong w/ this script? In reply to
The $title and $link variables seem to be undefined, meaning that nothing will print.

Regards,

Eliot Lee
Quote Reply
Re: What's wrong w/ this script? In reply to
What should I do? Thanks for your help.