Gossamer Forum
Home : General : Perl Programming :

Grab a list of URL's from a file?

Quote Reply
Grab a list of URL's from a file?
Hi. I have a file, consisting of;

Code:
<LI><A href="http://www.aiuniv.edu/">American InterContinental
University - Georgia</A>
<LI><A href="http://www.andoncollege.com/">Andon College -
Modesto</A>
<LI><A href="http://www.andoncollege.org/">Andon College -
Stockton</A>
<LI><A href="http://www.antiochla.edu/">Antioch University Los
Angeles</A>
<LI><A href="http://www.antiochsb.edu/">Antioch University Santa
Barbara</A>
<LI><A href="http://www.armstrong-u.edu/">Armstrong University</A>

<LI><A href="http://www.artcenter.edu/">Art Center College of
Design</A>
<LI><A href="http://www.aisc.edu/">Art Institute of Southern
California</A>
<LI><A href="http://www.apu.edu/">Azusa Pacific University</A>
<LI><A href="http://www.bethany.edu/">Bethany College
California</A>
<LI><A href="http://www.biola.edu/">Biola University</A>
<LI><A href="http://www.brooks.edu/">Brooks Institute of
Photography</A>
<LI><A href="http://www.calbaptist.edu/">California Baptist
College</A>
<LI><A href="http://www.calcoast.edu/">California Coast
University</A>
<LI><A href="http://www.cchs.edu/">California College for Health
Sciences</A>
<LI><A href="http://www.ccac-art.edu/">California College of Arts
and Crafts</A>
<LI><A href="http://www.ccpm.edu/">California College of Podiatric
Medicine</A>
<LI><A href="http://www.ciis.edu/">California Institute of
Integral Studies</A>
<LI><A href="http://www.caltech.edu/">California Institute of
Technology</A>
<LI><A href="http://www.calarts.edu/">California Institute of the
Arts</A>
<LI><A href="http://www.callutheran.edu/">California Lutheran
University</A>
<LI><A href="http://www.csum.edu/">California Maritime Academy</A>

I am currently splitting this data into single lines, and then doing a foreach, and a regex match to grab the title/URL's;

Code:
$line =~ m/\<A href=\"(.+?)\"\>(.+?)\<\/A\>/ and $url = $1, $title = $2;

Basically, because parts of the HTML are on seperate lines, I need a way to grab the URL's, and put them into an array. I could use HTML::LinkExtor, *but* because the file is in the same folder as the script (a cgi-bin), it would give a 500 IS error, and the script wouldn't grab anything.

Is there some kind of regex I can use to grab multiple links/titles at the same time?

TIA

Andy (mod)
andy@ultranerds.co.uk


IMPORTANT: I've now moved to ultranerds.co.uk, and the .com will no longer work!
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package (plugins total "value" $3,325 & rising, for just $350)| GLinks ULTRA Package PRO (plugins total "value" $5,625 & rising, for just $500)
Support Forum | Links SQL Plugins | DMOZ Dumps | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Compare our different Plugin packages *new* Free CSS Templates
Quote Reply
Re: [Andy] Grab a list of URL's from a file? In reply to
If you have the data in a scalar called $data :
Code:
my %urls;
$data =~ s!$/!!g;
$urls{$1} = $2 while ($data =~ s/"([^"]+)">([^<]+)//);

-g

s/(\d{2})/chr($1)/ge + print if $_ = '8284703280698276687967';
Quote Reply
Re: [GClemmons] Grab a list of URL's from a file? In reply to
Thanks, worked great. I ended up using;

Code:
my @list;
$joined =~ s!$/!!g;
push(@list,"$1::$2") while ($joined =~ s/"([^"]+)">([^<]+)//);

Out of interest.. what does the 2nd line in your code do?

Cheers

Andy (mod)
andy@ultranerds.co.uk


IMPORTANT: I've now moved to ultranerds.co.uk, and the .com will no longer work!
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package (plugins total "value" $3,325 & rising, for just $350)| GLinks ULTRA Package PRO (plugins total "value" $5,625 & rising, for just $500)
Support Forum | Links SQL Plugins | DMOZ Dumps | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Compare our different Plugin packages *new* Free CSS Templates
Quote Reply
Re: [Andy] Grab a list of URL's from a file? In reply to
$/ is the record separator variable (just saves typing to cover \n, \r, etc).

-g

s/(\d{2})/chr($1)/ge + print if $_ = '8284703280698276687967';