Gossamer Forum
Home : General : Perl Programming :

Grab a list of URL's from a file?

Quote Reply
Grab a list of URL's from a file?
Hi. I have a file, consisting of;

Code:
<LI><A href="http://www.aiuniv.edu/">American InterContinental
University - Georgia</A>
<LI><A href="http://www.andoncollege.com/">Andon College -
Modesto</A>
<LI><A href="http://www.andoncollege.org/">Andon College -
Stockton</A>
<LI><A href="http://www.antiochla.edu/">Antioch University Los
Angeles</A>
<LI><A href="http://www.antiochsb.edu/">Antioch University Santa
Barbara</A>
<LI><A href="http://www.armstrong-u.edu/">Armstrong University</A>

<LI><A href="http://www.artcenter.edu/">Art Center College of
Design</A>
<LI><A href="http://www.aisc.edu/">Art Institute of Southern
California</A>
<LI><A href="http://www.apu.edu/">Azusa Pacific University</A>
<LI><A href="http://www.bethany.edu/">Bethany College
California</A>
<LI><A href="http://www.biola.edu/">Biola University</A>
<LI><A href="http://www.brooks.edu/">Brooks Institute of
Photography</A>
<LI><A href="http://www.calbaptist.edu/">California Baptist
College</A>
<LI><A href="http://www.calcoast.edu/">California Coast
University</A>
<LI><A href="http://www.cchs.edu/">California College for Health
Sciences</A>
<LI><A href="http://www.ccac-art.edu/">California College of Arts
and Crafts</A>
<LI><A href="http://www.ccpm.edu/">California College of Podiatric
Medicine</A>
<LI><A href="http://www.ciis.edu/">California Institute of
Integral Studies</A>
<LI><A href="http://www.caltech.edu/">California Institute of
Technology</A>
<LI><A href="http://www.calarts.edu/">California Institute of the
Arts</A>
<LI><A href="http://www.callutheran.edu/">California Lutheran
University</A>
<LI><A href="http://www.csum.edu/">California Maritime Academy</A>

I am currently splitting this data into single lines, and then doing a foreach, and a regex match to grab the title/URL's;

Code:
$line =~ m/\<A href=\"(.+?)\"\>(.+?)\<\/A\>/ and $url = $1, $title = $2;

Basically, because parts of the HTML are on seperate lines, I need a way to grab the URL's, and put them into an array. I could use HTML::LinkExtor, *but* because the file is in the same folder as the script (a cgi-bin), it would give a 500 IS error, and the script wouldn't grab anything.

Is there some kind of regex I can use to grab multiple links/titles at the same time?

TIA

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Grab a list of URL's from a file? In reply to
If you have the data in a scalar called $data :
Code:
my %urls;
$data =~ s!$/!!g;
$urls{$1} = $2 while ($data =~ s/"([^"]+)">([^<]+)//);

-g

s/(\d{2})/chr($1)/ge + print if $_ = '8284703280698276687967';
Quote Reply
Re: [GClemmons] Grab a list of URL's from a file? In reply to
Thanks, worked great. I ended up using;

Code:
my @list;
$joined =~ s!$/!!g;
push(@list,"$1::$2") while ($joined =~ s/"([^"]+)">([^<]+)//);

Out of interest.. what does the 2nd line in your code do?

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Grab a list of URL's from a file? In reply to
$/ is the record separator variable (just saves typing to cover \n, \r, etc).

-g

s/(\d{2})/chr($1)/ge + print if $_ = '8284703280698276687967';