Gossamer Forum
Home : General : Perl Programming :

Get a line from a html file

Quote Reply
Get a line from a html file
How to write a script if found "<a href" in a html file then copy the line until "</a>" match to another html file?

exam:

open (FILE, "test.html);
@lines=<FILE>;
close (FILE);

foreach $line (@lines) {
if ($line=~ /<a href/) {
# if found "<a href" then copy all of the line until "</a>" to $getline
print "$getline"; #print the line here
}
}

Quote Reply
Re: Get a line from a html file In reply to
Are you just trying to get all the links out of a page? I would recommend looking at:

http://www.personal.u-net.com/.../console32/xurl.html

which is a win95/nt exe file, or:

http://www.oasis.leo.org/...w/html/xurl.dsc.html

which is a perl script (though you'll need LWP and HTML::Parse).

It'll be much easier then trying to catch all the different possibilities of links. You could try:

if ($line =~ /<a\s+href[^>]+>([^<]+)<\/a>/i) {
print $1;
}

however that can fail in a lot of cases. Better to use the above programs..

Cheers,

Alex
Quote Reply
Re: Get a line from a html file In reply to
the script below will print only the text between the tag, how to change it to print the link exactly?

if ($line =~ /<a\s+href[^>]+>([^<]+)<\/a>/i) {
print $1;
}

and what is "^>" and "^<" mean? can you please explain for me?

Thank you very much!!

best regards,
kian
Quote Reply
Re: Get a line from a html file In reply to
That's an easy fix, just move the () brackets:

if ($line =~ /(<a\s+href[^>]+>[^<]+<\/a&gt Wink/i) {
print $1;
}

Should do the trick. Walking through it:

/ - Begin Reg Exp.
( - Save in $1 everything in the ( .. ) brackets.
<a - Look for the string '<a'
\s+ - Followed by one or more spaces.
href - Folloed by href.
[^>]+ - Followed by one or more (+) elements of the set [^>]. The ^> means everything except a >. So [^>]+ means everything up to the next >.
> - The closing >
[^<]+ - Like above, but everything up to the next <.
< - The opening <
\/a> - escaped / followed by closing a>
) - closing ).

Hope that helps,

Alex
Quote Reply
Re: Get a line from a html file In reply to
Below is the script that will get hyperlink from "test.html", and then print the output to a new page, my problem is it can only support for 5 links (in "test.html"), if "test.html" content more than 5 links, it will display a blank screen, may I know what's I am doing wrong here? Thanks for help!

p/s: I was download the two files at http://www.oasis.leo.org/perl/scripts/net/infosys/www/html/xurl.dsc.html, but it don't have any readme file, would you please tell me how to use it? Thanks again!

#!/usr/bin/perl

$html_url = "test.html";

open (FILE, $html_url);
@lines=<FILE>;
close (FILE);

foreach $line (@lines) {
if ($line =~ /(<a\s+href[^>]+>[^<]+<\/a&gt Wink/i) {
@links = (@links,$1);
}
}

foreach $links (@links) {
print "$links<br>";
}

Quote Reply
Re: Get a line from a html file In reply to
 
Quote:
@links = (@links,$1);

Ok, this just looks weird. =) Try:

push(@links, $1);

Cheers,

Alex
Quote Reply
Re: Get a line from a html file In reply to
Alex,

Cann't solve it, still the same problem. BTW, can you please tell me how to use the script that I download from

http://www.oasis.leo.org/perl/scripts/net/infosys/www/html/xurl.dsc.html

you suggested before?

Thanks!

Kian
Quote Reply
Re: [kian] Get a line from a html file In reply to
Hi

I am novice perlian . Hope this simple code helps.

It takes the file as input and outputs the links between <a href>" "</a>.
[:)]



# Vishal19178 : Perl program to scan for html links.

###############################################
#!/usr/bin/perl

print "Enter the html file name to be scanned:\n";

$html_url=<>;
chomp($html_url);

open (FILE,$html_url) or die ("Cannot open file");

@lines=<FILE>;
close(FILE);


print"Scanned links of the html file.....\n";
print"####################################\n";
print"\n";

foreach $line(@lines)
{
if ($line =~ /<a\s+href="([^"]+)"/i)
{
@links = (@links,$1);
}
}

foreach $links (@links)
{
print "$links\n";
print"\n";
}

######################################################


Please do make any changes for its betterment.

Thank you.


Vishal
Novice Perlian
Quote Reply
Re: [vishal19178] Get a line from a html file In reply to
Not much better than the original... but it should be a bit faster....

Code:
#!/usr/bin/perl

print "Enter the html file name to be scanned:\n";

$html_url = <>;

chomp($html_url);

open (FILE,$html_url) or die ("Cannot open file");
while (<FILE>) {
if ($line =~ /<a\s+href="([^"]+)"/i) {
push (@links,$1);
}
}
close(FILE);

print "Content-type: text/html \n\n";
print join ("\n<BR>",@links);

Cheers

Andy (mod)
andy@ultranerds.co.uk


IMPORTANT: I've now moved to ultranerds.co.uk, and the .com will no longer work!
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package (plugins total "value" $3,325 & rising, for just $350)| GLinks ULTRA Package PRO (plugins total "value" $5,625 & rising, for just $500)
Support Forum | Links SQL Plugins | DMOZ Dumps | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Compare our different Plugin packages *new* Free CSS Templates
Quote Reply
Re: [Andy] Get a line from a html file In reply to
Hi Andy

I am still learning. Thanks for the correction from veterans like you.

Cheers.

Vishal

Newbie