Gossamer Forum
Home : General : Perl Programming :

Get a line from a html file

Quote Reply
Get a line from a html file
How to write a script if found "<a href" in a html file then copy the line until "</a>" match to another html file?

exam:

open (FILE, "test.html);
@lines=<FILE>;
close (FILE);

foreach $line (@lines) {
if ($line=~ /<a href/) {
# if found "<a href" then copy all of the line until "</a>" to $getline
print "$getline"; #print the line here
}
}

Quote Reply
Re: Get a line from a html file In reply to
Are you just trying to get all the links out of a page? I would recommend looking at:

http://www.personal.u-net.com/.../console32/xurl.html

which is a win95/nt exe file, or:

http://www.oasis.leo.org/...w/html/xurl.dsc.html

which is a perl script (though you'll need LWP and HTML::Parse).

It'll be much easier then trying to catch all the different possibilities of links. You could try:

if ($line =~ /<a\s+href[^>]+>([^<]+)<\/a>/i) {
print $1;
}

however that can fail in a lot of cases. Better to use the above programs..

Cheers,

Alex
Quote Reply
Re: Get a line from a html file In reply to
the script below will print only the text between the tag, how to change it to print the link exactly?

if ($line =~ /<a\s+href[^>]+>([^<]+)<\/a>/i) {
print $1;
}

and what is "^>" and "^<" mean? can you please explain for me?

Thank you very much!!

best regards,
kian
Quote Reply
Re: Get a line from a html file In reply to
That's an easy fix, just move the () brackets:

if ($line =~ /(<a\s+href[^>]+>[^<]+<\/a&gt Wink/i) {
print $1;
}

Should do the trick. Walking through it:

/ - Begin Reg Exp.
( - Save in $1 everything in the ( .. ) brackets.
<a - Look for the string '<a'
\s+ - Followed by one or more spaces.
href - Folloed by href.
[^>]+ - Followed by one or more (+) elements of the set [^>]. The ^> means everything except a >. So [^>]+ means everything up to the next >.
> - The closing >
[^<]+ - Like above, but everything up to the next <.
< - The opening <
\/a> - escaped / followed by closing a>
) - closing ).

Hope that helps,

Alex
Quote Reply
Re: Get a line from a html file In reply to
Below is the script that will get hyperlink from "test.html", and then print the output to a new page, my problem is it can only support for 5 links (in "test.html"), if "test.html" content more than 5 links, it will display a blank screen, may I know what's I am doing wrong here? Thanks for help!

p/s: I was download the two files at http://www.oasis.leo.org/perl/scripts/net/infosys/www/html/xurl.dsc.html, but it don't have any readme file, would you please tell me how to use it? Thanks again!

#!/usr/bin/perl

$html_url = "test.html";

open (FILE, $html_url);
@lines=<FILE>;
close (FILE);

foreach $line (@lines) {
if ($line =~ /(<a\s+href[^>]+>[^<]+<\/a&gt Wink/i) {
@links = (@links,$1);
}
}

foreach $links (@links) {
print "$links<br>";
}

Quote Reply
Re: Get a line from a html file In reply to
 
Quote:
@links = (@links,$1);

Ok, this just looks weird. =) Try:

push(@links, $1);

Cheers,

Alex
Quote Reply
Re: Get a line from a html file In reply to
Alex,

Cann't solve it, still the same problem. BTW, can you please tell me how to use the script that I download from

http://www.oasis.leo.org/perl/scripts/net/infosys/www/html/xurl.dsc.html

you suggested before?

Thanks!

Kian
Quote Reply
Re: [kian] Get a line from a html file In reply to
Hi

I am novice perlian . Hope this simple code helps.

It takes the file as input and outputs the links between <a href>" "</a>.
[:)]



# Vishal19178 : Perl program to scan for html links.

###############################################
#!/usr/bin/perl

print "Enter the html file name to be scanned:\n";

$html_url=<>;
chomp($html_url);

open (FILE,$html_url) or die ("Cannot open file");

@lines=<FILE>;
close(FILE);


print"Scanned links of the html file.....\n";
print"####################################\n";
print"\n";

foreach $line(@lines)
{
if ($line =~ /<a\s+href="([^"]+)"/i)
{
@links = (@links,$1);
}
}

foreach $links (@links)
{
print "$links\n";
print"\n";
}

######################################################


Please do make any changes for its betterment.

Thank you.


Vishal
Novice Perlian
Quote Reply
Re: [vishal19178] Get a line from a html file In reply to
Not much better than the original... but it should be a bit faster....

Code:
#!/usr/bin/perl

print "Enter the html file name to be scanned:\n";

$html_url = <>;

chomp($html_url);

open (FILE,$html_url) or die ("Cannot open file");
while (<FILE>) {
if ($line =~ /<a\s+href="([^"]+)"/i) {
push (@links,$1);
}
}
close(FILE);

print "Content-type: text/html \n\n";
print join ("\n<BR>",@links);

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Get a line from a html file In reply to
Hi Andy

I am still learning. Thanks for the correction from veterans like you.

Cheers.

Vishal

Newbie