Gossamer Forum
Home : General : Perl Programming :

Sucking out text between HTML tags?

Quote Reply
Sucking out text between HTML tags?
I have a program that basically goes into a directory and prints out each file onto an HTML page and sorts it so it would look something like this:

<UL>
<LI><A HREF="/november26.shtml">november26.shtml</A>
<LI><A HREF="/november25.shtml">november25.shtml</A>
<LI ><A HREF="/november24.shtml">november24.shtml</A>
</UL>

I want to extract the text between the <TITLE> and </TITLE> tags and print the page as follows:

<UL>
<LI><A HREF="/november26.shtml">November 26 Title Here</A>
<LI><A HREF="/november25.shtml">November 25 Title Here</A>
<LI><A HREF="/november24.shtml">November 24 Title Here</A>
</UL>

Any ideas?

[This message has been edited by Bryan (edited November 26, 1998).]

[This message has been edited by Bryan (edited November 26, 1998).]
Quote Reply
Re: Sucking out text between HTML tags? In reply to
Did you write the first program? If so, then I assume you are comfortable reading the files from the directory?

In your for loop (going through each file) add something like:

open (FILE, "$dir/$file") or die $!;
$text = join ("", <FILE> );
close FILE;
($title) = $text =~ m,<title>.+?</title>,i;

and then just print $title as you output your html file.

Hope that helps,

Alex