Gossamer Forum
Home : General : Perl Programming :

Regular Expression

Quote Reply
Regular Expression
Does anyone know what Regex to use to print out all of the tags in an HTML file. Thanks.

Adrian

Quote Reply
Re: Regular Expression In reply to
Try this (untested):

$text =~ s/<([^>]| )*>//g;


Dan Cool


Quote Reply
Re: Regular Expression In reply to
that would delete all the html tags in a html file..

use this..

[i have a feeling it won't work..]

#!/usr/bin/perl

$file = "whatever.html";

open FILE, "$file";
while (<FILE>) { $html .= $_; }
close FILE;

print "Content-type: text/plain\n\n";
foreach $tag ($html =~ /<(.+?)>/) {
print "$tag\n";
}

if i am correct.. that will print out the tags out..

umm..... for:
<img src="blah.gif">
it will print out:
img src="blah.gif"

if you just want "img".. change:

foreach $tag ($html =~ /<(.+?)>/) {

to

foreach $tag ($html =~ /<([^\s]+)(.*?)>/) {

Jerry Su
widgetz sucks
Quote Reply
Re: Regular Expression In reply to
Jerry:

I assumed he meant he wanted the HTML tags removed - i.e., convert HTML to text.

Adrian: Could you clarify.


Dan Cool


Quote Reply
Re: Regular Expression In reply to
Hi Dan,

Sorry for the confusion. Jerry's got what I tried to say. Thanks a lot to both of you for your help.

Regards,
Adrian

Quote Reply
Re: Regular Expression In reply to
Hi Jerry,

Thanks for the code. However, it only printed the first html tag. I made a few changes:

#!/usr/bin/perl

$file = "whaterver.html";

open FILE, "$file";
while (<FILE>) {
@tags = /<(\/?\w+)+/ig;
foreach (@tags) { print "$_\n" }
}
close FILE;


Thanks,
Adrian

Quote Reply
Re: Regular Expression In reply to
#!/usr/bin/perl

$file = "whatever.html";

open FILE, "$file";
while (<FILE>) { $html .= $_; }
close FILE;

print "Content-type: text/plain\n\n";
foreach $tag ($html =~ m#</?(.+?)>#g) {
print "$tag\n";
}


i forgot the g! no!!!!

Jerry Su
widgetz sucks