Gossamer Forum
Home : General : Perl Programming :

Best way to strip most HTML?

Quote Reply
Best way to strip most HTML?
Hi,

Does anyone know of a way to strip out everything from a passed in variable, excluding valid text, and <i><u><b> etc tags?

Code:
sub cleanhtml {

my $in = $_[0];
$in =~ s|</?.*>||sig;

}

..but that will filter out all HTML tags, which I don't want Unsure

TIA

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Best way to strip most HTML? In reply to
sorry i want to help you but first i need to understand you.


you want to just filter <i> <u> <b> tags ?
Quote Reply
Re: [NamedRisk] Best way to strip most HTML? In reply to
Hi,

Its ok, I managed to get it working <G>

Code:
sub cleanhtml {

my $in = $_[0];

my @to_exclude = qw/style font script html body head div span meta/;

$in =~ s|<style>(.*?)</style>||sig;
$in =~ s|<title>(.*?)</title>|<h1>$1</h1>|sig;

$in =~ s|<p(.*?)>(.*?)</p>|<p>$2</p>|sig;

foreach (@to_exclude) {
# $in =~ s/<[^>]*>[^<]*$_[^<]*<[^>]*>//sig;
$in =~ s/<$_.*?>//sig;
$in =~ s/<\/$_.*?>//sig;
}

return $in;

}

Thanks for the reply though Cool

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!