Gossamer Forum
Home : General : Perl Programming :

Removing HTML with perl

Quote Reply
Removing HTML with perl
Hey peoples,

I'm trying to use wget to download a HTML file, and display the text in the HTML in an xterm(basically just what the webpage looks like, minus the pictures). Is there any way with Perl, to extract all < whatever_it_is > from a file ? That way, i'm basically just left with the text, and no HTML code. Or does anyone maybe have a better way of going about it ? Thanks
Quote Reply
Re: [mtorres] Removing HTML with perl In reply to
Funny, I just posted that question about 3 days ago. I'm still working to get a GOOD solution. Multi-line and convoluted items like commented out code pose some difficulties. I am working hard on this..

Last edited by:

webslicer: Apr 30, 2003, 2:29 PM
Quote Reply
Re: [webslicer] Removing HTML with perl In reply to
THis is the closest i've been able to get it...try this from a unix shell....

sed -e :a -e 's/<[^>]*>//g;/</N;//ba' $YOUR_FILENAME_HERE |grep -v "&nbsp"