Gossamer Forum: General: Perl Programming: Removing HTML with perl

Gossamer Threads

Home : General : Perl Programming :

Removing HTML with perl

Apr 30, 2003, 1:14 PM

mtorres

Novice (16 posts)

Apr 30, 2003, 1:14 PM

Post #1 of 3

Shortcut

Removing HTML with perl

Hey peoples,

I'm trying to use wget to download a HTML file, and display the text in the HTML in an xterm(basically just what the webpage looks like, minus the pictures). Is there any way with Perl, to extract all < whatever_it_is > from a file ? That way, i'm basically just left with the text, and no HTML code. Or does anyone maybe have a better way of going about it ? Thanks

Apr 30, 2003, 2:24 PM

webslicer

User (236 posts)

Apr 30, 2003, 2:24 PM

Post #2 of 3

Shortcut

Re: [mtorres] Removing HTML with perl In reply to

Funny, I just posted that question about 3 days ago. I'm still working to get a GOOD solution. Multi-line and convoluted items like commented out code pose some difficulties. I am working hard on this..

Last edited by:

webslicer: Apr 30, 2003, 2:29 PM

Apr 30, 2003, 2:33 PM

mtorres

Novice (16 posts)

Apr 30, 2003, 2:33 PM

Post #3 of 3

Shortcut

Re: [webslicer] Removing HTML with perl In reply to

THis is the closest i've been able to get it...try this from a unix shell....

sed -e :a -e 's/<[^>]*>//g;/</N;//ba' $YOUR_FILENAME_HERE |grep -v "&nbsp"

Gossamer Threads is a Vancouver-based company with over 28 years experience in web technology. From development to hosting, we partner with leading organizations around the globe and help to build their web presences, strategies and infrastructures.

Let’s talk: 1-877-715-7676