Gossamer Forum
Home : Products : Gossamer Links : Development, Plugins and Globals :

Extracting text from a webpage

Quote Reply
Extracting text from a webpage
I'm working on programming a PERL subroutine or global to extract a text sample of a webpage. Would supplement the "Description" meta tag, etc info.

Specifically, how would I extract the first 200 text chars, (broken on word boundary) digest of a web page using LINKS SQL 2.12, stripped of HTML codes.

I will need this to run through all "200' (GOOD PAGE) valid links and insert to the links database in a custom field, let's say "page_extract".
Same goes for Title and Description meta tags, and email, but there may be code examples for that ...

thanks!
Quote Reply
Re: [webslicer] Extracting text from a webpage In reply to
http://www.gossamer-threads.com/...;;page=unread#unread

Only one is needed, which should I remove? Wink
Quote Reply
Re: [Paul] Extracting text from a webpage In reply to
Hmmm..

What do you think gives me a better chance of a solution?

Yes, there is no need for a near duplicate post. I posted here because after 2nd thought, believed it would have a better chance of being solved.

What do you think?

Of course, Paul, if you help out Blush, where it is posted is moot, except for the benefit of the community...!