Gossamer Forum: Products: Gossamer Links: Development, Plugins and Globals: Extracting text from a webpage

Apr 27, 2003, 2:45 PM

webslicer

User (236 posts)

Apr 27, 2003, 2:45 PM

Post #1 of 3

Shortcut

Extracting text from a webpage

I'm working on programming a PERL subroutine or global to extract a text sample of a webpage. Would supplement the "Description" meta tag, etc info.

Specifically, how would I extract the first 200 text chars, (broken on word boundary) digest of a web page using LINKS SQL 2.12, stripped of HTML codes.

I will need this to run through all "200' (GOOD PAGE) valid links and insert to the links database in a custom field, let's say "page_extract".
Same goes for Title and Description meta tags, and email, but there may be code examples for that ...

thanks!

Apr 27, 2003, 4:24 PM

Paul

Veteran (19537 posts)

Apr 27, 2003, 4:24 PM

Post #2 of 3

Shortcut

Re: [webslicer] Extracting text from a webpage In reply to

http://www.gossamer-threads.com/...;;page=unread#unread

Only one is needed, which should I remove? Wink

Apr 27, 2003, 4:40 PM

webslicer

User (236 posts)

Apr 27, 2003, 4:40 PM

Post #3 of 3

Shortcut

Re: [Paul] Extracting text from a webpage In reply to

Hmmm..

What do you think gives me a better chance of a solution?

Yes, there is no need for a near duplicate post. I posted here because after 2nd thought, believed it would have a better chance of being solved.

What do you think?

Of course, Paul, if you help out Blush

, where it is posted is moot, except for the benefit of the community...!