Gossamer Forum
Home : General : Perl Programming :

Suggestions for reading Web Page (Scraping)?

Quote Reply
Suggestions for reading Web Page (Scraping)?
I need to get text from a website, but have never really thought about it before...

Basically I need to compare data from a Govt Website in an automated form. I need a way to "get" a webpage and then "save" the contents. (From there I can use pattern matching to compare the numbers and find what I need.)

Someone suggested a "Screen Scraper"... In the forums here I've seen a couple of references - does anybody have any comments, suggestions?

I think for my purpose having the data saved as a text-based file is the best solution. I've done similar things with net::ftp, but never thought about trying it via http.

Any comments are welcome. Is this legal, ethical, etc.?
Quote Reply
Re: [Watts] Suggestions for reading Web Page (Scraping)? In reply to
You'd probably want to look at using LWP::Useragent and HTTP::Request.

Philip
------------------
Limecat is not pleased.
Quote Reply
Re: [Watts] Suggestions for reading Web Page (Scraping)? In reply to
Something like this should work;

Code:
#!/usr/bin/perl

use strict;

my $url = 'http://www.domain.com/something/foo/bar.html';
my $write_path = 'file1.txt';

use LWP::Simple;

my @page = get($url);

open(WRITEIT,">$write_path") || die "Cant write $write_path. Reason: $!";
print WRITEIT @page;
close(WRITEIT);

print "Content-type: text/html \n\n";
print "Done!";

Hope that helps.

Cheers

Andy (mod)
andy@ultranerds.co.uk


IMPORTANT: I've now moved to ultranerds.co.uk, and the .com will no longer work!
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package (plugins total "value" $3,325 & rising, for just $350)| GLinks ULTRA Package PRO (plugins total "value" $5,625 & rising, for just $500)
Support Forum | Links SQL Plugins | DMOZ Dumps | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Compare our different Plugin packages *new* Free CSS Templates
Quote Reply
Re: [Andy] Suggestions for reading Web Page (Scraping)? In reply to
Thanks for the feedback guys!

Andy's example works. I think I'll be able to utilize it to start with.