Gossamer Forum
Home : General : Perl Programming :

Suggestions for reading Web Page (Scraping)?

Quote Reply
Suggestions for reading Web Page (Scraping)?
I need to get text from a website, but have never really thought about it before...

Basically I need to compare data from a Govt Website in an automated form. I need a way to "get" a webpage and then "save" the contents. (From there I can use pattern matching to compare the numbers and find what I need.)

Someone suggested a "Screen Scraper"... In the forums here I've seen a couple of references - does anybody have any comments, suggestions?

I think for my purpose having the data saved as a text-based file is the best solution. I've done similar things with net::ftp, but never thought about trying it via http.

Any comments are welcome. Is this legal, ethical, etc.?
Quote Reply
Re: [Watts] Suggestions for reading Web Page (Scraping)? In reply to
You'd probably want to look at using LWP::Useragent and HTTP::Request.

Philip
------------------
Limecat is not pleased.
Quote Reply
Re: [Watts] Suggestions for reading Web Page (Scraping)? In reply to
Something like this should work;

Code:
#!/usr/bin/perl

use strict;

my $url = 'http://www.domain.com/something/foo/bar.html';
my $write_path = 'file1.txt';

use LWP::Simple;

my @page = get($url);

open(WRITEIT,">$write_path") || die "Cant write $write_path. Reason: $!";
print WRITEIT @page;
close(WRITEIT);

print "Content-type: text/html \n\n";
print "Done!";

Hope that helps.

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Suggestions for reading Web Page (Scraping)? In reply to
Thanks for the feedback guys!

Andy's example works. I think I'll be able to utilize it to start with.