Gossamer Forum
Home : General : Perl Programming :

web scrape

Quote Reply
web scrape
I am pretty new with perl and is working on a small project to grab some data off of a website. Here is what I have come up with:

#!/usr/bin/perl -w
use strict;
use WWW::Mechanize;
use HTML::TokeParser;
my $agent = WWW::Mechanize->new();
$agent->get("http://mospublic.ercot.com/...ing_services_mcp.jsp");
my $stream = HTML::TokeParser->new(\$agent->{content});
my $info1;
$info1 = $stream->get_trimmed_text("");
$info1 =~ s/\xa0/ /g;
print "$info1\n";

output:

Balancing Services MCP BALANCING SERVICES MCP Market Date: Interval Ending ZoneHOUSTON2003 ZoneNORTH2003 ZoneSOUTH2003 ZoneWEST2003 0015 $74.60 $74.60 $74.60
$74.60 0030 $31.95 $31.95 $31.95 $31.95 0045 $35.40 $35.40 $35.40 $35.40 0100 $31.95 $31.95 $31.95 $31.95 0115 $36.63 $36.63 $36.63 $36.63
0130 $33.43 $33.43 $33.43 $33.43 0145 $27.30 $27.30 $27.30 $27.30 0200 $9.40 $9.40 $9.40 $9.40 0215 $23.97 $23.97 $23.97 $23.97 0230 $31.
90 $31.90 $31.90 $31.90 0245 $33.43 $33.43 $33.43 $33.43 0300 $32.50 $32.50 $32.50 $32.50 0315 $29.95 $29.95 $29.95 $29.95 0330 $10.78 $10.
78 $10.78 $10.78 0345 $8.50 $8.50 $8.50 $8.50 0400 $18.21 $18.21 $18.21 $18.21 0415 $11.51 $11.51 $11.51 $11.51 0430 $27.50 $27.50 $27.50
$27.50 0445 $19.01 $19.01 $19.01 $19.01 0500 $27.35 $27.35 $27.35 $27.35 0515 $4.70 $4.70 $4.70 $4.70 0530 $5.89 $5.89 $5.89 $5.89 0545 $1
1.00 $11.00 $11.00 $11.00 0600 $17.42 $17.42 $17.42 $17.42 0615 $8.10 $8.10 $8.10 $8.10 0630 $26.99 $26.99 $26.99 $26.99 0645 $27.70 $27.70
$27.70 $27.70 0700 $30.40 $30.40 $30.40 $30.40 0715 $28.40 $28.40 $28.40 $28.40 0730 $31.11 $31.11 $31.11 $31.11 0745 $31.77 $31.77 $31.77
$31.77 0800 $33.04 $33.04 $33.04 $33.04 0815 $36.86 $36.86 $36.86 $36.86 0830 $34.89 $34.89 $34.89 $34.89 0845 $34.89 $34.89 $34.89 $34.89
0900 $35.80 $35.80 $35.80 $35.80 0915 $37.36 $37.36 $37.36 $37.36 0930 $38.12 $38.12 $38.12 $38.12 0945 $38.82 $38.82 $38.82 $38.82 1000
$38.82 $38.82 $38.82 $38.82 1015 $43.75 $43.52 $43.09 $43.47 1030 $43.52 $43.52 $43.52 $43.52 1045 $44.94 $41.72 $35.90 $41.11 1100 $39.82
$39.82 $39.82 $39.82 1115 $43.22 $43.22 $43.22 $43.22 1130 $46.33 $43.52 $38.43 $42.99 1145 $43.81 $43.81 $43.81 $43.81 1200 $44.01 $44.01
$44.01 $44.01 1215 $44.01 $44.01 $44.01 $44.01 1230 $44.31 $44.31 $44.31 $44.31 1245 $44.11 $44.11 $44.11 $44.11 1300 $43.81 $43.81 $43.81
$43.81 1315 $41.30 $41.30 $41.30 $41.30 1330 $39.50 $39.50 $39.50 $39.50 1345 $43.62 $43.62 $43.62 $43.62 1400 $45.30 $45.30 $45.30 $45.30
1415 $46.20 $46.20 $46.20 $46.20 1430 $46.90 $46.90 $46.90 $46.90 1445 $47.10 $47.10 $47.10 $47.10 1500 $47.10 $47.10 $47.10 $47.10 1515 $4
9.25 $49.25 $49.25 $49.25 1530 $49.25 $49.25 $49.25 $49.25 1545 $49.25 $49.25 $49.25 $49.25 1600 $49.38 $49.38 $49.38 $49.38 1615 $48.94 $4
8.94 $48.94 $48.94 1630 $49.37 $49.37 $49.37 $49.37 addButton('Help', "loadHelp('/balancing_services_mcpHelp.htm');", '50');

my problem is how can I get it to scrap on the data that I want and rem out unneccessary stuff. From the output above, all I want is the Zone name and all the values below it. For some reason my script extract all the text from the html and not trimmed out the stuff I ask it to. Please help. Thanks.
Quote Reply
Re: [ercotmaed] web scrape In reply to
I've removed the thread you just made. If you want to follow up regarding this topic then please post here and don't create new threads.
Quote Reply
Re: [Paul] web scrape In reply to
can someone help me out please???