I am pretty new with perl and is working on a small project to grab some data off of a website. Here is what I have come up with:
#!/usr/bin/perl -w
use strict;
use WWW::Mechanize;
use HTML::TokeParser;
my $agent = WWW::Mechanize->new();
$agent->get("http://mospublic.ercot.com/...ing_services_mcp.jsp");
my $stream = HTML::TokeParser->new(\$agent->{content});
my $info1;
$info1 = $stream->get_trimmed_text("");
$info1 =~ s/\xa0/ /g;
print "$info1\n";
output:
Balancing Services MCP BALANCING SERVICES MCP Market Date: Interval Ending ZoneHOUSTON2003 ZoneNORTH2003 ZoneSOUTH2003 ZoneWEST2003 0015 $74.60 $74.60 $74.60
$74.60 0030 $31.95 $31.95 $31.95 $31.95 0045 $35.40 $35.40 $35.40 $35.40 0100 $31.95 $31.95 $31.95 $31.95 0115 $36.63 $36.63 $36.63 $36.63
0130 $33.43 $33.43 $33.43 $33.43 0145 $27.30 $27.30 $27.30 $27.30 0200 $9.40 $9.40 $9.40 $9.40 0215 $23.97 $23.97 $23.97 $23.97 0230 $31.
90 $31.90 $31.90 $31.90 0245 $33.43 $33.43 $33.43 $33.43 0300 $32.50 $32.50 $32.50 $32.50 0315 $29.95 $29.95 $29.95 $29.95 0330 $10.78 $10.
78 $10.78 $10.78 0345 $8.50 $8.50 $8.50 $8.50 0400 $18.21 $18.21 $18.21 $18.21 0415 $11.51 $11.51 $11.51 $11.51 0430 $27.50 $27.50 $27.50
$27.50 0445 $19.01 $19.01 $19.01 $19.01 0500 $27.35 $27.35 $27.35 $27.35 0515 $4.70 $4.70 $4.70 $4.70 0530 $5.89 $5.89 $5.89 $5.89 0545 $1
1.00 $11.00 $11.00 $11.00 0600 $17.42 $17.42 $17.42 $17.42 0615 $8.10 $8.10 $8.10 $8.10 0630 $26.99 $26.99 $26.99 $26.99 0645 $27.70 $27.70
$27.70 $27.70 0700 $30.40 $30.40 $30.40 $30.40 0715 $28.40 $28.40 $28.40 $28.40 0730 $31.11 $31.11 $31.11 $31.11 0745 $31.77 $31.77 $31.77
$31.77 0800 $33.04 $33.04 $33.04 $33.04 0815 $36.86 $36.86 $36.86 $36.86 0830 $34.89 $34.89 $34.89 $34.89 0845 $34.89 $34.89 $34.89 $34.89
0900 $35.80 $35.80 $35.80 $35.80 0915 $37.36 $37.36 $37.36 $37.36 0930 $38.12 $38.12 $38.12 $38.12 0945 $38.82 $38.82 $38.82 $38.82 1000
$38.82 $38.82 $38.82 $38.82 1015 $43.75 $43.52 $43.09 $43.47 1030 $43.52 $43.52 $43.52 $43.52 1045 $44.94 $41.72 $35.90 $41.11 1100 $39.82
$39.82 $39.82 $39.82 1115 $43.22 $43.22 $43.22 $43.22 1130 $46.33 $43.52 $38.43 $42.99 1145 $43.81 $43.81 $43.81 $43.81 1200 $44.01 $44.01
$44.01 $44.01 1215 $44.01 $44.01 $44.01 $44.01 1230 $44.31 $44.31 $44.31 $44.31 1245 $44.11 $44.11 $44.11 $44.11 1300 $43.81 $43.81 $43.81
$43.81 1315 $41.30 $41.30 $41.30 $41.30 1330 $39.50 $39.50 $39.50 $39.50 1345 $43.62 $43.62 $43.62 $43.62 1400 $45.30 $45.30 $45.30 $45.30
1415 $46.20 $46.20 $46.20 $46.20 1430 $46.90 $46.90 $46.90 $46.90 1445 $47.10 $47.10 $47.10 $47.10 1500 $47.10 $47.10 $47.10 $47.10 1515 $4
9.25 $49.25 $49.25 $49.25 1530 $49.25 $49.25 $49.25 $49.25 1545 $49.25 $49.25 $49.25 $49.25 1600 $49.38 $49.38 $49.38 $49.38 1615 $48.94 $4
8.94 $48.94 $48.94 1630 $49.37 $49.37 $49.37 $49.37 addButton('Help', "loadHelp('/balancing_services_mcpHelp.htm');", '50');
my problem is how can I get it to scrap on the data that I want and rem out unneccessary stuff. From the output above, all I want is the Zone name and all the values below it. For some reason my script extract all the text from the html and not trimmed out the stuff I ask it to. Please help. Thanks.
#!/usr/bin/perl -w
use strict;
use WWW::Mechanize;
use HTML::TokeParser;
my $agent = WWW::Mechanize->new();
$agent->get("http://mospublic.ercot.com/...ing_services_mcp.jsp");
my $stream = HTML::TokeParser->new(\$agent->{content});
my $info1;
$info1 = $stream->get_trimmed_text("");
$info1 =~ s/\xa0/ /g;
print "$info1\n";
output:
Balancing Services MCP BALANCING SERVICES MCP Market Date: Interval Ending ZoneHOUSTON2003 ZoneNORTH2003 ZoneSOUTH2003 ZoneWEST2003 0015 $74.60 $74.60 $74.60
$74.60 0030 $31.95 $31.95 $31.95 $31.95 0045 $35.40 $35.40 $35.40 $35.40 0100 $31.95 $31.95 $31.95 $31.95 0115 $36.63 $36.63 $36.63 $36.63
0130 $33.43 $33.43 $33.43 $33.43 0145 $27.30 $27.30 $27.30 $27.30 0200 $9.40 $9.40 $9.40 $9.40 0215 $23.97 $23.97 $23.97 $23.97 0230 $31.
90 $31.90 $31.90 $31.90 0245 $33.43 $33.43 $33.43 $33.43 0300 $32.50 $32.50 $32.50 $32.50 0315 $29.95 $29.95 $29.95 $29.95 0330 $10.78 $10.
78 $10.78 $10.78 0345 $8.50 $8.50 $8.50 $8.50 0400 $18.21 $18.21 $18.21 $18.21 0415 $11.51 $11.51 $11.51 $11.51 0430 $27.50 $27.50 $27.50
$27.50 0445 $19.01 $19.01 $19.01 $19.01 0500 $27.35 $27.35 $27.35 $27.35 0515 $4.70 $4.70 $4.70 $4.70 0530 $5.89 $5.89 $5.89 $5.89 0545 $1
1.00 $11.00 $11.00 $11.00 0600 $17.42 $17.42 $17.42 $17.42 0615 $8.10 $8.10 $8.10 $8.10 0630 $26.99 $26.99 $26.99 $26.99 0645 $27.70 $27.70
$27.70 $27.70 0700 $30.40 $30.40 $30.40 $30.40 0715 $28.40 $28.40 $28.40 $28.40 0730 $31.11 $31.11 $31.11 $31.11 0745 $31.77 $31.77 $31.77
$31.77 0800 $33.04 $33.04 $33.04 $33.04 0815 $36.86 $36.86 $36.86 $36.86 0830 $34.89 $34.89 $34.89 $34.89 0845 $34.89 $34.89 $34.89 $34.89
0900 $35.80 $35.80 $35.80 $35.80 0915 $37.36 $37.36 $37.36 $37.36 0930 $38.12 $38.12 $38.12 $38.12 0945 $38.82 $38.82 $38.82 $38.82 1000
$38.82 $38.82 $38.82 $38.82 1015 $43.75 $43.52 $43.09 $43.47 1030 $43.52 $43.52 $43.52 $43.52 1045 $44.94 $41.72 $35.90 $41.11 1100 $39.82
$39.82 $39.82 $39.82 1115 $43.22 $43.22 $43.22 $43.22 1130 $46.33 $43.52 $38.43 $42.99 1145 $43.81 $43.81 $43.81 $43.81 1200 $44.01 $44.01
$44.01 $44.01 1215 $44.01 $44.01 $44.01 $44.01 1230 $44.31 $44.31 $44.31 $44.31 1245 $44.11 $44.11 $44.11 $44.11 1300 $43.81 $43.81 $43.81
$43.81 1315 $41.30 $41.30 $41.30 $41.30 1330 $39.50 $39.50 $39.50 $39.50 1345 $43.62 $43.62 $43.62 $43.62 1400 $45.30 $45.30 $45.30 $45.30
1415 $46.20 $46.20 $46.20 $46.20 1430 $46.90 $46.90 $46.90 $46.90 1445 $47.10 $47.10 $47.10 $47.10 1500 $47.10 $47.10 $47.10 $47.10 1515 $4
9.25 $49.25 $49.25 $49.25 1530 $49.25 $49.25 $49.25 $49.25 1545 $49.25 $49.25 $49.25 $49.25 1600 $49.38 $49.38 $49.38 $49.38 1615 $48.94 $4
8.94 $48.94 $48.94 1630 $49.37 $49.37 $49.37 $49.37 addButton('Help', "loadHelp('/balancing_services_mcpHelp.htm');", '50');
my problem is how can I get it to scrap on the data that I want and rem out unneccessary stuff. From the output above, all I want is the Zone name and all the values below it. For some reason my script extract all the text from the html and not trimmed out the stuff I ask it to. Please help. Thanks.