Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: MythTV: Users

New Australian XMLTV grabber

 

 

First page Previous page 1 2 Next page Last page  View All MythTV users RSS feed   Index | Next | Previous | View Threaded


michael-mlists at cheshire

Nov 17, 2004, 3:18 AM

Post #1 of 30 (10437 views)
Permalink
New Australian XMLTV grabber

The great script, updated with foxtel channels.

#!/usr/bin/perl -w
# Australian TV Guide XMLTV grabber by Damon Searle
# Derived from a yahoo XMLTV grabber by Ron Kellam which was itself...
# Derived from original code by Justin Hawkins
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

# 30 Oct 2004
# Damon Searle <djsearle [at] netspace>
# - wrote first version
# - gets data from NineMSN as a backup. Its not that fancy,
# 31 Oct 2004
# Fred Donelly <fdonelly [at] hotmail>
# - added an option so that the output file can be specified on the
# command line and from the quick test I gave it, it now works with
# mythfilldatabase.
# - $offset set to +1000 at the top and then had "+1000" set in a
# output string further down rather than the variable
# 4 Nov 2004
# Paul Andreassen <paulx [at] andreassen>
# - learned some perl and now wants to go back to python
# - added and then reduced status info
# - retry on failure to getstore
# - changed cache to '/var/local/tv_grab_au'
# - added threading for each day
# 5 Nov 2004
# - improved threading with use of queue
# Eyal Lebedinsky <eyal [at] eyal>
# - easier location selection
# 8 Nov 2004
# Paul
# - fixed pid=0 bug
# - did some merging, I hate merging
# 9 Nov 2004
# Rob Hill <rob [at] dot>
# - added Sydney
# 10 Nov 2004
# Mary Wright <mwright [at] taz-devil>
# - digital info for Sydney
# Paul
# - more cleanup and improved error checking
# - used mirror instead of getstore to get any updates
# - mirror didn't work replaced with own smarts to check for updates to
times
# 11 Nov 2004
# - added program name in check
# 13 Nov 2004
# - added freesd for Brisbane
# 14 Nov 2004
# - --configure to exit nicely
# - if no program data then skip program nicely, mainly for foxtel data
# - added foxtel channels

# 17 Nov 2004

# - added remaining foxtel channels

use strict;
use Getopt::Long;
use XMLTV;
use LWP::Simple;
use Date::Manip;
use File::Path;
use threads;
use Thread::Queue;

# Instructions:
# Select your region and source.
# If your location isn't listed below, go to
# http://tvguide.ninemsn.com.au/guide/ select your area
# look at the last number in the URL before ".asp" and set
# the region variable below. Then put the channel names as listed
# on the tv guide site into the variables below.
# Then set your XMLTV ids from the database in the XMLTVID_URL variable.
#
# If it doesn't work with mythfilldatabase, try:
# ./tv_grab_au
# mythfilldatabase --file 1 -1 /var/local/tv_grab_au/guide.xml

# pick your region
#
#my $location = "Brisbane";
#my $location = "Canberra";
#my $location = "Sydney";
#my $location = "Adelaide";
my $location = "Australia";

# pick your source
#
#my $source = "free";
#my $source = "freesd";
#my $source = "freehd";
my $source = "foxtel";

# choose the XMLID URL suffix that mythtv knows
#
my $XMLTVID_URL = "d1.com.au";

# change to how you think it should work
my $days_to_grab = 7;
my $threads = 5;
my $retrys = 3;
my $secondsbeforeretry = 2;

# Variables
my $guide_url = "http://tvguide.ninemsn.com.au/guide/";
my $details_url = "http://tvguide.ninemsn.com.au/closeup/default.asp?pid=";
my $cache_dir = "/var/local/tv_grab_au.foxtel";

my $XMLTV_prefix = $source . "." . $location . ".";
my $XMLTV_suffix = "." . $XMLTVID_URL;

my $region; my $offset;
my %channels;

if ("Canberra" eq $location) {
$region = "126";
$offset = "+1100";
if ("free" eq $source) {
$channels{"ABC NSW"}="2";
$channels{"Prime Southern"}="PrimS";
$channels{"SBS Sydney"}="SBS";
$channels{"Southern Cross TEN Capital"}="10Cap";
$channels{"WIN Television NSW"}="WIN"
} elsif ("freesd" eq $source) {
$channels{"ABC NSW"}="2";
$channels{"Prime Southern"}="7";
$channels{"SBS Sydney"}="SBS";
$channels{"Southern Cross TEN Capital"}="10";
$channels{"WIN Television NSW"}="9"
} else {
print "Unknows source '$source' for $location\n";
exit (1);
}
} elsif ("Brisbane" eq $location) {
$region = "79";
$offset = "+1000";
if (("free" eq $source)||("freesd" eq $source)) {
$channels{"ABC QLD"}="2";
$channels{"Channel Seven Queensland"}="7";
$channels{"SBS Queensland"}="SBS";
$channels{"Southern Cross TEN Queensland"}="10";
$channels{"WIN Television QLD"}="9";
} else {
print "Unknows source '$source' for $location\n";
exit (1);
}
} elsif ("Sydney" eq $location) {
$region = "73";
$offset = "+1000";
if (("free" eq $source)||("freesd" eq $source)) {
$channels{"ABC NSW"}="2";
$channels{"Channel Seven Sydney"}="7";
$channels{"SBS Sydney"}="SBS";
$channels{"Network TEN Sydney"}="10";
$channels{"Channel Nine Sydney"}="9";
} else {
print "Unknows source '$source' for $location\n";
exit (1);
}
} elsif ("Adelaide" eq $location) {
$region = "81";
$offset = "+0930";
if (("free" eq $source)||("freesd" eq $source)) {
$channels{"ABC SA"}="2";
$channels{"Channel Seven Adelaide"}="7";
$channels{"SBS"}="SBS";
$channels{"Network TEN Adekaude"}="10";
$channels{"Channel Nine Adekaude"}="9";
} else {
print "Unknows source '$source' for $location\n";
exit (1);
}
} elsif ("Australia" eq $location) {
$region = "123";
$offset = "+0930";
if ("foxtel" eq $source) {
$channels{"Arena TV"}="Arena";
$channels{"BBC World"}="BBC";
$channels{"Cartoon Network"}="Cartoon";
$channels{"Channel [V]"}="Red";
$channels{"CNBC"}="CNBC";
$channels{"CNN"}="CNN";
$channels{"Discovery Channel"}="Disc";
$channels{"FOX News"}="FoxFNC";
$channels{"FOX8"}="FOX";
$channels{"MAX"}="FoxMMX";
$channels{"National Geographic Channel"}="NatGe";
$channels{"Nickelodeon"}="Nick";
$channels{"Showtime"}="Show";
$channels{"Showtime 2"}="FoxSH2";
$channels{"Sky News"}="SkyNews";
$channels{"TV1"}="TV1";
$channels{"UKTV"}="UKTV";
$channels{"Showtime Greats"}="ShowGreats";
$channels{"World Movies"}="wmov";
$channels{"WCH"}="WCH";
$channels{"TVSN"}="TVSN";
$channels{"Sky Racing"}="SkyRa";
$channels{"Ovation"}="Ovation";
$channels{"Disney Channel"}="Disney";
$channels{"Animal Planet"}="Animal";
$channels{"The Comedy Channel"}="Com";
$channels{"The LifeStyle Channel"}="Lifes";
$channels{"FOX Sports 1"}="FoxFS1";
$channels{"Movie One"}="Movie1";
$channels{"TCM"}="TCM";
$channels{"MTV"}="MTV";
$channels{"FOX Sports 2"}="FoxSP2";
$channels{"FOX Footy Channel"}="FFC";
$channels{"Movie Extra"}="MovieEx";
$channels{"Hallmark Channel"}="Hall";
$channels{"The History Channel"}="FoxHST";
$channels{"ESPN"}="ESPN";
$channels{"FOX Classics"}="FoxCLA";
$channels{"Movie Greats"}="MovieGr";
} else {
print "Unknows source '$source' for $location\n";
exit (1);
}
} else {
print "Unknows location '$location'\n";
exit (1);
}

my $prog_ref;
my $chan_ref;

foreach my $channel (keys %channels)
{
$$chan_ref{$channel} =
{
'id' => $XMLTV_prefix . $channels{$channel} . $XMLTV_suffix,
'display-name' => [ [ $channel, undef ]]
};
}


# Options
my $opt_days;
my $opt_output;
my $opt_configfile;
my $opt_configure = 0;

GetOptions('days=i' => \$opt_days,
'output=s' => \$opt_output,
'config-file=s' => \$opt_configfile,
'configure' => \$opt_configure,
);

if ($opt_days) {
$days_to_grab = $opt_days
}

if (!($opt_output)) {
$opt_output = $cache_dir . "/guide.xml";
}

# $opt_configfile should probably do something
('/home/mythtv/.mythtv/tv_grab_au.xmltv')

if ($opt_configure == 1)
{
print "configuration must be done in this script $0\n";
exit (0);
}

print "grabing $days_to_grab days into $opt_output\n";




print "starting $threads threads\n";

my @thrlist;
my $datepids = Thread::Queue->new;

for (my $thread=0; $thread<$threads; $thread++)
{
push @thrlist, threads->new(\&fetch_details);
}

print "loading queue\n";

my $currentday = &ParseDate("today");
my $day_counter = 1;
while ($day_counter <= $days_to_grab)
{
my $date = &UnixDate($currentday, "%d%m%Y");
my @day_lines = get_day($date,1);
if (@day_lines == 0)
{
$currentday = &DateCalc($currentday, "+ 1 day");
$day_counter++;
next;
}

my @pids;
my @rowspans;
my @names;
foreach my $line (@day_lines)
{
foreach my $link (split /\n|tr|TR|TD|tr/, $line )
{
if ($link =~ /closeup\/default.asp/)
{
my $rowspan = $link;
$rowspan =~ s/.+rowspan=//g;
$rowspan =~ s/ .+//g;

my $name = $link;
$name =~ s/.+target=new>(<P>|)//g;
$name =~ s/<\/a>.+//g;

$link =~ s/.+pid=//g;
$link =~ s/".+//g; #"
if (($rowspan =~ /\d+/) and ($link =~ /\d\d+/))
{
push @pids, $link;
push @rowspans, $rowspan;
push @names, $name;
}
}
}
}

if (changed_guide($date,@pids,@rowspans,@names))
{
for (my $count=0; $count <= $#pids; $count++)
{
$datepids->enqueue($date . "-" . $pids[$count]);
}
}

$day_counter++;
$currentday = &DateCalc($currentday, "+ 1 day");
}

for (my $thread=0; $thread<$threads; $thread++)
{
$datepids->enqueue(0 . "-" . 0);
}

print "queue is complete\n";

foreach my $thr (@thrlist)
{
$thr->join;
}

print "all threads done\n";
print "building xml structure\n";

$currentday = &ParseDate("today");
$day_counter = 1;
while ($day_counter <= $days_to_grab)
{
my @pids;
my $date = &UnixDate($currentday, "%d%m%Y");

my $guide_prn_file = $cache_dir . "/" . $date . "/guide.prn";
if (open(PRN, $guide_prn_file))
{
my @prn = split />/, <PRN>;
close(PRN);

if ($#prn > 1)
{
my $pidlast = ($#prn + 1)/3 - 1;
@pids=@prn[0..$pidlast];
}
else
{
print "no pids in $guide_prn_file\n";
@pids=();
}
}
else
{
print "can't read $guide_prn_file\n";
@pids=()
}

my $retry = 0;
foreach my $pid (@pids)
{
my @details = get_details($date, $pid);
if (@details == 0)
{
next;
}

my $show_details_table = "";
my $use_line = 0;
my $close_html = 0;
foreach my $line (@details)
{
if ($line =~ /bgColor=#f7f3e8/)
{
$use_line = 0;
}
if ($use_line == 1)
{
$show_details_table .= $line;
}
if ($line =~ /bgcolor=#ffffff/)
{
$use_line = 1;
}
if ($line =~ /<\/HTML>/)
{
$close_html = 1;
}
}

if ($close_html == 0)
{
my $name = $cache_dir . "/" . $date . "/" . $pid . ".html";
if ($retry++ >= $retrys)
{
print "giving up on truncated $name\n";
$retry=0;
next;
}
unlink $name;
push @pids, $pid;
print "t"; # truncated
sleep($secondsbeforeretry);
next;
}

if ((length $show_details_table) == 0)
{
print "m"; # missing: can't do anything about this
$retry=0;
next;
}

$show_details_table =~ s/<[^>]*>/\n/g;
$show_details_table =~ s/\&nbsp\;//g;
#$show_details_table =~ s/<BR>|<TR>|<TD><B><b><\/B><\/b>/\n/g;
#$show_details_table =~ s/Genre://g;
#$show_details_table =~ s/Rated:/\n/g;
my $count = 0;

my $channel = "";
my $start_date = &UnixDate($currentday, "%Y-%m-%d");
my $time;
my $title1 = "";
my $title2 = "";
my $genre = "";
my $descr = "";
my $details = "";
my $duration;


#print $show_details_table. "\n\n\n";
foreach my $line (split /\n/, $show_details_table)
{
if ($count == 4){
#print "Time: " . $line . "\n";
$time = $line;
}
elsif ($count == 7){
$channel = $line;
#print "Channel: " . $line . "\n";
}
elsif ($count == 19){
$title1 = $line;
#print "Program: " . $line . "\n";
}
elsif ($count == 20){
$line =~ s/ - //g;
$title2 = $line;
#print "Subtitle: " . $line . "\n";
}
elsif ($count == 21){
$line =~ s/\D//g;
$duration = $line;
#print "Run time: " . $line . "\n";
}
elsif ($count == 22){
$line =~ s/[^A-Z]//g;
$details = $line;
#print "Rating: " . $line . "\n";
}
elsif ($count == 26){
$line =~ s/ //g;
$genre = $line;
#print "Genre: " . $line . "\n";
}
elsif ($count == 28 && $line =~ /[a-zA-Z]/){
$descr = $line;
#print "Description: " . $line . "\n";
}
#elsif ($count == 26 && $line =~ /[a-zA-Z]/){
# $descr = $line;
# print "Description: " . $line . "\n";
#}
#print $count .": " . $line . "\n";
++$count;
}


my $start_time = &UnixDate($time, "%H:%M");
# my $start_datetime = $start_date . " " . $start_time;
if ($start_time =~ /00:|01:|02:|03:|04:|05:/)
{
$start_date = &DateCalc($start_date, "+ 1 day");
}
$start_date = &UnixDate($start_date, "%Y%m%d");
my $end_time = &DateCalc($start_time, " + " . $duration . "minutes");
$end_time = &UnixDate($end_time, "%H:%M");

my $end_date;
if (&Date_Cmp($start_time, $end_time) <= 0)
{
$end_date = $start_date;
}
else
{
my $err;
my $edate = &DateCalc($start_date, "+ 1 day", \$err);
$end_date = &UnixDate($edate, "%Y%m%d");
}

if (defined $channels{$channel})
{
$channel = $XMLTV_prefix . $channels{$channel} . $XMLTV_suffix;
}
else
{
print "unknown channel $channel\n";
$retry=0;
next;
}

my $start;
my $stop;

$start = $start_date . &UnixDate($start_time,"%H%M") . "00 " . $offset;
$stop = $end_date . &UnixDate($end_time,"%H%M") . "00 " . $offset;

my $a_prog = {
channel => $channel,
start => $start,
stop => $stop,
title => [ [ $title1, undef ] ]
};

$descr =~ s/^\s+//;
$descr =~ s/\s+$//;

if ($title2) { $$a_prog{'sub-title'} = [ [ $title2, undef ] ]; }
if ($descr) { $$a_prog{desc} = [ [ $descr, undef ] ]; }
if ($genre) { $$a_prog{category} = [ [ $genre, undef ] ]; }

push @$prog_ref, $a_prog;
$retry=0;
}

$currentday = &DateCalc($currentday, "+ 1 day");
$day_counter++;
}

my $data = [
'ISO-8859-1',
{
'source-info-name' => 'http://tvguide.ninemsn.com.au/',
'generator-info-name' => 'NineMSN grabber',
'generator-info-url' => '',
'generator-info-name' => "XMLTV - tv_grab_au NineMSN v0.2"
},
$chan_ref,
$prog_ref
];

my $hour=&UnixDate(&ParseDate("now"),"%H");
if ($hour < 6)
{
print "can't update between 0:00 and 6:00\n";
# If we update between these hours we lose any data we had up to 6:00.
# This is because the web site starts a day at 6:00 and ends at 6:00 the
next day
# This could be fixed by read the previous days info and adding the needed
shows.
# I did try adding the whole previous day but got lots of mythfilldatabase
errors.
exit(1);
}

print "writing file\n";

my $fh = new IO::File ">$opt_output";
XMLTV::write_data($data, OUTPUT=>$fh);

print "done\n";

# subroutines
sub get_day
{
my $date = shift;
my $force = shift;
my $url = $guide_url . $date . "_" . $region . ".asp";

my $guide_dir = $cache_dir . "/" . $date;
my $guide_file = $guide_dir . "/guide.html";
mkpath ($guide_dir);

for (my $retry=0; (($force==1) || (!(-e $guide_file))) &&
is_error(getstore($url, $guide_file)) && ($retry<$retrys); $retry++)
{
print ".";
sleep($secondsbeforeretry);
}

my @guide_lines;
if (open(GUIDE, $guide_file))
{
@guide_lines = <GUIDE>;
close(GUIDE);
}
else
{
@guide_lines = ();
print "giving up on $guide_file\n";
}
return @guide_lines;
}

sub get_details
{
my $date = shift;
my $program_id = shift;

my $url = $details_url . $program_id;
my $guide_dir = $cache_dir . "/" . $date;
my $details_file = $guide_dir . "/" . $program_id . ".html";
mkpath ($guide_dir);

for (my $retry=0; (!(-e $details_file)) && is_error(getstore($url,
$details_file)) && ($retry<$retrys); $retry++)
{
print ".";
sleep($secondsbeforeretry);
}

my @details_lines;
if (open(DETAILS, $details_file))
{
@details_lines = <DETAILS>;
close(DETAILS);
}
else
{
@details_lines = ();
print "giving up on $details_file\n";
}
return @details_lines;
}

sub fetch_details
{
my $datepid=$datepids->dequeue;
my @datepidl=split /-/, $datepid;
my $date = $datepidl[0];
my $pid = $datepidl[1];

while (($date!=0) and ($pid!=0))
{
my $guide_dir = $cache_dir . "/" . $date;
mkpath ($guide_dir);

my $url = $details_url . $pid;
my $details_file = $guide_dir . "/" . $pid . ".html";

for (my $retry=0; is_error(getstore($url, $details_file)) &&
($retry<$retrys); $retry++)
{
sleep($secondsbeforeretry);
}

$datepid=$datepids->dequeue;
@datepidl=split /-/, $datepid;
$date = $datepidl[0];
$pid = $datepidl[1];
}
}

sub changed_guide
{
my $date = shift;
my @pidsrowspansnames = @_;

my $guide_prn_file = $cache_dir . "/" . $date . "/guide.prn";
if (open(PRN, $guide_prn_file))
{
my @prn = split />/, <PRN>;
close(PRN);

if (($#prn > 1) and ($#prn == $#pidsrowspansnames))
{
my $count;
my $diff = ((($#prn+1)*2)/3)-1;
for ($count=0; ($count <= $diff) &&
($prn[$count]==$pidsrowspansnames[$count]); $count++)
{ }

if ($count==($diff+1))
{
for (; ($count <= $#prn) && ($prn[$count] eq $pidsrowspansnames[$count]);
$count++)
{ }

if ($count==($#prn+1))
{
print "$date unchanged\n";
return 0;
}
}
}
}

print "$date downloading\n";

if (open(PRN, ">", $guide_prn_file))
{
for (my $count=0; $count<$#pidsrowspansnames; $count++)
{
print PRN "$pidsrowspansnames[$count]>";
}
print PRN "$pidsrowspansnames[$#pidsrowspansnames]";
close(PRN);
}
else
{
print "can't open for writing $guide_prn_file\n";
}

return 1;
}


eyal at eyal

Nov 17, 2004, 4:33 AM

Post #2 of 30 (10371 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

Michael Cheshire (Mailing Lists) wrote:
> The great script, updated with foxtel channels.

If I did not break anything else then this is the same thing, as an attachment
so that whitespace is not lost, and with one misspelling fixed...

--
Eyal Lebedinsky (eyal [at] eyal) <http://samba.org/eyal/>
Attachments: tv_grab_au (17.5 KB)


michael-mlists at cheshire

Nov 17, 2004, 1:20 PM

Post #3 of 30 (10337 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

Actually there were several more spelling mistakes..

Woops :)

----- Original Message -----
From: "Eyal Lebedinsky" <eyal [at] eyal>
To: "Discussion about mythtv" <mythtv-users [at] mythtv>
Sent: Wednesday, November 17, 2004 11:03 PM
Subject: Re: [mythtv-users] New Australian XMLTV grabber


> Michael Cheshire (Mailing Lists) wrote:
>> The great script, updated with foxtel channels.
>
> If I did not break anything else then this is the same thing, as an
> attachment
> so that whitespace is not lost, and with one misspelling fixed...
>
> --
> Eyal Lebedinsky (eyal [at] eyal) <http://samba.org/eyal/>
>


--------------------------------------------------------------------------------


> #!/usr/bin/perl -w
> # Australian TV Guide XMLTV grabber by Damon Searle
> # Derived from a yahoo XMLTV grabber by Ron Kellam which was itself...
> # Derived from original code by Justin Hawkins
> #
> # This program is free software; you can redistribute it and/or modify
> # it under the terms of the GNU General Public License as published by
> # the Free Software Foundation; either version 2 of the License, or
> # (at your option) any later version.
> #
> # This program is distributed in the hope that it will be useful,
> # but WITHOUT ANY WARRANTY; without even the implied warranty of
> # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> # GNU General Public License for more details.
> #
> # You should have received a copy of the GNU General Public License
> # along with this program; if not, write to the Free Software
> # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307
> USA
>
> # 30 Oct 2004
> # Damon Searle <djsearle [at] netspace>
> # - wrote first version
> # - gets data from NineMSN as a backup. Its not that fancy,
> # 31 Oct 2004
> # Fred Donelly <fdonelly [at] hotmail>
> # - added an option so that the output file can be specified on the
> # command line and from the quick test I gave it, it now works with
> # mythfilldatabase.
> # - $offset set to +1000 at the top and then had "+1000" set in a
> # output string further down rather than the variable
> # 4 Nov 2004
> # Paul Andreassen <paulx [at] andreassen>
> # - learned some perl and now wants to go back to python
> # - added and then reduced status info
> # - retry on failure to getstore
> # - changed cache to '/var/local/tv_grab_au'
> # - added threading for each day
> # 5 Nov 2004
> # - improved threading with use of queue
> # Eyal Lebedinsky <eyal [at] eyal>
> # - easier location selection
> # 8 Nov 2004
> # Paul
> # - fixed pid=0 bug
> # - did some merging, I hate merging
> # 9 Nov 2004
> # Rob Hill <rob [at] dot>
> # - added Sydney
> # 10 Nov 2004
> # Mary Wright <mwright [at] taz-devil>
> # - digital info for Sydney
> # Paul
> # - more cleanup and improved error checking
> # - used mirror instead of getstore to get any updates
> # - mirror didn't work replaced with own smarts to check for updates to
> times
> # 11 Nov 2004
> # - added program name in check
> # 13 Nov 2004
> # - added freesd for Brisbane
> # 14 Nov 2004
> # - --configure to exit nicely
> # - if no program data then skip program nicely, mainly for foxtel data
> # - added foxtel channels
> # 17 Nov 2004
> # - added remaining foxtel channels
> # Eyal Lebedinsky <eyal [at] eyal>
> # - Fix misspelling Unknows -> Unknown
> # - Note: is Sydney now is on summer time +1100?
>
> use strict;
> use Getopt::Long;
> use XMLTV;
> use LWP::Simple;
> use Date::Manip;
> use File::Path;
> use threads;
> use Thread::Queue;
>
> # Instructions:
> # Select your region and source.
> # If your location isn't listed below, go to
> # http://tvguide.ninemsn.com.au/guide/ select your area
> # look at the last number in the URL before ".asp" and set
> # the region variable below. Then put the channel names as listed
> # on the tv guide site into the variables below.
> # Then set your XMLTV ids from the database in the XMLTVID_URL variable.
> #
> # If it doesn't work with mythfilldatabase, try:
> # ./tv_grab_au
> # mythfilldatabase --file 1 -1 /var/local/tv_grab_au/guide.xml
>
> # pick your region
> #
> my $location = "Canberra";
> #my $location = "Brisbane";
> #my $location = "Sydney";
> #my $location = "Australia";
>
> # pick your source
> #
> my $source = "free";
> #my $source = "freesd";
> #my $source = "freehd";
> #my $source = "foxtel";
>
> # choose the XMLID URL suffix that mythtv knows
> #
> my $XMLTVID_URL = "d1.com.au";
>
> # change to how you think it should work
> my $days_to_grab = 7;
> my $threads = 5;
> my $retrys = 3;
> my $secondsbeforeretry = 2;
>
> # Variables
> my $guide_url = "http://tvguide.ninemsn.com.au/guide/";
> my $details_url =
> "http://tvguide.ninemsn.com.au/closeup/default.asp?pid=";
> my $cache_dir = "/var/local/tv_grab_au";
>
> my $XMLTV_prefix = $source . "." . $location . ".";
> my $XMLTV_suffix = "." . $XMLTVID_URL;
>
> my $region; my $offset;
> my %channels;
>
> if ("Canberra" eq $location) {
> $region = "126";
> $offset = "+1100";
> if ("free" eq $source) {
> $channels{"ABC NSW"}="2";
> $channels{"Prime Southern"}="PrimS";
> $channels{"SBS Sydney"}="SBS";
> $channels{"Southern Cross TEN Capital"}="10Cap";
> $channels{"WIN Television NSW"}="WIN"
> } elsif ("freesd" eq $source or "freehd" eq $source) {
> $channels{"ABC NSW"}="2";
> $channels{"Prime Southern"}="7";
> $channels{"SBS Sydney"}="SBS";
> $channels{"Southern Cross TEN Capital"}="10";
> $channels{"WIN Television NSW"}="9"
> } else {
> print "Unknown source '$source' for $location\n";
> exit (1);
> }
> } elsif ("Brisbane" eq $location) {
> $region = "79";
> $offset = "+1000";
> if (("free" eq $source)||("freesd" eq $source)) {
> $channels{"ABC QLD"}="2";
> $channels{"Channel Seven Queensland"}="7";
> $channels{"SBS Queensland"}="SBS";
> $channels{"Southern Cross TEN Queensland"}="10";
> $channels{"WIN Television QLD"}="9";
> } else {
> print "Unknown source '$source' for $location\n";
> exit (1);
> }
> } elsif ("Sydney" eq $location) {
> $region = "73";
> $offset = "+1100";
> if (("free" eq $source)||("freesd" eq $source)) {
> $channels{"ABC NSW"}="2";
> $channels{"Channel Seven Sydney"}="7";
> $channels{"SBS Sydney"}="SBS";
> $channels{"Network TEN Sydney"}="10";
> $channels{"Channel Nine Sydney"}="9";
> } else {
> print "Unknown source '$source' for $location\n";
> exit (1);
> }
> } elsif ("Adelaide" eq $location) {
> $region = "81";
> $offset = "+0930";
> if (("free" eq $source)||("freesd" eq $source)) {
> $channels{"ABC SA"}="2";
> $channels{"Channel Seven Adelaide"}="7";
> $channels{"SBS"}="SBS";
> $channels{"Network TEN Adekaude"}="10";
> $channels{"Channel Nine Adekaude"}="9";
> } else {
> print "Unknown source '$source' for $location\n";
> exit (1);
> }
> } elsif ("Australia" eq $location) {
> $region = "123";
> $offset = "+0930";
> if ("foxtel" eq $source) {
> $channels{"Arena TV"}="Arena";
> $channels{"BBC World"}="BBC";
> $channels{"Cartoon Network"}="Cartoon";
> $channels{"Channel [V]"}="Red";
> $channels{"CNBC"}="CNBC";
> $channels{"CNN"}="CNN";
> $channels{"Discovery Channel"}="Disc";
> $channels{"FOX News"}="FoxFNC";
> $channels{"FOX8"}="FOX";
> $channels{"MAX"}="FoxMMX";
> $channels{"National Geographic Channel"}="NatGe";
> $channels{"Nickelodeon"}="Nick";
> $channels{"Showtime"}="Show";
> $channels{"Showtime 2"}="FoxSH2";
> $channels{"Sky News"}="SkyNews";
> $channels{"TV1"}="TV1";
> $channels{"UKTV"}="UKTV";
> $channels{"Showtime Greats"}="ShowGreats";
> $channels{"World Movies"}="wmov";
> $channels{"WCH"}="WCH";
> $channels{"TVSN"}="TVSN";
> $channels{"Sky Racing"}="SkyRa";
> $channels{"Ovation"}="Ovation";
> $channels{"Disney Channel"}="Disney";
> $channels{"Animal Planet"}="Animal";
> $channels{"The Comedy Channel"}="Com";
> $channels{"The LifeStyle Channel"}="Lifes";
> $channels{"FOX Sports 1"}="FoxFS1";
> $channels{"Movie One"}="Movie1";
> $channels{"TCM"}="TCM";
> $channels{"MTV"}="MTV";
> $channels{"FOX Sports 2"}="FoxSP2";
> $channels{"FOX Footy Channel"}="FFC";
> $channels{"Movie Extra"}="MovieEx";
> $channels{"Hallmark Channel"}="Hall";
> $channels{"The History Channel"}="FoxHST";
> $channels{"ESPN"}="ESPN";
> $channels{"FOX Classics"}="FoxCLA";
> $channels{"Movie Greats"}="MovieGr";
> } else {
> print "Unknown source '$source' for $location\n";
> exit (1);
> }
> } else {
> print "Unknown location '$location'\n";
> exit (1);
> }
>
> my $prog_ref;
> my $chan_ref;
>
> foreach my $channel (keys %channels)
> {
> $$chan_ref{$channel} =
> {
> 'id' => $XMLTV_prefix . $channels{$channel} . $XMLTV_suffix,
> 'display-name' => [ [ $channel, undef ]]
> };
> }
>
>
> # Options
> my $opt_days;
> my $opt_output;
> my $opt_configfile;
> my $opt_configure = 0;
>
> GetOptions('days=i' => \$opt_days,
> 'output=s' => \$opt_output,
> 'config-file=s' => \$opt_configfile,
> 'configure' => \$opt_configure,
> );
>
> if ($opt_days) {
> $days_to_grab = $opt_days
> }
>
> if (!($opt_output)) {
> $opt_output = $cache_dir . "/guide.xml";
> }
>
> # $opt_configfile should probably do something
> ('/home/mythtv/.mythtv/tv_grab_au.xmltv')
>
> if ($opt_configure == 1)
> {
> print "configuration must be done in this script $0\n";
> exit (0);
> }
>
> print "grabing $days_to_grab days into $opt_output\n";
>
>
>
>
> print "starting $threads threads\n";
>
> my @thrlist;
> my $datepids = Thread::Queue->new;
>
> for (my $thread=0; $thread<$threads; $thread++)
> {
> push @thrlist, threads->new(\&fetch_details);
> }
>
> print "loading queue\n";
>
> my $currentday = &ParseDate("today");
> my $day_counter = 1;
> while ($day_counter <= $days_to_grab)
> {
> my $date = &UnixDate($currentday, "%d%m%Y");
> my @day_lines = get_day($date,1);
> if (@day_lines == 0)
> {
> $currentday = &DateCalc($currentday, "+ 1 day");
> $day_counter++;
> next;
> }
>
> my @pids;
> my @rowspans;
> my @names;
> foreach my $line (@day_lines)
> {
> foreach my $link (split /\n|tr|TR|TD|tr/, $line )
> {
> if ($link =~ /closeup\/default.asp/)
> {
> my $rowspan = $link;
> $rowspan =~ s/.+rowspan=//g;
> $rowspan =~ s/ .+//g;
>
> my $name = $link;
> $name =~ s/.+target=new>(<P>|)//g;
> $name =~ s/<\/a>.+//g;
>
> $link =~ s/.+pid=//g;
> $link =~ s/".+//g; #"
> if (($rowspan =~ /\d+/) and ($link =~ /\d\d+/))
> {
> push @pids, $link;
> push @rowspans, $rowspan;
> push @names, $name;
> }
> }
> }
> }
>
> if (changed_guide($date,@pids,@rowspans,@names))
> {
> for (my $count=0; $count <= $#pids; $count++)
> {
> $datepids->enqueue($date . "-" . $pids[$count]);
> }
> }
>
> $day_counter++;
> $currentday = &DateCalc($currentday, "+ 1 day");
> }
>
> for (my $thread=0; $thread<$threads; $thread++)
> {
> $datepids->enqueue(0 . "-" . 0);
> }
>
> print "queue is complete\n";
>
> foreach my $thr (@thrlist)
> {
> $thr->join;
> }
>
> print "all threads done\n";
> print "building xml structure\n";
>
> $currentday = &ParseDate("today");
> $day_counter = 1;
> while ($day_counter <= $days_to_grab)
> {
> my @pids;
> my $date = &UnixDate($currentday, "%d%m%Y");
>
> my $guide_prn_file = $cache_dir . "/" . $date . "/guide.prn";
> if (open(PRN, $guide_prn_file))
> {
> my @prn = split />/, <PRN>;
> close(PRN);
>
> if ($#prn > 1)
> {
> my $pidlast = ($#prn + 1)/3 - 1;
> @pids=@prn[0..$pidlast];
> }
> else
> {
> print "no pids in $guide_prn_file\n";
> @pids=();
> }
> }
> else
> {
> print "can't read $guide_prn_file\n";
> @pids=()
> }
>
> my $retry = 0;
> foreach my $pid (@pids)
> {
> my @details = get_details($date, $pid);
> if (@details == 0)
> {
> next;
> }
>
> my $show_details_table = "";
> my $use_line = 0;
> my $close_html = 0;
> foreach my $line (@details)
> {
> if ($line =~ /bgColor=#f7f3e8/)
> {
> $use_line = 0;
> }
> if ($use_line == 1)
> {
> $show_details_table .= $line;
> }
> if ($line =~ /bgcolor=#ffffff/)
> {
> $use_line = 1;
> }
> if ($line =~ /<\/HTML>/)
> {
> $close_html = 1;
> }
> }
>
> if ($close_html == 0)
> {
> my $name = $cache_dir . "/" . $date . "/" . $pid . ".html";
> if ($retry++ >= $retrys)
> {
> print "giving up on truncated $name\n";
> $retry=0;
> next;
> }
> unlink $name;
> push @pids, $pid;
> print "t"; # truncated
> sleep($secondsbeforeretry);
> next;
> }
>
> if ((length $show_details_table) == 0)
> {
> print "m"; # missing: can't do anything about this
> $retry=0;
> next;
> }
>
> $show_details_table =~ s/<[^>]*>/\n/g;
> $show_details_table =~ s/\&nbsp\;//g;
> #$show_details_table =~ s/<BR>|<TR>|<TD><B><b><\/B><\/b>/\n/g;
> #$show_details_table =~ s/Genre://g;
> #$show_details_table =~ s/Rated:/\n/g;
> my $count = 0;
>
> my $channel = "";
> my $start_date = &UnixDate($currentday, "%Y-%m-%d");
> my $time;
> my $title1 = "";
> my $title2 = "";
> my $genre = "";
> my $descr = "";
> my $details = "";
> my $duration;
>
>
> #print $show_details_table. "\n\n\n";
> foreach my $line (split /\n/, $show_details_table)
> {
> if ($count == 4){
> #print "Time: " . $line . "\n";
> $time = $line;
> }
> elsif ($count == 7){
> $channel = $line;
> #print "Channel: " . $line . "\n";
> }
> elsif ($count == 19){
> $title1 = $line;
> #print "Program: " . $line . "\n";
> }
> elsif ($count == 20){
> $line =~ s/ - //g;
> $title2 = $line;
> #print "Subtitle: " . $line . "\n";
> }
> elsif ($count == 21){
> $line =~ s/\D//g;
> $duration = $line;
> #print "Run time: " . $line . "\n";
> }
> elsif ($count == 22){
> $line =~ s/[^A-Z]//g;
> $details = $line;
> #print "Rating: " . $line . "\n";
> }
> elsif ($count == 26){
> $line =~ s/ //g;
> $genre = $line;
> #print "Genre: " . $line . "\n";
> }
> elsif ($count == 28 && $line =~ /[a-zA-Z]/){
> $descr = $line;
> #print "Description: " . $line . "\n";
> }
> #elsif ($count == 26 && $line =~ /[a-zA-Z]/){
> # $descr = $line;
> # print "Description: " . $line . "\n";
> #}
> #print $count .": " . $line . "\n";
> ++$count;
> }
>
>
> my $start_time = &UnixDate($time, "%H:%M");
> # my $start_datetime = $start_date . " " . $start_time;
> if ($start_time =~ /00:|01:|02:|03:|04:|05:/)
> {
> $start_date = &DateCalc($start_date, "+ 1 day");
> }
> $start_date = &UnixDate($start_date, "%Y%m%d");
> my $end_time = &DateCalc($start_time, " + " . $duration . "minutes");
> $end_time = &UnixDate($end_time, "%H:%M");
>
> my $end_date;
> if (&Date_Cmp($start_time, $end_time) <= 0)
> {
> $end_date = $start_date;
> }
> else
> {
> my $err;
> my $edate = &DateCalc($start_date, "+ 1 day", \$err);
> $end_date = &UnixDate($edate, "%Y%m%d");
> }
>
> if (defined $channels{$channel})
> {
> $channel = $XMLTV_prefix . $channels{$channel} . $XMLTV_suffix;
> }
> else
> {
> print "unknown channel $channel\n";
> $retry=0;
> next;
> }
>
> my $start;
> my $stop;
>
> $start = $start_date . &UnixDate($start_time,"%H%M") . "00 " . $offset;
> $stop = $end_date . &UnixDate($end_time,"%H%M") . "00 " . $offset;
>
> my $a_prog = {
> channel => $channel,
> start => $start,
> stop => $stop,
> title => [ [ $title1, undef ] ]
> };
>
> $descr =~ s/^\s+//;
> $descr =~ s/\s+$//;
>
> if ($title2) { $$a_prog{'sub-title'} = [ [ $title2, undef ] ]; }
> if ($descr) { $$a_prog{desc} = [ [ $descr, undef ] ]; }
> if ($genre) { $$a_prog{category} = [ [ $genre, undef ] ]; }
>
> push @$prog_ref, $a_prog;
> $retry=0;
> }
>
> $currentday = &DateCalc($currentday, "+ 1 day");
> $day_counter++;
> }
>
> my $data = [
> 'ISO-8859-1',
> {
> 'source-info-name' => 'http://tvguide.ninemsn.com.au/',
> 'generator-info-name' => 'NineMSN grabber',
> 'generator-info-url' => '',
> 'generator-info-name' => "XMLTV - tv_grab_au NineMSN v0.2"
> },
> $chan_ref,
> $prog_ref
> ];
>
> my $hour=&UnixDate(&ParseDate("now"),"%H");
> if ($hour < 6)
> {
> print "can't update between 0:00 and 6:00\n";
> # If we update between these hours we lose any data we had up to 6:00.
> # This is because the web site starts a day at 6:00 and ends at 6:00 the
> next day
> # This could be fixed by read the previous days info and adding the needed
> shows.
> # I did try adding the whole previous day but got lots of mythfilldatabase
> errors.
> exit(1);
> }
>
> print "writing file\n";
>
> my $fh = new IO::File ">$opt_output";
> XMLTV::write_data($data, OUTPUT=>$fh);
>
> print "done\n";
>
> # subroutines
> sub get_day
> {
> my $date = shift;
> my $force = shift;
> my $url = $guide_url . $date . "_" . $region . ".asp";
>
> my $guide_dir = $cache_dir . "/" . $date;
> my $guide_file = $guide_dir . "/guide.html";
> mkpath ($guide_dir);
>
> for (my $retry=0; (($force==1) || (!(-e $guide_file))) &&
> is_error(getstore($url, $guide_file)) && ($retry<$retrys); $retry++)
> {
> print ".";
> sleep($secondsbeforeretry);
> }
>
> my @guide_lines;
> if (open(GUIDE, $guide_file))
> {
> @guide_lines = <GUIDE>;
> close(GUIDE);
> }
> else
> {
> @guide_lines = ();
> print "giving up on $guide_file\n";
> }
> return @guide_lines;
> }
>
> sub get_details
> {
> my $date = shift;
> my $program_id = shift;
>
> my $url = $details_url . $program_id;
> my $guide_dir = $cache_dir . "/" . $date;
> my $details_file = $guide_dir . "/" . $program_id . ".html";
> mkpath ($guide_dir);
>
> for (my $retry=0; (!(-e $details_file)) && is_error(getstore($url,
> $details_file)) && ($retry<$retrys); $retry++)
> {
> print ".";
> sleep($secondsbeforeretry);
> }
>
> my @details_lines;
> if (open(DETAILS, $details_file))
> {
> @details_lines = <DETAILS>;
> close(DETAILS);
> }
> else
> {
> @details_lines = ();
> print "giving up on $details_file\n";
> }
> return @details_lines;
> }
>
> sub fetch_details
> {
> my $datepid=$datepids->dequeue;
> my @datepidl=split /-/, $datepid;
> my $date = $datepidl[0];
> my $pid = $datepidl[1];
>
> while (($date!=0) and ($pid!=0))
> {
> my $guide_dir = $cache_dir . "/" . $date;
> mkpath ($guide_dir);
>
> my $url = $details_url . $pid;
> my $details_file = $guide_dir . "/" . $pid . ".html";
>
> for (my $retry=0; is_error(getstore($url, $details_file)) &&
> ($retry<$retrys); $retry++)
> {
> sleep($secondsbeforeretry);
> }
>
> $datepid=$datepids->dequeue;
> @datepidl=split /-/, $datepid;
> $date = $datepidl[0];
> $pid = $datepidl[1];
> }
> }
>
> sub changed_guide
> {
> my $date = shift;
> my @pidsrowspansnames = @_;
>
> my $guide_prn_file = $cache_dir . "/" . $date . "/guide.prn";
> if (open(PRN, $guide_prn_file))
> {
> my @prn = split />/, <PRN>;
> close(PRN);
>
> if (($#prn > 1) and ($#prn == $#pidsrowspansnames))
> {
> my $count;
> my $diff = ((($#prn+1)*2)/3)-1;
> for ($count=0; ($count <= $diff) &&
> ($prn[$count]==$pidsrowspansnames[$count]); $count++)
> { }
>
> if ($count==($diff+1))
> {
> for (; ($count <= $#prn) && ($prn[$count] eq $pidsrowspansnames[$count]);
> $count++)
> { }
>
> if ($count==($#prn+1))
> {
> print "$date unchanged\n";
> return 0;
> }
> }
> }
> }
>
> print "$date downloading\n";
>
> if (open(PRN, ">", $guide_prn_file))
> {
> for (my $count=0; $count<$#pidsrowspansnames; $count++)
> {
> print PRN "$pidsrowspansnames[$count]>";
> }
> print PRN "$pidsrowspansnames[$#pidsrowspansnames]";
> close(PRN);
> }
> else
> {
> print "can't open for writing $guide_prn_file\n";
> }
>
> return 1;
> }
>


--------------------------------------------------------------------------------


_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
Attachments: tv_grab_au.txt (17.5 KB)


markusb at netspace

Nov 18, 2004, 3:43 PM

Post #4 of 30 (10355 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

Okay, this looks good but I can't get it to work for me...

I am in Adelaide, get all sorts of compile errors when it runs. Had
thread errors etc initially, but recompiled Perl with threads so that is
fixed, but I guess I have stuffed something when editing the script, not
being a programming/scripting person, I am tearing my hair out.

Does anyone have an already edited script for Adelaide?

thanks

markus



Michael Cheshire (Mailing Lists) wrote:

> Actually there were several more spelling mistakes..
>
> Woops :)
>
> ----- Original Message ----- From: "Eyal Lebedinsky"
> <eyal [at] eyal>
> To: "Discussion about mythtv" <mythtv-users [at] mythtv>
> Sent: Wednesday, November 17, 2004 11:03 PM
> Subject: Re: [mythtv-users] New Australian XMLTV grabber
>
>
>> Michael Cheshire (Mailing Lists) wrote:
>>
>>> The great script, updated with foxtel channels.
>>
>>
>> If I did not break anything else then this is the same thing, as an
>> attachment
>> so that whitespace is not lost, and with one misspelling fixed...
>>
>> --
>> Eyal Lebedinsky (eyal [at] eyal) <http://samba.org/eyal/>
>>
>
>
> --------------------------------------------------------------------------------
>
>
>
>> #!/usr/bin/perl -w
>> # Australian TV Guide XMLTV grabber by Damon Searle
>> # Derived from a yahoo XMLTV grabber by Ron Kellam which was itself...
>> # Derived from original code by Justin Hawkins
>> #
>> # This program is free software; you can redistribute it and/or modify
>> # it under the terms of the GNU General Public License as published by
>> # the Free Software Foundation; either version 2 of the License, or
>> # (at your option) any later version.
>> #
>> # This program is distributed in the hope that it will be useful,
>> # but WITHOUT ANY WARRANTY; without even the implied warranty of
>> # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> # GNU General Public License for more details.
>> #
>> # You should have received a copy of the GNU General Public License
>> # along with this program; if not, write to the Free Software
>> # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
>> 02111-1307 USA
>>
>> # 30 Oct 2004
>> # Damon Searle <djsearle [at] netspace>
>> # - wrote first version
>> # - gets data from NineMSN as a backup. Its not that fancy,
>> # 31 Oct 2004
>> # Fred Donelly <fdonelly [at] hotmail>
>> # - added an option so that the output file can be specified on the
>> # command line and from the quick test I gave it, it now works with
>> # mythfilldatabase.
>> # - $offset set to +1000 at the top and then had "+1000" set in a
>> # output string further down rather than the variable
>> # 4 Nov 2004
>> # Paul Andreassen <paulx [at] andreassen>
>> # - learned some perl and now wants to go back to python
>> # - added and then reduced status info
>> # - retry on failure to getstore
>> # - changed cache to '/var/local/tv_grab_au'
>> # - added threading for each day
>> # 5 Nov 2004
>> # - improved threading with use of queue
>> # Eyal Lebedinsky <eyal [at] eyal>
>> # - easier location selection
>> # 8 Nov 2004
>> # Paul
>> # - fixed pid=0 bug
>> # - did some merging, I hate merging
>> # 9 Nov 2004
>> # Rob Hill <rob [at] dot>
>> # - added Sydney
>> # 10 Nov 2004
>> # Mary Wright <mwright [at] taz-devil>
>> # - digital info for Sydney
>> # Paul
>> # - more cleanup and improved error checking
>> # - used mirror instead of getstore to get any updates
>> # - mirror didn't work replaced with own smarts to check for updates
>> to times
>> # 11 Nov 2004
>> # - added program name in check
>> # 13 Nov 2004
>> # - added freesd for Brisbane
>> # 14 Nov 2004
>> # - --configure to exit nicely
>> # - if no program data then skip program nicely, mainly for foxtel data
>> # - added foxtel channels
>> # 17 Nov 2004
>> # - added remaining foxtel channels
>> # Eyal Lebedinsky <eyal [at] eyal>
>> # - Fix misspelling Unknows -> Unknown
>> # - Note: is Sydney now is on summer time +1100?
>>
>> use strict;
>> use Getopt::Long;
>> use XMLTV;
>> use LWP::Simple;
>> use Date::Manip;
>> use File::Path;
>> use threads;
>> use Thread::Queue;
>>
>> # Instructions:
>> # Select your region and source.
>> # If your location isn't listed below, go to
>> # http://tvguide.ninemsn.com.au/guide/ select your area
>> # look at the last number in the URL before ".asp" and set
>> # the region variable below. Then put the channel names as listed
>> # on the tv guide site into the variables below.
>> # Then set your XMLTV ids from the database in the XMLTVID_URL variable.
>> #
>> # If it doesn't work with mythfilldatabase, try:
>> # ./tv_grab_au
>> # mythfilldatabase --file 1 -1 /var/local/tv_grab_au/guide.xml
>>
>> # pick your region
>> #
>> my $location = "Canberra";
>> #my $location = "Brisbane";
>> #my $location = "Sydney";
>> #my $location = "Australia";
>>
>> # pick your source
>> #
>> my $source = "free";
>> #my $source = "freesd";
>> #my $source = "freehd";
>> #my $source = "foxtel";
>>
>> # choose the XMLID URL suffix that mythtv knows
>> #
>> my $XMLTVID_URL = "d1.com.au";
>>
>> # change to how you think it should work
>> my $days_to_grab = 7;
>> my $threads = 5;
>> my $retrys = 3;
>> my $secondsbeforeretry = 2;
>>
>> # Variables
>> my $guide_url = "http://tvguide.ninemsn.com.au/guide/";
>> my $details_url =
>> "http://tvguide.ninemsn.com.au/closeup/default.asp?pid=";
>> my $cache_dir = "/var/local/tv_grab_au";
>>
>> my $XMLTV_prefix = $source . "." . $location . ".";
>> my $XMLTV_suffix = "." . $XMLTVID_URL;
>>
>> my $region; my $offset;
>> my %channels;
>>
>> if ("Canberra" eq $location) {
>> $region = "126";
>> $offset = "+1100";
>> if ("free" eq $source) {
>> $channels{"ABC NSW"}="2";
>> $channels{"Prime Southern"}="PrimS";
>> $channels{"SBS Sydney"}="SBS";
>> $channels{"Southern Cross TEN Capital"}="10Cap";
>> $channels{"WIN Television NSW"}="WIN"
>> } elsif ("freesd" eq $source or "freehd" eq $source) {
>> $channels{"ABC NSW"}="2";
>> $channels{"Prime Southern"}="7";
>> $channels{"SBS Sydney"}="SBS";
>> $channels{"Southern Cross TEN Capital"}="10";
>> $channels{"WIN Television NSW"}="9"
>> } else {
>> print "Unknown source '$source' for $location\n";
>> exit (1);
>> }
>> } elsif ("Brisbane" eq $location) {
>> $region = "79";
>> $offset = "+1000";
>> if (("free" eq $source)||("freesd" eq $source)) {
>> $channels{"ABC QLD"}="2";
>> $channels{"Channel Seven Queensland"}="7";
>> $channels{"SBS Queensland"}="SBS";
>> $channels{"Southern Cross TEN Queensland"}="10";
>> $channels{"WIN Television QLD"}="9";
>> } else {
>> print "Unknown source '$source' for $location\n";
>> exit (1);
>> }
>> } elsif ("Sydney" eq $location) {
>> $region = "73";
>> $offset = "+1100";
>> if (("free" eq $source)||("freesd" eq $source)) {
>> $channels{"ABC NSW"}="2";
>> $channels{"Channel Seven Sydney"}="7";
>> $channels{"SBS Sydney"}="SBS";
>> $channels{"Network TEN Sydney"}="10";
>> $channels{"Channel Nine Sydney"}="9";
>> } else {
>> print "Unknown source '$source' for $location\n";
>> exit (1);
>> }
>> } elsif ("Adelaide" eq $location) {
>> $region = "81";
>> $offset = "+0930";
>> if (("free" eq $source)||("freesd" eq $source)) {
>> $channels{"ABC SA"}="2";
>> $channels{"Channel Seven Adelaide"}="7";
>> $channels{"SBS"}="SBS";
>> $channels{"Network TEN Adekaude"}="10";
>> $channels{"Channel Nine Adekaude"}="9";
>> } else {
>> print "Unknown source '$source' for $location\n";
>> exit (1);
>> }
>> } elsif ("Australia" eq $location) {
>> $region = "123";
>> $offset = "+0930";
>> if ("foxtel" eq $source) {
>> $channels{"Arena TV"}="Arena";
>> $channels{"BBC World"}="BBC";
>> $channels{"Cartoon Network"}="Cartoon";
>> $channels{"Channel [V]"}="Red";
>> $channels{"CNBC"}="CNBC";
>> $channels{"CNN"}="CNN";
>> $channels{"Discovery Channel"}="Disc";
>> $channels{"FOX News"}="FoxFNC";
>> $channels{"FOX8"}="FOX";
>> $channels{"MAX"}="FoxMMX";
>> $channels{"National Geographic Channel"}="NatGe";
>> $channels{"Nickelodeon"}="Nick";
>> $channels{"Showtime"}="Show";
>> $channels{"Showtime 2"}="FoxSH2";
>> $channels{"Sky News"}="SkyNews";
>> $channels{"TV1"}="TV1";
>> $channels{"UKTV"}="UKTV";
>> $channels{"Showtime Greats"}="ShowGreats";
>> $channels{"World Movies"}="wmov";
>> $channels{"WCH"}="WCH";
>> $channels{"TVSN"}="TVSN";
>> $channels{"Sky Racing"}="SkyRa";
>> $channels{"Ovation"}="Ovation";
>> $channels{"Disney Channel"}="Disney";
>> $channels{"Animal Planet"}="Animal";
>> $channels{"The Comedy Channel"}="Com";
>> $channels{"The LifeStyle Channel"}="Lifes";
>> $channels{"FOX Sports 1"}="FoxFS1";
>> $channels{"Movie One"}="Movie1";
>> $channels{"TCM"}="TCM";
>> $channels{"MTV"}="MTV";
>> $channels{"FOX Sports 2"}="FoxSP2";
>> $channels{"FOX Footy Channel"}="FFC";
>> $channels{"Movie Extra"}="MovieEx";
>> $channels{"Hallmark Channel"}="Hall";
>> $channels{"The History Channel"}="FoxHST";
>> $channels{"ESPN"}="ESPN";
>> $channels{"FOX Classics"}="FoxCLA";
>> $channels{"Movie Greats"}="MovieGr";
>> } else {
>> print "Unknown source '$source' for $location\n";
>> exit (1);
>> }
>> } else {
>> print "Unknown location '$location'\n";
>> exit (1);
>> }
>>
>> my $prog_ref;
>> my $chan_ref;
>>
>> foreach my $channel (keys %channels)
>> {
>> $$chan_ref{$channel} =
>> {
>> 'id' => $XMLTV_prefix . $channels{$channel} . $XMLTV_suffix,
>> 'display-name' => [ [ $channel, undef ]]
>> };
>> }
>>
>>
>> # Options
>> my $opt_days;
>> my $opt_output;
>> my $opt_configfile;
>> my $opt_configure = 0;
>>
>> GetOptions('days=i' => \$opt_days,
>> 'output=s' => \$opt_output,
>> 'config-file=s' => \$opt_configfile,
>> 'configure' => \$opt_configure,
>> );
>>
>> if ($opt_days) {
>> $days_to_grab = $opt_days
>> }
>>
>> if (!($opt_output)) {
>> $opt_output = $cache_dir . "/guide.xml";
>> }
>>
>> # $opt_configfile should probably do something
>> ('/home/mythtv/.mythtv/tv_grab_au.xmltv')
>>
>> if ($opt_configure == 1)
>> {
>> print "configuration must be done in this script $0\n";
>> exit (0);
>> }
>>
>> print "grabing $days_to_grab days into $opt_output\n";
>>
>>
>>
>>
>> print "starting $threads threads\n";
>>
>> my @thrlist;
>> my $datepids = Thread::Queue->new;
>>
>> for (my $thread=0; $thread<$threads; $thread++)
>> {
>> push @thrlist, threads->new(\&fetch_details);
>> }
>>
>> print "loading queue\n";
>>
>> my $currentday = &ParseDate("today");
>> my $day_counter = 1;
>> while ($day_counter <= $days_to_grab)
>> {
>> my $date = &UnixDate($currentday, "%d%m%Y");
>> my @day_lines = get_day($date,1);
>> if (@day_lines == 0)
>> {
>> $currentday = &DateCalc($currentday, "+ 1 day");
>> $day_counter++;
>> next;
>> }
>>
>> my @pids;
>> my @rowspans;
>> my @names;
>> foreach my $line (@day_lines)
>> {
>> foreach my $link (split /\n|tr|TR|TD|tr/, $line )
>> {
>> if ($link =~ /closeup\/default.asp/)
>> {
>> my $rowspan = $link;
>> $rowspan =~ s/.+rowspan=//g;
>> $rowspan =~ s/ .+//g;
>>
>> my $name = $link;
>> $name =~ s/.+target=new>(<P>|)//g;
>> $name =~ s/<\/a>.+//g;
>>
>> $link =~ s/.+pid=//g;
>> $link =~ s/".+//g; #"
>> if (($rowspan =~ /\d+/) and ($link =~ /\d\d+/))
>> {
>> push @pids, $link;
>> push @rowspans, $rowspan;
>> push @names, $name;
>> }
>> }
>> }
>> }
>>
>> if (changed_guide($date,@pids,@rowspans,@names))
>> {
>> for (my $count=0; $count <= $#pids; $count++)
>> {
>> $datepids->enqueue($date . "-" . $pids[$count]);
>> }
>> }
>>
>> $day_counter++;
>> $currentday = &DateCalc($currentday, "+ 1 day");
>> }
>>
>> for (my $thread=0; $thread<$threads; $thread++)
>> {
>> $datepids->enqueue(0 . "-" . 0);
>> }
>>
>> print "queue is complete\n";
>>
>> foreach my $thr (@thrlist)
>> {
>> $thr->join;
>> }
>>
>> print "all threads done\n";
>> print "building xml structure\n";
>>
>> $currentday = &ParseDate("today");
>> $day_counter = 1;
>> while ($day_counter <= $days_to_grab)
>> {
>> my @pids;
>> my $date = &UnixDate($currentday, "%d%m%Y");
>>
>> my $guide_prn_file = $cache_dir . "/" . $date . "/guide.prn";
>> if (open(PRN, $guide_prn_file))
>> {
>> my @prn = split />/, <PRN>;
>> close(PRN);
>>
>> if ($#prn > 1)
>> {
>> my $pidlast = ($#prn + 1)/3 - 1;
>> @pids=@prn[0..$pidlast];
>> }
>> else
>> {
>> print "no pids in $guide_prn_file\n";
>> @pids=();
>> }
>> }
>> else
>> {
>> print "can't read $guide_prn_file\n";
>> @pids=()
>> }
>>
>> my $retry = 0;
>> foreach my $pid (@pids)
>> {
>> my @details = get_details($date, $pid);
>> if (@details == 0)
>> {
>> next;
>> }
>>
>> my $show_details_table = "";
>> my $use_line = 0;
>> my $close_html = 0;
>> foreach my $line (@details)
>> {
>> if ($line =~ /bgColor=#f7f3e8/)
>> {
>> $use_line = 0;
>> }
>> if ($use_line == 1)
>> {
>> $show_details_table .= $line;
>> }
>> if ($line =~ /bgcolor=#ffffff/)
>> {
>> $use_line = 1;
>> }
>> if ($line =~ /<\/HTML>/)
>> {
>> $close_html = 1;
>> }
>> }
>>
>> if ($close_html == 0)
>> {
>> my $name = $cache_dir . "/" . $date . "/" . $pid . ".html";
>> if ($retry++ >= $retrys)
>> {
>> print "giving up on truncated $name\n";
>> $retry=0;
>> next;
>> }
>> unlink $name;
>> push @pids, $pid;
>> print "t"; # truncated
>> sleep($secondsbeforeretry);
>> next;
>> }
>>
>> if ((length $show_details_table) == 0)
>> {
>> print "m"; # missing: can't do anything about this
>> $retry=0;
>> next;
>> }
>>
>> $show_details_table =~ s/<[^>]*>/\n/g;
>> $show_details_table =~ s/\&nbsp\;//g;
>> #$show_details_table =~ s/<BR>|<TR>|<TD><B><b><\/B><\/b>/\n/g;
>> #$show_details_table =~ s/Genre://g;
>> #$show_details_table =~ s/Rated:/\n/g;
>> my $count = 0;
>>
>> my $channel = "";
>> my $start_date = &UnixDate($currentday, "%Y-%m-%d");
>> my $time;
>> my $title1 = "";
>> my $title2 = "";
>> my $genre = "";
>> my $descr = "";
>> my $details = "";
>> my $duration;
>>
>>
>> #print $show_details_table. "\n\n\n";
>> foreach my $line (split /\n/, $show_details_table)
>> {
>> if ($count == 4){
>> #print "Time: " . $line . "\n";
>> $time = $line;
>> }
>> elsif ($count == 7){
>> $channel = $line;
>> #print "Channel: " . $line . "\n";
>> }
>> elsif ($count == 19){
>> $title1 = $line;
>> #print "Program: " . $line . "\n";
>> }
>> elsif ($count == 20){
>> $line =~ s/ - //g;
>> $title2 = $line;
>> #print "Subtitle: " . $line . "\n";
>> }
>> elsif ($count == 21){
>> $line =~ s/\D//g;
>> $duration = $line;
>> #print "Run time: " . $line . "\n";
>> }
>> elsif ($count == 22){
>> $line =~ s/[^A-Z]//g;
>> $details = $line;
>> #print "Rating: " . $line . "\n";
>> }
>> elsif ($count == 26){
>> $line =~ s/ //g;
>> $genre = $line;
>> #print "Genre: " . $line . "\n";
>> }
>> elsif ($count == 28 && $line =~ /[a-zA-Z]/){
>> $descr = $line;
>> #print "Description: " . $line . "\n";
>> }
>> #elsif ($count == 26 && $line =~ /[a-zA-Z]/){
>> # $descr = $line;
>> # print "Description: " . $line . "\n";
>> #}
>> #print $count .": " . $line . "\n";
>> ++$count;
>> }
>>
>>
>> my $start_time = &UnixDate($time, "%H:%M");
>> # my $start_datetime = $start_date . " " . $start_time;
>> if ($start_time =~ /00:|01:|02:|03:|04:|05:/)
>> {
>> $start_date = &DateCalc($start_date, "+ 1 day");
>> }
>> $start_date = &UnixDate($start_date, "%Y%m%d");
>> my $end_time = &DateCalc($start_time, " + " . $duration . "minutes");
>> $end_time = &UnixDate($end_time, "%H:%M");
>>
>> my $end_date;
>> if (&Date_Cmp($start_time, $end_time) <= 0)
>> {
>> $end_date = $start_date;
>> }
>> else
>> {
>> my $err;
>> my $edate = &DateCalc($start_date, "+ 1 day", \$err);
>> $end_date = &UnixDate($edate, "%Y%m%d");
>> }
>>
>> if (defined $channels{$channel})
>> {
>> $channel = $XMLTV_prefix . $channels{$channel} . $XMLTV_suffix;
>> }
>> else
>> {
>> print "unknown channel $channel\n";
>> $retry=0;
>> next;
>> }
>>
>> my $start;
>> my $stop;
>>
>> $start = $start_date . &UnixDate($start_time,"%H%M") . "00 " . $offset;
>> $stop = $end_date . &UnixDate($end_time,"%H%M") . "00 " . $offset;
>>
>> my $a_prog = {
>> channel => $channel,
>> start => $start,
>> stop => $stop,
>> title => [ [ $title1, undef ] ]
>> };
>>
>> $descr =~ s/^\s+//;
>> $descr =~ s/\s+$//;
>>
>> if ($title2) { $$a_prog{'sub-title'} = [ [ $title2, undef ] ]; }
>> if ($descr) { $$a_prog{desc} = [ [ $descr, undef ] ]; }
>> if ($genre) { $$a_prog{category} = [ [ $genre, undef ] ]; }
>>
>> push @$prog_ref, $a_prog;
>> $retry=0;
>> }
>>
>> $currentday = &DateCalc($currentday, "+ 1 day");
>> $day_counter++;
>> }
>>
>> my $data = [
>> 'ISO-8859-1',
>> {
>> 'source-info-name' => 'http://tvguide.ninemsn.com.au/',
>> 'generator-info-name' => 'NineMSN grabber',
>> 'generator-info-url' => '',
>> 'generator-info-name' => "XMLTV - tv_grab_au NineMSN v0.2"
>> },
>> $chan_ref,
>> $prog_ref
>> ];
>>
>> my $hour=&UnixDate(&ParseDate("now"),"%H");
>> if ($hour < 6)
>> {
>> print "can't update between 0:00 and 6:00\n";
>> # If we update between these hours we lose any data we had up to 6:00.
>> # This is because the web site starts a day at 6:00 and ends at 6:00
>> the next day
>> # This could be fixed by read the previous days info and adding the
>> needed shows.
>> # I did try adding the whole previous day but got lots of
>> mythfilldatabase errors.
>> exit(1);
>> }
>>
>> print "writing file\n";
>>
>> my $fh = new IO::File ">$opt_output";
>> XMLTV::write_data($data, OUTPUT=>$fh);
>>
>> print "done\n";
>>
>> # subroutines
>> sub get_day
>> {
>> my $date = shift;
>> my $force = shift;
>> my $url = $guide_url . $date . "_" . $region . ".asp";
>>
>> my $guide_dir = $cache_dir . "/" . $date;
>> my $guide_file = $guide_dir . "/guide.html";
>> mkpath ($guide_dir);
>>
>> for (my $retry=0; (($force==1) || (!(-e $guide_file))) &&
>> is_error(getstore($url, $guide_file)) && ($retry<$retrys); $retry++)
>> {
>> print ".";
>> sleep($secondsbeforeretry);
>> }
>>
>> my @guide_lines;
>> if (open(GUIDE, $guide_file))
>> {
>> @guide_lines = <GUIDE>;
>> close(GUIDE);
>> }
>> else
>> {
>> @guide_lines = ();
>> print "giving up on $guide_file\n";
>> }
>> return @guide_lines;
>> }
>>
>> sub get_details
>> {
>> my $date = shift;
>> my $program_id = shift;
>>
>> my $url = $details_url . $program_id;
>> my $guide_dir = $cache_dir . "/" . $date;
>> my $details_file = $guide_dir . "/" . $program_id . ".html";
>> mkpath ($guide_dir);
>>
>> for (my $retry=0; (!(-e $details_file)) && is_error(getstore($url,
>> $details_file)) && ($retry<$retrys); $retry++)
>> {
>> print ".";
>> sleep($secondsbeforeretry);
>> }
>>
>> my @details_lines;
>> if (open(DETAILS, $details_file))
>> {
>> @details_lines = <DETAILS>;
>> close(DETAILS);
>> }
>> else
>> {
>> @details_lines = ();
>> print "giving up on $details_file\n";
>> }
>> return @details_lines;
>> }
>>
>> sub fetch_details
>> {
>> my $datepid=$datepids->dequeue;
>> my @datepidl=split /-/, $datepid;
>> my $date = $datepidl[0];
>> my $pid = $datepidl[1];
>>
>> while (($date!=0) and ($pid!=0))
>> {
>> my $guide_dir = $cache_dir . "/" . $date;
>> mkpath ($guide_dir);
>>
>> my $url = $details_url . $pid;
>> my $details_file = $guide_dir . "/" . $pid . ".html";
>>
>> for (my $retry=0; is_error(getstore($url, $details_file)) &&
>> ($retry<$retrys); $retry++)
>> {
>> sleep($secondsbeforeretry);
>> }
>>
>> $datepid=$datepids->dequeue;
>> @datepidl=split /-/, $datepid;
>> $date = $datepidl[0];
>> $pid = $datepidl[1];
>> }
>> }
>>
>> sub changed_guide
>> {
>> my $date = shift;
>> my @pidsrowspansnames = @_;
>>
>> my $guide_prn_file = $cache_dir . "/" . $date . "/guide.prn";
>> if (open(PRN, $guide_prn_file))
>> {
>> my @prn = split />/, <PRN>;
>> close(PRN);
>>
>> if (($#prn > 1) and ($#prn == $#pidsrowspansnames))
>> {
>> my $count;
>> my $diff = ((($#prn+1)*2)/3)-1;
>> for ($count=0; ($count <= $diff) &&
>> ($prn[$count]==$pidsrowspansnames[$count]); $count++)
>> { }
>>
>> if ($count==($diff+1))
>> {
>> for (; ($count <= $#prn) && ($prn[$count] eq
>> $pidsrowspansnames[$count]); $count++)
>> { }
>>
>> if ($count==($#prn+1))
>> {
>> print "$date unchanged\n";
>> return 0;
>> }
>> }
>> }
>> }
>>
>> print "$date downloading\n";
>>
>> if (open(PRN, ">", $guide_prn_file))
>> {
>> for (my $count=0; $count<$#pidsrowspansnames; $count++)
>> {
>> print PRN "$pidsrowspansnames[$count]>";
>> }
>> print PRN "$pidsrowspansnames[$#pidsrowspansnames]";
>> close(PRN);
>> }
>> else
>> {
>> print "can't open for writing $guide_prn_file\n";
>> }
>>
>> return 1;
>> }
>>

_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


paulx at andreassen

Nov 27, 2004, 11:08 PM

Post #5 of 30 (10269 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

On Thu, 2004-11-18 at 07:20, Michael Cheshire (Mailing Lists) wrote:
> Actually there were several more spelling mistakes..
>
> Woops :)

Hi,

Made some more changes.

It now converts &amp; -> & and &quot; -> " to fix some display issues.
There is also an option to convert all & to 'and' to fix cases where
names change week to week. "The Bold and the Beautiful" does this, not
that I watch "The Bold & the Beautiful".

I believe tv_grab_au also now handles 0:00 to 6:00 correctly. And it
deletes the last 7 days in the cache. It can miss some if not run for
more then 7 days.

Todo
- read / write config (could save alot of setup stuff in top of file)
- two different region codes
- do partial updates when day changes
- do updates when data for program changes (title2 and description)

The last todo is the most annoying because programs aren't recorded when
title2 and description are the same as last week. My only thought on
who to fix is to download everything each day or maybe only today. Any
suggestions?

Paul
Attachments: tv_grab_au.pl (18.3 KB)


phill_edwards at hotmail

Nov 29, 2004, 6:47 PM

Post #6 of 30 (10272 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

Thanks Paul - this certainly fixes the 00:00 to 06:00 issue. I was wondering
why mythfilldatabase didn't work when I was running it at 2am!

Well done and on behalf of the Aussie MythTV community, thanks very much for
this.

Regards,
Phill

>From: Paul Andreassen <paulx [at] andreassen>
>Reply-To: Discussion about mythtv <mythtv-users [at] mythtv>
>To: Discussion about mythtv <mythtv-users [at] mythtv>
>Subject: Re: [mythtv-users] New Australian XMLTV grabber
>Date: Sun, 28 Nov 2004 17:08:16 +1000
>
>On Thu, 2004-11-18 at 07:20, Michael Cheshire (Mailing Lists) wrote:
> > Actually there were several more spelling mistakes..
> >
> > Woops :)
>
>Hi,
>
>Made some more changes.
>
>It now converts &amp; -> & and &quot; -> " to fix some display issues.
>There is also an option to convert all & to 'and' to fix cases where
>names change week to week. "The Bold and the Beautiful" does this, not
>that I watch "The Bold & the Beautiful".
>
>I believe tv_grab_au also now handles 0:00 to 6:00 correctly. And it
>deletes the last 7 days in the cache. It can miss some if not run for
>more then 7 days.
>


paulx at andreassen

Dec 1, 2004, 6:18 AM

Post #7 of 30 (10263 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

Hi Mark,

I can't see how the offset is supposed to straddle daylight saving
change. Perldoc reports '"tz_local_offset()" determins the offset from
GMT time in seconds. It only does the calculation once.' Date::Manip
is also brain dead.

Modified the rating fix and seems to work.

Left out a few other changes that seem silly. Like changing cache_dir
and location from command line, and renaming channels.

Thanks for the changes,
Paul

-----Forwarded Message-----
> From: Mark Spieth <mspieth [at] dclabs>
> To: paulx [at] andreassen
> Subject: update to ninemsn grabber
> Date: Wed, 01 Dec 2004 14:49:31 +1100
>
> paul,
>
> based on your 17/11 one
> simple rating fix, auto timezone offset calc including straddle of
> daylightsavings change.
> also wanted to do auto region getting as this is in a list in the
first
> page but havent done it as Im too lazy.
> also want to put in an override for xmltvid so that I can fetch for
> foxtel.Melbourne.xxxx and freehd.Melbourne.yyy as the names are
slightly
> different.
>
> also a foxtel with conf one that grabs from www.foxtel.tv
> which works ok. can get listing a whole month ahead.
> conf is extension of d1 conf file to allow mapping of unknown d1
channels
> also since main and adult are same in analog, a special case for
> multichannels. takes about 20 min for a month of progs.
> post if you like. too many posts on users for me so only on dev. thus
I
> send it to you.
> may be able to parallelize gets too.
>
> I run them both after d1 fetch in cron so that if these fail at least
I
> have something.
>
> cheers
> mark
Attachments: tv_grab_au (19.0 KB)


paulx at andreassen

Dec 11, 2004, 5:28 AM

Post #8 of 30 (10181 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

On Thu, 2004-12-02 at 07:02, Mark Spieth wrote:
> tz_local_offset works on each time supplied individually. I tested it.
> works.
> thus 1am on last sund of oct is +1000 and 4am on same day is +1100 and diff
> on the dates gets 2 hours as it should be.
> cache dir set was mainly for testing so I could test in a private dir.
> main reason for renaming channes is so that it operates together with d1 as
> backup in case ninemsn change and the script stops working. same reason for
> the foxtel one. no myth db changes required.
> cheers
> mark

Hi Mark,

Could you try this version for daylight saving straddling. I believe
your version started about 15 hours early, due to Date::Manip getting
the timezone wrong. Why can't the perl people get there act together
and write some date routines that work with timezones?

Paul
Attachments: tv_grab_au (19.1 KB)


rob at hillis

Dec 12, 2004, 2:27 AM

Post #9 of 30 (10196 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

Paul Andreassen wrote:

> Could you try this version for daylight saving straddling. I believe
> your version started about 15 hours early, due to Date::Manip getting
> the timezone wrong. Why can't the perl people get there act together
> and write some date routines that work with timezones?

After far too long, I've changed over to this posted version of the
script - it's working fine for the freesd channels, but when I run an
appropriately modified copy for Foxtel, I end up with the following...

mythtv root # tv_grab_au_foxtel
grabing 7 days into /var/local/tv_grab_au_foxtel/guide.xml
starting 5 threads
loading queue
11122004 unchanged
12122004 unchanged
13122004 unchanged
14122004 unchanged
15122004 unchanged
16122004 downloading
Use of uninitialized value in string at /usr/bin/tv_grab_au_foxtel
line 722.
17122004 downloading
Use of uninitialized value in string at /usr/bin/tv_grab_au_foxtel
line 722.
18122004 downloading
Use of uninitialized value in string at /usr/bin/tv_grab_au_foxtel
line 722.
queue is complete
all threads done
building xml structure

At this point, the script just hangs. Doesn't do anything, even if left
alone for a few hours. I probably should point out that the Foxtel
channels aren't in my channels table (yet) though I doubt this is the
cause of the problem. A diff between the two scripts yeilds the
following...

mythtv bin # diff -u tv_grab_au tv_grab_au_foxtel
--- tv_grab_au 2004-12-12 18:58:37.059979392 +1100
+++ tv_grab_au_foxtel 2004-12-12 19:01:11.487502840 +1100
@@ -115,15 +115,15 @@
#my $location = "Brisbane";
#my $location = "Queensland";
#my $location = "Sydney";
-my $location = "Melbourne";
-#my $location = "Australia";
+#my $location = "Melbourne";
+my $location = "Australia";

# pick your source
#
#my $source = "free";
-my $source = "freesd";
+#my $source = "freesd";
#my $source = "freehd";
-#my $source = "foxtel";
+my $source = "foxtel";

# choose the XMLID URL suffix that mythtv knows
#
@@ -144,7 +144,7 @@

my $guide_url = "http://tvguide.ninemsn.com.au/guide/";
my $details_url = "http://tvguide.ninemsn.com.au/closeup/default.asp?pid=";
-my $cache_dir = "/var/local/tv_grab_au";
+my $cache_dir = "/var/local/tv_grab_au_foxtel";

my $XMLTV_prefix = $source . "." . $location . ".";
my $XMLTV_suffix = "." . $XMLTVID_URL;

I haven't seen anything else that should need to be reconfigured. Just
in case there's an extra couple of linefeeds in my script, line 722 is
the following one that is in bold. It's in the last function of the script.

if (open(PRN, ">", $guide_prn_file))
{
for (my $count=0; $count<$#pidsrowspansnames; $count++) {
print PRN "$pidsrowspansnames[$count]>";
}
* print PRN "$pidsrowspansnames[$#pidsrowspansnames]";
* close(PRN);
} else {
print "can't open for writing $guide_prn_file\n";
}

I am as sure as I can be that I have copied the script verbatim. My
knowledge of PERL is near zero, though I know enough C/C++ to be able to
make some sense of the code. That line does look a little odd, though
I'd be willing to bet the difference between $pids.. and $#pids is like
the difference between a variable and a pointer, so I may well be wrong.

Is anyone else running in to this? Is it just me?


paulx at andreassen

Dec 12, 2004, 6:55 AM

Post #10 of 30 (10189 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

On Sun, 2004-12-12 at 20:27, Rob Hillis wrote:
> After far too long, I've changed over to this posted version of the
> script

Excellent.

> - it's working fine for the freesd channels, but when I run an
> appropriately modified copy for Foxtel, I end up with the following...
> mythtv root # tv_grab_au_foxtel
> grabing 7 days into /var/local/tv_grab_au_foxtel/guide.xml
> starting 5 threads
> loading queue
> 11122004 unchanged
> 12122004 unchanged
> 13122004 unchanged
> 14122004 unchanged
> 15122004 unchanged
> 16122004 downloading
> Use of uninitialized value in string at
> /usr/bin/tv_grab_au_foxtel line 722.

This is probably caused by have no data to download.

> 17122004 downloading
> Use of uninitialized value in string at
> /usr/bin/tv_grab_au_foxtel line 722.
> 18122004 downloading
> Use of uninitialized value in string at
> /usr/bin/tv_grab_au_foxtel line 722.
> queue is complete
> all threads done
> building xml structure
> At this point, the script just hangs.

The hang is probably because the files aren't closed and then can't be
opened.

> Doesn't do anything, even if left alone for a few hours. I probably
> should point out that the Foxtel channels aren't in my channels table
> (yet) though I doubt this is the cause of the problem. A diff between
> the two scripts yeilds the following...
>
> mythtv bin # diff -u tv_grab_au tv_grab_au_foxtel
> --- tv_grab_au 2004-12-12 18:58:37.059979392 +1100
> +++ tv_grab_au_foxtel 2004-12-12 19:01:11.487502840 +1100
> @@ -115,15 +115,15 @@
> #my $location = "Brisbane";
> #my $location = "Queensland";
> #my $location = "Sydney";
> -my $location = "Melbourne";
> -#my $location = "Australia";
> +#my $location = "Melbourne";
> +my $location = "Australia";
>
> # pick your source
> #
> #my $source = "free";
> -my $source = "freesd";
> +#my $source = "freesd";
> #my $source = "freehd";
> -#my $source = "foxtel";
> +my $source = "foxtel";
>
> # choose the XMLID URL suffix that mythtv knows
> #
> @@ -144,7 +144,7 @@
>
> my $guide_url = "http://tvguide.ninemsn.com.au/guide/";
> my $details_url =
> "http://tvguide.ninemsn.com.au/closeup/default.asp?pid=";
> -my $cache_dir = "/var/local/tv_grab_au";
> +my $cache_dir = "/var/local/tv_grab_au_foxtel";
>
> my $XMLTV_prefix = $source . "." . $location . ".";
> my $XMLTV_suffix = "." . $XMLTVID_URL;
>

Look good to me.

> I haven't seen anything else that should need to be reconfigured.
> Just in case there's an extra couple of linefeeds in my script, line
> 722 is the following one that is in bold. It's in the last function
> of the script.
>
> if (open(PRN, ">", $guide_prn_file))
> {
> for (my $count=0; $count<$#pidsrowspansnames;
> $count++) {
> print PRN "$pidsrowspansnames[$count]>";
> }
> print PRN "$pidsrowspansnames[$#pidsrowspansnames]";

I believe this is the line giving grieve.

> close(PRN);
> } else {
> print "can't open for writing $guide_prn_file\n";
> }
>
> I am as sure as I can be that I have copied the script verbatim. My
> knowledge of PERL is near zero, though I know enough C/C++ to be able
> to make some sense of the code.

Read the 'man perlintro' and you should be ok.

> That line does look a little odd, though I'd be willing to bet the
> difference between $pids.. and $#pids is like the difference between a
> variable and a pointer, so I may well be wrong.

Perl is much more convoluted than that.

> Is anyone else running in to this? Is it just me?

Not may people use the foxtel stuff.

I've attached a fixed version and here is a test output:

grabing 1 days into /var/local/tv_grab_au.foxtel/guide.xml
starting 5 threads
loading queue
18122004 downloading
19122004 downloading
queue is complete
all threads done
building xml structure
no pids in /var/local/tv_grab_au.foxtel/18122004/guide.prn
no pids in /var/local/tv_grab_au.foxtel/19122004/guide.prn
writing file
cleaning 7 days
done

Paul
Attachments: tv_grab_au (19.4 KB)


paulx at andreassen

Dec 12, 2004, 7:02 AM

Post #11 of 30 (10168 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

Hi Rob,

Here is a foxtel grabber from Mark. He sent it to me awhile ago because
he isn't on the myth-user mail list.

Paul

-----Forwarded Message-----
> From: Mark Spieth <mspieth [at] dclabs>
> To: paulx [at] andreassen
> Subject: update to ninemsn grabber
> Date: Wed, 01 Dec 2004 14:49:31 +1100
>
> also a foxtel with conf one that grabs from www.foxtel.tv
> which works ok. can get listing a whole month ahead.
> conf is extension of d1 conf file to allow mapping of unknown d1
channels
> also since main and adult are same in analog, a special case for
> multichannels. takes about 20 min for a month of progs.
> post if you like. too many posts on users for me so only on dev. thus
I
> send it to you.
> may be able to parallelize gets too.
>
> I run them both after d1 fetch in cron so that if these fail at least
I
> have something.
>
> cheers
> mark
Attachments: foxtel.conf (2.68 KB)
  foxtel.tv_grab_au.pl (12.6 KB)


rob at hillis

Dec 13, 2004, 1:39 AM

Post #12 of 30 (10167 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

Paul Andreassen wrote:

>On Sun, 2004-12-12 at 20:27, Rob Hillis wrote:
>
>
>>After far too long, I've changed over to this posted version of the
>>script
>>
>>
>Excellent.
>
>
So far, that's been my reaction too. I'm pleasantly surprised by it's speed over the old screen-scraper grabbers - probably one of the major reasons I put off the move for this long.

>> - it's working fine for the freesd channels, but when I run an
>>appropriately modified copy for Foxtel, I end up with the following...
>> [...]
>> Use of uninitialized value in string at
>> /usr/bin/tv_grab_au_foxtel line 722.
>>
>>
>This is probably caused by have no data to download.
>
>
That was my initial guess, though I couldn't be sure.

>>Doesn't do anything, even if left alone for a few hours. I probably
>>should point out that the Foxtel channels aren't in my channels table
>>(yet) though I doubt this is the cause of the problem. A diff between
>>the two scripts yeilds the following...
>>
>>[...]
>>
>Look good to me.
>
>
I didn't think there was a problem there, but best to be sure.

>> print PRN "$pidsrowspansnames[$#pidsrowspansnames]";
>>
>>
>I believe this is the line giving grieve.
>
>
Yep, that was the one. Apparently Thunderbird didn't think to highlight
it in any way when converting to text only...

>>I am as sure as I can be that I have copied the script verbatim. My
>>knowledge of PERL is near zero, though I know enough C/C++ to be able
>>to make some sense of the code.
>>
>>
>Read the 'man perlintro' and you should be ok.
>
>
Perl makes me nervous... I've heard multiple times that it can be far
more complex and unreadable than C/C++, and given what I've learnt about
regular expressions, I have no problem believing that... :) I have no
doubt what you lose in readability and simplicity, you pick up in
bucketloads in flexibility, but it *does* make for a steep learning curve.

>> That line does look a little odd, though I'd be willing to bet the
>>difference between $pids.. and $#pids is like the difference between a
>>variable and a pointer, so I may well be wrong.
>>
>>
>Perl is much more convoluted than that.
>
>
That's what I was afraid of... :)

>>Is anyone else running in to this? Is it just me?
>>
>>
>Not may people use the foxtel stuff.
>
>
Silly think is that I'm not - yet. I've been asked if I want to share
the cost of a Foxtel connection with my father (it's incredible - you
move out of home into your own place and one of your parents moves with
you...) Since I work rather a lot, I'm not interested if I have a lot
of trouble integrating Foxtel into MythTV, so I was starting to cover
the bases by adding channels to see what's on...

I haven't started toying with how to get the audio/video to MythTV yet,
whether it be via SVIDEO/composite to a BT878 card or via a DVB-S card
with a CAM. Obviously the latter would be vastly preferable, though
being a consortia essentially between channel 9 and Tel$tra, I have no
doubt that my wishes have been taken into account and a product
providing exactly the opposite will be provided.

>I've attached a fixed version and here is a test output:
>
>
I'm trying that at the moment. Whilst I'm no longer getting the errors,
the hanging may not yet be fixed... Now I'm getting:-

mythtv bin # tv_grab_au_foxtel
grabing 7 days into /var/local/tv_grab_au_foxtel/guide.xml
starting 5 threads
loading queue
12122004 unchanged
13122004 downloading
14122004 downloading
15122004 downloading
16122004 downloading
17122004 downloading
18122004 downloading
19122004 downloading
queue is complete

Repeated du -h commands in the cache directory does not reveal
incremental disk usage. It's only been running for around ten minutes,
so I'll leave it alone for a while and see what transpires...


rob at hillis

Dec 13, 2004, 2:38 AM

Post #13 of 30 (10199 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

Rob Hillis wrote:

>>I've attached a fixed version and here is a test output:
>>
>>
> I'm trying that at the moment. Whilst I'm no longer getting the
> errors, the hanging may not yet be fixed... Repeated du -h commands
> in the cache directory does not reveal incremental disk usage. It's
> only been running for around ten minutes, so I'll leave it alone for a
> while and see what transpires...

Just me being impatient. I didn't realise there was that much garbage
on Foxtel... 40mb+ for 5 days so far...


bam at snoopy

Dec 13, 2004, 3:31 PM

Post #14 of 30 (10176 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

>>>>> "Rob" == Rob Hillis <rob [at] hillis> writes:

Rob> Perl makes me nervous... I've heard multiple times that it
Rob> can be far more complex and unreadable than C/C++, and given
Rob> what I've learnt about regular expressions, I have no problem
Rob> believing that... :) I have no doubt what you lose in
Rob> readability and simplicity, you pick up in bucketloads in
Rob> flexibility, but it *does* make for a steep learning curve.

It is possible to have readable Perl code just as it is possible to
have unreadable C++ code. It is also possible to have regular
expressions (using an appropriate library) in C++.

Having said that, it seems much simpler to write unreadable Perl code
then to write unreadable C++ code when you are attempting to do the
opposite...

Rob> That line does look a little odd, though I'd be
Rob> willing to bet the difference between $pids.. and $#pids is
Rob> like the difference between a variable and a pointer, so I
Rob> may well be wrong.

I have heard all of this is going to change in the next major version
of Perl, when it finally gets released (read: years).

I am going to assume @pids is an array, e.g.:

my @pids;

If so, then you use

$pids[0] to get the first element, or:

$#pids to get the last index of pids (no this is not the total number
of pids),

$pids[$#pids] gets the last element of pids. This is also an example
of what I mean when I say "unreadable"!

Rob> Perl is much more convoluted than that.

The problem with perl as far as readability is concerned, not only
does it have a numerous ways of doing exactly the same thing, but its
goes all these weird symbols, and I can never remember what each one means,

eg:

( ) vs [ ] vs { }

$# _ $@ $_ @$_ $>

-o -w -x -d

"" vs '' vs q{} vs qq{} vs qx{} vs qw{} vs etc

@ vs $ vs %

== vs eq vs != vs ne

Ok, maybe I am going to extremes here, I actually have memorized some
of these, but the fact remains it isn't obvious to a non Perl expert
what these mean without looking up the documentation (and finding the
appropriate spot in the documentation too!).

Personally, I much prefer a compiled language because the compiler is
able to pick up on many mistakes that perl won't detect until you run
the code in question. However, I still use perl for quick and dirty
scripts.

Hope some of this helps....
--
Brian May <bam [at] snoopy>


paulx at andreassen

Dec 13, 2004, 11:12 PM

Post #15 of 30 (10166 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

On Tue, 2004-12-14 at 09:31, Brian May wrote:
> The problem with perl as far as readability is concerned, not only
> does it have a numerous ways of doing exactly the same thing, but its
> goes all these weird symbols, and I can never remember what each one means,
>
> eg:
> ( ) vs [ ] vs { }
> $# _ $@ $_ @$_ $>
> -o -w -x -d
> "" vs '' vs q{} vs qq{} vs qx{} vs qw{} vs etc
> @ vs $ vs %
> == vs eq vs != vs ne
>
> Ok, maybe I am going to extremes here, I actually have memorized some
> of these, but the fact remains it isn't obvious to a non Perl expert
> what these mean without looking up the documentation (and finding the
> appropriate spot in the documentation too!).
>
> Personally, I much prefer a compiled language because the compiler is
> able to pick up on many mistakes that perl won't detect until you run
> the code in question. However, I still use perl for quick and dirty
> scripts.
>
> Hope some of this helps....

Gee, I had just started to do some perl programing and now I find out
its alot more complex then I had hoped.

I like c / c++ very much but it lags behind the speed with which I can
write complex operations like http page getting and parsing. And the
standard libraries still have functions that are thread unsafe and
possibly buffer overflow problems.

Thanks
Paul


bend at bedel

Dec 14, 2004, 2:22 AM

Post #16 of 30 (10135 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

>>
>
> Gee, I had just started to do some perl programing and now I find out
> its alot more complex then I had hoped.
>
> I like c / c++ very much but it lags behind the speed with which I can
> write complex operations like http page getting and parsing. And the
> standard libraries still have functions that are thread unsafe and
> possibly buffer overflow problems.
>

Hi Paul
I might suggest using python instead of perl, its hard to write
unreadable python code. Its taken me about 1 day to write a parser for
the nine msn guide with it.


paulx at andreassen

Dec 17, 2004, 4:01 AM

Post #17 of 30 (10131 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

On Tue, 2004-12-14 at 20:22, Ben de Luca wrote:
> Hi Paul
> I might suggest using python instead of perl, its hard to write
> unreadable python code. Its taken me about 1 day to write a parser for
> the nine msn guide with it.

Thanks Ben,

I'm looking into writing a Python grabber using the
http://pytvgrab.sourceforge.net/ library as a base.

It may take awhile due to a bit of planning. ;-)

Here is the lastest perl version. Progress on this has slowed.

Paul
Attachments: tv_grab_au (20.2 KB)


bam at snoopy

Dec 17, 2004, 2:20 PM

Post #18 of 30 (10124 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

>>>>> "Paul" == Paul Andreassen <paulx [at] andreassen> writes:

Paul> It may take awhile due to a bit of planning. ;-)

Your going to *plan* first?

Isn't the traditional open source strategy to write code first until
you end up in a complete and non-recoverable mess ? ;-)

Good luck with the Python rewrite.
--
Brian May <bam [at] snoopy>


bend at bedel

Dec 19, 2004, 3:59 AM

Post #19 of 30 (10117 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

planning we dont need no stinking planing
Attachments: tv_grab_au.py (12.3 KB)


paulx at andreassen

Dec 19, 2004, 4:58 AM

Post #20 of 30 (10120 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

On Sat, 2004-12-18 at 08:20, Brian May wrote:
> >>>>> "Paul" == Paul Andreassen <paulx [at] andreassen> writes:
> Paul> It may take awhile due to a bit of planning. ;-)
>
> Your going to *plan* first?
>
> Isn't the traditional open source strategy to write code first until
> you end up in a complete and non-recoverable mess ? ;-)

I find there is nothing wrong with the traditional open source method
because the more I plan the less I work. :-P

On Sun, 2004-12-19 at 21:59, Ben de Luca wrote:
> planning we dont need no stinking planing
>
> ______________________________________________________________________
> here is my complete mess, but in python

What I can't get over with this beautiful Python program is how closely
it follows the Perl version.

I going away for two week over Christmas so Merry Christmas.
Paul


bend at bedel

Dec 19, 2004, 7:54 AM

Post #21 of 30 (10105 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

>>
>
> What I can't get over with this beautiful Python program is how closely
> it follows the Perl version.


I only spent a few minutes looking at the perl program before it hurt
my eyes


merry xmas

> I going away for two week over Christmas so Merry Christmas.
> Paul


michael-mlists at cheshire

Dec 19, 2004, 1:42 PM

Post #22 of 30 (10141 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

Why not a version which pulls Foxtel listings off www.foxtel.com - they post
1 month (!) in advance :)

M.
----- Original Message -----
From: "Paul Andreassen" <paulx [at] andreassen>
To: "Discussion about mythtv" <mythtv-users [at] mythtv>
Sent: Friday, December 17, 2004 10:31 PM
Subject: Re: [mythtv-users] New Australian XMLTV grabber


> On Tue, 2004-12-14 at 20:22, Ben de Luca wrote:
>> Hi Paul
>> I might suggest using python instead of perl, its hard to write
>> unreadable python code. Its taken me about 1 day to write a parser for
>> the nine msn guide with it.
>
> Thanks Ben,
>
> I'm looking into writing a Python grabber using the
> http://pytvgrab.sourceforge.net/ library as a base.
>
> It may take awhile due to a bit of planning. ;-)
>
> Here is the lastest perl version. Progress on this has slowed.
>
> Paul
>
>


--------------------------------------------------------------------------------


_______________________________________________
mythtv-users mailing list
mythtv-users [at] mythtv
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


bend at bedel

Dec 19, 2004, 7:30 PM

Post #23 of 30 (10060 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

On 20/12/2004, at 8:42 AM, Michael Cheshire ((Mailing Lists)) wrote:

> Why not a version which pulls Foxtel listings off www.foxtel.com -
> they post 1 month (!) in advance :)
>

unfortunately there is no FTA stuff there, so you might not see me
writing it.


paulx at andreassen

Dec 20, 2004, 11:02 PM

Post #24 of 30 (10114 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

Hi Australian Myth Users,

This version is FIVE TIMES FASTER. I was having trouble with my router
overflowing its NAT because the old version was leaving the ip
connections open. Its was probably also doing a DOS on the web site,
although only in a small way. It now takes 7:46 minutes instead of
36:33 minutes.

Later
Paul
Attachments: tv_grab_au (20.4 KB)


hamish at cloud

Dec 28, 2004, 9:40 PM

Post #25 of 30 (10039 views)
Permalink
Re: New Australian XMLTV grabber [In reply to]

On Tue, Dec 21, 2004 at 05:02:07PM +1000, Paul Andreassen wrote:
> This version is FIVE TIMES FASTER. I was having trouble with my router
> overflowing its NAT because the old version was leaving the ip
> connections open. Its was probably also doing a DOS on the web site,
> although only in a small way. It now takes 7:46 minutes instead of
> 36:33 minutes.

With 5 threads, isn't it DOSing their web site every time?


Hamish
--
Hamish Moffatt VK3SB <hamish [at] debian> <hamish [at] cloud>

First page Previous page 1 2 Next page Last page  View All MythTV users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.