Andy,
I checked my script and it had errors when I saved it.(" instead of ", oops) So I reupped it again without the errors, and ran it again.
This command: perl /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi
Here are the new errors: Backslash found where operator expected at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 4, near "html \"
(Do you need to predeclare html?)
Backslash found where operator expected at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 4, near "n\"
Operator or semicolon missing before " at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 4.
Ambiguous use of & resolved as operator & at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 4.
Scalar found where operator expected at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 8, near "$full"
(Missing semicolon on previous line?)
Semicolon seems to be missing at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 31.
Precedence problem: open Sendmail should be open(Sendmail) at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 32.
Operator or semicolon missing before " at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 32.
Ambiguous use of & resolved as operator & at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 32.
syntax error at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 3, near "br>"
syntax error at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 4, near "type:"
syntax error at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 5, near "br>"
syntax error at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 13, near "br>"
Execution of /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi aborted due to compilation errors.
Here is the script:
#!/usr/bin/perl
print "Content-type: text/html \n\n";
# helps us catch nasty errors use CGI::Carp qw(fatalsToBrowser);
$full = 1; # if only wanting everything bar regional and world...use this!
######################################################
# GET THE DUMP FILE STYARTS HERE #####################
######################################################
# get rid of the old file... #
# unlink "content.rdf.u8";
# $main_rdf_start_time = time;
# `wget --no-directories http://dmoz.org/rdf/content.rdf.u8.gz`;
# `gzip -d content.rdf.u8.gz`; # finished with raf.u8.gz, so delete now...keep space!
# unlink "content.rdf.u8.gz";
#$main_rdf_end_time = time;
#$main_rdf_total_time = $main_rdf_end_time - $main_rdf_start_time;
# open(MAIL,"|/usr/sbin/sendmail -t") || die &error("Unable to open Sendmail. Reason: $!");
# $webmaster = 'webmaster@assistantdirectors.com';
# print MAIL "To: $webmaster \n";
# print MAIL "From: $webmaster \n";
# print MAIL "Reply-to: $webmaster \n";
# print MAIL "Subject: RE Dump... \n\n";
# print MAIL "content.rdf.u8.gz has successfully been downloaded and decompressed. Took $main_rdf_total_time\n";
# print MAIL "\n \n Thanks";
# print MAIL "\n";
# print MAIL "A.J.Newby \n";
# print MAIL "Ace Installer \n";
# close(MAIL);
###################################################
### END THE GETTING OF THE MAIN DUMP FILE #########
###################################################
##################################################
### CUT THE DUMP INTO 17 SMALLER CATEGORIES ######
##################################################
$categories = "Top\/Adult::Top\/Arts";
$categories .= "~Top\/Arts::Top\/Business";
$categories .= "~Top\/Business::Top\/Computers";
$categories .= "~Top\/Computers::Top\/Games";
$categories .= "~Top\/Games::Top\/Health";
$categories .= "~Top\/Health::Top\/Home";
$categories .= "~Top\/News::Top\/Recreation";
$categories .= "~Top\/Reference::Top\/Regional";
$categories .= "~Top\/Regional::Top\/Science";
$categories .= "~Top\/Science::Top\/Shopping";
$categories .= "~Top\/Shopping::Top\/Society";
$categories .= "~Top\/Sports::Top\/World";
$categories .= "~Top\/Home::Top\/Kids_and_Teens";
@categories = split("~", $categories); # now loop through them all....
foreach (@categories) {
@aaa = split("::", $_);
$start_line = $aaa[0];
$end_line = $aaa[1];
$file_save = lc($start_line);
$file_save =~ s/Top//i; # open up the main dmoz dump u8 file
open(DMOZ, "./content.rdf.u8") || &error("Unable to read dump file. Reason: $!"); # category
open(CLEAN_DUMP, ">./$file_save.dump.slice");
print CLEAN_DUMP ""; close(CLEAN_DUMP); # to make the file blank...
open(DUMP_FILE, ">>./$file_save.dump.slice") or &error("cant do it: $! : ./$file_save.dump.slice"); # open ready for input....
# start a while..not closed til right near the end...
$do = 0;
while (<DMOZ>) {
# doing the arts category only needs this...then if the lines matches the regex we are moved onto the next category..
# check to see when we wanna start, otherwise use next;
if ($start_line) {
if ($_ =~ /<Topic r:id=\"$start_line\">/) { $do = 1; }
}
if ($_ =~ /<Topic r:id=\"$end_line\">/) { close(DUMP_FILE); &import_done_email($start_line); last; }
else { if ($do) { print DUMP_FILE "$_\n"; } }
} # end the while
close(DMOZ); # close up the main file...
} # end the foreach
sub import_done_email {
my $cat = shift;
open(MAIL,"|/usr/sbin/sendmail -t") || die &error("Unable to open Sendmail. Reason: $!");
$webmaster = 'webmaster@assistantdirectors.com';
print MAIL "To: $webmaster \n";
print MAIL "From: $webmaster \n";
print MAIL "Reply-to: $webmaster \n";
print MAIL "Subject: RE Main $cat Dump... \n\n";
print MAIL "$cat has now been inported into the SQL database.... \n";
print MAIL "\n \n Thanks";
print MAIL "\n";
print MAIL "A.J.Newby \n";
print MAIL "Ace Installer \n";
close(MAIL);
}
# error incase stuff goes wrong...
sub error {
my ($error) = shift;
print $error; exit;
}
Could the problem be that I already have the unzipped content.rdf.u8 on my server?
Thanks
Lennie
I checked my script and it had errors when I saved it.(" instead of ", oops) So I reupped it again without the errors, and ran it again.
This command: perl /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi
Here are the new errors: Backslash found where operator expected at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 4, near "html \"
(Do you need to predeclare html?)
Backslash found where operator expected at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 4, near "n\"
Operator or semicolon missing before " at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 4.
Ambiguous use of & resolved as operator & at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 4.
Scalar found where operator expected at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 8, near "$full"
(Missing semicolon on previous line?)
Semicolon seems to be missing at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 31.
Precedence problem: open Sendmail should be open(Sendmail) at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 32.
Operator or semicolon missing before " at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 32.
Ambiguous use of & resolved as operator & at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 32.
syntax error at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 3, near "br>"
syntax error at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 4, near "type:"
syntax error at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 5, near "br>"
syntax error at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 13, near "br>"
Execution of /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi aborted due to compilation errors.
Here is the script:
#!/usr/bin/perl
print "Content-type: text/html \n\n";
# helps us catch nasty errors use CGI::Carp qw(fatalsToBrowser);
$full = 1; # if only wanting everything bar regional and world...use this!
######################################################
# GET THE DUMP FILE STYARTS HERE #####################
######################################################
# get rid of the old file... #
# unlink "content.rdf.u8";
# $main_rdf_start_time = time;
# `wget --no-directories http://dmoz.org/rdf/content.rdf.u8.gz`;
# `gzip -d content.rdf.u8.gz`; # finished with raf.u8.gz, so delete now...keep space!
# unlink "content.rdf.u8.gz";
#$main_rdf_end_time = time;
#$main_rdf_total_time = $main_rdf_end_time - $main_rdf_start_time;
# open(MAIL,"|/usr/sbin/sendmail -t") || die &error("Unable to open Sendmail. Reason: $!");
# $webmaster = 'webmaster@assistantdirectors.com';
# print MAIL "To: $webmaster \n";
# print MAIL "From: $webmaster \n";
# print MAIL "Reply-to: $webmaster \n";
# print MAIL "Subject: RE Dump... \n\n";
# print MAIL "content.rdf.u8.gz has successfully been downloaded and decompressed. Took $main_rdf_total_time\n";
# print MAIL "\n \n Thanks";
# print MAIL "\n";
# print MAIL "A.J.Newby \n";
# print MAIL "Ace Installer \n";
# close(MAIL);
###################################################
### END THE GETTING OF THE MAIN DUMP FILE #########
###################################################
##################################################
### CUT THE DUMP INTO 17 SMALLER CATEGORIES ######
##################################################
$categories = "Top\/Adult::Top\/Arts";
$categories .= "~Top\/Arts::Top\/Business";
$categories .= "~Top\/Business::Top\/Computers";
$categories .= "~Top\/Computers::Top\/Games";
$categories .= "~Top\/Games::Top\/Health";
$categories .= "~Top\/Health::Top\/Home";
$categories .= "~Top\/News::Top\/Recreation";
$categories .= "~Top\/Reference::Top\/Regional";
$categories .= "~Top\/Regional::Top\/Science";
$categories .= "~Top\/Science::Top\/Shopping";
$categories .= "~Top\/Shopping::Top\/Society";
$categories .= "~Top\/Sports::Top\/World";
$categories .= "~Top\/Home::Top\/Kids_and_Teens";
@categories = split("~", $categories); # now loop through them all....
foreach (@categories) {
@aaa = split("::", $_);
$start_line = $aaa[0];
$end_line = $aaa[1];
$file_save = lc($start_line);
$file_save =~ s/Top//i; # open up the main dmoz dump u8 file
open(DMOZ, "./content.rdf.u8") || &error("Unable to read dump file. Reason: $!"); # category
open(CLEAN_DUMP, ">./$file_save.dump.slice");
print CLEAN_DUMP ""; close(CLEAN_DUMP); # to make the file blank...
open(DUMP_FILE, ">>./$file_save.dump.slice") or &error("cant do it: $! : ./$file_save.dump.slice"); # open ready for input....
# start a while..not closed til right near the end...
$do = 0;
while (<DMOZ>) {
# doing the arts category only needs this...then if the lines matches the regex we are moved onto the next category..
# check to see when we wanna start, otherwise use next;
if ($start_line) {
if ($_ =~ /<Topic r:id=\"$start_line\">/) { $do = 1; }
}
if ($_ =~ /<Topic r:id=\"$end_line\">/) { close(DUMP_FILE); &import_done_email($start_line); last; }
else { if ($do) { print DUMP_FILE "$_\n"; } }
} # end the while
close(DMOZ); # close up the main file...
} # end the foreach
sub import_done_email {
my $cat = shift;
open(MAIL,"|/usr/sbin/sendmail -t") || die &error("Unable to open Sendmail. Reason: $!");
$webmaster = 'webmaster@assistantdirectors.com';
print MAIL "To: $webmaster \n";
print MAIL "From: $webmaster \n";
print MAIL "Reply-to: $webmaster \n";
print MAIL "Subject: RE Main $cat Dump... \n\n";
print MAIL "$cat has now been inported into the SQL database.... \n";
print MAIL "\n \n Thanks";
print MAIL "\n";
print MAIL "A.J.Newby \n";
print MAIL "Ace Installer \n";
close(MAIL);
}
# error incase stuff goes wrong...
sub error {
my ($error) = shift;
print $error; exit;
}
Could the problem be that I already have the unzipped content.rdf.u8 on my server?
Thanks
Lennie