Gossamer Forum
Home : Products : Gossamer Links : Discussions :

Re: [Andy] DMOZ import question

Quote Reply
Re: [Andy] DMOZ import question In reply to
Andy,

I checked my script and it had errors when I saved it.(" instead of ", oops) So I reupped it again without the errors, and ran it again.

This command: perl /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi
Here are the new errors: Backslash found where operator expected at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 4, near "html \"
(Do you need to predeclare html?)
Backslash found where operator expected at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 4, near "n\"
Operator or semicolon missing before &quot at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 4.
Ambiguous use of & resolved as operator & at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 4.
Scalar found where operator expected at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 8, near "$full"
(Missing semicolon on previous line?)
Semicolon seems to be missing at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 31.
Precedence problem: open Sendmail should be open(Sendmail) at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 32.
Operator or semicolon missing before &quot at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 32.
Ambiguous use of & resolved as operator & at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 32.
syntax error at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 3, near "br>"
syntax error at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 4, near "type:"
syntax error at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 5, near "br>"
syntax error at /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi line 13, near "br>"
Execution of /home/virtual/site2/fst/var/www/cgi-bin/dmoz.cgi aborted due to compilation errors.

Here is the script:

#!/usr/bin/perl

print "Content-type: text/html \n\n";

# helps us catch nasty errors use CGI::Carp qw(fatalsToBrowser);

$full = 1; # if only wanting everything bar regional and world...use this!

######################################################
# GET THE DUMP FILE STYARTS HERE #####################
######################################################

# get rid of the old file... #

# unlink "content.rdf.u8";

# $main_rdf_start_time = time;

# `wget --no-directories http://dmoz.org/rdf/content.rdf.u8.gz`;

# `gzip -d content.rdf.u8.gz`; # finished with raf.u8.gz, so delete now...keep space!

# unlink "content.rdf.u8.gz";

#$main_rdf_end_time = time;

#$main_rdf_total_time = $main_rdf_end_time - $main_rdf_start_time;

# open(MAIL,"|/usr/sbin/sendmail -t") || die &error("Unable to open Sendmail. Reason: $!");
# $webmaster = 'webmaster@assistantdirectors.com';
# print MAIL "To: $webmaster \n";
# print MAIL "From: $webmaster \n";
# print MAIL "Reply-to: $webmaster \n";
# print MAIL "Subject: RE Dump... \n\n";
# print MAIL "content.rdf.u8.gz has successfully been downloaded and decompressed. Took $main_rdf_total_time\n";
# print MAIL "\n \n Thanks";
# print MAIL "\n";
# print MAIL "A.J.Newby \n";
# print MAIL "Ace Installer \n";
# close(MAIL);

###################################################
### END THE GETTING OF THE MAIN DUMP FILE #########

###################################################

##################################################
### CUT THE DUMP INTO 17 SMALLER CATEGORIES ######
##################################################

$categories = "Top\/Adult::Top\/Arts";
$categories .= "~Top\/Arts::Top\/Business";
$categories .= "~Top\/Business::Top\/Computers";
$categories .= "~Top\/Computers::Top\/Games";
$categories .= "~Top\/Games::Top\/Health";
$categories .= "~Top\/Health::Top\/Home";
$categories .= "~Top\/News::Top\/Recreation";
$categories .= "~Top\/Reference::Top\/Regional";
$categories .= "~Top\/Regional::Top\/Science";
$categories .= "~Top\/Science::Top\/Shopping";
$categories .= "~Top\/Shopping::Top\/Society";
$categories .= "~Top\/Sports::Top\/World";
$categories .= "~Top\/Home::Top\/Kids_and_Teens";

@categories = split("~", $categories); # now loop through them all....

foreach (@categories) {
@aaa = split("::", $_);
$start_line = $aaa[0];
$end_line = $aaa[1];
$file_save = lc($start_line);
$file_save =~ s/Top//i; # open up the main dmoz dump u8 file

open(DMOZ, "./content.rdf.u8") || &error("Unable to read dump file. Reason: $!"); # category
open(CLEAN_DUMP, ">./$file_save.dump.slice");
print CLEAN_DUMP ""; close(CLEAN_DUMP); # to make the file blank...
open(DUMP_FILE, ">>./$file_save.dump.slice") or &error("cant do it: $! : ./$file_save.dump.slice"); # open ready for input....

# start a while..not closed til right near the end...
$do = 0;
while (<DMOZ>) {
# doing the arts category only needs this...then if the lines matches the regex we are moved onto the next category..
# check to see when we wanna start, otherwise use next;
if ($start_line) {
if ($_ =~ /<Topic r:id=\"$start_line\">/) { $do = 1; }
}
if ($_ =~ /<Topic r:id=\"$end_line\">/) { close(DUMP_FILE); &import_done_email($start_line); last; }
else { if ($do) { print DUMP_FILE "$_\n"; } }
} # end the while

close(DMOZ); # close up the main file...

} # end the foreach


sub import_done_email {

my $cat = shift;
open(MAIL,"|/usr/sbin/sendmail -t") || die &error("Unable to open Sendmail. Reason: $!");
$webmaster = 'webmaster@assistantdirectors.com';
print MAIL "To: $webmaster \n";
print MAIL "From: $webmaster \n";
print MAIL "Reply-to: $webmaster \n";
print MAIL "Subject: RE Main $cat Dump... \n\n";
print MAIL "$cat has now been inported into the SQL database.... \n";
print MAIL "\n \n Thanks";
print MAIL "\n";
print MAIL "A.J.Newby \n";
print MAIL "Ace Installer \n";
close(MAIL);
}


# error incase stuff goes wrong...
sub error {
my ($error) = shift;
print $error; exit;
}

Could the problem be that I already have the unzipped content.rdf.u8 on my server?

Thanks

Lennie
Subject Author Views Date
Thread; hot thread DMOZ import question lennie 7873 Jun 3, 2003, 7:36 AM
Thread; hot thread Re: [lennie] DMOZ import question
Paul 7703 Jun 3, 2003, 8:23 AM
Thread; hot thread Re: [Paul] DMOZ import question
lennie 7724 Jun 3, 2003, 8:35 AM
Thread; hot thread Re: [lennie] DMOZ import question
Paul 7748 Jun 3, 2003, 8:39 AM
Thread; hot thread Re: [Paul] DMOZ import question
lennie 7717 Jun 3, 2003, 8:54 AM
Thread; hot thread Re: [lennie] DMOZ import question
Andy 7735 Jun 3, 2003, 9:03 AM
Thread; hot thread Re: [Andy] DMOZ import question
lennie 7727 Jun 3, 2003, 9:20 AM
Thread; hot thread Re: [lennie] DMOZ import question
Andy 7721 Jun 3, 2003, 9:25 AM
Thread; hot thread Re: [Andy] DMOZ import question
lennie 7756 Jun 3, 2003, 9:34 AM
Thread; hot thread Re: [lennie] DMOZ import question
Andy 7746 Jun 3, 2003, 9:41 AM
Thread; hot thread Re: [Andy] DMOZ import question
lennie 7697 Jun 3, 2003, 9:51 AM
Thread; hot thread Re: [lennie] DMOZ import question
Andy 7685 Jun 3, 2003, 9:56 AM
Thread; hot thread Re: [Andy] DMOZ import question
lennie 7685 Jun 3, 2003, 10:08 AM
Thread; hot thread Re: [lennie] DMOZ import question
Andy 7779 Jun 3, 2003, 10:13 AM
Thread; hot thread Re: [Andy] DMOZ import question
lennie 7694 Jun 3, 2003, 10:19 AM
Thread; hot thread Re: [lennie] DMOZ import question
Andy 7736 Jun 3, 2003, 10:24 AM
Thread; hot thread Re: [Andy] DMOZ import question
lennie 7714 Jun 3, 2003, 10:30 AM
Thread; hot thread Re: [lennie] DMOZ import question
Andy 7734 Jun 3, 2003, 10:34 AM
Thread; hot thread Re: [Andy] DMOZ import question
lennie 7678 Jun 3, 2003, 10:38 AM
Thread; hot thread Re: [lennie] DMOZ import question
Andy 7739 Jun 3, 2003, 10:43 AM
Thread; hot thread Re: [Andy] DMOZ import question
lennie 7683 Jun 3, 2003, 11:16 AM
Post; hot thread Re: [lennie] DMOZ import question
Andy 7714 Jun 3, 2003, 11:18 AM
Thread; hot thread Re: [Andy] DMOZ import question
FrankM 7428 Jan 4, 2004, 5:33 PM
Thread; hot thread Re: [FrankM] DMOZ import question
FrankM 7475 Jan 4, 2004, 7:07 PM
Post; hot thread Re: [FrankM] DMOZ import question
Andy 7490 Jan 5, 2004, 1:58 AM