I know that a lot of people are interested in importing from DMOZ and want to know more about how much server space is needed and how long it takes etc.....so I thought I'd give those interested, a little bit of information........
Right, over the last two days I have been importing from DMOZ and below are the steps you need to take to do this as well as other useful pieces of information...
Firstly you obviously need to have Links SQL installed, then the next step is to get the content.rdf file. This file can be found at the DMOZ website and is 139MB gzipped and 700MB unzipped. Therefore you need to make sure you have enough space on your server to begin with.
To get the content.rdf file to your server, login to your telnet account and type:
wget ftp://ftp.dmoz.org/rdf/content.rdf.u8.gz
After about 20 minutes (depending on your line speed), you should have the 139MB file on your server.
Next you need to type:
gzip -d content.rdf.u8.gz
This will unzip the file to its full 700MB size. You don't have to unzip it as Parse_RDF.pl will do it for you if you specify within the script.
Next you need to upload Parse_RDF.pl to your server and chmod it to 755 and edit the variables inside the script such as which category you want to import and the path to your content.rdf file.
Back in telnet type:
perl Parse_RDF.pl
This will begin the import.
When complete you will need to re-index your directory from the admin area..(only takes 30 secs)...then rebuild from telnet using:
perl nph-build.cgi
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ SOME STATS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I imported the first two categories...Adult and Arts
The Adult Category took 10 mins to import and 15 to build.
The Arts category took 60 mins to import and 30 to build.
Both categories combined totals 30,155 categories and 300,000 links.
Total space needed for database is around 100MB
Including the content.rdf file you will need 800MB free space, but once you have imported you can delete the 700MB file.
Hope this helps you all.......
Paul Wilson.
NEW http://www.wiredon.net
Right, over the last two days I have been importing from DMOZ and below are the steps you need to take to do this as well as other useful pieces of information...
Firstly you obviously need to have Links SQL installed, then the next step is to get the content.rdf file. This file can be found at the DMOZ website and is 139MB gzipped and 700MB unzipped. Therefore you need to make sure you have enough space on your server to begin with.
To get the content.rdf file to your server, login to your telnet account and type:
wget ftp://ftp.dmoz.org/rdf/content.rdf.u8.gz
After about 20 minutes (depending on your line speed), you should have the 139MB file on your server.
Next you need to type:
gzip -d content.rdf.u8.gz
This will unzip the file to its full 700MB size. You don't have to unzip it as Parse_RDF.pl will do it for you if you specify within the script.
Next you need to upload Parse_RDF.pl to your server and chmod it to 755 and edit the variables inside the script such as which category you want to import and the path to your content.rdf file.
Back in telnet type:
perl Parse_RDF.pl
This will begin the import.
When complete you will need to re-index your directory from the admin area..(only takes 30 secs)...then rebuild from telnet using:
perl nph-build.cgi
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ SOME STATS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I imported the first two categories...Adult and Arts
The Adult Category took 10 mins to import and 15 to build.
The Arts category took 60 mins to import and 30 to build.
Both categories combined totals 30,155 categories and 300,000 links.
Total space needed for database is around 100MB
Including the content.rdf file you will need 800MB free space, but once you have imported you can delete the 700MB file.
Hope this helps you all.......
Paul Wilson.
NEW http://www.wiredon.net