>> Just skipping the record all the time is not good, as there is no log file made of skipped records.
>> Just adding it will wipe out existing data.
I'm really missing something here. We are not looking at a spider seeking out new, unknown links. These are links that are already imported in your database, from DMOZ, and links not yet imported.
Skipped records already exist, and are already in your database, so what is the issue? You want to be notified that the record you got from DMOZ is in DMOZ? If you do a 5,000 record import, and there is one new record, you'll have 4,999 "skipped" records, that are pretty pointless.
If you are looking for changed records, that is a double edged sword. Also, DMOZ drops good links for all sorts of reasons, and keeps bad ones around way longer than they should. You are better running the "Validate Links" and keeping your own record of still-live sites. It will be more accurate than DMOZ which _rarely_ seems to prune dead links. At least not as often as it should.
I'm pretty hard on this kind of tool, and I've been beating it to death with loads of imports the past 2-3 weeks. I really don't see what you are asking for.
If you have something specific in mind, for your needs, maybe you need a custom job. But, I don't see what added functionality you are trying to get.
The only thing I can see in all this, is something I've wanted for pruning duplicate links.
1) the database runs, and checks if the link (URL) exists in the database. If it does, it checks the Title & Description to see if it matches.
2) if they match, it's skipped (in the duplicate database, it's deleted, after a cat_links addition). At most, keep a count of skipped links, there is _no_ point in doing anything else.
3) if they don't match, insert the link into a duplicates database.
4) if the link exists, but the category is different, a) ignore b) add a cat_link record if the category exists c) add to a suggestion database for adding a catlinks record, or creating a new title if the category doesn't.
This adds a slight bit of functionality that I just haven't been able to allocate the time for. The current tools allow you to do all this, just not in a simple/integrated manner.
PUGDOG� Enterprises, Inc.
The best way to contact me is to NOT use Email.
Please leave a PM here.
>> Just adding it will wipe out existing data.
I'm really missing something here. We are not looking at a spider seeking out new, unknown links. These are links that are already imported in your database, from DMOZ, and links not yet imported.
Skipped records already exist, and are already in your database, so what is the issue? You want to be notified that the record you got from DMOZ is in DMOZ? If you do a 5,000 record import, and there is one new record, you'll have 4,999 "skipped" records, that are pretty pointless.
If you are looking for changed records, that is a double edged sword. Also, DMOZ drops good links for all sorts of reasons, and keeps bad ones around way longer than they should. You are better running the "Validate Links" and keeping your own record of still-live sites. It will be more accurate than DMOZ which _rarely_ seems to prune dead links. At least not as often as it should.
I'm pretty hard on this kind of tool, and I've been beating it to death with loads of imports the past 2-3 weeks. I really don't see what you are asking for.
If you have something specific in mind, for your needs, maybe you need a custom job. But, I don't see what added functionality you are trying to get.
The only thing I can see in all this, is something I've wanted for pruning duplicate links.
1) the database runs, and checks if the link (URL) exists in the database. If it does, it checks the Title & Description to see if it matches.
2) if they match, it's skipped (in the duplicate database, it's deleted, after a cat_links addition). At most, keep a count of skipped links, there is _no_ point in doing anything else.
3) if they don't match, insert the link into a duplicates database.
4) if the link exists, but the category is different, a) ignore b) add a cat_link record if the category exists c) add to a suggestion database for adding a catlinks record, or creating a new title if the category doesn't.
This adds a slight bit of functionality that I just haven't been able to allocate the time for. The current tools allow you to do all this, just not in a simple/integrated manner.
PUGDOG� Enterprises, Inc.
The best way to contact me is to NOT use Email.
Please leave a PM here.