I've got a few news sites for the topic of my links script that I visit regularly and see all sorts of new sites that I would like to "mine" into Links 2.0. I don't want to go randomly about the web, nor do I want to list every sub page to the site. Just the main site page that is listed and have it submitted automatically for me (or my links admin) to go through by hand and validate/duplicate check.
I posted an ad for a different script I wanted done and have been getting quotes on the following proposal as well. Would anyone be interested in splitting the cost with me? I've got about a half dozen offers already and am negotiating with a few of them.
The current lowball price is about $400 for the script as described below. If you would like to help split the cost or have some other creative ideas to get the script done, I'm open to ideas. I'd like to see this as shareware after it is complete (maybe $25?) to help recoup the costs.
Here's the idea:
A Web Robot For Gossimer Threads Links 2.0
There are three different parts that I see to this CGI. Seek, Validate, and Submit.
Seek
Via a web interface an admin enters the URL's of sites for the bot to visit. This list is kept for the next time. I'm looking at doing this to news sites that announce new products and would probably need to run it nearly daily. When given the command, the bot goes to the site and takes all the external links and writes them to a file. The bot then goes to the next site and does the same, appending the same file. The process repeats until all the URL's have been visited or return as unavailable.
The script then eliminates duplicate URL's and take the highest level of multi-level listings. For Example: It finds "www.foo.com" and "www.foo.com/bar", it will eliminate "www.foo.com/bar", but keep "www.foo.com". These are then written to a seperate "history" file. Future visits by the bot will compare the found listings to the history file and eliminate the ones already listed in the history file. (so only new links will be added later)
Validate
The bot will visit the remaining "short list" of URL's that passed all the above sorting and uniqueness tests and make sure the site is up. Once at the site it will attempt to grab as much useful information as possible:
"Site Title" from the Title tags
"Description" from the Meta description tag, or, if no meta tag, take the first paragraph on the site that is smaller than H3.
"Contact Name and E-mail" from an e-mail address found near the words "PR, contact, e-mail, write, webmaster, owner, editor" or other pre-defined strings.
"Category" If Possible, match a category in Links with the site contents (not vital if too difficult, but really nice if it can be done.)
Submit
Take the new list of URL's with the above information and submit them to the "add a link" form of Links 2.0 in the proper fields ready for a normal admin validation from within Links 2.0 (to approve the addition of the site into the list).
After the sites are submitted, the admin will manually look them over and approve them so they go into Links. This will help make sure the bot does its job properly and keeps the quality up.
The process may take some time and could be broken down into different parts to be done on a staggard basis if it gets too lengthy.
Comments? Suggestions? Better Ideas?
Rich Barron
rbarron@exo.com
I posted an ad for a different script I wanted done and have been getting quotes on the following proposal as well. Would anyone be interested in splitting the cost with me? I've got about a half dozen offers already and am negotiating with a few of them.
The current lowball price is about $400 for the script as described below. If you would like to help split the cost or have some other creative ideas to get the script done, I'm open to ideas. I'd like to see this as shareware after it is complete (maybe $25?) to help recoup the costs.
Here's the idea:
A Web Robot For Gossimer Threads Links 2.0
There are three different parts that I see to this CGI. Seek, Validate, and Submit.
Seek
Via a web interface an admin enters the URL's of sites for the bot to visit. This list is kept for the next time. I'm looking at doing this to news sites that announce new products and would probably need to run it nearly daily. When given the command, the bot goes to the site and takes all the external links and writes them to a file. The bot then goes to the next site and does the same, appending the same file. The process repeats until all the URL's have been visited or return as unavailable.
The script then eliminates duplicate URL's and take the highest level of multi-level listings. For Example: It finds "www.foo.com" and "www.foo.com/bar", it will eliminate "www.foo.com/bar", but keep "www.foo.com". These are then written to a seperate "history" file. Future visits by the bot will compare the found listings to the history file and eliminate the ones already listed in the history file. (so only new links will be added later)
Validate
The bot will visit the remaining "short list" of URL's that passed all the above sorting and uniqueness tests and make sure the site is up. Once at the site it will attempt to grab as much useful information as possible:
"Site Title" from the Title tags
"Description" from the Meta description tag, or, if no meta tag, take the first paragraph on the site that is smaller than H3.
"Contact Name and E-mail" from an e-mail address found near the words "PR, contact, e-mail, write, webmaster, owner, editor" or other pre-defined strings.
"Category" If Possible, match a category in Links with the site contents (not vital if too difficult, but really nice if it can be done.)
Submit
Take the new list of URL's with the above information and submit them to the "add a link" form of Links 2.0 in the proper fields ready for a normal admin validation from within Links 2.0 (to approve the addition of the site into the list).
After the sites are submitted, the admin will manually look them over and approve them so they go into Links. This will help make sure the bot does its job properly and keeps the quality up.
The process may take some time and could be broken down into different parts to be done on a staggard basis if it gets too lengthy.
Comments? Suggestions? Better Ideas?
Rich Barron
rbarron@exo.com