Gossamer Forum
Home : Products : Links 2.0 : Customization :

spider, crawler, bot, link adder mod?

Quote Reply
spider, crawler, bot, link adder mod?
I've got a few news sites for the topic of my links script that I visit regularly and see all sorts of new sites that I would like to "mine" into Links 2.0. I don't want to go randomly about the web, nor do I want to list every sub page to the site. Just the main site page that is listed and have it submitted automatically for me (or my links admin) to go through by hand and validate/duplicate check.

I posted an ad for a different script I wanted done and have been getting quotes on the following proposal as well. Would anyone be interested in splitting the cost with me? I've got about a half dozen offers already and am negotiating with a few of them.

The current lowball price is about $400 for the script as described below. If you would like to help split the cost or have some other creative ideas to get the script done, I'm open to ideas. I'd like to see this as shareware after it is complete (maybe $25?) to help recoup the costs.

Here's the idea:

A Web Robot For Gossimer Threads Links 2.0

There are three different parts that I see to this CGI. Seek, Validate, and Submit.

Seek
Via a web interface an admin enters the URL's of sites for the bot to visit. This list is kept for the next time. I'm looking at doing this to news sites that announce new products and would probably need to run it nearly daily. When given the command, the bot goes to the site and takes all the external links and writes them to a file. The bot then goes to the next site and does the same, appending the same file. The process repeats until all the URL's have been visited or return as unavailable.

The script then eliminates duplicate URL's and take the highest level of multi-level listings. For Example: It finds "www.foo.com" and "www.foo.com/bar", it will eliminate "www.foo.com/bar", but keep "www.foo.com". These are then written to a seperate "history" file. Future visits by the bot will compare the found listings to the history file and eliminate the ones already listed in the history file. (so only new links will be added later)

Validate
The bot will visit the remaining "short list" of URL's that passed all the above sorting and uniqueness tests and make sure the site is up. Once at the site it will attempt to grab as much useful information as possible:

"Site Title" from the Title tags
"Description" from the Meta description tag, or, if no meta tag, take the first paragraph on the site that is smaller than H3.
"Contact Name and E-mail" from an e-mail address found near the words "PR, contact, e-mail, write, webmaster, owner, editor" or other pre-defined strings.
"Category" If Possible, match a category in Links with the site contents (not vital if too difficult, but really nice if it can be done.)

Submit
Take the new list of URL's with the above information and submit them to the "add a link" form of Links 2.0 in the proper fields ready for a normal admin validation from within Links 2.0 (to approve the addition of the site into the list).

After the sites are submitted, the admin will manually look them over and approve them so they go into Links. This will help make sure the bot does its job properly and keeps the quality up.

The process may take some time and could be broken down into different parts to be done on a staggard basis if it gets too lengthy.

Comments? Suggestions? Better Ideas?

Rich Barron
rbarron@exo.com
Quote Reply
Re: spider, crawler, bot, link adder mod? In reply to
A company called Innerprise offers some of this, but not all. I downloaded the demo, but it crashed a couple of times...maybe someone else has had better luck.

You can download the functional demo at http://www.innerprise.net/us4.htm .

It doesn't do what you suggested with the email and contact name thing, but I bet if you contact them, they will agree it's a great idea for a future mod.

------------------
http://www.diamondgrading.com
Learn about Diamonds from former Gem-Lab Graders
Quote Reply
Re: spider, crawler, bot, link adder mod? In reply to
Innerprise seems to have a good solution, but it doesn't fit my needs.
It only runs on Windows.
It's $100 per machine. (and I have a handfull of people who would be running this in varous locations)
My machine at work is behind a firewall that would block this app.

I've also found a couple freeware/shareware apps that will come close to doing what I need, but my best solution would be a CGI script. I can run it from anywhere with a net connection and set it up to run via cron.

The extra benifit would be that we could split the development cost, open it up to the public as shareware, and make a nice contribution to Links for those that need it.

I suppose that's being idealistsic, so plan B would be some custom scripting using a shareware application that will do it all from a desktop machine instead of a server, much like the program you suggested.

Thanks for the input!
Quote Reply
Re: spider, crawler, bot, link adder mod? In reply to
Please remember that this forum is to be used to discuss ongoing Links modification projects or to get help with existing Links mods already published. It is not for soliciting programmers to help with an "envisioned" Links mod. That would be better discussed in the Links Discussion Forum which is open to any discussion concerning Links. This forum is more of a "support" forum to solve problems with mods. Thank you for your cooperation.

------------------
Bob Connors, Moderator
Links Modification Forum
Moderator Email: moderator@orphanage.com
Personal Email: bobsie@orphanage.com
goodstuff.orphanage.com/

Quote Reply
Re: spider, crawler, bot, link adder mod? In reply to
Greeting rBarron!

I too am at the place of looking for a spider mod for LINKS.

I downloaded URL Spider Pro and it crashed more than several times and I use whim98.

Keep my email address etc and when you have finished send me an email.

Bobsie

Quote:
Please remember that this forum is to be used to discuss ongoing Links modification projects or to get help with existing Links mods already published

This sort of conversation is coming up more and more now as links establishes itself into the market place. Can you ask Alex to start another forum for people seeking custom mods. Hyperseek has one and other software have similar forums as well. Saves you having to post messages in a forum not meant for those type of messages.

Just a suggestion.




------------------
http://www.nzcid.godzone.net.nz
New Zealand Christian Internet Directory


Quote Reply
Re: spider, crawler, bot, link adder mod? In reply to
You're absolutely right Bobsie.

Could you please help me out a little and delete this thread from the forum? We've decided to not make it open source as previously discussed and may go shareware or commercial. By deleting this thread you can help us curtail any other outside development.

Thank You,
Rich Barron
Quote Reply
Re: spider, crawler, bot, link adder mod? In reply to
OK rbarron, you said you found some shareware apps that came close to what you were looking for. Where can I find these?

Thanks,
SaintSilver