Gossamer Forum
Home : Products : Links 2.0 : Discussions :

New mod/addon idea ....

Quote Reply
New mod/addon idea ....
After installing both the altavista.cgi
and bulkload mod from the resouce center
I thought of maybe combining the 2.

Sorry if this gets a bit long winded ..
Hopefully some people will see where im going

My idea is to create a script that can be used to bulk-up your link database.

The script would have to ...
Create a page where a admin can choose which category to buildup, and whether to use
meta tags or link descriptions as keywords.

Once a admin chooses these items and submits
the page the script would search altavista
using the category title plus either that categories meta tags and or words grabbed
from the pre-existing links in that category.

What is returned from the search engine gets
placed into text fields on a results type page. This results page would allow
editing of what is returned.

Then the next step would be to have the cgi write that info to a file for bulkloading
into your link database. That is unless the data can be passed straight from the page
into the database without having to save it.

Ideally a check to see if the link is already in the database before adding it.

Now I won't comment on the legalities of
doing such a grab.

I have already altered the altavista.cgi
so that it returns the items in pipe
delimited form ready for bulkloading
but I still would have to enter the categories manually for it to be of any use.
I would prefer to use metacrawler as it
uses alot of search engines at once so
there would be no need to do a version for yahoo and webcralwer etc etc

So anyone have any thoughts on my idea ?

Summary of idea:
* Script to bulk up your own link db.
* Use categories and meta tags to figure
out what to search for.
* Return the info in a editable form so
bad links and descriptions can be removed or fixed.
* Check for duplicate links
* Bulkload the results into your link.db
* Avoid being sued
Quote Reply
Re: New mod/addon idea .... In reply to
Are you talking about a spider type of script that takes links from the Net and puts them into a temporary database to use with your links.db???

This has already be done in the following formats:

1) Spider Pro in the Resource Center
2) GetURL.cgi found in one of the Threads in the Modification Forum written by Widgetz.
3) DMOZ Integration script (works best with SQL version of LINKS)

Regards,

------------------
Eliot Lee
Anthro TECH,L.L.C
www.anthrotech.com
----------------------




Quote Reply
Re: New mod/addon idea .... In reply to
No not exactly .. not taken from the net ..
researched thru another search engine

1) Spider Pro in the Resource Center

Spider pro is a windows based program ..
It wouldn't allow direct addition of new
links into a db. The data still has to be edited and checked for duplicate links.

2) GetURL.cgi found in one of the Threads in the Modification Forum written by Widgetz.

I have used "a" geturl.cgi before. Not sure if it was widgetz one. It was great for grabbing pages but still doesn't have the interaction im talking about

3) DMOZ Integration script (works best with SQL version of LINKS)
Yes this is closer .. but not what im after.

Im talking about a intergrated script to be used from the admin panel. A painless way to add extra links to categories that you already have. I don't wish to add millions of links just cause someone else has them.
This is directed at those wishing to get the harder to find links that aren't on yahoo.

Using the category name, the category meta tags and desciption text as keywords the
cgi would contact metacrawler and grab
the results. The results page would be editable.

The edited data could then be submitted into
the category first chosen prior to contacting metacrawler. The added links should be of course first checked for duplicate links.

I mention metacrawler over altavista as the results come from 11 search engines.
No need to re program a cgi to whatever
search engine you want .. metacrawler already
gives a consistant output to each engine it uses.
check out http://beta.metacrawler.com/index_power.html
for some of its features. The domain limit
is especially handy to me when i want only
australian content.

Its all fine to grab someone elses database
but when you are running links for a specific
topic its hard other then manually to get
the links you want. Being able to use
multiple search engines at once and have the
returned info editable and ready for import
would make alot of these "niche" links sites
alot easier to maintain.

cheers
shaun
p.s. that make it any more confusing ?
Quote Reply
Re: New mod/addon idea .... In reply to
eliot,

where can i get the "DMOZ Integration script" ?

many thanks

------------------
ciao
Nicky
mse.nicky.net
www.nicky.net/forum german forum for GT Links


Quote Reply
Re: New mod/addon idea .... In reply to
Look in this Forum for DMOZ. There is a perl script posted that does this...however, it is most effective with SQL version of LINKS.

Regards,

------------------
Eliot Lee
Anthro TECH,L.L.C
www.anthrotech.com
----------------------




Quote Reply
Re: New mod/addon idea .... In reply to
parse_rdf.pl comes with Links SQL.. it's made for Links SQL..

------------------
Jerry Su
Links SQL Licensed
------------------
Quote Reply
Re: New mod/addon idea .... In reply to
Jerry,

I can't find your getURL.cgi coding anywhere...do you think you could post it here? Would appreciate it alot

Thank you
Quote Reply
Re: New mod/addon idea .... In reply to
this is just a for fun program made awhile ago..

www.pdamania.com/cgi-bin/boom1.0/boom.cgi

it's not really a spider.. i have a spider somewhere else on my site though..

------------------
Jerry Su
Links SQL Licensed
------------------
Quote Reply
Re: New mod/addon idea .... In reply to
Hi blurb I've just been reading your post and it's just what I'm looking for! I'm i the process of building a UK directory and the webcrawler idea sounds real good! Anyway would you mind giving me a shout if you do get this one sorted? My email is devs1@usa.net

Cheers,
Devs
Quote Reply
Re: New mod/addon idea .... In reply to
Hi devs,
If I can can my head around the LWP
module it shouldn't be very hard to code.
I have been looking at the code for
"Lwp Url Checker mod" and for the altavista.cgi
from the resource area.

the serach submission and the grabbing of the results is only a couple of lines of perl.
Stripping the results into link name, url
and description is the hard part ..

Anyone out there a expert pattern matcher ?
Any example LWP code would be handy.

Good thing about the metacralwer output is it is so neat and tidy that looking for the links and descriptions should not be that
complex. I already have a macro for textpad that does the ripping and converting to a
links db file. Just need to rewrite it in perl.
cheers
blurb

Quote Reply
Re: New mod/addon idea .... In reply to
Jerry,

Do you think you could post your getURL.cgi code here? I'd really like to quit manually spending hours per day importing files into my database....it'd be ok if I didn't have to always input the name and email over and over and over each time I do it! Anyway around that? Smile

Thanks
Quote Reply
Re: New mod/addon idea .... In reply to
Hey widgetz
I checked out
www.pdamania.com/cgi-bin/boom1.0/boom.cgi

It seems to output the wrong code to the
initial page so it wouldn't do what
it was supposed to. I did get it working
by saving the first page and editing it to
remove a extra /

Does it use LWP ?
Is it possible to get the code for it plus the code for this geturl that everyone is mentioning.
cheers
blurb
Quote Reply
Re: New mod/addon idea .... In reply to
Blurb,

What / did you remove and where on that Boom.cgi?

Thank you
Quote Reply
Re: New mod/addon idea .... In reply to
When I went to www.pdamania.com/cgi-bin/boom1.0/boom.cgi

It out put a page with this in the code
action="//cgi-bin/boom1.0/boom.cgi"

It was taking me to cgi-bin.com ...
Which im sure wasn't the right effect.

So i saved the page removed the extra /
and put the full path to your cgi and it
worked .. I notice it is fixed now so either
you found it or my netscape was playing tricks on me

cheers
blurb
p.s. any chance of looking at the code .. ?