Gossamer Forum
Home : Products : Links 2.0 : Customization :

Mod of Links Spider

(Page 1 of 2)
> >
Quote Reply
Mod of Links Spider
I have partly modified the Links Spider to do what some people, including myself want. It does what it used to but now it gets the info from the sites that it finds from the spidered site. I would use this for my add.cgi although if you did, you could only have a Title, Description, Keywords, and I guess Hits, and Ratings fields. I have not made it write to a database because the spidering of the main site and its links is a seperate step of getting the other sites information. I would probably make it write to it both times since each step gets the same info. anyway, its at
http://lookhard.hypermart.net/...-bin/Look/spider.cgi .
I will probably write the database stuff later.

------------------
LookHard Search
lookhard.hypermart.net
Lavon Russell

Quote Reply
Re: Mod of Links Spider In reply to
sounds interesting...
where is it available?

Webmaster33
Quote Reply
Re: Mod of Links Spider In reply to
hmmm.
Seems it doesn't exclude javascripts from indexing.
It should...

Webmaster33
Quote Reply
Re: Mod of Links Spider In reply to
I haven't finished with the code yet. Some forms of Javascript it does exclude. I already said i hadn't made it write to a database yet, so the code isn't done. What site did you try which showed javascript? I'm asking because my site has Javascript but the spider doesn't bring it back.

------------------
LookHard Search
lookhard.hypermart.net
Lavon Russell



[This message has been edited by Bmxer (edited February 27, 2000).]
Quote Reply
Re: Mod of Links Spider In reply to
Try themoney.hypolit.net. This site is on Hungarian language, but I think it doesn't matter :-)
Quote Reply
Re: Mod of Links Spider In reply to
Very nice, Bmxer...However, there seems to be a bug in the last updated date. All the links shown for Anthro TECH (www.anthrotech.com) shows today's date rather than the actual modified date.

Regards,

------------------
Eliot Lee....
Former Handle: Eliot
* Check Resource Center
* Search Forums
Quote Reply
Re: Mod of Links Spider In reply to
Thats right, its like links, showing the day that it was added. Because it still has the basics of the Xav search. I was thinking about getting the last time the page was actually updated after i do the database stuff, but i'm not sure how to do it with sockets. I also can't get it to write to a database.

------------------
LookHard Search
lookhard.hypermart.net
Lavon Russell

Quote Reply
Re: Mod of Links Spider In reply to
Ok, i got the urls and info to save to a database, now i have to update them with an ID number in front. I tried using the thing from add.cgi but all the links have 1 infront of them. I may use a while loop then say <ID> + 1 i don't know

------------------
LookHard Search
lookhard.hypermart.net
Lavon Russell

Quote Reply
Re: Mod of Links Spider In reply to
Ok, got that, i didn't plan on using this for a Database with Categories. I should release an example and a way to view the database soon

------------------
LookHard Search
lookhard.hypermart.net
Lavon Russell

Quote Reply
Re: Mod of Links Spider In reply to
Hi Bmxer,

Any chance this could be used with an sql database? I am not using the sql version now but I assume, that if one wants to build a decent database then the flatfile won't do good. I currently have a xav search engine for a small database of articles (16mb) but that takes 6-10 seconds to search. I am looking for an alternative. Would be nice to have a sql solution.
Quote Reply
Re: Mod of Links Spider In reply to
See, that is one thing i don't know about, because i'm testing on Hypermart and they don't tell you how much bandwith your using unless they are ready to kick you off. Obviously, its perl so it would need that, and it uses various perl modules like lwp and sockets which also uses server power. Also, i've found by testing that i can get upto 2000 links within one night of having it write to a database. So you would need a good server to run links on. Also, its not very attractive, since the xav spider didn't use templates, i did all the code, html, and perl in the cgi file so most novices won't be able to get it. But i guess i can do that now. Work on the internal functions and then the design later, right? Smile anyway.

------------------
LookHard Search
lookhard.hypermart.net
Lavon Russell

Quote Reply
Re: Mod of Links Spider In reply to
Cool Mode! but my only concern is bandwidth.
does it use too much bandwidth?
Quote Reply
Re: Mod of Links Spider In reply to
The better route to go if you anticipate having a large database file and if you are relying on this kind of Mod to extract links from the Web, is to go with a dedicated server (no less than 4 Gigs) with a fast network connection (also having a fast connection at home either via broadband or DSL would help in terms of extracting data down to your PC from a remote server) and LINKS SQL.

Regards,

------------------
Eliot Lee....
Former Handle: Eliot
* Check Resource Center
* Search Forums
Quote Reply
Re: Mod of Links Spider In reply to
i may make it for sql someday. When i learn it.
Quote Reply
Re: Mod of Links Spider In reply to
i like to try it on my server when you are done. i would know within minutes if it uses too much bandwidth. my host will stop it and send me an e-mail
Quote Reply
Re: Mod of Links Spider In reply to
I was gonna make a cgi script with instructions for all my mods, and i was done with the spider mod, and i had it with templates because Eliot helped it on time reduction and gave me his version of the original links spider formatted in templates. So i've finished the conversion of my code into his, but i now i have the problem of whenever i try to spider an url, it asks me to download. So i have to start over w/ a blank copy of the old one, and put my code in it and go step by step to find what is wrong. Because i've tried like everything. So i will try to finish by today, and get crappy instruction, and then by this week i should have a revamped mod center/collection script.

------------------
LookHard Search
lookhard.hypermart.net
Lavon Russell


Quote Reply
Re: Mod of Links Spider In reply to
Bmxer,

Any luck getting together the mod to download?

Thanks!
Adam
Quote Reply
Re: Mod of Links Spider In reply to
I finally have it templated and it all works, but if you go to the spider and spider an url, you will see the main url is printed at the top of the page. I have no idea why its doin this, because the tag isn't on the html page at the top, and the code doesn't look messed up. So, i just have to finish this, and then make instructions.

------------------
LookHard Search
lookhard.hypermart.net
Lavon Russell


Quote Reply
Re: Mod of Links Spider In reply to
Bmxer,

It could be a problem with the following codes:

Code:
print "Content-type: text/html\n\n";

if you have them in the spider.cgi script. Since you have templated this file, you do not need the above codes. I noticed this when I first templated the spider.cgi script, that weird codes and the URL with description was printed twice...once in the top of the screen, and once in the appropriate place.

I don't know if this will help.

Regards,

------------------
Eliot Lee....
Former Handle: Eliot
Anthro TECH, L.L.C
anthrotech.com
* Check Resource Center
* Search Forums
* Thinking out of the box (codes) is not only fun, but effective.


Quote Reply
Re: Mod of Links Spider In reply to
That does happen, but when i removed the html_print_headers call from site_html_templates.pl, and put the content call in spider.cgi, it stopped trying to download itself. I'll try your suggestion and remove the code.

Another problem i've found is that when submitting large spidered sites such as yours or yahoo, it opens the cgi and it works but on the browser, it brings back a dns error. Would this be hypermarts fault or the scripts?

------------------
LookHard Search
lookhard.hypermart.net
Lavon Russell


Quote Reply
Re: Mod of Links Spider In reply to
Would anyone let me test this on their server?

I would need a login and pass for ftp.

------------------
LookHard Search
lookhard.hypermart.net
Lavon Russell


Quote Reply
Re: Mod of Links Spider In reply to
I could set something up on my server if you would like. How long would you need it active for? And I would want have the script run through cgiwrap. Unless you need it otherwise.

Let me know and I will set it up.
Larry
Quote Reply
Re: Mod of Links Spider In reply to
probably till Wednesday
I don't know about cgiwrap though. i never used that, nor know what its for. I heard it was for security.
your server would need the perl modules
LWP::Simple
LWP::UserAgent
Socket

i think thats it,
you also need Links installed and running.
when u set it up, u can email Litrbf@aol.com to tell me the url, password, and login for ftp

------------------
LookHard Search
lookhard.hypermart.net
Lavon Russell


Quote Reply
Re: Mod of Links Spider In reply to
I was just going to give you access to one of my servers... Smile

I have everything needed for the scripts to work.

Let me know,

Larry
Quote Reply
Re: Mod of Links Spider In reply to
ok, anytime you wanna send me the info, is good, asap.

------------------
LookHard Search
lookhard.hypermart.net
Lavon Russell


> >