Gossamer Forum
Home : Products : Links 2.0 : Customization :

gofetch 2.1 update

Quote Reply
gofetch 2.1 update
I am working on the latest download. If you go to the demo page http://lookhard.hypermart.net/.../Look/admin/dmoz.cgi
you will see that i added the option to get regular urls also, and it also still uses blocking of certain urls, so if you spider a yahoo search page, it will block out all yahoo internal links. It still would update the linksid file the same and is written to validate.db in the same format. While it still has the same dmoz spider function as before. I should have this up within a day or so. It will not be a complete reinstall. You would just need to add the new subs to site_html_templates.pl and then copy over the existing dmoz,gofetch.cgi's with the ones from the zip. And if you've customized fields, you'll just need to modify the fields on the new gofetch with the changes. In the new release though, there will be 2 places to change fields. One for dmoz, and one for web urls.
Lavon Russell
LookHard Mods
lavon@lh.links247.net
Quote Reply
Re: [Bmxer] gofetch 2.1 update In reply to
Nice job Bmxer Smile
Quote Reply
Re: [reenee] gofetch 2.1 update In reply to
Yes it is gread!

I already use it and i am satisfied with it!

But i think there should be a button that will say to validate all than validate manually every site.

Think about it Bxer.Wink

Loucian

Last edited by:

loucian: Oct 1, 2001, 11:39 AM
Quote Reply
Re: [loucian] gofetch 2.1 update In reply to
well, i don't really see the point to that. but i remember there being a little javascript code in the forums that would let you select a checkbox and then all the validate radio buttons would be chosen. If you mean making an option outside of the admin to validate all, i won't do that. That is something you can find in the mycgiscripts Links.
Lavon Russell
LookHard Mods
lavon@lh.links247.net
Quote Reply
Re: [Bmxer] gofetch 2.1 update In reply to
Hey all. First post to the forums. I've gotten a lot of help from you all in the past though, so thanks for it.

Now...I'm feeling like a bit of an idiot here.

I've run the 2.2 demo on your site, Bmxer, and I'm impressed. I know that it's only in Beta, and isn't available for download yet, so I downloaded and installed the 2.0 zip.

After a few installation problems (solved by reading the forums, thank you guys), it looked like everything was running great. I can index URL's, and add them to the queue.

I'm having problems getting the queue to run though. Serious problems.

Problem one: In comparing my screen to your demo on dmoz.cgi...I have no "Spider Web URLS" option. Not really a big problem compared to the other ones though.

Problem two: I add URLS to the queue, press "Spider Indexed", and absolutely nothing happens. It gives me the number of categories in the queue, Spiders DB size (O bytes), number of links spidered (0), and says "1 categories blocked". I have a feeling that this is my big problem, and I've done something stupid to make this happen. I've tried indexing several different categories on DMOZ...and nothing changes.

Problem three: If I run gofetch.cgi and try to import URLS into the Links2DB, I get a 404 Error.

I went to your demo, to see if I could get it to work, and when I push the "Index" button, I get a 500 ISE with the message: Debugging Information:

Premature end of script headers: /data1/hm/lookhard/look-bin/Look/admin/dmoz.cgi

I'm almost positive that this is something stupid that I've done, making this happen, but have no idea where to begin looking. If you could maybe give me a push toward the section of code I might have messed up? I would appreciate it greatly.

Thanks again for everything.

bri
Quote Reply
Re: [brihana25] gofetch 2.1 update In reply to
if you want the demo, go here
http://lh.links247.net/.../Look/admin/dmoz.cgi

You don't index urls, it indexes cats from dmoz.
There wouldn't be a spider web urls option in the 2.0 download because spider web urls is in that version.
2.0 is dmoz only.
Quote:
Problem two: I add URLS to the queue, press "Spider Indexed", and absolutely nothing happens. It gives me the number of categories in the queue, Spiders DB size (O bytes), number of links spidered (0), and says "1 categories blocked". I have a feeling that this is my big problem, and I've done something stupid to make this happen. I've tried indexing several different categories on DMOZ...and nothing changes.
Was this a top level directory on dmoz that you added. Most top level directories don't have links in them. Was there an error message at the top saying
"no links spidered" or something. One problem with the hypermart demo on my site, could be that i am testing my installer and it works beautifully, but maybe some code got messed up or something, because it should have installed 2.2 onto my hypermart demo. I don't know. I'll check that out.
I don't know about the gofetch.cgi 404 error. Is it in the admin folder. Check the templates to make sure that in the form input , the location of gofetch is right.
Lavon Russell
LookHard Mods
lavon@lh.links247.net
Quote Reply
Re: [Bmxer] gofetch 2.1 update In reply to
Bmxer,

Thanks so much for answering so fast.

I understand the things about the Web URLS not being there. DUH! Should have figured that out on my own. Thanks.

I've done a clean re-install of the MOD, and had the same results. The only modifications that I've had to make to the script is taking out the <%build_site_title%>'s in the templates, because of the Unknown Tag error.

It's not a top-level directory, and it does have links to 12 sites in it. No, there's no error message. There's just nothing. I enter the URL from the DMOZ page I want to index, hit "Index", it adds it to the queue. I hit "Spider Indexed" and nothing happens. If you want my user name and password for my admin folder, let me know, and I'll email it to you, so you can see what's happening.

As for the 404 Error...everything is where you said to put it. The .cgi's are in the admin folder, the templates are in the template folder, and the .db's are in the data folder. All permissions are set the way you said.

I'm sorry to be such a bother, but I've got a rather ambitious project on my hands, which would take me weeks (if not months) to do by hand, and I know this would save me so much time.

Thanks again for all the help.

bri
Quote Reply
Re: [brihana25] gofetch 2.1 update In reply to
Ok, can you send me temporary ftp login info, and a username/password to access your admin folder by browser. also the url that gofetch is at. you can send them to bmxer@links247.net

-----------
Installer Update...
It works exactly the way i want it and everything is fine.
I even made it so it will add the paths to dmoz.db and all that other junk to links.cfg automatically.
Now i'm trying to make it do the same for all the subs that go in site_html_templates.pl but i'm hitting some problems. So be done with the install program within a week.
Lavon Russell
LookHard Mods
lavon@lh.links247.net

Last edited by:

Bmxer: Oct 21, 2001, 10:18 PM
Quote Reply
Re: [Bmxer] gofetch 2.1 update In reply to
Hey, Bmxer.

I played around with things after I emailed you last night...so I don't know if I fixed it or if you did...but it is working this morning. Turns out that I had a _ instead of a - in one path in links.cfg...weird thing was that it didn't affect Links2.0 itself.

I've only indexed one Cat from DMOZ so far, but I can't wait to get all of the ones I need done. This is going to be such a great MOD. Thank you so much.

Can't wait for 2.2!

bri
Quote Reply
Re: [brihana25] gofetch 2.1 update In reply to
brihana25,
whatever changed was most likely the problem and you fixed it because i just woke up and went to your site to fix it. I just realized in the globals everyone has site_title => $build_site_title , but i had build_site_title => $build_site_title, so i thought that was the default and I put that in the mod. I'll change this on the installs and templates.

Note::
If you download the current 2.0 zip, change <%build_site_title%> to <%site_title%>.
Lavon Russell
LookHard Mods
lavon@lh.links247.net
Quote Reply
Re: [Bmxer] gofetch 2.1 update In reply to
I was wondering if someone would be will for me to test the install on the Links. It backs up the files it changes and everything - links.cfg, and site_html_templates.pl. I would just need their ftp server address, username, pass,
and username, pass to get in the admin folder. Then if it tests good. I will ask you to try and install and see how good you understand the tips and how user friendly they are, and then release it. Right now, while i'm waiting, i'm working on gofetch, because the web spidering is real slow. I have to speed it up. if you want me to test the install with
you, you can email me at webmaster@links247.net

BTW..
The install works great... all you would have to do when its released is add tags like this
<--dmozurl-->, <--dmozdb--> to links.cfg. and this is my favorite, <--sht--> for site html templates.
It prints about 100 - 300 lines of code for the templates subs in about 20 seconds.
just thought i'd tell about that.
Lavon Russell
LookHard Mods
lavon@lh.links247.net
Quote Reply
Re: [Bmxer] gofetch 2.1 update In reply to
Emailed you.

You can try it out on my site if u want. The one you fixed it on before. Does your install substitue the whole file or update the subs in files? Just asking beckause some of my files are fairly well hacked up now, mostly other mods and some of my own junk.
Quote Reply
Re: [roman365] gofetch 2.1 update In reply to
um, yes, the install substitutes the whole file which at first i thought was good, but now, i do think it's unfair to do that to those who modify it. I think it would be kind of hard to modify just the subs. if portions of the main sub is what you changed and its also what i'm updating, it won't get updated, based on the fact that its not the original code.
I guess i would suggest for those who have modded, save gofetch.cgi, and dmoz.cgi in seperate folders and when the new one is installed just easily compare what you did in the first one with the second one. Mostly only dmoz.cgi is changed, but I added alot of things to gofetch.cgi too, and i plan on working more on gofetch.cg for the next release. I'll back up your gofetch.cgi and dmoz.cgi for your mod's sake.
Lavon Russell
LookHard Mods
lavon@lh.links247.net
Quote Reply
Re: [Bmxer] gofetch 2.1 update In reply to
 
Hi all, great hack. I am a bit confused. Let's say I want a particular category off the DMOZ, does this program also download all the individual URLs from that particular category and convert it into Links 2 format?

Any help would be appreciated. Thanks in advance.
Quote Reply
Re: [svr] gofetch 2.1 update In reply to
After the links have been spidered how do I merge the links with my own Links Database?
Post deleted by RedRum In reply to
Quote Reply
Re: [svr] gofetch 2.1 update In reply to
Quote:
does this program also download all the individual URLs from that particular category and convert it into Links 2 format?

huh? it only converts external links. Any link with dmoz.com will not be spidered. The demo would show this.

Quote:
After the links have been spidered how do I merge the links with my own Links Database?

It already merges the links in your own database, it writes the spidered links to the validate.db. If i do recall thats a links validation database. Just go in your admin and validate them.
Lavon Russell
LookHard Mods
lavon@lh.links247.net
Quote Reply
Re: [Bmxer] gofetch 2.1 update In reply to
 
Hi Bmxer, your script works great ;)

Thanks very much for excellent work.

Could you keep us all updated as to when the version that'll fetch non-dmz.org URL's will be released please?

Thanks again!!!!!!!!!!Cool
Quote Reply
Re: [svr] gofetch 2.1 update In reply to
 
Err.. I found a bug. The database created (validate.db) somehow strips the chars "http://" from all the URLs spidered! Anyone has the same prob..? A fix would be appreciated a lot!

Thanks.
Quote Reply
Re: [svr] gofetch 2.1 update In reply to
you must have gotten that a while ago. I know it strips out the http:// when spidering and was supposed to have it add it back while adding it to the dbase, but when i first release, i was notified that i didn't put it in, but at the line that prints to the database.
The line that reads:
print SPIDER "$ID|$mytitle|$myurl|$mydescrip|||$lastupd|$FORM{'Category'}|$email|0||20\n";

change it to:

print SPIDER "$ID|$mytitle|http://$myurl|$mydescrip|||$lastupd|$FORM{'Category'}|$email|0||20\n";

Lavon Russell
LookHard Mods
lavon@lh.links247.net
Quote Reply
Re: [Bmxer] gofetch 2.1 update In reply to
Hi Bmxer, thanks for the advice.

I have 2 questions:

1. Why are there two chars || between $mydescrip? You show three chars ||| in your post. The gofetch.cgi file I have from the download has two chars || The reason why I am asking is that all the link chars in my links.db file are separated by only one | character. I have removed the || chars and changed it to one | char in my gofetch.cgi file. Is that ok?

2. After fetching the dmoz links, I noticed that the validate.db file (in my case) has one space after the | char that is there before the description starts, and one space after the description ends. So it is like this:

|Arts/Performing_Arts/Acting/Actors_and_Actresses| Description of the link here |

Notice the space after the | char that starts before the word Description, and the space after the word here

Why is that? Thanks again for your time!
Quote Reply
Re: [Bmxer] gofetch 2.1 update In reply to
By the way I have added the following to dmoz_error.html, dmoz_success.html and getdmoz.html :

Contact Name:
<input type="text" size=15 name="username" maxlength="25" value=""><br>


And I have modified the print SPIDER line and it is now as follows:

print SPIDER "$ID|$mytitle|http://$myurl|$lastupd|$FORM{'Category'}|$mydescrip|$username|$email|0|No|No|0|0|No\
n";
Quote Reply
Re: [svr] gofetch 2.1 update In reply to
1 . Because i used the print to db code from my 2.2 version which was different then 2.0

2 . It's in the parsing. I guess i leave in spaces in the beginning and end.
Lavon Russell
LookHard Mods
lavon@lh.links247.net