Gossamer Forum
Home : General : Perl Programming :

About Metacrawlers like altavista.cgi

Quote Reply
About Metacrawlers like altavista.cgi
I found some info that may be of interest to people who want to develop metacrawlers or want info on getting permission from search engines. . .

www.chatologica.com/site/collection.htm

Also, they have upgraded their scripts to use hobot and altavista searches and a faq on developing modules for it. Seems like non advance perl programmers can make modules for it now Smile


------------------
www.techdevelopers.com
ASP, HTML, CGI, Flash, and more!

HPCalc.com - English
Official HP Calculator Site
www.hpcalc.com/english


[This message has been edited by XanthisHP (edited July 23, 1999).]
Quote Reply
Re: About Metacrawlers like altavista.cgi In reply to
Hi XanthisHP,

I've try to create one module for chatologica but i dont understand the code of modules...

Can you help-me?

------------------
[]'s

Lucas Saud - #34750464
Quote Reply
Re: About Metacrawlers like altavista.cgi In reply to
I have also tried but with no success I found out that with the updated version that you also have to edit another script for them to recognize the modules. What search engine did you try to create the module for?
Vince Urmanski
Quote Reply
Re: About Metacrawlers like altavista.cgi In reply to
Hy Vinci,

i've tried to create modules for: webcrawler, canada.com, infoseek and yahoo...but with no success!

anyone have created one module???

------------------
[]'s

Lucas Saud - #34750464
Quote Reply
Re: About Metacrawlers like altavista.cgi In reply to
Hey Lucas,

I myself have tried to create a module following their faq, but it seems a bit complicated on how they coded them and their instructions are a bit vague on the parsing technique. I'll give it another shot tomorrow when I get the time. . . Smile

------------------
www.techdevelopers.com
ASP, HTML, CGI, Flash, and more!

HPCalc.com - English
Official HP Calculator Site
http://www.hpcalc.com/english
Quote Reply
Re: About Metacrawlers like altavista.cgi In reply to
Hy XanthisHP,

i've look in your website (www.techdevelopers.com) the submit script, and i love it! can you send-me the source code??

ps: i'm working in Yahoo add-on to chatologica..when i finish i sent to you..


------------------
[]'s

Lucas Saud - #34750464
Quote Reply
Re: About Metacrawlers like altavista.cgi In reply to
Hey Lucas,

Well, the script was worked on by Vince and me. We can make some deal if you can get that yahoo module working Smile

------------------
www.techdevelopers.com
ASP, HTML, CGI, Flash, and more!

HPCalc.com - English
Official HP Calculator Site
http://www.hpcalc.com/english
Quote Reply
Re: About Metacrawlers like altavista.cgi In reply to
hey XanthisHP,

you only send the source if i get that yahoo module working?

i'm try to get this module working..my problem in the yahoo module is get the body of page..i dont understand this part..

if($page =~ m{<b>$i\. </b>(.+)<b>$j\. </b>}is){

i need to change the code above to use the script but i dont understand it!

but can you send the script in my e-mail?? (lsaud@manaus.br)

------------------
[]'s

Lucas Saud - #34750464


[This message has been edited by Lucas (edited July 24, 1999).]
Quote Reply
Re: About Metacrawlers like altavista.cgi In reply to
Lucas,

Quote:
if($page =~ m{<b>$i\. </b>(.+)<b>$j\. </b>}is){

You got the body of the yahoo page? I think this line means that if the search results contain at least 1 search results then began parsing it. On the altavista source from a results, each number is bold. It looks like it's trying to check if there is at least one line that has the properties of this:

Quote:
<dl><dt><b>1. </b><a href="http://www.gamesdomain.co.uk/"><b>Games Domain - Games Domain</b></a><dd>
Welcome to Games Domain. Console Domain - Kids Domain - MPOG. Latest & Greatest from E3. Excitement! Ecstasy! Exercise! The GDR reports continue to flood..<br><b>URL:</b> <font color=gray>www.gamesdomain.co.uk/<br>
Last modified 28-May-99 - page size 26K - in English</font> [ <a href="http://jump.altavista.com/trans.go?urltext=http%3a%2f%2fwww%2egamesdomain%2eco%2euk%2f&language=en">Translate</a> ]</ dl>

<dl><dt><b>2. </b><a href="http://www.gamespot.co.uk/"><b>GameSpot UK: PC Games news, reviews, demos & strategy guides</b></a><dd>
Comprehensive PC gamers site. Offers daily news, reviews, demos & downloads, hints & strategy guides for PC gamers. Plus competitions and 3D...<br><b>URL:</b> <font color=gray>www.gamespot.co.uk/<br>


This is the first two results from a yahoo search:

Quote:
<li><a href="http://www.gamespot.com/zdnet/index.html"><b>GameS</b>pot</a> - <b>GameS</b>pot's home page, the most comprehensive gaming site on the Web, offering the most up-to-date reviews, features, demos, links, hints, cheats, and tech support on PC <b>games</b> available. Includes reviews of PC <b>games</b>, online <b>games</b>, PC gaming har<br><i>--http://www.<b>games</b>pot.com/zdnet/index.html</i><p>

<li><a href="http://home.miningco.com/games/index.htm">About.com - <b>Games</b></a> - About <b>games</b>. Get hot news, helpful advice, key links, and invaluable perspective from our expert human Guides<br><i>--http://home.miningco.com/<b>games</b>/index.htm</i><p>

The code altavista uses the foreach loop which needs each line numbered, $i, and $j +1. Yahoo results don't use numbers to number their listing.

I'm not sure exactly how it's really going to work unless we modify the foreach loop. . .

PS, Lucas if you want the script, let's trade. If you can get one module of any engine working besides hotbot and altavista, I'll send you the script. Deal?


------------------
http://www.techdevelopers.com
ASP, HTML, CGI, Flash, and more!

HPCalc.com - English
Official HP Calculator Site
http://www.hpcalc.com/english


[This message has been edited by XanthisHP (edited July 25, 1999).]
Quote Reply
Re: About Metacrawlers like altavista.cgi In reply to
Hello XanthisHP,

My i've changed the part to get the results line from Yahoo. Look the script:

Code:
# 0.SETTING THE MAIN REMOTE ENGINE PARAMETERS

$site_name = 'Yahoo'; # name of the remote site - this name will appear
# on the result's output page
$host = "ink.yahoo.com"; # remote host address
$port = 80; # default web server port is 80
$method = 'GET'; # POST or GET http method we use
$remote_path = "/bin/query"; # relative web path to the remote script/page

# 3.PREPARING THE VARIABLES THAT WE HAVE TO PASS
# Variable/value pairs that we will pass to the remote search script.
# Look at the URL of the search result page if 'GET' method to see what are the pairs.
# If method is 'POST' open the search page source and look at the form description.
%variables = (
'p', '$query',
'hc', '11',
'hs', '500'
);
# Some of the search engines like HotBot require a special order of the pairs.
# The list below is the order in which we send the variables. If it does not
# matter make the list empty.
@order_of_variables = ('p','hc','hs');


# 5.PARSING THE DOWNLOADED PAGE
# Now we will look through the source trying to extract the search results.
$results = ''; # here we will strore all search results in database format -
# string with | signs as delimiter of the values
$n = 0; # number of results extracted successfully
# This is parsing TYPE1. - delimiters are numbers in order
foreach $i (1..9) {
$j = $i +1;
%result = (); # here we will store all components of a search result
$result_string = ""; # all components of a search result in database format
$result{'link'} = $link; # link to the downloaded page

# Now we try to extract the current search result's body from the source
if($page =~ m{<b>$i\. </b>(.+)<b>$j\. </b>}is){
$body = $1; # body is extracted
# print "search result's body - extracted\n"; # uncomment to debug
# try parsing the search result's body to get the components
if ($body =~ m{^<a href=\"(.+)\"><b>(.*)</b></a>(.+)--}is){
# print "search result's components - parsed\n"; # uncomment to debug
$n ++; # counting the found entries
$result{'URL'} = $1; # site's URL
$result{'title'} = $2; # site's title
$result{'description'} = $3; # site's description
$result{'rate'} = $i; # how do we rate this result

# removing all html tags from the site's description
$result{'description'} = &rm_tags($result{'description'});

# make the components in database formated string
$result_string = &ArrStr(\%result,\@result_structure);
# print "extracted data: $i $result_string\n"; # uncomment to debug

$results .= "$result_string\n"; # collect here the results
# as a miltiline string
};
};
};

help-me to finish this..


About your submit.cgi, you use LWP in this script??

------------------
[]'s

Lucas Saud - #34750464
Quote Reply
Re: About Metacrawlers like altavista.cgi In reply to
Yes it does in fact it requires server differnt modules, I will list them below
Code:
use CGI;
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Status;
These are the modules required in order to run it properly,
Vince Urmanski
Quote Reply
Re: About Metacrawlers like altavista.cgi In reply to
Lucas, can you send the module that you have so far for. I can't copy/paste from this page because it's resulting in a one line code and not the ordinary multiple line code. . .

------------------
http://www.techdevelopers.com
ASP, HTML, CGI, Flash, and more!

HPCalc.com - English
Official HP Calculator Site
http://www.hpcalc.com/english

Quote Reply
Re: About Metacrawlers like altavista.cgi In reply to
Well, I was able to pull the content from yahoo and parse it, but it's not numbering it. I don't think I am doing this right, but here is what I got:

Link: www.techdevelopers.com/cgi-bin/meta/modules/yahoo.txt

Example of a search: http://www.techdevelopers.com/...ow=and&where=web

[This message has been edited by XanthisHP (edited July 25, 1999).]

[This message has been edited by XanthisHP (edited July 26, 1999).]
Quote Reply
Re: About Metacrawlers like altavista.cgi In reply to
Hello,

i've tested your metasearch but this show all results from Yahoo , but it's not numbering it.

i will contact the chatologia owner to help-me...



------------------
[]'s

Lucas Saud - #34750464
Quote Reply
Re: About Metacrawlers like altavista.cgi In reply to
It looks like now, Chatologica has made it not freeware or shareware, but you have to buy it for 300 dollars for a server.

Luckily, we were the few that have a chance to get it while they put it up as freeware. There are some new modules that you might want to get. They are made freeware, but these are not developed by them, but by the people.

http://www.chatologica.com/site/collection.htm

------------------
www.techdevelopers.com
ASP, HTML, CGI, Flash, and more!

HPCalc.com - English
Official HP Calculator Site
www.hpcalc.com/english



[This message has been edited by XanthisHP (edited July 27, 1999).]
Quote Reply
Re: About Metacrawlers like altavista.cgi In reply to
Hi ERROR,
I only have the old 1.1 Chatologica Script not the new 1.1.1 but if it helps tell me and I'll mail it to you.
Koolski has the new Script but he didn't answear on mails.
Can another "frindly" guy mail me the new script ??? (I really need the new features but have not the money)

kAOs

[This message has been edited by sourceofkaos (edited August 07, 1999).]

[This message has been edited by sourceofkaos (edited August 07, 1999).]
Quote Reply
Re: About Metacrawlers like altavista.cgi In reply to
Hey whats up, can one of you guys send me that metasearch script?? My email address is jxl3046@rit.edu It would really be appreciated! Please Help..Ive been looking all over for a script like that and I really dont have enough money too buy the script..I think I need too first pay for college books :-). Any Help would be great.

------------------
Theres only now, theres only this, forget regret or life is yours too miss. No other road, no other way, no other day but today!
Quote Reply
Re: About Metacrawlers like altavista.cgi In reply to
sounds good..I'll email yah too under my name jxl3046@rit.edu in case yah get a funny email :-) and was wondering who its from...Also ill keep looking around for the newer version, and if I can get it I will give it to yah. thanks a lot,
John

------------------
Theres only now, theres only this, forget regret or life is yours too miss. No other road, no other way, no other day but today!
Quote Reply
Re: About Metacrawlers like altavista.cgi In reply to
I'm wondering if I can use something like this to call a remote script (on another domain/server we have). Any example to send a simple query to a remote server?? and have the result back?
THANK YOU.