Gossamer Forum
Home : Products : Gossamer Links : Version 1.x :

Searching Like Yahoo :)

Quote Reply
Searching Like Yahoo :)
Links SQL has to search like Yahoo, here is an example of what I mean:

TITLE:
Auto Shop
DESCRIPTION:
Find the best car accessories. Wheels, stereos, and more.
URL:
www.racestore.com

In Yahoo if you search words like:

car shop
wheels shop
race wheels
shop stereo
stereo store
auto store
wheels accessories
best wheels

You will receive a good match of that link, what they do is very easy, the search engine looks in all the words of the Title, Description, and Url, to receive a good match, you can write words in any order, and Yahoo is going to give you a good match.

In this moment Links 2.0 works very well searching, but the thing I don't like is that it can't search in any order in the Title, Description, and Url. If you use the example I give you below in to links, you are going to see that if you search words like:

Wheels accessories
best wheels

You are going to receive a good match, but if you search words like:

car shop
wheels shop
race wheels
shop stereo
stereo store
auto store

You are not going to receive a good match, because Links can't search in any order in any of the camps like: Title, Description, and URL.

Also Yahoo removes from the searches very common words like:

What
I
a
is

The best I would like to see in Links SQL, is that we can remove words from the searches, that way, everyone can make their search engines work like they want.
Quote Reply
Re: Searching Like Yahoo :) In reply to
You can remove words from being indexed by using the:

$LINKS{search_stopwords}

at the bottom of the LINKS.pm file

Alex has explained the situation with searching. It's a complicated problem. I've got a work around that works for me.

Maybe there will be additional options in the future.

You do understand 'weighting' right? If you give the various fields a weight such as Title=5 description=3 url=1 then when links searches, it will weight it's finds.

Code:
TITLE:
Auto Shop
DESCRIPTION:
Find the best car accessories. Wheels, stereos, and more.
URL:
www.racestore.com

if you search on "auto shop" the weight for this link will be 10 for the double hit in the title.

If you reversed it so that title=3 description=5, then 'auto shop' would only return a 6, where 'car shop' would return an 8 and 'wheels stereos' would return a '10'

The major search engines use a very complicated process. It's not easy to implement on this scale.

One way would be to put the field names as parameters to the search, so you can include/exclude certain fields -- such as URL or title --

Full text searching is very inefficient. Some compromise has to be made for speed or resources vs targeted accuracy. It's better to over include, than under.

Nice features would be 'narrow' the search, but to really implement this you would need to use a lot more behind the scenes trickery for preserving user state, intermediate returns, and more. The larger engines do that, but it was a do or die situation for them since they couldn't afford the resource penalties associated with repeated same searches. I've seen some back-end code for this and it's as large or larger than LinkSQL itself _just_ for these semi-persistant searches. They also work best on RAID machines, since they do a lot of disk IO in leu of memory and cpu (state is saved to files rather than kept in memory to allow more processes to run on physically available machines.)

What you ask is not impossible, but to expect yahoo or lycos type searches from LinkSQL, MySQL and the average webserver (machine) is somewhat unrealistic.

Weighted results and inclusion/exclusion of fields is the best alternative.



Quote Reply
Re: Searching Like Yahoo :) In reply to
The trade off is resources consumed for features. LinkSQL makes a good trade off. Other progrms run more slowly, use more resources, or don't have the features.

I'm sure Alex can come up with a quick way to enable full-text searches, but that almost defeats the purpose of indexing. A couple of people doing those kind of searches can kill a CPU pretty quickly -- especially on a shared machine.

I don't see not having searches 'like yahoo' a major drawback. There is "and" and "or" searching as well as "partial" and "exact" searching.

Quote Reply
Re: Searching Like Yahoo :) In reply to
Making a search system that searches like Yahoo is easy for a good programmer.

I have seen a lot of search engine programs that work that way, if you want I can send you the names of the companies and the urls, that systems cost the same that Links SQL.

Also I have seen a search system that uses msql, and searches the way I say, one time I have that script installed in my server, but because I can't import all the links I have deleted it.

That script was very simple, and it worked like Yahoo, if you don't know very much about programming you can't build the right code, but if you are a good programmer you can build the right code for every server, using the same lines that uses Links SQL.

Also I have seen a script that searches the way I say, and it don't work using SQL, only flat databases, if you use SQL is more easy to write a program to do what I say.

Come on, reading this forum I got disappointed, it looks that Links SQL is going to be forever a poor searching system, is that right ALEX, everyone in here says that searching like Yahoo is like going to mars for the weekend, I don't think so.

Because I have seen other scripts search like Yahoo, I know that it can be done very easy.
Quote Reply
Re: Searching Like Yahoo :) In reply to
Hi,

The only feature Links SQL doesn't have right now is phrase searching. i.e. a search for "auto shop" will match all links that have the words auto shop together as a phrase.

If you are concerned about returning the best matches first, try using the order by score method. If you pass in order=SCORE, Links SQL will use the weighting pugdog described to return the most relevant matches first.

Adding in phrase matching correctly is not trivial. The easy way to do it is to just pass into SQL Title LIKE '%my phrase%'. However this is extremly ineffeciant as you have to do a full table scan for every search. To work phrases into the indexed system is much more complex.

However, The next version will include a non indexed search engine that does what's described above. This is neccessary for foreign languages where words are not separated by spaces, so indexing is very difficult. You can also use this to do phrase searching.

It's limitation of course is that it's not able to achieve nearly the same performance as an indexed system.

If you have any other questions about searching, don't hesitate to ask.

Cheers,

Alex
Quote Reply
Re: Searching Like Yahoo :) In reply to
Alex,

So what you are saying is that searching by 'phrase' will be able to be enabled in the next version by setting $LINKS{foreign_char} = 1;


This at the price of _significant_ processing penalty since this type of search treats the SQL database as if it was a very large and (more efficient) flat-file database, bypassing the advantages of indexing.

What about making an easy way to turn this on for individual searches? In the 'advanced' option (the multi-line type search box) allow a "by phrase" search but note it as "SLOW" This would pass the search term to the search routine, but also set the $LINKS{foreign_char} flag to TRUE. It would give site operators the choice of enabling phrase searches, or turning them off if performance starts to suffer.

This gives everyone what they want -- and they can _see_ the performance penalties and make the decision to accept them, or dis-enable them.

Basically, it's allowing an additional parameter to the search.cgi -- to override the default search.

Problems: people who pass their own parameter to the engine -- solution : use an '$override' variable in the Links.cfg (set this to true to allow phrase searching when foreign language support is set to off) which if set will cause processing of the $foreign_language variable from the form (in the form of a "by phrase 1/0"), otherwise it's ignored.

Shouldn't be more than a few lines of code.

Could probably be done after the fact, but if the capability is there, why not make it an option -- "fast" and "slow" searches Smile
Quote Reply
Re: Searching Like Yahoo :) In reply to
Alex, How can I use the score by method in Links SQL.
Quote Reply
Re: Searching Like Yahoo :) In reply to
PugDog,

Please tell me how to change the weight of the searches, as you describe before.

I tell Alex to tell but I don't understand him
Quote Reply
Re: Searching Like Yahoo :) In reply to
In the ADMIN area, go to the table maint -> links -> update fields area.

The last box is an index weight. '0' means ignore. The higher the number, the more 'weight' the field will have.

For instance, if the DESCRIPTION has a 5, and the TITLE has a 3, the word "Car" appearing in the DESCRIPTION will show higher on the list of search results than the 'car' showing in the TITLE.

Then, inside the <FORM> tags of your search, you need to tell links to return the links in 'score' order --- by adding an <input type=hidden name="order" value="score">

Or, you can edit the search.cgi file at the top and set the default to be 'score':

Code:
($in->param('order') =~ /^(score|category)$/i) ? ($order = uc $1) : ($order = 'CATEGORY');

You can also change the default by making 'CATEGORY' 'SCORE' in the line above.

That line checks to see if param{'order'} is either 'score' or 'category' and if it is, it's sent to the UPPERCASE 'SCORE' or 'CATEGORY' or else it's set to 'CATEGORY'...

You'd want to change that else to 'SCORE'.
Quote Reply
Re: Searching Like Yahoo :) In reply to
Thanks, PugDog.

I have good news, right now I looked at a search program based on Links that uses the PHP3 and mySQL language and searches like Yahoo, in any order, as I told before.

I will not give the name of that system in this forum, but what I want is that Alex talk with the person that builds that search engine so that Links SQL searches like Yahoo.

I am sure that with a little help, and with very little of extra code, Links SQL is going to search like Yahoo.

Smile

Quote Reply
Re: Searching Like Yahoo :) In reply to
Another change I had to make, and Alex might want to change it in the distribution is:

Code:
# Set the order preference.
($order eq 'SCORE') ? ($relevance = 'ORDER BY Weight DESC') : ($relevance = "ORDER BY Name, $LINKS{build_sort_order} ASC");

to

Code:
# Set the order preference.
($order eq 'SCORE') ? ($relevance = 'ORDER BY score DESC') : ($relevance = "ORDER BY Name, $LINKS{build_sort_order} ASC");

Otherwise, the sort was on the 'weight' and not the SUM(Weight) stored as 'score'



[This message has been edited by pugdog (edited September 29, 1999).]
Quote Reply
Re: Searching Like Yahoo :) In reply to
PugDog, thank you very much, with your help I have found a way to make the searches better to all the users.

Using your instructions I make the search results work like Altavista, I hope that Alex also can make Links SQL work like Yahoo, that way the people can select what the think is best for their sites.

Another question PugDog, I can't make Links SQL searches in the URL, how can I do that, I already select a higher weight to the URL, but it is not working.

What is the max number for the weight option.
Quote Reply
Re: Searching Like Yahoo :) In reply to
PugDog the search results are a lot better than before, but I want to display also the category that is releated to every link.

For example, if I search Microsoft Windows, I want that the category: Computer:Software is display on top of the list, is that possible.


Thanks
Quote Reply
Re: Searching Like Yahoo :) In reply to
 
Quote:
Another question PugDog, I can't make Links SQL searches in the URL, how can I do that, I already select a higher
weight to the URL, but it is not working.

Re-index the site, that should update the weights in the index.

As for the category names -- I haven't quite figured that one out. It might not be possible without changing the code around a bit.
Quote Reply
Re: Searching Like Yahoo :) In reply to
Well it looks that Alex needs to make better Links SQL, you can search like Altavista, but you can't search like Yahoo.

It is important in a directory to display the releated categories in every search so that the people browse them, and generate more pageviews.

Thanks anyway PugDog.
Quote Reply
Re: Searching Like Yahoo :) In reply to
Supersearch: The product you mentioned uses Title LIKE '%my_phrase%' and as such needs to do a full table scan. This is not realistic to use on large sites, as even mysql takes a while if when searching it has to look at every record.

Pugdog: Correct, that should be order by score, not by weight! Thanks!

One change I was thinking to making to the searches, is normally Links groups by category and returns result by category name ascending. This really favours results that are in category's that start with a,b,c etc.

How about if we switch to always returning results by score, and then just group the links together by category on a page. So on page one, you'd have the top 20 matches, but they would be grouped together by category. This means on page 2 you might have a link in a category that has already appeared in page 1?

Cheers,

Alex
Quote Reply
Re: Searching Like Yahoo :) In reply to
Alex...

This is the flexibility in the searches that I and others keep bringing up in different forums.

What some sites have done is go to the 'priority' ranking system, then randomly organizing the rest of the links.

The way I would really like to see the links is when they are returned have the category_linked attached to each individual link, so that it can either be put first, or at the end in tiny type, etc.

I don't like the group-by-category, since on a search (top ten, most visited, etc) people are looking for the most-relavant first -- not a category.

They will browse and search in a specific category, but they want a search to do the work for them, and return the most valid links first....

In fairness to the other sites, randomly mixing rather than alpha order of the returns within a weight/score or category would be nice. That way 'a' sites aren't given premiere status.

Not an easy issue -- especially if you try to make it flexible -- but with 300 links in a category, alpha order is unfair to the later-lettered links.

You'd still want 'priority' and 'new' links to appear first, and searches to appear in ranked order.