Gossamer Forum
Home : Products : Links 2.0 : Customization :

Search results improvement with the Links cgi scripts

Quote Reply
Search results improvement with the Links cgi scripts
So i just looked through that recent post about improving that silly thing with the search results like:
search for "art" returns also Arthur, artifacts or artists, that's not exactly professional.
So they where talking about a fix within the HTML templates with some Radio botton or hidden imput for "exact" search, but i have a Flash searchbox and i do not think that is the right solution anyway.

I would like to find a solution within the cgi script/s that are written to return anything with the search term somewhere in it, isn't there a way <to mod the scripts to return only search results that match an exact full word rather then also parts of words that contain the search term????
Quote Reply
Re: [gossy] Search results improvement with the Links cgi scripts In reply to
oh, i've spent my due hours searching for an answer within the mods and the links2 forum.
one thing
jsu7785.hypermart.net/search.txt
no longer exists,
the other needs some radio botton - and that's already al.........
I am surprised how little interest folks seem to have in relevant search results, everyone just plain happy with
Bart Simpson as a search result for art ????
I hope not

Quote Reply
Re: [gossy] Search results improvement with the Links cgi scripts In reply to
The code in this thread works:

http://www.gossamer-threads.com/...cgi?post=33256#33256

(see also post #21 in that thread)

Then to make ALL searches use exact words, include this in your search forms:

<input type="hidden" name="exact" value="exact" />

To make it an option in your advanced search, use this (thanks KM Wink ):

exact word:<input type="radio" name="exact" value="exact" checked="checked" /> any form of word:<input type="radio" name="exact" value="" />

Of course, if you use the option above, do not use the hidden input line...


Leonard
aka PerlFlunkie
Quote Reply
Re: [PerlFlunkie] Search results improvement with the Links cgi scripts In reply to
Thanks for that.
Now, i did find this post last long night along with a similar one.
With the one you hyperlinked to i had only ONE BIG confusion:
it says
"in your search.html template" - but what is quoted there is cgi scipt, so i assume that belongs really into search.pl under the search sub, right??

Also last night i tried a few variations on just this
if ($in{'exact'}) {
$link_results =~ s,(<[^>]+>)|(\b\Q$term\E\b),defined($1) ? $1 : "<STRONG>$2</STRONG>",gie;
$category_results =~ s,(<[^>]+>)|(\b\Q$term\E\b),defined($1) ? $1 : "<STRONG>$2</STRONG>",gie;
}
but of course i did not add the
<input type="hidden" name="exact" value="exact" />
into the html templates search box, because i have that in a flash embedded, and as of now i am not sure how i can embed this hidden value nto the Flash4 action

Quote Reply
Re: [gossy] Search results improvement with the Links cgi scripts In reply to
You're right, it should say search.cgi, not template. You can do this for exact matching, no option ( note the changing of a : to a ; )...


# Save the reg expressions to avoid rebuilding.
$or_match = $bool ne 'and';
if ($or_match) {
for (0 .. $#{$search_terms}) {
next if (length ${$search_terms}[$_] < 2); # Skip single letter words.

$tmp .= "m/\\b\Q${$search_terms}[$_]\E\\b/io &#0124; &#0124;";

}
}
else {
for (0 .. $#{$search_terms}) {
next if (length ${$search_terms}[$_] < 2); # Skip single letter words.

$tmp .= "m/\\b\Q${$search_terms}[$_]\E\\b/io &&";

}
}
chop ($tmp); chop ($tmp);


and for the other piece of code, change it from this

if ($in{'exact'}) {
$link_results =~ s,(<[^>]+> )|(\b\Q$term\E\b),defined($1) ? $1 : "<STRONG>$2</STRONG>",gie;
$category_results =~ s,(<[^>]+> )|(\b\Q$term\E\b),defined($1) ? $1 : "<STRONG>$2</STRONG>",gie;
}
else {
$link_results =~ s,(<[^>]+> )|(\Q$term\E),defined($1) ? $1 : "<STRONG>$2</STRONG>",gie;
$category_results =~ s,(<[^>]+> )|(\Q$term\E),defined($1) ? $1 : "<STRONG>$2</STRONG>",gie;
}


to this:

$link_results =~ s,(<[^>]+>)|(\b\Q$term\E\b),defined($1) ? $1 : "<STRONG>$2</STRONG>",gie;
$category_results =~ s,(<[^>]+>)|(\b\Q$term\E\b),defined($1) ? $1 : "<STRONG>$2</STRONG>",gie;


## Note this from the original post.
If there is a space after the +> take it out!
I just made the correction to the code above.

This should do the job for you.
You will not need the radio-button option or the hidden input code, as now ALL searches will return exact matches only.

For the observant, the only difference now is the addition of the two \b --

|(\b\Q$term\E\b),

This means that the search term (and ONLY the search term) has to fit between the beginning and the end of a word. Without those, ANY pattern match satisfies the search, which gets you shampoo when you searched for ham.


Leonard
aka PerlFlunkie

Last edited by:

PerlFlunkie: Feb 18, 2005, 10:19 AM
Quote Reply
Re: [PerlFlunkie] Search results improvement with the Links cgi scripts In reply to
At first glance that's a lot more then i expected, an actual solution i was looking for, hey, really, great job indeed, thanks.
Now of course i have to test it all out and see what happens, and i will post a comment on it back here in a day or two, wold be great to put something positive here, next step is to take on Google........ in another 100 years....
bu no joking, improvement of search results relevancy is something a few folks should try to sink their theet in, it's so important these days.
Quote Reply
Re: [gossy] Search results improvement with the Links cgi scripts In reply to
Have you removed the Category name from your search results? Use the Relevency mod, make this correction to it (remove red):

# If we have a hit, add it in!
if (($or_match && $match) or $andmatch) {
push (@{$link_results{$values[$db_category]}}, @values);
$numhits++; # We have a match!
}


And now your searches will sort by relevency only, not relevence per category.

Another way to aid your searches is to use the keyword mod (3 Shetsapolov mods). This creates a field for each link that is not seen by visitors, but that the search looks to for keywords related to the site, words not in the title or description.


Leonard
aka PerlFlunkie

Last edited by:

PerlFlunkie: Feb 18, 2005, 11:28 AM
Quote Reply
Re: [PerlFlunkie] Search results improvement with the Links cgi scripts In reply to
I did not yet get to your latest suggestion on increased relevancy - but will right now,
-- but first i have 1 tiny little problem still to solve with the mod for EXACT search results on ALL queries:
It works already, that's nice to have, - but the bolding is gone, it sure looks more informative to have the matching search term within the Title and Description texts stand out, probably it is somehow possible to add this back into it???

Quote Reply
Re: [PerlFlunkie] Search results improvement with the Links cgi scripts In reply to
Done that, works - thanks for the great suggestions, much appreciate, only the now missing BOLDING issue to go.
Well, if i may ask here, now that i have a different relevancy (i never displayed the categories on the search results pages) for the search results and checked it on a couple of simple test searches, i can't really tell what determines now the relevancy?
(it's been to long that i have been pocking around the scripts to set everything up the way i need, the only thing i remember doing is within the categories where i set it up to display the links by submission date, firts submission to a category / sub-category at the top of the first page then descending), but now that we removed that
[$db_category] from search.pl i guess that is not what determines the relevancy???
The reason why i ask is;
Can we somewhere mock around with the relevancy settings and to what extend???
Quote Reply
Re: [PerlFlunkie] Search results improvement with the Links cgi scripts In reply to
OK, the BOLDING now works, i overlooked that i needed to remove these
if ($in{'exact'}) {
$link_results =~ s,(<[^>]+> )|(\b\Q$term\E\b),defined($1) ? $1 : "<STRONG>$2</STRONG>",gie;
$category_results =~ s,(<[^>]+> )|(\b\Q$term\E\b),defined($1) ? $1 : "<STRONG>$2</STRONG>",gie;
}
from the previous attempt.
so, to make the process easy for others here the complete sections WITH the changes that give EXACT search results returns WITHOUT any changes to any html templates:
In search.pl (or search.cgi)
# If we want to bold the search terms...
if ($search_bold) {
foreach $term (@search_terms) {
# This reg expression will do the trick, and doesn't bold things inside <> tags such as
# URL's.
$link_results =~ s,(<[^>]+>)|(\b\Q$term\E\b),defined($1) ? $1 : "<STRONG>$2</STRONG>",gie;
$category_results =~ s,(<[^>]+>)|(\b\Q$term\E\b),defined($1) ? $1 : "<STRONG>$2</STRONG>",gie;

}
}

# If we have to many hits, let's build the next toolbar, and return only the hits we want.

and a bit further down in - well, here i made a VERY INTERESTING discovery.
I used first the original (with only the mod above) section in
sub search
# Save the reg expressions to avoid rebuilding.
$or_match = $bool ne 'and';
if ($or_match) {
for (0 .. $#{$search_terms}) {
next if (length ${$search_terms}[$_] < 2); # Skip single letter words.
$tmp .= "m/\Q${$search_terms}[$_]\E/io ||";
}
}
else {
for (0 .. $#{$search_terms}) {
next if (length ${$search_terms}[$_] < 2); # Skip single letter words.
$tmp .= "m/\Q${$search_terms}[$_]\E/io &&";
}
}
chop ($tmp); chop ($tmp);
instead of the suggested mod for this section:
# Save the reg expressions to avoid rebuilding.
$or_match = $bool ne 'and';
if ($or_match) {
for (0 .. $#{$search_terms}) {
next if (length ${$search_terms}[$_] < 2); # Skip single letter words.
$tmp .= "m/\\b\Q${$search_terms}[$_]\E\\b/io &#0124; &#0124;";
}
}
else {
for (0 .. $#{$search_terms}) {
next if (length ${$search_terms}[$_] < 2); # Skip single letter words.
$tmp .= "m/\\b\Q${$search_terms}[$_]\E\\b/io &&";
}
}
chop ($tmp); chop ($tmp);

The interesting thing was that without the changes in the
sub serach
the search results returned contained also links that DID NOT CONTAIN the actual search term within the Title or Description!
However, they did match the search criteria!
Example:
I used search term "artist".
It returned of course all links with this exact word in title and or description, but also other links from artists that did not have the word artist in either title nor desription.,
don't understand this part, but
A: i would like to
B: i kinda like it
So, now that i have at least the EXACT SEARCH RESULKTS worked out i still have one thing to go into,
i do not understand yet where the RELEVANCY of the results is at or how to optimize it.

Quote Reply
Re: [gossy] Search results improvement with the Links cgi scripts In reply to
Actually this line should have two pipes (||), the &#0124; was put in due to the old forum's tendency to mess up the proper code:

$tmp .= "m/\\b\Q${$search_terms}[$_]\E\\b/io &#0124; &#0124;";

$tmp .= "m/\\b\Q${$search_terms}[$_]\E\\b/io ||";



The relevency mod is explained in the instructions:

... it is first matching any words contained in the relevency fields. Each word it picks up in those fields it adds 1 to the relevency score. Then it goes through the title and category. Here it adds 2 for each one it sees there hence more emphasis is placed on the category or title.

#Any extra fields such as url and description you'd like to have it on
@relevency_fields = (2,5);
#These are the field numbers you want the relevency search to look through. Don't use title or category as these are already
#in use further on. Field 2 is URL and field 5 is description.


If you use the keyword mod, rememeber to add the keyword field in to the fields to be searched by the relevency mod.

Leaving the [$db_category] in will cause the results to sort by relevency WITHIN each category, so your results may look like:

link A relevency: 100
link B relevency: 50
link C relevency: 100
link D relevency: 50
link E relevency: 20
link F relevency: 100

...when of course they should look like this:

link A relevency: 100
link B relevency: 100
link C relevency: 100
link D relevency: 50
link E relevency: 50
link F relevency: 20

...which is the results without ANY relation to the categories.

Do a search on my site, and you will see that the categories are scattered, not grouped (including them is another code hack...).


Leonard
aka PerlFlunkie
Quote Reply
Re: [PerlFlunkie] Search results improvement with the Links cgi scripts In reply to
I just made the change to
$tmp .= "m/\\b\Q${$search_terms}[$_]\E\\b/io ||";
apparently there is no difference, it worked both ways and the results are the same.

Now i looked at the search results on your site. They are clean and well done.
I on my site have however taken out anything other then TITLE and DESCRIPTION
(see http://useroo.businessresearchsources.com/Useroo.html )
also, in
links.def
i have this settings:
# Field names you want to allow visitors to search on:
@search_fields = (1,5);

so again only title and description as i find URL's often have little to do with a sites content and the rest, ratings and such, just cloaks up your server space (i have posted a mod to remove all "waste" that can be built by links2 in the Links mod section, a few years ago already.

Now i want to get into the relevancy issue.
I did find the mod from gb resources, but i am not clear on this;
Is it only for the CATEGORIES???
My categories are listed the way i want them to (by submission date), i do not need to change anything in there, what i want to improve is only the results of a search query via search (keyword) box.
And in that again only for TITLE and DESCRIPTION,
so, this section of your answer i can currently not put together;
@relevency_fields = (2,5);
#These are the field numbers you want the relevency search to look through. Don't use title or category as these are already
#in use further on. Field 2 is URL and field 5 is description.

If you use the keyword mod, rememeber to add the keyword field in to the fields to be searched by the relevency mod.

Keyword mod i have not found yet.
Where those relevancy fields go in i am not sure.
( i do not actually want to display any relevancy numbers on the search results like on your site, just make the results more relevant, in short,
have all search results that contain the exact search term in the title AND description ONCE be listet first, only in the title second, and all results that have the exact search term in the description at the end.
Do i need the entire relevancy mod for this, or just the keyword mod i yet have to find???

Here some observation "on the job".
Not changing
$tmp .= "m/\Q${$search_terms}[$_]\E/io ||";
$tmp .= "m/\Q${$search_terms}[$_]\E/io &&";
while having changed
$link_results =~ s,(<[^>]+>)|(\b\Q$term\E\b),defined($1) ? $1 : "<STRONG>$2</STRONG>",gie;
$category_results =~ s,(<[^>]+>)|(\b\Q$term\E\b),defined($1) ? $1 : "<STRONG>$2</STRONG>",gie;
makes for the interesting mix i mentioned in my previous comment.
It only marks EXACT matchtes as bold but still returns also the old "ham"ster for "ham" as a search term, sometimes that is useful, sometimes it is just irrelevant, i guess i can always play around with it according to what's really in my database.

In this section:
# If we have a hit, add it in!
if (($or_match && $match) or $andmatch) {
push (@{$link_results{$values}}, @values);
$numhits++; # We have a match!
}

# Check to see if the category matches.
if ($regexp and !$seen{$values[$db_category]}++) {
$match=0; $andmatch = 1;
$_ = $values[$db_category];
$or_match ?
($match = $match || &{$regexp}) :
($match = &{$regexp});
$andmatch = $andmatch && $match;

if (($or_match && $match) or $andmatch) {
$numcat++;
push (@category_results, $values[$db_category]);
}
}
}
close DB;
i only removed the db_category as told, no difference yet, i assume this only works in connection with the rest of the relevancy mod??
Also i was wondering about the other
[$db_category] that are inside the # Check to see if the category matches. do they not need to be removed as well?

Quote Reply
Re: [gossy] Search results improvement with the Links cgi scripts In reply to
  
Keywords mod: http://sport.kc.ru/Links/3sh.htm (from Resource Section)

Relevency mod: http://cgi-resource.co.uk/pages/relevency.shtml

Quote:
Now i want to get into the relevancy issue.
I did find the mod from gb resources, but i am not clear on this;
Is it only for the CATEGORIES???


No, it is for links, but it will sort by relevency WITHIN each category:

Category 1
link A relevency: 100
link B relevency: 50

Category 2
link A relevency: 100
link B relevency: 50
link C relevency: 20

Category 3
link A relevency: 100

Removing the [$db_category] is to be done only after installing the relevency mod, and after doing this in search.cgi:

# Go through the hash just built, and build the complete link output. Store in $link_results.
foreach $setoflinks (sort keys %link_output) {
$cat_clean = &build_clean ($setoflinks);
$title_linked = &build_linked_title ($setoflinks);
# blocking next line will remove category results >
#
$link_results .= qq|<P>$title_linked\n|;
$link_results .= $link_output{$setoflinks};
}

These changes will result in your search results sorting ONLY by relevency:

link A relevency: 100
link B relevency: 100
link C relevency: 100
link D relevency: 50
link E relevency: 50
link F relevency: 20

And of course you do not need the Relevency: tag in your code, I have that in while I continue to test the mods.

Quote:
My categories are listed the way i want them to (by submission date), i do not need to change anything in there, what i want to improve is only the results of a search query via search (keyword) box.


These mods affect ONLY the search results.

Quote:
And in that again only for TITLE and DESCRIPTION,
so, this section of your answer i can currently not put together;
@relevency_fields = (2,5);


These are fields (URL and Description) that the relevency mod uses to rate a link, in addition to Title and Category. You can change them so that the URL is not included, and to include the keyword mod.

Quote:
Do i need the entire relevancy mod for this, or just the keyword mod i yet have to find???


Not really, the search will use the keyword field without installing the Relevency mod. But, the Relevency mod will sorth the links in a more logical manner.

Quote:
Also i was wondering about the other [$db_category] that are inside the # Check to see if the category matches. do they not need to be removed as well?


Apparently not. That section is for categories, not links, so I suspect it is currently useless since I do not sort by categories at all.


Leonard
aka PerlFlunkie
Quote Reply
Re: [PerlFlunkie] Search results improvement with the Links cgi scripts In reply to
In Reply To:
Thanks for your patience and good explaining " Enthusiast " (it helps build enthusiasm).
Funny, i already had this line blocked, must have been from another older mod or so.
# blocking next line will remove category results >
#
$link_results .= qq|<P>$title_linked\n|;
The mod from Sergey i need to study first, has me a bit confused, it seems to create some sort of "Search queries backup" or so??
Do i actually need this KEYWORD mod or parts of it at all to just have the search results appear sorted like this;
on top = EXACT match in title AND description
in the middle = EXACT match only in the title
at the bottom = EXACT match only in the description
and of course only one word in either title or description
or can the RELEVANCY mod alone do this???

Are you saying i do not need to do these changes from within the RELEVANCY mod

Add: ###Relevency search mod (cgi-resource.co.uk)####
$values[$db_votes] = 0;
foreach $term (split (/\s/, $in{'query'})) {
FIELD: foreach $field (@relevency_fields) {
and so on....
if i do not show any relevancy percentatages on the html code of the search_results template???

What i also don't understand is what for it would need the
template called searchlink.html
hence the
new sub in site_html_templates.pl called sub_html_searchlink
if i anyway do not want to display any relevancy quote in the search results but only want the sorting by relevancy as said above???

Quote Reply
Re: [gossy] Search results improvement with the Links cgi scripts In reply to
The Keyword mod does also have a search term log, you do not need to use it if not wanted.

The Keyword mod is not needed for anything but to include words ONLY used by the Links2 search function. These words do not have to appear anywhere else (Title, Description, etc.)

Quote:

Do i actually need this KEYWORD mod or parts of it at all to just have the search results appear sorted like this;
on top = EXACT match in title AND description
in the middle = EXACT match only in the title
at the bottom = EXACT match only in the description
and of course only one word in either title or description
or can the RELEVANCY mod alone do this???


There are three mods in that quote: Keyword, Exact Match, and Relevency. Any one can be used alone or in any combination with the others.


If you are going to use the relevency mod, and also use Votes, you would need to add a new Relevency field to the links.def. OR, it may work without writing to a field, I have not tried that. BUT, you can use it as written, and useing the Votes field, but not show the rank in your results by leaving out the Relevency: <%Votes%>.

On my site, you may notice that the "normal" link is different than the searchlink. It has different information, and this is done with the new searchlink template. No, it is not required, you can have your search results use the same link.html as your normal links.


Leonard
aka PerlFlunkie
Quote Reply
Re: [PerlFlunkie] Search results improvement with the Links cgi scripts In reply to
In the meantime i found my old mod
http://www.gossamer-threads.com/perl/gforum/gforum.cgi?post=218198;do=post_view_threaded#218198
it shows you, i have already long time ago stopped or removed all the
"cool" "new" & "rating" from being built for good
and of course i have the EXACT MATCH mod done now, and i guess it's clear the
Keyword mod is at present not needed.

So, the Relevancy mod is the only one i still have to install, but obviousely only a segment of it, the simplified version so to speak

Now i have made a first attempt.
I went step-by-step.
I did not make a new template hence did not make any changes to
site_html_templates:
sub site_html_searchlink

I added
(now in search.cgi)
@relevency_fields = (5);
left
$link_output{$setoflinks} .= &site_html_link (%tmp) . "\n";
as is because i want to use the regular links template
I added the 3 maxrelevent plus
my $relvar = (($maxrelevent) * 3);
to the reg expressions section - and saved and uploaded it.
This of course did not work as far as the actual sorting by relevancy is concerned, but at least i did not get a "Internal Server Error".
So, now i added
###Relevency search mod (cgi-resource.co.uk)####
in full
changed to
@{$link_results{$link}} = &dynamic_sorthit (@{$link_results{$link}});
and eventually added
sub dynamic_sorthit {
i did this carefully to avoid any typos and of course did not forget the CHMOD.
However, now i got an "Internal Server Error"
and i assume it has something to do with the

$sort_it = $votes;
$dbsort_it = $db_votes;
because i have disabled that whole vote stuff all together a long time ago.
I don't see any "db_votes" file in my entire Links2 dir........
IS THIS INDEED THE PROBLEM???
If so, how to get around it???
So, i went back to the step by step thing and changed the bottom 2 mod sections back to original so that i had been at the
###Relevency search mod end####
installed and to this point no Internal Server Error - but of course still no relevancy sorting either.
Next step was putting in
@{$link_results{$link}} = &dynamic_sorthit (@{$link_results{$link}});
of course i got a "fatal error" pointing to the missin sub routine
but as i put that sub routine at the bottom of search.cgi i am back at the Internal Server Error, i made sure i copy&paste exactly what's in the mod.......
So, somewhere is a worm in it...
I made one last try in removing the category stuff as i do not display those, but that did not change anything.
How can i sort this out??