Gossamer Forum
Home : Products : DBMan : Customization :

Ignoring stopwords in search results

Quote Reply
Ignoring stopwords in search results
I am using dbman to manage lists of journals.
Some of the journals have titles like "The Lancet" & "The Journal of Extrasensory Perception".
When I do a keyword search and retrieve an alphabetical list of titles these titles appear in the T's (ie sorted by the word "The").
Any ideas about how to get around this?
Quote Reply
Re: Ignoring stopwords in search results In reply to
To list alphabetical by the Title field, all you have to do add the following codes:

sub html_view_search

in the form codes:

Code:
<input type="hidden" name="so" value="ascend">
<input type="hidden" name="sb" value="3">

Change 3 to the field number of your Title field.

Regards,

------------------
Eliot Lee
Anthro TECH,L.L.C
www.anthrotech.com
----------------------




Quote Reply
Re: Ignoring stopwords in search results In reply to
 
I think you misunderstand me. I know how to sort by the title field.

My problem is that several hundred entries in the title field begin with the word "The" and therefore these entries appear under the letter T in an alphabetical sort.

For example, I would like the entry "The Lancet" to be treated as if it were just "Lancet"

An obvious solution would be to remove the "The" before it is added to the dbman database - Unfortunately I have very little control over how the data is provided to me.

My own thought was there might be some way of automatically generating a field in the database which is a mirror of the title field without the "The" prefix.

Quote Reply
Re: Ignoring stopwords in search results In reply to
Okay....

Then try the following:

1) Add the following variable to your default.cfg
file:

Code:
$refnormlang = "your you www with will why who which where when what web we was want w used use two to this they these there then then them their the that than t so site should see s re quot page our other org or
only one on of now not no new net nbsp name n my ms mrs mr most more me may lt like just its it is in if i http how he have has gt get from for find ed do d com can by but been be b at as are any and an amp also all after about a";

(You can add additional words or delete words between the double
quotes.)

2) Then add the following codes in your sub html_view_success
routine in the html.pl file:

Code:
foreach $column (@db_cols) {
if ($in{$column}) {
$search_terms .= " $in{$column},";
}
}
chop($search_terms);

@normlang = split(/ /, $refnormlang);
foreach $excnormlang (@normlang) {
$in{'search_terms'} =~ s/\b$excnormlang\b/\ /gi; }
}

Source: Human Language like Ask.com Mod in the
Gossamer Threads Resources: Links: Modifications: Links2

gamefinder.psybercounsellor.com/mods/hls.html

I REALLY hope this helps.

Regards,

------------------
Eliot Lee
Anthro TECH,L.L.C
www.anthrotech.com
----------------------






[This message has been edited by Eliot (edited January 13, 2000).]
Quote Reply
Re: Ignoring stopwords in search results In reply to
I really appreciate the tip from Eliot.
Couldnt get it to work straight away but I will have a fiddle and post a solution if I am successful.
One thing, The code you sent me seemed to have too many }'s.
Quote Reply
Re: Ignoring stopwords in search results In reply to
You probably added an extra right bracket. If you look at the codes more carefully, you will see that the left and right brackets MATCH!

Howz about posting the error messages you are receiving? That would HELP us to HELP YOU.

Regards,

------------------
Eliot Lee
Anthro TECH,L.L.C
www.anthrotech.com
----------------------






[This message has been edited by Eliot (edited January 13, 2000).]
Quote Reply
Re: Ignoring stopwords in search results In reply to
I could swear there is an extra } in the code you provided. The error message I get is the usual "cgi programming fault - you have an unmatched } in your document at ...".

I have tried various permutations of the code (less one }) and I get a result but with no change from my original output.

If you go to

http://www.library.unsw.edu.au/cgi-bin/Data/db.cgi?db=ejcat&uid=default&Title-gt=T&Title-lt=U&mh=300&sb=1&so=ascend&view_records=View+Record s

you will see exactly the problem that I have - this is the T page of a journal list : note all the "The American Journal of ..." titles which I would like to see appearing on the A page.
Quote Reply
Re: Ignoring stopwords in search results In reply to
Try adding The to the list of words in the $refnormlang variable.

Regards,

------------------
Eliot Lee
Anthro TECH,L.L.C
www.anthrotech.com
----------------------