Gossamer Forum
Home : Products : Links 2.0 : Customization :

ignoring words that are too common

Quote Reply
ignoring words that are too common
hi again. i was searching around the forums for a modification that would let my search simply ignore common words (not by numbers) such as 'and', 'of', 'it', 'the' etc. but couldn't find a definitive answer. i believe the ask.com mod ignores every instance of say, 'and', so basically every word that contains those letters in the exact order will be stripped of. like 'andrew' will be treated as 'rew' and 'bandstand' as 'bdstd'. please help! thank you!

Quote Reply
Re: ignoring words that are too common In reply to
If you believe that's a case you can just change the regexpr to remove whole words only.

without seeing the codes used, I'd say something like...
Code:
my @ignored = qw(a i if is an it so to do);
foreach my $term (@query) {
(@query = grep {!/^$term$/} @query) if ( grep {/^$term$/} @ignored);
}
Happy coding,

--Drew
http://www.camelsoup.com/links_mods/
Quote Reply
Re: ignoring words that are too common In reply to
where do i insert this piece of code? and how do i implement this into search.cgi?

Quote Reply
Re: ignoring words that are too common In reply to
alright, got to solve my problem. using what junko posted and bmx's mod, i combined these to simply ignore every common word written in blockterm.txt. it's different because if you type in "What is a bandstand?" as your query it ignores the common words (what is a) and does the search only on bandstand. it doesn't block, it skips, ignores, whatever you want to call it and searches on the remaining words. moverover, 'bandstand' remains intact and not treated as 'bstd' just because it contains 'and' in it.

thanks for everyone's help. took me the whole of yesterday and until 9pm tonight to solve this dilemma. perl is such a pain, but a joy at the same time. thanks bmx and junko!! i'll have to do more testing before i could post it here if anyone wants it.