colossus.forbin at gmail
Dec 31, 2007, 2:34 PM
Post #8 of 8
On Dec 31, 2007 1:57 PM, Marvin Humphrey <marvin [at] rectangular> wrote:
> On Mon, Dec 31, 2007 at 09:21:24AM -0800, colossus forbin wrote:
> > This approach would make sense for a large site that expects a large
> > set of search terms, but what about a small site expecting a limited
> > number of terms, such as a small ecommerce site with a limited number
> > of products. If a user misspells a product name, it would make sense
> > to not only offer a corrected spelling, but perhaps suggest a similar
> > product which is carried by the site. These actions would be done at
> > run-time so it would be important to know which terms did not
> > contribute to any hits.
> There's a method on Searcher, doc_freq(), which returns an integer telling you
> how many documents a given term occurs in. Terms with a doc_freq of 0
> have no chance of contributing to a score.
> Searcher->doc_freq isn't public yet, but there's a good chance it will
> be exposed in time -- document frequency information will always be
> needed during the Query-to-Scorer compilation phase for weighting. I don't
> think the API has changed since 0.05, so go ahead and use it, with caution. :)
Excellent! Thank you.
> Once you've identified your terms, you need to figure out what to suggest.
> Aspell would help with ordinary words, but might not help with product names.
> Maybe one of the edit-distance CPAN modules could help.
Aspell should work with product names as it allows the use of custom
dictionaries (not just supplemental ones). And it makes suggestions
based upon phonetic and edit-distance misspellings.
I noticed that xapian has support for misspelled terms, so maybe it
would make sense to have a recipe on the wiki on how to use kinosearch
KinoSearch mailing list
KinoSearch [at] rectangular