marvin at rectangular
Jul 11, 2006, 9:19 PM
Post #4 of 5
On Jul 11, 2006, at 6:24 AM, henka [at] cityweb wrote:
> How can one influence the ranking of results? Let's say you have a
> special field with an integer value which is determined not by the
> indexer, but by some other algorithm, and when this index page is hit
> because of a standard search query, you would like this special
> field to
> influence the ranking.
> Can this be achieved?
At present, only with hacks, and not for large datasets.
You've described the Sort functionality we've been discussing in
other threads. It's implemented in Lucene using a FieldCache, which
is what we've been faking using Perl arrays. FieldCache in Lucene is
an array of field values, just like here, but instead of being
retrieved from stored documents as Gavin's doing, the values are
loaded from the term dictionary and are parsed as either integers,
floats, or strings, and an array of the low-level data type is built
up. This technique is still memory-intensive for large document
collections, but considerably less so than using a Perl array.
Inverted indexes excel at relevance scoring. Sorting on secondary
fields while reading from disk is not their forte, because that
information is not normally housed in the data structures used for
scoring and heavily optimized for speed. However, if you have the
memory and the time to pre-load, the FieldCache technique is quite
efficient. It ought to be faster, for instance, than naively sorting
document numbers obtained during a search against rows in a flat file
database -- because if you already have the sort field values, all
you need are document numbers, and those are quicker to extract from
a Lucene index than from a data file with fixed width rows containing
The documentation for Lucene's Sort class explains some of the
caveats around selecting a field which you would load into a FieldCache.