Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: kinosearch: discuss

Boost value query

 

 

kinosearch discuss RSS feed   Index | Next | Previous | View Threaded


henka at cityweb

Jul 11, 2006, 4:04 AM

Post #1 of 5 (138 views)
Permalink
Boost value query

Hello all and sundry,

wrt this thread
http://www.rectangular.com/pipermail/kinosearch/2006-January/000034.html,
wherein the issue of boosting a "special" field is discussed, amongst
other things, what is a *reasonable* boost value? 3, 1, 0.1?

I understand the default boost value is 1 - will this suffice for a
typical search to ensure more relevant results surface first, or is it
better to use 3?

Furthermore, let's say you have tens of special fields, each with a boost
value - same question applies, default of 1, or some other value? -- maby
a sliding scale of importance: say from 3 to 1 in steps of 0.1?

Anyone with experience or comments will be appreciated.


henka at cityweb

Jul 11, 2006, 6:27 AM

Post #2 of 5 (136 views)
Permalink
Boost value query [In reply to]

Hello,

How can one influence the ranking of results? Let's say you have a
special field with an integer value which is determined not by the
indexer, but by some other algorithm, and when this index page is hit
because of a standard search query, you would like this special field to
influence the ranking.

Can this be achieved?

Thanks


marvin at rectangular

Jul 11, 2006, 8:03 PM

Post #3 of 5 (136 views)
Permalink
Boost value query [In reply to]

On Jul 11, 2006, at 4:01 AM, henka [at] cityweb wrote:

> Hello all and sundry,
>
> wrt this thread
> http://www.rectangular.com/pipermail/kinosearch/2006-January/
> 000034.html,
> wherein the issue of boosting a "special" field is discussed, amongst
> other things, what is a *reasonable* boost value? 3, 1, 0.1?

Those are all fine under certain circumstances. It would be unusual
to downgrade a field to 0.1 though unless you really didn't want it
to factor in. Most people don't think that way.

> I understand the default boost value is 1 - will this suffice for a
> typical search to ensure more relevant results surface first, or is it
> better to use 3?

I'd recommend that you boost short fields like title some. 2 or 3 is
a good starting point, but higher might be reasonable based on how
important title is to you.

> Furthermore, let's say you have tens of special fields, each with a
> boost
> value - same question applies, default of 1, or some other value?
> -- maby
> a sliding scale of importance: say from 3 to 1 in steps of 0.1?

You could experiment with that by building complex queries up and
assigning different boosts to the different components.

Lucene has an "explain" method that makes this sort of refinement a
bit easier, but it is unimplemented in KinoSearch.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


marvin at rectangular

Jul 11, 2006, 9:19 PM

Post #4 of 5 (136 views)
Permalink
Boost value query [In reply to]

On Jul 11, 2006, at 6:24 AM, henka [at] cityweb wrote:

> Hello,
>
> How can one influence the ranking of results? Let's say you have a
> special field with an integer value which is determined not by the
> indexer, but by some other algorithm, and when this index page is hit
> because of a standard search query, you would like this special
> field to
> influence the ranking.
>
> Can this be achieved?

At present, only with hacks, and not for large datasets.

You've described the Sort functionality we've been discussing in
other threads. It's implemented in Lucene using a FieldCache, which
is what we've been faking using Perl arrays. FieldCache in Lucene is
an array of field values, just like here, but instead of being
retrieved from stored documents as Gavin's doing, the values are
loaded from the term dictionary and are parsed as either integers,
floats, or strings, and an array of the low-level data type is built
up. This technique is still memory-intensive for large document
collections, but considerably less so than using a Perl array.

Inverted indexes excel at relevance scoring. Sorting on secondary
fields while reading from disk is not their forte, because that
information is not normally housed in the data structures used for
scoring and heavily optimized for speed. However, if you have the
memory and the time to pre-load, the FieldCache technique is quite
efficient. It ought to be faster, for instance, than naively sorting
document numbers obtained during a search against rows in a flat file
database -- because if you already have the sort field values, all
you need are document numbers, and those are quicker to extract from
a Lucene index than from a data file with fixed width rows containing
other information.

The documentation for Lucene's Sort class explains some of the
caveats around selecting a field which you would load into a FieldCache.

http://lucene.apache.org/java/docs/api/org/apache/lucene/search/
Sort.html

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


henka at cityweb

Jul 11, 2006, 11:47 PM

Post #5 of 5 (138 views)
Permalink
Boost value query [In reply to]

>
> On Jul 11, 2006, at 6:24 AM, henka wrote:
>
>> Hello,
>>
>> How can one influence the ranking of results? Let's say you have a
>> special field with an integer value which is determined not by the
>> indexer, but by some other algorithm, and when this index page is hit
>> because of a standard search query, you would like this special
>> field to
>> influence the ranking.
>>
>> Can this be achieved?
>
> At present, only with hacks, and not for large datasets.

Thanks for the detailed responses to this and the previous question.

kinosearch discuss RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.