
marvin at rectangular
Apr 13, 2007, 12:29 AM
Post #9 of 15
(1109 views)
Permalink
|
On Apr 12, 2007, at 11:42 PM, Chris Nandor wrote: > Sure ... I am just not entirely sure what the resulting value > should be for > "no lower bound" and "no upper bound." Should no lower be -2? 0? > And > should no upper be ... num_docs? No lower should probably be 0. No upper should be Reader->max_doc. The range filter uses an IntMap object, which is essentially just an array of i32_t. Array index corresponds to doc num and value corresponds to "term number". Term number is the would-be array index of the term if all the terms in that field were marshalled into an array. Docs without a term in that field are assigned -1 by default. -1 is the lowest possible value that can appear in the array; therefore assigning -1 or less to lower_bound means that no document will be excluded by the lower bound test, even those with no value for the field. Also, assigning -2 to the upper_bound means that no hits will be collected. Reader->max_doc works out to 1 greater than the highest possible term number. RangeFilter can only be used against an unanalyzed field, which means we have at most one unique term per document. Reader- >max_doc is 1 greater than the maximum number of documents; therefore, it is also 1 greater than the maximum number of terms in an unanalyzed field. > > Or should this instead be smarter and skip those bounds checks? For > example, return "-3" to mean no bound check, then: > > // no upper or lower bound > if (data->lower_bound == -3 && data->upper_bound == -3) { > data->inner_coll->collect(data->inner_coll, doc_num, score); > } > // no lower bound > else if (data->lower_bound == -3 && locus <= data->upper_bound) { > data->inner_coll->collect(data->inner_coll, doc_num, score); > } > // no upper bound > else if (data->upper_bound == -3 && locus >= data->lower_bound) { > data->inner_coll->collect(data->inner_coll, doc_num, score); > } > else if (locus >= data->lower_bound && locus <= data- > >upper_bound) { > data->inner_coll->collect(data->inner_coll, doc_num, score); > } This is an implementation detail, so it's not something I'm gonna sweat too hard. However, my mild preference is to resolve this in RangeFilter.pm and keep the HitCollector code, which is a performance- critical inner loop, as simple as possible. These conditionals aren't going to make a difference, but as a general rule, I try to keep the inner loop stuff uncluttered. Marvin Humphrey Rectangular Research http://www.rectangular.com/
|