marvin at rectangular
Jul 11, 2006, 8:44 PM
Post #6 of 6
On Jul 11, 2006, at 1:59 AM, Tony Bowden wrote:
> On Mon, Jul 10, 2006 at 07:38:05PM -0700, Marvin Humphrey wrote:
>>> How to Sort the search result in KinoSearch.
>>> I want sort by numeric field the search result
>> I'm sorry, but it's not currently possible to sort KinoSearch search
>> results on anything except relevance score.
> Couldn't this be done the same way as the approach in the 'filtering
> search results' thread?
> Pre-cache the score you want to sort by, then get the BitVector of
> search results, map them to the score array and sort.
In KinoSearch, unlike Plucene, BitVector is not a public class. I
plan to make it public eventually, but with tweaks. Same deal with
the bits() method from QueryFilter, which isn't public either.
The go-slow approach to expanding the API is deliberate and is one of
the reasons that KinoSearch has had relatively few bug reports for a
project of its size and complexity.
Having a few intrepid individuals tinker with non-public stuff helps
to vet potential API changes before they are made public. Also, the
explanations I send to the mailing list serve as first drafts for
I have to be up for the task of supporting these experiments and
explaining lightly documented functionality. While KinoSearch is
heavily commented throughout, the private function descriptions are
sometimes rather light.
Last night I was nearing the end of what was a 13-hour workday when I
sent those emails. It was not a good time for me to explain all the
caveats surrounding a hack. However, I didn't want the question to
linger unanswered any longer than it already had.
> I haven't actually done this yet, but it seems like it would be quite
> simple (as long as you're able to re-build your scoring cache each
> you update the index). Or am I missing something?
You're correct that it would work, though as has been pointed out,
the technique does not scale up as well as other aspects of KinoSearch.
I think the right way to handle the need for matching categories is
to implement an abstract IntSet class, of which BitVector would be
one implementation. One way of replacing the QueryFilter::bits()
hack would be to add a search_intset() method to Searcher. However,
Searcher and InvIndexer, as KinoSearch's two main classes, have a
tendency to accumulate clutter, so a better solution is to make
something like Plucene's search_hc() method available.
Unfortunately, the HitCollector class is one of those thing which
simply cannot be done well in a dynamic language -- the callbacks
have to be C function pointers or it's just too slow with large
datasets. I haven't yet figured out how to expose a decent API for
it. We might provide a smorgasboard of pre-built HitCollectors
matching the most common use cases, but that's not nearly as good.
I'm inching towards the conclusion that there's nothing for it but to
make HitCollector functionality a permanent alpha, advanced feature
and document a C API.
Sort will ultimately be implemented using a FieldCache of some kind.
I'd thought I was going to need this for a project, but it turns out
I didn't so I haven't gotten to it. Not a lot is going to happen for
the next 3 or 4 weeks. Stuff is just too crazy with my main clients.