
marvin at rectangular
Jul 11, 2006, 8:44 PM
Post #6 of 6
(207 views)
Permalink
|
On Jul 11, 2006, at 1:59 AM, Tony Bowden wrote: > On Mon, Jul 10, 2006 at 07:38:05PM -0700, Marvin Humphrey wrote: >>> How to Sort the search result in KinoSearch. >>> I want sort by numeric field the search result >> I'm sorry, but it's not currently possible to sort KinoSearch search >> results on anything except relevance score. > > Couldn't this be done the same way as the approach in the 'filtering > search results' thread? > > Pre-cache the score you want to sort by, then get the BitVector of > search results, map them to the score array and sort. In KinoSearch, unlike Plucene, BitVector is not a public class. I plan to make it public eventually, but with tweaks. Same deal with the bits() method from QueryFilter, which isn't public either. The go-slow approach to expanding the API is deliberate and is one of the reasons that KinoSearch has had relatively few bug reports for a project of its size and complexity. Having a few intrepid individuals tinker with non-public stuff helps to vet potential API changes before they are made public. Also, the explanations I send to the mailing list serve as first drafts for documentation. I have to be up for the task of supporting these experiments and explaining lightly documented functionality. While KinoSearch is heavily commented throughout, the private function descriptions are sometimes rather light. Last night I was nearing the end of what was a 13-hour workday when I sent those emails. It was not a good time for me to explain all the caveats surrounding a hack. However, I didn't want the question to linger unanswered any longer than it already had. > I haven't actually done this yet, but it seems like it would be quite > simple (as long as you're able to re-build your scoring cache each > time > you update the index). Or am I missing something? You're correct that it would work, though as has been pointed out, the technique does not scale up as well as other aspects of KinoSearch. I think the right way to handle the need for matching categories is to implement an abstract IntSet class, of which BitVector would be one implementation. One way of replacing the QueryFilter::bits() hack would be to add a search_intset() method to Searcher. However, Searcher and InvIndexer, as KinoSearch's two main classes, have a tendency to accumulate clutter, so a better solution is to make something like Plucene's search_hc() method available. Unfortunately, the HitCollector class is one of those thing which simply cannot be done well in a dynamic language -- the callbacks have to be C function pointers or it's just too slow with large datasets. I haven't yet figured out how to expose a decent API for it. We might provide a smorgasboard of pre-built HitCollectors matching the most common use cases, but that's not nearly as good. I'm inching towards the conclusion that there's nothing for it but to make HitCollector functionality a permanent alpha, advanced feature and document a C API. Sort will ultimately be implemented using a FieldCache of some kind. I'd thought I was going to need this for a project, but it turns out I didn't so I haven't gotten to it. Not a lot is going to happen for the next 3 or 4 weeks. Stuff is just too crazy with my main clients. Marvin Humphrey Rectangular Research http://www.rectangular.com/
|