
bramble.andrew at gmail
Jul 29, 2008, 4:33 AM
Views: 4915
Permalink
|
Given my head is in this problem space already , I figure there's no harm distilling some thoughts while the ideas are 'fresh'. Forgive me if there are some details of KinoSearch that I have yet to properly understand. To provide faceted search (FS) capability for KinoSearch (KS) requires, to quote Marvin "massive server-side caching". We like!! A FS class would trawl the index on startup (or even better during index time and store with the invindex, there's an API for index overlays... right?!?) and generate bitvectors for desired field(s) terms - storing a 1 in doc_num position for documents posessing the given term. For a field with 100 facets or terms - you'd need 100 bit vectors of at most max_docs bits long. An FS query would wrap a regular KS query - AND'ing the query results with each term's cached bitvector to derive a count of documents within the wrapped query that posess that term ( 100 ANDs + 100 counts of bitvectors no greater than maxdoc bits ). I have made a VERY naive implementation of this without glueing into KinoSearch XS/C, since I confess to _barely_ grokking charmony + XS + C beyond kindergarten level. Facet::Counter2 (yes I embrace version control really) is quite hopeless from a practical standpoint and will only count the facets of documents returned by KS::Search::Hits , limited to num_wanted. My next goal would be to better understand KS internals so as to * use KS BitVectors to replace scalars and vec * make Facet::Counter into KSx::Search::FacetQuery to collect the >0 scored results of a child query and count facets this way instead of using KS::Search::Hits Constructive abuse welcome. AB
|