bramble.andrew at gmail
Jul 29, 2008, 4:33 AM
Given my head is in this problem space already , I figure there's no
harm distilling some thoughts while the ideas are 'fresh'.
Forgive me if there are some details of KinoSearch that I have yet to
To provide faceted search (FS) capability for KinoSearch (KS)
requires, to quote Marvin "massive server-side caching". We like!!
A FS class would trawl the index on startup (or even better during
index time and store with the invindex, there's an API for index
overlays... right?!?) and generate bitvectors for desired field(s)
terms - storing a 1 in doc_num position for documents posessing the
given term. For a field with 100 facets or terms - you'd need 100 bit
vectors of at most max_docs bits long.
An FS query would wrap a regular KS query - AND'ing the query results
with each term's cached bitvector to derive a count of documents
within the wrapped query that posess that term ( 100 ANDs + 100 counts
of bitvectors no greater than maxdoc bits ).
I have made a VERY naive implementation of this without glueing into
KinoSearch XS/C, since I confess to _barely_ grokking charmony + XS +
C beyond kindergarten level.
Facet::Counter2 (yes I embrace version control really) is quite
hopeless from a practical standpoint and will only count the facets of
documents returned by KS::Search::Hits , limited to num_wanted.
My next goal would be to better understand KS internals so as to
* use KS BitVectors to replace scalars and vec
* make Facet::Counter into KSx::Search::FacetQuery to collect the >0
scored results of a child query and count facets this way instead of
Constructive abuse welcome.