
marvin at rectangular
Mar 21, 2007, 12:18 PM
Post #12 of 46
(1338 views)
Permalink
|
On Mar 20, 2007, at 11:55 AM, Chris Nandor wrote: > At 9:39 -0700 2007.03.15, Marvin Humphrey wrote: >> On Mar 8, 2007, at 2:36 PM, Chris Nandor wrote: >>> how do I combine my $range_filter with other filters? Is it >>> possible? >> >> Not presently. I've been contemplating how to make this available >> (i.e. procastinating) while working on a bunch of other problems. >> The trick is how ranges should score. > > This is something we need pretty soon; is there anything I can do > to help > make it work? Yes, there is. QueryFilter needs to be changed to cache BitVector objects in a hash, keyed per IndexReader. The bits() method should be changed to take an IndexReader rather than a Searcher as an argument, and so should make_collector(). Calls to those methods in the library and the test suite need to be adjusted. Tests need to be added to t/507-query_filter.t to ensure that... * The caching mechanism works and we don't keep generating new BitVectors. * The correct BitVector is returned by the bits() method (i.e. not one belonging to another IndexReader). Ideally, destruction of the cached BitVectors held by a QueryFilter object would be triggered when the IndexReader gets destroyed, since they're no longer of any use after that. That's a little harder, and may require some sort of stupid hack to store references to the BitVectors in IndexReader along with calling weaken() on the refs held by the QueryFilter object. The point is that we don't want to accumulate BitVectors when the Searcher/Reader is being continually refreshed. RangeFilter also needs make_collector() changed to be keyed off of an IndexReader. That will be straightforward, as the first thing RangeFilter->make_collector does right now is call get_reader(). Tests and Library calls to the method need to be adjusted, but won't need any changes to their substance. RangeFilter then needs a bits() method added to it. It will probably look like this... sub bits { my ( $self, $reader ) = @_; # collect docs that have a value for this field which passes the filter my $collector = KinoSearch::Search::HitCollector->new_bit_coll; my $searcher = KinoSearch::Searcher->new( reader => $reader ); my $query = KinoSearch::Search::MatchFieldQuery->new( field => $self->{field}, ); $searcher->collect( query => $query, filter => $self, collector => $collector, ); return $collector->get_bit_vector; } Searcher->collect needs to be created, but that will basically be a refactoring of Searcher->search_hit_collector which will be trivial for me and hard for anyone else... so I'll handle that. MatchFieldQuery (which will be nearly identical to TermQuery) also needs to be written. Writing tests to ensure that a Searcher returns correct results when supplied with a MatchFieldQuery will be pretty straightforward and would be appreciated. I'd love it if someone else wanted to get involved in writing MatchFieldQuery itself, but such a person would need to be be willing to absorb some information retrieval theory -- so I'll assume it will be my sole responsibility (as will MatchFieldScorer) unless someone expresses an interest. Finally, we need to create PolyFilter. PolyFilter will have an add() method which works like this: $poly_filter->add( filter => $filter, logic => 'AND', ); PolyFilter->bits() will call bits() on each of its sub-filters, then it will combine the BitVectors together. Like QueryFilter, it will cache filters per-IndexReader. At present, BitVector only has a logical_and() method; if PolyFilter is to be able to combine filters using OR, XOR, etc, the appropriate methods need to be added to BitVector. This is deceptively difficult. It involves classic C bit-twiddling, but has to be maximally efficient, and there are a lot of nasty corner cases that need tests. I'm assuming I'll be handling this one. Still with me? ;) I also ask that potential hackers agree contribute their code to Apache. That way we can use it in Lucy without complication. Marvin Humphrey Rectangular Research http://www.rectangular.com/
|