
marvin at rectangular
Apr 28, 2008, 1:00 PM
Post #1 of 2
(815 views)
Permalink
|
On Apr 27, 2008, at 3:07 PM, Nathan Kurz wrote: > You could have the Parser build a tree with a special field type of > 'any', which then gets expanded out to multiple fields at a later > stage. I think the most elegant solution is to use undef/NULL for the 'any' field type. (We'll have to modify TermQuery etc. to accept an undef value for 'field'.) > I'd sort of like to have this stage anyway, since it keep the > Parser more independent of the Index, and would let me do tricks like > replacing OrScorer with MyOrScorer. OK, let's run with the idea of an intermediate representation before we get to Query objects. The final output of the parser *has* to be a Query object, because this code snippet has to work: my $query = $query_parser->parse($query_string); my $hits = $searcher->search( query => $query ); Internally, though, we can divide up parse() into two stages: my $tree = $query_parser->tree($query_string); my $query = $query_parser->build($tree); The tree stage would be constructed from "ParseNode" objects and would be single-field (except where the field gets explicitly specified via the query string). Hmm. I think we'll need ParseNode subclasses like ANDNode, ORNode, PhraseNode, and so on. Are we any better off than when the tree was being built with ANDQuery, ORQuery, etc? I don't see how we are. So... keep the two-stage compilation, but have the both stages output Query objects and just make it possible to walk child nodes for ANDQuery, ORQuery, etc. > Instead of trying to build an > optimizing Parser, you could do the optimizations and checks in a > separate pass and keep the Parser simpler. It seems like a sound principle that the parser should not optimize. Where do you normally perform optimization? Not in the lexer, not in the parser... optimization is normally the compiler's job. And that finally suggests a decent replacement name for the Weight class: it should be renamed to KinoSearch::Search::Compiler. To refine what we want out of the parser class, though... it ought to be abstract syntax tree built from Query objects, rather than a parse tree per se. (<http://en.wikipedia.org/wiki/Abstract_syntax_tree>) For example, we want the output of the parser for both of these query string inputs... 'foo -bar' 'foo AND NOT bar' ... to be exactly equivalent to this: my $foo_query = TermQuery->new( field => undef, term => 'foo' ); my $bar_query = TermQuery->new( field => undef, term => 'bar' ); my $not_bar_query = NOTQuery->new( query => $bar_query ); my $and_query = ANDQuery->new; $and_query->add_child( query => $foo_query ); $and_query->add_child( query => $not_bar_query ); That's lossy -- because we lose the ability to recreate the input query string -- but we don't lose any intent and it's not a true optimization. Marvin Humphrey Rectangular Research http://www.rectangular.com/ _______________________________________________ KinoSearch mailing list KinoSearch [at] rectangular http://www.rectangular.com/mailman/listinfo/kinosearch
|