
marvin at rectangular
Jul 2, 2007, 8:18 AM
Views: 1638
Permalink
|
On Jun 29, 2007, at 5:45 AM, Hans Dieter Pearcey wrote: >> BooleanQuery? > > I don't see how I'd do this just in terms of matching. Maybe I don't > understand SHOULD? If you add two clauses to a BooleanQuery with SHOULD, then their result sets get OR'd together. $bool_query->add_clause( query => $term_query_a, occur => 'SHOULD' ); $bool_query->add_clause( query => $term_query_b, occur => 'SHOULD' ); > If some particular selection mechanism is available both as a Query > and as a > Filter -- e.g. BooleanQuery, which you can also use as part of a > Queryfilter -- > is there any reason to prefer one over the other, assuming that you > are (as I > am) only interested in matching, not scoring? Do Filters have any > kind of > startup overhead compared to Queries, etc.? If you don't care about scoring and you can reuse Filters, you should use as many as practical. Scorers require hitting the disk. QueryFilters and PolyFilters, once their internal caches are warmed, do not. The startup cost for a RangeFilter only happens once per field per IndexReader, when a portion of that field's lexicon is read into memory. The main per-query cost is a single burst of disk activity to look up the search term and and assign it a "term number" based on where it falls in the lexicon, after which everything else is CPU crunching and memory access. >> I think the ultimate solution will be to make MatchFieldQuery public >> and give it a constant score which defaults to zero. Then it could >> be combined with a RangeFilter to produce the same effect as a >> ConstantScoreRangeQuery. MatchFieldQuery is relatively simple, and >> lets you do things that require kludges otherwise. > > I had found MatchFieldQuery, and thought that that might work, but > didn't know > enough internals to be sure. I like this idea. What can I do to > make it work? Sorry for the delayed response -- I had to think this over. I've resisted making MatchFieldQuery public because I didn't feel like its API was mature enough. I'm still not sure about it, and I don't want to add it to the list of things that have to get done prior to the release of 0.20. For the time being, I suggest you go ahead and use MatchFieldQuery as is, but mark that aspect of your module experimental. Looking forward, you can help move things along by participating in design discussions about subclassing strategies. A lot of the KS public API and class design is pretty solid. To touch on one aspect, I'm pleased that the Query components allow you to create your own query building mechanism as an alternative to QueryParser. I'm also more certain than ever that the decision to limit QueryParser to a much simpler syntax than its Lucene counterpart was the right one. What you are doing demonstrates that it is possible to write custom KSx extensions to play the Query- building role, and if someone wants to write a Lucene-ish query parser that supports syntax like 'boost^3', they can. Core KinoSearch, by opting out of the more complex high-level task, lowers its support costs and maintains greater flexibility. This is successful modularization, "divide and conquer", "loose coupling", etc, in action. Every class has its own reasonably contained problem domain. There are no "God Objects" that know too much or do too much. The components tolerate being assembled into many different configurations. The main goal of KinoSearch 0.30 will be to reproduce this flexibility across more phases of search and indexing. Scorer should be public and it should not be so challenging to subclass. If that were already the case, somebody could whip up KSx::Search::RangeQuery and you could use it without waiting for me to act. For 0.20, though, it's time to think reductively (to echo a sentiment expressed by Nathan Kurz). Rather than add new public APIs, it's time to yank Hits->seek (and simplify Searcher), migrate some documentation out of POD and onto to the new wiki, and possibly redact the public APIs for Analyzer, Token, and TokenBatch, marking them as experimental once again so that we have the option to modify them. Marvin Humphrey Rectangular Research http://www.rectangular.com/
|