
marvin at rectangular
Jan 30, 2008, 9:51 AM
Post #1 of 1
(587 views)
Permalink
|
|
Query-Weight-Scorer hierarchy (was Re: Wildcards)
|
|
On Jan 29, 2008, at 8:06 PM, Nathan Kurz wrote: > What I meant to say was that > the globals information doesn't need to be known by the query, only by > the Scorer. By "globals information", I presume you mean the IDF. IDF is needed by the Weight. > The Query would deal with only the per-document data. That confuses me. Query objects represent an abstract ideal. They don't dirty their hands with actual real-world index data. Query: A pure, abstract representation of a logical query. Weight: Applies a query to a particular collection of documents. Scorer: Applies a query to individual documents. So the Weight deals with the per-collection information, and it's the Scorer -- not the Query -- that deals with per-document data. This actually has implications for generating HighlightSpan objects. I've been saying that we should go back to the Query for that, but really, Query objects won't know what to do with an individual document. We'll have to compile the Query to a Weight to a Scorer and have the Scorer perform that task. >> Or maybe the default TermQuery class can do flat scoring and >> TFIDFTermQuery would override? I imagine that would make you >> happy. ;) > > Given the smileys, I'm not sure if this is a joke or not. To be > clear, this solution would make me ill. Heh. No, I was serious. > My desire is to separate the > query from the scoring, so having a different Query class for each > possible scoring option is the antithesis of what I want. What I want > is to have a number of independent Scorers that can be plugged into a > Scorer-agnostic set of Queries: simple Queries, simple Scorers, > complex combinations. That's an interesting vision. It's sort-of at odds with how things currently work, because the expectation is that a FooQuery will be associated with FooWeight and FooScorer. However, BooleanScorers are aggregates of many other Scorers, and a PhraseWeight will actually kick out a TermScorer if you only give it one term. Plus, the association of a field with particular Posting and Similarity subclasses affects how scores are calculated. This is an area that's ripe for refactoring. I've already pulled out a bunch of cruft that was inherited from Lucene. Why don't we see what we come up with if we go back to first principles? I think the division of labor in the Query-Weight-Scorer hierarchy described above is sound. Do you agree? Marvin Humphrey Rectangular Research http://www.rectangular.com/ _______________________________________________ KinoSearch mailing list KinoSearch [at] rectangular http://www.rectangular.com/mailman/listinfo/kinosearch
|