
mike.klaas at gmail
Mar 13, 2009, 10:51 AM
Post #5 of 11
(1824 views)
Permalink
|
|
Re: operator precedence and confusing result
[In reply to]
|
|
On 11-Mar-09, at 7:13 PM, Jenny Brown wrote: > I use the boolean logic heavily in a production app, because it's the > grammar that my users understand (and they put together complex > boolean queries in other apps too). Also, we're not using relevance > ranking. A document either "matches the query" and gets returned, or > "doesn't match" and doesn't get returned. We only want yes/no > answers. > > I haven't had time to really figure out what the earlier commenter > meant with the + operators syntax conversion. I still thought it > would have meant the same thing as the query I had posted, ie, article > has to match all terms in the AND clauses, and at least one of the > terms in the OR list. I guess I'm still missing what his explanation > was trying to demonstrate. > > Anyway, just a note to say that boolean matching is important to me > and my users; it'd be good if it worked the way it looks like it > would. If it doesn't, I need to understand better what the current > limitations are. Well, this is precisely why I am suggesting that we remove it (in some future version of Lucene). Lucene doesn't have a hierarchical boolean query model that works like people "expect", and bugs filed that report discrepancies between the way boolean operators work and intuition are rejected. We are left with something that is convenient if you understand how it works, but if that is so, there is no reason that translation into the alternate syntax can't be used. Lucene's query model is based on REQUIRED, OPTIONAL, and EXCLUDED clauses. A clause with no annotation is always OPTIONAL, and doesn't affect matching unless there are only OPTIONAL clauses on that level. brackets () create a subclause (note that this is OPTIONAL by default!). AND terms are translated into REQUIRED clauses, AND NOT's are translated into EXCLUDED clauses. Require clauses are annotated with +'s A AND B OR C OR D OR E OR F -> +A +B C D E F -> find documents that match clause A and clause B (other clauses don't affect matching) C OR D OR E OR F -> C D E F -> find documents matching at least one of these clauses A AND (B OR C OR D OR E OR F) -> +A +(B C D E F) -> find documents that match A, and match one of B, C, D, E, or F (A AND B) OR C OR D OR E OR F -> (+A +B) C D E F -> find documents that match at least one of C, D, E, F, or both of A and B The key takeaway: once you have an AND in a grouped set of clauses, the OR are completely irrelevant for matching. -Mike
|