
renaud+lucene at waldura
Oct 12, 2006, 4:11 PM
Views: 2890
Permalink
|
|
QueryParser Is Badly Broken
|
|
I'm developing an application used by scientists -- people who have a pretty good idea of what logic is -- and they were shocked to find out that neither of these queries return the same results: 1- banana AND apple OR orange 2- banana AND (apple OR orange) 3- (banana AND apple) OR orange I'd expect (1) to be either (2) or (3), but it turns out it's parsed as "+banana apple orange". I was rather, uh, dismayed by this find, as it doesn't seem to make sense. I just spent half a day reading up on the various ways QueryParser is broken, by going through the bugs and the mailing-list archives. And I'm still unable to come to a conclusion. Here's where I'm at: a- queries which mix boolean operators require strict parenthesizing to work right b- "+" isn't shorthand for "AND"; using it with "AND"/"OR"/"NOT" and the default operator "" rarely does what you expect c- the stock QueryParser doesn't work well in these cases d- there's a new PrecedenceQueryParser at http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/miscellaneous that solves *some* of the issues but creates others e- there is a non-Lucene effort to create a query parser with a different syntax at http://famestalker.com/devwiki/ While we are also developing a query-building UI, users must be able to enter text queries as well. What do other folks do? I mean, this is pretty bad. I can hardly go back to my scientists and tell them Lucene is unable to handle 2 boolean operators, that they should parenthesize everything by hand. I mean, that's just cheesy. --Renaud --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org For additional commands, e-mail: java-user-help[at]lucene.apache.org
|