Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Technology Preview of new Lucene QueryParser

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


markrmiller at gmail

Jan 9, 2007, 7:43 PM

Post #1 of 20 (5288 views)
Permalink
Technology Preview of new Lucene QueryParser

I have released a Technology preview of my Lucene query parser Qsol.
This is the first official release. The purpose of this release is to
gather feedback for a 1.0 release.
If you have an interest in using this parser please lend a hand in
testing it out and making suggestions.

A recap of the parser's features:

These where my goals in building Qsol:

1.Proximity Operators in the search syntax



2.Paragraph/Sentence proximity searching



3.Query abbreviation tokens



4.SuggestedQuery support



5.Very customizable search syntax



6.Easily replaceable Date Parser



7.A cleaner syntax without precedence issues


http://www.myhardshadow.com/qsol.php


Qsol is open source and released under the Apache 2.0 license.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


hossman_lucene at fucit

Jan 10, 2007, 5:32 PM

Post #2 of 20 (5232 views)
Permalink
Re: Technology Preview of new Lucene QueryParser [In reply to]

: http://www.myhardshadow.com/qsol.php

Mark: I only read your querysyntax.php and didnt' dig into the source, but
i'm curious about the "There are no unary operators in Qsol syntax"
statement.... what is the Qsol equivilent of the QueryParser syntax: "A -B -C"

It's also not clear to me how differnet fields can be queried ... you give
creditcard[23907094 - 23094345] as an example of a range qery, but how
does one search for the word "foo" in the field "title" ?

Hmm... is there no support for query boosts?


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


markrmiller at gmail

Jan 10, 2007, 6:43 PM

Post #3 of 20 (5198 views)
Permalink
Re: Technology Preview of new Lucene QueryParser [In reply to]

Hey Hoss,

I didn't realize that I had left out the field stuff...I really am still
working on a lot with the parser's documentation and I apologize.
>
> Mark: I only read your querysyntax.php and didnt' dig into the source, but
> i'm curious about the "There are no unary operators in Qsol syntax"
> statement.... what is the Qsol equivilent of the QueryParser syntax: "A -B -C"
>
Perhaps I am not being completely accurate with that statement, I'll let
you be the judge.
It works like this: "A -B -C" would be expressed as "A ! B ! C"
By binary, I mean that each operator must connect two clauses...in that
case A is connected to B and C is connected to A ! B.
I avoid the single prohibit clause issue, -query, by not really allowing
it in the syntax. The operators are: AND, OR, ANDNOT, and PROXIMITY.
A ! B ! B = A ANDNOT B ANDNOT C
> It's also not clear to me how differnet fields can be queried ... you give
> creditcard[23907094 - 23094345] as an example of a range qery, but how
> does one search for the word "foo" in the field "title" ?
>
I will add this...a field search is: field1,field2(foo) | field3(foobar)

I have to update that documentation...field search was '[ ]' but is now
'( )'.
Thanks for pointing this out to me.
> Hmm... is there no support for query boosts?
>
This is a glaring omission eh? I have to plead laziness...I havn't
needed it yet in my work so it is not there...I will add it before a
1.0 release but was hoping for some syntax suggestions -- though I'm
betting people are happy with Lucene's syntax for this.
> -Hoss
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


hossman_lucene at fucit

Jan 10, 2007, 8:15 PM

Post #4 of 20 (5220 views)
Permalink
Re: Technology Preview of new Lucene QueryParser [In reply to]

: It works like this: "A -B -C" would be expressed as "A ! B ! C"
: By binary, I mean that each operator must connect two clauses...in that
: case A is connected to B and C is connected to A ! B.
: I avoid the single prohibit clause issue, -query, by not really allowing

so do you convert A ! B ! C into a three clause boolean query, or a two
clause BooleanQuery that contains another two clause BooleanQuery?

: I will add this...a field search is: field1,field2(foo) | field3(foobar)

is that field1,field2(foo) construct a DisjunctionMaxQuery or just a
BooleanQuery?

: 1.0 release but was hoping for some syntax suggestions -- though I'm
: betting people are happy with Lucene's syntax for this.

yeah, "^" is pretty straight forward

incidently: what was there a motivating factor behind the mixed use of
both ~ and : to denote slop?


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


markrmiller at gmail

Jan 11, 2007, 4:00 AM

Post #5 of 20 (5201 views)
Permalink
Re: Technology Preview of new Lucene QueryParser [In reply to]

> : It works like this: "A -B -C" would be expressed as "A ! B ! C"
> : By binary, I mean that each operator must connect two clauses...in that
> : case A is connected to B and C is connected to A ! B.
> : I avoid the single prohibit clause issue, -query, by not really allowing
>
> so do you convert A ! B ! C into a three clause boolean query, or a two
> clause BooleanQuery that contains another two clause BooleanQuery?
>
It becomes a three clause boolean query...would there be a difference in
scoring? I assumed not and it used to make a boolean that contained
another boolean...these days it checks to see if its in a chain of the
same operator and makes only one boolean.
> : I will add this...a field search is: field1,field2(foo) | field3(foobar)
>
> is that field1,field2(foo) construct a DisjunctionMaxQuery or just a
> BooleanQuery?
>
Just a boolean right now. I will have to look at DisjuntionMaxQuery.
Currently its just a boolean: +field1:foo +field2:foo
> : 1.0 release but was hoping for some syntax suggestions -- though I'm
> : betting people are happy with Lucene's syntax for this.
>
> yeah, "^" is pretty straight forward
>
> incidently: what was there a motivating factor behind the mixed use of
> both ~ and : to denote slop?
>
':' is for slop on a phrase query. "the car is burning so get out":2
will allow for each word to be within 2.
'~' is a binary operator...mark ~4 postman...or say: (mark ~5 (horse &
car) ~6 tom brady | "hard knocks dude":3) ~6 garbage

Phrase slop could be specified with the '~' op too: the ~2 car ~2 is
~burning ~2 so ~2 get ~2 out : but that is a pain in the butt.

Also, '~' is needed for paragraph and sentence prox searches: (old
crooner) ~3p johnny

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


hossman_lucene at fucit

Jan 11, 2007, 2:09 PM

Post #6 of 20 (5199 views)
Permalink
Re: Technology Preview of new Lucene QueryParser [In reply to]

: > so do you convert A ! B ! C into a three clause boolean query, or a two
: > clause BooleanQuery that contains another two clause BooleanQuery?
: >
: It becomes a three clause boolean query...would there be a difference in
: scoring? I assumed not and it used to make a boolean that contained
: another boolean...these days it checks to see if its in a chain of the
: same operator and makes only one boolean.

there is in fact a difference in score ... a big difference depending
on how the coordFactor comes into play. your three-clause approach makes
sense to me as the "right" approach, but your "in a chain of the same
operator" comment scares me ... how does "A | B | C ! D ! E" get parsed?
I would assume it should result in the QueryParser equivilent of
"A B C -D -E" ... is there any way to produce a the same underlying
BooleanQuery using your syntax?

: Just a boolean right now. I will have to look at DisjuntionMaxQuery.
: Currently its just a boolean: +field1:foo +field2:foo

hmmm... so field1,field2(foo) requires that foo match on both field1 and
field2, even if you've used it in this context...

fieldX(bar) | field1,field2(foo)

...it seems like a shortcut for "match on foo in ANY of the following
fields" would be needed in more cases then a shortfut for "match on foo in
ALL of the following fields"

: > incidently: what was there a motivating factor behind the mixed use of
: > both ~ and : to denote slop?
: >
: ':' is for slop on a phrase query. "the car is burning so get out":2
: will allow for each word to be within 2.
: '~' is a binary operator...mark ~4 postman...or say: (mark ~5 (horse &
: car) ~6 tom brady | "hard knocks dude":3) ~6 garbage
:
: Phrase slop could be specified with the '~' op too: the ~2 car ~2 is
: ~burning ~2 so ~2 get ~2 out : but that is a pain in the butt.

you kind of lost me there ... i get that ~ is a binary operator, but in
both cases the intent is to say "these words must appear near eachother"
...s oi'm wondering why you cose to use "hard knocks dude":3 instead of
"hard knocks dude"~3 .... oh wiat, i think i get it ... was it to
eliminate ambiguity of something like ("hard knocks dude" ~3 foo) ...
is the whitespace arround binary operators optional?




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


markrmiller at gmail

Jan 11, 2007, 3:18 PM

Post #7 of 20 (5188 views)
Permalink
Re: Technology Preview of new Lucene QueryParser [In reply to]

> : > so do you convert A ! B ! C into a three clause boolean query, or a two
> : > clause BooleanQuery that contains another two clause BooleanQuery?
> : >
> : It becomes a three clause boolean query...would there be a difference in
> : scoring? I assumed not and it used to make a boolean that contained
> : another boolean...these days it checks to see if its in a chain of the
> : same operator and makes only one boolean.
>
> there is in fact a difference in score ... a big difference depending
> on how the coordFactor comes into play. your three-clause approach makes
> sense to me as the "right" approach, but your "in a chain of the same
> operator" comment scares me ... how does "A | B | C ! D ! E" get parsed?
> I would assume it should result in the QueryParser equivilent of
> "A B C -D -E" ... is there any way to produce a the same underlying
> BooleanQuery using your syntax?
>
This sounds troubling to me now :) I may need to clear up my
understanding of this and rework the parser:
"A | B | C ! D ! E" wold get parsed as allFields:a allFields:b
(+allFields:c -allFields:d -allFields:e)
This is because ! binds tighter than |...
Sounds like I need to bone up on how I thought this query would operate.
I set up this logic back when I was new to Lucene and have not
considered it since. Seems as though the hits will be right but perhaps
the scoring will not be correct?
> : Just a boolean right now. I will have to look at DisjuntionMaxQuery.
> : Currently its just a boolean: +field1:foo +field2:foo
>
> hmmm... so field1,field2(foo) requires that foo match on both field1 and
> field2, even if you've used it in this context...
>
> fieldX(bar) | field1,field2(foo)
>
> ...it seems like a shortcut for "match on foo in ANY of the following
> fields" would be needed in more cases then a shortfut for "match on foo in
> ALL of the following fields"
>
I was mistaken: it is actually field1:foo field2:foo. OR instead of AND.
Sorry about that. Obviously it looks like I should be looking into
DisjunctionMaxQuery instead though.
> : > incidently: what was there a motivating factor behind the mixed use of
> : > both ~ and : to denote slop?
> : >
> : ':' is for slop on a phrase query. "the car is burning so get out":2
> : will allow for each word to be within 2.
> : '~' is a binary operator...mark ~4 postman...or say: (mark ~5 (horse &
> : car) ~6 tom brady | "hard knocks dude":3) ~6 garbage
> :
> : Phrase slop could be specified with the '~' op too: the ~2 car ~2 is
> : ~burning ~2 so ~2 get ~2 out : but that is a pain in the butt.
>
> you kind of lost me there ... i get that ~ is a binary operator, but in
> both cases the intent is to say "these words must appear near eachother"
> ...s oi'm wondering why you cose to use "hard knocks dude":3 instead of
> "hard knocks dude"~3 .... oh wiat, i think i get it ... was it to
> eliminate ambiguity of something like ("hard knocks dude" ~3 foo) ...
> is the whitespace arround binary operators optional?
>
Eliminating ambiguity was the intention. You think they should be the same?
The whitespace is not optional. I chose this route so that you would be
able to query m&m without escaping the &..instead you would use m & m
for an AND search.
> -Hoss
>
I can't thank you enough even for this brief exchange Hoss. You are a
tremendous help. I will be using this system in a production environment
and need it to be perfect before I am done.
The booleanquery question you brought up seems very troubling.

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


markrmiller at gmail

Jan 11, 2007, 3:46 PM

Post #8 of 20 (5196 views)
Permalink
Re: Technology Preview of new Lucene QueryParser [In reply to]

> you kind of lost me there ... i get that ~ is a binary operator, but in
> both cases the intent is to say "these words must appear near eachother"
> ...s oi'm wondering why you cose to use "hard knocks dude":3 instead of
> "hard knocks dude"~3 .... oh wiat, i think i get it ... was it to
> eliminate ambiguity of something like ("hard knocks dude" ~3 foo) ...
> is the whitespace arround binary operators optional?
>
I wasn't clear on this answer. The problem was not grammar ambiguity but
from a user standpoint...I wanted to differentiate the proximity binary
operator from the phrase distance operator...even though they are
similar. Perhaps the differentiation is more confusing then helpful.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


hossman_lucene at fucit

Jan 11, 2007, 5:47 PM

Post #9 of 20 (5206 views)
Permalink
Re: Technology Preview of new Lucene QueryParser [In reply to]

: I wasn't clear on this answer. The problem was not grammar ambiguity but
: from a user standpoint...I wanted to differentiate the proximity binary
: operator from the phrase distance operator...even though they are
: similar. Perhaps the differentiation is more confusing then helpful.

it's not confusing if your target audience isn't confused. .. i asked only
as someone who understands LUcene fairly well, and recognised that under
the covers they were (probably) both producing PhraseQueries ... i was
mainly just curious as to your motivation, which makes sense now

(it's mainly just one more special character people have to worry about
escaping)


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


markrmiller at gmail

Jan 11, 2007, 5:58 PM

Post #10 of 20 (5214 views)
Permalink
Re: Technology Preview of new Lucene QueryParser [In reply to]

Chris Hostetter wrote:
> : I wasn't clear on this answer. The problem was not grammar ambiguity but
> : from a user standpoint...I wanted to differentiate the proximity binary
> : operator from the phrase distance operator...even though they are
> : similar. Perhaps the differentiation is more confusing then helpful.
>
> it's not confusing if your target audience isn't confused. .. i asked only
> as someone who understands LUcene fairly well, and recognised that under
> the covers they were (probably) both producing PhraseQueries ... i was
> mainly just curious as to your motivation, which makes sense now
>
Both actually produce SpanQueries...I think that Surround makes this
compromise as well. Because the Phrase search must be able to
be in a span search and it is desirable to have the same type of phrase
search in a Span search or not, both must produce a SpanQuery.
I just worry that after saying ~ is a binary proximity operator that it
is confusing to say
"old horse"~3 cow
is different then
"old horse" ~3 cow
is dfferent from
"old horse"~3 ~3 cow
> (it's mainly just one more special character people have to worry about
> escaping)
>
I hope that this will not be much of a problem...the ':' will only be
recognized if it follows two sets of quotes and a series of digits
follows it..."the search":4
if it does not fit that pattern here is no need to escape it. Also, no
need to escape ~ unless its surrounded by spaces.



I'm most curious to hear what you think of the nested boolean query
issue...will my scores be screwed up? behave in a usable fashion? Any
insight to offer?


- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


hossman_lucene at fucit

Jan 11, 2007, 6:00 PM

Post #11 of 20 (5276 views)
Permalink
Re: Technology Preview of new Lucene QueryParser [In reply to]

: This sounds troubling to me now :) I may need to clear up my
: understanding of this and rework the parser:
: "A | B | C ! D ! E" wold get parsed as allFields:a allFields:b
: (+allFields:c -allFields:d -allFields:e)
: This is because ! binds tighter than |...
: Sounds like I need to bone up on how I thought this query would operate.
: I set up this logic back when I was new to Lucene and have not
: considered it since. Seems as though the hits will be right but perhaps
: the scoring will not be correct?

it depends on your definition of "correct" .. take a look at the
Query.toString and Explanation.toString output from a query for something
like "A X Y B C -D" vs the same results of "A X Y B (C -D)" or "(A X Y B)
(C -D)" .. particulararly when X and Y aren't in the documents and you'll
see what i mean.

: > ...it seems like a shortcut for "match on foo in ANY of the following
: > fields" would be needed in more cases then a shortfut for "match on foo in
: > ALL of the following fields"
: >
: I was mistaken: it is actually field1:foo field2:foo. OR instead of AND.
: Sorry about that. Obviously it looks like I should be looking into
: DisjunctionMaxQuery instead though.

well, i have an unhealthy obsession with DisjunctionMaxQuery, so don't
assume you *should* use it ... it's just the first thing that occured to
me what i saw your field1,field2(foo) syntax ... making that be a shortcut
for (field1(foo) field2(foo)) is perfectly fine too if that's the use case
you expect to be more useful.

where supporting DisjunctionMaxQuery would really be cool is if you
allowed people to specify the tiebreaker value, with something like...

field1,field2(foo):0.05

..and if a tiebreaker isn't specified, you default to just using a
BooleanQuery.

Incidently, is field1(foo bar) a shortcut for field1(foo) field1(bar) like
in the regular QueryParser?



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


markrmiller at gmail

Jan 11, 2007, 6:14 PM

Post #12 of 20 (5233 views)
Permalink
Re: Technology Preview of new Lucene QueryParser [In reply to]

>
> Incidently, is field1(foo bar) a shortcut for field1(foo) field1(bar) like
> in the regular QueryParser?
>
I believe I just or the queries together:

example = "field1,field2((search & old) ~3 horse)";
expected = "(+spanNear([field1:search, field1:horse], 3, false)
+spanNear([field1:old, field1:horse], 3, false))
(+spanNear([field2:search, field2:horse], 3, false)
+spanNear([field2:old, field2:horse], 3, false))";

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


markrmiller at gmail

Jan 12, 2007, 11:26 AM

Post #13 of 20 (5200 views)
Permalink
Re: Technology Preview of new Lucene QueryParser [In reply to]

> : This sounds troubling to me now :) I may need to clear up my
> : understanding of this and rework the parser:
> : "A | B | C ! D ! E" wold get parsed as allFields:a allFields:b
> : (+allFields:c -allFields:d -allFields:e)
> : This is because ! binds tighter than |...
> : Sounds like I need to bone up on how I thought this query would operate.
> : I set up this logic back when I was new to Lucene and have not
> : considered it since. Seems as though the hits will be right but perhaps
> : the scoring will not be correct?
>
> it depends on your definition of "correct" .. take a look at the
> Query.toString and Explanation.toString output from a query for something
> like "A X Y B C -D" vs the same results of "A X Y B (C -D)" or "(A X Y B)
> (C -D)" .. particulararly when X and Y aren't in the documents and you'll
> see what i mean.
>
>
I will certainly start experimenting with this. For clarification
though, you are telling me that the Lucene syntax query: 'Mark AND pig
AND man' is different than the query: '(Mark AND pig) AND man', correct?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


hossman_lucene at fucit

Jan 12, 2007, 11:56 AM

Post #14 of 20 (5193 views)
Permalink
Re: Technology Preview of new Lucene QueryParser [In reply to]

: I will certainly start experimenting with this. For clarification
: though, you are telling me that the Lucene syntax query: 'Mark AND pig
: AND man' is different than the query: '(Mark AND pig) AND man', correct?

Ummm... because you are making all of the clauses required, the parens
*may* not affect the final scores .. i can't remember off the top of my
head (it depends on how the queryNorm is calculated)

i know for a fact that you will see lots of score differences however
between the queries 'Mark OR pig OR man' and '(Mark OR pig) OR man'
for documents which only contain one or two of those terms -- that's
because of the coordFactor.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


markrmiller at gmail

Jan 20, 2007, 10:07 AM

Post #15 of 20 (5151 views)
Permalink
Re: Technology Preview of new Lucene QueryParser [In reply to]

Chris Hostetter wrote:
> : > so do you convert A ! B ! C into a three clause boolean query, or a two
> : > clause BooleanQuery that contains another two clause BooleanQuery?
> : >
> : It becomes a three clause boolean query...would there be a difference in
> : scoring? I assumed not and it used to make a boolean that contained
> : another boolean...these days it checks to see if its in a chain of the
> : same operator and makes only one boolean.
>
> there is in fact a difference in score ... a big difference depending
> on how the coordFactor comes into play. your three-clause approach makes
> sense to me as the "right" approach, but your "in a chain of the same
> operator" comment scares me ... how does "A | B | C ! D ! E" get parsed?
> I would assume it should result in the QueryParser equivilent of
> "A B C -D -E" ... is there any way to produce a the same underlying
> BooleanQuery using your syntax?
>
This exchange has caused me to reassess my syntax. It seems that
QueryParser's handling of A B C -D -E is special because QueryParser
does not have any operator precedence rules (unless the 1 rule is that
all operators resolve with the same precedence <g>). What would appear
in my parser to map to A B C -D -E i.e. A | B | C ! D ! E, actually
maps to: A B (+C -D -E). If you want precedence applied to your query,
there is no way around this-- that is what operator precedence creates.
Unfortunately, that means my parser is not quite as "rich" as
QueryParser. While I can duplicate the same hits, QueryParser has a
greater scoring expressiveness (QueryParser can express precedence
queries, but you must use parenthesis). My current idea (and mostly
implemented) is to add a set of operators to my syntax that force a
first level precedence resolve. To express the query A B C -D -E you
would use A || B || C !! D !! E . Doubling the operator would
effectively make the operator bind at the level of parenthesis. You
could then mix like this: A B ! C & A || Z || B !! H & Q.

I think that this change will be a large step forward in supporting the
entire Lucene Query/Scoring language with my parser.

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


hossman_lucene at fucit

Jan 20, 2007, 6:14 PM

Post #16 of 20 (5147 views)
Permalink
Re: Technology Preview of new Lucene QueryParser [In reply to]

: This exchange has caused me to reassess my syntax. It seems that
: QueryParser's handling of A B C -D -E is special because QueryParser
: does not have any operator precedence rules (unless the 1 rule is that
: all operators resolve with the same precedence <g>). What would appear

there are some precedence rules, but they really only apply when using
"AND", "OR", or "NOT" -- which as i've gone on record with many teams: are
big freaking hasks that people shouldn't use.

the big reason QueryParser doesn't have to wory about precidence, is
because it's core operators: "+" and "-" are unary, which makes sense
given the way BooleanQuery works: the "prohibited" and "mandatory"
options apply to individual clauses... in your syntax you've made all
operators binary, so you have to decide how to map those binary
opertor concepts to the unary "options" of BooleanQuery.





-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


markrmiller at gmail

Jan 20, 2007, 7:09 PM

Post #17 of 20 (5150 views)
Permalink
Re: Technology Preview of new Lucene QueryParser [In reply to]

Chris Hostetter wrote:
> : This exchange has caused me to reassess my syntax. It seems that
> : QueryParser's handling of A B C -D -E is special because QueryParser
> : does not have any operator precedence rules (unless the 1 rule is that
> : all operators resolve with the same precedence <g>). What would appear
>
> there are some precedence rules, but they really only apply when using
> "AND", "OR", or "NOT" -- which as i've gone on record with many teams: are
> big freaking hasks that people shouldn't use.
>
>
From what I understand "AND", "OR", and "NOT" do not follow normal
precedence rules. I thought instead that AND just means to MUST both
sides of the AND and OR means to SHOULD both sides etc. While this might
create a form of precedence, it's a confusing oddity that I am not a fan
of either. I am sure I saw a comment from Erick explaining that the
QueryParser to works this way.

With my syntax you can get real precedence that mixes with how no
precedence (Lucene's unary operators) works. No precedence is created by
allowing you to make any operator resolve first...any operator that
resolves first connected with another operator that resolves first will
behave as if neither has precedence over the other and generate a single
BooleanQuery.

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


hossman_lucene at fucit

Jan 22, 2007, 1:29 PM

Post #18 of 20 (5124 views)
Permalink
Re: Technology Preview of new Lucene QueryParser [In reply to]

: With my syntax you can get real precedence that mixes with how no
: precedence (Lucene's unary operators) works. No precedence is created by
: allowing you to make any operator resolve first...any operator that
: resolves first connected with another operator that resolves first will
: behave as if neither has precedence over the other and generate a single
: BooleanQuery.

what i was trying to get at is that i don't think precedence is really the
issue -- it's the lack of unary operators. If the only way to get a
single BooleanQuery is to use operators that have the exact same
precedence, and all operators are binary, then how to you create the
equivilent of QueryParser "+a b c -d -e" ? ... if i remember your syntax
correctly the only way to match the same documents is...
"a & ( b | c ) ! d ! e"

...but it won't score the same way because the parens force a nested
boolean query to be created.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


paul.elschot at xs4all

Jan 22, 2007, 2:54 PM

Post #19 of 20 (5115 views)
Permalink
Re: Technology Preview of new Lucene QueryParser [In reply to]

On Monday 22 January 2007 22:29, Chris Hostetter wrote:
>
> : With my syntax you can get real precedence that mixes with how no
> : precedence (Lucene's unary operators) works. No precedence is created by
> : allowing you to make any operator resolve first...any operator that
> : resolves first connected with another operator that resolves first will
> : behave as if neither has precedence over the other and generate a single
> : BooleanQuery.
>
> what i was trying to get at is that i don't think precedence is really the
> issue -- it's the lack of unary operators. If the only way to get a
> single BooleanQuery is to use operators that have the exact same
> precedence, and all operators are binary, then how to you create the
> equivilent of QueryParser "+a b c -d -e" ? ... if i remember your syntax
> correctly the only way to match the same documents is...
> "a & ( b | c ) ! d ! e"
>
> ...but it won't score the same way because the parens force a nested
> boolean query to be created.

I considered adding the removal of such nests to the surround
query language, but I never took the time to actually do it.

Anyway, which of the two forms is more user friendly? I wish I knew,
but the lack of brackets in the prefix form is tempting.

Thanks for spelling this out,

Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


markrmiller at gmail

Jan 22, 2007, 3:43 PM

Post #20 of 20 (5150 views)
Permalink
Re: Technology Preview of new Lucene QueryParser [In reply to]

As I humbly ran into. I thought of '-a', and 'a' but hadn't thought too
far ahead. It covers enough ground to satisfy me for now though. Mixing
real precedence and unary operators is something I experimented with a
little a few months back and couldn't find anything good. This is my
first parser so I am sure I am a bit limited in this regard. Unary
syntax equivalence will have to be pushed aside for now. Thanks for all
your input Hoss.

- Mark

Chris Hostetter wrote:
> : With my syntax you can get real precedence that mixes with how no
> : precedence (Lucene's unary operators) works. No precedence is created by
> : allowing you to make any operator resolve first...any operator that
> : resolves first connected with another operator that resolves first will
> : behave as if neither has precedence over the other and generate a single
> : BooleanQuery.
>
> what i was trying to get at is that i don't think precedence is really the
> issue -- it's the lack of unary operators. If the only way to get a
> single BooleanQuery is to use operators that have the exact same
> precedence, and all operators are binary, then how to you create the
> equivilent of QueryParser "+a b c -d -e" ? ... if i remember your syntax
> correctly the only way to match the same documents is...
> "a & ( b | c ) ! d ! e"
>
> ...but it won't score the same way because the parens force a nested
> boolean query to be created.
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.