Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Jan 22, 2012, 2:19 AM

Post #1 of 12 (64 views)
Permalink
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190640#comment-13190640 ]

Mike commented on SOLR-2649:
----------------------------

Yeah, I'm seeing this too. A user has reported that they queried:
(internet OR online OR web) "personal jurisdiction"

I have defaultOperator set to AND, so I'd expect the query to get processed as:
(internet OR online OR web) AND "personal jurisdiction"

But it is instead getting processed with an OR statement. I've confirmed this using debug.

This doesn't seem like ideal functionality for the default operator to work, except when the user tries to override it in parts of a query. This seems like more than a minor issue to me.

> MM ignored in edismax queries with operators
> --------------------------------------------
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
> Issue Type: Bug
> Components: search
> Affects Versions: 3.3
> Reporter: Magnus Bergmark
> Priority: Minor
>
> Hypothetical scenario:
> 1. User searches for "stocks oil gold" with MM set to "50%"
> 2. User adds "-stockings" to the query: "stocks oil gold -stockings"
> 3. User gets no hits since MM was ignored and all terms where AND-ed together
> The behavior seems to be intentional, although the reason why is never explained:
> // For correct lucene queries, turn off mm processing if there
> // were explicit operators (except for AND).
> boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0;
> (lines 232-234 taken from tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the primary features of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jan 22, 2012, 9:28 AM

Post #2 of 12 (53 views)
Permalink
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190715#comment-13190715 ]

Brian Carver commented on SOLR-2649:
------------------------------------

If this bug is responsible for the behavior Mike describes, then I agree with him that this should not be classed "minor" as it results in precisely the opposite behavior that the user/maintainer would anticipate.

> MM ignored in edismax queries with operators
> --------------------------------------------
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
> Issue Type: Bug
> Components: search
> Affects Versions: 3.3
> Reporter: Magnus Bergmark
> Priority: Minor
>
> Hypothetical scenario:
> 1. User searches for "stocks oil gold" with MM set to "50%"
> 2. User adds "-stockings" to the query: "stocks oil gold -stockings"
> 3. User gets no hits since MM was ignored and all terms where AND-ed together
> The behavior seems to be intentional, although the reason why is never explained:
> // For correct lucene queries, turn off mm processing if there
> // were explicit operators (except for AND).
> boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0;
> (lines 232-234 taken from tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the primary features of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jan 22, 2012, 11:31 PM

Post #3 of 12 (56 views)
Permalink
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190909#comment-13190909 ]

Ron Davies commented on SOLR-2649:
----------------------------------

A significant portion of our users (professional searchers) would never accept this behaviour so this issue is a blocker for us, i.e. prevents us us from using edismax (which we would very much like to do).

> MM ignored in edismax queries with operators
> --------------------------------------------
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
> Issue Type: Bug
> Components: search
> Affects Versions: 3.3
> Reporter: Magnus Bergmark
> Priority: Minor
>
> Hypothetical scenario:
> 1. User searches for "stocks oil gold" with MM set to "50%"
> 2. User adds "-stockings" to the query: "stocks oil gold -stockings"
> 3. User gets no hits since MM was ignored and all terms where AND-ed together
> The behavior seems to be intentional, although the reason why is never explained:
> // For correct lucene queries, turn off mm processing if there
> // were explicit operators (except for AND).
> boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0;
> (lines 232-234 taken from tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the primary features of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Feb 2, 2012, 12:24 PM

Post #4 of 12 (47 views)
Permalink
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199182#comment-13199182 ]

Jan Høydahl commented on SOLR-2649:
-----------------------------------

So how should the parser interpret these examples?

{noformat}
q=word1 word2 word3 -word4&mm=100%
{noformat}
I agree with Ahmet that here both word1, word2 and word3 must be required since mm is explicitly specified. If mm is not specified, mm is set from defaultOperator, i.e. AND=>100%, OR=>0

{noformat}
q=word1 word2 word3 -word4%mm=50%
{noformat}
Here you'd expect that two of of the three first words must match.

{noformat}
q=word1 OR word2 word3%mm=100%
Example after having indexed exampledocs:
http://localhost:8983/solr/browse?q=ipod%20OR%20samsung%20printer&debugQuery=true&mm=100%25
{noformat}
With ipod OR samsung I get 5 hits. Adding the word "printer" yields 6 hits, i.e. it is OR'ed too. Here I'd expect the equivalent of (word1 OR word2) AND word3.

{noformat}
q=word1 AND word2 word3%mm=50%
{noformat}
What would you expect for this? Perhaps (word1 AND word2) to be treated as clause1 and word3 as clause2 and then apply mm=1?

{noformat}
q=word1 OR word2 word3 word4 word5%mm=50%
{noformat}
How about this? Again, it would make sense to respect (word1 OR word2) as one clause and then require two clauses out of the resulting four.

> MM ignored in edismax queries with operators
> --------------------------------------------
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
> Issue Type: Bug
> Components: search
> Affects Versions: 3.3
> Reporter: Magnus Bergmark
> Priority: Minor
>
> Hypothetical scenario:
> 1. User searches for "stocks oil gold" with MM set to "50%"
> 2. User adds "-stockings" to the query: "stocks oil gold -stockings"
> 3. User gets no hits since MM was ignored and all terms where AND-ed together
> The behavior seems to be intentional, although the reason why is never explained:
> // For correct lucene queries, turn off mm processing if there
> // were explicit operators (except for AND).
> boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0;
> (lines 232-234 taken from tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the primary features of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Feb 2, 2012, 12:24 PM

Post #5 of 12 (48 views)
Permalink
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199181#comment-13199181 ]

Jan Høydahl commented on SOLR-2649:
-----------------------------------

So how should the parser interpret these examples?

{noformat}
q=word1 word2 word3 -word4&mm=100%
{noformat}
I agree with Ahmet that here both word1, word2 and word3 must be required since mm is explicitly specified. If mm is not specified, mm is set from defaultOperator, i.e. AND=>100%, OR=>0

{noformat}
q=word1 word2 word3 -word4%mm=50%
{noformat}
Here you'd expect that two of of the three first words must match.

{noformat}
q=word1 OR word2 word3%mm=100%
Example after having indexed exampledocs:
http://localhost:8983/solr/browse?q=ipod%20OR%20samsung%20printer&debugQuery=true&mm=100%25
{noformat}
With ipod OR samsung I get 5 hits. Adding the word "printer" yields 6 hits, i.e. it is OR'ed too. Here I'd expect the equivalent of (word1 OR word2) AND word3.

{noformat}
q=word1 AND word2 word3%mm=50%
{noformat}
What would you expect for this? Perhaps (word1 AND word2) to be treated as clause1 and word3 as clause2 and then apply mm=1?

{noformat}
q=word1 OR word2 word3 word4 word5%mm=50%
{noformat}
How about this? Again, it would make sense to respect (word1 OR word2) as one clause and then require two clauses out of the resulting four.

> MM ignored in edismax queries with operators
> --------------------------------------------
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
> Issue Type: Bug
> Components: search
> Affects Versions: 3.3
> Reporter: Magnus Bergmark
> Priority: Minor
>
> Hypothetical scenario:
> 1. User searches for "stocks oil gold" with MM set to "50%"
> 2. User adds "-stockings" to the query: "stocks oil gold -stockings"
> 3. User gets no hits since MM was ignored and all terms where AND-ed together
> The behavior seems to be intentional, although the reason why is never explained:
> // For correct lucene queries, turn off mm processing if there
> // were explicit operators (except for AND).
> boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0;
> (lines 232-234 taken from tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the primary features of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Feb 2, 2012, 12:38 PM

Post #6 of 12 (48 views)
Permalink
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199194#comment-13199194 ]

James Dyer commented on SOLR-2649:
----------------------------------

Maybe a simple answer is to have it make "mm" apply to all optional terms and ignore the rest. So for...
{noformat}
q=word1 AND word2 word3%mm=50%
{noformat}
..."word3" is the only optional term, so mm=50% only applies to "word3".

And for...
{noformat}
q=word1 OR word2 word3 word4 word5%mm=50%
{noformat}
...Everything here is optional, so "mm" applies to all the terms. Otherwise, you'd be in a situation where "OR" takes on a meaning that is different from "optional" and I'm not sure you want to introduce a 4th concept here beyond what we already have: required/optional/prohibited.

The semantics of "mm" would then become "the minimum of all optional terms that need to match".

> MM ignored in edismax queries with operators
> --------------------------------------------
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
> Issue Type: Bug
> Components: search
> Affects Versions: 3.3
> Reporter: Magnus Bergmark
> Priority: Minor
>
> Hypothetical scenario:
> 1. User searches for "stocks oil gold" with MM set to "50%"
> 2. User adds "-stockings" to the query: "stocks oil gold -stockings"
> 3. User gets no hits since MM was ignored and all terms where AND-ed together
> The behavior seems to be intentional, although the reason why is never explained:
> // For correct lucene queries, turn off mm processing if there
> // were explicit operators (except for AND).
> boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0;
> (lines 232-234 taken from tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the primary features of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Feb 2, 2012, 12:43 PM

Post #7 of 12 (47 views)
Permalink
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199199#comment-13199199 ]

Mike commented on SOLR-2649:
----------------------------

That makes sense to me and sounds like the simplest, most logical solution.

I'm mostly in favor of the easiest thing that will make default AND queries work properly as quickly as possible.

> MM ignored in edismax queries with operators
> --------------------------------------------
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
> Issue Type: Bug
> Components: search
> Affects Versions: 3.3
> Reporter: Magnus Bergmark
> Priority: Minor
>
> Hypothetical scenario:
> 1. User searches for "stocks oil gold" with MM set to "50%"
> 2. User adds "-stockings" to the query: "stocks oil gold -stockings"
> 3. User gets no hits since MM was ignored and all terms where AND-ed together
> The behavior seems to be intentional, although the reason why is never explained:
> // For correct lucene queries, turn off mm processing if there
> // were explicit operators (except for AND).
> boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0;
> (lines 232-234 taken from tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the primary features of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Feb 2, 2012, 1:44 PM

Post #8 of 12 (47 views)
Permalink
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199259#comment-13199259 ]

Jan Høydahl commented on SOLR-2649:
-----------------------------------

Yes I think the key here is what terms are part of some user imposed operator (forced MUST or MUST NOT) vs what terms are left dangling in the wild to be subject to mm. But what about this

{noformat}
q=word1 AND word2 (word3 OR word4) word5%mm=100%
{noformat}
Should this be interpreted as MUST have word1 AND word2 and set mm=3 for word3, word4, word5? Don't think so. An OR does not mean the same as a "loose" term. This would clearly (perhaps because of the parens) signal that word3 OR word4 should be treated as one unit, not requiring both of them?

> MM ignored in edismax queries with operators
> --------------------------------------------
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
> Issue Type: Bug
> Components: search
> Affects Versions: 3.3
> Reporter: Magnus Bergmark
> Priority: Minor
>
> Hypothetical scenario:
> 1. User searches for "stocks oil gold" with MM set to "50%"
> 2. User adds "-stockings" to the query: "stocks oil gold -stockings"
> 3. User gets no hits since MM was ignored and all terms where AND-ed together
> The behavior seems to be intentional, although the reason why is never explained:
> // For correct lucene queries, turn off mm processing if there
> // were explicit operators (except for AND).
> boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0;
> (lines 232-234 taken from tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the primary features of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Feb 2, 2012, 2:22 PM

Post #9 of 12 (48 views)
Permalink
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199286#comment-13199286 ]

James Dyer commented on SOLR-2649:
----------------------------------

It seems it would be simpler to implement and understand if we just counted up the optional words in the query and apply "mm" to those. I suppose you could create a subtle rule that naked terms count for "mm" but OR-ed terms do not. This might be functionality someone wants but then again it might confuse others who would expect "x OR y" to mean the same as "x y".

Counting multiple terms as 1 because they are in parenthesis together doesn't seem like a good idea to me. But then again, maybe someone out there would appreciate all the subtle things you could do with this?

I guess whatever is decided just needs to be well-documented so when/if someone is surprised by the functionality they can look it up and see what's going on. Whatever is done, it will be a nice improvement over the current behavior.

> MM ignored in edismax queries with operators
> --------------------------------------------
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
> Issue Type: Bug
> Components: search
> Affects Versions: 3.3
> Reporter: Magnus Bergmark
> Priority: Minor
>
> Hypothetical scenario:
> 1. User searches for "stocks oil gold" with MM set to "50%"
> 2. User adds "-stockings" to the query: "stocks oil gold -stockings"
> 3. User gets no hits since MM was ignored and all terms where AND-ed together
> The behavior seems to be intentional, although the reason why is never explained:
> // For correct lucene queries, turn off mm processing if there
> // were explicit operators (except for AND).
> boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0;
> (lines 232-234 taken from tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the primary features of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Feb 2, 2012, 4:20 PM

Post #10 of 12 (48 views)
Permalink
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199400#comment-13199400 ]

Jan Høydahl commented on SOLR-2649:
-----------------------------------

When bringing up all these cases, we may perhaps understand the reason for the current behavior after all :) However, it is flawed in assuming that schema's defaultOperator should be used instead of mm.

Here's a concrete suggestion for improvement

* For mm=0%, mm=100% or no mm specified: Disable mm as today, but induce defaultOperator from the mm value
* For all other values of mm, use James' method of counting "optional" terms (including OR'ed ones) and apply "mm" to those.

This would be a big step in right direction and probably fix most peoples needs

> MM ignored in edismax queries with operators
> --------------------------------------------
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
> Issue Type: Bug
> Components: search
> Affects Versions: 3.3
> Reporter: Magnus Bergmark
> Priority: Minor
>
> Hypothetical scenario:
> 1. User searches for "stocks oil gold" with MM set to "50%"
> 2. User adds "-stockings" to the query: "stocks oil gold -stockings"
> 3. User gets no hits since MM was ignored and all terms where AND-ed together
> The behavior seems to be intentional, although the reason why is never explained:
> // For correct lucene queries, turn off mm processing if there
> // were explicit operators (except for AND).
> boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0;
> (lines 232-234 taken from tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the primary features of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Feb 3, 2012, 2:20 PM

Post #11 of 12 (48 views)
Permalink
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200102#comment-13200102 ]

Hoss Man commented on SOLR-2649:
--------------------------------

bq. Counting multiple terms as 1 because they are in parenthesis together doesn't seem like a good idea to me.

I disagree, but it definitely just seems like a matter of opinion -- i don't know that we could ever come up with something that makes sense in all use cases

personally i think the sanest change would be to say that "mm" applies to all top level SHOULD clauses in the query (regardless of wether they have an explicit OR or not) -- exactly as it always has in dismax. if a top level clause is a nested boolean queries, then "mm" shouldn't apply to those because it doesn't make sense to blur the "count" of how many SHOULD clauses there are at the various levels.

would would mm=5 mean for a query like "q=X AND Y (a b) (c d) (e f) (g h)" if you looked at all the nested subqueries? that only 5 of those 8 (lowercase) leaf level clauses are required? how would that be implemented on the underlying BooleanQuery objects w/o completely flattening the query (which would break the intent of the user when they grouped them) ... it seems like mm=5 (or mm=100%) should mean 5 (or 100%) of the top level SHOULD clauses are required ... the default query op should determine how any top level clauses that are BooleanQueries are dealt with.

...but that's just my opinion.



> MM ignored in edismax queries with operators
> --------------------------------------------
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
> Issue Type: Bug
> Components: search
> Affects Versions: 3.3
> Reporter: Magnus Bergmark
> Priority: Minor
>
> Hypothetical scenario:
> 1. User searches for "stocks oil gold" with MM set to "50%"
> 2. User adds "-stockings" to the query: "stocks oil gold -stockings"
> 3. User gets no hits since MM was ignored and all terms where AND-ed together
> The behavior seems to be intentional, although the reason why is never explained:
> // For correct lucene queries, turn off mm processing if there
> // were explicit operators (except for AND).
> boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0;
> (lines 232-234 taken from tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the primary features of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Feb 4, 2012, 8:52 AM

Post #12 of 12 (48 views)
Permalink
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200476#comment-13200476 ]

Brian Carver commented on SOLR-2649:
------------------------------------

I'm new to solr, so I have a tenuous grasp on some of these issues, but I've understood boolean logic for a couple of decades and it seems to me like solr's current behavior is thwarting the expectations of those who understand what they want and explicitly ask for it. Mike's example above is what troubles me.

Principles:
1. The maintainer sets whitespace to be interpreted as AND or OR and solr should do nothing to change that in particular instances.
2. Where a user inputs an ambiguous query, a default rule about how operator scope will work is needed and that also should not be changed in particular instances.

So, Mike says he sets whitespace to AND, users know this, and then a user enters:

Example 1: (A or B or C) "D E"

Given the above assumptions, the only reasonable interpretation of this is:

(A or B or C) AND "D E" which is a conjunction with two conjuncts, both of which must be satisfied for a result to be produced, yet Mike/the user gets results that only satisfy one of the conjuncts. That shouldn't happen.

I'd agree though that how to understand/apply mm in some of the examples above creates hard questions, but that is why many search engines provide two interfaces, one "natural language" interface and one that requires strict use of boolean syntax. Allowing people to enter some boolean operators (which they're going to expect will be respected-no-matter-what) and simultaneously interpreting their query using mm handlers intended for a more rough-and-ready approach is just going to lead to confused end users most of the time. So, in some ways, ignoring mm when operators are used is a feature, not a bug, but that seems orthogonal to the completely unacceptable outcome Mike described: whatever is causing THAT, is a bug.

> MM ignored in edismax queries with operators
> --------------------------------------------
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
> Issue Type: Bug
> Components: search
> Affects Versions: 3.3
> Reporter: Magnus Bergmark
> Priority: Minor
>
> Hypothetical scenario:
> 1. User searches for "stocks oil gold" with MM set to "50%"
> 2. User adds "-stockings" to the query: "stocks oil gold -stockings"
> 3. User gets no hits since MM was ignored and all terms where AND-ed together
> The behavior seems to be intentional, although the reason why is never explained:
> // For correct lucene queries, turn off mm processing if there
> // were explicit operators (except for AND).
> boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0;
> (lines 232-234 taken from tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the primary features of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.