Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] [Commented] (SOLR-3028) Support for additional query operators (feature parity request)

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Feb 3, 2012, 11:54 AM

Post #1 of 4 (38 views)
Permalink
[jira] [Commented] (SOLR-3028) Support for additional query operators (feature parity request)

[ https://issues.apache.org/jira/browse/SOLR-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199990#comment-13199990 ]

Hoss Man commented on SOLR-3028:
--------------------------------


#1) you can either index both stemmed and non-stemed in diff fields, and then specify the appropriate field name at query time for each input word to control what gets queried, or something like SOLR-2866 would be needed along with additional filters to record in the terms whether it's stemmed/unstemmed (possible with the payload?) so it's available at query time

#2) already possible with the standard lucene syntax: "cat doc goat"~15

#3) is already possible on trunk with the surround parser (SOLR-2703) -- although there isn't a lot of documentation out there about the syntax...

{code}
{!surround}(this W that) AND (other W next)
{code}

...it seems like the only real missing piece is some query side support for SOLR-2866, and it seems like that would best be tracked in SOLR-2866 right? ... make sure everything works all the way through the system?

> Support for additional query operators (feature parity request)
> ---------------------------------------------------------------
>
> Key: SOLR-3028
> URL: https://issues.apache.org/jira/browse/SOLR-3028
> Project: Solr
> Issue Type: Improvement
> Components: search
> Affects Versions: 4.0
> Reporter: Mike
> Labels: operator, queryparser
> Original Estimate: 6h
> Remaining Estimate: 6h
>
> I'm migrating my system from Sphinx Search, and there are a couple of operators that are not available to Solr, which are available in Sphinx.
> I would love to see the following added to the Dismax parser:
> 1. Exact match. This might be tricky to get right, since it requires work on the index side as well[1], but in Sphinx, you can do a query such as [ =running walking ], and running will have stemming off, while walking will have it on.
> 2. Term quorum. In Sphinx and some commercial search engines (like Recommind, Westlaw and Lexis), you can do a search such as [ (cat dog goat)/15 ], and find the three words within 15 terms of each other. I think this is possible in the backend via the span query, but there's no front end option for it, so it's quite hard to reveal to users.
> 3. Word order. Being able to say, "this term before that one, and this other term before the next" is something else in Sphinx that span queries support, but is missing in the query parser. Would be great to get this in too.
> These seem like the three biggest missing operators in Solr to me. I would love to help move these forward if there is any way I can help.
> [1] At least, *I* think it does. There's some discussion of one way of doing exact match like support in SOLR-2866.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Feb 3, 2012, 12:25 PM

Post #2 of 4 (34 views)
Permalink
[jira] [Commented] (SOLR-3028) Support for additional query operators (feature parity request) [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200022#comment-13200022 ]

Mike commented on SOLR-3028:
----------------------------

Thanks for the response. I really do appreciate it.

#1) It's possible to implement a custom query parser to add my own operator that directs the user's query to separate fields (one stemmed, one not), but it would be better if built in for two reasons. One, I'm sure I'd do it incorrectly or inefficiently. Two, having two fields seems like a rather inefficient way of implementing exact match -- intuitively at least, having two nearly identical indexes seems very bad.

I'm also not sure SOLR-2866 is a good place for that discussion, since that issue is to implement non-stemmed search by using humongous synonym files. Is it worth opening a new issue for the index side of this feature?

#2) Sorry - I messed up in my description. I'm looking for *quorum search*, but I described *proximity search*. Quorum search is more like "of these five words, find documents that contain at least two of them." I suppose it's possible to do this with the mm parameter, but there's no operator available to users, right?

#3) Woah, that's awesome! But, I don't think I can ask users to place queries with squiggley brackets. Some kind of sane operator seems necessary to me.

> Support for additional query operators (feature parity request)
> ---------------------------------------------------------------
>
> Key: SOLR-3028
> URL: https://issues.apache.org/jira/browse/SOLR-3028
> Project: Solr
> Issue Type: Improvement
> Components: search
> Affects Versions: 4.0
> Reporter: Mike
> Labels: operator, queryparser
> Original Estimate: 6h
> Remaining Estimate: 6h
>
> I'm migrating my system from Sphinx Search, and there are a couple of operators that are not available to Solr, which are available in Sphinx.
> I would love to see the following added to the Dismax parser:
> 1. Exact match. This might be tricky to get right, since it requires work on the index side as well[1], but in Sphinx, you can do a query such as [ =running walking ], and running will have stemming off, while walking will have it on.
> 2. Term quorum. In Sphinx and some commercial search engines (like Recommind, Westlaw and Lexis), you can do a search such as [ (cat dog goat)/15 ], and find the three words within 15 terms of each other. I think this is possible in the backend via the span query, but there's no front end option for it, so it's quite hard to reveal to users.
> 3. Word order. Being able to say, "this term before that one, and this other term before the next" is something else in Sphinx that span queries support, but is missing in the query parser. Would be great to get this in too.
> These seem like the three biggest missing operators in Solr to me. I would love to help move these forward if there is any way I can help.
> [1] At least, *I* think it does. There's some discussion of one way of doing exact match like support in SOLR-2866.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Feb 3, 2012, 2:49 PM

Post #3 of 4 (34 views)
Permalink
[jira] [Commented] (SOLR-3028) Support for additional query operators (feature parity request) [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200119#comment-13200119 ]

Hoss Man commented on SOLR-3028:
--------------------------------

#1) maybe i'm missundertanding SOLR-2866 ... it talks about synonyms, but the crux of it is really indexing multiple variants of a stemmed word with informatino about wether it is a stem or not, and then being able to query on both -- your requrest seems to heavily overlap with that -- in Victor's case he may be using a dictionary based stemmer, and in your case you may want a hueristic stemmer, but the underlying plumbing should probably all be the same.

#2) sorry, yeah i missed your label and only looked at the example. quorom search is definitely possible using the dismax parse with the mm param, but there is no explicit syntax for it in any parser i know of at the moment.

#3) the curly braces in that example were just me being explicit about which parser was in use via local params -- that's not the query syntax. you could just as easily do...

{code}
defType=surround&q=(this W that) AND (other W next)
{code}

In generallymy suggestion for moving forward would be to break these individual requests out into 3 distinct issues since they are largely unrelated (or only open two issues and ask about #1 in SOLR-2866 .. make an offshoot issue as needed)

individual issues with more direct issue summaries are easier to track and more likely to encourage patches from people who see the summaries and realize it's something they are interested in.

> Support for additional query operators (feature parity request)
> ---------------------------------------------------------------
>
> Key: SOLR-3028
> URL: https://issues.apache.org/jira/browse/SOLR-3028
> Project: Solr
> Issue Type: Improvement
> Components: search
> Affects Versions: 4.0
> Reporter: Mike
> Labels: operator, queryparser
> Original Estimate: 6h
> Remaining Estimate: 6h
>
> I'm migrating my system from Sphinx Search, and there are a couple of operators that are not available to Solr, which are available in Sphinx.
> I would love to see the following added to the Dismax parser:
> 1. Exact match. This might be tricky to get right, since it requires work on the index side as well[1], but in Sphinx, you can do a query such as [ =running walking ], and running will have stemming off, while walking will have it on.
> 2. Term quorum. In Sphinx and some commercial search engines (like Recommind, Westlaw and Lexis), you can do a search such as [ (cat dog goat)/15 ], and find the three words within 15 terms of each other. I think this is possible in the backend via the span query, but there's no front end option for it, so it's quite hard to reveal to users.
> 3. Word order. Being able to say, "this term before that one, and this other term before the next" is something else in Sphinx that span queries support, but is missing in the query parser. Would be great to get this in too.
> These seem like the three biggest missing operators in Solr to me. I would love to help move these forward if there is any way I can help.
> [1] At least, *I* think it does. There's some discussion of one way of doing exact match like support in SOLR-2866.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Feb 5, 2012, 6:43 PM

Post #4 of 4 (29 views)
Permalink
[jira] [Commented] (SOLR-3028) Support for additional query operators (feature parity request) [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13201006#comment-13201006 ]

Mike commented on SOLR-3028:
----------------------------

Agreed - now that we're talking through three threads simultaneously, it seems obvious we need three tickets. This one can serve as a meta ticket, I suppose.

Therefore:
1. I split off *exact match* into SOLR-3099, and made a comment in SOLR-2866. I think they're different enough to warrant separate issues.
2. I split off *quorum search* into SOLR-3100.
3. I split off *word order* to issue SOLR-3101..

And I'll set depends on flags shortly here, assuming I have the needed permissions. Thanks again for the guidance and help, Hoss.

> Support for additional query operators (feature parity request)
> ---------------------------------------------------------------
>
> Key: SOLR-3028
> URL: https://issues.apache.org/jira/browse/SOLR-3028
> Project: Solr
> Issue Type: Improvement
> Components: search
> Affects Versions: 4.0
> Reporter: Mike
> Labels: operator, queryparser
> Original Estimate: 6h
> Remaining Estimate: 6h
>
> I'm migrating my system from Sphinx Search, and there are a couple of operators that are not available to Solr, which are available in Sphinx.
> I would love to see the following added to the Dismax parser:
> 1. Exact match. This might be tricky to get right, since it requires work on the index side as well[1], but in Sphinx, you can do a query such as [ =running walking ], and running will have stemming off, while walking will have it on.
> 2. Term quorum. In Sphinx and some commercial search engines (like Recommind, Westlaw and Lexis), you can do a search such as [ (cat dog goat)/15 ], and find the three words within 15 terms of each other. I think this is possible in the backend via the span query, but there's no front end option for it, so it's quite hard to reveal to users.
> 3. Word order. Being able to say, "this term before that one, and this other term before the next" is something else in Sphinx that span queries support, but is missing in the query parser. Would be great to get this in too.
> These seem like the three biggest missing operators in Solr to me. I would love to help move these forward if there is any way I can help.
> [1] At least, *I* think it does. There's some discussion of one way of doing exact match like support in SOLR-2866.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.