Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] [Commented] (SOLR-2368) Improve extended dismax (edismax) parser

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Jan 25, 2012, 10:22 AM

Post #1 of 7 (47 views)
Permalink
[jira] [Commented] (SOLR-2368) Improve extended dismax (edismax) parser

[ https://issues.apache.org/jira/browse/SOLR-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193175#comment-13193175 ]

Jan Høydahl commented on SOLR-2368:
-----------------------------------

I've started a Wiki page to document eDismax here: http://wiki.apache.org/solr/ExtendedDisMax - feel free to contribute!

Once we get the "userFields" feature SOLR-3026 in, are there any blockers left for retiring the old dismax parser?

> Improve extended dismax (edismax) parser
> ----------------------------------------
>
> Key: SOLR-2368
> URL: https://issues.apache.org/jira/browse/SOLR-2368
> Project: Solr
> Issue Type: Improvement
> Components: search
> Reporter: Yonik Seeley
> Labels: QueryParser
>
> This is a "mother" issue to track further improvements for eDismax parser.
> The goal is to be able to deprecate and remove the old dismax once edismax satisfies all usecases of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jan 26, 2012, 4:33 AM

Post #2 of 7 (41 views)
Permalink
[jira] [Commented] (SOLR-2368) Improve extended dismax (edismax) parser [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193772#comment-13193772 ]

Okke Klein commented on SOLR-2368:
----------------------------------

Was the feature {quote}advanced stopword handling... stopwords are not required in the mandatory part of the query but are still used (if indexed) in the proximity boosting part. If a query consists of all stopwords (e.g. to be or not to be) then all will be required.{quote} from https://issues.apache.org/jira/browse/SOLR-1553 ever implemented? If not can this feature be added?

> Improve extended dismax (edismax) parser
> ----------------------------------------
>
> Key: SOLR-2368
> URL: https://issues.apache.org/jira/browse/SOLR-2368
> Project: Solr
> Issue Type: Improvement
> Components: search
> Reporter: Yonik Seeley
> Labels: QueryParser
>
> This is a "mother" issue to track further improvements for eDismax parser.
> The goal is to be able to deprecate and remove the old dismax once edismax satisfies all usecases of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Feb 1, 2012, 3:53 PM

Post #3 of 7 (43 views)
Permalink
[jira] [Commented] (SOLR-2368) Improve extended dismax (edismax) parser [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198340#comment-13198340 ]

Hoss Man commented on SOLR-2368:
--------------------------------

bq. are there any blockers left for retiring the old dismax parser?

As i've mentioned before, I don't think DismaxQParser should ever be retired ... i'm still not convinced that the (default) parser you get when using "defType=dismax" should change to ExtendedDismaxQParser instance

My three main reasons for (still) feeling this way are:
* I see no advantage to changing what QParser you get (by default) when asking for "dismax" ... not when it's so easy for new users (or old users who want to switch) to just use "edismax" by name. (or explicitly declare their own instance of ExtendedDismaxQParser with the name "dismax" if that's what they always want)
* ExtendedDismaxQParser is a significantly more complex beast then DismaxQParser, and likely to have a lot of little quirks (and bugs) that no one has really noticed yet. For people who are happy with DismaxQParser, we should leave well enough alone.
* Even with things like SOLR-3026 allowing you to disable field specific queries, ExtendedDismaxQParser still supports more types of queries/syntax then DismaxQParser (ie: fuzzy queries, prefix queries, wildcard queries, etc...) which may have performance impacts on existing dismax users, many of whom probably don't want to start allowing from their users -- particularly considering that limited syntax w/o metacharacters was a major advertised advantage of using dismax from day 1.

Please note: i have no tangible objection to smiley's suggestion that...

bq. defType should default to ... [edismax] in Solr 4

...if folks think that the ExtendedDismaxQParser would make a better default then the LuceneQParser moving forward, i've got no objection to that -- but if someone explicitly asks for "defType=dismax" by name, that should be the DismaxQParser (and it's limited syntax) ... ExtendedDismaxQParser is a completely different animal.

saying defType=dismax should return an ExtendedDismaxQParser makes as much sense to me as saying that defType=lucene should return an ExtendedDismaxQParser -- just because the legal syntax of edismax is a super set of dismax/lucene doesn't mean they are equivalent or that we should assume "it's better" for people who ask for a specific QParser by name.

> Improve extended dismax (edismax) parser
> ----------------------------------------
>
> Key: SOLR-2368
> URL: https://issues.apache.org/jira/browse/SOLR-2368
> Project: Solr
> Issue Type: Improvement
> Components: search
> Reporter: Yonik Seeley
> Labels: QueryParser
>
> This is a "mother" issue to track further improvements for eDismax parser.
> The goal is to be able to deprecate and remove the old dismax once edismax satisfies all usecases of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Feb 1, 2012, 5:43 PM

Post #4 of 7 (40 views)
Permalink
[jira] [Commented] (SOLR-2368) Improve extended dismax (edismax) parser [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198448#comment-13198448 ]

Jan Høydahl commented on SOLR-2368:
-----------------------------------

Regarding old DisMax, do all bugs being detected in eDismax also get fixed in DisMax if applicable? I'm not sure..
If/when eDismax can be configured to fill the original role of DisMax, why should we maintain the old one?
And if edismax was the one written first, it would have the name "dismax", so why could it not get that name the day it supersedes dismax in features and usage?

It's a bit analogue to IntField/TrieIntField - naming hints on implementation details to distinguish from other implementations, but if TrieIntField was developed in one go and not committed incrementally, it could simply have replaced IntField. eDisMax is @lucene.experimental and when it is up to par with dismax all over, it should in my opinion take over its name with a fat notice in CHANGES.TXT. For a few versions edismax could be a valid alias too, and the old dismax could be kept around as "legacydismax" for the conservative/lazy. How would you like to have to relate to EnhancedCsvUpdateRequestHandler at /solr/update/ecsv just because the original is less complex? :-)

Created SOLR-3086 to let users lobotomize edismax so it will only accept the query syntaxes they choose.


> Improve extended dismax (edismax) parser
> ----------------------------------------
>
> Key: SOLR-2368
> URL: https://issues.apache.org/jira/browse/SOLR-2368
> Project: Solr
> Issue Type: Improvement
> Components: search
> Reporter: Yonik Seeley
> Labels: QueryParser
>
> This is a "mother" issue to track further improvements for eDismax parser.
> The goal is to be able to deprecate and remove the old dismax once edismax satisfies all usecases of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Feb 2, 2012, 5:25 AM

Post #5 of 7 (41 views)
Permalink
[jira] [Commented] (SOLR-2368) Improve extended dismax (edismax) parser [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198773#comment-13198773 ]

Jan Høydahl commented on SOLR-2368:
-----------------------------------

Personally I don't think we should worry about the added features after edismax becomes dismax. If people don't read the release notes when upgrading, they cannot complain later: "Noone told me that fuzzy queries was allowed by dismax". Especially if we provide a way to turn it off. In worst case I can commit to changing the config defaults for edismax to resemble dismax behaviour, i.e. uf=-*&us.all=false (see SOLR-3086).

Anyone of you out there using "dismax" over "edismax" for other reasons than the ones already mentioned?

> Improve extended dismax (edismax) parser
> ----------------------------------------
>
> Key: SOLR-2368
> URL: https://issues.apache.org/jira/browse/SOLR-2368
> Project: Solr
> Issue Type: Improvement
> Components: search
> Reporter: Yonik Seeley
> Labels: QueryParser
>
> This is a "mother" issue to track further improvements for eDismax parser.
> The goal is to be able to deprecate and remove the old dismax once edismax satisfies all usecases of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Feb 3, 2012, 1:50 PM

Post #6 of 7 (43 views)
Permalink
[jira] [Commented] (SOLR-2368) Improve extended dismax (edismax) parser [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200079#comment-13200079 ]

Hoss Man commented on SOLR-2368:
--------------------------------

bq. If/when eDismax can be configured to fill the original role of DisMax, why should we maintain the old one?

my chief concerns -- as i mentioned -- are that _currently_ edismax has behavior dismax doesn't support that people may actively *not* want, and that edismax may have quirks dismax doesn't that we have yet to discover and don't realize because the overall test coverage is low and the EDismaxQParse is so much more significantly complex and there are so many weird edge cases.

But sure: if SOLR-3086 makes it possible to configure EDisMaxQParser to behave the same as DisMaxQParser, and if we feel confident through testing that (when configured as such) they behave the same, i've won't have any objections what soever to retiring the DisMaxQParser class for simplifying code maintence.

bq. Personally I don't think we should worry about the added features after edismax becomes dismax.

this part i don't understand ... even if all of the functionality ultimately merges and only the EDisMaxQparser remains, why should defType=dismax and defType=edismax suddenly become the smae thing? why not offer two instances by default, "edismax" which is open and everything defaults to on, and "dismax" where it's more locked down like it is today? ... what is gained by changing the default behavior when people use "defType=dismax"?

(as i said before (in a slightly diff way above): would you suggest that defType=lucene should now be an EDisMaxQparser instance as well? with a CHANGES.txt note telling people that if they only want features LuceneQParser supported, they have to add invariant params to disable them????)


> Improve extended dismax (edismax) parser
> ----------------------------------------
>
> Key: SOLR-2368
> URL: https://issues.apache.org/jira/browse/SOLR-2368
> Project: Solr
> Issue Type: Improvement
> Components: search
> Reporter: Yonik Seeley
> Labels: QueryParser
>
> This is a "mother" issue to track further improvements for eDismax parser.
> The goal is to be able to deprecate and remove the old dismax once edismax satisfies all usecases of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Feb 3, 2012, 4:31 PM

Post #7 of 7 (42 views)
Permalink
[jira] [Commented] (SOLR-2368) Improve extended dismax (edismax) parser [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200199#comment-13200199 ]

Jan Høydahl commented on SOLR-2368:
-----------------------------------

Are you saying it would be possible to define something like this in solrconfig.xml

{code:xml}
<queryParser name="dismax" class="solr.ExtendedDismaxQParserPlugin">
<lst name="defaults">
<str name="uf">-*</str>
</lst>
</queryParser>
{code}

And when someone says {{&defType=dismax}} it uses those defaults?

If so, that is simply a brilliant way to do it.

> Improve extended dismax (edismax) parser
> ----------------------------------------
>
> Key: SOLR-2368
> URL: https://issues.apache.org/jira/browse/SOLR-2368
> Project: Solr
> Issue Type: Improvement
> Components: search
> Reporter: Yonik Seeley
> Labels: QueryParser
>
> This is a "mother" issue to track further improvements for eDismax parser.
> The goal is to be able to deprecate and remove the old dismax once edismax satisfies all usecases of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.