Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] [Commented] (SOLR-3442) Example schema switch to DisMax instead of CopyField

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

May 7, 2012, 3:15 AM

Post #1 of 8 (188 views)
Permalink
[jira] [Commented] (SOLR-3442) Example schema switch to DisMax instead of CopyField

[ https://issues.apache.org/jira/browse/SOLR-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269489#comment-13269489 ]

Chris Male commented on SOLR-3442:
----------------------------------

I think it's a pretty bold claim to call it an anti-pattern. I've seen it successfully used in many projects and it continues to fulfill user needs.

> Example schema switch to DisMax instead of CopyField
> ----------------------------------------------------
>
> Key: SOLR-3442
> URL: https://issues.apache.org/jira/browse/SOLR-3442
> Project: Solr
> Issue Type: Improvement
> Components: Schema and Analysis
> Reporter: Jan Høydahl
> Labels: dismax
>
> Spinoff from SOLR-3439:
> The use of copyField in todays example schema is an anti pattern since we indirectly teach people to duplicate most of their content, while most would be better off using DisMax, or at least a combination.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

May 7, 2012, 3:47 AM

Post #2 of 8 (182 views)
Permalink
[jira] [Commented] (SOLR-3442) Example schema switch to DisMax instead of CopyField [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269509#comment-13269509 ]

Jan Høydahl commented on SOLR-3442:
-----------------------------------

Sure, I've seen it successfully used too, and I use it myself now and then to reduce the number of fields required in "qf".

For very small indexes without much need for tuning analysis or relevancy it does not matter very much. But I'm arguing that copyField is the legacy way of searching multiple fields in one go, while DisMax is the current recommendation. So why stick to the legacy in the default example?

> Example schema switch to DisMax instead of CopyField
> ----------------------------------------------------
>
> Key: SOLR-3442
> URL: https://issues.apache.org/jira/browse/SOLR-3442
> Project: Solr
> Issue Type: Improvement
> Components: Schema and Analysis
> Reporter: Jan Høydahl
> Labels: dismax
>
> Spinoff from SOLR-3439:
> The use of copyField in todays example schema is an anti pattern since we indirectly teach people to duplicate most of their content, while most would be better off using DisMax, or at least a combination.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

May 7, 2012, 5:58 AM

Post #3 of 8 (177 views)
Permalink
[jira] [Commented] (SOLR-3442) Example schema switch to DisMax instead of CopyField [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269584#comment-13269584 ]

Jack Krupansky commented on SOLR-3442:
--------------------------------------

Maybe Solr has outgrown the concept of a single example schema/config. "Full function" and "maximal performance" conflict to some degree and picking one arbitrary point on the design spectrum does a disservice for those who have varying requirements. The current example already has performance tips and a warning advisory not to use it for benchmarking. And SolrCell documents having "core", common metadata is somewhat distinct from full-custom schema design.

The copyField to "text" pattern is more clearly targeted at non-dismax users, where "text" is the single default search field.

This issue essentially raises the question: Is non-dismax query parsing dead? If not, the copyField/text pattern still seems relevant.

Maybe it would be worth having a modest library of schema/config files that the user can select from when running "example". OTOH, maintaining a lot of somewhat similar files can be a pain. A way to configure the schema/config files (conditionals) would be helpful.


> Example schema switch to DisMax instead of CopyField
> ----------------------------------------------------
>
> Key: SOLR-3442
> URL: https://issues.apache.org/jira/browse/SOLR-3442
> Project: Solr
> Issue Type: Improvement
> Components: Schema and Analysis
> Reporter: Jan Høydahl
> Labels: dismax
>
> Spinoff from SOLR-3439:
> The use of copyField in todays example schema is an anti pattern since we indirectly teach people to duplicate most of their content, while most would be better off using DisMax, or at least a combination.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

May 7, 2012, 6:32 AM

Post #4 of 8 (181 views)
Permalink
[jira] [Commented] (SOLR-3442) Example schema switch to DisMax instead of CopyField [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269603#comment-13269603 ]

Jan Høydahl commented on SOLR-3442:
-----------------------------------

I'm not saying anything is "dead". Both the "lucene" queryparser and copyField has its mission and is supported, and you can mix and match these with DisMax to fit your needs. But for the example we should select the most useful and flexible way to show indexing and search, and that is no longer "text" catch-all and copyField. Aside from it doubling the size of your index, it is inflexible in that adding or removing a field from search means schema update and re-indexing. Catch-all fields with copyField can sometimes be used as a performance optimization, but you do not start in that end.

Maintaining many examples has shown not to be a very good strategy, look at the multi-core and DIH examples, they lag behind several versions when it comes to schema version and new solrconfig syntaxes. Instead, a single schema which can do both the product search and document search use cases well is easy to achieve. The Velocity GUI can be extended with two tabs if need be, one "products" tab and one "documents" tab. If we choose the example documents to index wisely, to be i.e. user guides for the products, we get a nice connection. You can search for "ipod" and see both products and user guides matching your search.

> Example schema switch to DisMax instead of CopyField
> ----------------------------------------------------
>
> Key: SOLR-3442
> URL: https://issues.apache.org/jira/browse/SOLR-3442
> Project: Solr
> Issue Type: Improvement
> Components: Schema and Analysis
> Reporter: Jan Høydahl
> Labels: dismax
>
> Spinoff from SOLR-3439:
> The use of copyField in todays example schema is an anti pattern since we indirectly teach people to duplicate most of their content, while most would be better off using DisMax, or at least a combination.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

May 7, 2012, 8:14 AM

Post #5 of 8 (178 views)
Permalink
[jira] [Commented] (SOLR-3442) Example schema switch to DisMax instead of CopyField [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269684#comment-13269684 ]

Jack Krupansky commented on SOLR-3442:
--------------------------------------

I don't disagree with the gist of your argument, but I would cringe a little if we change the schema so that it doesn't work very well if the user does drop back to the lucene query parser with &defType=lucene which has only a single default field.

OTOH, maybe that is simply the cost of making the example schema (and config) be more representative of "best practices". But, that sort of implies that the Lucene query parser is not a "best practice", at least when searchable text content is spread over multiple fields.


> Example schema switch to DisMax instead of CopyField
> ----------------------------------------------------
>
> Key: SOLR-3442
> URL: https://issues.apache.org/jira/browse/SOLR-3442
> Project: Solr
> Issue Type: Improvement
> Components: Schema and Analysis
> Reporter: Jan Høydahl
> Labels: dismax
>
> Spinoff from SOLR-3439:
> The use of copyField in todays example schema is an anti pattern since we indirectly teach people to duplicate most of their content, while most would be better off using DisMax, or at least a combination.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

May 7, 2012, 8:46 AM

Post #6 of 8 (178 views)
Permalink
[jira] [Commented] (SOLR-3442) Example schema switch to DisMax instead of CopyField [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269710#comment-13269710 ]

Yonik Seeley commented on SOLR-3442:
------------------------------------

bq. I would cringe a little if we change the schema so that it doesn't work very well if the user does drop back to the lucene query parser

The lucene query parser generally shouldn't be used for user queries, only programmatically generated ones. Using expicit fieldnames (or specifying df) for that case should be fine.

> Example schema switch to DisMax instead of CopyField
> ----------------------------------------------------
>
> Key: SOLR-3442
> URL: https://issues.apache.org/jira/browse/SOLR-3442
> Project: Solr
> Issue Type: Improvement
> Components: Schema and Analysis
> Reporter: Jan Høydahl
> Labels: dismax
>
> Spinoff from SOLR-3439:
> The use of copyField in todays example schema is an anti pattern since we indirectly teach people to duplicate most of their content, while most would be better off using DisMax, or at least a combination.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

May 7, 2012, 10:26 AM

Post #7 of 8 (183 views)
Permalink
[jira] [Commented] (SOLR-3442) Example schema switch to DisMax instead of CopyField [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269794#comment-13269794 ]

Jack Krupansky commented on SOLR-3442:
--------------------------------------

bq. The lucene query parser generally shouldn't be used for user queries...

If that is the general sentiment, then having the default example *user* query parser be edismax makes perfect sense.


> Example schema switch to DisMax instead of CopyField
> ----------------------------------------------------
>
> Key: SOLR-3442
> URL: https://issues.apache.org/jira/browse/SOLR-3442
> Project: Solr
> Issue Type: Improvement
> Components: Schema and Analysis
> Reporter: Jan Høydahl
> Labels: dismax
>
> Spinoff from SOLR-3439:
> The use of copyField in todays example schema is an anti pattern since we indirectly teach people to duplicate most of their content, while most would be better off using DisMax, or at least a combination.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

May 14, 2012, 3:30 PM

Post #8 of 8 (167 views)
Permalink
[jira] [Commented] (SOLR-3442) Example schema switch to DisMax instead of CopyField [In reply to]

https://issues.apache.org/jira/secure/ViewProfile.jspa?name=jkrupan"]Jack Krupansky commented on https://issues.apache.org/jira/browse/SOLR-3442"]SOLR-3442 https://issues.apache.org/jira/browse/SOLR-3442"]Example schema switch to DisMax instead of CopyField

When I initially read this issue I mistakenly read it as edismax rather than dismax. So, I would request that the intent be crystal clear - is it reasonable to switch the default query parser handler to edismax, or is it being suggested that the more limited dismax query parser be the new default? If the latter, we won't even be able to query specific fields without config changes.

Some of the discussion over on https://issues.apache.org/jira/browse/SOLR-2368"]SOLR-2368 might be relevant, as to whether the default query for example should be severely "locked-down" as opposed to highly functional (fields, Lucene syntax, etc.)

I was going to proceed with an edismax-based patch, but now I am not so sure. This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira"]http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe [at] lucene For additional commands, e-mail: dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.