Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] [Updated] (SOLR-3439) Add "content" field to example schema to make SolrCell easier to use out of the box

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

May 5, 2012, 4:03 PM

Post #1 of 3 (62 views)
Permalink
[jira] [Updated] (SOLR-3439) Add "content" field to example schema to make SolrCell easier to use out of the box

[ https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Høydahl updated SOLR-3439:
------------------------------

Fix Version/s: 4.0

> Add "content" field to example schema to make SolrCell easier to use out of the box
> -----------------------------------------------------------------------------------
>
> Key: SOLR-3439
> URL: https://issues.apache.org/jira/browse/SOLR-3439
> Project: Solr
> Issue Type: Improvement
> Components: contrib - Solr Cell (Tika extraction), Schema and Analysis
> Reporter: Jack Krupansky
> Priority: Minor
> Fix For: 4.0
>
>
> Currently, SolrCell is configured to map Tika "content" (the main body of a document) to the "text" field which is the indexed-only (not stored) catch-all for default queries. That searches fine, but doesn't show the document content in the results, sometimes leading users to think that something is wrong. Sure, the user can easily add the field (and this is documented), but it would be a better user experience to have such a basic feature work right out of the box without any config editing and without the need for the user to read the fine print in the documentation.
> I propose that we add the "content" field to the example schema in the section of fields already defined to support SolrCell metadata. It would be stored and indexed.
> I further propose that a copyField be added for the "title", "description", (and maybe a couple of others) and "content" fields to add them to the "text" field for searching. Again, trying to improve the out of the box user experience. It also simplifies testing - less setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

May 6, 2012, 8:31 AM

Post #2 of 3 (56 views)
Permalink
[jira] [Updated] (SOLR-3439) Add "content" field to example schema to make SolrCell easier to use out of the box [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jack Krupansky updated SOLR-3439:
---------------------------------

Attachment: Lincoln-Gettysburg-Address.pdf
Lincoln-Gettysburg-Address.docx

Test documents for SolrCell. Both have a bunch of metadata fields defined. The PDF was generated from the Word doc.

We can consider them for inclusion in exampledocs, but for now they are posted here for reference and anybody wanting to test this issue.

> Add "content" field to example schema to make SolrCell easier to use out of the box
> -----------------------------------------------------------------------------------
>
> Key: SOLR-3439
> URL: https://issues.apache.org/jira/browse/SOLR-3439
> Project: Solr
> Issue Type: Improvement
> Components: contrib - Solr Cell (Tika extraction), Schema and Analysis
> Reporter: Jack Krupansky
> Priority: Minor
> Fix For: 4.0
>
> Attachments: Lincoln-Gettysburg-Address.docx, Lincoln-Gettysburg-Address.pdf
>
>
> Currently, SolrCell is configured to map Tika "content" (the main body of a document) to the "text" field which is the indexed-only (not stored) catch-all for default queries. That searches fine, but doesn't show the document content in the results, sometimes leading users to think that something is wrong. Sure, the user can easily add the field (and this is documented), but it would be a better user experience to have such a basic feature work right out of the box without any config editing and without the need for the user to read the fine print in the documentation.
> I propose that we add the "content" field to the example schema in the section of fields already defined to support SolrCell metadata. It would be stored and indexed.
> I further propose that a copyField be added for the "title", "description", (and maybe a couple of others) and "content" fields to add them to the "text" field for searching. Again, trying to improve the out of the box user experience. It also simplifies testing - less setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

May 6, 2012, 9:13 AM

Post #3 of 3 (54 views)
Permalink
[jira] [Updated] (SOLR-3439) Add "content" field to example schema to make SolrCell easier to use out of the box [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jack Krupansky updated SOLR-3439:
---------------------------------

Attachment: SOLR-3439.patch

Preliminary patch. "content" is both stored and indexed, with multiple copy fields.

> Add "content" field to example schema to make SolrCell easier to use out of the box
> -----------------------------------------------------------------------------------
>
> Key: SOLR-3439
> URL: https://issues.apache.org/jira/browse/SOLR-3439
> Project: Solr
> Issue Type: Improvement
> Components: contrib - Solr Cell (Tika extraction), Schema and Analysis
> Reporter: Jack Krupansky
> Priority: Minor
> Fix For: 4.0
>
> Attachments: Lincoln-Gettysburg-Address.docx, Lincoln-Gettysburg-Address.pdf, SOLR-3439.patch
>
>
> Currently, SolrCell is configured to map Tika "content" (the main body of a document) to the "text" field which is the indexed-only (not stored) catch-all for default queries. That searches fine, but doesn't show the document content in the results, sometimes leading users to think that something is wrong. Sure, the user can easily add the field (and this is documented), but it would be a better user experience to have such a basic feature work right out of the box without any config editing and without the need for the user to read the fine print in the documentation.
> I propose that we add the "content" field to the example schema in the section of fields already defined to support SolrCell metadata. It would be stored and indexed.
> I further propose that a copyField be added for the "title", "description", (and maybe a couple of others) and "content" fields to add them to the "text" field for searching. Again, trying to improve the out of the box user experience. It also simplifies testing - less setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.