Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] [Updated] (SOLR-3707) Upgrade Solr to Tika 1.2

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Aug 3, 2012, 10:55 AM

Post #1 of 4 (119 views)
Permalink
[jira] [Updated] (SOLR-3707) Upgrade Solr to Tika 1.2

[ https://issues.apache.org/jira/browse/SOLR-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Høydahl updated SOLR-3707:
------------------------------

Description: Tika 1.2 has been released with these improvements: http://tika.apache.org/1.2/index.html (was: Tika 1.1 is being released soon. It features some new parsers, ability to extract text from password protected PDFs and office docs, and several bug fixes. See http://people.apache.org/~mattmann/apache-tika-1.1/rc1/CHANGES-1.1.txt

We should upgrade as soon as it is released.)
Fix Version/s: (was: 4.0-ALPHA)
5.0
4.0

> Upgrade Solr to Tika 1.2
> ------------------------
>
> Key: SOLR-3707
> URL: https://issues.apache.org/jira/browse/SOLR-3707
> Project: Solr
> Issue Type: Improvement
> Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
> Reporter: Jan Høydahl
> Assignee: Jan Høydahl
> Fix For: 4.0, 5.0
>
>
> Tika 1.2 has been released with these improvements: http://tika.apache.org/1.2/index.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Aug 3, 2012, 5:11 PM

Post #2 of 4 (112 views)
Permalink
[jira] [Updated] (SOLR-3707) Upgrade Solr to Tika 1.2 [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Høydahl updated SOLR-3707:
------------------------------

Attachment: SOLR-3707.patch

Patch for trunk upgrading to tika1.2. There are two new JARs included:
* xc-1.0.jar for more compress formats
* juniversalchardet-1.0.3.jar for new charset detection
We have also removed two unused Jars:
* scannotation-1.0.2.jar
* javassist-3.6.0.GA.jar
Tests pass, after updating some tests to ignore the extra metadata fields being parsed out by the enhanced metadata parser in Tika1.2

> Upgrade Solr to Tika 1.2
> ------------------------
>
> Key: SOLR-3707
> URL: https://issues.apache.org/jira/browse/SOLR-3707
> Project: Solr
> Issue Type: Improvement
> Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
> Reporter: Jan Høydahl
> Assignee: Jan Høydahl
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-3707.patch
>
>
> Tika 1.2 has been released with these improvements: http://tika.apache.org/1.2/index.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Aug 4, 2012, 3:51 PM

Post #3 of 4 (116 views)
Permalink
[jira] [Updated] (SOLR-3707) Upgrade Solr to Tika 1.2 [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Høydahl updated SOLR-3707:
------------------------------

Attachment: SOLR-3707.patch

Patch with updated classpath for Eclipse.

Anything else needed before commit?

> Upgrade Solr to Tika 1.2
> ------------------------
>
> Key: SOLR-3707
> URL: https://issues.apache.org/jira/browse/SOLR-3707
> Project: Solr
> Issue Type: Improvement
> Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
> Reporter: Jan Høydahl
> Assignee: Jan Høydahl
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-3707.patch, SOLR-3707.patch
>
>
> Tika 1.2 has been released with these improvements: http://tika.apache.org/1.2/index.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Aug 8, 2012, 7:39 AM

Post #4 of 4 (104 views)
Permalink
[jira] [Updated] (SOLR-3707) Upgrade Solr to Tika 1.2 [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Høydahl updated SOLR-3707:
------------------------------

Fix Version/s: 5.0

> Upgrade Solr to Tika 1.2
> ------------------------
>
> Key: SOLR-3707
> URL: https://issues.apache.org/jira/browse/SOLR-3707
> Project: Solr
> Issue Type: Improvement
> Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
> Reporter: Jan Høydahl
> Assignee: Jan Høydahl
> Fix For: 5.0, 4.0
>
> Attachments: SOLR-3707.patch, SOLR-3707.patch
>
>
> Tika 1.2 has been released with these improvements: http://tika.apache.org/1.2/index.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.