Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] Created: (LUCENE-1377) Add HTMLStripReader and WordDelimiterFilter from SOLR

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Sep 4, 2008, 11:47 AM

Post #1 of 2 (263 views)
Permalink
[jira] Created: (LUCENE-1377) Add HTMLStripReader and WordDelimiterFilter from SOLR

Add HTMLStripReader and WordDelimiterFilter from SOLR
-----------------------------------------------------

Key: LUCENE-1377
URL: https://issues.apache.org/jira/browse/LUCENE-1377
Project: Lucene - Java
Issue Type: Improvement
Components: Analysis
Affects Versions: 2.3.2
Reporter: Jason Rutherglen
Priority: Minor


SOLR has two classes HTMLStripReader and WordDelimiterFilter which are very useful for a wide variety of use cases. It would be good to place them into core Lucene.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


hossman_lucene at fucit

Sep 4, 2008, 10:15 PM

Post #2 of 2 (233 views)
Permalink
Re: [jira] Created: (LUCENE-1377) Add HTMLStripReader and WordDelimiterFilter from SOLR [In reply to]

: SOLR has two classes HTMLStripReader and WordDelimiterFilter which are
: very useful for a wide variety of use cases. It would be good to place
: them into core Lucene.

FWIW: Just about every concrete TokenFilter and Tokeinzer in Solr's code
base could and probably should be promoted up into Lucene-Java -- at the
very least into a contrib if not into the "core"

A big reason why there hasn't been any movement to do this in many cases
is refactoring the testcases -- most Solr tests use the Solr TestHarness
to test things at a very high level black box style. essentially all new
test cases would be needed.

(in other cases there are no test cases, but they were committed to SOlr
anyway to scratch an itch)

the best appraoch for dealing with things like this is probably to track
each individual piece that people want to promote in seperate Jira issues
with seperate patches ... that way if someone does right good generalized
unit tests for WordDelimiterFilter but not HTMLStripReader (for example)
the issues remain detangled and one can be commited before the other.

(smaller more self contained patches are a lot easier to review and
commit)



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.