Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Recommend a example to implement an analyzer with parsing Camelcase

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


thienthanhomenh at gmail

Feb 7, 2010, 7:36 AM

Post #1 of 5 (1091 views)
Permalink
Recommend a example to implement an analyzer with parsing Camelcase

Would you like to suggest me an example for implementing an analyzer with
parsing CamelCase !

I can overload methods with StopFilter PorterStemFilter, LowerCaseTokenizer
but with a new one different from these available filter I have not
solutions.
Thank you !


iorixxx at yahoo

Feb 7, 2010, 8:11 AM

Post #2 of 5 (1070 views)
Permalink
Re: Recommend a example to implement an analyzer with parsing Camelcase [In reply to]

> Would you like to suggest me an
> example for implementing an analyzer with
> parsing CamelCase !
>
> I can overload methods with StopFilter PorterStemFilter,
> LowerCaseTokenizer
> but with a new one different from these available filter I
> have not
> solutions.
> Thank you !

You can use WordDelimiterFilterFactory[1] with splitOnCaseChange="1"

[1]http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

You need to consume it from solr artifacts.




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


thienthanhomenh at gmail

Feb 7, 2010, 8:18 AM

Post #3 of 5 (1070 views)
Permalink
Re: Recommend a example to implement an analyzer with parsing Camelcase [In reply to]

Hi Ahmet,
I have ever known WordDelimiterFilterFactory, but never use Solr.
But how to download this class.
Can I use it in Lucene 3.0, or extends Analyzer with overloading its
methods.
Sorry If my questions are too details.


On Mon, Feb 8, 2010 at 1:11 AM, Ahmet Arslan <iorixxx [at] yahoo> wrote:

> > Would you like to suggest me an
> > example for implementing an analyzer with
> > parsing CamelCase !
> >
> > I can overload methods with StopFilter PorterStemFilter,
> > LowerCaseTokenizer
> > but with a new one different from these available filter I
> > have not
> > solutions.
> > Thank you !
>
> You can use WordDelimiterFilterFactory[1] with splitOnCaseChange="1"
>
> [1]
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
>
> You need to consume it from solr artifacts.
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


iorixxx at yahoo

Feb 7, 2010, 8:37 AM

Post #4 of 5 (1066 views)
Permalink
Re: Recommend a example to implement an analyzer with parsing Camelcase [In reply to]

> Hi Ahmet,
> I have ever known WordDelimiterFilterFactory, but never use
> Solr.
> But how to download this class.

http://repo1.maven.org/maven2/org/apache/solr/solr-core/1.4.0/

> Can I use it in Lucene 3.0, or extends Analyzer with
> overloading its
> methods.

It is not using new token stream API yet, but you can use it. WordDelimiterFilter is package-private but you can use its factory as follows:

Map<String, String> delimeterArgs = new HashMap<String, String>(9);

delimeterArgs.put("generateWordParts", "1");
delimeterArgs.put("generateNumberParts", "0");
delimeterArgs.put("catenateWords", "0");
delimeterArgs.put("catenateNumbers", "0");
delimeterArgs.put("catenateAll", "0");
delimeterArgs.put("splitOnCaseChange", "0");
delimeterArgs.put("splitOnNumerics", "1");
delimeterArgs.put("preserveOriginal", "1");
delimeterArgs.put("stemEnglishPossessive", "0");

WordDelimiterFilterFactory wordDelimiterFactory = new WordDelimiterFilterFactory();

wordDelimiterFactory.init(delimeterArgs);

You can appned it to your analyzer chain:

result = wordDelimiterFactory.create(result);

Explanations of parameters are explained in the wiki.





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


thienthanhomenh at gmail

Feb 7, 2010, 9:16 AM

Post #5 of 5 (1062 views)
Permalink
Re: Recommend a example to implement an analyzer with parsing Camelcase [In reply to]

They are more details.
Thank you very much !

On Mon, Feb 8, 2010 at 1:37 AM, Ahmet Arslan <iorixxx [at] yahoo> wrote:

>
> > Hi Ahmet,
> > I have ever known WordDelimiterFilterFactory, but never use
> > Solr.
> > But how to download this class.
>
> http://repo1.maven.org/maven2/org/apache/solr/solr-core/1.4.0/
>
> > Can I use it in Lucene 3.0, or extends Analyzer with
> > overloading its
> > methods.
>
> It is not using new token stream API yet, but you can use it.
> WordDelimiterFilter is package-private but you can use its factory as
> follows:
>
> Map<String, String> delimeterArgs = new HashMap<String, String>(9);
>
> delimeterArgs.put("generateWordParts", "1");
> delimeterArgs.put("generateNumberParts", "0");
> delimeterArgs.put("catenateWords", "0");
> delimeterArgs.put("catenateNumbers", "0");
> delimeterArgs.put("catenateAll", "0");
> delimeterArgs.put("splitOnCaseChange", "0");
> delimeterArgs.put("splitOnNumerics", "1");
> delimeterArgs.put("preserveOriginal", "1");
> delimeterArgs.put("stemEnglishPossessive", "0");
>
> WordDelimiterFilterFactory wordDelimiterFactory = new
> WordDelimiterFilterFactory();
>
> wordDelimiterFactory.init(delimeterArgs);
>
> You can appned it to your analyzer chain:
>
> result = wordDelimiterFactory.create(result);
>
> Explanations of parameters are explained in the wiki.
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.