Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Analyzer for stripping non alpha-numeric characters?

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


jason.rutherglen at gmail

Feb 4, 2010, 9:18 AM

Post #1 of 4 (840 views)
Permalink
Analyzer for stripping non alpha-numeric characters?

Is there an analyzer that easily strips non alpha-numeric from the end
of a token?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


sarowe at syr

Feb 4, 2010, 12:18 PM

Post #2 of 4 (786 views)
Permalink
RE: Analyzer for stripping non alpha-numeric characters? [In reply to]

Hi Jason,

Solr's PatternReplaceFilter(ts, "\\P{Alnum}+$", "", false) should work, chained after an appropriate tokenizer.

Steve

On 02/04/2010 at 12:18 PM, Jason Rutherglen wrote:
> Is there an analyzer that easily strips non alpha-numeric from the end
> of a token?
>
> --------------------------------------------------------------------- To
> unsubscribe, e-mail: java-user-unsubscribe [at] lucene For
> additional commands, e-mail: java-user-help [at] lucene



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


jason.rutherglen at gmail

Feb 4, 2010, 2:16 PM

Post #3 of 4 (785 views)
Permalink
Re: Analyzer for stripping non alpha-numeric characters? [In reply to]

Transferred partially to solr-user...

Steven, thanks for the reply!

I wonder if PatternReplaceFilter can output multiple tokens? I'd like
to progressively strip the non-alphanums, for example output:

apple!&*
apple!&
apple!
apple

On Thu, Feb 4, 2010 at 12:18 PM, Steven A Rowe <sarowe [at] syr> wrote:
> Hi Jason,
>
> Solr's PatternReplaceFilter(ts, "\\P{Alnum}+$", "", false) should work, chained after an appropriate tokenizer.
>
> Steve
>
> On 02/04/2010 at 12:18 PM, Jason Rutherglen wrote:
>> Is there an analyzer that easily strips non alpha-numeric from the end
>> of a token?
>>
>> --------------------------------------------------------------------- To
>> unsubscribe, e-mail: java-user-unsubscribe [at] lucene For
>> additional commands, e-mail: java-user-help [at] lucene
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


jason.rutherglen at gmail

Feb 4, 2010, 3:00 PM

Post #4 of 4 (783 views)
Permalink
Re: Analyzer for stripping non alpha-numeric characters? [In reply to]

Answering my own question... PatternReplaceFilter doesn't output
multiple tokens...

Which means messing with capture state...

On Thu, Feb 4, 2010 at 2:16 PM, Jason Rutherglen
<jason.rutherglen [at] gmail> wrote:
> Transferred partially to solr-user...
>
> Steven, thanks for the reply!
>
> I wonder if PatternReplaceFilter can output multiple tokens?  I'd like
> to progressively strip the non-alphanums, for example output:
>
> apple!&*
> apple!&
> apple!
> apple
>
> On Thu, Feb 4, 2010 at 12:18 PM, Steven A Rowe <sarowe [at] syr> wrote:
>> Hi Jason,
>>
>> Solr's PatternReplaceFilter(ts, "\\P{Alnum}+$", "", false) should work, chained after an appropriate tokenizer.
>>
>> Steve
>>
>> On 02/04/2010 at 12:18 PM, Jason Rutherglen wrote:
>>> Is there an analyzer that easily strips non alpha-numeric from the end
>>> of a token?
>>>
>>> --------------------------------------------------------------------- To
>>> unsubscribe, e-mail: java-user-unsubscribe [at] lucene For
>>> additional commands, e-mail: java-user-help [at] lucene
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.