Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Is creating an analyzer expensive?

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


dseltzer at tveyes

Jul 12, 2012, 1:13 PM

Post #1 of 4 (456 views)
Permalink
Is creating an analyzer expensive?

I have one more question to pose to the group today:

I have several thousand searches being performed against MemoryIndexes on
a regular basis.

I'd like the ability for each search to choose it's own Analyzer, such
that some queries could use a regex pattern, other queries could use the
Standard Analyzer.

Does anyone know how expensive it is to create an Analyzer? Am I better
off creating analyzers as needed, or should I store them once and re-use
them?

Thanks very much!

-Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


simon.willnauer at gmail

Jul 12, 2012, 3:22 PM

Post #2 of 4 (443 views)
Permalink
Re: Is creating an analyzer expensive? [In reply to]

You can safely reuse a single analyzer across threads. The Analyzer
class maintains ThreadLocal storage for TokenStreams internally so you
can just create the analyzer once and use it throughout your
application.

simon

On Thu, Jul 12, 2012 at 10:13 PM, Dave Seltzer <dseltzer [at] tveyes> wrote:
> I have one more question to pose to the group today:
>
> I have several thousand searches being performed against MemoryIndexes on
> a regular basis.
>
> I'd like the ability for each search to choose it's own Analyzer, such
> that some queries could use a regex pattern, other queries could use the
> Standard Analyzer.
>
> Does anyone know how expensive it is to create an Analyzer? Am I better
> off creating analyzers as needed, or should I store them once and re-use
> them?
>
> Thanks very much!
>
> -Dave
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


rrs at rand

Aug 17, 2012, 9:38 AM

Post #3 of 4 (399 views)
Permalink
Re: Is creating an analyzer expensive? [In reply to]

Hi Simon,

I'm trying to reuse a custom analyzer and it's not working unless I manually
call reset() on the TokenStream. Basically the analyzer will work on the
first string, but complete fail on any string after that. The weird part is
that this is only necessary when using the SynonymFilter.

I wrote a short piece of code that shows what's going on. Let me know if you
want me to post the code. But it's more likely that I don't understand what
I'm doing - I am new to Lucene.

Thank you,

-ricardo



--
View this message in context: http://lucene.472066.n3.nabble.com/Is-creating-an-analyzer-expensive-tp3994731p4001878.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


uwe at thetaphi

Aug 17, 2012, 2:39 PM

Post #4 of 4 (398 views)
Permalink
RE: Is creating an analyzer expensive? [In reply to]

You have to use the TokenStream retrieved by Analyzer in the specified
order, otherwise it will not work correctly and will behave as described by
you:

reset()
while (incrementToken())
end()
close()

You have to call reset() also when using for first time! That's specified in
the specs. If you do this, it will work as expected.
See:
https://builds.apache.org/job/Lucene-Artifacts-4.x/javadoc/core/org/apache/l
ucene/analysis/TokenStream.html
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi


> -----Original Message-----
> From: rrs [mailto:rrs [at] rand]
> Sent: Friday, August 17, 2012 6:38 PM
> To: java-user [at] lucene
> Subject: Re: Is creating an analyzer expensive?
>
> Hi Simon,
>
> I'm trying to reuse a custom analyzer and it's not working unless I
manually call
> reset() on the TokenStream. Basically the analyzer will work on the first
string,
> but complete fail on any string after that. The weird part is that this is
only
> necessary when using the SynonymFilter.
>
> I wrote a short piece of code that shows what's going on. Let me know if
you
> want me to post the code. But it's more likely that I don't understand
what I'm
> doing - I am new to Lucene.
>
> Thank you,
>
> -ricardo
>
>
>
> --
> View this message in context:
http://lucene.472066.n3.nabble.com/Is-creating-
> an-analyzer-expensive-tp3994731p4001878.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.