Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Clarification on TokenStream.close() needed

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


Kuro at basistech

Oct 20, 2009, 12:27 PM

Post #1 of 4 (484 views)
Permalink
Clarification on TokenStream.close() needed

Hi,
My Tokenizer started showing an error when I switched
to Solr 1.4 dev version. I am not too confident but
it seems that Solr 1.4 calls close() on my Tokenizer
before calling reset(Reader) in order to reuse
the Tokenizer. That is, close() is called more than
once.

The API doc of close() reads:
Releases resources associated with this stream.

So I thought close() shold be called only once,
and the Tokenizer objects cannot be reused after
close() is called. Is my interpretation correct?

If my interpretation is wrong and it is legal to
call close() more than once, where is the best place
to free per-instance resources?

T. "Kuro" Kurosaka


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


yonik at lucidimagination

Oct 20, 2009, 12:48 PM

Post #2 of 4 (462 views)
Permalink
Re: Clarification on TokenStream.close() needed [In reply to]

2009/10/20 Teruhiko Kurosaka <Kuro [at] basistech>:
> My Tokenizer started showing an error when I switched
> to Solr 1.4 dev version.  I am not too confident but
> it seems that Solr 1.4 calls close() on my Tokenizer
> before calling reset(Reader) in order to reuse
> the Tokenizer.  That is, close() is called more than
> once.

Is this when indexing a document, or querying a document.
close() should only be called once.

If indexing, it would be closed in Lucene at DocInverterPerField.java:197

-Yonik
http://www.lucidimagination.com



> The API doc of close() reads:
> Releases resources associated with this stream.
>
> So I thought close() shold be called only once,
> and the Tokenizer objects cannot be reused after
> close() is called.  Is my interpretation correct?
>
> If my interpretation is wrong and it is legal to
> call close() more than once, where is the best place
> to free per-instance resources?
>
> T. "Kuro" Kurosaka
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


uwe at thetaphi

Oct 20, 2009, 12:48 PM

Post #3 of 4 (457 views)
Permalink
RE: Clarification on TokenStream.close() needed [In reply to]

TokenStream.close() is called (and was everytime called before, too), when
the tokenization is done to close the Reader. The call to reset(Reader) is
the same like creating a new instance (only that the cost of creating a new
instance is not needed).

The change in Solr 1.4 is that now TokenStreams are reused, if the Analyzer
supports it. You should release all resources in close and recreate on
reset(). If this is too costly, they should not be released in close() but
then there is no way to release them (only in finalize() which is called by
GC).

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

> -----Original Message-----
> From: Teruhiko Kurosaka [mailto:Kuro [at] basistech]
> Sent: Tuesday, October 20, 2009 9:27 PM
> To: java-user [at] lucene
> Subject: Clarification on TokenStream.close() needed
>
> Hi,
> My Tokenizer started showing an error when I switched
> to Solr 1.4 dev version. I am not too confident but
> it seems that Solr 1.4 calls close() on my Tokenizer
> before calling reset(Reader) in order to reuse
> the Tokenizer. That is, close() is called more than
> once.
>
> The API doc of close() reads:
> Releases resources associated with this stream.
>
> So I thought close() shold be called only once,
> and the Tokenizer objects cannot be reused after
> close() is called. Is my interpretation correct?
>
> If my interpretation is wrong and it is legal to
> call close() more than once, where is the best place
> to free per-instance resources?
>
> T. "Kuro" Kurosaka
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


Kuro at basistech

Oct 20, 2009, 4:47 PM

Post #4 of 4 (463 views)
Permalink
RE: Clarification on TokenStream.close() needed [In reply to]

> From: Uwe Schindler [mailto:uwe [at] thetaphi]

> TokenStream.close() is called (and was everytime called
> before, too), when the tokenization is done to close the
> Reader. The call to reset(Reader) is the same like creating a
> new instance (only that the cost of creating a new instance
> is not needed).

Shouldn't that be done in end()? If not, what is
the difference in purpose between end() and close()?
What is the purpose of end()?

In any event, if close() is meant to close the Reader,
I think the current description of TokenStream.close()
in the javadoc is a bit confusing. Instead of saying
"Releases resources associated with this stream.", perhaps
it should say something like "Release resources associated with
the current input source.".

-kuro

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.