Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Searching while optimizing

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


amordo at infosciences

Jul 31, 2003, 11:20 AM

Post #1 of 20 (1907 views)
Permalink
Searching while optimizing

Is it possible and safe to search an index while another thread adds
documents or optimizes the same index?


cutting at lucene

Jul 31, 2003, 12:31 PM

Post #2 of 20 (1873 views)
Permalink
Re: Searching while optimizing [In reply to]

Aviran Mordo wrote:
> Is it possible and safe to search an index while another thread adds
> documents or optimizes the same index?

Yes.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-user-help [at] jakarta


SteveR at opin

Jul 31, 2003, 12:57 PM

Post #3 of 20 (1871 views)
Permalink
RE: Searching while optimizing [In reply to]

This seems to contradict an item from the Lucene FAQ:

<<
41. Can I modify the index while performing ongoing searches ?
Yes and no. At the time of writing this FAQ (June 2001), Lucene is not
thread safe in this regard. Here is a quote from Doug Cutting, the creator
of Lucene:


The problems are only when you add documents or optimize an index, and then
search with an IndexReader that was constructed before those changes to the
index were made.
A possible work around is to perform the index updates in a parable and
separate index and switch to the new index when its updating is done. The
switching may be done for example, using a variable that will points to the
directory of the current active index. Since searches have a relatively
short life time, you may discard (or resue the old index) short time after
performing the switch (this grace period should be a little longer if you
want to let all searches that involved paging through the hit list to be
completed with consistent results).
>>

Can you explain further?

-----Original Message-----
From: Doug Cutting [mailto:cutting [at] lucene]
Sent: Thursday, July 31, 2003 2:31 PM
To: Lucene Users List
Subject: Re: Searching while optimizing


Aviran Mordo wrote:
> Is it possible and safe to search an index while another thread adds
> documents or optimizes the same index?

Yes.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-user-help [at] jakarta

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-user-help [at] jakarta


jgaliana at renr

Aug 1, 2003, 1:58 AM

Post #4 of 20 (1871 views)
Permalink
RE: Searching while optimizing [In reply to]

Ok, would be possible the next scenary?

1. We run an IndeWriter and several IndexSearch runs in parallel.
IndexWriter is open all time. There are two contexts in Tomcat, one to Index
and the other to search and administrative works.
2. The context "search" creates an IndexSearch
3. At the same time arrives a document to be indexed.
4. the context "indexer" uses the IndexWriter to add the document, and it
needs to do a merge and creates and deletes segments.
5. When IndexSearch is goint be used the files are changed, and don´t found
it.

It´s possible?
Jose galiana

-----Mensaje original-----
De: Steve Rajavuori [mailto:SteveR [at] opin]
Enviado el: jueves, 31 de julio de 2003 21:57
Para: 'Lucene Users List'
Asunto: RE: Searching while optimizing


This seems to contradict an item from the Lucene FAQ:

<<
41. Can I modify the index while performing ongoing searches ?
Yes and no. At the time of writing this FAQ (June 2001), Lucene is not
thread safe in this regard. Here is a quote from Doug Cutting, the creator
of Lucene:


The problems are only when you add documents or optimize an index, and then
search with an IndexReader that was constructed before those changes to the
index were made.
A possible work around is to perform the index updates in a parable and
separate index and switch to the new index when its updating is done. The
switching may be done for example, using a variable that will points to the
directory of the current active index. Since searches have a relatively
short life time, you may discard (or resue the old index) short time after
performing the switch (this grace period should be a little longer if you
want to let all searches that involved paging through the hit list to be
completed with consistent results).
>>

Can you explain further?

-----Original Message-----
From: Doug Cutting [mailto:cutting [at] lucene]
Sent: Thursday, July 31, 2003 2:31 PM
To: Lucene Users List
Subject: Re: Searching while optimizing


Aviran Mordo wrote:
> Is it possible and safe to search an index while another thread adds
> documents or optimizes the same index?

Yes.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-user-help [at] jakarta

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-user-help [at] jakarta



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-user-help [at] jakarta


jgaliana at renr

Aug 1, 2003, 1:58 AM

Post #5 of 20 (1879 views)
Permalink
RE: Searching while optimizing [In reply to]

Ok, would be possible the next scenary?

1. We run an IndeWriter and several IndexSearch runs in parallel.
IndexWriter is open all time. There are two contexts in Tomcat, one to Index
and the other to search and administrative works.
2. The context "search" creates an IndexSearch
3. At the same time arrives a document to be indexed.
4. the context "indexer" uses the IndexWriter to add the document, and it
needs to do a merge and creates and deletes segments.
5. When IndexSearch is goint be used the files are changed, and don´t found
it.

It´s possible?
Jose galiana

-----Mensaje original-----
De: Steve Rajavuori [mailto:SteveR [at] opin]
Enviado el: jueves, 31 de julio de 2003 21:57
Para: 'Lucene Users List'
Asunto: RE: Searching while optimizing


This seems to contradict an item from the Lucene FAQ:

<<
41. Can I modify the index while performing ongoing searches ?
Yes and no. At the time of writing this FAQ (June 2001), Lucene is not
thread safe in this regard. Here is a quote from Doug Cutting, the creator
of Lucene:


The problems are only when you add documents or optimize an index, and then
search with an IndexReader that was constructed before those changes to the
index were made.
A possible work around is to perform the index updates in a parable and
separate index and switch to the new index when its updating is done. The
switching may be done for example, using a variable that will points to the
directory of the current active index. Since searches have a relatively
short life time, you may discard (or resue the old index) short time after
performing the switch (this grace period should be a little longer if you
want to let all searches that involved paging through the hit list to be
completed with consistent results).
>>

Can you explain further?

-----Original Message-----
From: Doug Cutting [mailto:cutting [at] lucene]
Sent: Thursday, July 31, 2003 2:31 PM
To: Lucene Users List
Subject: Re: Searching while optimizing


Aviran Mordo wrote:
> Is it possible and safe to search an index while another thread adds
> documents or optimizes the same index?

Yes.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-user-help [at] jakarta

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-user-help [at] jakarta


cutting at lucene

Aug 20, 2003, 12:25 PM

Post #6 of 20 (1871 views)
Permalink
Re: Searching while optimizing [In reply to]

That is an old FAQ item. Lucene has been thread safe for a while now.

Doug

Steve Rajavuori wrote:
> This seems to contradict an item from the Lucene FAQ:
>
> <<
> 41. Can I modify the index while performing ongoing searches ?
> Yes and no. At the time of writing this FAQ (June 2001), Lucene is not
> thread safe in this regard. Here is a quote from Doug Cutting, the creator
> of Lucene:
>
>
> The problems are only when you add documents or optimize an index, and then
> search with an IndexReader that was constructed before those changes to the
> index were made.
> A possible work around is to perform the index updates in a parable and
> separate index and switch to the new index when its updating is done. The
> switching may be done for example, using a variable that will points to the
> directory of the current active index. Since searches have a relatively
> short life time, you may discard (or resue the old index) short time after
> performing the switch (this grace period should be a little longer if you
> want to let all searches that involved paging through the hit list to be
> completed with consistent results).
>
>
> Can you explain further?
>
> -----Original Message-----
> From: Doug Cutting [mailto:cutting [at] lucene]
> Sent: Thursday, July 31, 2003 2:31 PM
> To: Lucene Users List
> Subject: Re: Searching while optimizing
>
>
> Aviran Mordo wrote:
>
>>Is it possible and safe to search an index while another thread adds
>>documents or optimizes the same index?
>
>
> Yes.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe [at] jakarta
> For additional commands, e-mail: lucene-user-help [at] jakarta
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe [at] jakarta
> For additional commands, e-mail: lucene-user-help [at] jakarta
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-user-help [at] jakarta


cutting at lucene

Aug 20, 2003, 12:25 PM

Post #7 of 20 (1866 views)
Permalink
Re: Searching while optimizing [In reply to]

That is an old FAQ item. Lucene has been thread safe for a while now.

Doug

Steve Rajavuori wrote:
> This seems to contradict an item from the Lucene FAQ:
>
> <<
> 41. Can I modify the index while performing ongoing searches ?
> Yes and no. At the time of writing this FAQ (June 2001), Lucene is not
> thread safe in this regard. Here is a quote from Doug Cutting, the creator
> of Lucene:
>
>
> The problems are only when you add documents or optimize an index, and then
> search with an IndexReader that was constructed before those changes to the
> index were made.
> A possible work around is to perform the index updates in a parable and
> separate index and switch to the new index when its updating is done. The
> switching may be done for example, using a variable that will points to the
> directory of the current active index. Since searches have a relatively
> short life time, you may discard (or resue the old index) short time after
> performing the switch (this grace period should be a little longer if you
> want to let all searches that involved paging through the hit list to be
> completed with consistent results).
>
>
> Can you explain further?
>
> -----Original Message-----
> From: Doug Cutting [mailto:cutting [at] lucene]
> Sent: Thursday, July 31, 2003 2:31 PM
> To: Lucene Users List
> Subject: Re: Searching while optimizing
>
>
> Aviran Mordo wrote:
>
>>Is it possible and safe to search an index while another thread adds
>>documents or optimizes the same index?
>
>
> Yes.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe [at] jakarta
> For additional commands, e-mail: lucene-user-help [at] jakarta
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe [at] jakarta
> For additional commands, e-mail: lucene-user-help [at] jakarta
>


v.sevel at lombardodier

Nov 23, 2009, 10:44 PM

Post #8 of 20 (1855 views)
Permalink
Re: Searching while optimizing [In reply to]

1) correct: I am using IndexWriter.getReader(). I guess I was assuming that
was a privately owned object and I had no business dealing with its
lifecycle. the api would be clearer to rename the operation createReader().

2) how much transient disk space should I expect? isn't this pretty much
what the index writer javadoc said we should not do: "When running in this
mode, be careful not to refresh your readers while optimize or segment
merges are taking place as this can tie up substantial disk space."


Michael McCandless-2 wrote:
>
> When you say "getting a reader of the writer" do you mean
> writer.getReader()? Ie the new near real-time API in 2.9?
>
> For that API (an in general whenever you open a reader), you must
> close it. I think all your files is because you're not closing your
> old readers.
>
> Reopening readers during optimize is fine, if you close the old reader
> each time. It will possibly tie up more transient disk usage than had
> you reopened at the end of optimize, but if you have plenty of disk
> space it shouldn't be a problem.
>
> Mike
>
> On Mon, Nov 23, 2009 at 3:20 PM, vsevel <v.sevel [at] lombardodier> wrote:
>>
>> Hi, I am using lucene 2.9.1 to index a continuous flow of events. My
>> server
>> keeps an index writer open at all time and write events as groups of a
>> few
>> hundred followed by a commit. While writing, users invoke my server to
>> perform searches. Once a day I optimize the index, while writes happens
>> and
>> searches may happen. I adopted the following strategy:
>>
>> for every search I open a new IndexSearcher of the reader of the writer.
>> I
>> execute the search, fetch the documents and finally close the searcher.
>> Specifically, I never close the reader, nor the writer.
>>
>> Q: is that a reasonnable strategy?
>>
>> I found out that my 40Gb index grew up to 200Gb while the number of docs
>> stayed put at 30 millions. I am suspecting that a search during the
>> optimize
>> caused this situation, as described in the index writer javadoc (about
>> refreshing readers during an optimize).
>>
>> Q: is that the likely cause? is getting a reader of the writer just as
>> "bad"
>> as refreshing a reader during an optimize? how can I avoid this behavior?
>> should I just deny searches while optimizing?
>>
>> question on the side: is there any way to interrupt a search that takes
>> too
>> long? for instance by setting a boolean from another thread on the
>> searcher
>> currently performing the search.
>>
>> thanks,
>> vincent
>> --
>> View this message in context:
>> http://old.nabble.com/Searching-while-optimizing-tp26485138p26485138.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>
>

--
View this message in context: http://old.nabble.com/Searching-while-optimizing-tp26485138p26491155.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Nov 24, 2009, 1:59 AM

Post #9 of 20 (1853 views)
Permalink
Re: Searching while optimizing [In reply to]

On Tue, Nov 24, 2009 at 1:44 AM, vsevel <v.sevel [at] lombardodier> wrote:
>
> 1) correct: I am using IndexWriter.getReader(). I guess I was assuming that
> was a privately owned object and I had no business dealing with its
> lifecycle. the api would be clearer to rename the operation createReader().

I just committed an addition to the javadocs that the caller is
responsible for closing the returned reader.

I think createReader() isn't great either because it sound more
expensive than it is -- under the hood, the returned reader is
typically sharing many subreaders with the last reader obtained. That
sharing is what makes the reopen time fast.

> 2) how much transient disk space should I expect? isn't this pretty much
> what the index writer javadoc said we should not do: "When running in this
> mode, be careful not to refresh your readers while optimize or segment
> merges are taking place as this can tie up substantial disk space."

It is exactly what the javadoc says you should not do, but if you know
the risks, go for it ;)

How much space is tied up depends on how often you reopen and how
quickly you close the last reader. If eg you aggressively close the
last reader, such that effectively only one reader is open at once,
then I think you're looking at worst case index consumes 4X it's
"nominal" size (vs 3X if you don't open a single reader).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


uwe at thetaphi

Nov 24, 2009, 2:02 AM

Post #10 of 20 (1862 views)
Permalink
RE: Searching while optimizing [In reply to]

How about newReader()?

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi


> -----Original Message-----
> From: Michael McCandless [mailto:lucene [at] mikemccandless]
> Sent: Tuesday, November 24, 2009 11:00 AM
> To: java-user [at] lucene
> Subject: Re: Searching while optimizing
>
> On Tue, Nov 24, 2009 at 1:44 AM, vsevel <v.sevel [at] lombardodier> wrote:
> >
> > 1) correct: I am using IndexWriter.getReader(). I guess I was assuming
> that
> > was a privately owned object and I had no business dealing with its
> > lifecycle. the api would be clearer to rename the operation
> createReader().
>
> I just committed an addition to the javadocs that the caller is
> responsible for closing the returned reader.
>
> I think createReader() isn't great either because it sound more
> expensive than it is -- under the hood, the returned reader is
> typically sharing many subreaders with the last reader obtained. That
> sharing is what makes the reopen time fast.
>
> > 2) how much transient disk space should I expect? isn't this pretty much
> > what the index writer javadoc said we should not do: "When running in
> this
> > mode, be careful not to refresh your readers while optimize or segment
> > merges are taking place as this can tie up substantial disk space."
>
> It is exactly what the javadoc says you should not do, but if you know
> the risks, go for it ;)
>
> How much space is tied up depends on how often you reopen and how
> quickly you close the last reader. If eg you aggressively close the
> last reader, such that effectively only one reader is open at once,
> then I think you're looking at worst case index consumes 4X it's
> "nominal" size (vs 3X if you don't open a single reader).
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Nov 24, 2009, 2:22 AM

Post #11 of 20 (1847 views)
Permalink
Re: Searching while optimizing [In reply to]

I don't really like that name, for the same reason ("create" and "new"
imply that an entirely new reader is being created, which is far more
costly than what normally happens).

Mike

On Tue, Nov 24, 2009 at 5:02 AM, Uwe Schindler <uwe [at] thetaphi> wrote:
> How about newReader()?
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
>
>
>> -----Original Message-----
>> From: Michael McCandless [mailto:lucene [at] mikemccandless]
>> Sent: Tuesday, November 24, 2009 11:00 AM
>> To: java-user [at] lucene
>> Subject: Re: Searching while optimizing
>>
>> On Tue, Nov 24, 2009 at 1:44 AM, vsevel <v.sevel [at] lombardodier> wrote:
>> >
>> > 1) correct: I am using IndexWriter.getReader(). I guess I was assuming
>> that
>> > was a privately owned object and I had no business dealing with its
>> > lifecycle. the api would be clearer to rename the operation
>> createReader().
>>
>> I just committed an addition to the javadocs that the caller is
>> responsible for closing the returned reader.
>>
>> I think createReader() isn't great either because it sound more
>> expensive than it is -- under the hood, the returned reader is
>> typically sharing many subreaders with the last reader obtained.  That
>> sharing is what makes the reopen time fast.
>>
>> > 2) how much transient disk space should I expect? isn't this pretty much
>> > what the index writer javadoc said we should not do: "When running in
>> this
>> > mode, be careful not to refresh your readers while optimize or segment
>> > merges are taking place as this can tie up substantial disk space."
>>
>> It is exactly what the javadoc says you should not do, but if you know
>> the risks, go for it ;)
>>
>> How much space is tied up depends on how often you reopen and how
>> quickly you close the last reader.  If eg you aggressively close the
>> last reader, such that effectively only one reader is open at once,
>> then I think you're looking at worst case index consumes 4X it's
>> "nominal" size (vs 3X if you don't open a single reader).
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


v.sevel at lombardodier

Nov 24, 2009, 6:08 AM

Post #12 of 20 (1844 views)
Permalink
Re: Searching while optimizing [In reply to]

Hi, just to make sure I understand correctly... After an optimize, without
any reader, my index takes 30Gb on the disk. Are you saying that if I can
ensure there is only one reader at a time, it could take up to 120Gb on the
disk if searching while an optimize is going on?

I did not get your 3X when there is no reader. In that situation isn't that
the nominal size?

different subject: I saw in 3.0.0RC1 that interrupting a merging thread was
being discussed. couldn't you do something similar for searches. I let my
users do full text searches on documents with over 50 fields. if using too
many wildcards, the search could take a long time. and rather than
restricting what they can do, I would rather let them cancel the search
gracefully. would that be something feasible?

Thanks,
vincent


Michael McCandless-2 wrote:
>
> On Tue, Nov 24, 2009 at 1:44 AM, vsevel <v.sevel [at] lombardodier> wrote:
>>
>> 1) correct: I am using IndexWriter.getReader(). I guess I was assuming
>> that
>> was a privately owned object and I had no business dealing with its
>> lifecycle. the api would be clearer to rename the operation
>> createReader().
>
> I just committed an addition to the javadocs that the caller is
> responsible for closing the returned reader.
>
> I think createReader() isn't great either because it sound more
> expensive than it is -- under the hood, the returned reader is
> typically sharing many subreaders with the last reader obtained. That
> sharing is what makes the reopen time fast.
>
>> 2) how much transient disk space should I expect? isn't this pretty much
>> what the index writer javadoc said we should not do: "When running in
>> this
>> mode, be careful not to refresh your readers while optimize or segment
>> merges are taking place as this can tie up substantial disk space."
>
> It is exactly what the javadoc says you should not do, but if you know
> the risks, go for it ;)
>
> How much space is tied up depends on how often you reopen and how
> quickly you close the last reader. If eg you aggressively close the
> last reader, such that effectively only one reader is open at once,
> then I think you're looking at worst case index consumes 4X it's
> "nominal" size (vs 3X if you don't open a single reader).
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>
>

--
View this message in context: http://old.nabble.com/Searching-while-optimizing-tp26485138p26496505.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Nov 24, 2009, 8:59 AM

Post #13 of 20 (1841 views)
Permalink
Re: Searching while optimizing [In reply to]

On Tue, Nov 24, 2009 at 9:08 AM, vsevel <v.sevel [at] lombardodier> wrote:

> Hi, just to make sure I understand correctly... After an optimize, without
> any reader, my index takes 30Gb on the disk. Are you saying that if I can
> ensure there is only one reader at a time, it could take up to 120Gb on the
> disk if searching while an optimize is going on?
>
> I did not get your 3X when there is no reader. In that situation isn't that
> the nominal size?

If before optimizing your index takes 30 GB, then you open a writer,
and start the optimize and wait for it to finish, it can take up to 90
GB. Once the optimize is done, but before you commit, 60 GB will be
in use. Once you commit/close this will drop to 30 GB.

(These are all worst-case numbers -- in practice, an optimized index
is smaller, sometimes by alot eg if there are many pending deletions,
than the original).

If the reader was already opened before you opened the writer, then
there's no change to disk space requirements (because the reader has
opened a commit (the starting commit) that the writer will not delete,
anyway).

But if you open a new reader while the optimize is underway, it's
possible to require total 120 GB of space (30 GB for your index, 90 GB
transient), because the reader is holding open temporary segments that
the writer wants to delete. If you open more than one reader, and
don't close the old ones, you can tie up even more disk space.

> different subject: I saw in 3.0.0RC1 that interrupting a merging thread was
> being discussed. couldn't you do something similar for searches. I let my
> users do full text searches on documents with over 50 fields. if using too
> many wildcards, the search could take a long time. and rather than
> restricting what they can do, I would rather let them cancel the search
> gracefully. would that be something feasible?

IndexWriter is interruptible via Thread.interrupt(), but searching
currently is not. However, TimeLimitingCollector can be used to set a
timeout for searches.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


v.sevel at lombardodier

Nov 24, 2009, 12:31 PM

Post #14 of 20 (1823 views)
Permalink
Re: Searching while optimizing [In reply to]

Hi, this is good information. as I read your post I realized that I am
supposed to commit after an optimize, which is something I do not currently
do. That would probably lead to the extra disk space I saw being consumed.
If this is correct, then the optimize javadoc could be improved to say that
it needs to be followed by a commit or close, like any other write.
thanks for the help,
vincent


Michael McCandless-2 wrote:
>
> On Tue, Nov 24, 2009 at 9:08 AM, vsevel <v.sevel [at] lombardodier> wrote:
>> Hi, just to make sure I understand correctly... After an optimize,
>> without
>> any reader, my index takes 30Gb on the disk. Are you saying that if I can
>> ensure there is only one reader at a time, it could take up to 120Gb on
>> the
>> disk if searching while an optimize is going on?
>>
>> I did not get your 3X when there is no reader. In that situation isn't
>> that
>> the nominal size?
>
> If before optimizing your index takes 30 GB, then you open a writer,
> and start the optimize and wait for it to finish, it can take up to 90
> GB. Once the optimize is done, but before you commit, 60 GB will be
> in use. Once you commit/close this will drop to 30 GB.
>
> (These are all worst-case numbers -- in practice, an optimized index
> is smaller, sometimes by alot eg if there are many pending deletions,
> than the original).
>
> If the reader was already opened before you opened the writer, then
> there's no change to disk space requirements (because the reader has
> opened a commit (the starting commit) that the writer will not delete,
> anyway).
>
> But if you open a new reader while the optimize is underway, it's
> possible to require total 120 GB of space (30 GB for your index, 90 GB
> transient), because the reader is holding open temporary segments that
> the writer wants to delete. If you open more than one reader, and
> don't close the old ones, you can tie up even more disk space.
>
>> different subject: I saw in 3.0.0RC1 that interrupting a merging thread
>> was
>> being discussed. couldn't you do something similar for searches. I let my
>> users do full text searches on documents with over 50 fields. if using
>> too
>> many wildcards, the search could take a long time. and rather than
>> restricting what they can do, I would rather let them cancel the search
>> gracefully. would that be something feasible?
>
> IndexWriter is interruptible via Thread.interrupt(), but searching
> currently is not. However, TimeLimitingCollector can be used to set a
> timeout for searches.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>
>

--
View this message in context: http://old.nabble.com/Searching-while-optimizing-tp26485138p26502131.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Nov 24, 2009, 12:39 PM

Post #15 of 20 (1817 views)
Permalink
Re: Searching while optimizing [In reply to]

OK, I'll add that to the javadocs; thanks.

But the fact that you weren't closing the old readers was probably
also tying up lots of disk space...

Mike

On Tue, Nov 24, 2009 at 3:31 PM, vsevel <v.sevel [at] lombardodier> wrote:
>
> Hi, this is good information. as I read your post I realized that I am
> supposed to commit after an optimize, which is something I do not currently
> do. That would probably lead to the extra disk space I saw being consumed.
> If this is correct, then the optimize javadoc could be improved to say that
> it needs to be followed by a commit or close, like any other write.
> thanks for the help,
> vincent
>
>
> Michael McCandless-2 wrote:
>>
>> On Tue, Nov 24, 2009 at 9:08 AM, vsevel <v.sevel [at] lombardodier> wrote:
>>> Hi, just to make sure I understand correctly... After an optimize,
>>> without
>>> any reader, my index takes 30Gb on the disk. Are you saying that if I can
>>> ensure there is only one reader at a time, it could take up to 120Gb on
>>> the
>>> disk if searching while an optimize is going on?
>>>
>>> I did not get your 3X when there is no reader. In that situation isn't
>>> that
>>> the nominal size?
>>
>> If before optimizing your index takes 30 GB, then you open a writer,
>> and start the optimize and wait for it to finish, it can take up to 90
>> GB.  Once the optimize is done, but before you commit, 60 GB will be
>> in use.  Once you commit/close this will drop to 30 GB.
>>
>> (These are all worst-case numbers -- in practice, an optimized index
>> is smaller, sometimes by alot eg if there are many pending deletions,
>> than the original).
>>
>> If the reader was already opened before you opened the writer, then
>> there's no change to disk space requirements (because the reader has
>> opened a commit (the starting commit) that the writer will not delete,
>> anyway).
>>
>> But if you open a new reader while the optimize is underway, it's
>> possible to require total 120 GB of space (30 GB for your index, 90 GB
>> transient), because the reader is holding open temporary segments that
>> the writer wants to delete.  If you open more than one reader, and
>> don't close the old ones, you can tie up even more disk space.
>>
>>> different subject: I saw in 3.0.0RC1 that interrupting a merging thread
>>> was
>>> being discussed. couldn't you do something similar for searches. I let my
>>> users do full text searches on documents with over 50 fields. if using
>>> too
>>> many wildcards, the search could take a long time. and rather than
>>> restricting what they can do, I would rather let them cancel the search
>>> gracefully. would that be something feasible?
>>
>> IndexWriter is interruptible via Thread.interrupt(), but searching
>> currently is not.  However, TimeLimitingCollector can be used to set a
>> timeout for searches.
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/Searching-while-optimizing-tp26485138p26502131.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


v.sevel at lombardodier

Nov 27, 2009, 1:12 PM

Post #16 of 20 (1758 views)
Permalink
Re: Searching while optimizing [In reply to]

Hi, I have done some testing that I would like to share with you.

I am starting my tests with an unoptimized 40Mb index. I have 3 test cases:
1) open a writer, optimize, commit, close
2) open a writer, open a reader from the writer, optimize, commit, close
3) same as 2) except the reader is opened while the optimize is done in a
different thread

During all the tests, I monitor the size of the index on the disk. The
results are:
1) initial=41Mb, before end of optimize=122Mb, after end of optimize=81Mb,
after commit=40Mb, after writer close=40Mb
2) initial=41Mb, before end of optimize=122Mb, after end of optimize=104Mb,
after commit=104Mb, after reader close=104Mb, after writer close=40Mb
3) initial=41Mb, before end of optimize=145Mb, after end of optimize=127Mb,
after commit=103Mb, after reader close=103Mb, after writer close=40Mb

From your different posts I assumed that a commit would have the same effect
as a close as far as reclaiming disk space is concerned. however test cases
2 and 3 show that whether the reader is opened before or during the optimize
we end up after commit with an index that is 2.5 times the nominal size.
closing the reader does not change anything. only a close can get us the
index back to nominal.

What is the reason why the commit nor closing the reader can get us back to
nominal?
Do you recommend closing and recreating a new writer after an optimize?

thanks
vincent


Michael McCandless-2 wrote:
>
> OK, I'll add that to the javadocs; thanks.
>
> But the fact that you weren't closing the old readers was probably
> also tying up lots of disk space...
>
> Mike
>

--
View this message in context: http://old.nabble.com/Searching-while-optimizing-tp26485138p26545384.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Nov 27, 2009, 1:47 PM

Post #17 of 20 (1764 views)
Permalink
Re: Searching while optimizing [In reply to]

Phew, thanks for testing! It's all explainable...

When you have a reader open, it prevents the segments it had opened
from being deleted.

When you close that reader, the segments could be deleted, however,
that won't happen until the writer next tries to delete, which it does
only periodically (eg, on flushing a new segment, committing a new
merge, etc.).

Could you try closing your reader, then calling writer.commit() (which
is a no-op, since you had already committed, but it may tickle the
writer into attempting the deletions), and see if that frees up disk
space w/o closing?

Mike

On Fri, Nov 27, 2009 at 4:12 PM, vsevel <v.sevel [at] lombardodier> wrote:
> I am starting my tests with an unoptimized 40Mb index. I have 3 test cases:
> 1) open a writer, optimize, commit, close
> 2) open a writer, open a reader from the writer, optimize, commit, close
> 3) same as 2) except the reader is opened while the optimize is done in a
> different thread
>
> During all the tests, I monitor the size of the index on the disk. The
> results are:
> 1) initial=41Mb, before end of optimize=122Mb, after end of optimize=81Mb,
> after commit=40Mb,                            after writer close=40Mb
> 2) initial=41Mb, before end of optimize=122Mb, after end of optimize=104Mb,
> after commit=104Mb, after reader close=104Mb, after writer close=40Mb
> 3) initial=41Mb, before end of optimize=145Mb, after end of optimize=127Mb,
> after commit=103Mb, after reader close=103Mb, after writer close=40Mb
>
> From your different posts I assumed that a commit would have the same effect
> as a close as far as reclaiming disk space is concerned. however test cases
> 2 and 3 show that whether the reader is opened before or during the optimize
> we end up after commit with an index that is 2.5 times the nominal size.
> closing the reader does not change anything. only a close can get us the
> index back to nominal.
>
> What is the reason why the commit nor closing the reader can get us back to
> nominal?
> Do you recommend closing and recreating a new writer after an optimize?
>
> thanks
> vincent
>
>
> Michael McCandless-2 wrote:
>>
>> OK, I'll add that to the javadocs; thanks.
>>
>> But the fact that you weren't closing the old readers was probably
>> also tying up lots of disk space...
>>
>> Mike
>>
>
> --
> View this message in context: http://old.nabble.com/Searching-while-optimizing-tp26485138p26545384.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


v.sevel at lombardodier

Nov 28, 2009, 12:02 PM

Post #18 of 20 (1691 views)
Permalink
Re: Searching while optimizing [In reply to]

Hi, thanks for the explanations. Though I had no luck...

I now do the close of the reader before the commit. But still, only the
close get us back to nominal. Here is the complete test:

@Test
public void optimize() throws Exception {
final File dir = new File("lucene_work/optimize");
dir.mkdirs();

for (File f : dir.listFiles()) {
f.delete();
}

Assert.assertEquals(0, dir.listFiles().length);

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
MaxFieldLength maxLength = IndexWriter.MaxFieldLength.UNLIMITED;
IndexWriter writer = new IndexWriter(FSDirectory.open(dir),
analyzer, true, maxLength);
monitorIndexSize(dir);
long time = 2000;

log.info("writing...");
for (int i = 0; i < 1000000; i++) {
Document doc = new Document();
doc.add(new Field("foo", "bar " + i, Store.YES,
Index.NOT_ANALYZED));
writer.addDocument(doc);
}

writer.commit();
log.info("done write");
Thread.sleep(time);

log.info("opening reader...");
IndexReader reader = writer.getReader();
log.info("done open reader");
Thread.sleep(time);

log.info("optimizing...");
writer.optimize();
log.info("done optimize");
Thread.sleep(time);

log.info("closing reader...");
reader.close();
log.info("done reader close");
Thread.sleep(time);

log.info("committing...");
writer.commit();
log.info("done commit");
Thread.sleep(time);

log.info("closing writer...");
writer.close();
log.info("done writer close");
Thread.sleep(time);
}

And an exec log:

15:58:46,875 INFO logserver.LuceneSystemTest writing...
15:58:46,875 INFO logserver.LuceneSystemTest size=0Mb
15:58:47,891 INFO logserver.LuceneSystemTest size=1Mb
15:58:48,891 INFO logserver.LuceneSystemTest size=3Mb
15:58:49,891 INFO logserver.LuceneSystemTest size=5Mb
15:58:50,906 INFO logserver.LuceneSystemTest size=8Mb
15:58:51,906 INFO logserver.LuceneSystemTest size=9Mb
15:58:52,906 INFO logserver.LuceneSystemTest size=12Mb
15:58:53,922 INFO logserver.LuceneSystemTest size=14Mb
15:58:54,984 INFO logserver.LuceneSystemTest size=15Mb
15:58:55,984 INFO logserver.LuceneSystemTest size=18Mb
15:58:56,984 INFO logserver.LuceneSystemTest size=20Mb
15:58:58,000 INFO logserver.LuceneSystemTest size=21Mb
15:58:59,000 INFO logserver.LuceneSystemTest size=25Mb
15:59:00,016 INFO logserver.LuceneSystemTest size=27Mb
15:59:01,016 INFO logserver.LuceneSystemTest size=29Mb
15:59:02,016 INFO logserver.LuceneSystemTest size=52Mb
15:59:03,031 INFO logserver.LuceneSystemTest size=52Mb
15:59:04,031 INFO logserver.LuceneSystemTest size=32Mb
15:59:04,328 INFO logserver.LuceneSystemTest done write
15:59:05,031 INFO logserver.LuceneSystemTest size=32Mb
15:59:06,031 INFO logserver.LuceneSystemTest size=32Mb
15:59:06,328 INFO logserver.LuceneSystemTest opening reader...
15:59:06,453 INFO logserver.LuceneSystemTest done open reader
15:59:07,031 INFO logserver.LuceneSystemTest size=32Mb
15:59:08,031 INFO logserver.LuceneSystemTest size=32Mb
15:59:08,453 INFO logserver.LuceneSystemTest optimizing...
15:59:09,047 INFO logserver.LuceneSystemTest size=34Mb
15:59:10,047 INFO logserver.LuceneSystemTest size=37Mb
15:59:11,047 INFO logserver.LuceneSystemTest size=40Mb
15:59:12,047 INFO logserver.LuceneSystemTest size=42Mb
15:59:12,391 INFO logserver.LuceneSystemTest done optimize
15:59:13,062 INFO logserver.LuceneSystemTest size=55Mb
15:59:14,062 INFO logserver.LuceneSystemTest size=55Mb
15:59:14,391 INFO logserver.LuceneSystemTest closing reader...
15:59:14,406 INFO logserver.LuceneSystemTest done reader close
15:59:15,062 INFO logserver.LuceneSystemTest size=55Mb
15:59:16,062 INFO logserver.LuceneSystemTest size=55Mb
15:59:16,406 INFO logserver.LuceneSystemTest committing...
15:59:16,469 INFO logserver.LuceneSystemTest done commit
15:59:17,062 INFO logserver.LuceneSystemTest size=43Mb
15:59:18,062 INFO logserver.LuceneSystemTest size=43Mb
15:59:18,469 INFO logserver.LuceneSystemTest closing writer...
15:59:18,484 INFO logserver.LuceneSystemTest done writer close
15:59:19,062 INFO logserver.LuceneSystemTest size=32Mb
15:59:20,078 INFO logserver.LuceneSystemTest size=32Mb

I guess I would be able to do a close and reopen if really I need to. But if
there is a nicer and more natural solution, I would love to know about it.

thanks,
vincent


Michael McCandless-2 wrote:
>
> Phew, thanks for testing! It's all explainable...
>
> When you have a reader open, it prevents the segments it had opened
> from being deleted.
>
> When you close that reader, the segments could be deleted, however,
> that won't happen until the writer next tries to delete, which it does
> only periodically (eg, on flushing a new segment, committing a new
> merge, etc.).
>
> Could you try closing your reader, then calling writer.commit() (which
> is a no-op, since you had already committed, but it may tickle the
> writer into attempting the deletions), and see if that frees up disk
> space w/o closing?
>
> Mike
>
> On Fri, Nov 27, 2009 at 4:12 PM, vsevel <v.sevel [at] lombardodier> wrote:
>> I am starting my tests with an unoptimized 40Mb index. I have 3 test
>> cases:
>> 1) open a writer, optimize, commit, close
>> 2) open a writer, open a reader from the writer, optimize, commit, close
>> 3) same as 2) except the reader is opened while the optimize is done in a
>> different thread
>>
>> During all the tests, I monitor the size of the index on the disk. The
>> results are:
>> 1) initial=41Mb, before end of optimize=122Mb, after end of
>> optimize=81Mb,
>> after commit=40Mb,                            after writer close=40Mb
>> 2) initial=41Mb, before end of optimize=122Mb, after end of
>> optimize=104Mb,
>> after commit=104Mb, after reader close=104Mb, after writer close=40Mb
>> 3) initial=41Mb, before end of optimize=145Mb, after end of
>> optimize=127Mb,
>> after commit=103Mb, after reader close=103Mb, after writer close=40Mb
>>
>> From your different posts I assumed that a commit would have the same
>> effect
>> as a close as far as reclaiming disk space is concerned. however test
>> cases
>> 2 and 3 show that whether the reader is opened before or during the
>> optimize
>> we end up after commit with an index that is 2.5 times the nominal size.
>> closing the reader does not change anything. only a close can get us the
>> index back to nominal.
>>
>> What is the reason why the commit nor closing the reader can get us back
>> to
>> nominal?
>> Do you recommend closing and recreating a new writer after an optimize?
>>
>> thanks
>> vincent
>>
>>
>> Michael McCandless-2 wrote:
>>>
>>> OK, I'll add that to the javadocs; thanks.
>>>
>>> But the fact that you weren't closing the old readers was probably
>>> also tying up lots of disk space...
>>>
>>> Mike
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Searching-while-optimizing-tp26485138p26545384.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>
>

--
View this message in context: http://old.nabble.com/Searching-while-optimizing-tp26485138p26556468.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Nov 29, 2009, 2:57 AM

Post #19 of 20 (1680 views)
Permalink
Re: Searching while optimizing [In reply to]

OK I dug down on this one... it's actually a bug in IndexWriter, when
used in near real-time mode *and* when CFS is enabled. In that case,
internally IndexWriter holds open the wrong SegmentReader, thus tying
up more disk space than it should.

Functionally, the bug is harmless -- it's just tying up disk space.

I've boiled your example down to a test case.

Thanks for catching & reporting this! I'll open an issue.

If it's a problem, you can workaround the bug by either turning off
CFS, or, using IndexReader.open (& reopen) to get your reader, instead
of the near real-time writer. getReader() method.

Mike

On Sat, Nov 28, 2009 at 3:02 PM, vsevel <v.sevel [at] lombardodier> wrote:
>
> Hi, thanks for the explanations. Though I had no luck...
>
> I now do the close of the reader before the commit. But still, only the
> close get us back to nominal. Here is the complete test:
>
>    @Test
>    public void optimize() throws Exception {
>        final File dir = new File("lucene_work/optimize");
>        dir.mkdirs();
>
>        for (File f : dir.listFiles()) {
>            f.delete();
>        }
>
>        Assert.assertEquals(0, dir.listFiles().length);
>
>        Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
>        MaxFieldLength maxLength = IndexWriter.MaxFieldLength.UNLIMITED;
>        IndexWriter writer = new IndexWriter(FSDirectory.open(dir),
> analyzer, true, maxLength);
>        monitorIndexSize(dir);
>        long time = 2000;
>
>        log.info("writing...");
>        for (int i = 0; i < 1000000; i++) {
>            Document doc = new Document();
>            doc.add(new Field("foo", "bar " + i, Store.YES,
> Index.NOT_ANALYZED));
>            writer.addDocument(doc);
>        }
>
>        writer.commit();
>        log.info("done write");
>        Thread.sleep(time);
>
>        log.info("opening reader...");
>        IndexReader reader = writer.getReader();
>        log.info("done open reader");
>        Thread.sleep(time);
>
>        log.info("optimizing...");
>        writer.optimize();
>        log.info("done optimize");
>        Thread.sleep(time);
>
>        log.info("closing reader...");
>        reader.close();
>        log.info("done reader close");
>        Thread.sleep(time);
>
>        log.info("committing...");
>        writer.commit();
>        log.info("done commit");
>        Thread.sleep(time);
>
>        log.info("closing writer...");
>        writer.close();
>        log.info("done writer close");
>        Thread.sleep(time);
>    }
>
> And an exec log:
>
> 15:58:46,875  INFO logserver.LuceneSystemTest     writing...
> 15:58:46,875  INFO logserver.LuceneSystemTest     size=0Mb
> 15:58:47,891  INFO logserver.LuceneSystemTest     size=1Mb
> 15:58:48,891  INFO logserver.LuceneSystemTest     size=3Mb
> 15:58:49,891  INFO logserver.LuceneSystemTest     size=5Mb
> 15:58:50,906  INFO logserver.LuceneSystemTest     size=8Mb
> 15:58:51,906  INFO logserver.LuceneSystemTest     size=9Mb
> 15:58:52,906  INFO logserver.LuceneSystemTest     size=12Mb
> 15:58:53,922  INFO logserver.LuceneSystemTest     size=14Mb
> 15:58:54,984  INFO logserver.LuceneSystemTest     size=15Mb
> 15:58:55,984  INFO logserver.LuceneSystemTest     size=18Mb
> 15:58:56,984  INFO logserver.LuceneSystemTest     size=20Mb
> 15:58:58,000  INFO logserver.LuceneSystemTest     size=21Mb
> 15:58:59,000  INFO logserver.LuceneSystemTest     size=25Mb
> 15:59:00,016  INFO logserver.LuceneSystemTest     size=27Mb
> 15:59:01,016  INFO logserver.LuceneSystemTest     size=29Mb
> 15:59:02,016  INFO logserver.LuceneSystemTest     size=52Mb
> 15:59:03,031  INFO logserver.LuceneSystemTest     size=52Mb
> 15:59:04,031  INFO logserver.LuceneSystemTest     size=32Mb
> 15:59:04,328  INFO logserver.LuceneSystemTest     done write
> 15:59:05,031  INFO logserver.LuceneSystemTest     size=32Mb
> 15:59:06,031  INFO logserver.LuceneSystemTest     size=32Mb
> 15:59:06,328  INFO logserver.LuceneSystemTest     opening reader...
> 15:59:06,453  INFO logserver.LuceneSystemTest     done open reader
> 15:59:07,031  INFO logserver.LuceneSystemTest     size=32Mb
> 15:59:08,031  INFO logserver.LuceneSystemTest     size=32Mb
> 15:59:08,453  INFO logserver.LuceneSystemTest     optimizing...
> 15:59:09,047  INFO logserver.LuceneSystemTest     size=34Mb
> 15:59:10,047  INFO logserver.LuceneSystemTest     size=37Mb
> 15:59:11,047  INFO logserver.LuceneSystemTest     size=40Mb
> 15:59:12,047  INFO logserver.LuceneSystemTest     size=42Mb
> 15:59:12,391  INFO logserver.LuceneSystemTest     done optimize
> 15:59:13,062  INFO logserver.LuceneSystemTest     size=55Mb
> 15:59:14,062  INFO logserver.LuceneSystemTest     size=55Mb
> 15:59:14,391  INFO logserver.LuceneSystemTest     closing reader...
> 15:59:14,406  INFO logserver.LuceneSystemTest     done reader close
> 15:59:15,062  INFO logserver.LuceneSystemTest     size=55Mb
> 15:59:16,062  INFO logserver.LuceneSystemTest     size=55Mb
> 15:59:16,406  INFO logserver.LuceneSystemTest     committing...
> 15:59:16,469  INFO logserver.LuceneSystemTest     done commit
> 15:59:17,062  INFO logserver.LuceneSystemTest     size=43Mb
> 15:59:18,062  INFO logserver.LuceneSystemTest     size=43Mb
> 15:59:18,469  INFO logserver.LuceneSystemTest     closing writer...
> 15:59:18,484  INFO logserver.LuceneSystemTest     done writer close
> 15:59:19,062  INFO logserver.LuceneSystemTest     size=32Mb
> 15:59:20,078  INFO logserver.LuceneSystemTest     size=32Mb
>
> I guess I would be able to do a close and reopen if really I need to. But if
> there is a nicer and more natural solution, I would love to know about it.
>
> thanks,
> vincent
>
>
> Michael McCandless-2 wrote:
>>
>> Phew, thanks for testing!  It's all explainable...
>>
>> When you have a reader open, it prevents the segments it had opened
>> from being deleted.
>>
>> When you close that reader, the segments could be deleted, however,
>> that won't happen until the writer next tries to delete, which it does
>> only periodically (eg, on flushing a new segment, committing a new
>> merge, etc.).
>>
>> Could you try closing your reader, then calling writer.commit() (which
>> is a no-op, since you had already committed, but it may tickle the
>> writer into attempting the deletions), and see if that frees up disk
>> space w/o closing?
>>
>> Mike
>>
>> On Fri, Nov 27, 2009 at 4:12 PM, vsevel <v.sevel [at] lombardodier> wrote:
>>> I am starting my tests with an unoptimized 40Mb index. I have 3 test
>>> cases:
>>> 1) open a writer, optimize, commit, close
>>> 2) open a writer, open a reader from the writer, optimize, commit, close
>>> 3) same as 2) except the reader is opened while the optimize is done in a
>>> different thread
>>>
>>> During all the tests, I monitor the size of the index on the disk. The
>>> results are:
>>> 1) initial=41Mb, before end of optimize=122Mb, after end of
>>> optimize=81Mb,
>>> after commit=40Mb,                            after writer close=40Mb
>>> 2) initial=41Mb, before end of optimize=122Mb, after end of
>>> optimize=104Mb,
>>> after commit=104Mb, after reader close=104Mb, after writer close=40Mb
>>> 3) initial=41Mb, before end of optimize=145Mb, after end of
>>> optimize=127Mb,
>>> after commit=103Mb, after reader close=103Mb, after writer close=40Mb
>>>
>>> From your different posts I assumed that a commit would have the same
>>> effect
>>> as a close as far as reclaiming disk space is concerned. however test
>>> cases
>>> 2 and 3 show that whether the reader is opened before or during the
>>> optimize
>>> we end up after commit with an index that is 2.5 times the nominal size.
>>> closing the reader does not change anything. only a close can get us the
>>> index back to nominal.
>>>
>>> What is the reason why the commit nor closing the reader can get us back
>>> to
>>> nominal?
>>> Do you recommend closing and recreating a new writer after an optimize?
>>>
>>> thanks
>>> vincent
>>>
>>>
>>> Michael McCandless-2 wrote:
>>>>
>>>> OK, I'll add that to the javadocs; thanks.
>>>>
>>>> But the fact that you weren't closing the old readers was probably
>>>> also tying up lots of disk space...
>>>>
>>>> Mike
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/Searching-while-optimizing-tp26485138p26545384.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>>> For additional commands, e-mail: java-user-help [at] lucene
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/Searching-while-optimizing-tp26485138p26556468.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Nov 29, 2009, 3:09 AM

Post #20 of 20 (1667 views)
Permalink
Re: Searching while optimizing [In reply to]

OK I opened https://issues.apache.org/jira/browse/LUCENE-2097 to track this.

Thanks v.sevel!

Mike

On Sun, Nov 29, 2009 at 5:57 AM, Michael McCandless
<lucene [at] mikemccandless> wrote:
> OK I dug down on this one... it's actually a bug in IndexWriter, when
> used in near real-time mode *and* when CFS is enabled.  In that case,
> internally IndexWriter holds open the wrong SegmentReader, thus tying
> up more disk space than it should.
>
> Functionally, the bug is harmless -- it's just tying up disk space.
>
> I've boiled your example down to a test case.
>
> Thanks for catching & reporting this! I'll open an issue.
>
> If it's a problem, you can workaround the bug by either turning off
> CFS, or, using IndexReader.open (& reopen) to get your reader, instead
> of the near real-time writer. getReader() method.
>
> Mike
>
> On Sat, Nov 28, 2009 at 3:02 PM, vsevel <v.sevel [at] lombardodier> wrote:
>>
>> Hi, thanks for the explanations. Though I had no luck...
>>
>> I now do the close of the reader before the commit. But still, only the
>> close get us back to nominal. Here is the complete test:
>>
>>    @Test
>>    public void optimize() throws Exception {
>>        final File dir = new File("lucene_work/optimize");
>>        dir.mkdirs();
>>
>>        for (File f : dir.listFiles()) {
>>            f.delete();
>>        }
>>
>>        Assert.assertEquals(0, dir.listFiles().length);
>>
>>        Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
>>        MaxFieldLength maxLength = IndexWriter.MaxFieldLength.UNLIMITED;
>>        IndexWriter writer = new IndexWriter(FSDirectory.open(dir),
>> analyzer, true, maxLength);
>>        monitorIndexSize(dir);
>>        long time = 2000;
>>
>>        log.info("writing...");
>>        for (int i = 0; i < 1000000; i++) {
>>            Document doc = new Document();
>>            doc.add(new Field("foo", "bar " + i, Store.YES,
>> Index.NOT_ANALYZED));
>>            writer.addDocument(doc);
>>        }
>>
>>        writer.commit();
>>        log.info("done write");
>>        Thread.sleep(time);
>>
>>        log.info("opening reader...");
>>        IndexReader reader = writer.getReader();
>>        log.info("done open reader");
>>        Thread.sleep(time);
>>
>>        log.info("optimizing...");
>>        writer.optimize();
>>        log.info("done optimize");
>>        Thread.sleep(time);
>>
>>        log.info("closing reader...");
>>        reader.close();
>>        log.info("done reader close");
>>        Thread.sleep(time);
>>
>>        log.info("committing...");
>>        writer.commit();
>>        log.info("done commit");
>>        Thread.sleep(time);
>>
>>        log.info("closing writer...");
>>        writer.close();
>>        log.info("done writer close");
>>        Thread.sleep(time);
>>    }
>>
>> And an exec log:
>>
>> 15:58:46,875  INFO logserver.LuceneSystemTest     writing...
>> 15:58:46,875  INFO logserver.LuceneSystemTest     size=0Mb
>> 15:58:47,891  INFO logserver.LuceneSystemTest     size=1Mb
>> 15:58:48,891  INFO logserver.LuceneSystemTest     size=3Mb
>> 15:58:49,891  INFO logserver.LuceneSystemTest     size=5Mb
>> 15:58:50,906  INFO logserver.LuceneSystemTest     size=8Mb
>> 15:58:51,906  INFO logserver.LuceneSystemTest     size=9Mb
>> 15:58:52,906  INFO logserver.LuceneSystemTest     size=12Mb
>> 15:58:53,922  INFO logserver.LuceneSystemTest     size=14Mb
>> 15:58:54,984  INFO logserver.LuceneSystemTest     size=15Mb
>> 15:58:55,984  INFO logserver.LuceneSystemTest     size=18Mb
>> 15:58:56,984  INFO logserver.LuceneSystemTest     size=20Mb
>> 15:58:58,000  INFO logserver.LuceneSystemTest     size=21Mb
>> 15:58:59,000  INFO logserver.LuceneSystemTest     size=25Mb
>> 15:59:00,016  INFO logserver.LuceneSystemTest     size=27Mb
>> 15:59:01,016  INFO logserver.LuceneSystemTest     size=29Mb
>> 15:59:02,016  INFO logserver.LuceneSystemTest     size=52Mb
>> 15:59:03,031  INFO logserver.LuceneSystemTest     size=52Mb
>> 15:59:04,031  INFO logserver.LuceneSystemTest     size=32Mb
>> 15:59:04,328  INFO logserver.LuceneSystemTest     done write
>> 15:59:05,031  INFO logserver.LuceneSystemTest     size=32Mb
>> 15:59:06,031  INFO logserver.LuceneSystemTest     size=32Mb
>> 15:59:06,328  INFO logserver.LuceneSystemTest     opening reader...
>> 15:59:06,453  INFO logserver.LuceneSystemTest     done open reader
>> 15:59:07,031  INFO logserver.LuceneSystemTest     size=32Mb
>> 15:59:08,031  INFO logserver.LuceneSystemTest     size=32Mb
>> 15:59:08,453  INFO logserver.LuceneSystemTest     optimizing...
>> 15:59:09,047  INFO logserver.LuceneSystemTest     size=34Mb
>> 15:59:10,047  INFO logserver.LuceneSystemTest     size=37Mb
>> 15:59:11,047  INFO logserver.LuceneSystemTest     size=40Mb
>> 15:59:12,047  INFO logserver.LuceneSystemTest     size=42Mb
>> 15:59:12,391  INFO logserver.LuceneSystemTest     done optimize
>> 15:59:13,062  INFO logserver.LuceneSystemTest     size=55Mb
>> 15:59:14,062  INFO logserver.LuceneSystemTest     size=55Mb
>> 15:59:14,391  INFO logserver.LuceneSystemTest     closing reader...
>> 15:59:14,406  INFO logserver.LuceneSystemTest     done reader close
>> 15:59:15,062  INFO logserver.LuceneSystemTest     size=55Mb
>> 15:59:16,062  INFO logserver.LuceneSystemTest     size=55Mb
>> 15:59:16,406  INFO logserver.LuceneSystemTest     committing...
>> 15:59:16,469  INFO logserver.LuceneSystemTest     done commit
>> 15:59:17,062  INFO logserver.LuceneSystemTest     size=43Mb
>> 15:59:18,062  INFO logserver.LuceneSystemTest     size=43Mb
>> 15:59:18,469  INFO logserver.LuceneSystemTest     closing writer...
>> 15:59:18,484  INFO logserver.LuceneSystemTest     done writer close
>> 15:59:19,062  INFO logserver.LuceneSystemTest     size=32Mb
>> 15:59:20,078  INFO logserver.LuceneSystemTest     size=32Mb
>>
>> I guess I would be able to do a close and reopen if really I need to. But if
>> there is a nicer and more natural solution, I would love to know about it.
>>
>> thanks,
>> vincent
>>
>>
>> Michael McCandless-2 wrote:
>>>
>>> Phew, thanks for testing!  It's all explainable...
>>>
>>> When you have a reader open, it prevents the segments it had opened
>>> from being deleted.
>>>
>>> When you close that reader, the segments could be deleted, however,
>>> that won't happen until the writer next tries to delete, which it does
>>> only periodically (eg, on flushing a new segment, committing a new
>>> merge, etc.).
>>>
>>> Could you try closing your reader, then calling writer.commit() (which
>>> is a no-op, since you had already committed, but it may tickle the
>>> writer into attempting the deletions), and see if that frees up disk
>>> space w/o closing?
>>>
>>> Mike
>>>
>>> On Fri, Nov 27, 2009 at 4:12 PM, vsevel <v.sevel [at] lombardodier> wrote:
>>>> I am starting my tests with an unoptimized 40Mb index. I have 3 test
>>>> cases:
>>>> 1) open a writer, optimize, commit, close
>>>> 2) open a writer, open a reader from the writer, optimize, commit, close
>>>> 3) same as 2) except the reader is opened while the optimize is done in a
>>>> different thread
>>>>
>>>> During all the tests, I monitor the size of the index on the disk. The
>>>> results are:
>>>> 1) initial=41Mb, before end of optimize=122Mb, after end of
>>>> optimize=81Mb,
>>>> after commit=40Mb,                            after writer close=40Mb
>>>> 2) initial=41Mb, before end of optimize=122Mb, after end of
>>>> optimize=104Mb,
>>>> after commit=104Mb, after reader close=104Mb, after writer close=40Mb
>>>> 3) initial=41Mb, before end of optimize=145Mb, after end of
>>>> optimize=127Mb,
>>>> after commit=103Mb, after reader close=103Mb, after writer close=40Mb
>>>>
>>>> From your different posts I assumed that a commit would have the same
>>>> effect
>>>> as a close as far as reclaiming disk space is concerned. however test
>>>> cases
>>>> 2 and 3 show that whether the reader is opened before or during the
>>>> optimize
>>>> we end up after commit with an index that is 2.5 times the nominal size.
>>>> closing the reader does not change anything. only a close can get us the
>>>> index back to nominal.
>>>>
>>>> What is the reason why the commit nor closing the reader can get us back
>>>> to
>>>> nominal?
>>>> Do you recommend closing and recreating a new writer after an optimize?
>>>>
>>>> thanks
>>>> vincent
>>>>
>>>>
>>>> Michael McCandless-2 wrote:
>>>>>
>>>>> OK, I'll add that to the javadocs; thanks.
>>>>>
>>>>> But the fact that you weren't closing the old readers was probably
>>>>> also tying up lots of disk space...
>>>>>
>>>>> Mike
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/Searching-while-optimizing-tp26485138p26545384.html
>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>>>> For additional commands, e-mail: java-user-help [at] lucene
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>>> For additional commands, e-mail: java-user-help [at] lucene
>>>
>>>
>>>
>>
>> --
>> View this message in context: http://old.nabble.com/Searching-while-optimizing-tp26485138p26556468.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.