Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: General

InderxWriter.optimize() fail

 

 

Lucene general RSS feed   Index | Next | Previous | View Threaded


sendtoprat at yahoo

Feb 10, 2009, 8:29 AM

Post #1 of 3 (974 views)
Permalink
InderxWriter.optimize() fail

Hi
We scan web and index pages in lucene. Our index size is in the range of
500K to 1 million documens. As we index pages, we also call
IndexWriter.optimize after certain time intervals [I believe Lucene also
does optimization in the background ?]. So far it has worked great. But for
just this one scan we noticed that the our index size grew to 90 GB for
about 900K documents [typical index size should be around 17-18GB]. We are
not sure what caused the index to grow this large. Outside of our system,
when we did a forced IndexWriter.optimize() on this 90 GB lucene index, it
indeed shrinked to 17 GB. My question is what may have caused the size to
grow to 90GB? Did the size grow because optimization failed ? Does
optimization fail if there is any foreign file in the lucene index directory
[.though we tried optimizing with foreign files in lucene directory, and
lucene still did optimize the index.]

any suggestion, input will be quite valuable.
thanks
Pratyush
--
View this message in context: http://www.nabble.com/InderxWriter.optimize%28%29-fail-tp21937277p21937277.html
Sent from the Lucene - General mailing list archive at Nabble.com.


lucene at mikemccandless

Feb 10, 2009, 3:00 PM

Post #2 of 3 (906 views)
Permalink
Re: InderxWriter.optimize() fail [In reply to]

Which version of Lucene are you using?

More questions/answers below...

sendtoprat [at] yahoo wrote:

> We scan web and index pages in lucene. Our index size is in the
> range of
> 500K to 1 million documens. As we index pages, we also call
> IndexWriter.optimize after certain time intervals [.I believe Lucene
> also
> does optimization in the background ?].

Actually Lucene merges segments periodically in the background, but does
not optimize.

> So far it has worked great. But for
> just this one scan we noticed that the our index size grew to 90 GB
> for
> about 900K documents [typical index size should be around 17-18GB].
> We are
> not sure what caused the index to grow this large. Outside of our
> system,
> when we did a forced IndexWriter.optimize() on this 90 GB lucene
> index, it
> indeed shrinked to 17 GB. My question is what may have caused the
> size to
> grow to 90GB?

Optimize requires free temporary disk space equal to 1X the index size.

Do you have an IndexReader open on the index when optimize runs? That
ties up another 1X.

That should mean a 17-18GB index takes 51-54 GB, so I'm not sure why
you got up to 90 GB. There we no exceptions, even in BG merge threads?

Are you reopening readers while optimize is running? In theory that
could
tie up even more disk space (eg if you didn't close the old readers).

> Did the size grow because optimization failed ?

If optimization fails it would remove the partially written files, so
I don't think
this would explain too-high disk usage.

> Does
> optimization fail if there is any foreign file in the lucene index
> directory
> [.though we tried optimizing with foreign files in lucene directory,
> and
> lucene still did optimize the index.]

Foreign files are harmless as long as they don't conflict w/ Lucene's
file names.

Mike


sendtoprat at yahoo

Feb 10, 2009, 3:14 PM

Post #3 of 3 (896 views)
Permalink
Re: InderxWriter.optimize() fail [In reply to]

We are using lucene 2.4.


Michael McCandless-2 wrote:
>
>
> Which version of Lucene are you using?
>
> More questions/answers below...
>
> sendtoprat [at] yahoo wrote:
>
>> We scan web and index pages in lucene. Our index size is in the
>> range of
>> 500K to 1 million documens. As we index pages, we also call
>> IndexWriter.optimize after certain time intervals [.I believe Lucene
>> also
>> does optimization in the background ?].
>
> Actually Lucene merges segments periodically in the background, but does
> not optimize.
>
>> So far it has worked great. But for
>> just this one scan we noticed that the our index size grew to 90 GB
>> for
>> about 900K documents [typical index size should be around 17-18GB].
>> We are
>> not sure what caused the index to grow this large. Outside of our
>> system,
>> when we did a forced IndexWriter.optimize() on this 90 GB lucene
>> index, it
>> indeed shrinked to 17 GB. My question is what may have caused the
>> size to
>> grow to 90GB?
>
> Optimize requires free temporary disk space equal to 1X the index size.
>
> Do you have an IndexReader open on the index when optimize runs? That
> ties up another 1X.
>
> That should mean a 17-18GB index takes 51-54 GB, so I'm not sure why
> you got up to 90 GB. There we no exceptions, even in BG merge threads?
>
> Are you reopening readers while optimize is running? In theory that
> could
> tie up even more disk space (eg if you didn't close the old readers).
>
>> Did the size grow because optimization failed ?
>
> If optimization fails it would remove the partially written files, so
> I don't think
> this would explain too-high disk usage.
>
>> Does
>> optimization fail if there is any foreign file in the lucene index
>> directory
>> [.though we tried optimizing with foreign files in lucene directory,
>> and
>> lucene still did optimize the index.]
>
> Foreign files are harmless as long as they don't conflict w/ Lucene's
> file names.
>
> Mike
>
>

--
View this message in context: http://www.nabble.com/InderxWriter.optimize%28%29-fail-tp21937277p21944987.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Lucene general RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.