Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

background merge hit exception

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


gaddamsandeeps at gmail

Jul 12, 2008, 3:05 AM

Post #1 of 8 (562 views)
Permalink
background merge hit exception

Hi ALL ,
This is the exception raised when when am indexing the records (I have 10
million records and after indexing 4 million record i got this exception)

java.io.IOException: background merge hit exception: _8n:c7759352 _8o:c57658
_8p:c55810 into _8q [optimize]

please give me the solution.
--
View this message in context: http://www.nabble.com/background-merge-hit-exception-tp18417970p18417970.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


lucene at mikemccandless

Jul 12, 2008, 3:46 AM

Post #2 of 8 (540 views)
Permalink
Re: background merge hit exception [In reply to]

Normally when optimize throws this exception, either it includes a
"caused by" in the exception, or, you should have seen a stack trace
from a merge thread printed to your stderr.

One quick thing to check is whether you have enough disk space to
complete the optimize. It looks like that was your final merge, which
is the biggest, so you'll need ~1X the size of your index in free
space. EG if your index is 10 GB you need 10 GB of free disk space
before attempting optimize.

Finally, you can switch temporarily to SerialMergeScheduler and then
run your optimize, eg:

writer.setMergeScheduler(new SerialMergeScheduler());
writer.optimize();

Mike

sandyg wrote:

>
>
>
> Hi ALL ,
> This is the exception raised when when am indexing the records (I
> have 10
> million records and after indexing 4 million record i got this
> exception)
>
> java.io.IOException: background merge hit exception: _8n:c7759352
> _8o:c57658
> _8p:c55810 into _8q [optimize]
>
> please give me the solution.
> --
> View this message in context: http://www.nabble.com/background-merge-hit-exception-tp18417970p18417970.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-user-help[at]lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


gaddamsandeeps at gmail

Jul 21, 2008, 12:00 AM

Post #3 of 8 (478 views)
Permalink
Re: background merge hit exception [In reply to]

Hi,
thx for the reply.
But i had enough space in my desk .


Michael McCandless-2 wrote:
>
>
> Normally when optimize throws this exception, either it includes a
> "caused by" in the exception, or, you should have seen a stack trace
> from a merge thread printed to your stderr.
>
> One quick thing to check is whether you have enough disk space to
> complete the optimize. It looks like that was your final merge, which
> is the biggest, so you'll need ~1X the size of your index in free
> space. EG if your index is 10 GB you need 10 GB of free disk space
> before attempting optimize.
>
> Finally, you can switch temporarily to SerialMergeScheduler and then
> run your optimize, eg:
>
> writer.setMergeScheduler(new SerialMergeScheduler());
> writer.optimize();
>
> Mike
>
> sandyg wrote:
>
>>
>>
>>
>> Hi ALL ,
>> This is the exception raised when when am indexing the records (I
>> have 10
>> million records and after indexing 4 million record i got this
>> exception)
>>
>> java.io.IOException: background merge hit exception: _8n:c7759352
>> _8o:c57658
>> _8p:c55810 into _8q [optimize]
>>
>> please give me the solution.
>> --
>> View this message in context:
>> http://www.nabble.com/background-merge-hit-exception-tp18417970p18417970.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
>> For additional commands, e-mail: java-user-help[at]lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-user-help[at]lucene.apache.org
>
>
>

--
View this message in context: http://www.nabble.com/background-merge-hit-exception-tp18417970p18563153.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


lucene at mikemccandless

Jul 21, 2008, 3:09 AM

Post #4 of 8 (472 views)
Permalink
Re: background merge hit exception [In reply to]

Did you see a "caused by" in your stack trace? Or, a separate
unhandled exception in a thread? Is this reproducible?

Also, did you have a reader open on the index when you ran the
optimize? If so, you'd need 2X the index size (eg 20 GB with the
example below) free.

Mike

sandyg wrote:

>
> Hi,
> thx for the reply.
> But i had enough space in my desk .
>
>
> Michael McCandless-2 wrote:
>>
>>
>> Normally when optimize throws this exception, either it includes a
>> "caused by" in the exception, or, you should have seen a stack trace
>> from a merge thread printed to your stderr.
>>
>> One quick thing to check is whether you have enough disk space to
>> complete the optimize. It looks like that was your final merge,
>> which
>> is the biggest, so you'll need ~1X the size of your index in free
>> space. EG if your index is 10 GB you need 10 GB of free disk space
>> before attempting optimize.
>>
>> Finally, you can switch temporarily to SerialMergeScheduler and then
>> run your optimize, eg:
>>
>> writer.setMergeScheduler(new SerialMergeScheduler());
>> writer.optimize();
>>
>> Mike
>>
>> sandyg wrote:
>>
>>>
>>>
>>>
>>> Hi ALL ,
>>> This is the exception raised when when am indexing the records (I
>>> have 10
>>> million records and after indexing 4 million record i got this
>>> exception)
>>>
>>> java.io.IOException: background merge hit exception: _8n:c7759352
>>> _8o:c57658
>>> _8p:c55810 into _8q [optimize]
>>>
>>> please give me the solution.
>>> --
>>> View this message in context:
>>> http://www.nabble.com/background-merge-hit-exception-tp18417970p18417970.html
>>> Sent from the Lucene - Java Users mailing list archive at
>>> Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
>>> For additional commands, e-mail: java-user-help[at]lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
>> For additional commands, e-mail: java-user-help[at]lucene.apache.org
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/background-merge-hit-exception-tp18417970p18563153.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-user-help[at]lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


lucene at mikemccandless

Sep 18, 2008, 7:19 AM

Post #5 of 8 (296 views)
Permalink
Re: Background merge hit exception [In reply to]

Lucene tries to carry forward the root cause exception from the merge,
into that IOException that optimize throws. But it doesn't always
succeed in doing so; I'll open a Jira issue and try to figure out why
this is the case.

All the exception "means" is that the optimize didn't finish -- you
still have multiple segments in the index. The index should still be
fine (not corrupt, nothing lost).

What's happening is a BG merge thread is hitting an unhandled
exception. The JRE will log such unhandled exceptions to System.err
by default, so, you should scour the app server's logs to find it (it
should be there).

BTW, that merge that's being attempted is particularly inefficient --
you are merging an immense segment (the first one) with a bunch of
tiny ones. A partial optimize could be much better.

Things to try w/o code changes:

* Use a separate tool, eg Luke, to run optimize and see what
root exception is thrown.

Things to try with code changes:

* Switch to SerialMergeScheduler -- it should still throw the
exception, but you'll see the full root cause.

* As of 2.4 (coming soon), subclass ConcurrentMergeScheduler and
override the handleMergeException to do your own logging so you
that you see the detailed root-cause exception.

* Alternatively, before Lucene 2.4, if you are running in JRE 1.5+
environment, you can set the default exception handler for
threads to do your own logging:

http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Thread.html#setDefaultUncaughtExceptionHandler(java.lang.Thread.UncaughtExceptionHandler)

Mike

On Sep 17, 2008, at 4:24 PM, vivek sar wrote:

> Hi,
>
> We have been running Lucene 2.3 for last few months with our
> application and all the sudden we have hit the following exception,
>
> java.lang.RuntimeException: java.io.IOException: background
> merge hit exception: _2uxy:c11345949 _2uxz:c150 _2uy0:c150 _2uy1:c150
> _2uy2:c150 _2uy3:c150 _2uy4:c82 into _2uy5 [optimize]
>
> I don't see any other error messages (or stacktrace) around this
> exception message. This problem doesn't seem to be recoverable and the
> indexer process is failing even after the reboot of the machine.
>
> I've gone through the mailing list over this issue and saw few
> suggestions,
>
> 1) Make sure you've enough disk space (x2 the index size) - our
> index size is around 5 GB and we have around 50GB space available so
> this shouldn't be the case
> 2) Is your machine multi-core - yes, this application is running on
> Linux box with 8 CPUs, not sure if this is the problem
>
> I can't update the code as this is running on the customer site. Here
> are my questions,
>
> a) Is there any workaround to this problem without updating the
> code base?
> b) Is there a jira opened on this issue?
> c) Has this been fixed in the subsequent Lucene releases?
>
> Thanks,
> -vivek
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-user-help[at]lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


vivextra at gmail

Sep 18, 2008, 4:13 PM

Post #6 of 8 (294 views)
Permalink
Re: Background merge hit exception [In reply to]

Thanks Mike for the insight. I did check the stdout log and found it
was complaining of not having enough disk space. I thought we need
only x2 of the index size. Our index size is 10G (max) and we had 45G
left on that parition - should it still complain of the space?


Some comments/questions on other issues you raised,


We have 2 threads that index the data in two different indexes and
then we merge them into a master index with following call,

masterWriter.addIndexesNoOptimize(indices);

Once the smaller indices have merged into the master index we delete
the smaller indices.

This process runs every 5 minutes. Master Index can grow up to 10G
before we partition it - move it to other directory and start a new
master index.

Every hour we then optimize the master index using,

writer.optimize(optimizeSegment); //where optimizeSegment = 10

Here are my questions,

1) Is this process flawed in terms of performance and efficiency? What
would you recommend?
2) When you say "partial optimize" what do you mean by that?
3) In Lucene 2.3 "segment merging is done in a background thread" -
how does it work, ie, how does it know which segments to merge? What
would cause this background merge exception?
4) Can we turn off "background merge" if I'm running the optimize
every hour in any case? How do we turn it off?

Thanks,
-vivek



On Thu, Sep 18, 2008 at 7:19 AM, Michael McCandless
<lucene[at]mikemccandless.com> wrote:
> Lucene tries to carry forward the root cause exception from the merge,
> into that IOException that optimize throws. But it doesn't always
> succeed in doing so; I'll open a Jira issue and try to figure out why
> this is the case.
>
> All the exception "means" is that the optimize didn't finish -- you
> still have multiple segments in the index. The index should still be
> fine (not corrupt, nothing lost).
>
> What's happening is a BG merge thread is hitting an unhandled
> exception. The JRE will log such unhandled exceptions to System.err
> by default, so, you should scour the app server's logs to find it (it
> should be there).
>
> BTW, that merge that's being attempted is particularly inefficient --
> you are merging an immense segment (the first one) with a bunch of
> tiny ones. A partial optimize could be much better.
>
> Things to try w/o code changes:
>
> * Use a separate tool, eg Luke, to run optimize and see what
> root exception is thrown.
>
> Things to try with code changes:
>
> * Switch to SerialMergeScheduler -- it should still throw the
> exception, but you'll see the full root cause.
>
> * As of 2.4 (coming soon), subclass ConcurrentMergeScheduler and
> override the handleMergeException to do your own logging so you
> that you see the detailed root-cause exception.
>
> * Alternatively, before Lucene 2.4, if you are running in JRE 1.5+
> environment, you can set the default exception handler for
> threads to do your own logging:
>
>
> http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Thread.html#setDefaultUncaughtExceptionHandler(java.lang.Thread.UncaughtExceptionHandler)
>
> Mike
>
> On Sep 17, 2008, at 4:24 PM, vivek sar wrote:
>
>> Hi,
>>
>> We have been running Lucene 2.3 for last few months with our
>> application and all the sudden we have hit the following exception,
>>
>> java.lang.RuntimeException: java.io.IOException: background
>> merge hit exception: _2uxy:c11345949 _2uxz:c150 _2uy0:c150 _2uy1:c150
>> _2uy2:c150 _2uy3:c150 _2uy4:c82 into _2uy5 [optimize]
>>
>> I don't see any other error messages (or stacktrace) around this
>> exception message. This problem doesn't seem to be recoverable and the
>> indexer process is failing even after the reboot of the machine.
>>
>> I've gone through the mailing list over this issue and saw few
>> suggestions,
>>
>> 1) Make sure you've enough disk space (x2 the index size) - our
>> index size is around 5 GB and we have around 50GB space available so
>> this shouldn't be the case
>> 2) Is your machine multi-core - yes, this application is running on
>> Linux box with 8 CPUs, not sure if this is the problem
>>
>> I can't update the code as this is running on the customer site. Here
>> are my questions,
>>
>> a) Is there any workaround to this problem without updating the code
>> base?
>> b) Is there a jira opened on this issue?
>> c) Has this been fixed in the subsequent Lucene releases?
>>
>> Thanks,
>> -vivek
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
>> For additional commands, e-mail: java-user-help[at]lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-user-help[at]lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


lucene at mikemccandless

Sep 19, 2008, 2:49 AM

Post #7 of 8 (292 views)
Permalink
Re: Background merge hit exception [In reply to]

vivek sar wrote:

> Thanks Mike for the insight. I did check the stdout log and found it
> was complaining of not having enough disk space. I thought we need
> only x2 of the index size. Our index size is 10G (max) and we had 45G
> left on that parition - should it still complain of the space?

Is there a reader open on the index while optimize is running? That
ties up potentially another 1X.

Are you certain you're closing all previously open readers?

On Linux, because the semantics is "delete on last close", it's hard
to detect when you have IndexReaders still open because an "ls" won't
show the deleted files, yet, they are still consuming bytes on disk
until the last open file handle is closed. You can try running "lsof"
to see which files are held open, while optimize is running?

Also, if you can call IndexWriter.setInfoStream(...) for all of the
operations below, I can peak at it to try to see why it's using up so
much intermediate disk space.

> Some comments/questions on other issues you raised,
>
>
> We have 2 threads that index the data in two different indexes and
> then we merge them into a master index with following call,
>
> masterWriter.addIndexesNoOptimize(indices);
>
> Once the smaller indices have merged into the master index we delete
> the smaller indices.
>
> This process runs every 5 minutes. Master Index can grow up to 10G
> before we partition it - move it to other directory and start a new
> master index.
>
> Every hour we then optimize the master index using,
>
> writer.optimize(optimizeSegment); //where optimizeSegment =
> 10

How long does that optimize take? And what do you do with the every-5-
minutes job while optimize is running? Do you run it, anyway, sharing
the same writer (ie you're calling addIndexesNoOptimize while another
thread is running the optimize)?

>
> Here are my questions,
>
> 1) Is this process flawed in terms of performance and efficiency? What
> would you recommend?

Actually I think your approach is the right approach.
>
> 2) When you say "partial optimize" what do you mean by that?

Actually, it's what you're already doing (passing 10 to optimize).
This means the index just has to reduce itself to <= 10 segments,
instead of the normal 1 segment for a full optimize.

Still I find that particular merge being done somewhat odd: it was
merging 7 segments, the first of which was immense, and the final 6
were tiny. It's not an efficient merge to do. Seeing the infoStream
output might help explain what led to that...

>
> 3) In Lucene 2.3 "segment merging is done in a background thread" -
> how does it work, ie, how does it know which segments to merge? What
> would cause this background merge exception?

The selection of segments to merge, and when, is done by the
LogByteSizeMergePolicy, which you can swap out for your own merge
policy (should not in general be necessary). Once a merge is
selected, the execution of that merge is controlled by
ConcurrentMergeScheduler, which runs merges in background threads.
You can also swap that out (eg for SerialMergeScheduler, which uses
the FG thread to merging, like Lucene used to before 2.3).

I think the background merge exception is often disk full, but in
general it can be anything that went wrong while merging. Such
exceptions won't corrupt your index because the merge only commits the
changes to the index if it completes successfully.

>
> 4) Can we turn off "background merge" if I'm running the optimize
> every hour in any case? How do we turn it off?

Yes: IndexWriter.setMergeScheduler(new SerialMergeScheduler()) gets
you back to the old (fg thread) way of running merges. But in general
this gets you worse net performance, unless you are already using
multiple threads when adding documents.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


lucene at mikemccandless

Sep 22, 2008, 4:02 PM

Post #8 of 8 (274 views)
Permalink
Re: Background merge hit exception [In reply to]

OK I found one path whereby optimize would detect that the
ConcurrentMergeScheduler had hit an exception while merging in a BG
thread, and correctly throw an IOException back to its caller, but
fail to set the root cause in that exception. I just committed it, so
it should be fixed in 2.4:

https://issues.apache.org/jira/browse/LUCENE-1397

Mike

Michael McCandless wrote:

>
> vivek sar wrote:
>
>> Thanks Mike for the insight. I did check the stdout log and found it
>> was complaining of not having enough disk space. I thought we need
>> only x2 of the index size. Our index size is 10G (max) and we had 45G
>> left on that parition - should it still complain of the space?
>
> Is there a reader open on the index while optimize is running? That
> ties up potentially another 1X.
>
> Are you certain you're closing all previously open readers?
>
> On Linux, because the semantics is "delete on last close", it's hard
> to detect when you have IndexReaders still open because an "ls"
> won't show the deleted files, yet, they are still consuming bytes on
> disk until the last open file handle is closed. You can try running
> "lsof" to see which files are held open, while optimize is running?
>
> Also, if you can call IndexWriter.setInfoStream(...) for all of the
> operations below, I can peak at it to try to see why it's using up
> so much intermediate disk space.
>
>> Some comments/questions on other issues you raised,
>>
>>
>> We have 2 threads that index the data in two different indexes and
>> then we merge them into a master index with following call,
>>
>> masterWriter.addIndexesNoOptimize(indices);
>>
>> Once the smaller indices have merged into the master index we delete
>> the smaller indices.
>>
>> This process runs every 5 minutes. Master Index can grow up to 10G
>> before we partition it - move it to other directory and start a new
>> master index.
>>
>> Every hour we then optimize the master index using,
>>
>> writer.optimize(optimizeSegment); //where optimizeSegment =
>> 10
>
> How long does that optimize take? And what do you do with the
> every-5-minutes job while optimize is running? Do you run it,
> anyway, sharing the same writer (ie you're calling
> addIndexesNoOptimize while another thread is running the optimize)?
>
>>
>> Here are my questions,
>>
>> 1) Is this process flawed in terms of performance and efficiency?
>> What
>> would you recommend?
>
> Actually I think your approach is the right approach.
>>
>> 2) When you say "partial optimize" what do you mean by that?
>
> Actually, it's what you're already doing (passing 10 to optimize).
> This means the index just has to reduce itself to <= 10 segments,
> instead of the normal 1 segment for a full optimize.
>
> Still I find that particular merge being done somewhat odd: it was
> merging 7 segments, the first of which was immense, and the final 6
> were tiny. It's not an efficient merge to do. Seeing the
> infoStream output might help explain what led to that...
>
>>
>> 3) In Lucene 2.3 "segment merging is done in a background thread" -
>> how does it work, ie, how does it know which segments to merge? What
>> would cause this background merge exception?
>
> The selection of segments to merge, and when, is done by the
> LogByteSizeMergePolicy, which you can swap out for your own merge
> policy (should not in general be necessary). Once a merge is
> selected, the execution of that merge is controlled by
> ConcurrentMergeScheduler, which runs merges in background threads.
> You can also swap that out (eg for SerialMergeScheduler, which uses
> the FG thread to merging, like Lucene used to before 2.3).
>
> I think the background merge exception is often disk full, but in
> general it can be anything that went wrong while merging. Such
> exceptions won't corrupt your index because the merge only commits
> the changes to the index if it completes successfully.
>
>>
>> 4) Can we turn off "background merge" if I'm running the optimize
>> every hour in any case? How do we turn it off?
>
> Yes: IndexWriter.setMergeScheduler(new SerialMergeScheduler()) gets
> you back to the old (fg thread) way of running merges. But in
> general this gets you worse net performance, unless you are already
> using multiple threads when adding documents.
>
> Mike


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.