Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

background merge hit exception

 

 

First page Previous page 1 2 Next page Last page  View All Lucene java-user RSS feed   Index | Next | Previous | View Threaded


gaddamsandeeps at gmail

Jul 12, 2008, 3:05 AM

Post #1 of 29 (6635 views)
Permalink
background merge hit exception

Hi ALL ,
This is the exception raised when when am indexing the records (I have 10
million records and after indexing 4 million record i got this exception)

java.io.IOException: background merge hit exception: _8n:c7759352 _8o:c57658
_8p:c55810 into _8q [optimize]

please give me the solution.
--
View this message in context: http://www.nabble.com/background-merge-hit-exception-tp18417970p18417970.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Jul 12, 2008, 3:46 AM

Post #2 of 29 (6526 views)
Permalink
Re: background merge hit exception [In reply to]

Normally when optimize throws this exception, either it includes a
"caused by" in the exception, or, you should have seen a stack trace
from a merge thread printed to your stderr.

One quick thing to check is whether you have enough disk space to
complete the optimize. It looks like that was your final merge, which
is the biggest, so you'll need ~1X the size of your index in free
space. EG if your index is 10 GB you need 10 GB of free disk space
before attempting optimize.

Finally, you can switch temporarily to SerialMergeScheduler and then
run your optimize, eg:

writer.setMergeScheduler(new SerialMergeScheduler());
writer.optimize();

Mike

sandyg wrote:

>
>
>
> Hi ALL ,
> This is the exception raised when when am indexing the records (I
> have 10
> million records and after indexing 4 million record i got this
> exception)
>
> java.io.IOException: background merge hit exception: _8n:c7759352
> _8o:c57658
> _8p:c55810 into _8q [optimize]
>
> please give me the solution.
> --
> View this message in context: http://www.nabble.com/background-merge-hit-exception-tp18417970p18417970.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


gaddamsandeeps at gmail

Jul 21, 2008, 12:00 AM

Post #3 of 29 (6462 views)
Permalink
Re: background merge hit exception [In reply to]

Hi,
thx for the reply.
But i had enough space in my desk .


Michael McCandless-2 wrote:
>
>
> Normally when optimize throws this exception, either it includes a
> "caused by" in the exception, or, you should have seen a stack trace
> from a merge thread printed to your stderr.
>
> One quick thing to check is whether you have enough disk space to
> complete the optimize. It looks like that was your final merge, which
> is the biggest, so you'll need ~1X the size of your index in free
> space. EG if your index is 10 GB you need 10 GB of free disk space
> before attempting optimize.
>
> Finally, you can switch temporarily to SerialMergeScheduler and then
> run your optimize, eg:
>
> writer.setMergeScheduler(new SerialMergeScheduler());
> writer.optimize();
>
> Mike
>
> sandyg wrote:
>
>>
>>
>>
>> Hi ALL ,
>> This is the exception raised when when am indexing the records (I
>> have 10
>> million records and after indexing 4 million record i got this
>> exception)
>>
>> java.io.IOException: background merge hit exception: _8n:c7759352
>> _8o:c57658
>> _8p:c55810 into _8q [optimize]
>>
>> please give me the solution.
>> --
>> View this message in context:
>> http://www.nabble.com/background-merge-hit-exception-tp18417970p18417970.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>
>

--
View this message in context: http://www.nabble.com/background-merge-hit-exception-tp18417970p18563153.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Jul 21, 2008, 3:09 AM

Post #4 of 29 (6445 views)
Permalink
Re: background merge hit exception [In reply to]

Did you see a "caused by" in your stack trace? Or, a separate
unhandled exception in a thread? Is this reproducible?

Also, did you have a reader open on the index when you ran the
optimize? If so, you'd need 2X the index size (eg 20 GB with the
example below) free.

Mike

sandyg wrote:

>
> Hi,
> thx for the reply.
> But i had enough space in my desk .
>
>
> Michael McCandless-2 wrote:
>>
>>
>> Normally when optimize throws this exception, either it includes a
>> "caused by" in the exception, or, you should have seen a stack trace
>> from a merge thread printed to your stderr.
>>
>> One quick thing to check is whether you have enough disk space to
>> complete the optimize. It looks like that was your final merge,
>> which
>> is the biggest, so you'll need ~1X the size of your index in free
>> space. EG if your index is 10 GB you need 10 GB of free disk space
>> before attempting optimize.
>>
>> Finally, you can switch temporarily to SerialMergeScheduler and then
>> run your optimize, eg:
>>
>> writer.setMergeScheduler(new SerialMergeScheduler());
>> writer.optimize();
>>
>> Mike
>>
>> sandyg wrote:
>>
>>>
>>>
>>>
>>> Hi ALL ,
>>> This is the exception raised when when am indexing the records (I
>>> have 10
>>> million records and after indexing 4 million record i got this
>>> exception)
>>>
>>> java.io.IOException: background merge hit exception: _8n:c7759352
>>> _8o:c57658
>>> _8p:c55810 into _8q [optimize]
>>>
>>> please give me the solution.
>>> --
>>> View this message in context:
>>> http://www.nabble.com/background-merge-hit-exception-tp18417970p18417970.html
>>> Sent from the Lucene - Java Users mailing list archive at
>>> Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>>> For additional commands, e-mail: java-user-help [at] lucene
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/background-merge-hit-exception-tp18417970p18563153.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Sep 18, 2008, 7:19 AM

Post #5 of 29 (6280 views)
Permalink
Re: Background merge hit exception [In reply to]

Lucene tries to carry forward the root cause exception from the merge,
into that IOException that optimize throws. But it doesn't always
succeed in doing so; I'll open a Jira issue and try to figure out why
this is the case.

All the exception "means" is that the optimize didn't finish -- you
still have multiple segments in the index. The index should still be
fine (not corrupt, nothing lost).

What's happening is a BG merge thread is hitting an unhandled
exception. The JRE will log such unhandled exceptions to System.err
by default, so, you should scour the app server's logs to find it (it
should be there).

BTW, that merge that's being attempted is particularly inefficient --
you are merging an immense segment (the first one) with a bunch of
tiny ones. A partial optimize could be much better.

Things to try w/o code changes:

* Use a separate tool, eg Luke, to run optimize and see what
root exception is thrown.

Things to try with code changes:

* Switch to SerialMergeScheduler -- it should still throw the
exception, but you'll see the full root cause.

* As of 2.4 (coming soon), subclass ConcurrentMergeScheduler and
override the handleMergeException to do your own logging so you
that you see the detailed root-cause exception.

* Alternatively, before Lucene 2.4, if you are running in JRE 1.5+
environment, you can set the default exception handler for
threads to do your own logging:

http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Thread.html#setDefaultUncaughtExceptionHandler(java.lang.Thread.UncaughtExceptionHandler)

Mike

On Sep 17, 2008, at 4:24 PM, vivek sar wrote:

> Hi,
>
> We have been running Lucene 2.3 for last few months with our
> application and all the sudden we have hit the following exception,
>
> java.lang.RuntimeException: java.io.IOException: background
> merge hit exception: _2uxy:c11345949 _2uxz:c150 _2uy0:c150 _2uy1:c150
> _2uy2:c150 _2uy3:c150 _2uy4:c82 into _2uy5 [optimize]
>
> I don't see any other error messages (or stacktrace) around this
> exception message. This problem doesn't seem to be recoverable and the
> indexer process is failing even after the reboot of the machine.
>
> I've gone through the mailing list over this issue and saw few
> suggestions,
>
> 1) Make sure you've enough disk space (x2 the index size) - our
> index size is around 5 GB and we have around 50GB space available so
> this shouldn't be the case
> 2) Is your machine multi-core - yes, this application is running on
> Linux box with 8 CPUs, not sure if this is the problem
>
> I can't update the code as this is running on the customer site. Here
> are my questions,
>
> a) Is there any workaround to this problem without updating the
> code base?
> b) Is there a jira opened on this issue?
> c) Has this been fixed in the subsequent Lucene releases?
>
> Thanks,
> -vivek
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


vivextra at gmail

Sep 18, 2008, 4:13 PM

Post #6 of 29 (6270 views)
Permalink
Re: Background merge hit exception [In reply to]

Thanks Mike for the insight. I did check the stdout log and found it
was complaining of not having enough disk space. I thought we need
only x2 of the index size. Our index size is 10G (max) and we had 45G
left on that parition - should it still complain of the space?


Some comments/questions on other issues you raised,


We have 2 threads that index the data in two different indexes and
then we merge them into a master index with following call,

masterWriter.addIndexesNoOptimize(indices);

Once the smaller indices have merged into the master index we delete
the smaller indices.

This process runs every 5 minutes. Master Index can grow up to 10G
before we partition it - move it to other directory and start a new
master index.

Every hour we then optimize the master index using,

writer.optimize(optimizeSegment); //where optimizeSegment = 10

Here are my questions,

1) Is this process flawed in terms of performance and efficiency? What
would you recommend?
2) When you say "partial optimize" what do you mean by that?
3) In Lucene 2.3 "segment merging is done in a background thread" -
how does it work, ie, how does it know which segments to merge? What
would cause this background merge exception?
4) Can we turn off "background merge" if I'm running the optimize
every hour in any case? How do we turn it off?

Thanks,
-vivek



On Thu, Sep 18, 2008 at 7:19 AM, Michael McCandless
<lucene [at] mikemccandless> wrote:
> Lucene tries to carry forward the root cause exception from the merge,
> into that IOException that optimize throws. But it doesn't always
> succeed in doing so; I'll open a Jira issue and try to figure out why
> this is the case.
>
> All the exception "means" is that the optimize didn't finish -- you
> still have multiple segments in the index. The index should still be
> fine (not corrupt, nothing lost).
>
> What's happening is a BG merge thread is hitting an unhandled
> exception. The JRE will log such unhandled exceptions to System.err
> by default, so, you should scour the app server's logs to find it (it
> should be there).
>
> BTW, that merge that's being attempted is particularly inefficient --
> you are merging an immense segment (the first one) with a bunch of
> tiny ones. A partial optimize could be much better.
>
> Things to try w/o code changes:
>
> * Use a separate tool, eg Luke, to run optimize and see what
> root exception is thrown.
>
> Things to try with code changes:
>
> * Switch to SerialMergeScheduler -- it should still throw the
> exception, but you'll see the full root cause.
>
> * As of 2.4 (coming soon), subclass ConcurrentMergeScheduler and
> override the handleMergeException to do your own logging so you
> that you see the detailed root-cause exception.
>
> * Alternatively, before Lucene 2.4, if you are running in JRE 1.5+
> environment, you can set the default exception handler for
> threads to do your own logging:
>
>
> http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Thread.html#setDefaultUncaughtExceptionHandler(java.lang.Thread.UncaughtExceptionHandler)
>
> Mike
>
> On Sep 17, 2008, at 4:24 PM, vivek sar wrote:
>
>> Hi,
>>
>> We have been running Lucene 2.3 for last few months with our
>> application and all the sudden we have hit the following exception,
>>
>> java.lang.RuntimeException: java.io.IOException: background
>> merge hit exception: _2uxy:c11345949 _2uxz:c150 _2uy0:c150 _2uy1:c150
>> _2uy2:c150 _2uy3:c150 _2uy4:c82 into _2uy5 [optimize]
>>
>> I don't see any other error messages (or stacktrace) around this
>> exception message. This problem doesn't seem to be recoverable and the
>> indexer process is failing even after the reboot of the machine.
>>
>> I've gone through the mailing list over this issue and saw few
>> suggestions,
>>
>> 1) Make sure you've enough disk space (x2 the index size) - our
>> index size is around 5 GB and we have around 50GB space available so
>> this shouldn't be the case
>> 2) Is your machine multi-core - yes, this application is running on
>> Linux box with 8 CPUs, not sure if this is the problem
>>
>> I can't update the code as this is running on the customer site. Here
>> are my questions,
>>
>> a) Is there any workaround to this problem without updating the code
>> base?
>> b) Is there a jira opened on this issue?
>> c) Has this been fixed in the subsequent Lucene releases?
>>
>> Thanks,
>> -vivek
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Sep 19, 2008, 2:49 AM

Post #7 of 29 (6266 views)
Permalink
Re: Background merge hit exception [In reply to]

vivek sar wrote:

> Thanks Mike for the insight. I did check the stdout log and found it
> was complaining of not having enough disk space. I thought we need
> only x2 of the index size. Our index size is 10G (max) and we had 45G
> left on that parition - should it still complain of the space?

Is there a reader open on the index while optimize is running? That
ties up potentially another 1X.

Are you certain you're closing all previously open readers?

On Linux, because the semantics is "delete on last close", it's hard
to detect when you have IndexReaders still open because an "ls" won't
show the deleted files, yet, they are still consuming bytes on disk
until the last open file handle is closed. You can try running "lsof"
to see which files are held open, while optimize is running?

Also, if you can call IndexWriter.setInfoStream(...) for all of the
operations below, I can peak at it to try to see why it's using up so
much intermediate disk space.

> Some comments/questions on other issues you raised,
>
>
> We have 2 threads that index the data in two different indexes and
> then we merge them into a master index with following call,
>
> masterWriter.addIndexesNoOptimize(indices);
>
> Once the smaller indices have merged into the master index we delete
> the smaller indices.
>
> This process runs every 5 minutes. Master Index can grow up to 10G
> before we partition it - move it to other directory and start a new
> master index.
>
> Every hour we then optimize the master index using,
>
> writer.optimize(optimizeSegment); //where optimizeSegment =
> 10

How long does that optimize take? And what do you do with the every-5-
minutes job while optimize is running? Do you run it, anyway, sharing
the same writer (ie you're calling addIndexesNoOptimize while another
thread is running the optimize)?

>
> Here are my questions,
>
> 1) Is this process flawed in terms of performance and efficiency? What
> would you recommend?

Actually I think your approach is the right approach.
>
> 2) When you say "partial optimize" what do you mean by that?

Actually, it's what you're already doing (passing 10 to optimize).
This means the index just has to reduce itself to <= 10 segments,
instead of the normal 1 segment for a full optimize.

Still I find that particular merge being done somewhat odd: it was
merging 7 segments, the first of which was immense, and the final 6
were tiny. It's not an efficient merge to do. Seeing the infoStream
output might help explain what led to that...

>
> 3) In Lucene 2.3 "segment merging is done in a background thread" -
> how does it work, ie, how does it know which segments to merge? What
> would cause this background merge exception?

The selection of segments to merge, and when, is done by the
LogByteSizeMergePolicy, which you can swap out for your own merge
policy (should not in general be necessary). Once a merge is
selected, the execution of that merge is controlled by
ConcurrentMergeScheduler, which runs merges in background threads.
You can also swap that out (eg for SerialMergeScheduler, which uses
the FG thread to merging, like Lucene used to before 2.3).

I think the background merge exception is often disk full, but in
general it can be anything that went wrong while merging. Such
exceptions won't corrupt your index because the merge only commits the
changes to the index if it completes successfully.

>
> 4) Can we turn off "background merge" if I'm running the optimize
> every hour in any case? How do we turn it off?

Yes: IndexWriter.setMergeScheduler(new SerialMergeScheduler()) gets
you back to the old (fg thread) way of running merges. But in general
this gets you worse net performance, unless you are already using
multiple threads when adding documents.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Sep 22, 2008, 4:02 PM

Post #8 of 29 (6256 views)
Permalink
Re: Background merge hit exception [In reply to]

OK I found one path whereby optimize would detect that the
ConcurrentMergeScheduler had hit an exception while merging in a BG
thread, and correctly throw an IOException back to its caller, but
fail to set the root cause in that exception. I just committed it, so
it should be fixed in 2.4:

https://issues.apache.org/jira/browse/LUCENE-1397

Mike

Michael McCandless wrote:

>
> vivek sar wrote:
>
>> Thanks Mike for the insight. I did check the stdout log and found it
>> was complaining of not having enough disk space. I thought we need
>> only x2 of the index size. Our index size is 10G (max) and we had 45G
>> left on that parition - should it still complain of the space?
>
> Is there a reader open on the index while optimize is running? That
> ties up potentially another 1X.
>
> Are you certain you're closing all previously open readers?
>
> On Linux, because the semantics is "delete on last close", it's hard
> to detect when you have IndexReaders still open because an "ls"
> won't show the deleted files, yet, they are still consuming bytes on
> disk until the last open file handle is closed. You can try running
> "lsof" to see which files are held open, while optimize is running?
>
> Also, if you can call IndexWriter.setInfoStream(...) for all of the
> operations below, I can peak at it to try to see why it's using up
> so much intermediate disk space.
>
>> Some comments/questions on other issues you raised,
>>
>>
>> We have 2 threads that index the data in two different indexes and
>> then we merge them into a master index with following call,
>>
>> masterWriter.addIndexesNoOptimize(indices);
>>
>> Once the smaller indices have merged into the master index we delete
>> the smaller indices.
>>
>> This process runs every 5 minutes. Master Index can grow up to 10G
>> before we partition it - move it to other directory and start a new
>> master index.
>>
>> Every hour we then optimize the master index using,
>>
>> writer.optimize(optimizeSegment); //where optimizeSegment =
>> 10
>
> How long does that optimize take? And what do you do with the
> every-5-minutes job while optimize is running? Do you run it,
> anyway, sharing the same writer (ie you're calling
> addIndexesNoOptimize while another thread is running the optimize)?
>
>>
>> Here are my questions,
>>
>> 1) Is this process flawed in terms of performance and efficiency?
>> What
>> would you recommend?
>
> Actually I think your approach is the right approach.
>>
>> 2) When you say "partial optimize" what do you mean by that?
>
> Actually, it's what you're already doing (passing 10 to optimize).
> This means the index just has to reduce itself to <= 10 segments,
> instead of the normal 1 segment for a full optimize.
>
> Still I find that particular merge being done somewhat odd: it was
> merging 7 segments, the first of which was immense, and the final 6
> were tiny. It's not an efficient merge to do. Seeing the
> infoStream output might help explain what led to that...
>
>>
>> 3) In Lucene 2.3 "segment merging is done in a background thread" -
>> how does it work, ie, how does it know which segments to merge? What
>> would cause this background merge exception?
>
> The selection of segments to merge, and when, is done by the
> LogByteSizeMergePolicy, which you can swap out for your own merge
> policy (should not in general be necessary). Once a merge is
> selected, the execution of that merge is controlled by
> ConcurrentMergeScheduler, which runs merges in background threads.
> You can also swap that out (eg for SerialMergeScheduler, which uses
> the FG thread to merging, like Lucene used to before 2.3).
>
> I think the background merge exception is often disk full, but in
> general it can be anything that went wrong while merging. Such
> exceptions won't corrupt your index because the merge only commits
> the changes to the index if it completes successfully.
>
>>
>> 4) Can we turn off "background merge" if I'm running the optimize
>> every hour in any case? How do we turn it off?
>
> Yes: IndexWriter.setMergeScheduler(new SerialMergeScheduler()) gets
> you back to the old (fg thread) way of running merges. But in
> general this gets you worse net performance, unless you are already
> using multiple threads when adding documents.
>
> Mike


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Jan 2, 2009, 12:44 PM

Post #9 of 29 (5936 views)
Permalink
Re: background merge hit exception [In reply to]

It looks like your index has some kind of corruption. Were there any other
exceptions prior to this one, or, any previous problems with the OS/IO
system?

Can you run CheckIndex (java org.apache.lucene.index.CheckIndex to see
usage) and post the output?
Mike

Brian Whitman <brian [at] echonest> wrote:

> I am getting this on a 10GB index (via solr 1.3) during an optimize:
> Jan 2, 2009 6:51:52 PM org.apache.solr.common.SolrException log
> SEVERE: java.io.IOException: background merge hit exception: _ks4:C2504982
> _oaw:C514635 _tll:C827949 _tdx:C18372 _te8:C19929 _tej:C22201
> _1agw:C1717926
> _1agz:C1 into _1ah2 [optimize]
> at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2346)
> at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2280)
> at
>
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:355)
> at
>
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:77)
> ...
>
> Exception in thread "Lucene Merge Thread #2"
> org.apache.lucene.index.MergePolicy$MergeException:
> java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 34950
> at
>
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:314)
> at
>
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Array index out of
> range: 34950
> at org.apache.lucene.util.BitVector.get(BitVector.java:91)
> at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:125)
> at
>
> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:98)
> ...
>
>
> Does anyone know how this is caused and how I can fix it? It happens with
> every optimize. Commits were very slow on this index as well (40x as slow
> as
> a similar index on another machine) I have plenty of disk space (many 100s
> of GB) free.
>


brian at echonest

Jan 2, 2009, 12:47 PM

Post #10 of 29 (5946 views)
Permalink
Re: background merge hit exception [In reply to]

I will but I bet I can guess what happened -- this index has many duplicates
in it as well (same uniqueKey id multiple times) - this happened to us once
before and it was because the solr server went down during an add. We may
have to re-index, but I will run checkIndex now. Thanks
(Thread for dupes here :
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200803.mbox/%3c4ED8C459-1B0F-41CC-986C-4FFCEEF82E55 [at] variogr%3e)

On Fri, Jan 2, 2009 at 3:44 PM, Michael McCandless <
lucene [at] mikemccandless> wrote:

> It looks like your index has some kind of corruption. Were there any other
> exceptions prior to this one, or, any previous problems with the OS/IO
> system?
>
> Can you run CheckIndex (java org.apache.lucene.index.CheckIndex to see
> usage) and post the output?
> Mike
>
> Brian Whitman <brian [at] echonest> wrote:
>
> > I am getting this on a 10GB index (via solr 1.3) during an optimize:
> > Jan 2, 2009 6:51:52 PM org.apache.solr.common.SolrException log
> > SEVERE: java.io.IOException: background merge hit exception:
> _ks4:C2504982
> > _oaw:C514635 _tll:C827949 _tdx:C18372 _te8:C19929 _tej:C22201
> > _1agw:C1717926
> > _1agz:C1 into _1ah2 [optimize]
> > at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2346)
> > at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2280)
> > at
> >
> >
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:355)
> > at
> >
> >
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:77)
> > ...
> >
> > Exception in thread "Lucene Merge Thread #2"
> > org.apache.lucene.index.MergePolicy$MergeException:
> > java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 34950
> > at
> >
> >
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:314)
> > at
> >
> >
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)
> > Caused by: java.lang.ArrayIndexOutOfBoundsException: Array index out of
> > range: 34950
> > at org.apache.lucene.util.BitVector.get(BitVector.java:91)
> > at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:125)
> > at
> >
> >
> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:98)
> > ...
> >
> >
> > Does anyone know how this is caused and how I can fix it? It happens with
> > every optimize. Commits were very slow on this index as well (40x as slow
> > as
> > a similar index on another machine) I have plenty of disk space (many
> 100s
> > of GB) free.
> >
>


brian at echonest

Jan 2, 2009, 1:02 PM

Post #11 of 29 (5937 views)
Permalink
Re: background merge hit exception [In reply to]

Here's checkindex:

NOTE: testing will be more thorough if you run java with
'-ea:org.apache.lucene', so assertions are enabled

Opening index @ /vol/solr/data/index/

Segments file=segments_vxx numSegments=8 version=FORMAT_HAS_PROX [Lucene
2.4]
1 of 8: name=_ks4 docCount=2504982
compound=false
hasProx=true
numFiles=11
size (MB)=3,965.695
no deletions
test: open reader.........OK
test: fields, norms.......OK [343 fields]
test: terms, freq, prox...OK [37238560 terms; 161527224 terms/docs
pairs; 186273362 tokens]
test: stored fields.......OK [55813402 total field count; avg 22.281
fields per doc]
test: term vectors........OK [.7998458 total vector count; avg 3.193
term/freq vector fields per doc]

2 of 8: name=_oaw docCount=514635
compound=false
hasProx=true
numFiles=12
size (MB)=746.887
has deletions [delFileName=_oaw_1rb.del]
test: open reader.........OK [155528 deleted docs]
test: fields, norms.......OK [172 fields]
test: terms, freq, prox...OK [7396227 terms; 28146962 terms/docs pairs;
17298364 tokens]
test: stored fields.......OK [5736012 total field count; avg 15.973
fields per doc]
test: term vectors........OK [.1045176 total vector count; avg 2.91
term/freq vector fields per doc]

3 of 8: name=_tll docCount=827949
compound=false
hasProx=true
numFiles=12
size (MB)=761.782
has deletions [delFileName=_tll_2fs.del]
test: open reader.........OK [39283 deleted docs]
test: fields, norms.......OK [180 fields]
test: terms, freq, prox...OK [10925397 terms; 43361019 terms/docs pairs;
42123294 tokens]
test: stored fields.......OK [8673255 total field count; avg 10.997
fields per doc]
test: term vectors........OK [.880272 total vector count; avg 1.116
term/freq vector fields per doc]

4 of 8: name=_tdx docCount=18372
compound=false
hasProx=true
numFiles=12
size (MB)=56.856
has deletions [delFileName=_tdx_9.del]
test: open reader.........OK [18368 deleted docs]
test: fields, norms.......OK [50 fields]
test: terms, freq, prox...OK [261974 terms; 2018842 terms/docs pairs;
150 tokens]
test: stored fields.......OK [76 total field count; avg 19 fields per
doc]
test: term vectors........OK [.14 total vector count; avg 3.5 term/freq
vector fields per doc]

5 of 8: name=_te8 docCount=19929
compound=false
hasProx=true
numFiles=12
size (MB)=60.475
has deletions [delFileName=_te8_a.del]
test: open reader.........OK [19900 deleted docs]
test: fields, norms.......OK [72 fields]
test: terms, freq, prox...OK [276045 terms; 2166958 terms/docs pairs;
1196 tokens]
test: stored fields.......OK [522 total field count; avg 18 fields per
doc]
test: term vectors........OK [.132 total vector count; avg 4.552
term/freq vector fields per doc]

6 of 8: name=_tej docCount=22201
compound=false
hasProx=true
numFiles=12
size (MB)=65.827
has deletions [delFileName=_tej_o.del]
test: open reader.........OK [22171 deleted docs]
test: fields, norms.......OK [50 fields]
test: terms, freq, prox...FAILED
WARNING: would remove reference to this segment (-fix was not
specified); full exception:
java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 34950
at org.apache.lucene.util.BitVector.get(BitVector.java:91)
at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:125)
at
org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:98)
at org.apache.lucene.index.CheckIndex.check(CheckIndex.java:222)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:433)

7 of 8: name=_1agw docCount=1717926
compound=false
hasProx=true
numFiles=12
size (MB)=2,390.413
has deletions [delFileName=_1agw_1.del]
test: open reader.........OK [1 deleted docs]
test: fields, norms.......OK [438 fields]
test: terms, freq, prox...OK [20959015 terms; 101603282 terms/docs
pairs; 123561985 tokens]
test: stored fields.......OK [26248407 total field count; avg 15.279
fields per doc]
test: term vectors........OK [.4911368 total vector count; avg 2.859
term/freq vector fields per doc]

8 of 8: name=_1agz docCount=1
compound=false
hasProx=true
numFiles=8
size (MB)=0
no deletions
test: open reader.........OK
test: fields, norms.......OK [6 fields]
test: terms, freq, prox...OK [6 terms; 6 terms/docs pairs; 6 tokens]
test: stored fields.......OK [6 total field count; avg 6 fields per doc]
test: term vectors........OK [.0 total vector count; avg 0 term/freq
vector fields per doc]

WARNING: 1 broken segments detected
WARNING: 30 documents would be lost if -fix were specified

NOTE: would write new segments file [-fix was not specified]



On Fri, Jan 2, 2009 at 3:47 PM, Brian Whitman <brian [at] echonest> wrote:

> I will but I bet I can guess what happened -- this index has many
> duplicates in it as well (same uniqueKey id multiple times) - this happened
> to us once before and it was because the solr server went down during an
> add. We may have to re-index, but I will run checkIndex now. Thanks
> (Thread for dupes here :
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200803.mbox/%3c4ED8C459-1B0F-41CC-986C-4FFCEEF82E55 [at] variogr%3e)
>
>
> On Fri, Jan 2, 2009 at 3:44 PM, Michael McCandless <
> lucene [at] mikemccandless> wrote:
>
>> It looks like your index has some kind of corruption. Were there any
>> other
>> exceptions prior to this one, or, any previous problems with the OS/IO
>> system?
>>
>> Can you run CheckIndex (java org.apache.lucene.index.CheckIndex to see
>> usage) and post the output?
>> Mike
>>
>> Brian Whitman <brian [at] echonest> wrote:
>>
>> > I am getting this on a 10GB index (via solr 1.3) during an optimize:
>> > Jan 2, 2009 6:51:52 PM org.apache.solr.common.SolrException log
>> > SEVERE: java.io.IOException: background merge hit exception:
>> _ks4:C2504982
>> > _oaw:C514635 _tll:C827949 _tdx:C18372 _te8:C19929 _tej:C22201
>> > _1agw:C1717926
>> > _1agz:C1 into _1ah2 [optimize]
>> > at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2346)
>> > at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2280)
>> > at
>> >
>> >
>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:355)
>> > at
>> >
>> >
>> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:77)
>> > ...
>> >
>> > Exception in thread "Lucene Merge Thread #2"
>> > org.apache.lucene.index.MergePolicy$MergeException:
>> > java.lang.ArrayIndexOutOfBoundsException: Array index out of range:
>> 34950
>> > at
>> >
>> >
>> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:314)
>> > at
>> >
>> >
>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)
>> > Caused by: java.lang.ArrayIndexOutOfBoundsException: Array index out of
>> > range: 34950
>> > at org.apache.lucene.util.BitVector.get(BitVector.java:91)
>> > at
>> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:125)
>> > at
>> >
>> >
>> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:98)
>> > ...
>> >
>> >
>> > Does anyone know how this is caused and how I can fix it? It happens
>> with
>> > every optimize. Commits were very slow on this index as well (40x as
>> slow
>> > as
>> > a similar index on another machine) I have plenty of disk space (many
>> 100s
>> > of GB) free.
>> >
>>
>
>


yseeley at gmail

Jan 2, 2009, 2:09 PM

Post #12 of 29 (5943 views)
Permalink
Re: background merge hit exception [In reply to]

On Fri, Jan 2, 2009 at 3:47 PM, Brian Whitman <brian [at] echonest> wrote:
> I will but I bet I can guess what happened -- this index has many duplicates
> in it as well (same uniqueKey id multiple times) - this happened to us once
> before and it was because the solr server went down during an add.

That should no longer be possible with Solr 1.3, which uses Lucene for
handling the duplicates in a transactional manner.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Jan 2, 2009, 2:26 PM

Post #13 of 29 (5940 views)
Permalink
Re: background merge hit exception [In reply to]

Also, this (Solr server going down during an add) should not be able to
cause this kind of corruption.
Mike

Yonik Seeley <yseeley [at] gmail> wrote:

> On Fri, Jan 2, 2009 at 3:47 PM, Brian Whitman <brian [at] echonest> wrote:
> > I will but I bet I can guess what happened -- this index has many
> duplicates
> > in it as well (same uniqueKey id multiple times) - this happened to us
> once
> > before and it was because the solr server went down during an add.
>
> That should no longer be possible with Solr 1.3, which uses Lucene for
> handling the duplicates in a transactional manner.
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


lucene at mikemccandless

Jan 2, 2009, 2:36 PM

Post #14 of 29 (5936 views)
Permalink
Re: background merge hit exception [In reply to]

So you have a segment (_tej) with 22201 docs, all but 30 of which are
deleted, and somehow one of the posting lists in _tej.frq is referencing an
out-of-bound docID 34950. Odd...

Are you sure the IO system doesn't have any consistency issues? What
environment are you running on (machine, OS, filesystem, JVM)?

You could re-run CheckIndex with -fix to remove that one problematic segment
(you'd lose the 30 docs in it though).
Mike

Brian Whitman <brian [at] echonest> wrote:

> Here's checkindex:
>
> NOTE: testing will be more thorough if you run java with
> '-ea:org.apache.lucene', so assertions are enabled
>
> Opening index @ /vol/solr/data/index/
>
> Segments file=segments_vxx numSegments=8 version=FORMAT_HAS_PROX [Lucene
> 2.4]
> 1 of 8: name=_ks4 docCount=2504982
> compound=false
> hasProx=true
> numFiles=11
> size (MB)=3,965.695
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [343 fields]
> test: terms, freq, prox...OK [37238560 terms; 161527224 terms/docs
> pairs; 186273362 tokens]
> test: stored fields.......OK [55813402 total field count; avg 22.281
> fields per doc]
> test: term vectors........OK [.7998458 total vector count; avg 3.193
> term/freq vector fields per doc]
>
> 2 of 8: name=_oaw docCount=514635
> compound=false
> hasProx=true
> numFiles=12
> size (MB)=746.887
> has deletions [delFileName=_oaw_1rb.del]
> test: open reader.........OK [155528 deleted docs]
> test: fields, norms.......OK [172 fields]
> test: terms, freq, prox...OK [7396227 terms; 28146962 terms/docs pairs;
> 17298364 tokens]
> test: stored fields.......OK [5736012 total field count; avg 15.973
> fields per doc]
> test: term vectors........OK [.1045176 total vector count; avg 2.91
> term/freq vector fields per doc]
>
> 3 of 8: name=_tll docCount=827949
> compound=false
> hasProx=true
> numFiles=12
> size (MB)=761.782
> has deletions [delFileName=_tll_2fs.del]
> test: open reader.........OK [39283 deleted docs]
> test: fields, norms.......OK [180 fields]
> test: terms, freq, prox...OK [10925397 terms; 43361019 terms/docs pairs;
> 42123294 tokens]
> test: stored fields.......OK [8673255 total field count; avg 10.997
> fields per doc]
> test: term vectors........OK [.880272 total vector count; avg 1.116
> term/freq vector fields per doc]
>
> 4 of 8: name=_tdx docCount=18372
> compound=false
> hasProx=true
> numFiles=12
> size (MB)=56.856
> has deletions [delFileName=_tdx_9.del]
> test: open reader.........OK [18368 deleted docs]
> test: fields, norms.......OK [50 fields]
> test: terms, freq, prox...OK [261974 terms; 2018842 terms/docs pairs;
> 150 tokens]
> test: stored fields.......OK [76 total field count; avg 19 fields per
> doc]
> test: term vectors........OK [.14 total vector count; avg 3.5 term/freq
> vector fields per doc]
>
> 5 of 8: name=_te8 docCount=19929
> compound=false
> hasProx=true
> numFiles=12
> size (MB)=60.475
> has deletions [delFileName=_te8_a.del]
> test: open reader.........OK [19900 deleted docs]
> test: fields, norms.......OK [72 fields]
> test: terms, freq, prox...OK [276045 terms; 2166958 terms/docs pairs;
> 1196 tokens]
> test: stored fields.......OK [522 total field count; avg 18 fields per
> doc]
> test: term vectors........OK [.132 total vector count; avg 4.552
> term/freq vector fields per doc]
>
> 6 of 8: name=_tej docCount=22201
> compound=false
> hasProx=true
> numFiles=12
> size (MB)=65.827
> has deletions [delFileName=_tej_o.del]
> test: open reader.........OK [22171 deleted docs]
> test: fields, norms.......OK [50 fields]
> test: terms, freq, prox...FAILED
> WARNING: would remove reference to this segment (-fix was not
> specified); full exception:
> java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 34950
> at org.apache.lucene.util.BitVector.get(BitVector.java:91)
> at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:125)
> at
>
> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:98)
> at org.apache.lucene.index.CheckIndex.check(CheckIndex.java:222)
> at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:433)
>
> 7 of 8: name=_1agw docCount=1717926
> compound=false
> hasProx=true
> numFiles=12
> size (MB)=2,390.413
> has deletions [delFileName=_1agw_1.del]
> test: open reader.........OK [1 deleted docs]
> test: fields, norms.......OK [438 fields]
> test: terms, freq, prox...OK [20959015 terms; 101603282 terms/docs
> pairs; 123561985 tokens]
> test: stored fields.......OK [26248407 total field count; avg 15.279
> fields per doc]
> test: term vectors........OK [.4911368 total vector count; avg 2.859
> term/freq vector fields per doc]
>
> 8 of 8: name=_1agz docCount=1
> compound=false
> hasProx=true
> numFiles=8
> size (MB)=0
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [6 fields]
> test: terms, freq, prox...OK [6 terms; 6 terms/docs pairs; 6 tokens]
> test: stored fields.......OK [6 total field count; avg 6 fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0 term/freq
> vector fields per doc]
>
> WARNING: 1 broken segments detected
> WARNING: 30 documents would be lost if -fix were specified
>
> NOTE: would write new segments file [-fix was not specified]
>
>
>
> On Fri, Jan 2, 2009 at 3:47 PM, Brian Whitman <brian [at] echonest> wrote:
>
> > I will but I bet I can guess what happened -- this index has many
> > duplicates in it as well (same uniqueKey id multiple times) - this
> happened
> > to us once before and it was because the solr server went down during an
> > add. We may have to re-index, but I will run checkIndex now. Thanks
> > (Thread for dupes here :
> >
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200803.mbox/%3c4ED8C459-1B0F-41CC-986C-4FFCEEF82E55 [at] variogr%3e
> )
> >
> >
> > On Fri, Jan 2, 2009 at 3:44 PM, Michael McCandless <
> > lucene [at] mikemccandless> wrote:
> >
> >> It looks like your index has some kind of corruption. Were there any
> >> other
> >> exceptions prior to this one, or, any previous problems with the OS/IO
> >> system?
> >>
> >> Can you run CheckIndex (java org.apache.lucene.index.CheckIndex to see
> >> usage) and post the output?
> >> Mike
> >>
> >> Brian Whitman <brian [at] echonest> wrote:
> >>
> >> > I am getting this on a 10GB index (via solr 1.3) during an optimize:
> >> > Jan 2, 2009 6:51:52 PM org.apache.solr.common.SolrException log
> >> > SEVERE: java.io.IOException: background merge hit exception:
> >> _ks4:C2504982
> >> > _oaw:C514635 _tll:C827949 _tdx:C18372 _te8:C19929 _tej:C22201
> >> > _1agw:C1717926
> >> > _1agz:C1 into _1ah2 [optimize]
> >> > at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2346)
> >> > at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2280)
> >> > at
> >> >
> >> >
> >>
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:355)
> >> > at
> >> >
> >> >
> >>
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:77)
> >> > ...
> >> >
> >> > Exception in thread "Lucene Merge Thread #2"
> >> > org.apache.lucene.index.MergePolicy$MergeException:
> >> > java.lang.ArrayIndexOutOfBoundsException: Array index out of range:
> >> 34950
> >> > at
> >> >
> >> >
> >>
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:314)
> >> > at
> >> >
> >> >
> >>
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)
> >> > Caused by: java.lang.ArrayIndexOutOfBoundsException: Array index out
> of
> >> > range: 34950
> >> > at org.apache.lucene.util.BitVector.get(BitVector.java:91)
> >> > at
> >> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:125)
> >> > at
> >> >
> >> >
> >>
> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:98)
> >> > ...
> >> >
> >> >
> >> > Does anyone know how this is caused and how I can fix it? It happens
> >> with
> >> > every optimize. Commits were very slow on this index as well (40x as
> >> slow
> >> > as
> >> > a similar index on another machine) I have plenty of disk space (many
> >> 100s
> >> > of GB) free.
> >> >
> >>
> >
> >
>


brian at echonest

Jan 2, 2009, 3:39 PM

Post #15 of 29 (5928 views)
Permalink
Re: background merge hit exception [In reply to]

So my apologies for the duplicate comments, I went to go get proof of
duplicates and was confused as we apparently have duplicates across
different shards now in our distributed setup (a bug on our end.) I assumed
when I saw duplicates that it was the same problem as last time. Still
doesn't help me get at my segment corruption problem, though :(

Michael, in answer to your question: java 1.6 64-bit, debian linux, amazon
ec2 machine with the index on an elastic block store. No other problems with
that setup for a few months now.

I ran checkindex with -fix on and optimize still throws the same error.


On Fri, Jan 2, 2009 at 5:26 PM, Michael McCandless <
lucene [at] mikemccandless> wrote:

> Also, this (Solr server going down during an add) should not be able to
> cause this kind of corruption.
> Mike
>
> Yonik Seeley <yseeley [at] gmail> wrote:
>
> > On Fri, Jan 2, 2009 at 3:47 PM, Brian Whitman <brian [at] echonest>
> wrote:
> > > I will but I bet I can guess what happened -- this index has many
> > duplicates
> > > in it as well (same uniqueKey id multiple times) - this happened to us
> > once
> > > before and it was because the solr server went down during an add.
> >
> > That should no longer be possible with Solr 1.3, which uses Lucene for
> > handling the duplicates in a transactional manner.
> >
> > -Yonik
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > For additional commands, e-mail: java-user-help [at] lucene
> >
> >
>


lucene at mikemccandless

Jan 3, 2009, 5:55 AM

Post #16 of 29 (5900 views)
Permalink
Re: background merge hit exception [In reply to]

It's remotely possible you hit a JVM bug in the past and that caused the
corruption.
EG there is at least one JVM bug lurking that can affect Lucene (though
apparently with an OS-level fault):
https://issues.apache.org/jira/browse/LUCENE-1342

I don't know much about Amazon's elastic block store, but presumably it's
unlikely to have undetected IO errors.

Did this corruption happen only once? (You mentioned hitting dups in the
past... but did you also see corruption too?)

It's very strange that CheckIndex -fix did not resolve the issue. After
fixing it, if you re-run CheckIndex on the index do you still see that
original one broken segment present? CheckIndex should have removed
reference to that one segment.

Mike

Brian Whitman <brian [at] echonest> wrote:

> So my apologies for the duplicate comments, I went to go get proof of
> duplicates and was confused as we apparently have duplicates across
> different shards now in our distributed setup (a bug on our end.) I assumed
> when I saw duplicates that it was the same problem as last time. Still
> doesn't help me get at my segment corruption problem, though :(
>
> Michael, in answer to your question: java 1.6 64-bit, debian linux, amazon
> ec2 machine with the index on an elastic block store. No other problems
> with
> that setup for a few months now.
>
> I ran checkindex with -fix on and optimize still throws the same error.
>
>
> On Fri, Jan 2, 2009 at 5:26 PM, Michael McCandless <
> lucene [at] mikemccandless> wrote:
>
> > Also, this (Solr server going down during an add) should not be able to
> > cause this kind of corruption.
> > Mike
> >
> > Yonik Seeley <yseeley [at] gmail> wrote:
> >
> > > On Fri, Jan 2, 2009 at 3:47 PM, Brian Whitman <brian [at] echonest>
> > wrote:
> > > > I will but I bet I can guess what happened -- this index has many
> > > duplicates
> > > > in it as well (same uniqueKey id multiple times) - this happened to
> us
> > > once
> > > > before and it was because the solr server went down during an add.
> > >
> > > That should no longer be possible with Solr 1.3, which uses Lucene for
> > > handling the duplicates in a transactional manner.
> > >
> > > -Yonik
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > > For additional commands, e-mail: java-user-help [at] lucene
> > >
> > >
> >
>


brian at echonest

Jan 3, 2009, 8:04 AM

Post #17 of 29 (5906 views)
Permalink
Re: background merge hit exception [In reply to]

>
>
> It's very strange that CheckIndex -fix did not resolve the issue. After
> fixing it, if you re-run CheckIndex on the index do you still see that
> original one broken segment present? CheckIndex should have removed
> reference to that one segment.
>

I just ran it again, and it detected the same error and claimed to fix it. I
then shut down the solr server (I wasn't sure if this would be an issue),
ran it a third time (where it again found and claimed to fix the error),
then a fourth where it did not find any problems, and now the optimize()
call on the running server does not throw the merge exception.

Did this corruption happen only once? (You mentioned hitting dups in
the past...
> but did you also see corruption too?)


Not that we know of, but it's very likely we never noticed. (The only reason
I discovered this was our commits were taking 20-40x longer on this index
than others)


vivextra at gmail

Feb 25, 2009, 5:29 PM

Post #18 of 29 (5594 views)
Permalink
Re: background merge hit exception [In reply to]

Hi,

We ran into the same issue (corrupted index) using Lucene 2.4.0.
There was no outage or system reboot - not sure how could it get
corrupted. Here is the exception,

Caused by: java.io.IOException: background merge hit exception:
_io5:c66777491 _nh9:c10656736 _taq:c2021563 _s8m:c1421051
_uh5:c2065961 _r0y:c1124653 _s4s:c2477731 _t6w:c4340938 _ucx:c8018451
_xkb:c13842776 _xkd:c3394 _xke:c3379 _xkg:c1231 _xkh:c1252 _xkj:c1680
_xkk:c1689 into _xkl [optimize]
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2258)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2194)
at com.packetmotion.manager.query.fulltext.index.Indexer.optimizeMasterIndex(Indexer.java:887)
at com.packetmotion.manager.query.fulltext.index.Indexer.createNewIndexPartition(Indexer.java:783)
at com.packetmotion.manager.query.fulltext.index.Indexer.indexByDevice(Indexer.java:582)
at com.packetmotion.manager.query.fulltext.index.Indexer.indexData(Indexer.java:440)
at com.packetmotion.manager.query.fulltext.index.Indexer$$FastClassByCGLIB$$97fb7e9b.invoke(<generated>)
at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:149)
at org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:628)
at com.packetmotion.manager.query.fulltext.index.Indexer$$EnhancerByCGLIB$$ebbe3914.indexData(<generated>)
at com.packetmotion.manager.query.fulltext.index.IndexerJob$1.run(IndexerJob.java:38)
... 8 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 51, Size: 26
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:265)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:185)
at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:729)
at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:359)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4226)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3877)
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:205)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:260)


I ran the checkIndex tool to fix the corrupted index. Here is its output,


Opening index @ /opt/ps_manager/apps/conf/index/MasterIndex

Segments file=segments_1587 numSegments=18 version=FORMAT_HAS_PROX [Lucene 2.4]
1 of 18: name=_io5 docCount=66777491
compound=true
hasProx=true
numFiles=1
size (MB)=6,680.574
no deletions
test: open reader.........OK
test: fields, norms.......OK [30 fields]
test: terms, freq, prox...OK [368415 terms; 1761057989 terms/docs
pairs; 2095636359 tokens]
test: stored fields.......OK [66777491 total field count; avg 1
fields per doc]
test: term vectors........OK [.0 total vector count; avg 0
term/freq vector fields per doc]

2 of 18: name=_nh9 docCount=10656736
compound=true
hasProx=true
numFiles=1
size (MB)=1,058.869
no deletions
test: open reader.........OK
test: fields, norms.......OK [30 fields]
test: terms, freq, prox...OK [118964 terms; 278176718 terms/docs
pairs; 329825350 tokens]
test: stored fields.......OK [10656736 total field count; avg 1
fields per doc]
test: term vectors........OK [.0 total vector count; avg 0
term/freq vector fields per doc]

3 of 18: name=_taq docCount=2021563
compound=true
hasProx=true
numFiles=1
size (MB)=208.544
no deletions
test: open reader.........OK
test: fields, norms.......OK [30 fields]
test: terms, freq, prox...OK [145494 terms; 54131697 terms/docs
pairs; 65411701 tokens]
test: stored fields.......OK [2021563 total field count; avg 1
fields per doc]
test: term vectors........OK [.0 total vector count; avg 0
term/freq vector fields per doc]

4 of 18: name=_s8m docCount=1421051
compound=true
hasProx=true
numFiles=1
size (MB)=162.443
no deletions
test: open reader.........OK
test: fields, norms.......OK [30 fields]
test: terms, freq, prox...OK [210276 terms; 42491363 terms/docs
pairs; 53054214 tokens]
test: stored fields.......OK [1421051 total field count; avg 1
fields per doc]
test: term vectors........OK [.0 total vector count; avg 0
term/freq vector fields per doc]

5 of 18: name=_uh5 docCount=2065961
compound=true
hasProx=true
numFiles=1
size (MB)=229.394
no deletions
test: open reader.........OK
test: fields, norms.......OK [30 fields]
test: terms, freq, prox...OK [126789 terms; 60416663 terms/docs
pairs; 75120265 tokens]
test: stored fields.......OK [2065964 total field count; avg 1
fields per doc]
test: term vectors........OK [.0 total vector count; avg 0
term/freq vector fields per doc]

6 of 18: name=_r0y docCount=1124653
compound=true
hasProx=true
numFiles=1
size (MB)=115.792
no deletions
test: open reader.........OK
test: fields, norms.......OK [26 fields]
test: terms, freq, prox...FAILED
WARNING: fixIndex() would remove reference to this segment; full exception:
java.lang.RuntimeException: term desthost:wir docFreq=6 != num docs
seen 0 + num docs deleted 0
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:475)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:676)

7 of 18: name=_s4s docCount=2477731
compound=true
hasProx=true
numFiles=1
size (MB)=230.698
no deletions
test: open reader.........PuTTYPuTTYOK
test: fields, norms.......OK [25 fields]
test: terms, freq, prox...OK [90327 terms; 60212096 terms/docs
pairs; 72740383 tokens]
test: stored fields.......OK [2477731 total field count; avg 1
fields per doc]
test: term vectors........OK [.0 total vector count; avg 0
term/freq vector fields per doc]

8 of 18: name=_t6w docCount=4340938
compound=true
hasProx=true
numFiles=1
size (MB)=451.389
no deletions
test: open reader.........OK
test: fields, norms.......OK [26 fields]
test: terms, freq, prox...OK [157817 terms; 121753990 terms/docs
pairs; 151141568 tokens]
test: stored fields.......OK [4340938 total field count; avg 1
fields per doc]
test: term vectors........OK [.0 total vector count; avg 0
term/freq vector fields per doc]

9 of 18: name=_ucx docCount=8018451
compound=true
hasProx=true
numFiles=1
size (MB)=845.968
no deletions
test: open reader.........OK
test: fields, norms.......OK [26 fields]
test: terms, freq, prox...OK [410057 terms; 227617455 terms/docs
pairs; 283398975 tokens]
test: stored fields.......OK [8018451 total field count; avg 1
fields per doc]
test: term vectors........OK [.0 total vector count; avg 0
term/freq vector fields per doc]

10 of 18: name=_xl9 docCount=13891933
compound=true
hasProx=true
numFiles=1
size (MB)=1,408.881
no deletions
test: open reader.........OK
test: fields, norms.......OK [26 fields]
test: terms, freq, prox...OK [535226 terms; 376394493 terms/docs
pairs; 465003060 tokens]
test: stored fields.......OK [13891933 total field count; avg 1
fields per doc]
test: term vectors........OK [.0 total vector count; avg 0
term/freq vector fields per doc]

11 of 18: name=_xlb docCount=3521
compound=true
hasProx=true
numFiles=1
size (MB)=0.411
no deletions
test: open reader.........OK
test: fields, norms.......OK [25 fields]
test: terms, freq, prox...OK [2070 terms; 108496 terms/docs pairs;
136365 tokens]
test: stored fields.......OK [3521 total field count; avg 1 fields per doc]
test: term vectors........OK [.0 total vector count; avg 0
term/freq vector fields per doc]

12 of 18: name=_xlc docCount=3529
compound=true
hasProx=true
numFiles=1
size (MB)=0.412
no deletions
test: open reader.........OK
test: fields, norms.......OK [22 fields]
test: terms, freq, prox...OK [2214 terms; 108633 terms/docs pairs;
136384 tokens]
test: stored fields.......OK [3529 total field count; avg 1 fields per doc]
test: term vectors........OK [.0 total vector count; avg 0
term/freq vector fields per doc]

13 of 18: name=_xle docCount=1401
compound=true
hasProx=true
numFiles=1
size (MB)=0.188
no deletions
test: open reader.........OK
test: fields, norms.......OK [21 fields]
test: terms, freq, prox...OK [1854 terms; 48703 terms/docs pairs;
62221 tokens]
test: stored fields.......OK [1401 total field count; avg 1 fields per doc]
test: term vectors........OK [.0 total vector count; avg 0
term/freq vector fields per doc]

14 of 18: name=_xlf docCount=1399
compound=true
hasProx=true
numFiles=1
size (MB)=0.188
no deletions
test: open reader.........OK
test: fields, norms.......OK [21 fields]
test: terms, freq, prox...OK [1763 terms; 48910 terms/docs pairs;
63035 tokens]
test: stored fields.......OK [1399 total field count; avg 1 fields per doc]
test: term vectors........OK [.0 total vector count; avg 0
term/freq vector fields per doc]

15 of 18: name=_xlh docCount=1727
compound=true
hasProx=true
numFiles=1
size (MB)=0.24
no deletions
test: open reader.........OK
test: fields, norms.......OK [21 fields]
test: terms, freq, prox...OK [2318 terms; 62596 terms/docs pairs;
80688 tokens]
test: stored fields.......OK [1727 total field count; avg 1 fields per doc]
test: term vectors........OK [.0 total vector count; avg 0
term/freq vector fields per doc]

16 of 18: name=_xli docCount=1716
compound=true
hasProx=true
numFiles=1
size (MB)=0.237
no deletions
test: open reader.........OK
test: fields, norms.......OK [21 fields]
test: terms, freq, prox...OK [2264 terms; 61867 terms/docs pairs;
79497 tokens]
test: stored fields.......OK [1716 total field count; avg 1 fields per doc]
test: term vectors........OK [.0 total vector count; avg 0
term/freq vector fields per doc]

17 of 18: name=_xlk docCount=2921
compound=true
hasProx=true
numFiles=1
size (MB)=0.364
no deletions
test: open reader.........OK
test: fields, norms.......OK [25 fields]
test: terms, freq, prox...OK [2077 terms; 96536 terms/docs pairs;
123166 tokens]
test: stored fields.......OK [2921 total field count; avg 1 fields per doc]
test: term vectors........OK [.0 total vector count; avg 0
term/freq vector fields per doc]

18 of 18: name=_xll docCount=3876
compound=true
hasProx=true
numFiles=1
size (MB)=0.476
no deletions
test: open reader.........OK
test: fields, norms.......OK [21 fields]
test: terms, freq, prox...OK [2261 terms; 130104 terms/docs pairs;
166867 tokens]
test: stored fields.......OK [3876 total field count; avg 1 fields per doc]
test: term vectors........OK [.0 total vector count; avg 0
term/freq vector fields per doc]

WARNING: 1 broken segments (containing 1124653 documents) detected
WARNING: 1124653 documents will be lost

NOTE: will write new segments file in 5 seconds; this will remove
1124653 docs from the index. THIS IS YOUR LAST CHANCE TO CTRL+C!
5...
4...
3...
2...
1...
Writing...
OK
Wrote new segments file "segments_1588"


Any ideas how would the index get corrupted?

Thanks,
-vivek


On Sat, Jan 3, 2009 at 8:04 AM, Brian Whitman <brian [at] echonest> wrote:
>>
>>
>> It's very strange that CheckIndex -fix did not resolve the issue. áAfter
>> fixing it, if you re-run CheckIndex on the index do you still see that
>> original one broken segment present? áCheckIndex should have removed
>> reference to that one segment.
>>
>
> I just ran it again, and it detected the same error and claimed to fix it. I
> then shut down the solr server (I wasn't sure if this would be an issue),
> ran it a third time (where it again found and claimed to fix the error),
> then a fourth where it did not find any problems, and now the optimize()
> call on the running server does not throw the merge exception.
>
> áDid this corruption happen only once? á(You mentioned hitting dups in
> the past...
>> but did you also see corruption too?)
>
>
> Not that we know of, but it's very likely we never noticed. (The only reason
> I discovered this was our commits were taking 20-40x longer on this index
> than others)
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Feb 26, 2009, 3:52 AM

Post #19 of 29 (5600 views)
Permalink
Re: background merge hit exception [In reply to]

The exception you hit during optimize indicates some corruption in the
stored fields file (_X.fdt). Then, the exception hit during
CheckIndex was different -- the postings entry (_r0y.frq) for term
"desthost:wir" was supposed to have 6 entries but had 0.

Did "CheckIndex -fix" allow you to optimize successfully? In which
case, both of these different corruptions were in the same segment
(_r0y) that CheckIndex removed.

Was this index fully created with 2.4.0 or a prior release?

Which exact JRE was used to create/write to this index? And which one
was used for searching it?

How often do you hit this? Is it repeatable?

It's remotely possible you have bad hardware (RAM, hard drive)

Most likely it's gone... but if somehow I could get access to that bad
segment, I could try to dig.

vivek sar wrote:

> Hi,
>
> We ran into the same issue (corrupted index) using Lucene 2.4.0.
> There was no outage or system reboot - not sure how could it get
> corrupted. Here is the exception,
>
> Caused by: java.io.IOException: background merge hit exception:
> _io5:c66777491 _nh9:c10656736 _taq:c2021563 _s8m:c1421051
> _uh5:c2065961 _r0y:c1124653 _s4s:c2477731 _t6w:c4340938 _ucx:c8018451
> _xkb:c13842776 _xkd:c3394 _xke:c3379 _xkg:c1231 _xkh:c1252 _xkj:c1680
> _xkk:c1689 into _xkl [optimize]
> at
> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2258)
> at
> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2194)
> at
> com
> .packetmotion
> .manager
> .query.fulltext.index.Indexer.optimizeMasterIndex(Indexer.java:887)
> at
> com
> .packetmotion
> .manager
> .query.fulltext.index.Indexer.createNewIndexPartition(Indexer.java:
> 783)
> at
> com
> .packetmotion
> .manager.query.fulltext.index.Indexer.indexByDevice(Indexer.java:582)
> at
> com
> .packetmotion
> .manager.query.fulltext.index.Indexer.indexData(Indexer.java:440)
> at com.packetmotion.manager.query.fulltext.index.Indexer$
> $FastClassByCGLIB$$97fb7e9b.invoke(<generated>)
> at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:149)
> at org.springframework.aop.framework.Cglib2AopProxy
> $DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:628)
> at com.packetmotion.manager.query.fulltext.index.Indexer$
> $EnhancerByCGLIB$$ebbe3914.indexData(<generated>)
> at com.packetmotion.manager.query.fulltext.index.IndexerJob
> $1.run(IndexerJob.java:38)
> ... 8 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 51, Size: 26
> at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> at java.util.ArrayList.get(ArrayList.java:322)
> at
> org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:265)
> at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:
> 185)
> at
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:729)
> at
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:
> 359)
> at
> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
> at
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4226)
> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:
> 3877)
> at
> org
> .apache
> .lucene
> .index
> .ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:205)
> at org.apache.lucene.index.ConcurrentMergeScheduler
> $MergeThread.run(ConcurrentMergeScheduler.java:260)
>
>
> I ran the checkIndex tool to fix the corrupted index. Here is its
> output,
>
>
> Opening index @ /opt/ps_manager/apps/conf/index/MasterIndex
>
> Segments file=segments_1587 numSegments=18 version=FORMAT_HAS_PROX
> [Lucene 2.4]
> 1 of 18: name=_io5 docCount=66777491
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=6,680.574
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [30 fields]
> test: terms, freq, prox...OK [368415 terms; 1761057989 terms/docs
> pairs; 2095636359 tokens]
> test: stored fields.......OK [66777491 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 2 of 18: name=_nh9 docCount=10656736
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=1,058.869
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [30 fields]
> test: terms, freq, prox...OK [118964 terms; 278176718 terms/docs
> pairs; 329825350 tokens]
> test: stored fields.......OK [10656736 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 3 of 18: name=_taq docCount=2021563
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=208.544
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [30 fields]
> test: terms, freq, prox...OK [145494 terms; 54131697 terms/docs
> pairs; 65411701 tokens]
> test: stored fields.......OK [2021563 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 4 of 18: name=_s8m docCount=1421051
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=162.443
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [30 fields]
> test: terms, freq, prox...OK [210276 terms; 42491363 terms/docs
> pairs; 53054214 tokens]
> test: stored fields.......OK [1421051 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 5 of 18: name=_uh5 docCount=2065961
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=229.394
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [30 fields]
> test: terms, freq, prox...OK [126789 terms; 60416663 terms/docs
> pairs; 75120265 tokens]
> test: stored fields.......OK [2065964 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 6 of 18: name=_r0y docCount=1124653
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=115.792
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [26 fields]
> test: terms, freq, prox...FAILED
> WARNING: fixIndex() would remove reference to this segment; full
> exception:
> java.lang.RuntimeException: term desthost:wir docFreq=6 != num docs
> seen 0 + num docs deleted 0
> at
> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:475)
> at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:676)
>
> 7 of 18: name=_s4s docCount=2477731
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=230.698
> no deletions
> test: open reader.........PuTTYPuTTYOK
> test: fields, norms.......OK [25 fields]
> test: terms, freq, prox...OK [90327 terms; 60212096 terms/docs
> pairs; 72740383 tokens]
> test: stored fields.......OK [2477731 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 8 of 18: name=_t6w docCount=4340938
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=451.389
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [26 fields]
> test: terms, freq, prox...OK [157817 terms; 121753990 terms/docs
> pairs; 151141568 tokens]
> test: stored fields.......OK [4340938 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 9 of 18: name=_ucx docCount=8018451
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=845.968
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [26 fields]
> test: terms, freq, prox...OK [410057 terms; 227617455 terms/docs
> pairs; 283398975 tokens]
> test: stored fields.......OK [8018451 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 10 of 18: name=_xl9 docCount=13891933
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=1,408.881
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [26 fields]
> test: terms, freq, prox...OK [535226 terms; 376394493 terms/docs
> pairs; 465003060 tokens]
> test: stored fields.......OK [13891933 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 11 of 18: name=_xlb docCount=3521
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=0.411
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [25 fields]
> test: terms, freq, prox...OK [2070 terms; 108496 terms/docs pairs;
> 136365 tokens]
> test: stored fields.......OK [3521 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 12 of 18: name=_xlc docCount=3529
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=0.412
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [22 fields]
> test: terms, freq, prox...OK [2214 terms; 108633 terms/docs pairs;
> 136384 tokens]
> test: stored fields.......OK [3529 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 13 of 18: name=_xle docCount=1401
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=0.188
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [21 fields]
> test: terms, freq, prox...OK [1854 terms; 48703 terms/docs pairs;
> 62221 tokens]
> test: stored fields.......OK [1401 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 14 of 18: name=_xlf docCount=1399
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=0.188
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [21 fields]
> test: terms, freq, prox...OK [1763 terms; 48910 terms/docs pairs;
> 63035 tokens]
> test: stored fields.......OK [1399 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 15 of 18: name=_xlh docCount=1727
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=0.24
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [21 fields]
> test: terms, freq, prox...OK [2318 terms; 62596 terms/docs pairs;
> 80688 tokens]
> test: stored fields.......OK [1727 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 16 of 18: name=_xli docCount=1716
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=0.237
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [21 fields]
> test: terms, freq, prox...OK [2264 terms; 61867 terms/docs pairs;
> 79497 tokens]
> test: stored fields.......OK [1716 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 17 of 18: name=_xlk docCount=2921
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=0.364
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [25 fields]
> test: terms, freq, prox...OK [2077 terms; 96536 terms/docs pairs;
> 123166 tokens]
> test: stored fields.......OK [2921 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 18 of 18: name=_xll docCount=3876
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=0.476
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [21 fields]
> test: terms, freq, prox...OK [2261 terms; 130104 terms/docs pairs;
> 166867 tokens]
> test: stored fields.......OK [3876 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> WARNING: 1 broken segments (containing 1124653 documents) detected
> WARNING: 1124653 documents will be lost
>
> NOTE: will write new segments file in 5 seconds; this will remove
> 1124653 docs from the index. THIS IS YOUR LAST CHANCE TO CTRL+C!
> 5...
> 4...
> 3...
> 2...
> 1...
> Writing...
> OK
> Wrote new segments file "segments_1588"
>
>
> Any ideas how would the index get corrupted?
>
> Thanks,
> -vivek
>
>
> On Sat, Jan 3, 2009 at 8:04 AM, Brian Whitman <brian [at] echonest>
> wrote:
>>>
>>>
>>> It's very strange that CheckIndex -fix did not resolve the issue.
>>> After
>>> fixing it, if you re-run CheckIndex on the index do you still see
>>> that
>>> original one broken segment present? CheckIndex should have removed
>>> reference to that one segment.
>>>
>>
>> I just ran it again, and it detected the same error and claimed to
>> fix it. I
>> then shut down the solr server (I wasn't sure if this would be an
>> issue),
>> ran it a third time (where it again found and claimed to fix the
>> error),
>> then a fourth where it did not find any problems, and now the
>> optimize()
>> call on the running server does not throw the merge exception.
>>
>> Did this corruption happen only once? (You mentioned hitting dups
>> in
>> the past...
>>> but did you also see corruption too?)
>>
>>
>> Not that we know of, but it's very likely we never noticed. (The
>> only reason
>> I discovered this was our commits were taking 20-40x longer on this
>> index
>> than others)
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Feb 26, 2009, 3:58 AM

Post #20 of 29 (5583 views)
Permalink
Re: background merge hit exception [In reply to]

Also: what sorts of stored fields are you storing? Binary?
Compressed? Text with unicode characters? Roughly how many stored
fields per document?

Mike

vivek sar wrote:

> Hi,
>
> We ran into the same issue (corrupted index) using Lucene 2.4.0.
> There was no outage or system reboot - not sure how could it get
> corrupted. Here is the exception,
>
> Caused by: java.io.IOException: background merge hit exception:
> _io5:c66777491 _nh9:c10656736 _taq:c2021563 _s8m:c1421051
> _uh5:c2065961 _r0y:c1124653 _s4s:c2477731 _t6w:c4340938 _ucx:c8018451
> _xkb:c13842776 _xkd:c3394 _xke:c3379 _xkg:c1231 _xkh:c1252 _xkj:c1680
> _xkk:c1689 into _xkl [optimize]
> at
> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2258)
> at
> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2194)
> at
> com
> .packetmotion
> .manager
> .query.fulltext.index.Indexer.optimizeMasterIndex(Indexer.java:887)
> at
> com
> .packetmotion
> .manager
> .query.fulltext.index.Indexer.createNewIndexPartition(Indexer.java:
> 783)
> at
> com
> .packetmotion
> .manager.query.fulltext.index.Indexer.indexByDevice(Indexer.java:582)
> at
> com
> .packetmotion
> .manager.query.fulltext.index.Indexer.indexData(Indexer.java:440)
> at com.packetmotion.manager.query.fulltext.index.Indexer$
> $FastClassByCGLIB$$97fb7e9b.invoke(<generated>)
> at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:149)
> at org.springframework.aop.framework.Cglib2AopProxy
> $DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:628)
> at com.packetmotion.manager.query.fulltext.index.Indexer$
> $EnhancerByCGLIB$$ebbe3914.indexData(<generated>)
> at com.packetmotion.manager.query.fulltext.index.IndexerJob
> $1.run(IndexerJob.java:38)
> ... 8 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 51, Size: 26
> at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> at java.util.ArrayList.get(ArrayList.java:322)
> at
> org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:265)
> at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:
> 185)
> at
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:729)
> at
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:
> 359)
> at
> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
> at
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4226)
> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:
> 3877)
> at
> org
> .apache
> .lucene
> .index
> .ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:205)
> at org.apache.lucene.index.ConcurrentMergeScheduler
> $MergeThread.run(ConcurrentMergeScheduler.java:260)
>
>
> I ran the checkIndex tool to fix the corrupted index. Here is its
> output,
>
>
> Opening index @ /opt/ps_manager/apps/conf/index/MasterIndex
>
> Segments file=segments_1587 numSegments=18 version=FORMAT_HAS_PROX
> [Lucene 2.4]
> 1 of 18: name=_io5 docCount=66777491
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=6,680.574
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [30 fields]
> test: terms, freq, prox...OK [368415 terms; 1761057989 terms/docs
> pairs; 2095636359 tokens]
> test: stored fields.......OK [66777491 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 2 of 18: name=_nh9 docCount=10656736
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=1,058.869
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [30 fields]
> test: terms, freq, prox...OK [118964 terms; 278176718 terms/docs
> pairs; 329825350 tokens]
> test: stored fields.......OK [10656736 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 3 of 18: name=_taq docCount=2021563
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=208.544
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [30 fields]
> test: terms, freq, prox...OK [145494 terms; 54131697 terms/docs
> pairs; 65411701 tokens]
> test: stored fields.......OK [2021563 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 4 of 18: name=_s8m docCount=1421051
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=162.443
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [30 fields]
> test: terms, freq, prox...OK [210276 terms; 42491363 terms/docs
> pairs; 53054214 tokens]
> test: stored fields.......OK [1421051 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 5 of 18: name=_uh5 docCount=2065961
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=229.394
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [30 fields]
> test: terms, freq, prox...OK [126789 terms; 60416663 terms/docs
> pairs; 75120265 tokens]
> test: stored fields.......OK [2065964 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 6 of 18: name=_r0y docCount=1124653
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=115.792
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [26 fields]
> test: terms, freq, prox...FAILED
> WARNING: fixIndex() would remove reference to this segment; full
> exception:
> java.lang.RuntimeException: term desthost:wir docFreq=6 != num docs
> seen 0 + num docs deleted 0
> at
> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:475)
> at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:676)
>
> 7 of 18: name=_s4s docCount=2477731
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=230.698
> no deletions
> test: open reader.........PuTTYPuTTYOK
> test: fields, norms.......OK [25 fields]
> test: terms, freq, prox...OK [90327 terms; 60212096 terms/docs
> pairs; 72740383 tokens]
> test: stored fields.......OK [2477731 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 8 of 18: name=_t6w docCount=4340938
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=451.389
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [26 fields]
> test: terms, freq, prox...OK [157817 terms; 121753990 terms/docs
> pairs; 151141568 tokens]
> test: stored fields.......OK [4340938 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 9 of 18: name=_ucx docCount=8018451
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=845.968
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [26 fields]
> test: terms, freq, prox...OK [410057 terms; 227617455 terms/docs
> pairs; 283398975 tokens]
> test: stored fields.......OK [8018451 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 10 of 18: name=_xl9 docCount=13891933
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=1,408.881
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [26 fields]
> test: terms, freq, prox...OK [535226 terms; 376394493 terms/docs
> pairs; 465003060 tokens]
> test: stored fields.......OK [13891933 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 11 of 18: name=_xlb docCount=3521
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=0.411
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [25 fields]
> test: terms, freq, prox...OK [2070 terms; 108496 terms/docs pairs;
> 136365 tokens]
> test: stored fields.......OK [3521 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 12 of 18: name=_xlc docCount=3529
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=0.412
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [22 fields]
> test: terms, freq, prox...OK [2214 terms; 108633 terms/docs pairs;
> 136384 tokens]
> test: stored fields.......OK [3529 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 13 of 18: name=_xle docCount=1401
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=0.188
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [21 fields]
> test: terms, freq, prox...OK [1854 terms; 48703 terms/docs pairs;
> 62221 tokens]
> test: stored fields.......OK [1401 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 14 of 18: name=_xlf docCount=1399
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=0.188
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [21 fields]
> test: terms, freq, prox...OK [1763 terms; 48910 terms/docs pairs;
> 63035 tokens]
> test: stored fields.......OK [1399 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 15 of 18: name=_xlh docCount=1727
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=0.24
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [21 fields]
> test: terms, freq, prox...OK [2318 terms; 62596 terms/docs pairs;
> 80688 tokens]
> test: stored fields.......OK [1727 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 16 of 18: name=_xli docCount=1716
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=0.237
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [21 fields]
> test: terms, freq, prox...OK [2264 terms; 61867 terms/docs pairs;
> 79497 tokens]
> test: stored fields.......OK [1716 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 17 of 18: name=_xlk docCount=2921
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=0.364
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [25 fields]
> test: terms, freq, prox...OK [2077 terms; 96536 terms/docs pairs;
> 123166 tokens]
> test: stored fields.......OK [2921 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> 18 of 18: name=_xll docCount=3876
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=0.476
> no deletions
> test: open reader.........OK
> test: fields, norms.......OK [21 fields]
> test: terms, freq, prox...OK [2261 terms; 130104 terms/docs pairs;
> 166867 tokens]
> test: stored fields.......OK [3876 total field count; avg 1
> fields per doc]
> test: term vectors........OK [.0 total vector count; avg 0
> term/freq vector fields per doc]
>
> WARNING: 1 broken segments (containing 1124653 documents) detected
> WARNING: 1124653 documents will be lost
>
> NOTE: will write new segments file in 5 seconds; this will remove
> 1124653 docs from the index. THIS IS YOUR LAST CHANCE TO CTRL+C!
> 5...
> 4...
> 3...
> 2...
> 1...
> Writing...
> OK
> Wrote new segments file "segments_1588"
>
>
> Any ideas how would the index get corrupted?
>
> Thanks,
> -vivek
>
>
> On Sat, Jan 3, 2009 at 8:04 AM, Brian Whitman <brian [at] echonest>
> wrote:
>>>
>>>
>>> It's very strange that CheckIndex -fix did not resolve the issue.
>>> After
>>> fixing it, if you re-run CheckIndex on the index do you still see
>>> that
>>> original one broken segment present? CheckIndex should have removed
>>> reference to that one segment.
>>>
>>
>> I just ran it again, and it detected the same error and claimed to
>> fix it. I
>> then shut down the solr server (I wasn't sure if this would be an
>> issue),
>> ran it a third time (where it again found and claimed to fix the
>> error),
>> then a fourth where it did not find any problems, and now the
>> optimize()
>> call on the running server does not throw the merge exception.
>>
>> Did this corruption happen only once? (You mentioned hitting dups
>> in
>> the past...
>>> but did you also see corruption too?)
>>
>>
>> Not that we know of, but it's very likely we never noticed. (The
>> only reason
>> I discovered this was our commits were taking 20-40x longer on this
>> index
>> than others)
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


koji at r

Sep 9, 2011, 3:25 PM

Post #21 of 29 (3895 views)
Permalink
Re: background merge hit exception [In reply to]

Hi Mike,

> This wouldn't be caused by the "NFS happily deletes open files"
> problem (= Stale NFS file handle error).
>
> But this could in theory be caused by the NFS client somehow being
> wrong about the file's metadata (file length). It's sort of odd
> because I would expect since the client wrote the file, there wouldn't
> be any stale client-side cache problems.

I see.

> What happened is SegmentMerger just merged all the stored docs, and as
> a check in the end it verifies that the fdx file size is exactly 4 +
> numDocs*8 bytes in length, but in your case it wasn't -- it was 10572
> bytes short, and so it aborts the merge.
>
> Is it possible disk filled up? Though I'd expect an IOE during write
> or close in that case.
>
> In this case nothing should be lost in the index: the merge simply
> refused to commit itself, since it detected something went wrong. But
> I believe we also have the same check during flush... have they hit an
> exception during flush?

I'll ask the user if there were such cases.

> Also: what java version are they running? We added this check
> originally as a workaround for a JRE bug... but usually when that bug
> strikes the file size is very close (like off by just 1 byte or 8
> bytes or something).

I think they are using 1.6, but I should ask the minor number.
Could you show me the pointer of the JRE bug you mentioned?

Thank you very much!

koji

>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> 2011/9/9 Koji Sekiguchi<koji [at] r>:
>> A user here hit the exception the title says when optimizing. They're using Solr 1.4
>> (Lucene 2.9) running on a server that mounts NFS for index.
>>
>> I think I know the famous "Stale NFS File Handle IOException" problem, but I think it causes
>> FileNoutFoundException. Is there any chance to hit the exception in the title due
>> to NFS? If so what is the mechanism?
>>
>> The full stack trace is:
>>
>> 2011/09/07 9:40:00 org.apache.solr.update.DirectUpdateHandler2 commit
>> INFO: start commit(optimize=true,waitFlush=true,waitSearcher=true,expungeDeletes=false)
>>
>> :
>>
>> 2011/09/07 9:40:52 org.apache.solr.update.processor.LogUpdateProcessor finish
>> INFO: {} 0 52334
>> 2011/09/07 9:40:52 org.apache.solr.common.SolrException log
>> FATAL: java.io.IOException: background merge hit exception: _73ie:C290089 _73if:C34 _73ig:C31
>> _73ir:C356 into _73is [optimize] [mergeDocStores]
>> at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2908)
>> at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2829)
>> at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:403)
>> at
>> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
>> at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:169)
>> at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
>> at
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>> at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>> at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>> at
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>> at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>> at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>> at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>> at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525)
>> at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>> at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>> at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:568)
>> at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>> at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>> at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
>> at
>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
>> at java.lang.Thread.run(Thread.java:619)
>> Caused by: java.lang.RuntimeException: mergeFields produced an invalid result: docCount is 290089
>> but fdx file size is 2310144 file=_73is.fdx file exists?=true; now aborting this merge to prevent
>> index corruption
>> at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:369)
>> at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:153)
>> at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5112)
>> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4675)
>> at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235)
>> at
>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291\
>> )
>>
>>
>> koji
>> --
>> Check out "Query Log Visualizer" for Apache Solr
>> http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
>> http://www.rondhuit.com/en/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


--
Check out "Query Log Visualizer" for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


koji at r

Sep 9, 2011, 3:50 PM

Post #22 of 29 (3897 views)
Permalink
Re: background merge hit exception [In reply to]

>> Also: what java version are they running? We added this check
>> originally as a workaround for a JRE bug... but usually when that bug
>> strikes the file size is very close (like off by just 1 byte or 8
>> bytes or something).
>
> I think they are using 1.6, but I should ask the minor number.
> Could you show me the pointer of the JRE bug you mentioned?

I found:

https://issues.apache.org/jira/browse/LUCENE-1282
https://issues.apache.org/jira/browse/LUCENE-1869

koji
--
Check out "Query Log Visualizer" for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Sep 10, 2011, 3:58 AM

Post #23 of 29 (3889 views)
Permalink
Re: background merge hit exception [In reply to]

On Fri, Sep 9, 2011 at 6:50 PM, Koji Sekiguchi <koji [at] r> wrote:
>>> Also: what java version are they running? We added this check
>>> originally as a workaround for a JRE bug... but usually when that bug
>>> strikes the file size is very close (like off by just 1 byte or 8
>>> bytes or something).
>>
>> I think they are using 1.6, but I should ask the minor number.
>> Could you show me the pointer of the JRE bug you mentioned?
>
> I found:
>
> https://issues.apache.org/jira/browse/LUCENE-1282
> https://issues.apache.org/jira/browse/LUCENE-1869

Yes, it's LUCENE-1282!

Somehow hotspot had trouble properly compiling an if statement
embedded in a for loop when that if statement "rarely" changes its
condition. But I believe it was fixed by 1.6.0_10.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


simon.willnauer at googlemail

Sep 10, 2011, 10:49 AM

Post #24 of 29 (3885 views)
Permalink
Re: background merge hit exception [In reply to]

On Sat, Sep 10, 2011 at 12:58 PM, Michael McCandless
<lucene [at] mikemccandless> wrote:
> On Fri, Sep 9, 2011 at 6:50 PM, Koji Sekiguchi <koji [at] r> wrote:
>>>> Also: what java version are they running? We added this check
>>>> originally as a workaround for a JRE bug... but usually when that bug
>>>> strikes the file size is very close (like off by just 1 byte or 8
>>>> bytes or something).
>>>
>>> I think they are using 1.6, but I should ask the minor number.
>>> Could you show me the pointer of the JRE bug you mentioned?
>>
>> I found:
>>
>> https://issues.apache.org/jira/browse/LUCENE-1282
>> https://issues.apache.org/jira/browse/LUCENE-1869
>
> Yes, it's LUCENE-1282!
>
> Somehow hotspot had trouble properly compiling an if statement
> embedded in a for loop when that if statement "rarely" changes its
> condition.  But I believe it was fixed by 1.6.0_10.

oh boy, sometimes I think using plain C isn't the worst idea :/

simon
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


koji at r

Sep 12, 2011, 5:42 PM

Post #25 of 29 (3866 views)
Permalink
Re: background merge hit exception [In reply to]

I've got some follow-up from the user.

> Is it possible disk filled up? Though I'd expect an IOE during write
> or close in that case.
>
> In this case nothing should be lost in the index: the merge simply
> refused to commit itself, since it detected something went wrong. But
> I believe we also have the same check during flush... have they hit an
> exception during flush?

They couldn't find any errors, including disk full, in their solr log/tomcat
log/syslog, except the exception in the title.

> Also: what java version are they running? We added this check
> originally as a workaround for a JRE bug... but usually when that bug
> strikes the file size is very close (like off by just 1 byte or 8
> bytes or something).

They are using JDK6u15.

If you think up something the cause of this problem, please let me know!

koji
--
Check out "Query Log Visualizer" for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/

(11/09/09 21:36), Michael McCandless wrote:
> Interesting...
>
> This wouldn't be caused by the "NFS happily deletes open files"
> problem (= Stale NFS file handle error).
>
> But this could in theory be caused by the NFS client somehow being
> wrong about the file's metadata (file length). It's sort of odd
> because I would expect since the client wrote the file, there wouldn't
> be any stale client-side cache problems.
>
> What happened is SegmentMerger just merged all the stored docs, and as
> a check in the end it verifies that the fdx file size is exactly 4 +
> numDocs*8 bytes in length, but in your case it wasn't -- it was 10572
> bytes short, and so it aborts the merge.
>
> Is it possible disk filled up? Though I'd expect an IOE during write
> or close in that case.
>
> In this case nothing should be lost in the index: the merge simply
> refused to commit itself, since it detected something went wrong. But
> I believe we also have the same check during flush... have they hit an
> exception during flush?
>
> Also: what java version are they running? We added this check
> originally as a workaround for a JRE bug... but usually when that bug
> strikes the file size is very close (like off by just 1 byte or 8
> bytes or something).
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> 2011/9/9 Koji Sekiguchi<koji [at] r>:
>> A user here hit the exception the title says when optimizing. They're using Solr 1.4
>> (Lucene 2.9) running on a server that mounts NFS for index.
>>
>> I think I know the famous "Stale NFS File Handle IOException" problem, but I think it causes
>> FileNoutFoundException. Is there any chance to hit the exception in the title due
>> to NFS? If so what is the mechanism?
>>
>> The full stack trace is:
>>
>> 2011/09/07 9:40:00 org.apache.solr.update.DirectUpdateHandler2 commit
>> INFO: start commit(optimize=true,waitFlush=true,waitSearcher=true,expungeDeletes=false)
>>
>> :
>>
>> 2011/09/07 9:40:52 org.apache.solr.update.processor.LogUpdateProcessor finish
>> INFO: {} 0 52334
>> 2011/09/07 9:40:52 org.apache.solr.common.SolrException log
>> FATAL: java.io.IOException: background merge hit exception: _73ie:C290089 _73if:C34 _73ig:C31
>> _73ir:C356 into _73is [optimize] [mergeDocStores]
>> at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2908)
>> at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2829)
>> at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:403)
>> at
>> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
>> at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:169)
>> at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
>> at
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>> at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>> at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>> at
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>> at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>> at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>> at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>> at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525)
>> at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>> at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>> at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:568)
>> at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>> at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>> at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
>> at
>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
>> at java.lang.Thread.run(Thread.java:619)
>> Caused by: java.lang.RuntimeException: mergeFields produced an invalid result: docCount is 290089
>> but fdx file size is 2310144 file=_73is.fdx file exists?=true; now aborting this merge to prevent
>> index corruption
>> at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:369)
>> at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:153)
>> at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5112)
>> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4675)
>> at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235)
>> at
>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291\
>> )
>>
>>
>> koji
>> --
>> Check out "Query Log Visualizer" for Apache Solr
>> http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
>> http://www.rondhuit.com/en/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

First page Previous page 1 2 Next page Last page  View All Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.