Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] [Commented] (SOLR-3180) ChaosMonkey test failures

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Feb 29, 2012, 6:37 AM

Post #1 of 3 (71 views)
Permalink
[jira] [Commented] (SOLR-3180) ChaosMonkey test failures

[ https://issues.apache.org/jira/browse/SOLR-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219246#comment-13219246 ]

Yonik Seeley commented on SOLR-3180:
------------------------------------

We've come a long way and the monkey has uncovered a number of bugs that we've fixed and is helping to make a really solid solution.

I just uncovered another one having to do with races on shutdown.
When we kill the Jetty instance, it can cause an interrupted exception that closes the underlying NIO files under lucene.
If a commit is happening concurrently then what can happen is that we can end up with more than one unfinished transaction log.

We call preCommit, which move the current tlog to prevTlog.
The commit fails, but concurrently other updates are coming in and they cause a new tlog to be created.
Even other updates coming in after this point can also succeed since they are simply buffered in memory by the IW.

> ChaosMonkey test failures
> -------------------------
>
> Key: SOLR-3180
> URL: https://issues.apache.org/jira/browse/SOLR-3180
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Reporter: Yonik Seeley
>
> Handle intermittent failures in the ChaosMonkey tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Mar 2, 2012, 2:15 PM

Post #2 of 3 (66 views)
Permalink
[jira] [Commented] (SOLR-3180) ChaosMonkey test failures [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221300#comment-13221300 ]

Yonik Seeley commented on SOLR-3180:
------------------------------------

Just checked in a fix for this as well as a test that recovers from more than one tlog at startup.

> ChaosMonkey test failures
> -------------------------
>
> Key: SOLR-3180
> URL: https://issues.apache.org/jira/browse/SOLR-3180
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Reporter: Yonik Seeley
>
> Handle intermittent failures in the ChaosMonkey tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Mar 3, 2012, 2:05 PM

Post #3 of 3 (60 views)
Permalink
[jira] [Commented] (SOLR-3180) ChaosMonkey test failures [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221712#comment-13221712 ]

Yonik Seeley commented on SOLR-3180:
------------------------------------

Failures are *much* less frequent, but I still got one after about 7 hours I think.
I saw a commit fail (due to the interrupted exception), but then I later saw the IW.close() succeed (which caused Solr to cap the log file, assuming that everything was in the index).

As a result, I just committed a change to the shutdown code to do an explicit commit.

> ChaosMonkey test failures
> -------------------------
>
> Key: SOLR-3180
> URL: https://issues.apache.org/jira/browse/SOLR-3180
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Reporter: Yonik Seeley
>
> Handle intermittent failures in the ChaosMonkey tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.