Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] [Commented] (LUCENE-4025) ReferenceManager.maybeRefresh should allow the caller to block

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Apr 30, 2012, 9:17 AM

Post #1 of 7 (65 views)
Permalink
[jira] [Commented] (LUCENE-4025) ReferenceManager.maybeRefresh should allow the caller to block

[ https://issues.apache.org/jira/browse/LUCENE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265004#comment-13265004 ]

Michael McCandless commented on LUCENE-4025:
--------------------------------------------

I'm torn here...

Always blocking seems dangerous because "simple" apps, that call .maybeRefresh() from searching threads, will suddenly see all search threads block when a reopen is happening ... I think it's too trappy to do that. I think when it comes to concurrent code you should try hard to not have "blocks all threads" trappiness...

But I also agree it's a hassle now for callers that want to ensure specific changes are now visible... though, we do have NRTManager for that use case.

Maybe we can add an optional boolean (doWait, defaulting to false) to maybeRefresh? Or we a separate method for it (maybeRefreshAndWaitIfItsAlreadyHappening), or, we can make decoupled method (waitForRefreshToFinish) and change maybeRefresh to return 3 results (IS_CURRENT, DID_REFRESH, OTHER_THREAD_IS_REFRESHING)... or, something else?

> ReferenceManager.maybeRefresh should allow the caller to block
> --------------------------------------------------------------
>
> Key: LUCENE-4025
> URL: https://issues.apache.org/jira/browse/LUCENE-4025
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/search
> Reporter: Shai Erera
> Priority: Minor
> Fix For: 4.0
>
>
> ReferenceManager.maybeRefresh() returns a boolean today, specifying whether the maybeRefresh logic was executed by the caller or not. If it's false, it means that another thread is currently refreshing and the call returns immediately.
> I think that that's inconvenient to the caller. I.e., if you wanted to do something like:
> {code}
> writer.commit();
> searcherManager.maybeRefresh();
> searcherManager.acquire();
> {code}
> It'd be better if you could guarantee that when the maybeRefresh() call returned, the follow on acquire() will return a refreshed IndexSearcher. Even if you omit the commit instruction, it'd be good if you can guarantee that.
> I don't quite see the benefit of having the caller thread not block if there's another thread currently refreshing. In, I believe, most cases, you'd anyway have just one thread calling maybeRefresh(). Even if not, the only benefit of not blocking is if you have commit() followed by maybeRefresh() logic done by some threads, while other threads acquire searchers - maybe then you wouldn't care if another thread is currently doing the refresh?
> Actually, I tend to think that not blocking is buggy? I mean, what if two threads do commit() + maybeRefresh(). The first thread finishes commit, enters maybeRefresh(), acquires the lock and reopens the Reader. Then the second thread does its commit(), enters maybeRefresh, fails to obtain the lock and exits. Its changes won't be exposed by SM until the next maybeRefresh() is called.
> So it looks to me like current logic may be buggy in that sense?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Apr 30, 2012, 9:55 AM

Post #2 of 7 (58 views)
Permalink
[jira] [Commented] (LUCENE-4025) ReferenceManager.maybeRefresh should allow the caller to block [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265027#comment-13265027 ]

Hoss Man commented on LUCENE-4025:
----------------------------------

isn't this exactly why things like the ExecutorService and/or Future APIs exist? let the caller decide if they want the methods they call to be executed in the current thread or as part of a thread pool and/or block until the logic is executed (with a possible time limit)

> ReferenceManager.maybeRefresh should allow the caller to block
> --------------------------------------------------------------
>
> Key: LUCENE-4025
> URL: https://issues.apache.org/jira/browse/LUCENE-4025
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/search
> Reporter: Shai Erera
> Priority: Minor
> Fix For: 4.0
>
>
> ReferenceManager.maybeRefresh() returns a boolean today, specifying whether the maybeRefresh logic was executed by the caller or not. If it's false, it means that another thread is currently refreshing and the call returns immediately.
> I think that that's inconvenient to the caller. I.e., if you wanted to do something like:
> {code}
> writer.commit();
> searcherManager.maybeRefresh();
> searcherManager.acquire();
> {code}
> It'd be better if you could guarantee that when the maybeRefresh() call returned, the follow on acquire() will return a refreshed IndexSearcher. Even if you omit the commit instruction, it'd be good if you can guarantee that.
> I don't quite see the benefit of having the caller thread not block if there's another thread currently refreshing. In, I believe, most cases, you'd anyway have just one thread calling maybeRefresh(). Even if not, the only benefit of not blocking is if you have commit() followed by maybeRefresh() logic done by some threads, while other threads acquire searchers - maybe then you wouldn't care if another thread is currently doing the refresh?
> Actually, I tend to think that not blocking is buggy? I mean, what if two threads do commit() + maybeRefresh(). The first thread finishes commit, enters maybeRefresh(), acquires the lock and reopens the Reader. Then the second thread does its commit(), enters maybeRefresh, fails to obtain the lock and exits. Its changes won't be exposed by SM until the next maybeRefresh() is called.
> So it looks to me like current logic may be buggy in that sense?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Apr 30, 2012, 12:25 PM

Post #3 of 7 (55 views)
Permalink
[jira] [Commented] (LUCENE-4025) ReferenceManager.maybeRefresh should allow the caller to block [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265132#comment-13265132 ]

Shai Erera commented on LUCENE-4025:
------------------------------------

bq. I'm torn here...

I'm also torn here :).

bq. Always blocking seems dangerous because "simple" apps, that call .maybeRefresh() from searching threads, will suddenly see all search threads block when a reopen is happening

maybeRefresh says that you should call this *periodically*. I consider SearcherManager kind of advanced API. Calling maybeRefresh() on every search is like calling IndexReader.openIfChanged before every search. If we need to, let's document the implications of the method.

Not blocking might be buggy, while blocking might affect your performance. I think that the performance issue is really and edge case of stupidity, while the former is a simple expectation. maybeRefresh documents that if it returns false, it might be that the next acquire won't return the most up-to-date IndexSearcher, but it doesn't give you any way to ensure that.

As for the extra API, can we start by unnecessarily complicating the API? It looks to me that the API is clear with plenty of sample code and documentation (Mike, you even wrote a couple of blog posts about it). It's just a matter of semantics and if we tell people to call maybeRefresh periodically, then let's help them by ensuring this call does something.

> ReferenceManager.maybeRefresh should allow the caller to block
> --------------------------------------------------------------
>
> Key: LUCENE-4025
> URL: https://issues.apache.org/jira/browse/LUCENE-4025
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/search
> Reporter: Shai Erera
> Priority: Minor
> Fix For: 4.0
>
>
> ReferenceManager.maybeRefresh() returns a boolean today, specifying whether the maybeRefresh logic was executed by the caller or not. If it's false, it means that another thread is currently refreshing and the call returns immediately.
> I think that that's inconvenient to the caller. I.e., if you wanted to do something like:
> {code}
> writer.commit();
> searcherManager.maybeRefresh();
> searcherManager.acquire();
> {code}
> It'd be better if you could guarantee that when the maybeRefresh() call returned, the follow on acquire() will return a refreshed IndexSearcher. Even if you omit the commit instruction, it'd be good if you can guarantee that.
> I don't quite see the benefit of having the caller thread not block if there's another thread currently refreshing. In, I believe, most cases, you'd anyway have just one thread calling maybeRefresh(). Even if not, the only benefit of not blocking is if you have commit() followed by maybeRefresh() logic done by some threads, while other threads acquire searchers - maybe then you wouldn't care if another thread is currently doing the refresh?
> Actually, I tend to think that not blocking is buggy? I mean, what if two threads do commit() + maybeRefresh(). The first thread finishes commit, enters maybeRefresh(), acquires the lock and reopens the Reader. Then the second thread does its commit(), enters maybeRefresh, fails to obtain the lock and exits. Its changes won't be exposed by SM until the next maybeRefresh() is called.
> So it looks to me like current logic may be buggy in that sense?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Apr 30, 2012, 1:41 PM

Post #4 of 7 (58 views)
Permalink
[jira] [Commented] (LUCENE-4025) ReferenceManager.maybeRefresh should allow the caller to block [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265195#comment-13265195 ]

Simon Willnauer commented on LUCENE-4025:
-----------------------------------------

my take on this is that if you really want this point in time semantics you are describing you should use your own application locking on top. You can simply use a read/write lock and acquire the write lock if you enter the "commit" block which I assume you don't do too frequently. The other option is to simply call maybeRefresh in an empty loop. The non-blocking fashion is a big asset here IMO since you can't build a non-blocking app with blocking components but you can do the other way around.

bq. Not blocking might be buggy, while blocking might affect your performance.

that is bogus IMO.

> ReferenceManager.maybeRefresh should allow the caller to block
> --------------------------------------------------------------
>
> Key: LUCENE-4025
> URL: https://issues.apache.org/jira/browse/LUCENE-4025
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/search
> Reporter: Shai Erera
> Priority: Minor
> Fix For: 4.0
>
>
> ReferenceManager.maybeRefresh() returns a boolean today, specifying whether the maybeRefresh logic was executed by the caller or not. If it's false, it means that another thread is currently refreshing and the call returns immediately.
> I think that that's inconvenient to the caller. I.e., if you wanted to do something like:
> {code}
> writer.commit();
> searcherManager.maybeRefresh();
> searcherManager.acquire();
> {code}
> It'd be better if you could guarantee that when the maybeRefresh() call returned, the follow on acquire() will return a refreshed IndexSearcher. Even if you omit the commit instruction, it'd be good if you can guarantee that.
> I don't quite see the benefit of having the caller thread not block if there's another thread currently refreshing. In, I believe, most cases, you'd anyway have just one thread calling maybeRefresh(). Even if not, the only benefit of not blocking is if you have commit() followed by maybeRefresh() logic done by some threads, while other threads acquire searchers - maybe then you wouldn't care if another thread is currently doing the refresh?
> Actually, I tend to think that not blocking is buggy? I mean, what if two threads do commit() + maybeRefresh(). The first thread finishes commit, enters maybeRefresh(), acquires the lock and reopens the Reader. Then the second thread does its commit(), enters maybeRefresh, fails to obtain the lock and exits. Its changes won't be exposed by SM until the next maybeRefresh() is called.
> So it looks to me like current logic may be buggy in that sense?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Apr 30, 2012, 10:25 PM

Post #5 of 7 (55 views)
Permalink
[jira] [Commented] (LUCENE-4025) ReferenceManager.maybeRefresh should allow the caller to block [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265663#comment-13265663 ]

Shai Erera commented on LUCENE-4025:
------------------------------------

Calling maybeRefresh in an empty loop is not good. You'd want to at least add some sleep in between calls. And then you need to decide on the sleep interval. Complicate things.

What's not clear to me is why was this API made non-blocking in the first place? Did someone complain about it being blocking?

bq. The non-blocking fashion is a big asset here IMO since you can't build a non-blocking app with blocking components but you can do the other way around.

That's generally a correct statement, but who said that in the context of a ReferenceManager you want to build a non-blocking app?

Perhaps a maybeRefreshBlocking() will be the best compromise after all. That method won't return anything and will just block on reopenLock. I'll create a patch, let's see how it looks first. While I'm at it, I'll rename reopenLock to refreshLock (reopenLock was from the time it was in SearcherManager).

> ReferenceManager.maybeRefresh should allow the caller to block
> --------------------------------------------------------------
>
> Key: LUCENE-4025
> URL: https://issues.apache.org/jira/browse/LUCENE-4025
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/search
> Reporter: Shai Erera
> Priority: Minor
> Fix For: 4.0
>
>
> ReferenceManager.maybeRefresh() returns a boolean today, specifying whether the maybeRefresh logic was executed by the caller or not. If it's false, it means that another thread is currently refreshing and the call returns immediately.
> I think that that's inconvenient to the caller. I.e., if you wanted to do something like:
> {code}
> writer.commit();
> searcherManager.maybeRefresh();
> searcherManager.acquire();
> {code}
> It'd be better if you could guarantee that when the maybeRefresh() call returned, the follow on acquire() will return a refreshed IndexSearcher. Even if you omit the commit instruction, it'd be good if you can guarantee that.
> I don't quite see the benefit of having the caller thread not block if there's another thread currently refreshing. In, I believe, most cases, you'd anyway have just one thread calling maybeRefresh(). Even if not, the only benefit of not blocking is if you have commit() followed by maybeRefresh() logic done by some threads, while other threads acquire searchers - maybe then you wouldn't care if another thread is currently doing the refresh?
> Actually, I tend to think that not blocking is buggy? I mean, what if two threads do commit() + maybeRefresh(). The first thread finishes commit, enters maybeRefresh(), acquires the lock and reopens the Reader. Then the second thread does its commit(), enters maybeRefresh, fails to obtain the lock and exits. Its changes won't be exposed by SM until the next maybeRefresh() is called.
> So it looks to me like current logic may be buggy in that sense?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

May 1, 2012, 6:41 AM

Post #6 of 7 (50 views)
Permalink
[jira] [Commented] (LUCENE-4025) ReferenceManager.maybeRefresh should allow the caller to block [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265800#comment-13265800 ]

Michael McCandless commented on LUCENE-4025:
--------------------------------------------

Approach & patch look good!

Maybe change javadocs to state that "you must call maybeRefresh or maybeRefreshBlocking periodically" (now each one states you must call that one). Also maybe say "if another thread is currently refreshing, this method blocks until that thread completes".

> ReferenceManager.maybeRefresh should allow the caller to block
> --------------------------------------------------------------
>
> Key: LUCENE-4025
> URL: https://issues.apache.org/jira/browse/LUCENE-4025
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/search
> Reporter: Shai Erera
> Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-4025.patch
>
>
> ReferenceManager.maybeRefresh() returns a boolean today, specifying whether the maybeRefresh logic was executed by the caller or not. If it's false, it means that another thread is currently refreshing and the call returns immediately.
> I think that that's inconvenient to the caller. I.e., if you wanted to do something like:
> {code}
> writer.commit();
> searcherManager.maybeRefresh();
> searcherManager.acquire();
> {code}
> It'd be better if you could guarantee that when the maybeRefresh() call returned, the follow on acquire() will return a refreshed IndexSearcher. Even if you omit the commit instruction, it'd be good if you can guarantee that.
> I don't quite see the benefit of having the caller thread not block if there's another thread currently refreshing. In, I believe, most cases, you'd anyway have just one thread calling maybeRefresh(). Even if not, the only benefit of not blocking is if you have commit() followed by maybeRefresh() logic done by some threads, while other threads acquire searchers - maybe then you wouldn't care if another thread is currently doing the refresh?
> Actually, I tend to think that not blocking is buggy? I mean, what if two threads do commit() + maybeRefresh(). The first thread finishes commit, enters maybeRefresh(), acquires the lock and reopens the Reader. Then the second thread does its commit(), enters maybeRefresh, fails to obtain the lock and exits. Its changes won't be exposed by SM until the next maybeRefresh() is called.
> So it looks to me like current logic may be buggy in that sense?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

May 1, 2012, 7:45 AM

Post #7 of 7 (49 views)
Permalink
[jira] [Commented] (LUCENE-4025) ReferenceManager.maybeRefresh should allow the caller to block [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265842#comment-13265842 ]

Michael McCandless commented on LUCENE-4025:
--------------------------------------------

+1, looks good. Thanks Shai!

> ReferenceManager.maybeRefresh should allow the caller to block
> --------------------------------------------------------------
>
> Key: LUCENE-4025
> URL: https://issues.apache.org/jira/browse/LUCENE-4025
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/search
> Reporter: Shai Erera
> Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-4025.patch, LUCENE-4025.patch
>
>
> ReferenceManager.maybeRefresh() returns a boolean today, specifying whether the maybeRefresh logic was executed by the caller or not. If it's false, it means that another thread is currently refreshing and the call returns immediately.
> I think that that's inconvenient to the caller. I.e., if you wanted to do something like:
> {code}
> writer.commit();
> searcherManager.maybeRefresh();
> searcherManager.acquire();
> {code}
> It'd be better if you could guarantee that when the maybeRefresh() call returned, the follow on acquire() will return a refreshed IndexSearcher. Even if you omit the commit instruction, it'd be good if you can guarantee that.
> I don't quite see the benefit of having the caller thread not block if there's another thread currently refreshing. In, I believe, most cases, you'd anyway have just one thread calling maybeRefresh(). Even if not, the only benefit of not blocking is if you have commit() followed by maybeRefresh() logic done by some threads, while other threads acquire searchers - maybe then you wouldn't care if another thread is currently doing the refresh?
> Actually, I tend to think that not blocking is buggy? I mean, what if two threads do commit() + maybeRefresh(). The first thread finishes commit, enters maybeRefresh(), acquires the lock and reopens the Reader. Then the second thread does its commit(), enters maybeRefresh, fails to obtain the lock and exits. Its changes won't be exposed by SM until the next maybeRefresh() is called.
> So it looks to me like current logic may be buggy in that sense?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.