Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Jun 25, 2008, 11:02 AM

Post #1 of 21 (840 views)
Permalink
[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

[ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608128#action_12608128 ]

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

Although this doesn't solve the general problem, this probably still makes sense to do now for the no-deletes case.
Todd, can you produce a patch? See http://wiki.apache.org/lucene-java/HowToContribute

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
> Key: LUCENE-1316
> URL: https://issues.apache.org/jira/browse/LUCENE-1316
> Project: Lucene - Java
> Issue Type: Bug
> Components: Query/Scoring
> Affects Versions: 2.3
> Environment: All
> Reporter: Todd Feak
> Priority: Minor
> Attachments: MatchAllDocsQuery.java
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
> if (!reader.isDeleted(id)) {
> TO:
> if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.
> Please consider making this adjustment for a future bug fix release.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 25, 2008, 11:02 AM

Post #2 of 21 (825 views)
Permalink
[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608129#action_12608129 ]

Hoss Man commented on LUCENE-1316:
----------------------------------

rather then attempting localized optimizations of individual Query classes, a more generalized improvements would probably be to change SegmentReader.isDeleted so that instead of the entire method being synchronized, it first checks if the segment has any deletions, and if not then enters a synchronized block to check deletedDocs.get(n).

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
> Key: LUCENE-1316
> URL: https://issues.apache.org/jira/browse/LUCENE-1316
> Project: Lucene - Java
> Issue Type: Bug
> Components: Query/Scoring
> Affects Versions: 2.3
> Environment: All
> Reporter: Todd Feak
> Priority: Minor
> Attachments: MatchAllDocsQuery.java
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
> if (!reader.isDeleted(id)) {
> TO:
> if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.
> Please consider making this adjustment for a future bug fix release.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 25, 2008, 11:14 AM

Post #3 of 21 (825 views)
Permalink
[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608134#action_12608134 ]

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

> a more generalized improvements would probably be to change SegmentReader.isDeleted so that instead of the entire method being synchronized

Right, but that's not totally back compatible. Code that depended on deletes being instantly visible across threads would no longer be guaranteed.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
> Key: LUCENE-1316
> URL: https://issues.apache.org/jira/browse/LUCENE-1316
> Project: Lucene - Java
> Issue Type: Bug
> Components: Query/Scoring
> Affects Versions: 2.3
> Environment: All
> Reporter: Todd Feak
> Priority: Minor
> Attachments: MatchAllDocsQuery.java
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
> if (!reader.isDeleted(id)) {
> TO:
> if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.
> Please consider making this adjustment for a future bug fix release.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 25, 2008, 11:24 AM

Post #4 of 21 (825 views)
Permalink
[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608137#action_12608137 ]

Hoss Man commented on LUCENE-1316:
----------------------------------

bq. Code that depended on deletes being instantly visible across threads would no longer be guaranteed.

you lost me there ... why would deletes be stop being instantly visible if we changed this...

{code}
public synchronized boolean isDeleted(int n) {
return (deletedDocs != null && deletedDocs.get(n));
}
{code}

...to this...

{code}
public boolean isDeleted(int n) {
if (null == deletedDocs) return false;
synchronized (this) { return (deletedDocs.get(n)); }
}
{code}

?

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
> Key: LUCENE-1316
> URL: https://issues.apache.org/jira/browse/LUCENE-1316
> Project: Lucene - Java
> Issue Type: Bug
> Components: Query/Scoring
> Affects Versions: 2.3
> Environment: All
> Reporter: Todd Feak
> Priority: Minor
> Attachments: MatchAllDocsQuery.java
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
> if (!reader.isDeleted(id)) {
> TO:
> if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.
> Please consider making this adjustment for a future bug fix release.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 25, 2008, 11:42 AM

Post #5 of 21 (822 views)
Permalink
[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608146#action_12608146 ]

robert engels commented on LUCENE-1316:
---------------------------------------

According to the java memory model, hasDeletions() would need to be synchronized as well , since if another thread did perform a deletion, it would need to be up to date.

This might work in later JVMs by declaring the deletedDocs variable volatile, but no guarantees.

Seems better to ALLOW this behavior, that a reader might not see up to date deletions made during a query, and do a single synchronized check of deletions at the start.



> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
> Key: LUCENE-1316
> URL: https://issues.apache.org/jira/browse/LUCENE-1316
> Project: Lucene - Java
> Issue Type: Bug
> Components: Query/Scoring
> Affects Versions: 2.3
> Environment: All
> Reporter: Todd Feak
> Priority: Minor
> Attachments: MatchAllDocsQuery.java
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
> if (!reader.isDeleted(id)) {
> TO:
> if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.
> Please consider making this adjustment for a future bug fix release.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 25, 2008, 11:44 AM

Post #6 of 21 (825 views)
Permalink
[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608147#action_12608147 ]

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

bq. why would deletes be stop being instantly visible

It's minor, but before, if thread A deleted a document, and then thread B checked if it was deleted, thread B was guaranteed to see that it was in fact deleted.

If the check for deletedDocs == null were moved outside of the synchronized, there's no guarantee when thread B will see (if ever) that deletedDocs has been set (no memory barrier).

It's not a major issue since client code shouldn't be written that way IMO, but it's worth factoring into the decision.


> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
> Key: LUCENE-1316
> URL: https://issues.apache.org/jira/browse/LUCENE-1316
> Project: Lucene - Java
> Issue Type: Bug
> Components: Query/Scoring
> Affects Versions: 2.3
> Environment: All
> Reporter: Todd Feak
> Priority: Minor
> Attachments: MatchAllDocsQuery.java
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
> if (!reader.isDeleted(id)) {
> TO:
> if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.
> Please consider making this adjustment for a future bug fix release.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 25, 2008, 11:46 AM

Post #7 of 21 (825 views)
Permalink
[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608148#action_12608148 ]

robert engels commented on LUCENE-1316:
---------------------------------------

According to

http://www.ibm.com/developerworks/java/library/j-jtp06197.html

declaring the deletedDocs volatile should do the trick.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
> Key: LUCENE-1316
> URL: https://issues.apache.org/jira/browse/LUCENE-1316
> Project: Lucene - Java
> Issue Type: Bug
> Components: Query/Scoring
> Affects Versions: 2.3
> Environment: All
> Reporter: Todd Feak
> Priority: Minor
> Attachments: MatchAllDocsQuery.java
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
> if (!reader.isDeleted(id)) {
> TO:
> if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.
> Please consider making this adjustment for a future bug fix release.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 25, 2008, 11:50 AM

Post #8 of 21 (824 views)
Permalink
[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608149#action_12608149 ]

robert engels commented on LUCENE-1316:
---------------------------------------

The Pattern#5 referenced (cheap read-write lock) is exactly what is trying to be accomplished.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
> Key: LUCENE-1316
> URL: https://issues.apache.org/jira/browse/LUCENE-1316
> Project: Lucene - Java
> Issue Type: Bug
> Components: Query/Scoring
> Affects Versions: 2.3
> Environment: All
> Reporter: Todd Feak
> Priority: Minor
> Attachments: MatchAllDocsQuery.java
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
> if (!reader.isDeleted(id)) {
> TO:
> if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.
> Please consider making this adjustment for a future bug fix release.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 25, 2008, 12:24 PM

Post #9 of 21 (819 views)
Permalink
[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608160#action_12608160 ]

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

bq. declaring the deletedDocs volatile should do the trick.

Right... that would be cheaper when no docs were deleted. But would it be more expensive when there were deleted docs (a volatile + a synchronized?) I don't know if lock coarsening could do anything with this case...

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
> Key: LUCENE-1316
> URL: https://issues.apache.org/jira/browse/LUCENE-1316
> Project: Lucene - Java
> Issue Type: Bug
> Components: Query/Scoring
> Affects Versions: 2.3
> Environment: All
> Reporter: Todd Feak
> Priority: Minor
> Attachments: MatchAllDocsQuery.java
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
> if (!reader.isDeleted(id)) {
> TO:
> if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.
> Please consider making this adjustment for a future bug fix release.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 25, 2008, 12:30 PM

Post #10 of 21 (823 views)
Permalink
[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608162#action_12608162 ]

Mark Miller commented on LUCENE-1316:
-------------------------------------

If I remember correctly, volatile does not work correctly until java 1.5. At best I think it was implementation dependent with the old memory model.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
> Key: LUCENE-1316
> URL: https://issues.apache.org/jira/browse/LUCENE-1316
> Project: Lucene - Java
> Issue Type: Bug
> Components: Query/Scoring
> Affects Versions: 2.3
> Environment: All
> Reporter: Todd Feak
> Priority: Minor
> Attachments: MatchAllDocsQuery.java
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
> if (!reader.isDeleted(id)) {
> TO:
> if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.
> Please consider making this adjustment for a future bug fix release.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 25, 2008, 1:50 PM

Post #11 of 21 (822 views)
Permalink
[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608183#action_12608183 ]

Hoss Man commented on LUCENE-1316:
----------------------------------

bq. if thread A deleted a document, and then thread B checked if it was deleted, thread B was guaranteed to see that it was in fact deleted.

Hmmm.... i'll take your word for it, but i don't follow the rational: the current synchronization just ensured that either the isDeleted() call will complete before the delete() call started or vice versa -- but you have no guarantee that thread B would run after thread A and get true. .... unless... is your point that without synchronization on the null check there's no garuntee that B will ever see the change to deletedDocs even if it does execute after delete() ?

either way: robert's point about hasDeletions() needing to be synchronized seems like a bigger issue -- isn't that a bug in the current implementation? assuming we fix that then it seems like the original issue is back to square one: synchro bottlenecks when there are no deletions.





> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
> Key: LUCENE-1316
> URL: https://issues.apache.org/jira/browse/LUCENE-1316
> Project: Lucene - Java
> Issue Type: Bug
> Components: Query/Scoring
> Affects Versions: 2.3
> Environment: All
> Reporter: Todd Feak
> Priority: Minor
> Attachments: MatchAllDocsQuery.java
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
> if (!reader.isDeleted(id)) {
> TO:
> if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.
> Please consider making this adjustment for a future bug fix release.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 25, 2008, 2:02 PM

Post #12 of 21 (824 views)
Permalink
[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608187#action_12608187 ]

robert engels commented on LUCENE-1316:
---------------------------------------

Hoss, that is indeed the case, another thread would see deletedDocs as null, even though another thread has set it

hasDeletions does not need to be synchronized if deletedDocs is volatile

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
> Key: LUCENE-1316
> URL: https://issues.apache.org/jira/browse/LUCENE-1316
> Project: Lucene - Java
> Issue Type: Bug
> Components: Query/Scoring
> Affects Versions: 2.3
> Environment: All
> Reporter: Todd Feak
> Priority: Minor
> Attachments: MatchAllDocsQuery.java
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
> if (!reader.isDeleted(id)) {
> TO:
> if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.
> Please consider making this adjustment for a future bug fix release.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 25, 2008, 2:04 PM

Post #13 of 21 (822 views)
Permalink
[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608189#action_12608189 ]

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

bq. is your point that without synchronization on the null check there's no garuntee that B will ever see the change to deletedDocs even if it does execute after delete()

Right... it's about the memory barrier.

The reality is that there is normally a need for higher level synchronization anyway. That's why it was always silly for things like StringBuffer to be synchronized.

bq. assuming we fix that then it seems like the original issue is back to square one: synchro bottlenecks when there are no deletions.

A scorer could just check once when initialized... there's never been any guarantees about in-flight queries immediately seeing deleted docs changes - now that *really* doesn't make sense. TermScorer grabs the whole bit vector at the start and never checks again.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
> Key: LUCENE-1316
> URL: https://issues.apache.org/jira/browse/LUCENE-1316
> Project: Lucene - Java
> Issue Type: Bug
> Components: Query/Scoring
> Affects Versions: 2.3
> Environment: All
> Reporter: Todd Feak
> Priority: Minor
> Attachments: MatchAllDocsQuery.java
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
> if (!reader.isDeleted(id)) {
> TO:
> if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.
> Please consider making this adjustment for a future bug fix release.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 26, 2008, 8:45 AM

Post #14 of 21 (762 views)
Permalink
[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608472#action_12608472 ]

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

20tps (queries per second?) for 4700 docs is *very* slow... how many threads were you testing with?

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
> Key: LUCENE-1316
> URL: https://issues.apache.org/jira/browse/LUCENE-1316
> Project: Lucene - Java
> Issue Type: Bug
> Components: Query/Scoring
> Affects Versions: 2.3
> Environment: All
> Reporter: Todd Feak
> Priority: Minor
> Attachments: MatchAllDocsQuery.java
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
> if (!reader.isDeleted(id)) {
> TO:
> if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.
> Please consider making this adjustment for a future bug fix release.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 26, 2008, 11:43 AM

Post #15 of 21 (763 views)
Permalink
[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608542#action_12608542 ]

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

bq. Test results show this solution performs equally as the other solution did.

That's good news (as I assume this was for an optimized index).
Would it be possible for you to try the same test on a non-optimized index (multi-segment) with a couple deletions?


> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
> Key: LUCENE-1316
> URL: https://issues.apache.org/jira/browse/LUCENE-1316
> Project: Lucene - Java
> Issue Type: Bug
> Components: Query/Scoring
> Affects Versions: 2.3
> Environment: All
> Reporter: Todd Feak
> Priority: Minor
> Attachments: LUCENE_1316.patch, MatchAllDocsQuery.java
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
> if (!reader.isDeleted(id)) {
> TO:
> if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.
> Please consider making this adjustment for a future bug fix release.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 27, 2008, 10:01 AM

Post #16 of 21 (745 views)
Permalink
[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608841#action_12608841 ]

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

I'll look into it (that's why it was marked as "prototype")

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
> Key: LUCENE-1316
> URL: https://issues.apache.org/jira/browse/LUCENE-1316
> Project: Lucene - Java
> Issue Type: Bug
> Components: Query/Scoring
> Affects Versions: 2.3
> Environment: All
> Reporter: Todd Feak
> Priority: Minor
> Attachments: LUCENE_1316.patch, MatchAllDocsQuery.java
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
> if (!reader.isDeleted(id)) {
> TO:
> if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.
> Please consider making this adjustment for a future bug fix release.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 27, 2008, 10:25 AM

Post #17 of 21 (744 views)
Permalink
[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608849#action_12608849 ]

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

Sigh... the problem is that things are done in a two step process by default.
A TermDocs is created, and then seek is called (by which time the impl is set already).

{code}
public TermDocs termDocs(Term term) throws IOException {
ensureOpen();
TermDocs termDocs = termDocs();
termDocs.seek(term);
return termDocs;
}

public abstract TermDocs termDocs() throws IOException;
{code}

So a full solution down this road would be slightly more involved (overriding termDocs(Term) in all the sublcasses rather than just termDocs())



> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
> Key: LUCENE-1316
> URL: https://issues.apache.org/jira/browse/LUCENE-1316
> Project: Lucene - Java
> Issue Type: Bug
> Components: Query/Scoring
> Affects Versions: 2.3
> Environment: All
> Reporter: Todd Feak
> Priority: Minor
> Attachments: LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
> if (!reader.isDeleted(id)) {
> TO:
> if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.
> Please consider making this adjustment for a future bug fix release.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 27, 2008, 5:57 PM

Post #18 of 21 (730 views)
Permalink
[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608956#action_12608956 ]

Hoss Man commented on LUCENE-1316:
----------------------------------

bq. TermDocs instance returned cannot be used to seek to a different term. However, this is minor and not a back compatibility concern since "null" was not previously a supported value.

so essentially this approach only improves MatchAllDocsQuery correct? .... Other use cases where lots of termDoc enumeration take place (RangeFilter and PrefixFilter type code) wouldn't' benefit from this at all.

Assuming the genuinely eliminating the synchronization is infeasible, the other approach that occurred to me along the lines of a "read only" IndexReader is that if we started providing a public method for getting the list of all deleted docs (either as a BitVector or as a DocIdSet or something) then it would be easy to implement a SnapshotFilteredIndexReader that on construction cached the current list of all deleted docs in the IndexReader it's wrapping, used it for all isDeleted() methods, and proxied all other methods to the underlying reader.

then things like MatchAllDocs, RangeFilter, and PrefixFilter could (optionally?) construct one of those prior to their big iteration loops, and use it in place of the original IndexReader. Trade the syncro bottleneck for deletion data as of the moment the query was started (a fair trade for most people)

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
> Key: LUCENE-1316
> URL: https://issues.apache.org/jira/browse/LUCENE-1316
> Project: Lucene - Java
> Issue Type: Bug
> Components: Query/Scoring
> Affects Versions: 2.3
> Environment: All
> Reporter: Todd Feak
> Priority: Minor
> Attachments: LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
> if (!reader.isDeleted(id)) {
> TO:
> if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.
> Please consider making this adjustment for a future bug fix release.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 2, 2008, 6:26 PM

Post #19 of 21 (633 views)
Permalink
[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610101#action_12610101 ]

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

bq. so essentially this approach only improves MatchAllDocsQuery correct?

It would essentially simulate a term indexed for every document, and improve any query that could use that (i.e. that needed to iterate over all documents). For things like MatchAllDocuments and FunctionQuery on a MultiReader, it should actually be superior to the elimination of all synchronization on isDeleted() since it also removes the binary search to find the correct reader for a document.

bq. Other use cases where lots of termDoc enumeration take place (RangeFilter and PrefixFilter type code) wouldn't' benefit from this at all.

Right, but using TermDocs, they already have the same style of optimization and hence suffer no synchronization overhead.

bq. the other approach that occurred to me along the lines of a "read only" IndexReader is that if we started providing a public method for getting the list of all deleted docs

Right... that could be more useful for someone that needs random access to isDeleted(), at the cost of greater setup cost, and greater memory usage. However the TermDocs approach solves the use-cases we have today in lucene-core.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
> Key: LUCENE-1316
> URL: https://issues.apache.org/jira/browse/LUCENE-1316
> Project: Lucene - Java
> Issue Type: Bug
> Components: Query/Scoring
> Affects Versions: 2.3
> Environment: All
> Reporter: Todd Feak
> Priority: Minor
> Attachments: LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
> if (!reader.isDeleted(id)) {
> TO:
> if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.
> Please consider making this adjustment for a future bug fix release.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 2, 2008, 9:48 PM

Post #20 of 21 (622 views)
Permalink
[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610123#action_12610123 ]

Hoss Man commented on LUCENE-1316:
----------------------------------


bq. Right, but using TermDocs, they already have the same style of optimization and hence suffer no synchronization overhead.

Arggg.... sorry, my bad: i was thinking reader.isDeleted was used in those
cases as well -- I was totally missing that SegmentTermDocs takes care of
it directly.

but ... thinking about how SegmentTermDocs are constructed for a moment,
isn't the (unsynchronized) usage of deletedDocs in SegmentTermDocs.next
prone to the same concern you had about removing (or reducing) the
synchronization in SegmentReader.isDeleted ... "deletes aren't instantly
visible across threads" ... are they?

Is SegmentTermDocs.next too lax in how it deals with deletedDocs, or is
SegmentReader.isDeleted to paranoid?



> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
> Key: LUCENE-1316
> URL: https://issues.apache.org/jira/browse/LUCENE-1316
> Project: Lucene - Java
> Issue Type: Bug
> Components: Query/Scoring
> Affects Versions: 2.3
> Environment: All
> Reporter: Todd Feak
> Priority: Minor
> Attachments: LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
> if (!reader.isDeleted(id)) {
> TO:
> if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.
> Please consider making this adjustment for a future bug fix release.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 3, 2008, 6:36 AM

Post #21 of 21 (596 views)
Permalink
[jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610228#action_12610228 ]

Yonik Seeley commented on LUCENE-1316:
--------------------------------------

{quote}but ... thinking about how SegmentTermDocs are constructed for a moment,
isn't the (unsynchronized) usage of deletedDocs in SegmentTermDocs.next
prone to the same concern you had about removing (or reducing) the
synchronization in SegmentReader.isDeleted ... "deletes aren't instantly
visible across threads" ... are they?
{quote}

No, deletes aren't instantly visible across threads (when one thread has started a query and the other thread does a delete). AFAIK it's always been that way, so I think it's probably acceptable. On a practical level, seeking on a TermDocs crosses synchronization points, so deletes won't go unrecognized by other threads forever either.

But there is a thread-safety issue I've been mulling over since I wrote this patch.
SegmentTermDocs.next() is fine... unsynchronized reads look benign on the BitVector class since writes just change a byte at a time. "count" could be off, but that's OK too... it will simply be a stale value since updates to it are atomic.

The issue is the possibility of grabbing a partially constructed BitVector object to begin with. Notice how I synchronize to avoid this in AllTermDocs:
{code}
protected AllTermDocs(SegmentReader parent) {
synchronized (parent) {
this.deletedDocs = parent.deletedDocs;
}
this.maxDoc = parent.maxDoc();
}
{code}

That should probably be done in SegmentTermDocs too. Without it, a null pointer exception or an array out of bounds exception could result when checking the BitVector.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
> Key: LUCENE-1316
> URL: https://issues.apache.org/jira/browse/LUCENE-1316
> Project: Lucene - Java
> Issue Type: Bug
> Components: Query/Scoring
> Affects Versions: 2.3
> Environment: All
> Reporter: Todd Feak
> Priority: Minor
> Attachments: LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch, MatchAllDocsQuery.java
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential synchronization bottleneck. However, the reason this bottleneck occurs is actually at a higher level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests for NOT queries, due to this bottleneck. The problem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized index exacerbates this issues, as there is only a single SegmentReader to synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader, much of this can be avoided. Especially in a read-only environment for production where you have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
> if (!reader.isDeleted(id)) {
> TO:
> if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement. We also got the same query results. I don't believe this will improve the situation for indexes that have deletions.
> Please consider making this adjustment for a future bug fix release.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.