Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] [Commented] (LUCENE-4080) SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Jun 27, 2012, 9:56 AM

Post #1 of 6 (210 views)
Permalink
[jira] [Commented] (LUCENE-4080) SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs

[ https://issues.apache.org/jira/browse/LUCENE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402353#comment-13402353 ]

Robert Muir commented on LUCENE-4080:
-------------------------------------

I think its cleaner not to have the 'if numDocs >= 0' in SegmentReader ctor#2
Instead i think ctor #1 should just forward docCount - delCount like ctor#3

> SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs
> --------------------------------------------------------------------------
>
> Key: LUCENE-4080
> URL: https://issues.apache.org/jira/browse/LUCENE-4080
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/index
> Affects Versions: 4.0, 4.1
> Reporter: Adrien Grand
> Priority: Trivial
> Fix For: 4.1
>
> Attachments: LUCENE-4080.patch
>
>
> At merge time, SegmentReader sometimes gives an incorrect value for numDeletedDocs.
> From LUCENE-2357:
> bq. As far as I know, [SegmenterReader.numDeletedDocs() is] only unreliable in this context (SegmentReader passed to SegmentMerger for merging); this is because we allow newly marked deleted docs to happen concurrently up until the moment we need to pass the SR instance to the merger (search for "// Must sync to ensure BufferedDeletesStream" in IndexWriter.java) ... but it would be nice to fix that, so I think open a new issue (it won't block this one)? We should be able to make a new SR instance, sharing the same core as the current one but using the correct delCount...
> bq. It would be cleaner (but I think hairier) to create a new SR for merging that holds the correct delCount, but let's do that under the separate issue.
> bq. it would be best if the SegmentReader's numDeletedDocs were always correct, but, fixing that in IndexWriter is somewhat tricky. Ie, the fix could be hairy but the end result ("SegmentReader.numDeletedDocs can always be trusted") would be cleaner...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 27, 2012, 10:17 AM

Post #2 of 6 (204 views)
Permalink
[jira] [Commented] (LUCENE-4080) SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402365#comment-13402365 ]

Robert Muir commented on LUCENE-4080:
-------------------------------------

Also is it ok in mergeMiddle that we call rld.getMergeReader inside the sync?

Previously, we never did actual i/o here...

> SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs
> --------------------------------------------------------------------------
>
> Key: LUCENE-4080
> URL: https://issues.apache.org/jira/browse/LUCENE-4080
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/index
> Affects Versions: 4.0, 4.1
> Reporter: Adrien Grand
> Priority: Trivial
> Fix For: 4.1
>
> Attachments: LUCENE-4080.patch
>
>
> At merge time, SegmentReader sometimes gives an incorrect value for numDeletedDocs.
> From LUCENE-2357:
> bq. As far as I know, [SegmenterReader.numDeletedDocs() is] only unreliable in this context (SegmentReader passed to SegmentMerger for merging); this is because we allow newly marked deleted docs to happen concurrently up until the moment we need to pass the SR instance to the merger (search for "// Must sync to ensure BufferedDeletesStream" in IndexWriter.java) ... but it would be nice to fix that, so I think open a new issue (it won't block this one)? We should be able to make a new SR instance, sharing the same core as the current one but using the correct delCount...
> bq. It would be cleaner (but I think hairier) to create a new SR for merging that holds the correct delCount, but let's do that under the separate issue.
> bq. it would be best if the SegmentReader's numDeletedDocs were always correct, but, fixing that in IndexWriter is somewhat tricky. Ie, the fix could be hairy but the end result ("SegmentReader.numDeletedDocs can always be trusted") would be cleaner...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 28, 2012, 4:07 AM

Post #3 of 6 (201 views)
Permalink
[jira] [Commented] (LUCENE-4080) SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403007#comment-13403007 ]

Michael McCandless commented on LUCENE-4080:
--------------------------------------------

I only looked briefly at the patch ... but: could we do the liveDocs/numDeletedDocs copy up above, in IW (syncd), and pass them to RLD.getMergeReader? Then we don't need to cutover to ReentrantLock?

> SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs
> --------------------------------------------------------------------------
>
> Key: LUCENE-4080
> URL: https://issues.apache.org/jira/browse/LUCENE-4080
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/index
> Affects Versions: 4.0, 4.1
> Reporter: Adrien Grand
> Priority: Trivial
> Fix For: 4.1
>
> Attachments: LUCENE-4080.patch, LUCENE-4080.patch
>
>
> At merge time, SegmentReader sometimes gives an incorrect value for numDeletedDocs.
> From LUCENE-2357:
> bq. As far as I know, [SegmenterReader.numDeletedDocs() is] only unreliable in this context (SegmentReader passed to SegmentMerger for merging); this is because we allow newly marked deleted docs to happen concurrently up until the moment we need to pass the SR instance to the merger (search for "// Must sync to ensure BufferedDeletesStream" in IndexWriter.java) ... but it would be nice to fix that, so I think open a new issue (it won't block this one)? We should be able to make a new SR instance, sharing the same core as the current one but using the correct delCount...
> bq. It would be cleaner (but I think hairier) to create a new SR for merging that holds the correct delCount, but let's do that under the separate issue.
> bq. it would be best if the SegmentReader's numDeletedDocs were always correct, but, fixing that in IndexWriter is somewhat tricky. Ie, the fix could be hairy but the end result ("SegmentReader.numDeletedDocs can always be trusted") would be cleaner...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 28, 2012, 4:15 AM

Post #4 of 6 (200 views)
Permalink
[jira] [Commented] (LUCENE-4080) SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403011#comment-13403011 ]

Robert Muir commented on LUCENE-4080:
-------------------------------------

I don't understand the concurrency here but thats what I read from the issue description: my concern was just that its not obvious in the first patch if we are actually just opening an SR with an existing core inside this sync or not, and I dont even know if its a problem :)

> SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs
> --------------------------------------------------------------------------
>
> Key: LUCENE-4080
> URL: https://issues.apache.org/jira/browse/LUCENE-4080
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/index
> Affects Versions: 4.0, 4.1
> Reporter: Adrien Grand
> Priority: Trivial
> Fix For: 4.1
>
> Attachments: LUCENE-4080.patch, LUCENE-4080.patch
>
>
> At merge time, SegmentReader sometimes gives an incorrect value for numDeletedDocs.
> From LUCENE-2357:
> bq. As far as I know, [SegmenterReader.numDeletedDocs() is] only unreliable in this context (SegmentReader passed to SegmentMerger for merging); this is because we allow newly marked deleted docs to happen concurrently up until the moment we need to pass the SR instance to the merger (search for "// Must sync to ensure BufferedDeletesStream" in IndexWriter.java) ... but it would be nice to fix that, so I think open a new issue (it won't block this one)? We should be able to make a new SR instance, sharing the same core as the current one but using the correct delCount...
> bq. It would be cleaner (but I think hairier) to create a new SR for merging that holds the correct delCount, but let's do that under the separate issue.
> bq. it would be best if the SegmentReader's numDeletedDocs were always correct, but, fixing that in IndexWriter is somewhat tricky. Ie, the fix could be hairy but the end result ("SegmentReader.numDeletedDocs can always be trusted") would be cleaner...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 28, 2012, 4:33 AM

Post #5 of 6 (203 views)
Permalink
[jira] [Commented] (LUCENE-4080) SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403017#comment-13403017 ]

Adrien Grand commented on LUCENE-4080:
--------------------------------------

@Robert I think this is an issue since one must hold the IndexWriter lock to perform deletes on any segment (cf. assert Thread.holdsLock(writer); in ReadersAndLiveDocs.delete).

@Michael Oh right, I think I understand what you mean. I'll try to produce a better patch.

> SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs
> --------------------------------------------------------------------------
>
> Key: LUCENE-4080
> URL: https://issues.apache.org/jira/browse/LUCENE-4080
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/index
> Affects Versions: 4.0, 4.1
> Reporter: Adrien Grand
> Priority: Trivial
> Fix For: 4.1
>
> Attachments: LUCENE-4080.patch, LUCENE-4080.patch
>
>
> At merge time, SegmentReader sometimes gives an incorrect value for numDeletedDocs.
> From LUCENE-2357:
> bq. As far as I know, [SegmenterReader.numDeletedDocs() is] only unreliable in this context (SegmentReader passed to SegmentMerger for merging); this is because we allow newly marked deleted docs to happen concurrently up until the moment we need to pass the SR instance to the merger (search for "// Must sync to ensure BufferedDeletesStream" in IndexWriter.java) ... but it would be nice to fix that, so I think open a new issue (it won't block this one)? We should be able to make a new SR instance, sharing the same core as the current one but using the correct delCount...
> bq. It would be cleaner (but I think hairier) to create a new SR for merging that holds the correct delCount, but let's do that under the separate issue.
> bq. it would be best if the SegmentReader's numDeletedDocs were always correct, but, fixing that in IndexWriter is somewhat tricky. Ie, the fix could be hairy but the end result ("SegmentReader.numDeletedDocs can always be trusted") would be cleaner...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jul 3, 2012, 4:30 PM

Post #6 of 6 (204 views)
Permalink
[jira] [Commented] (LUCENE-4080) SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406168#comment-13406168 ]

Michael McCandless commented on LUCENE-4080:
--------------------------------------------

New patch looks great! I like this solution: much simpler. Maybe just add a comment explaining why we must sometimes make a new reader? (Ie, that deletes could have snuck in after we pulled the merge reader but before the sync block where we get the live docs)?

It's nice to simply pass around the reader and not the pair of liveDocs + reader...

We can remove liveDocs and delCount args to SegmentMerger.add now?

+1, thanks Adrien!

> SegmentReader.numDeletedDocs() sometimes gives an incorrect numDeletedDocs
> --------------------------------------------------------------------------
>
> Key: LUCENE-4080
> URL: https://issues.apache.org/jira/browse/LUCENE-4080
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/index
> Affects Versions: 4.0, 4.1
> Reporter: Adrien Grand
> Priority: Trivial
> Fix For: 4.1
>
> Attachments: LUCENE-4080.patch, LUCENE-4080.patch, LUCENE-4080.patch
>
>
> At merge time, SegmentReader sometimes gives an incorrect value for numDeletedDocs.
> From LUCENE-2357:
> bq. As far as I know, [SegmenterReader.numDeletedDocs() is] only unreliable in this context (SegmentReader passed to SegmentMerger for merging); this is because we allow newly marked deleted docs to happen concurrently up until the moment we need to pass the SR instance to the merger (search for "// Must sync to ensure BufferedDeletesStream" in IndexWriter.java) ... but it would be nice to fix that, so I think open a new issue (it won't block this one)? We should be able to make a new SR instance, sharing the same core as the current one but using the correct delCount...
> bq. It would be cleaner (but I think hairier) to create a new SR for merging that holds the correct delCount, but let's do that under the separate issue.
> bq. it would be best if the SegmentReader's numDeletedDocs were always correct, but, fixing that in IndexWriter is somewhat tricky. Ie, the fix could be hairy but the end result ("SegmentReader.numDeletedDocs can always be trusted") would be cleaner...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.