Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Jun 1, 2012, 10:50 AM

Post #1 of 4 (85 views)
Permalink
[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

[ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287563#comment-13287563 ]

Michael McCandless commented on LUCENE-2858:
--------------------------------------------

Not entirely sure why but it looks like the commits here slowed down our NRT reopen latency. If you look at the nightly bench graph: http://people.apache.org/~mikemccand/lucenebench/nrt.html and click + drag from Jan 2012 to today, annotation R shows we increased from ~46 msec NRT reopen latency to ~50 msec ... could just be hotspot being upset...

> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
> Key: LUCENE-2858
> URL: https://issues.apache.org/jira/browse/LUCENE-2858
> Project: Lucene - Java
> Issue Type: Task
> Reporter: Uwe Schindler
> Assignee: Uwe Schindler
> Priority: Blocker
> Fix For: 4.0
>
> Attachments: LUCENE-2858-FCinsanity.patch, LUCENE-2858-FixSlowEnsureOpen.patch, LUCENE-2858.patch, LUCENE-2858.patch
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 3, 2012, 3:00 AM

Post #2 of 4 (75 views)
Permalink
[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288129#comment-13288129 ]

Michael McCandless commented on LUCENE-2858:
--------------------------------------------

bq. Not entirely sure why but it looks like the commits here slowed down our NRT reopen latency.

OK, sorry, I was wrong about this!

I went back and re-ran the NRTPerfTest and isolated the slowdown to LUCENE-3728 (also committed on the same day)... it was because we lost the SegmentInfo.sizeInBytes caching... but then in LUCENE-4055 we got it back and we got the performance back ... so all's well that ends well :)

> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
> Key: LUCENE-2858
> URL: https://issues.apache.org/jira/browse/LUCENE-2858
> Project: Lucene - Java
> Issue Type: Task
> Reporter: Uwe Schindler
> Assignee: Uwe Schindler
> Priority: Blocker
> Fix For: 4.0
>
> Attachments: LUCENE-2858-FCinsanity.patch, LUCENE-2858-FixSlowEnsureOpen.patch, LUCENE-2858.patch, LUCENE-2858.patch
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 3, 2012, 3:17 AM

Post #3 of 4 (75 views)
Permalink
[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288133#comment-13288133 ]

Uwe Schindler commented on LUCENE-2858:
---------------------------------------

Thanks Mike for taking care. For me this looked crazy, because refactoring like this should not change perf.

This explains the change back after 4055, I thought un-compressed oops responsible for the speedup.

> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
> Key: LUCENE-2858
> URL: https://issues.apache.org/jira/browse/LUCENE-2858
> Project: Lucene - Java
> Issue Type: Task
> Reporter: Uwe Schindler
> Assignee: Uwe Schindler
> Priority: Blocker
> Fix For: 4.0
>
> Attachments: LUCENE-2858-FCinsanity.patch, LUCENE-2858-FixSlowEnsureOpen.patch, LUCENE-2858.patch, LUCENE-2858.patch
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 3, 2012, 3:21 AM

Post #4 of 4 (75 views)
Permalink
[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288135#comment-13288135 ]

Michael McCandless commented on LUCENE-2858:
--------------------------------------------

bq. This explains the change back after 4055, I thought un-compressed oops responsible for the speedup.

That's what I thought too (and I was very confused that uncompressed oops would improve things)! But LUCENE-4055 landed on the same day I turned off uncompressed oops... so I think all mysteries are now explained.

> Separate SegmentReaders (and other atomic readers) from composite IndexReaders
> ------------------------------------------------------------------------------
>
> Key: LUCENE-2858
> URL: https://issues.apache.org/jira/browse/LUCENE-2858
> Project: Lucene - Java
> Issue Type: Task
> Reporter: Uwe Schindler
> Assignee: Uwe Schindler
> Priority: Blocker
> Fix For: 4.0
>
> Attachments: LUCENE-2858-FCinsanity.patch, LUCENE-2858-FixSlowEnsureOpen.patch, LUCENE-2858.patch, LUCENE-2858.patch
>
>
> With current trunk, whenever you open an IndexReader on a directory you get back a DirectoryReader which is a composite reader. The interface of IndexReader has now lots of methods that simply throw UOE (in fact more than 50% of all methods that are commonly used ones are unuseable now). This confuses users and makes the API hard to understand.
> This issue should split "atomic readers" from "reader collections" with a separate API. After that, you are no longer able, to get TermsEnum without wrapping from those composite readers. We currently have helper classes for wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*), those should be retrofitted to implement the correct classes (SlowMultiReaderWrapper would be an atomic reader but takes a composite reader as ctor param, maybe it could also simply take a List<AtomicReader>). In my opinion, maybe composite readers could implement some collection APIs and also have the ReaderUtil method directly built in (possibly as a "view" in the util.Collection sense). In general composite readers do not really need to look like the previous IndexReaders, they could simply be a "collection" of SegmentReaders with some functionality like reopen.
> On the other side, atomic readers do not need reopen logic anymore? When a segment changes, you need a new atomic reader? - maybe because of deletions thats not the best idea, but we should investigate. Maybe make the whole reopen logic simplier to use (ast least on the collection reader level).
> We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.