Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] Commented: (LUCENE-743) IndexReader.reopen()

 

 

First page Previous page 1 2 3 Next page Last page  View All Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Oct 17, 2007, 2:23 PM

Post #26 of 71 (774 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535753 ]

Michael Busch commented on LUCENE-743:
--------------------------------------

{quote}
How about just defining the behavior such that any pending changes are flushed. That would make it more useful because you could then reopen readers you used for deletes.
{quote}

Hmm, I'm not sure I understand. A reader which is being used for deletes or setting norms is always current (it owns the write lock), so there should never be the need to re-open such a reader.

However, if you re-open an existing reader which was not used for deletes before and use the new instance (b) to perform deletes, it will result in a undefined behavior for the old reader (a):

{code:java}
IndexReader a = .....
....
IndexReader b = a.reopen();
b.deleteDocument(...);
{code}

> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 18, 2007, 9:55 AM

Post #27 of 71 (773 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535970 ]

Yonik Seeley commented on LUCENE-743:
-------------------------------------

{quote}
A reader which is being used for deletes or setting norms is always current (it owns the write lock), so there should never be the need to re-open such a reader.
{quote}

I was thinking about the "add some docs" then "delete some docs" scenario:
One currently needs to close() the deleting reader to open an IndexWriter. If IndexReader.commit() was public, then one could simply flush changes, and then call reopen() after the IndexWriter was done adding new documents. But perhaps longer term, all deletions should be done via the IndexWriter anyway.

{quote}
if you re-open an existing reader which was not used for deletes before and use the new instance (b) to perform deletes
{quote}

Ah, thanks for that clarification. I guess that should remain undefined.


> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 18, 2007, 12:49 PM

Post #28 of 71 (767 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536028 ]

Michael Busch commented on LUCENE-743:
--------------------------------------

Hmm one other thing: how should IndexReader.close() work? If we re-open a reader (a is the old reader, b is the new one), and then someone calls a.close(), then a should not close any resources that it shares with b.

One way to make this work would be reference counting. Since we want to avoid that, we could simply restrict reopen() from being called twice for the same reader. Then there would never be more than 2 readers sharing the same ressources. The old reader a would remember that reopen() was called already and would not close the shared ressources on close(). However, the new reader b would close all ressources, meaning that reader a would not work anymore after b.close() was called.
Thoughts?

> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 18, 2007, 1:04 PM

Post #29 of 71 (768 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536033 ]

Michael McCandless commented on LUCENE-743:
-------------------------------------------

I think reference counting would solve this issue quite nicely. How come we want to avoid reference counting? It seems like the right solution here.

The implementation seems simple. When a reader is opened, it starts with RC 1. When it is closed, it decrefs the RC and marks itself closed (to make sure double-close does not re-decref the RC). When a MultiReader needs to use the reader, it calls incref. And when that MultiReader is done with it, it calls decref. Whenever the RC hits 0 it's safe to free all resources.

Wouldn't that work?

> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 18, 2007, 1:35 PM

Post #30 of 71 (772 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536041 ]

Yonik Seeley commented on LUCENE-743:
-------------------------------------

> When it is closed, it decrefs the RC and marks itself closed (to make sure double-close does not re-decref the RC)

But if a reader is shared, how do you tell two real closes from an erronous double-close?
Perhaps have a top level close() that is only invoked once via isClosed, and a projected doClose() that a multi-reader would use that actually does the decref?


> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 18, 2007, 1:45 PM

Post #31 of 71 (771 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536044 ]

Michael Busch commented on LUCENE-743:
--------------------------------------

{quote}
The implementation seems simple. When a reader is opened, it starts with RC 1. When it is closed, it decrefs the RC and marks itself closed (to make sure double-close does not re-decref the RC). When a MultiReader needs to use the reader, it calls incref. And when that MultiReader is done with it, it calls decref. Whenever the RC hits 0 it's safe to free all resources.

Wouldn't that work?
{quote}

You're right, for a MultiReader and ParallelReader this would work and wouldn't be hard to implement.

It is quite different when it comes to SegmentReaders. SegmentReader.reopen() checks SegmentInfos if there is a segment with the same name but newer normGen or delGen. If there is one then a new SegmentReader instance is created that reuses resources like e. g. TermInfosReader and loads the new generation of the del or norm file.

Now you can end up having a bunch of SegmentReaders that share the same resources but don't know about each other. The reference counting would have to happen somewhere else, e. g. in the TermInfosReader. Of course this is doable, but it's a special case and more complicated compared to the "restrict reopen() to only be called once"-approach.

> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 18, 2007, 2:53 PM

Post #32 of 71 (771 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536063 ]

Michael McCandless commented on LUCENE-743:
-------------------------------------------

> But if a reader is shared, how do you tell two real closes from an erronous double-close?
> Perhaps have a top level close() that is only invoked once via isClosed, and a projected doClose() that a multi-reader would use that actually does the decref?

Yes I think that's the right approach.

> It is quite different when it comes to SegmentReaders. SegmentReader.reopen() checks SegmentInfos if there is a segment with the same name but newer normGen or delGen. If there is one then a new SegmentReader instance is created that reuses resources like e. g. TermInfosReader and loads the new generation of the del or norm file.
>
> Now you can end up having a bunch of SegmentReaders that share the same resources but don't know about each other. The reference counting would have to happen somewhere else, e. g. in the TermInfosReader. Of course this is doable, but it's a special case and more complicated compared to the "restrict reopen() to only be called once"-approach.

For SegmentReader, I think on reopen(), everything would be shared
except norms and/or deletedDocs right? In which case couldn't all
cascaded reopens hold onto the original SegmentReader & call doClose()
on it when they are closed? (Ie it is the "owner" of all the shared
parts of a SegmentReader). Then, deletedDocs needs no protection (it
has no close()), and for Norms we could push the reference counting
down into it as well?

We wouldn't need to push reference counting into all the readers that
a SegmentReader holds because those are always shared when a
SegmentReader is reopened?

> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 19, 2007, 2:50 PM

Post #33 of 71 (763 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536353 ]

Michael Busch commented on LUCENE-743:
--------------------------------------

Hi Mike,

I'm not sure if I fully understand your comment. Consider the following (quite constructed) example:

{code:java}
IndexReader reader1 = IndexReader.open(index1); // optimized index, reader1 is a SegmentReader
IndexReader multiReader1 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index2)});
... // modify index2
IndexReader multiReader2 = multiReader1.reopen();
// only index2 changed, so multiReader2 uses reader1 and has to increment the refcount of reader1
... // modify index1
IndexReader reader2 = reader1.reopen();
// reader2 is a new instance that shares resources with reader1
... // modify index1
IndexReader reader3 = reader2.reopen();
// reader3 is a new instance that shares resources with reader1 and reader2
{code}

Now the user closes the readers in this order:
# multiReader1.close();
# multiReader2.close();
# reader2.close();
# reader3.close();

reader1 should be marked as closed after 2., right? Because multiReader1.close() and multiReader2.close() have to decrement reader1's refcount. But the underlying files have to remain open until after 4., because reader2 and reader3 use reader1's resources.

So don't we need 2 refcount values in reader1? One that tells us when the reader itself can be marked as closed, and one that tells when the resources can be closed? Then multiReader1 and multiReader2 would decrement the first refCount, whereas reader2 and reader3 both have to "know" reader1, so that they can decrement the second refcount.

I hope I'm just completely confused now and someone tells me that the whole thing is much simpler :-)



> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 20, 2007, 2:09 AM

Post #34 of 71 (759 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536409 ]

Michael McCandless commented on LUCENE-743:
-------------------------------------------

It's not nearly this complex (we don't need two ref counts). If we
follow the simple rule that "every time reader X wants to use reader
Y, it increfs it" and "whenver reader X is done using reader Y, it
decrefs it", all should work correctly.

Also we should think of "close()" as the way that the external user
does the decref of their reader. We just special-case this call, by
setting isOpen=false, to make sure we don't double decref on a double
close call.

Let's walk through your example...

I'm assuming in your example you meant for reader2 and reader3 to also
be SegmentReaders? Ie, the changes that are happening to the
single-segment index1 are just changes to norms and/or deletes. If
not, the example is less interesting because reader1 will be closed
sooner :)

Also in your example let's insert missing "reader1.close()" as the
very first close? (Else it will never be closed because it's RC never
hits 0).

When reader1 is created it has RC 1.

When multiReader1 is created, reader1 now has RC 2.

When multiReader2 is created, reader1 now has RC 3.

When reader2 is created (by reader1.reopen()), it incref's reader1
because it's sharing the sub-readers in reader1. So reader1 now has
RC 4.

When reader3 was created (by reader2.reopen()), it incref's reader2
because it's sharing the sub-readers reader2 contains. So reader1 is
still at RC 4 and reader2 is now at RC 2.

Now, we close.

After reader1.close() is called, reader1 sets isOpen=false (to prevent
double close by the user) and RC drops to 3.

With multiReader1.close(), multiReader1 is not at RC 0, and so it
decrefs all readers it was using, and so reader1 RC is now 2.

With multiReader2.close(), likewise it is now at RC 0 and so it
decrefs all readers it was using, and so reader1 RC is now 1.

With reader2.close(), it decrefs its own RC, however that brings its
RC to 1 (reader3 is still referring to it) and so it does not decref
the reader1 that it's referring to.

Finally, with reader3.close(), it is now at RC 0 and so it decrefs the
reader2 it refers to. This brings reader2's RC to 0, and so reader2
decrefs the reader1 that it's referring to. Which brings reader1's RC
to 0, and so reader1 finally closes all its internal sub-readers.


> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 20, 2007, 2:46 AM

Post #35 of 71 (763 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536413 ]

Michael Busch commented on LUCENE-743:
--------------------------------------

{quote}
I'm assuming in your example you meant for reader2 and reader3 to also
be SegmentReaders?
{quote}
Yes that's what I meant. Sorry, I didn't make that clear.

{quote}
Also in your example let's insert missing "reader1.close()" as the
very first close? (Else it will never be closed because it's RC never
hits 0).
{quote}
Doesn't what you describe change the semantics of MultiReader.close()?

If you do:
{code:java}
IndexReader reader1 = IndexReader.open(index1);
IndexReader multiReader1 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index2)});
multiReader1.close();
{code}

then today multiReader1.close() also closes reader1. That's why I consciously omitted reader1.close().

Consequently, if you do
{code:java}
IndexReader reader1 = IndexReader.open(index1);
IndexReader multiReader1 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index2)});
IndexReader multiReader2 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index3)});
multiReader1.close();
{code}
then multiReader2 is not usable anymore, because multiReader1.close() closes reader1. But that can be explicitly avoided by the user because it's known that multiReader1 and multiReader2 share the same reader.

Now, with the reopen() code:
{code:java}
IndexReader reader1 = IndexReader.open(index1); // optimized index, reader1 is a SegmentReader
IndexReader multiReader1 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index2)});
... // modify index2
IndexReader multiReader2 = multiReader1.reopen();
// only index2 changed, so multiReader2 uses reader1 and has to increment the refcount of reader1
{code}
The user gets a new reader instance from multiReader.reopen(), but can't tell which of the subreaders has been reopened and which are shared. That's why multiReader1.close() should not close reader1 in this case and we need refcounting in order to make this work.

So do you suggest that a MultiReader should increment the refcounts when it is opened as well (in the constructor)? I believe we can implement it like this, but as I said it changes the semantics of MultiReader.close() (and ParallelReader.close() is, I believe, the same). A user would then have to close subreaders manually.



> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 20, 2007, 3:45 AM

Post #36 of 71 (759 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536418 ]

Michael McCandless commented on LUCENE-743:
-------------------------------------------

{quote}
If you do:
{code:java}
IndexReader reader1 = IndexReader.open(index1);
IndexReader multiReader1 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index2)});
multiReader1.close();
{code}

then today multiReader1.close() also closes reader1. That's why I consciously omitted reader1.close().
{quote}

Ahh, I missed that MultiReader is allowed to close all readers that
were passed into it, when it is closed. OK, let's leave
reader1.close() out of the example.

It's somewhat "aggressive" of MultiReader/ParallelReader to do that?
If you go and use those same sub-readers in other MultiReaders then
they closing of the first MultiReader will then break the other ones?

I think we are forced to keep this semantics, for backwards
compatibility. But I don't really think MultiReader/ParallelReader
should actually be this aggressive. Maybe in the future we can add
ctors for MultiReader/ParallelReader that accept a "doClose" boolean
to turn this off.

Anyway, it's simple to preserve this semantics with reference
counting. It just means that IndexReader / MultiReader do not incref
the readers they receive, and, when they are done with those readers,
they must call their close(), not decref. Ie they "borrow the
reference" that was passed in, rather than incref'ing their own
reference, to the child readers.

With that change, plus the change below, your example works fine.

{quote}
Consequently, if you do
{code:java}
IndexReader reader1 = IndexReader.open(index1);
IndexReader multiReader1 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index2)});
IndexReader multiReader2 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index3)});
multiReader1.close();
{code}
then multiReader2 is not usable anymore, because multiReader1.close() closes reader1. But that can be explicitly avoided by the user because it's known that multiReader1 and multiReader2 share the same reader.
{quote}

This is why I don't like the semantics we have today -- I don't think
it's right that the multiReader1.close() breaks multiReader2.

{quote}
Now, with the reopen() code:
{code:java}
IndexReader reader1 = IndexReader.open(index1); // optimized index, reader1 is a SegmentReader
IndexReader multiReader1 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index2)});
... // modify index2
IndexReader multiReader2 = multiReader1.reopen();
// only index2 changed, so multiReader2 uses reader1 and has to increment the refcount of reader1
{code}
The user gets a new reader instance from multiReader.reopen(), but can't tell which of the subreaders has been reopened and which are shared. That's why multiReader1.close() should not close reader1 in this case and we need refcounting in order to make this work.
{quote}

Both of these cases are easy to fix with reference counting: we just
have to change ensureOpen() to assert that RC > 0 instead of
closed==false. Ie, a reader may still be used as long as its RC is
still non-zero.


> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 20, 2007, 4:06 AM

Post #37 of 71 (760 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536419 ]

Michael McCandless commented on LUCENE-743:
-------------------------------------------

{quote}
I think we are forced to keep this semantics, for backwards
compatibility. But I don't really think MultiReader/ParallelReader
should actually be this aggressive. Maybe in the future we can add
ctors for MultiReader/ParallelReader that accept a "doClose" boolean
to turn this off.
{quote}

Actually I retract this: it's no longer necessary as long as we change
ensureOpen to assert that RC > 0 instead of closed==false.

I think this is actually a nice unexpected side-effect of using
reference counting: it resolves this overly aggressive behavior of
MultiReader/ParallelReader.


> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2007, 12:36 PM

Post #38 of 71 (745 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536805 ]

Michael Busch commented on LUCENE-743:
--------------------------------------

{quote}
With that change, plus the change below, your example works fine.
{quote}

Two things:
- MultiReader/ParallelReader must not incref the subreaders on open()
as you said. But on reopen() it must incref the subreaders that
haven't changed and thus are shared with the old MultiReader/
ParallelReader. This further means, that the re-opened MultiReader/
ParallelReader must remember which of the subreaders to decref on
close(), right?

- If we change ensureOpen() like you suggest, then the user might
still be able to use reader1 (in my example), even after
reader1.close() was explicitly called. Probably not a big deal?


> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2007, 1:37 PM

Post #39 of 71 (739 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536827 ]

Michael McCandless commented on LUCENE-743:
-------------------------------------------

{quote}
* MultiReader/ParallelReader must not incref the subreaders on open()
as you said. But on reopen() it must incref the subreaders that
haven't changed and thus are shared with the old MultiReader/
ParallelReader. This further means, that the re-opened MultiReader/
ParallelReader must remember which of the subreaders to decref on
close(), right?
{quote}

Hmm, right. MultiReader/ParallelReader must keep track of whether it
should call decref() or close() on each of its child readers, when it
itself is closed.

{quote}
* If we change ensureOpen() like you suggest, then the user might
still be able to use reader1 (in my example), even after
reader1.close() was explicitly called. Probably not a big deal?
{quote}

I think this is OK?


> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2007, 2:36 PM

Post #40 of 71 (741 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536845 ]

Michael Busch commented on LUCENE-743:
--------------------------------------

{quote}
I think this is OK?
{quote}

This was essentially the reason why I suggested to use two refcount values:
one to control when to close a reader, and one to control when to close
it's (shared) resources in case of SegmentReader. That approach would not
alter the behaviour of IndexReader.close().
But I agree that your approach is simpler and I also think it is okay to
change ensureOpen() and accept the slight API change.

So I'll go ahead and implement the refcount approach unless anybody objects.

Oh and Mike, thanks for bearing with me :-)

> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2007, 2:55 PM

Post #41 of 71 (740 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536848 ]

Yonik Seeley commented on LUCENE-743:
-------------------------------------

What about a new constructor for MultiReader/ParallelReader that implements more sensible semantics (increment refcount on readers passed to it, and decrement on close())?

> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2007, 4:08 PM

Post #42 of 71 (740 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536862 ]

Michael Busch commented on LUCENE-743:
--------------------------------------

{quote}
What about a new constructor for MultiReader/ParallelReader that implements more sensible semantics (increment refcount on readers passed to it, and decrement on close())?
{quote}

Yeah, when reference counting is implemented then such a constructor should be easy to add.

> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2007, 4:31 PM

Post #43 of 71 (741 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536866 ]

Michael McCandless commented on LUCENE-743:
-------------------------------------------

{quote}
Oh and Mike, thanks for bearing with me :-)
{quote}

Thank you for bearing with me!

{quote}
What about a new constructor for MultiReader/ParallelReader that implements more sensible semantics (increment refcount on readers passed to it, and decrement on close())?
{quote}

+1

> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 30, 2007, 1:56 AM

Post #44 of 71 (702 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538705 ]

Michael McCandless commented on LUCENE-743:
-------------------------------------------

Patch looks great! I'm still working through it but found a few small
issues...

It might be good to put a "assert refCount > 0" at various places like
decRef(), incRef(), ensureOpen()? That would require changing the
constructors to init refCount=1 rather than incRef() it to 1.

I'm seeing a failure in contrib/memory testcase:

{code}
[junit] *********** FILE=./NOTICE.txt
[junit] Fatal error at query=Apache, file=./NOTICE.txt, anal=org.apache.lucene.analysis.SimpleAnalyzer [at] 34196
[junit] ------------- ---------------- ---------------
[junit] Testcase: testMany(org.apache.lucene.index.memory.MemoryIndexTest): Caused an ERROR
[junit] this IndexReader is closed
[junit] org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed
[junit] at org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:158)
[junit] at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:632)
[junit] at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:64)
[junit] at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:143)
[junit] at org.apache.lucene.search.Searcher.search(Searcher.java:118)
[junit] at org.apache.lucene.search.Searcher.search(Searcher.java:97)
[junit] at org.apache.lucene.index.memory.MemoryIndexTest.query(MemoryIndexTest.java:412)
[junit] at org.apache.lucene.index.memory.MemoryIndexTest.run(MemoryIndexTest.java:313)
[junit] at org.apache.lucene.index.memory.MemoryIndexTest.testMany(MemoryIndexTest.java:234)
{code}

I think it's because MemoryIndexReader (private class in
MemoryIndex.java) calls super(null) =
IndexReader.IndexReader(Directory) in its constructor, which does not
initialize the refCount to 1? If I insert incRef() into
IndexReader.IndexReader(Directory) constructor, the test passes, but
who else is using that constructor (ie will this double-incref in
those cases?).


> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743-take3.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 30, 2007, 4:42 AM

Post #45 of 71 (698 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538743 ]

Michael McCandless commented on LUCENE-743:
-------------------------------------------

OK I think this patch is very close! I finished reviewing it --
here's some more feedback:

- In multiple places you catch an IOException and undo the attempted
re-open, but shouldn't this be a try/finally instead so you also
clean up on hitting any unchecked exceptions?

- I think you need an explicit refCount for the Norm class in
SegmentReader.
.
Say I've done a chain of 10 re-opens for SegmentReader and each
time only the segment's norms has changed. I've closed all but
the last SegmentReader. At this point all 10 SegmentReaders are
still alive (RC > 0) and holding open all file handles for their
copies of the norms. So this will leak file handles/RAM with each
reopen?
.
To fix this, I think you just need to add refCount into Norm class
& set refCount to 1 in the constructor. Then, each each
SegmentReader calls Norm.decRef(), not Norm.close(), when it's
done. When refCount hits 0 then the Norm closes itself. Finally,
during re-open you should share a Norm instance (rather than open
a new one) if it had not changed from the previous SegmentReader.
.
For singleNormStream, I think each reopened SegmentReader should
always re-open this descriptor and then we can forcefully close
this stream when the SegmentReader is closed (what you are doing
now). Ie the SegmentReader fully owns singleNormStream.

- If you have a long series of reopens, then, all SegmentReaders in
the chain will remain alive. So this is a [small] memory leak
with time. I think if you changed referencedSegmentReader to
always be the *starting* SegmentReader then this chain is broken
and after 10 reopens only the original SegmentReader and the most
recent one will remain alive (assuming I closed all SegmentReaders
but the most recent one).


> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743-take3.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 31, 2007, 11:05 AM

Post #46 of 71 (690 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539139 ]

Michael McCandless commented on LUCENE-743:
-------------------------------------------

Looks great! All tests pass for me.

{quote}
OK I see. I made this change as well. I also made the change that
there is no chain, but one starting SegmentReader which all re-opened
ones reference (see below). Now this starting SegmentReader won't close
its norms until all other readers that reference it are closed (RC=0),
because only then doClose() is called, which calls closeNorms().
Do you see an easy way how to improve this?
{quote}

How about if SegmentReader.close() always calls Norm.decRef(),
immediately, for each Norm is has open? EG you could implement
doCloseUnsharedResources in SegmentReader and do it there). This way,
if the SegmentReader has been closed but it shares resources (and not
the Norms) with reopened SegmentReaders then its Norms would all
decRef to 0 & be closed.

Also make sure that if a SegmentReader is decRef'd to 0 and close was
never called, that also in this case you remember to call Norm.decRef
for all open norms.

One more comment: I think you can partially share Norm instances? Eg
if I have 2 fields that have norms, but only one of them changed since
I opened this SegmentReader, then the reopened SegmentReader could
share the Norm instance of the field that didn't change with the old
SegmentReader? But right now you're re-loading all the Norms.

Otherwise no more comments!

> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743-take3.patch, lucene-743-take4.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 31, 2007, 3:57 PM

Post #47 of 71 (690 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539208 ]

Michael Busch commented on LUCENE-743:
--------------------------------------

{quote}
How about if SegmentReader.close() always calls Norm.decRef(),
immediately, for each Norm is has open? EG you could implement
doCloseUnsharedResources in SegmentReader and do it there). This way,
{quote}

Hmm I was thinking about this before (that's actually why I put that
method in there). But I don't think this is gonna work. For example,
let's say we use a MultiReader that has two SegmentReader SR1 and SR2.
Now only SR2 changed, we reopen the MR which increases the refCount on
SR1, because it shares that SR. Now we close the old MultiReader, which
calls close() on SR1. If now SegmentReader.close() calls Norm.decRef(),
then it will close the norms even though they are still used by the new
MultiReader.


> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743-take3.patch, lucene-743-take4.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 31, 2007, 3:57 PM

Post #48 of 71 (691 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539209 ]

Michael Busch commented on LUCENE-743:
--------------------------------------

{quote}
One more comment: I think you can partially share Norm instances? Eg
{quote}
Good idea! Will make the change.

> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743-take3.patch, lucene-743-take4.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 1, 2007, 6:12 AM

Post #49 of 71 (683 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539359 ]

Michael McCandless commented on LUCENE-743:
-------------------------------------------

{quote}
Hmm I was thinking about this before (that's actually why I put that
method in there). But I don't think this is gonna work. For example,
let's say we use a MultiReader that has two SegmentReader SR1 and SR2.
Now only SR2 changed, we reopen the MR which increases the refCount on
SR1, because it shares that SR. Now we close the old MultiReader, which
calls close() on SR1. If now SegmentReader.close() calls Norm.decRef(),
then it will close the norms even though they are still used by the new
MultiReader.
{quote}

Ugh, you're right. The challenge is sometimes a reference to SR means
"I will use norms" (this is when MultiReader incRefs) but other times
it means "I will not use norms" (this is when SR incRefs due to
reopen).

OK, I like your original proposal: SR overrides incRef() and incrs its
RC as well as the RC for each Norm it's using. Then, in SR's
reopenSegment, you carefully incRef the "original" SR without
incRef'ing its Norms (except for those Norms you will keep).

Likewise, SR overrides decRef() to decr its RC and RC for each Norm.
But, when a reopened SR1.doClose() is called, you must carefully
decRef the RD of the original SR but not decRef each of its Norms
(except for those you had actually shared).

This way when MR calls SR.incRef/decRef then all Norms and the SR's RC
are incr'd/decr'd. But when SR1 shares resources with an original SR
it only incr's/decr's the refCount of the SR.


> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743-take3.patch, lucene-743-take4.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 2, 2007, 7:10 AM

Post #50 of 71 (682 views)
Permalink
[jira] Commented: (LUCENE-743) IndexReader.reopen() [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539588 ]

Yonik Seeley commented on LUCENE-743:
-------------------------------------

I just did a quick partial review of SegmentReader for thread safety only and ran across some potential issues

- It looks like fieldsReader is shared between clones(), and that isn't thread-safe (synchronization is done at the SegmentReader level, and now there is more than one)
- maybe the same issue with deletedDocs? mutual exclusion is no longer enforced.
- it looks like the norms Hashtable could be shared... looping over the individual norms and calling incRef doesn't seem safe for a number of reasons (for example, you might miss some just being added)
- reading new norms isn't safe...
synchronized norms(String field, byte[] bytes, int offset) uses the "norm' IndexInput that is shared. synchronization on a single reader no longer guarantees mutual exclusion.

There's probably other stuff, but I stopped looking. Since we are sharing things now, every method that was synchronized is now potentially unsafe. Synchronizing on the object being shared is probably a much better strategy now.

This is complex enough that in addition to review, I think we need a good multi-threaded test - 100 or 1000 threads over a ram directory, all changing, querying, retrieving docs, reopening, closing, etc.

> IndexReader.reopen()
> --------------------
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Otis Gospodnetic
> Assignee: Michael Busch
> Priority: Minor
> Fix For: 2.3
>
> Attachments: IndexReaderUtils.java, lucene-743-take2.patch, lucene-743-take3.patch, lucene-743-take4.patch, lucene-743-take5.patch, lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

First page Previous page 1 2 3 Next page Last page  View All Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.