Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force)

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Jun 24, 2008, 6:24 AM

Post #1 of 20 (673 views)
Permalink
[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force)

[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607609#action_12607609 ]

Jason Rutherglen commented on LUCENE-1314:
------------------------------------------

A package protected field "boolean openNewFieldsReader = true;" (defaults to true to mimic previous behavior) should be added to SegmentReader to allow subclasses to determine if they want a new fieldsReader opened everytime a reopen occurs. The SegmentReader.doClose would need to not close fieldsReader if the openNewFieldsReader was set to false.

The SegmentReader.reopenSegment method directly instantiates a SegmentReader rather than using IMPL like SegmentReader.get(Directory dir, SegmentInfo si, SegmentInfos sis, boolean closeDir, boolean ownDir, int readBufferSize, boolean doOpenStores) does.

In my SegmentReader subclass I am passing a lock and passing a reference to fieldsReader for global locking and a single fieldsReader across all instances. Otherwise there are too many instances of fieldsReader and file descriptors will be used up.

> IndexReader.reopen(boolean force)
> ---------------------------------
>
> Key: LUCENE-1314
> URL: https://issues.apache.org/jira/browse/LUCENE-1314
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.3.1
> Reporter: Jason Rutherglen
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: lucene-1314.patch
>
>
> Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 24, 2008, 10:02 PM

Post #2 of 20 (640 views)
Permalink
[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607857#action_12607857 ]

Nadav Har'El commented on LUCENE-1314:
--------------------------------------

At first glance, my opinion was that adding this flag to reopen() is confusing. reopen()'s current behavior is explained well in the documentation, and has a particular use case in mind (checking if the index has changed, and if it has, reopen it). Frankly, I didn't understand why reopen() should (even with the addition of a new parameter) clone or "copy on write" the IndexReader when the index hasn't changed.
If this capability is needed, wouldn't it have been clearer if IndexReader had some new clone() or copyOnWrite() (in IndexReader's case, a write would actually be a delete...) method that can be called to get a new object that behaves independently from the previous one when it comes to writing (again, a delete)?
In your code, you could then do something like

newIndexReader = indexReader.reopen();
if(newIndexReader==indexReader)
newIndexReader = indexReader.clone(); // copy on write
else {
oldIndexReader.close(); // most applications won't do this here, but never mind now.
}
indexReader = newIndexReader;

I thought that this was a cleaner API, because reopen() isn't complicated with an extra flag that has nothing to do with its intended function, and the new clone() or copyOnWrite() method can also be used in other situations when you want different objects of the same index to handle deletes separately.

But on second glance, it dawned on me: You can't actually delete on both objects at once, because when you start deleting in one object, it holds a lock and then you can't do deletions in the second object! So I have to admit, the usefulness of of a general clone/copyOnWrite feature for IndexReader is quite limited. My suggestion above can still be the API, but I admit it will hardly be useful in any situation except (the rare situation nowadays of?) a reopen() and later deletes.

> IndexReader.reopen(boolean force)
> ---------------------------------
>
> Key: LUCENE-1314
> URL: https://issues.apache.org/jira/browse/LUCENE-1314
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.3.1
> Reporter: Jason Rutherglen
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: lucene-1314.patch, lucene-1314.patch, lucene-1314.patch
>
>
> Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 25, 2008, 3:53 AM

Post #3 of 20 (637 views)
Permalink
[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607954#action_12607954 ]

Michael McCandless commented on LUCENE-1314:
--------------------------------------------

bq. In my SegmentReader subclass I am passing a lock and passing a reference to fieldsReader for global locking and a single fieldsReader across all instances. Otherwise there are too many instances of fieldsReader and file descriptors will be used up.

Maybe instead we should just fix access to FieldsReader to be thread safe, either by making FieldsReader itself thread safe, or by doing something similar to what's done for TermVectorsReader (where each thread makes a "shallow" clone of the original TermVectorsReader, held in a ThreadLocal instance). If we do that, then in SegmentReader.doReopen() we never have to clone FieldsReader.

> IndexReader.reopen(boolean force)
> ---------------------------------
>
> Key: LUCENE-1314
> URL: https://issues.apache.org/jira/browse/LUCENE-1314
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.3.1
> Reporter: Jason Rutherglen
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: lucene-1314.patch, lucene-1314.patch, lucene-1314.patch
>
>
> Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 25, 2008, 6:39 AM

Post #4 of 20 (637 views)
Permalink
[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608039#action_12608039 ]

Jason Rutherglen commented on LUCENE-1314:
------------------------------------------

Here is the code of the SegmentReader subclass. Using the clone terminology would work as well, inside of SegmentReader the clone would most likely reuse SegmentReader.reopenSegment. The subclass turns off locking by overriding acquireWriteLock and having it do nothing. I do not know a general fix for the locking issue mentioned "it holds a lock and then you can't do deletions in the second object". Perhaps there is a way using lock less commits. It is possible to have SegmentReader implement if deletes occur to an earlier IndexReader and a flush is tried it fails, rather than fail in a newer IndexReader like it would now. This would require keeping track of later IndexReaders which is something Ocean does outside of IndexReader.

As far as the FieldsReader, given how many SegmentReaders Ocean creates (up to one per update), a shallow clone threadlocal would still potentially create many file descriptors. I would rather see a synchronized FieldsReader, or simply use the approach in the code below. The external lock used seems ok because there is little competition for reading Documents, no more than normal a Lucene application using a single IndexReader loading documents for N results.

{code}
public class OceanSegmentReader extends SegmentReader {
protected ReentrantLock fieldsReaderLock;

public OceanSegmentReader() {
openNewFieldsReader = false;
}

protected void doInitialize() {
fieldsReaderLock = new ReentrantLock();
}

protected void acquireWriteLock() throws IOException {
}

protected synchronized DirectoryIndexReader doReopen(SegmentInfos infos, boolean force) throws CorruptIndexException, IOException {
OceanSegmentReader segmentReader = (OceanSegmentReader)super.doReopen(infos, force);
segmentReader.fieldsReaderLock = fieldsReaderLock;
return segmentReader;
}

/**
* @throws CorruptIndexException
* if the index is corrupt
* @throws IOException
* if there is a low-level IO error
*/
public synchronized Document document(int n, FieldSelector fieldSelector) throws CorruptIndexException, IOException {
ensureOpen();
if (isDeleted(n))
throw new IllegalArgumentException("attempt to access a deleted document");
fieldsReaderLock.lock();
try {
return getFieldsReader().doc(n, fieldSelector);
} finally {
fieldsReaderLock.unlock();
}
}
}
{code}

> IndexReader.reopen(boolean force)
> ---------------------------------
>
> Key: LUCENE-1314
> URL: https://issues.apache.org/jira/browse/LUCENE-1314
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.3.1
> Reporter: Jason Rutherglen
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: lucene-1314.patch, lucene-1314.patch, lucene-1314.patch
>
>
> Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 26, 2008, 6:34 PM

Post #5 of 20 (620 views)
Permalink
[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608635#action_12608635 ]

Jason Rutherglen commented on LUCENE-1314:
------------------------------------------

Using the patch and the above subclass of SegmentReader received the following bug. I am assuming it has something to do with SegmentInfos committing. Ideally the new clone method of IndexReader will avoid things like reloading SegmentInfos from disk each time. That will probably slow down the rapid updates too much.

{noformat}
1) testSearch(org.apache.lucene.ocean.TestSearch)java.lang.AssertionError: delete count mismatch: info=1 vs BitVector=5
at org.apache.lucene.index.SegmentReader.loadDeletedDocs(SegmentReader.java:365)
at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:328)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:267)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:235)
at org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:90)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:649)
at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:97)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:213)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
{noformat}

> IndexReader.reopen(boolean force)
> ---------------------------------
>
> Key: LUCENE-1314
> URL: https://issues.apache.org/jira/browse/LUCENE-1314
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.3.1
> Reporter: Jason Rutherglen
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: lucene-1314.patch, lucene-1314.patch, lucene-1314.patch
>
>
> Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 29, 2008, 2:45 AM

Post #6 of 20 (598 views)
Permalink
[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12609071#action_12609071 ]

Michael McCandless commented on LUCENE-1314:
--------------------------------------------

bq. Using the patch and the above subclass of SegmentReader received the following bug. I am assuming it has something to do with SegmentInfos committing. Ideally the new clone method of IndexReader will avoid things like reloading SegmentInfos from disk each time. That will probably slow down the rapid updates too much.

Right, that exception happens because you are carrying your own deletedDocs in memory to the new SegmentReader without first saving them to the _X_N.del file for that segment. The new clone() approach definitely should not reload the segments_N file, and thus not call SegmentReader.initialize.

> IndexReader.reopen(boolean force)
> ---------------------------------
>
> Key: LUCENE-1314
> URL: https://issues.apache.org/jira/browse/LUCENE-1314
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.3.1
> Reporter: Jason Rutherglen
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: lucene-1314.patch, lucene-1314.patch, lucene-1314.patch
>
>
> Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 29, 2008, 2:51 AM

Post #7 of 20 (601 views)
Permalink
[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12609072#action_12609072 ]

Michael McCandless commented on LUCENE-1314:
--------------------------------------------

bq. It is possible to have SegmentReader implement if deletes occur to an earlier IndexReader and a flush is tried it fails, rather than fail in a newer IndexReader like it would now. This would require keeping track of later IndexReaders which is something Ocean does outside of IndexReader.

I think this is tricky, since SegmentReader doesn't explicitly track whether there is a "cloned" reader out there. As things stand now, there is no such thing as a cloned reader, and so the only way that another SegmentReader is out there is if there have been commits to the index, in which case isCurrent() returns false and the old reader will not allow deletes to be performed. I suppose we could look at the refCount of the IndexReader: any reader that has been cloned and not yet closed will have a refCount > 1, whereas the last reader returned from a clone() call will have refCount 1.

So unless we try to track this, when there are N clones out there, any one of them will be allowed to grab the write lock when a change (deletion or setNorm) is attempted, thus preventing all the other clones (and all readers open on previous commits) from making changes.

> IndexReader.reopen(boolean force)
> ---------------------------------
>
> Key: LUCENE-1314
> URL: https://issues.apache.org/jira/browse/LUCENE-1314
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.3.1
> Reporter: Jason Rutherglen
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: lucene-1314.patch, lucene-1314.patch, lucene-1314.patch
>
>
> Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jun 30, 2008, 10:11 PM

Post #8 of 20 (587 views)
Permalink
[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12609456#action_12609456 ]

Hoss Man commented on LUCENE-1314:
----------------------------------

I haven't really been following this issue, but this comment caught my eye...

bq. So unless we try to track this, when there are N clones out there, any one of them will be allowed to grab the write lock when a change (deletion or setNorm) is attempted, thus preventing all the other clones (and all readers open on previous commits) from making changes.

...and gave me an erie sense of deja vu. looking back at LUCENE-743, the original approach for reopen was based on cloning and prompted this comment from me...

https://issues.apache.org/jira/browse/LUCENE-743?focusedCommentId=12534123#action_12534123

...which led to some interesting discussion about what the semantics of cloning an IndexReader should be (even though ultimately cloning wasn't used).



> IndexReader.reopen(boolean force)
> ---------------------------------
>
> Key: LUCENE-1314
> URL: https://issues.apache.org/jira/browse/LUCENE-1314
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.3.1
> Reporter: Jason Rutherglen
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch
>
>
> Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 2, 2008, 11:27 AM

Post #9 of 20 (553 views)
Permalink
[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12609999#action_12609999 ]

Michael McCandless commented on LUCENE-1314:
--------------------------------------------

bq. ...which led to some interesting discussion about what the semantics of cloning an IndexReader should be (even though ultimately cloning wasn't used).

In fact that 2nd point you raised ("what happens when you clone an IndexReader that has pending changes") makes me nervous here. I think I'd prefer that we disallow that (throw an exception when this is attempted). Ie you can only clone a reader that has no pending changes.

Also Jason you added set/getWriteLock: how come you couldn't just customize LockFactory for that, instead? Eg if you want to turn off locking you can just use NoLockFactory.

{quote}
Would like to be able to optionally have this line run in DirectoryIndexReader in reopen. Does it need to be run on a clone?

SegmentInfos infos = new SegmentInfos();
infos.read(directory, segmentFileName);
{quote}

I agree that line should not run on clone().

> IndexReader.reopen(boolean force)
> ---------------------------------
>
> Key: LUCENE-1314
> URL: https://issues.apache.org/jira/browse/LUCENE-1314
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.3.1
> Reporter: Jason Rutherglen
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch
>
>
> Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 3, 2008, 7:54 AM

Post #10 of 20 (533 views)
Permalink
[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610244#action_12610244 ]

Jason Rutherglen commented on LUCENE-1314:
------------------------------------------

> check isCurrent()

I thought we wanted to check a commit on a clone that the index is current? Does it need to be in a clone only portion of the code? Which class is best?

> clone() the norms

We need to clone norms. I want to make cloning deletedDocs and norms optional mainly because it is a waste in Ocean to clone norms. Is the best way to give the option parameters to the clone method (breaking Cloneable)? An additional option could be readOnly. Perhaps norms or deletedDocs becomes readOnly if they are ref copied and not cloned. IndexReader.open and reopen would need a readOnly parameter. Or should a subclass of SegmentReader handle cloning or refing norms and deletedDocs.

I think it may be easiest to have readOnly be a part of this patch. I wanted to separate out the FieldsReader synchronization code into a separate patch but then this patch would have been messed up without it (the new FieldsReader per SegmentReader issue). Readonly may end up being similar.

The newlines is another Eclipse thing I haven't figured out yet.

> IndexReader.reopen(boolean force)
> ---------------------------------
>
> Key: LUCENE-1314
> URL: https://issues.apache.org/jira/browse/LUCENE-1314
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.3.1
> Reporter: Jason Rutherglen
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch
>
>
> Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 3, 2008, 8:38 AM

Post #11 of 20 (532 views)
Permalink
[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610260#action_12610260 ]

Michael McCandless commented on LUCENE-1314:
--------------------------------------------

bq. I thought we wanted to check a commit on a clone that the index is current? Does it need to be in a clone only portion of the code? Which class is best?

I think checking when a change is first made (and grabbing the write.lock to prevent others) is safer? Else you could lose changes you had made. Ie if there are clones out there, the first one that starts making changes prevents any others from doing so. But I feel like I'm missing something about Ocean: was there some driver for checking on commit instead?

bq. We need to clone norms. I want to make cloning deletedDocs and norms optional mainly because it is a waste in Ocean to clone norms. Is the best way to give the option parameters to the clone method (breaking Cloneable)? An additional option could be readOnly. Perhaps norms or deletedDocs becomes readOnly if they are ref copied and not cloned. IndexReader.open and reopen would need a readOnly parameter. Or should a subclass of SegmentReader handle cloning or refing norms and deletedDocs.

It'd be nice if we could do a copy-on-write approach. This way no copy is made when you first clone, but if you go to make a change, it makes a private copy only at that point. And you don't have to separately specify a readOnly up front only to find later you didn't pass in the right value.

bq. I think it may be easiest to have readOnly be a part of this patch. I wanted to separate out the FieldsReader synchronization code into a separate patch but then this patch would have been messed up without it (the new FieldsReader per SegmentReader issue). Readonly may end up being similar.

Maybe instead we wait on the readOnly patch until we resolve this one (ie stage them)?

> IndexReader.reopen(boolean force)
> ---------------------------------
>
> Key: LUCENE-1314
> URL: https://issues.apache.org/jira/browse/LUCENE-1314
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.3.1
> Reporter: Jason Rutherglen
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch
>
>
> Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 3, 2008, 11:05 AM

Post #12 of 20 (529 views)
Permalink
[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610303#action_12610303 ]

Jason Rutherglen commented on LUCENE-1314:
------------------------------------------

There are really only two options here and perhaps this API will work.

// true makes a copy of the data structure always, while false passes the reference if the data structure is read only or makes a copy if it is writeable.
IndexReader.getCopy(boolean normsWriteable, boolean deletesWriteable)
IndexReader.getCopyReadOnly() // defaults to getCopy(false, false)

Clone can be removed or default to getCopy(true, true). The current APIs default to getCopy(true, true). It is good to make this explicit here so that the deletedDocs or norms cannot be changed later when it is the clear intention of the code. It is no different than RandomAccessFile(file, "r") and RandomAccessFile(file, "rw")

Lucene is supposed to be designed for fast reads at the expense of writes, no? This code in SegmentReader with the deletedDocs and norms synchronization goes against that. I think it is important to figure out a solution to give users the option of removing synchronization in SegmentReader, users who are willing to give up a little bit in memory (norms or deletedDocs don't use very much anyways).

> copy-on-write approach

Is the problem with isDeleted now. The java.util.concurrent.CopyOnWriteArrayList for example uses a volatile list and synchronized update methods. Which will not work because of JDK1.4.

> driver for checking on commit

Yes, it is more of an assertion, it can be performed in Ocean as well.

> IndexReader.reopen(boolean force)
> ---------------------------------
>
> Key: LUCENE-1314
> URL: https://issues.apache.org/jira/browse/LUCENE-1314
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.3.1
> Reporter: Jason Rutherglen
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch
>
>
> Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 3, 2008, 11:55 AM

Post #13 of 20 (528 views)
Permalink
[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610318#action_12610318 ]

robert engels commented on LUCENE-1314:
---------------------------------------

volatile should work on just about all 1.4.1+ JVMs, which I think is the required JVM level for Lucene anyway...

> IndexReader.reopen(boolean force)
> ---------------------------------
>
> Key: LUCENE-1314
> URL: https://issues.apache.org/jira/browse/LUCENE-1314
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.3.1
> Reporter: Jason Rutherglen
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch
>
>
> Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 3, 2008, 12:49 PM

Post #14 of 20 (527 views)
Permalink
[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610330#action_12610330 ]

Jason Rutherglen commented on LUCENE-1314:
------------------------------------------

Following up on the API comment, there can be a version of the norms or deletedDocs wrapper class for pre JDK1.5 that uses a synchronized accessor as demonstrated http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/CopyOnWriteArrayList.java (works on JDK1.2 and above) and a version for JDK1.5 that uses volatile. This is only for the writeable norms or deletedDocs anyways, but will yield results for users who continue to use the default API with a writeable IndexReader. The null check can be synchronized and there can be a global setting that tells the IndexReader to instantiate a new deletedDocs or norms on init.

> IndexReader.reopen(boolean force)
> ---------------------------------
>
> Key: LUCENE-1314
> URL: https://issues.apache.org/jira/browse/LUCENE-1314
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.3.1
> Reporter: Jason Rutherglen
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch
>
>
> Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 4, 2008, 3:05 AM

Post #15 of 20 (509 views)
Permalink
[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610510#action_12610510 ]

Michael McCandless commented on LUCENE-1314:
--------------------------------------------

Why would you ever need to make a read-only clone of a writable IndexReader? In fact, once we have readOnly IndexReaders, why would you ever clone one? The [precarious] use case that set us down this path (adding clone()) in the first place was to make a clone so that you could make changes to the clone without affecting the original reader.

I think clone() should just clone() and not "alter" the readOnly-ness of the original IndexReader? We could still then under the hood do copy-on-write. We can just do this ourselves -- keep a boolean isShared in SegmentReader that's true when more than one SegmentReader is referencing the norms/deletedDocs.

I do think we should make IndexReader.open take a readOnly boolean (LUCENE-1030). In fact, maybe we should go off do that one, first, since it may change our approach to clone? (Ie swap the order of these two)?

> IndexReader.reopen(boolean force)
> ---------------------------------
>
> Key: LUCENE-1314
> URL: https://issues.apache.org/jira/browse/LUCENE-1314
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.3.1
> Reporter: Jason Rutherglen
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch
>
>
> Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 8, 2008, 5:36 AM

Post #16 of 20 (454 views)
Permalink
[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611556#action_12611556 ]

Michael McCandless commented on LUCENE-1314:
--------------------------------------------

Jason, I had some problems with the latest patch. First, it wouldn't compile, because createIndex was private in TestIndexReaderReopen.

Once I changed that to protected, I'm seeing this failure in TestStressIndexing2:

[junit] Testsuite: org.apache.lucene.index.TestStressIndexing2
[junit] Tests run: 2, Failures: 2, Errors: 0, Time elapsed: 9.596 sec

[junit] ------------- Standard Output ---------------
[junit] [.stored/uncompressed,indexed,termVector,termVectorOffsets,termVectorPosition<f71:??? >, stored/uncompressed,indexed,tokenized,termVector,termVectorOffsets,termVectorPosition<f99:E J H J H F E C I C >, stored/uncompressed,indexed,omitNorms<id:1000085>]
[junit] [stored/uncompressed,indexed,termVector,termVectorOffsets,termVectorPosition<f27:???@?p+???l? >, stored/uncompressed,indexed,omitNorms<id:1000000>]
[junit] [stored/uncompressed,indexed,omitNorms<id:12>]
[junit] [stored/uncompressed,indexed,termVector,termVectorOffsets,termVectorPosition,omitNorms<f64:C >, stored/uncompressed,indexed,termVector,termVectorOffsets,termVectorPosition,omitNorms<f90:1???t?? >, stored/uncompressed,indexed,omitNorms<id:0>]
[junit] ------------- ---------------- ---------------
[junit] Testcase: testRandom(org.apache.lucene.index.TestStressIndexing2): FAILED
[junit] expected:<3> but was:<2>
[junit] junit.framework.AssertionFailedError: expected:<3> but was:<2>
[junit] at org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:336)
[junit] at org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:234)
[junit] at org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:193)
[junit] at org.apache.lucene.index.TestStressIndexing2.testRandom(TestStressIndexing2.java:68)


[junit] Testcase: testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED
[junit] expected:<1> but was:<3>
[junit] junit.framework.AssertionFailedError: expected:<1> but was:<3>
[junit] at org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:336)
[junit] at org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:234)
[junit] at org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:193)
[junit] at org.apache.lucene.index.TestStressIndexing2.testMultiConfig(TestStressIndexing2.java:97)


Are you seeing this too?

> IndexReader.reopen(boolean force)
> ---------------------------------
>
> Key: LUCENE-1314
> URL: https://issues.apache.org/jira/browse/LUCENE-1314
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.3.1
> Reporter: Jason Rutherglen
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch
>
>
> Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 8, 2008, 6:56 AM

Post #17 of 20 (451 views)
Permalink
[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611585#action_12611585 ]

Jason Rutherglen commented on LUCENE-1314:
------------------------------------------

I am seeing the error. It is not norms because the test does not test norms. It is either a problem with fieldsreader or deleted docs.

> IndexReader.reopen(boolean force)
> ---------------------------------
>
> Key: LUCENE-1314
> URL: https://issues.apache.org/jira/browse/LUCENE-1314
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.3.1
> Reporter: Jason Rutherglen
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch
>
>
> Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 10, 2008, 7:30 AM

Post #18 of 20 (379 views)
Permalink
[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612511#action_12612511 ]

Michael McCandless commented on LUCENE-1314:
--------------------------------------------

bq. Added protected DirectoryIndexReader.allowCloneWithChanges which allows clones with changes to be made.

This makes me a bit nervous. What does it "mean" to clone an IndexReader that has changes? Normally Lucene only allows one writer at a time on the index.

> IndexReader.reopen(boolean force)
> ---------------------------------
>
> Key: LUCENE-1314
> URL: https://issues.apache.org/jira/browse/LUCENE-1314
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.3.1
> Reporter: Jason Rutherglen
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch
>
>
> Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 10, 2008, 8:08 AM

Post #19 of 20 (378 views)
Permalink
[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612525#action_12612525 ]

Jason Rutherglen commented on LUCENE-1314:
------------------------------------------

Because Ocean does not flush to disk with every transaction (deleteDocument then clone the reader) there is a need to allow cloning of readers that have not flushed changes. Rather than create a workaround, which I did that involves overriding clone and setting hasChanges=false, cloning and then setting hasChanges=true again, this seemed to be cleaner.

> IndexReader.reopen(boolean force)
> ---------------------------------
>
> Key: LUCENE-1314
> URL: https://issues.apache.org/jira/browse/LUCENE-1314
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.3.1
> Reporter: Jason Rutherglen
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch
>
>
> Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Jul 11, 2008, 5:33 PM

Post #20 of 20 (349 views)
Permalink
[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613025#action_12613025 ]

Michael McCandless commented on LUCENE-1314:
--------------------------------------------

bq. Because Ocean does not flush to disk with every transaction (deleteDocument then clone the reader) there is a need to allow cloning of readers that have not flushed changes.

But when you clone a reader with pending changes, is the old reader's write lock revoked? Meaning it is not allowed to commit (yet, continues to hold the changes it has)? And the newly cloned reader gets the write lock? Are further changes allowed against the old reader, or maybe it should become "frozen"?

> IndexReader.reopen(boolean force)
> ---------------------------------
>
> Key: LUCENE-1314
> URL: https://issues.apache.org/jira/browse/LUCENE-1314
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.3.1
> Reporter: Jason Rutherglen
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch
>
>
> Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.