Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] Updated: (LUCENE-1313) Near Realtime Search (using a built in RAMDirectory)

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Nov 2, 2009, 11:04 AM

Post #1 of 9 (214 views)
Permalink
[jira] Updated: (LUCENE-1313) Near Realtime Search (using a built in RAMDirectory)

[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1313:
-------------------------------------

Attachment: LUCENE-1313.patch

Realizing the previous patches approach has grown too
complicated, this is a far simpler implementation that fulfills
the same goal, batching segments in RAM until they exceed a
given maximum size, then merging those RAM segments to a primary
directory (i.e. disk). All the while allowing all segments to be
searchable with a minimum latency turnaround.

* Segment names are generated for the ram writer from the
primary writer, this insures name continuity. Actually I'm not
sure this is necessary anymore.

* The problem is when the ram segments are merged into the
primary writer, they appear to be non-contiguous. Some of the
contiguous segment checking has been relaxed for this case, and
needs to be conditional on the segment merging being from the
ram dir. Perhaps we can have our cake and eat it too here by
keeping the contiguous check around for all cases?

* When the ram writer's usage exceeds a specified size, the ram
buffer is flushed, and the ram segments are synchronously merged
to the primary writer using a mechanism similar to
addIndexesNoOptimize.

> Near Realtime Search (using a built in RAMDirectory)
> ----------------------------------------------------
>
> Key: LUCENE-1313
> URL: https://issues.apache.org/jira/browse/LUCENE-1313
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.4.1
> Reporter: Jason Rutherglen
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch
>
>
> Enable near realtime search in Lucene without external
> dependencies. When RAM NRT is enabled, the implementation adds a
> RAMDirectory to IndexWriter. Flushes go to the ramdir unless
> there is no available space. Merges are completed in the ram
> dir until there is no more available ram.
> IW.optimize and IW.commit flush the ramdir to the primary
> directory, all other operations try to keep segments in ram
> until there is no more space.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Nov 2, 2009, 4:38 PM

Post #2 of 9 (194 views)
Permalink
[jira] Updated: (LUCENE-1313) Near Realtime Search (using a built in RAMDirectory) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1313:
-------------------------------------

Attachment: LUCENE-1313.patch

* Ensure contiguous is mostly back

* Cleaned up the code and made the flush method non-synchronized

* There's a subtle synchronization bug causing files to not be found in the testRandomThreads method

* There's excessive merge logging to debug the sync issue

> Near Realtime Search (using a built in RAMDirectory)
> ----------------------------------------------------
>
> Key: LUCENE-1313
> URL: https://issues.apache.org/jira/browse/LUCENE-1313
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.4.1
> Reporter: Jason Rutherglen
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch
>
>
> Enable near realtime search in Lucene without external
> dependencies. When RAM NRT is enabled, the implementation adds a
> RAMDirectory to IndexWriter. Flushes go to the ramdir unless
> there is no available space. Merges are completed in the ram
> dir until there is no more available ram.
> IW.optimize and IW.commit flush the ramdir to the primary
> directory, all other operations try to keep segments in ram
> until there is no more space.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Nov 4, 2009, 11:51 AM

Post #3 of 9 (184 views)
Permalink
[jira] Updated: (LUCENE-1313) Near Realtime Search (using a built in RAMDirectory) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1313:
-------------------------------------

Attachment: LUCENE-1313.patch

I wanted to simplify a little more to more easily understand the
edge cases that fail. In the multithreaded test, files were
sometimes left open which is hard for me to debug.

The TestNRT.testSimple method passes however, IndexFileDeleter
is complaining about not being able to delete when expected
which is shown in the IW.infoStream.

The NRT.flush method creates a merge of all the ram segments,
then calls IW.mergeIn to manually merge the ram segments into
the writer. OneMerge contains the writer where the segment
readers should be obtained from. In this case, the primary
writer obtains the readers from the ram writer's readerpool.
This is important because deletes may be coming in as we're
merging. However I'm not sure this will work without a shared
lock between the writers for commitMergedDeletes which requires
syncing.

Mike, can you take a look to see if this path will work?

> Near Realtime Search (using a built in RAMDirectory)
> ----------------------------------------------------
>
> Key: LUCENE-1313
> URL: https://issues.apache.org/jira/browse/LUCENE-1313
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.4.1
> Reporter: Jason Rutherglen
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch
>
>
> Enable near realtime search in Lucene without external
> dependencies. When RAM NRT is enabled, the implementation adds a
> RAMDirectory to IndexWriter. Flushes go to the ramdir unless
> there is no available space. Merges are completed in the ram
> dir until there is no more available ram.
> IW.optimize and IW.commit flush the ramdir to the primary
> directory, all other operations try to keep segments in ram
> until there is no more space.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Nov 4, 2009, 6:05 PM

Post #4 of 9 (182 views)
Permalink
[jira] Updated: (LUCENE-1313) Near Realtime Search (using a built in RAMDirectory) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1313:
-------------------------------------

Attachment: LUCENE-1313.patch

The tests pass, and the previous kinks seem to be worked out.
Actually there is still one issue, where in waitForMerges, the
assert mergingSegments size equals zero occasionally fails. I
think this is a small sync problem because of the manual merge
between the two writers.

I'll run the multi threaded test at a longer interval to see
what other errors may crop up. Once it runs successfully for
lets say 30 minutes, we can beef up the stress testing of this
patch by doing concurrent updates, deletes, etc.

> Near Realtime Search (using a built in RAMDirectory)
> ----------------------------------------------------
>
> Key: LUCENE-1313
> URL: https://issues.apache.org/jira/browse/LUCENE-1313
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.4.1
> Reporter: Jason Rutherglen
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch
>
>
> Enable near realtime search in Lucene without external
> dependencies. When RAM NRT is enabled, the implementation adds a
> RAMDirectory to IndexWriter. Flushes go to the ramdir unless
> there is no available space. Merges are completed in the ram
> dir until there is no more available ram.
> IW.optimize and IW.commit flush the ramdir to the primary
> directory, all other operations try to keep segments in ram
> until there is no more space.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Nov 4, 2009, 6:15 PM

Post #5 of 9 (181 views)
Permalink
[jira] Updated: (LUCENE-1313) Near Realtime Search (using a built in RAMDirectory) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1313:
-------------------------------------

Attachment: LUCENE-1313.patch

Alright, the issue was simple, OneMerge.registerDone was being set to false by the primary writer, so the ram writer wasn't removing the infos from mergingSegments in mergeFinish.



> Near Realtime Search (using a built in RAMDirectory)
> ----------------------------------------------------
>
> Key: LUCENE-1313
> URL: https://issues.apache.org/jira/browse/LUCENE-1313
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.4.1
> Reporter: Jason Rutherglen
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch
>
>
> Enable near realtime search in Lucene without external
> dependencies. When RAM NRT is enabled, the implementation adds a
> RAMDirectory to IndexWriter. Flushes go to the ramdir unless
> there is no available space. Merges are completed in the ram
> dir until there is no more available ram.
> IW.optimize and IW.commit flush the ramdir to the primary
> directory, all other operations try to keep segments in ram
> until there is no more space.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Nov 4, 2009, 10:34 PM

Post #6 of 9 (179 views)
Permalink
[jira] Updated: (LUCENE-1313) Near Realtime Search (using a built in RAMDirectory) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1313:
-------------------------------------

Attachment: LUCENE-1313.patch

TestNRTReaderWithThreads2 fails periodically, it's just another
synchronization issue. I added syncing on the merge writer in
methods like commitMergedDeletes and commitMerge. Perhaps more
of that type of syncing needs to be added. It can take time for
these issues to be figured out.

There's also remnants of a first attempt at transparently
utilizing the NRT class within IW.

> Near Realtime Search (using a built in RAMDirectory)
> ----------------------------------------------------
>
> Key: LUCENE-1313
> URL: https://issues.apache.org/jira/browse/LUCENE-1313
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.4.1
> Reporter: Jason Rutherglen
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch
>
>
> Enable near realtime search in Lucene without external
> dependencies. When RAM NRT is enabled, the implementation adds a
> RAMDirectory to IndexWriter. Flushes go to the ramdir unless
> there is no available space. Merges are completed in the ram
> dir until there is no more available ram.
> IW.optimize and IW.commit flush the ramdir to the primary
> directory, all other operations try to keep segments in ram
> until there is no more space.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Nov 5, 2009, 11:34 AM

Post #7 of 9 (173 views)
Permalink
[jira] Updated: (LUCENE-1313) Near Realtime Search (using a built in RAMDirectory) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1313:
-------------------------------------

Attachment: LUCENE-1313.patch

New assertions in NRT.flush are catching the issue that's occurring.

> Near Realtime Search (using a built in RAMDirectory)
> ----------------------------------------------------
>
> Key: LUCENE-1313
> URL: https://issues.apache.org/jira/browse/LUCENE-1313
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.4.1
> Reporter: Jason Rutherglen
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch
>
>
> Enable near realtime search in Lucene without external
> dependencies. When RAM NRT is enabled, the implementation adds a
> RAMDirectory to IndexWriter. Flushes go to the ramdir unless
> there is no available space. Merges are completed in the ram
> dir until there is no more available ram.
> IW.optimize and IW.commit flush the ramdir to the primary
> directory, all other operations try to keep segments in ram
> until there is no more space.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Nov 5, 2009, 1:45 PM

Post #8 of 9 (165 views)
Permalink
[jira] Updated: (LUCENE-1313) Near Realtime Search (using a built in RAMDirectory) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1313:
-------------------------------------

Attachment: LUCENE-1313.patch

OK, all the tests pass consistently now.

I guess the next feature is to have NRT.flush execute in a single background thread rather than block update doc calls.

> Near Realtime Search (using a built in RAMDirectory)
> ----------------------------------------------------
>
> Key: LUCENE-1313
> URL: https://issues.apache.org/jira/browse/LUCENE-1313
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.4.1
> Reporter: Jason Rutherglen
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch
>
>
> Enable near realtime search in Lucene without external
> dependencies. When RAM NRT is enabled, the implementation adds a
> RAMDirectory to IndexWriter. Flushes go to the ramdir unless
> there is no available space. Merges are completed in the ram
> dir until there is no more available ram.
> IW.optimize and IW.commit flush the ramdir to the primary
> directory, all other operations try to keep segments in ram
> until there is no more space.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


jira at apache

Nov 9, 2009, 12:48 PM

Post #9 of 9 (97 views)
Permalink
[jira] Updated: (LUCENE-1313) Near Realtime Search (using a built in RAMDirectory) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1313:
-------------------------------------

Attachment: LUCENE-1313.patch

This patch includes flushing in a background thread. Some formatting has been cleaned up, javadocs added.

I ran TestNRTReaderWithThreads2 a couple times for kicks and didn't see the assert sr.hasChanges error. I'll probably focus on adding more stress testing.

> Near Realtime Search (using a built in RAMDirectory)
> ----------------------------------------------------
>
> Key: LUCENE-1313
> URL: https://issues.apache.org/jira/browse/LUCENE-1313
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Affects Versions: 2.4.1
> Reporter: Jason Rutherglen
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch
>
>
> Enable near realtime search in Lucene without external
> dependencies. When RAM NRT is enabled, the implementation adds a
> RAMDirectory to IndexWriter. Flushes go to the ramdir unless
> there is no available space. Merges are completed in the ram
> dir until there is no more available ram.
> IW.optimize and IW.commit flush the ramdir to the primary
> directory, all other operations try to keep segments in ram
> until there is no more space.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.