Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] Commented: (LUCENE-983) Enable IndexReader to merge tail segments on demand, in RAM, when opening

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Aug 17, 2007, 12:24 AM

Post #1 of 1 (465 views)
Permalink
[jira] Commented: (LUCENE-983) Enable IndexReader to merge tail segments on demand, in RAM, when opening

[ https://issues.apache.org/jira/browse/LUCENE-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520480 ]

Michael Busch commented on LUCENE-983:
--------------------------------------

I like this idea. Merging small segments in memory is probably fast,
and only necessary during open()/reopen() and it will improve search
performance.

Lucene-743 will become a bit more difficult. We'll have to keep a
list of segments that are part of the merged index that is in the
RAMDirectory. During reopen() we have to check if any of those
segments changed. If yes, we have to empty the RAMDirectory and
load/merge the small segments again. Otherwise we just add new
segments to the RAMDirectory in case buffer size permits.

Hmm, we could even do more sophisticated things, e. g. if only the
deleted bits of a segment changed we could map them to the merged
RAM index, so we could avoid opening/merging the small segments
again during reopen(). But probably the small performance gain is
not even worth the extra code complexity, as the segments should be
quite small.

> Enable IndexReader to merge tail segments on demand, in RAM, when opening
> -------------------------------------------------------------------------
>
> Key: LUCENE-983
> URL: https://issues.apache.org/jira/browse/LUCENE-983
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Fix For: 2.3
>
>
> Spinoff from LUCENE-845.
> In LUCENE-845, the IndexWriter must pay a high cost (O(N^2) merge
> cost) for keeping the number of segments "always small" in the case
> where flushes of very small segments (1 doc as worst case) happen
> frequently. This happens in "low latency" applications.
> This is because IndexWriter must be ready "at every moment" for an
> IndexReader to open the index.
> But, if we allow IndexReader to use some RAM (give it a RAM buffer) to
> load the long tail of small segments into a RAMDirectory, and then
> merge them (in RAM), this allows IndexReader to still have good
> performance on the index without IndexWriter paying this high merge
> cost. This effectively allows us to optimize the tail segments "on
> demand" when a reader needs to use them.
> When we combine this with LUCENE-743 (efficient "re-open" of a reader)
> then we should be able to efficiently handle low latency applications.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.