Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: General

segment ? new segment after a commit

 

 

Lucene general RSS feed   Index | Next | Previous | View Threaded


johanna.34 at gmail

Apr 16, 2009, 6:51 AM

Post #1 of 2 (727 views)
Permalink
segment ? new segment after a commit

Hi,

I just made an update :
Indexing completed. Added/Updated: 6327 documents. Deleted 0 documents.

But I don"t get why it doesn't just add a new segment instead of change all
the segment.

ls data/index/
_1zf.fdt _1zf.fdx _1zf.fnm _1zf.frq _1zf.nrm _1zf.prx _1zf.tii
_1zf.tis _1zf.tvd _1zf.tvf _1zf.tvx segments.gen segments_1o

during the update
ls data/index/
_1zf.fdt _1zf.fnm _1zf.nrm _1zf.tii _1zf.tvd _1zf.tvx _1zg.fdt
_1zg.fnm _1zg.nrm _1zg.tii _1zg.tvd _1zg.tvx _1zh.frq _1zh.prx
_1zh.tis _1zi.fdx _1zi.frq _1zi.tii segments.gen
_1zf.fdx _1zf.frq _1zf.prx _1zf.tis _1zf.tvf _1zf_1.del _1zg.fdx
_1zg.frq _1zg.prx _1zg.tis _1zg.tvf _1zh.fnm _1zh.nrm _1zh.tii
_1zi.fdt _1zi.fnm _1zi.prx _1zi.tis segments_1o

ls data/index/
_1zi.fdt _1zi.fdx _1zi.fnm _1zi.frq _1zi.nrm _1zi.prx _1zi.tii
_1zi.tis _1zi.tvd _1zi.tvf _1zi.tvx segments.gen segments_1p


This is my conf:

<mergeFactor>15</mergeFactor>
<!--
If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will
flush based on whichever limit is hit first.

-->
<maxBufferedDocs>30000</maxBufferedDocs>
<!-- Tell Lucene when to flush documents to disk.
Giving Lucene more memory for indexing means faster indexing at the cost
of more RAM

If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will
flush based on whichever limit is hit first.


<ramBufferSizeMB>150</ramBufferSizeMB>-->
<maxMergeDocs>2147483647</maxMergeDocs>
<maxFieldLength>10000</maxFieldLength>
<writeLockTimeout>1000</writeLockTimeout>
<commitLockTimeout>1000000</commitLockTimeout>

<!--
Expert: Turn on Lucene's auto commit capability.
This causes intermediate segment flushes to write a new lucene
index descriptor, enabling it to be opened by an external
IndexReader.
NOTE: Despite the name, this value does not have any relation to Solr's
autoCommit functionality
-->
<!--<luceneAutoCommit>false</luceneAutoCommit>-->
<!--
Expert:
The Merge Policy in Lucene controls how merging is handled by Lucene.
The default in 2.3 is the LogByteSizeMergePolicy, previous
versions used LogDocMergePolicy.

LogByteSizeMergePolicy chooses segments to merge based on their size.
The Lucene 2.2 default, LogDocMergePolicy chose when
to merge based on number of documents

Other implementations of MergePolicy must have a no-argument
constructor
-->

<!--<mergePolicy>org.apache.lucene.index.LogByteSizeMergePolicy</mergePolicy>-->

<!--
Expert:
The Merge Scheduler in Lucene controls how merges are performed. The
ConcurrentMergeScheduler (Lucene 2.3 default)
can perform merges in the background using separate threads. The
SerialMergeScheduler (Lucene 2.2 default) does not.
-->

<!--<mergeScheduler>org.apache.lucene.index.ConcurrentMergeScheduler</mergeScheduler>-->

<!--
This option specifies which Lucene LockFactory implementation to use.

single = SingleInstanceLockFactory - suggested for a read-only index
or when there is no possibility of another process trying
to modify the index.
native = NativeFSLockFactory
simple = SimpleFSLockFactory

(For backwards compatibility with Solr 1.2, 'simple' is the default
if not specified.)
-->
<lockType>single</lockType>
</indexDefaults>

<mainIndex>
<!-- options specific to the main on-disk lucene index -->
<useCompoundFile>false</useCompoundFile>
<mergeFactor>15</mergeFactor>
<!-- Deprecated -->
<maxBufferedDocs>50000</maxBufferedDocs>
<maxMergeDocs>2147483647</maxMergeDocs>
<maxFieldLength>10000</maxFieldLength>


--
View this message in context: http://www.nabble.com/segment---new-segment-after-a-commit-tp23078413p23078413.html
Sent from the Lucene - General mailing list archive at Nabble.com.


ted.dunning at gmail

Apr 16, 2009, 9:36 AM

Post #2 of 2 (653 views)
Permalink
Re: segment ? new segment after a commit [In reply to]

Here is a link to a talk that Doug gave describing the basics of the
indexing process. It should answer your questions.

http://lucene.sourceforge.net/talks/pisa/

The basic answer is that adding documents increases the size of the smaller
files until they get big enough to merge and then when the merged result
gets bigger, it gets merged to the next file.

If you add a small number of new documents, measured as a fraction of all
documents, then only a few files will change. If you had many documents,
then many files will change.

On Thu, Apr 16, 2009 at 6:51 AM, sunnyfr <johanna.34 [at] gmail> wrote:

>
>
> I just made an update :
> Indexing completed. Added/Updated: 6327 documents. Deleted 0 documents.
>
> But I don"t get why it doesn't just add a new segment instead of change all
> the segment.
>
>

Lucene general RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.