Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Clarification about segments

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


davidtlee at gmail

Aug 22, 2008, 4:35 PM

Post #1 of 4 (217 views)
Permalink
Clarification about segments

So from what I understand, is it true that if mergeFactor is 10, then when I
index my first 9 documents, I have 9 separate segments, each containing 1
document? And when searching, it will search through every segment?

Thanks!
David


karsten-lucene at fiz-technik

Aug 23, 2008, 2:40 AM

Post #2 of 4 (195 views)
Permalink
Re: Clarification about segments [In reply to]

Hi David,

this is not true, please take a look to
IndexWriter#setRAMBufferSizeMB
and
IndexWriter#setMaxBufferedDocs

But you can produce 9 segments (each with only one document), if you call
IndexWriter#flush
or
IndexWriter#commit
after each addDocument

so from my knowledge about lucene there is no difference between
#flush
and
#optimize(getMergeFactor())
(btw #optimize() is equal to optimize(1) ).


Best regards
Karsten

p.s. and yes, searching goes through every segment.


David Lee-26 wrote:
>
> So from what I understand, is it true that if mergeFactor is 10, then when
> I
> index my first 9 documents, I have 9 separate segments, each containing 1
> document? And when searching, it will search through every segment?
>
> Thanks!
> David
>
>

--
View this message in context: http://www.nabble.com/Clarification-about-segments-tp19117115p19120086.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


davidtlee at gmail

Aug 25, 2008, 10:39 AM

Post #3 of 4 (172 views)
Permalink
Re: Clarification about segments [In reply to]

ok, thanks. I knew that the documents were buffered in memory until they
were flushed, but I thought that in memory, they were still separate
documents/segments until they were merged together at the appropriate time
(dependent on the mergeFactor).

Do you mean that when the IndexWriter flushes the documents in memory to the
disk, it will merge all the documents in that flush to one segment?

Thanks!
David

On Sat, Aug 23, 2008 at 2:40 AM, Karsten F.
<karsten-lucene[at]fiz-technik.de>wrote:

>
> Hi David,
>
> this is not true, please take a look to
> IndexWriter#setRAMBufferSizeMB
> and
> IndexWriter#setMaxBufferedDocs
>
> But you can produce 9 segments (each with only one document), if you call
> IndexWriter#flush
> or
> IndexWriter#commit
> after each addDocument
>
> so from my knowledge about lucene there is no difference between
> #flush
> and
> #optimize(getMergeFactor())
> (btw #optimize() is equal to optimize(1) ).
>
>
> Best regards
> Karsten
>
> p.s. and yes, searching goes through every segment.
>
>
> David Lee-26 wrote:
> >
> > So from what I understand, is it true that if mergeFactor is 10, then
> when
> > I
> > index my first 9 documents, I have 9 separate segments, each containing 1
> > document? And when searching, it will search through every segment?
> >
> > Thanks!
> > David
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Clarification-about-segments-tp19117115p19120086.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-user-help[at]lucene.apache.org
>
>


lucene at mikemccandless

Aug 26, 2008, 1:39 AM

Post #4 of 4 (165 views)
Permalink
Re: Clarification about segments [In reply to]

Before 2.3, each doc was in fact a separate segment in memory, and
then these segments were merged together to flush a single segment in
the Directory.

As of 2.3, IndexWriter now writes directly into RAM the data
structures that are needed to create the segment, and then flushing
the segment is a matter of copying these data structures into the
Directory. This gave a substantial speedup to indexing throughput,
much better RAM efficiency (documents per MB that IndexWriter can
buffer), etc.

In any event, for all versions of Lucene, when flush happens that
flush adds a single new segment to the index.

Mike

David Lee wrote:

> ok, thanks. I knew that the documents were buffered in memory until
> they
> were flushed, but I thought that in memory, they were still separate
> documents/segments until they were merged together at the
> appropriate time
> (dependent on the mergeFactor).
>
> Do you mean that when the IndexWriter flushes the documents in
> memory to the
> disk, it will merge all the documents in that flush to one segment?
>
> Thanks!
> David
>
> On Sat, Aug 23, 2008 at 2:40 AM, Karsten F.
> <karsten-lucene[at]fiz-technik.de>wrote:
>
>>
>> Hi David,
>>
>> this is not true, please take a look to
>> IndexWriter#setRAMBufferSizeMB
>> and
>> IndexWriter#setMaxBufferedDocs
>>
>> But you can produce 9 segments (each with only one document), if
>> you call
>> IndexWriter#flush
>> or
>> IndexWriter#commit
>> after each addDocument
>>
>> so from my knowledge about lucene there is no difference between
>> #flush
>> and
>> #optimize(getMergeFactor())
>> (btw #optimize() is equal to optimize(1) ).
>>
>>
>> Best regards
>> Karsten
>>
>> p.s. and yes, searching goes through every segment.
>>
>>
>> David Lee-26 wrote:
>>>
>>> So from what I understand, is it true that if mergeFactor is 10,
>>> then
>> when
>>> I
>>> index my first 9 documents, I have 9 separate segments, each
>>> containing 1
>>> document? And when searching, it will search through every segment?
>>>
>>> Thanks!
>>> David
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Clarification-about-segments-tp19117115p19120086.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
>> For additional commands, e-mail: java-user-help[at]lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.