Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Comparing Indexing Speed of Lucene 3.5 and 4.0

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


peathal at yahoo

Jan 3, 2012, 8:56 AM

Post #1 of 8 (331 views)
Permalink
Comparing Indexing Speed of Lucene 3.5 and 4.0

Hi,

I recently switched an experimental project from Lucene 3.5 to 4.0 from
6th Dec 2011
and my indexing time increased by nearly 20% on my local machine*.
It seems to me that two simple StringField's could cause this slow down:
Field uIdField = new Field("_uid", "" + id, StringField.TYPE_STORED);
Field typeField = new Field("_type", "test", StringField.TYPE_STORED);

Without them Lucene 4 is faster**. Here is a recreation using different
branches for every lucene version:
https://github.com/karussell/lucene-tmp
Or is there something wrong with my too simplistic scenario?

Furthermore: How could I further improve Lucene 4.0 indexing speed?
(I already read through the performance list on the wiki)

Regards,
Peter.

*
open jdk 1.6.0_20 (but also confirmed with latest java6 from oracle)
ubuntu/10.10 linux/2.6.35-31 i686, 2GB ram

**
lucene 3.5
23.5sec index all three fields: _id, _uid, type
19.0sec index only the _id field

lucene 4
29.5sec index _id, _uid, type
16.5sec index only the _id


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


simon.willnauer at googlemail

Jan 3, 2012, 12:43 PM

Post #2 of 8 (329 views)
Permalink
Re: Comparing Indexing Speed of Lucene 3.5 and 4.0 [In reply to]

hey Peter,

as far as I can see you are comparing apples and pears. Your
comparison is waiting for merges to finish and if you are using
multiple threads lucene 4.0 will flush more segments to disk than 3.5
so what you are seeing is likely a merge that is still trying to merge
small segments. can you rerun and only measure the time until the last
commit finishes (not the close)

one more thing, you are indexing always the more or less same document
and the text is very very short. You should add some more randomness
or reality to your test.

simon

On Tue, Jan 3, 2012 at 5:56 PM, Peter K <peathal [at] yahoo> wrote:
> Hi,
>
> I recently switched an experimental project from Lucene 3.5 to 4.0 from
> 6th Dec 2011
> and my indexing time increased by nearly 20% on my local machine*.
> It seems to me that two simple StringField's could cause this slow down:
> Field uIdField = new Field("_uid", "" + id, StringField.TYPE_STORED);
> Field typeField = new Field("_type", "test", StringField.TYPE_STORED);
>
> Without them Lucene 4 is faster**. Here is a recreation using different
> branches for every lucene version:
> https://github.com/karussell/lucene-tmp
> Or is there something wrong with my too simplistic scenario?
>
> Furthermore: How could I further improve Lucene 4.0 indexing speed?
> (I already read through the performance list on the wiki)
>
> Regards,
> Peter.
>
> *
> open jdk 1.6.0_20  (but also confirmed with latest java6 from oracle)
> ubuntu/10.10 linux/2.6.35-31 i686, 2GB ram
>
> **
> lucene 3.5
> 23.5sec index all three fields: _id, _uid, type
> 19.0sec index only the _id field
>
> lucene 4
> 29.5sec index _id, _uid, type
> 16.5sec index only the _id
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


peathal at yahoo

Jan 3, 2012, 3:52 PM

Post #3 of 8 (363 views)
Permalink
Re: Comparing Indexing Speed of Lucene 3.5 and 4.0 [In reply to]

Thanks Simon for you answer!

> as far as I can see you are comparing apples and pears.

When excluding the waiting time I also get the slight but reproducable
difference**. The times for waitForGeneration are nearly the same
(~2sec). Also when I commit instead waitForGeneration it is no
difference. Would you mind to give me some more hints/explanations and
I'll try to digg deeper :) !

> Your comparison is waiting for merges to finish and if you are using multiple threads lucene 4.0 will flush more segments to disk than 3.5

It does not seem to be an 'IO related issue' because using RAMDirectory
results in the same times.
And indexing via Luc4 with only one thread shouldn't be slower than 3.5 (?)


> You should add some more randomness or reality to your test.

Hmmh, ok. The uid and type is the reality in my other (experimental)
project as it uses a generated and incremented id from AtomicLong and
two types.
Or do you have an explanation why luc4 can be slower on such 'simple'
fields?

Could it be due to some garbage collector or thread overhead with luc4?
As I see a bigger execution speed variation for single lucene 4.0 runs
(differences of seconds!) than for 3.5 (differences in 0.1seconds!).
E.g. how could I try to reduce those/some threads?

Regards,
Peter.



**
sw = new StopWatch("perf" + trial).start();
for (int i = 0; i < items; i++) {
innerRun(trial, i);
}
float indexingTime = sw.stop().getSeconds();


// luc4.0
@Override public void innerRun(int trial, int i) {
long id = i;
Document newDoc = new Document();
NumericField idField = new NumericField("_id", 6, Field.Store.YES,
true).setLongValue(id);
Field uIdField = new Field("_uid", "" + id, Field.Store.YES,
Field.Index.NOT_ANALYZED_NO_NORMS);
uIdField.setIndexOptions(IndexOptions.DOCS_ONLY);
Field typeField = new Field("_type", "test", Field.Store.YES,
Field.Index.NOT_ANALYZED_NO_NORMS);
typeField.setIndexOptions(IndexOptions.DOCS_ONLY);
newDoc.add(idField);
newDoc.add(uIdField);
newDoc.add(typeField);
try {
String longStr = NumericUtils.longToPrefixCoded(id);
latestGen = nrtManager.updateDocument(new Term("_id", longStr),
newDoc);
docs++;
} catch (IOException ex) {
logger.error("Cannot update " + i, ex);
}
}


// luc3.5
@Override public void innerRun(int trial, int i) {
long id = i;
Document newDoc = new
Document();
NumericField idField = new NumericField("_id", 6,
NumericField.TYPE_STORED).setLongValue(id);
Field uIdField = new Field("_uid", "" + id, StringField.TYPE_STORED);
Field typeField = new Field("_type", "test", StringField.TYPE_STORED);
newDoc.add(idField);
newDoc.add(uIdField);
newDoc.add(typeField);
try {
// problem when reusing: nrt thread and this thread access the
same bytes at the same time!
final BytesRef bytes = new BytesRef();
NumericUtils.longToPrefixCoded(id, 0, bytes);
latestGen = nrtManager.updateDocument(new Term("_id", bytes),
newDoc);
docs++;
} catch (IOException ex) {
logger.error("Cannot update " + i, ex);
}
}

> hey Peter,
>
> as far as I can see you are comparing apples and pears. Your
> comparison is waiting for merges to finish and if you are using
> multiple threads lucene 4.0 will flush more segments to disk than 3.5
> so what you are seeing is likely a merge that is still trying to merge
> small segments. can you rerun and only measure the time until the last
> commit finishes (not the close)
>
> one more thing, you are indexing always the more or less same document
> and the text is very very short. You should add some more randomness
> or reality to your test.
>
> simon
>
> On Tue, Jan 3, 2012 at 5:56 PM, Peter K <peathal [at] yahoo> wrote:
>> Hi,
>>
>> I recently switched an experimental project from Lucene 3.5 to 4.0 from
>> 6th Dec 2011
>> and my indexing time increased by nearly 20% on my local machine*.
>> It seems to me that two simple StringField's could cause this slow down:
>> Field uIdField = new Field("_uid", "" + id, StringField.TYPE_STORED);
>> Field typeField = new Field("_type", "test", StringField.TYPE_STORED);
>>
>> Without them Lucene 4 is faster**. Here is a recreation using different
>> branches for every lucene version:
>> https://github.com/karussell/lucene-tmp
>> Or is there something wrong with my too simplistic scenario?
>>
>> Furthermore: How could I further improve Lucene 4.0 indexing speed?
>> (I already read through the performance list on the wiki)
>>
>> Regards,
>> Peter.
>>
>> *
>> open jdk 1.6.0_20 (but also confirmed with latest java6 from oracle)
>> ubuntu/10.10 linux/2.6.35-31 i686, 2GB ram
>>
>> **
>> lucene 3.5
>> 23.5sec index all three fields: _id, _uid, type
>> 19.0sec index only the _id field
>>
>> lucene 4
>> 29.5sec index _id, _uid, type
>> 16.5sec index only the _id
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


simon.willnauer at googlemail

Jan 5, 2012, 12:21 AM

Post #4 of 8 (326 views)
Permalink
Re: Comparing Indexing Speed of Lucene 3.5 and 4.0 [In reply to]

hey peter,

On Wed, Jan 4, 2012 at 12:52 AM, Peter K <peathal [at] yahoo> wrote:
> Thanks Simon for you answer!
>
>> as far as I can see you are comparing apples and pears.
>
> When excluding the waiting time I also get the slight but reproducable
> difference**. The times for waitForGeneration are nearly the same
> (~2sec). Also when I commit instead waitForGeneration it is no
> difference. Would you mind to give me some more hints/explanations and
> I'll try to digg deeper :) !
>
>> Your comparison is waiting for merges to finish and if you are using multiple threads lucene 4.0 will flush more segments to disk than 3.5
>
> It does not seem to be an 'IO related issue' because using RAMDirectory
> results in the same times.
> And indexing via Luc4 with only one thread shouldn't be slower than 3.5 (?)

it could be since we use a different term dictionary impl which is
more expensive in building than the previous versions; thats just a
guess.
What I am really wondering is why you are using the NRT manager and
reopen during indexing - are you measuring the NRT reopen times too? -
maybe you can run your tests without NRT support, just plain indexing
What merge policies are you using for 3x and 4x?


>
>
>> You should add some more randomness or reality to your test.
>
> Hmmh, ok. The uid and type is the reality in my other (experimental)
> project as it uses a generated and incremented id from AtomicLong and
> two types.
> Or do you have an explanation why luc4 can be slower on such 'simple'
> fields?

you reported that indexing only the ID is faster in 4.x but the other
fields AFAIK are likely always the same for all docs, no? maybe there
is some weirdness that the term dict takes longer on those kind of
inputs?

>
> Could it be due to some garbage collector or thread overhead with luc4?
> As I see a bigger execution speed variation for single lucene 4.0 runs
> (differences of seconds!) than for 3.5 (differences in 0.1seconds!).
> E.g. how could I try to reduce those/some threads?

you are indexing with one thread right? I mean my benchmarks show up
to 300% improvement with 4.x versus older versions so something is
weird ie. non-realistic here or there is a bug so lets figure this
out. Can you profile you app and see if you find something suspicious?
I'd also try to index way more documents to make your benchmarks run
little longer just to be sure.

simon
>
> Regards,
> Peter.
>
>
>
> **
> sw = new StopWatch("perf" + trial).start();
> for (int i = 0; i < items; i++) {
>    innerRun(trial, i);
> }
> float indexingTime = sw.stop().getSeconds();
>
>
> // luc4.0
> @Override public void innerRun(int trial, int i) {
>    long id = i;
>    Document newDoc = new Document();
>    NumericField idField = new NumericField("_id", 6, Field.Store.YES,
> true).setLongValue(id);
>    Field uIdField = new Field("_uid", "" + id, Field.Store.YES,
> Field.Index.NOT_ANALYZED_NO_NORMS);
>    uIdField.setIndexOptions(IndexOptions.DOCS_ONLY);
>    Field typeField = new Field("_type", "test", Field.Store.YES,
> Field.Index.NOT_ANALYZED_NO_NORMS);
>    typeField.setIndexOptions(IndexOptions.DOCS_ONLY);
>    newDoc.add(idField);
>    newDoc.add(uIdField);
>    newDoc.add(typeField);
>    try {
>        String longStr = NumericUtils.longToPrefixCoded(id);
>        latestGen = nrtManager.updateDocument(new Term("_id", longStr),
> newDoc);
>        docs++;
>    } catch (IOException ex) {
>        logger.error("Cannot update " + i, ex);
>    }
> }
>
>
> // luc3.5
> @Override public void innerRun(int trial, int i) {
>    long id = i;
>    Document newDoc = new
> Document();
>    NumericField idField = new NumericField("_id", 6,
> NumericField.TYPE_STORED).setLongValue(id);
>    Field uIdField = new Field("_uid", "" + id, StringField.TYPE_STORED);
>    Field typeField = new Field("_type", "test", StringField.TYPE_STORED);
>    newDoc.add(idField);
>    newDoc.add(uIdField);
>    newDoc.add(typeField);
>    try {
>        // problem when reusing: nrt thread and this thread access the
> same bytes at the same time!
>        final BytesRef bytes = new BytesRef();
>        NumericUtils.longToPrefixCoded(id, 0, bytes);
>        latestGen = nrtManager.updateDocument(new Term("_id", bytes),
> newDoc);
>        docs++;
>    } catch (IOException ex) {
>        logger.error("Cannot update " + i, ex);
>    }
> }
>
>> hey Peter,
>>
>> as far as I can see you are comparing apples and pears. Your
>> comparison is waiting for merges to finish and if you are using
>> multiple threads lucene 4.0 will flush more segments to disk than 3.5
>> so what you are seeing is likely a merge that is still trying to merge
>> small segments. can you rerun and only measure the time until the last
>> commit finishes (not the close)
>>
>> one more thing, you are indexing always the more or less same document
>> and the text is very very short. You should add some more randomness
>> or reality to your test.
>>
>> simon
>>
>> On Tue, Jan 3, 2012 at 5:56 PM, Peter K <peathal [at] yahoo> wrote:
>>> Hi,
>>>
>>> I recently switched an experimental project from Lucene 3.5 to 4.0 from
>>> 6th Dec 2011
>>> and my indexing time increased by nearly 20% on my local machine*.
>>> It seems to me that two simple StringField's could cause this slow down:
>>> Field uIdField = new Field("_uid", "" + id, StringField.TYPE_STORED);
>>> Field typeField = new Field("_type", "test", StringField.TYPE_STORED);
>>>
>>> Without them Lucene 4 is faster**. Here is a recreation using different
>>> branches for every lucene version:
>>> https://github.com/karussell/lucene-tmp
>>> Or is there something wrong with my too simplistic scenario?
>>>
>>> Furthermore: How could I further improve Lucene 4.0 indexing speed?
>>> (I already read through the performance list on the wiki)
>>>
>>> Regards,
>>> Peter.
>>>
>>> *
>>> open jdk 1.6.0_20  (but also confirmed with latest java6 from oracle)
>>> ubuntu/10.10 linux/2.6.35-31 i686, 2GB ram
>>>
>>> **
>>> lucene 3.5
>>> 23.5sec index all three fields: _id, _uid, type
>>> 19.0sec index only the _id field
>>>
>>> lucene 4
>>> 29.5sec index _id, _uid, type
>>> 16.5sec index only the _id
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


peathal at yahoo

Jan 5, 2012, 4:25 AM

Post #5 of 8 (315 views)
Permalink
Re: Comparing Indexing Speed of Lucene 3.5 and 4.0 [In reply to]

Hi Simon,

answers below.

>> It does not seem to be an 'IO related issue' because using RAMDirectory
>> results in the same times.
>> And indexing via Luc4 with only one thread shouldn't be slower than 3.5 (?)
> it could be since we use a different term dictionary impl which is
> more expensive in building than the previous versions; thats just a
> guess.
> What I am really wondering is why you are using the NRT manager and
> reopen during indexing - are you measuring the NRT reopen times too?

My project requires reopening as it will then clear some caches.

Reopening isn't that frequent (every 5 seconds). When disabling it the
difference even increases slightly, but the big variation for luc4 goes
away!


> What merge policies are you using for 3x and 4x?

The default ones. I'm now using LogByteSizeMergePolicy for both but it
is nearly the same difference.


>>> You should add some more randomness or reality to your test.
>> Hmmh, ok. The uid and type is the reality in my other (experimental)
>> project as it uses a generated and incremented id from AtomicLong and
>> two types.
>> Or do you have an explanation why luc4 can be slower on such 'simple'
>> fields?
> you reported that indexing only the ID is faster in 4.x but the other
> fields AFAIK are likely always the same for all docs, no?

no, the _uid field is different: it's the id field converted to string.


> you are indexing with one thread right?

yes.


> I mean my benchmarks show up
> to 300% improvement with 4.x versus older versions so something is
> weird ie. non-realistic here or there is a bug so lets figure this
> out. Can you profile you app and see if you find something suspicious?

I'll try now and report back.


> I'd also try to index way more documents to make your benchmarks run
> little longer just to be sure.

For ~5 times more docs (5 mio) it is nearly the same difference.


Regards,
Peter.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


peathal at yahoo

Jan 7, 2012, 4:48 AM

Post #6 of 8 (314 views)
Permalink
Re: Comparing Indexing Speed of Lucene 3.5 and 4.0 [In reply to]

> I mean my benchmarks show up
> to 300% improvement with 4.x versus older versions so something is
> weird ie. non-realistic here or there is a bug so lets figure this
> out. Can you profile you app and see if you find something suspicious?
> I'll try now and report back.

It seems to be largely my mistake: maven enables assertions automatically when running tests.
Executing it as normal public main class results in faster indexing times for 4.0 compared to 3.5.

Conclusion:
1. execution with assertions for 4.0 is slower than 3.5 (thats what I mainly measured :/)
2. luc 4.0 execution times vary more than 3.5 when using reopen thread (and one single indexing thread, others not tested).
3. luc 4.0 then is still slower, but for 5 mio of my items its less then 5%.
The hot spots are:
* 30% ThreadAffinityDocumentsWriterThreadPool -> java.util.concurrent.ConcurrentHashMap.get(Object) -> threadBindings.get
* 26% BufferedDeletesStream.applyTermDeletes(Iterable, SegmentReader)
* 16% FreqProxTermsWriterPerField.flush(String, FieldsConsumer, SegmentWriteState)
* 10% DocFieldProcessor.processDocument

Now when reusing BytesRef in 4.0 (and reusing the char array in 3.5) then luc 4 is >20% faster than 3.5 for 5 mio docs!
But somewhen I had problems as a thread concurrently modified the docs - can this happen e.g. from the reopen thread? Or is it safe to reuse BytesRef?

Regards,
Peter.




> Hi Simon,
>
> answers below.
>
>>> It does not seem to be an 'IO related issue' because using RAMDirectory
>>> results in the same times.
>>> And indexing via Luc4 with only one thread shouldn't be slower than 3.5 (?)
>> it could be since we use a different term dictionary impl which is
>> more expensive in building than the previous versions; thats just a
>> guess.
>> What I am really wondering is why you are using the NRT manager and
>> reopen during indexing - are you measuring the NRT reopen times too?
> My project requires reopening as it will then clear some caches.
>
> Reopening isn't that frequent (every 5 seconds). When disabling it the
> difference even increases slightly, but the big variation for luc4 goes
> away!
>
>
>> What merge policies are you using for 3x and 4x?
> The default ones. I'm now using LogByteSizeMergePolicy for both but it
> is nearly the same difference.
>
>
>>>> You should add some more randomness or reality to your test.
>>> Hmmh, ok. The uid and type is the reality in my other (experimental)
>>> project as it uses a generated and incremented id from AtomicLong and
>>> two types.
>>> Or do you have an explanation why luc4 can be slower on such 'simple'
>>> fields?
>> you reported that indexing only the ID is faster in 4.x but the other
>> fields AFAIK are likely always the same for all docs, no?
> no, the _uid field is different: it's the id field converted to string.
>
>
>> you are indexing with one thread right?
> yes.
>
>
>> I mean my benchmarks show up
>> to 300% improvement with 4.x versus older versions so something is
>> weird ie. non-realistic here or there is a bug so lets figure this
>> out. Can you profile you app and see if you find something suspicious?
> I'll try now and report back.
>
>
>> I'd also try to index way more documents to make your benchmarks run
>> little longer just to be sure.
> For ~5 times more docs (5 mio) it is nearly the same difference.
>
>
> Regards,
> Peter.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


uwe at thetaphi

Jan 7, 2012, 5:03 AM

Post #7 of 8 (315 views)
Permalink
RE: Comparing Indexing Speed of Lucene 3.5 and 4.0 [In reply to]

Hi,

> > I mean my benchmarks show up
> > to 300% improvement with 4.x versus older versions so something is
> > weird ie. non-realistic here or there is a bug so lets figure this
> > out. Can you profile you app and see if you find something suspicious?
> > I'll try now and report back.
>
> It seems to be largely my mistake: maven enables assertions automatically
> when running tests.
> Executing it as normal public main class results in faster indexing times for 4.0
> compared to 3.5.
>
> Conclusion:
> 1. execution with assertions for 4.0 is slower than 3.5 (thats what I mainly
> measured :/)

Die, Maven, die :-)

> 2. luc 4.0 execution times vary more than 3.5 when using reopen thread (and
> one single indexing thread, others not tested).
> 3. luc 4.0 then is still slower, but for 5 mio of my items its less then 5%.
> The hot spots are:
> * 30% ThreadAffinityDocumentsWriterThreadPool ->
> java.util.concurrent.ConcurrentHashMap.get(Object) -> threadBindings.get
> * 26% BufferedDeletesStream.applyTermDeletes(Iterable, SegmentReader)
> * 16% FreqProxTermsWriterPerField.flush(String, FieldsConsumer,
> SegmentWriteState)
> * 10% DocFieldProcessor.processDocument
>
> Now when reusing BytesRef in 4.0 (and reusing the char array in 3.5) then luc 4
> is >20% faster than 3.5 for 5 mio docs!

You can only reuse the BytesRef (I assume the one to encode the numeric key to delete the document) from within the same thread! I see no other BytesRef use in your code. If you reuse the BytesRef, you can also reuse all Fields and Documents - but only within the same thread.

> But somewhen I had problems as a thread concurrently modified the docs - can
> this happen e.g. from the reopen thread? Or is it safe to reuse BytesRef?

In one thread: yes!

Uwe

> Regards,
> Peter.
>
>
>
>
> > Hi Simon,
> >
> > answers below.
> >
> >>> It does not seem to be an 'IO related issue' because using RAMDirectory
> >>> results in the same times.
> >>> And indexing via Luc4 with only one thread shouldn't be slower than 3.5 (?)
> >> it could be since we use a different term dictionary impl which is
> >> more expensive in building than the previous versions; thats just a
> >> guess.
> >> What I am really wondering is why you are using the NRT manager and
> >> reopen during indexing - are you measuring the NRT reopen times too?
> > My project requires reopening as it will then clear some caches.
> >
> > Reopening isn't that frequent (every 5 seconds). When disabling it the
> > difference even increases slightly, but the big variation for luc4 goes
> > away!
> >
> >
> >> What merge policies are you using for 3x and 4x?
> > The default ones. I'm now using LogByteSizeMergePolicy for both but it
> > is nearly the same difference.
> >
> >
> >>>> You should add some more randomness or reality to your test.
> >>> Hmmh, ok. The uid and type is the reality in my other (experimental)
> >>> project as it uses a generated and incremented id from AtomicLong and
> >>> two types.
> >>> Or do you have an explanation why luc4 can be slower on such 'simple'
> >>> fields?
> >> you reported that indexing only the ID is faster in 4.x but the other
> >> fields AFAIK are likely always the same for all docs, no?
> > no, the _uid field is different: it's the id field converted to string.
> >
> >
> >> you are indexing with one thread right?
> > yes.
> >
> >
> >> I mean my benchmarks show up
> >> to 300% improvement with 4.x versus older versions so something is
> >> weird ie. non-realistic here or there is a bug so lets figure this
> >> out. Can you profile you app and see if you find something suspicious?
> > I'll try now and report back.
> >
> >
> >> I'd also try to index way more documents to make your benchmarks run
> >> little longer just to be sure.
> > For ~5 times more docs (5 mio) it is nearly the same difference.
> >
> >
> > Regards,
> > Peter.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


peathal at yahoo

Jan 7, 2012, 5:38 AM

Post #8 of 8 (305 views)
Permalink
Re: Comparing Indexing Speed of Lucene 3.5 and 4.0 [In reply to]

Hi Uwe,

> Die, Maven, die :-)

Well, I for myself have a love-hate-relationship to maven: its simple
and works nice for deps management. also others can set it up quickly
and IDE support is nice. But sometimes it does a bit too much
(unexpected ;)) or is too complicated to customize.


> (I assume the one to encode the numeric key to delete the document)

exactly

>> But somewhen I had problems as a thread concurrently modified the docs - can
>> this happen e.g. from the reopen thread? Or is it safe to reuse BytesRef?
> In one thread: yes!

Thanks for confirming!

Regards,
Peter.

--
http://jetsli.de news reader for geeks


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.