Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

new sorting api and some perf numbers

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


john.wang at gmail

Oct 11, 2009, 10:19 PM

Post #1 of 2 (324 views)
Permalink
new sorting api and some perf numbers

Hi guys:
The new FieldComparator api looks really scary :)

But after some perf testing with numbers I'd like to share, I guess it
is worth it:

HW: Mac Pro with 16G memory
jvm: 1.6.0_13"
jvm arg: -Xms1g -Xmx1g -server

setup

index:
1M docs even split into 8 segments (to make sure the test is fair across
segment boundaries)
each doc has 3 fields:
1) id - stored
2) val - random number, indexed, not analyzed, no norms, omit tf
3) string - "even" or "odd" of the corresponding id, not analyzed, no norms,
omit tf

built with lucene 2.4.1 to keep the same index across lucene 2.4.1 and
lucene 2.9.0 search tests

Search:
query on the term: "even" (TermQuery, minimizes the overhead of the text
search), matches 500k docs, and across segment boundary, sort by val, sort
type: string. Numhits, e.g. number of slots = 100.

ran 20 iterations of the same query for each test.

First query, includes loading

lucene 2.4.1: 4858ms, lucene 2.9.0: 816ms, gain of 595%

avg of the rest 19 queries:

lucene 2.4.1: 32ms, lucene 2.9.0: 17ms , gain of 188%

I ran this test about 5 times, the findings are similar.

The performance gain is significant!

Great job!

-John


bradfordstephens at gmail

Oct 12, 2009, 1:01 PM

Post #2 of 2 (287 views)
Permalink
Re: new sorting api and some perf numbers [In reply to]

Wow! This is awesome. Can't wait to see how it plays with Bobo :)

On Sun, Oct 11, 2009 at 10:19 PM, John Wang <john.wang [at] gmail> wrote:
> Hi guys:
>    The new FieldComparator api looks really scary :)
>
>    But after some perf testing with numbers I'd like to share, I guess it
> is worth it:
>
> HW: Mac Pro with 16G memory
> jvm: 1.6.0_13"
> jvm arg: -Xms1g -Xmx1g -server
>
> setup
>
> index:
> 1M docs even split into 8 segments (to make sure the test is fair across
> segment boundaries)
> each doc has 3 fields:
> 1) id - stored
> 2) val - random number, indexed, not analyzed, no norms, omit tf
> 3) string - "even" or "odd" of the corresponding id, not analyzed, no norms,
> omit tf
>
> built with lucene 2.4.1 to keep the same index across lucene 2.4.1 and
> lucene 2.9.0 search tests
>
> Search:
> query on the term: "even" (TermQuery, minimizes the overhead of the text
> search), matches 500k docs, and across segment boundary, sort by val, sort
> type: string. Numhits, e.g. number of slots = 100.
>
> ran 20 iterations of the same query for each test.
>
> First query, includes loading
>
> lucene 2.4.1: 4858ms, lucene 2.9.0: 816ms, gain of 595%
>
> avg of the rest 19 queries:
>
> lucene 2.4.1: 32ms, lucene 2.9.0: 17ms , gain of 188%
>
> I ran this test about 5 times, the findings are similar.
>
> The performance gain is significant!
>
> Great job!
>
> -John
>



--
http://www.drawntoscaleconsulting.com - Scalability, Hadoop, HBase,
and Distributed Lucene Consulting

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.