Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] Updated: (LUCENE-2061) Create benchmark & approach for testing Lucene's near real-time performance

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Nov 13, 2009, 7:13 AM

Post #1 of 3 (276 views)
Permalink
[jira] Updated: (LUCENE-2061) Create benchmark & approach for testing Lucene's near real-time performance

[ https://issues.apache.org/jira/browse/LUCENE-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2061:
---------------------------------------

Attachment: LUCENE-2061.patch

Attached first cut python script nrtBench.py.

You have to edit the constants up top, to point to both Wiki XML
export and a Wiki line file. It use the XML export to build up the
base index, and then the line file to do the "live" indexing.

It first runs a baseline, redline searching with 9 (default) threads,
and reports the net qps. (You'll have to write a queries.txt w/ the
queries to test). Then it steps through NRT reopen rates of every
0.1, 1.0, 2.5, 5.0 seconds X indexing rate of 1, 10, 100, 1000 per sec
(using 2 indexing threads), and then redlines the search threads,
comparing their search throughput to the baseline.


> Create benchmark & approach for testing Lucene's near real-time performance
> ---------------------------------------------------------------------------
>
> Key: LUCENE-2061
> URL: https://issues.apache.org/jira/browse/LUCENE-2061
> Project: Lucene - Java
> Issue Type: Task
> Components: Index
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: LUCENE-2061.patch
>
>
> With the improvements to contrib/benchmark in LUCENE-2050, it's now
> possible to create compelling algs to test indexing & searching
> throughput against a periodically reopened near-real-time reader from
> the IndexWriter.
> Coming out of the discussions in LUCENE-1526, I think to properly
> characterize NRT, we should measure net search throughput as a
> function of both reopen rate (ie how often you get a new NRT reader
> from the writer) and indexing rate. We should also separately measure
> pure adds vs updates (deletes + adds); the latter is much more work
> for Lucene.
> This can help apps make capacity decisions... and can help us test
> performance of pending improvements for NRT (eg LUCENE-1313,
> LUCENE-2047).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 14, 2009, 2:55 AM

Post #2 of 3 (240 views)
Permalink
[jira] Updated: (LUCENE-2061) Create benchmark & approach for testing Lucene's near real-time performance [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2061:
---------------------------------------

Attachment: LUCENE-2061.patch

New nrtBench.py attached, fixed a few small issues... also, I removed
-Xbatch to java; it seems to make less consistent results.

My initial results:


JAVA:
java version "1.6.0_14"
Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)


OS:
SunOS rhumba 5.11 snv_111b i86pc i386 i86pc Solaris


Baseline QPS 158.12

||Indexing docs/sec||NRT reopen period (sec)||QPS add||QPS update||QPS add (% diff)||QPS update (% diff)||
|1|1|157.5|125.7|{color:red}-0.4%{color}|{color:red}-20.5%{color}|
|1|2.5|157.6|127.5|{color:red}-0.4%{color}|{color:red}-19.4%{color}|
|1|5|156.9|127.2|{color:red}-0.8%{color}|{color:red}-19.5%{color}|
|10|0.1|156.3|142.4|{color:red}-1.2%{color}|{color:red}-9.9%{color}|
|10|0.5|155.8|125.0|{color:red}-1.5%{color}|{color:red}-20.9%{color}|
|10|1|156.0|142.6|{color:red}-1.3%{color}|{color:red}-9.8%{color}|
|10|2.5|156.6|143.4|{color:red}-0.9%{color}|{color:red}-9.3%{color}|
|10|5|156.2|144.0|{color:red}-1.2%{color}|{color:red}-8.9%{color}|
|100|0.1|153.9|138.8|{color:red}-2.7%{color}|{color:red}-12.2%{color}|
|100|0.5|155.0|141.1|{color:red}-2.0%{color}|{color:red}-10.8%{color}|
|100|1|156.1|141.3|{color:red}-1.3%{color}|{color:red}-10.6%{color}|
|100|2.5|155.9|116.7|{color:red}-1.4%{color}|{color:red}-26.2%{color}|
|100|5|157.0|143.8|{color:red}-0.7%{color}|{color:red}-9.1%{color}|
|1000|0.1|145.9|110.0|{color:red}-7.7%{color}|{color:red}-30.4%{color}|
|1000|0.5|148.0|117.6|{color:red}-6.4%{color}|{color:red}-25.6%{color}|
|1000|1|148.3|97.7|{color:red}-6.2%{color}|{color:red}-38.2%{color}|
|1000|2.5|149.3|99.1|{color:red}-5.6%{color}|{color:red}-37.3%{color}|
|1000|5|147.4|124.3|{color:red}-6.8%{color}|{color:red}-21.4%{color}|

The docs are ~1KB sized docs derived from wikipedia. The searching is
only running a single fixed query (1), over and over.

Some rough observations:

* Even at only 1 update/sec, QPS already drops way too much
(~20%), which is weird. Something is amiss.

* At all indexing rates, handling updates slows NRT down much more
than pure adds.

* Pure adds (no deletes) are handled quite well, at <= 100 adds/sec,
the hit to QPS is ~1-2%, which is great. Even at 1000 docs/sec
the QPS only drops ~6%, which seems reasonable.

* There isn't really a clear correlation of reopen rate to QPS,
which is also weird.

Looks like we have some puzzles to solve...



> Create benchmark & approach for testing Lucene's near real-time performance
> ---------------------------------------------------------------------------
>
> Key: LUCENE-2061
> URL: https://issues.apache.org/jira/browse/LUCENE-2061
> Project: Lucene - Java
> Issue Type: Task
> Components: Index
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: LUCENE-2061.patch, LUCENE-2061.patch
>
>
> With the improvements to contrib/benchmark in LUCENE-2050, it's now
> possible to create compelling algs to test indexing & searching
> throughput against a periodically reopened near-real-time reader from
> the IndexWriter.
> Coming out of the discussions in LUCENE-1526, I think to properly
> characterize NRT, we should measure net search throughput as a
> function of both reopen rate (ie how often you get a new NRT reader
> from the writer) and indexing rate. We should also separately measure
> pure adds vs updates (deletes + adds); the latter is much more work
> for Lucene.
> This can help apps make capacity decisions... and can help us test
> performance of pending improvements for NRT (eg LUCENE-1313,
> LUCENE-2047).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 20, 2009, 9:26 AM

Post #3 of 3 (192 views)
Permalink
[jira] Updated: (LUCENE-2061) Create benchmark & approach for testing Lucene's near real-time performance [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2061:
---------------------------------------

Attachment: LUCENE-2061.patch

Just attaching latest nrtBench.py...

> Create benchmark & approach for testing Lucene's near real-time performance
> ---------------------------------------------------------------------------
>
> Key: LUCENE-2061
> URL: https://issues.apache.org/jira/browse/LUCENE-2061
> Project: Lucene - Java
> Issue Type: Task
> Components: Index
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: LUCENE-2061.patch, LUCENE-2061.patch, LUCENE-2061.patch
>
>
> With the improvements to contrib/benchmark in LUCENE-2050, it's now
> possible to create compelling algs to test indexing & searching
> throughput against a periodically reopened near-real-time reader from
> the IndexWriter.
> Coming out of the discussions in LUCENE-1526, I think to properly
> characterize NRT, we should measure net search throughput as a
> function of both reopen rate (ie how often you get a new NRT reader
> from the writer) and indexing rate. We should also separately measure
> pure adds vs updates (deletes + adds); the latter is much more work
> for Lucene.
> This can help apps make capacity decisions... and can help us test
> performance of pending improvements for NRT (eg LUCENE-1313,
> LUCENE-2047).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.