Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

 

 

First page Previous page 1 2 3 4 5 Next page Last page  View All Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Oct 20, 2009, 9:41 AM

Post #1 of 110 (905 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767870#action_12767870 ]

Michael McCandless commented on LUCENE-1997:
--------------------------------------------

OK I ran sortBench.py on opensolaris 2009.06 box, Java 1.6.0_13.

It'd be great if others with more mainstream platforms (Linux,
Windows) could run this and post back.

Raw results (only ran on the log-sized segments):

||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change||
|log|1|318481|title|10|114.26|112.40|{color:red}-1.6%{color}|
|log|1|318481|title|25|117.59|110.08|{color:red}-6.4%{color}|
|log|1|318481|title|50|116.22|106.96|{color:red}-8.0%{color}|
|log|1|318481|title|100|114.48|100.07|{color:red}-12.6%{color}|
|log|1|318481|title|500|103.16|73.98|{color:red}-28.3%{color}|
|log|1|318481|title|1000|95.60|57.85|{color:red}-39.5%{color}|
|log|<all>|1000000|title|10|95.71|109.41|{color:green}14.3%{color}|
|log|<all>|1000000|title|25|111.56|101.73|{color:red}-8.8%{color}|
|log|<all>|1000000|title|50|110.56|98.84|{color:red}-10.6%{color}|
|log|<all>|1000000|title|100|104.09|93.02|{color:red}-10.6%{color}|
|log|<all>|1000000|title|500|93.36|66.67|{color:red}-28.6%{color}|
|log|<all>|1000000|title|1000|97.07|50.03|{color:red}-48.5%{color}|
|log|<all>|1000000|rand string|10|118.10|109.63|{color:red}-7.2%{color}|
|log|<all>|1000000|rand string|25|107.68|102.33|{color:red}-5.0%{color}|
|log|<all>|1000000|rand string|50|107.12|100.37|{color:red}-6.3%{color}|
|log|<all>|1000000|rand string|100|110.63|95.17|{color:red}-14.0%{color}|
|log|<all>|1000000|rand string|500|79.97|72.09|{color:red}-9.9%{color}|
|log|<all>|1000000|rand string|1000|76.82|54.67|{color:red}-28.8%{color}|
|log|<all>|1000000|country|10|129.49|103.63|{color:red}-20.0%{color}|
|log|<all>|1000000|country|25|111.74|102.60|{color:red}-8.2%{color}|
|log|<all>|1000000|country|50|108.82|100.90|{color:red}-7.3%{color}|
|log|<all>|1000000|country|100|108.01|96.84|{color:red}-10.3%{color}|
|log|<all>|1000000|country|500|97.60|72.02|{color:red}-26.2%{color}|
|log|<all>|1000000|country|1000|85.19|54.56|{color:red}-36.0%{color}|
|log|<all>|1000000|rand int|10|151.75|110.37|{color:red}-27.3%{color}|
|log|<all>|1000000|rand int|25|138.06|109.15|{color:red}-20.9%{color}|
|log|<all>|1000000|rand int|50|135.40|106.49|{color:red}-21.4%{color}|
|log|<all>|1000000|rand int|100|108.30|101.86|{color:red}-5.9%{color}|
|log|<all>|1000000|rand int|500|94.45|73.42|{color:red}-22.3%{color}|
|log|<all>|1000000|rand int|1000|88.30|54.71|{color:red}-38.0%{color}|

Some observations:

* MultiPQ seems like it's generally slower, thought it is faster in
one case, when topN = 10, sorting by title. It's only faster with
the *:* (MatchAllDocsQuery) query, not with the TermQuery for
term=1, which is odd.

* MultiPQ slows down, relatively, as topN increases.

* Sorting by int acts differently: MultiPQ is quite a bit slower
across the board, except for topN=100


> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2009, 8:14 PM

Post #2 of 110 (867 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769039#action_12769039 ]

Mark Miller commented on LUCENE-1997:
-------------------------------------

Results from John Wang:

||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change||
|log|<all>|1000000|rand string|10|91.76|108.63|{color:green}18.4%{color}|
|log|<all>|1000000|rand string|25|92.39|106.79|{color:green}15.6%{color}|
|log|<all>|1000000|rand string|50|91.30|104.02|{color:green}13.9%{color}|
|log|<all>|1000000|rand string|500|86.16|63.27|{color:red}-26.6%{color}|
|log|<all>|1000000|rand string|1000|76.92|64.85|{color:red}-15.7%{color}|
|log|<all>|1000000|country|10|92.42|108.78|{color:green}17.7%{color}|
|log|<all>|1000000|country|25|92.60|106.26|{color:green}14.8%{color}|
|log|<all>|1000000|country|50|92.64|103.76|{color:green}12.0%{color}|
|log|<all>|1000000|country|500|83.92|50.30|{color:red}-40.1%{color}|
|log|<all>|1000000|country|1000|74.78|46.59|{color:red}-37.7%{color}|
|log|<all>|1000000|rand int|10|114.03|114.85|{color:green}0.7%{color}|
|log|<all>|1000000|rand int|25|113.77|112.92|{color:red}-0.7%{color}|
|log|<all>|1000000|rand int|50|113.36|109.56|{color:red}-3.4%{color}|
|log|<all>|1000000|rand int|500|103.90|66.29|{color:red}-36.2%{color}|
|log|<all>|1000000|rand int|1000|89.52|70.67|{color:red}-21.1%{color}|

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2009, 8:18 PM

Post #3 of 110 (874 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769042#action_12769042 ]

Jake Mannix commented on LUCENE-1997:
-------------------------------------

Hah! Thanks for posting that, Mark! Much easier to read. :)

Hey John, can you comment with your hardware specs on this, so it can be recorded for posterity? ;)

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2009, 8:26 PM

Post #4 of 110 (863 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769045#action_12769045 ]

John Wang commented on LUCENE-1997:
-----------------------------------

My machine HW spec:

Model Name: MacBook Pro
Model Identifier: MacBookPro3,1
Processor Name: Intel Core 2 Duo
Processor Speed: 2.4 GHz
Number Of Processors: 1
Total Number Of Cores: 2
L2 Cache: 4 MB
Memory: 4 GB
Bus Speed: 800 MHz

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2009, 9:13 PM

Post #5 of 110 (872 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769051#action_12769051 ]

Mark Miller commented on LUCENE-1997:
-------------------------------------

Another run:

I made the changes to int/string comparator to do the faster compare.
Java 1.5.0_20
Laptop
Quad Core - 2.0 Ghz
Ubuntu 9.10 Kernel 2.6.31
4 GB RAM

||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change||
|log|1|317925|title|10|87.38|75.42|{color:red}-13.7%{color}|
|log|1|317925|title|25|86.55|74.49|{color:red}-13.9%{color}|
|log|1|317925|title|50|90.49|71.90|{color:red}-20.5%{color}|
|log|1|317925|title|100|88.07|83.08|{color:red}-5.7%{color}|
|log|1|317925|title|500|76.67|54.34|{color:red}-29.1%{color}|
|log|1|317925|title|1000|69.29|38.54|{color:red}-44.4%{color}|
|log|<all>|1000000|title|10|109.01|92.78|{color:red}-14.9%{color}|
|log|<all>|1000000|title|25|108.30|89.43|{color:red}-17.4%{color}|
|log|<all>|1000000|title|50|107.19|85.86|{color:red}-19.9%{color}|
|log|<all>|1000000|title|100|94.84|80.25|{color:red}-15.4%{color}|
|log|<all>|1000000|title|500|78.84|49.10|{color:red}-37.7%{color}|
|log|<all>|1000000|title|1000|72.52|26.90|{color:red}-62.9%{color}|
|log|<all>|1000000|rand string|10|115.32|101.53|{color:red}-12.0%{color}|
|log|<all>|1000000|rand string|25|115.22|91.82|{color:red}-20.3%{color}|
|log|<all>|1000000|rand string|50|114.40|89.70|{color:red}-21.6%{color}|
|log|<all>|1000000|rand string|100|91.30|81.04|{color:red}-11.2%{color}|
|log|<all>|1000000|rand string|500|76.31|43.94|{color:red}-42.4%{color}|
|log|<all>|1000000|rand string|1000|67.33|28.29|{color:red}-58.0%{color}|
|log|<all>|1000000|country|10|115.40|101.46|{color:red}-12.1%{color}|
|log|<all>|1000000|country|25|115.06|92.15|{color:red}-19.9%{color}|
|log|<all>|1000000|country|50|114.03|90.06|{color:red}-21.0%{color}|
|log|<all>|1000000|country|100|99.30|80.07|{color:red}-19.4%{color}|
|log|<all>|1000000|country|500|75.64|43.44|{color:red}-42.6%{color}|
|log|<all>|1000000|country|1000|66.05|27.94|{color:red}-57.7%{color}|
|log|<all>|1000000|rand int|10|118.47|109.30|{color:red}-7.7%{color}|
|log|<all>|1000000|rand int|25|118.72|99.37|{color:red}-16.3%{color}|
|log|<all>|1000000|rand int|50|118.25|95.14|{color:red}-19.5%{color}|
|log|<all>|1000000|rand int|100|97.57|83.39|{color:red}-14.5%{color}|
|log|<all>|1000000|rand int|500|86.55|46.21|{color:red}-46.6%{color}|
|log|<all>|1000000|rand int|1000|78.23|28.94|{color:red}-63.0%{color}|



> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2009, 9:25 PM

Post #6 of 110 (865 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769053#action_12769053 ]

Yonik Seeley commented on LUCENE-1997:
--------------------------------------

While Java5 numbers are still important, I'd say that Java6 (-server of course) should be weighted far heavier? That must be what a majority of people are running in production for new systems?


> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2009, 9:37 PM

Post #7 of 110 (861 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769055#action_12769055 ]

Mark Miller commented on LUCENE-1997:
-------------------------------------

Hey John, did you pull from a wiki dump or use the random index?

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2009, 9:41 PM

Post #8 of 110 (868 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769056#action_12769056 ]

Jake Mannix commented on LUCENE-1997:
-------------------------------------

Java6 is standard in production servers, since when? What justified lucene staying java1.4 for so long if this is the case? In my own experience, my last job only moved to java1.5 a year ago, and at my current company, we're still on 1.5, and I've seen that be pretty common, and I'm in the Valley, where things update pretty quickly.

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2009, 9:43 PM

Post #9 of 110 (873 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769058#action_12769058 ]

Jake Mannix commented on LUCENE-1997:
-------------------------------------

I would say that of course weighting more highly linux and solaris should be done over results on macs, because while I love my mac, I've yet to see a production cluster running on MacBook Pros... :)

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2009, 9:53 PM

Post #10 of 110 (867 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769059#action_12769059 ]

Yonik Seeley commented on LUCENE-1997:
--------------------------------------

bq. Java6 is standard in production servers, since when?

Maybe I'm wrong... it was just a guess. It's just what I've seen most customers deploying new projects on.

bq. What justified lucene staying java1.4 for so long if this is the case?

The decision of what JVM a business should use to deploy their new app is a very different one than what Lucene should require.
A minority of users may be justification enough to avoid requring a new JVM... unless the benefits are really that huge. Lucene does not target the JVM that most people will be deploying on - if that were the case, I have a feeling we'd be switching to Java6 instead of Java5.

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2009, 9:55 PM

Post #11 of 110 (868 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769060#action_12769060 ]

Mark Miller commented on LUCENE-1997:
-------------------------------------

Same system, Java 1.6.0_15

||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change||
|log|1|317925|title|10|105.46|97.11|{color:red}-7.9%{color}|
|log|1|317925|title|25|109.08|98.34|{color:red}-9.8%{color}|
|log|1|317925|title|50|108.01|93.99|{color:red}-13.0%{color}|
|log|1|317925|title|100|105.79|84.08|{color:red}-20.5%{color}|
|log|1|317925|title|500|91.12|50.28|{color:red}-44.8%{color}|
|log|1|317925|title|1000|80.51|33.59|{color:red}-58.3%{color}|
|log|<all>|1000000|title|10|113.89|105.39|{color:red}-7.5%{color}|
|log|<all>|1000000|title|25|113.14|102.13|{color:red}-9.7%{color}|
|log|<all>|1000000|title|50|111.30|96.51|{color:red}-13.3%{color}|
|log|<all>|1000000|title|100|86.77|83.86|{color:red}-3.4%{color}|
|log|<all>|1000000|title|500|78.00|42.15|{color:red}-46.0%{color}|
|log|<all>|1000000|title|1000|70.50|27.02|{color:red}-61.7%{color}|
|log|<all>|1000000|rand string|10|107.78|106.09|{color:red}-1.6%{color}|
|log|<all>|1000000|rand string|25|103.09|102.53|{color:red}-0.5%{color}|
|log|<all>|1000000|rand string|50|106.42|95.17|{color:red}-10.6%{color}|
|log|<all>|1000000|rand string|100|86.28|85.41|{color:red}-1.0%{color}|
|log|<all>|1000000|rand string|500|76.69|37.76|{color:red}-50.8%{color}|
|log|<all>|1000000|rand string|1000|68.48|22.95|{color:red}-66.5%{color}|
|log|<all>|1000000|country|10|103.36|106.79|{color:green}3.3%{color}|
|log|<all>|1000000|country|25|103.43|102.69|{color:red}-0.7%{color}|
|log|<all>|1000000|country|50|102.93|94.97|{color:red}-7.7%{color}|
|log|<all>|1000000|country|100|108.49|85.71|{color:red}-21.0%{color}|
|log|<all>|1000000|country|500|80.87|38.23|{color:red}-52.7%{color}|
|log|<all>|1000000|country|1000|67.24|22.79|{color:red}-66.1%{color}|
|log|<all>|1000000|rand int|10|120.59|112.03|{color:red}-7.1%{color}|
|log|<all>|1000000|rand int|25|119.80|107.49|{color:red}-10.3%{color}|
|log|<all>|1000000|rand int|50|119.96|98.84|{color:red}-17.6%{color}|
|log|<all>|1000000|rand int|100|88.58|89.24|{color:green}0.7%{color}|
|log|<all>|1000000|rand int|500|83.50|40.13|{color:red}-51.9%{color}|
|log|<all>|1000000|rand int|1000|74.80|23.83|{color:red}-68.1%{color}|


> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2009, 10:06 PM

Post #12 of 110 (864 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769085#action_12769085 ]

Mark Miller commented on LUCENE-1997:
-------------------------------------

bq. Java6 is standard in production servers, since when?

bq. Maybe I'm wrong... it was just a guess. It's just what I've seen most customers deploying new projects on.

Thats my impression too - Java 1.6 is mainly just a bug fix and performance release and has been out for a while, so its usually the choice I've seen.
Sounds like Uwe thinks its more buggy though, so who knows if thats a good idea :)

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2009, 10:09 PM

Post #13 of 110 (863 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769088#action_12769088 ]

Mark Miller commented on LUCENE-1997:
-------------------------------------

John, what happened to your topn:100 results?

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2009, 10:13 PM

Post #14 of 110 (868 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769089#action_12769089 ]

Yonik Seeley commented on LUCENE-1997:
--------------------------------------

There was a bad stretch in Java6... they plopped in a major JVM upgrade (not just bug fixes) and there were bugs. I think that's been behind us for a little while now though. If someone were starting a project today, I'd recommend the latest Java6 JVM.

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2009, 10:15 PM

Post #15 of 110 (862 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769090#action_12769090 ]

John Wang commented on LUCENE-1997:
-----------------------------------

bq: topn:100
I had made changes to sortBench.py to look at each run. And forgot to add back in 100 :) My bad.


> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2009, 10:23 PM

Post #16 of 110 (861 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769092#action_12769092 ]

Mark Miller commented on LUCENE-1997:
-------------------------------------

Anyone got a Windows box to run this on? I'm only running windows on a VM these days.

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2009, 10:37 PM

Post #17 of 110 (863 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769097#action_12769097 ]

Mark Miller commented on LUCENE-1997:
-------------------------------------

bq. There was a bad stretch in Java6...

But how can that be :)? Number 10 of the top 10 of whats new is the -lites! :)

10. The -lities: Quality, Compatibility, Stability

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 22, 2009, 11:03 PM

Post #18 of 110 (866 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769099#action_12769099 ]

Uwe Schindler commented on LUCENE-1997:
---------------------------------------

bq. Thats my impression too - Java 1.6 is mainly just a bug fix and performance release and has been out for a while, so its usually the choice I've seen. Sounds like Uwe thinks its more buggy though, so who knows if thats a good idea

Because of this, for Lucene 3.0 we should say, it's a Java 1.5 compatible release. As Mark said, Java 6 does not contain anything really new useable for Lucene, so we are fine with staying on 1.5. If somebody wants to use 1.5 or 1.6 it's his choice, but we should not force people to use 1.6. If at least one developer uses 1.5 for developing, we have no problem with maybe some added functions in core classes we accidently use (like String.isEmpty() - which is a common problem because it was added in 1.6 and many developers use it intuitive).

Even though 1.5 is EOLed by Sun, they recently added a new release 1.5.0_21. I was also wondering about that, but it seems that Sun is still providing "support" for it.

About the stability: maybe it is better now, but I have seen so many crashed JVMs in the earlier versions <= _12, so I stayed on 1.5. But we are also thinking of switching here at some time.

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 23, 2009, 12:05 AM

Post #19 of 110 (865 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769116#action_12769116 ]

John Wang commented on LUCENE-1997:
-----------------------------------

I think I found the reason for the discrepancy: 32 vs 64 bit:

32-bit, run
jwang-mn:benchmark jwang$ python -u sortBench.py -report john3

||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change||
|log|<all>|1000000|rand string|10|92.24|103.65|{color:green}12.4%{color}|
|log|<all>|1000000|rand string|25|91.88|102.06|{color:green}11.1%{color}|
|log|<all>|1000000|rand string|50|91.72|99.07|{color:green}8.0%{color}|
|log|<all>|1000000|rand string|100|106.26|90.61|{color:red}-14.7%{color}|
|log|<all>|1000000|rand string|500|86.38|59.88|{color:red}-30.7%{color}|
|log|<all>|1000000|rand string|1000|74.88|39.93|{color:red}-46.7%{color}|
|log|<all>|1000000|country|10|92.33|103.79|{color:green}12.4%{color}|
|log|<all>|1000000|country|25|92.27|101.60|{color:green}10.1%{color}|
|log|<all>|1000000|country|50|91.58|99.14|{color:green}8.3%{color}|
|log|<all>|1000000|country|100|100.76|82.25|{color:red}-18.4%{color}|
|log|<all>|1000000|country|500|75.18|48.65|{color:red}-35.3%{color}|
|log|<all>|1000000|country|1000|67.68|32.67|{color:red}-51.7%{color}|
|log|<all>|1000000|rand int|10|88.14|101.93|{color:green}15.6%{color}|
|log|<all>|1000000|rand int|25|95.02|96.14|{color:green}1.2%{color}|
|log|<all>|1000000|rand int|50|96.54|89.61|{color:red}-7.2%{color}|
|log|<all>|1000000|rand int|100|88.58|92.06|{color:green}3.9%{color}|
|log|<all>|1000000|rand int|500|103.60|62.25|{color:red}-39.9%{color}|
|log|<all>|1000000|rand int|1000|92.36|40.84|{color:red}-55.8%{color}|

64bit run:
jwang-mn:benchmark jwang$ python -u sortBench.py -report john4

||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change||
|log|<all>|1000000|rand string|10|119.59|107.52|{color:red}-10.1%{color}|
|log|<all>|1000000|rand string|25|119.25|105.05|{color:red}-11.9%{color}|
|log|<all>|1000000|rand string|50|117.22|101.99|{color:red}-13.0%{color}|
|log|<all>|1000000|rand string|100|95.78|86.19|{color:red}-10.0%{color}|
|log|<all>|1000000|rand string|500|76.05|54.71|{color:red}-28.1%{color}|
|log|<all>|1000000|rand string|1000|68.37|38.94|{color:red}-43.0%{color}|
|log|<all>|1000000|country|10|119.68|108.12|{color:red}-9.7%{color}|
|log|<all>|1000000|country|25|119.10|105.72|{color:red}-11.2%{color}|
|log|<all>|1000000|country|50|115.85|99.70|{color:red}-13.9%{color}|
|log|<all>|1000000|country|100|97.44|91.03|{color:red}-6.6%{color}|
|log|<all>|1000000|country|500|78.92|40.97|{color:red}-48.1%{color}|
|log|<all>|1000000|country|1000|68.48|30.43|{color:red}-55.6%{color}|
|log|<all>|1000000|rand int|10|121.64|108.82|{color:red}-10.5%{color}|
|log|<all>|1000000|rand int|25|121.68|113.92|{color:red}-6.4%{color}|
|log|<all>|1000000|rand int|50|120.80|110.45|{color:red}-8.6%{color}|
|log|<all>|1000000|rand int|100|101.36|95.68|{color:red}-5.6%{color}|
|log|<all>|1000000|rand int|500|90.15|60.29|{color:red}-33.1%{color}|
|log|<all>|1000000|rand int|1000|80.23|40.67|{color:red}-49.3%{color}|



> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 23, 2009, 12:17 AM

Post #20 of 110 (870 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769119#action_12769119 ]

John Wang commented on LUCENE-1997:
-----------------------------------

wrote a small test and verified that 64bit vm's string compare is much faster than that of 32-bit. (kinda makes sense)
and the above numbers now all make sense.

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 23, 2009, 12:53 AM

Post #21 of 110 (863 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769127#action_12769127 ]

Uwe Schindler commented on LUCENE-1997:
---------------------------------------

So it does not have something to do with Java 1.5/1.6 but more with 32/64 bit. As most servers are running 64 bit, I think the new 2.9 search API is fine?

I agree with you, the new API is cleaner at all, the old API could only be reimplemented with major refactorings, as it does not fit well in multi-segment search.

By the way, I found during refactoring for Java5 some inconsistenceies in MultiSearcher/ParallelMultiSearcher, which uses FieldDocSortedHitQueue (its used nowhere else anymore): During sorting it uses when merging the queues of all Searcher some native compareTo operations, which may not work correct with custom comparators. Is this correct. In my opinion this queue sshould also somehow use at least the FieldComparator. Mark, do not understand it completely, but how does this fit together. I added a warning because of very strange casts in the source code (unsafe casts) and a SuppressWarnings("unchecked") so its easy to find in FieldDocSortedHitQueue.

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 23, 2009, 6:32 AM

Post #22 of 110 (837 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769221#action_12769221 ]

Mark Miller commented on LUCENE-1997:
-------------------------------------

bq. but how does this fit together.

Thats what Comparable FieldComparator#value is for - fillFields will grab all those and load up FieldDoc fields - so the custom FieldComparator is tied into it - it creates Comparable objects that can be compared by the native compareTos.

{code}
/**
* Given a queue Entry, creates a corresponding FieldDoc
* that contains the values used to sort the given document.
* These values are not the raw values out of the index, but the internal
* representation of them. This is so the given search hit can be collated by
* a MultiSearcher with other search hits.
*
* @param entry The Entry used to create a FieldDoc
* @return The newly created FieldDoc
* @see Searchable#search(Weight,Filter,int,Sort)
*/
FieldDoc fillFields(final Entry entry) {
final int n = comparators.length;
final Comparable[] fields = new Comparable[n];
for (int i = 0; i < n; ++i) {
fields[i] = comparators[i].value(entry.slot);
}
//if (maxscore > 1.0f) doc.score /= maxscore; // normalize scores
return new FieldDoc(entry.docID, entry.score, fields);
}
{code}

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 23, 2009, 6:34 AM

Post #23 of 110 (836 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769222#action_12769222 ]

Mark Miller commented on LUCENE-1997:
-------------------------------------

bq. As most servers are running 64 bit,

Aren't we at the tipping point where even non servers are 64bit now? My consumer desktop/laptops have been 64-bit for years now.

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 23, 2009, 6:54 AM

Post #24 of 110 (834 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769227#action_12769227 ]

Uwe Schindler commented on LUCENE-1997:
---------------------------------------

bq. it creates Comparable objects that can be compared by the native compareTos. (the old API did the same thing)

OK understood. I will try to fix the generics somehow to be able to remove the SuppressWarnings.

> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Oct 23, 2009, 9:34 AM

Post #25 of 110 (839 views)
Permalink
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769287#action_12769287 ]

Michael McCandless commented on LUCENE-1997:
--------------------------------------------

Env:

JAVA:
java version "1.5.0_19"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_19-b02)
Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_19-b02, mixed mode)


OS:
SunOS rhumba 5.11 snv_111b i86pc i386 i86pc Solaris


Results:

||Source||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change||
|wiki|log|1|318481|title|10|98.47|104.60|{color:green}6.2%{color}|
|wiki|log|1|318481|title|25|97.90|103.63|{color:green}5.9%{color}|
|wiki|log|1|318481|title|50|105.12|101.50|{color:red}-3.4%{color}|
|wiki|log|1|318481|title|100|102.30|108.59|{color:green}6.1%{color}|
|wiki|log|1|318481|title|500|89.43|79.40|{color:red}-11.2%{color}|
|wiki|log|1|318481|title|1000|82.83|63.75|{color:red}-23.0%{color}|
|wiki|log|<all>|1000000|title|10|152.56|157.40|{color:green}3.2%{color}|
|wiki|log|<all>|1000000|title|25|151.95|148.52|{color:red}-2.3%{color}|
|wiki|log|<all>|1000000|title|50|148.52|142.90|{color:red}-3.8%{color}|
|wiki|log|<all>|1000000|title|100|127.70|138.72|{color:green}8.6%{color}|
|wiki|log|<all>|1000000|title|500|104.30|90.30|{color:red}-13.4%{color}|
|wiki|log|<all>|1000000|title|1000|99.10|66.05|{color:red}-33.4%{color}|
|random|log|<all>|1000000|rand string|10|153.13|157.74|{color:green}3.0%{color}|
|random|log|<all>|1000000|rand string|25|128.79|150.62|{color:green}17.0%{color}|
|random|log|<all>|1000000|rand string|50|122.46|153.95|{color:green}25.7%{color}|
|random|log|<all>|1000000|rand string|100|116.26|141.43|{color:green}21.6%{color}|
|random|log|<all>|1000000|rand string|500|98.24|96.17|{color:red}-2.1%{color}|
|random|log|<all>|1000000|rand string|1000|86.38|71.95|{color:red}-16.7%{color}|
|random|log|<all>|1000000|country|10|148.65|153.23|{color:green}3.1%{color}|
|random|log|<all>|1000000|country|25|148.52|152.69|{color:green}2.8%{color}|
|random|log|<all>|1000000|country|50|122.01|149.52|{color:green}22.5%{color}|
|random|log|<all>|1000000|country|100|120.39|145.99|{color:green}21.3%{color}|
|random|log|<all>|1000000|country|500|99.70|95.65|{color:red}-4.1%{color}|
|random|log|<all>|1000000|country|1000|90.18|69.46|{color:red}-23.0%{color}|
|random|log|<all>|1000000|rand int|10|150.85|171.22|{color:green}13.5%{color}|
|random|log|<all>|1000000|rand int|25|151.13|167.94|{color:green}11.1%{color}|
|random|log|<all>|1000000|rand int|50|152.51|162.23|{color:green}6.4%{color}|
|random|log|<all>|1000000|rand int|100|130.54|145.04|{color:green}11.1%{color}|
|random|log|<all>|1000000|rand int|500|108.38|43.74|{color:red}-59.6%{color}|
|random|log|<all>|1000000|rand int|1000|98.27|63.56|{color:red}-35.3%{color}|


> Explore performance of multi-PQ vs single-PQ sorting API
> --------------------------------------------------------
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.9
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests. Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available). Then
> it runs various combinations:
> * Index with 20 balanced segments vs index with the "normal" log
> segment size
> * Queries with different numbers of hits (only for wikipedia index)
> * Different top N
> * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept. The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

First page Previous page 1 2 3 4 5 Next page Last page  View All Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.