Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Jan 3, 2007, 6:40 PM

Post #1 of 7 (612 views)
Permalink
[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene

[ https://issues.apache.org/jira/browse/LUCENE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462117 ]

Grant Ingersoll commented on LUCENE-675:
----------------------------------------

Doron,

When I apply your patch, I am getting strange errors. It seems to go through cleanly, but then the new files (for instance, byTask.stats.Report.java) has the whole file occurring twice in each file, thus causing duplicate class exceptions. This happens for all the files in the byTask package. The changes in the other files apply cleanly.

I applied the patch as: patch -p0 -i <patch file> as I always do on a clean version.

I suspect that your last comment may be at the root of the issue. Can you try applying this again to a clean version and see if you still have issues or whether it is something I am missing? Can you regenerate this patch, perhaps using a command line tool? Looking at the patch file, I am not sure what the issue is.

Otherwise, based on the documentation, this sounds really interesting and useful. Based on some of your other patches, I assume you are using this to do benchmarking, no?

Thanks,
Grant

> Lucene benchmark: objective performance test for Lucene
> -------------------------------------------------------
>
> Key: LUCENE-675
> URL: https://issues.apache.org/jira/browse/LUCENE-675
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Andrzej Bialecki
> Assigned To: Grant Ingersoll
> Priority: Minor
> Attachments: benchmark.byTask.patch, benchmark.patch, BenchmarkingIndexer.pm, extract_reuters.plx, LuceneBenchmark.java, LuceneIndexer.java, taskBenchmark.zip, timedata.zip, tiny.alg, tiny.properties
>
>
> We need an objective way to measure the performance of Lucene, both indexing and querying, on a known corpus. This issue is intended to collect comments and patches implementing a suite of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is the original Reuters collection, available from http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz or http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. I propose to use this corpus as a base for benchmarks. The benchmarking suite could automatically retrieve it from known locations, and cache it locally.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Jan 4, 2007, 11:05 AM

Post #2 of 7 (569 views)
Permalink
[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462287 ]

Doron Cohen commented on LUCENE-675:
------------------------------------

Grant, thanks for trying this out - I will update the patch shortly.
I am using this for benchmarking - quite easy to add new stuff - and in fact I added some stuff lately but did not update here because wasn't sure if others are interested.
I will verify what I have with svn head and pack it here as an updated patch.
Regards,
Doron

> Lucene benchmark: objective performance test for Lucene
> -------------------------------------------------------
>
> Key: LUCENE-675
> URL: https://issues.apache.org/jira/browse/LUCENE-675
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Andrzej Bialecki
> Assigned To: Grant Ingersoll
> Priority: Minor
> Attachments: benchmark.byTask.patch, benchmark.patch, BenchmarkingIndexer.pm, extract_reuters.plx, LuceneBenchmark.java, LuceneIndexer.java, taskBenchmark.zip, timedata.zip, tiny.alg, tiny.properties
>
>
> We need an objective way to measure the performance of Lucene, both indexing and querying, on a known corpus. This issue is intended to collect comments and patches implementing a suite of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is the original Reuters collection, available from http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz or http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. I propose to use this corpus as a base for benchmarks. The benchmarking suite could automatically retrieve it from known locations, and cache it locally.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Jan 4, 2007, 8:43 PM

Post #3 of 7 (568 views)
Permalink
[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462402 ]

Doron Cohen commented on LUCENE-675:
------------------------------------

This update of the byTask package includes:
- allowing to tailor a perf test "programmically" (without an .alg file).
- maintaining both the "algorithm" and the run-properties in a single .alg file - this is easier to maintain in my opinion.
- some code cleanup.
- build.xml has a single "task related" target now: run-task. an ant property is used to invoke other .alg files.
- documentation updated (package docs under byTask).

To apply the patch from the trunk dir: patch -p0 -i <byTask.2.patch.txt>
To test it, cd to contrib/benchmark and type: ant run-task

Grant, I noticed that the patch file contains EOL characters - Unix/DOS thing I guess.
But 'patch' works cleanly for me either with these characters or without them, so I am leaving these characters there.
I hope this patch applies cleanly for you.


> Lucene benchmark: objective performance test for Lucene
> -------------------------------------------------------
>
> Key: LUCENE-675
> URL: https://issues.apache.org/jira/browse/LUCENE-675
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Andrzej Bialecki
> Assigned To: Grant Ingersoll
> Priority: Minor
> Attachments: benchmark.byTask.patch, benchmark.patch, BenchmarkingIndexer.pm, byTask.2.patch.txt, extract_reuters.plx, LuceneBenchmark.java, LuceneIndexer.java, taskBenchmark.zip, timedata.zip, tiny.alg, tiny.properties
>
>
> We need an objective way to measure the performance of Lucene, both indexing and querying, on a known corpus. This issue is intended to collect comments and patches implementing a suite of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is the original Reuters collection, available from http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz or http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. I propose to use this corpus as a base for benchmarks. The benchmarking suite could automatically retrieve it from known locations, and cache it locally.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Jan 10, 2007, 7:22 PM

Post #4 of 7 (542 views)
Permalink
[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463792 ]

Grant Ingersoll commented on LUCENE-675:
----------------------------------------

Hey Doron,

Your patch uses JDK 1.5. I am assuming it is safe to use Class.getName in place of Class.getSimpleName, right? I think once I do that plus change the String.contains calls to String.indexOf it should all be fine, right? I have it compiling and running, so that is a good sign. I will look to commit soon.

-Grant

> Lucene benchmark: objective performance test for Lucene
> -------------------------------------------------------
>
> Key: LUCENE-675
> URL: https://issues.apache.org/jira/browse/LUCENE-675
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Andrzej Bialecki
> Assigned To: Grant Ingersoll
> Priority: Minor
> Attachments: benchmark.byTask.patch, benchmark.patch, BenchmarkingIndexer.pm, byTask.2.patch.txt, extract_reuters.plx, LuceneBenchmark.java, LuceneIndexer.java, taskBenchmark.zip, timedata.zip, tiny.alg, tiny.properties
>
>
> We need an objective way to measure the performance of Lucene, both indexing and querying, on a known corpus. This issue is intended to collect comments and patches implementing a suite of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is the original Reuters collection, available from http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz or http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. I propose to use this corpus as a base for benchmarks. The benchmarking suite could automatically retrieve it from known locations, and cache it locally.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Jan 10, 2007, 11:49 PM

Post #5 of 7 (541 views)
Permalink
[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463830 ]

Doron Cohen commented on LUCENE-675:
------------------------------------

Oops... I had the impression that compiling with compliance level 1.4 is sufficient to prevent this, but guess I need to read again what that compliance level setting guarantees exactly.

Anyhow there are a 3 things that require 1.5:
- Boolean.parseBoolean() --> Boolean.valueOf().booleanValue()
- String.contains() --> indexOf()
- Class.getSimpleName() --> ?

Modifying Class.getSimpleName() to Class.getName() would not be very nice - queries prints and task names prints would be quite ugly. To fix that I added a method simpleName(Class) to byTask.util.Format. I am attaching an updated patch - byTask.jre1.4.patch.txt - that includes this method and removes the Java 1.5 dependency.

Thanks for catching this!
Doron

> Lucene benchmark: objective performance test for Lucene
> -------------------------------------------------------
>
> Key: LUCENE-675
> URL: https://issues.apache.org/jira/browse/LUCENE-675
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Andrzej Bialecki
> Assigned To: Grant Ingersoll
> Priority: Minor
> Attachments: benchmark.byTask.patch, benchmark.patch, BenchmarkingIndexer.pm, byTask.2.patch.txt, byTask.jre1.4.patch.txt, extract_reuters.plx, LuceneBenchmark.java, LuceneIndexer.java, taskBenchmark.zip, timedata.zip, tiny.alg, tiny.properties
>
>
> We need an objective way to measure the performance of Lucene, both indexing and querying, on a known corpus. This issue is intended to collect comments and patches implementing a suite of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is the original Reuters collection, available from http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz or http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. I propose to use this corpus as a base for benchmarks. The benchmarking suite could automatically retrieve it from known locations, and cache it locally.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


hossman_lucene at fucit

Jan 11, 2007, 12:51 PM

Post #6 of 7 (553 views)
Permalink
Re: [jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene [In reply to]

: Oops... I had the impression that compiling with compliance level 1.4 is
: sufficient to prevent this, but guess I need to read again what that
: compliance level setting guarantees exactly.

NOTE: see LUCENE-718 for an explanation of your problem, and a possible
solution i've been toying with.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Jan 12, 2007, 8:15 PM

Post #7 of 7 (527 views)
Permalink
[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464410 ]

Grant Ingersoll commented on LUCENE-675:
----------------------------------------

Doron,

I have committed your additions. This truly is great stuff. Thank you so much for contributing. The documentation (code and package level) is well done, the output is very readable. The alg language is a bit cryptic and takes a little deciphering, but you do document it quite nicely. I like the extendability factor and I think it will make it easier for people to contribute benchmarking capabilities.

I would love to see someone mod the reporting mechanism in the future to allow for printing info to something other than System.out, as I know people have expressed interest in being able to slurp the output into Excel or similar number crunching tools. This could also lead to the possibility of running some of the algorithms nightly and then integrating with JUnitPerf or some other performance unit testing approach.

We may want to consider deprecating the other benchmarking stuff, although, I suppose it can't hurt to have multiple opinions in this area.

At any rate, this is very much appreciated. I would encourage everyone who is interested in benchmarking to take a look and provide feedback. I'm going to mark this bug as finished for now as I think we have a good baseline for benchmarking at this point.

Thanks again,
Grant




> Lucene benchmark: objective performance test for Lucene
> -------------------------------------------------------
>
> Key: LUCENE-675
> URL: https://issues.apache.org/jira/browse/LUCENE-675
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Andrzej Bialecki
> Assigned To: Grant Ingersoll
> Priority: Minor
> Attachments: benchmark.byTask.patch, benchmark.patch, BenchmarkingIndexer.pm, byTask.2.patch.txt, byTask.jre1.4.patch.txt, extract_reuters.plx, LuceneBenchmark.java, LuceneIndexer.java, taskBenchmark.zip, timedata.zip, tiny.alg, tiny.properties
>
>
> We need an objective way to measure the performance of Lucene, both indexing and querying, on a known corpus. This issue is intended to collect comments and patches implementing a suite of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is the original Reuters collection, available from http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz or http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. I propose to use this corpus as a base for benchmarks. The benchmarking suite could automatically retrieve it from known locations, and cache it locally.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.