marvin at rectangular
Sep 14, 2008, 4:36 PM
On Sep 13, 2008, at 1:56 PM, Dan wrote:
> So now I have made claims... :)
> I'll try to give more details.
In my book, benchmarking claims presented without code, corpus, stats,
raw data, and detailed methodological descriptions qualify as
"anecdotal evidence". If you have a scientific background, you know
what that means: not to be ignored, but requiring a high degree of
skepticism and not particularly useful.
> So as you can see this whole "test" is pretty simple with many
> possible holes to try and get this Apples Vs Oranges test running.
KinoSearch is a low-level engine analogous to Lucene; Solr is a higher-
level library built on top of Lucene that does a lot of extra stuff,
including copious caching.
A comparison of Lucene to KinoSearch would be more germane from a
development standpoint. By using Solr rather than Lucene, you've
polluted the experiment with an extra layer of variables. I actually
think that testing with all of Solr's default caching mechanisms *on*
would be more interesting in a sense than what we've gotten from you
so far. It wouldn't be helpful for development in terms of
identifying optimization opportunities within KS, but it might be more
interesting for decision makers.
> Is there anything I can do to make these searches perform better?
There are a couple of known issues that on the todo list that affect
search speed. One is a bugfix (SegPList_Skip_To had to be temporarily
disabled due to corrupt .skip files), and the other is a design flaw,
described in <http://www.mail-archive.com/java-dev [at] lucene/msg15825.html
>. Additionally, implementing the PForDelta compression algorithm
for postings should speed up searching, but I'd planned to put that off.
However, measuring progress on those issues using a closed source
benchmark with "many possible holes" would be foolish. If we're going
to do benchmarking at all, we're going to do it right: <http://www.rectangular.com/kinosearch/benchmarks.html
KinoSearch mailing list
KinoSearch [at] rectangular