Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: kinosearch: discuss
Re: Queries with large number of hits.
 

Index | Next | Previous | View Flat


marvin at rectangular

Sep 14, 2008, 4:36 PM


Views: 13736
Permalink
Re: Queries with large number of hits. [In reply to]

On Sep 13, 2008, at 1:56 PM, Dan wrote:

> So now I have made claims... :)
> I'll try to give more details.

In my book, benchmarking claims presented without code, corpus, stats,
raw data, and detailed methodological descriptions qualify as
"anecdotal evidence". If you have a scientific background, you know
what that means: not to be ignored, but requiring a high degree of
skepticism and not particularly useful.

> So as you can see this whole "test" is pretty simple with many
> possible holes to try and get this Apples Vs Oranges test running.

KinoSearch is a low-level engine analogous to Lucene; Solr is a higher-
level library built on top of Lucene that does a lot of extra stuff,
including copious caching.

A comparison of Lucene to KinoSearch would be more germane from a
development standpoint. By using Solr rather than Lucene, you've
polluted the experiment with an extra layer of variables. I actually
think that testing with all of Solr's default caching mechanisms *on*
would be more interesting in a sense than what we've gotten from you
so far. It wouldn't be helpful for development in terms of
identifying optimization opportunities within KS, but it might be more
interesting for decision makers.

> Is there anything I can do to make these searches perform better?

There are a couple of known issues that on the todo list that affect
search speed. One is a bugfix (SegPList_Skip_To had to be temporarily
disabled due to corrupt .skip files), and the other is a design flaw,
described in <http://www.mail-archive.com/java-dev [at] lucene/msg15825.html
>. Additionally, implementing the PForDelta compression algorithm
for postings should speed up searching, but I'd planned to put that off.

However, measuring progress on those issues using a closed source
benchmark with "many possible holes" would be foolish. If we're going
to do benchmarking at all, we're going to do it right: <http://www.rectangular.com/kinosearch/benchmarks.html
>.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch [at] rectangular
http://www.rectangular.com/mailman/listinfo/kinosearch

Subject User Time
Queries with large number of hits. dmarkham at gmail Sep 13, 2008, 1:56 PM
    Re: Queries with large number of hits. henka at cityweb Sep 14, 2008, 10:41 AM
        Re: Queries with large number of hits. dmarkham at gmail Sep 14, 2008, 12:12 PM
    Re: Queries with large number of hits. marvin at rectangular Sep 14, 2008, 4:36 PM
    Re: Queries with large number of hits. dmarkham at gmail Sep 14, 2008, 6:05 PM
    Re: Queries with large number of hits. nate at verse Sep 14, 2008, 10:02 PM
    Re: Queries with large number of hits. dmarkham at gmail Sep 14, 2008, 10:55 PM
    Re: Queries with large number of hits. marvin at rectangular Sep 16, 2008, 11:36 PM
    Re: Queries with large number of hits. dmarkham at gmail Sep 17, 2008, 11:23 AM
    Re: Queries with large number of hits. nate at verse Sep 17, 2008, 1:16 PM
    Re: Queries with large number of hits. marvin at rectangular Sep 18, 2008, 9:25 PM
    Re: Queries with large number of hits. nate at verse Sep 19, 2008, 11:25 AM
    Re: Queries with large number of hits. marvin at rectangular Sep 19, 2008, 5:28 PM
    Re: Queries with large number of hits. marvin at rectangular Sep 19, 2008, 7:14 PM
    Re: Queries with large number of hits. dmarkham at gmail Sep 19, 2008, 10:37 PM

  Index | Next | Previous | View Flat
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.