
rcmuir at gmail
Jan 11, 2010, 10:59 AM
Post #11 of 13
(2303 views)
Permalink
|
If you look at the code to these indexers, why did they pre-process the corpus to a specific format sphinx needs, then run 'time indexer', but for the lucene benchmark parsing was part of their indexer and included in the benchmarking time? On Mon, Jan 11, 2010 at 1:52 PM, Marvin Humphrey <marvin [at] rectangular> wrote: > On Mon, Jan 11, 2010 at 10:20:04AM -0800, Ashwin Jayaprakash wrote: >> The results here might prove useful - >> http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/ > > They're totally worthless. > > One key design decision I made was not to change any numerical tuning > parameters. I really wanted to test “Out of the Box” performance to > simulate the common developer scenario. Plus, it takes forever to optimize > parameters fairly across multiple platforms and different data sets esp. > for an over-the-weekend benchmark (see disclaimer in the Conclusion > section). > > If you're not going to tune your installation, you deserve the crappy > performance you'll get. > > Exposing shoddy, weekender foolishness like this is why I want ORP to do > scientifically rigorous performance benchmarking as well as relevance > benchmarking. > > Marvin Humphrey > > -- Robert Muir rcmuir [at] gmail
|