
markrmiller at gmail
Nov 2, 2009, 3:34 PM
Post #85 of 110
(782 views)
Permalink
|
|
Re: [jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[In reply to]
|
|
Yeah, I was half joking about the Latin - I'm sure you do know what it means. But you still chose to reword it as "the" concern. And so I chose to flaunt my ridiculously miniscule grasp of high school Latin :) - Mark http://www.lucidimagination.com (mobile) On Nov 2, 2009, at 6:27 PM, Jake Mannix <jake.mannix [at] gmail> wrote: > Right, that's my common sense saying that I'm guessing that worrying > about memory usage of the order of numSegments * numResultsReturned > is rarely going to be an issue. > > I think I actually took latin once, so I may have even known what > "eg" literally meant back at some point in prehistory. I just have > seen how these discussions can catch on one particular thing that is > brought up, and then derailed into long banterings back and forth on > that topic like it was the most important in the world for ever (for > example, how I'm highlighting this one parenthetical remark which > Mike made), losing what the actual consensus and disagreement was. > > As far as I could see, in general, for most common use cases, the > performance was so comparable as to be changeable by wind patterns, > and ram usage is such that in comparison to how much memory a lucene > system which is asking for thousands of hits to be returned across > hundred-segment indices (for typical LogMergePolicy, this would only > happen once you get up toward a billion documents) is probably > already so large that worrying about another 10KB here and there is > really going to get washed away in the same way that the leading > order complexity is totally dominating the CPU time. > > The real thing to focus on is the API itself, for maximal ease of > use for the next possibly 3 years (if the timeframe of 2.x is any > indicator). > > -jake > > On Mon, Nov 2, 2009 at 3:16 PM, Mark Miller <markrmiller [at] gmail> > wrote: > That I don't know. Probably fewer than more, but that's just me > guessing based on common sense :) > > > - Mark > > http://www.lucidimagination.com (mobile) > > On Nov 2, 2009, at 6:13 PM, Jake Mannix <jake.mannix [at] gmail> wrote: > >> Ok, and how many of those users are also running on indices with >> hundreds of segments? >> >> -jake >> >> On Mon, Nov 2, 2009 at 3:10 PM, Mark Miller <markrmiller [at] gmail> >> wrote: >> There are plenty of Lucene users that do go 1000 in. We've been >> calling it "deep paging" at LI. I like that name :) >> >> - Mark >> >> http://www.lucidimagination.com (mobile) >> >> >> On Nov 2, 2009, at 6:04 PM, "Jake Mannix (JIRA)" <jira [at] apache> >> wrote: >> >> >> [ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772741#action_12772741 >> ] >> >> Jake Mannix commented on LUCENE-1997: >> ------------------------------------- >> >> The current concern is to do with the memory? I'm more concerned >> with the weird "java ghosts" that are flying around, sometimes >> swaying results by 20-40%... the memory could only be an issue on >> a setup with hundreds of segments and sorting the top 1000 values >> (do we really try to optimize for this performance case?). In the >> normal case (no more than tens of segments, and the top 10 or 100 >> hits), we're talking about what, 100-1000 PQ entries? >> >> Explore performance of multi-PQ vs single-PQ sorting API >> -------------------------------------------------------- >> >> Key: LUCENE-1997 >> URL: https://issues.apache.org/jira/browse/LUCENE-1997 >> Project: Lucene - Java >> Issue Type: Improvement >> Components: Search >> Affects Versions: 2.9 >> Reporter: Michael McCandless >> Assignee: Michael McCandless >> Attachments: LUCENE-1997.patch, LUCENE-1997.patch, >> LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, >> LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, >> LUCENE-1997.patch >> >> >> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java- >> dev, >> where a simpler (non-segment-based) comparator API is proposed that >> gathers results into multiple PQs (one per segment) and then merges >> them in the end. >> I started from John's multi-PQ code and worked it into >> contrib/benchmark so that we could run perf tests. Then I generified >> the Python script I use for running search benchmarks (in >> contrib/benchmark/sortBench.py). >> The script first creates indexes with 1M docs (based on >> SortableSingleDocSource, and based on wikipedia, if available). Then >> it runs various combinations: >> * Index with 20 balanced segments vs index with the "normal" log >> segment size >> * Queries with different numbers of hits (only for wikipedia index) >> * Different top N >> * Different sorts (by title, for wikipedia, and by random string, >> random int, and country for the random index) >> For each test, 7 search rounds are run and the best QPS is kept. The >> script runs singlePQ then multiPQ, and records the resulting best QPS >> for each and produces table (in Jira format) as output. >> >> -- >> This message is automatically generated by JIRA. >> - >> You can reply to this email to add a comment to the issue online. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene >> For additional commands, e-mail: java-dev-help [at] lucene >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene >> For additional commands, e-mail: java-dev-help [at] lucene >> >> >
|