
marvin at rectangular
Mar 7, 2007, 10:05 PM
Post #8 of 11
(1024 views)
Permalink
|
On Mar 7, 2007, at 8:36 PM, Chris Nandor wrote: > I was updating my searcher code, and previously I had been setting the > offset passed to seek() using $hits->total_hits. I don't understand what the use is of this, unless it's to naively retrieve the last (worst) matches. FYI, in KS 0.15 and earlier, calling total_hits before seek() actually triggers a call to seek(0, 100) internally. It's not possible to know how many documents a query matches without running the whole scoring routine. Note that there's not much difference between calling seek(0, 10) and seek(0, 100). The only change is the size of a priority queue; the cost of matching and scoring remains the same. KS 0.15 also performed unnecessary seeks in some cases -- for instance, calling seek(0, 10) when you've already called seek(0, 100) shouldn't be necessary, but KS was doing that if you called seek(0, 10) after total_hits(). This has changed in 0.20. Credit to Henry for identifying the issue. > But now I can't get that > before I call seek(), While you were "able" to get it before, you still had doubled costs. > and as a result, I was passing num_wanted => 0 to > search(). This bug in my code causing a bus error in KS. Heh. I'll go fix that. > That said, I wonder if 0 or something similar might be a way to denote > "send everything." There are memory and performance implications for setting a large num_wanted. Hits are collected in a priority queue, and the size of the queue is determined by num_wanted. > My workaround now is to send $reader->num_docs instead, > which is fine too, I think. That will work -- sort of. If your index is large, that's gonna be a huge priority queue. Each element in the queue is either a ScoreDoc (16 bytes) or, when sorting, a FieldDoc (20 bytes presently, and probably about to grow to take in an arbitrary string). Marvin Humphrey Rectangular Research http://www.rectangular.com/
|