
ab at getopt
Apr 26, 2012, 7:21 AM
Views: 318
Permalink
|
On 26/04/2012 09:49, Uwe Schindler wrote: > There are possibilities to truncate those lists, but this is not implemented > in Lucene core. The main problem with Lucene's segmented index structure is, > that you cannot early exit, because the very last document in the posting > list could be one with a very high score. Techniques like index pruning or > index sorting can optimize all this, but need index preprocessing and loose > updatability of the index while in use. > > The Top-N result collection is optimized in Lucene (item #3), to throw away > all collected documents, where the score is never be able to get into the > top-n priority queue (too small score, once the queue is full). But the > score has to be calculated. This paper presents yet another option: keep a value that represents max. impact of postings in a block and then skip whole blocks if the max impact is too low to get docs within the block in the queue: http://cis.poly.edu/suel/papers/bmw.pdf Implementing this in Lucene would require changing the iterators so that they can take a threshold value, i.e. the current lowest score in the queue, so that nextDoc() and advance() could skip whole blocks when their max impact is lower than the current lowest score. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe [at] lucene For additional commands, e-mail: java-user-help [at] lucene
|