Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] [Comment Edited] (LUCENE-4283) Support more frequent skip with Block Postings Format

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Aug 3, 2012, 12:25 PM

Post #1 of 1 (39 views)
Permalink
[jira] [Comment Edited] (LUCENE-4283) Support more frequent skip with Block Postings Format

[ https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428311#comment-13428311 ]

Han Jiang edited comment on LUCENE-4283 at 8/3/12 7:24 PM:
-----------------------------------------------------------

bq. Can't we call skipWriter.bufferSkip every skipInterval docs (and pass it lastDocID, etc.)? Then it can write the skip point immediately.
Hmm, actually, no. We can't predict the df when buffering skip data, therefore, we may save extra skip data inside the vInt block. For example, df=128+33 and interval=32.

bq. Also, in BlockPostingsReader, why do we need a separate docBufferOffset? Can't we just set docBufferUpto to wherever (36, 64, 96) we had skipped to within the block?
Yes, you're right! I'll clean up those codes.

was (Author: billy):
bq. Can't we call skipWriter.bufferSkip every skipInterval docs (and pass it lastDocID, etc.)? Then it can write the skip point immediately.
Hmm, actually, no. We can't predict the df when buffering skip data, therefore, we may save extra skip data for the vInt block. For example, df=128+33 and interval=32.

bq. Also, in BlockPostingsReader, why do we need a separate docBufferOffset? Can't we just set docBufferUpto to wherever (36, 64, 96) we had skipped to within the block?
Yes, you're right! I'll clean up those codes.

> Support more frequent skip with Block Postings Format
> -----------------------------------------------------
>
> Key: LUCENE-4283
> URL: https://issues.apache.org/jira/browse/LUCENE-4283
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Han Jiang
> Priority: Minor
> Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize. Every time the skipper reaches the last level 0 skip point, we'll have to decode a whole block to read doc/freq data. Also, a higher level skip list will be created only for those df>blockSize^k, which means for most terms, skipping will just be a linear scan. If we increase current blockSize for better bulk i/o performance, current skip setting will be a bottleneck.
> For ForPF, the encoded block can be easily splitted if we set skipInterval=32*k.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.