Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] [Commented] (LUCENE-4283) Support more frequent skip with Block Postings Format

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Aug 2, 2012, 1:15 PM

Post #1 of 9 (103 views)
Permalink
[jira] [Commented] (LUCENE-4283) Support more frequent skip with Block Postings Format

[ https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427573#comment-13427573 ]

Michael McCandless commented on LUCENE-4283:
--------------------------------------------

Billy, it looks like this patch is a bit stale (it doesn't apply on the current branch)? Can you please update it? Thanks.

> Support more frequent skip with Block Postings Format
> -----------------------------------------------------
>
> Key: LUCENE-4283
> URL: https://issues.apache.org/jira/browse/LUCENE-4283
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Han Jiang
> Priority: Minor
> Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize. Every time the skipper reaches the last level 0 skip point, we'll have to decode a whole block to read doc/freq data. Also, a higher level skip list will be created only for those df>blockSize^k, which means for most terms, skipping will just be a linear scan. If we increase current blockSize for better bulk i/o performance, current skip setting will be a bottleneck.
> For ForPF, the encoded block can be easily splitted if we set skipInterval=32*k.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Aug 2, 2012, 3:11 PM

Post #2 of 9 (99 views)
Permalink
[jira] [Commented] (LUCENE-4283) Support more frequent skip with Block Postings Format [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427667#comment-13427667 ]

Michael McCandless commented on LUCENE-4283:
--------------------------------------------

I think we shouldn't have to do our own buffering up of the skip points within one block?

Can't we call skipWriter.bufferSkip every skipInterval docs (and pass it lastDocID, etc.)? Then it can write the skip point immediately.

Also, in BlockPostingsReader, why do we need a separate docBufferOffset? Can't we just set docBufferUpto to wherever (36, 64, 96) we had skipped to within the block?

> Support more frequent skip with Block Postings Format
> -----------------------------------------------------
>
> Key: LUCENE-4283
> URL: https://issues.apache.org/jira/browse/LUCENE-4283
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Han Jiang
> Priority: Minor
> Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize. Every time the skipper reaches the last level 0 skip point, we'll have to decode a whole block to read doc/freq data. Also, a higher level skip list will be created only for those df>blockSize^k, which means for most terms, skipping will just be a linear scan. If we increase current blockSize for better bulk i/o performance, current skip setting will be a bottleneck.
> For ForPF, the encoded block can be easily splitted if we set skipInterval=32*k.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Aug 3, 2012, 12:25 PM

Post #3 of 9 (99 views)
Permalink
[jira] [Commented] (LUCENE-4283) Support more frequent skip with Block Postings Format [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428311#comment-13428311 ]

Han Jiang commented on LUCENE-4283:
-----------------------------------

bq. Can't we call skipWriter.bufferSkip every skipInterval docs (and pass it lastDocID, etc.)? Then it can write the skip point immediately.
Hmm, actually, no. We can't predict the df when buffering skip data, therefore, we may save extra skip data for the vInt block. For example, df=128+33 and interval=32.

bq. Also, in BlockPostingsReader, why do we need a separate docBufferOffset? Can't we just set docBufferUpto to wherever (36, 64, 96) we had skipped to within the block?
Yes, you're right! I'll clean up those codes.

> Support more frequent skip with Block Postings Format
> -----------------------------------------------------
>
> Key: LUCENE-4283
> URL: https://issues.apache.org/jira/browse/LUCENE-4283
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Han Jiang
> Priority: Minor
> Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize. Every time the skipper reaches the last level 0 skip point, we'll have to decode a whole block to read doc/freq data. Also, a higher level skip list will be created only for those df>blockSize^k, which means for most terms, skipping will just be a linear scan. If we increase current blockSize for better bulk i/o performance, current skip setting will be a bottleneck.
> For ForPF, the encoded block can be easily splitted if we set skipInterval=32*k.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Aug 4, 2012, 2:37 PM

Post #4 of 9 (102 views)
Permalink
[jira] [Commented] (LUCENE-4283) Support more frequent skip with Block Postings Format [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428682#comment-13428682 ]

Michael McCandless commented on LUCENE-4283:
--------------------------------------------

I added some new tasks to luceneutil (AndHighLow, OrHighLow), and also
separated tasks for Low/Med/HighTerm (and same for SpanNear/Phrase
queries) so that we can see the impact on the different queries, and
so that we actually test skipping (AndHighLow).

Then I ran a test w/ the 2nd (non-buggy, partial decode, 32
skipInterval patch):

{noformat}
Task QPS base StdDev base QPS comp StdDev comp Pct diff
AndHighLow 631.54 10.72 101.44 0.70 -84% - -83%
AndHighMed 44.85 0.94 39.31 0.36 -14% - -9%
AndHighHigh 18.39 0.27 16.16 0.08 -13% - -10%
MedSloppyPhrase 12.15 0.14 11.27 0.30 -10% - -3%
MedSpanNear 9.11 0.10 8.58 0.10 -7% - -3%
LowSpanNear 5.05 0.03 4.78 0.03 -6% - -4%
MedPhrase 5.09 0.10 4.81 0.10 -9% - -1%
LowPhrase 7.80 0.08 7.43 0.07 -6% - -2%
HighSloppyPhrase 2.13 0.06 2.04 0.06 -10% - 1%
LowSloppyPhrase 5.28 0.11 5.09 0.15 -8% - 1%
HighTerm 22.85 0.11 22.08 0.56 -6% - 0%
LowTerm 526.19 3.56 510.53 9.14 -5% - 0%
MedTerm 138.34 0.51 134.66 3.58 -5% - 0%
HighPhrase 3.55 0.11 3.46 0.11 -8% - 3%
HighSpanNear 1.64 0.00 1.60 0.02 -3% - 0%
Fuzzy1 99.11 3.49 98.91 2.71 -6% - 6%
Fuzzy2 88.31 3.05 88.19 2.32 -6% - 6%
Respell 77.97 1.75 78.24 1.86 -4% - 5%
PKLookup 192.61 1.47 193.47 1.53 -1% - 2%
OrHighMed 25.14 1.23 25.28 1.16 -8% - 10%
OrHighHigh 9.22 0.47 9.30 0.45 -8% - 11%
OrHighLow 37.28 1.79 37.60 1.75 -8% - 10%
Wildcard 67.88 0.33 69.19 2.70 -2% - 6%
Prefix3 25.67 0.35 26.25 1.22 -3% - 8%
IntNRQ 8.85 0.02 9.27 0.98 -6% - 15%
{noformat}

I'm confused why AndHighLow got slower... this patch should have
lowered the per-skip cost.


> Support more frequent skip with Block Postings Format
> -----------------------------------------------------
>
> Key: LUCENE-4283
> URL: https://issues.apache.org/jira/browse/LUCENE-4283
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Han Jiang
> Priority: Minor
> Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch, LUCENE-4283-slow.patch, LUCENE-4283-small-interval-fully.patch, LUCENE-4283-small-interval-partially.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize. Every time the skipper reaches the last level 0 skip point, we'll have to decode a whole block to read doc/freq data. Also, a higher level skip list will be created only for those df>blockSize^k, which means for most terms, skipping will just be a linear scan. If we increase current blockSize for better bulk i/o performance, current skip setting will be a bottleneck.
> For ForPF, the encoded block can be easily splitted if we set skipInterval=32*k.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Aug 4, 2012, 4:05 PM

Post #5 of 9 (101 views)
Permalink
[jira] [Commented] (LUCENE-4283) Support more frequent skip with Block Postings Format [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428698#comment-13428698 ]

Michael McCandless commented on LUCENE-4283:
--------------------------------------------

I tested the -fully patch:
{noformat}
Task QPS base StdDev base QPS comp StdDev comp Pct diff
AndHighLow 628.46 8.28 155.04 1.42 -75% - -74%
LowSpanNear 5.07 0.02 4.85 0.10 -6% - -2%
MedSpanNear 9.12 0.07 8.86 0.22 -5% - 0%
OrHighMed 26.16 1.15 25.53 2.65 -16% - 12%
AndHighMed 44.92 0.88 43.94 0.30 -4% - 0%
OrHighLow 38.76 1.70 37.97 4.03 -16% - 13%
OrHighHigh 9.57 0.45 9.40 1.02 -16% - 14%
HighTerm 22.88 0.13 22.83 0.95 -4% - 4%
HighSloppyPhrase 2.14 0.10 2.14 0.11 -9% - 10%
LowSloppyPhrase 5.31 0.22 5.32 0.22 -7% - 8%
LowPhrase 7.85 0.09 7.87 0.21 -3% - 3%
HighSpanNear 1.65 0.01 1.66 0.04 -2% - 3%
Respell 77.70 1.24 78.14 2.12 -3% - 4%
MedTerm 138.26 0.52 139.07 5.52 -3% - 4%
PKLookup 193.63 2.06 195.98 2.84 -1% - 3%
MedSloppyPhrase 12.15 0.34 12.33 0.48 -5% - 8%
LowTerm 525.12 4.89 534.89 14.12 -1% - 5%
Fuzzy2 87.20 2.05 89.05 3.27 -3% - 8%
Fuzzy1 97.81 2.33 99.94 3.99 -4% - 8%
AndHighHigh 18.39 0.27 19.62 0.06 4% - 8%
MedPhrase 5.09 0.11 5.52 0.33 0% - 17%
Wildcard 67.59 0.58 73.76 3.37 3% - 15%
Prefix3 25.51 0.39 29.54 1.60 7% - 23%
HighPhrase 3.55 0.12 4.13 0.33 3% - 30%
IntNRQ 8.79 0.08 10.67 1.52 3% - 40%
{noformat}

It seems like we are getting some gains for Med/HighPhrase, but AndHighLow is still way off.

> Support more frequent skip with Block Postings Format
> -----------------------------------------------------
>
> Key: LUCENE-4283
> URL: https://issues.apache.org/jira/browse/LUCENE-4283
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Han Jiang
> Priority: Minor
> Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch, LUCENE-4283-slow.patch, LUCENE-4283-small-interval-fully.patch, LUCENE-4283-small-interval-partially.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize. Every time the skipper reaches the last level 0 skip point, we'll have to decode a whole block to read doc/freq data. Also, a higher level skip list will be created only for those df>blockSize^k, which means for most terms, skipping will just be a linear scan. If we increase current blockSize for better bulk i/o performance, current skip setting will be a bottleneck.
> For ForPF, the encoded block can be easily splitted if we set skipInterval=32*k.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Aug 5, 2012, 3:42 PM

Post #6 of 9 (100 views)
Permalink
[jira] [Commented] (LUCENE-4283) Support more frequent skip with Block Postings Format [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428926#comment-13428926 ]

Michael McCandless commented on LUCENE-4283:
--------------------------------------------

Thanks Billy, I tweaked the cleanup patch some (removed blockInts, restored lost DEBUGs, added some nocommits, etc.) and committed.

> Support more frequent skip with Block Postings Format
> -----------------------------------------------------
>
> Key: LUCENE-4283
> URL: https://issues.apache.org/jira/browse/LUCENE-4283
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Han Jiang
> Priority: Minor
> Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch, LUCENE-4283-codes-cleanup.patch, LUCENE-4283-slow.patch, LUCENE-4283-small-interval-fully.patch, LUCENE-4283-small-interval-partially.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize. Every time the skipper reaches the last level 0 skip point, we'll have to decode a whole block to read doc/freq data. Also, a higher level skip list will be created only for those df>blockSize^k, which means for most terms, skipping will just be a linear scan. If we increase current blockSize for better bulk i/o performance, current skip setting will be a bottleneck.
> For ForPF, the encoded block can be easily splitted if we set skipInterval=32*k.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Aug 7, 2012, 2:27 PM

Post #7 of 9 (91 views)
Permalink
[jira] [Commented] (LUCENE-4283) Support more frequent skip with Block Postings Format [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430636#comment-13430636 ]

Michael McCandless commented on LUCENE-4283:
--------------------------------------------

Thanks Billy, that's a nice optimization! I think other postings formats should do the same thing...

It seems to give a small gain to the skip-heavy queries:
{noformat}
Task QPS base StdDev baseQPS nextskipStdDev nextskip Pct diff
AndHighHigh 23.87 0.09 23.56 0.19 -2% - 0%
Fuzzy2 63.37 1.07 62.59 0.86 -4% - 1%
OrHighHigh 11.67 0.08 11.53 0.35 -4% - 2%
Fuzzy1 75.44 1.02 74.59 0.74 -3% - 1%
OrHighMed 24.14 0.18 23.89 0.72 -4% - 2%
Respell 62.66 0.65 62.04 1.37 -4% - 2%
OrHighLow 27.86 0.23 27.60 0.85 -4% - 2%
HighSloppyPhrase 2.00 0.04 1.99 0.05 -5% - 3%
HighSpanNear 1.70 0.02 1.69 0.01 -2% - 1%
LowTerm 517.40 1.67 514.32 2.68 -1% - 0%
LowSloppyPhrase 7.61 0.07 7.58 0.16 -3% - 2%
MedSloppyPhrase 6.90 0.09 6.88 0.13 -3% - 2%
PKLookup 192.23 1.99 191.81 3.80 -3% - 2%
Prefix3 82.35 0.63 82.36 1.06 -2% - 2%
Wildcard 52.49 0.44 52.54 0.41 -1% - 1%
HighTerm 36.03 0.11 36.09 0.03 0% - 0%
IntNRQ 11.56 0.07 11.58 0.03 0% - 1%
MedTerm 197.94 0.88 198.87 0.36 0% - 1%
MedSpanNear 4.84 0.07 4.86 0.03 -1% - 2%
LowSpanNear 9.49 0.26 9.64 0.01 -1% - 4%
LowPhrase 21.95 0.38 22.39 0.08 0% - 4%
AndHighLow 641.56 10.38 657.49 5.64 0% - 5%
MedPhrase 13.04 0.30 13.37 0.05 0% - 5%
AndHighMed 67.13 0.57 69.30 0.80 1% - 5%
HighPhrase 1.81 0.10 1.87 0.03 -3% - 11%
{noformat}


> Support more frequent skip with Block Postings Format
> -----------------------------------------------------
>
> Key: LUCENE-4283
> URL: https://issues.apache.org/jira/browse/LUCENE-4283
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Han Jiang
> Priority: Minor
> Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch, LUCENE-4283-codes-cleanup.patch, LUCENE-4283-record-next-skip.patch, LUCENE-4283-slow.patch, LUCENE-4283-small-interval-fully.patch, LUCENE-4283-small-interval-partially.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize. Every time the skipper reaches the last level 0 skip point, we'll have to decode a whole block to read doc/freq data. Also, a higher level skip list will be created only for those df>blockSize^k, which means for most terms, skipping will just be a linear scan. If we increase current blockSize for better bulk i/o performance, current skip setting will be a bottleneck.
> For ForPF, the encoded block can be easily splitted if we set skipInterval=32*k.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Aug 8, 2012, 2:00 AM

Post #8 of 9 (88 views)
Permalink
[jira] [Commented] (LUCENE-4283) Support more frequent skip with Block Postings Format [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430958#comment-13430958 ]

Han Jiang commented on LUCENE-4283:
-----------------------------------

Hmm, the improvement isn't that noisy
{noformat}
Task QPS base StdDev base QPS comp StdDev comp Pct diff
AndHighHigh 83.84 5.07 88.64 2.41 -3% - 15%
AndHighLow 1716.87 62.53 1891.91 20.85 5% - 15%
AndHighMed 348.15 37.20 441.49 10.78 11% - 45%
Fuzzy1 87.67 0.92 84.80 2.36 -6% - 0%
Fuzzy2 32.84 0.37 31.41 1.06 -8% - 0%
HighPhrase 18.45 0.93 18.88 0.53 -5% - 10%
HighSloppyPhrase 22.16 0.76 21.55 0.57 -8% - 3%
HighSpanNear 3.07 0.11 3.09 0.04 -3% - 5%
HighTerm 181.58 18.26 171.10 6.44 -17% - 8%
IntNRQ 48.39 1.47 49.28 0.88 -2% - 6%
LowPhrase 80.49 3.34 87.04 2.63 0% - 16%
LowSloppyPhrase 28.53 1.09 27.31 0.71 -10% - 2%
LowSpanNear 46.86 1.63 49.34 1.15 0% - 11%
LowTerm 1637.37 19.39 1608.23 16.93 -3% - 0%
MedPhrase 22.48 1.03 23.27 0.52 -3% - 10%
MedSloppyPhrase 15.46 0.52 15.00 0.37 -8% - 2%
MedSpanNear 37.09 1.21 37.80 0.69 -3% - 7%
MedTerm 587.20 44.40 560.78 19.09 -14% - 6%
OrHighHigh 62.10 0.88 62.95 1.05 -1% - 4%
OrHighLow 126.89 1.48 128.30 1.53 -1% - 3%
OrHighMed 124.20 1.18 125.34 1.23 -1% - 2%
PKLookup 213.54 3.75 211.98 0.37 -2% - 1%
Prefix3 106.76 2.31 107.79 0.84 -1% - 3%
Respell 100.12 1.00 96.48 2.58 -7% - 0%
Wildcard 149.61 3.53 150.29 0.88 -2% - 3%
{noformat}

> Support more frequent skip with Block Postings Format
> -----------------------------------------------------
>
> Key: LUCENE-4283
> URL: https://issues.apache.org/jira/browse/LUCENE-4283
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Han Jiang
> Priority: Minor
> Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch, LUCENE-4283-codes-cleanup.patch, LUCENE-4283-record-next-skip.patch, LUCENE-4283-record-skip&inlining-scanning.patch, LUCENE-4283-slow.patch, LUCENE-4283-small-interval-fully.patch, LUCENE-4283-small-interval-partially.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize. Every time the skipper reaches the last level 0 skip point, we'll have to decode a whole block to read doc/freq data. Also, a higher level skip list will be created only for those df>blockSize^k, which means for most terms, skipping will just be a linear scan. If we increase current blockSize for better bulk i/o performance, current skip setting will be a bottleneck.
> For ForPF, the encoded block can be easily splitted if we set skipInterval=32*k.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Aug 8, 2012, 4:01 PM

Post #9 of 9 (86 views)
Permalink
[jira] [Commented] (LUCENE-4283) Support more frequent skip with Block Postings Format [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431486#comment-13431486 ]

Michael McCandless commented on LUCENE-4283:
--------------------------------------------

Thanks Billy, patch looks good... I also see some improvements in the skip
heavy queries:

{noformat}
Task QPS base StdDev base QPS for StdDev for Pct diff
HighSpanNear 1.70 0.05 1.66 0.02 -6% - 2%
PKLookup 192.84 3.29 190.09 2.97 -4% - 1%
MedSloppyPhrase 6.86 0.09 6.79 0.13 -4% - 2%
HighSloppyPhrase 1.97 0.04 1.96 0.08 -6% - 5%
MedSpanNear 4.88 0.12 4.85 0.06 -4% - 3%
OrHighMed 23.40 0.74 23.31 0.73 -6% - 6%
LowSloppyPhrase 7.58 0.12 7.56 0.18 -4% - 3%
OrHighLow 27.00 0.92 26.93 0.86 -6% - 6%
Wildcard 52.66 0.43 52.54 0.32 -1% - 1%
Prefix3 82.44 0.90 82.36 0.87 -2% - 2%
IntNRQ 11.61 0.02 11.60 0.02 0% - 0%
LowTerm 513.72 0.95 513.40 2.77 0% - 0%
OrHighHigh 11.27 0.35 11.27 0.35 -6% - 6%
HighTerm 36.10 0.07 36.10 0.03 0% - 0%
MedTerm 198.76 0.26 198.85 0.23 0% - 0%
Respell 61.52 1.12 61.88 0.36 -1% - 3%
Fuzzy1 74.60 1.37 75.07 0.58 -1% - 3%
Fuzzy2 62.36 1.33 63.09 0.33 -1% - 3%
AndHighHigh 23.62 0.08 24.07 0.21 0% - 3%
LowSpanNear 9.65 0.22 9.88 0.06 0% - 5%
LowPhrase 22.08 0.37 22.63 0.31 0% - 5%
HighPhrase 1.77 0.10 1.83 0.09 -6% - 14%
MedPhrase 13.09 0.29 13.54 0.25 0% - 7%
AndHighLow 662.00 1.45 700.98 24.76 1% - 9%
AndHighMed 69.58 0.18 75.15 1.28 5% - 10%
{noformat}


> Support more frequent skip with Block Postings Format
> -----------------------------------------------------
>
> Key: LUCENE-4283
> URL: https://issues.apache.org/jira/browse/LUCENE-4283
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Han Jiang
> Priority: Minor
> Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch, LUCENE-4283-codes-cleanup.patch, LUCENE-4283-record-next-skip.patch, LUCENE-4283-record-skip&inlining-scanning.patch, LUCENE-4283-slow.patch, LUCENE-4283-small-interval-fully.patch, LUCENE-4283-small-interval-partially.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize. Every time the skipper reaches the last level 0 skip point, we'll have to decode a whole block to read doc/freq data. Also, a higher level skip list will be created only for those df>blockSize^k, which means for most terms, skipping will just be a linear scan. If we increase current blockSize for better bulk i/o performance, current skip setting will be a bottleneck.
> For ForPF, the encoded block can be easily splitted if we set skipInterval=32*k.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.