Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)

 

 

First page Previous page 1 2 3 4 Next page Last page  View All Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Jun 2, 2012, 6:30 AM

Post #1 of 94 (1427 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287936#comment-13287936 ]

Michael McCandless commented on LUCENE-3892:
--------------------------------------------

Awesome progress! Nice to have a dirt path online that we can then
iterate from ...

Hmm, I'm seeing some test failures when I run:
{noformat}
ant test -Dtests.postingsformat=PFor
{noformat}
Eg, TestNRTThreads, TestShardSearching, TestTimeLimitingCollector.

Remember to add the standard copyright headers to each new source
file...

We don't have to do this now, but I wonder if we can share code w/ the
packed ints impl we have, instead generating another one with the .py
source.

TestDemo makes a nice TestMin... I usually start with TestDemo when
testing scary new code, and then it's a huge milestone once TestDemo
passes :)

We should definitely cutover to BlockTree terms dict (I would upgrade
that TODO to a nocommit!).

I suspect that wrapping the blocks byte[] as ByteBuffer and then
IntBuffer is going to be too costly per decode so we should init them
once and re-use (upgrade that TODO to a nocommit).


> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892_pfor.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 2, 2012, 7:21 AM

Post #2 of 94 (1387 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287951#comment-13287951 ]

Han Jiang commented on LUCENE-3892:
-----------------------------------

Ah, yes, I forgot to use -Dtests.postingsformat...I can see the errors
now.

{quote}
TestDemo makes a nice TestMin... I usually start with TestDemo when
testing scary new code, and then it's a huge milestone once TestDemo
passes
{quote}
Hmm, that means I should remove TestMin.java? This testcase works fine
for the patch.

{quote}
We should definitely cutover to BlockTree terms dict (I would upgrade
that TODO to a nocommit!).
{quote}
I'm not quite familiar with these sign stuff, shall I change all the
"TODO" sign into "nocommit"? Are the signs related to documentation,
or just marked to remember not to commit current codes?

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892_pfor.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 2, 2012, 7:41 AM

Post #3 of 94 (1390 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287952#comment-13287952 ]

Michael McCandless commented on LUCENE-3892:
--------------------------------------------

bq. Hmm, that means I should remove TestMin.java? This testcase works fine for the patch.

Oh it's fine to keep TestMin now that you wrote it ... I was just saying that TestDemo is the test I run when I want the most trivial test for a new big change.

{quote}
I'm not quite familiar with these sign stuff, shall I change all the
"TODO" sign into "nocommit"? Are the signs related to documentation,
or just marked to remember not to commit current codes?
{quote}

Sorry - this is just a convention I use: I put a // nocommit comment whenever there's a "blocker" to committing; this way I can grep for nocommit to see what still needs fixing... and towards the end, nocommits will often be downgraded to TODOs since on closer inspection they really don't have to block committing...

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892_pfor.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 4, 2012, 10:01 AM

Post #4 of 94 (1384 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288675#comment-13288675 ]

Michael McCandless commented on LUCENE-3892:
--------------------------------------------

Excellent! All tests also pass for me w/ PFor postings format as
well... this is a great starting point :) One Solr test failed
(ContentStreamTest)... but I think it was false failure...

I did notice the tests seem to run slower, especially certain ones eg
TestJoinUtil.

Still missing a couple license headers (TestMin, TestCompress)...

I ran a quick perf test using
http://code.google.com/a/apache-extras.org/p/luceneutil on a 10M doc
Wikipedia index.

Indexing time is ~18% slower than Lucene40PostingsFormat (1071 sec vs
1261 sec).

But more important is the slower search times:

{noformat}
Task QPS base StdDev base QPS pfor StdDev pfor Pct diff
Phrase 8.52 0.50 4.43 0.40 -55% - -39%
SloppyPhrase 12.52 0.39 7.87 0.51 -43% - -30%
AndHighMed 67.69 2.82 44.22 1.47 -39% - -29%
SpanNear 5.19 0.12 3.90 0.28 -31% - -17%
PKLookup 112.16 1.71 95.61 1.30 -17% - -12%
AndHighHigh 13.22 0.34 11.86 0.72 -17% - -2%
Wildcard 46.04 0.37 41.68 4.45 -19% - 1%
Fuzzy1 50.11 2.03 48.06 1.91 -11% - 3%
OrHighMed 9.26 0.48 8.90 0.37 -12% - 5%
OrHighHigh 12.28 0.56 11.83 0.49 -11% - 5%
TermBGroup1M1P 40.47 1.94 39.88 2.51 -11% - 10%
Fuzzy2 53.71 2.66 53.01 2.08 -9% - 7%
TermGroup1M 36.46 1.21 35.99 1.58 -8% - 6%
TermBGroup1M 55.53 1.99 55.26 2.68 -8% - 8%
Respell 69.71 4.49 69.73 2.07 -8% - 10%
Term 94.38 7.62 94.96 12.19 -18% - 23%
Prefix3 41.63 0.34 42.21 5.82 -13% - 16%
IntNRQ 7.08 0.15 7.28 1.29 -17% - 23%
{noformat}

The queries that do skipping are quite a bit slower; this makes sense,
since on skip we do a full block decode. A smaller block size (we use
128 now right?) should help I think.

It's strange that the non-skipping queries (Term, OrHighMed,
OrHighHigh) don't show any performance gain ... maybe we need to
optimize the decode... or it could be the removal of the bulk api
is hurting us here.

I'm also curious if we tried a pure FOR (no patching, so we must set
numBits according to the max value = larger index but hopefully faster
decode) if the results would improve...



> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 4, 2012, 8:00 PM

Post #5 of 94 (1379 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289104#comment-13289104 ]

Han Jiang commented on LUCENE-3892:
-----------------------------------

Thanks Mike, we have so much details to help optimize!

bq.Still missing a couple license headers (TestMin, TestCompress)...
Ok, I'll add them later.

bq.I ran a quick perf test using http://code.google.com/a/apache-extras.org/p/luceneutil on a 10M doc Wikipedia index.
The script is wonderful! But the wiki data is missing? Can I get it from a wiki dump instead?

bq.Indexing time is ~18% slower than Lucene40PostingsFormat (1071 sec vs 1261 sec).
Yes, it is expected, actually it scans every block 33 times to estimate metadata such as numFrameBits and numExceptions.

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 5, 2012, 3:06 AM

Post #6 of 94 (1379 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289296#comment-13289296 ]

Michael McCandless commented on LUCENE-3892:
--------------------------------------------

Hi Billy,

bq. Can I get it from a wiki dump instead?

You can download it at http://people.apache.org/~mikemccand/enwiki-20120502-lines-1k.txt.lzma

That's ~6.3 GB (compressed) and 28.7 GB (decompressed); it's the 2012/05/02 Wikipedia en export, filtered to plain text and then broken into 33.3 M ~1 KB sized docs. I can help you get the luceneutil env set up...

{quote}
bq. Indexing time is ~18% slower than Lucene40PostingsFormat (1071 sec vs 1261 sec).

Yes, it is expected, actually it scans every block 33 times to estimate metadata such as numFrameBits and numExceptions.
{quote}

OK, in that case I'm surprised it's only ~18% slower!

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 8, 2012, 9:33 AM

Post #7 of 94 (1380 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291850#comment-13291850 ]

Han Jiang commented on LUCENE-3892:
-----------------------------------

OK, here is a result I tried to reproduce with Mike's test script:
Indexing time:
trunk: 2396 sec
patch: 2793 sec

Searching time:
{noformat}
TaskQPS Lucene40StdDev Lucene40 QPS PFor StdDev PFor Pct diff
AndHighMed 22.76 0.54 14.68 1.00 -41% - -29%
SloppyPhrase 3.58 0.17 2.46 0.27 -41% - -19%
SpanNear 5.90 0.09 4.08 0.37 -38% - -23%
AndHighHigh 10.00 0.17 8.08 0.57 -26% - -11%
Phrase 1.68 0.07 1.45 0.17 -27% - 0%
Respell 37.65 0.74 33.41 1.04 -15% - -6%
Fuzzy1 38.00 1.60 34.37 1.06 -15% - -2%
IntNRQ 4.27 0.33 3.87 0.19 -19% - 3%
Fuzzy2 16.35 0.60 15.02 0.31 -13% - -2%
Wildcard 30.24 0.57 28.24 1.85 -14% - 1%
PKLookup 85.82 5.04 83.25 2.81 -11% - 6%
Prefix3 19.20 0.40 19.19 1.46 -9% - 9%
OrHighMed 9.25 0.59 9.41 0.70 -11% - 16%
TermGroup1M 11.46 0.62 11.74 0.81 -9% - 15%
OrHighHigh 3.15 0.17 3.28 0.23 -8% - 17%
TermBGroup1M1P 19.28 0.38 20.32 1.14 -2% - 13%
TermBGroup1M 6.23 0.21 6.71 0.46 -3% - 19%
Term 30.86 1.52 34.34 3.26 -4% - 28%
{noformat}

It is done on a 64bit AMD server with Java 1.7.0.

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 18, 2012, 6:54 AM

Post #8 of 94 (1369 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13395894#comment-13395894 ]

Han Jiang commented on LUCENE-3892:
-----------------------------------

There's a potential bottleneck during method calling...Here is an example for PFor, with blocksize=128, exception rate = 97%, normal value <= 2 bits, exception value <= 32 bits:

{noformat}
Decoding normal values: 4703 ns
Patching exceptions: 5797 ns
Single call of PForUtil.decompress totally takes: 58318 ns
{noformat}

In addition, it costs about 4000ns to record the time span.

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892_for.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 18, 2012, 3:44 PM

Post #9 of 94 (1360 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396325#comment-13396325 ]

Michael McCandless commented on LUCENE-3892:
--------------------------------------------

On the For patch ... we shouldn't encode/decode numInts right? It's
always 128?

Up above, in ForFactory, when we readInt() to get numBytes ... it
seems like we could stuff the header numBits into that same int and
save checking that in FORUtil.decompress....

I think there are a few possible ideas to explore to get faster
PFor/For performance:

* Get more direct access to the file as an int[]; eg MMapDir could
expose an IntBuffer from its ByteBuffer (saving the initial copy
into byte[] that we now do). Or maybe we add
IndexInput.readInts(int[]) and dir impl can optimize how that's
done (MMapDir could use Unsafe.copyBytes... except for little
endian architectures ... we'd probably have to have separate
specialized decoder rather than letting Int/ByteBuffer do the byte
swapping). This would require the whole file stays aligned w/ int
(eg the header must be 0 mod 4).

* Copy/share how oal.packed works, i.e. being able to waste a bit to
have faster decode (eg storing the 7 bit case as byte[], wasting 1
bit for each value).

* Skipping: can we partially decode a block? EG if we are skipping
and we know we only want values after the 80th one, then we
shouldn't decode those first 80...

* Since doc/freq are "aligned", when we store pointers to a given
spot, eg in the terms dict or in skip data, we should only store
the offset once (today we store it twice).

* Alternatively, maybe we should only save skip data on doc/freq
block boundaries (prox would still need skip-within-block).

* Maybe we should store doc & frq blocks interleaved in a single
file (since they are "aligned") and then skip would skip to the
start of a doc/frq block pair.

Other ideas...?


> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892_for.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 19, 2012, 12:03 PM

Post #10 of 94 (1362 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396987#comment-13396987 ]

Han Jiang commented on LUCENE-3892:
-----------------------------------

Oh, thank you Mike! I haven't thought too much about those skipping policies.

bq. Up above, in ForFactory, when we readInt() to get numBytes ... it seems like we could stuff the header numBits into that same int and save checking that in FORUtil.decompress....
Ah, yes, I just forgot to remove the redundant codes. Here is a initial try to remove header and call ForDecompressImpl directly in readBlock():with For, blockSize=128. Data in bracket show prior benchmark.
{noformat}
Task QPS Base StdDev Base QPS For StdDev For Pct diff
Phrase 4.99 0.37 3.57 0.26 -38% - -17% (-44% - -18%)
AndHighMed 28.91 2.17 22.66 0.82 -29% - -12% (-38% - -9%)
SpanNear 2.72 0.14 2.22 0.13 -26% - -8% (-36% - -8%)
SloppyPhrase 4.24 0.26 3.70 0.16 -21% - -3% (-33% - -6%)
Respell 40.71 2.59 37.66 1.36 -16% - 2% (-18% - 0%)
Fuzzy1 43.22 2.01 40.66 0.32 -10% - 0% (-12% - 0%)
Fuzzy2 16.25 0.90 15.64 0.26 -10% - 3% (-12% - 3%)
Wildcard 19.07 0.86 19.07 0.73 -8% - 8% (-21% - 3%)
AndHighHigh 7.76 0.47 7.77 0.15 -7% - 8% (-21% - 10%)
PKLookup 87.50 4.56 88.51 1.24 -5% - 8% ( -2% - 5%)
TermBGroup1M 20.42 0.87 21.32 0.74 -3% - 12% ( 2% - 10%)
OrHighMed 5.33 0.68 5.61 0.14 -9% - 23% (-16% - 25%)
OrHighHigh 4.43 0.53 4.69 0.12 -8% - 23% (-15% - 24%)
TermGroup1M 13.30 0.34 14.31 0.40 2% - 13% ( 0% - 13%)
TermBGroup1M1P 20.92 0.59 23.71 0.86 6% - 20% ( -1% - 22%)
Prefix3 30.30 1.41 35.14 1.76 5% - 27% (-14% - 21%)
IntNRQ 3.90 0.54 4.58 0.47 -7% - 50% (-25% - 33%)
Term 42.17 1.55 52.33 2.57 13% - 35% ( 1% - 33%)
{noformat}
The improvement is quite general. However, I still suppose this just benefits from less method calling. I'm trying to change the PFor codes, and remove those nested call.

bq. Get more direct access to the file as an int[]; ...
Ok, this will be considered when the pfor+pulsing is completed. I'm just curious why we don't have readInts in ora.util yet...

bq. Skipping: can we partially decode a block? ...
The pfor-opt approach(encode lower bits of exception in normal area, and other bits in exception area) natually fits "partially decode a block", that'll be possible when we optimize skipping queries.

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892_for.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 19, 2012, 8:18 PM

Post #11 of 94 (1359 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397228#comment-13397228 ]

Han Jiang commented on LUCENE-3892:
-----------------------------------

And result for PFor(blocksize=128):
{noformat}
Task QPS Base StdDev Base QPS PFor StdDev PFor Pct diff
Phrase 4.87 0.36 3.39 0.18 -38% - -20% (-47% - -25%)
AndHighMed 27.78 2.35 21.13 0.52 -31% - -14% (-37% - -15%)
SpanNear 2.70 0.14 2.20 0.11 -26% - -9% (-36% - -13%)
SloppyPhrase 4.17 0.15 3.77 0.21 -17% - 0% (-30% - -6%)
Respell 39.97 1.56 37.65 1.95 -14% - 3% (-15% - 2%)
Wildcard 19.08 0.77 18.33 0.92 -12% - 5% (-17% - 3%)
Fuzzy1 42.29 1.13 40.78 1.44 -9% - 2% (-11% - 1%)
AndHighHigh 7.61 0.55 7.45 0.08 -9% - 6% (-19% - 6%)
Fuzzy2 15.79 0.55 15.64 0.70 -8% - 7% (-11% - 6%)
PKLookup 86.71 2.13 88.92 2.24 -2% - 7% ( -2% - 7%)
TermGroup1M 13.04 0.23 14.03 0.40 2% - 12% ( 1% - 9%)
IntNRQ 3.97 0.48 4.35 0.61 -15% - 41% (-16% - 24%)
TermBGroup1M1P 21.04 0.35 23.20 0.60 5% - 14% ( 0% - 14%)
TermBGroup1M 19.27 0.47 21.28 0.84 3% - 17% ( 1% - 10%)
OrHighHigh 4.13 0.47 4.63 0.27 -5% - 34% (-14% - 27%)
OrHighMed 4.95 0.59 5.58 0.34 -5% - 35% (-14% - 27%)
Prefix3 30.33 1.36 34.26 2.14 1% - 25% ( -6% - 20%)
Term 41.99 1.19 50.75 1.72 13% - 28% ( 2% - 26%)
{noformat}
It works, and it is quite interesting that StdDev for Term query is reduced significantly.

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892_for.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 20, 2012, 9:14 AM

Post #12 of 94 (1370 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397605#comment-13397605 ]

Michael McCandless commented on LUCENE-3892:
--------------------------------------------

OK I created a branch and committed last For patch: https://svn.apache.org/repos/asf/lucene/dev/branches/pforcodec_3892

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892_for.patch, LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 20, 2012, 10:58 AM

Post #13 of 94 (1371 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397694#comment-13397694 ]

Han Jiang commented on LUCENE-3892:
-----------------------------------

OK, just reproduce your test. But Mike, are we using a same task file? Our relative speeds for different queries are not the same.
{quote}
Task QPS Base StdDev Base QPS For StdDev For Pct diff
Phrase 5.07 0.45 3.76 0.19 -35% - -14% (-44% - -18%)
AndHighMed 28.32 2.34 22.67 0.67 -28% - -10% (-38% - -9%)
SpanNear 2.72 0.13 2.36 0.14 -22% - -3% (-36% - -8%)
SloppyPhrase 4.18 0.20 3.83 0.15 -16% - 0% (-33% - -6%)
Respell 42.02 1.83 38.86 2.30 -16% - 2% (-18% - 0%)
Fuzzy1 44.96 1.58 42.85 1.69 -11% - 2% (-12% - 0%)
Fuzzy2 16.78 0.69 16.34 0.68 -10% - 5% (-12% - 3%)
PKLookup 89.11 2.15 87.33 2.19 -6% - 2% ( -2% - 5%)
AndHighHigh 7.61 0.44 7.69 0.21 -7% - 10% (-21% - 10%)
Wildcard 19.50 0.91 20.02 0.72 -5% - 11% (-21% - 3%)
TermBGroup1M 20.82 0.37 21.73 0.69 0% - 9% ( 2% - 10%)
TermGroup1M 13.79 0.13 14.61 0.32 2% - 9% ( 1% - 9%)
IntNRQ 4.11 0.56 4.56 0.56 -14% - 43% (-25% - 33%)
TermBGroup1M1P 21.45 0.75 24.00 0.51 5% - 18% ( -1% - 22%)
OrHighMed 5.08 0.49 5.73 0.15 0% - 28% (-16% - 25%)
OrHighHigh 4.22 0.39 4.78 0.13 1% - 28% (-15% - 24%)
Prefix3 30.91 1.63 35.65 2.02 3% - 28% (-14% - 21%)
Term 44.36 1.87 54.01 1.96 12% - 31% ( -1% - 33%)
{quote}

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892_for.patch, LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 20, 2012, 3:41 PM

Post #14 of 94 (1360 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397958#comment-13397958 ]

Michael McCandless commented on LUCENE-3892:
--------------------------------------------

bq. But Mike, are we using a same task file? Our relative speeds for different queries are not the same.

Sorry, I'm using a hand edited "hard" tasks file; I'll commit & push to luceneutil. But, separately: each run picks a different subset of the tasks from each category to run, so results from one run to another in general aren't comparable unless we fix the random seed it uses.

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892_for.patch, LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 21, 2012, 12:50 PM

Post #15 of 94 (1364 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398800#comment-13398800 ]

Han Jiang commented on LUCENE-3892:
-----------------------------------

And same codes with the wikimediumhard.tasks file.(This is really a hard testcase, since QPS are so small that we can hardly depend on Pct Diff :) )
{noformat}
Task QPS Base StdDev Base QPS For StdDev For Pct diff
AndHighMed 10.76 0.21 6.47 0.32 -43% - -35%
AndHighHigh 2.89 0.08 2.57 0.19 -20% - -1%
SpanNear 0.60 0.01 0.55 0.01 -11% - -6%
SloppyPhrase 0.61 0.01 0.57 0.01 -9% - -3%
PKLookup 87.72 2.61 86.28 1.48 -6% - 3%
Fuzzy1 36.22 1.14 35.90 0.97 -6% - 5%
Phrase 1.22 0.03 1.22 0.08 -9% - 8%
Respell 32.84 0.92 33.55 0.87 -3% - 7%
IntNRQ 3.66 0.35 3.74 0.08 -8% - 15%
Fuzzy2 21.62 0.66 22.10 0.51 -3% - 7%
Prefix3 13.30 0.49 14.09 0.76 -3% - 15%
OrHighMed 3.43 0.16 3.65 0.45 -10% - 25%
OrHighHigh 1.66 0.09 1.79 0.22 -10% - 28%
Wildcard 3.39 0.14 3.74 0.20 0% - 21%
TermBGroup1M1P 1.84 0.03 2.10 0.16 3% - 25%
TermGroup1M 1.14 0.03 1.34 0.10 5% - 29%
TermBGroup1M 1.49 0.05 1.78 0.13 7% - 32%
Term 3.49 0.13 4.38 0.65 2% - 49%
{noformat}

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892_for.patch, LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 23, 2012, 12:24 AM

Post #16 of 94 (1365 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399869#comment-13399869 ]

Chris Male commented on LUCENE-3892:
------------------------------------

It's really interesting the effect of peeling back those abstractions.

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892_for.patch, LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jun 23, 2012, 1:37 AM

Post #17 of 94 (1363 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399883#comment-13399883 ]

Han Jiang commented on LUCENE-3892:
-----------------------------------

Yes, really interesting. And that should make sense. As far as I know, a method with exception handling may be quite slow than a simple if statement check. Here is part of the result in my test, with Mike's patch:
{noformat}
OrHighMed 2.53 0.31 2.57 0.13 -13% - 21%
Wildcard 3.86 0.12 3.94 0.38 -10% - 15%
OrHighHigh 1.57 0.18 1.61 0.08 -12% - 21%
TermBGroup1M1P 1.93 0.03 2.48 0.10 21% - 35%
TermGroup1M 1.37 0.02 1.81 0.05 26% - 37%
TermBGroup1M 1.17 0.02 1.64 0.07 32% - 47%
Term 2.92 0.13 4.46 0.23 38% - 68%
{noformat}

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892_for.patch, LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jul 2, 2012, 5:45 PM

Post #18 of 94 (1323 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405477#comment-13405477 ]

Michael McCandless commented on LUCENE-3892:
--------------------------------------------

Thanks Billy, I committed this to the branch.

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor.patch, LUCENE-3892_for.patch, LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jul 9, 2012, 10:32 AM

Post #19 of 94 (1318 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409664#comment-13409664 ]

Michael McCandless commented on LUCENE-3892:
--------------------------------------------

bq. Current branch cannot pass tests like this:

Thanks, I committed the patch.

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor.patch, LUCENE-3892-handle_open_files.patch, LUCENE-3892_for.patch, LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jul 11, 2012, 6:22 AM

Post #20 of 94 (1315 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411472#comment-13411472 ]

Michael McCandless commented on LUCENE-3892:
--------------------------------------------

bq. I'm still not sure about the IOUtils.closeWhileHandlingException(), I think the exceptions should not be suppressed when out.close() is called?

Actually I think you want them to be suppressed, so that the original exception is seen?

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, LUCENE-3892-for&pfor.patch, LUCENE-3892-handle_open_files.patch, LUCENE-3892_for.patch, LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jul 11, 2012, 6:26 AM

Post #21 of 94 (1316 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411475#comment-13411475 ]

Michael McCandless commented on LUCENE-3892:
--------------------------------------------

Docs/cleanup patch looks good, I'll commit to the branch! Thanks.

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, LUCENE-3892-for&pfor.patch, LUCENE-3892-handle_open_files.patch, LUCENE-3892_for.patch, LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jul 11, 2012, 6:42 AM

Post #22 of 94 (1315 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411495#comment-13411495 ]

Han Jiang commented on LUCENE-3892:
-----------------------------------

bq. Actually I think you want them to be suppressed, so that the original exception is seen?

Not my idea actually, I think the exception should be thrown for out.close()? closeWhileHandlingException() will suppress those exceptions.

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, LUCENE-3892-for&pfor.patch, LUCENE-3892-handle_open_files.patch, LUCENE-3892_for.patch, LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jul 11, 2012, 6:58 AM

Post #23 of 94 (1316 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411515#comment-13411515 ]

Michael McCandless commented on LUCENE-3892:
--------------------------------------------

bq. Not my idea actually, I think the exception should be thrown for out.close()? closeWhileHandlingException() will suppress those exceptions

But the problem is some other exception has already been thrown (because success is false). If out.close then hits a second exception we have to pick which one should be thrown, and I think the original one is better? (Since it's likely the root cause of whatever went wrong).

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, LUCENE-3892-for&pfor.patch, LUCENE-3892-handle_open_files.patch, LUCENE-3892_for.patch, LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jul 11, 2012, 7:00 AM

Post #24 of 94 (1317 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411517#comment-13411517 ]

Han Jiang commented on LUCENE-3892:
-----------------------------------

bq. The Pulsing parts in last patch is not included here, because they doesn't improve performance significantly.

Here are some tests between For vs PulsingFor, PFor vs PulsingPFor. Run on the 1M docs with wikimediumhard.tasks

It is strange that PKLookup still doesn't benefit for FixedBlockInt:

{noformat}
Task QPS For StdDev ForQPS PulsingForStdDev PulsingFor Pct diff
AndHighHigh 23.01 0.33 22.94 0.66 -4% - 4%
AndHighMed 56.41 0.76 57.41 1.74 -2% - 6%
Fuzzy1 86.74 0.85 82.22 2.39 -8% - -1%
Fuzzy2 28.23 0.38 26.15 0.97 -11% - -2%
IntNRQ 41.78 1.65 40.78 3.53 -14% - 10%
OrHighHigh 14.44 0.34 14.50 0.92 -8% - 9%
OrHighMed 30.59 0.77 31.12 1.93 -6% - 10%
PKLookup 110.31 2.03 109.22 2.43 -4% - 3%
Phrase 8.18 0.44 7.97 0.40 -12% - 8%
Prefix3 99.64 2.38 97.09 3.46 -8% - 3%
Respell 99.66 0.45 92.76 2.81 -10% - -3%
SloppyPhrase 4.28 0.16 4.08 0.13 -11% - 2%
SpanNear 4.08 0.13 3.93 0.06 -7% - 0%
Term 33.63 1.25 34.06 1.71 -7% - 10%
TermBGroup1M 15.54 0.46 15.78 0.56 -4% - 8%
TermBGroup1M1P 20.34 0.73 20.62 0.62 -5% - 8%
TermGroup1M 19.18 0.52 19.72 0.49 -2% - 8%
Wildcard 34.86 0.88 34.27 1.77 -9% - 6%
{noformat}

{noformat}
AndHighHigh 19.98 0.31 19.92 0.26 -3% - 2%
AndHighMed 58.21 1.51 57.86 1.18 -5% - 4%
Fuzzy1 91.86 1.17 85.86 1.18 -8% - -4%
Fuzzy2 32.66 0.58 30.08 0.57 -11% - -4%
IntNRQ 33.89 0.82 32.66 1.10 -9% - 2%
OrHighHigh 15.79 1.29 14.96 0.67 -16% - 7%
OrHighMed 30.31 2.09 28.91 1.67 -15% - 8%
PKLookup 112.80 0.81 111.82 2.90 -4% - 2%
Phrase 6.14 0.11 6.23 0.10 -1% - 5%
Prefix3 147.80 2.88 138.35 2.11 -9% - -3%
Respell 118.57 1.18 108.30 1.86 -11% - -6%
SloppyPhrase 5.78 0.15 5.66 0.29 -9% - 5%
SpanNear 6.32 0.14 6.40 0.16 -3% - 6%
Term 41.60 2.44 38.12 0.33 -14% - -1%
TermBGroup1M 14.40 0.48 13.73 0.19 -8% - 0%
TermBGroup1M1P 23.68 0.44 22.82 0.44 -7% - 0%
TermGroup1M 15.25 0.48 14.51 0.20 -9% - 0%
Wildcard 32.76 0.53 31.76 0.62 -6% - 0%
{noformat}


> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, LUCENE-3892-for&pfor.patch, LUCENE-3892-handle_open_files.patch, LUCENE-3892_for.patch, LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jul 11, 2012, 7:06 AM

Post #25 of 94 (1318 views)
Permalink
[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.) [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411533#comment-13411533 ]

Han Jiang commented on LUCENE-3892:
-----------------------------------

bq. But the problem is some other exception has already been thrown (because success is false). If out.close then hits a second exception we have to pick which one should be thrown, and I think the original one is better? (Since it's likely the root cause of whatever went wrong).

OK, I see, then let's change ForPostingsFormat.fieldsConsumer/Producer as well.

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, LUCENE-3892-for&pfor.patch, LUCENE-3892-handle_open_files.patch, LUCENE-3892_for.patch, LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene

First page Previous page 1 2 3 4 Next page Last page  View All Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.