Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

revisit payloads API in DocsAndPositionsEnum

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


rcmuir at gmail

Aug 11, 2012, 7:08 AM

Post #1 of 8 (141 views)
Permalink
revisit payloads API in DocsAndPositionsEnum

The payloads api is really confusing:

/** Returns the payload at this position, or null if no
* payload was indexed. Only call this once per
* position. You should not modify anything (neither
* members of the returned BytesRef nor bytes in the
* byte[]). */
public abstract BytesRef getPayload() throws IOException;

public abstract boolean hasPayload();

1. is it ok for the consumer to call getPayload() [checking for null],
and never call hasPayload? It seems so, so why do we have hasPayload?
can we remove it?
The current situation requires impls to handle 'hasPayload' logic
twice: in hasPayload itself and also in getPayload.

2. You should not modify anything (neither members of the returned
BytesRef nor bytes in the byte[]). Then why do we have this in
TestPayloads.java:
// Just to ensure all codecs can
// handle a caller that mucks with the
// returned payload:
if (rarely()) {
br.bytes = new byte[random().nextInt(5)];
}
br.length = 0;
br.offset = 0;

Testing for this totally disagrees with the javadocs.

3. 'Only call this once per position'. This is totally different than
any of our other 'attributes' on the enums (freq(), startOffset(),
endOffset(), etc). I think we should
remove this.

So I want to propose we remove hasPayload(), remove 'only call once
per position', and remove this test in TestPayloads.java. If we want
to make sure none
of lucene's code itself is mucking with the returned BytesRef, we can
add such a check in AssertingCodec.

/** Returns the payload at this position, or null if no
* payload was indexed. You should not modify anything (neither
* members of the returned BytesRef nor bytes in the
* byte[]). */
public abstract BytesRef getPayload() throws IOException;

--
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


lucene at mikemccandless

Aug 11, 2012, 10:28 AM

Post #2 of 8 (133 views)
Permalink
Re: revisit payloads API in DocsAndPositionsEnum [In reply to]

+1

Mike McCandless

http://blog.mikemccandless.com


On Sat, Aug 11, 2012 at 10:08 AM, Robert Muir <rcmuir [at] gmail> wrote:
> The payloads api is really confusing:
>
> /** Returns the payload at this position, or null if no
> * payload was indexed. Only call this once per
> * position. You should not modify anything (neither
> * members of the returned BytesRef nor bytes in the
> * byte[]). */
> public abstract BytesRef getPayload() throws IOException;
>
> public abstract boolean hasPayload();
>
> 1. is it ok for the consumer to call getPayload() [checking for null],
> and never call hasPayload? It seems so, so why do we have hasPayload?
> can we remove it?
> The current situation requires impls to handle 'hasPayload' logic
> twice: in hasPayload itself and also in getPayload.
>
> 2. You should not modify anything (neither members of the returned
> BytesRef nor bytes in the byte[]). Then why do we have this in
> TestPayloads.java:
> // Just to ensure all codecs can
> // handle a caller that mucks with the
> // returned payload:
> if (rarely()) {
> br.bytes = new byte[random().nextInt(5)];
> }
> br.length = 0;
> br.offset = 0;
>
> Testing for this totally disagrees with the javadocs.
>
> 3. 'Only call this once per position'. This is totally different than
> any of our other 'attributes' on the enums (freq(), startOffset(),
> endOffset(), etc). I think we should
> remove this.
>
> So I want to propose we remove hasPayload(), remove 'only call once
> per position', and remove this test in TestPayloads.java. If we want
> to make sure none
> of lucene's code itself is mucking with the returned BytesRef, we can
> add such a check in AssertingCodec.
>
> /** Returns the payload at this position, or null if no
> * payload was indexed. You should not modify anything (neither
> * members of the returned BytesRef nor bytes in the
> * byte[]). */
> public abstract BytesRef getPayload() throws IOException;
>
> --
> lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe [at] lucene
> For additional commands, e-mail: dev-help [at] lucene
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


simon.willnauer at gmail

Aug 11, 2012, 10:35 AM

Post #3 of 8 (132 views)
Permalink
Re: revisit payloads API in DocsAndPositionsEnum [In reply to]

+1 this makes lots of sense

simon

On Sat, Aug 11, 2012 at 7:28 PM, Michael McCandless
<lucene [at] mikemccandless> wrote:
> +1
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Sat, Aug 11, 2012 at 10:08 AM, Robert Muir <rcmuir [at] gmail> wrote:
>> The payloads api is really confusing:
>>
>> /** Returns the payload at this position, or null if no
>> * payload was indexed. Only call this once per
>> * position. You should not modify anything (neither
>> * members of the returned BytesRef nor bytes in the
>> * byte[]). */
>> public abstract BytesRef getPayload() throws IOException;
>>
>> public abstract boolean hasPayload();
>>
>> 1. is it ok for the consumer to call getPayload() [checking for null],
>> and never call hasPayload? It seems so, so why do we have hasPayload?
>> can we remove it?
>> The current situation requires impls to handle 'hasPayload' logic
>> twice: in hasPayload itself and also in getPayload.
>>
>> 2. You should not modify anything (neither members of the returned
>> BytesRef nor bytes in the byte[]). Then why do we have this in
>> TestPayloads.java:
>> // Just to ensure all codecs can
>> // handle a caller that mucks with the
>> // returned payload:
>> if (rarely()) {
>> br.bytes = new byte[random().nextInt(5)];
>> }
>> br.length = 0;
>> br.offset = 0;
>>
>> Testing for this totally disagrees with the javadocs.
>>
>> 3. 'Only call this once per position'. This is totally different than
>> any of our other 'attributes' on the enums (freq(), startOffset(),
>> endOffset(), etc). I think we should
>> remove this.
>>
>> So I want to propose we remove hasPayload(), remove 'only call once
>> per position', and remove this test in TestPayloads.java. If we want
>> to make sure none
>> of lucene's code itself is mucking with the returned BytesRef, we can
>> add such a check in AssertingCodec.
>>
>> /** Returns the payload at this position, or null if no
>> * payload was indexed. You should not modify anything (neither
>> * members of the returned BytesRef nor bytes in the
>> * byte[]). */
>> public abstract BytesRef getPayload() throws IOException;
>>
>> --
>> lucidimagination.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe [at] lucene
>> For additional commands, e-mail: dev-help [at] lucene
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe [at] lucene
> For additional commands, e-mail: dev-help [at] lucene
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


rcmuir at gmail

Aug 11, 2012, 10:56 PM

Post #4 of 8 (133 views)
Permalink
Re: revisit payloads API in DocsAndPositionsEnum [In reply to]

Here's a patch: http://pastebin.com/d2DdWxJp

On Sat, Aug 11, 2012 at 1:35 PM, Simon Willnauer
<simon.willnauer [at] gmail> wrote:
> +1 this makes lots of sense
>
> simon
>
> On Sat, Aug 11, 2012 at 7:28 PM, Michael McCandless
> <lucene [at] mikemccandless> wrote:
>> +1
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Sat, Aug 11, 2012 at 10:08 AM, Robert Muir <rcmuir [at] gmail> wrote:
>>> The payloads api is really confusing:
>>>
>>> /** Returns the payload at this position, or null if no
>>> * payload was indexed. Only call this once per
>>> * position. You should not modify anything (neither
>>> * members of the returned BytesRef nor bytes in the
>>> * byte[]). */
>>> public abstract BytesRef getPayload() throws IOException;
>>>
>>> public abstract boolean hasPayload();
>>>
>>> 1. is it ok for the consumer to call getPayload() [checking for null],
>>> and never call hasPayload? It seems so, so why do we have hasPayload?
>>> can we remove it?
>>> The current situation requires impls to handle 'hasPayload' logic
>>> twice: in hasPayload itself and also in getPayload.
>>>
>>> 2. You should not modify anything (neither members of the returned
>>> BytesRef nor bytes in the byte[]). Then why do we have this in
>>> TestPayloads.java:
>>> // Just to ensure all codecs can
>>> // handle a caller that mucks with the
>>> // returned payload:
>>> if (rarely()) {
>>> br.bytes = new byte[random().nextInt(5)];
>>> }
>>> br.length = 0;
>>> br.offset = 0;
>>>
>>> Testing for this totally disagrees with the javadocs.
>>>
>>> 3. 'Only call this once per position'. This is totally different than
>>> any of our other 'attributes' on the enums (freq(), startOffset(),
>>> endOffset(), etc). I think we should
>>> remove this.
>>>
>>> So I want to propose we remove hasPayload(), remove 'only call once
>>> per position', and remove this test in TestPayloads.java. If we want
>>> to make sure none
>>> of lucene's code itself is mucking with the returned BytesRef, we can
>>> add such a check in AssertingCodec.
>>>
>>> /** Returns the payload at this position, or null if no
>>> * payload was indexed. You should not modify anything (neither
>>> * members of the returned BytesRef nor bytes in the
>>> * byte[]). */
>>> public abstract BytesRef getPayload() throws IOException;
>>>
>>> --
>>> lucidimagination.com
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe [at] lucene
>>> For additional commands, e-mail: dev-help [at] lucene
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe [at] lucene
>> For additional commands, e-mail: dev-help [at] lucene
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe [at] lucene
> For additional commands, e-mail: dev-help [at] lucene
>



--
lucidworks.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


serera at gmail

Aug 11, 2012, 11:10 PM

Post #5 of 8 (133 views)
Permalink
Re: revisit payloads API in DocsAndPositionsEnum [In reply to]

Looks good.

Perhaps separately, what do you think about doing the same to
Spans.isPayloadAvailable/getPayload?

Shai

On Sun, Aug 12, 2012 at 8:56 AM, Robert Muir <rcmuir [at] gmail> wrote:

> Here's a patch: http://pastebin.com/d2DdWxJp
>
> On Sat, Aug 11, 2012 at 1:35 PM, Simon Willnauer
> <simon.willnauer [at] gmail> wrote:
> > +1 this makes lots of sense
> >
> > simon
> >
> > On Sat, Aug 11, 2012 at 7:28 PM, Michael McCandless
> > <lucene [at] mikemccandless> wrote:
> >> +1
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Sat, Aug 11, 2012 at 10:08 AM, Robert Muir <rcmuir [at] gmail> wrote:
> >>> The payloads api is really confusing:
> >>>
> >>> /** Returns the payload at this position, or null if no
> >>> * payload was indexed. Only call this once per
> >>> * position. You should not modify anything (neither
> >>> * members of the returned BytesRef nor bytes in the
> >>> * byte[]). */
> >>> public abstract BytesRef getPayload() throws IOException;
> >>>
> >>> public abstract boolean hasPayload();
> >>>
> >>> 1. is it ok for the consumer to call getPayload() [checking for null],
> >>> and never call hasPayload? It seems so, so why do we have hasPayload?
> >>> can we remove it?
> >>> The current situation requires impls to handle 'hasPayload' logic
> >>> twice: in hasPayload itself and also in getPayload.
> >>>
> >>> 2. You should not modify anything (neither members of the returned
> >>> BytesRef nor bytes in the byte[]). Then why do we have this in
> >>> TestPayloads.java:
> >>> // Just to ensure all codecs can
> >>> // handle a caller that mucks with the
> >>> // returned payload:
> >>> if (rarely()) {
> >>> br.bytes = new byte[random().nextInt(5)];
> >>> }
> >>> br.length = 0;
> >>> br.offset = 0;
> >>>
> >>> Testing for this totally disagrees with the javadocs.
> >>>
> >>> 3. 'Only call this once per position'. This is totally different than
> >>> any of our other 'attributes' on the enums (freq(), startOffset(),
> >>> endOffset(), etc). I think we should
> >>> remove this.
> >>>
> >>> So I want to propose we remove hasPayload(), remove 'only call once
> >>> per position', and remove this test in TestPayloads.java. If we want
> >>> to make sure none
> >>> of lucene's code itself is mucking with the returned BytesRef, we can
> >>> add such a check in AssertingCodec.
> >>>
> >>> /** Returns the payload at this position, or null if no
> >>> * payload was indexed. You should not modify anything (neither
> >>> * members of the returned BytesRef nor bytes in the
> >>> * byte[]). */
> >>> public abstract BytesRef getPayload() throws IOException;
> >>>
> >>> --
> >>> lucidimagination.com
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe [at] lucene
> >>> For additional commands, e-mail: dev-help [at] lucene
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe [at] lucene
> >> For additional commands, e-mail: dev-help [at] lucene
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe [at] lucene
> > For additional commands, e-mail: dev-help [at] lucene
> >
>
>
>
> --
> lucidworks.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe [at] lucene
> For additional commands, e-mail: dev-help [at] lucene
>
>


rcmuir at gmail

Aug 11, 2012, 11:18 PM

Post #6 of 8 (132 views)
Permalink
Re: revisit payloads API in DocsAndPositionsEnum [In reply to]

That would definitely be separate. I looked into this: the problem is
things like LUCENE-4219.

So the current behavior for payload-using span queries (at least
span-near with payloads) is wrong, they score differently depending
upon whether you next() or advance() them (which is horrible!), so I
completely don't trust any of the existing tests.

I think we need to either first fix the bugs in spans or do something
else with them before trying to simplify their APIs.

On Sun, Aug 12, 2012 at 2:10 AM, Shai Erera <serera [at] gmail> wrote:
> Looks good.
>
> Perhaps separately, what do you think about doing the same to
> Spans.isPayloadAvailable/getPayload?
>
> Shai
>
>
> On Sun, Aug 12, 2012 at 8:56 AM, Robert Muir <rcmuir [at] gmail> wrote:
>>
>> Here's a patch: http://pastebin.com/d2DdWxJp
>>
>> On Sat, Aug 11, 2012 at 1:35 PM, Simon Willnauer
>> <simon.willnauer [at] gmail> wrote:
>> > +1 this makes lots of sense
>> >
>> > simon
>> >
>> > On Sat, Aug 11, 2012 at 7:28 PM, Michael McCandless
>> > <lucene [at] mikemccandless> wrote:
>> >> +1
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >>
>> >> On Sat, Aug 11, 2012 at 10:08 AM, Robert Muir <rcmuir [at] gmail> wrote:
>> >>> The payloads api is really confusing:
>> >>>
>> >>> /** Returns the payload at this position, or null if no
>> >>> * payload was indexed. Only call this once per
>> >>> * position. You should not modify anything (neither
>> >>> * members of the returned BytesRef nor bytes in the
>> >>> * byte[]). */
>> >>> public abstract BytesRef getPayload() throws IOException;
>> >>>
>> >>> public abstract boolean hasPayload();
>> >>>
>> >>> 1. is it ok for the consumer to call getPayload() [checking for null],
>> >>> and never call hasPayload? It seems so, so why do we have hasPayload?
>> >>> can we remove it?
>> >>> The current situation requires impls to handle 'hasPayload' logic
>> >>> twice: in hasPayload itself and also in getPayload.
>> >>>
>> >>> 2. You should not modify anything (neither members of the returned
>> >>> BytesRef nor bytes in the byte[]). Then why do we have this in
>> >>> TestPayloads.java:
>> >>> // Just to ensure all codecs can
>> >>> // handle a caller that mucks with the
>> >>> // returned payload:
>> >>> if (rarely()) {
>> >>> br.bytes = new byte[random().nextInt(5)];
>> >>> }
>> >>> br.length = 0;
>> >>> br.offset = 0;
>> >>>
>> >>> Testing for this totally disagrees with the javadocs.
>> >>>
>> >>> 3. 'Only call this once per position'. This is totally different than
>> >>> any of our other 'attributes' on the enums (freq(), startOffset(),
>> >>> endOffset(), etc). I think we should
>> >>> remove this.
>> >>>
>> >>> So I want to propose we remove hasPayload(), remove 'only call once
>> >>> per position', and remove this test in TestPayloads.java. If we want
>> >>> to make sure none
>> >>> of lucene's code itself is mucking with the returned BytesRef, we can
>> >>> add such a check in AssertingCodec.
>> >>>
>> >>> /** Returns the payload at this position, or null if no
>> >>> * payload was indexed. You should not modify anything (neither
>> >>> * members of the returned BytesRef nor bytes in the
>> >>> * byte[]). */
>> >>> public abstract BytesRef getPayload() throws IOException;
>> >>>
>> >>> --
>> >>> lucidimagination.com
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: dev-unsubscribe [at] lucene
>> >>> For additional commands, e-mail: dev-help [at] lucene
>> >>>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscribe [at] lucene
>> >> For additional commands, e-mail: dev-help [at] lucene
>> >>
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe [at] lucene
>> > For additional commands, e-mail: dev-help [at] lucene
>> >
>>
>>
>>
>> --
>> lucidworks.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe [at] lucene
>> For additional commands, e-mail: dev-help [at] lucene
>>
>



--
lucidworks.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


serera at gmail

Aug 11, 2012, 11:29 PM

Post #7 of 8 (137 views)
Permalink
Re: revisit payloads API in DocsAndPositionsEnum [In reply to]

I agree

Shai

On Sun, Aug 12, 2012 at 9:18 AM, Robert Muir <rcmuir [at] gmail> wrote:

> That would definitely be separate. I looked into this: the problem is
> things like LUCENE-4219.
>
> So the current behavior for payload-using span queries (at least
> span-near with payloads) is wrong, they score differently depending
> upon whether you next() or advance() them (which is horrible!), so I
> completely don't trust any of the existing tests.
>
> I think we need to either first fix the bugs in spans or do something
> else with them before trying to simplify their APIs.
>
> On Sun, Aug 12, 2012 at 2:10 AM, Shai Erera <serera [at] gmail> wrote:
> > Looks good.
> >
> > Perhaps separately, what do you think about doing the same to
> > Spans.isPayloadAvailable/getPayload?
> >
> > Shai
> >
> >
> > On Sun, Aug 12, 2012 at 8:56 AM, Robert Muir <rcmuir [at] gmail> wrote:
> >>
> >> Here's a patch: http://pastebin.com/d2DdWxJp
> >>
> >> On Sat, Aug 11, 2012 at 1:35 PM, Simon Willnauer
> >> <simon.willnauer [at] gmail> wrote:
> >> > +1 this makes lots of sense
> >> >
> >> > simon
> >> >
> >> > On Sat, Aug 11, 2012 at 7:28 PM, Michael McCandless
> >> > <lucene [at] mikemccandless> wrote:
> >> >> +1
> >> >>
> >> >> Mike McCandless
> >> >>
> >> >> http://blog.mikemccandless.com
> >> >>
> >> >>
> >> >> On Sat, Aug 11, 2012 at 10:08 AM, Robert Muir <rcmuir [at] gmail>
> wrote:
> >> >>> The payloads api is really confusing:
> >> >>>
> >> >>> /** Returns the payload at this position, or null if no
> >> >>> * payload was indexed. Only call this once per
> >> >>> * position. You should not modify anything (neither
> >> >>> * members of the returned BytesRef nor bytes in the
> >> >>> * byte[]). */
> >> >>> public abstract BytesRef getPayload() throws IOException;
> >> >>>
> >> >>> public abstract boolean hasPayload();
> >> >>>
> >> >>> 1. is it ok for the consumer to call getPayload() [checking for
> null],
> >> >>> and never call hasPayload? It seems so, so why do we have
> hasPayload?
> >> >>> can we remove it?
> >> >>> The current situation requires impls to handle 'hasPayload'
> logic
> >> >>> twice: in hasPayload itself and also in getPayload.
> >> >>>
> >> >>> 2. You should not modify anything (neither members of the returned
> >> >>> BytesRef nor bytes in the byte[]). Then why do we have this in
> >> >>> TestPayloads.java:
> >> >>> // Just to ensure all codecs can
> >> >>> // handle a caller that mucks with the
> >> >>> // returned payload:
> >> >>> if (rarely()) {
> >> >>> br.bytes = new byte[random().nextInt(5)];
> >> >>> }
> >> >>> br.length = 0;
> >> >>> br.offset = 0;
> >> >>>
> >> >>> Testing for this totally disagrees with the javadocs.
> >> >>>
> >> >>> 3. 'Only call this once per position'. This is totally different
> than
> >> >>> any of our other 'attributes' on the enums (freq(), startOffset(),
> >> >>> endOffset(), etc). I think we should
> >> >>> remove this.
> >> >>>
> >> >>> So I want to propose we remove hasPayload(), remove 'only call once
> >> >>> per position', and remove this test in TestPayloads.java. If we want
> >> >>> to make sure none
> >> >>> of lucene's code itself is mucking with the returned BytesRef, we
> can
> >> >>> add such a check in AssertingCodec.
> >> >>>
> >> >>> /** Returns the payload at this position, or null if no
> >> >>> * payload was indexed. You should not modify anything (neither
> >> >>> * members of the returned BytesRef nor bytes in the
> >> >>> * byte[]). */
> >> >>> public abstract BytesRef getPayload() throws IOException;
> >> >>>
> >> >>> --
> >> >>> lucidimagination.com
> >> >>>
> >> >>>
> ---------------------------------------------------------------------
> >> >>> To unsubscribe, e-mail: dev-unsubscribe [at] lucene
> >> >>> For additional commands, e-mail: dev-help [at] lucene
> >> >>>
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: dev-unsubscribe [at] lucene
> >> >> For additional commands, e-mail: dev-help [at] lucene
> >> >>
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: dev-unsubscribe [at] lucene
> >> > For additional commands, e-mail: dev-help [at] lucene
> >> >
> >>
> >>
> >>
> >> --
> >> lucidworks.com
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe [at] lucene
> >> For additional commands, e-mail: dev-help [at] lucene
> >>
> >
>
>
>
> --
> lucidworks.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe [at] lucene
> For additional commands, e-mail: dev-help [at] lucene
>
>


lucene at mikemccandless

Aug 12, 2012, 11:21 AM

Post #8 of 8 (133 views)
Permalink
Re: revisit payloads API in DocsAndPositionsEnum [In reply to]

+1, this is a great cleanup. Thank Robert!

Mike McCandless

http://blog.mikemccandless.com

On Sun, Aug 12, 2012 at 1:56 AM, Robert Muir <rcmuir [at] gmail> wrote:
> Here's a patch: http://pastebin.com/d2DdWxJp
>
> On Sat, Aug 11, 2012 at 1:35 PM, Simon Willnauer
> <simon.willnauer [at] gmail> wrote:
>> +1 this makes lots of sense
>>
>> simon
>>
>> On Sat, Aug 11, 2012 at 7:28 PM, Michael McCandless
>> <lucene [at] mikemccandless> wrote:
>>> +1
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Sat, Aug 11, 2012 at 10:08 AM, Robert Muir <rcmuir [at] gmail> wrote:
>>>> The payloads api is really confusing:
>>>>
>>>> /** Returns the payload at this position, or null if no
>>>> * payload was indexed. Only call this once per
>>>> * position. You should not modify anything (neither
>>>> * members of the returned BytesRef nor bytes in the
>>>> * byte[]). */
>>>> public abstract BytesRef getPayload() throws IOException;
>>>>
>>>> public abstract boolean hasPayload();
>>>>
>>>> 1. is it ok for the consumer to call getPayload() [checking for null],
>>>> and never call hasPayload? It seems so, so why do we have hasPayload?
>>>> can we remove it?
>>>> The current situation requires impls to handle 'hasPayload' logic
>>>> twice: in hasPayload itself and also in getPayload.
>>>>
>>>> 2. You should not modify anything (neither members of the returned
>>>> BytesRef nor bytes in the byte[]). Then why do we have this in
>>>> TestPayloads.java:
>>>> // Just to ensure all codecs can
>>>> // handle a caller that mucks with the
>>>> // returned payload:
>>>> if (rarely()) {
>>>> br.bytes = new byte[random().nextInt(5)];
>>>> }
>>>> br.length = 0;
>>>> br.offset = 0;
>>>>
>>>> Testing for this totally disagrees with the javadocs.
>>>>
>>>> 3. 'Only call this once per position'. This is totally different than
>>>> any of our other 'attributes' on the enums (freq(), startOffset(),
>>>> endOffset(), etc). I think we should
>>>> remove this.
>>>>
>>>> So I want to propose we remove hasPayload(), remove 'only call once
>>>> per position', and remove this test in TestPayloads.java. If we want
>>>> to make sure none
>>>> of lucene's code itself is mucking with the returned BytesRef, we can
>>>> add such a check in AssertingCodec.
>>>>
>>>> /** Returns the payload at this position, or null if no
>>>> * payload was indexed. You should not modify anything (neither
>>>> * members of the returned BytesRef nor bytes in the
>>>> * byte[]). */
>>>> public abstract BytesRef getPayload() throws IOException;
>>>>
>>>> --
>>>> lucidimagination.com
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe [at] lucene
>>>> For additional commands, e-mail: dev-help [at] lucene
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe [at] lucene
>>> For additional commands, e-mail: dev-help [at] lucene
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe [at] lucene
>> For additional commands, e-mail: dev-help [at] lucene
>>
>
>
>
> --
> lucidworks.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe [at] lucene
> For additional commands, e-mail: dev-help [at] lucene
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.