Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

SinkTokenizer: next(Token) vs. next()

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


cdoronc at gmail

Dec 26, 2007, 3:20 PM

Post #1 of 9 (2834 views)
Permalink
SinkTokenizer: next(Token) vs. next()

Working on Lucene-1101 I checked if SinkTokenizer.next(Token) should also
call Token.clear(). (It shouldn't, because it ignores the input token.)

However I think that calls to next() would end up creating Tokens for
nothing (by TokenStream.next()).

May currently be an empty case (if all current uses call next(Token)), but
still - is it safer for SinkTokenizer to implement next() rather than
next(Token)?


gsingers at apache

Dec 27, 2007, 6:20 AM

Post #2 of 9 (2732 views)
Permalink
Re: SinkTokenizer: next(Token) vs. next() [In reply to]

On Dec 26, 2007, at 6:20 PM, Doron Cohen wrote:

> Working on Lucene-1101 I checked if SinkTokenizer.next(Token) should
> also
> call Token.clear(). (It shouldn't, because it ignores the input
> token.)
>
> However I think that calls to next() would end up creating Tokens for
> nothing (by TokenStream.next()).
>
> May currently be an empty case (if all current uses call
> next(Token)), but
> still - is it safer for SinkTokenizer to implement next() rather than
> next(Token)?

I'm still a bit fuzzy on the interplay of these myself, but what makes
the call of SinkTokenizer.next(Token) unsafe or is it just the
potential of Tokens being created? I guess SinkTokenizer could just
override both methods.

-Grant



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


cdoronc at gmail

Dec 28, 2007, 5:20 AM

Post #3 of 9 (2724 views)
Permalink
Re: SinkTokenizer: next(Token) vs. next() [In reply to]

Hi Grant,

"safer" was not the best wording, sorry for that - I meant performance
wise, there's no correctness issue.

The "contract" of the two next methods as I understand it is that
a TS must implement one of them. I see no harm in implementing
the two (but doing so is likely to just duplicate TokenStream's code.)

For SinkTokenizer it actually implements next with no reuse logic,
so it really should implement just next(). Then, if any consumer
of SinkTokenizer calls next(Token), the default impl of this method
in TokenStream would call SinkTokenizers' next().

Do you agree with this?

Cheers,
Doron

On Dec 27, 2007 4:20 PM, Grant Ingersoll <gsingers [at] apache> wrote:

>
> On Dec 26, 2007, at 6:20 PM, Doron Cohen wrote:
>
> > Working on Lucene-1101 I checked if SinkTokenizer.next(Token) should
> > also
> > call Token.clear(). (It shouldn't, because it ignores the input
> > token.)
> >
> > However I think that calls to next() would end up creating Tokens for
> > nothing (by TokenStream.next()).
> >
> > May currently be an empty case (if all current uses call
> > next(Token)), but
> > still - is it safer for SinkTokenizer to implement next() rather than
> > next(Token)?
>
> I'm still a bit fuzzy on the interplay of these myself, but what makes
> the call of SinkTokenizer.next(Token) unsafe or is it just the
> potential of Tokens being created? I guess SinkTokenizer could just
> override both methods.
>
> -Grant
>


yonik at apache

Dec 28, 2007, 5:37 AM

Post #4 of 9 (2729 views)
Permalink
Re: SinkTokenizer: next(Token) vs. next() [In reply to]

On Dec 28, 2007 8:20 AM, Doron Cohen <cdoronc [at] gmail> wrote:
> The "contract" of the two next methods as I understand it is that
> a TS must implement one of them. I see no harm in implementing
> the two (but doing so is likely to just duplicate TokenStream's code.)

I don't think the contract was ever laid out so strictly. I think
it's fine for any TokenStream to implement both if it's advantageous
to do so.

> For SinkTokenizer it actually implements next with no reuse logic,
> so it really should implement just next(). Then, if any consumer
> of SinkTokenizer calls next(Token), the default impl of this method
> in TokenStream would call SinkTokenizers' next().
>
> Do you agree with this?

A agree. The current implementation is sub-optimal if the caller uses next()

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


cdoronc at gmail

Dec 28, 2007, 5:43 AM

Post #5 of 9 (2719 views)
Permalink
Re: SinkTokenizer: next(Token) vs. next() [In reply to]

>
> > a TS must implement one of them. I see no harm in implementing
> > the two (but doing so is likely to just duplicate TokenStream's code.)
>
> I don't think the contract was ever laid out so strictly. I think
> it's fine for any TokenStream to implement both if it's advantageous
> to do so.
>

From TokenStream's Javadocs:
"subclasses must override at least one of next() or next(Token)."


yonik at apache

Dec 28, 2007, 5:54 AM

Post #6 of 9 (2725 views)
Permalink
Re: SinkTokenizer: next(Token) vs. next() [In reply to]

On Dec 28, 2007 8:43 AM, Doron Cohen <cdoronc [at] gmail> wrote:
> >
> > > a TS must implement one of them. I see no harm in implementing
> > > the two (but doing so is likely to just duplicate TokenStream's code.)
> >
> > I don't think the contract was ever laid out so strictly. I think
> > it's fine for any TokenStream to implement both if it's advantageous
> > to do so.
> >
>
> From TokenStream's Javadocs:
> "subclasses must override at least one of next() or next(Token)."

Meaning that it's also fine to override both. We are agreeing, right?

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


gsingers at apache

Dec 28, 2007, 6:10 AM

Post #7 of 9 (2722 views)
Permalink
Re: SinkTokenizer: next(Token) vs. next() [In reply to]

I'm fine w/ making this change. No sense in implementing both as we
can just rely on next(Token) to call next(). I will commit the change
and put a comment on the issue that created the SinkTokenizer.


-Grant

On Dec 28, 2007, at 8:43 AM, Doron Cohen wrote:

>>
>>> a TS must implement one of them. I see no harm in implementing
>>> the two (but doing so is likely to just duplicate TokenStream's
>>> code.)
>>
>> I don't think the contract was ever laid out so strictly. I think
>> it's fine for any TokenStream to implement both if it's advantageous
>> to do so.
>>
>
> From TokenStream's Javadocs:
> "subclasses must override at least one of next() or next(Token)."



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


cdoronc at gmail

Dec 28, 2007, 6:26 AM

Post #8 of 9 (2751 views)
Permalink
Re: SinkTokenizer: next(Token) vs. next() [In reply to]

On Dec 28, 2007 3:54 PM, Yonik Seeley <yonik [at] apache> wrote:

> On Dec 28, 2007 8:43 AM, Doron Cohen <cdoronc [at] gmail> wrote:
> > >
> > > > a TS must implement one of them. I see no harm in implementing
> > > > the two (but doing so is likely to just duplicate TokenStream's
> code.)
> > >
> > > I don't think the contract was ever laid out so strictly. I think
> > > it's fine for any TokenStream to implement both if it's advantageous
> > > to do so.
> > >
> >
> > From TokenStream's Javadocs:
> > "subclasses must override at least one of next() or next(Token)."
>
> Meaning that it's also fine to override both. We are agreeing, right?


Yes :-)

Doron


cdoronc at gmail

Dec 28, 2007, 6:27 AM

Post #9 of 9 (2720 views)
Permalink
Re: SinkTokenizer: next(Token) vs. next() [In reply to]

On Dec 28, 2007 4:10 PM, Grant Ingersoll <gsingers [at] apache> wrote:

> I'm fine w/ making this change. No sense in implementing both as we
> can just rely on next(Token) to call next(). I will commit the change
> and put a comment on the issue that created the SinkTokenizer.
>

Cool thanks!

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.