Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

docBase Parameter in Collector.setNextReader

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


benhei at gmail

Nov 12, 2009, 1:25 PM

Post #1 of 7 (960 views)
Permalink
docBase Parameter in Collector.setNextReader

Hello everyone,

I'm a little bit confused about the docBase parameter of
Collector.setNextReader.

Imagine the following:
- Create new Index
- Index 5 docs
- Call IndexWriter.commit()
- Index 7 docs
- Call IndexWriter.commit()
- close Writer

Now I have a 2-segment index right?

I have implemented an own Collector. If I execute an all docs matching
query on the above case the Collectors setNextReader method is called
twice (as I expected).
But docBase both times equals 0. Shouldn't it be 0 and 5?

What mistake could trigger such behaviour?


Benjamin

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Nov 12, 2009, 2:14 PM

Post #2 of 7 (930 views)
Permalink
Re: docBase Parameter in Collector.setNextReader [In reply to]

Yes it should be 0 and 5.

I'm not sure what would cause 0 and 0, offhand.

Can you make a small standalone test case showing it?

Mike

On Thu, Nov 12, 2009 at 4:25 PM, Benjamin Heilbrunn <benhei [at] gmail> wrote:
> Hello everyone,
>
> I'm a little bit confused about the docBase parameter of
> Collector.setNextReader.
>
> Imagine the following:
>  - Create new Index
>  - Index 5 docs
>  - Call IndexWriter.commit()
>  - Index 7 docs
>  - Call IndexWriter.commit()
>  - close Writer
>
> Now I have a 2-segment index right?
>
> I have implemented an own Collector. If I execute an all docs matching
> query on the above case the Collectors setNextReader method is called
> twice (as I expected).
> But docBase both times equals 0. Shouldn't it be 0 and 5?
>
> What mistake could trigger such behaviour?
>
>
> Benjamin
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


uwe at thetaphi

Nov 12, 2009, 2:20 PM

Post #3 of 7 (918 views)
Permalink
RE: docBase Parameter in Collector.setNextReader [In reply to]

Could it that you are using the expert IndexSearcher ctor that takes the sub
reader array and docStarts?

Else it is impossible that all docBases are 0 (look into the code).

By the way, the docStarts should be 5 and then 0, as IndexSearcher starts to
search bigger segments first. Maybe this is your problem, that you have only
looked at the second call?

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

> -----Original Message-----
> From: Michael McCandless [mailto:lucene [at] mikemccandless]
> Sent: Thursday, November 12, 2009 11:15 PM
> To: java-user [at] lucene
> Subject: Re: docBase Parameter in Collector.setNextReader
>
> Yes it should be 0 and 5.
>
> I'm not sure what would cause 0 and 0, offhand.
>
> Can you make a small standalone test case showing it?
>
> Mike
>
> On Thu, Nov 12, 2009 at 4:25 PM, Benjamin Heilbrunn <benhei [at] gmail>
> wrote:
> > Hello everyone,
> >
> > I'm a little bit confused about the docBase parameter of
> > Collector.setNextReader.
> >
> > Imagine the following:
> >  - Create new Index
> >  - Index 5 docs
> >  - Call IndexWriter.commit()
> >  - Index 7 docs
> >  - Call IndexWriter.commit()
> >  - close Writer
> >
> > Now I have a 2-segment index right?
> >
> > I have implemented an own Collector. If I execute an all docs matching
> > query on the above case the Collectors setNextReader method is called
> > twice (as I expected).
> > But docBase both times equals 0. Shouldn't it be 0 and 5?
> >
> > What mistake could trigger such behaviour?
> >
> >
> > Benjamin
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > For additional commands, e-mail: java-user-help [at] lucene
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


uwe at thetaphi

Nov 12, 2009, 2:28 PM

Post #4 of 7 (912 views)
Permalink
RE: docBase Parameter in Collector.setNextReader [In reply to]

> By the way, the docStarts should be 5 and then 0, as IndexSearcher starts
> to
> search bigger segments first. Maybe this is your problem, that you have
> only
> looked at the second call?

Oh, that's no longer the case. Sorry. The docBases should be sorted upwards.
Mike: What was the reason for this change? By the way. Oal.util.SortTemplate
is now dead code, but it's released, so we cannot remove it (but it's really
handy). :-)

Uwe

>
> > -----Original Message-----
> > From: Michael McCandless [mailto:lucene [at] mikemccandless]
> > Sent: Thursday, November 12, 2009 11:15 PM
> > To: java-user [at] lucene
> > Subject: Re: docBase Parameter in Collector.setNextReader
> >
> > Yes it should be 0 and 5.
> >
> > I'm not sure what would cause 0 and 0, offhand.
> >
> > Can you make a small standalone test case showing it?
> >
> > Mike
> >
> > On Thu, Nov 12, 2009 at 4:25 PM, Benjamin Heilbrunn <benhei [at] gmail>
> > wrote:
> > > Hello everyone,
> > >
> > > I'm a little bit confused about the docBase parameter of
> > > Collector.setNextReader.
> > >
> > > Imagine the following:
> > >  - Create new Index
> > >  - Index 5 docs
> > >  - Call IndexWriter.commit()
> > >  - Index 7 docs
> > >  - Call IndexWriter.commit()
> > >  - close Writer
> > >
> > > Now I have a 2-segment index right?
> > >
> > > I have implemented an own Collector. If I execute an all docs matching
> > > query on the above case the Collectors setNextReader method is called
> > > twice (as I expected).
> > > But docBase both times equals 0. Shouldn't it be 0 and 5?
> > >
> > > What mistake could trigger such behaviour?
> > >
> > >
> > > Benjamin
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > > For additional commands, e-mail: java-user-help [at] lucene
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > For additional commands, e-mail: java-user-help [at] lucene
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Nov 12, 2009, 4:33 PM

Post #5 of 7 (912 views)
Permalink
Re: docBase Parameter in Collector.setNextReader [In reply to]

On Thu, Nov 12, 2009 at 5:28 PM, Uwe Schindler <uwe [at] thetaphi> wrote:

> Mike: What was the reason for this change?

We first thought this (visiting segments from largest to smallest
size) improved performance, but, then we decided a better optimization
was for Collectors to save tie breaking by knowing the docIDs will
always arrive in order. So we reverted the change.

> By the way. Oal.util.SortTemplate
> is now dead code, but it's released, so we cannot remove it (but it's really
> handy). :-)

Sigh! We need a dead-code-finder. Would have also discovered we no
longer use Directory.touchFile ;)

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


benhei at gmail

Nov 13, 2009, 2:22 AM

Post #6 of 7 (898 views)
Permalink
Re: docBase Parameter in Collector.setNextReader [In reply to]

Hello,

sorry for causing inconvenience.
It was my mistake and i wasn't able to reproduce it completely this morning.

My testcase was a little to complex and there were two or three bugs /
false assumptions which made it look to me like i explained above.


Benjamin

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Nov 13, 2009, 2:36 AM

Post #7 of 7 (900 views)
Permalink
Re: docBase Parameter in Collector.setNextReader [In reply to]

Phew :)

Thanks for bringing closure!

Mike

On Fri, Nov 13, 2009 at 5:22 AM, Benjamin Heilbrunn <benhei [at] gmail> wrote:
> Hello,
>
> sorry for causing inconvenience.
> It was my mistake and i wasn't able to reproduce it completely this morning.
>
> My testcase was a little to complex and there were two or three bugs /
> false assumptions which made it look to me like i explained above.
>
>
> Benjamin
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.