Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

BlockGroupingCollector, not always getting first document

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


goliatus at polzone

Mar 8, 2012, 1:30 AM

Post #1 of 7 (498 views)
Permalink
BlockGroupingCollector, not always getting first document

Hello,

I am using BlockGroupingCollector for first time and I have some small
problem with it. Indexing code is pretty much copy of the one from docs.
Searching looks like this:

Filter groupEndFilter = new CachingWrapperFilter(new
QueryWrapperFilter(new TermQuery(new Term("last", "true"))));
...
BlockGroupingCollector c = new
BlockGroupingCollector(SORT_SCORE, offset + n, false, groupEndFilter);
searcher.search(query, filter, c);
TopGroups groups = c.getTopGroups(SORT_ID, offset, 0,
1, true);
if (groups != null) {
results.total_hits =
groups.totalGroupCount.intValue();
for (int i = 0; i < groups.groups.length; i++)
if (groups.groups[i].totalHits > 0)
results.add(getResult(searcher,
groups.groups[i].scoreDocs[0]));
}

So I want to get top groups for given query with documents sorted by
their IDs. For some reason I don't always get first document from group.
It's like every 10th group of search results does not have document with
lowest ID on first position in scoreDocs.
ID is numeric field. Sorting groups by field values works fine.
Documents also are sorted by their IDs during indexing and I'm adding
them as block.

What am I doing wrong?

--
Regards,
Grzegorz


lucene at mikemccandless

Mar 8, 2012, 3:12 AM

Post #2 of 7 (486 views)
Permalink
Re: BlockGroupingCollector, not always getting first document [In reply to]

Hmm... that doesn't sound good.

Is the issue repeatable once it happens? And, when it happens, can
you verify that the index is corrrect (eg, the missing doc is
retrievable by non-grouped searches)? This way we can isolate the
issue to the search-side.

Can you boil it down to a small test case?

Mike McCandless

http://blog.mikemccandless.com

On Thu, Mar 8, 2012 at 4:30 AM, Grzegorz Tańczyk <goliatus [at] polzone> wrote:
> Hello,
>
> I am using BlockGroupingCollector for first time and I have some small
> problem with it. Indexing code is pretty much copy of the one from docs.
> Searching looks like this:
>
>                Filter groupEndFilter = new CachingWrapperFilter(new
> QueryWrapperFilter(new TermQuery(new Term("last", "true"))));
> ...
>                BlockGroupingCollector c = new
> BlockGroupingCollector(SORT_SCORE, offset + n, false, groupEndFilter);
>                searcher.search(query, filter, c);
>                TopGroups groups = c.getTopGroups(SORT_ID, offset, 0, 1,
> true);
>                if (groups != null) {
>                    results.total_hits = groups.totalGroupCount.intValue();
>                    for (int i = 0; i < groups.groups.length; i++)
>                        if (groups.groups[i].totalHits > 0)
>                            results.add(getResult(searcher,
> groups.groups[i].scoreDocs[0]));
>                }
>
> So I want to get top groups for given query with documents sorted by their
> IDs. For some reason I don't always get first document from group. It's like
> every 10th group of search results does not have document with lowest ID on
> first position in scoreDocs.
> ID is numeric field. Sorting groups by field values works fine.
> Documents also are sorted by their IDs during indexing and I'm adding them
> as block.
>
> What am I doing wrong?
>
> --
> Regards,
>  Grzegorz

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


grzegorz.tanczyk at polskastrefa

Mar 8, 2012, 4:22 AM

Post #3 of 7 (473 views)
Permalink
Re: Re: BlockGroupingCollector, not always getting first document [In reply to]

Hello,

Thanks for reply, I can find first document from group using non
grouping search.

To be sure about this I deleted index and indexed only first 100 groups
which gives around 2300 documents and I see the problem on at least half
of groups. No problem in finding first documents normally.
I noticed this problem first when I had indexed few thousands groups.

When I index everything(15k groups, which means around 200k documents,
commit every 500 groups) the problem is no more or at least I can't find
any group with non first document in scoreDocs[0]. I'm reindexing it
since morning, I will reindex it once again to be sure about this one.

I'm not Lucene internals expert, but maybe this problem is somehow
connected to segment merging?

Some additional info:

I'm using Lucene 3.5.0.

Sort:
public final static Sort SORT_ID = new Sort(new SortField("id_n",
SortField.INT));

Adding field to document:
doc.add(new NumericField("id_n", Store.NO, true).setIntValue(rs.getInt(1)));

(I checked how it works with Store.YES, it didn't change anything.)

I also call searcher.setDefaultFieldSortScoring(true, true) before
grouping search.

Calling optimize() also didn't help(but anyway I wouldn't use this
method even if it was the solution for this problem ;-) )

Index writer config has default settings.

For now I'm using workaround, but I'm looking forward to finding
solution of this problem.

W dniu 2012-03-08 12:12, Michael McCandless pisze:
> Hmm... that doesn't sound good.
>
> Is the issue repeatable once it happens? And, when it happens, can
> you verify that the index is corrrect (eg, the missing doc is
> retrievable by non-grouped searches)? This way we can isolate the
> issue to the search-side.
>
> Can you boil it down to a small test case?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


goliatus at polzone

Mar 8, 2012, 4:23 AM

Post #4 of 7 (477 views)
Permalink
Re: Re: BlockGroupingCollector, not always getting first document [In reply to]

Hello,

Thanks for reply, I can find first document from group using non
grouping search.

To be sure about this I deleted index and indexed only first 100 groups
which gives around 2300 documents and I see the problem on at least half
of groups. No problem in finding first documents normally.
I noticed this problem first when I had indexed few thousands groups.

When I index everything(15k groups, which means around 200k documents,
commit every 500 groups) the problem is no more or at least I can't find
any group with non first document in scoreDocs[0]. I'm reindexing it
since morning, I will reindex it once again to be sure about this one.

I'm not Lucene internals expert, but maybe this problem is somehow
connected to segment merging?

Some additional info:

I'm using Lucene 3.5.0.

Sort:
public final static Sort SORT_ID = new Sort(new SortField("id_n",
SortField.INT));

Adding field to document:
doc.add(new NumericField("id_n", Store.NO,
true).setIntValue(rs.getInt(1)));

(I checked how it works with Store.YES, it didn't change anything.)

I also call searcher.setDefaultFieldSortScoring(true, true) before
grouping search.

Calling optimize() also didn't help(but anyway I wouldn't use this
method even if it was the solution for this problem )

Index writer config has default settings.

For now I'm using workaround, but I'm looking forward to finding
solution of this problem.

W dniu 2012-03-08 12:12, Michael McCandless pisze:
> Hmm... that doesn't sound good.
>
> Is the issue repeatable once it happens? And, when it happens, can
> you verify that the index is corrrect (eg, the missing doc is
> retrievable by non-grouped searches)? This way we can isolate the
> issue to the search-side.
>
> Can you boil it down to a small test case?


lucene at mikemccandless

Mar 9, 2012, 3:06 AM

Post #5 of 7 (469 views)
Permalink
Re: Re: BlockGroupingCollector, not always getting first document [In reply to]

On Thu, Mar 8, 2012 at 7:22 AM, Grzegorz Tańczyk
<grzegorz.tanczyk [at] polskastrefa> wrote:
> Hello,
>
> Thanks for reply, I can find first document from group using non grouping
> search.

OK, so the index seems ok.

> To be sure about this I deleted index and indexed only first 100 groups
> which gives around 2300 documents and I see the problem on at least half of
> groups.  No problem in finding first documents normally.
> I noticed this problem first when I had indexed few thousands groups.

Hmm.

> When I index everything(15k groups, which means around 200k documents,
> commit every 500 groups) the problem is no more or at least I can't find any
> group with non first document in scoreDocs[0]. I'm reindexing it since
> morning, I will reindex it once again to be sure about this one.

Weird that the full index doesn't show the issue but the partial index does.

> I'm not Lucene internals expert, but maybe this problem is somehow connected
> to segment merging?

Well, a simple way to test this is to use set NoMergePolicy on the
IndexWriterConfig.

> Some additional info:
>
> I'm using Lucene 3.5.0.
>
> Sort:
> public final static Sort SORT_ID = new Sort(new SortField("id_n",
> SortField.INT));
>
> Adding field to document:
> doc.add(new NumericField("id_n", Store.NO, true).setIntValue(rs.getInt(1)));
>
> (I checked how it works with Store.YES, it didn't change anything.)
>
> I also call searcher.setDefaultFieldSortScoring(true, true) before grouping
> search.

If you don't call this, is the issue still there?

> Calling optimize() also didn't help(but anyway I wouldn't use this method
> even if it was the solution for this problem ;-) )

OK. Did calling optimize() change which docs were missing...?

> Index writer config has default settings.

Are you doing any deleteDocuments or updateDocument calls?

> For now I'm using workaround, but I'm looking forward to finding solution of
> this problem.

Wait, what's the workaround?

I noticed you pass maxDocsPerGroup=1; if you increase that (eg to 10)
does it change the bug...?

Is it possible to boil this down to a small test case?

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


goliatus at polzone

Mar 9, 2012, 5:52 AM

Post #6 of 7 (474 views)
Permalink
Re: Re: Re: BlockGroupingCollector, not always getting first document [In reply to]

Hello,

I found the problem and it was my misunderstanding. I didn't get first
documents in every group, because some of head documents didn't match
given query. I made a wrong assumption that I can sort between all
documents within group.

--
Regards,
Grzegorz


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Mar 9, 2012, 9:19 AM

Post #7 of 7 (472 views)
Permalink
Re: Re: Re: BlockGroupingCollector, not always getting first document [In reply to]

Phew, thanks for bringing closure!

Mike McCandless

http://blog.mikemccandless.com

On Fri, Mar 9, 2012 at 8:52 AM, Grzegorz Tańczyk <goliatus [at] polzone> wrote:
> Hello,
>
> I found the problem and it was my misunderstanding. I didn't get first
> documents in every group, because some of head documents didn't match given
> query. I made a wrong assumption that I can sort between all documents
> within group.
>
> --
> Regards,
>  Grzegorz
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.