Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Do deleted documents affect scores?

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


yuvalf at answers

Feb 10, 2010, 11:34 PM

Post #1 of 4 (866 views)
Permalink
Do deleted documents affect scores?

I want to focus my previous question.
Say we have two Lucene indexes: A and B.
Index A contains documents a and b.
Index B used to contain documents a, b and c,
But c was deleted.
All documents share some vocabulary.
If we search using terms common to documents b and c,
Can we get a different score for document b in index A and index B?
Note that both indexes are identical with regard to the non-deleted documents,
And only differ by the deleted document c.
Thanks,
Yuval


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


ian.lea at gmail

Feb 11, 2010, 8:35 AM

Post #2 of 4 (794 views)
Permalink
Re: Do deleted documents affect scores? [In reply to]

I'm pretty sure that the answer is no and a quick test on a small
index with/without deleted docs showed no difference in the scores,
using 3.0. But that was hardly a rigorous test and I don't know
enough about lucene internals and scoring to give a definitive answer.

Shouldn't be too hard for you to verify or disprove: build an index
and throw loads of updates and deletes at it, checking scores as you
go.


--
Ian.


On Thu, Feb 11, 2010 at 7:34 AM, Yuval Feinstein <yuvalf [at] answers> wrote:
> I want to focus my previous question.
> Say we have two Lucene indexes: A and B.
> Index A contains documents a and b.
> Index B used to contain documents a, b and c,
> But c was deleted.
> All documents share some vocabulary.
> If we search using terms common to documents b and c,
> Can we get a different score for document b in index A and index B?
> Note that both indexes are identical with regard to the non-deleted documents,
> And only differ by the deleted document c.
> Thanks,
> Yuval
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


ab at getopt

Feb 11, 2010, 8:53 AM

Post #3 of 4 (802 views)
Permalink
Re: Do deleted documents affect scores? [In reply to]

On 2010-02-11 17:35, Ian Lea wrote:
> I'm pretty sure that the answer is no and a quick test on a small
> index with/without deleted docs showed no difference in the scores,
> using 3.0. But that was hardly a rigorous test and I don't know
> enough about lucene internals and scoring to give a definitive answer.
>
> Shouldn't be too hard for you to verify or disprove: build an index
> and throw loads of updates and deletes at it, checking scores as you
> go.

Actually, deleted docs do affect scoring for a time - IDF of a term is
not updated until you optimize (or when Lucene decides to merge segments).


--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


yuvalf at answers

Feb 11, 2010, 10:54 AM

Post #4 of 4 (783 views)
Permalink
RE: Do deleted documents affect scores? [In reply to]

Thanks Ian and Andrzej.
You solved a mystery for us.
-- Yuval

________________________________________
From: Andrzej Bialecki [ab [at] getopt]
Sent: Thursday, February 11, 2010 6:53 PM
To: java-user [at] lucene
Subject: Re: Do deleted documents affect scores?

On 2010-02-11 17:35, Ian Lea wrote:
> I'm pretty sure that the answer is no and a quick test on a small
> index with/without deleted docs showed no difference in the scores,
> using 3.0. But that was hardly a rigorous test and I don't know
> enough about lucene internals and scoring to give a definitive answer.
>
> Shouldn't be too hard for you to verify or disprove: build an index
> and throw loads of updates and deletes at it, checking scores as you
> go.

Actually, deleted docs do affect scoring for a time - IDF of a term is
not updated until you optimize (or when Lucene decides to merge segments).


--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.