
gsingers at apache
Nov 19, 2007, 10:38 AM
Post #6 of 14
(493 views)
Permalink
|
|
Re: Scoring for all the documents in the index relative to a query
[In reply to]
|
|
Lucene only scores those documents that have at least one match term, it doesn't implement a pure vector space model whereby all documents are scored (it uses a combination of the Boolean Model and VSM). Thus, I am not sure you can do a pure comparison. I suppose you could simulating the relevance by using TermVectors and looping over all documents, but I think one could argue this isn't exactly what Lucene does, so it isn't comparable. http://lucene.apache.org/java/docs/scoring.html might help in understanding this stuff. HTH, Grant On Nov 19, 2007, at 1:25 PM, HAIDUC SONIA wrote: > I am trying to order all the documents in the index according to > their similarity to a given query. I am interested in having a > complete list of *all* the documents in the index with their score. > From what I understood by reading some documentation, Lucene > internally assigns scores to all the documents in the index > according to their similarity to the query, but when returning the > hits, all the scores that are less than 0 are rounded to 0 and only > the documents with the score > 0 are returned as hits. But what I > would like to get is the list before this intermediate processing, > so the list of all the documents with their raw score. I am trying > to compare Lucene with LSI and for the comparison I want to do, I > need the entire list of documents. Is there a way that I can get > that with Lucene? > I hope I explained it clearly this time. If you need more details > let me know. > > Thank you, > Sonia > > ----- Original Message ---- > From: Erick Erickson <erickerickson[at]gmail.com> > To: java-user[at]lucene.apache.org > Sent: Monday, November 19, 2007 11:55:00 AM > Subject: Re: Scoring for all the documents in the index relative to > a query > > > Could you explain a bit more what problem you're trying to solve? > The reason I ask is that your question doesn't make sense to me, > since I have no idea what you expect by the term "negative score". > > My simplistic view has been that all the docs returned via Hits > or HitCollector have scores > 0, and all the rest have scores of 0, > and this view is supported by the explanation of > HitCollector.collect > > " Called once for every non-zero scoring document, with the > document number and its score." > > You might also get value from this page: > http://lucene.apache.org/java/docs/scoring.html#Scoring > > Best > Erick > > On Nov 19, 2007 11:05 AM, HAIDUC SONIA <haiduc_sonia[at]yahoo.com> wrote: > >> Hi everyone, >> >> I am trying to obtain the score for each document in the index > relative to >> a given query. For example, if I have the query "search file", I am > trying >> to get the list of all documents in the index and their scores > relative to >> the given query. I tried first using Hits, which gave me the > normalized >> score. I thought that I don't see the whole list of documents and > their >> scores because of the normalization, so I tried using HitsCollector. > But >> even after using HitsCollector, I get the same number of matching > documents, >> so the normalization didn't exclude documents because of negative > scoring. >> Does Lucene actually compute the score for all the documents in the > index or >> just for matching documents? I really need to have the scores for all > the >> documents in the index relative to the query (even if negative), not > just >> the ones that contain the query terms(this is what Lucene considers >> "matching documents", right?). Is this possible using Lucene? >> >> I really appreciate your time and effort! >> Thanks, >> Sonia >> >> >> >> >> >> > > ____________________________________________________________________________________ >> Get easy, one-click access to your favorites. >> Make Yahoo! your homepage. >> http://www.yahoo.com/r/hs >> > > > > > > > > ____________________________________________________________________________________ > Get easy, one-click access to your favorites. > Make Yahoo! your homepage. > http://www.yahoo.com/r/hs -------------------------- Grant Ingersoll http://lucene.grantingersoll.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org For additional commands, e-mail: java-user-help[at]lucene.apache.org
|