Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Is it a lucene bug?

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


songzi0206 at gmail

Nov 26, 2009, 4:01 AM

Post #1 of 8 (1060 views)
Permalink
Is it a lucene bug?

Hi,
Recently, there is a requirement to sort the hits by both the
scores of documents and the updateTime which is a field of document to
mark the document's update time. We want the new document in the
front and also want high score document in the front,in other words,
we want to mix the score and updateTime, but not first sort by
one,second by the other. So, I design a time based function f(t) to
calculte each document boost to write into the index store.
The result is that I can caculate a value for each document
based its update time, and the value can influence the document score
through adjusting the fieldNorm value. But when I lookup the boost
value through the method document.getBoost() from every document in
the index store, I found the boost value = 1.0. Which means I can set
a document's boost value and the boost value can adjust the final
score, but I can't read the boost value from the document I have
searched out.
Is it a bug in lucene? Thanks. I use lucene version 2.4.1.
PS: Is there any other way to meet my reqirement? I think it is
not a good idea to adjust document's final score through writing a
document boost into the index store. Because if I want to open two
interfaces to the Client: one is sorting documents only by score which
is just the similarity score and has not been adjusted by boost value
f(t), the other is sorting by final score which has been adjuested by
boost value f(t). Thank a lot!

wilson

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


savvas.andreas.moysidis at googlemail

Nov 26, 2009, 5:24 AM

Post #2 of 8 (1014 views)
Permalink
Re: Is it a lucene bug? [In reply to]

hi,



Im not exactly sure I understand they the type of sorting you are trying to
achieve.

You have an updateTime field and you mention that "We want the new document
in the
front and also want high score document in the front".

My take on this is that you want to first sort by the updateTime and then by
score but you say this is not the case?


Instead of calculating a boost value with f(t) can you not calculate and
index the actual value you need for every document?

Then you can first sort by this value and then by score?



regards,

savvas


2009/11/26 Wilson Wu <songzi0206 [at] gmail>

> Hi,
> Recently, there is a requirement to sort the hits by both the
> scores of documents and the updateTime which is a field of document to
> mark the document's update time. We want the new document in the
> front and also want high score document in the front,in other words,
> we want to mix the score and updateTime, but not first sort by
> one,second by the other. So, I design a time based function f(t) to
> calculte each document boost to write into the index store.
> The result is that I can caculate a value for each document
> based its update time, and the value can influence the document score
> through adjusting the fieldNorm value. But when I lookup the boost
> value through the method document.getBoost() from every document in
> the index store, I found the boost value = 1.0. Which means I can set
> a document's boost value and the boost value can adjust the final
> score, but I can't read the boost value from the document I have
> searched out.
> Is it a bug in lucene? Thanks. I use lucene version 2.4.1.
> PS: Is there any other way to meet my reqirement? I think it is
> not a good idea to adjust document's final score through writing a
> document boost into the index store. Because if I want to open two
> interfaces to the Client: one is sorting documents only by score which
> is just the similarity score and has not been adjusted by boost value
> f(t), the other is sorting by final score which has been adjuested by
> boost value f(t). Thank a lot!
>
> wilson
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


uwe at thetaphi

Nov 26, 2009, 5:41 AM

Post #3 of 8 (1004 views)
Permalink
RE: Is it a lucene bug? [In reply to]

Read the documentation of the Document class: if you set a boost for a
document, it is used when indexing the fields and multiplied to each field.
For the document itself no boost value is stored, so you cannot get it (only
so called stored fields are retrievable).

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi


> -----Original Message-----
> From: Wilson Wu [mailto:songzi0206 [at] gmail]
> Sent: Thursday, November 26, 2009 1:01 PM
> To: java-user [at] lucene
> Subject: Is it a lucene bug?
>
> Hi,
> Recently, there is a requirement to sort the hits by both the
> scores of documents and the updateTime which is a field of document to
> mark the document's update time. We want the new document in the
> front and also want high score document in the front,in other words,
> we want to mix the score and updateTime, but not first sort by
> one,second by the other. So, I design a time based function f(t) to
> calculte each document boost to write into the index store.
> The result is that I can caculate a value for each document
> based its update time, and the value can influence the document score
> through adjusting the fieldNorm value. But when I lookup the boost
> value through the method document.getBoost() from every document in
> the index store, I found the boost value = 1.0. Which means I can set
> a document's boost value and the boost value can adjust the final
> score, but I can't read the boost value from the document I have
> searched out.
> Is it a bug in lucene? Thanks. I use lucene version 2.4.1.
> PS: Is there any other way to meet my reqirement? I think it is
> not a good idea to adjust document's final score through writing a
> document boost into the index store. Because if I want to open two
> interfaces to the Client: one is sorting documents only by score which
> is just the similarity score and has not been adjusted by boost value
> f(t), the other is sorting by final score which has been adjuested by
> boost value f(t). Thank a lot!
>
> wilson
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


songzi0206 at gmail

Nov 26, 2009, 5:59 PM

Post #4 of 8 (987 views)
Permalink
Re: Is it a lucene bug? [In reply to]

hi
Thank you vary much
I have another question.As is mentioned in document class: if I
set a boost for a document, it is used when indexing the field and
multiplied to each field. Here is a case: sometimes I want the boost
to be a factor of score, but sometimes I want to ignore the boost when
scoring the searched hits. Can lucene fulfill? If can,how to write the
search code in that case?

2009/11/26 Uwe Schindler <uwe [at] thetaphi>:
> Read the documentation of the Document class: if you set a boost for a
> document, it is used when indexing the fields and multiplied to each field.
> For the document itself no boost value is stored, so you cannot get it (only
> so called stored fields are retrievable).
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
>
>
>> -----Original Message-----
>> From: Wilson Wu [mailto:songzi0206 [at] gmail]
>> Sent: Thursday, November 26, 2009 1:01 PM
>> To: java-user [at] lucene
>> Subject: Is it a lucene bug?
>>
>> Hi,
>> Recently, there is a requirement to sort the hits by both the
>> scores of documents and the updateTime which is a field of document to
>> mark the document's update time. We want the new document in the
>> front and also want high score document in the front,in other words,
>> we want to mix the score and updateTime, but not first sort by
>> one,second by the other. So, I design a time based function f(t) to
>> calculte each document boost to write into the index store.
>> The result is that I can caculate a value for each document
>> based its update time, and the value can influence the document score
>> through adjusting the fieldNorm value. But when I lookup the boost
>> value through the method document.getBoost() from every document in
>> the index store, I found the boost value = 1.0. Which means I can set
>> a document's boost value and the boost value can adjust the final
>> score, but I can't read the boost value from the document I have
>> searched out.
>> Is it a bug in lucene? Thanks. I use lucene version 2.4.1.
>> PS: Is there any other way to meet my reqirement? I think it is
>> not a good idea to adjust document's final score through writing a
>> document boost into the index store. Because if I want to open two
>> interfaces to the Client: one is sorting documents only by score which
>> is just the similarity score and has not been adjusted by boost value
>> f(t), the other is sorting by final score which has been adjuested by
>> boost value f(t). Thank a lot!
>>
>> wilson
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


tangfulin at gmail

Nov 26, 2009, 6:00 PM

Post #5 of 8 (989 views)
Permalink
Re: Is it a lucene bug? [In reply to]

Maybe you should take a look at the Scorer and Similarity series
classes , they will show you how the score is calculated , make some
change of them, and you will get what you want.

We have the same problem and we get it done by write subclasses of
DefaultSimilarity and BooleanScorer


2009/11/26 Uwe Schindler <uwe [at] thetaphi>:
> Read the documentation of the Document class: if you set a boost for a
> document, it is used when indexing the fields and multiplied to each field.
> For the document itself no boost value is stored, so you cannot get it (only
> so called stored fields are retrievable).
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
>
>
>> -----Original Message-----
>> From: Wilson Wu [mailto:songzi0206 [at] gmail]
>> Sent: Thursday, November 26, 2009 1:01 PM
>> To: java-user [at] lucene
>> Subject: Is it a lucene bug?
>>
>> Hi,
>>      Recently, there is a requirement to sort the hits by both the
>> scores of documents and the updateTime which is a field of document to
>> mark the document's update time.  We want the new document in the
>> front and also want high score document in the front,in other words,
>> we want to mix the score and updateTime, but not first sort by
>> one,second by the other. So, I design a time based function f(t) to
>> calculte each document boost to write into the index store.
>>       The result is that I can caculate a value for each document
>> based its update time, and the value can influence the document score
>> through adjusting the fieldNorm value. But when I lookup the boost
>> value through the method document.getBoost() from every document in
>> the index store, I found the boost value = 1.0. Which means I can set
>> a document's boost value and the boost value can adjust the final
>> score, but I can't read the boost value from the document I have
>> searched out.
>>     Is it a bug in lucene? Thanks. I use lucene version 2.4.1.
>>     PS: Is there any other way to meet my reqirement?  I think it is
>> not a good idea to adjust document's final score through writing a
>> document boost into the index store. Because if I want to open two
>> interfaces to the Client: one is sorting documents only by score which
>> is just the similarity score and has not been adjusted by boost value
>> f(t), the other is sorting by final score which has been adjuested by
>> boost value f(t). Thank a lot!
>>
>>                                                wilson
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>



--
梦的开始挣扎于城市的边缘
心的远方执着在脚步的瞬间
我的宿命埋藏了寂寞的永远

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


songzi0206 at gmail

Nov 26, 2009, 7:02 PM

Post #6 of 8 (990 views)
Permalink
Re: Is it a lucene bug? [In reply to]

Hi,
I am afraid I didn't describe clearly enough in my last mail.
Let me describe it again.
For example, there are 5 documents as doc1,doc2,doc3,doc4,doc5 in the
search hits. And their updateTimes are respectively
t1 = doc1.updateTime = 2009-01-01 12:45:00
t2 = doc2.updateTime = 2009-01-01 15:30:00
t3 = doc3.updateTime = 2009-01-05 09:45:00
t4 = doc4.updateTime = 2009-08-01 12:45:00
t5 = doc5.updateTime = 2009-11-27 12:45:00
Suppose their relevancy scores are:
score1 = doc1.score = 2.4
score2 = doc2.score = 2.3
score3 = doc3.score = 2.3
score4 = doc4.score = 1.8
score5 = doc5.score = 1.6
If I don't care the updateTime and I sort by document score
(relevancy), the sequence should be doc1 > doc2 > doc3 > doc4 > doc5,
am I right?
But we should take the updateTime as a sorting factor. Through the
function f(t), we can calculate values according to updateTimes.
Suppose the values are
v1 = f(t1) = 2.00
v2 = f(t2) = 2.01
v3 = f(t3) = 2.1
v4 = f(t4) = 2.5
v5 = f(t5) = 3.5
So the final result is:
r1 = v1 * score1 = 2 * 2.4 = 4.8
r2 = v2 * score2 = 2.01 * 2.3 = 4.623
r3 = v3 * score3 = 2.1 * 2.3 = 4.83
r4 = v4 * score4 = 2.5 * 1.8 = 4.5
r5 = v5 * score5 = 3.5 * 1.6 = 5.6,
r5 > r3 > r1 > r2 > r4
the sequence should be doc5 > doc3 > doc1 > doc2 > doc4 .

In the above example, we can see althrough score1(= 2.4) >
score2(=2.3) = score3(=2.3), but t2 is almost 3 hours bigger than
t1,and t3 is almost 4 days bigger than t1. We think 3 hours is a small
value,and 4 days maybe a much big value. So the final result r3 > r1 >
r2. And we can also change the updateTime's proportion in sorting
factors through changing the function f(t).

Am I describing clearly?






2009/11/26 Savvas-Andreas Moysidis <savvas.andreas.moysidis [at] googlemail>:
> hi,
>
>
>
> Im not exactly sure I understand they the type of sorting you are trying to
> achieve.
>
> You have an updateTime field and you mention that "We want the new document
> in the
> front and also want high score document in the front".
>
> My take on this is that you want to first sort by the updateTime and then by
> score but you say this is not the case?
>
>
> Instead of calculating a boost value with f(t) can you not calculate and
> index the actual value you need for every document?
>
> Then you can first sort by this value and then by score?
>
>
>
> regards,
>
> savvas
>
>
> 2009/11/26 Wilson Wu <songzi0206 [at] gmail>
>
>> Hi,
>> Recently, there is a requirement to sort the hits by both the
>> scores of documents and the updateTime which is a field of document to
>> mark the document's update time. We want the new document in the
>> front and also want high score document in the front,in other words,
>> we want to mix the score and updateTime, but not first sort by
>> one,second by the other. So, I design a time based function f(t) to
>> calculte each document boost to write into the index store.
>> The result is that I can caculate a value for each document
>> based its update time, and the value can influence the document score
>> through adjusting the fieldNorm value. But when I lookup the boost
>> value through the method document.getBoost() from every document in
>> the index store, I found the boost value = 1.0. Which means I can set
>> a document's boost value and the boost value can adjust the final
>> score, but I can't read the boost value from the document I have
>> searched out.
>> Is it a bug in lucene? Thanks. I use lucene version 2.4.1.
>> PS: Is there any other way to meet my reqirement? I think it is
>> not a good idea to adjust document's final score through writing a
>> document boost into the index store. Because if I want to open two
>> interfaces to the Client: one is sorting documents only by score which
>> is just the similarity score and has not been adjusted by boost value
>> f(t), the other is sorting by final score which has been adjuested by
>> boost value f(t). Thank a lot!
>>
>> wilson
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


songzi0206 at gmail

Nov 26, 2009, 7:09 PM

Post #7 of 8 (985 views)
Permalink
Re: Is it a lucene bug? [In reply to]

Hi,
Thanks for your inspiration. What version(lucene 2.4 or 2.9 or
others) are you used in your project. Can you give more details fo
your suggestion, thanks.

Wilson

2009/11/27 fulin tang <tangfulin [at] gmail>:
> Maybe you should take a look at the Scorer and Similarity series
> classes , they will show you how the score is calculated , make some
> change of them, and you will get what you want.
>
> We have the same problem and we get it done by write subclasses of
> DefaultSimilarity and BooleanScorer
>
>
> 2009/11/26 Uwe Schindler <uwe [at] thetaphi>:
>> Read the documentation of the Document class: if you set a boost for a
>> document, it is used when indexing the fields and multiplied to each field.
>> For the document itself no boost value is stored, so you cannot get it (only
>> so called stored fields are retrievable).
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe [at] thetaphi
>>
>>
>>> -----Original Message-----
>>> From: Wilson Wu [mailto:songzi0206 [at] gmail]
>>> Sent: Thursday, November 26, 2009 1:01 PM
>>> To: java-user [at] lucene
>>> Subject: Is it a lucene bug?
>>>
>>> Hi,
>>> Recently, there is a requirement to sort the hits by both the
>>> scores of documents and the updateTime which is a field of document to
>>> mark the document's update time. We want the new document in the
>>> front and also want high score document in the front,in other words,
>>> we want to mix the score and updateTime, but not first sort by
>>> one,second by the other. So, I design a time based function f(t) to
>>> calculte each document boost to write into the index store.
>>> The result is that I can caculate a value for each document
>>> based its update time, and the value can influence the document score
>>> through adjusting the fieldNorm value. But when I lookup the boost
>>> value through the method document.getBoost() from every document in
>>> the index store, I found the boost value = 1.0. Which means I can set
>>> a document's boost value and the boost value can adjust the final
>>> score, but I can't read the boost value from the document I have
>>> searched out.
>>> Is it a bug in lucene? Thanks. I use lucene version 2.4.1.
>>> PS: Is there any other way to meet my reqirement? I think it is
>>> not a good idea to adjust document's final score through writing a
>>> document boost into the index store. Because if I want to open two
>>> interfaces to the Client: one is sorting documents only by score which
>>> is just the similarity score and has not been adjusted by boost value
>>> f(t), the other is sorting by final score which has been adjuested by
>>> boost value f(t). Thank a lot!
>>>
>>> wilson
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>
>
>
> --
> εĿʼڳеıԵ
> ĵԶִڽŲ˲
> ҵ˼įԶ
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


savvas.andreas.moysidis at googlemail

Nov 27, 2009, 2:18 AM

Post #8 of 8 (974 views)
Permalink
Re: Is it a lucene bug? [In reply to]

have you considered a custom sort strategy using a ScoreDocComparator ?
Inside your implementation you have access to individual doc scores and you
could create a parallel (to your docs) array of floats which stores your
r1,r2,r3 etc values.
Then use this array to implement your int compare(ScoreDoc i, ScoreDoc j)method.

savvas.

2009/11/27 Wilson Wu <songzi0206 [at] gmail>

> Hi,
> Thanks for your inspiration. What version(lucene 2.4 or 2.9 or
> others) are you used in your project. Can you give more details fo
> your suggestion, thanks.
>
> Wilson
>
> 2009/11/27 fulin tang <tangfulin [at] gmail>:
> > Maybe you should take a look at the Scorer and Similarity series
> > classes , they will show you how the score is calculated , make some
> > change of them, and you will get what you want.
> >
> > We have the same problem and we get it done by write subclasses of
> > DefaultSimilarity and BooleanScorer
> >
> >
> > 2009/11/26 Uwe Schindler <uwe [at] thetaphi>:
> >> Read the documentation of the Document class: if you set a boost for a
> >> document, it is used when indexing the fields and multiplied to each
> field.
> >> For the document itself no boost value is stored, so you cannot get it
> (only
> >> so called stored fields are retrievable).
> >>
> >> -----
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: uwe [at] thetaphi
> >>
> >>
> >>> -----Original Message-----
> >>> From: Wilson Wu [mailto:songzi0206 [at] gmail]
> >>> Sent: Thursday, November 26, 2009 1:01 PM
> >>> To: java-user [at] lucene
> >>> Subject: Is it a lucene bug?
> >>>
> >>> Hi,
> >>> Recently, there is a requirement to sort the hits by both the
> >>> scores of documents and the updateTime which is a field of document to
> >>> mark the document's update time. We want the new document in the
> >>> front and also want high score document in the front,in other words,
> >>> we want to mix the score and updateTime, but not first sort by
> >>> one,second by the other. So, I design a time based function f(t) to
> >>> calculte each document boost to write into the index store.
> >>> The result is that I can caculate a value for each document
> >>> based its update time, and the value can influence the document score
> >>> through adjusting the fieldNorm value. But when I lookup the boost
> >>> value through the method document.getBoost() from every document in
> >>> the index store, I found the boost value = 1.0. Which means I can set
> >>> a document's boost value and the boost value can adjust the final
> >>> score, but I can't read the boost value from the document I have
> >>> searched out.
> >>> Is it a bug in lucene? Thanks. I use lucene version 2.4.1.
> >>> PS: Is there any other way to meet my reqirement? I think it is
> >>> not a good idea to adjust document's final score through writing a
> >>> document boost into the index store. Because if I want to open two
> >>> interfaces to the Client: one is sorting documents only by score which
> >>> is just the similarity score and has not been adjusted by boost value
> >>> f(t), the other is sorting by final score which has been adjuested by
> >>> boost value f(t). Thank a lot!
> >>>
> >>> wilson
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >>> For additional commands, e-mail: java-user-help [at] lucene
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> For additional commands, e-mail: java-user-help [at] lucene
> >>
> >>
> >
> >
> >
> > --
> > εĿʼڳеıԵ
> > ĵԶִڽŲ˲
> > ҵ˼įԶ
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > For additional commands, e-mail: java-user-help [at] lucene
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.