Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

答复: how to effeciently implement the stastical scores like pagerank?

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


zhou_qi at sjtu

Nov 15, 2007, 6:25 AM

Post #1 of 3 (199 views)
Permalink
答复: how to effeciently implement the stastical scores like pagerank?

Thank you, my score is fixed score from the properties of the page, but at first we need to adjust the score for a promising result.
I have tried one way of manually re-ranking all the documents by the search results. But it needs to iterate all the retrieved results and fetch the re-ranking score (stored in the index) to sum the overall score. It is inefficient. How to improve that by a new approach?

Sorry for making you misunderstanding. Thanks

Best Regards,

Zhou Qi
Dept. Computer Science & Engineering
Shanghai Jiaotong University


-----邮件原件-----
发件人: Chee Wu [mailto:chee.wu[at]gmail.com]
发送时间: 2007年11月15日 14:23
收件人: java-user[at]lucene.apache.org
主题: Re: how to effeciently implement the stastical scores like pagerank?

Not sure with what you want to do ..
There are many factors can affect the rank of documents.Some factors should
be fixed, always the same for different query words ,such as PageRank and
the ratio between amount of the links and the full text length of the
pages,and VSM should be a dynamic factor..

So how do you want to implement your own score mechanism, just import some
new factors or a completely new way? If add new factor, then a fixed one or
a dynamic one ?



2007/11/15, Zhou Qi <zhou_qi[at]sjtu.edu.cn>:
>
> Hi Guys,
>
>
>
> I made a problem in implement some extra scores besides the VSM
> model. My works entails with re-ranking the returned documents from the
> extra scores like page quality or page property ( good page or navigator
> page). I have tried two approaches before:
>
> A. Setting the document boosting attribute in indexing, but it is
> hard
> to adjust these weight after indexing the documents.
>
> B. The second way is to retrieve all the documents and getting the
> score first, and then I will need to iterate all the documents to fetch
> some
> store fields to get the property of the page to compute the re-ranking
> score. From the experiments before, this way takes too much time and it
> will
> need to iterate all the documents to re-ranking. This implementation is
> easy
> to adjust the ranking algorithem but it is too slow.
>
> Can anyone give a better solution to do that? Thanks in advance.
>
>
>
> Best Regards,
>
>
>
> Zhou Qi
>
> Dept. Computer Science & Engineering
>
> Shanghai Jiaotong University
>
>
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


john.wang at gmail

Nov 15, 2007, 7:03 AM

Post #2 of 3 (190 views)
Permalink
Re: : how to effeciently implement the stastical scores like pagerank? [In reply to]

Would payload work?
-John

On 11/15/07, Zhou Qi <zhou_qi[at]sjtu.edu.cn> wrote:
>
> Thank you, my score is fixed score from the properties of the page, but at
> first we need to adjust the score for a promising result.
> I have tried one way of manually re-ranking all the documents by the
> search results. But it needs to iterate all the retrieved results and fetch
> the re-ranking score (stored in the index) to sum the overall score. It is
> inefficient. How to improve that by a new approach?
>
> Sorry for making you misunderstanding. Thanks
>
> Best Regards,
>
> Zhou Qi
> Dept. Computer Science & Engineering
> Shanghai Jiaotong University
>
>
> -----ʼԭ-----
> : Chee Wu [mailto:chee.wu[at]gmail.com]
> ʱ: 20071115 14:23
> ռ: java-user[at]lucene.apache.org
> : Re: how to effeciently implement the stastical scores like pagerank?
>
> Not sure with what you want to do ..
> There are many factors can affect the rank of documents.Some factors
> should
> be fixed, always the same for different query words ,such as PageRank and
> the ratio between amount of the links and the full text length of the
> pages,and VSM should be a dynamic factor..
>
> So how do you want to implement your own score mechanism, just import some
> new factors or a completely new way? If add new factor, then a fixed one
> or
> a dynamic one ?
>
>
>
> 2007/11/15, Zhou Qi <zhou_qi[at]sjtu.edu.cn>:
> >
> > Hi Guys,
> >
> >
> >
> > I made a problem in implement some extra scores besides the VSM
> > model. My works entails with re-ranking the returned documents from the
> > extra scores like page quality or page property ( good page or navigator
> > page). I have tried two approaches before:
> >
> > A. Setting the document boosting attribute in indexing, but it is
> > hard
> > to adjust these weight after indexing the documents.
> >
> > B. The second way is to retrieve all the documents and getting the
> > score first, and then I will need to iterate all the documents to fetch
> > some
> > store fields to get the property of the page to compute the re-ranking
> > score. From the experiments before, this way takes too much time and it
> > will
> > need to iterate all the documents to re-ranking. This implementation is
> > easy
> > to adjust the ranking algorithem but it is too slow.
> >
> > Can anyone give a better solution to do that? Thanks in advance.
> >
> >
> >
> > Best Regards,
> >
> >
> >
> > Zhou Qi
> >
> > Dept. Computer Science & Engineering
> >
> > Shanghai Jiaotong University
> >
> >
> >
> >
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-user-help[at]lucene.apache.org
>
>


buschmic at gmail

Nov 15, 2007, 10:33 AM

Post #3 of 3 (189 views)
Permalink
Re: : how to effeciently implement the stastical scores like pagerank? [In reply to]

John Wang wrote:
> Would payload work?
> -John
>
>
Yes, if you used payloads instead of stored fields your performance
should be much better.

Try and index one special term per document (e. g. score:pagerank), and
index one position with a payload for each doc. Then when you retrieve
hits open a TermPositions using the special term, get the payload and
incorporate it in the docs' score.

The performance overhead should be comparable to adding one AND-term to
your query.

-Michael

> On 11/15/07, Zhou Qi <zhou_qi[at]sjtu.edu.cn> wrote:
>
>> Thank you, my score is fixed score from the properties of the page, but at
>> first we need to adjust the score for a promising result.
>> I have tried one way of manually re-ranking all the documents by the
>> search results. But it needs to iterate all the retrieved results and fetch
>> the re-ranking score (stored in the index) to sum the overall score. It is
>> inefficient. How to improve that by a new approach?
>>
>> Sorry for making you misunderstanding. Thanks
>>
>> Best Regards,
>>
>> Zhou Qi
>> Dept. Computer Science & Engineering
>> Shanghai Jiaotong University
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.