fancyerii at gmail
Apr 27, 2012, 12:40 AM
Post #8 of 8
+(title:hello title:world desc:hello desc:world)
Re: two fields, the first important than the second
[In reply to]
the boost values(100,50,10,10) should be carefully adjusted.
if tf of a document is very large, 10 may be not enough.
you can modify DefaultSimilariy of it's methods such as tf() idf() and
constrain them to a controllable range.
On Fri, Apr 27, 2012 at 2:59 PM, Akos Tajti <akos.tajti [at] gmail> wrote:
> Thanks gfor the details explanation. But as I understand this query will
> still match only documents that contains both terms (either in the same
> field or in different). What if there's a document that contains only
> "hello"? This query will not find it, am I right? But what we want to
> achieve is this. So in the result first have to come those documents that
> contain both terms then thos that contain only one of them.
> On Fri, Apr 27, 2012 at 5:17 AM, Li Li <fancyerii [at] gmail> wrote:
>> sorry for some typos.
>> original query +(title:hello desc:hello) +(title:world desc:world)
>> boosted one +(title:hello^2 desc:hello) +(title:world^2 desc:world)
>> last one +(title:hello desc:hello) +(title:world desc:hello)
>> (+title:hello +title:world)^10 (+desc:hello +desc:world)^5
>> the example has two terms. if it has more terms, the query will become too
>> On Fri, Apr 27, 2012 at 11:12 AM, Li Li <fancyerii [at] gmail> wrote:
>> > you should describe your ranking strategy more precisely.
>> > if the query has 2 terms, "hello" and "world" for example, and your
>> > search fields are title and description. There are many possible
>> > combinations.
>> > Here is my understanding.
>> > Both terms should occur in title or desc
>> > query may be +(title:hello desc:hello) +(title:world desc:hello)
>> > the problem is that we need title weight more than desc, so may be we
>> > rewrite it to
>> > +(title:hello^2 desc:hello) +(title:world^2 desc:hello)
>> > but we consider this two scenarios:
>> > 1. hello hit only in title, world hit only in desc
>> > 2. hello and world both hit in desc
>> > because title is boosted, so 1 has more score than 2.
>> > But we may think 2 is better than 1 because hello world is a phrase.
>> > But we don't want to use phrase query because it's too strict that the
>> > recall can meet our needs.
>> > Our solution is modify lucene so boolean scorer can tell us which term
>> > is matched. then we use our own collector to boost scenario 1. This
>> > solution need modify lucene(I have posted a mail and you can patch your
>> > DisjunctionSumScorer with
>> > https://issues.apache.org/jira/browse/LUCENE-2686)
>> > Another solution I can come up with is using complicated query:
>> > +(title:hello desc:hello) +(title:world desc:hello)
>> > (+title:hello +title:world)^10 (+desc:hello +desc:world)^5
>> > The must occurrence condition is the same as before. but if hello
>> > are all in title, we give it a boost. similarly, if hello world are all
>> > desc, we also boost it.
>> > On Fri, Apr 27, 2012 at 3:12 AM, Akos Tajti <akos.tajti [at] gmail>
>> >> Dear List,
>> >> we've been struggling the following problem for a while:
>> >> we have two fields: title and description. Title is generated from short
>> >> summaries while description is generated fromlong texts. We want to
>> >> on both fields at the same time but we'd like to get all documents in
>> >> which
>> >> the title matches the search term before all others. For multi term
>> >> queries
>> >> we want to achieve the following: all documents that contain all terms
>> >> their title must come before every other document, no matter how many
>> >> times
>> >> the description matches the query. Is there a simple way to achieve
>> >> Thanks in advance,
>> >> Ákos Tajti
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene