Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Similarity coefficient for more exact matching

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


sxamt at yahoo

Apr 27, 2012, 5:18 AM

Post #1 of 4 (497 views)
Permalink
Similarity coefficient for more exact matching

Hi guys,
I have a field, Anayzed, Store.No.
Suppose one Document with value inside this field "Hello".
Another one "Hello world , one, two, three, four".
Since the field is Analyzed (with norms), the "one two three four) will definitely affect the resulting rating in case we search for "Hello world" query. Does anyone know whether I can control some coefficients to determine what is the weight for exact matching vs. amount of worlds (the norm factor)?
Thanks,
 

Maxim


ian.lea at gmail

Apr 27, 2012, 6:29 AM

Post #2 of 4 (474 views)
Permalink
Re: Similarity coefficient for more exact matching [In reply to]

You can override org.apache.lucene.search.Similarity/DefaultSimilarity
to tweak quite a lot of stuff.

computeNorm() may be the method you are interested in. Called at
indexing time so be sure to use the same implementation at index and
query time, using IndexWriterConfig.setSimilarity() and
IndexSearcher.setSimilarity(), unless you are clever or like being
confused.

SweetSpotSimilarity might also be worth a look.

--
Ian.


On Fri, Apr 27, 2012 at 1:18 PM, Maxim Terletsky <sxamt [at] yahoo> wrote:
> Hi guys,
> I have a field, Anayzed, Store.No.
> Suppose one Document with value inside this field "Hello".
> Another one "Hello world , one, two, three, four".
> Since the field is Analyzed (with norms), the "one two three four) will definitely affect the resulting rating in case we search for "Hello world" query. Does anyone know whether I can control some coefficients to determine what is the weight for exact matching vs. amount of worlds (the norm factor)?
> Thanks,
>
>
> Maxim

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


paul at metajure

May 4, 2012, 9:32 AM

Post #3 of 4 (456 views)
Permalink
RE: Similarity coefficient for more exact matching [In reply to]

> [use] IndexWriterConfig.setSimilarity() and
> IndexSearcher.setSimilarity(), unless you are clever or like being confused.
>
> SweetSpotSimilarity might also be worth a look.
>
> --
> Ian.

Being even less clever, I just make sure I set:

Similarity.setDefault(new MySimilarity())

when crawling and searching, so everything uses the same similarity strategies.

Checking the 3.4 code IndexWriterConfig and IndexSearcher, both default to Similarity.getDefault().

Any thoughts on scenarios where you'd not push a custom similarity into the default position?

-Paul


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


ian.lea at gmail

May 10, 2012, 1:26 AM

Post #4 of 4 (433 views)
Permalink
Re: Similarity coefficient for more exact matching [In reply to]

Similarity.setDefault(new MySimilarity()) is certainly better than the
2 calls I recommended. Thanks.

I find it hard to see why one might not want to do this in normal
usage but have a vague recollection of someone once outlining some
obscure scenarios where different similarities at index and search
time made sense.


--
Ian.


On Fri, May 4, 2012 at 5:32 PM, Paul Hill <paul [at] metajure> wrote:
>> [use] IndexWriterConfig.setSimilarity() and
>> IndexSearcher.setSimilarity(), unless you are clever or like being confused.
>>
>> SweetSpotSimilarity might also be worth a look.
>>
>> --
>> Ian.
>
> Being even less clever,  I just make sure I set:
>
> Similarity.setDefault(new MySimilarity())
>
> when crawling and searching, so everything uses the same similarity strategies.
>
> Checking the 3.4 code IndexWriterConfig and IndexSearcher, both default to Similarity.getDefault().
>
> Any thoughts on scenarios where you'd not push a custom similarity into the default position?
>
> -Paul
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.