Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Nov 24, 2009, 3:29 PM

Post #1 of 24 (670 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782235#action_12782235 ]

Robert Muir commented on LUCENE-2091:
-------------------------------------

I am using a refactored version of this too, looking forward to seeing your implementation!


> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 29, 2009, 8:02 PM

Post #2 of 24 (623 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783530#action_12783530 ]

Otis Gospodnetic commented on LUCENE-2091:
------------------------------------------

Has anyone compared this particular BM25 impl. to the current Lucene's quasi-VSM approach in terms of:
* any of the relevance eval methods
* indexing performance
* search performance
* ...

Also, this issue is marked as contrib/*. Should this not go straight to core, so more people actually use this and provide feedback? Who knows, there is a chance (ha!) BM25 might turn out better than the current approach, and become the default.

> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 29, 2009, 11:03 PM

Post #3 of 24 (621 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783555#action_12783555 ]

Yuval Feinstein commented on LUCENE-2091:
-----------------------------------------

Otis and Robert, Here's my (limited) experience with BM25:
On a proprietary corpus (alas) I got a nice improvement, which was more pronounced in recall
(hits that were previously not ranked as top ones, and therefore remained unseen, now appear in the top results).
I have worked on lowering the BM25 run time to a reasonable level.
I hope that once this gets into the hands of the Lucene community, BM25 performance
will approach the current Lucene scoring's performance. This is a tall order,
as the latter has been in the works for the last eight years or so.
As for use cases, in my use case BM25 helps, I believe this may be true for other cases.


> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Attachments: persianlucene.jpg
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Nov 29, 2009, 11:23 PM

Post #4 of 24 (619 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783558#action_12783558 ]

Robert Muir commented on LUCENE-2091:
-------------------------------------

Yuval, bm25 has been working nicely for me too.
on some collections, it really helps, but I haven't yet found a case where it hurts (compared to lucene's current scoring algorithm)

thanks in advance for working this!


> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Attachments: persianlucene.jpg
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Dec 1, 2009, 8:02 AM

Post #5 of 24 (602 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784270#action_12784270 ]

Joaquin Perez-Iglesias commented on LUCENE-2091:
------------------------------------------------

Hi Otis, Robert and Yuval.

I developed this add-on for Lucene in 2008, for some experiments that I was doing, and I would like to express my impressions about this.

In my experience and after reading lot of papers I have never found a case where the Lucene-VSM implementation improves BM25 performance.
BM25 (with standard parameters) outperforms Lucene-VSM, moreover a room for improvement exists if the parameters are fixed specifically for the collection. I made publish some results with the Eurogov collection some time ago.

I can show you now some experiments with TREC Disk4&5 collection, these results have been obtained with default parameters with the Robust track topics. As you can see BM25 improves the Lucene-VSM ranking function.

MAP P@5
VSM 0.2079 0.4096
BM25 0.2340 0.4578



This implementation is getting more popular and I know that some people is using it on their research, thus it will be really nice if at some point it is included in the core.

The only concerns that I have about it, are related with:
- Only simple boolean queries based on terms are supported (with operators or, and, not). For instance it does not support PhraseQuery.
- IDF cannot be calculated at a document level (this is important for BM25F).
- Another issue is related with computing the document average length, but this could be easily solved.


These issues are described in detail in the documentation that I made public in my website.

Thanks to all for your interest and work.

Joaquin Perez-Iglesias

> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Attachments: persianlucene.jpg
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Dec 1, 2009, 8:38 AM

Post #6 of 24 (602 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784284#action_12784284 ]

Robert Muir commented on LUCENE-2091:
-------------------------------------

{quote}
In my experience and after reading lot of papers I have never found a case where the Lucene-VSM implementation improves BM25 performance.
BM25 (with standard parameters) outperforms Lucene-VSM, moreover a room for improvement exists if the parameters are fixed specifically for the collection
{quote}

Just wanted to mention that in the results I have provided, I never change the default parameters (B,K1) either.


> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Attachments: persianlucene.jpg
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Dec 3, 2009, 2:22 AM

Post #7 of 24 (581 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785250#action_12785250 ]

Yuval Feinstein commented on LUCENE-2091:
-----------------------------------------

So what is the next step? Robert, do you have time to look at this? Or should I assign this to someone else? And how?
Sorry if these are trivial questions, but I'm a newbie...

> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2091.patch, persianlucene.jpg
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Dec 3, 2009, 2:54 AM

Post #8 of 24 (582 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785264#action_12785264 ]

Robert Muir commented on LUCENE-2091:
-------------------------------------

Hi Yuval, I see your patch, I can help with some relevance testing and comments.

I don't know if it should be assigned to me, maybe we can trick one of the devs who really knows the scoring system to well to look at it, especially about performance and things like that.

Here is the first thing I noticed, maybe I am completely stupid but I never understood this:

I don't understand why we need BM25Boolean.* and everything like that. I don't understand why these are necessary, they seem to be duplicates of BooleanQuery etc and just sum up subscorers or whatever.

So in my usages I dropped them. I just have BM25TermQuery,BM25TermScorer, and BM25Parameters and to use it, I override a method in QueryParser.


> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2091.patch, persianlucene.jpg
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Dec 3, 2009, 3:04 AM

Post #9 of 24 (580 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785271#action_12785271 ]

Uwe Schindler commented on LUCENE-2091:
---------------------------------------

I was wondering about the separate BooleanQuery, too as it is almost simply a copy (of an old version of it). The question is more, why do we need the BM25 calsses at ally, why should it be not possible to use normal term queries and other query types together with BM25 by just changing some scoring defaults? So replace Similarity and maybe have a switch inside the scorers. So TermQuery could be switched to BM25 mode and then using another Scorer or something like that.

That was just my first impression, these additional classes do not look like a good public API to me. Query classes should be abstract wrappers for wights and scoreres. The internal impl like BM25 or conventional should be hidden from the user (and maybe properties e.g. on the IndexSearcher to use BM25 scoring). This way, it could also be used for other query types (not only TermQ/BQ), but eg. for function queries (to further change the score) or FuzzyQuery and what else.

If what I said is complete nonsense, don't hurt me, I do not know much about BM25, but for me it is an implementation detail and not part of a public API.

> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2091.patch, persianlucene.jpg
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Dec 3, 2009, 3:10 AM

Post #10 of 24 (580 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785272#action_12785272 ]

Robert Muir commented on LUCENE-2091:
-------------------------------------

Uwe, yes I agree it would be nicer to do a tighter integration.

I am suggesting we tackle it one step at a time, first we can answer this question, next we can talk about average document length and other tricky things like that.

For this I calculate it from norms, as Doug suggested here: https://issues.apache.org/jira/browse/LUCENE-965?focusedCommentId=12515803&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12515803 , but lets get to this later.


> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2091.patch, persianlucene.jpg
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Dec 3, 2009, 4:18 AM

Post #11 of 24 (580 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785284#action_12785284 ]

Robert Muir commented on LUCENE-2091:
-------------------------------------

Hi Yuval, I think we should also consider BM25F being the really relevant version for Lucene, as it is field-oriented (and dropping document-oriented BM25).

because BM25F makes more sense for lucene, in my opinion. If you are searching BODY for some keywords, who cares how long the AUTHOR field is!

I think narrowing focus to BM25F would eliminate confusion about things such as "average document length" and then we work with just field and "average field length" and more people will have ideas.


> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2091.patch, persianlucene.jpg
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Dec 3, 2009, 11:55 AM

Post #12 of 24 (572 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785473#action_12785473 ]

Otis Gospodnetic commented on LUCENE-2091:
------------------------------------------

+1 for skipping BM25 and going straight to BM25F.

I think the answer to Uwe's question about why this can't just be a different Similarity or some such is that BM25 requires some data that Lucene currently doesn't collect. That's what there were some of those static methods in examples on the author's site. I *think* what I'm saying is correct. :)


> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2091.patch, persianlucene.jpg
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Dec 3, 2009, 1:34 PM

Post #13 of 24 (570 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785538#action_12785538 ]

Joaquin Perez-Iglesias commented on LUCENE-2091:
------------------------------------------------

Hi everybody,

I'm going to try to answer some of your questions, when I started to develop this library I didn't want
to modify the Lucene code, moreover I tried to create a jar that could be straight added to the official
Lucene distribution. That is the main reason why there are some duplicated classes.
So yes it would be better a tigher integration, and I believe we will get more support for different query types.

In relation with BM25 or BM25F they are equivalent, BM25F is the version for more than a field, so yes go for BM25F.
What it is really important is the way boost factors are applied, as you can see in the equation these must be applied to raw frequencies and not to normalized frequencies or saturated frequencies.
(Currently Lucene is doing it after normalization and saturation of frequencies, what in my opinion is not the best approach.)
A more detailed explanation of BM25F and this issue can be found in this paper http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.9.5255

The problem, as I said, comes from IDF. In the BM25 equations family, IDF is always computed at document level (that is why
I recommend as heuristic to use the field with more terms, or use an special field that contains all the terms). As far as I know that is a problem
because Lucene doesn't store the document frequency per document but per field.

Otis is right as far as I know just changing similarity is not enough, some data is not available to TermScorer neither similarity and TermScorer
apply the obtained values from similarity in a way that make it incompatible with BM25.
It is really important to follow the steps as it appears in my explanation:

1. Normalize frequencies with document/field length and b factor.
2. Saturate the effect of frequency with k1
3. Compute summatory of terms weights
4. Apply IDF

I really believe that this can be done (not sure how), so maybe we will need the suggestions of some 'scorer guru'.

> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2091.patch, persianlucene.jpg
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Dec 3, 2009, 2:14 PM

Post #14 of 24 (569 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785569#action_12785569 ]

Uwe Schindler commented on LUCENE-2091:
---------------------------------------

Thanks for the explanation!

About the IDF: The problem with a per-document IDF in lucene would be that most uses also add fields that are e.g. catch-all fields (which would be the IDF you want to have) but in addition they add special fields like numeric field (which would not produce a good IDF, at the moment this IDF is ignored). Some users also add fileds simply for sorting. So a IDF for documents is impossible with Lucene. You can only use e.g. catch all fields (which are always a godd idea for non-fielded searches, because oring all fields together is slower that just indexing the same terms a second time in a catch-all field), e.g. "contents" contains all terms from "title", "subject", "mailtext" as an example for emails. But the IDF for BM25F could be taken from the "contents" field even when searching only for a title.

> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2091.patch, persianlucene.jpg
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Dec 3, 2009, 5:03 PM

Post #15 of 24 (566 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785690#action_12785690 ]

Otis Gospodnetic commented on LUCENE-2091:
------------------------------------------

Joaquin - could you please explain what you mean by "Saturate the effect of frequency with k1"? Thanks.

> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2091.patch, persianlucene.jpg
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Dec 4, 2009, 1:43 AM

Post #16 of 24 (551 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785840#action_12785840 ]

Joaquin Perez-Iglesias commented on LUCENE-2091:
------------------------------------------------

Yes sorry.

Basically what we are trying is to constraint the effect of the raw frequency (saturate the frequency).
In Lucene this is carried out with the root square of the frequency, another classical approach
is to use the log. With both approaches we avoid giving a linear 'importance' to the frequency.

BM25 is a bit tricky, it parametrises the 'saturation' of the frequency with a parameter k1, with the
equation weight(t)/(weight(t)+k1). Usually k1 is fixed to 2, but it can be fixed by collection.

> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2091.patch, persianlucene.jpg
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Dec 4, 2009, 3:20 AM

Post #17 of 24 (548 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785880#action_12785880 ]

Michael McCandless commented on LUCENE-2091:
--------------------------------------------

Could someone summarize what's missing in Lucene's index format, to allow a query's scorer to efficiently compute BM25?

It looks like these two are important for BM25F:

* How many times does term T occur in all fields for doc D?

* How many documents contain term T in any of their fields? (for computing document level IDF)

Is there anything else?

bq. Only simple boolean queries based on terms are supported (with operators or, and, not). For instance it does not support PhraseQuery.

This is concerning -- is there no way to score a PhraseQuery in BM25F?

> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2091.patch, persianlucene.jpg
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Dec 4, 2009, 5:12 AM

Post #18 of 24 (541 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785918#action_12785918 ]

Simon Willnauer commented on LUCENE-2091:
-----------------------------------------

Joaquin,
bq. For example if we have indexed with 3 fields. F1, F2, F3, and the user want to search on F1, and F2 there is no way to compute docFreq in both fields. With a catch-all field we have docFreq for all fields.
can you explain this a little more in detail. From my understanding, if I use a BooleanQuery a:x1 AND b:x2 TermWeigth will calculate the IDF for Term(a, x1) and Term(b, x2), am I missing something?

> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2091.patch, persianlucene.jpg
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Dec 4, 2009, 11:17 AM

Post #19 of 24 (528 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786087#action_12786087 ]

Joaquin Perez-Iglesias commented on LUCENE-2091:
------------------------------------------------

Yes, you are right what I meant was related with multifield queries, if you search a:F1^F2, the right approach will be to compute IDF with docFreq(a,F1^F1) what in my understanding cannot be done.

If I'm right Lucene does weight(a)*idf(a,F1) + weight(a)*idf(a,F2), and the correct approach would be weight(a)*idf(a,F1^F2).

That's the reason why Uwe (and I) suggested to use IDF per field in the previous case, and if the query is executed on each field, use a kind of catch-all field to compute docFreq in all fields.

(Michael)
In summary it will be nice to have:

1. docFreq at document level, something like "int docFreq(term, doc_id)" and return the number of documents where term occurs, but if it is not possible a catch-all field will be enough.
2. The Collection Average Document Length and Collection Average Field Length (per each field).

I don't think that we need "How many times does term T occur in all fields for doc D", frequency is necessary per field and not per document.

I don't know too much about the implementation of PhraseQuery, but I think that should be possible to implement BM25F for it (and any other query type), as far as frequency and docFreq of the phrase/terms are available.

At this point it is not supported in the patch, but I don't see any reason why it couldn't be implemented, moreover that I don't really know how to do it :-).


> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2091.patch, persianlucene.jpg
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Dec 4, 2009, 12:25 PM

Post #20 of 24 (525 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786135#action_12786135 ]

Michael McCandless commented on LUCENE-2091:
--------------------------------------------

bq. I think that should be possible to implement BM25F for it

Ahh OK I just misunderstood -- BM25 can score PhraseQuery; it's just that the current patch doesn't implement that.

bq. 1. docFreq at document level, something like "int docFreq(term, doc_id)" and return the number of documents where term occurs, but if it is not possible a catch-all field will be enough.

OK, catch all seems like an OK starting point. I wonder if we could enable storing terms dict but not postings... then we could store catch all just for the terms stats, so we wouldn't waste disk space. Though merging gets tricky, since we'd have to walk postings for all fields (or at least all involved in BM25F) in parallel, re-computing the catch-all stats.

bq. 2. The Collection Average Document Length and Collection Average Field Length (per each field).

Lucene doesn't store/compute this today... we can easily compute these stats for newly created segments, and record in the segments file, but then recomputing them during segment merging with deletions gets tricky. We could just take the linear approximate avg with deletions, but that may end up being too approximate, so we could instead make a dedicated posting list, which would be properly merged, but we'd then have to re-walk to compute the stats for the newly merged segment.

> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2091.patch, persianlucene.jpg
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Dec 4, 2009, 1:03 PM

Post #21 of 24 (524 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786164#action_12786164 ]

Grant Ingersoll commented on LUCENE-2091:
-----------------------------------------

I haven't looked at the patch yet, but...

Should we take just a small step back and consider what it would take to actually make scoring more pluggable instead of just thinking about how best to integrate BM25? In other words, someone else has also donated an implementation of the Axiomatic Retr. Function. Much like BM25, I think it also requires avg. doc length, as does (I believe) language modeling and some other approaches. Of course, we need to do this in a way that doesn't hurt performance for the default case.

I'm also curious if anyone has compared BM25 w/ a Lucene similarity that uses a different length normalization factor? I've seen many people use a different len. norm with good success, but it isn't necessarily for everyone.

> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2091.patch, persianlucene.jpg
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Dec 4, 2009, 2:37 PM

Post #22 of 24 (519 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786207#action_12786207 ]

Robert Muir commented on LUCENE-2091:
-------------------------------------

bq. In other words, someone else has also donated an implementation of the Axiomatic Retr. Function.

I've never been able to get that scoring function to do anything more than be consistently worse than the default Lucene formula. I tried at least 3 test collections with it...

bq. I'm also curious if anyone has compared BM25 w/ a Lucene similarity that uses a different length normalization factor? I've seen many people use a different len. norm with good success, but it isn't necessarily for everyone.

Yes in the image posted here, I tried modifying length normalization with SweetSpot etc as others have done in the past. For this corpus I was unable to improve it in this way.

But maybe I made some mistakes in both these cases, so anyone feel free to try this themselves.


> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2091.patch, persianlucene.jpg
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Dec 4, 2009, 2:59 PM

Post #23 of 24 (521 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786220#action_12786220 ]

Grant Ingersoll commented on LUCENE-2091:
-----------------------------------------

bq. Yes in the image posted here, I tried modifying length normalization with SweetSpot etc as others have done in the past. For this corpus I was unable to improve it in this way.

Yeah, can't speak for SweetSpot, but there are other approaches too that don't favor shorter docs all the time.

> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2091.patch, persianlucene.jpg
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jira at apache

Dec 4, 2009, 3:03 PM

Post #24 of 24 (521 views)
Permalink
[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene [In reply to]

[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786223#action_12786223 ]

Robert Muir commented on LUCENE-2091:
-------------------------------------

bq. Yeah, can't speak for SweetSpot, but there are other approaches too that don't favor shorter docs all the time.

This is why it would be interesting if someone came up with a different modification (preferably one that isn't corpus-specific tuning) that actually works for that one corpus.

Its in openrelevance svn, anyone can try.

> Add BM25 Scoring to Lucene
> --------------------------
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Yuval Feinstein
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2091.patch, persianlucene.jpg
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime somewhat.
> I would like to contribute the code to Lucene under contrib.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.