Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Custom scoring algorithm

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


gimenete at gmail

Nov 13, 2009, 6:42 AM

Post #1 of 2 (652 views)
Permalink
Custom scoring algorithm

Hi.

I am developing an application and I would like to add searching
capabilities. I have a database with items. Each item has a number of
"features" with a numeric value. Example: feature_x=100,
feature_y=200. Items can have common or different "features". And they
can have a variable number of "features" as well. In the whole
application can be hundreds of different features. I need users to be
able to make queries by features and I need the results to be sorted
as the sum of those features. For example if a user looks for "x, y,
z", the first result should be the item with the greatest
(features_x+feature_y+feature_z).

I think Lucene can be a solution. But I don't need some of its
features such as Analyzers, Filters, Tokenizers... I think I should
implement my own scoring algorithm. How can I do that in the easiest
possible way? I have read something about payloads. Can they be useful
for my needings?

My first attempt was to generate dumb strings containing each feature
name repeated as many times as the feature value. Example: "x, x, x,
x, y, y, z, z". Of course I know this is a very poor solution. And I
also have seen that it doesn't work as I expected because the default
scoring algorithm is much more complex than just counting words.

Thank you very much in advance.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


gimenete at gmail

Nov 13, 2009, 9:09 AM

Post #2 of 2 (604 views)
Permalink
Re: Custom scoring algorithm [In reply to]

Hi again.

I've made a proof of concept using the boost factor. I have done the
following: add a field for each feature and put the field boost factor
as the feature value.

private static void addDocument(String id, Map<String, Integer>
features, IndexWriter writer) throws IOException {
Document doc = new Document();

doc.add(new Field("id", id,
Field.Store.YES, Field.Index.NO));

for (String key : features.keySet()) {
Field field = new Field(key, key,
Field.Store.YES, Field.Index.ANALYZED);
field.setBoost(features.get(key));
doc.add(field);
}

writer.addDocument(doc);
}

But I don't know if this is the best way of doing this.

Thanks.


On Fri, Nov 13, 2009 at 3:42 PM, Alberto Gimeno <gimenete [at] gmail> wrote:
> Hi.
>
> I am developing an application and I would like to add searching
> capabilities. I have a database with items. Each item has a number of
> "features" with a numeric value. Example: feature_x=100,
> feature_y=200. Items can have common or different "features". And they
> can have a variable number of "features" as well. In the whole
> application can be hundreds of different features. I need users to be
> able to make queries by features and I need the results to be sorted
> as the sum of those features. For example if a user looks for "x, y,
> z", the first result should be the item with the greatest
> (features_x+feature_y+feature_z).
>
> I think Lucene can be a solution. But I don't need some of its
> features such as Analyzers, Filters, Tokenizers... I think I should
> implement my own scoring algorithm. How can I do that in the easiest
> possible way? I have read something about payloads. Can they be useful
> for my needings?
>
> My first attempt was to generate dumb strings containing each feature
> name repeated as many times as the feature value. Example: "x, x, x,
> x, y, y, z, z". Of course I know this is a very poor solution. And I
> also have seen that it doesn't work as I expected because the default
> scoring algorithm is much more complex than just counting words.
>
> Thank you very much in advance.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.