Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: kinosearch: discuss

Weight requires a Similarity

 

 

kinosearch discuss RSS feed   Index | Next | Previous | View Threaded


sprout at cpan

Mar 1, 2008, 10:57 AM

Post #1 of 4 (1019 views)
Permalink
Weight requires a Similarity

I just got this error message when testing my RegexpTermWeight class:

Error in function kino_Weight_init_from_hash at ../c_src/KinoSearch/
Search/Weight.c:24: Can't find 'similarity'

Although this is not documented, Weight apparently requires a
Similarity object to be passed to its constructor. Is this permanent
(in which case I can write a doc patch)? or is this going to change?

Note that this also prevents the example in ::Cookbook::WildCardQuery
from working.


_______________________________________________
KinoSearch mailing list
KinoSearch [at] rectangular
http://www.rectangular.com/mailman/listinfo/kinosearch


marvin at rectangular

Mar 1, 2008, 12:56 PM

Post #2 of 4 (961 views)
Permalink
Re: Weight requires a Similarity [In reply to]

On Mar 1, 2008, at 10:57 AM, Father Chrysostomos wrote:

> I just got this error message when testing my RegexpTermWeight class:
>
> Error in function kino_Weight_init_from_hash at ../c_src/KinoSearch/
> Search/Weight.c:24: Can't find 'similarity'
>
> Although this is not documented, Weight apparently requires a
> Similarity object to be passed to its constructor. Is this permanent
> (in which case I can write a doc patch)? or is this going to change?

It's permanent. I've committed a doc patch, and would appreciate
review:

http://xrl.us/bgzv6 (Link to www.rectangular.com)

Similarity objects are assigned via Schema, almost exactly like
Analyzers. The Schema itself has one primary Similarity; individual
FieldSpecs may override FieldSpec::similarity() to provide another.
The only difference is that Schema::analyzer() is an abstract method
that every subclass has to implement, while Schema::similarity()
returns a standard Similarity object by default.

If every Weight subclass was associated with a field, it would be
possible to automatically retrieve the correct similarity like so:

my $sim = $searchable->get_schema->fetch_sim($field);

However, some Weight subclasses don't have a field -- e.g.
BooleanWeight.

It's tempting to default to the Schema's primary similarity, but
that's not failsafe design. If a field-specific Weight subclass fails
to supply a value for "similarity", it should trigger an error rather
than the silently incorrect behavior of defaulting to the wrong object.

> Note that this also prevents the example
> in ::Cookbook::WildCardQuery from working.


I changed the Cookbook example in the commit as well. However, both
Weight and Cookbook::WildCardQuery now refer to Schema::fetch_sim,
which isn't currently exposed as a public method. We'll need to fix
that.

I think Nathan will argue that Schema::fetch_sim should be renamed to
Schema::fetch_similarity() before it goes public. If he does, he's
probably right. :)

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch [at] rectangular
http://www.rectangular.com/mailman/listinfo/kinosearch


sprout at cpan

Mar 1, 2008, 1:24 PM

Post #3 of 4 (962 views)
Permalink
Re: Weight requires a Similarity [In reply to]

On Mar 1, 2008, at 12:56 PM, Marvin Humphrey wrote:

> It's permanent. I've committed a doc patch, and would appreciate
> review:

Sounds fine to me.

> I think Nathan will argue that Schema::fetch_sim should be renamed
> to Schema::fetch_similarity() before it goes public. If he does,
> he's probably right. :)

Six of one or half a dozen of the other. :-)


_______________________________________________
KinoSearch mailing list
KinoSearch [at] rectangular
http://www.rectangular.com/mailman/listinfo/kinosearch


nate at verse

Mar 3, 2008, 3:49 PM

Post #4 of 4 (952 views)
Permalink
Re: Weight requires a Similarity [In reply to]

On 3/1/08, Marvin Humphrey <marvin [at] rectangular> wrote:
> I think Nathan will argue that Schema::fetch_sim should be renamed to
> Schema::fetch_similarity() before it goes public. If he does, he's
> probably right. :)

Well, depending on how curmudgeonly-versus-practical I'm being, I'd
probably argue that the whole concept of a 'Similarity' object is
specific to the TF/IDF scheme, and that while you are at it you should
get rid of 'Weight' as well. But I appreciate the sentiment, and am
glad you changed to Scorer::get_doc_num.

I like that you've internalized my objections, since I've been too
busy to closely read the list just lately. For the benefit of the
archive, it might be good to have 'my' objections in print. If it
would help, I can give you access to my email account so you can write
them yourself. :)

If I could have a word with the homunculus of me, though, I'd
encourage it to have you write up a high level architecture document
describing the life cycle of a query. If I were to have a further
word with it, I'd encourage it to keep nagging you until that
architecture has been simplified to fit on one side of a 3"x5" card
without omitting any essential components!

Nathan Kurz
nate [at] verse

_______________________________________________
KinoSearch mailing list
KinoSearch [at] rectangular
http://www.rectangular.com/mailman/listinfo/kinosearch

kinosearch discuss RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.