
nate at verse
Apr 19, 2008, 12:10 PM
Post #5 of 8
(1293 views)
Permalink
|
On Fri, Apr 18, 2008 at 7:50 PM, Marvin Humphrey <marvin [at] rectangular> wrote: > > My desire for simplicity makes me wonder if > > one could just have a single 'QueryNode' class that instantiates a > > customizeable Scorer. > > > > I don't quite follow. Instead of building a tree of different classes of Query, it seems simpler to me to build a tree out of nodes of the same type and move the class to a field: QueryNode: scorer: "KSx::MyScorer" children: [Child1, Child2] Probably just my quirk, though. I've never been liked subclassing for the sake of avoiding function pointers (or their Perlish equivalents). I'd also want a better name than QueryNode. ;) So long as this tree can be easily parsed and 'optimized', I guess I don't have a problem with the current approach, though. > You mean how would you persuade QueryParser to use your ORQuery variant > rather than the default? Yes, I'm wondering how to get a variant to actually be used. As it is, the the official way seems to be to rewrite QueryParser to use my own classes, but this seems onerous. Or one could post-process the Query tree and swap in the custom class. Alternatively, one could take the approach I did before I bogged down, and conclude that it's simpler to skip the indirection and build the Scorer tree directly. > Probably we'd need to give QueryParser some sort > of make_orquery() factory method you could override. > I'm not sure I want that to happen right away in core, though. > QueryParser-type classes are sadly prone to death by Featuritis. This is > the kind of thing I'd rather see refined via KSx. Definitely the custom scorer should go in KSx (or in some other userspace) but there needs to be some way to use this class without writing a lot of other infrastructure. Either QueryParser needs to be more easily subclassable, or needs to have customizable types (skip factories, all we need is a class name string), or there needs to be hook to post-process the Query tree (s/// for trees a la XSLT). > QueryParser doesn't parse 'NOT > brobniquitz' down to a NOTQuery because it's standard behavior for search > engines to parse that kind of thing as a void query with no result set > rather than return the universe. I strongly think you want to 'return the universe' here. If you design the system so it doesn't choke on large result sets, it will be truly industrial strength and multi-purpose. Instead of thinking about this as a search engine (with standard search engine constraints) think of KinoSearch as a general purpose database with some really cool retrieval functions. Make it strong and fearless! > > > ANDORQuery is the odd one out, because it doesn't really mean 'a AND/OR > b'. > > > Ditto. Why not just layer an AND and an OR? > > > > I don't think that's quite the same thing?? I was shooting from the hip, but I think 'A AND (A OR B)' would produce the same results once normalized. Given the way caching works, this probably isn't actually that expensive, but I can certainly see why it isn't perfect. Alternatively, one could allow OrScorer a non-zero no-match score, or come up with an 'OptionalTermScorer'. But you are probably right: while I like the building block simplicity of these approaches, it's not that bad to have a custom Scorer for this situation. Although "Term AND OptionalTerm" is pretty clear too. If you do go with a RequiredAndOptionalScorer, though, I'd request that it be able to handle arbitrary subqueries under the Required half, rather than just straight Terms. > Or, even better, say you have a simple TermQuery, and you > find out that the term isn't in the index (because $searchable->doc_freq > returns 0). Then you can just return undef (indicating a null result set) > instead of a Scorer. This doesn't really strike me as an 'even better', but a recipe for poorly handled rare error conditions. A query with no results is going to be very fast to run, so this optimization isn't really saving much. And it definitely made my code paths uglier trying to handle this case. I'd much prefer to get an actual empty set of results after running the search (which I need to handle anyway) rather than have his special case. > There is actually quite a lot that happens in between a Query and a Scorer. > That's where the "Weight" classes come in - they encapsulate the process of > compiling a Query to a Scorer. Any chance you could write up what actually happens here? And then perhaps feeling too embarrassed to publish the as-builts, rework this part of the architecture to make it simple, streamlined, and 3x5 cardable? ;) Nathan Kurz nate [at] verse > > ps. The ice cream goes pretty well: http://screamsorbet.com/ > > > > Beet Lemon Sorbet! Awesome. Yeah, it's surprisingly good. I'll send you up some once we figure out the right way to package it for shipping. _______________________________________________ KinoSearch mailing list KinoSearch [at] rectangular http://www.rectangular.com/mailman/listinfo/kinosearch
|