
nate at verse
Sep 6, 2007, 12:17 AM
Post #3 of 5
(807 views)
Permalink
|
After sending you this message, I worked on this particular approach enough to convince myself that it was a dead end. I've move on to another approach that I hope you'll like better, and while I don't have it working yet I think I'm convinced it can be made to work. There is still a Match structure, but it doesn't nest. Instead, the existing Scorer hierarchy is used. I'll send you more details in a followup message, and respond more generally here. > FWIW, I've written up the naming principles I follow as a blog entry: > <http://www.rectangular.com/blog/my_name_is_variable.html>. Yes, I can agree with most of that. Personally, I have less fear of long identifiers and find 'string_compare' to be clearer to purpose than 'scomp'. :) > One of your inventions is Scorer_Advance. I like it as a substitute > for Scorer_Next, and it might be worth a global search and replace > since that method isn't public yet. :) However, in your code it > appears to be a substitute for Scorer_Skip_To. I'm hoping to collapse those two down to a single function. Currently, I'm thinking that function is Scorer_Match(), to emphasize that the contents of the Match struct are available only until the next such call, in a manner parallel to Scorer_Tally(). > From a DRY standpoint, it would be nice to have a single > PhraseScorer working over sub-scorers rather than having one which > uses sub-Scorers and one which uses PostingLists. Yes, I think that PhraseScorer should use a subscorer and not PostingLists. That said, it may be simpler to restrict complexity of that subscorer at least temporarily so that we don't have to start with a fully recursive phrase scorer. Something like allowing: PhraseScorer -> AndScorer -> [TermScorer TermScorer TermScorer] and not yet handling: PhraseScorer -> AndScorer -.> [OrScorer PhraseScorer AndScorer] > I think similar reasoning led you to Match and me to Tally. Well, that and the hope that if I paralleled Match and Tally you'd like the idea better :). > > The trickiness (and I don't like trickiness) is that each Match is > > allowed to contain either an array of positions, or an array of Match > > structs: > > I doubt that's necessary. Just create a default wrapper at the > lowest level. That's how TermScorer does things presently. I fear the trickiness is still necessary at some level, but I think I've managed to hide it in a place you'll like better. Essentially, I'm going to propose two main subclasses for Scorer, MultiScorer and MatchScorer. MultiScorer's contain a public VArray of other Scorer's, while MatchScorer's contain a public Match struct. > This variable name violates my "avoid overload overload" rule. :) > "field" has a very specific meaning in the context of KS and this > isn't it. I agree with you in general, but I thought this was the specific meaning. It's removed from Match in my new incarnation, but would would you prefer it to be called: 'index_field', 'field_num'? > I think we can avoid this union. See below. Yes, it's gone in current incarnation. Unfortunately, what it switches to is a run-time type check for OBJ_IS_A(MultiScorer). > This was the driving factor behind the ScoreProx class. I've forgotten the details, but I came to the conclusion that ScoreProx was at odds with Rich Positions, and that to allow a Proximity type scorer to use Positions specific weights some wider interface was needed. > A better name for the ScoreProx class would be appreciated. :) It's > the worst class name in KS, and the "num_sproxen" member var in Tally > is the worst member var name. :) > Collation of positions gets complicated when these scorers are nested. It's possible we are defining terms differently here, but my current plan is that there never will be any collation. Instead, the MultiScorer's (AndScorer, OrScorer) will allow their children's Match structs to be accessed directly. I tried to pursue collation at one point, and gave up: positions from multiple fields, phrases of different lengths. On the bright side, direct access is very efficient! More tomorrow about what I'm currently aiming for. It's still rough, but I think you'll like it better than the initial proposal. Nathan Kurz nate [at] verse _______________________________________________ KinoSearch mailing list KinoSearch [at] rectangular http://www.rectangular.com/mailman/listinfo/kinosearch
|