marvin at rectangular
Jun 21, 2007, 4:30 PM
Post #2 of 3
On Jun 21, 2007, at 3:25 PM, Chris Nandor wrote:
> Can't locate object method "get_size" via package
> "KinoSearch::Index::MultiLexicon" at
> line 159, <GEN0> line 1.
Mmf. OK, no big deal. This is much easier to solve than the last
one you threw my way. :)
A Lexicon's "size" is the number of terms it holds. We can't know
the size of a MultiLexicon until we've iterated over the entire thing
once. We can know the number of terms each SegLexicon in the
MultiLexicon holds, but we don't know how many terms overlap. The
iterator uses a PriorityQueue which checks for duplicates, though, so
if we start at the top and count how many times Lex_Next
(multi_lexicon) returns true, we have the size.
Fortunately, by this point, we'll have already performed that
iteration -- during the call to build_sort_cache. What we need to do
is add a self->size member var to the MultiLexicon struct, then set
it to self->term_num as soon as the iteration finishes in
The actual accessor will look like this:
if (self->lex_cache == NULL)
CONFESS("Can't call MultiLex_Size unless cache filled");
We should add a Lex_Get_Size abstract accessor to Lexicon.c/h, along
with an XS hook in Lexicon.pm which both SegLexicon and MultiLexicon
will inherit. We should zap the current XS hook in SegLexicon.pm and
replace it with an implementation of Lex_Get_Size in SegLexicon.c/h.
I have a deadline tomorrow, so I don't think I'll get to adding this
code and the accompanying tests before the weekend.