
marvin at rectangular
Sep 19, 2008, 5:28 PM
Views: 11766
Permalink
|
On Sep 19, 2008, at 11:25 AM, Nathan Kurz wrote: > The third thing (tiny, but perhaps easy to fix) is that > Scorepost_read_record is spending 40% of its time in REALLOC. Is the > enlarged position buffer not getting reused for some reason? Oi, good catch! With one line of code, we see a 10-20% search-time speed improvement: Index: ../c_src/KinoSearch/Posting/ScorePosting.c =================================================================== --- ../c_src/KinoSearch/Posting/ScorePosting.c (revision 3882) +++ ../c_src/KinoSearch/Posting/ScorePosting.c (working copy) @@ -145,6 +145,7 @@ num_prox = self->freq; if (num_prox > self->prox_cap) { self->prox = REALLOCATE(self->prox, num_prox, u32_t); + self->prox_cap = num_prox; } positions = self->prox; > ps. The directions for building the Reuters benchmark index seem out > of date. '-Mblib' no longer finds the uninstalled KinoSearch.so in > the parent hierarchy. I'll try to get updates committed later this evening. Incidentally, although there are c. 19,000 unique documents in the Reuters corpus, the indexing benchmarker will loop if you specify a larger number, e.g. --docs=1000000. Marvin Humphrey Rectangular Research http://www.rectangular.com/ _______________________________________________ KinoSearch mailing list KinoSearch [at] rectangular http://www.rectangular.com/mailman/listinfo/kinosearch
|