
simon at simon-cozens
Apr 14, 2007, 7:21 AM
Post #1 of 2
(510 views)
Permalink
|
|
Re: KinoSearch postings database
|
|
[Private mail => list, as requested] Marvin Humphrey wrote: >> I'd like to be able to use kinosearch to generate my tag >> clouds, which are essentially mappings between all terms in a given >> field and the number of postings for that term. Is there a way >> (supported or otherwise) for me to grab this directly from the index >> data, or do I have to grovel around doing a search for each individual >> term and counting up the hits? > > If all you need is the document frequency for each term in the corpus, > and it doesn't matter whether there some of those docs might be deleted, > you can use this unsupported method in any version of KS: > > my $doc_freq = $searcher->doc_freq($term); > > That's nice and fast, because all it does is access the term infos file > rather than consult the postings files as a search would. Nice and fast indeed! It's just that... perl -MGlob -le 'print Glob->searcher->search(query => "tag:theology")->total_hits; print Glob->searcher->doc_freq(KinoSearch::Index::Term->new("tag", "theology"))' 130 0 What am I doing wrong? > Another way, also unsupported, is to get a terms iterator (TermEnum in > Lucene/Plucene), which allows you to access the term infos data > sequentially. The exact incantation to get one in KS has changed a > number of times, though, and is in flux again today. This one would be useful for me eventually for building tag-clouds, but for the time being I can get away with having a "tag" table in the database and going through that. Ideally most of my database would move to KinoSearch though. Simon
|