
bramble.andrew at gmail
Jul 22, 2008, 7:11 PM
Post #3 of 7
(5153 views)
Permalink
|
Justin , Plans ? Plans ? I'm still struggling for ideas :) Don't stop on my account. I never finish anythi... KinoSearch was one of several approaches at indexing and slicing CPAN data, my original hack was not a text based search but rather an index of meta.yaml information against distributions allowing for queries like requires:Test::More (you'll never guess the headslap effect of 'set_heed_colons' when doing this with QueryParser) or requires: Test::More license: apache The original naive implementation used Graph and some list utils for intersections. To be honest - I'm working more towards a product search engine than a CPAN index in particular, CPAN data was safe to work on at home ... business data - not so safe. What pushed me towards KinoSearch was seeing some results from the evo.combeta linked from rectangular.com , evo.com appear to have the functionality I'm thinking of - where the results have computed 'refinements' for categories like brand that are presumably document fields. See http://www.evo.com/search?q=cooking&tag=Lead-Free On Wed, Jul 23, 2008 at 10:57 AM, Justin DeVuyst <justin [at] devuyst> wrote: > Hello, > > I was playing around with indexing and searching CPAN with KinoSearch > recently myself. Could you elaborate on what your plans are? I'd > like to move on to something else if someone else is already doing > what I would like to see happen. > > Basically my goal is to make searchable, in one place, everything > known about modules on the CPAN. Whether KinoSearch can fit the > whole bill or just part of the bill I'm still not sure of. > > Thanks, > jdv > > Andrew Bramble wrote: > > Hello, > > > > After getting useful results and fast with KinoSearch .20 I began > > looking at > > ways to narrow results further using field specific refinements. EG > > having > > CPAN metadata indexed and being able to slice into it by a license > > field. > > Might it be possible for a Scorer (I think it's a scorer) to compute > > from > > within the set of matched results, the total frequency of tokens from > > a > > given field. To use the CPAN example again, rather than choosing to > > search > > for "date parser" and license:artistic , might the initial search for > > "date parser" return the matching results AND a structure describing > > that of > > 100 matched documents, the field 'license' breaks down to perl=50, > > artistic=30, gpl=10, bsd=5, apache=5. > > One could then repeat the original search , adding 'license:perl' > > to > > narrow the search to only the 50 matching documents. > > > > Since this would required reading/examining each matched record I > > would > > guess this belongs in the XS/C rather than perl. > > > > Is it wishful thinking ? or might this be possible with subclassable > > scorers/hit collectors. > > > > ++KinoSearch > > > > Andrew > > _______________________________________________ > > KinoSearch mailing list > > KinoSearch [at] rectangular > > http://www.rectangular.com/mailman/listinfo/kinosearch > > > > > > _______________________________________________ > KinoSearch mailing list > KinoSearch [at] rectangular > http://www.rectangular.com/mailman/listinfo/kinosearch >
|