bramble.andrew at gmail
Jul 22, 2008, 7:11 PM
Post #3 of 7
Plans ? Plans ? I'm still struggling for ideas :) Don't stop on my account.
I never finish anythi...
KinoSearch was one of several approaches at indexing and slicing CPAN data,
my original hack was not a text based search but rather an index of
meta.yaml information against distributions allowing for queries like
requires:Test::More (you'll never guess the headslap effect of
'set_heed_colons' when doing this with QueryParser) or
The original naive implementation used Graph and some list utils for
To be honest - I'm working more towards a product search engine than a CPAN
index in particular, CPAN data was safe to work on at home ... business
data - not so safe.
What pushed me towards KinoSearch was seeing some results from the
evo.combeta linked from
rectangular.com , evo.com appear to have the functionality I'm thinking of -
where the results have computed 'refinements' for categories like brand that
are presumably document fields. See
On Wed, Jul 23, 2008 at 10:57 AM, Justin DeVuyst <justin [at] devuyst> wrote:
> I was playing around with indexing and searching CPAN with KinoSearch
> recently myself. Could you elaborate on what your plans are? I'd
> like to move on to something else if someone else is already doing
> what I would like to see happen.
> Basically my goal is to make searchable, in one place, everything
> known about modules on the CPAN. Whether KinoSearch can fit the
> whole bill or just part of the bill I'm still not sure of.
> Andrew Bramble wrote:
> > Hello,
> > After getting useful results and fast with KinoSearch .20 I began
> > looking at
> > ways to narrow results further using field specific refinements. EG
> > having
> > CPAN metadata indexed and being able to slice into it by a license
> > field.
> > Might it be possible for a Scorer (I think it's a scorer) to compute
> > from
> > within the set of matched results, the total frequency of tokens from
> > a
> > given field. To use the CPAN example again, rather than choosing to
> > search
> > for "date parser" and license:artistic , might the initial search for
> > "date parser" return the matching results AND a structure describing
> > that of
> > 100 matched documents, the field 'license' breaks down to perl=50,
> > artistic=30, gpl=10, bsd=5, apache=5.
> > One could then repeat the original search , adding 'license:perl'
> > to
> > narrow the search to only the 50 matching documents.
> > Since this would required reading/examining each matched record I
> > would
> > guess this belongs in the XS/C rather than perl.
> > Is it wishful thinking ? or might this be possible with subclassable
> > scorers/hit collectors.
> > ++KinoSearch
> > Andrew
> > _______________________________________________
> > KinoSearch mailing list
> > KinoSearch [at] rectangular
> > http://www.rectangular.com/mailman/listinfo/kinosearch
> KinoSearch mailing list
> KinoSearch [at] rectangular