
barborak at basikgroup
Aug 25, 2008, 12:56 PM
Post #1 of 1
(3081 views)
Permalink
|
|
Re: newbie: Indexing and searching text not
|
|
Hi, There is a utility that comes with the KinoSearch distribution called dump_index. Running that shows these terms associated with the body field: Terms: body:a Doc 0 (1 occurrences) body:bodi Doc 0 (1 occurrences) body:here Doc 0 (1 occurrences) body:is Doc 0 (1 occurrences) body:short Doc 0 (1 occurrences) body:this Doc 0 (1 occurrences) body:veri Doc 0 (1 occurrences) So you can see that the PolyAnalyzer converted "very" to "veri." To get your example to work then, either search for "veri" or run the word "very" through the PolyAnalyzer first. Best, Mike On Mon, Aug 25, 2008 at 2:58 PM, <kinosearch-request [at] rectangular> wrote: > Date: Mon, 25 Aug 2008 11:40:10 +0530 > From: ram <ram [at] netcore> > Subject: Re: [KinoSearch] newbie: Indexing and searching text not > working > To: KinoSearch discussion forum <kinosearch [at] rectangular> > Message-ID: <1219644610.22357.61.camel [at] darkstar> > Content-Type: text/plain > > > On Sat, 2008-08-23 at 15:22 -0400, Mike Barborak wrote: > > Hi, > > > > After creating your index with PolyAnalyzer, your body field will have > > the terms "short" and "body" but not "short body." Take a look at > > KinoSearch::QueryParser::QueryParser as it will likely do what you > > want. > > I think my installation has got some issue. I cant search on a single > word too > > > > --------------------------------------- > use KinoSearch::InvIndexer; > use KinoSearch::Analysis::PolyAnalyzer; > use KinoSearch::Searcher; > use strict; > # > # Start on a clean slate > # > system("rm -rf /tmp/invindex/*"); > my $analyzer = KinoSearch::Analysis::PolyAnalyzer->new( language => > 'en' ); > @gl::headers = qw(from to cc subject body date reply-to message-id > in-reply-to filename); > my $invindexer = KinoSearch::InvIndexer->new( > invindex => '/tmp/invindex', > create => 1, > analyzer => $analyzer, > ); > foreach (@gl::headers) { > $invindexer->spec_field( name => $_ ,indexed =>1); > } > my $doc = $invindexer->new_doc; > my %mail = ( > 'date' => 'Mon, 07 Jan 2008 14:04:35 +0530', > 'to' => 'myteam [at] example', > 'subject' => 'subject test here ', > 'body' => 'This is a very short body here ', > 'cc' => 'ram [at] example', > 'from' => 'sagar [at] example', > 'message-id' => '<1199694875.14998.392.camel [at] sagar>', > 'filename'=>'/abc/def' > ); > foreach (keys %mail) { > next unless($mail{$_}); > $doc->set_value( $_ => $mail{$_} ); > } > $invindexer->add_doc($doc); > $invindexer->finish; > > > $analyzer = KinoSearch::Analysis::PolyAnalyzer->new( language => > 'en' ); > my $searcher = KinoSearch::Searcher->new( > invindex => '/tmp/invindex', > analyzer => $analyzer, > ); > # > # Search on body > # > my $term = KinoSearch::Index::Term->new("body","very"); > my $term_query = KinoSearch::Search::TermQuery->new(term => $term); > my $hits = $searcher->search( query => $term_query ); > while ( my $hit = $hits->fetch_hit_hashref ){ > print "Found HIT in body" . $hit->{body}."\n"; > } > > ----------------------------------------------------------------- > > I am using Fedora-8 and perl-5.10 and latest kinosearch installed via > CPAN >
|