Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: kinosearch: discuss

Error: Maximum token length is 65535

 

 

kinosearch discuss RSS feed   Index | Next | Previous | View Threaded


riyaad.miller at predix

Jul 14, 2008, 6:18 AM

Post #1 of 3 (1183 views)
Permalink
Error: Maximum token length is 65535

Hi All

Hi Marvin, this is really great work and truly appreciated.

I'm using KS 0.162. When using the following code, the error below is produced:

My Definitions
my $stemmer = KinoSearch::Analysis::Stemmer->new( language => 'en' );
my $stopalizer = KinoSearch::Analysis::Stopalizer->new(language => 'en');
my $analyzer = KinoSearch::Analysis::PolyAnalyzer->new(analyzers => [$stemmer, $stopalizer]);


The Error
Maximum token length is 65535; got 107462 at /usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi/KinoSearch/Index/SegWriter.pm line 82
KinoSearch::Index::SegWriter::add_doc('KinoSearch::Index::SegWriter=HASH(0x852d47c)', 'KinoSearch::Document::Doc=HASH(0x852cf90)') called at /usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi/KinoSearch/InvIndexer.pm line 224
KinoSearch::InvIndexer::add_doc('KinoSearch::InvIndexer=HASH(0x8546d7c)', 'KinoSearch::Document::Doc=HASH(0x852cf90)')

I comment $stemmer and $stopalizer's definitions and use the below code. This works perfectly but clearly won't allow for stemming and stopalizer =0
my $analyzer = KinoSearch::Analysis::PolyAnalyzer->new(language => 'en');

Could anyone assist in providing a possible work around this? - Your assistance is greatly appreciated.

Regards,
Riyaad


marvin at rectangular

Jul 14, 2008, 10:07 PM

Post #2 of 3 (1105 views)
Permalink
Re: Error: Maximum token length is 65535 [In reply to]

On Jul 14, 2008, at 6:18 AM, Riyaad Miller wrote:

> I'm using KS 0.162. When using the following code, the error below
> is produced:
>
> My Definitions
> my $stemmer = KinoSearch::Analysis::Stemmer->new( language =>
> 'en' );
> my $stopalizer = KinoSearch::Analysis::Stopalizer->new(language =>
> 'en');
> my $analyzer = KinoSearch::Analysis::PolyAnalyzer->new(analyzers
> => [$stemmer, $stopalizer]);
>
> The Error
> Maximum token length is 65535; got 107462

You have a PolyAnalyzer which contains a Stemmer and a Stopalizer, but
not a Tokenizer. Thus, the entire field value, all 107462 characters
of it, is the only token.

Theoretically, if KS had completed indexing successfully rather than
choked on that value, and at search-time someone were to type in the
appropriate 100,000+ character search string, you might get a hit.

Whatever those 107462 characters are, I can guarantee you that nothing
that long exists in the english stop list. Similarly, I doubt the
Stemmer has anything useful to say about the last few characters of
that field.

You really need a Tokenizer. You probably also want an LCNormalizer
in there unless you really want searches to be case sensitive.

my $lc_normalizer = KinoSearch::Analysis::LCNormalizer->new;
my $tokenizer = KinoSearch::Analysis::Tokenizer->new;
my $stemmer = KinoSearch::Analysis::Stemmer->new(
language => 'en',
);
my $stopalizer = KinoSearch::Analysis::Stopalizer->new(
language => 'en',
);
my $analyzer = KinoSearch::Analysis::PolyAnalyzer->new(
analyzers => [ $lc_normalizer, $tokenizer, $stopalizer, $stemmer ],
);

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch [at] rectangular
http://www.rectangular.com/mailman/listinfo/kinosearch


riyaad.miller at predix

Jul 15, 2008, 6:24 AM

Post #3 of 3 (1112 views)
Permalink
re: Error: Maximum token length is 65535 [In reply to]

Hi Marvin

Thank you for the help. I did as mentioned and it worked brilliantly.
We not worthy ... :-)

Regards
Riyaad


_______________________________________________
KinoSearch mailing list
KinoSearch [at] rectangular
http://www.rectangular.com/mailman/listinfo/kinosearch

kinosearch discuss RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.