Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: kinosearch: discuss

bad intialization in SegTermDocs.pm

 

 

kinosearch discuss RSS feed   Index | Next | Previous | View Threaded


jack_tanner at yahoo

Apr 10, 2008, 3:22 PM

Post #1 of 5 (1240 views)
Permalink
bad intialization in SegTermDocs.pm

I'm a KS newbie, but I think I've found a bug that has to do with SegTermDocs.pm in 0.162.

Specifically, I'm seeing a BooleanQuery return 0 docs. I stepped through KS's code (yay EPIC) to find that in SegTermDocs->new, after _init_child($self) the $self remains undef. Subsequently, $self->_set_reader( $reader ) also fails, and new() returns an undef SegTermDocs object. I'm an XS n00b, and I wouldn't know how to begin tracing that part of the code.

In case this is relevant (probably not), the docs are analyzed with only a whitespace tokenizer because I do my own stemming and stopword removal. They're stored in a RAMInvIndex. I'm running on Fedora 8, Perl 5.8.8, with KS installed from CPAN.




__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com


_______________________________________________
KinoSearch mailing list
KinoSearch [at] rectangular
http://www.rectangular.com/mailman/listinfo/kinosearch


marvin at rectangular

Apr 10, 2008, 3:51 PM

Post #2 of 5 (1181 views)
Permalink
Re: bad intialization in SegTermDocs.pm [In reply to]

On Apr 10, 2008, at 3:22 PM, jack_tanner [at] yahoo wrote:
> I'm a KS newbie, but I think I've found a bug that has to do with
> SegTermDocs.pm in 0.162.
>
> Specifically, I'm seeing a BooleanQuery return 0 docs. I stepped
> through KS's code (yay EPIC) to find that in SegTermDocs->new, after
> _init_child($self) the $self remains undef. Subsequently, $self-
> >_set_reader( $reader ) also fails, and new() returns an undef
> SegTermDocs object. I'm an XS n00b, and I wouldn't know how to begin
> tracing that part of the code.
>
> In case this is relevant (probably not), the docs are analyzed with
> only a whitespace tokenizer because I do my own stemming and
> stopword removal. They're stored in a RAMInvIndex. I'm running on
> Fedora 8, Perl 5.8.8, with KS installed from CPAN.

My first guess is that the root of the problem is a mismatch between
what's in the index and what's been requested. But it's hard to say,
and the term_docs() code is indeed a little messed up in 0.162. Can
you supply a failing test case?

Also, if you don't need API stability, I encourage you to use the
devel release.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch [at] rectangular
http://www.rectangular.com/mailman/listinfo/kinosearch


jack_tanner at yahoo

Apr 10, 2008, 5:32 PM

Post #3 of 5 (1178 views)
Permalink
Re: bad intialization in SegTermDocs.pm [In reply to]

I'm working on a test case. In the mean time, here's a test case for a different issue. (Could be an issue with the wetware, too...)

Yields: Can't call method "get_score" on unblessed reference at ks-test-case.pl line 33.

I'm kind of tied down to 0.162 because that's what's available via ActiveState PPMs. I really don't want to get into compiling my own manually on Windows, and the app needs to deploy on Windows (although I can develop on Linux).



__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Attachments: ks-test-case.pl (1.08 KB)


jack_tanner at yahoo

Apr 10, 2008, 6:12 PM

Post #4 of 5 (1166 views)
Permalink
Re: bad intialization in SegTermDocs.pm [In reply to]

> From: Marvin Humphrey <marvin [at] rectangular>
>
> My first guess is that the root of the problem is a mismatch between
> what's in the index and what's been requested. But it's hard to say,

Bingo! OK, that was a bug in my code. I now always get the error 'Can't call method "get_score" on unblessed reference', as in the e-mail from an hour ago.

This leads me to another question: if I have tokenized (and stemmed and removed stopwords from) my own input, is there a way to create a new KS doc with the list of tokens? For now, I have to do this:

my $doc = $invindexer->new_doc;
$doc->set_value(txt => join (' ', @tokens));




__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com


_______________________________________________
KinoSearch mailing list
KinoSearch [at] rectangular
http://www.rectangular.com/mailman/listinfo/kinosearch


marvin at rectangular

Apr 10, 2008, 8:02 PM

Post #5 of 5 (1166 views)
Permalink
Re: bad intialization in SegTermDocs.pm [In reply to]

On Apr 10, 2008, at 5:32 PM, jack_tanner [at] yahoo wrote:

> I'm working on a test case. In the mean time, here's a test case for
> a different issue. (Could be an issue with the wetware, too...)
>
> Yields: Can't call method "get_score" on unblessed reference at ks-
> test-case.pl line 33.

This is an area where the maint and devel branches diverge. Maint
returns a simple hashref with extra entries. SVN trunk returns a
HitDoc object overloaded to behave like a hashref, but with accessors
instead of hash entries for things like "score".

0.162:

while ( my $hashref = $hits->fetch_hit_hashref ) {
print "$hashref->{title}: $hashref->{score}\n";
}

SVN trunk:

while ( my $hit_doc = $hits->fetch_hit ) {
my $score = $hit_doc->get_score;
print "$hit_doc->{title}: $score\n";
}

> I'm kind of tied down to 0.162 because that's what's available via
> ActiveState PPMs. I really don't want to get into compiling my own
> manually on Windows, and the app needs to deploy on Windows
> (although I can develop on Linux).

Gotcha. FWIW, Windows support will continue with 0.2x. I did a bunch
of work on restoring Windows compatibility over the last couple weeks,
and now devel at least compiles again -- though we're getting "free to
wrong pool" memory errors. I think those are "canary in the coal
mine" evidence of leaking objects making it to global destruction when
they shouldn't; I need to find and fix some memory problems which are
leaks on Unixen but errors on Windows.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch [at] rectangular
http://www.rectangular.com/mailman/listinfo/kinosearch

kinosearch discuss RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.