
edwardbetts at gmail
Jun 5, 2007, 8:32 AM
Post #1 of 4
(674 views)
Permalink
|
|
Another minimal test case: File::Find causes crash
|
|
Here is my code: #!/usr/bin/perl use strict; use warnings; package Schema; use base qw( KinoSearch::Schema ); use KinoSearch::Analysis::PolyAnalyzer; our %fields = ( title => 'KinoSearch::Schema::FieldSpec' ); sub analyzer { KinoSearch::Analysis::PolyAnalyzer->new( language => 'en' ) } package main; use File::Find; use KinoSearch::InvIndexer; my $index = KinoSearch::InvIndexer->new(invindex => Schema->clobber('index')); find(\&wanted, "en"); $index->finish(); sub wanted { /\.html$/ or return; my $filename = $_; my %field; open my $fh, $filename or die "$filename: $!"; while (<$fh>) { m!<body>! and last; if (m!<title>(.*)</title>!) { $field{title} = $1; last; } } close $fh; $index->add_doc(\%field); } I'm running this with KinoSearch-0.20_03 from CPAN. It needs a reasonably big collection of files, like 50,000 of them. I've used a static dump from wikipedia. If you want to try that you need to install 7zip, if you're running Debian the package name is p7zip-full. Assuming you want to use the wiki dump and you've put the code in index_wiki.pl the steps to run look like this: wget http://static.wikipedia.org/downloads/April_2007/en/wikipedia-en-html.0.7z 7z x wikipedia-en-html.0.7z perl index_wiki.pl The output I get is: Error in function kino_FSFolder_open_outstream at c_src/KinoSearch/Store/FSFolder.c:56: Can't open '_1.skip': No such file or directory at /home/edward/src/KinoSearch-0.20_03/blib/lib/KinoSearch/Index/SegWriter.pm line 121 KinoSearch::Index::SegWriter::add_doc('KinoSearch::Index::SegWriter=HASH(0x816bdfc)', 'HASH(0x890e790)', 1) called at /home/edward/src/KinoSearch-0.20_03/blib/lib/KinoSearch/InvIndexer.pm line 114 KinoSearch::InvIndexer::add_doc('KinoSearch::InvIndexer=HASH(0x816b7c0)', 'HASH(0x890e790)') called at ./index_wiki.pl line 42 main::wanted() called at /usr/share/perl/5.8/File/Find.pm line 886 File::Find::_find_dir('HASH(0x816c00c)', 'en', 8) called at /usr/share/perl/5.8/File/Find.pm line 700 File::Find::_find_opt('HASH(0x816c00c)', 'en') called at /usr/share/perl/5.8/File/Find.pm line 1223 File::Find::find('CODE(0x8337cac)', 'en') called at ./index_wiki.pl line 23 The line numbers in index_wiki.pl are wrong because I took out the 'use lib' line in the sample above. Let me know if you need any more info. -- Edward Betts
|