
marvin at rectangular
Aug 29, 2008, 1:32 PM
Post #1 of 1
(1004 views)
Permalink
|
|
r3793 - in trunk/perl: . lib/KinoSearch/Docs/Cookbook
|
|
Author: creamyg Date: 2008-08-29 13:32:22 -0700 (Fri, 29 Aug 2008) New Revision: 3793 Added: trunk/perl/lib/KinoSearch/Docs/Cookbook/CachedSearcher.pod Modified: trunk/perl/MANIFEST Log: Add KinoSearch::Docs::Cookbook::CachedSearcher. Modified: trunk/perl/MANIFEST =================================================================== --- trunk/perl/MANIFEST 2008-08-29 20:19:31 UTC (rev 3792) +++ trunk/perl/MANIFEST 2008-08-29 20:32:22 UTC (rev 3793) @@ -62,6 +62,7 @@ lib/KinoSearch/Doc.pm lib/KinoSearch/Doc/HitDoc.pm lib/KinoSearch/Docs/Cookbook.pod +lib/KinoSearch/Docs/Cookbook/CachedSearcher.pod lib/KinoSearch/Docs/Cookbook/CustomQuery.pod lib/KinoSearch/Docs/Cookbook/CustomQueryParser.pod lib/KinoSearch/Docs/DocNums.pod Added: trunk/perl/lib/KinoSearch/Docs/Cookbook/CachedSearcher.pod =================================================================== --- trunk/perl/lib/KinoSearch/Docs/Cookbook/CachedSearcher.pod (rev 0) +++ trunk/perl/lib/KinoSearch/Docs/Cookbook/CachedSearcher.pod 2008-08-29 20:32:22 UTC (rev 3793) @@ -0,0 +1,92 @@ +=head1 NAME + +KinoSearch::Docs::Cookbook::CachedSearcher - Improve search-time +responsiveness with a cached Searcher. + +=head1 ABSTRACT + +At the core of every Searcher object is an IndexReader, and when an +IndexReader object is created, a small portion of the InvIndex is loaded into +memory. Additional caches are filled as relevant queries arrive. + +For small document collections on lightly-loaded servers, the time to warm up +the Searcher/Reader isn't worth worrying about. For large document +collections or busy servers, the warmup time may become significant, in which +case reusing the Searcher is likely to speed up your application. + +=head1 FastCGI + +A script running under standard CGI runs once per request. In contrast, a +script running on FastCGI webserver using the CGI::Fast module from CPAN +starts upon the first request then executes a loop once per request. + +Create your Searcher outside this loop: + + my $searcher = KinoSearch::Searcher->new( + invindex => MySchema->read('/path/to/invindex/') + ); + while ( my $cgi = CGI::Fast->new ) { + my $hits = $searcher->search( query => $cgi->param('q') || '' ); + ... + } + +=head2 mod_perl + +Under mod_perl, the Searcher can be stored in a module loaded by startup.pl. + + package CachedSearcher; + + my $searcher; + + sub obtain { + $searcher ||= KinoSearch::Searcher->new( + invindex => MySchema->read('/path/to/invindex/') + ); + return $searcher; + } + + sub refresh { + undef $searcher; + return get_searcher(); + } + + # Load at startup rather than wait for first request. + obtain(); + +Individual search processes call CachedSearcher->obtain rather than +create their own Searcher object. If an index gets updated, a special http +request can be made which triggers a call to CachedSearcher->refresh. + +=head1 Benchmarks + +Using Benchmark::Stopwatch to measure a lightly-modified version of the sample +search.cgi app, we get the following results for a query for "congress" under +standard CGI... + + NAME TIME CUMULATIVE PERCENTAGE + load modules 0.121 0.121 73.754% + init searcher 0.004 0.125 2.626% + process search 0.032 0.158 19.735% + fetch hits 0.006 0.164 3.877% + _stop_ 0.000 0.164 0.008% + +... and these results under CGI::Fast: + + NAME TIME CUMULATIVE PERCENTAGE + process search 0.002 0.002 24.213% + fetch hits 0.006 0.008 75.602% + _stop_ 0.000 0.008 0.186% + +As the numbers indicate, for a simple term query, the time to initialize the +Searcher overwhelms the time to execute the search and return results. + +=head1 COPYRIGHT + +Copyright 2008 Marvin Humphrey + +=head1 LICENSE, DISCLAIMER, BUGS, etc. + +See L<KinoSearch> version 0.20. + +=cut + _______________________________________________ kinosearch-commits mailing list kinosearch-commits [at] rectangular http://www.rectangular.com/mailman/listinfo/kinosearch-commits
|