Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: kinosearch: discuss

Sort::External 0.10_2

 

 

kinosearch discuss RSS feed   Index | Next | Previous | View Threaded


marvin at rectangular

Jul 23, 2005, 8:28 PM

Post #1 of 1 (164 views)
Permalink
Sort::External 0.10_2

Greets,

A developer's release of Sort::External, version 0.10_2, has been
uploaded to CPAN.

http://search.cpan.org/~creamyg/Sort-External-0.10_2/

It includes the new Sort::External::Cookbook, thanks to Ken's
encouragement. :) Ken, please let me know if it answers some of
your questions, e.g. about null-terminating fields.

http://search.cpan.org/~creamyg/Sort-External-0.10_2/lib/Sort/
External/Cookbook.pm

The major API change is the implementation of -mem_threshold (!),
which also required extensive changes to feed, fetch, and expecially,
_consolidate_one_level, which is about 50 lines shorter (for now).

Initial testing seems to indicate somewhat diminished performance in
_consolidate_one_level (a finish routine which used to take 3.2
seconds now takes 4.7), which may be due to one of two factors.

1) The new version relies entirely on OS read buffering for
efficiency. It may be that I can restore the efficiency by reading 1
meg or so at a time per temp file into buffers.

2) It's possible that the new version may require a greater number of
dereference ops when sorting, though I don't think so. This will be
harder to fix, because I'm already doing some pretty unusual stuff in
an effort to get that number down.

There is also be diminished performance in fetch (a routine involving
fetch which used to take 12 seconds now takes 14 -- and most of that
discrepancy is due to fetch). This is needless and due to extra
lexical variable creation, conditional testing, and probably, poor OS
read buffering. I thought it was necessary to eliminate the
output_cache, but it isn't. Putting it back should fix things.

There is definitely decreased performance in feed if you use -
mem_threshold, though I haven't measured that yet. That's the price
you pay for -mem_threshold, which call's Devel::Size's size() on each
item.

It's my hope that I can tweak _consolidate_one_level and fetch until
they're better than they were before. Then you can use -
mem_threshold if you need roughly reliable memory consumption, or
avoid it if you know how large your items are going to be and want to
squeeze that last little bit of performance out.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

kinosearch discuss RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.