
marvin at rectangular
Jul 23, 2005, 8:28 PM
Post #1 of 1
(164 views)
Permalink
|
Greets, A developer's release of Sort::External, version 0.10_2, has been uploaded to CPAN. http://search.cpan.org/~creamyg/Sort-External-0.10_2/ It includes the new Sort::External::Cookbook, thanks to Ken's encouragement. :) Ken, please let me know if it answers some of your questions, e.g. about null-terminating fields. http://search.cpan.org/~creamyg/Sort-External-0.10_2/lib/Sort/ External/Cookbook.pm The major API change is the implementation of -mem_threshold (!), which also required extensive changes to feed, fetch, and expecially, _consolidate_one_level, which is about 50 lines shorter (for now). Initial testing seems to indicate somewhat diminished performance in _consolidate_one_level (a finish routine which used to take 3.2 seconds now takes 4.7), which may be due to one of two factors. 1) The new version relies entirely on OS read buffering for efficiency. It may be that I can restore the efficiency by reading 1 meg or so at a time per temp file into buffers. 2) It's possible that the new version may require a greater number of dereference ops when sorting, though I don't think so. This will be harder to fix, because I'm already doing some pretty unusual stuff in an effort to get that number down. There is also be diminished performance in fetch (a routine involving fetch which used to take 12 seconds now takes 14 -- and most of that discrepancy is due to fetch). This is needless and due to extra lexical variable creation, conditional testing, and probably, poor OS read buffering. I thought it was necessary to eliminate the output_cache, but it isn't. Putting it back should fix things. There is definitely decreased performance in feed if you use - mem_threshold, though I haven't measured that yet. That's the price you pay for -mem_threshold, which call's Devel::Size's size() on each item. It's my hope that I can tweak _consolidate_one_level and fetch until they're better than they were before. Then you can use - mem_threshold if you need roughly reliable memory consumption, or avoid it if you know how large your items are going to be and want to squeeze that last little bit of performance out. Marvin Humphrey Rectangular Research http://www.rectangular.com/
|