
marvin at rectangular
Mar 28, 2007, 9:59 AM
Post #4 of 8
(828 views)
Permalink
|
On Mar 28, 2007, at 9:23 AM, Roger Dooley wrote: > Marvin Humphrey (3/28/2007 12:03 PM) wrote: >> On Mar 28, 2007, at 7:11 AM, Roger Dooley wrote: >>> When indexing with 0.20_02, >> What's your actual config? I know you're using the new Tokenizer, >> but that's not in 0.20_02. Did you copy just Tokenizer.pm into >> 0.20_02, or did you check out from subversion? > > New Tokenizer from the previous week, but the rest is from 0.20_02. OK. Unfortunately, I can't duplicate this issue using either that config, or subversion trunk. In both cases, memory usage plateaus at 33.8 MB on my box for the benchmarking script. > I've commented that part out for this round of indexing. I can try > setting this again and see what happens. It would be better to leave it at its default. My only concern was the remote possibility that it was set to a value that was causing the problem. > Anything else I can try to figure out what is going on? Troubleshooting memory leaks isn't easy. Here's what I would do, which is not the same as what I recommend you do: 1) Move the problem script to a Linux system if it's not there already. 2) Compile a debugging perl from 5.8.8 sources. 3) Run devel/valgrind_test.plx using the debugging perl. 4) Examine the output for memory leaks. If none show up, then the problem is script specific. 5) Run the script under valgrind and debug perl. The environment variable PERL_DESTRUCT_LEVEL has to be set to 2 and the suppressions file devel/p588_valgrind.supp has to be fed to valgrind. (Peek the commands that valgrind_test.plx runs.) 6) If nothing turns up after indexing a few documents and exiting cleanly, invoke the script under valgrind and debug perl again, but let it run for a long time and then crash it intentionally so that Perl doesn't run its cleanup routines. Then examine valgrind's output looking for clues as to where the memory went. Hopefully at that point we'd be able to narrow down the search to KS's perl code (not likely), KS's C code (likely), or your script itself (quite possible -- could be a black hole hash, for example). What I recommend you do is attempt to duplicate the problem so that I can hunt it down. Create a script I'll be able to run and monitor its memory usage using top. Use the us_constitution HTML presentation if you can. If the footprint keeps growing long past 30 MB, send it my way. Marvin Humphrey Rectangular Research http://www.rectangular.com/
|