
btx at mojo
May 7, 2004, 8:32 PM
Post #1 of 2
(600 views)
Permalink
|
|
Aha! snmp_portscan.nes lockups
|
|
All: Aha! This is so typical, I work on this thing for a few days, think of dozens of things, build test / debug code, etc ... finally post to the mailing list, and an hour after that, I [seemingly] ascertain what the problem is... It SEEMS like the problem is a combination of the timeout not killing the process, and a problem with the nessusd plugin code. To get this problem to happen, I just set the timeout value of the snmp_portscan plugin to 15 seconds and kicked off a scan. About halfway through the third snmpwalk execution, the test timed out, and that was where the problem happened. As I mentioned before, sigterm_handler doesn't keep the latest snmpwalk child pid, it only keeps the pid from the version discovery execution. Therefore, this scan was unable to be killed. Next, I don't have a clue WHY (yet), but the nessusd child running snmp_portscan.nes never exits. I ran gdb on this pid, and I got this: #0 0x420dabc2 in recv () from /lib/i686/libc.so.6 #1 0x400537fa in comm_send_status () from /usr/lib/libnessus.so.2 #2 0x4020cdc6 in plugin_run (desc=0x8123750) at snmp_portscan.c:365 #3 0x080544e0 in nes_thread (args=0x8123750) at nes_plugins.c:310 #4 0x0804ebd1 in create_process (function=0x8054388 <nes_thread>, argument=0x8123750) at processes.c:108 #5 0x0805433f in nes_plugin_launch (globals=0x812df88, plugin=0x8123750, hostinfos=0x81af520, preferences=0x806e608, kb=0x81c1be0, name=0x81c1c6a "", soc=7) at nes_plugins.c:251 #6 0x08059e17 in plugin_launch (globals=0x812df88, plugin=0x8122ad0, hostinfos=0x81af520, preferences=0x806e608, key=0x81c1be0, name=0x81c1c40 "/usr/lib/nessus/plugins/snmp_portscan.nes",launcher=0xa7f) at pluginlaunch.c:503 #7 0x0804bcad in launch_plugin (globals=0x812df88, plugins=0x8122ad0, hostname=0xbfffd988 "10.1.2.9", cur_plug=0xbfffd878, num_plugs=1784, hostinfos=0x81af520, key=0x81c1be0, new_kb=1) at attack.c:271 #8 0x0804c05e in attack_host (globals=0x812df88, hostinfos=0x81af520, hostname=0xbfffd988 "10.1.2.9", sched=0x81591b0) at attack.c:423 #9 0x0804c261 in attack_start (args=0x81af520) at attack.c:524 #10 0x0804ebd1 in create_process (function=0x804c11c <attack_start>, argument=0xbfffd970) at processes.c:108 #11 0x0804cb4f in attack_network (globals=0x812df88) at attack.c:820 #12 0x08055267 in server_thread (globals=0x812df88) at nessusd.c:526 #13 0x0804ebd1 in create_process (function=0x8054d88 <server_thread>, argument=0x812df88) at processes.c:108 #14 0x080557b7 in main_loop () at nessusd.c:860 #15 0x0805624e in main (argc=0, argv=0xbfffe424, envp=0xbfffe438) at nessusd.c:1323 #16 0x420158d4 in __libc_start_main () from /lib/i686/libc.so.6 ... SO the child appears to be waiting for the status from the parent nessusd process. This completely locks up the works - nothing continues until the child process (the snmp_portscan.nes process) is kill -9'd. If the sigterm handler is modified so that it kills its own pid after killing the snmpwalk child (and all the changes that go with that change), this problem doesn't show up. I personally would initialize the snmpwalk_process variable to something like -1 or 0, then check to make sure the value is > 0 before calling kill with that as an arg - otherwise, it seems like you'd have a race condition in which nessud and everything in its process group would be whacked (assuming the user cancels the scan at exactly the right moment). I don't know enough about the nessusd plugin scheduler or the control connection (where it seems to be locked up) to suggest a definitive fix or an accurate analysis of the problem, but it seems like nessusd isnt expecting to have to ack something, but since the plugin isn't killed, it ends up having to ack it. So that left me with the problem of trying to figure out how on earth this was triggered - snmpwalk's default timeout is around 6 seconds, and snmpwalk is only run (for a scan) 4 times per scan (at least when WIN_INST_SOFT isn't defined). It's run another time to get the version, but that usually ends up running for well under a second. Therefore, my best guess is that these people were running several gigantic scans at once. The load must've been extremely high, which caused snmpwalk to take 9+ seconds to execute, which pushed the execution time over the limit. So, for a resolution, I can submit my suggested fix (context diff patch?) but I can't help thinking that there was a reason why the child pids were ignored. I'll wait to see what other people say prior to submitting any code. Thank you, Brian Costello btx [at] calyx
|