branchnetconsulting at gmail
Jun 16, 2011, 3:13 PM
Post #3 of 3
Re: How to diagnose recurrent CPU soft lockup problems with the latest PF_RING on Ubuntu 10.04 server
[In reply to]
I filed the bug on bugzilla.ntop.org.
Here is how I start snort 220.127.116.11 and snort 18.104.22.168. Both versions
frequently lock up the same way when the group of snort daemons on a
given box are restarted.
taskset $CPUMASK snort -i $IFACE -l
/opt/nids/sensor/var/log/snort/$IFACE -u snort -c
--pid-path /var/run/snort -D -F /opt/nids/sensor/etc/snort/$IFACE.bpf
I have not yet tried snort clustering yet.
My modprobe config line for pf_ring looks like this:
options pf_ring num_slots=30000 transparent_mode=0 enable_tx_capture=0
On 6/16/2011 1:08 PM, Luca Deri wrote:
> it looks to me that there's a lock that is not released and that causes this problem. it might be that is the clustering you use.
> Can you please explain how you started snort?
> Please file a bug on bugzilla.ntop.org for tracking this issue
> Regards Luca
> On Jun 16, 2011, at 6:53 PM, Kevin Branch wrote:
>> Ever since upgrading a couple of my CentOS 4.6 NIDS sensor hosts to use the latest PF_RING from subversion, they have been recurrently locking up with CPU soft lockup errors like this, forcing a manual system reboot:
>> BUG: soft lockup - CPU#1 stuck for 61s! [snort:5049]
>> On one sensor host, I downgraded to PF_RING 4195 and the problem completely went away. However on the other host I am needing to take advantage of the snort 2.9 daq and pf_ring clustering, so in that case I replaced the hardware (now a dual-core hyperthreaded Xeon system) and switched to Ubuntu 10.04 server in hopes of getting better alignment with current PF_RING development and eliminating this problem -- but it still comes up frequently. I've also diagnostically downgraded from snort-22.214.171.124 to snort 126.96.36.199 and shut down all other NIDS processes besides snort.
>> Currently on my Ubuntu sensor host, I am running 4 instances of snort 188.8.131.52 linked against the latest PF_RING. Each snort instance is sniffing a unique physical network interface (e1000). I am not using transparent mode or the PF_RING version of the e1000 driver at this point. When I shut down these snort instances and then attempt to start them up again, at least half the time the NIDS host seizes up with CPU soft lockup errors and has to be rebooted. I have tried disabling hyperthreading to no avail. I also started using taskset to set cpu affinity for each snort process such that each instance runs on a unique logical cpu. I have been able to recreate this problem even when limiting myself to 2 snort processes, as long as they are running on the same physical cpu.
>> When I rebuild snort 184.108.40.206 to not use PF_RING, I can restart all 4 snort instances endlessly without CPU soft lockups.
>> It seems that I can't reliably restart sets of snort daemons on the same physical CPU if snort is linked against current PF_RING.
>> I really don't know where to go from here, so I'm hoping someone else has encountered something like this or can suggest where I should go next in the process of diagnosing the issue.
>> Kevin Branch
>> Ntop-misc mailing list
>> Ntop-misc [at] listgateway
> If you can not measure it, you can not improve it - Lord Kelvin
> Ntop-misc mailing list
> Ntop-misc [at] listgateway
Ntop-misc mailing list
Ntop-misc [at] listgateway