Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: NTop: Misc

How to diagnose recurrent CPU soft lockup problems with the latest PF_RING on Ubuntu 10.04 server

 

 

NTop misc RSS feed   Index | Next | Previous | View Threaded


branchnetconsulting at gmail

Jun 16, 2011, 9:53 AM

Post #1 of 3 (649 views)
Permalink
How to diagnose recurrent CPU soft lockup problems with the latest PF_RING on Ubuntu 10.04 server

Ever since upgrading a couple of my CentOS 4.6 NIDS sensor hosts to use
the latest PF_RING from subversion, they have been recurrently locking
up with CPU soft lockup errors like this, forcing a manual system reboot:
BUG: soft lockup - CPU#1 stuck for 61s! [snort:5049]

On one sensor host, I downgraded to PF_RING 4195 and the problem
completely went away. However on the other host I am needing to take
advantage of the snort 2.9 daq and pf_ring clustering, so in that case I
replaced the hardware (now a dual-core hyperthreaded Xeon system) and
switched to Ubuntu 10.04 server in hopes of getting better alignment
with current PF_RING development and eliminating this problem -- but it
still comes up frequently. I've also diagnostically downgraded from
snort-2.9.0.5 to snort 2.8.6.1 and shut down all other NIDS processes
besides snort.

Currently on my Ubuntu sensor host, I am running 4 instances of snort
2.8.6.1 linked against the latest PF_RING. Each snort instance is
sniffing a unique physical network interface (e1000). I am not using
transparent mode or the PF_RING version of the e1000 driver at this
point. When I shut down these snort instances and then attempt to start
them up again, at least half the time the NIDS host seizes up with CPU
soft lockup errors and has to be rebooted. I have tried disabling
hyperthreading to no avail. I also started using taskset to set cpu
affinity for each snort process such that each instance runs on a unique
logical cpu. I have been able to recreate this problem even when
limiting myself to 2 snort processes, as long as they are running on the
same physical cpu.

When I rebuild snort 2.8.6.1 to not use PF_RING, I can restart all 4
snort instances endlessly without CPU soft lockups.

It seems that I can't reliably restart sets of snort daemons on the same
physical CPU if snort is linked against current PF_RING.

I really don't know where to go from here, so I'm hoping someone else
has encountered something like this or can suggest where I should go
next in the process of diagnosing the issue.

Kevin Branch
_______________________________________________
Ntop-misc mailing list
Ntop-misc [at] listgateway
http://listgateway.unipi.it/mailman/listinfo/ntop-misc


deri at ntop

Jun 16, 2011, 10:08 AM

Post #2 of 3 (628 views)
Permalink
Re: How to diagnose recurrent CPU soft lockup problems with the latest PF_RING on Ubuntu 10.04 server [In reply to]

Kevin
it looks to me that there's a lock that is not released and that causes this problem. it might be that is the clustering you use.
Can you please explain how you started snort?

Please file a bug on bugzilla.ntop.org for tracking this issue

Regards Luca

On Jun 16, 2011, at 6:53 PM, Kevin Branch wrote:

>
> Ever since upgrading a couple of my CentOS 4.6 NIDS sensor hosts to use the latest PF_RING from subversion, they have been recurrently locking up with CPU soft lockup errors like this, forcing a manual system reboot:
> BUG: soft lockup - CPU#1 stuck for 61s! [snort:5049]
>
> On one sensor host, I downgraded to PF_RING 4195 and the problem completely went away. However on the other host I am needing to take advantage of the snort 2.9 daq and pf_ring clustering, so in that case I replaced the hardware (now a dual-core hyperthreaded Xeon system) and switched to Ubuntu 10.04 server in hopes of getting better alignment with current PF_RING development and eliminating this problem -- but it still comes up frequently. I've also diagnostically downgraded from snort-2.9.0.5 to snort 2.8.6.1 and shut down all other NIDS processes besides snort.
>
> Currently on my Ubuntu sensor host, I am running 4 instances of snort 2.8.6.1 linked against the latest PF_RING. Each snort instance is sniffing a unique physical network interface (e1000). I am not using transparent mode or the PF_RING version of the e1000 driver at this point. When I shut down these snort instances and then attempt to start them up again, at least half the time the NIDS host seizes up with CPU soft lockup errors and has to be rebooted. I have tried disabling hyperthreading to no avail. I also started using taskset to set cpu affinity for each snort process such that each instance runs on a unique logical cpu. I have been able to recreate this problem even when limiting myself to 2 snort processes, as long as they are running on the same physical cpu.
>
> When I rebuild snort 2.8.6.1 to not use PF_RING, I can restart all 4 snort instances endlessly without CPU soft lockups.
>
> It seems that I can't reliably restart sets of snort daemons on the same physical CPU if snort is linked against current PF_RING.
>
> I really don't know where to go from here, so I'm hoping someone else has encountered something like this or can suggest where I should go next in the process of diagnosing the issue.
>
> Kevin Branch
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc [at] listgateway
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

---
If you can not measure it, you can not improve it - Lord Kelvin

_______________________________________________
Ntop-misc mailing list
Ntop-misc [at] listgateway
http://listgateway.unipi.it/mailman/listinfo/ntop-misc


branchnetconsulting at gmail

Jun 16, 2011, 3:13 PM

Post #3 of 3 (628 views)
Permalink
Re: How to diagnose recurrent CPU soft lockup problems with the latest PF_RING on Ubuntu 10.04 server [In reply to]

Luca,

I filed the bug on bugzilla.ntop.org.

Here is how I start snort 2.8.6.1 and snort 2.9.0.5. Both versions
frequently lock up the same way when the group of snort daemons on a
given box are restarted.

taskset $CPUMASK snort -i $IFACE -l
/opt/nids/sensor/var/log/snort/$IFACE -u snort -c
/opt/nids/sensor/etc/snort/rules/$IFACE/snort.conf --create-pidfile
--pid-path /var/run/snort -D -F /opt/nids/sensor/etc/snort/$IFACE.bpf

I have not yet tried snort clustering yet.

My modprobe config line for pf_ring looks like this:
options pf_ring num_slots=30000 transparent_mode=0 enable_tx_capture=0

Kevin Branch


On 6/16/2011 1:08 PM, Luca Deri wrote:
> Kevin
> it looks to me that there's a lock that is not released and that causes this problem. it might be that is the clustering you use.
> Can you please explain how you started snort?
>
> Please file a bug on bugzilla.ntop.org for tracking this issue
>
> Regards Luca
>
> On Jun 16, 2011, at 6:53 PM, Kevin Branch wrote:
>
>> Ever since upgrading a couple of my CentOS 4.6 NIDS sensor hosts to use the latest PF_RING from subversion, they have been recurrently locking up with CPU soft lockup errors like this, forcing a manual system reboot:
>> BUG: soft lockup - CPU#1 stuck for 61s! [snort:5049]
>>
>> On one sensor host, I downgraded to PF_RING 4195 and the problem completely went away. However on the other host I am needing to take advantage of the snort 2.9 daq and pf_ring clustering, so in that case I replaced the hardware (now a dual-core hyperthreaded Xeon system) and switched to Ubuntu 10.04 server in hopes of getting better alignment with current PF_RING development and eliminating this problem -- but it still comes up frequently. I've also diagnostically downgraded from snort-2.9.0.5 to snort 2.8.6.1 and shut down all other NIDS processes besides snort.
>>
>> Currently on my Ubuntu sensor host, I am running 4 instances of snort 2.8.6.1 linked against the latest PF_RING. Each snort instance is sniffing a unique physical network interface (e1000). I am not using transparent mode or the PF_RING version of the e1000 driver at this point. When I shut down these snort instances and then attempt to start them up again, at least half the time the NIDS host seizes up with CPU soft lockup errors and has to be rebooted. I have tried disabling hyperthreading to no avail. I also started using taskset to set cpu affinity for each snort process such that each instance runs on a unique logical cpu. I have been able to recreate this problem even when limiting myself to 2 snort processes, as long as they are running on the same physical cpu.
>>
>> When I rebuild snort 2.8.6.1 to not use PF_RING, I can restart all 4 snort instances endlessly without CPU soft lockups.
>>
>> It seems that I can't reliably restart sets of snort daemons on the same physical CPU if snort is linked against current PF_RING.
>>
>> I really don't know where to go from here, so I'm hoping someone else has encountered something like this or can suggest where I should go next in the process of diagnosing the issue.
>>
>> Kevin Branch
>> _______________________________________________
>> Ntop-misc mailing list
>> Ntop-misc [at] listgateway
>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
> ---
> If you can not measure it, you can not improve it - Lord Kelvin
>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc [at] listgateway
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

_______________________________________________
Ntop-misc mailing list
Ntop-misc [at] listgateway
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

NTop misc RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.