Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Cisco: NSP

3550 High CPU - nothing in proc cpu

 

 

Cisco nsp RSS feed   Index | Next | Previous | View Threaded


mail4hh at pobox

Nov 14, 2009, 4:58 PM

Post #1 of 14 (1640 views)
Permalink
3550 High CPU - nothing in proc cpu

During a high network usage event, the cpu load increased to 90%
sustained, while a 'show processes cpu' did not reveal any culprits.
I suspected IP Input may be consuming a high amount of cpu, but it was
only at 2.7%

The 3550 is working as a L3 router with two static entries for the
default gw (for load balancing on our uplink).

Traffic levels at the time of the high cpu usage were ~120Mbps.

I also examined broadcast packet counts and traffic destined for the
router itself. They also did not reveal anything out of the ordinary.

Do you have any suggestions on what I should be looking at to
determine the source of the high cpu usage?

Thank you,

Hector
_______________________________________________
cisco-nsp mailing list cisco-nsp [at] puck
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


maillist at thelan

Nov 14, 2009, 6:59 PM

Post #2 of 14 (1585 views)
Permalink
Re: 3550 High CPU - nothing in proc cpu [In reply to]

Hector Herrera wrote:
> During a high network usage event, the cpu load increased to 90%
> sustained, while a 'show processes cpu' did not reveal any culprits.
> I suspected IP Input may be consuming a high amount of cpu, but it was
> only at 2.7%
>
> The 3550 is working as a L3 router with two static entries for the
> default gw (for load balancing on our uplink).
>
> Traffic levels at the time of the high cpu usage were ~120Mbps.
>
> I also examined broadcast packet counts and traffic destined for the
> router itself. They also did not reveal anything out of the ordinary.
>
> Do you have any suggestions on what I should be looking at to
> determine the source of the high cpu usage?
>
What did the topmost line in the "show processes cpu" say? At the five
second average you got two values; one is for interrupts and the other
is for process cpu usage. My guess is you was seing a lot of interrupts
which means traffic was punted to the CPU. Take a look at some of the
other threads on c-nsp to find out what kind of traffic was being punted
("show cef not-cef-switched" is a good start).

Hope this was helpfull

--
Harald Firing Karlsen
_______________________________________________
cisco-nsp mailing list cisco-nsp [at] puck
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


mail4hh at pobox

Nov 14, 2009, 10:43 PM

Post #3 of 14 (1593 views)
Permalink
Re: 3550 High CPU - nothing in proc cpu [In reply to]

Thank you for your responses.

I collected the commands to run the next time the cpu utilization
spikes. I did manage to capture the output of 'show cef
not-cef-switched' and it shows a very large number under the
"unsupported" column. All the other columns are zero.

Reading on the list archives I found a few commands to diagnose the
"unsupported" column and according to the output, it appears that it's
caused by TTL-expired being send to the cpu for processing. Does this
mean that the hardware can't handle the TTL expired load or that
TTL-expired messages are strictly a software process on this hardware
(3550-12t)?

If I have such a large number of TTL-expired messages, does that mean
I have a routing loop somewhere? If so, I have three uplink
interfaces, how do I find out which interface is causing the punts?

Here is the output from the commands I ran:

van-hc16-423-router#show ip cef switching stat

Reason Drop Punt Punt2Host
RP LES No route 0 0 37
RP LES Packet destined for us 0 273716 0
RP LES No adjacency 8587 0 0
RP LES TTL expired 0 0 1676276
RP LES Unclassified reason 1 0 0
RP LES Neighbor resolution req 210055 3 0
RP LES Total 218643 273719 1676313

All Total 218643 273719 1676313
van-hc16-423-router#show ip cef switching stat feature
IPv4 CEF input features:
Feature Drop Consume Punt Punt2Host Gave route
Total 0 0 0 0 0

IPv4 CEF output features:
Feature Drop Consume Punt Punt2Host New i/f
Total 0 0 0 0 0

IPv4 CEF post-encap features:
Feature Drop Consume Punt Punt2Host New i/f
Total 0 0 0 0 0

IPv4 CEF for us features:
Feature Drop Consume Punt Punt2Host New i/f
Total 0 0 0 0 0

IPv4 CEF punt features:
Feature Drop Consume Punt Punt2Host New i/f
Total 0 0 0 0 0

IPv4 CEF local features:
Feature Drop Consume Punt Punt2Host Gave route
Total 0 0 0 0 0
van-hc16-423-router#sh ip arp summ
16 IP ARP entries, with 0 of them incomplete
van-hc16-423-router#sh sdm prefer
The current template is the routing extended-match template.
The selected template optimizes the resources in
the switch to support this level of features for
16 routed interfaces and 1K VLANs.

number of unicast mac addresses: 6K
number of igmp groups: 6K
number of qos aces: 1K
number of security aces: 1K
number of unicast routes: 12K
number of multicast routes: 6K

van-hc16-423-router#sh ip route summary
IP routing table name is Default-IP-Routing-Table(0)
IP routing table maximum-paths is 32
Route Source Networks Subnets Overhead Memory (bytes)
connected 0 1 64 152
static 0 0 0 0
bgp 4280 0 0 0 0
External: 0 Internal: 0 Local: 0
internal 1 1172
Total 1 1 64 1324
van-hc16-423-router#sh ip route vrf PublicRouter sum
van-hc16-423-router#sh ip route vrf PublicRouter summary
IP routing table name is PublicRouter(1)
IP routing table maximum-paths is 32
Route Source Networks Subnets Overhead Memory (bytes)
connected 0 4 256 608
static 1 0 128 152
bgp 4280 1274 1134 154112 367036
External: 2408 Internal: 0 Local: 0
internal 66 77352
Total 1341 1138 154496 445148
van-hc16-423-router#


On Sat, Nov 14, 2009 at 6:59 PM, Harald Firing Karlsen
<maillist [at] thelan> wrote:
> Hector Herrera wrote:
>>
>> During a high network usage event, the cpu load increased to 90%
>> sustained, while a 'show processes cpu' did not reveal any culprits.
>> I suspected IP Input may be consuming a high amount of cpu, but it was
>> only at 2.7%
>>
>> The 3550 is working as a L3 router with two static entries for the
>> default gw (for load balancing on our uplink).
>>
>> Traffic levels at the time of the high cpu usage were ~120Mbps.
>>
>> I also examined broadcast packet counts and traffic destined for the
>> router itself.  They also did not reveal anything out of the ordinary.
>>
>> Do you have any suggestions on what I should be looking at to
>> determine the source of the high cpu usage?
>>
>
> What did the topmost line in the "show processes cpu" say? At the five
> second average you got two values; one is for interrupts and the other is
> for process cpu usage. My guess is you was seing a lot of interrupts which
> means traffic was punted to the CPU. Take a look at some of the other
> threads on c-nsp to find out what kind of traffic was being punted ("show
> cef not-cef-switched" is a good start).
>
> Hope this was helpfull
>
> --
> Harald Firing Karlsen
>



--
Hector Herrera
President
Pier Programming Services Ltd.
_______________________________________________
cisco-nsp mailing list cisco-nsp [at] puck
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


swmike at swm

Nov 15, 2009, 12:30 AM

Post #4 of 14 (1575 views)
Permalink
Re: 3550 High CPU - nothing in proc cpu [In reply to]

On Sat, 14 Nov 2009, Hector Herrera wrote:

> If I have such a large number of TTL-expired messages, does that mean I
> have a routing loop somewhere? If so, I have three uplink interfaces,
> how do I find out which interface is causing the punts?

Try "show int switching" (hidden command, you can't tab-complete).

--
Mikael Abrahamsson email: swmike [at] swm
_______________________________________________
cisco-nsp mailing list cisco-nsp [at] puck
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


mail4hh at pobox

Nov 15, 2009, 1:43 AM

Post #5 of 14 (1575 views)
Permalink
Re: 3550 High CPU - nothing in proc cpu [In reply to]

Great, so now I know:

from 'show ip cef switching stat' I learned that there is a large
number of packets with an expired TTL (TTL-expired is handled by the
IP process, ie. software routing)

from 'show interface switching' (hidden command) I learned the
interface that has a high number of packets In and packets Out in the
row "IP Process"

Since the number of packets in the two commands above are very close
to each other, I think I have identified the network interface with
the large number of TTL-expired packets. It is a BGP interface, so my
best guess is that a BGP neighbour is advertising routes that they
don't actually carry in their routing tables and for some reason they
are sending the packets back to me, and the question now is to locate
the culprit route advertisement and contact the neighbor. Right?

Still, for the next time I see high cpu usage, the commands to use are:

'show process cpu' and look at the first few lines to determine if
it's interrupts or processes consuming the cpu time. If it's
processes, look at the list of processes for any that are using large
percentages.

To diagnose high cpu consumption by interrupts, CPU Profiling
(http://www.cisco.com/en/US/products/hw/routers/ps359/products_tech_note09186a00801c2af0.shtml)
is a possible tool.

Thank you all for your help!

Hector
_______________________________________________
cisco-nsp mailing list cisco-nsp [at] puck
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


swmike at swm

Nov 15, 2009, 2:12 AM

Post #6 of 14 (1570 views)
Permalink
Re: 3550 High CPU - nothing in proc cpu [In reply to]

On Sun, 15 Nov 2009, Hector Herrera wrote:

> Since the number of packets in the two commands above are very close to
> each other, I think I have identified the network interface with the
> large number of TTL-expired packets. It is a BGP interface, so my best
> guess is that a BGP neighbour is advertising routes that they don't
> actually carry in their routing tables and for some reason they are
> sending the packets back to me, and the question now is to locate the
> culprit route advertisement and contact the neighbor. Right?

Yes, or they didn't null-route their aggregate prefix and has default
route to you (or you didn't null-route your prefix and you have a default
route to them).

Best way is probably to port-mirror the port and look for the ICMP
messages generated. You might also have luck with "debug icmp" on the 3550
and see whereto the ICMP messages are sent. There might also be a debug
command to actually tell you what unreachables are being sent. Make sure
you have "no logging console", and remember it's always a risk to debug
things...

--
Mikael Abrahamsson email: swmike [at] swm
_______________________________________________
cisco-nsp mailing list cisco-nsp [at] puck
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


mail4hh at pobox

Nov 21, 2009, 5:01 PM

Post #7 of 14 (1522 views)
Permalink
Re: 3550 High CPU - nothing in proc cpu [In reply to]

I had another opportunity to debug the high cpu usage on the 3550-12t.

show proc cpu indicated that cpu load was 39% interrupt, 40% total

So it's definitively a high interrupt rate that is using up the cpu.

I also debugged the switching mechanism, and although I have high
amounts of TTL-expired events, they only occur at a rate of 2-3 per
second.

I proceeded to profile the cpu usage with:

profile <start> <end> <granularity>
profile start
... 10 mins later
profile stop
show profile terse

Granularity was 8 due to the largest free block being about half the
size of the main:text section.

This gave me a listing of all the memory ranges and a count of how
many times the cpu was found to be in that memory location.

System Total = 000141506
Interrupt Total = 000056163 (39 percent)
Sched Total = 000094547 (66 percent)

Interrupt [00] = 000056163 (39 percent)

The interrupt breakdown is (top 3):

0x475F50 with 3281 counts (~5.4 per sec.)
0x4B82B8 with 1667 counts (~2.7 per sec)
0x4B8F90 with 1456 counts (~2.4 per sec)

My question is:

How do I convert those memory addresses into something that would tell
me what interrupts are being triggered so much?

Thank you,

Hector
_______________________________________________
cisco-nsp mailing list cisco-nsp [at] puck
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


eninja at gmail

Nov 22, 2009, 4:01 PM

Post #8 of 14 (1510 views)
Permalink
Re: 3550 High CPU - nothing in proc cpu [In reply to]

Hector,

It is interesting that the cisco article tells you how to profile your cpu
but not how to interpret the results ;-)

There is only one way to interpret the results - contact Cisco to report the
abnormality. They will have to decode the address/es using the symbol files
for your device software which will reveal the culprit function/s. It should
be pretty straight forward to isolate cause and rectify thereafter.

FYI, seeing CPU spikes to X% during high traffic is not abnormal for most
non-distributed platforms that are groaning under an inappropriate switching
algorithm or overload.

Out of curiosity, is 40% cpu utilization above your benchmarked baseline? If
no, ignore. Also, any alignment corrections? device#sh align

Eninja
PS. Note to CPU profiler PM, help customers to help themselves - enhance cpu
profiler to display decoded addresses in *show profile terse* results and
display culprit functions so users can resolve these simple issues
themselves. Justification - reduction in TAC calls.



On Sat, Nov 21, 2009 at 5:01 PM, Hector Herrera <mail4hh [at] pobox> wrote:

> I had another opportunity to debug the high cpu usage on the 3550-12t.
>
> show proc cpu indicated that cpu load was 39% interrupt, 40% total
>
> So it's definitively a high interrupt rate that is using up the cpu.
>
> I also debugged the switching mechanism, and although I have high
> amounts of TTL-expired events, they only occur at a rate of 2-3 per
> second.
>
> I proceeded to profile the cpu usage with:
>
> profile <start> <end> <granularity>
> profile start
> ... 10 mins later
> profile stop
> show profile terse
>
> Granularity was 8 due to the largest free block being about half the
> size of the main:text section.
>
> This gave me a listing of all the memory ranges and a count of how
> many times the cpu was found to be in that memory location.
>
> System Total = 000141506
> Interrupt Total = 000056163 (39 percent)
> Sched Total = 000094547 (66 percent)
>
> Interrupt [00] = 000056163 (39 percent)
>
> The interrupt breakdown is (top 3):
>
> 0x475F50 with 3281 counts (~5.4 per sec.)
> 0x4B82B8 with 1667 counts (~2.7 per sec)
> 0x4B8F90 with 1456 counts (~2.4 per sec)
>
> My question is:
>
> How do I convert those memory addresses into something that would tell
> me what interrupts are being triggered so much?
>
> Thank you,
>
> Hector
> _______________________________________________
> cisco-nsp mailing list cisco-nsp [at] puck
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>
_______________________________________________
cisco-nsp mailing list cisco-nsp [at] puck
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


mail4hh at pobox

Nov 22, 2009, 4:45 PM

Post #9 of 14 (1504 views)
Permalink
Re: 3550 High CPU - nothing in proc cpu [In reply to]

On Sun, Nov 22, 2009 at 4:01 PM, e ninja <eninja [at] gmail> wrote:
> Hector,
>
> It is interesting that the cisco article tells you how to profile your cpu
> but not how to interpret the results ;-)
>
> There is only one way to interpret the results - contact Cisco to report the
> abnormality. They will have to decode the address/es using the symbol files
> for your device software which will reveal the culprit function/s. It should
> be pretty straight forward to isolate cause and rectify thereafter.

I did receive an email from someone at Cisco offering to look up the
functions. Thank you :-) I can't wait to see the outcome.

> FYI, seeing CPU spikes to X% during high traffic is not abnormal for most
> non-distributed platforms that are groaning under an inappropriate switching
> algorithm or overload.
>
> Out of curiosity, is 40% cpu utilization above your benchmarked baseline? If
> no, ignore. Also, any alignment corrections? device#sh align

Your question made me go back and review my notes. CPU load appears
to be directly correlated to the amount of traffic on the switch. At
50Mbps the cpu load is 40%, at 200Mbps the load is 100%. At 20Mbps
the load (currently) is 10%

I wonder if expecting the 3550-12t platform to handle more than
200Mbps is too much to ask? The specs indicate it's capable of
17Mpps. According to the logs, at 200Mbps (with the 100% cpu load)
the router was forwarding 45Kpps, much less than the advertised
capacity.

Perhaps it is a bad design on my part.

I learned that the 3550-12t has three forwarding engines, one for each
set of four interfaces (0/1 to 0/4, 0/5 to 0/8 and 0/9 to 0/12)

With that in mind, I configured a VRF with four routed interfaces (0/1
to 0/4). 0/3 is a BGP interface. 0/4 is the LAN. 0/1 and 0/2 are
configured in a load-balancing static default route. The forwarding
engine is configured to use per-destination load-balancing.

If I understand it correctly, Cisco's load-balancing in
per-destination mode has an initial cost when the destination is not
present in the routing table, but once it is there, CEF takes care of
the forwarding. Since the traffic on the network is stream based
(Live video streams), with very few new destinations (less than 500
per hour), but a constant stream of packets which should be handled by
CEF.

So I'm still at a loss ... Should I expect better performance from the
3550-12t or am I trying to squeeze blood out of stones?

Hector
_______________________________________________
cisco-nsp mailing list cisco-nsp [at] puck
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


gert at greenie

Nov 22, 2009, 11:28 PM

Post #10 of 14 (1496 views)
Permalink
Re: 3550 High CPU - nothing in proc cpu [In reply to]

Hi,

On Sun, Nov 22, 2009 at 04:45:18PM -0800, Hector Herrera wrote:
> So I'm still at a loss ... Should I expect better performance from the
> 3550-12t or am I trying to squeeze blood out of stones?

Normally, hardware-forwarding boxes should never show significant CPU
load. So some of your traffic is software-forwarded - and you need to
figure out what and why.

gert

--
USENET is *not* the non-clickable part of WWW!
//www.muc.de/~gert/
Gert Doering - Munich, Germany gert [at] greenie
fax: +49-89-35655025 gert [at] net


oboehmer at cisco

Nov 22, 2009, 11:48 PM

Post #11 of 14 (1498 views)
Permalink
Re: 3550 High CPU - nothing in proc cpu [In reply to]

> On Sun, Nov 22, 2009 at 04:45:18PM -0800, Hector Herrera wrote:
> > So I'm still at a loss ... Should I expect better performance from
the
> > 3550-12t or am I trying to squeeze blood out of stones?
>
> Normally, hardware-forwarding boxes should never show significant CPU
> load. So some of your traffic is software-forwarded - and you need to
> figure out what and why.

ack.. I don't have much experience with this platform., but
http://www.cisco.com/en/US/docs/switches/lan/catalyst3750/software/troub
leshooting/cpu_util.html seems to be a good start..

oli
_______________________________________________
cisco-nsp mailing list cisco-nsp [at] puck
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


sthaug at nethelp

Nov 23, 2009, 12:45 AM

Post #12 of 14 (1493 views)
Permalink
Re: 3550 High CPU - nothing in proc cpu [In reply to]

> Normally, hardware-forwarding boxes should never show significant CPU
> load.

With the exception of the old 3500XL series using 50% or more of the
CPU to drive the front panel LEDs :-)

(Yes, I know, EoL years ago...)

Steinar Haug, Nethelp consulting, sthaug [at] nethelp
_______________________________________________
cisco-nsp mailing list cisco-nsp [at] puck
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


MatlockK at exempla

Nov 24, 2009, 7:03 AM

Post #13 of 14 (1468 views)
Permalink
Re: 3550 High CPU - nothing in proc cpu [In reply to]

Heh ,or the old ACC boxes (I think the Danube), where the original
design was to not have ANY front-panel LEDs. The 'managers' didn't like
that, so all they did was create a simple oscillator circuit that
blinked an LED.

The LED has NO correlation to the real status of the chassis. The
chassis can be locked up solid, and that LED will continue blinking
merrily.

Ken Matlock
Network Analyst
Exempla Healthcare
(303) 467-4671
matlockk [at] exempla



-----Original Message-----
From: cisco-nsp-bounces [at] puck
[mailto:cisco-nsp-bounces [at] puck] On Behalf Of
sthaug [at] nethelp
Sent: Monday, November 23, 2009 1:45 AM
To: gert [at] greenie
Cc: cisco-nsp [at] puck
Subject: Re: [c-nsp] 3550 High CPU - nothing in proc cpu

> Normally, hardware-forwarding boxes should never show significant CPU
> load.

With the exception of the old 3500XL series using 50% or more of the
CPU to drive the front panel LEDs :-)

(Yes, I know, EoL years ago...)

Steinar Haug, Nethelp consulting, sthaug [at] nethelp
_______________________________________________
cisco-nsp mailing list cisco-nsp [at] puck
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
_______________________________________________
cisco-nsp mailing list cisco-nsp [at] puck
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


jeff-kell at utc

Nov 24, 2009, 8:03 AM

Post #14 of 14 (1482 views)
Permalink
Re: 3550 High CPU - nothing in proc cpu [In reply to]

> From: sthaug [at] nethelp
>> Normally, hardware-forwarding boxes should never show significant CPU
>> load.
>>
> With the exception of the old 3500XL series using 50% or more of the
> CPU to drive the front panel LEDs :-)

Yes, a 3500XL...

PCP-2000-IDF-3-2#show proc cpu | e 0.00.*0.00.*0.00
CPU utilization for five seconds: 55%/9%; one minute: 54%; five minutes: 53%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
2 324 145 2234 0.20% 0.24% 0.07% 1 Virtual
Exec
19 298827467 725645780 411 4.99% 3.38% 2.15% 0 LED Control
Proc
20 22588231 6594172 3425 0.32% 0.28% 0.28% 0 Frank
Aging
21 15080761451340714509 1124 12.28% 11.87% 11.73% 0 Port Status
Proc

or a 2924XL...

NVA-CM-1#show proc cpu | e 0.00.*0.00.*0.00
CPU utilization for five seconds: 30%/7%; one minute: 32%; five minutes: 33%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
2 333 84 3964 0.00% 0.30% 0.07% 1 Virtual
Exec
19 87700963 110937038 790 1.12% 3.35% 3.93% 0 LED Control
Proc
21 361698432 203598505 1776 12.20% 11.28% 11.21% 0 Port Status
Proc

Jeff
_______________________________________________
cisco-nsp mailing list cisco-nsp [at] puck
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Cisco nsp RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.