Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: NANOG: users

Catalyst 6500 High Switch Proc

 

 

NANOG users RSS feed   Index | Next | Previous | View Threaded


phil at mindfury

Nov 15, 2008, 1:35 PM

Post #1 of 8 (1952 views)
Permalink
Catalyst 6500 High Switch Proc

Hello.

I've run into a bit of a snag and I hope some folks here may be able to
enlighten. From time to time I check the 'sh platform hardware
capacity' command on our Catalyst 6509s and have noticed this item:

CPU Resources
CPU utilization: Module 5 seconds 1 minute 5
minutes
5 RP 1% / 0%
3% 4%
5 SP 82% / 27%
62% 73%

This is shown on two 6509 switches that we operate as Core layer
devices. This value goes up to 85-90% during periods of peak traffic
and I'm concerned that this may be a problem.

Checking 'sh proc cpu' is usually 10% or less.

I've gone over this document backwards and forwards and none of the
situations outlined seem to apply here:
http://www.cisco.com/en/US/products/hw/switches/ps708/products_tech_note09186a00804916e0.shtml

One thing to note, is that our main ACL for ingress traffic is applied
here due to historical reasons. It's roughly 5000 single host entries
at present. We also use these devices for NDE.

I'm probably missing some other key details, but what could influence
the SP like this? Any insight would be appreciated.

--
Philip L.


jlewis at lewis

Nov 15, 2008, 1:57 PM

Post #2 of 8 (1910 views)
Permalink
Re: Catalyst 6500 High Switch Proc [In reply to]

On Sat, 15 Nov 2008, Philip L. wrote:

> I've run into a bit of a snag and I hope some folks here may be able to
> enlighten. From time to time I check the 'sh platform hardware capacity'
> command on our Catalyst 6509s and have noticed this item:
>
> CPU Resources
> CPU utilization: Module 5 seconds 1 minute 5 minutes
> 5 RP 1% / 0% 3% 4%
> 5 SP 82% / 27% 62% 73%
>
> This is shown on two 6509 switches that we operate as Core layer devices.
> This value goes up to 85-90% during periods of peak traffic and I'm concerned
> that this may be a problem.
>
> Checking 'sh proc cpu' is usually 10% or less.
>
> I've gone over this document backwards and forwards and none of the
> situations outlined seem to apply here:
> http://www.cisco.com/en/US/products/hw/switches/ps708/products_tech_note09186a00804916e0.shtml
>
> One thing to note, is that our main ACL for ingress traffic is applied here
> due to historical reasons. It's roughly 5000 single host entries at present.
> We also use these devices for NDE.

This should probably be on cisco-nsp rather than nanog, but...

5000 lines for ACL? I don't have any experience with ACLs of that size,
but it sounds like a possible problem.

If you're doing netflow export and not doing sampled netflow, I'm guessing
this is where your problem is. sh mls netflow table-contention detailed
might be able to confirm or rule this out.

----------------------------------------------------------------------
Jon Lewis | I route
Senior Network Engineer | therefore you are
Atlantic Net |
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________


fw at deneb

Nov 15, 2008, 2:05 PM

Post #3 of 8 (1920 views)
Permalink
Re: Catalyst 6500 High Switch Proc [In reply to]

* Jon Lewis:

>> I've run into a bit of a snag and I hope some folks here may be able
>> to enlighten. From time to time I check the 'sh platform hardware
>> capacity' command on our Catalyst 6509s and have noticed this item:

MSFC/PFC version is also relevant.

> 5000 lines for ACL? I don't have any experience with ACLs of that
> size, but it sounds like a possible problem.

Yes, but it should be doable. I don't know the commands for the
current IOS releases, but "show tcam" (including "show tcam detail")
and "show fm interface" were quite helpful for designing ACLs for
efficient processing.


phil at mindfury

Nov 15, 2008, 2:08 PM

Post #4 of 8 (1915 views)
Permalink
Re: Catalyst 6500 High Switch Proc [In reply to]

This is on a Sup720-3BXL by the way:

'sh mls netflow table-con detailed:'
Earl in Module 5
Detailed Netflow CAM (TCAM and ICAM) Utilization
================================================
TCAM Utilization : 100%
ICAM Utilization : 6%
Netflow TCAM count : 262024
Netflow ICAM count : 8
Netflow Creation Failures : 2085847
Netflow CAM aliases : 0

I had read about this earlier, along with 100% TCAM usage for the FIB,
but that wouldn't be the case here, as we're only showing 25% of the FIB
TCAM being used.

--
Philip L.

Jon Lewis wrote:
>
> This should probably be on cisco-nsp rather than nanog, but...
>
> 5000 lines for ACL? I don't have any experience with ACLs of that
> size, but it sounds like a possible problem.
>
> If you're doing netflow export and not doing sampled netflow, I'm
> guessing this is where your problem is. sh mls netflow
> table-contention detailed
> might be able to confirm or rule this out.
>
> ----------------------------------------------------------------------
> Jon Lewis | I route
> Senior Network Engineer | therefore you are
> Atlantic Net |
> _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________


jlewis at lewis

Nov 15, 2008, 2:23 PM

Post #5 of 8 (1910 views)
Permalink
Re: Catalyst 6500 High Switch Proc [In reply to]

On Sat, 15 Nov 2008, Philip L. wrote:

> This is on a Sup720-3BXL by the way:
>
> 'sh mls netflow table-con detailed:'
> Earl in Module 5
> Detailed Netflow CAM (TCAM and ICAM) Utilization
> ================================================
> TCAM Utilization : 100%
> ICAM Utilization : 6%
> Netflow TCAM count : 262024
> Netflow ICAM count : 8
> Netflow Creation Failures : 2085847
> Netflow CAM aliases : 0

This looks like the same issue I ran into not long ago. Switch your
netflow over from full to sampled...you lose lots of data, but your
hardware can't handle full netflow for your traffic level.

AFAIK, your only other options are to mess with the mls aging timers
(shorten them) or buy cards with DFC and hope that gets you enough
additional netflow capacity for the interfaces your collecting.

http://www.gossamer-threads.com/lists/cisco/nsp/94953

----------------------------------------------------------------------
Jon Lewis | I route
Senior Network Engineer | therefore you are
Atlantic Net |
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________


kloch at kl

Nov 15, 2008, 6:53 PM

Post #6 of 8 (1897 views)
Permalink
Re: Catalyst 6500 High Switch Proc [In reply to]

Jon Lewis wrote:
> On Sat, 15 Nov 2008, Philip L. wrote:
>
>> This is on a Sup720-3BXL by the way:
>>
>> 'sh mls netflow table-con detailed:'
>> Earl in Module 5
>> Detailed Netflow CAM (TCAM and ICAM) Utilization
>> ================================================
>> TCAM Utilization : 100%
>> ICAM Utilization : 6%
>> Netflow TCAM count : 262024
>> Netflow ICAM count : 8
>> Netflow Creation Failures : 2085847
>> Netflow CAM aliases : 0
>
> This looks like the same issue I ran into not long ago. Switch your
> netflow over from full to sampled...you lose lots of data, but your
> hardware can't handle full netflow for your traffic level.
>
> AFAIK, your only other options are to mess with the mls aging timers
> (shorten them) or buy cards with DFC and hope that gets you enough
> additional netflow capacity for the interfaces your collecting.
>
> http://www.gossamer-threads.com/lists/cisco/nsp/94953

Hopefully he is not trying to use netflow for accounting/billing.
I use:

mls sampling packet-based 1024 8192

As it is a convenient factor of ~1000 from the real numbers.
1Gbit/s of traffic shows up as 1Mbit/s. This has been accurate enough
for anything I have wanted to look at like per-AS traffic.

- Kevin


ross at kallisti

Nov 17, 2008, 11:11 AM

Post #7 of 8 (1876 views)
Permalink
Re: Catalyst 6500 High Switch Proc [In reply to]

On Sat, Nov 15, 2008 at 04:35:28PM -0500, Philip L. wrote:
> One thing to note, is that our main ACL for ingress traffic is applied
> here due to historical reasons. It's roughly 5000 single host entries
> at present. We also use these devices for NDE.

On a SUP7203BXL, if your ACL TCAM utilization is fine, this shouldn't
impact performance unless you're logging too much. Since you've been
over the CPU utilization doc, I'm guessing you know that.

"show platform hardware capacity acl" will give you a breakdown on
your ACL TCAM usage.

> I'm probably missing some other key details, but what could influence
> the SP like this? Any insight would be appreciated.

Cisco says that Netflow-based features always handle the first packet
of a flow in software, but I don't know if this is the RP or the SP.
It would make sense if a first-flow packet that didn't need punting
hit the SP and not the RP. In that case, your traffic level with
netflow enabled could explain your high SP utilization.

--
Ross Vandegrift
ross [at] kallisti

"If the fight gets hot, the songs get hotter. If the going gets tough,
the songs get tougher."
--Woody Guthrie


phil at mindfury

Nov 17, 2008, 6:34 PM

Post #8 of 8 (1869 views)
Permalink
Re: Catalyst 6500 High Switch Proc [In reply to]

Ross Vandegrift wrote:
> On Sat, Nov 15, 2008 at 04:35:28PM -0500, Philip L. wrote:
>
>> One thing to note, is that our main ACL for ingress traffic is applied
>> here due to historical reasons. It's roughly 5000 single host entries
>> at present. We also use these devices for NDE.
>>
>
> On a SUP7203BXL, if your ACL TCAM utilization is fine, this shouldn't
> impact performance unless you're logging too much. Since you've been
> over the CPU utilization doc, I'm guessing you know that.
>
> "show platform hardware capacity acl" will give you a breakdown on
> your ACL TCAM usage.
>
>
>> I'm probably missing some other key details, but what could influence
>> the SP like this? Any insight would be appreciated.
>>
>
> Cisco says that Netflow-based features always handle the first packet
> of a flow in software, but I don't know if this is the RP or the SP.
> It would make sense if a first-flow packet that didn't need punting
> hit the SP and not the RP. In that case, your traffic level with
> netflow enabled could explain your high SP utilization.
>
>
It is a Sup720-3BXL. Based on the suggestions here, I went ahead and
did 'no ip flow ingress' on all the interfaces just to see, and surely
enough, the SP went down to about 10-15%. My colleague implemented
packet count-based NetFlow sampling to attempt to reduce the 100%
NetFlow TCAM usage, and it appears to be partially effective. It still
fills up frequently, so we'll have to do some more tweaking.

I appreciate all the replies, public and private.

--
Philip L.

NANOG users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.