Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: nsp: foundry

Errors from MLX card

 

 

nsp foundry RSS feed   Index | Next | Previous | View Threaded


jethro.binks at strath

Feb 8, 2011, 1:20 PM

Post #1 of 3 (1084 views)
Permalink
Errors from MLX card

Hi all,

My usual friendly Brocade engineer is unavailable just now, so can I run
this one past you for a view.

Brand new MLX chassis and line cards, just started putting this network
together. We are seeing OSPF instability between one box and its two peers
-- the three form a triangle, and two peers are both connected from the
same card just now. In the logs of the box, we get this message
regularly appearing:

Feb 8 21:14:37:A:System: Health Monitoring: TM Egress data errors detected on LP 1/TM 0

Taking a leap of faith, I rconsole to LP slot 1, and using "sh tm stats
all-counters 0":

Ingress Counters:
...
Egress Counters:
EGQ EnQue Pkt Count: 82632384114
EGQ EnQue Byte Count: 15803851648864
EGQ Discard Pkt Count: 0
EGQ Discard Byte Count: 0
EGQ Segment Error Count: 45156
EGQ Fragment Error Count: 1857249
Port63 Error Pkt Count: 0
Pkt Header Error Pkt Count: 0
Pkt Lost Due to Buffer Full Pkt Count: 0
Reassem Err Discard Pkt Count: 301089
Reassem Err Discard Fragment(32B) Count: 750175
TDM_A Lost Pkt Count: 0
TDM_B Lost Pkt Count: 0

Programmable Egress Counters:
[.Port Id for Enque: 0 (Disable), Port Id for Discard: 0 (Disable)]
EGQ EnQue Pkt Count: 20659499964
EGQ EnQue Byte Count: 3951348364608
EGQ Discard Pkt Count: 0
EGQ Discard Byte Count: 0

and I guess those errors on the Egress Counters are not good.

What might this indicate? Hardware fault on the slot or module? Hardware
info appended for the interested, thanks for any comments.


. . . . . . . . . . . . . . . . . . . . . . . . .
Jethro R Binks, Network Manager,
Information Services Directorate, University Of Strathclyde, Glasgow, UK

The University of Strathclyde is a charitable body, registered in
Scotland, number SC015263.



SL 1: NI-MLX-10Gx4 4-port 10GbE Module (Serial #: N12623F1NS, Part #: 35600-203H)
Boot : Version 5.0.0T175 Copyright (c) 1996-2009 Brocade Communications Systems, Inc.
Compiled on Apr 19 2010 at 17:27:52 labeled as xmlb05000
(486524 bytes) from boot flash
Monitor : Version 5.0.0T175 Copyright (c) 1996-2009 Brocade Communications Systems, Inc.
Compiled on Apr 19 2010 at 17:27:32 labeled as xmlprm05000
(486034 bytes) from code flash
IronWare : Version 5.0.0cT177 Copyright (c) 1996-2009 Brocade Communications Systems, Inc.
Compiled on Aug 17 2010 at 13:31:04 labeled as xmlp05000c
(4838219 bytes) from Primary
FPGA versions:
Valid PBIF Version = 3.21, Build Time = 11/11/2009 13:57:00

Valid XPP Version = 6.03, Build Time = 1/28/2010 8:17:00

Valid XGMAC Version = 0.12, Build Time = 11/10/2008 15:50:00

X10G2MAC 0
X10G2MAC 1
666 MHz MPC 8541 (version 8020/0020) 333 MHz bus
512 KB Boot Flash (MX29LV040C), 16 MB Code Flash (MT28F640J3)
512 MB DRAM, 8 KB SRAM, 0 Bytes BRAM
PPCR0: 768K entries CAM, 8192K PRAM, 2048K AGE RAM
PPCR1: 768K entries CAM, 8192K PRAM, 2048K AGE RAM
LP Slot 1 uptime is 43 days 20 hours 34 minutes 32 seconds


_______________________________________________
foundry-nsp mailing list
foundry-nsp [at] puck
http://puck.nether.net/mailman/listinfo/foundry-nsp


tomeks at man

Feb 10, 2011, 12:12 AM

Post #2 of 3 (1069 views)
Permalink
Re: Errors from MLX card [In reply to]

Hi,

I remember similar case. We had to replace the module (LP). However try
to reload the module first. In our case the answer from TAC was that
they suspect hardware failure. But I also remember the case, when such
errors disappear for a "while" after module power-off/on.

Tomek

W dniu 2011-02-08 22:20, Jethro R Binks pisze:
> Hi all,
>
> My usual friendly Brocade engineer is unavailable just now, so can I run
> this one past you for a view.
>
> Brand new MLX chassis and line cards, just started putting this network
> together. We are seeing OSPF instability between one box and its two peers
> -- the three form a triangle, and two peers are both connected from the
> same card just now. In the logs of the box, we get this message
> regularly appearing:
>
> Feb 8 21:14:37:A:System: Health Monitoring: TM Egress data errors detected on LP 1/TM 0
>
> Taking a leap of faith, I rconsole to LP slot 1, and using "sh tm stats
> all-counters 0":
>
> Ingress Counters:
> ...
> Egress Counters:
> EGQ EnQue Pkt Count: 82632384114
> EGQ EnQue Byte Count: 15803851648864
> EGQ Discard Pkt Count: 0
> EGQ Discard Byte Count: 0
> EGQ Segment Error Count: 45156
> EGQ Fragment Error Count: 1857249
> Port63 Error Pkt Count: 0
> Pkt Header Error Pkt Count: 0
> Pkt Lost Due to Buffer Full Pkt Count: 0
> Reassem Err Discard Pkt Count: 301089
> Reassem Err Discard Fragment(32B) Count: 750175
> TDM_A Lost Pkt Count: 0
> TDM_B Lost Pkt Count: 0
>
> Programmable Egress Counters:
> [.Port Id for Enque: 0 (Disable), Port Id for Discard: 0 (Disable)]
> EGQ EnQue Pkt Count: 20659499964
> EGQ EnQue Byte Count: 3951348364608
> EGQ Discard Pkt Count: 0
> EGQ Discard Byte Count: 0
>
> and I guess those errors on the Egress Counters are not good.
>
> What might this indicate? Hardware fault on the slot or module? Hardware
> info appended for the interested, thanks for any comments.
>
>
> . . . . . . . . . . . . . . . . . . . . . . . . .
> Jethro R Binks, Network Manager,
> Information Services Directorate, University Of Strathclyde, Glasgow, UK
>
> The University of Strathclyde is a charitable body, registered in
> Scotland, number SC015263.
>
>
>
> SL 1: NI-MLX-10Gx4 4-port 10GbE Module (Serial #: N12623F1NS, Part #: 35600-203H)
> Boot : Version 5.0.0T175 Copyright (c) 1996-2009 Brocade Communications Systems, Inc.
> Compiled on Apr 19 2010 at 17:27:52 labeled as xmlb05000
> (486524 bytes) from boot flash
> Monitor : Version 5.0.0T175 Copyright (c) 1996-2009 Brocade Communications Systems, Inc.
> Compiled on Apr 19 2010 at 17:27:32 labeled as xmlprm05000
> (486034 bytes) from code flash
> IronWare : Version 5.0.0cT177 Copyright (c) 1996-2009 Brocade Communications Systems, Inc.
> Compiled on Aug 17 2010 at 13:31:04 labeled as xmlp05000c
> (4838219 bytes) from Primary
> FPGA versions:
> Valid PBIF Version = 3.21, Build Time = 11/11/2009 13:57:00
>
> Valid XPP Version = 6.03, Build Time = 1/28/2010 8:17:00
>
> Valid XGMAC Version = 0.12, Build Time = 11/10/2008 15:50:00
>
> X10G2MAC 0
> X10G2MAC 1
> 666 MHz MPC 8541 (version 8020/0020) 333 MHz bus
> 512 KB Boot Flash (MX29LV040C), 16 MB Code Flash (MT28F640J3)
> 512 MB DRAM, 8 KB SRAM, 0 Bytes BRAM
> PPCR0: 768K entries CAM, 8192K PRAM, 2048K AGE RAM
> PPCR1: 768K entries CAM, 8192K PRAM, 2048K AGE RAM
> LP Slot 1 uptime is 43 days 20 hours 34 minutes 32 seconds
>
>
> _______________________________________________
> foundry-nsp mailing list
> foundry-nsp [at] puck
> http://puck.nether.net/mailman/listinfo/foundry-nsp
>


_______________________________________________
foundry-nsp mailing list
foundry-nsp [at] puck
http://puck.nether.net/mailman/listinfo/foundry-nsp


jethro.binks at strath

Mar 2, 2011, 4:00 AM

Post #3 of 3 (1001 views)
Permalink
Re: Errors from MLX card [In reply to]

On Thu, 10 Feb 2011, Tomasz Szewczyk wrote:

> I remember similar case. We had to replace the module (LP). However try
> to reload the module first. In our case the answer from TAC was that
> they suspect hardware failure. But I also remember the case, when such
> errors disappear for a "while" after module power-off/on.

For what it's worth, reloading the box made the problem go away, and it
hasn't returned.

Thanks for the comments,

Jethro.

>
> Tomek
>
> W dniu 2011-02-08 22:20, Jethro R Binks pisze:
> > Hi all,
> >
> > My usual friendly Brocade engineer is unavailable just now, so can I run
> > this one past you for a view.
> >
> > Brand new MLX chassis and line cards, just started putting this network
> > together. We are seeing OSPF instability between one box and its two peers
> > -- the three form a triangle, and two peers are both connected from the
> > same card just now. In the logs of the box, we get this message
> > regularly appearing:
> >
> > Feb 8 21:14:37:A:System: Health Monitoring: TM Egress data errors detected on LP 1/TM 0
> >
> > Taking a leap of faith, I rconsole to LP slot 1, and using "sh tm stats
> > all-counters 0":
> >
> > Ingress Counters:
> > ...
> > Egress Counters:
> > EGQ EnQue Pkt Count: 82632384114
> > EGQ EnQue Byte Count: 15803851648864
> > EGQ Discard Pkt Count: 0
> > EGQ Discard Byte Count: 0
> > EGQ Segment Error Count: 45156
> > EGQ Fragment Error Count: 1857249
> > Port63 Error Pkt Count: 0
> > Pkt Header Error Pkt Count: 0
> > Pkt Lost Due to Buffer Full Pkt Count: 0
> > Reassem Err Discard Pkt Count: 301089
> > Reassem Err Discard Fragment(32B) Count: 750175
> > TDM_A Lost Pkt Count: 0
> > TDM_B Lost Pkt Count: 0
> >
> > Programmable Egress Counters:
> > [.Port Id for Enque: 0 (Disable), Port Id for Discard: 0 (Disable)]
> > EGQ EnQue Pkt Count: 20659499964
> > EGQ EnQue Byte Count: 3951348364608
> > EGQ Discard Pkt Count: 0
> > EGQ Discard Byte Count: 0
> >
> > and I guess those errors on the Egress Counters are not good.
> >
> > What might this indicate? Hardware fault on the slot or module? Hardware
> > info appended for the interested, thanks for any comments.
> >
> >
> > . . . . . . . . . . . . . . . . . . . . . . . . .
> > Jethro R Binks, Network Manager,
> > Information Services Directorate, University Of Strathclyde, Glasgow, UK
> >
> > The University of Strathclyde is a charitable body, registered in
> > Scotland, number SC015263.
> >
> >
> >
> > SL 1: NI-MLX-10Gx4 4-port 10GbE Module (Serial #: N12623F1NS, Part #: 35600-203H)
> > Boot : Version 5.0.0T175 Copyright (c) 1996-2009 Brocade Communications Systems, Inc.
> > Compiled on Apr 19 2010 at 17:27:52 labeled as xmlb05000
> > (486524 bytes) from boot flash
> > Monitor : Version 5.0.0T175 Copyright (c) 1996-2009 Brocade Communications Systems, Inc.
> > Compiled on Apr 19 2010 at 17:27:32 labeled as xmlprm05000
> > (486034 bytes) from code flash
> > IronWare : Version 5.0.0cT177 Copyright (c) 1996-2009 Brocade Communications Systems, Inc.
> > Compiled on Aug 17 2010 at 13:31:04 labeled as xmlp05000c
> > (4838219 bytes) from Primary
> > FPGA versions:
> > Valid PBIF Version = 3.21, Build Time = 11/11/2009 13:57:00
> >
> > Valid XPP Version = 6.03, Build Time = 1/28/2010 8:17:00
> >
> > Valid XGMAC Version = 0.12, Build Time = 11/10/2008 15:50:00
> >
> > X10G2MAC 0
> > X10G2MAC 1
> > 666 MHz MPC 8541 (version 8020/0020) 333 MHz bus
> > 512 KB Boot Flash (MX29LV040C), 16 MB Code Flash (MT28F640J3)
> > 512 MB DRAM, 8 KB SRAM, 0 Bytes BRAM
> > PPCR0: 768K entries CAM, 8192K PRAM, 2048K AGE RAM
> > PPCR1: 768K entries CAM, 8192K PRAM, 2048K AGE RAM
> > LP Slot 1 uptime is 43 days 20 hours 34 minutes 32 seconds
> >
> >
> > _______________________________________________
> > foundry-nsp mailing list
> > foundry-nsp [at] puck
> > http://puck.nether.net/mailman/listinfo/foundry-nsp
> >
>
>
> _______________________________________________
> foundry-nsp mailing list
> foundry-nsp [at] puck
> http://puck.nether.net/mailman/listinfo/foundry-nsp
>

. . . . . . . . . . . . . . . . . . . . . . . . .
Jethro R Binks, Network Manager,
Information Services Directorate, University Of Strathclyde, Glasgow, UK

The University of Strathclyde is a charitable body, registered in
Scotland, number SC015263.
_______________________________________________
foundry-nsp mailing list
foundry-nsp [at] puck
http://puck.nether.net/mailman/listinfo/foundry-nsp

nsp foundry RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.