Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: nsp: juniper

DSCP-marked traffic mysteriously being dropped by MX960

 

 

nsp juniper RSS feed   Index | Next | Previous | View Threaded


jneiberger at gmail

Jul 20, 2012, 1:56 PM

Post #1 of 8 (566 views)
Permalink
DSCP-marked traffic mysteriously being dropped by MX960

We've been troubleshooting a strange problem for a few days. JTAC is
on the case, too, but we have not found any resolution. I thought
maybe picking some minds here would be helpful. Here is a simplified
diagram:

[Device A] ------- [Router A] ------- [Router B] ------- [Router C]
----- [Device B]

The problem is that packets from Device B to Device A are being
dropped at Router A. Routers A and C are MX960s. Router B is a CRS.
Router C has an ingress firewall filter that does nothing but mark
traffic as cs2. Router A has an egress firewall filter toward Device
A, but it specifically allows the source IP address of Device B as
well as any traffic marked as cs2.

Here is where it really gets weird. If we remove the filter on Router
C that marks the traffic, everything starts working. Put the filter
back in place and the traffic stops. We've been looking at this for a
couple of days and JTAC has spent a few hours looking at it and we're
still no closer to figuring out why cs2 traffic is being dropped. With
the filter in place, traceroutes from Device B to A stop at Router A.
Remove the marking filter and traceroutes complete and pings start
succeeding.

Can any of you think of a potential culprit that we're not seeing? I
would hope that if this were something obvious, JTAC would have caught
it by now. We're all stumped.

Thanks!
John
_______________________________________________
juniper-nsp mailing list juniper-nsp [at] puck
https://puck.nether.net/mailman/listinfo/juniper-nsp


ObrienH at missouri

Jul 20, 2012, 2:04 PM

Post #2 of 8 (540 views)
Permalink
Re: DSCP-marked traffic mysteriously being dropped by MX960 [In reply to]

Have you captured traffic before and after to validate the marking?
Relavent config bits would help.

On Jul 20, 2012, at 3:56 PM, John Neiberger wrote:

> We've been troubleshooting a strange problem for a few days. JTAC is
> on the case, too, but we have not found any resolution. I thought
> maybe picking some minds here would be helpful. Here is a simplified
> diagram:
>
> [Device A] ------- [Router A] ------- [Router B] ------- [Router C]
> ----- [Device B]
>
> The problem is that packets from Device B to Device A are being
> dropped at Router A. Routers A and C are MX960s. Router B is a CRS.
> Router C has an ingress firewall filter that does nothing but mark
> traffic as cs2. Router A has an egress firewall filter toward Device
> A, but it specifically allows the source IP address of Device B as
> well as any traffic marked as cs2.
>
> Here is where it really gets weird. If we remove the filter on Router
> C that marks the traffic, everything starts working. Put the filter
> back in place and the traffic stops. We've been looking at this for a
> couple of days and JTAC has spent a few hours looking at it and we're
> still no closer to figuring out why cs2 traffic is being dropped. With
> the filter in place, traceroutes from Device B to A stop at Router A.
> Remove the marking filter and traceroutes complete and pings start
> succeeding.
>
> Can any of you think of a potential culprit that we're not seeing? I
> would hope that if this were something obvious, JTAC would have caught
> it by now. We're all stumped.
>
> Thanks!
> John
> _______________________________________________
> juniper-nsp mailing list juniper-nsp [at] puck
> https://puck.nether.net/mailman/listinfo/juniper-nsp


_______________________________________________
juniper-nsp mailing list juniper-nsp [at] puck
https://puck.nether.net/mailman/listinfo/juniper-nsp


jneiberger at gmail

Jul 20, 2012, 2:15 PM

Post #3 of 8 (541 views)
Permalink
Re: DSCP-marked traffic mysteriously being dropped by MX960 [In reply to]

We have packet captures going at the endpoints, but not in between,
unfortunately. It would be nice if we had a sniffer at that location
so we could mirror the ports and get some data there.

The inbound filter on Router C looks like this:

term netmgmt {
then {
count fec-cs2;
loss-priority high;
forwarding-class MNGMT;

Elsewhere, MNGMT is defined as cs2.

The outbound filter on Router A toward the end device is very long,
but the first two terms should allow ICMP and traceroute from the
address space that Device B is in. Further down in the filter is
another term that specifically permits CS2, among other things. We
know that that filter is working because when we remove the ingress
marking filter everything starts working, which indicates that the
outbound filter on Router A is correctly matching the source IP
address of Device B. I'd like to post more of the config, but I'm
trying to keep it sanitized and anonymous. :) I don't want to
irritate any bosses by posting our configs publicly.

Thanks,
John

On Fri, Jul 20, 2012 at 3:04 PM, OBrien, Will <ObrienH [at] missouri> wrote:
> Have you captured traffic before and after to validate the marking?
> Relavent config bits would help.
>
> On Jul 20, 2012, at 3:56 PM, John Neiberger wrote:
>
>> We've been troubleshooting a strange problem for a few days. JTAC is
>> on the case, too, but we have not found any resolution. I thought
>> maybe picking some minds here would be helpful. Here is a simplified
>> diagram:
>>
>> [Device A] ------- [Router A] ------- [Router B] ------- [Router C]
>> ----- [Device B]
>>
>> The problem is that packets from Device B to Device A are being
>> dropped at Router A. Routers A and C are MX960s. Router B is a CRS.
>> Router C has an ingress firewall filter that does nothing but mark
>> traffic as cs2. Router A has an egress firewall filter toward Device
>> A, but it specifically allows the source IP address of Device B as
>> well as any traffic marked as cs2.
>>
>> Here is where it really gets weird. If we remove the filter on Router
>> C that marks the traffic, everything starts working. Put the filter
>> back in place and the traffic stops. We've been looking at this for a
>> couple of days and JTAC has spent a few hours looking at it and we're
>> still no closer to figuring out why cs2 traffic is being dropped. With
>> the filter in place, traceroutes from Device B to A stop at Router A.
>> Remove the marking filter and traceroutes complete and pings start
>> succeeding.
>>
>> Can any of you think of a potential culprit that we're not seeing? I
>> would hope that if this were something obvious, JTAC would have caught
>> it by now. We're all stumped.
>>
>> Thanks!
>> John
>> _______________________________________________
>> juniper-nsp mailing list juniper-nsp [at] puck
>> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
_______________________________________________
juniper-nsp mailing list juniper-nsp [at] puck
https://puck.nether.net/mailman/listinfo/juniper-nsp


wayne at tuckerlabs

Jul 20, 2012, 2:49 PM

Post #4 of 8 (542 views)
Permalink
Re: DSCP-marked traffic mysteriously being dropped by MX960 [In reply to]

Does show interfaces <blah> extensive on the interface between Router A and
Device A show any drops? IIRC, the default scheduler map does not define
schedulers for anything other than be and nc - so if you're classifying the
packets on input then it could be that they're going to a class that has no
resources on the egress interface.

:w


On Fri, Jul 20, 2012 at 2:15 PM, John Neiberger <jneiberger [at] gmail>wrote:

> We have packet captures going at the endpoints, but not in between,
> unfortunately. It would be nice if we had a sniffer at that location
> so we could mirror the ports and get some data there.
>
> The inbound filter on Router C looks like this:
>
> term netmgmt {
> then {
> count fec-cs2;
> loss-priority high;
> forwarding-class MNGMT;
>
> Elsewhere, MNGMT is defined as cs2.
>
> The outbound filter on Router A toward the end device is very long,
> but the first two terms should allow ICMP and traceroute from the
> address space that Device B is in. Further down in the filter is
> another term that specifically permits CS2, among other things. We
> know that that filter is working because when we remove the ingress
> marking filter everything starts working, which indicates that the
> outbound filter on Router A is correctly matching the source IP
> address of Device B. I'd like to post more of the config, but I'm
> trying to keep it sanitized and anonymous. :) I don't want to
> irritate any bosses by posting our configs publicly.
>
> Thanks,
> John
>
> On Fri, Jul 20, 2012 at 3:04 PM, OBrien, Will <ObrienH [at] missouri>
> wrote:
> > Have you captured traffic before and after to validate the marking?
> > Relavent config bits would help.
> >
> > On Jul 20, 2012, at 3:56 PM, John Neiberger wrote:
> >
> >> We've been troubleshooting a strange problem for a few days. JTAC is
> >> on the case, too, but we have not found any resolution. I thought
> >> maybe picking some minds here would be helpful. Here is a simplified
> >> diagram:
> >>
> >> [Device A] ------- [Router A] ------- [Router B] ------- [Router C]
> >> ----- [Device B]
> >>
> >> The problem is that packets from Device B to Device A are being
> >> dropped at Router A. Routers A and C are MX960s. Router B is a CRS.
> >> Router C has an ingress firewall filter that does nothing but mark
> >> traffic as cs2. Router A has an egress firewall filter toward Device
> >> A, but it specifically allows the source IP address of Device B as
> >> well as any traffic marked as cs2.
> >>
> >> Here is where it really gets weird. If we remove the filter on Router
> >> C that marks the traffic, everything starts working. Put the filter
> >> back in place and the traffic stops. We've been looking at this for a
> >> couple of days and JTAC has spent a few hours looking at it and we're
> >> still no closer to figuring out why cs2 traffic is being dropped. With
> >> the filter in place, traceroutes from Device B to A stop at Router A.
> >> Remove the marking filter and traceroutes complete and pings start
> >> succeeding.
> >>
> >> Can any of you think of a potential culprit that we're not seeing? I
> >> would hope that if this were something obvious, JTAC would have caught
> >> it by now. We're all stumped.
> >>
> >> Thanks!
> >> John
> >> _______________________________________________
> >> juniper-nsp mailing list juniper-nsp [at] puck
> >> https://puck.nether.net/mailman/listinfo/juniper-nsp
> >
> _______________________________________________
> juniper-nsp mailing list juniper-nsp [at] puck
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
_______________________________________________
juniper-nsp mailing list juniper-nsp [at] puck
https://puck.nether.net/mailman/listinfo/juniper-nsp


jneiberger at gmail

Jul 20, 2012, 3:06 PM

Post #5 of 8 (545 views)
Permalink
Re: DSCP-marked traffic mysteriously being dropped by MX960 [In reply to]

Someone else off-list just mentioned something similar. We're checking
into that now.

Thanks!
John

On Fri, Jul 20, 2012 at 3:49 PM, Wayne Tucker <wayne [at] tuckerlabs> wrote:
> Does show interfaces <blah> extensive on the interface between Router A and
> Device A show any drops? IIRC, the default scheduler map does not define
> schedulers for anything other than be and nc - so if you're classifying the
> packets on input then it could be that they're going to a class that has no
> resources on the egress interface.
>
> :w
>
>
> On Fri, Jul 20, 2012 at 2:15 PM, John Neiberger <jneiberger [at] gmail>
> wrote:
>>
>> We have packet captures going at the endpoints, but not in between,
>> unfortunately. It would be nice if we had a sniffer at that location
>> so we could mirror the ports and get some data there.
>>
>> The inbound filter on Router C looks like this:
>>
>> term netmgmt {
>> then {
>> count fec-cs2;
>> loss-priority high;
>> forwarding-class MNGMT;
>>
>> Elsewhere, MNGMT is defined as cs2.
>>
>> The outbound filter on Router A toward the end device is very long,
>> but the first two terms should allow ICMP and traceroute from the
>> address space that Device B is in. Further down in the filter is
>> another term that specifically permits CS2, among other things. We
>> know that that filter is working because when we remove the ingress
>> marking filter everything starts working, which indicates that the
>> outbound filter on Router A is correctly matching the source IP
>> address of Device B. I'd like to post more of the config, but I'm
>> trying to keep it sanitized and anonymous. :) I don't want to
>> irritate any bosses by posting our configs publicly.
>>
>> Thanks,
>> John
>>
>> On Fri, Jul 20, 2012 at 3:04 PM, OBrien, Will <ObrienH [at] missouri>
>> wrote:
>> > Have you captured traffic before and after to validate the marking?
>> > Relavent config bits would help.
>> >
>> > On Jul 20, 2012, at 3:56 PM, John Neiberger wrote:
>> >
>> >> We've been troubleshooting a strange problem for a few days. JTAC is
>> >> on the case, too, but we have not found any resolution. I thought
>> >> maybe picking some minds here would be helpful. Here is a simplified
>> >> diagram:
>> >>
>> >> [Device A] ------- [Router A] ------- [Router B] ------- [Router C]
>> >> ----- [Device B]
>> >>
>> >> The problem is that packets from Device B to Device A are being
>> >> dropped at Router A. Routers A and C are MX960s. Router B is a CRS.
>> >> Router C has an ingress firewall filter that does nothing but mark
>> >> traffic as cs2. Router A has an egress firewall filter toward Device
>> >> A, but it specifically allows the source IP address of Device B as
>> >> well as any traffic marked as cs2.
>> >>
>> >> Here is where it really gets weird. If we remove the filter on Router
>> >> C that marks the traffic, everything starts working. Put the filter
>> >> back in place and the traffic stops. We've been looking at this for a
>> >> couple of days and JTAC has spent a few hours looking at it and we're
>> >> still no closer to figuring out why cs2 traffic is being dropped. With
>> >> the filter in place, traceroutes from Device B to A stop at Router A.
>> >> Remove the marking filter and traceroutes complete and pings start
>> >> succeeding.
>> >>
>> >> Can any of you think of a potential culprit that we're not seeing? I
>> >> would hope that if this were something obvious, JTAC would have caught
>> >> it by now. We're all stumped.
>> >>
>> >> Thanks!
>> >> John
>> >> _______________________________________________
>> >> juniper-nsp mailing list juniper-nsp [at] puck
>> >> https://puck.nether.net/mailman/listinfo/juniper-nsp
>> >
>> _______________________________________________
>> juniper-nsp mailing list juniper-nsp [at] puck
>> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
>
_______________________________________________
juniper-nsp mailing list juniper-nsp [at] puck
https://puck.nether.net/mailman/listinfo/juniper-nsp


jneiberger at gmail

Jul 20, 2012, 9:39 PM

Post #6 of 8 (540 views)
Permalink
Re: DSCP-marked traffic mysteriously being dropped by MX960 [In reply to]

On Fri, Jul 20, 2012 at 3:49 PM, Wayne Tucker <wayne [at] tuckerlabs> wrote:
> Does show interfaces <blah> extensive on the interface between Router A and
> Device A show any drops? IIRC, the default scheduler map does not define
> schedulers for anything other than be and nc - so if you're classifying the
> packets on input then it could be that they're going to a class that has no
> resources on the egress interface.
>
> :w

This is certainly what is happening. I checked and saw that we're
seeing output drops in queue 1, but based on the reading I did
tonight, it sounds like the default is for 95% of the bandwidth to be
assigned to best effort in queue 0 and 5% is set aside for network
control in queue 3. The fact that we're seeing all those drops in
queue 1 pretty much proves the issues. We have some groups configured
that have the right scheduler map on them. I just need to determine
exactly which group is the right one and apply it to the right
interfaces.

I haven't had a chance to apply the fix yet, and all of the people who
have access to the end devices for testing are gone for the weekend,
but I wanted to thank everyone for the help on this. I'm pretty new to
Juniper and I (and everyone else looking at this, including JTAC) were
stumped.


Thanks again,
John
_______________________________________________
juniper-nsp mailing list juniper-nsp [at] puck
https://puck.nether.net/mailman/listinfo/juniper-nsp


jneiberger at gmail

Jul 23, 2012, 8:02 AM

Post #7 of 8 (513 views)
Permalink
Re: DSCP-marked traffic mysteriously being dropped by MX960 [In reply to]

Well, we have applied the scheduler map to the interface but we're
still seeing 100% drops in queue 1, which is where CS2 is hitting. It
is literally dropping every packet in queue 1, but I don't understand
enough about what I'm seeing to understand why.

Queue counters: Queued packets Transmitted packets Dropped packets
0 HSD, BASIC-D 4775 4775 0
1 MNGMT, VOIP- 4913 0 4913
2 UET, CDN, VO 0 0 0
3 VOIP-BEARER, 81 81 0


show configuration class-of-service interfaces ge-2/2/0
apply-groups CRAN-P2P-COS;

show configuration groups CRAN-P2P-COS
class-of-service {
interfaces {
<*> {
scheduler-map QOS-MAP;
unit 0 {
classifiers {
dscp DSCPV4-CLASSIFIER;
dscp-ipv6 DSCPV6-CLASSIFIER;
exp EXP-CLASSIFIER;
}
rewrite-rules {
dscp DSCPV4-REWRITE;
dscp-ipv6 DSCPV6-REWRITE;
exp EXP-REWRITE;

show class-of-service interface ge-2/2/0
Physical interface: ge-2/2/0, Index: 270
Queues supported: 8, Queues in use: 4
Scheduler map: QOS-MAP, Index: 26435

Logical interface: ge-2/2/0.0, Index: 205
Object Name Type Index
Rewrite DSCPV4-REWRITE dscp 39698
Rewrite DSCPV6-REWRITE dscp-ipv6 6938
Classifier DSCPV4-CLASSIFIER dscp 7318
Classifier DSCPV6-CLASSIFIER dscp-ipv6 40094

show class-of-service scheduler-map QOS-MAP
Scheduler map: QOS-MAP, Index: 26435

Scheduler: TRAFFIC-CLASS-1-SCHEDULER, Forwarding class: BASIC-DATA,
Index: 34325
Transmit rate: 20 percent, Rate Limit: none, Buffer size: 50000
us, Priority: low
Excess Priority: unspecified
Drop profiles:
Loss priority Protocol Index Name
Low any 41108 DROP-LOW
Medium low any 1 <default-drop-profile>
Medium high any 1 <default-drop-profile>
High any 2270 DROP-HIGH

Scheduler: TRAFFIC-CLASS-2-SCHEDULER, Forwarding class:
PRIORITY-DATA, Index: 34329
Transmit rate: 30 percent, Rate Limit: none, Buffer size: 20000
us, Priority: low
Excess Priority: unspecified
Drop profiles:
Loss priority Protocol Index Name
Low any 41108 DROP-LOW
Medium low any 1 <default-drop-profile>
Medium high any 1 <default-drop-profile>
High any 2270 DROP-HIGH

Scheduler: TRAFFIC-CLASS-3-SCHEDULER, Forwarding class: VOD, Index: 34333
Transmit rate: 45 percent, Rate Limit: none, Buffer size: 10000
us, Priority: low
Excess Priority: unspecified
Drop profiles:
Loss priority Protocol Index Name
Low any 1 <default-drop-profile>
Medium low any 1 <default-drop-profile>
Medium high any 1 <default-drop-profile>
High any 2270 DROP-HIGH

Scheduler: TRAFFIC-CLASS-4-SCHEDULER, Forwarding class:
PREMIUM-DATA, Index: 34305
Transmit rate: unspecified, Rate Limit: none, Buffer size: 35000
us, Priority: strict-high
Excess Priority: unspecified
Drop profiles:
Loss priority Protocol Index Name
Low any 1 <default-drop-profile>
Medium low any 1 <default-drop-profile>
Medium high any 1 <default-drop-profile>
High any 1 <default-drop-profile>


TRAFFIC-CLASS-1-SCHEDULER {
transmit-rate percent 20;
buffer-size temporal 50k;
priority low;
drop-profile-map loss-priority low protocol any drop-profile DROP-LOW;
drop-profile-map loss-priority high protocol any drop-profile DROP-HIGH;
}
TRAFFIC-CLASS-2-SCHEDULER {
transmit-rate percent 30;
buffer-size temporal 20k;
priority low;
drop-profile-map loss-priority low protocol any drop-profile DROP-LOW;
drop-profile-map loss-priority high protocol any drop-profile DROP-HIGH;
}
TRAFFIC-CLASS-3-SCHEDULER {
transmit-rate percent 45;
buffer-size temporal 10k;
priority low;
drop-profile-map loss-priority high protocol any drop-profile DROP-HIGH;
}
TRAFFIC-CLASS-4-SCHEDULER {
buffer-size temporal 35k;
priority strict-high;


On Fri, Jul 20, 2012 at 10:39 PM, John Neiberger <jneiberger [at] gmail> wrote:
> On Fri, Jul 20, 2012 at 3:49 PM, Wayne Tucker <wayne [at] tuckerlabs> wrote:
>> Does show interfaces <blah> extensive on the interface between Router A and
>> Device A show any drops? IIRC, the default scheduler map does not define
>> schedulers for anything other than be and nc - so if you're classifying the
>> packets on input then it could be that they're going to a class that has no
>> resources on the egress interface.
>>
>> :w
>
> This is certainly what is happening. I checked and saw that we're
> seeing output drops in queue 1, but based on the reading I did
> tonight, it sounds like the default is for 95% of the bandwidth to be
> assigned to best effort in queue 0 and 5% is set aside for network
> control in queue 3. The fact that we're seeing all those drops in
> queue 1 pretty much proves the issues. We have some groups configured
> that have the right scheduler map on them. I just need to determine
> exactly which group is the right one and apply it to the right
> interfaces.
>
> I haven't had a chance to apply the fix yet, and all of the people who
> have access to the end devices for testing are gone for the weekend,
> but I wanted to thank everyone for the help on this. I'm pretty new to
> Juniper and I (and everyone else looking at this, including JTAC) were
> stumped.
>
>
> Thanks again,
> John
_______________________________________________
juniper-nsp mailing list juniper-nsp [at] puck
https://puck.nether.net/mailman/listinfo/juniper-nsp


wayne at tuckerlabs

Jul 23, 2012, 9:53 AM

Post #8 of 8 (515 views)
Permalink
Re: DSCP-marked traffic mysteriously being dropped by MX960 [In reply to]

On Mon, Jul 23, 2012 at 8:02 AM, John Neiberger <jneiberger [at] gmail> wrote:
>
> Well, we have applied the scheduler map to the interface but we're
> still seeing 100% drops in queue 1, which is where CS2 is hitting. It
> is literally dropping every packet in queue 1, but I don't understand
> enough about what I'm seeing to understand why.
>
> Queue counters: Queued packets Transmitted packets Dropped packets
> 0 HSD, BASIC-D 4775 4775 0
> 1 MNGMT, VOIP- 4913 0 4913
> 2 UET, CDN, VO 0 0 0
> 3 VOIP-BEARER, 81 81 0

It looks like you're mapping multiple classes to the same queue. I
try to avoid doing that - from what I recall it comes with an added
requirement that all forwarding classes in that queue use the same
scheduler.

I noticed in your previous output that you're setting the
loss-priority for the management traffic to high - are you sure you
want to do that? It's a little counter-intuitive, but loss-priority
high is the traffic you want to drop before low (or medium, etc.).
Depending on the drop profile used it's pretty much guaranteed to not
get any resources in cases like this where there is no applicable
scheduler (the aggressive settings keep the packets from even getting
into a queue). In fact, it's probably why you're seeing these drops
in the first place - the drop profile for loss-priority low is usually
liberal enough that the packets still get queued and forwarded.

> classifiers {
> dscp DSCPV4-CLASSIFIER;

Can you paste the config for DSCPV4-CLASSIFIER?

:w
_______________________________________________
juniper-nsp mailing list juniper-nsp [at] puck
https://puck.nether.net/mailman/listinfo/juniper-nsp

nsp juniper RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.