Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[PATCH 1/4] veth: move loopback logic to common location

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


arnd at arndb

Nov 23, 2009, 4:56 PM

Post #1 of 17 (416 views)
Permalink
[PATCH 1/4] veth: move loopback logic to common location

The veth driver contains code to forward an skb
from the start_xmit function of one network
device into the receive path of another device.

Moving that code into a common location lets us
reuse the code for direct forwarding of data
between macvlan ports, and possibly in other
drivers.

Signed-off-by: Arnd Bergmann <arnd [at] arndb>
---
drivers/net/veth.c | 17 +++--------------
include/linux/netdevice.h | 2 ++
net/core/dev.c | 36 ++++++++++++++++++++++++++++++++++++
3 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 2d657f2..6c4b5a2 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -155,8 +155,6 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
struct veth_net_stats *stats, *rcv_stats;
int length, cpu;

- skb_orphan(skb);
-
priv = netdev_priv(dev);
rcv = priv->peer;
rcv_priv = netdev_priv(rcv);
@@ -168,20 +166,12 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
if (!(rcv->flags & IFF_UP))
goto tx_drop;

- if (skb->len > (rcv->mtu + MTU_PAD))
- goto rx_drop;
-
- skb->tstamp.tv64 = 0;
- skb->pkt_type = PACKET_HOST;
- skb->protocol = eth_type_trans(skb, rcv);
if (dev->features & NETIF_F_NO_CSUM)
skb->ip_summed = rcv_priv->ip_summed;

- skb->mark = 0;
- secpath_reset(skb);
- nf_reset(skb);
-
- length = skb->len;
+ length = skb->len + ETH_HLEN;
+ if (dev_forward_skb(rcv, skb) != NET_RX_SUCCESS)
+ goto rx_drop;

stats->tx_bytes += length;
stats->tx_packets++;
@@ -189,7 +179,6 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
rcv_stats->rx_bytes += length;
rcv_stats->rx_packets++;

- netif_rx(skb);
return NETDEV_TX_OK;

tx_drop:
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 97873e3..9428793 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1562,6 +1562,8 @@ extern int dev_set_mac_address(struct net_device *,
extern int dev_hard_start_xmit(struct sk_buff *skb,
struct net_device *dev,
struct netdev_queue *txq);
+extern int dev_forward_skb(struct net_device *dev,
+ struct sk_buff *skb);

extern int netdev_budget;

diff --git a/net/core/dev.c b/net/core/dev.c
index ccefa24..ba18e82 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -105,6 +105,7 @@
#include <net/dst.h>
#include <net/pkt_sched.h>
#include <net/checksum.h>
+#include <net/xfrm.h>
#include <linux/highmem.h>
#include <linux/init.h>
#include <linux/kmod.h>
@@ -1419,6 +1420,41 @@ static inline void net_timestamp(struct sk_buff *skb)
skb->tstamp.tv64 = 0;
}

+/**
+ * dev_forward_skb - loopback an skb to another netif
+ *
+ * @dev: destination network device
+ * @skb: buffer to forward
+ *
+ * return values:
+ * NET_RX_SUCCESS (no congestion)
+ * NET_RX_DROP (packet was dropped)
+ *
+ * dev_forward_skb can be used for injecting an skb from the
+ * start_xmit function of one device into the receive queue
+ * of another device.
+ */
+int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
+{
+ skb_orphan(skb);
+
+ if (!(dev->flags & IFF_UP))
+ return NET_RX_DROP;
+
+ if (skb->len > (dev->mtu + dev->hard_header_len))
+ return NET_RX_DROP;
+
+ skb_dst_drop(skb);
+ skb->tstamp.tv64 = 0;
+ skb->pkt_type = PACKET_HOST;
+ skb->protocol = eth_type_trans(skb, dev);
+ skb->mark = 0;
+ secpath_reset(skb);
+ nf_reset(skb);
+ return netif_rx(skb);
+}
+EXPORT_SYMBOL_GPL(dev_forward_skb);
+
/*
* Support routine. Sends outgoing frames to any network
* taps currently in use.
--
1.6.3.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kaber at trash

Nov 24, 2009, 1:51 AM

Post #2 of 17 (413 views)
Permalink
Re: [PATCH 1/4] veth: move loopback logic to common location [In reply to]

Arnd Bergmann wrote:
> +int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
> +{
> + skb_orphan(skb);
> +
> + if (!(dev->flags & IFF_UP))
> + return NET_RX_DROP;
> +
> + if (skb->len > (dev->mtu + dev->hard_header_len))
> + return NET_RX_DROP;
> +
> + skb_dst_drop(skb);
> + skb->tstamp.tv64 = 0;
> + skb->pkt_type = PACKET_HOST;
> + skb->protocol = eth_type_trans(skb, dev);
> + skb->mark = 0;

skb->mark clearing should stay private to veth since its usually
supposed to stay intact. The only exception is packets crossing
namespaces, where they should appear like a freshly received skbs.

> + secpath_reset(skb);
> + nf_reset(skb);
> + return netif_rx(skb);
> +}
> +EXPORT_SYMBOL_GPL(dev_forward_skb);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


arnd at arndb

Nov 24, 2009, 2:02 AM

Post #3 of 17 (412 views)
Permalink
Re: [PATCH 1/4] veth: move loopback logic to common location [In reply to]

On Tuesday 24 November 2009 09:51:19 Patrick McHardy wrote:
> > + skb_dst_drop(skb);
> > + skb->tstamp.tv64 = 0;
> > + skb->pkt_type = PACKET_HOST;
> > + skb->protocol = eth_type_trans(skb, dev);
> > + skb->mark = 0;
>
> skb->mark clearing should stay private to veth since its usually
> supposed to stay intact. The only exception is packets crossing
> namespaces, where they should appear like a freshly received skbs.

But isn't that what we want in macvlan as well when we're
forwarding from one downstream interface to another?

I did all my testing with macvlan interfaces in separate namespaces
communicating with each other, so I'd assume that we should always
clear skb->mark and skb->dst in this function. Maybe I should make
the documentation clearer?

---
net: clarify documentation of dev_forward_skb

Signed-off-by: Arnd Bergmann <arnd [at] arndb>

--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1433,6 +1433,10 @@ static inline void net_timestamp(struct sk_buff *skb)
* dev_forward_skb can be used for injecting an skb from the
* start_xmit function of one device into the receive queue
* of another device.
+ *
+ * The receiving device may be in another namespace, so
+ * we have to clear all information in the skb that could
+ * impact namespace isolation.
*/
int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
{
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kaber at trash

Nov 24, 2009, 2:17 AM

Post #4 of 17 (412 views)
Permalink
Re: [PATCH 1/4] veth: move loopback logic to common location [In reply to]

Arnd Bergmann wrote:
> On Tuesday 24 November 2009 09:51:19 Patrick McHardy wrote:
>>> + skb_dst_drop(skb);
>>> + skb->tstamp.tv64 = 0;
>>> + skb->pkt_type = PACKET_HOST;
>>> + skb->protocol = eth_type_trans(skb, dev);
>>> + skb->mark = 0;
>> skb->mark clearing should stay private to veth since its usually
>> supposed to stay intact. The only exception is packets crossing
>> namespaces, where they should appear like a freshly received skbs.
>
> But isn't that what we want in macvlan as well when we're
> forwarding from one downstream interface to another?

In the TX direction you can use the mark for TC classification
on the underlying device.

> I did all my testing with macvlan interfaces in separate namespaces
> communicating with each other, so I'd assume that we should always
> clear skb->mark and skb->dst in this function.

Good point, in that case we probably should clear it as well. But
in the non-namespace case the TC classification currently works and
this is consistent with any other virtual device driver, so it
should continue to work.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


arnd at arndb

Nov 24, 2009, 2:34 AM

Post #5 of 17 (409 views)
Permalink
Re: [PATCH 1/4] veth: move loopback logic to common location [In reply to]

On Tuesday 24 November 2009 10:17:11 Patrick McHardy wrote:
> Arnd Bergmann wrote:
> > On Tuesday 24 November 2009 09:51:19 Patrick McHardy wrote:
> >>> + skb_dst_drop(skb);
> >>> + skb->tstamp.tv64 = 0;
> >>> + skb->pkt_type = PACKET_HOST;
> >>> + skb->protocol = eth_type_trans(skb, dev);
> >>> + skb->mark = 0;
> >> skb->mark clearing should stay private to veth since its usually
> >> supposed to stay intact. The only exception is packets crossing
> >> namespaces, where they should appear like a freshly received skbs.
> >
> > But isn't that what we want in macvlan as well when we're
> > forwarding from one downstream interface to another?
>
> In the TX direction you can use the mark for TC classification
> on the underlying device.

I don't use dev_forward_skb for the case where the data is sent
to the underlying device, so the TC classification should stay
intact.

> > I did all my testing with macvlan interfaces in separate namespaces
> > communicating with each other, so I'd assume that we should always
> > clear skb->mark and skb->dst in this function.
>
> Good point, in that case we probably should clear it as well. But
> in the non-namespace case the TC classification currently works and
> this is consistent with any other virtual device driver, so it
> should continue to work.

Do you think we should be able to use TC to direct traffic between
macvlans on the same underlying device in bridge mode? It does sound
useful, but I'm not sure how to implement that or if you'd expect
it to work with the current code. If we support that, it should probably
also work with namespaces, by consuming the mark in the macvlan
and veth drivers.

Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kaber at trash

Nov 24, 2009, 2:40 AM

Post #6 of 17 (403 views)
Permalink
Re: [PATCH 1/4] veth: move loopback logic to common location [In reply to]

Arnd Bergmann wrote:
> On Tuesday 24 November 2009 10:17:11 Patrick McHardy wrote:
>> Arnd Bergmann wrote:
>>> On Tuesday 24 November 2009 09:51:19 Patrick McHardy wrote:
>>>>> + skb_dst_drop(skb);
>>>>> + skb->tstamp.tv64 = 0;
>>>>> + skb->pkt_type = PACKET_HOST;
>>>>> + skb->protocol = eth_type_trans(skb, dev);
>>>>> + skb->mark = 0;
>>>> skb->mark clearing should stay private to veth since its usually
>>>> supposed to stay intact. The only exception is packets crossing
>>>> namespaces, where they should appear like a freshly received skbs.
>>> But isn't that what we want in macvlan as well when we're
>>> forwarding from one downstream interface to another?
>> In the TX direction you can use the mark for TC classification
>> on the underlying device.
>
> I don't use dev_forward_skb for the case where the data is sent
> to the underlying device, so the TC classification should stay
> intact.

Right, I see. This looks fine.

>>> I did all my testing with macvlan interfaces in separate namespaces
>>> communicating with each other, so I'd assume that we should always
>>> clear skb->mark and skb->dst in this function.
>> Good point, in that case we probably should clear it as well. But
>> in the non-namespace case the TC classification currently works and
>> this is consistent with any other virtual device driver, so it
>> should continue to work.
>
> Do you think we should be able to use TC to direct traffic between
> macvlans on the same underlying device in bridge mode? It does sound
> useful, but I'm not sure how to implement that or if you'd expect
> it to work with the current code. If we support that, it should probably
> also work with namespaces, by consuming the mark in the macvlan
> and veth drivers.

I don't think its necessary, we bypass outgoing queuing anyways.
But if you'd want to add it, just keeping the skb->mark clearing
in veth should work from what I can tell.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


arnd at arndb

Nov 24, 2009, 5:13 AM

Post #7 of 17 (401 views)
Permalink
Re: [PATCH 1/4] veth: move loopback logic to common location [In reply to]

On Tuesday 24 November 2009, Patrick McHardy wrote:
> I don't think its necessary, we bypass outgoing queuing anyways.
> But if you'd want to add it, just keeping the skb->mark clearing
> in veth should work from what I can tell.

Ok, I won't bother with it for now then.

Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


ebiederm at xmission

Nov 24, 2009, 8:42 AM

Post #8 of 17 (394 views)
Permalink
Re: [PATCH 1/4] veth: move loopback logic to common location [In reply to]

Patrick McHardy <kaber [at] trash> writes:

>>>> I did all my testing with macvlan interfaces in separate namespaces
>>>> communicating with each other, so I'd assume that we should always
>>>> clear skb->mark and skb->dst in this function.
>>> Good point, in that case we probably should clear it as well. But
>>> in the non-namespace case the TC classification currently works and
>>> this is consistent with any other virtual device driver, so it
>>> should continue to work.
>>
>> Do you think we should be able to use TC to direct traffic between
>> macvlans on the same underlying device in bridge mode? It does sound
>> useful, but I'm not sure how to implement that or if you'd expect
>> it to work with the current code. If we support that, it should probably
>> also work with namespaces, by consuming the mark in the macvlan
>> and veth drivers.
>
> I don't think its necessary, we bypass outgoing queuing anyways.
> But if you'd want to add it, just keeping the skb->mark clearing
> in veth should work from what I can tell.

veth doesn't have an outgoing queue. The reason we clear skb->mark
in veth is because when reentering the networking stack the packet
needs to be reclassified. At the point of loopback we are talking
a packet that has at least logically gone out of the machine on a
wire and come back into the machine on another physical interface.

So it seems to me we should have consistent handling for macvlans,
veth, for the cases where we are looping packets back around. In
practice I expect all of those cases are going to be cross namespace
as otherwise we would have intercepted the packet before going
out a physical interface.

Eric




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kaber at trash

Nov 24, 2009, 8:56 AM

Post #9 of 17 (394 views)
Permalink
Re: [PATCH 1/4] veth: move loopback logic to common location [In reply to]

Eric W. Biederman wrote:
> Patrick McHardy <kaber [at] trash> writes:
>
>>>>> I did all my testing with macvlan interfaces in separate namespaces
>>>>> communicating with each other, so I'd assume that we should always
>>>>> clear skb->mark and skb->dst in this function.
>>>> Good point, in that case we probably should clear it as well. But
>>>> in the non-namespace case the TC classification currently works and
>>>> this is consistent with any other virtual device driver, so it
>>>> should continue to work.
>>> Do you think we should be able to use TC to direct traffic between
>>> macvlans on the same underlying device in bridge mode? It does sound
>>> useful, but I'm not sure how to implement that or if you'd expect
>>> it to work with the current code. If we support that, it should probably
>>> also work with namespaces, by consuming the mark in the macvlan
>>> and veth drivers.
>> I don't think its necessary, we bypass outgoing queuing anyways.
>> But if you'd want to add it, just keeping the skb->mark clearing
>> in veth should work from what I can tell.
>
> veth doesn't have an outgoing queue. The reason we clear skb->mark
> in veth is because when reentering the networking stack the packet
> needs to be reclassified. At the point of loopback we are talking
> a packet that has at least logically gone out of the machine on a
> wire and come back into the machine on another physical interface.
>
> So it seems to me we should have consistent handling for macvlans,
> veth, for the cases where we are looping packets back around. In
> practice I expect all of those cases are going to be cross namespace
> as otherwise we would have intercepted the packet before going
> out a physical interface.

Agreed on the looping case, that's what we're doing now.

In the layered case (macvlan -> eth0) its common behaviour to
keep the mark however. But in case of different namespaces,
I think macvlan should also clear the mark on the dev_queue_xmit()
path since this is just a shortcut to looping the packets
through veth. In fact probably both of them should also clear
skb->priority so other namespaces don't accidentally misclassify
packets.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


ebiederm at xmission

Nov 24, 2009, 10:10 AM

Post #10 of 17 (396 views)
Permalink
Re: [PATCH 1/4] veth: move loopback logic to common location [In reply to]

Patrick McHardy <kaber [at] trash> writes:

> Eric W. Biederman wrote:
>> Patrick McHardy <kaber [at] trash> writes:
>>
>>>>>> I did all my testing with macvlan interfaces in separate namespaces
>>>>>> communicating with each other, so I'd assume that we should always
>>>>>> clear skb->mark and skb->dst in this function.
>>>>> Good point, in that case we probably should clear it as well. But
>>>>> in the non-namespace case the TC classification currently works and
>>>>> this is consistent with any other virtual device driver, so it
>>>>> should continue to work.
>>>> Do you think we should be able to use TC to direct traffic between
>>>> macvlans on the same underlying device in bridge mode? It does sound
>>>> useful, but I'm not sure how to implement that or if you'd expect
>>>> it to work with the current code. If we support that, it should probably
>>>> also work with namespaces, by consuming the mark in the macvlan
>>>> and veth drivers.
>>> I don't think its necessary, we bypass outgoing queuing anyways.
>>> But if you'd want to add it, just keeping the skb->mark clearing
>>> in veth should work from what I can tell.
>>
>> veth doesn't have an outgoing queue. The reason we clear skb->mark
>> in veth is because when reentering the networking stack the packet
>> needs to be reclassified. At the point of loopback we are talking
>> a packet that has at least logically gone out of the machine on a
>> wire and come back into the machine on another physical interface.
>>
>> So it seems to me we should have consistent handling for macvlans,
>> veth, for the cases where we are looping packets back around. In
>> practice I expect all of those cases are going to be cross namespace
>> as otherwise we would have intercepted the packet before going
>> out a physical interface.
>
> Agreed on the looping case, that's what we're doing now.
>
> In the layered case (macvlan -> eth0) its common behaviour to
> keep the mark however. But in case of different namespaces,
> I think macvlan should also clear the mark on the dev_queue_xmit()
> path since this is just a shortcut to looping the packets
> through veth. In fact probably both of them should also clear
> skb->priority so other namespaces don't accidentally misclassify
> packets.

That is why I pushed for what is becoming dev_forward_skb. So that
we have one place where we can make all of those tweaks. It seems
like in every review we find another field that should be cleared/handled
specially.

I don't quite follow what you intend with dev_queue_xmit when the macvlan
is in one namespace and the real physical device is in another. Are
you mentioning that the packet classifier runs in the namespace where
the primary device lives with packets from a different namespace?

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


arnd at arndb

Nov 24, 2009, 10:28 AM

Post #11 of 17 (395 views)
Permalink
Re: [PATCH 1/4] veth: move loopback logic to common location [In reply to]

On Tuesday 24 November 2009, Eric W. Biederman wrote:
> I don't quite follow what you intend with dev_queue_xmit when the macvlan
> is in one namespace and the real physical device is in another. Are
> you mentioning that the packet classifier runs in the namespace where
> the primary device lives with packets from a different namespace?

I treat internal and external delivery very differently, the three
cases are:

1. skb from real device to macvlan (macvlan_handle_frame): basically
unchanged from before, except avoiding duplicate broadcasts. All
skbs end up in netif_rx(vlan->dev) without clearing any data.
We catch the frame in netif_receive_skb before it interacts with the
namespace of the real device.

2. skb to external device (macvlan_start_xmit): if the destination
is external, we just end up in dev_queue_xmit, with skb->dev set to
the external device but no other changes. The data is already on the
way out at this stage, so the namespace should not matter any more.

3. internal delivery: an skb from one macvlan to another gets always
sent through dev_forward_skb, which is supposed to clear anything
that must not leave the namespace.

Does this make sense?

Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kaber at trash

Nov 24, 2009, 10:38 AM

Post #12 of 17 (392 views)
Permalink
Re: [PATCH 1/4] veth: move loopback logic to common location [In reply to]

Eric W. Biederman wrote:
> Patrick McHardy <kaber [at] trash> writes:
>
>> In the layered case (macvlan -> eth0) its common behaviour to
>> keep the mark however. But in case of different namespaces,
>> I think macvlan should also clear the mark on the dev_queue_xmit()
>> path since this is just a shortcut to looping the packets
>> through veth. In fact probably both of them should also clear
>> skb->priority so other namespaces don't accidentally misclassify
>> packets.
>
> That is why I pushed for what is becoming dev_forward_skb. So that
> we have one place where we can make all of those tweaks. It seems
> like in every review we find another field that should be cleared/handled
> specially.
>
> I don't quite follow what you intend with dev_queue_xmit when the macvlan
> is in one namespace and the real physical device is in another. Are
> you mentioning that the packet classifier runs in the namespace where
> the primary device lives with packets from a different namespace?

Exactly. And I think we should make sure that the namespace of
the macvlan device can't (deliberately or accidentally) cause
misclassification.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


arnd at arndb

Nov 26, 2009, 7:21 AM

Post #13 of 17 (396 views)
Permalink
Re: [PATCH 1/4] veth: move loopback logic to common location [In reply to]

On Tuesday 24 November 2009, Patrick McHardy wrote:
> Eric W. Biederman wrote:
> > I don't quite follow what you intend with dev_queue_xmit when the macvlan
> > is in one namespace and the real physical device is in another. Are
> > you mentioning that the packet classifier runs in the namespace where
> > the primary device lives with packets from a different namespace?
>
> Exactly. And I think we should make sure that the namespace of
> the macvlan device can't (deliberately or accidentally) cause
> misclassification.

This is independent of my series and a preexisting problem, right?

Which fields do you think need to be reset to maintain namespace
isolation for the outbound path in macvlan?
---
net: maintain namespace isolation between vlan and real device

In the vlan and macvlan drivers, the start_xmit function forwards
data to the dev_queue_xmit function for another device, which may
potentially belong to a different namespace.

To make sure that classification stays within a single namespace,
this resets the potentially critical fields.

Signed-off-by: Arnd Bergmann <arnd [at] arndb>

---
drivers/net/macvlan.c | 2 +-
include/linux/netdevice.h | 9 +++++++++
net/8021q/vlan_dev.c | 2 +-
net/core/dev.c | 26 ++++++++++++++++++++++----
4 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 322112c..edcebf1 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -269,7 +269,7 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev)
}

xmit_world:
- skb->dev = vlan->lowerdev;
+ skb_set_dev(skb, vlan->lowerdev);
return dev_queue_xmit(skb);
}

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9428793..fdf4a1a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1009,6 +1009,15 @@ static inline bool netdev_uses_dsa_tags(struct net_device *dev)
return 0;
}

+#ifdef CONFIG_NET_NS
+static inline void skb_set_dev(struct sk_buff *skb, struct net_device *dev)
+{
+ skb->dev = dev;
+}
+#else /* CONFIG_NET_NS */
+void skb_set_dev(struct sk_buff *skb, struct net_device *dev);
+#endif
+
static inline bool netdev_uses_trailer_tags(struct net_device *dev)
{
#ifdef CONFIG_NET_DSA_TAG_TRAILER
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index de0dc6b..51fcfff 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -323,7 +323,7 @@ static netdev_tx_t vlan_dev_hard_start_xmit(struct sk_buff *skb,
}


- skb->dev = vlan_dev_info(dev)->real_dev;
+ skb_set_dev(skb, vlan_dev_info(dev)->real_dev);
len = skb->len;
ret = dev_queue_xmit(skb);

diff --git a/net/core/dev.c b/net/core/dev.c
index f8baa15..7179b58 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1448,13 +1448,10 @@ int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
if (skb->len > (dev->mtu + dev->hard_header_len))
return NET_RX_DROP;

- skb_dst_drop(skb);
+ skb_set_dev(skb, dev);
skb->tstamp.tv64 = 0;
skb->pkt_type = PACKET_HOST;
skb->protocol = eth_type_trans(skb, dev);
- skb->mark = 0;
- secpath_reset(skb);
- nf_reset(skb);
return netif_rx(skb);
}
EXPORT_SYMBOL_GPL(dev_forward_skb);
@@ -1614,6 +1611,27 @@ static bool dev_can_checksum(struct net_device *dev, struct sk_buff *skb)
return false;
}

+/**
+ * skb_dev_set -- assign a buffer to a new device
+ * @skb: buffer for the new device
+ * @dev: network device
+ *
+ * If an skb is owned by a device already, we have to reset
+ * all data private to the namespace a device belongs to
+ * before assigning it a new device.
+ */
+void skb_set_dev(struct sk_buff *skb, struct net_device *dev)
+{
+ if (skb->dev && !net_eq(dev_net(skb->dev), dev_net(dev))) {
+ secpath_reset(skb);
+ skb_dst_drop(skb);
+ nf_reset(skb);
+ skb->mark = 0;
+ }
+ skb->dev = dev;
+}
+EXPORT_SYMBOL(skb_set_dev);
+
/*
* Invalidate hardware checksum when packet is to be mangled, and
* complete checksum manually on outgoing path.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kaber at trash

Nov 26, 2009, 7:33 AM

Post #14 of 17 (395 views)
Permalink
Re: [PATCH 1/4] veth: move loopback logic to common location [In reply to]

Arnd Bergmann wrote:
> On Tuesday 24 November 2009, Patrick McHardy wrote:
>> Eric W. Biederman wrote:
>>> I don't quite follow what you intend with dev_queue_xmit when the macvlan
>>> is in one namespace and the real physical device is in another. Are
>>> you mentioning that the packet classifier runs in the namespace where
>>> the primary device lives with packets from a different namespace?
>> Exactly. And I think we should make sure that the namespace of
>> the macvlan device can't (deliberately or accidentally) cause
>> misclassification.
>
> This is independent of my series and a preexisting problem, right?

Correct.

> Which fields do you think need to be reset to maintain namespace
> isolation for the outbound path in macvlan?

In addition to those already handled, I'd say

- priority: affects qdisc classification, may refer to classes of the
old namespace
- ipvs_property: might cause packets to incorrectly skip netfilter hooks
- nf_trace: might trigger packet tracing
- nf_bridge: contains references to network devices in the old NS,
also indicates packet was bridged
- iif: index is only valid in the originating namespace
- tc_index: classification result, should only be set in the namespace
of the classifier
- tc_verd: RTTL etc. should begin at zero again
- probably secmark.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


ebiederm at xmission

Nov 26, 2009, 8:38 AM

Post #15 of 17 (392 views)
Permalink
Re: [PATCH 1/4] veth: move loopback logic to common location [In reply to]

Patrick McHardy <kaber [at] trash> writes:

> Arnd Bergmann wrote:
>> On Tuesday 24 November 2009, Patrick McHardy wrote:
>>> Eric W. Biederman wrote:
>>>> I don't quite follow what you intend with dev_queue_xmit when the macvlan
>>>> is in one namespace and the real physical device is in another. Are
>>>> you mentioning that the packet classifier runs in the namespace where
>>>> the primary device lives with packets from a different namespace?
>>> Exactly. And I think we should make sure that the namespace of
>>> the macvlan device can't (deliberately or accidentally) cause
>>> misclassification.
>>
>> This is independent of my series and a preexisting problem, right?
>
> Correct.
>
>> Which fields do you think need to be reset to maintain namespace
>> isolation for the outbound path in macvlan?
>
> In addition to those already handled, I'd say
>
> - priority: affects qdisc classification, may refer to classes of the
> old namespace
> - ipvs_property: might cause packets to incorrectly skip netfilter hooks
> - nf_trace: might trigger packet tracing
> - nf_bridge: contains references to network devices in the old NS,
> also indicates packet was bridged
> - iif: index is only valid in the originating namespace
> - tc_index: classification result, should only be set in the namespace
> of the classifier
> - tc_verd: RTTL etc. should begin at zero again
> - probably secmark.

Wow. I thought we were trying to reduce skbuff, where did all of those
fields come from? Regarless that sounds like a good list to get stomped.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


arnd at arndb

Nov 26, 2009, 9:44 AM

Post #16 of 17 (394 views)
Permalink
Re: [PATCH 1/4] veth: move loopback logic to common location [In reply to]

On Thursday 26 November 2009, Patrick McHardy wrote:
> In addition to those already handled, I'd say
>
> - priority: affects qdisc classification, may refer to classes of the
> old namespace
> - ipvs_property: might cause packets to incorrectly skip netfilter hooks
> - nf_trace: might trigger packet tracing
> - nf_bridge: contains references to network devices in the old NS,
> also indicates packet was bridged
> - iif: index is only valid in the originating namespace
> - probably secmark.

ok

> - tc_index: classification result, should only be set in the namespace
> of the classifier
> - tc_verd: RTTL etc. should begin at zero again

Wouldn't that defeat the purpose of RTTL? If you create a loop
across two devices in different namespaces, it may no longer get
detected. Or is that a different problem again?

Arnd <><

---

net: maintain namespace isolation between vlan and real device

In the vlan and macvlan drivers, the start_xmit function forwards
data to the dev_queue_xmit function for another device, which may
potentially belong to a different namespace.

To make sure that classification stays within a single namespace,
this resets the potentially critical fields.

Still needs testing, don't apply

Signed-off-by: Arnd Bergmann <arnd [at] arndb>
---
drivers/net/macvlan.c | 2 +-
include/linux/netdevice.h | 9 +++++++++
net/8021q/vlan_dev.c | 2 +-
net/core/dev.c | 37 +++++++++++++++++++++++++++++++++----
4 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 322112c..edcebf1 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -269,7 +269,7 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev)
}

xmit_world:
- skb->dev = vlan->lowerdev;
+ skb_set_dev(skb, vlan->lowerdev);
return dev_queue_xmit(skb);
}

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9428793..fdf4a1a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1009,6 +1009,15 @@ static inline bool netdev_uses_dsa_tags(struct net_device *dev)
return 0;
}

+#ifdef CONFIG_NET_NS
+static inline void skb_set_dev(struct sk_buff *skb, struct net_device *dev)
+{
+ skb->dev = dev;
+}
+#else /* CONFIG_NET_NS */
+void skb_set_dev(struct sk_buff *skb, struct net_device *dev);
+#endif
+
static inline bool netdev_uses_trailer_tags(struct net_device *dev)
{
#ifdef CONFIG_NET_DSA_TAG_TRAILER
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index de0dc6b..51fcfff 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -323,7 +323,7 @@ static netdev_tx_t vlan_dev_hard_start_xmit(struct sk_buff *skb,
}


- skb->dev = vlan_dev_info(dev)->real_dev;
+ skb_set_dev(skb, vlan_dev_info(dev)->real_dev);
len = skb->len;
ret = dev_queue_xmit(skb);

diff --git a/net/core/dev.c b/net/core/dev.c
index f8baa15..220d4e4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1448,13 +1448,10 @@ int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
if (skb->len > (dev->mtu + dev->hard_header_len))
return NET_RX_DROP;

- skb_dst_drop(skb);
+ skb_set_dev(skb, dev);
skb->tstamp.tv64 = 0;
skb->pkt_type = PACKET_HOST;
skb->protocol = eth_type_trans(skb, dev);
- skb->mark = 0;
- secpath_reset(skb);
- nf_reset(skb);
return netif_rx(skb);
}
EXPORT_SYMBOL_GPL(dev_forward_skb);
@@ -1614,6 +1611,39 @@ static bool dev_can_checksum(struct net_device *dev, struct sk_buff *skb)
return false;
}

+/**
+ * skb_dev_set -- assign a buffer to a new device
+ * @skb: buffer for the new device
+ * @dev: network device
+ *
+ * If an skb is owned by a device already, we have to reset
+ * all data private to the namespace a device belongs to
+ * before assigning it a new device.
+ */
+void skb_set_dev(struct sk_buff *skb, struct net_device *dev)
+{
+ if (skb->dev && !net_eq(dev_net(skb->dev), dev_net(dev))) {
+ secpath_reset(skb);
+ skb_dst_drop(skb);
+ nf_reset(skb);
+ skb_init_secmark(skb);
+ skb->mark = 0;
+ skb->priority = 0;
+ skb->nf_trace = 0;
+ skb->ipvs_property = 0;
+#ifdef CONFIG_NET_SCHED
+ skb->tc_index = 0;
+#ifdef CONFIG_NET_CLS_ACT
+ skb->tc_verd = SET_TC_VERD(skb->tc_verd, 0);
+ skb->tc_verd = SET_TC_RTTL(skb->tc_verd, 0);
+#endif
+#endif
+ }
+ skb->dev = dev;
+ skb->skb_iif = skb->dev->ifindex;
+}
+EXPORT_SYMBOL(skb_set_dev);
+
/*
* Invalidate hardware checksum when packet is to be mangled, and
* complete checksum manually on outgoing path.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kaber at trash

Nov 26, 2009, 1:14 PM

Post #17 of 17 (391 views)
Permalink
Re: [PATCH 1/4] veth: move loopback logic to common location [In reply to]

Arnd Bergmann wrote:
> On Thursday 26 November 2009, Patrick McHardy wrote:
>> In addition to those already handled, I'd say
>>
>> - priority: affects qdisc classification, may refer to classes of the
>> old namespace
>> - ipvs_property: might cause packets to incorrectly skip netfilter hooks
>> - nf_trace: might trigger packet tracing
>> - nf_bridge: contains references to network devices in the old NS,
>> also indicates packet was bridged
>> - iif: index is only valid in the originating namespace
>> - probably secmark.
>
> ok
>
>> - tc_index: classification result, should only be set in the namespace
>> of the classifier
>> - tc_verd: RTTL etc. should begin at zero again
>
> Wouldn't that defeat the purpose of RTTL? If you create a loop
> across two devices in different namespaces, it may no longer get
> detected. Or is that a different problem again?

Mhh good point, that would indeed be possible. OTOH using ingress
filtering in one namespace currently might cause the packet to get
dropped in a different namespace because the ttl runs out. For now
I'd suggest to go the safe route and keep the TTL intact until we
can come up with something better.

> +void skb_set_dev(struct sk_buff *skb, struct net_device *dev)
> +{
> + if (skb->dev && !net_eq(dev_net(skb->dev), dev_net(dev))) {
> + secpath_reset(skb);
> + skb_dst_drop(skb);
> + nf_reset(skb);
> + skb_init_secmark(skb);
> + skb->mark = 0;
> + skb->priority = 0;
> + skb->nf_trace = 0;
> + skb->ipvs_property = 0;
> +#ifdef CONFIG_NET_SCHED
> + skb->tc_index = 0;
> +#ifdef CONFIG_NET_CLS_ACT
> + skb->tc_verd = SET_TC_VERD(skb->tc_verd, 0);
> + skb->tc_verd = SET_TC_RTTL(skb->tc_verd, 0);
> +#endif
> +#endif

This makes we wonder which ones we actually should keep. Most of the
others get reinitialized anyways, so maybe its better to simply clear
the entire area up until ->tail like f.i. skb_recycle_check().

> + }
> + skb->dev = dev;
> + skb->skb_iif = skb->dev->ifindex;

This doesn't seem necessary, if the packet goes through
netif_receive_skb, it will be set anyways.

> +}
> +EXPORT_SYMBOL(skb_set_dev);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.