Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

QUESTION: can netdev_alloc_skb() errors be reduced by tuning?

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


starlight at binnacle

Jun 15, 2009, 5:19 PM

Post #1 of 8 (397 views)
Permalink
QUESTION: can netdev_alloc_skb() errors be reduced by tuning?

Hello,

I submitted testcase for a hugepages bug that has been
successfully resolved. Have an apparently obscure question
related to MM, and so I am asking anyone who might have some idea
on this. Nothing much turned up via Google and digging into
the KMEM code looks daunting.

Running Intel 82598/ixgbe 10 gig Ethernet under heavy stress.
Generally is working well after tuning IRQ affinities, but a
fair number of buffer allocation failures are occurring in the
'ixgbe' device driver and are reported via 'ethtool' statistics.
This may be causing data loss.

The kernel primitive returning the error is netdev_alloc_skb().

Are any tuneable parameters available that can reduce or
eliminate these allocation failures? Have about eleven
gigabytes of free memory, though most of that is consumed
by non-dirty file cache data. Total system memory is 16GB with
4GB allocated to hugepages. Zero swap usage and activity though
swap is enabled. Most application memory is hugepage or is
'mlock()'ed.

Thank you.





System rebooted before test run.

Dual Xeon E5430, 16GB FB-DIMM RAM.


$ cat /proc/meminfo
MemTotal: 16443828 kB
MemFree: 281176 kB
Buffers: 53896 kB
Cached: 11331924 kB
SwapCached: 0 kB
Active: 200740 kB
Inactive: 11284312 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 16443828 kB
LowFree: 281176 kB
SwapTotal: 2031608 kB
SwapFree: 2031400 kB
Dirty: 4 kB
Writeback: 0 kB
AnonPages: 104464 kB
Mapped: 14644 kB
Slab: 440452 kB
PageTables: 4032 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 8156368 kB
Committed_AS: 122452 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 266872 kB
VmallocChunk: 34359471043 kB
HugePages_Total: 2048
HugePages_Free: 735
HugePages_Rsvd: 0
Hugepagesize: 2048 kB


# ethtool -S eth2 | egrep -v ': 0$'
NIC statistics:
rx_packets: 724246449
tx_packets: 229847
rx_bytes: 152691992335
tx_bytes: 10573426
multicast: 725997241
broadcast: 6
rx_csum_offload_good: 723051776
alloc_rx_buff_failed: 7119
tx_queue_0_packets: 229847
tx_queue_0_bytes: 10573426
rx_queue_0_packets: 340698332
rx_queue_0_bytes: 70844299683
rx_queue_1_packets: 385298923
rx_queue_1_bytes: 82276167594


ixgbe driver fragment
=====================
struct sk_buff *skb = netdev_alloc_skb(adapter->netdev, bufsz);

if (!skb) {
adapter->alloc_rx_buff_failed++;
goto no_buffers;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


eric.dumazet at gmail

Jun 15, 2009, 7:26 PM

Post #2 of 8 (360 views)
Permalink
Re: QUESTION: can netdev_alloc_skb() errors be reduced by tuning? [In reply to]

starlight [at] binnacle a écrit :
> Hello,
>
> I submitted testcase for a hugepages bug that has been
> successfully resolved. Have an apparently obscure question
> related to MM, and so I am asking anyone who might have some idea
> on this. Nothing much turned up via Google and digging into
> the KMEM code looks daunting.
>
> Running Intel 82598/ixgbe 10 gig Ethernet under heavy stress.
> Generally is working well after tuning IRQ affinities, but a
> fair number of buffer allocation failures are occurring in the
> 'ixgbe' device driver and are reported via 'ethtool' statistics.
> This may be causing data loss.
>
> The kernel primitive returning the error is netdev_alloc_skb().
>
> Are any tuneable parameters available that can reduce or
> eliminate these allocation failures? Have about eleven
> gigabytes of free memory, though most of that is consumed
> by non-dirty file cache data. Total system memory is 16GB with
> 4GB allocated to hugepages. Zero swap usage and activity though
> swap is enabled. Most application memory is hugepage or is
> 'mlock()'ed.
>
> Thank you.
>
>
>
>
>
> System rebooted before test run.
>
> Dual Xeon E5430, 16GB FB-DIMM RAM.
>
>
> $ cat /proc/meminfo
> MemTotal: 16443828 kB
> MemFree: 281176 kB
> Buffers: 53896 kB
> Cached: 11331924 kB
> SwapCached: 0 kB
> Active: 200740 kB
> Inactive: 11284312 kB
> HighTotal: 0 kB
> HighFree: 0 kB
> LowTotal: 16443828 kB
> LowFree: 281176 kB
> SwapTotal: 2031608 kB
> SwapFree: 2031400 kB
> Dirty: 4 kB
> Writeback: 0 kB
> AnonPages: 104464 kB
> Mapped: 14644 kB
> Slab: 440452 kB
> PageTables: 4032 kB
> NFS_Unstable: 0 kB
> Bounce: 0 kB
> CommitLimit: 8156368 kB
> Committed_AS: 122452 kB
> VmallocTotal: 34359738367 kB
> VmallocUsed: 266872 kB
> VmallocChunk: 34359471043 kB
> HugePages_Total: 2048
> HugePages_Free: 735
> HugePages_Rsvd: 0
> Hugepagesize: 2048 kB
>
>
> # ethtool -S eth2 | egrep -v ': 0$'
> NIC statistics:
> rx_packets: 724246449
> tx_packets: 229847
> rx_bytes: 152691992335
> tx_bytes: 10573426
> multicast: 725997241
> broadcast: 6
> rx_csum_offload_good: 723051776
> alloc_rx_buff_failed: 7119
> tx_queue_0_packets: 229847
> tx_queue_0_bytes: 10573426
> rx_queue_0_packets: 340698332
> rx_queue_0_bytes: 70844299683
> rx_queue_1_packets: 385298923
> rx_queue_1_bytes: 82276167594
>
>
> ixgbe driver fragment
> =====================
> struct sk_buff *skb = netdev_alloc_skb(adapter->netdev, bufsz);
>
> if (!skb) {
> adapter->alloc_rx_buff_failed++;
> goto no_buffers;
> }
>

152691992335/724246449 = 210 bytes per rx packet in average

It could make sense to add copybreak feature in this driver to reduce memory needs,
but that also would consume more cpu cycles, and slow down forwarding setups.

Maybe this packet trimming could be done generically in UDP stack input path,
before queueing packet into a receive queue, if amount of available memory
is under a given threshold.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


starlight at binnacle

Jun 15, 2009, 9:12 PM

Post #3 of 8 (359 views)
Permalink
Re: QUESTION: can netdev_alloc_skb() errors be reduced by tuning? [In reply to]

Eric,

Great thought--thank you. Running a similar server with
82571/e1000e and it does not exhibit the problem. 'e1000e' has
default copybreak=256 while 'ixgbe' has no copybreak. Rational
given is

http://osdir.com/ml/linux.drivers.e1000.devel/2008-01/msg00103.html

But the comparion is a bit apples-and-oranges since the 'e1000e'
system is dual Opteron 2354 while the 'ixgbe' system is Xeon
E5430 (a painful choice thus far). Also 'e1000e' system passes
data via a PACKET socket while the 'ixgbe' system passes data
via UDP (a configurable option).

I'm not fully up on how this all works: am I to understand that
the error could result from RX ring-queue buffers not freeing
quickly enough because they have a use-count held non-zero as
the packet travels the stack?

I've just doubled some SLAB tuneables that seem relevant, but
if the cause is the aforementioned, this won't help. Will
have the answer on the tweaks by the end of Tuesday.

David



At 04:26 AM 6/16/2009 +0200, Eric Dumazet wrote:
>
>152691992335/724246449 = 210 bytes per rx packet in average
>
>It could make sense to add copybreak feature in this driver to
>reduce memory needs, but that also would consume more cpu
>cycles, and slow down forwarding setups.
>
>Maybe this packet trimming could be done generically in UDP
>stack input path, before queueing packet into a receive queue,
>if amount of available memory is under a given threshold.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


eric.dumazet at gmail

Jun 15, 2009, 11:12 PM

Post #4 of 8 (358 views)
Permalink
Re: QUESTION: can netdev_alloc_skb() errors be reduced by tuning? [In reply to]

Please dont top post, we prefer other way around :)

starlight [at] binnacle a écrit :
> Eric,
>
> Great thought--thank you. Running a similar server with
> 82571/e1000e and it does not exhibit the problem. 'e1000e' has
> default copybreak=256 while 'ixgbe' has no copybreak. Rational
> given is
>
> http://osdir.com/ml/linux.drivers.e1000.devel/2008-01/msg00103.html
>
> But the comparion is a bit apples-and-oranges since the 'e1000e'
> system is dual Opteron 2354 while the 'ixgbe' system is Xeon
> E5430 (a painful choice thus far). Also 'e1000e' system passes
> data via a PACKET socket while the 'ixgbe' system passes data
> via UDP (a configurable option).
>
> I'm not fully up on how this all works: am I to understand that
> the error could result from RX ring-queue buffers not freeing
> quickly enough because they have a use-count held non-zero as
> the packet travels the stack?

Well, error is normal in stress situation, when no more kernel
memory is available.

cat /proc/net/udp

can show you (in last column) sockets where packets where dropped
by UDP stack if their receive queue was full.

>
> I've just doubled some SLAB tuneables that seem relevant, but
> if the cause is the aforementioned, this won't help. Will
> have the answer on the tweaks by the end of Tuesday.
>
> David

copybreak in drivers themselves is nice because driver can recycle
its rx skbs much faster, but that is suboptimal in forwarding (routers)
workloads. Its also a lot of duplicated code in every driver.

So we could do the skb trimming (ie : reallocating the data portion to exactly
the size of packet) in core network stack, when we know packet must be handled
by an application, and not dropped or forwarded by kernel.

Because of slab rounding, this reallocation should be done only if resulting data
portion is really smaller (50 %) than original skb.

>
>
>
> At 04:26 AM 6/16/2009 +0200, Eric Dumazet wrote:
>> 152691992335/724246449 = 210 bytes per rx packet in average
>>
>> It could make sense to add copybreak feature in this driver to
>> reduce memory needs, but that also would consume more cpu
>> cycles, and slow down forwarding setups.
>>
>> Maybe this packet trimming could be done generically in UDP
>> stack input path, before queueing packet into a receive queue,
>> if amount of available memory is under a given threshold.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


mel at csn

Jun 16, 2009, 2:19 AM

Post #5 of 8 (357 views)
Permalink
Re: QUESTION: can netdev_alloc_skb() errors be reduced by tuning? [In reply to]

On Mon, Jun 15, 2009 at 08:19:33PM -0400, starlight [at] binnacle wrote:
> Hello,
>
> I submitted testcase for a hugepages bug that has been
> successfully resolved. Have an apparently obscure question
> related to MM, and so I am asking anyone who might have some idea
> on this. Nothing much turned up via Google and digging into
> the KMEM code looks daunting.
>
> Running Intel 82598/ixgbe 10 gig Ethernet under heavy stress.
> Generally is working well after tuning IRQ affinities, but a
> fair number of buffer allocation failures are occurring in the
> 'ixgbe' device driver and are reported via 'ethtool' statistics.
> This may be causing data loss.
>

Can you give an example of an allocation failure? Specifically, I want to
see what sort of allocation it was and what order.

For reliable protocols, an allocation failure should recover and the
data get through but obviously there is a drop in network performance
when this happens.

> The kernel primitive returning the error is netdev_alloc_skb().
>
> Are any tuneable parameters available that can reduce or
> eliminate these allocation failures? Have about eleven
> gigabytes of free memory, though most of that is consumed
> by non-dirty file cache data. Total system memory is 16GB with
> 4GB allocated to hugepages. Zero swap usage and activity though
> swap is enabled. Most application memory is hugepage or is
> 'mlock()'ed.
>

If the allocations are high-order and atomic, increasing min_free_kbytes
can help, particularly in situations where there is a burst of network
traffic. I won't know if they are atomic until I see an error message
though.

> Thank you.
>
>
>
>
>
> System rebooted before test run.
>
> Dual Xeon E5430, 16GB FB-DIMM RAM.
>
>
> $ cat /proc/meminfo
> MemTotal: 16443828 kB
> MemFree: 281176 kB
> Buffers: 53896 kB
> Cached: 11331924 kB
> SwapCached: 0 kB
> Active: 200740 kB
> Inactive: 11284312 kB
> HighTotal: 0 kB
> HighFree: 0 kB
> LowTotal: 16443828 kB
> LowFree: 281176 kB
> SwapTotal: 2031608 kB
> SwapFree: 2031400 kB
> Dirty: 4 kB
> Writeback: 0 kB
> AnonPages: 104464 kB
> Mapped: 14644 kB
> Slab: 440452 kB
> PageTables: 4032 kB
> NFS_Unstable: 0 kB
> Bounce: 0 kB
> CommitLimit: 8156368 kB
> Committed_AS: 122452 kB
> VmallocTotal: 34359738367 kB
> VmallocUsed: 266872 kB
> VmallocChunk: 34359471043 kB
> HugePages_Total: 2048
> HugePages_Free: 735
> HugePages_Rsvd: 0
> Hugepagesize: 2048 kB
>
>
> # ethtool -S eth2 | egrep -v ': 0$'
> NIC statistics:
> rx_packets: 724246449
> tx_packets: 229847
> rx_bytes: 152691992335
> tx_bytes: 10573426
> multicast: 725997241
> broadcast: 6
> rx_csum_offload_good: 723051776
> alloc_rx_buff_failed: 7119
> tx_queue_0_packets: 229847
> tx_queue_0_bytes: 10573426
> rx_queue_0_packets: 340698332
> rx_queue_0_bytes: 70844299683
> rx_queue_1_packets: 385298923
> rx_queue_1_bytes: 82276167594
>
>
> ixgbe driver fragment
> =====================
> struct sk_buff *skb = netdev_alloc_skb(adapter->netdev, bufsz);
>
> if (!skb) {
> adapter->alloc_rx_buff_failed++;
> goto no_buffers;
> }
>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


starlight at binnacle

Jun 16, 2009, 8:25 AM

Post #6 of 8 (356 views)
Permalink
Re: QUESTION: can netdev_alloc_skb() errors be reduced by tuning? [In reply to]

At 10:19 AM 6/16/2009 +0100, Mel Gorman wrote:

>Can you give an example of an allocation failure? Specifically, I want to
>see what sort of allocation it was and what order.

I think it's just the basic buffer allocation for
Ethernet frames arriving in the 'ixgbe' driver. Seems
like it's one allocation per frame. Per the original
message the allocations are made with the 'netdev_alloc_skb()'
kernel call. The function where this code appears is
named 'ixgbe_alloc_rx_buffers()' and the comment is
"Replace used receive buffers."

The code path in question does not generate an error. It just
increments the 'alloc_rx_buff_failed' counter for the ethX
device. In addition it appears that the frame is dropped
only if the PCIe hardware ring-queue associated with each
interface is full. So on the next interrupt the allocation
is retried and appears to be successful 99% of the time.

>For reliable protocols, an allocation failure should recover and the
>data get through but obviously there is a drop in network performance
>when this happens.

This is for a specialized high-volume UDP multicast application
where data loss of any kind is unacceptable.

>If the allocations are high-order and atomic, increasing min_free_kbytes
>can help, particularly in situations where there is a burst of network
>traffic. I won't know if they are atomic until I see an error message
>though.

Doesn't the use of 'netdev_alloc_skb()' kernel primitive
imply what the nature of the allocation is? I followed the
call graph down into "kmem" land, but it's a complex place
and so I abandoned the review.

My impression is that 'min_free_kbytes' relates mainly to systems
where significant paging pressure exists. The servers have zero
paging pressure and lots of free memory, though mostly in the
form of instantly discardable file data cache pages. In the
past disabling the program that generates the cache pressure
has had no effect on data loss, though I haven't tried it in
relation this specific issue.

Tried increasing a few /proc/slabinfo tuneable parameters today
and this appears to have fixed the issue so far today.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


starlight at binnacle

Jun 16, 2009, 10:24 AM

Post #7 of 8 (356 views)
Permalink
Re: QUESTION: can netdev_alloc_skb() errors be reduced by tuning? [In reply to]

>Tried increasing a few /proc/slabinfo tuneable parameters today
>and this appears to have fixed the issue so far today.

Spoke too soon. A burst of allocation fails appeared
a some incoming data was lost. 'e1000e' system had
no problem.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


herbert at gondor

Jul 4, 2009, 8:44 PM

Post #8 of 8 (325 views)
Permalink
Re: QUESTION: can netdev_alloc_skb() errors be reduced by tuning? [In reply to]

Eric Dumazet <eric.dumazet [at] gmail> wrote:
>
> Because of slab rounding, this reallocation should be done only if resulting data
> portion is really smaller (50 %) than original skb.

If we're going to do this in the core then we should only do it
in the spots where the packet may be held indefinitely.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert [at] gondor>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.