Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: NTop: Misc

X520 what options for higher throughput with pfcount_multichannel?

 

 

NTop misc RSS feed   Index | Next | Previous | View Threaded


andrew_lehane at agilent

Aug 12, 2011, 5:32 AM

Post #1 of 4 (440 views)
Permalink
X520 what options for higher throughput with pfcount_multichannel?

Hi,

I have recently purchased a machine with an Intel X520 NIC, having read the documentation I am confused as what options I have to maximize performance of "pfcount_multichannel".

My initial experiments using the released code driver PF_RING-4.7.2/drivers/intel/ixgbe/ixgbe-3.1.15-FlowDirector-NoTNAPI and transparent mode = 2 show performance roughly as follows.

...

Absolute Stats: [channel=6][381188 pkts rcvd][0 pkts dropped]
Total Pkts=381188/Dropped=0.0 %
381188 pkts - 22871280 bytes [7401.6 pkt/sec - 3.55 Mbit/sec]
=========================
Actual Stats: [channel=6][12514 pkts][1003.1 ms][12475.0 pkt/sec]
=========================
Absolute Stats: [channel=7][381036 pkts rcvd][0 pkts dropped]
Total Pkts=381036/Dropped=0.0 %
381036 pkts - 22862160 bytes [7398.6 pkt/sec - 3.55 Mbit/sec]
=========================
Actual Stats: [channel=7][13010 pkts][1003.1 ms][12969.4 pkt/sec]
=========================
Absolute Stats: [channel=8][381835 pkts rcvd][0 pkts dropped]
Total Pkts=381835/Dropped=0.0 %
381835 pkts - 22910100 bytes [7414.1 pkt/sec - 3.56 Mbit/sec]
=========================
Actual Stats: [channel=8][13031 pkts][1003.1 ms][12990.4 pkt/sec]
=========================
Absolute Stats: [channel=9][381299 pkts rcvd][0 pkts dropped]
Total Pkts=381299/Dropped=0.0 %
381299 pkts - 22877940 bytes [7403.7 pkt/sec - 3.55 Mbit/sec]
=========================

...

Aggregate stats (all channels): [371781.1 pkt/sec][0 pkts dropped]

At 64 byte packets this, if my maths is correct, is 190MBits/s

I am using a 10 Gig source that is set to ignore PAUSE frames and deliver full line rate 64 byte packets so about 7.6 GBits/s, I assume that the traffic is getting buffered asno packets seem to be dropped and pfcount_multichannel keeps on processing even when the source has stopped sending.

# cat /proc/net/pf_ring/info

PF_RING Version : 4.7.2 ($Revision: exported$)
Ring slots : 4096
Slot version : 13
Capture TX : Yes [RX+TX]
IP Defragment : No
Socket Mode : Standard
Transparent mode : No (mode 2)
Total rings : 45
Total plugins : 0

Is this the correct/expected throughput for this driver in transparent mode? And, what do I need to get to the next level? Do I need the TNAPI driver? Or is it better to get the Silicom card? (It is my understanding that Silicom supports DMA and therefore it will be as fast (or faster) than TNAPI but use fewer CPU cycles, am I correct?).

Many thanks.

Andrew


andrew_lehane at agilent

Aug 12, 2011, 5:51 AM

Post #2 of 4 (447 views)
Permalink
Re: X520 what options for higher throughput with pfcount_multichannel? [In reply to]

Hi Luca,

I was getting some vmap errors, I wonder if this is the reason for the low throughtput?

[ 7313.184192] device eth4 entered promiscuous mode
[ 7313.195911] device eth4 left promiscuous mode
[ 7313.200292] device eth4 entered promiscuous mode
[ 7313.385476] vmap allocation for size 2101248 failed: use vmalloc=<size> to increase size.
[ 7313.385483] [PF_RING] ERROR: not enough memory for ring
[ 7313.385487] [PF_RING] ring_mmap(): unable to allocate memory

Is there any documentation on how to deal with this? Do I reduce the number of queues?

Thanks,

Andrew


From: ntop-misc-bounces [at] listgateway [mailto:ntop-misc-bounces [at] listgateway] On Behalf Of Luca Deri
Sent: Friday, August 12, 2011 1:42 PM
To: ntop-misc [at] listgateway
Subject: Re: [Ntop-misc] X520 what options for higher throughput with pfcount_multichannel?

Hi Andrew
nice to hear from you. The figures you have are bad, as in total you capture too little. Typical figures are shown here http://www.ntop.org/blog/pf_ring/packet-capture-performance-at-10-gbit-pf_ring-vs-tnapi/

I suggest to go for DNA as it is both fast and CPU savvy (see http://www.ntop.org/blog/pf_ring/how-to-sendreceive-26mpps-using-pf_ring-on-commodity-hardware/)

Luca

On 08/12/2011 02:32 PM, andrew_lehane [at] agilent<mailto:andrew_lehane [at] agilent> wrote:
Hi,

I have recently purchased a machine with an Intel X520 NIC, having read the documentation I am confused as what options I have to maximize performance of "pfcount_multichannel".

My initial experiments using the released code driver PF_RING-4.7.2/drivers/intel/ixgbe/ixgbe-3.1.15-FlowDirector-NoTNAPI and transparent mode = 2 show performance roughly as follows.

...

Absolute Stats: [channel=6][381188 pkts rcvd][0 pkts dropped]
Total Pkts=381188/Dropped=0.0 %
381188 pkts - 22871280 bytes [7401.6 pkt/sec - 3.55 Mbit/sec]
=========================
Actual Stats: [channel=6][12514 pkts][1003.1 ms][12475.0 pkt/sec]
=========================
Absolute Stats: [channel=7][381036 pkts rcvd][0 pkts dropped]
Total Pkts=381036/Dropped=0.0 %
381036 pkts - 22862160 bytes [7398.6 pkt/sec - 3.55 Mbit/sec]
=========================
Actual Stats: [channel=7][13010 pkts][1003.1 ms][12969.4 pkt/sec]
=========================
Absolute Stats: [channel=8][381835 pkts rcvd][0 pkts dropped]
Total Pkts=381835/Dropped=0.0 %
381835 pkts - 22910100 bytes [7414.1 pkt/sec - 3.56 Mbit/sec]
=========================
Actual Stats: [channel=8][13031 pkts][1003.1 ms][12990.4 pkt/sec]
=========================
Absolute Stats: [channel=9][381299 pkts rcvd][0 pkts dropped]
Total Pkts=381299/Dropped=0.0 %
381299 pkts - 22877940 bytes [7403.7 pkt/sec - 3.55 Mbit/sec]
=========================

...

Aggregate stats (all channels): [371781.1 pkt/sec][0 pkts dropped]

At 64 byte packets this, if my maths is correct, is 190MBits/s

I am using a 10 Gig source that is set to ignore PAUSE frames and deliver full line rate 64 byte packets so about 7.6 GBits/s, I assume that the traffic is getting buffered asno packets seem to be dropped and pfcount_multichannel keeps on processing even when the source has stopped sending.

# cat /proc/net/pf_ring/info

PF_RING Version : 4.7.2 ($Revision: exported$)
Ring slots : 4096
Slot version : 13
Capture TX : Yes [RX+TX]
IP Defragment : No
Socket Mode : Standard
Transparent mode : No (mode 2)
Total rings : 45
Total plugins : 0

Is this the correct/expected throughput for this driver in transparent mode? And, what do I need to get to the next level? Do I need the TNAPI driver? Or is it better to get the Silicom card? (It is my understanding that Silicom supports DMA and therefore it will be as fast (or faster) than TNAPI but use fewer CPU cycles, am I correct?).

Many thanks.

Andrew







_______________________________________________

Ntop-misc mailing list

Ntop-misc [at] listgateway<mailto:Ntop-misc [at] listgateway>

http://listgateway.unipi.it/mailman/listinfo/ntop-misc


deri at ntop

Aug 12, 2011, 5:57 AM

Post #3 of 4 (438 views)
Permalink
Re: X520 what options for higher throughput with pfcount_multichannel? [In reply to]

Andrew
1. how did you insmod the pf_ring/ixgbe module (I mean what are the
options you use)?
2. how do you start the multichannel app?

Luca

On 08/12/2011 02:51 PM, andrew_lehane [at] agilent wrote:
>
> Hi Luca,
>
> I was getting some vmap errors, I wonder if this is the reason for
> the low throughtput?
>
> [ 7313.184192] device eth4 entered promiscuous mode
>
> [ 7313.195911] device eth4 left promiscuous mode
>
> [ 7313.200292] device eth4 entered promiscuous mode
>
> [ 7313.385476] vmap allocation for size 2101248 failed: use
> vmalloc=<size> to increase size.
>
> [ 7313.385483] [PF_RING] ERROR: not enough memory for ring
>
> [ 7313.385487] [PF_RING] ring_mmap(): unable to allocate memory
>
> Is there any documentation on how to deal with this? Do I reduce the
> number of queues?
>
> Thanks,
>
> Andrew
>
> *From:*ntop-misc-bounces [at] listgateway
> [mailto:ntop-misc-bounces [at] listgateway] *On Behalf Of *Luca Deri
> *Sent:* Friday, August 12, 2011 1:42 PM
> *To:* ntop-misc [at] listgateway
> *Subject:* Re: [Ntop-misc] X520 what options for higher throughput
> with pfcount_multichannel?
>
> Hi Andrew
> nice to hear from you. The figures you have are bad, as in total you
> capture too little. Typical figures are shown here
> http://www.ntop.org/blog/pf_ring/packet-capture-performance-at-10-gbit-pf_ring-vs-tnapi/
>
> I suggest to go for DNA as it is both fast and CPU savvy (see
> http://www.ntop.org/blog/pf_ring/how-to-sendreceive-26mpps-using-pf_ring-on-commodity-hardware/)
>
> Luca
>
> On 08/12/2011 02:32 PM, andrew_lehane [at] agilent
> <mailto:andrew_lehane [at] agilent> wrote:
>
> Hi,
>
> I have recently purchased a machine with an Intel X520 NIC, having
> read the documentation I am confused as what options I have to
> maximize performance of "pfcount_multichannel".
>
> My initial experiments using the released code driver
> PF_RING-4.7.2/drivers/intel/ixgbe/ixgbe-3.1.15-FlowDirector-NoTNAPI
> and transparent mode = 2 show performance roughly as follows.
>
> ...
>
> Absolute Stats: [channel=6][381188 pkts rcvd][0 pkts dropped]
>
> Total Pkts=381188/Dropped=0.0 %
>
> 381188 pkts - 22871280 bytes [7401.6 pkt/sec - 3.55 Mbit/sec]
>
> =========================
>
> Actual Stats: [channel=6][12514 pkts][1003.1 ms][12475.0 pkt/sec]
>
> =========================
>
> Absolute Stats: [channel=7][381036 pkts rcvd][0 pkts dropped]
>
> Total Pkts=381036/Dropped=0.0 %
>
> 381036 pkts - 22862160 bytes [7398.6 pkt/sec - 3.55 Mbit/sec]
>
> =========================
>
> Actual Stats: [channel=7][13010 pkts][1003.1 ms][12969.4 pkt/sec]
>
> =========================
>
> Absolute Stats: [channel=8][381835 pkts rcvd][0 pkts dropped]
>
> Total Pkts=381835/Dropped=0.0 %
>
> 381835 pkts - 22910100 bytes [7414.1 pkt/sec - 3.56 Mbit/sec]
>
> =========================
>
> Actual Stats: [channel=8][13031 pkts][1003.1 ms][12990.4 pkt/sec]
>
> =========================
>
> Absolute Stats: [channel=9][381299 pkts rcvd][0 pkts dropped]
>
> Total Pkts=381299/Dropped=0.0 %
>
> 381299 pkts - 22877940 bytes [7403.7 pkt/sec - 3.55 Mbit/sec]
>
> =========================
>
> ...
>
> Aggregate stats (all channels): [371781.1 pkt/sec][0 pkts dropped]
>
> At 64 byte packets this, if my maths is correct, is 190MBits/s
>
> I am using a 10 Gig source that is set to ignore PAUSE frames and
> deliver full line rate 64 byte packets so about 7.6 GBits/s, I assume
> that the traffic is getting buffered asno packets seem to be dropped
> and pfcount_multichannel keeps on processing even when the source has
> stopped sending.
>
> # cat /proc/net/pf_ring/info
>
> PF_RING Version : 4.7.2 ($Revision: exported$)
>
> Ring slots : 4096
>
> Slot version : 13
>
> Capture TX : Yes [RX+TX]
>
> IP Defragment : No
>
> Socket Mode : Standard
>
> Transparent mode : No (mode 2)
>
> Total rings : 45
>
> Total plugins : 0
>
> Is this the correct/expected throughput for this driver in
> transparent mode? And, what do I need to get to the next level? Do I
> need the TNAPI driver? Or is it better to get the Silicom card? (It is
> my understanding that Silicom supports DMA and therefore it will be as
> fast (or faster) than TNAPI but use fewer CPU cycles, am I correct?).
>
> Many thanks.
>
> Andrew
>
>
>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc [at] listgateway <mailto:Ntop-misc [at] listgateway>
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>
>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc [at] listgateway
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc


andrew_lehane at agilent

Aug 12, 2011, 6:06 AM

Post #4 of 4 (448 views)
Permalink
Re: X520 what options for higher throughput with pfcount_multichannel? [In reply to]

Hi Luca,


1. Clean booted the machine.


2. Built your ixgbe driver for the kernel version I am using.



3. Built your code as per README instructions.


4. rmmod the existing ixgbe module loaded when the machine started up.


5. inserted your ixgbe driver using insmod and using default params


6. inserted the pf_ring module with insmod using default params


7. then ran the pfcount_multichannel -i ethX


The machine has 32 cores, so 64 'processors' I would guess due to hyper threading.

Andrew






From: ntop-misc-bounces [at] listgateway [mailto:ntop-misc-bounces [at] listgateway] On Behalf Of Luca Deri
Sent: Friday, August 12, 2011 1:57 PM
To: ntop-misc [at] listgateway
Subject: Re: [Ntop-misc] X520 what options for higher throughput with pfcount_multichannel?

Andrew
1. how did you insmod the pf_ring/ixgbe module (I mean what are the options you use)?
2. how do you start the multichannel app?

Luca

On 08/12/2011 02:51 PM, andrew_lehane [at] agilent<mailto:andrew_lehane [at] agilent> wrote:

Hi Luca,

I was getting some vmap errors, I wonder if this is the reason for the low throughtput?

[ 7313.184192] device eth4 entered promiscuous mode
[ 7313.195911] device eth4 left promiscuous mode
[ 7313.200292] device eth4 entered promiscuous mode
[ 7313.385476] vmap allocation for size 2101248 failed: use vmalloc=<size> to increase size.
[ 7313.385483] [PF_RING] ERROR: not enough memory for ring
[ 7313.385487] [PF_RING] ring_mmap(): unable to allocate memory

Is there any documentation on how to deal with this? Do I reduce the number of queues?

Thanks,

Andrew


From: ntop-misc-bounces [at] listgateway<mailto:ntop-misc-bounces [at] listgateway> [mailto:ntop-misc-bounces [at] listgateway] On Behalf Of Luca Deri
Sent: Friday, August 12, 2011 1:42 PM
To: ntop-misc [at] listgateway<mailto:ntop-misc [at] listgateway>
Subject: Re: [Ntop-misc] X520 what options for higher throughput with pfcount_multichannel?

Hi Andrew
nice to hear from you. The figures you have are bad, as in total you capture too little. Typical figures are shown here http://www.ntop.org/blog/pf_ring/packet-capture-performance-at-10-gbit-pf_ring-vs-tnapi/

I suggest to go for DNA as it is both fast and CPU savvy (see http://www.ntop.org/blog/pf_ring/how-to-sendreceive-26mpps-using-pf_ring-on-commodity-hardware/)

Luca

On 08/12/2011 02:32 PM, andrew_lehane [at] agilent<mailto:andrew_lehane [at] agilent> wrote:
Hi,

I have recently purchased a machine with an Intel X520 NIC, having read the documentation I am confused as what options I have to maximize performance of "pfcount_multichannel".

My initial experiments using the released code driver PF_RING-4.7.2/drivers/intel/ixgbe/ixgbe-3.1.15-FlowDirector-NoTNAPI and transparent mode = 2 show performance roughly as follows.

...

Absolute Stats: [channel=6][381188 pkts rcvd][0 pkts dropped]
Total Pkts=381188/Dropped=0.0 %
381188 pkts - 22871280 bytes [7401.6 pkt/sec - 3.55 Mbit/sec]
=========================
Actual Stats: [channel=6][12514 pkts][1003.1 ms][12475.0 pkt/sec]
=========================
Absolute Stats: [channel=7][381036 pkts rcvd][0 pkts dropped]
Total Pkts=381036/Dropped=0.0 %
381036 pkts - 22862160 bytes [7398.6 pkt/sec - 3.55 Mbit/sec]
=========================
Actual Stats: [channel=7][13010 pkts][1003.1 ms][12969.4 pkt/sec]
=========================
Absolute Stats: [channel=8][381835 pkts rcvd][0 pkts dropped]
Total Pkts=381835/Dropped=0.0 %
381835 pkts - 22910100 bytes [7414.1 pkt/sec - 3.56 Mbit/sec]
=========================
Actual Stats: [channel=8][13031 pkts][1003.1 ms][12990.4 pkt/sec]
=========================
Absolute Stats: [channel=9][381299 pkts rcvd][0 pkts dropped]
Total Pkts=381299/Dropped=0.0 %
381299 pkts - 22877940 bytes [7403.7 pkt/sec - 3.55 Mbit/sec]
=========================

...

Aggregate stats (all channels): [371781.1 pkt/sec][0 pkts dropped]

At 64 byte packets this, if my maths is correct, is 190MBits/s

I am using a 10 Gig source that is set to ignore PAUSE frames and deliver full line rate 64 byte packets so about 7.6 GBits/s, I assume that the traffic is getting buffered asno packets seem to be dropped and pfcount_multichannel keeps on processing even when the source has stopped sending.

# cat /proc/net/pf_ring/info

PF_RING Version : 4.7.2 ($Revision: exported$)
Ring slots : 4096
Slot version : 13
Capture TX : Yes [RX+TX]
IP Defragment : No
Socket Mode : Standard
Transparent mode : No (mode 2)
Total rings : 45
Total plugins : 0

Is this the correct/expected throughput for this driver in transparent mode? And, what do I need to get to the next level? Do I need the TNAPI driver? Or is it better to get the Silicom card? (It is my understanding that Silicom supports DMA and therefore it will be as fast (or faster) than TNAPI but use fewer CPU cycles, am I correct?).

Many thanks.

Andrew







_______________________________________________

Ntop-misc mailing list

Ntop-misc [at] listgateway<mailto:Ntop-misc [at] listgateway>

http://listgateway.unipi.it/mailman/listinfo/ntop-misc






_______________________________________________

Ntop-misc mailing list

Ntop-misc [at] listgateway<mailto:Ntop-misc [at] listgateway>

http://listgateway.unipi.it/mailman/listinfo/ntop-misc

NTop misc RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.