Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: NTop: Misc

libzero CPU affinity

 

 

NTop misc RSS feed   Index | Next | Previous | View Threaded


c.d.wakelin at reading

Jul 9, 2012, 9:06 AM

Post #1 of 6 (1316 views)
Permalink
libzero CPU affinity

Hi,

I've got a few more questions on libzero.

I have 2 x 8-core machine with an Intel 10Gb ixgbe card. I want to run
three apps on the same packets, currently maxing at about 1GB/s (most of
that is research data that can be largely ignored quickly; I want to
just log flow data for that).

1) If I use fan-out in libzero to duplicate packets (using my modified
version of pfdnacluster_master), is it best to have each application use
the same CPU affinity for each thread?

So (for 8 threads each):

app1 thread 1 connects to dnacluster:1@0, bound to CPU0
...
app1 thread 8 connects to dnacluster:1@7, bound to CPU7
app2 thread 1 connects to dnacluster:1@8, bound to CPU0
...
app2 thread 8 connects to dnacluster:1 [at] 1 bound to CPU7
app3 thread 1 connects to dnacluster:1 [at] 1 bound to CPU0

etc.

2) I can't use all my cores easily with this as 3 apps x 16 cores > 32.
Is there any way to increase this (probably not)?. Would 64 bits be slower?

3) (Mad idea!) Could I use ixgbe DNA + RSS to split the traffic up
(possibly with the modified RSS function in the driver to make it
symmetric) and then libzero to fan-out to the three apps?

i.e.

insmod ixgbe RSS=16,16; ifconfig dna0 up; set_irq_affinity.sh dna0

pfdnacluster_master -i dna0@0 -c 0 -n 3 -m 3
...
pfdnacluster_master -i dna0 [at] 1 -c 15 -n 3 -m 3

(I'd probably make a version that creates N clusters and amalgamates stats)

app1 thread 1 connects to dnacluster:0@0
...
app1 thread 16 connects to dnacluster:15@0
app2 thread 1 connects to dnacluster:0@1
...
app2 thread 16 connects to dnacluster:15@1
app3 thread 1 connects to dnacluster:0@2

etc.

4) I have hyperthreading turned off. Would enabling it make things better?

Best Wishes,
Chris

--
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
Christopher Wakelin, c.d.wakelin [at] reading
IT Services Centre, The University of Reading, Tel: +44 (0)118 378 2908
Whiteknights, Reading, RG6 6AF, UK Fax: +44 (0)118 975 3094

_______________________________________________
Ntop-misc mailing list
Ntop-misc [at] listgateway
http://listgateway.unipi.it/mailman/listinfo/ntop-misc


c.d.wakelin at reading

Jul 17, 2012, 3:29 AM

Post #2 of 6 (1245 views)
Permalink
Re: libzero CPU affinity [In reply to]

I'm still not sure of the answers to these but I've been running pretty
much as "1)".

I did also try "3)" which worked pretty well, except that, for some
reason, the modified ixgbe RSS hash function (overwriting the RSS key
with repeated "0xAFE3AFE3") doesn't seem to do anything in Ubuntu 12.04
(i.e. 3.2.0 kernel) but works in Ubuntu 10.04 (2.6.32).

My suspicion is this is something to do with "net device ops", which is
enabled in 3.2.x, but I haven't managed to work out how.

I also have a problem with one of the three apps, ARGUS, using 100% CPU.
It uses select() with a timeout to generate idle time, but with DNA
clusters this always seems to return immediately, even with no traffic.
Is that expected?

Best Wishes,
Chris

On 09/07/12 17:06, Chris Wakelin wrote:
> Hi,
>
> I've got a few more questions on libzero.
>
> I have 2 x 8-core machine with an Intel 10Gb ixgbe card. I want to run
> three apps on the same packets, currently maxing at about 1GB/s (most of
> that is research data that can be largely ignored quickly; I want to
> just log flow data for that).
>
> 1) If I use fan-out in libzero to duplicate packets (using my modified
> version of pfdnacluster_master), is it best to have each application use
> the same CPU affinity for each thread?
>
> So (for 8 threads each):
>
> app1 thread 1 connects to dnacluster:1@0, bound to CPU0
> ...
> app1 thread 8 connects to dnacluster:1@7, bound to CPU7
> app2 thread 1 connects to dnacluster:1@8, bound to CPU0
> ...
> app2 thread 8 connects to dnacluster:1 [at] 1 bound to CPU7
> app3 thread 1 connects to dnacluster:1 [at] 1 bound to CPU0
>
> etc.
>
> 2) I can't use all my cores easily with this as 3 apps x 16 cores > 32.
> Is there any way to increase this (probably not)?. Would 64 bits be slower?
>
> 3) (Mad idea!) Could I use ixgbe DNA + RSS to split the traffic up
> (possibly with the modified RSS function in the driver to make it
> symmetric) and then libzero to fan-out to the three apps?
>
> i.e.
>
> insmod ixgbe RSS=16,16; ifconfig dna0 up; set_irq_affinity.sh dna0
>
> pfdnacluster_master -i dna0@0 -c 0 -n 3 -m 3
> ...
> pfdnacluster_master -i dna0 [at] 1 -c 15 -n 3 -m 3
>
> (I'd probably make a version that creates N clusters and amalgamates stats)
>
> app1 thread 1 connects to dnacluster:0@0
> ...
> app1 thread 16 connects to dnacluster:15@0
> app2 thread 1 connects to dnacluster:0@1
> ...
> app2 thread 16 connects to dnacluster:15@1
> app3 thread 1 connects to dnacluster:0@2
>
> etc.
>
> 4) I have hyperthreading turned off. Would enabling it make things better?
>
> Best Wishes,
> Chris
>


--
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
Christopher Wakelin, c.d.wakelin [at] reading
IT Services Centre, The University of Reading, Tel: +44 (0)118 378 2908
Whiteknights, Reading, RG6 6AF, UK Fax: +44 (0)118 975 3094


_______________________________________________
Ntop-misc mailing list
Ntop-misc [at] listgateway
http://listgateway.unipi.it/mailman/listinfo/ntop-misc


cardigliano at ntop

Jul 17, 2012, 4:07 AM

Post #3 of 6 (1248 views)
Permalink
Re: libzero CPU affinity [In reply to]

Chris
sorry but I just realized my answer never reached the ml, please see inline

---------- Forwarded message ----------
Date: 2012/7/10
Subject: Re: [Ntop-misc] libzero CPU affinity
To: "ntop-misc [at] listgateway" <ntop-misc [at] listgateway>
Cc: "ntop-misc [at] listgateway" <ntop-misc [at] listgateway>


Hi Chris
Please see inline

On 09/lug/2012, at 18:06, Chris Wakelin <c.d.wakelin [at] reading> wrote:

> Hi,
>
> I've got a few more questions on libzero.
>
> I have 2 x 8-core machine with an Intel 10Gb ixgbe card. I want to run
> three apps on the same packets, currently maxing at about 1GB/s (most of
> that is research data that can be largely ignored quickly; I want to
> just log flow data for that).
>
> 1) If I use fan-out in libzero to duplicate packets (using my modified
> version of pfdnacluster_master), is it best to have each application use
> the same CPU affinity for each thread?
>
> So (for 8 threads each):
>
> app1 thread 1 connects to dnacluster:1@0, bound to CPU0
> ...
> app1 thread 8 connects to dnacluster:1@7, bound to CPU7
> app2 thread 1 connects to dnacluster:1@8, bound to CPU0
> ...
> app2 thread 8 connects to dnacluster:1 [at] 1 bound to CPU7
> app3 thread 1 connects to dnacluster:1 [at] 1 bound to CPU0
>
> etc.

Yes, this affinity looks good

>
> 2) I can't use all my cores easily with this as 3 apps x 16 cores > 32.
> Is there any way to increase this (probably not)?. Would 64 bits be slower?

No, currently there is no way. We tought 32 was enough, a good compromise between performance and flexibility, but we will consider whether to move to 64.

>
> 3) (Mad idea!) Could I use ixgbe DNA + RSS to split the traffic up
> (possibly with the modified RSS function in the driver to make it
> symmetric) and then libzero to fan-out to the three apps?
>
> i.e.
>
> insmod ixgbe RSS=16,16; ifconfig dna0 up; set_irq_affinity.sh dna0
>
> pfdnacluster_master -i dna0@0 -c 0 -n 3 -m 3
> ...
> pfdnacluster_master -i dna0 [at] 1 -c 15 -n 3 -m 3
>
> (I'd probably make a version that creates N clusters and amalgamates stats)
>
> app1 thread 1 connects to dnacluster:0@0
> ...
> app1 thread 16 connects to dnacluster:15@0
> app2 thread 1 connects to dnacluster:0@1
> ...
> app2 thread 16 connects to dnacluster:15@1
> app3 thread 1 connects to dnacluster:0@2
>
>

Mad idea but yes, you could :-) (probably two queues are enough, 3x8 + 3x8)

>
> 4) I have hyperthreading turned off. Would enabling it make things better?

It depends on your applications, the best to discover it is running some test.

Regards
Alfredo

>
> Best Wishes,
> Chris
>
> --
> --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
> Christopher Wakelin, c.d.wakelin [at] reading
> IT Services Centre, The University of Reading, Tel: +44 (0)118 378 2908
> Whiteknights, Reading, RG6 6AF, UK Fax: +44 (0)118 975 3094
>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc [at] listgateway
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

On Jul 17, 2012, at 12:29 PM, Chris Wakelin wrote:

> I'm still not sure of the answers to these but I've been running pretty
> much as "1)".
>
> I did also try "3)" which worked pretty well, except that, for some
> reason, the modified ixgbe RSS hash function (overwriting the RSS key
> with repeated "0xAFE3AFE3") doesn't seem to do anything in Ubuntu 12.04
> (i.e. 3.2.0 kernel) but works in Ubuntu 10.04 (2.6.32).

Please try updating to latest SVN and using the pfring_open() flag PF_RING_DNA_SYMMETRIC_RSS.
BTW the RSS key we are using is different.

> My suspicion is this is something to do with "net device ops", which is
> enabled in 3.2.x, but I haven't managed to work out how.

I will check this.

>
> I also have a problem with one of the three apps, ARGUS, using 100% CPU.
> It uses select() with a timeout to generate idle time, but with DNA
> clusters this always seems to return immediately, even with no traffic.
> Is that expected?

poll() on the fd returned by pfring_get_selectable_fd() should work, I will check also this.

Thank you
Alfredo

>
> Best Wishes,
> Chris
>
> On 09/07/12 17:06, Chris Wakelin wrote:
>> Hi,
>>
>> I've got a few more questions on libzero.
>>
>> I have 2 x 8-core machine with an Intel 10Gb ixgbe card. I want to run
>> three apps on the same packets, currently maxing at about 1GB/s (most of
>> that is research data that can be largely ignored quickly; I want to
>> just log flow data for that).
>>
>> 1) If I use fan-out in libzero to duplicate packets (using my modified
>> version of pfdnacluster_master), is it best to have each application use
>> the same CPU affinity for each thread?
>>
>> So (for 8 threads each):
>>
>> app1 thread 1 connects to dnacluster:1@0, bound to CPU0
>> ...
>> app1 thread 8 connects to dnacluster:1@7, bound to CPU7
>> app2 thread 1 connects to dnacluster:1@8, bound to CPU0
>> ...
>> app2 thread 8 connects to dnacluster:1 [at] 1 bound to CPU7
>> app3 thread 1 connects to dnacluster:1 [at] 1 bound to CPU0
>>
>> etc.
>>
>> 2) I can't use all my cores easily with this as 3 apps x 16 cores > 32.
>> Is there any way to increase this (probably not)?. Would 64 bits be slower?
>>
>> 3) (Mad idea!) Could I use ixgbe DNA + RSS to split the traffic up
>> (possibly with the modified RSS function in the driver to make it
>> symmetric) and then libzero to fan-out to the three apps?
>>
>> i.e.
>>
>> insmod ixgbe RSS=16,16; ifconfig dna0 up; set_irq_affinity.sh dna0
>>
>> pfdnacluster_master -i dna0@0 -c 0 -n 3 -m 3
>> ...
>> pfdnacluster_master -i dna0 [at] 1 -c 15 -n 3 -m 3
>>
>> (I'd probably make a version that creates N clusters and amalgamates stats)
>>
>> app1 thread 1 connects to dnacluster:0@0
>> ...
>> app1 thread 16 connects to dnacluster:15@0
>> app2 thread 1 connects to dnacluster:0@1
>> ...
>> app2 thread 16 connects to dnacluster:15@1
>> app3 thread 1 connects to dnacluster:0@2
>>
>> etc.
>>
>> 4) I have hyperthreading turned off. Would enabling it make things better?
>>
>> Best Wishes,
>> Chris
>>
>
>
> --
> --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
> Christopher Wakelin, c.d.wakelin [at] reading
> IT Services Centre, The University of Reading, Tel: +44 (0)118 378 2908
> Whiteknights, Reading, RG6 6AF, UK Fax: +44 (0)118 975 3094
>
>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc [at] listgateway
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

_______________________________________________
Ntop-misc mailing list
Ntop-misc [at] listgateway
http://listgateway.unipi.it/mailman/listinfo/ntop-misc


c.d.wakelin at reading

Jul 17, 2012, 5:28 AM

Post #4 of 6 (1254 views)
Permalink
Re: libzero CPU affinity [In reply to]

On 17/07/12 12:07, Alfredo Cardigliano wrote:
> Chris
> sorry but I just realized my answer never reached the ml, please see inline

Ah, glad to know it wasn't because the question have no answers :)

>> 1) If I use fan-out in libzero to duplicate packets (using my modified
>> version of pfdnacluster_master), is it best to have each application use
>> the same CPU affinity for each thread?
>>
...
>
> Yes, this affinity looks good
>
>>
>> 2) I can't use all my cores easily with this as 3 apps x 16 cores > 32.
>> Is there any way to increase this (probably not)?. Would 64 bits be slower?
>
> No, currently there is no way. We tought 32 was enough, a good compromise between performance and flexibility, but we will consider whether to move to 64.

I think my case might be a bit extreme. However if it's no slower, there
might be a case in the 64-bit library.

>> 3) (Mad idea!) Could I use ixgbe DNA + RSS to split the traffic up
>> (possibly with the modified RSS function in the driver to make it
>> symmetric) and then libzero to fan-out to the three apps?
>>

...

>
> Mad idea but yes, you could :-) (probably two queues are enough, 3x8 + 3x8)
>

You mean "insmod ixgbe RSS=2,2" then "pfdnalcuster_master_cdw -i dna0@0
-c0 -n 8 -D3" and "-i dna0@1 -c1 -n 8 -D3" (I changed to "-D x" for the
multiplier)?

>>
>> I did also try "3)" which worked pretty well, except that, for some
>> reason, the modified ixgbe RSS hash function (overwriting the RSS key
>> with repeated "0xAFE3AFE3") doesn't seem to do anything in Ubuntu 12.04
>> (i.e. 3.2.0 kernel) but works in Ubuntu 10.04 (2.6.32).
>
> Please try updating to latest SVN and using the pfring_open() flag PF_RING_DNA_SYMMETRIC_RSS.
> BTW the RSS key we are using is different.
>

That's in PF_RING 5.4.4, I think. I'd already patched Suricata to use
it, and I was going to use my patch for an environment variable check in
libpcap (now in SVN) so that ARGUS could use it too (I'll be running it
on its own interface to start with).

However, isn't this useful only for plain DNA, and DNA clusters do their
own RSS already?

>> My suspicion is this is something to do with "net device ops", which is
>> enabled in 3.2.x, but I haven't managed to work out how.
>
> I will check this.
>
>>
>> I also have a problem with one of the three apps, ARGUS, using 100% CPU.
>> It uses select() with a timeout to generate idle time, but with DNA
>> clusters this always seems to return immediately, even with no traffic.
>> Is that expected?
>
> poll() on the fd returned by pfring_get_selectable_fd() should work, I will check also this.

This is with ARGUS monitoring dnacluster:1 [at] Y I'm not sure whether the
same problem appears in DNA, with dna0@Y, but I had no problems with
plain PF_RING and just eth1 (no multiqueue).

Best Wishes,
Chris

--
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
Christopher Wakelin, c.d.wakelin [at] reading
IT Services Centre, The University of Reading, Tel: +44 (0)118 378 2908
Whiteknights, Reading, RG6 6AF, UK Fax: +44 (0)118 975 3094


_______________________________________________
Ntop-misc mailing list
Ntop-misc [at] listgateway
http://listgateway.unipi.it/mailman/listinfo/ntop-misc


cardigliano at ntop

Jul 17, 2012, 7:47 AM

Post #5 of 6 (1245 views)
Permalink
Re: libzero CPU affinity [In reply to]

Chris
see inline

On Jul 17, 2012, at 2:28 PM, Chris Wakelin wrote:

> On 17/07/12 12:07, Alfredo Cardigliano wrote:
>> Chris
>> sorry but I just realized my answer never reached the ml, please see inline
>
> Ah, glad to know it wasn't because the question have no answers :)
>
>>> 1) If I use fan-out in libzero to duplicate packets (using my modified
>>> version of pfdnacluster_master), is it best to have each application use
>>> the same CPU affinity for each thread?
>>>
> ...
>>
>> Yes, this affinity looks good
>>
>>>
>>> 2) I can't use all my cores easily with this as 3 apps x 16 cores > 32.
>>> Is there any way to increase this (probably not)?. Would 64 bits be slower?
>>
>> No, currently there is no way. We tought 32 was enough, a good compromise between performance and flexibility, but we will consider whether to move to 64.
>
> I think my case might be a bit extreme. However if it's no slower, there
> might be a case in the 64-bit library.
>
>>> 3) (Mad idea!) Could I use ixgbe DNA + RSS to split the traffic up
>>> (possibly with the modified RSS function in the driver to make it
>>> symmetric) and then libzero to fan-out to the three apps?
>>>
>
> ...
>
>>
>> Mad idea but yes, you could :-) (probably two queues are enough, 3x8 + 3x8)
>>
>
> You mean "insmod ixgbe RSS=2,2" then "pfdnalcuster_master_cdw -i dna0@0
> -c0 -n 8 -D3" and "-i dna0@1 -c1 -n 8 -D3" (I changed to "-D x" for the
> multiplier)?

Yes

>
>>>
>>> I did also try "3)" which worked pretty well, except that, for some
>>> reason, the modified ixgbe RSS hash function (overwriting the RSS key
>>> with repeated "0xAFE3AFE3") doesn't seem to do anything in Ubuntu 12.04
>>> (i.e. 3.2.0 kernel) but works in Ubuntu 10.04 (2.6.32).
>>
>> Please try updating to latest SVN and using the pfring_open() flag PF_RING_DNA_SYMMETRIC_RSS.
>> BTW the RSS key we are using is different.
>>
>
> That's in PF_RING 5.4.4, I think. I'd already patched Suricata to use
> it, and I was going to use my patch for an environment variable check in
> libpcap (now in SVN) so that ARGUS could use it too (I'll be running it
> on its own interface to start with).
>
> However, isn't this useful only for plain DNA, and DNA clusters do their
> own RSS already?

Yes, RSS is useless with the DNA cluster, as it uses a sw distribution function

>
>>> My suspicion is this is something to do with "net device ops", which is
>>> enabled in 3.2.x, but I haven't managed to work out how.
>>
>> I will check this.
>>
>>>
>>> I also have a problem with one of the three apps, ARGUS, using 100% CPU.
>>> It uses select() with a timeout to generate idle time, but with DNA
>>> clusters this always seems to return immediately, even with no traffic.
>>> Is that expected?
>>
>> poll() on the fd returned by pfring_get_selectable_fd() should work, I will check also this.
>
> This is with ARGUS monitoring dnacluster:1 [at] Y I'm not sure whether the
> same problem appears in DNA, with dna0@Y, but I had no problems with
> plain PF_RING and just eth1 (no multi queue).

On, however the DNA cluster and standard DNA are two different modules.

Thanks
Alfredo

>
> Best Wishes,
> Chris
>
> --
> --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
> Christopher Wakelin, c.d.wakelin [at] reading
> IT Services Centre, The University of Reading, Tel: +44 (0)118 378 2908
> Whiteknights, Reading, RG6 6AF, UK Fax: +44 (0)118 975 3094
>
>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc [at] listgateway
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

_______________________________________________
Ntop-misc mailing list
Ntop-misc [at] listgateway
http://listgateway.unipi.it/mailman/listinfo/ntop-misc


c.d.wakelin at reading

Jul 25, 2012, 3:55 PM

Post #6 of 6 (1242 views)
Permalink
Re: libzero CPU affinity [In reply to]

On 17/07/12 15:47, Alfredo Cardigliano wrote:
> Chris
> see inline
>
> On Jul 17, 2012, at 2:28 PM, Chris Wakelin wrote:

<snip>

>>>> I also have a problem with one of the three apps, ARGUS, using 100% CPU.
>>>> It uses select() with a timeout to generate idle time, but with DNA
>>>> clusters this always seems to return immediately, even with no traffic.
>>>> Is that expected?
>>>
>>> poll() on the fd returned by pfring_get_selectable_fd() should work, I will check also this.
>>
>> This is with ARGUS monitoring dnacluster:1 [at] Y I'm not sure whether the
>> same problem appears in DNA, with dna0@Y, but I had no problems with
>> plain PF_RING and just eth1 (no multi queue).
>
> On, however the DNA cluster and standard DNA are two different modules.
>

I've been doing some more testing with this. I wrote a counting script
around the PF_RING "Num Poll Calls" stats and it turns out that ARGUS is
managing 1.8m polls a second!

Bro IDS, which also uses libpcap only does 100k polls a second on the
same traffic. Bro calls select() with a 0 timeout, though, and has a 20
microsecond pause built in by calling select(0,0,0,0,&timeout). I tried
adding something similar to ARGUS which doesn't make much difference.

Anyway, I also found that the only pfcount* example app that uses
select/poll seems to be pfcount_aggregator, and with a cluster on an
idle interface, dna1,

pfcount_aggregator -i dnacluster:2@0 -p 200 -w 128 -s

is making 3m polls a second, whereas with "-i dna1" it's only 10 (mind
you, I expected it to be 5?).

I haven't tried the latest SVN yet, this is all with PF_RING 5.4.4.

I've got a distribution function (a further generalisation of
pfdnacluster_master: "-D" for number of duplicates of the "-n" queues
and "-A" for number of additional all-traffic queues, if anyone is
interested) that gives ARGUS all the traffic on one queue and Bro and
Suricata each get the traffic split 8 ways (so "-n 8 -D 2 -A 1"). This
means ARGUS just ties up one CPU core, which I can live with for now,
and doesn't seem to missing anything :-)

Best Wishes,
Chris

--
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
Christopher Wakelin, c.d.wakelin [at] reading
IT Services Centre, The University of Reading, Tel: +44 (0)118 378 8439
Whiteknights, Reading, RG6 2AF, UK Fax: +44 (0)118 975 3094
_______________________________________________
Ntop-misc mailing list
Ntop-misc [at] listgateway
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

NTop misc RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.