Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux Virtual Server: Users

[lvs-users] Sync daemon and many concurrent connections

 

 

Linux Virtual Server users RSS feed   Index | Next | Previous | View Threaded


siim at p6drad-teel

Oct 8, 2009, 8:31 AM

Post #1 of 3 (292 views)
Permalink
[lvs-users] Sync daemon and many concurrent connections

Hi

We are planning to use LVS for a setup with a lot of (millions)
concurrent (mostly idle) connections and were setting up sync daemon to
avoid a reconnect flood when the master fails.

Originally I was planning to ask for help, but it turned out to be one
of those cases where you go over the problem description and refine the
details until the problem description ceases to exist. So, instead I'll
post the results and what we needed tuning to get it working.

Short summary: sync daemon is working very well with high connection
rate if you increase rmem_default and wmem_default sysctls.

Initially, there was a problem with sync_master daemon sending updates.
As it just sent updates every second, the send buffer of the socket got
full and we got ip_vs_sync_send_async errors in kernel log. We decreased
the sleep time to 100ms which gave slightly better results, but
net.core.wmem_max and net.core.wmem_default also needed increasing
(which probably means, that we could have left the kernel unchanged).

After that we had problems on the sync_backup daemon size, whose receive
buffer now got full from time to time and resulted in lost sync packets
(visible through udp receive errors). So we also increased the rmem
sysctls quite a bit, which solved that problem as well.

Another consideration for mostly idle connections seems to be choosing
appropriate sync_threshold and tcp timeout (ipvsadm -L --timeout)
values. Our current plan is to increase the tcp timeout to 30 minutes
(1800) and reduce sync_threshold to (3 10) so that the connections would
stay actual on the backup even with relatively infrequent keepalives
being sent.

Hardware for testing was a few of 2xquad opterons with 16GB memory, dual
e1000 and onboard dual bnx network cards, sync_threshold = 0 1 (sync on
every packet, for testing), using LVS-NAT. Set up and run by a very
diligent coworker :)

Some results:
8.5 million connections all synced
~100Kpackets/s of keepalives on external interface
900 packets/s of sync daemon traffic
just over 100Mbps of traffic (short packets)

On primary LVS, ~1% of 1 core for sync_master daemon, 1 core 10-40% in
softirq (ipvs?), ~1.7GB of memory used in total
On secondary LVS, ~10% of 1 core for sync_backup daemon, 1 core 20% in
softirq (ipvs?), ~1.7GB of memory used in total

Failover with keepalived worked as expected once all connections were
established.

The likely limiting factor seems to be the 1 core 40% in softirq. This
was also the core which serviced the bnx network card so it's possible
that switching entirely to e1000 would leviate the problem (the core
responsible for e1000 was ~10% in softirq). Also, time spent in softirq
was not really consistent and sometimes dropped quite low (maybe an
altogether different problem).

Interrupt load was low (8K/s in total) with both e1000 and bnx cards in
use, although we still superstitiously suspect broadcom is not quite as
scalable as intel.

Siim

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users[at]LinuxVirtualServer.org
Send requests to lvs-users-request[at]LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users


sashi.kant at eng

Oct 8, 2009, 9:23 AM

Post #2 of 3 (276 views)
Permalink
Re: [lvs-users] Sync daemon and many concurrent connections [In reply to]

Hello Siim,

We had similar packet/second rate in our environment and had issues of
dropped packets during floods of requests coming. We also have very
similar traffic pattern (small packets and lots of them). Our client
connections are not long running.

We start seeing issues with broadcom cards ~ 40 k PPS
swapping them with intel helped us reach ~ 80 k PPS

Ultimately we had to replace intel (e1000) type card with newer intel
cards with MSI-X capabilities, this card uses igb driver which needed
to be compiled for debian etch and lenny systems.

During our test we could reach upto 600 K PPS using these cards
without hogging the interrupts.

Hope this helps

-Sashi

On Oct 8, 2009, at 8:31 AM, Siim Põder wrote:

> Hi
>
> We are planning to use LVS for a setup with a lot of (millions)
> concurrent (mostly idle) connections and were setting up sync daemon
> to
> avoid a reconnect flood when the master fails.
>
> Originally I was planning to ask for help, but it turned out to be one
> of those cases where you go over the problem description and refine
> the
> details until the problem description ceases to exist. So, instead
> I'll
> post the results and what we needed tuning to get it working.
>
> Short summary: sync daemon is working very well with high connection
> rate if you increase rmem_default and wmem_default sysctls.
>
> Initially, there was a problem with sync_master daemon sending
> updates.
> As it just sent updates every second, the send buffer of the socket
> got
> full and we got ip_vs_sync_send_async errors in kernel log. We
> decreased
> the sleep time to 100ms which gave slightly better results, but
> net.core.wmem_max and net.core.wmem_default also needed increasing
> (which probably means, that we could have left the kernel unchanged).
>
> After that we had problems on the sync_backup daemon size, whose
> receive
> buffer now got full from time to time and resulted in lost sync
> packets
> (visible through udp receive errors). So we also increased the rmem
> sysctls quite a bit, which solved that problem as well.
>
> Another consideration for mostly idle connections seems to be choosing
> appropriate sync_threshold and tcp timeout (ipvsadm -L --timeout)
> values. Our current plan is to increase the tcp timeout to 30 minutes
> (1800) and reduce sync_threshold to (3 10) so that the connections
> would
> stay actual on the backup even with relatively infrequent keepalives
> being sent.
>
> Hardware for testing was a few of 2xquad opterons with 16GB memory,
> dual
> e1000 and onboard dual bnx network cards, sync_threshold = 0 1 (sync
> on
> every packet, for testing), using LVS-NAT. Set up and run by a very
> diligent coworker :)
>
> Some results:
> 8.5 million connections all synced
> ~100Kpackets/s of keepalives on external interface
> 900 packets/s of sync daemon traffic
> just over 100Mbps of traffic (short packets)
>
> On primary LVS, ~1% of 1 core for sync_master daemon, 1 core 10-40% in
> softirq (ipvs?), ~1.7GB of memory used in total
> On secondary LVS, ~10% of 1 core for sync_backup daemon, 1 core 20% in
> softirq (ipvs?), ~1.7GB of memory used in total
>
> Failover with keepalived worked as expected once all connections were
> established.
>
> The likely limiting factor seems to be the 1 core 40% in softirq. This
> was also the core which serviced the bnx network card so it's possible
> that switching entirely to e1000 would leviate the problem (the core
> responsible for e1000 was ~10% in softirq). Also, time spent in
> softirq
> was not really consistent and sometimes dropped quite low (maybe an
> altogether different problem).
>
> Interrupt load was low (8K/s in total) with both e1000 and bnx cards
> in
> use, although we still superstitiously suspect broadcom is not quite
> as
> scalable as intel.
>
> Siim
>
> _______________________________________________
> Please read the documentation before posting - it's available at:
> http://www.linuxvirtualserver.org/
>
> LinuxVirtualServer.org mailing list - lvs-users[at]LinuxVirtualServer.org
> Send requests to lvs-users-request[at]LinuxVirtualServer.org
> or go to http://lists.graemef.net/mailman/listinfo/lvs-users


_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users[at]LinuxVirtualServer.org
Send requests to lvs-users-request[at]LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users


horms at verge

Oct 8, 2009, 3:51 PM

Post #3 of 3 (273 views)
Permalink
Re: [lvs-users] Sync daemon and many concurrent connections [In reply to]

On Thu, Oct 08, 2009 at 09:23:40AM -0700, Sashi Kant wrote:
> Hello Siim,
>
> We had similar packet/second rate in our environment and had issues of
> dropped packets during floods of requests coming. We also have very
> similar traffic pattern (small packets and lots of them). Our client
> connections are not long running.
>
> We start seeing issues with broadcom cards ~ 40 k PPS
> swapping them with intel helped us reach ~ 80 k PPS
>
> Ultimately we had to replace intel (e1000) type card with newer intel
> cards with MSI-X capabilities, this card uses igb driver which needed
> to be compiled for debian etch and lenny systems.
>
> During our test we could reach upto 600 K PPS using these cards
> without hogging the interrupts.

Out of curiosity, are you using the Intel 82576 or the 82575?


_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users[at]LinuxVirtualServer.org
Send requests to lvs-users-request[at]LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

Linux Virtual Server users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.