Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux Virtual Server: Users

[lvs-users] issue with closing connections with Julian's nfct patches

 

 

Linux Virtual Server users RSS feed   Index | Next | Previous | View Threaded


windo at p6drad-teel

Apr 7, 2008, 9:23 AM

Post #1 of 3 (556 views)
Permalink
[lvs-users] issue with closing connections with Julian's nfct patches

Yo

kernel 2.6.22 + julian's nfct patch.
/proc/net/ipv4/vs/
snat_reroute=1
conntrack=1

I have a server behind LVS-NAT that sends all it's data quite fast
followed by a FIN. after that, it retransmits lost packets as needed.
the problem is, that for some reason, the connection-terminating FIN
(with the last ACK) from CLIENT isn't delivered to the RS (in some
cases), which keeps on sending the last packet until it gives up.

The following rules in FORWARD chain:
ACCEPT 0 -- 0.0.0.0/0 0.0.0.0/0 ctstate
ESTABLISHED
DROP 0 -- 0.0.0.0/0 0.0.0.0/0 ctstate
INVALID
LOG 0 -- 0.0.0.0/0 0.0.0.0/0 LOG flags 0
level 4 prefix `forward: '

Netfilter seems to be matching a lot of ESTABLISHED and some INVALID
packets. All those retransmissions from RS to CLIENT end up in the LOG
rule and get dropped, so for them no ctstate was found?

Packet traces (from external and internal interfaces: 1.2.3.4 VIP,
10.0.0.1 RIP, 4.3.2.1 CIP):

external:
13:34:04 4.3.2.1.9876 > 1.2.3.4.8888: S 3015053360:3015053360(0)
13:34:04 1.2.3.4.8888 > 4.3.2.1.9876: S 3950144430:3950144430(0) ack
3015053361
13:34:05 4.3.2.1.9876 > 1.2.3.4.8888: . ack 1
13:34:05 4.3.2.1.9876 > 1.2.3.4.8888: P 1:6(5) ack 1
13:34:05 1.2.3.4.8888 > 4.3.2.1.9876: . ack 6
13:34:05 1.2.3.4.8888 > 4.3.2.1.9876: P 1:6(5) ack 6
13:34:05 4.3.2.1.9876 > 1.2.3.4.8888: . ack 6
13:34:05 4.3.2.1.9876 > 1.2.3.4.8888: P 6:216(210) ack 6
13:34:05 1.2.3.4.8888 > 4.3.2.1.9876: . ack 216
13:34:06 4.3.2.1.9876 > 1.2.3.4.8888: P 216:323(107) ack 6
13:34:06 1.2.3.4.8888 > 4.3.2.1.9876: . ack 323
13:34:06 1.2.3.4.8888 > 4.3.2.1.9876: P 6:22(16) ack 323
13:34:06 1.2.3.4.8888 > 4.3.2.1.9876: . 22:1462(1440) ack 323
13:34:07 4.3.2.1.9876 > 1.2.3.4.8888: . ack 22
13:34:07 1.2.3.4.8888 > 4.3.2.1.9876: . 1462:2902(1440) ack 323
13:34:09 1.2.3.4.8888 > 4.3.2.1.9876: . 22:1462(1440) ack 323
13:34:10 4.3.2.1.9876 > 1.2.3.4.8888: . ack 1462
13:34:10 1.2.3.4.8888 > 4.3.2.1.9876: . 1462:2902(1440) ack 323
13:34:11 4.3.2.1.9876 > 1.2.3.4.8888: . ack 2902
13:34:11 1.2.3.4.8888 > 4.3.2.1.9876: . 2902:4342(1440) ack 323
13:34:15 1.2.3.4.8888 > 4.3.2.1.9876: . 2902:4342(1440) ack 323
...skip some...
13:34:21 1.2.3.4.8888 > 4.3.2.1.9876: . 60502:61942(1440) ack 323
13:34:21 1.2.3.4.8888 > 4.3.2.1.9876: . 61942:63382(1440) ack 323
13:34:21 1.2.3.4.8888 > 4.3.2.1.9876: FP 63382:64463(1081) ack 323
13:34:22 4.3.2.1.9876 > 1.2.3.4.8888: . ack 2902
13:34:22 4.3.2.1.9876 > 1.2.3.4.8888: . ack 2902
13:34:25 1.2.3.4.8888 > 4.3.2.1.9876: . 2902:4342(1440) ack 323
13:34:25 4.3.2.1.9876 > 1.2.3.4.8888: . ack 7222
13:34:25 1.2.3.4.8888 > 4.3.2.1.9876: . 7222:8662(1440) ack 323
13:34:43 1.2.3.4.8888 > 4.3.2.1.9876: . 7222:8662(1440) ack 323
13:34:44 4.3.2.1.9876 > 1.2.3.4.8888: . ack 8662
13:34:44 1.2.3.4.8888 > 4.3.2.1.9876: . 8662:10102(1440) ack 323
13:35:21 1.2.3.4.8888 > 4.3.2.1.9876: . 8662:10102(1440) ack 323
13:35:22 4.3.2.1.9876 > 1.2.3.4.8888: . ack 10102
13:35:22 1.2.3.4.8888 > 4.3.2.1.9876: . 10102:11542(1440) ack 323
13:35:22 4.3.2.1.9876 > 1.2.3.4.8888: . ack 11542
13:35:22 1.2.3.4.8888 > 4.3.2.1.9876: . 11542:12982(1440) ack 323
13:35:23 4.3.2.1.9876 > 1.2.3.4.8888: . ack 12982
13:35:23 1.2.3.4.8888 > 4.3.2.1.9876: . 12982:14422(1440) ack 323
13:35:24 4.3.2.1.9876 > 1.2.3.4.8888: . ack 14422
13:35:24 1.2.3.4.8888 > 4.3.2.1.9876: . 14422:15862(1440) ack 323
13:35:25 4.3.2.1.9876 > 1.2.3.4.8888: . ack 15862
13:35:25 1.2.3.4.8888 > 4.3.2.1.9876: . 15862:17302(1440) ack 323
13:35:25 4.3.2.1.9876 > 1.2.3.4.8888: . ack 17302
13:35:25 1.2.3.4.8888 > 4.3.2.1.9876: . 17302:18742(1440) ack 323
13:37:25 4.3.2.1.9876 > 1.2.3.4.8888: F 323:323(0) ack 17302
13:37:28 4.3.2.1.9876 > 1.2.3.4.8888: F 323:323(0) ack 17302
13:37:33 4.3.2.1.9876 > 1.2.3.4.8888: F 323:323(0) ack 17302
13:37:45 4.3.2.1.9876 > 1.2.3.4.8888: F 323:323(0) ack 17302
13:38:09 4.3.2.1.9876 > 1.2.3.4.8888: F 323:323(0) ack 17302

internal:
13:34:05 4.3.2.1.9876 > 10.0.0.1.8888: . ack 1
13:34:05 4.3.2.1.9876 > 10.0.0.1.8888: P 1:6(5) ack 1
13:34:05 10.0.0.1.8888 > 4.3.2.1.9876: . ack 6
13:34:05 10.0.0.1.8888 > 4.3.2.1.9876: P 1:6(5) ack 6
13:34:05 4.3.2.1.9876 > 10.0.0.1.8888: . ack 6
13:34:05 4.3.2.1.9876 > 10.0.0.1.8888: P 6:216(210) ack 6
13:34:05 10.0.0.1.8888 > 4.3.2.1.9876: . ack 216
13:34:06 4.3.2.1.9876 > 10.0.0.1.8888: P 216:323(107) ack 6
13:34:06 10.0.0.1.8888 > 4.3.2.1.9876: . ack 323
13:34:06 10.0.0.1.8888 > 4.3.2.1.9876: P 6:22(16) ack 323
13:34:06 10.0.0.1.8888 > 4.3.2.1.9876: . 22:1462(1440) ack 323
13:34:07 4.3.2.1.9876 > 10.0.0.1.8888: . ack 22
13:34:07 10.0.0.1.8888 > 4.3.2.1.9876: . 1462:2902(1440) ack 323
13:34:09 10.0.0.1.8888 > 4.3.2.1.9876: . 22:1462(1440) ack 323
13:34:10 4.3.2.1.9876 > 10.0.0.1.8888: . ack 1462
13:34:10 10.0.0.1.8888 > 4.3.2.1.9876: . 1462:2902(1440) ack 323
13:34:11 4.3.2.1.9876 > 10.0.0.1.8888: . ack 2902
13:34:11 10.0.0.1.8888 > 4.3.2.1.9876: . 2902:4342(1440) ack 323
13:34:15 10.0.0.1.8888 > 4.3.2.1.9876: . 2902:4342(1440) ack 323
...skip some...
13:34:21 10.0.0.1.8888 > 4.3.2.1.9876: . 60502:61942(1440) ack 323
13:34:21 10.0.0.1.8888 > 4.3.2.1.9876: . 61942:63382(1440) ack 323
13:34:21 10.0.0.1.8888 > 4.3.2.1.9876: FP 63382:64463(1081) ack 323
13:34:22 4.3.2.1.9876 > 10.0.0.1.8888: . ack 2902
13:34:22 4.3.2.1.9876 > 10.0.0.1.8888: . ack 2902
13:34:25 10.0.0.1.8888 > 4.3.2.1.9876: . 2902:4342(1440) ack 323
13:34:25 4.3.2.1.9876 > 10.0.0.1.8888: . ack 7222
13:34:25 10.0.0.1.8888 > 4.3.2.1.9876: . 7222:8662(1440) ack 323
13:34:43 10.0.0.1.8888 > 4.3.2.1.9876: . 7222:8662(1440) ack 323
13:34:44 4.3.2.1.9876 > 10.0.0.1.8888: . ack 8662
13:34:44 10.0.0.1.8888 > 4.3.2.1.9876: . 8662:10102(1440) ack 323
13:35:21 10.0.0.1.8888 > 4.3.2.1.9876: . 8662:10102(1440) ack 323
13:35:22 4.3.2.1.9876 > 10.0.0.1.8888: . ack 10102
13:35:22 10.0.0.1.8888 > 4.3.2.1.9876: . 10102:11542(1440) ack 323
13:35:22 4.3.2.1.9876 > 10.0.0.1.8888: . ack 11542
13:35:22 10.0.0.1.8888 > 4.3.2.1.9876: . 11542:12982(1440) ack 323
13:35:23 4.3.2.1.9876 > 10.0.0.1.8888: . ack 12982
13:35:23 10.0.0.1.8888 > 4.3.2.1.9876: . 12982:14422(1440) ack 323
13:35:24 4.3.2.1.9876 > 10.0.0.1.8888: . ack 14422
13:35:24 10.0.0.1.8888 > 4.3.2.1.9876: . 14422:15862(1440) ack 323
13:35:25 4.3.2.1.9876 > 10.0.0.1.8888: . ack 15862
13:35:25 10.0.0.1.8888 > 4.3.2.1.9876: . 15862:17302(1440) ack 323
13:35:25 4.3.2.1.9876 > 10.0.0.1.8888: . ack 17302
13:35:25 10.0.0.1.8888 > 4.3.2.1.9876: . 17302:18742(1440) ack 323
13:36:38 10.0.0.1.8888 > 4.3.2.1.9876: . 17302:18742(1440) ack 323

As seen, the RS keeps on trying to send the last packet while CLIENT
keeps on trying to send the FIN.

I'm not entirely sure if I was able to read the said information fast
enough (lots of connections, big tables) but it seems that at that time
ipvsadm -L --connection shows that connection in "FIN_WAIT" while
/proc/net/ip_conntrack does not have an entry for it at all.

There is also a variation of this issue, where the final FIN is
delivered from CLIENT to RS, but the RS's ACK isn't delivered to the
CLIENT, so the client still keeps on sending FINs. In that case, ipvsadm
shows the connection in "TIME_WAIT" state (still nothing in conntrack).

Alltogether, a few percent of connections is affected by this. My
interpetation is, that for some reason LVS code seems to remove the
conntrack immediately when a final FIN is seen and stops forwarding
packets after that. My iptables rules stop the answers going out,
because the connection is no longer ESTABLISHED.

Siim


_______________________________________________
LinuxVirtualServer.org mailing list - lvs-users [at] LinuxVirtualServer
Send requests to lvs-users-request [at] LinuxVirtualServer
or go to http://lists.graemef.net/mailman/listinfo/lvs-users


ja at ssi

Apr 7, 2008, 2:08 PM

Post #2 of 3 (530 views)
Permalink
Re: [lvs-users] issue with closing connections with Julian's nfct patches [In reply to]

Hello,

On Mon, 7 Apr 2008, Siim Põder wrote:

> I have a server behind LVS-NAT that sends all it's data quite fast
> followed by a FIN. after that, it retransmits lost packets as needed.
> the problem is, that for some reason, the connection-terminating FIN
> (with the last ACK) from CLIENT isn't delivered to the RS (in some
> cases), which keeps on sending the last packet until it gives up.

Server sends FIN, IPVS switches to FIN_WAIT (2 minutes):

> 13:34:21 1.2.3.4.8888 > 4.3.2.1.9876: FP 63382:64463(1081) ack 323

2 minutes passed:

> 13:35:25 4.3.2.1.9876 > 1.2.3.4.8888: . ack 15862
> 13:35:25 1.2.3.4.8888 > 4.3.2.1.9876: . 15862:17302(1440) ack 323
> 13:35:25 4.3.2.1.9876 > 1.2.3.4.8888: . ack 17302
> 13:35:25 1.2.3.4.8888 > 4.3.2.1.9876: . 17302:18742(1440) ack 323
> 13:37:25 4.3.2.1.9876 > 1.2.3.4.8888: F 323:323(0) ack 17302
> 13:37:28 4.3.2.1.9876 > 1.2.3.4.8888: F 323:323(0) ack 17302
> 13:37:33 4.3.2.1.9876 > 1.2.3.4.8888: F 323:323(0) ack 17302

> I'm not entirely sure if I was able to read the said information fast
> enough (lots of connections, big tables) but it seems that at that time
> ipvsadm -L --connection shows that connection in "FIN_WAIT" while
> /proc/net/ip_conntrack does not have an entry for it at all.

Hm, you can try to increase FIN_WAIT timeout in Netfilter.
IPVS extends its timeout with 2mins on every packet and as you see
IPVS connection is still present. May be you can play with following
parameters in /proc/sys/net/netfilter/:

nf_conntrack_tcp_max_retrans
nf_conntrack_tcp_timeout_fin_wait

I'm not expert for Netfilter timeouts, the idea is to
increase FIN_WAIT timeout. You can do the same for IPVS timeouts,
see ipvsadm --set.

> Alltogether, a few percent of connections is affected by this. My
> interpetation is, that for some reason LVS code seems to remove the
> conntrack immediately when a final FIN is seen and stops forwarding

... or Netfilter removes its connection after strictly
expiring timer after 2 mins in FIN_WAIT.

Regards

--
Julian Anastasov <ja [at] ssi>

_______________________________________________
LinuxVirtualServer.org mailing list - lvs-users [at] LinuxVirtualServer
Send requests to lvs-users-request [at] LinuxVirtualServer
or go to http://lists.graemef.net/mailman/listinfo/lvs-users


windo at p6drad-teel

Apr 10, 2008, 6:38 AM

Post #3 of 3 (512 views)
Permalink
Re: [lvs-users] issue with closing connections with Julian's nfct patches [In reply to]

Yo!

Julian Anastasov wrote:
>> why should the timer expire, if there are packets transmitted? also, i
>> set the timer to 8 minutes, longer than those connections. still the
>> same behaviour:
> May be it is not only the timeout, there are retransmission
> counters used in Netfilter.

ok, i set all netfilter tcp timeouts to 10min, retransmissions to 50 and
tcp_be_liberal + ipvs tcpfin timeout to 10min. it seemed to make the
situation better (not perfect though).

looking at packet traces and traceroutes to those clients that still
have trouble, it seems they are a from far away (network path wise),
have packet loss or instability with latency to us. those conditions
could cause a lot of retransmissions and and also time out at various
connection states.

looking at it from this perspective: if you have some clients with a
really bad link to you, it probably makes sense to have longer timeouts
for fin_wait and close_wait (and possibly higher max_retrans). also
last_ack, close and time_wait if you like to get the connections closed
properly.

or just accept the fact, that if their links (to us) suck and no amount
of netfilter tuning will help if the cables/hw between us and them are
broken.

>> i log the dropped packets and take the last one with tail. i search for
>> it from ipvsadm connection tables and conntrack:
>
> Any error message why packets are dropped in Netfilter?

It is because of the firewall. It accepts all ESTABLISHED packets and
drops the rest (logging them):

Chain FORWARD (policy DROP)
target prot opt source destination
ACCEPT 0 -- 0.0.0.0/0 0.0.0.0/0 ctstate
ESTABLISHED
DROP 0 -- 0.0.0.0/0 0.0.0.0/0 ctstate
INVALID
LOG 0 -- 0.0.0.0/0 0.0.0.0/0 LOG flags 0
level 4 prefix `forward: '

There are NAT connections on the LVS and I want to allow only the LVS
connections and the NAT connections to pass. nfct patches make it
possible to ACCEPT ESTABLISHED packets and first packets of any new NAT
connection (outgoing) and drop the rest. The problem seems to be that
ipvs tables are not at sync with conntrack tables and some connections
are forgotten by netfilter that are still forwarded by ipvs.

> IPVS tries to drop netfilter conn only at one place:
> ip_vs_nfct_conn_drop(), called when IPVS connection is removed.
> So, I assume it is not happening in your case because IPVS conn
> is still present.

would it make sense to have a true ipvs state match for netfilter? that
way we can make a fw policy that accepts known outgoing ipvs traffic on
the same basis as ipvs would accept incoming?

Siim

_______________________________________________
LinuxVirtualServer.org mailing list - lvs-users [at] LinuxVirtualServer
Send requests to lvs-users-request [at] LinuxVirtualServer
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

Linux Virtual Server users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.