Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux Virtual Server: Users

[lvs-users] Problem with udp/1812 on a 2-node UltraMonkey style HA cluster

 

 

Linux Virtual Server users RSS feed   Index | Next | Previous | View Threaded


john.donath at xb

Oct 24, 2007, 5:46 AM

Post #1 of 6 (404 views)
Permalink
[lvs-users] Problem with udp/1812 on a 2-node UltraMonkey style HA cluster

Hi,

I have setup a 2 node HA cluster based on the Streamline High
availability and Load Balancing concept.

The weird thing is that it works fantastic for tcp/80 but it doesn't
work properly for a udp service like radius (up/1812).

-------------------
Problem description
-------------------

Assume we have both the http and radius service down on the failover
director (grind12):

[root[at]grind11 ~]# ipvsadm
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
UDP 172.31.1.10:radius rr
-> 172.31.1.11:radius Local 1 0 0
TCP 172.31.1.10:http rr persistent 600
-> 172.31.1.11:http Local 1 0 0

I now can access the webserver but I don't get any response from the
radius service.

Here are results from tcpdump on both nodes when a radius request is
initiated:
[root[at]grind11 ~]# tcpdump -ni any -p udp and host 83.162.10.97
14:41:10.069858 IP 83.162.10.97.32843 > 172.31.1.10.radius: RADIUS,
Access Request (1), id: 0xdb length: 65
14:41:10.069891 IP 172.31.1.11.radius > 83.162.10.97.32843: RADIUS,
Access Accept (2), id: 0xdb length: 26

As you will note the wrong source address is used !!
It's responding with the realnode IP instead of the VIP and that's
causing the problem.

I am puzzled why this problem does not exist when testing http (tcp/80)
as yo can see from this:
14:43:53.399206 IP 83.162.10.97.41143 > 172.31.1.10.http: F 553:553(0)
ack 268 win 1728 <nop,nop,timestamp 496389562 507325571>
14:43:53.399224 IP 172.31.1.10.http > 83.162.10.97.41143: . ack 554 win
1724 <nop,nop,timestamp 507325582 496389562>

Might this be UDP related?

[root[at]grind12 ~]# tcpdump -ni any -p udp and host 83.162.10.97
** nothing of course **

If I reverse the situation - bringing down both services on the primary
director node (grind11) and starting them up on the failover director
(grind12) then both services are accessible.

[root[at]grind11 ~]# ipvsadm
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
UDP 172.31.1.10:radius rr
-> 172.31.1.12:radius Route 1 0 0
TCP 172.31.1.10:http rr persistent 600
-> 172.31.1.12:http Route 1 0 0

[root[at]grind11 ~]# tcpdump -ni any -p udp and host 83.162.10.97
11:28:18.604803 IP 83.162.10.97.32841 > 172.31.1.10.radius: RADIUS,
Access Request (1), id: 0x88 length: 65
11:28:18.604915 IP 83.162.10.97.32841 > 172.31.1.10.radius: RADIUS,
Access Request (1), id: 0x88 length: 65

[root[at]grind12 ~]# tcpdump -ni any -p udp and host 83.162.10.97
11:28:22.517935 IP 83.162.10.97.32841 > 172.31.1.10.radius: RADIUS,
Access Request (1), id: 0x88 length: 65
11:28:22.522124 IP 172.31.1.10.radius > 83.162.10.97.32841: RADIUS,
Access Accept (2), id: 0x88 length: 26

I have tried all I can think off and I am getting a little desperate now
.. -(

Do you gurus have any clue?

------------------------------------
Configuration and topology
------------------------------------

ha.cf
-----
logfacility local0
debug 0
keepalive 2
deadtime 10
warntime 5
initdead 120
udpport 694
ucast eth1 172.31.1.12
ucast eth3 10.0.0.2
auto_failback on
node grind11.graddelt.com
node grind12.graddelt.com
respawn hacluster /usr/lib/heartbeat/ipfail
crm off

haresources
-----------
grind11.graddelt.com \
ldirectord::ldirectord.cf \
LVSSyncDaemonSwap::master \
IPaddr2::172.31.1.10/24/eth1/172.31.1.255

/etc/ha.d/ldirectord.cf
checktimeout=10
checkinterval=2
autoreload=no
logfile="/var/log/ldirectord.log"
quiescent=no
virtual=172.31.1.10:1812
fallback=127.0.0.1:1812
real=172.31.1.11:1812 gate
real=172.31.1.12:1812 gate
service=radius
scheduler=rr
#persistent=600
protocol=udp
checktype=negotiate
login="ldtest[at]xb.nl"
passwd="ScdCz32v"
secret="ldtest123"

virtual=172.31.1.10:80
fallback=127.0.0.1:80
real=172.31.1.11:80 gate
real=172.31.1.12:80 gate
service=http
scheduler=rr
persistent=600
protocol=tcp
checktype=negotiate
request="ldtest.html"
receive="ALIVE"

sysctl
------

[root[at]grind11 ~]# sysctl -a | egrep "(forward|arp)"
net.ipv4.conf.eth3.arp_ignore = 1
net.ipv4.conf.eth3.arp_announce = 2
net.ipv4.conf.eth3.arp_filter = 0
net.ipv4.conf.eth3.proxy_arp = 0
net.ipv4.conf.eth3.mc_forwarding = 0
net.ipv4.conf.eth3.forwarding = 1
net.ipv4.conf.eth1.arp_ignore = 1
net.ipv4.conf.eth1.arp_announce = 2
net.ipv4.conf.eth1.arp_filter = 0
net.ipv4.conf.eth1.proxy_arp = 0
net.ipv4.conf.eth1.mc_forwarding = 0
net.ipv4.conf.eth1.forwarding = 1
net.ipv4.conf.eth0.arp_ignore = 1
net.ipv4.conf.eth0.arp_announce = 2
net.ipv4.conf.eth0.arp_filter = 0
net.ipv4.conf.eth0.proxy_arp = 0
net.ipv4.conf.eth0.mc_forwarding = 0
net.ipv4.conf.eth0.forwarding = 1
net.ipv4.conf.lo.arp_ignore = 0
net.ipv4.conf.lo.arp_announce = 0
net.ipv4.conf.lo.arp_filter = 0
net.ipv4.conf.lo.proxy_arp = 0
net.ipv4.conf.lo.mc_forwarding = 0
net.ipv4.conf.lo.forwarding = 1
net.ipv4.conf.default.arp_ignore = 0
net.ipv4.conf.default.arp_announce = 0
net.ipv4.conf.default.arp_filter = 0
net.ipv4.conf.default.proxy_arp = 0
net.ipv4.conf.default.mc_forwarding = 0
net.ipv4.conf.default.forwarding = 1
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2
net.ipv4.conf.all.arp_filter = 0
net.ipv4.conf.all.proxy_arp = 0
net.ipv4.conf.all.mc_forwarding = 0
net.ipv4.conf.all.forwarding = 1
net.ipv4.ip_forward = 1

_______________________________________________
LinuxVirtualServer.org mailing list - lvs-users[at]LinuxVirtualServer.org
Send requests to lvs-users-request[at]LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users


jmack at wm7d

Oct 24, 2007, 7:20 AM

Post #2 of 6 (334 views)
Permalink
Re: [lvs-users] Problem with udp/1812 on a 2-node UltraMonkey style HA cluster [In reply to]

On Wed, 24 Oct 2007, John Donath wrote:

> Hi,
>
> I have setup a 2 node HA cluster based on the Streamline High
> availability and Load Balancing concept.
>
> The weird thing is that it works fantastic for tcp/80 but it doesn't
> work properly for a udp service like radius (up/1812).

There are conceptual problems loadbalancing UDP, as there is
no connection (see UDP in the HOWTO, there are solutions but
all have problems). As well do you understand the many
reader/single writer problem when loadbalancing databases?

> Assume we have both the http and radius service down on the failover
> director (grind12):
>
> [root[at]grind11 ~]# ipvsadm
> IP Virtual Server version 1.2.0 (size=4096)
> Prot LocalAddress:Port Scheduler Flags
> -> RemoteAddress:Port Forward Weight ActiveConn InActConn
> UDP 172.31.1.10:radius rr
> -> 172.31.1.11:radius Local 1 0 0
> TCP 172.31.1.10:http rr persistent 600
> -> 172.31.1.11:http Local 1 0 0

(not related to your problem) persistence has problems. You
could look at the -SH scheduler instead.

> I now can access the webserver but I don't get any response from the
> radius service.

how can you access a service when the service is down?

Is Radius listening on the VIP? (it should be, see writeup
for LocalNode)


> Here are results from tcpdump on both nodes when a radius request is
> initiated:
> [root[at]grind11 ~]# tcpdump -ni any -p udp and host 83.162.10.97
> 14:41:10.069858 IP 83.162.10.97.32843 > 172.31.1.10.radius: RADIUS,
> Access Request (1), id: 0xdb length: 65
> 14:41:10.069891 IP 172.31.1.11.radius > 83.162.10.97.32843: RADIUS,
> Access Accept (2), id: 0xdb length: 26
>
> As you will note the wrong source address is used !!
> It's responding with the realnode IP instead of the VIP and that's
> causing the problem.

No idea. I assume that Radius is listening on x.x.x.11
(instead of x.x.x.10), in which case I can't imagine how
Radius is getting packets at all.

> I am puzzled why this problem does not exist when testing http (tcp/80)
> as yo can see from this:
> 14:43:53.399206 IP 83.162.10.97.41143 > 172.31.1.10.http: F 553:553(0)
> ack 268 win 1728 <nop,nop,timestamp 496389562 507325571>
> 14:43:53.399224 IP 172.31.1.10.http > 83.162.10.97.41143: . ack 554 win
> 1724 <nop,nop,timestamp 507325582 496389562>
>
> Might this be UDP related?

possibly (since I have no idea what's wrong yet).

> [root[at]grind12 ~]# tcpdump -ni any -p udp and host 83.162.10.97
> ** nothing of course **

I'm sorry, this went over my head. Why "of course"?


> If I reverse the situation - bringing down both services on the primary
> director node (grind11) and starting them up on the failover director
> (grind12) then both services are accessible.

hmm. let's leave this till later.

Joe

--
Joseph Mack NA3T EME(B,D), FM05lw North Carolina
jmack (at) wm7d (dot) net - azimuthal equidistant map
generator at http://www.wm7d.net/azproj.shtml
Homepage http://www.austintek.com/ It's GNU/Linux!

_______________________________________________
LinuxVirtualServer.org mailing list - lvs-users[at]LinuxVirtualServer.org
Send requests to lvs-users-request[at]LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users


john.donath at xb

Oct 24, 2007, 12:30 PM

Post #3 of 6 (328 views)
Permalink
Re: [lvs-users] Problem with udp/1812 on a 2-node UltraMonkey style HA cluster [In reply to]

Joseph Mack NA3T wrote:
> On Wed, 24 Oct 2007, John Donath wrote:
>
>
>> Hi,
>>
>> I have setup a 2 node HA cluster based on the Streamline High
>> availability and Load Balancing concept.
>>
>> The weird thing is that it works fantastic for tcp/80 but it doesn't
>> work properly for a udp service like radius (up/1812).
>>
>
> There are conceptual problems loadbalancing UDP, as there is
> no connection (see UDP in the HOWTO, there are solutions but
> all have problems). As well do you understand the many
> reader/single writer problem when loadbalancing databases?
>
>
Yes, I do. This is not a problem as only read actions are involved.
>> Assume we have both the http and radius service down on the failover
>> director (grind12):
>>
>> [root[at]grind11 ~]# ipvsadm
>> IP Virtual Server version 1.2.0 (size=4096)
>> Prot LocalAddress:Port Scheduler Flags
>> -> RemoteAddress:Port Forward Weight ActiveConn InActConn
>> UDP 172.31.1.10:radius rr
>> -> 172.31.1.11:radius Local 1 0 0
>> TCP 172.31.1.10:http rr persistent 600
>> -> 172.31.1.11:http Local 1 0 0
>>
>
> (not related to your problem) persistence has problems. You
> could look at the -SH scheduler instead.
>
I will sure do.
>
>> I now can access the webserver but I don't get any response from the
>> radius service.
>>
>
> how can you access a service when the service is down?
>
>
The service is down on the failover node but up on the primary.
> Is Radius listening on the VIP? (it should be, see writeup
> for LocalNode)
>
>
Radius is listening on 0.0.0.0.
>
>> Here are results from tcpdump on both nodes when a radius request is
>> initiated:
>> [root[at]grind11 ~]# tcpdump -ni any -p udp and host 83.162.10.97
>> 14:41:10.069858 IP 83.162.10.97.32843 > 172.31.1.10.radius: RADIUS,
>> Access Request (1), id: 0xdb length: 65
>> 14:41:10.069891 IP 172.31.1.11.radius > 83.162.10.97.32843: RADIUS,
>> Access Accept (2), id: 0xdb length: 26
>>
>> As you will note the wrong source address is used !!
>> It's responding with the realnode IP instead of the VIP and that's
>> causing the problem.
>>
>
> No idea. I assume that Radius is listening on x.x.x.11
> (instead of x.x.x.10), in which case I can't imagine how
> Radius is getting packets at all.
>
>
>> I am puzzled why this problem does not exist when testing http (tcp/80)
>> as yo can see from this:
>> 14:43:53.399206 IP 83.162.10.97.41143 > 172.31.1.10.http: F 553:553(0)
>> ack 268 win 1728 <nop,nop,timestamp 496389562 507325571>
>> 14:43:53.399224 IP 172.31.1.10.http > 83.162.10.97.41143: . ack 554 win
>> 1724 <nop,nop,timestamp 507325582 496389562>
>>
>> Might this be UDP related?
>>
>
> possibly (since I have no idea what's wrong yet).
>
>
>> [root[at]grind12 ~]# tcpdump -ni any -p udp and host 83.162.10.97
>> ** nothing of course **
>>
>
> I'm sorry, this went over my head. Why "of course"?
>
"Of course" because I don't expect any packets on the failover node as
the service is only up on the primary node.
So nothing will be forwarded ...
>
>
>> If I reverse the situation - bringing down both services on the primary
>> director node (grind11) and starting them up on the failover director
>> (grind12) then both services are accessible.
>>
>
> hmm. let's leave this till later.
>
>
Just a remark - when the radius service is down on the primary but up on
the failover node the radius service nicely responds to requests.
> Joe
>
>

Thanks for your quick response.

John

_______________________________________________
LinuxVirtualServer.org mailing list - lvs-users[at]LinuxVirtualServer.org
Send requests to lvs-users-request[at]LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users


jmack at wm7d

Oct 24, 2007, 1:00 PM

Post #4 of 6 (325 views)
Permalink
Re: [lvs-users] Problem with udp/1812 on a 2-node UltraMonkey style HA cluster [In reply to]

On Wed, 24 Oct 2007, John Donath wrote:

> Yes, I do. This is not a problem as only read actions are involved.

just checking

>> Is Radius listening on the VIP? (it should be, see writeup
>> for LocalNode)
>>
>>
> Radius is listening on 0.0.0.0.

that knocks my main theory down.

Just to clean things up a little, can you run it only on the
VIP?

> Just a remark - when the radius service is down on the primary but up on
> the failover node the radius service nicely responds to requests.

hmm, udp load balances by staying on one realserver for a
while (15mins? - see the UDP write ups in the HOWT0). It
doesn't behave like tcp at all. I don't know what will
happen if the realserver fails that clients are connecting
to for that time interval.

Do you have another udp service you can test? ntp is udp,
but the time interval for checks increases. Hopefully
ntpdate is udp and you can run that on demand.

you don't have any firewall rules anywhere? turn them off
for testing.

Joe

--
Joseph Mack NA3T EME(B,D), FM05lw North Carolina
jmack (at) wm7d (dot) net - azimuthal equidistant map
generator at http://www.wm7d.net/azproj.shtml
Homepage http://www.austintek.com/ It's GNU/Linux!

_______________________________________________
LinuxVirtualServer.org mailing list - lvs-users[at]LinuxVirtualServer.org
Send requests to lvs-users-request[at]LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users


john.donath at xb

Oct 26, 2007, 3:02 PM

Post #5 of 6 (316 views)
Permalink
Re: [lvs-users] Problem with udp/1812 on a 2-node UltraMonkey style HA cluster [In reply to]

Joseph Mack NA3T wrote:
> On Wed, 24 Oct 2007, John Donath wrote:
>
>
>> Yes, I do. This is not a problem as only read actions are involved.
>>
>
> just checking
>
>
>>> Is Radius listening on the VIP? (it should be, see writeup
>>> for LocalNode)
>>>
>>>
>>>
>> Radius is listening on 0.0.0.0.
>>
>
> that knocks my main theory down.
>
> Just to clean things up a little, can you run it only on the
> VIP?
>
>
Yes, I can:
[root[at]grind11 ~]# netstat -an | grep 1812
udp 0 0 172.31.1.10:1812 0.0.0.0:*

But this won't work either because the check - as defined in the
ldirector.cf - will fail :
virtual=172.31.1.10:1812
#fallback=127.0.0.1:1812
real=172.31.1.11:1812 gate
real=172.31.1.12:1812 gate
service=radius
...

This should be the situation I guess:

[root[at]grind11 ~]# ipvsadm
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
UDP 172.31.1.10:radius rr
-> 172.31.1.12:radius Route 1 0 0
-> 172.31.1.11:radius Local 1 0 3

But when I bind radius to the VIP only one if the realnodes has gone away:

IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
UDP 172.31.1.10:radius rr
-> 172.31.1.12:radius Route 1 0 0

>> Just a remark - when the radius service is down on the primary but up on
>> the failover node the radius service nicely responds to requests.
>>
>
> hmm, udp load balances by staying on one realserver for a
> while (15mins? - see the UDP write ups in the HOWT0). It
> doesn't behave like tcp at all. I don't know what will
> happen if the realserver fails that clients are connecting
> to for that time interval.
>
> Do you have another udp service you can test? ntp is udp,
> but the time interval for checks increases. Hopefully
> ntpdate is udp and you can run that on demand.
>
> you don't have any firewall rules anywhere? turn them off
> for testing.
>
No firewalling involved.
> Joe
>
>



_______________________________________________
LinuxVirtualServer.org mailing list - lvs-users[at]LinuxVirtualServer.org
Send requests to lvs-users-request[at]LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users


jmack at wm7d

Oct 26, 2007, 4:10 PM

Post #6 of 6 (319 views)
Permalink
Re: [lvs-users] Problem with udp/1812 on a 2-node UltraMonkey style HA cluster [In reply to]

On Sat, 27 Oct 2007, John Donath wrote:

>> Just to clean things up a little, can you run it only on the
>> VIP?
>>
>>
> Yes, I can:
> [root[at]grind11 ~]# netstat -an | grep 1812
> udp 0 0 172.31.1.10:1812 0.0.0.0:*
>
> But this won't work either because the check - as defined in the
> ldirector.cf - will fail :

Forgot about that :-(

(In my configure script, the director ssh's to the RIP and
checks that the demon is running on the VIP).

> This should be the situation I guess:
>
> [root[at]grind11 ~]# ipvsadm
> IP Virtual Server version 1.2.0 (size=4096)
> Prot LocalAddress:Port Scheduler Flags
> -> RemoteAddress:Port Forward Weight ActiveConn InActConn
> UDP 172.31.1.10:radius rr
> -> 172.31.1.12:radius Route 1 0 0
> -> 172.31.1.11:radius Local 1 0 3
>
> But when I bind radius to the VIP only one if the realnodes has gone away:
>
> IP Virtual Server version 1.2.0 (size=4096)
> Prot LocalAddress:Port Scheduler Flags
> -> RemoteAddress:Port Forward Weight ActiveConn InActConn
> UDP 172.31.1.10:radius rr
> -> 172.31.1.12:radius Route 1 0 0

sorry I don't get the point here.

I'm pretty much out of ideas. Is this a radius problem or an
ldirectord problem? ie can you setup a single director (no
ldirectord) by hand and have it loadbalance two radius
realservers?

Joe

--
Joseph Mack NA3T EME(B,D), FM05lw North Carolina
jmack (at) wm7d (dot) net - azimuthal equidistant map
generator at http://www.wm7d.net/azproj.shtml
Homepage http://www.austintek.com/ It's GNU/Linux!

_______________________________________________
LinuxVirtualServer.org mailing list - lvs-users[at]LinuxVirtualServer.org
Send requests to lvs-users-request[at]LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

Linux Virtual Server users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.