
davea at ingraftedsoftware
May 3, 2011, 4:17 AM
Post #1 of 1
(242 views)
Permalink
|
|
[lvs-users] problem with ldirectord- web server up/site down :(
|
|
We had a failure yesterday(and we have had this happen in the past about once a month- I am now taking the time to post the problem) and one of our web sites was unavailable. After a few minutes of investigation, I found that the load-balancer did not have any hosts in the rotation for that site. All 3 web servers were up and working so the check in ldirectord should have had all 3 in the current running configuration of ipvs. A simple restart of ldirectord caused all 3 web servers to be added back into the rotation immediately and the site was restored to service. There is no clustering software used in this current configuration. It seems that ldirectord forgets what it is supposed to do over time(a few weeks) and a simple restart makes it happy again, as it has in this case and in previous cases. Here are the software versions for the loadbalancer: CentOS release 5.5 x86_64 ldirectord-1.0.4-1.1.el5 kernel 2.6.18-194.32.1.el5 Here are the important parts of the ldirectord.cf file (anonymized) ============================= # Global Directives checktimeout=20 checkinterval=30 autoreload=yes logfile="local0" quiescent=no fork=yes # http virtual service for redirecting port 80 to my.securesite.com virtual=192.168.35.117:80 real=192.168.35.43:80 gate 100 real=192.168.35.44:80 gate 100 real=192.168.35.45:80 gate 100 service=http scheduler=rr netmask=255.255.255.255 protocol=tcp # http virtual service for my.securesite.com virtual=192.168.35.117:443 real=192.168.35.43:40117 gate 100 real=192.168.35.44:40117 gate 100 real=192.168.35.45:40117 gate 100 service=https scheduler=wlc persistent=600 netmask=255.255.255.255 protocol=tcp virtualhost=my.securesite.com ============================= /etc/ipvsadm.rules ============================= (no entry for this host- let ldirectord figure it out) (note: I have since ADDED the rules here for the 117 https host but I don't see how not having it matters as ldirectord manages that.) ============================= The logs had no place where the actual site was removed from ipvs. It did have some like the following with "failed" - notice the timestamps: May 1 21:10:56 lb71 ldirectord[7336]: system(/sbin/ipvsadm -a -t 63.251.35.117:80 -r 192.168.35.45:80 -g -w 100) failed: May 1 21:10:56 lb71 ldirectord[7336]: Added real server: 192.168.35.45:80 (192.168.35.117:80) (Weight set to 100) May 1 21:10:56 lb71 ldirectord[7343]: Resetting soft failure count: 192.168.35.45:40117 (tcp:192.168.35.117:443) May 1 21:10:56 lb71 ldirectord[7343]: system(/sbin/ipvsadm -a -t 192.168.35.117:443 -r 192.168.35.45:40117 -g -w 100) failed: May 1 21:10:56 lb71 ldirectord[7343]: Added real server: 192.168.35.45:40117 (192.168.35.117:443) (Weight set to 100) Is this a bug in ldirectord? Some thing wrong in my config? Should I look to keepalived? mon? Thanks, Dave _______________________________________________ Please read the documentation before posting - it's available at: http://www.linuxvirtualserver.org/ LinuxVirtualServer.org mailing list - lvs-users [at] LinuxVirtualServer Send requests to lvs-users-request [at] LinuxVirtualServer or go to http://lists.graemef.net/mailman/listinfo/lvs-users
|