
alanr at unix
Aug 2, 2005, 3:38 AM
Post #2 of 5
(1103 views)
Permalink
|
Hi Diego, The usual newbie problem when setting up a heartbeat cluster and both nodes think the other is dead is that firewalls on one or both machines are blocking the heartbeat port (694). Diego de Felice wrote: > Hi to all, I'm making first steps with Linux-HA and I'm having some problems. > > First of all the scenario. > > - nodes: slave1 and slave2 in an active/passive configuration > - each node has 2 ethernet card, eth0 is connected to the normal LAN, > eth1 is used for the cluster (using a cross-cable). slave1 eth0 has > 10.10.1.40, eth1 192.168.1.2. slave2 eth0 has 10.10.1.50, eth1 > 192.168.1.1 > - each node has Linux 2.4.21-4.ELsmp #1 SMP and Linux-HA installed > from heartbeat-1.2.3-2.rh.el.3.0.i386.rpm found in the redhat_el_3.0 > directory on http://www.ultramonkey.org/download/heartbeat/1.2.3/ > > The configuration files for slave1 are these: > > ha.cf > > keepalive 1 > deadtime 5 > warntime 3 > initdead 10 > udpport 694 > bcast eth1 > ucast eth1 192.168.1.1 > auto_failback off > node slave1 > node slave2 > > haresouces > > slave1 10.10.1.45 smb > > The configuration files for slave1 are these: > > ha.cf > > keepalive 1 > deadtime 5 > warntime 3 > initdead 10 > udpport 694 > bcast eth1 > ucast eth1 192.168.1.2 > auto_failback off > node slave1 > node slave2 > > haresouces > > slave1 10.10.1.45 smb > > Now, here is the problem. I start slave1 (slave2 is power down), and I > expect it acquires the resources and the virtual IP, the problem is > that slave1 is as if it was dead! It acquires no virtual IP. The log > for slave1 is reported below: > > heartbeat[8082]: 2005/08/02_09:37:26 info: AUTH: i=1: key = 0x80fe474, > auth=0xb75e5634, authname=sha1 > heartbeat[8082]: 2005/08/02_09:37:26 WARN: Logging daemon is disabled > --enabling logging daemon is recommended > heartbeat[8082]: 2005/08/02_09:37:26 info: ************************** > heartbeat[8082]: 2005/08/02_09:37:26 info: Configuration validated. > Starting heartbeat 1.99.5 > heartbeat[8083]: 2005/08/02_09:37:26 info: heartbeat: version 1.99.5 > heartbeat[8083]: 2005/08/02_09:37:27 info: Heartbeat generation: 18 > heartbeat[8083]: 2005/08/02_09:37:27 info: glib: UDP Broadcast > heartbeat started on port 694 (694) interface eth1 > heartbeat[8083]: 2005/08/02_09:37:27 info: glib: ucast: write socket > priority set to IPTOS_LOWDELAY on eth1 > heartbeat[8083]: 2005/08/02_09:37:27 info: glib: ucast: bound send > socket to device: eth1 > heartbeat[8083]: 2005/08/02_09:37:27 info: glib: ucast: bound receive > socket to device: eth1 > heartbeat[8083]: 2005/08/02_09:37:27 info: glib: ucast: started on > port 694 interface eth1 to 192.168.1.1 > heartbeat[8083]: 2005/08/02_09:37:27 info: G_main_add_SignalHandler: > Added signal handler for signal 17 > heartbeat[8083]: 2005/08/02_09:37:27 info: pid 8083 locked in memory. > heartbeat[8083]: 2005/08/02_09:37:27 info: Local status now set to: 'up' > heartbeat[8090]: 2005/08/02_09:37:28 info: pid 8090 locked in memory. > heartbeat[8091]: 2005/08/02_09:37:28 info: pid 8091 locked in memory. > heartbeat[8092]: 2005/08/02_09:37:28 info: pid 8092 locked in memory. > heartbeat[8083]: 2005/08/02_09:37:28 info: Link slave1:eth1 up. > heartbeat[8093]: 2005/08/02_09:37:28 info: pid 8093 locked in memory. > heartbeat[8094]: 2005/08/02_09:37:28 info: pid 8094 locked in memory. > heartbeat[8083]: 2005/08/02_09:37:37 WARN: node slave2: is dead > heartbeat[8083]: 2005/08/02_09:37:37 info: Local status now set to: 'active' > heartbeat[8083]: 2005/08/02_09:37:37 WARN: No STONITH device configured. > heartbeat[8083]: 2005/08/02_09:37:37 WARN: Shared disks are not protected. > heartbeat[8083]: 2005/08/02_09:37:37 info: Resources being acquired from slave2. > harc[8096]: 2005/08/02_09:37:37 info: Running /etc/ha.d/rc.d/status status > mach_down[8106]: 2005/08/02_09:37:37 info: > /usr/lib/heartbeat/mach_down: nice_failback: foreign resources > acquired > mach_down[8106]: 2005/08/02_09:37:37 info: mach_down takeover complete > for node slave2. > heartbeat[8083]: 2005/08/02_09:37:37 info: Exiting status process 8096 > returned rc 0. > req_resource[8139]: 2005/08/02_09:37:37 debug: in > /usr/lib/heartbeat/req_resource 10.10.1.45 > req_resource[8139]: 2005/08/02_09:37:37 debug: dont_ask: yes nice_failback: yes > heartbeat[8120]: 2005/08/02_09:37:37 info: 1 local resources from > [/usr/lib/heartbeat/ResourceManager listkeys slave1] > heartbeat[8120]: 2005/08/02_09:37:37 info: Local Resource acquisition completed. > heartbeat[8083]: 2005/08/02_09:37:37 info: Exiting req_our_resources > process 8120 returned rc 0. > heartbeat[8083]: 2005/08/02_09:37:37 info: AnnounceTakeover(local 1, > foreign 0, reason 'req_our_resources' (0)) > > > I think slave1 is considered always dead, because if I start slave2 > (starting heartbeat also), it acquires the resources and the virtual > IP, and this is very strange. But the most strange thingh is that if I > shutdown slave2, slave1 continues to be dead and the resurces are left > unassigned (the virtual IP is not bound to anything)... not a very > usefull cluster :-) > > I report the slave2 log file, but this is not so usefull because the > cluster is not working with one node, so the first problem is the > first node: > > heartbeat[6128]: 2005/08/02_09:41:26 info: AUTH: i=1: key = 0x80fe474, > auth=0xb75e5634, authname=sha1 > heartbeat[6128]: 2005/08/02_09:41:26 WARN: Logging daemon is disabled > --enabling logging daemon is recommended > heartbeat[6128]: 2005/08/02_09:41:26 info: ************************** > heartbeat[6128]: 2005/08/02_09:41:26 info: Configuration validated. > Starting heartbeat 1.99.5 > heartbeat[6129]: 2005/08/02_09:41:26 info: heartbeat: version 1.99.5 > heartbeat[6129]: 2005/08/02_09:41:27 info: Heartbeat generation: 15 > heartbeat[6129]: 2005/08/02_09:41:27 info: glib: UDP Broadcast > heartbeat started on port 694 (694) interface eth1 > heartbeat[6129]: 2005/08/02_09:41:27 info: glib: ucast: write socket > priority set to IPTOS_LOWDELAY on eth1 > heartbeat[6129]: 2005/08/02_09:41:27 info: glib: ucast: bound send > socket to device: eth1 > heartbeat[6129]: 2005/08/02_09:41:27 info: glib: ucast: bound receive > socket to device: eth1 > heartbeat[6129]: 2005/08/02_09:41:27 info: glib: ucast: started on > port 694 interface eth1 to 192.168.1.2 > heartbeat[6129]: 2005/08/02_09:41:27 info: G_main_add_SignalHandler: > Added signal handler for signal 17 > heartbeat[6129]: 2005/08/02_09:41:27 info: pid 6129 locked in memory. > heartbeat[6129]: 2005/08/02_09:41:27 info: Local status now set to: 'up' > heartbeat[6136]: 2005/08/02_09:41:28 info: pid 6136 locked in memory. > heartbeat[6137]: 2005/08/02_09:41:28 info: pid 6137 locked in memory. > heartbeat[6139]: 2005/08/02_09:41:28 info: pid 6139 locked in memory. > heartbeat[6138]: 2005/08/02_09:41:28 info: pid 6138 locked in memory. > heartbeat[6129]: 2005/08/02_09:41:28 info: Link slave2:eth1 up. > heartbeat[6140]: 2005/08/02_09:41:28 info: pid 6140 locked in memory. > heartbeat[6129]: 2005/08/02_09:41:37 WARN: node slave1: is dead > heartbeat[6129]: 2005/08/02_09:41:37 info: Local status now set to: 'active' > heartbeat[6129]: 2005/08/02_09:41:37 WARN: No STONITH device configured. > heartbeat[6129]: 2005/08/02_09:41:37 WARN: Shared disks are not protected. > heartbeat[6129]: 2005/08/02_09:41:37 info: Resources being acquired from slave1. > harc[6141]: 2005/08/02_09:41:37 info: Running /etc/ha.d/rc.d/status status > mach_down[6151]: 2005/08/02_09:41:37 info: Taking over resource group 10.10.1.45 > heartbeat[6163]: 2005/08/02_09:41:37 info: No local resources > [/usr/lib/heartbeat/ResourceManager listkeys slave2] to acquire. > heartbeat[6129]: 2005/08/02_09:41:37 info: AnnounceTakeover(local 0, > foreign 1, reason 'T_RESOURCES' (0)) > heartbeat[6129]: 2005/08/02_09:41:37 info: AnnounceTakeover(local 1, > foreign 1, reason 'T_RESOURCES(us)' (0)) > heartbeat[6129]: 2005/08/02_09:41:37 info: Initial resource > acquisition complete (T_RESOURCES(us)) > ResourceManager[6181]: 2005/08/02_09:41:37 info: Acquiring resource > group: slave1 10.10.1.45 smb > heartbeat[6129]: 2005/08/02_09:41:37 info: STATE 1 => 3 > heartbeat[6129]: 2005/08/02_09:41:37 info: Exiting req_our_resources > process 6163 returned rc 0. > heartbeat[6129]: 2005/08/02_09:41:37 info: AnnounceTakeover(local 1, > foreign 1, reason 'req_our_resources' (1)) > ResourceManager[6181]: 2005/08/02_09:41:37 info: Running > /etc/ha.d/resource.d/IPaddr 10.10.1.45 start > IPaddr[6239]: 2005/08/02_09:41:37 info: /sbin/ifconfig eth0:0 > 10.10.1.45 netmask 255.255.255.0 broadcast 10.10.1.255 > IPaddr[6239]: 2005/08/02_09:41:37 info: Sending Gratuitous Arp for > 10.10.1.45 on eth0:0 [eth0] > IPaddr[6239]: 2005/08/02_09:41:37 /usr/lib/heartbeat/send_arp -i 500 > -r 10 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-10.10.1.45 eth0 > 10.10.1.45 auto 10.10.1.45 ffffffffffff > ResourceManager[6181]: 2005/08/02_09:41:37 info: Running /etc/init.d/smb start > mach_down[6151]: 2005/08/02_09:41:37 info: > /usr/lib/heartbeat/mach_down: nice_failback: foreign resources > acquired > mach_down[6151]: 2005/08/02_09:41:38 info: mach_down takeover complete > for node slave1. > heartbeat[6129]: 2005/08/02_09:41:38 info: Exiting status process 6141 > returned rc 0. > -- Alan Robertson <alanr[at]unix.sh> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce _______________________________________________ Linux-HA mailing list Linux-HA[at]lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha
|