Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

Newbie question again

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


diego.defelice at gmail

Aug 2, 2005, 1:51 AM

Post #1 of 5 (1257 views)
Permalink
Newbie question again

Hi to all, I'm making first steps with Linux-HA and I'm having some problems.

First of all the scenario.

- nodes: slave1 and slave2 in an active/passive configuration
- each node has 2 ethernet card, eth0 is connected to the normal LAN,
eth1 is used for the cluster (using a cross-cable). slave1 eth0 has
10.10.1.40, eth1 192.168.1.2. slave2 eth0 has 10.10.1.50, eth1
192.168.1.1
- each node has Linux 2.4.21-4.ELsmp #1 SMP and Linux-HA installed
from heartbeat-1.2.3-2.rh.el.3.0.i386.rpm found in the redhat_el_3.0
directory on http://www.ultramonkey.org/download/heartbeat/1.2.3/

The configuration files for slave1 are these:

ha.cf

keepalive 1
deadtime 5
warntime 3
initdead 10
udpport 694
bcast eth1
ucast eth1 192.168.1.1
auto_failback off
node slave1
node slave2

haresouces

slave1 10.10.1.45 smb

The configuration files for slave1 are these:

ha.cf

keepalive 1
deadtime 5
warntime 3
initdead 10
udpport 694
bcast eth1
ucast eth1 192.168.1.2
auto_failback off
node slave1
node slave2

haresouces

slave1 10.10.1.45 smb

Now, here is the problem. I start slave1 (slave2 is power down), and I
expect it acquires the resources and the virtual IP, the problem is
that slave1 is as if it was dead! It acquires no virtual IP. The log
for slave1 is reported below:

heartbeat[8082]: 2005/08/02_09:37:26 info: AUTH: i=1: key = 0x80fe474,
auth=0xb75e5634, authname=sha1
heartbeat[8082]: 2005/08/02_09:37:26 WARN: Logging daemon is disabled
--enabling logging daemon is recommended
heartbeat[8082]: 2005/08/02_09:37:26 info: **************************
heartbeat[8082]: 2005/08/02_09:37:26 info: Configuration validated.
Starting heartbeat 1.99.5
heartbeat[8083]: 2005/08/02_09:37:26 info: heartbeat: version 1.99.5
heartbeat[8083]: 2005/08/02_09:37:27 info: Heartbeat generation: 18
heartbeat[8083]: 2005/08/02_09:37:27 info: glib: UDP Broadcast
heartbeat started on port 694 (694) interface eth1
heartbeat[8083]: 2005/08/02_09:37:27 info: glib: ucast: write socket
priority set to IPTOS_LOWDELAY on eth1
heartbeat[8083]: 2005/08/02_09:37:27 info: glib: ucast: bound send
socket to device: eth1
heartbeat[8083]: 2005/08/02_09:37:27 info: glib: ucast: bound receive
socket to device: eth1
heartbeat[8083]: 2005/08/02_09:37:27 info: glib: ucast: started on
port 694 interface eth1 to 192.168.1.1
heartbeat[8083]: 2005/08/02_09:37:27 info: G_main_add_SignalHandler:
Added signal handler for signal 17
heartbeat[8083]: 2005/08/02_09:37:27 info: pid 8083 locked in memory.
heartbeat[8083]: 2005/08/02_09:37:27 info: Local status now set to: 'up'
heartbeat[8090]: 2005/08/02_09:37:28 info: pid 8090 locked in memory.
heartbeat[8091]: 2005/08/02_09:37:28 info: pid 8091 locked in memory.
heartbeat[8092]: 2005/08/02_09:37:28 info: pid 8092 locked in memory.
heartbeat[8083]: 2005/08/02_09:37:28 info: Link slave1:eth1 up.
heartbeat[8093]: 2005/08/02_09:37:28 info: pid 8093 locked in memory.
heartbeat[8094]: 2005/08/02_09:37:28 info: pid 8094 locked in memory.
heartbeat[8083]: 2005/08/02_09:37:37 WARN: node slave2: is dead
heartbeat[8083]: 2005/08/02_09:37:37 info: Local status now set to: 'active'
heartbeat[8083]: 2005/08/02_09:37:37 WARN: No STONITH device configured.
heartbeat[8083]: 2005/08/02_09:37:37 WARN: Shared disks are not protected.
heartbeat[8083]: 2005/08/02_09:37:37 info: Resources being acquired from slave2.
harc[8096]: 2005/08/02_09:37:37 info: Running /etc/ha.d/rc.d/status status
mach_down[8106]: 2005/08/02_09:37:37 info:
/usr/lib/heartbeat/mach_down: nice_failback: foreign resources
acquired
mach_down[8106]: 2005/08/02_09:37:37 info: mach_down takeover complete
for node slave2.
heartbeat[8083]: 2005/08/02_09:37:37 info: Exiting status process 8096
returned rc 0.
req_resource[8139]: 2005/08/02_09:37:37 debug: in
/usr/lib/heartbeat/req_resource 10.10.1.45
req_resource[8139]: 2005/08/02_09:37:37 debug: dont_ask: yes nice_failback: yes
heartbeat[8120]: 2005/08/02_09:37:37 info: 1 local resources from
[/usr/lib/heartbeat/ResourceManager listkeys slave1]
heartbeat[8120]: 2005/08/02_09:37:37 info: Local Resource acquisition completed.
heartbeat[8083]: 2005/08/02_09:37:37 info: Exiting req_our_resources
process 8120 returned rc 0.
heartbeat[8083]: 2005/08/02_09:37:37 info: AnnounceTakeover(local 1,
foreign 0, reason 'req_our_resources' (0))


I think slave1 is considered always dead, because if I start slave2
(starting heartbeat also), it acquires the resources and the virtual
IP, and this is very strange. But the most strange thingh is that if I
shutdown slave2, slave1 continues to be dead and the resurces are left
unassigned (the virtual IP is not bound to anything)... not a very
usefull cluster :-)

I report the slave2 log file, but this is not so usefull because the
cluster is not working with one node, so the first problem is the
first node:

heartbeat[6128]: 2005/08/02_09:41:26 info: AUTH: i=1: key = 0x80fe474,
auth=0xb75e5634, authname=sha1
heartbeat[6128]: 2005/08/02_09:41:26 WARN: Logging daemon is disabled
--enabling logging daemon is recommended
heartbeat[6128]: 2005/08/02_09:41:26 info: **************************
heartbeat[6128]: 2005/08/02_09:41:26 info: Configuration validated.
Starting heartbeat 1.99.5
heartbeat[6129]: 2005/08/02_09:41:26 info: heartbeat: version 1.99.5
heartbeat[6129]: 2005/08/02_09:41:27 info: Heartbeat generation: 15
heartbeat[6129]: 2005/08/02_09:41:27 info: glib: UDP Broadcast
heartbeat started on port 694 (694) interface eth1
heartbeat[6129]: 2005/08/02_09:41:27 info: glib: ucast: write socket
priority set to IPTOS_LOWDELAY on eth1
heartbeat[6129]: 2005/08/02_09:41:27 info: glib: ucast: bound send
socket to device: eth1
heartbeat[6129]: 2005/08/02_09:41:27 info: glib: ucast: bound receive
socket to device: eth1
heartbeat[6129]: 2005/08/02_09:41:27 info: glib: ucast: started on
port 694 interface eth1 to 192.168.1.2
heartbeat[6129]: 2005/08/02_09:41:27 info: G_main_add_SignalHandler:
Added signal handler for signal 17
heartbeat[6129]: 2005/08/02_09:41:27 info: pid 6129 locked in memory.
heartbeat[6129]: 2005/08/02_09:41:27 info: Local status now set to: 'up'
heartbeat[6136]: 2005/08/02_09:41:28 info: pid 6136 locked in memory.
heartbeat[6137]: 2005/08/02_09:41:28 info: pid 6137 locked in memory.
heartbeat[6139]: 2005/08/02_09:41:28 info: pid 6139 locked in memory.
heartbeat[6138]: 2005/08/02_09:41:28 info: pid 6138 locked in memory.
heartbeat[6129]: 2005/08/02_09:41:28 info: Link slave2:eth1 up.
heartbeat[6140]: 2005/08/02_09:41:28 info: pid 6140 locked in memory.
heartbeat[6129]: 2005/08/02_09:41:37 WARN: node slave1: is dead
heartbeat[6129]: 2005/08/02_09:41:37 info: Local status now set to: 'active'
heartbeat[6129]: 2005/08/02_09:41:37 WARN: No STONITH device configured.
heartbeat[6129]: 2005/08/02_09:41:37 WARN: Shared disks are not protected.
heartbeat[6129]: 2005/08/02_09:41:37 info: Resources being acquired from slave1.
harc[6141]: 2005/08/02_09:41:37 info: Running /etc/ha.d/rc.d/status status
mach_down[6151]: 2005/08/02_09:41:37 info: Taking over resource group 10.10.1.45
heartbeat[6163]: 2005/08/02_09:41:37 info: No local resources
[/usr/lib/heartbeat/ResourceManager listkeys slave2] to acquire.
heartbeat[6129]: 2005/08/02_09:41:37 info: AnnounceTakeover(local 0,
foreign 1, reason 'T_RESOURCES' (0))
heartbeat[6129]: 2005/08/02_09:41:37 info: AnnounceTakeover(local 1,
foreign 1, reason 'T_RESOURCES(us)' (0))
heartbeat[6129]: 2005/08/02_09:41:37 info: Initial resource
acquisition complete (T_RESOURCES(us))
ResourceManager[6181]: 2005/08/02_09:41:37 info: Acquiring resource
group: slave1 10.10.1.45 smb
heartbeat[6129]: 2005/08/02_09:41:37 info: STATE 1 => 3
heartbeat[6129]: 2005/08/02_09:41:37 info: Exiting req_our_resources
process 6163 returned rc 0.
heartbeat[6129]: 2005/08/02_09:41:37 info: AnnounceTakeover(local 1,
foreign 1, reason 'req_our_resources' (1))
ResourceManager[6181]: 2005/08/02_09:41:37 info: Running
/etc/ha.d/resource.d/IPaddr 10.10.1.45 start
IPaddr[6239]: 2005/08/02_09:41:37 info: /sbin/ifconfig eth0:0
10.10.1.45 netmask 255.255.255.0 broadcast 10.10.1.255
IPaddr[6239]: 2005/08/02_09:41:37 info: Sending Gratuitous Arp for
10.10.1.45 on eth0:0 [eth0]
IPaddr[6239]: 2005/08/02_09:41:37 /usr/lib/heartbeat/send_arp -i 500
-r 10 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-10.10.1.45 eth0
10.10.1.45 auto 10.10.1.45 ffffffffffff
ResourceManager[6181]: 2005/08/02_09:41:37 info: Running /etc/init.d/smb start
mach_down[6151]: 2005/08/02_09:41:37 info:
/usr/lib/heartbeat/mach_down: nice_failback: foreign resources
acquired
mach_down[6151]: 2005/08/02_09:41:38 info: mach_down takeover complete
for node slave1.
heartbeat[6129]: 2005/08/02_09:41:38 info: Exiting status process 6141
returned rc 0.

--
Diego de Felice
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha


alanr at unix

Aug 2, 2005, 3:38 AM

Post #2 of 5 (1103 views)
Permalink
Re: Newbie question again [In reply to]

Hi Diego,

The usual newbie problem when setting up a heartbeat cluster and both
nodes think the other is dead is that firewalls on one or both machines
are blocking the heartbeat port (694).



Diego de Felice wrote:
> Hi to all, I'm making first steps with Linux-HA and I'm having some problems.
>
> First of all the scenario.
>
> - nodes: slave1 and slave2 in an active/passive configuration
> - each node has 2 ethernet card, eth0 is connected to the normal LAN,
> eth1 is used for the cluster (using a cross-cable). slave1 eth0 has
> 10.10.1.40, eth1 192.168.1.2. slave2 eth0 has 10.10.1.50, eth1
> 192.168.1.1
> - each node has Linux 2.4.21-4.ELsmp #1 SMP and Linux-HA installed
> from heartbeat-1.2.3-2.rh.el.3.0.i386.rpm found in the redhat_el_3.0
> directory on http://www.ultramonkey.org/download/heartbeat/1.2.3/
>
> The configuration files for slave1 are these:
>
> ha.cf
>
> keepalive 1
> deadtime 5
> warntime 3
> initdead 10
> udpport 694
> bcast eth1
> ucast eth1 192.168.1.1
> auto_failback off
> node slave1
> node slave2
>
> haresouces
>
> slave1 10.10.1.45 smb
>
> The configuration files for slave1 are these:
>
> ha.cf
>
> keepalive 1
> deadtime 5
> warntime 3
> initdead 10
> udpport 694
> bcast eth1
> ucast eth1 192.168.1.2
> auto_failback off
> node slave1
> node slave2
>
> haresouces
>
> slave1 10.10.1.45 smb
>
> Now, here is the problem. I start slave1 (slave2 is power down), and I
> expect it acquires the resources and the virtual IP, the problem is
> that slave1 is as if it was dead! It acquires no virtual IP. The log
> for slave1 is reported below:
>
> heartbeat[8082]: 2005/08/02_09:37:26 info: AUTH: i=1: key = 0x80fe474,
> auth=0xb75e5634, authname=sha1
> heartbeat[8082]: 2005/08/02_09:37:26 WARN: Logging daemon is disabled
> --enabling logging daemon is recommended
> heartbeat[8082]: 2005/08/02_09:37:26 info: **************************
> heartbeat[8082]: 2005/08/02_09:37:26 info: Configuration validated.
> Starting heartbeat 1.99.5
> heartbeat[8083]: 2005/08/02_09:37:26 info: heartbeat: version 1.99.5
> heartbeat[8083]: 2005/08/02_09:37:27 info: Heartbeat generation: 18
> heartbeat[8083]: 2005/08/02_09:37:27 info: glib: UDP Broadcast
> heartbeat started on port 694 (694) interface eth1
> heartbeat[8083]: 2005/08/02_09:37:27 info: glib: ucast: write socket
> priority set to IPTOS_LOWDELAY on eth1
> heartbeat[8083]: 2005/08/02_09:37:27 info: glib: ucast: bound send
> socket to device: eth1
> heartbeat[8083]: 2005/08/02_09:37:27 info: glib: ucast: bound receive
> socket to device: eth1
> heartbeat[8083]: 2005/08/02_09:37:27 info: glib: ucast: started on
> port 694 interface eth1 to 192.168.1.1
> heartbeat[8083]: 2005/08/02_09:37:27 info: G_main_add_SignalHandler:
> Added signal handler for signal 17
> heartbeat[8083]: 2005/08/02_09:37:27 info: pid 8083 locked in memory.
> heartbeat[8083]: 2005/08/02_09:37:27 info: Local status now set to: 'up'
> heartbeat[8090]: 2005/08/02_09:37:28 info: pid 8090 locked in memory.
> heartbeat[8091]: 2005/08/02_09:37:28 info: pid 8091 locked in memory.
> heartbeat[8092]: 2005/08/02_09:37:28 info: pid 8092 locked in memory.
> heartbeat[8083]: 2005/08/02_09:37:28 info: Link slave1:eth1 up.
> heartbeat[8093]: 2005/08/02_09:37:28 info: pid 8093 locked in memory.
> heartbeat[8094]: 2005/08/02_09:37:28 info: pid 8094 locked in memory.
> heartbeat[8083]: 2005/08/02_09:37:37 WARN: node slave2: is dead
> heartbeat[8083]: 2005/08/02_09:37:37 info: Local status now set to: 'active'
> heartbeat[8083]: 2005/08/02_09:37:37 WARN: No STONITH device configured.
> heartbeat[8083]: 2005/08/02_09:37:37 WARN: Shared disks are not protected.
> heartbeat[8083]: 2005/08/02_09:37:37 info: Resources being acquired from slave2.
> harc[8096]: 2005/08/02_09:37:37 info: Running /etc/ha.d/rc.d/status status
> mach_down[8106]: 2005/08/02_09:37:37 info:
> /usr/lib/heartbeat/mach_down: nice_failback: foreign resources
> acquired
> mach_down[8106]: 2005/08/02_09:37:37 info: mach_down takeover complete
> for node slave2.
> heartbeat[8083]: 2005/08/02_09:37:37 info: Exiting status process 8096
> returned rc 0.
> req_resource[8139]: 2005/08/02_09:37:37 debug: in
> /usr/lib/heartbeat/req_resource 10.10.1.45
> req_resource[8139]: 2005/08/02_09:37:37 debug: dont_ask: yes nice_failback: yes
> heartbeat[8120]: 2005/08/02_09:37:37 info: 1 local resources from
> [/usr/lib/heartbeat/ResourceManager listkeys slave1]
> heartbeat[8120]: 2005/08/02_09:37:37 info: Local Resource acquisition completed.
> heartbeat[8083]: 2005/08/02_09:37:37 info: Exiting req_our_resources
> process 8120 returned rc 0.
> heartbeat[8083]: 2005/08/02_09:37:37 info: AnnounceTakeover(local 1,
> foreign 0, reason 'req_our_resources' (0))
>
>
> I think slave1 is considered always dead, because if I start slave2
> (starting heartbeat also), it acquires the resources and the virtual
> IP, and this is very strange. But the most strange thingh is that if I
> shutdown slave2, slave1 continues to be dead and the resurces are left
> unassigned (the virtual IP is not bound to anything)... not a very
> usefull cluster :-)
>
> I report the slave2 log file, but this is not so usefull because the
> cluster is not working with one node, so the first problem is the
> first node:
>
> heartbeat[6128]: 2005/08/02_09:41:26 info: AUTH: i=1: key = 0x80fe474,
> auth=0xb75e5634, authname=sha1
> heartbeat[6128]: 2005/08/02_09:41:26 WARN: Logging daemon is disabled
> --enabling logging daemon is recommended
> heartbeat[6128]: 2005/08/02_09:41:26 info: **************************
> heartbeat[6128]: 2005/08/02_09:41:26 info: Configuration validated.
> Starting heartbeat 1.99.5
> heartbeat[6129]: 2005/08/02_09:41:26 info: heartbeat: version 1.99.5
> heartbeat[6129]: 2005/08/02_09:41:27 info: Heartbeat generation: 15
> heartbeat[6129]: 2005/08/02_09:41:27 info: glib: UDP Broadcast
> heartbeat started on port 694 (694) interface eth1
> heartbeat[6129]: 2005/08/02_09:41:27 info: glib: ucast: write socket
> priority set to IPTOS_LOWDELAY on eth1
> heartbeat[6129]: 2005/08/02_09:41:27 info: glib: ucast: bound send
> socket to device: eth1
> heartbeat[6129]: 2005/08/02_09:41:27 info: glib: ucast: bound receive
> socket to device: eth1
> heartbeat[6129]: 2005/08/02_09:41:27 info: glib: ucast: started on
> port 694 interface eth1 to 192.168.1.2
> heartbeat[6129]: 2005/08/02_09:41:27 info: G_main_add_SignalHandler:
> Added signal handler for signal 17
> heartbeat[6129]: 2005/08/02_09:41:27 info: pid 6129 locked in memory.
> heartbeat[6129]: 2005/08/02_09:41:27 info: Local status now set to: 'up'
> heartbeat[6136]: 2005/08/02_09:41:28 info: pid 6136 locked in memory.
> heartbeat[6137]: 2005/08/02_09:41:28 info: pid 6137 locked in memory.
> heartbeat[6139]: 2005/08/02_09:41:28 info: pid 6139 locked in memory.
> heartbeat[6138]: 2005/08/02_09:41:28 info: pid 6138 locked in memory.
> heartbeat[6129]: 2005/08/02_09:41:28 info: Link slave2:eth1 up.
> heartbeat[6140]: 2005/08/02_09:41:28 info: pid 6140 locked in memory.
> heartbeat[6129]: 2005/08/02_09:41:37 WARN: node slave1: is dead
> heartbeat[6129]: 2005/08/02_09:41:37 info: Local status now set to: 'active'
> heartbeat[6129]: 2005/08/02_09:41:37 WARN: No STONITH device configured.
> heartbeat[6129]: 2005/08/02_09:41:37 WARN: Shared disks are not protected.
> heartbeat[6129]: 2005/08/02_09:41:37 info: Resources being acquired from slave1.
> harc[6141]: 2005/08/02_09:41:37 info: Running /etc/ha.d/rc.d/status status
> mach_down[6151]: 2005/08/02_09:41:37 info: Taking over resource group 10.10.1.45
> heartbeat[6163]: 2005/08/02_09:41:37 info: No local resources
> [/usr/lib/heartbeat/ResourceManager listkeys slave2] to acquire.
> heartbeat[6129]: 2005/08/02_09:41:37 info: AnnounceTakeover(local 0,
> foreign 1, reason 'T_RESOURCES' (0))
> heartbeat[6129]: 2005/08/02_09:41:37 info: AnnounceTakeover(local 1,
> foreign 1, reason 'T_RESOURCES(us)' (0))
> heartbeat[6129]: 2005/08/02_09:41:37 info: Initial resource
> acquisition complete (T_RESOURCES(us))
> ResourceManager[6181]: 2005/08/02_09:41:37 info: Acquiring resource
> group: slave1 10.10.1.45 smb
> heartbeat[6129]: 2005/08/02_09:41:37 info: STATE 1 => 3
> heartbeat[6129]: 2005/08/02_09:41:37 info: Exiting req_our_resources
> process 6163 returned rc 0.
> heartbeat[6129]: 2005/08/02_09:41:37 info: AnnounceTakeover(local 1,
> foreign 1, reason 'req_our_resources' (1))
> ResourceManager[6181]: 2005/08/02_09:41:37 info: Running
> /etc/ha.d/resource.d/IPaddr 10.10.1.45 start
> IPaddr[6239]: 2005/08/02_09:41:37 info: /sbin/ifconfig eth0:0
> 10.10.1.45 netmask 255.255.255.0 broadcast 10.10.1.255
> IPaddr[6239]: 2005/08/02_09:41:37 info: Sending Gratuitous Arp for
> 10.10.1.45 on eth0:0 [eth0]
> IPaddr[6239]: 2005/08/02_09:41:37 /usr/lib/heartbeat/send_arp -i 500
> -r 10 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-10.10.1.45 eth0
> 10.10.1.45 auto 10.10.1.45 ffffffffffff
> ResourceManager[6181]: 2005/08/02_09:41:37 info: Running /etc/init.d/smb start
> mach_down[6151]: 2005/08/02_09:41:37 info:
> /usr/lib/heartbeat/mach_down: nice_failback: foreign resources
> acquired
> mach_down[6151]: 2005/08/02_09:41:38 info: mach_down takeover complete
> for node slave1.
> heartbeat[6129]: 2005/08/02_09:41:38 info: Exiting status process 6141
> returned rc 0.
>


--
Alan Robertson <alanr[at]unix.sh>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha


diego.defelice at gmail

Aug 2, 2005, 7:20 AM

Post #3 of 5 (1099 views)
Permalink
Re: Newbie question again [In reply to]

Thanks for the help. I've made some tests, but it is not a firewall
problem. I used a program called server_client.tar.gz found on the ML
and the two nodes can see each other. I tested another cross-cable and
I switched the two eth interfaces, but nothing. The problem however is
with the first node. I noticed that if in haresources I assign a VIP
and a resource to a node x, and I start the node y then node y takes
the resource and VIP, but when I start the other node x, this one is
tells that node y is dead and require the resource. Now if I stutdown
node y, node x doesn't reaquire the resource...

My first problem is to let the "cluster" works with one node that
acquires all its resources. Any idea ? In the log files I've read some
strange rows:

req_resource[8139]: 2005/08/02_09:37:37 debug: in
/usr/lib/heartbeat/req_resource 10.10.1.45
req_resource[8139]: 2005/08/02_09:37:37 debug: dont_ask: yes
nice_failback: yes
heartbeat[8120]: 2005/08/02_09:37:37 info: 1 local resources from
[/usr/lib/heartbeat/ResourceManager listkeys slave1]

Are they normal ?

Another thing, in all configuration files I call the nodes with the
name slave1 and slave2, I think I need these aliases in the hosts
file. I've tested with and without, but nothing. Is this setting
supposed to be done ? (there is no DNS in the LAN)

--
Diego de Felice
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha


hendrik.p.leroux at siemens

Aug 2, 2005, 9:54 PM

Post #4 of 5 (1116 views)
Permalink
RE: Newbie question again [In reply to]

=> -----Original Message-----
=> From: Diego de Felice [mailto:diego.defelice[at]gmail.com]
=> Sent: 02 August 2005 16:20
=> To: General Linux-HA mailing list
=> Subject: Re: [Linux-HA] Newbie question again
=>
=>
=> Another thing, in all configuration files I call the nodes with the
=> name slave1 and slave2, I think I need these aliases in the hosts
=> file. I've tested with and without, but nothing. Is this setting
=> supposed to be done ? (there is no DNS in the LAN)
=>
I think the documentation and comments in the ha.cf file are quite clear on
this: the names you give the nodes must be the real names of the systems
(the name you see if you execute `uname -n`).

Do not use host file aliasses - it does not work consistently (been there,
checked that - rather listen to the documentation).

Hendrik le Roux
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
email: mailto:hendrik.p.leroux[at]siemens.com

Note: The information in this e-mail is confidential and is intended solely
for the addressee. If you have received this e-mail in error, you are hereby
notified that any review, copying or distribution is strictly prohibited.
Please inform the sender immediately and destroy the original. Siemens
Limited and/or its subsidiaries accepts no liability of whatever nature for
any loss, liability, damage or expense resulting directly or indirectly from
access to this message and any files or links that are attached hereto.


diego.defelice at gmail

Aug 3, 2005, 12:30 AM

Post #5 of 5 (1089 views)
Permalink
Re: Newbie question again [In reply to]

Nothing to do, it doesn't work without host file aliases. The problem
seems to be in slave1 not capable of acquiring it's own resources! In
fact, look at the last rows in the node1 log:

heartbeat[8083]: 2005/08/02_09:37:37 WARN: node slave2: is dead
heartbeat[8083]: 2005/08/02_09:37:37 info: Local status now set to: 'active'
heartbeat[8083]: 2005/08/02_09:37:37 WARN: No STONITH device configured.
heartbeat[8083]: 2005/08/02_09:37:37 WARN: Shared disks are not protected.
heartbeat[8083]: 2005/08/02_09:37:37 info: Resources being acquired from slave2.
harc[8096]: 2005/08/02_09:37:37 info: Running /etc/ha.d/rc.d/status status
mach_down[8106]: 2005/08/02_09:37:37 info:
/usr/lib/heartbeat/mach_down: nice_failback: foreign resources
acquired
mach_down[8106]: 2005/08/02_09:37:37 info: mach_down takeover complete
for node slave2.
heartbeat[8083]: 2005/08/02_09:37:37 info: Exiting status process 8096
returned rc 0.
req_resource[8139]: 2005/08/02_09:37:37 debug: in
/usr/lib/heartbeat/req_resource 10.10.1.45
req_resource[8139]: 2005/08/02_09:37:37 debug: dont_ask: yes nice_failback: yes
heartbeat[8120]: 2005/08/02_09:37:37 info: 1 local resources from
[/usr/lib/heartbeat/ResourceManager listkeys slave1]
heartbeat[8120]: 2005/08/02_09:37:37 info: Local Resource acquisition completed.
heartbeat[8083]: 2005/08/02_09:37:37 info: Exiting req_our_resources
process 8120 returned rc 0.
heartbeat[8083]: 2005/08/02_09:37:37 info: AnnounceTakeover(local 1,
foreign 0, reason 'req_our_resources' (0))

first it says "Resources being acquired from slave2", but slave2 has
no resources (in haresources it is slave1 to have them). Then the log
says "AnnounceTakeover" but nothing more. In other logs on ML or on
the GettingStarted doc I can see other things. Is there something else
I can check ?

On 8/3/05, Le Roux, Hendrik <hendrik.p.leroux[at]siemens.com> wrote:
>
>
> I think the documentation and comments in the ha.cf file are quite clear on
> this: the names you give the nodes must be the real names of the systems
> (the name you see if you execute `uname -n`).
>
> Do not use host file aliasses - it does not work consistently (been there,
> checked that - rather listen to the documentation).
>

--
Diego de Felice
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.