
welisson at conectcor
Nov 5, 2007, 9:27 AM
Post #9 of 9
(601 views)
Permalink
|
thanks i will go to see the version of mine kernel, and I will compile another version kernel. Regards Welisson Em Seg 05 Nov 2007 13:36, Yan Fitterer escreveu: > Welisson wrote: > > Hi all, > > > > I am with the same problem, in relation to heartbeat, as it follows below > > in the e-mail. > > I tested handle, I increased the value of deadtime, and nothing it > > decided. I would like to know, if this could be some problem in relation > > to kernel, because I am using in the main o kernel 2.6.18, standard of > > debian etch, and in Connective 10 (secondary) 2.6.12.2 compiled. > > > > What it could be in relation to the Kernel? > > As Dejan said already: > > This indicates one of three possible problems: flakey > > communications, high load, or a kernel scheduler problems. > > So - Yes, it _could_ be a kernel issue. Have you ruled out the other two > possible causes? If not, you should probably start there (as they are > typically easier to identify / fix, and, if relevant, they MUST be fixed > if you want a stable cluster anyway). If comms are clean and load is not > the problem, then re-visit kernel issue. > > > Regards > > > > Welisson > > > > Em Seg 29 Out 2007 10:53, Dejan Muhamedagic escreveu: > >> Hi, > >> > >> On Sun, Oct 28, 2007 at 11:19:32PM -0300, welisson [at] conectcor wrote: > >>> Hi all. > >>> > >>> > >>> Following i have 2 servers, settings for function of firewall, with > >>> configuration. > >>> > >>> Server Master > >>> P4 3.0HT > >>> 2GB Ram > >>> 4 HD (2 used system and 2 to cache squid, firewall, Shaper and BGP-4) > >>> Motherboard Intel > >>> > >>> > >>> Server Slave > >>> P4 2.0 > >>> 1GB Ram > >>> 2 HD > >>> Motherboard Intel without squid but used to firewall, shaper and BGP-4 > >>> > >>> what it occurs is the following one, I have heartbeat installed in the > >>> two servers, and of some days for here, I am having problems with > >>> heartbeat of it to fall and to come back, as it follows in log below > >>> register in the main server: > >>> > >>> > >>> Oct 22 21:10:53 gateway heartbeat[19084]: WARN: Late heartbeat: Node > >>> gateway2.domain.com.br: interval 12530 ms > >>> Oct 22 22:20:37 gateway heartbeat[19084]: WARN: node > >>> gateway2.domain.com.br: is dead > >>> Oct 22 22:20:37 gateway heartbeat[19084]: WARN: No STONITH device > >>> configured. > >>> Oct 22 22:20:37 gateway heartbeat[19084]: WARN: Shared disks are not > >>> protected. > >>> Oct 22 22:20:37 gateway heartbeat[19084]: info: Resources being > >>> acquired from gateway2.domain.com.br. > >>> Oct 22 22:20:37 gateway heartbeat[19084]: info: Link > >>> gateway2.domain.com.br:/dev/ttyS0 dead. > >>> Oct 22 22:20:38 gateway heartbeat: info: Running /etc/ha.d/rc.d/status > >>> status > >>> Oct 22 22:20:38 gateway heartbeat: info: /usr/lib/heartbeat/mach_down: > >>> nice_failback: foreign resources acquired > >>> Oct 22 22:20:42 gateway heartbeat[19084]: WARN: Cluster node > >>> gateway2.domain.com.br returning after partition. > >>> Oct 22 22:20:42 gateway heartbeat[19084]: WARN: Deadtime value may be > >>> too small. > >>> Oct 22 22:20:42 gateway heartbeat[19084]: info: See documentation for > >>> information on tuning deadtime. > >>> Oct 22 22:20:42 gateway heartbeat[19084]: info: Link > >>> gateway2.domain.com.br:/dev/ttyS0 up. > >>> Oct 22 22:20:42 gateway heartbeat[19084]: WARN: Late heartbeat: Node > >>> gateway2.domain.com.br: interval 35790 ms > >> > >> This indicates one of three possible problems: flakey > >> communications, high load, or a kernel scheduler problems. > >> > >> Thanks, > >> > >> Dejan > >> > >>> Oct 22 22:20:42 gateway heartbeat[19084]: info: Status update for node > >>> gateway2.domain.com.br: status active > >>> Oct 22 22:20:42 gateway heartbeat[19084]: info: mach_down takeover > >>> complete. Oct 22 22:20:42 gateway heartbeat: info: mach_down takeover > >>> complete for node gateway2.domain.com.br. > >>> Oct 22 22:20:42 gateway heartbeat[14883]: info: Local Resource > >>> acquisition completed. > >>> Oct 22 22:20:42 gateway heartbeat: info: Running /etc/ha.d/rc.d/status > >>> status > >>> Oct 22 22:20:44 gateway heartbeat[19084]: info: Heartbeat shutdown in > >>> progress. (19084) > >>> Oct 22 22:20:44 gateway heartbeat[16667]: info: Giving up all HA > >>> resources. Oct 22 22:20:44 gateway heartbeat: info: Releasing resource > >>> group: gateway.domain.com.br 200.xxx.xxx.xxx/30/eth0 > >>> 200.xxx.xxx.x6/30/eth1 200.xxx.xxx.x7/29/eth2 firewall shaper > >>> Oct 22 22:20:44 gateway heartbeat: info: Running /etc/init.d/shaper > >>> stop Oct 22 22:20:46 gateway heartbeat: info: Running > >>> /etc/init.d/firewall stop Oct 22 22:20:46 gateway heartbeat: info: > >>> Running > >>> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.x7/29/eth2 stop > >>> Oct 22 22:20:47 gateway heartbeat: info: Running > >>> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.x6/30/eth1 stop > >>> Oct 22 22:20:47 gateway heartbeat: info: /sbin/route -n del -host > >>> 200.xxx.xxx.x6 > >>> Oct 22 22:20:47 gateway heartbeat: info: /sbin/ifconfig eth1:0 down > >>> Oct 22 22:20:47 gateway heartbeat: info: IP Address 200.xxx.xxx.x6 > >>> released Oct 22 22:20:47 gateway heartbeat: info: Running > >>> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.xxx/30/eth0 stop > >>> Oct 22 22:20:47 gateway heartbeat[16667]: info: All HA resources > >>> relinquished. > >>> Oct 22 22:20:47 gateway heartbeat[19084]: WARN: 1 lost packet(s) for > >>> [gateway2.domain.com.br] [239455:239457] > >>> Oct 22 22:20:47 gateway heartbeat[19084]: info: No pkts missing from > >>> gateway2.domain.com.br! > >>> Oct 22 22:20:48 gateway heartbeat[19084]: info: killing HBFIFO process > >>> 19086 with signal 15 > >>> Oct 22 22:20:48 gateway heartbeat[19084]: info: killing HBWRITE process > >>> 19087 with signal 15 > >>> Oct 22 22:20:48 gateway heartbeat[19084]: info: killing HBREAD process > >>> 19088 with signal 15 > >>> Oct 22 22:20:48 gateway heartbeat[19084]: info: Core process 19088 > >>> exited. 3 remaining > >>> Oct 22 22:20:48 gateway heartbeat[19084]: info: Core process 19086 > >>> exited. 2 remaining > >>> Oct 22 22:20:48 gateway heartbeat[19084]: info: Core process 19087 > >>> exited. 1 remaining > >>> Oct 22 22:20:48 gateway heartbeat[19084]: info: Heartbeat shutdown > >>> complete. Oct 22 22:20:48 gateway heartbeat[19084]: info: Heartbeat > >>> restart triggered. Oct 22 22:20:48 gateway heartbeat[19084]: info: > >>> Restarting heartbeat. Oct 22 22:20:48 gateway heartbeat[19084]: info: > >>> Performing heartbeat restart exec. > >>> Oct 22 22:21:19 gateway heartbeat[19084]: info: > >>> ************************** Oct 22 22:21:19 gateway heartbeat[19084]: > >>> info: Configuration > >>> validated. Starting heartbeat 1.2.5 > >>> Oct 22 22:21:19 gateway heartbeat[19947]: info: heartbeat: version > >>> 1.2.5 Oct 22 22:21:19 gateway heartbeat[19947]: info: Heartbeat > >>> generation: 23 Oct 22 22:21:20 gateway heartbeat[19947]: info: Starting > >>> serial heartbeat on tty /dev/ttyS0 (19200 baud) > >>> Oct 22 22:21:20 gateway heartbeat[19947]: info: pid 19947 locked in > >>> memory. Oct 22 22:21:20 gateway heartbeat[19947]: info: Local status > >>> now set to: 'up' > >>> Oct 22 22:21:21 gateway heartbeat[19949]: info: pid 19949 locked in > >>> memory. Oct 22 22:21:21 gateway heartbeat[19950]: info: pid 19950 > >>> locked in memory. Oct 22 22:21:21 gateway heartbeat[19951]: info: pid > >>> 19951 locked in memory. Oct 22 22:21:21 gateway heartbeat[19947]: WARN: > >>> string2msg_ll: node [gateway2.domain.com.br] failed authentication Oct > >>> 22 22:21:22 gateway heartbeat[19947]: info: Link > >>> gateway2.domain.com.br:/dev/ttyS0 up. > >>> Oct 22 22:21:22 gateway heartbeat[19947]: info: Status update for node > >>> gateway2.domain.com.br: status active > >>> Oct 22 22:21:22 gateway heartbeat[19947]: info: Local status now set > >>> to: 'active' > >>> Oct 22 22:21:22 gateway heartbeat: info: Running /etc/ha.d/rc.d/status > >>> status > >>> Oct 22 22:21:22 gateway heartbeat[19947]: info: remote resource > >>> transition completed. > >>> Oct 22 22:21:22 gateway heartbeat[19947]: info: remote resource > >>> transition completed. > >>> Oct 22 22:21:22 gateway heartbeat[19947]: info: Local Resource > >>> acquisition completed. (none) > >>> Oct 22 22:21:23 gateway heartbeat[19947]: info: gateway2.domain.com.br > >>> wants to go standby [foreign] > >>> Oct 22 22:21:35 gateway heartbeat[19947]: info: standby: acquire > >>> [foreign] resources from gateway2.domain.com.br > >>> Oct 22 22:21:35 gateway heartbeat[19956]: info: acquire local HA > >>> resources (standby). > >>> Oct 22 22:21:35 gateway heartbeat: info: Acquiring resource group: > >>> gateway.domain.com.br 200.xxx.xxx.xxx/30/eth0 200.xxx.xxx.x6/30/eth1 > >>> 200.xxx.xxx.x7/29/eth2 firewall shaper > >>> Oct 22 22:21:35 gateway heartbeat: info: Running > >>> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.xxx/30/eth0 start > >>> Oct 22 22:21:35 gateway heartbeat: info: /sbin/ifconfig eth0:0 > >>> 200.xxx.xxx.xxx netmask 255.255.255.252 broadcast 200.208.220.131 > >>> Oct 22 22:21:35 gateway heartbeat: info: Sending Gratuitous Arp for > >>> 200.xxx.xxx.xxx on eth0:0 [eth0] > >>> Oct 22 22:21:35 gateway heartbeat: /usr/lib/heartbeat/send_arp -i 1010 > >>> -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-200.xxx.xxx.xxx > >>> eth0 200.xxx.xxx.xxx auto 200.xxx.xxx.xxx ffffffffffff > >>> Oct 22 22:21:35 gateway heartbeat: info: Running > >>> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.x6/30/eth1 start > >>> Oct 22 22:21:35 gateway heartbeat: info: /sbin/ifconfig eth1:0 > >>> 200.xxx.xxx.x6 netmask 255.255.255.252 broadcast 200.208.223.67 > >>> Oct 22 22:21:35 gateway heartbeat: info: Sending Gratuitous Arp for > >>> 200.xxx.xxx.x6 on eth1:0 [eth1] > >>> Oct 22 22:21:35 gateway heartbeat: /usr/lib/heartbeat/send_arp -i 1010 > >>> -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-200.xxx.xxx.x6 eth1 > >>> 200.xxx.xxx.x6 auto 200.xxx.xxx.x6 ffffffffffff > >>> Oct 22 22:21:36 gateway heartbeat: info: Running > >>> /etc/ha.d/resource.d/IPaddr 200.xxx.xxx.x7/29/eth2 start > >>> Oct 22 22:21:36 gateway heartbeat: info: /sbin/ifconfig eth2:0 > >>> 200.xxx.xxx.x7 netmask 255.255.255.248 broadcast 200.208.220.151 > >>> Oct 22 22:21:36 gateway heartbeat: info: Sending Gratuitous Arp for > >>> 200.xxx.xxx.x7 on eth2:0 [eth2] > >>> Oct 22 22:21:36 gateway heartbeat: /usr/lib/heartbeat/send_arp -i 1010 > >>> -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-200.xxx.xxx.x7 eth2 > >>> 200.xxx.xxx.x7 auto 200.xxx.xxx.x7 ffffffffffff > >>> Oct 22 22:21:36 gateway heartbeat: info: Running /etc/init.d/firewall > >>> start Oct 22 22:21:36 gateway heartbeat: info: Running > >>> /etc/init.d/shaper start Oct 22 22:21:41 gateway heartbeat[19956]: > >>> info: local HA resource acquisition completed (standby). > >>> Oct 22 22:21:41 gateway heartbeat[19947]: info: Standby resource > >>> acquisition done [foreign]. > >>> Oct 22 22:21:41 gateway heartbeat[19947]: info: Initial resource > >>> acquisition complete (auto_failback) > >>> Oct 22 22:21:41 gateway heartbeat[19947]: info: remote resource > >>> transition completed. > >>> > >>> ---------------------------------------------------------------- > >>> Conectcor - velocidade com qualidade > >>> www.conectcor.com.br > >>> > >>> > >>> > >>> _______________________________________________ > >>> Linux-HA mailing list > >>> Linux-HA [at] lists > >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >>> See also: http://linux-ha.org/ReportingProblems > >> > >> _______________________________________________ > >> Linux-HA mailing list > >> Linux-HA [at] lists > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >> See also: http://linux-ha.org/ReportingProblems > > > > _______________________________________________ > > Linux-HA mailing list > > Linux-HA [at] lists > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > _______________________________________________ > Linux-HA mailing list > Linux-HA [at] lists > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA [at] lists http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
|