Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

New problem(s) with heartbeat 2.0.3 and STONITH

 

 

First page Previous page 1 2 3 Next page Last page  View All Linux-HA users RSS feed   Index | Next | Previous | View Threaded


beekhof at gmail

Oct 31, 2005, 9:49 AM

Post #26 of 55 (1845 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

> Btw. pengine tells someting about a memory leak.

i'll take a look at this part tomorrow.
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


beekhof at gmail

Oct 31, 2005, 10:02 AM

Post #27 of 55 (1842 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

On 10/31/05, Alan Robertson <alanr [at] unix> wrote:
> > Its about a 2 line change in the TE where it calls stonith.
> >
> > On the otherhand, if you want me using uuid_t EVERYWHERE... thats a
> > different story.
>
>
> No, no no.
>
> I just meant - let's not call it a uuid. Call it a charhandle or
> something. uniquestring or something.
>
> It's simply a nomenclature issue.

OH.

it is a uuid though - just the human readable form. no?
maybe node_id is a good name.
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


alanr at unix

Oct 31, 2005, 11:34 AM

Post #28 of 55 (1862 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

Andrew Beekhof wrote:
> On 10/31/05, Alan Robertson <alanr [at] unix> wrote:
>>> Its about a 2 line change in the TE where it calls stonith.
>>>
>>> On the otherhand, if you want me using uuid_t EVERYWHERE... thats a
>>> different story.
>>
>> No, no no.
>>
>> I just meant - let's not call it a uuid. Call it a charhandle or
>> something. uniquestring or something.
>>
>> It's simply a nomenclature issue.
>
> OH.
>
> it is a uuid though - just the human readable form. no?
> maybe node_id is a good name.

Then it should be called a uuid_string maybe instead?

I thought this was just unique. I didn't realize it was a canonical
string representation of a UUID.

My bad.


--
Alan Robertson <alanr [at] unix>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


peinkofe at fhm

Nov 4, 2005, 7:00 AM

Post #29 of 55 (1847 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

Hello everybody,

I just wanted to ask how the status of this Problem is.
Am I'm supposed to provide further infos?

Many thanks in advance.
Stefan Peinkofer

On Mon, Oct 31, 2005 at 10:49:11AM -0700, Alan Robertson wrote:
> peinkofe [at] fhm wrote:
> > Hello Alan,
> > On Mon, Oct 31, 2005 at 09:35:58AM -0700, Alan Robertson wrote:
> >> peinkofe [at] fhm wrote:
> >>> Hello Alan, On Mon, Oct 31, 2005 at 08:18:10AM -0700, Alan Robertson
> >>> wrote:
> >>>> peinkofe [at] fhm wrote:
> >>>>> Hello everybody, On Sun, Oct 30, 2005 at 07:29:31PM +0100,
> >>>>> peinkofe [at] fhm wrote:
> >>>>>> Yes, I just tried the current cvs version and it works.
> >>>>>> (Problem 2 (the "cannot add field to ha_msg" Error) is gone and
> >>>>>> Problem 1 seems to be solved either)
> >>>>>>
> >>>>> Seems that I was a little bit too optimistic. Problem 1 isn't
> >>>>> solved yet. In fact it worked once and failed many times. In the
> >>>>> case which worked, a timeout of the monitor op was discovered:
> >>>>> Oct 30 19:01:46 spock lrmd: [4468]: WARN: on_op_timeout_expired:
> >>>>> TIMEOUT: operation monitor[15] on stonith::wti_nps::kill_sarek
> >>>>> for client 4469, its parameters: timeout=5000
> >>>>> ipaddr=192.168.1.205 te-target-rc=7 lrm-is-probe=true
> >>>>> password=XXXXXXX crm_feature_set=1.0.3 interval=10000 .
> >>>>>
> >>>>> Oct 30 19:01:51 spock crmd: [4469]: ERROR:
> >>>>> mask(lrm.c:do_lrm_event): LRM operation (15) monitor_10000 on
> >>>>> kill_sarek Timed Out
> >>>>>
> >>>>> The it said that sontihd was killed by signal 11 and respawned
> >>>>> it. Oct 30 19:01:55 spock heartbeat: [4447]: ERROR: Exiting
> >>>>> /usr/lib/heartbeat/stonithd process 4467 killed by signal 11. Oct
> >>>>> 30 19:01:55 spock heartbeat: [4447]: ERROR: Exiting
> >>>>> /usr/lib/heartbeat/stonithd process 4467 dumped core
> >>>> WE NEED THE STACK TRACE FROM THIS CORE DUMP.
> >>>>
> >>> Im sorry, I forgot. Attached some gdb backtraces (hope that is what
> >>> you want, since pstack on linux seems not to support core files).
> >>>
> >>> To avoid misunderstandings, do you aggree, that solving the stonithd
> >>> coredump cause solves not the whole problem. I mean, stonithd
> >>> recovers through the respawning mechanism but what makes the
> >>> situation worse is that the stonith resources fail to restart and
> >>> therefore remain not active.
> >> I agree that there are two problems.
> >>
> >> IMHO, the more serious of the two is the core dump. The other wouldn't
> >> be a problem if the stonithd hadn't needed to restart.
>
> > Form my humble users point of view it's the other way round, because
> > overstated a user doesn't care that stonithd segfaults as long as the
> > cluster does what it's supposed to do
>
> I understand. Obviously, I have a different perspective.
>
> > By the way I personally like
> > the approach to accept that failures occour and to add "self healing"
> > capabilities to recover, if possible.
>
> We obviously agree on that. Stuff happens.
>
> >> I don't know why the CRM didn't restart the resources when the
> >> monitor operation failed. (At least, I think it failed)
>
> The respawn should more often happen before the monitor failed - unless
> things were unlucky.
>
> > I think CRM at least tried to restart the stonith resources and one
> > time (see the first set of the logfiles for this) it even succeeded
> > in doing so. Maybe there is a timing "problem" since the in the case
> > it succeeded, the announcement of the resource restart was after the
> > stointhd respawn announcment. In the other cases where restart didn't
> > succed, it was exactly the other way round. Many thanks in advance.
>
> OK
>
> So it did succeed some times.
>
> --
> Alan Robertson <alanr [at] unix>
>
> "Openness is the foundation and preservative of friendship... Let me
> claim from you at all times your undisguised opinions." - William
> Wilberforce
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


alanr at unix

Nov 4, 2005, 8:26 AM

Post #30 of 55 (1844 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

peinkofe [at] fhm wrote:
> Hello everybody,
>
> I just wanted to ask how the status of this Problem is.
> Am I'm supposed to provide further infos?


Sun Jiang Dong wrote:
>
> Anyway I think the problem you met has been fixed in CVS. Please have a try.
> If you still meet it, please tell me. Thanks.

And, I put some more safeguards into the code which was implicated.
And, gshi fixed a somewhat-related problem.

Could you try again from CVS(HEAD)?

Thanks!

--
Alan Robertson <alanr [at] unix>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


peinkofe at fhm

Nov 4, 2005, 8:46 AM

Post #31 of 55 (1846 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

Hello Alan,
On Fri, Nov 04, 2005 at 09:26:42AM -0700, Alan Robertson wrote:
> peinkofe [at] fhm wrote:
> > Hello everybody,
> >
> > I just wanted to ask how the status of this Problem is.
> > Am I'm supposed to provide further infos?
>
>
> Sun Jiang Dong wrote:
> >
> > Anyway I think the problem you met has been fixed in CVS. Please have a try.
> > If you still meet it, please tell me. Thanks.
>
That was Problem 2 (cannot add field to ha_msg Error) which was fixed one or two weeks ago. What I mean is Problem 1 the stonithd coredump + not properly handled restart of the stonithd resources, after the core dump.
> And, I put some more safeguards into the code which was implicated.
> And, gshi fixed a somewhat-related problem.
>
> Could you try again from CVS(HEAD)?
I tried one from 2005-11-2 but it had still the problem 2. I will make a new try tomorrow and report the results.
>
> Thanks!
I have to Thank.
Stefan Peinkofer
>
> --
> Alan Robertson <alanr [at] unix>
>
> "Openness is the foundation and preservative of friendship... Let me
> claim from you at all times your undisguised opinions." - William
> Wilberforce
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


peinkofe at fhm

Nov 5, 2005, 3:17 AM

Post #32 of 55 (1850 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

Hello Alan,
On Fri, Nov 04, 2005 at 05:46:17PM +0100, peinkofe [at] fhm wrote:
> Hello Alan,
> On Fri, Nov 04, 2005 at 09:26:42AM -0700, Alan Robertson wrote:
> > peinkofe [at] fhm wrote:
> > > Hello everybody,
> > >
> > > I just wanted to ask how the status of this Problem is.
> > > Am I'm supposed to provide further infos?
> >
> >
> > Sun Jiang Dong wrote:
> > >
> > > Anyway I think the problem you met has been fixed in CVS. Please have a try.
> > > If you still meet it, please tell me. Thanks.
> >
> That was Problem 2 (cannot add field to ha_msg Error) which was fixed one or two weeks ago. What I mean is Problem 1 the stonithd coredump + not properly handled restart of the stonithd resources, after the core dump.
> > And, I put some more safeguards into the code which was implicated.
> > And, gshi fixed a somewhat-related problem.
> >
> > Could you try again from CVS(HEAD)?
> I tried one from 2005-11-2 but it had still the problem 2. I will make a new try tomorrow and report the results.
> >
I tryed the recent CVS HEAD, and it shows still the same behavior.
After some time heartbeat was running:
Nov 5 11:47:58 sarek lrmd: [9297]: WARN: on_op_timeout_expired: TIMEOUT: operation monitor[22] on stonith::wti_nps::kill_spock for client 9298, its parameters: timeout=5000 ipaddr=192.168.1.204 te-target-rc=7 lrm-is-probe=true password=XXXXX crm_feature_set=1.0.3 interval=10000
...
Nov 5 11:48:01 sarek crmd: [9298]: ERROR: mask(lrm.c:do_lrm_event): LRM operation (22) monitor_10000 on kill_spock Timed Out
...
Nov 5 11:48:02 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op stop on kill_spock
Nov 5 11:48:02 sarek crmd: [9298]: WARN: mask(lrm.c:do_lrm_event): LRM operation (22) monitor_10000 on kill_spock Cancelled
...
Nov 5 11:48:04 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op start on kill_spoc
...
Nov 5 11:48:20 sarek crmd: [9298]: ERROR: mask(lrm.c:do_lrm_event): LRM operation (26) start_0 on kill_spock Error: unknown error
..
Nov 5 11:48:21 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op stop on kill_spock
Nov 5 11:48:21 sarek stonithd: [9296]: notice: try to stop a resource kill_spock who is not in started resource queue.
Nov 5 11:48:22 sarek crmd: [9298]: info: mask(lrm.c:do_update_resource): Updating kill_spock resource definitions after stop op
...
Nov 5 11:48:24 sarek heartbeat: [9261]: WARN: Exiting /usr/lib/heartbeat/stonithd process 9296 killed by signal 11.
Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Exiting /usr/lib/heartbeat/stonithd process 9296 dumped core
Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Client /usr/lib/heartbeat/stonithd killed by signal 11.
Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Respawning client "/usr/lib/heartbeat/stonithd":
Nov 5 11:48:24 sarek heartbeat: [9261]: info: Starting child client "/usr/lib/heartbeat/stonithd" (0,0)
Nov 5 11:48:24 sarek heartbeat: [17057]: info: Starting "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 17057)

Note, I haven't attached full logs + core backtrace since the look like the onesI have provided in the former mail. If you want them regardles of that, let me know.
BTW. At least the OCF resource script IPAddr in the recent CVS HEAD is "broken" (at least for my system). To get heartbeat working for testing Problem 2 status, I used the ones from a CVS version from 2005-11-02. I have no time today to investigate further, but I think I will look at it closer towmorrow evening.
Many thanks in advance.
Stefan Peinkofer

> > Thanks!
> I have to Thank.
> Stefan Peinkofer
> >
> > --
> > Alan Robertson <alanr [at] unix>
> >
> > "Openness is the foundation and preservative of friendship... Let me
> > claim from you at all times your undisguised opinions." - William
> > Wilberforce
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA [at] lists
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


hasjd at cn

Nov 6, 2005, 10:52 PM

Post #33 of 55 (1848 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

Andrew Beekhof wrote:
>>Btw. pengine tells someting about a memory leak.
I obviously meet the memory leak of crmd and pengine also. FYI.
>
>
> i'll take a look at this part tomorrow.
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>

--
BRs,

Sun Jiang Dong

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


hasjd at cn

Nov 6, 2005, 11:45 PM

Post #34 of 55 (1845 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

peinkofe [at] fhm wrote:
> Hello Alan,
> On Fri, Nov 04, 2005 at 05:46:17PM +0100, peinkofe [at] fhm wrote:
>
>>Hello Alan,
>>On Fri, Nov 04, 2005 at 09:26:42AM -0700, Alan Robertson wrote:
>>
>>>peinkofe [at] fhm wrote:
>>>
>>>>Hello everybody,
>>>>
>>>>I just wanted to ask how the status of this Problem is.
>>>>Am I'm supposed to provide further infos?
>>>
>>>
>>>Sun Jiang Dong wrote:
>>>
>>>>Anyway I think the problem you met has been fixed in CVS. Please have a try.
>>>>If you still meet it, please tell me. Thanks.
>>>
>>That was Problem 2 (cannot add field to ha_msg Error) which was fixed one or two weeks ago. What I mean is Problem 1 the stonithd coredump + not properly handled restart of the stonithd resources, after the core dump.
>>
>>>And, I put some more safeguards into the code which was implicated.
>>>And, gshi fixed a somewhat-related problem.
>>>
>>>Could you try again from CVS(HEAD)?
>>
>>I tried one from 2005-11-2 but it had still the problem 2. I will make a new try tomorrow and report the results.
>>
> I tryed the recent CVS HEAD, and it shows still the same behavior.
> After some time heartbeat was running:
> Nov 5 11:47:58 sarek lrmd: [9297]: WARN: on_op_timeout_expired: TIMEOUT: operation monitor[22] on stonith::wti_nps::kill_spock for client 9298, its parameters: timeout=5000 ipaddr=192.168.1.204 te-target-rc=7 lrm-is-probe=true password=XXXXX crm_feature_set=1.0.3 interval=10000
> ...
> Nov 5 11:48:01 sarek crmd: [9298]: ERROR: mask(lrm.c:do_lrm_event): LRM operation (22) monitor_10000 on kill_spock Timed Out
> ...
> Nov 5 11:48:02 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op stop on kill_spock
> Nov 5 11:48:02 sarek crmd: [9298]: WARN: mask(lrm.c:do_lrm_event): LRM operation (22) monitor_10000 on kill_spock Cancelled
> ...
> Nov 5 11:48:04 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op start on kill_spoc
> ...
> Nov 5 11:48:20 sarek crmd: [9298]: ERROR: mask(lrm.c:do_lrm_event): LRM operation (26) start_0 on kill_spock Error: unknown error
> ..
> Nov 5 11:48:21 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op stop on kill_spock
> Nov 5 11:48:21 sarek stonithd: [9296]: notice: try to stop a resource kill_spock who is not in started resource queue.
> Nov 5 11:48:22 sarek crmd: [9298]: info: mask(lrm.c:do_update_resource): Updating kill_spock resource definitions after stop op
> ...
> Nov 5 11:48:24 sarek heartbeat: [9261]: WARN: Exiting /usr/lib/heartbeat/stonithd process 9296 killed by signal 11.
> Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Exiting /usr/lib/heartbeat/stonithd process 9296 dumped core
> Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Client /usr/lib/heartbeat/stonithd killed by signal 11.
> Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Respawning client "/usr/lib/heartbeat/stonithd":
> Nov 5 11:48:24 sarek heartbeat: [9261]: info: Starting child client "/usr/lib/heartbeat/stonithd" (0,0)
> Nov 5 11:48:24 sarek heartbeat: [17057]: info: Starting "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 17057)
I'm puzzled by this issue ( stonithd killed by signal 11 ) for a long time,
because it's not reproduced on my machine.
It's so fortune for me you can reproduce it stably. ;-)
I make a small patch again to current HEAD file lib/clplumbing/cl_msg.c. Can you
please apply it and try again? This should be helpful for me to located the
issue more further. Thanks a lots in advance.

>
> Note, I haven't attached full logs + core backtrace since the look like the onesI have provided in the former mail. If you want them regardles of that, let me know.
> BTW. At least the OCF resource script IPAddr in the recent CVS HEAD is "broken" (at least for my system). To get heartbeat working for testing Problem 2 status, I used the ones from a CVS version from 2005-11-02. I have no time today to investigate further, but I think I will look at it closer towmorrow evening.
> Many thanks in advance.
BTW, I fixed the broken issue of the OCF IPAddr.
> Stefan Peinkofer
>
>
>>> Thanks!
>>
>>I have to Thank.
>>Stefan Peinkofer
>>
>>>--
>>> Alan Robertson <alanr [at] unix>
>>>
>>>"Openness is the foundation and preservative of friendship... Let me
>>>claim from you at all times your undisguised opinions." - William
>>>Wilberforce
>>>_______________________________________________
>>>Linux-HA mailing list
>>>Linux-HA [at] lists
>>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>See also: http://linux-ha.org/ReportingProblems
>>
>>_______________________________________________
>>Linux-HA mailing list
>>Linux-HA [at] lists
>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>See also: http://linux-ha.org/ReportingProblems
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>

--
BRs,

Sun Jiang Dong
Attachments: cl_msg.c.bug730.patch (1.06 KB)


peinkofe at fhm

Nov 7, 2005, 3:28 AM

Post #35 of 55 (1869 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

Hello,
On Mon, Nov 07, 2005 at 03:45:43PM +0800, Sun Jiang Dong wrote:
>
>
> peinkofe [at] fhm wrote:
> > Hello Alan,
> > On Fri, Nov 04, 2005 at 05:46:17PM +0100, peinkofe [at] fhm wrote:
> >
> >>Hello Alan,
> >>On Fri, Nov 04, 2005 at 09:26:42AM -0700, Alan Robertson wrote:
> >>
> >>>peinkofe [at] fhm wrote:
> >>>
> >>>>Hello everybody,
> >>>>
> >>>>I just wanted to ask how the status of this Problem is.
> >>>>Am I'm supposed to provide further infos?
> >>>
> >>>
> >>>Sun Jiang Dong wrote:
> >>>
> >>>>Anyway I think the problem you met has been fixed in CVS. Please have a try.
> >>>>If you still meet it, please tell me. Thanks.
> >>>
> >>That was Problem 2 (cannot add field to ha_msg Error) which was fixed one or two weeks ago. What I mean is Problem 1 the stonithd coredump + not properly handled restart of the stonithd resources, after the core dump.
> >>
> >>>And, I put some more safeguards into the code which was implicated.
> >>>And, gshi fixed a somewhat-related problem.
> >>>
> >>>Could you try again from CVS(HEAD)?
> >>
> >>I tried one from 2005-11-2 but it had still the problem 2. I will make a new try tomorrow and report the results.
> >>
> > I tryed the recent CVS HEAD, and it shows still the same behavior.
> > After some time heartbeat was running:
> > Nov 5 11:47:58 sarek lrmd: [9297]: WARN: on_op_timeout_expired: TIMEOUT: operation monitor[22] on stonith::wti_nps::kill_spock for client 9298, its parameters: timeout=5000 ipaddr=192.168.1.204 te-target-rc=7 lrm-is-probe=true password=XXXXX crm_feature_set=1.0.3 interval=10000
> > ...
> > Nov 5 11:48:01 sarek crmd: [9298]: ERROR: mask(lrm.c:do_lrm_event): LRM operation (22) monitor_10000 on kill_spock Timed Out
> > ...
> > Nov 5 11:48:02 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op stop on kill_spock
> > Nov 5 11:48:02 sarek crmd: [9298]: WARN: mask(lrm.c:do_lrm_event): LRM operation (22) monitor_10000 on kill_spock Cancelled
> > ...
> > Nov 5 11:48:04 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op start on kill_spoc
> > ...
> > Nov 5 11:48:20 sarek crmd: [9298]: ERROR: mask(lrm.c:do_lrm_event): LRM operation (26) start_0 on kill_spock Error: unknown error
> > ..
> > Nov 5 11:48:21 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op stop on kill_spock
> > Nov 5 11:48:21 sarek stonithd: [9296]: notice: try to stop a resource kill_spock who is not in started resource queue.
> > Nov 5 11:48:22 sarek crmd: [9298]: info: mask(lrm.c:do_update_resource): Updating kill_spock resource definitions after stop op
> > ...
> > Nov 5 11:48:24 sarek heartbeat: [9261]: WARN: Exiting /usr/lib/heartbeat/stonithd process 9296 killed by signal 11.
> > Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Exiting /usr/lib/heartbeat/stonithd process 9296 dumped core
> > Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Client /usr/lib/heartbeat/stonithd killed by signal 11.
> > Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Respawning client "/usr/lib/heartbeat/stonithd":
> > Nov 5 11:48:24 sarek heartbeat: [9261]: info: Starting child client "/usr/lib/heartbeat/stonithd" (0,0)
> > Nov 5 11:48:24 sarek heartbeat: [17057]: info: Starting "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 17057)
> I'm puzzled by this issue ( stonithd killed by signal 11 ) for a long time,
> because it's not reproduced on my machine.
> It's so fortune for me you can reproduce it stably. ;-)
In fact it is killed everytime I start heartbeat. Sometimes it is killed after 4 or 5 minutes takes a little bit longer (1 hour) (subjective impression is that it takes longer if the machine is fresh rebooted)
> I make a small patch again to current HEAD file lib/clplumbing/cl_msg.c. Can you
> please apply it and try again? This should be helpful for me to located the
> issue more further. Thanks a lots in advance.
>

OK, used the current CVS HEAD from today. I have attached the logs of both nodes.
Im not 100 percent sure yet, but it seems, that if stonithd segfaulted one time, and therefore no monitor operations are carried out anymore it will not segfault anymore. So maybe the monitor operation causes the segfault somehow???
(Just wanted to mention that, perhaps it's helpful)

BTW: I would much appreciate it, if someone could get LRM (or CRM) to restart the stonith resources reliably, in such a case. It's maybe sufficient if the stonith resources get restarted until the start operation succeeds. Is there somewhere a trigger in cib.xml where I can specify, try to restart infinitely? (or at least try it 100 times or so :)

Many thanks in advance.
Stefan Peinkofer
> >
> > Note, I haven't attached full logs + core backtrace since the look like the onesI have provided in the former mail. If you want them regardles of that, let me know.
> > BTW. At least the OCF resource script IPAddr in the recent CVS HEAD is "broken" (at least for my system). To get heartbeat working for testing Problem 2 status, I used the ones from a CVS version from 2005-11-02. I have no time today to investigate further, but I think I will look at it closer towmorrow evening.
> > Many thanks in advance.
> BTW, I fixed the broken issue of the OCF IPAddr.
> > Stefan Peinkofer
> >
> >
> >>> Thanks!
> >>
> >>I have to Thank.
> >>Stefan Peinkofer
> >>
> >>>--
> >>> Alan Robertson <alanr [at] unix>
> >>>
> >>>"Openness is the foundation and preservative of friendship... Let me
> >>>claim from you at all times your undisguised opinions." - William
> >>>Wilberforce
> >>>_______________________________________________
> >>>Linux-HA mailing list
> >>>Linux-HA [at] lists
> >>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>See also: http://linux-ha.org/ReportingProblems
> >>
> >>_______________________________________________
> >>Linux-HA mailing list
> >>Linux-HA [at] lists
> >>http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>See also: http://linux-ha.org/ReportingProblems
> >
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA [at] lists
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
>
> --
> BRs,
>
> Sun Jiang Dong

> Index: cl_msg.c
> ===================================================================
> RCS file: /home/cvs/linux-ha/linux-ha/lib/clplumbing/cl_msg.c,v
> retrieving revision 1.101
> diff -u -r1.101 cl_msg.c
> --- cl_msg.c 3 Nov 2005 22:28:32 -0000 1.101
> +++ cl_msg.c 7 Nov 2005 07:35:43 -0000
> @@ -1964,11 +1964,24 @@
> return HA_FAIL;
> }
>
> + /*
> + * Just for debugging bug 730, will remove it after the bug is fixed.
> + * http://www.osdl.org/developer_bugzilla/show_bug.cgi?id=730
> + */
> + cl_log(LOG_INFO, "%s:%d: Will audit the ha_msg.", __FUNCTION__, __LINE__);
> + AUDITMSG(m);
> +
> + cl_log(LOG_INFO, "%s:%d: Will detect the status of the channel as an "
> + " indirect checking", __FUNCTION__, __LINE__);
> + cl_log(LOG_INFO, "Channel staus: %d", ch->ops->get_chan_status(ch));
> +
> if ((imsg = hamsg2ipcmsg(m, ch)) == NULL) {
> cl_log(LOG_ERR, "hamsg2ipcmsg() failure");
> return HA_FAIL;
> }
>
> + cl_log(LOG_INFO, "%s:%d: hamsg2ipcmsg() ok.", __FUNCTION__, __LINE__);
> +
> if (ch->ops->send(ch, imsg) != IPC_OK) {
> if (ch->ch_status == IPC_CONNECT) {
> snprintf(ch->failreason,MAXFAILREASON,

> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
Attachments: messages.tar.gz (40.8 KB)


alanr at unix

Nov 7, 2005, 6:36 AM

Post #36 of 55 (1845 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

peinkofe [at] fhm wrote:
> Hello,
> On Mon, Nov 07, 2005 at 03:45:43PM +0800, Sun Jiang Dong wrote:
>>
>> peinkofe [at] fhm wrote:

Unless your messages files are really huge, please just attach them as
text/plain files as suggested by the link below.

> See also: http://linux-ha.org/ReportingProblems

Over a month I look at many dozens of logs, and it's much more
convenient to look at them inline (which I can't do if they're tar files).

--
Alan Robertson <alanr [at] unix>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


peinkofe at fhm

Nov 7, 2005, 6:59 AM

Post #37 of 55 (1843 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

Hello Alan,
On Mon, Nov 07, 2005 at 07:36:23AM -0700, Alan Robertson wrote:
> peinkofe [at] fhm wrote:
> > Hello,
> > On Mon, Nov 07, 2005 at 03:45:43PM +0800, Sun Jiang Dong wrote:
> >>
> >> peinkofe [at] fhm wrote:
>
> Unless your messages files are really huge, please just attach them as
> text/plain files as suggested by the link below.
>
> > See also: http://linux-ha.org/ReportingProblems
>
> Over a month I look at many dozens of logs, and it's much more
> convenient to look at them inline (which I can't do if they're tar files).
>
I know this, and I think I have attached them in plain, whenever I was able to (if not, I appologize for it). But since heartbeat logs grow fast over the 100kb boundary (at least on the DC) I'm forced to compress them.
But I will double check it from now on.
Or did you mean, I should provide not the whole log, from heartbeat startup until the failure?
(I though this would be a good idea, since something may be going wrong long before the failure occours. If i'm wrong, i'm sorry about that.)

Many thanks in advance.
Stefan Peinkofer
> Alan Robertson <alanr [at] unix>
>
> "Openness is the foundation and preservative of friendship... Let me
> claim from you at all times your undisguised opinions." - William
> Wilberforce
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


alanr at unix

Nov 7, 2005, 7:37 AM

Post #38 of 55 (1843 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

peinkofe [at] fhm wrote:
> Hello Alan,
> On Mon, Nov 07, 2005 at 07:36:23AM -0700, Alan Robertson wrote:
>> peinkofe [at] fhm wrote:
>>> Hello,
>>> On Mon, Nov 07, 2005 at 03:45:43PM +0800, Sun Jiang Dong wrote:
>>>> peinkofe [at] fhm wrote:
>> Unless your messages files are really huge, please just attach them as
>> text/plain files as suggested by the link below.
>>
>> > See also: http://linux-ha.org/ReportingProblems
>>
>> Over a month I look at many dozens of logs, and it's much more
>> convenient to look at them inline (which I can't do if they're tar files).
>>
> I know this, and I think I have attached them in plain, whenever I was able to (if not, I appologize for it). But since heartbeat logs grow fast over the 100kb boundary (at least on the DC) I'm forced to compress them.
> But I will double check it from now on.
> Or did you mean, I should provide not the whole log, from heartbeat startup until the failure?
> (I though this would be a good idea, since something may be going wrong long before the failure occours. If i'm wrong, i'm sorry about that.)

I just meant to attach them as plain when you could. It was just a
general comment. I doesn't sound like it was appropriate in your case.





--
Alan Robertson <alanr [at] unix>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


hasjd at cn

Nov 7, 2005, 10:06 PM

Post #39 of 55 (1852 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

peinkofe [at] fhm wrote:
> Hello,
> On Mon, Nov 07, 2005 at 03:45:43PM +0800, Sun Jiang Dong wrote:
>
>>
>>peinkofe [at] fhm wrote:
>>
>>>Hello Alan,
>>>On Fri, Nov 04, 2005 at 05:46:17PM +0100, peinkofe [at] fhm wrote:
>>>
>>>
>>>>Hello Alan,
>>>>On Fri, Nov 04, 2005 at 09:26:42AM -0700, Alan Robertson wrote:
>>>>
>>>>
>>>>>peinkofe [at] fhm wrote:
>>>>>
>>>>>
>>>>>>Hello everybody,
>>>>>>
>>>>>>I just wanted to ask how the status of this Problem is.
>>>>>>Am I'm supposed to provide further infos?
>>>>>
>>>>>
>>>>>Sun Jiang Dong wrote:
>>>>>
>>>>>
>>>>>>Anyway I think the problem you met has been fixed in CVS. Please have a try.
>>>>>>If you still meet it, please tell me. Thanks.
>>>>>
>>>>That was Problem 2 (cannot add field to ha_msg Error) which was fixed one or two weeks ago. What I mean is Problem 1 the stonithd coredump + not properly handled restart of the stonithd resources, after the core dump.
>>>>
>>>>
>>>>>And, I put some more safeguards into the code which was implicated.
>>>>>And, gshi fixed a somewhat-related problem.
>>>>>
>>>>>Could you try again from CVS(HEAD)?
>>>>
>>>>I tried one from 2005-11-2 but it had still the problem 2. I will make a new try tomorrow and report the results.
>>>>
>>>
>>>I tryed the recent CVS HEAD, and it shows still the same behavior.
>>>After some time heartbeat was running:
>>>Nov 5 11:47:58 sarek lrmd: [9297]: WARN: on_op_timeout_expired: TIMEOUT: operation monitor[22] on stonith::wti_nps::kill_spock for client 9298, its parameters: timeout=5000 ipaddr=192.168.1.204 te-target-rc=7 lrm-is-probe=true password=XXXXX crm_feature_set=1.0.3 interval=10000
>>>...
>>>Nov 5 11:48:01 sarek crmd: [9298]: ERROR: mask(lrm.c:do_lrm_event): LRM operation (22) monitor_10000 on kill_spock Timed Out
>>>...
>>>Nov 5 11:48:02 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op stop on kill_spock
>>>Nov 5 11:48:02 sarek crmd: [9298]: WARN: mask(lrm.c:do_lrm_event): LRM operation (22) monitor_10000 on kill_spock Cancelled
>>>...
>>>Nov 5 11:48:04 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op start on kill_spoc
>>>...
>>>Nov 5 11:48:20 sarek crmd: [9298]: ERROR: mask(lrm.c:do_lrm_event): LRM operation (26) start_0 on kill_spock Error: unknown error
>>>..
>>>Nov 5 11:48:21 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op stop on kill_spock
>>>Nov 5 11:48:21 sarek stonithd: [9296]: notice: try to stop a resource kill_spock who is not in started resource queue.
>>>Nov 5 11:48:22 sarek crmd: [9298]: info: mask(lrm.c:do_update_resource): Updating kill_spock resource definitions after stop op
>>>...
>>>Nov 5 11:48:24 sarek heartbeat: [9261]: WARN: Exiting /usr/lib/heartbeat/stonithd process 9296 killed by signal 11.
>>>Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Exiting /usr/lib/heartbeat/stonithd process 9296 dumped core
>>>Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Client /usr/lib/heartbeat/stonithd killed by signal 11.
>>>Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Respawning client "/usr/lib/heartbeat/stonithd":
>>>Nov 5 11:48:24 sarek heartbeat: [9261]: info: Starting child client "/usr/lib/heartbeat/stonithd" (0,0)
>>>Nov 5 11:48:24 sarek heartbeat: [17057]: info: Starting "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 17057)
>>
>>I'm puzzled by this issue ( stonithd killed by signal 11 ) for a long time,
>>because it's not reproduced on my machine.
>>It's so fortune for me you can reproduce it stably. ;-)
>
> In fact it is killed everytime I start heartbeat. Sometimes it is killed after 4 or 5 minutes takes a little bit longer (1 hour) (subjective impression is that it takes longer if the machine is fresh rebooted)
>
>>I make a small patch again to current HEAD file lib/clplumbing/cl_msg.c. Can you
>>please apply it and try again? This should be helpful for me to located the
>>issue more further. Thanks a lots in advance.
>>
>
>
> OK, used the current CVS HEAD from today. I have attached the logs of both nodes.
> Im not 100 percent sure yet, but it seems, that if stonithd segfaulted one time, and therefore no monitor operations are carried out anymore it will not segfault anymore. So maybe the monitor operation causes the segfault somehow???
> (Just wanted to mention that, perhaps it's helpful)
Thanks so much for your help.
Besides, do you apply my small patch as the attachment? I cannot see the output
from the small patch.
And, from the log you attached, it seems the issue of this time has a different
cause comparing to the last one. I added several memory initializing statements
in CVS. Could you please have a try again. Thanks and waiting for your result.

>
> BTW: I would much appreciate it, if someone could get LRM (or CRM) to restart the stonith resources reliably, in such a case. It's maybe sufficient if the stonith resources get restarted until the start operation succeeds. Is there somewhere a trigger in cib.xml where I can specify, try to restart infinitely? (or at least try it 100 times or so :)
>
I'll file a bug for this, but currently only for tracking the requirement.
http://www.osdl.org/developer_bugzilla/show_bug.cgi?id=950
> Many thanks in advance.
> Stefan Peinkofer
>
>>>Note, I haven't attached full logs + core backtrace since the look like the onesI have provided in the former mail. If you want them regardles of that, let me know.
>>>BTW. At least the OCF resource script IPAddr in the recent CVS HEAD is "broken" (at least for my system). To get heartbeat working for testing Problem 2 status, I used the ones from a CVS version from 2005-11-02. I have no time today to investigate further, but I think I will look at it closer towmorrow evening.
>>>Many thanks in advance.
>>
>>BTW, I fixed the broken issue of the OCF IPAddr.
>>
>>>Stefan Peinkofer
>>>
>>>
>>>
>>>>> Thanks!
>>>>
>>>>I have to Thank.
>>>>Stefan Peinkofer
>>>>
>>>>
>>>>>--
>>>>> Alan Robertson <alanr [at] unix>
>>>>>
>>>>>"Openness is the foundation and preservative of friendship... Let me
>>>>>claim from you at all times your undisguised opinions." - William
>>>>>Wilberforce
>>>>>_______________________________________________
>>>>>Linux-HA mailing list
>>>>>Linux-HA [at] lists
>>>>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>See also: http://linux-ha.org/ReportingProblems
>>>>
>>>>_______________________________________________
>>>>Linux-HA mailing list
>>>>Linux-HA [at] lists
>>>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>See also: http://linux-ha.org/ReportingProblems
>>>
>>>_______________________________________________
>>>Linux-HA mailing list
>>>Linux-HA [at] lists
>>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>See also: http://linux-ha.org/ReportingProblems
>>>
>>
>>--
>>BRs,
>>
>>Sun Jiang Dong
>
>
>>Index: cl_msg.c
>>===================================================================
>>RCS file: /home/cvs/linux-ha/linux-ha/lib/clplumbing/cl_msg.c,v
>>retrieving revision 1.101
>>diff -u -r1.101 cl_msg.c
>>--- cl_msg.c 3 Nov 2005 22:28:32 -0000 1.101
>>+++ cl_msg.c 7 Nov 2005 07:35:43 -0000
>>@@ -1964,11 +1964,24 @@
>> return HA_FAIL;
>> }
>>
>>+ /*
>>+ * Just for debugging bug 730, will remove it after the bug is fixed.
>>+ * http://www.osdl.org/developer_bugzilla/show_bug.cgi?id=730
>>+ */
>>+ cl_log(LOG_INFO, "%s:%d: Will audit the ha_msg.", __FUNCTION__, __LINE__);
>>+ AUDITMSG(m);
>>+
>>+ cl_log(LOG_INFO, "%s:%d: Will detect the status of the channel as an "
>>+ " indirect checking", __FUNCTION__, __LINE__);
>>+ cl_log(LOG_INFO, "Channel staus: %d", ch->ops->get_chan_status(ch));
>>+
>> if ((imsg = hamsg2ipcmsg(m, ch)) == NULL) {
>> cl_log(LOG_ERR, "hamsg2ipcmsg() failure");
>> return HA_FAIL;
>> }
>>
>>+ cl_log(LOG_INFO, "%s:%d: hamsg2ipcmsg() ok.", __FUNCTION__, __LINE__);
>>+
>> if (ch->ops->send(ch, imsg) != IPC_OK) {
>> if (ch->ch_status == IPC_CONNECT) {
>> snprintf(ch->failreason,MAXFAILREASON,
>
>
>>_______________________________________________
>>Linux-HA mailing list
>>Linux-HA [at] lists
>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>See also: http://linux-ha.org/ReportingProblems
>>
>>
>>------------------------------------------------------------------------
>>
>>_______________________________________________
>>Linux-HA mailing list
>>Linux-HA [at] lists
>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>See also: http://linux-ha.org/ReportingProblems

--
BRs,

Sun Jiang Dong

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


peinkofe at fhm

Nov 8, 2005, 2:13 AM

Post #40 of 55 (1853 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

Hello Sun Jiang Dong,
On Tue, 2005-11-08 at 14:06 +0800, Sun Jiang Dong wrote:
>
> peinkofe [at] fhm wrote:
> > Hello,
> > On Mon, Nov 07, 2005 at 03:45:43PM +0800, Sun Jiang Dong wrote:
> >
> >>
> >>peinkofe [at] fhm wrote:
> >>
> >>>Hello Alan,
> >>>On Fri, Nov 04, 2005 at 05:46:17PM +0100, peinkofe [at] fhm wrote:
> >>>
> >>>
> >>>>Hello Alan,
> >>>>On Fri, Nov 04, 2005 at 09:26:42AM -0700, Alan Robertson wrote:
> >>>>
> >>>>
> >>>>>peinkofe [at] fhm wrote:
> >>>>>
> >>>>>
> >>>>>>Hello everybody,
> >>>>>>
> >>>>>>I just wanted to ask how the status of this Problem is.
> >>>>>>Am I'm supposed to provide further infos?
> >>>>>
> >>>>>
> >>>>>Sun Jiang Dong wrote:
> >>>>>
> >>>>>
> >>>>>>Anyway I think the problem you met has been fixed in CVS. Please have a try.
> >>>>>>If you still meet it, please tell me. Thanks.
> >>>>>
> >>>>That was Problem 2 (cannot add field to ha_msg Error) which was fixed one or two weeks ago. What I mean is Problem 1 the stonithd coredump + not properly handled restart of the stonithd resources, after the core dump.
> >>>>
> >>>>
> >>>>>And, I put some more safeguards into the code which was implicated.
> >>>>>And, gshi fixed a somewhat-related problem.
> >>>>>
> >>>>>Could you try again from CVS(HEAD)?
> >>>>
> >>>>I tried one from 2005-11-2 but it had still the problem 2. I will make a new try tomorrow and report the results.
> >>>>
> >>>
> >>>I tryed the recent CVS HEAD, and it shows still the same behavior.
> >>>After some time heartbeat was running:
> >>>Nov 5 11:47:58 sarek lrmd: [9297]: WARN: on_op_timeout_expired: TIMEOUT: operation monitor[22] on stonith::wti_nps::kill_spock for client 9298, its parameters: timeout=5000 ipaddr=192.168.1.204 te-target-rc=7 lrm-is-probe=true password=XXXXX crm_feature_set=1.0.3 interval=10000
> >>>...
> >>>Nov 5 11:48:01 sarek crmd: [9298]: ERROR: mask(lrm.c:do_lrm_event): LRM operation (22) monitor_10000 on kill_spock Timed Out
> >>>...
> >>>Nov 5 11:48:02 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op stop on kill_spock
> >>>Nov 5 11:48:02 sarek crmd: [9298]: WARN: mask(lrm.c:do_lrm_event): LRM operation (22) monitor_10000 on kill_spock Cancelled
> >>>...
> >>>Nov 5 11:48:04 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op start on kill_spoc
> >>>...
> >>>Nov 5 11:48:20 sarek crmd: [9298]: ERROR: mask(lrm.c:do_lrm_event): LRM operation (26) start_0 on kill_spock Error: unknown error
> >>>..
> >>>Nov 5 11:48:21 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op stop on kill_spock
> >>>Nov 5 11:48:21 sarek stonithd: [9296]: notice: try to stop a resource kill_spock who is not in started resource queue.
> >>>Nov 5 11:48:22 sarek crmd: [9298]: info: mask(lrm.c:do_update_resource): Updating kill_spock resource definitions after stop op
> >>>...
> >>>Nov 5 11:48:24 sarek heartbeat: [9261]: WARN: Exiting /usr/lib/heartbeat/stonithd process 9296 killed by signal 11.
> >>>Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Exiting /usr/lib/heartbeat/stonithd process 9296 dumped core
> >>>Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Client /usr/lib/heartbeat/stonithd killed by signal 11.
> >>>Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Respawning client "/usr/lib/heartbeat/stonithd":
> >>>Nov 5 11:48:24 sarek heartbeat: [9261]: info: Starting child client "/usr/lib/heartbeat/stonithd" (0,0)
> >>>Nov 5 11:48:24 sarek heartbeat: [17057]: info: Starting "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 17057)
> >>
> >>I'm puzzled by this issue ( stonithd killed by signal 11 ) for a long time,
> >>because it's not reproduced on my machine.
> >>It's so fortune for me you can reproduce it stably. ;-)
> >
> > In fact it is killed everytime I start heartbeat. Sometimes it is killed after 4 or 5 minutes takes a little bit longer (1 hour) (subjective impression is that it takes longer if the machine is fresh rebooted)
> >
> >>I make a small patch again to current HEAD file lib/clplumbing/cl_msg.c. Can you
> >>please apply it and try again? This should be helpful for me to located the
> >>issue more further. Thanks a lots in advance.
> >>
> >
> >
> > OK, used the current CVS HEAD from today. I have attached the logs of both nodes.
> > Im not 100 percent sure yet, but it seems, that if stonithd segfaulted one time, and therefore no monitor operations are carried out anymore it will not segfault anymore. So maybe the monitor operation causes the segfault somehow???
> > (Just wanted to mention that, perhaps it's helpful)
> Thanks so much for your help.
> Besides, do you apply my small patch as the attachment? I cannot see the output
> from the small patch.
> And, from the log you attached, it seems the issue of this time has a different
> cause comparing to the last one. I added several memory initializing statements
> in CVS. Could you please have a try again. Thanks and waiting for your result.
>
Ups, I misunderstood your mail, I though the patch were in the CVS HEAD,
sorry. I think I will be able to apply the patch in a few hours and then
mail you the logs.
> >
> > BTW: I would much appreciate it, if someone could get LRM (or CRM) to restart the stonith resources reliably, in such a case. It's maybe sufficient if the stonith resources get restarted until the start operation succeeds. Is there somewhere a trigger in cib.xml where I can specify, try to restart infinitely? (or at least try it 100 times or so :)
> >
> I'll file a bug for this, but currently only for tracking the requirement.
> http://www.osdl.org/developer_bugzilla/show_bug.cgi?id=950
Many thanks for that.

Stefan Peinkofer
> > Many thanks in advance.
> > Stefan Peinkofer
> >
> >>>Note, I haven't attached full logs + core backtrace since the look like the onesI have provided in the former mail. If you want them regardles of that, let me know.
> >>>BTW. At least the OCF resource script IPAddr in the recent CVS HEAD is "broken" (at least for my system). To get heartbeat working for testing Problem 2 status, I used the ones from a CVS version from 2005-11-02. I have no time today to investigate further, but I think I will look at it closer towmorrow evening.
> >>>Many thanks in advance.
> >>
> >>BTW, I fixed the broken issue of the OCF IPAddr.
> >>
> >>>Stefan Peinkofer
> >>>
> >>>
> >>>
> >>>>> Thanks!
> >>>>
> >>>>I have to Thank.
> >>>>Stefan Peinkofer
> >>>>
> >>>>
> >>>>>--
> >>>>> Alan Robertson <alanr [at] unix>
> >>>>>
> >>>>>"Openness is the foundation and preservative of friendship... Let me
> >>>>>claim from you at all times your undisguised opinions." - William
> >>>>>Wilberforce
> >>>>>_______________________________________________
> >>>>>Linux-HA mailing list
> >>>>>Linux-HA [at] lists
> >>>>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>>>See also: http://linux-ha.org/ReportingProblems
> >>>>
> >>>>_______________________________________________
> >>>>Linux-HA mailing list
> >>>>Linux-HA [at] lists
> >>>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>>See also: http://linux-ha.org/ReportingProblems
> >>>
> >>>_______________________________________________
> >>>Linux-HA mailing list
> >>>Linux-HA [at] lists
> >>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>See also: http://linux-ha.org/ReportingProblems
> >>>
> >>
> >>--
> >>BRs,
> >>
> >>Sun Jiang Dong
> >
> >
> >>Index: cl_msg.c
> >>===================================================================
> >>RCS file: /home/cvs/linux-ha/linux-ha/lib/clplumbing/cl_msg.c,v
> >>retrieving revision 1.101
> >>diff -u -r1.101 cl_msg.c
> >>--- cl_msg.c 3 Nov 2005 22:28:32 -0000 1.101
> >>+++ cl_msg.c 7 Nov 2005 07:35:43 -0000
> >>@@ -1964,11 +1964,24 @@
> >> return HA_FAIL;
> >> }
> >>
> >>+ /*
> >>+ * Just for debugging bug 730, will remove it after the bug is fixed.
> >>+ * http://www.osdl.org/developer_bugzilla/show_bug.cgi?id=730
> >>+ */
> >>+ cl_log(LOG_INFO, "%s:%d: Will audit the ha_msg.", __FUNCTION__, __LINE__);
> >>+ AUDITMSG(m);
> >>+
> >>+ cl_log(LOG_INFO, "%s:%d: Will detect the status of the channel as an "
> >>+ " indirect checking", __FUNCTION__, __LINE__);
> >>+ cl_log(LOG_INFO, "Channel staus: %d", ch->ops->get_chan_status(ch));
> >>+
> >> if ((imsg = hamsg2ipcmsg(m, ch)) == NULL) {
> >> cl_log(LOG_ERR, "hamsg2ipcmsg() failure");
> >> return HA_FAIL;
> >> }
> >>
> >>+ cl_log(LOG_INFO, "%s:%d: hamsg2ipcmsg() ok.", __FUNCTION__, __LINE__);
> >>+
> >> if (ch->ops->send(ch, imsg) != IPC_OK) {
> >> if (ch->ch_status == IPC_CONNECT) {
> >> snprintf(ch->failreason,MAXFAILREASON,
> >
> >
> >>_______________________________________________
> >>Linux-HA mailing list
> >>Linux-HA [at] lists
> >>http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>See also: http://linux-ha.org/ReportingProblems
> >>
> >>
> >>------------------------------------------------------------------------
> >>
> >>_______________________________________________
> >>Linux-HA mailing list
> >>Linux-HA [at] lists
> >>http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>See also: http://linux-ha.org/ReportingProblems
>
Attachments: signature.asc (0.18 KB)


hasjd at cn

Nov 8, 2005, 2:23 AM

Post #41 of 55 (1859 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

Stefan Peinkofer wrote:
> Hello Sun Jiang Dong,
> On Tue, 2005-11-08 at 14:06 +0800, Sun Jiang Dong wrote:
>
>>peinkofe [at] fhm wrote:
>>
>>>Hello,
>>>On Mon, Nov 07, 2005 at 03:45:43PM +0800, Sun Jiang Dong wrote:
>>>
>>>
>>>>peinkofe [at] fhm wrote:
>>>>
>>>>
>>>>>Hello Alan,
>>>>>On Fri, Nov 04, 2005 at 05:46:17PM +0100, peinkofe [at] fhm wrote:
>>>>>
>>>>>
>>>>>
>>>>>>Hello Alan,
>>>>>>On Fri, Nov 04, 2005 at 09:26:42AM -0700, Alan Robertson wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>peinkofe [at] fhm wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>Hello everybody,
>>>>>>>>
>>>>>>>>I just wanted to ask how the status of this Problem is.
>>>>>>>>Am I'm supposed to provide further infos?
>>>>>>>
>>>>>>>
>>>>>>>Sun Jiang Dong wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>Anyway I think the problem you met has been fixed in CVS. Please have a try.
>>>>>>>>If you still meet it, please tell me. Thanks.
>>>>>>>
>>>>>>That was Problem 2 (cannot add field to ha_msg Error) which was fixed one or two weeks ago. What I mean is Problem 1 the stonithd coredump + not properly handled restart of the stonithd resources, after the core dump.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>And, I put some more safeguards into the code which was implicated.
>>>>>>>And, gshi fixed a somewhat-related problem.
>>>>>>>
>>>>>>>Could you try again from CVS(HEAD)?
>>>>>>
>>>>>>I tried one from 2005-11-2 but it had still the problem 2. I will make a new try tomorrow and report the results.
>>>>>>
>>>>>
>>>>>I tryed the recent CVS HEAD, and it shows still the same behavior.
>>>>>After some time heartbeat was running:
>>>>>Nov 5 11:47:58 sarek lrmd: [9297]: WARN: on_op_timeout_expired: TIMEOUT: operation monitor[22] on stonith::wti_nps::kill_spock for client 9298, its parameters: timeout=5000 ipaddr=192.168.1.204 te-target-rc=7 lrm-is-probe=true password=XXXXX crm_feature_set=1.0.3 interval=10000
>>>>>...
>>>>>Nov 5 11:48:01 sarek crmd: [9298]: ERROR: mask(lrm.c:do_lrm_event): LRM operation (22) monitor_10000 on kill_spock Timed Out
>>>>>...
>>>>>Nov 5 11:48:02 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op stop on kill_spock
>>>>>Nov 5 11:48:02 sarek crmd: [9298]: WARN: mask(lrm.c:do_lrm_event): LRM operation (22) monitor_10000 on kill_spock Cancelled
>>>>>...
>>>>>Nov 5 11:48:04 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op start on kill_spoc
>>>>>...
>>>>>Nov 5 11:48:20 sarek crmd: [9298]: ERROR: mask(lrm.c:do_lrm_event): LRM operation (26) start_0 on kill_spock Error: unknown error
>>>>>..
>>>>>Nov 5 11:48:21 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op stop on kill_spock
>>>>>Nov 5 11:48:21 sarek stonithd: [9296]: notice: try to stop a resource kill_spock who is not in started resource queue.
>>>>>Nov 5 11:48:22 sarek crmd: [9298]: info: mask(lrm.c:do_update_resource): Updating kill_spock resource definitions after stop op
>>>>>...
>>>>>Nov 5 11:48:24 sarek heartbeat: [9261]: WARN: Exiting /usr/lib/heartbeat/stonithd process 9296 killed by signal 11.
>>>>>Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Exiting /usr/lib/heartbeat/stonithd process 9296 dumped core
>>>>>Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Client /usr/lib/heartbeat/stonithd killed by signal 11.
>>>>>Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Respawning client "/usr/lib/heartbeat/stonithd":
>>>>>Nov 5 11:48:24 sarek heartbeat: [9261]: info: Starting child client "/usr/lib/heartbeat/stonithd" (0,0)
>>>>>Nov 5 11:48:24 sarek heartbeat: [17057]: info: Starting "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 17057)
>>>>
>>>>I'm puzzled by this issue ( stonithd killed by signal 11 ) for a long time,
>>>>because it's not reproduced on my machine.
>>>>It's so fortune for me you can reproduce it stably. ;-)
>>>
>>>In fact it is killed everytime I start heartbeat. Sometimes it is killed after 4 or 5 minutes takes a little bit longer (1 hour) (subjective impression is that it takes longer if the machine is fresh rebooted)
>>>
>>>
>>>>I make a small patch again to current HEAD file lib/clplumbing/cl_msg.c. Can you
>>>>please apply it and try again? This should be helpful for me to located the
>>>>issue more further. Thanks a lots in advance.
>>>>
>>>
>>>
>>>OK, used the current CVS HEAD from today. I have attached the logs of both nodes.
>>>Im not 100 percent sure yet, but it seems, that if stonithd segfaulted one time, and therefore no monitor operations are carried out anymore it will not segfault anymore. So maybe the monitor operation causes the segfault somehow???
>>>(Just wanted to mention that, perhaps it's helpful)
>>
>>Thanks so much for your help.
>>Besides, do you apply my small patch as the attachment? I cannot see the output
>>from the small patch.
>>And, from the log you attached, it seems the issue of this time has a different
>>cause comparing to the last one. I added several memory initializing statements
>>in CVS. Could you please have a try again. Thanks and waiting for your result.
>>
>
> Ups, I misunderstood your mail, I though the patch were in the CVS HEAD,
> sorry. I think I will be able to apply the patch in a few hours and then
> mail you the logs.
>
No problem. Look forward to your result.

>>>BTW: I would much appreciate it, if someone could get LRM (or CRM) to restart the stonith resources reliably, in such a case. It's maybe sufficient if the stonith resources get restarted until the start operation succeeds. Is there somewhere a trigger in cib.xml where I can specify, try to restart infinitely? (or at least try it 100 times or so :)
>>>
>>
>>I'll file a bug for this, but currently only for tracking the requirement.
>>http://www.osdl.org/developer_bugzilla/show_bug.cgi?id=950
>
> Many thanks for that.
Welcome.
>
> Stefan Peinkofer
>
>>>Many thanks in advance.
>>>Stefan Peinkofer
>>>
>>>
>>>>>Note, I haven't attached full logs + core backtrace since the look like the onesI have provided in the former mail. If you want them regardles of that, let me know.
>>>>>BTW. At least the OCF resource script IPAddr in the recent CVS HEAD is "broken" (at least for my system). To get heartbeat working for testing Problem 2 status, I used the ones from a CVS version from 2005-11-02. I have no time today to investigate further, but I think I will look at it closer towmorrow evening.
>>>>>Many thanks in advance.
>>>>
>>>>BTW, I fixed the broken issue of the OCF IPAddr.
>>>>
>>>>
>>>>>Stefan Peinkofer
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>> Thanks!
>>>>>>
>>>>>>I have to Thank.
>>>>>>Stefan Peinkofer
>>>>>>
>>>>>>
>>>>>>
>>>>>>>--
>>>>>>> Alan Robertson <alanr [at] unix>
>>>>>>>
>>>>>>>"Openness is the foundation and preservative of friendship... Let me
>>>>>>>claim from you at all times your undisguised opinions." - William
>>>>>>>Wilberforce
>>>>>>>_______________________________________________
>>>>>>>Linux-HA mailing list
>>>>>>>Linux-HA [at] lists
>>>>>>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>See also: http://linux-ha.org/ReportingProblems
>>>>>>
>>>>>>_______________________________________________
>>>>>>Linux-HA mailing list
>>>>>>Linux-HA [at] lists
>>>>>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>See also: http://linux-ha.org/ReportingProblems
>>>>>
>>>>>_______________________________________________
>>>>>Linux-HA mailing list
>>>>>Linux-HA [at] lists
>>>>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>See also: http://linux-ha.org/ReportingProblems
>>>>>
>>>>
>>>>--
>>>>BRs,
>>>>
>>>>Sun Jiang Dong
>>>
>>>
>>>>Index: cl_msg.c
>>>>===================================================================
>>>>RCS file: /home/cvs/linux-ha/linux-ha/lib/clplumbing/cl_msg.c,v
>>>>retrieving revision 1.101
>>>>diff -u -r1.101 cl_msg.c
>>>>--- cl_msg.c 3 Nov 2005 22:28:32 -0000 1.101
>>>>+++ cl_msg.c 7 Nov 2005 07:35:43 -0000
>>>>@@ -1964,11 +1964,24 @@
>>>> return HA_FAIL;
>>>> }
>>>>
>>>>+ /*
>>>>+ * Just for debugging bug 730, will remove it after the bug is fixed.
>>>>+ * http://www.osdl.org/developer_bugzilla/show_bug.cgi?id=730
>>>>+ */
>>>>+ cl_log(LOG_INFO, "%s:%d: Will audit the ha_msg.", __FUNCTION__, __LINE__);
>>>>+ AUDITMSG(m);
>>>>+
>>>>+ cl_log(LOG_INFO, "%s:%d: Will detect the status of the channel as an "
>>>>+ " indirect checking", __FUNCTION__, __LINE__);
>>>>+ cl_log(LOG_INFO, "Channel staus: %d", ch->ops->get_chan_status(ch));
>>>>+
>>>> if ((imsg = hamsg2ipcmsg(m, ch)) == NULL) {
>>>> cl_log(LOG_ERR, "hamsg2ipcmsg() failure");
>>>> return HA_FAIL;
>>>> }
>>>>
>>>>+ cl_log(LOG_INFO, "%s:%d: hamsg2ipcmsg() ok.", __FUNCTION__, __LINE__);
>>>>+
>>>> if (ch->ops->send(ch, imsg) != IPC_OK) {
>>>> if (ch->ch_status == IPC_CONNECT) {
>>>> snprintf(ch->failreason,MAXFAILREASON,
>>>
>>>
>>>>_______________________________________________
>>>>Linux-HA mailing list
>>>>Linux-HA [at] lists
>>>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>See also: http://linux-ha.org/ReportingProblems
>>>>
>>>>
>>>>------------------------------------------------------------------------
>>>>
>>>>_______________________________________________
>>>>Linux-HA mailing list
>>>>Linux-HA [at] lists
>>>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>See also: http://linux-ha.org/ReportingProblems
>>
>>
>>------------------------------------------------------------------------
>>
>>_______________________________________________
>>Linux-HA mailing list
>>Linux-HA [at] lists
>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>See also: http://linux-ha.org/ReportingProblems

--
BRs,

Sun Jiang Dong

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


peinkofe at fhm

Nov 8, 2005, 11:26 AM

Post #42 of 55 (1846 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

Hello Sun Jiang Dong,
On Tue, 2005-11-08 at 18:23 +0800, Sun Jiang Dong wrote:

> >>>>>>>>Anyway I think the problem you met has been fixed in CVS. Please have a try.
> >>>>>>>>If you still meet it, please tell me. Thanks.
> >>>>>>>
> >>>>>>That was Problem 2 (cannot add field to ha_msg Error) which was fixed one or two weeks ago. What I mean is Problem 1 the stonithd coredump + not properly handled restart of the stonithd resources, after the core dump.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>And, I put some more safeguards into the code which was implicated.
> >>>>>>>And, gshi fixed a somewhat-related problem.
> >>>>>>>
> >>>>>>>Could you try again from CVS(HEAD)?
> >>>>>>
> >>>>>>I tried one from 2005-11-2 but it had still the problem 2. I will make a new try tomorrow and report the results.
> >>>>>>
> >>>>>
> >>>>>I tryed the recent CVS HEAD, and it shows still the same behavior.
> >>>>>After some time heartbeat was running:
> >>>>>Nov 5 11:47:58 sarek lrmd: [9297]: WARN: on_op_timeout_expired: TIMEOUT: operation monitor[22] on stonith::wti_nps::kill_spock for client 9298, its parameters: timeout=5000 ipaddr=192.168.1.204 te-target-rc=7 lrm-is-probe=true password=XXXXX crm_feature_set=1.0.3 interval=10000
> >>>>>...
> >>>>>Nov 5 11:48:01 sarek crmd: [9298]: ERROR: mask(lrm.c:do_lrm_event): LRM operation (22) monitor_10000 on kill_spock Timed Out
> >>>>>...
> >>>>>Nov 5 11:48:02 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op stop on kill_spock
> >>>>>Nov 5 11:48:02 sarek crmd: [9298]: WARN: mask(lrm.c:do_lrm_event): LRM operation (22) monitor_10000 on kill_spock Cancelled
> >>>>>...
> >>>>>Nov 5 11:48:04 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op start on kill_spoc
> >>>>>...
> >>>>>Nov 5 11:48:20 sarek crmd: [9298]: ERROR: mask(lrm.c:do_lrm_event): LRM operation (26) start_0 on kill_spock Error: unknown error
> >>>>>..
> >>>>>Nov 5 11:48:21 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op stop on kill_spock
> >>>>>Nov 5 11:48:21 sarek stonithd: [9296]: notice: try to stop a resource kill_spock who is not in started resource queue.
> >>>>>Nov 5 11:48:22 sarek crmd: [9298]: info: mask(lrm.c:do_update_resource): Updating kill_spock resource definitions after stop op
> >>>>>...
> >>>>>Nov 5 11:48:24 sarek heartbeat: [9261]: WARN: Exiting /usr/lib/heartbeat/stonithd process 9296 killed by signal 11.
> >>>>>Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Exiting /usr/lib/heartbeat/stonithd process 9296 dumped core
> >>>>>Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Client /usr/lib/heartbeat/stonithd killed by signal 11.
> >>>>>Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Respawning client "/usr/lib/heartbeat/stonithd":
> >>>>>Nov 5 11:48:24 sarek heartbeat: [9261]: info: Starting child client "/usr/lib/heartbeat/stonithd" (0,0)
> >>>>>Nov 5 11:48:24 sarek heartbeat: [17057]: info: Starting "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 17057)
> >>>>
> >>>>I'm puzzled by this issue ( stonithd killed by signal 11 ) for a long time,
> >>>>because it's not reproduced on my machine.
> >>>>It's so fortune for me you can reproduce it stably. ;-)
> >>>
> >>>In fact it is killed everytime I start heartbeat. Sometimes it is killed after 4 or 5 minutes takes a little bit longer (1 hour) (subjective impression is that it takes longer if the machine is fresh rebooted)
> >>>
> >>>
> >>>>I make a small patch again to current HEAD file lib/clplumbing/cl_msg.c. Can you
> >>>>please apply it and try again? This should be helpful for me to located the
> >>>>issue more further. Thanks a lots in advance.
> >>>>
> >>>
> >>>
> >>>OK, used the current CVS HEAD from today. I have attached the logs of both nodes.
> >>>Im not 100 percent sure yet, but it seems, that if stonithd segfaulted one time, and therefore no monitor operations are carried out anymore it will not segfault anymore. So maybe the monitor operation causes the segfault somehow???
> >>>(Just wanted to mention that, perhaps it's helpful)
> >>
> >>Thanks so much for your help.
> >>Besides, do you apply my small patch as the attachment? I cannot see the output
> >>from the small patch.
> >>And, from the log you attached, it seems the issue of this time has a different
> >>cause comparing to the last one. I added several memory initializing statements
> >>in CVS. Could you please have a try again. Thanks and waiting for your result.
> >>
> >
> > Ups, I misunderstood your mail, I though the patch were in the CVS HEAD,
> > sorry. I think I will be able to apply the patch in a few hours and then
> > mail you the logs.
> >
> No problem. Look forward to your result.
OK, I applied the patch some hours ago and started heartbeat. Somehow,
it took much longer until stontithd segfaulted (3 hours against few
minutes).
Since the log file is pretty hughe (6.6mb unziped and 176kb bzipped) I
attached only a little part of it. If you want me to mail the full logs
directely, let me know.

Many thanks in advance.
Stefan Peinkofer
>
> >>>BTW: I would much appreciate it, if someone could get LRM (or CRM) to restart the stonith resources reliably, in such a case. It's maybe sufficient if the stonith resources get restarted until the start operation succeeds. Is there somewhere a trigger in cib.xml where I can specify, try to restart infinitely? (or at least try it 100 times or so :)
> >>>
> >>
> >>I'll file a bug for this, but currently only for tracking the requirement.
> >>http://www.osdl.org/developer_bugzilla/show_bug.cgi?id=950
> >
> > Many thanks for that.
> Welcome.
> >
> > Stefan Peinkofer
> >
> >>>Many thanks in advance.
> >>>Stefan Peinkofer
> >>>
> >>>
> >>>>>Note, I haven't attached full logs + core backtrace since the look like the onesI have provided in the former mail. If you want them regardles of that, let me know.
> >>>>>BTW. At least the OCF resource script IPAddr in the recent CVS HEAD is "broken" (at least for my system). To get heartbeat working for testing Problem 2 status, I used the ones from a CVS version from 2005-11-02. I have no time today to investigate further, but I think I will look at it closer towmorrow evening.
> >>>>>Many thanks in advance.
> >>>>
> >>>>BTW, I fixed the broken issue of the OCF IPAddr.
> >>>>
> >>>>
> >>>>>Stefan Peinkofer
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>>> Thanks!
> >>>>>>
> >>>>>>I have to Thank.
> >>>>>>Stefan Peinkofer
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>--
> >>>>>>> Alan Robertson <alanr [at] unix>
> >>>>>>>
> >>>>>>>"Openness is the foundation and preservative of friendship... Let me
> >>>>>>>claim from you at all times your undisguised opinions." - William
> >>>>>>>Wilberforce
> >>>>>>>_______________________________________________
> >>>>>>>Linux-HA mailing list
> >>>>>>>Linux-HA [at] lists
> >>>>>>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>>>>>See also: http://linux-ha.org/ReportingProblems
> >>>>>>
> >>>>>>_______________________________________________
> >>>>>>Linux-HA mailing list
> >>>>>>Linux-HA [at] lists
> >>>>>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>>>>See also: http://linux-ha.org/ReportingProblems
> >>>>>
> >>>>>_______________________________________________
> >>>>>Linux-HA mailing list
> >>>>>Linux-HA [at] lists
> >>>>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>>>See also: http://linux-ha.org/ReportingProblems
> >>>>>
> >>>>
> >>>>--
> >>>>BRs,
> >>>>
> >>>>Sun Jiang Dong
> >>>
> >>>
> >>>>Index: cl_msg.c
> >>>>===================================================================
> >>>>RCS file: /home/cvs/linux-ha/linux-ha/lib/clplumbing/cl_msg.c,v
> >>>>retrieving revision 1.101
> >>>>diff -u -r1.101 cl_msg.c
> >>>>--- cl_msg.c 3 Nov 2005 22:28:32 -0000 1.101
> >>>>+++ cl_msg.c 7 Nov 2005 07:35:43 -0000
> >>>>@@ -1964,11 +1964,24 @@
> >>>> return HA_FAIL;
> >>>> }
> >>>>
> >>>>+ /*
> >>>>+ * Just for debugging bug 730, will remove it after the bug is fixed.
> >>>>+ * http://www.osdl.org/developer_bugzilla/show_bug.cgi?id=730
> >>>>+ */
> >>>>+ cl_log(LOG_INFO, "%s:%d: Will audit the ha_msg.", __FUNCTION__, __LINE__);
> >>>>+ AUDITMSG(m);
> >>>>+
> >>>>+ cl_log(LOG_INFO, "%s:%d: Will detect the status of the channel as an "
> >>>>+ " indirect checking", __FUNCTION__, __LINE__);
> >>>>+ cl_log(LOG_INFO, "Channel staus: %d", ch->ops->get_chan_status(ch));
> >>>>+
> >>>> if ((imsg = hamsg2ipcmsg(m, ch)) == NULL) {
> >>>> cl_log(LOG_ERR, "hamsg2ipcmsg() failure");
> >>>> return HA_FAIL;
> >>>> }
> >>>>
> >>>>+ cl_log(LOG_INFO, "%s:%d: hamsg2ipcmsg() ok.", __FUNCTION__, __LINE__);
> >>>>+
> >>>> if (ch->ops->send(ch, imsg) != IPC_OK) {
> >>>> if (ch->ch_status == IPC_CONNECT) {
> >>>> snprintf(ch->failreason,MAXFAILREASON,
> >>>
> >>>
> >>>>_______________________________________________
> >>>>Linux-HA mailing list
> >>>>Linux-HA [at] lists
> >>>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>>See also: http://linux-ha.org/ReportingProblems
> >>>>
> >>>>
> >>>>------------------------------------------------------------------------
> >>>>
> >>>>_______________________________________________
> >>>>Linux-HA mailing list
> >>>>Linux-HA [at] lists
> >>>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>>See also: http://linux-ha.org/ReportingProblems
> >>
> >>
> >>------------------------------------------------------------------------
> >>
> >>_______________________________________________
> >>Linux-HA mailing list
> >>Linux-HA [at] lists
> >>http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>See also: http://linux-ha.org/ReportingProblems
>
Attachments: messages-sarek-edited.txt (55.7 KB)
  signature.asc (0.18 KB)


gshi at ncsa

Nov 8, 2005, 12:21 PM

Post #43 of 55 (1858 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

Nov 8 19:03:35 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will audit the ha_msg.
Nov 8 19:03:35 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
Nov 8 19:03:35 sarek heartbeat: [4018]: WARN: Exiting /usr/lib/heartbeat/stonithd process 4038 killed by signal 11.
Nov 8 19:03:35 sarek heartbeat: [4018]: ERROR: Exiting /usr/lib/heartbeat/stonithd process 4038 dumped core


So the message is fine, the channel is messed up.

I suspect the channel has already been destroied when the core dump happened.

-Guochun




Stefan Peinkofer wrote:

>Hello Sun Jiang Dong,
>On Tue, 2005-11-08 at 18:23 +0800, Sun Jiang Dong wrote:
>
>
>
>>>>>>>>>>Anyway I think the problem you met has been fixed in CVS. Please have a try.
>>>>>>>>>>If you still meet it, please tell me. Thanks.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>That was Problem 2 (cannot add field to ha_msg Error) which was fixed one or two weeks ago. What I mean is Problem 1 the stonithd coredump + not properly handled restart of the stonithd resources, after the core dump.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>And, I put some more safeguards into the code which was implicated.
>>>>>>>>>And, gshi fixed a somewhat-related problem.
>>>>>>>>>
>>>>>>>>>Could you try again from CVS(HEAD)?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>I tried one from 2005-11-2 but it had still the problem 2. I will make a new try tomorrow and report the results.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>I tryed the recent CVS HEAD, and it shows still the same behavior.
>>>>>>>After some time heartbeat was running:
>>>>>>>Nov 5 11:47:58 sarek lrmd: [9297]: WARN: on_op_timeout_expired: TIMEOUT: operation monitor[22] on stonith::wti_nps::kill_spock for client 9298, its parameters: timeout=5000 ipaddr=192.168.1.204 te-target-rc=7 lrm-is-probe=true password=XXXXX crm_feature_set=1.0.3 interval=10000
>>>>>>>...
>>>>>>>Nov 5 11:48:01 sarek crmd: [9298]: ERROR: mask(lrm.c:do_lrm_event): LRM operation (22) monitor_10000 on kill_spock Timed Out
>>>>>>>...
>>>>>>>Nov 5 11:48:02 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op stop on kill_spock
>>>>>>>Nov 5 11:48:02 sarek crmd: [9298]: WARN: mask(lrm.c:do_lrm_event): LRM operation (22) monitor_10000 on kill_spock Cancelled
>>>>>>>...
>>>>>>>Nov 5 11:48:04 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op start on kill_spoc
>>>>>>>...
>>>>>>>Nov 5 11:48:20 sarek crmd: [9298]: ERROR: mask(lrm.c:do_lrm_event): LRM operation (26) start_0 on kill_spock Error: unknown error
>>>>>>>..
>>>>>>>Nov 5 11:48:21 sarek crmd: [9298]: info: mask(lrm.c:do_lrm_rsc_op): Performing op stop on kill_spock
>>>>>>>Nov 5 11:48:21 sarek stonithd: [9296]: notice: try to stop a resource kill_spock who is not in started resource queue.
>>>>>>>Nov 5 11:48:22 sarek crmd: [9298]: info: mask(lrm.c:do_update_resource): Updating kill_spock resource definitions after stop op
>>>>>>>...
>>>>>>>Nov 5 11:48:24 sarek heartbeat: [9261]: WARN: Exiting /usr/lib/heartbeat/stonithd process 9296 killed by signal 11.
>>>>>>>Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Exiting /usr/lib/heartbeat/stonithd process 9296 dumped core
>>>>>>>Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Client /usr/lib/heartbeat/stonithd killed by signal 11.
>>>>>>>Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Respawning client "/usr/lib/heartbeat/stonithd":
>>>>>>>Nov 5 11:48:24 sarek heartbeat: [9261]: info: Starting child client "/usr/lib/heartbeat/stonithd" (0,0)
>>>>>>>Nov 5 11:48:24 sarek heartbeat: [17057]: info: Starting "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 17057)
>>>>>>>
>>>>>>>
>>>>>>I'm puzzled by this issue ( stonithd killed by signal 11 ) for a long time,
>>>>>>because it's not reproduced on my machine.
>>>>>>It's so fortune for me you can reproduce it stably. ;-)
>>>>>>
>>>>>>
>>>>>In fact it is killed everytime I start heartbeat. Sometimes it is killed after 4 or 5 minutes takes a little bit longer (1 hour) (subjective impression is that it takes longer if the machine is fresh rebooted)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>I make a small patch again to current HEAD file lib/clplumbing/cl_msg.c. Can you
>>>>>>please apply it and try again? This should be helpful for me to located the
>>>>>>issue more further. Thanks a lots in advance.
>>>>>>
>>>>>>
>>>>>>
>>>>>OK, used the current CVS HEAD from today. I have attached the logs of both nodes.
>>>>>Im not 100 percent sure yet, but it seems, that if stonithd segfaulted one time, and therefore no monitor operations are carried out anymore it will not segfault anymore. So maybe the monitor operation causes the segfault somehow???
>>>>>(Just wanted to mention that, perhaps it's helpful)
>>>>>
>>>>>
>>>>Thanks so much for your help.
>>>>Besides, do you apply my small patch as the attachment? I cannot see the output
>>>>
>>>>
>>>>from the small patch.
>>>
>>>
>>>>And, from the log you attached, it seems the issue of this time has a different
>>>>cause comparing to the last one. I added several memory initializing statements
>>>>in CVS. Could you please have a try again. Thanks and waiting for your result.
>>>>
>>>>
>>>>
>>>Ups, I misunderstood your mail, I though the patch were in the CVS HEAD,
>>>sorry. I think I will be able to apply the patch in a few hours and then
>>>mail you the logs.
>>>
>>>
>>>
>>No problem. Look forward to your result.
>>
>>
>OK, I applied the patch some hours ago and started heartbeat. Somehow,
>it took much longer until stontithd segfaulted (3 hours against few
>minutes).
>Since the log file is pretty hughe (6.6mb unziped and 176kb bzipped) I
>attached only a little part of it. If you want me to mail the full logs
>directely, let me know.
>
>Many thanks in advance.
>Stefan Peinkofer
>
>
>>>>>BTW: I would much appreciate it, if someone could get LRM (or CRM) to restart the stonith resources reliably, in such a case. It's maybe sufficient if the stonith resources get restarted until the start operation succeeds. Is there somewhere a trigger in cib.xml where I can specify, try to restart infinitely? (or at least try it 100 times or so :)
>>>>>
>>>>>
>>>>>
>>>>I'll file a bug for this, but currently only for tracking the requirement.
>>>>http://www.osdl.org/developer_bugzilla/show_bug.cgi?id=950
>>>>
>>>>
>>>Many thanks for that.
>>>
>>>
>>Welcome.
>>
>>
>>>Stefan Peinkofer
>>>
>>>
>>>
>>>>>Many thanks in advance.
>>>>>Stefan Peinkofer
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>>Note, I haven't attached full logs + core backtrace since the look like the onesI have provided in the former mail. If you want them regardles of that, let me know.
>>>>>>>BTW. At least the OCF resource script IPAddr in the recent CVS HEAD is "broken" (at least for my system). To get heartbeat working for testing Problem 2 status, I used the ones from a CVS version from 2005-11-02. I have no time today to investigate further, but I think I will look at it closer towmorrow evening.
>>>>>>>Many thanks in advance.
>>>>>>>
>>>>>>>
>>>>>>BTW, I fixed the broken issue of the OCF IPAddr.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>Stefan Peinkofer
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>I have to Thank.
>>>>>>>>Stefan Peinkofer
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>--
>>>>>>>>> Alan Robertson <alanr [at] unix>
>>>>>>>>>
>>>>>>>>>"Openness is the foundation and preservative of friendship... Let me
>>>>>>>>>claim from you at all times your undisguised opinions." - William
>>>>>>>>>Wilberforce
>>>>>>>>>_______________________________________________
>>>>>>>>>Linux-HA mailing list
>>>>>>>>>Linux-HA [at] lists
>>>>>>>>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>>>See also: http://linux-ha.org/ReportingProblems
>>>>>>>>>
>>>>>>>>>
>>>>>>>>_______________________________________________
>>>>>>>>Linux-HA mailing list
>>>>>>>>Linux-HA [at] lists
>>>>>>>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>>See also: http://linux-ha.org/ReportingProblems
>>>>>>>>
>>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>Linux-HA mailing list
>>>>>>>Linux-HA [at] lists
>>>>>>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>See also: http://linux-ha.org/ReportingProblems
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>--
>>>>>>BRs,
>>>>>>
>>>>>>Sun Jiang Dong
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>>Index: cl_msg.c
>>>>>>===================================================================
>>>>>>RCS file: /home/cvs/linux-ha/linux-ha/lib/clplumbing/cl_msg.c,v
>>>>>>retrieving revision 1.101
>>>>>>diff -u -r1.101 cl_msg.c
>>>>>>--- cl_msg.c 3 Nov 2005 22:28:32 -0000 1.101
>>>>>>+++ cl_msg.c 7 Nov 2005 07:35:43 -0000
>>>>>>@@ -1964,11 +1964,24 @@
>>>>>> return HA_FAIL;
>>>>>> }
>>>>>>
>>>>>>+ /*
>>>>>>+ * Just for debugging bug 730, will remove it after the bug is fixed.
>>>>>>+ * http://www.osdl.org/developer_bugzilla/show_bug.cgi?id=730
>>>>>>+ */
>>>>>>+ cl_log(LOG_INFO, "%s:%d: Will audit the ha_msg.", __FUNCTION__, __LINE__);
>>>>>>+ AUDITMSG(m);
>>>>>>+
>>>>>>+ cl_log(LOG_INFO, "%s:%d: Will detect the status of the channel as an "
>>>>>>+ " indirect checking", __FUNCTION__, __LINE__);
>>>>>>+ cl_log(LOG_INFO, "Channel staus: %d", ch->ops->get_chan_status(ch));
>>>>>>+
>>>>>> if ((imsg = hamsg2ipcmsg(m, ch)) == NULL) {
>>>>>> cl_log(LOG_ERR, "hamsg2ipcmsg() failure");
>>>>>> return HA_FAIL;
>>>>>> }
>>>>>>
>>>>>>+ cl_log(LOG_INFO, "%s:%d: hamsg2ipcmsg() ok.", __FUNCTION__, __LINE__);
>>>>>>+
>>>>>> if (ch->ops->send(ch, imsg) != IPC_OK) {
>>>>>> if (ch->ch_status == IPC_CONNECT) {
>>>>>> snprintf(ch->failreason,MAXFAILREASON,
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>>_______________________________________________
>>>>>>Linux-HA mailing list
>>>>>>Linux-HA [at] lists
>>>>>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>See also: http://linux-ha.org/ReportingProblems
>>>>>>
>>>>>>
>>>>>>------------------------------------------------------------------------
>>>>>>
>>>>>>_______________________________________________
>>>>>>Linux-HA mailing list
>>>>>>Linux-HA [at] lists
>>>>>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>See also: http://linux-ha.org/ReportingProblems
>>>>>>
>>>>>>
>>>>------------------------------------------------------------------------
>>>>
>>>>_______________________________________________
>>>>Linux-HA mailing list
>>>>Linux-HA [at] lists
>>>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>See also: http://linux-ha.org/ReportingProblems
>>>>
>>>>
>>------------------------------------------------------------------------
>>
>>Nov 8 19:02:51 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:02:51 sarek cib: [4037]: info: mask(callbacks.c:cib_common_callback): Operation cib_query from client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>Nov 8 19:02:51 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:02:51 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:02:51 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:02:51 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:02:53 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:02:53 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:02:53 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>Nov 8 19:02:53 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:02:53 sarek cib: [4037]: info: mask(callbacks.c:cib_common_callback): Operation cib_query from client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>Nov 8 19:02:53 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:02:53 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:02:53 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:02:53 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:02:56 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:02:56 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:02:56 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>Nov 8 19:02:56 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:02:56 sarek cib: [4037]: info: mask(callbacks.c:cib_common_callback): Operation cib_query from client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>Nov 8 19:02:56 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:02:56 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:02:56 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:02:56 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:02:57 sarek lrmd: [25206]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:02:57 sarek lrmd: [25206]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:02:57 sarek lrmd: [25206]: info: Channel staus: 1
>>Nov 8 19:02:57 sarek lrmd: [25206]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:02:57 sarek lrmd: [25206]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:02:57 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:02:57 sarek lrmd: [25206]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:02:57 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:02:57 sarek lrmd: [25206]: info: Channel staus: 1
>>Nov 8 19:02:57 sarek stonithd: [4038]: info: Channel staus: 1
>>Nov 8 19:02:57 sarek lrmd: [25206]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:02:57 sarek stonithd: [4038]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:02:57 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:02:57 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:02:57 sarek stonithd: [4038]: info: Channel staus: 1
>>Nov 8 19:02:57 sarek stonithd: [4038]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:02:58 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:02:58 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:02:58 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>Nov 8 19:02:58 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:02:58 sarek cib: [4037]: info: mask(callbacks.c:cib_common_callback): Operation cib_query from client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>Nov 8 19:02:58 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:02:58 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:02:58 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:02:58 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:00 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:00 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:00 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>Nov 8 19:03:00 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:00 sarek cib: [4037]: info: mask(callbacks.c:cib_common_callback): Operation cib_query from client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>Nov 8 19:03:00 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:00 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:00 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:03:00 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:02 sarek lrmd: [4039]: WARN: on_op_timeout_expired: TIMEOUT: operation monitor[22] on stonith::wti_nps::kill_spock for client 4040, its parameters: timeout=5000 ipaddr=192.168.1.204 te-target-rc=7 lrm-is-probe=true password=XXXXXX crm_feature_set=1.0.3 interval=10000 .
>>Nov 8 19:03:02 sarek T/O PS:: F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
>>Nov 8 19:03:02 sarek t/o ps:: PID TTY STAT TIME COMMAND
>>Nov 8 19:03:02 sarek t/o ps:: 1 ? S 0:00 init [3]
>>Nov 8 19:03:02 sarek t/o ps:: 2 ? S 0:00 [migration/0]
>>Nov 8 19:03:02 sarek t/o ps:: 3 ? SN 0:00 [ksoftirqd/0]
>>Nov 8 19:03:02 sarek t/o ps:: 4 ? S 0:00 [migration/1]
>>Nov 8 19:03:02 sarek t/o ps:: 5 ? SN 0:00 [ksoftirqd/1]
>>Nov 8 19:03:02 sarek t/o ps:: 6 ? S 0:00 [migration/2]
>>Nov 8 19:03:02 sarek t/o ps:: 7 ? SN 0:00 [ksoftirqd/2]
>>Nov 8 19:03:02 sarek t/o ps:: 8 ? S 0:00 [migration/3]
>>Nov 8 19:03:02 sarek t/o ps:: 9 ? SN 0:00 [ksoftirqd/3]
>>Nov 8 19:03:02 sarek t/o ps:: 10 ? S< 0:00 [events/0]
>>Nov 8 19:03:02 sarek t/o ps:: 11 ? S< 0:00 [events/1]
>>Nov 8 19:03:02 sarek t/o ps:: 12 ? S< 0:00 [events/2]
>>Nov 8 19:03:02 sarek t/o ps:: 13 ? S< 0:00 [events/3]
>>Nov 8 19:03:02 sarek t/o ps:: 14 ? S< 0:00 [khelper]
>>Nov 8 19:03:02 sarek t/o ps:: 15 ? S< 0:00 [kacpid]
>>Nov 8 19:03:02 sarek t/o ps:: 41 ? S< 0:00 [kblockd/0]
>>Nov 8 19:03:02 sarek t/o ps:: 42 ? S< 0:00 [kblockd/1]
>>Nov 8 19:03:02 sarek t/o ps:: 43 ? S< 0:00 [kblockd/2]
>>Nov 8 19:03:02 sarek t/o ps:: 44 ? S< 0:00 [kblockd/3]
>>Nov 8 19:03:02 sarek t/o ps:: 54 ? S 0:00 [pdflush]
>>Nov 8 19:03:02 sarek t/o ps:: 55 ? S 0:00 [pdflush]
>>Nov 8 19:03:02 sarek t/o ps:: 57 ? S< 0:00 [aio/0]
>>Nov 8 19:03:02 sarek t/o ps:: 58 ? S< 0:00 [aio/1]
>>Nov 8 19:03:02 sarek t/o ps:: 59 ? S< 0:00 [aio/2]
>>Nov 8 19:03:02 sarek t/o ps:: 60 ? S< 0:00 [aio/3]
>>Nov 8 19:03:02 sarek t/o ps:: 45 ? S 0:00 [khubd]
>>Nov 8 19:03:02 sarek t/o ps:: 56 ? S 0:00 [kswapd0]
>>Nov 8 19:03:02 sarek t/o ps:: 133 ? S 0:00 [kseriod]
>>Nov 8 19:03:02 sarek t/o ps:: 203 ? S 0:00 [scsi_eh_0]
>>Nov 8 19:03:02 sarek t/o ps:: 204 ? S 0:00 [ahd_dv_0]
>>Nov 8 19:03:02 sarek t/o ps:: 235 ? S 0:00 [scsi_eh_1]
>>Nov 8 19:03:02 sarek t/o ps:: 236 ? S 0:00 [ahd_dv_1]
>>Nov 8 19:03:02 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:02 sarek t/o ps:: 242 ? S 0:00 [scsi_eh_2]
>>Nov 8 19:03:02 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:02 sarek t/o ps:: 243 ? S< 0:00 [qla2300_2_dpc]
>>Nov 8 19:03:03 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>Nov 8 19:03:03 sarek t/o ps:: 261 ? S 0:00 [scsi_eh_3]
>>Nov 8 19:03:03 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:03 sarek t/o ps:: 262 ? S< 0:00 [qla2300_3_dpc]
>>Nov 8 19:03:03 sarek cib: [4037]: info: mask(callbacks.c:cib_common_callback): Operation cib_query from client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>Nov 8 19:03:03 sarek t/o ps:: 270 ? S 0:00 [md2_raid1]
>>Nov 8 19:03:03 sarek t/o ps:: 272 ? S 0:00 [md1_raid1]
>>Nov 8 19:03:03 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:03 sarek t/o ps:: 273 ? S 0:02 [md0_raid1]
>>Nov 8 19:03:03 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:03 sarek t/o ps:: 274 ? D 0:46 [kjournald]
>>Nov 8 19:03:03 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:03:03 sarek t/o ps:: 1599 ? S<s 0:00 udevd
>>Nov 8 19:03:03 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:03 sarek t/o ps:: 1829 ? S< 0:00 [kmirrord/0]
>>Nov 8 19:03:03 sarek t/o ps:: 1830 ? S< 0:00 [kmirrord/1]
>>Nov 8 19:03:03 sarek t/o ps:: 1831 ? S< 0:00 [kmirrord/2]
>>Nov 8 19:03:03 sarek t/o ps:: 1832 ? S< 0:00 [kmirrord/3]
>>Nov 8 19:03:03 sarek t/o ps:: 1870 ? S 0:00 [kjournald]
>>Nov 8 19:03:03 sarek t/o ps:: 2645 ? Ds 0:13 syslogd -m 0
>>Nov 8 19:03:03 sarek t/o ps:: 2649 ? Ss 0:00 klogd -x
>>Nov 8 19:03:03 sarek t/o ps:: 2659 ? Ss 0:00 irqbalance
>>Nov 8 19:03:03 sarek t/o ps:: 2676 ? Ss 0:00 portmap
>>Nov 8 19:03:03 sarek t/o ps:: 2695 ? Ss 0:00 rpc.statd
>>Nov 8 19:03:03 sarek t/o ps:: 2706 ? Ss 0:00 mdadm --monitor --scan -f
>>Nov 8 19:03:04 sarek t/o ps:: 2736 ? Ss 0:00 rpc.idmapd
>>Nov 8 19:03:04 sarek t/o ps:: 2903 ? S 0:00 /usr/sbin/smartd
>>Nov 8 19:03:04 sarek t/o ps:: 2912 ? Ss 0:00 /usr/sbin/acpid
>>Nov 8 19:03:04 sarek t/o ps:: 2923 ? Ss 0:00 cupsd
>>Nov 8 19:03:04 sarek t/o ps:: 2987 ? Ss 0:00 /usr/sbin/sshd
>>Nov 8 19:03:04 sarek t/o ps:: 3006 ? Ss 0:00 xinetd -stayalive -pidfile /var/run/xinetd.pid
>>Nov 8 19:03:04 sarek t/o ps:: 3021 ? SLs 0:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid
>>Nov 8 19:03:04 sarek t/o ps:: 3039 ? Ss 0:00 sendmail: accepting connections
>>Nov 8 19:03:04 sarek t/o ps:: 3047 ? Ss 0:00 sendmail: Queue runner [at] 0:00:00 for /var/spool/clientmqueue
>>Nov 8 19:03:04 sarek t/o ps:: 3077 ? Ss 0:00 /usr/sbin/htt -retryonerror 0
>>Nov 8 19:03:04 sarek t/o ps:: 3078 ? S 0:00 htt_server -nodaemon
>>Nov 8 19:03:04 sarek t/o ps:: 3087 ? Ss 0:00 crond
>>Nov 8 19:03:04 sarek t/o ps:: 3127 ? Ss 0:00 xfs -droppriv -daemon
>>Nov 8 19:03:04 sarek t/o ps:: 3144 ? Ss 0:00 /usr/sbin/atd
>>Nov 8 19:03:04 sarek t/o ps:: 3153 ? Ssl 0:00 dbus-daemon-1 --system
>>Nov 8 19:03:04 sarek t/o ps:: 3164 ? Ss 0:00 rhnsd --interval 60
>>Nov 8 19:03:04 sarek t/o ps:: 3173 ? Ss 0:01 hald
>>Nov 8 19:03:04 sarek t/o ps:: 3183 tty1 Ss+ 0:00 /sbin/mingetty tty1
>>Nov 8 19:03:04 sarek t/o ps:: 3185 tty2 Ss+ 0:00 /sbin/mingetty tty2
>>Nov 8 19:03:04 sarek t/o ps:: 3186 tty3 Ss+ 0:00 /sbin/mingetty tty3
>>Nov 8 19:03:04 sarek t/o ps:: 3187 tty4 Ss+ 0:00 /sbin/mingetty tty4
>>Nov 8 19:03:04 sarek t/o ps:: 3188 tty5 Ss+ 0:00 /sbin/mingetty tty5
>>Nov 8 19:03:04 sarek t/o ps:: 3190 tty6 Ss+ 0:00 /sbin/mingetty tty6
>>Nov 8 19:03:04 sarek t/o ps:: 3898 ? Ss 0:00 sshd: root [at] pt/0
>>Nov 8 19:03:04 sarek t/o ps:: 3900 pts/0 Ss+ 0:00 -bash
>>Nov 8 19:03:04 sarek t/o ps:: 3953 ? Ss 0:00 sshd: root [at] pt/1
>>Nov 8 19:03:04 sarek t/o ps:: 3955 pts/1 Ss+ 0:00 -bash
>>Nov 8 19:03:04 sarek t/o ps:: 3995 pts/1 S 0:12 ha_logd: read process
>>Nov 8 19:03:04 sarek t/o ps:: 4017 pts/1 S 0:16 ha_logd: write process
>>Nov 8 19:03:04 sarek t/o ps:: 4018 ? SLs 0:24 heartbeat: master control process
>>Nov 8 19:03:04 sarek t/o ps:: 4025 ? SL 0:00 heartbeat: FIFO reader
>>Nov 8 19:03:04 sarek t/o ps:: 4026 ? SL 0:00 heartbeat: write: bcast eth3
>>Nov 8 19:03:04 sarek t/o ps:: 4027 ? SL 0:00 heartbeat: read: bcast eth3
>>Nov 8 19:03:04 sarek t/o ps:: 4028 ? SL 0:00 heartbeat: write: ping gomtuu.rz.fh-muenchen.de
>>Nov 8 19:03:05 sarek t/o ps:: 4029 ? SL 0:02 heartbeat: read: ping gomtuu.rz.fh-muenchen.de
>>Nov 8 19:03:05 sarek t/o ps:: 4030 ? SL 0:01 heartbeat: write: ping nagilum.rz.fh-muenchen.de
>>Nov 8 19:03:05 sarek t/o ps:: 4031 ? SL 0:02 heartbeat: read: ping nagilum.rz.fh-muenchen.de
>>Nov 8 19:03:05 sarek t/o ps:: 4032 ? SL 0:02 heartbeat: write: ping infotest.rz.fh-muenchen.de
>>Nov 8 19:03:05 sarek t/o ps:: 4033 ? SL 0:03 heartbeat: read: ping infotest.rz.fh-muenchen.de
>>Nov 8 19:03:05 sarek t/o ps:: 4034 ? SL 0:02 heartbeat: write: ping infotst2.rz.fh-muenchen.de
>>Nov 8 19:03:05 sarek t/o ps:: 4035 ? SL 0:02 heartbeat: read: ping infotst2.rz.fh-muenchen.de
>>Nov 8 19:03:05 sarek t/o ps:: 4036 ? S 0:00 /usr/lib/heartbeat/ccm
>>Nov 8 19:03:05 sarek t/o ps:: 4037 ? S 8:00 /usr/lib/heartbeat/cib
>>Nov 8 19:03:05 sarek t/o ps:: 4038 ? SL 0:02 /usr/lib/heartbeat/stonithd
>>Nov 8 19:03:05 sarek t/o ps:: 4039 ? S 0:01 /usr/lib/heartbeat/lrmd
>>Nov 8 19:03:05 sarek t/o ps:: 4040 ? S 0:00 /usr/lib/heartbeat/crmd
>>Nov 8 19:03:05 sarek t/o ps:: 4041 ? Ss 0:01 sshd: root [at] pt/2
>>Nov 8 19:03:05 sarek t/o ps:: 4043 pts/2 Ss 0:00 -bash
>>Nov 8 19:03:05 sarek t/o ps:: 4076 pts/2 S+ 16:10 /root/cluster/heartbeat/heartbeat_current/linux-ha-2005-11-7-2/crm/admin/.libs/lt-crm_mon -i 2
>>Nov 8 19:03:05 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:05 sarek t/o ps:: 4299 ? S 0:00 [kjournald]
>>Nov 8 19:03:05 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:05 sarek t/o ps:: 4345 ? S 0:02 /usr/postgres/bin/postmaster -D /telebase/data
>>Nov 8 19:03:05 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>Nov 8 19:03:05 sarek t/o ps:: 4370 ? S 0:00 postgres: writer process
>>Nov 8 19:03:05 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:05 sarek t/o ps:: 4371 ? S 0:00 postgres: stats buffer process
>>Nov 8 19:03:05 sarek cib: [4037]: info: mask(callbacks.c:cib_common_callback): Operation cib_query from client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>Nov 8 19:03:05 sarek t/o ps:: 4372 ? S 0:00 postgres: stats collector process
>>Nov 8 19:03:05 sarek t/o ps:: 25206 ? S 0:00 /usr/lib/heartbeat/lrmd
>>Nov 8 19:03:05 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:05 sarek t/o ps:: 25207 ? S 0:00 /usr/lib/heartbeat/stonithd
>>Nov 8 19:03:05 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:05 sarek t/o ps:: 25231 ? S 0:00 sh -c ps axww | logger -p daemon.info -t 't/o ps:'
>>Nov 8 19:03:05 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:03:05 sarek t/o ps:: 25232 ? R 0:00 ps axww
>>Nov 8 19:03:05 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:06 sarek t/o ps:: 25233 ? S 0:00 logger -p daemon.info -t t/o ps:
>>Nov 8 19:03:06 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:06 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:06 sarek lrmd: [4039]: info: Channel staus: 1
>>Nov 8 19:03:06 sarek crmd: [4040]: ERROR: mask(lrm.c:do_lrm_event): LRM operation (22) monitor_10000 on kill_spock Timed Out
>>Nov 8 19:03:06 sarek lrmd: [4039]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:06 sarek crmd: [4040]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:06 sarek crmd: [4040]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:06 sarek crmd: [4040]: info: Channel staus: 1
>>Nov 8 19:03:06 sarek crmd: [4040]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:06 sarek cib: [4037]: info: mask(callbacks.c:cib_common_callback): Operation cib_update from client 01d8a5e6-1642-4182-801b-91b74af0ee8/cib_rw
>>Nov 8 19:03:07 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:07 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:07 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:03:07 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:07 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:07 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:07 sarek heartbeat: [4018]: info: Channel staus: 1
>>Nov 8 19:03:07 sarek heartbeat: [4018]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:07 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:07 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:07 sarek heartbeat: [4018]: info: Channel staus: 1
>>Nov 8 19:03:07 sarek heartbeat: [4018]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:07 sarek cib: [4037]: info: mask(callbacks.c:cib_peer_callback): Processing cib_apply_diff msg (171f) from spock
>>Nov 8 19:03:07 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:07 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:08 sarek cib: [4037]: WARN: mask(io.c:initializeCib): Option suppress_cib_writes not set
>>Nov 8 19:03:08 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>Nov 8 19:03:08 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:08 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:08 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:08 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:03:08 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:08 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:08 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:08 sarek heartbeat: [4018]: info: Channel staus: 1
>>Nov 8 19:03:08 sarek heartbeat: [4018]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:08 sarek crmd: [4040]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:08 sarek crmd: [4040]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:08 sarek crmd: [4040]: info: Channel staus: 1
>>Nov 8 19:03:08 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:08 sarek crmd: [4040]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:09 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:09 sarek lrmd: [4039]: info: Channel staus: 1
>>Nov 8 19:03:09 sarek lrmd: [4039]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:09 sarek crmd: [4040]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:09 sarek crmd: [4040]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:09 sarek crmd: [4040]: info: Channel staus: 1
>>Nov 8 19:03:09 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:09 sarek crmd: [4040]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:09 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:09 sarek crmd: [4040]: info: mask(lrm.c:do_lrm_rsc_op): Performing op stop on kill_spock
>>Nov 8 19:03:09 sarek lrmd: [4039]: info: Channel staus: 1
>>Nov 8 19:03:09 sarek lrmd: [4039]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:09 sarek crmd: [4040]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:09 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:09 sarek crmd: [4040]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:10 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:10 sarek crmd: [4040]: info: Channel staus: 1
>>Nov 8 19:03:10 sarek lrmd: [4039]: info: Channel staus: 1
>>Nov 8 19:03:10 sarek crmd: [4040]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:10 sarek lrmd: [4039]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:10 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:10 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:10 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:10 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:10 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>Nov 8 19:03:10 sarek lrmd: [4039]: info: Channel staus: 1
>>Nov 8 19:03:10 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:11 sarek lrmd: [4039]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:11 sarek lrmd: [25254]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:11 sarek lrmd: [25254]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:11 sarek lrmd: [25254]: info: Channel staus: 1
>>Nov 8 19:03:11 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:11 sarek crmd: [4040]: WARN: mask(lrm.c:do_lrm_event): LRM operation (22) monitor_10000 on kill_spock Cancelled
>>Nov 8 19:03:11 sarek lrmd: [25254]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:11 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:11 sarek stonithd: [4038]: info: Channel staus: 1
>>Nov 8 19:03:11 sarek lrmd: [25254]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:11 sarek stonithd: [4038]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:11 sarek lrmd: [25254]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:11 sarek lrmd: [25254]: info: Channel staus: 1
>>Nov 8 19:03:11 sarek lrmd: [25254]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:12 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:12 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:12 sarek lrmd: [25254]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:12 sarek stonithd: [4038]: info: Channel staus: 1
>>Nov 8 19:03:12 sarek lrmd: [25254]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:12 sarek stonithd: [4038]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:12 sarek lrmd: [25254]: info: Channel staus: 1
>>Nov 8 19:03:12 sarek lrmd: [25254]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:12 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:12 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:13 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:13 sarek lrmd: [4039]: info: Channel staus: 1
>>Nov 8 19:03:13 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:13 sarek lrmd: [4039]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:13 sarek stonithd: [4038]: info: Channel staus: 1
>>Nov 8 19:03:13 sarek stonithd: [4038]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:13 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:13 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:13 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>Nov 8 19:03:13 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:13 sarek crmd: [4040]: info: mask(lrm.c:do_update_resource): Updating kill_spock resource definitions after stop op
>>Nov 8 19:03:13 sarek crmd: [4040]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:13 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:13 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:13 sarek crmd: [4040]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:13 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:14 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:14 sarek crmd: [4040]: info: Channel staus: 1
>>Nov 8 19:03:14 sarek lrmd: [4039]: info: Channel staus: 1
>>Nov 8 19:03:14 sarek stonithd: [4038]: info: Channel staus: 1
>>Nov 8 19:03:14 sarek crmd: [4040]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:14 sarek lrmd: [4039]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:14 sarek stonithd: [4038]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:14 sarek crmd: [4040]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:14 sarek crmd: [4040]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:14 sarek crmd: [4040]: info: Channel staus: 1
>>Nov 8 19:03:14 sarek crmd: [4040]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:14 sarek cib: [4037]: info: mask(callbacks.c:cib_common_callback): Operation cib_update from client 01d8a5e6-1642-4182-801b-91b74af0ee8/cib_rw
>>Nov 8 19:03:15 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:15 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:15 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:03:15 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:15 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:15 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:15 sarek heartbeat: [4018]: info: Channel staus: 1
>>Nov 8 19:03:15 sarek heartbeat: [4018]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:15 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:15 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:15 sarek heartbeat: [4018]: info: Channel staus: 1
>>Nov 8 19:03:15 sarek heartbeat: [4018]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:15 sarek cib: [4037]: info: mask(callbacks.c:cib_peer_callback): Processing cib_apply_diff msg (1721) from spock
>>Nov 8 19:03:15 sarek cib: [4037]: WARN: mask(io.c:initializeCib): Option suppress_cib_writes not set
>>Nov 8 19:03:15 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:15 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:15 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:15 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:03:15 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:16 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:16 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>Nov 8 19:03:16 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:16 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:16 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:16 sarek heartbeat: [4018]: info: Channel staus: 1
>>Nov 8 19:03:16 sarek heartbeat: [4018]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:16 sarek cib: [4037]: info: mask(callbacks.c:cib_peer_callback): Processing cib_apply_diff msg (1722) from spock
>>Nov 8 19:03:16 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:16 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:16 sarek heartbeat: [4018]: info: Channel staus: 1
>>Nov 8 19:03:16 sarek heartbeat: [4018]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:16 sarek crmd: [4040]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:16 sarek crmd: [4040]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:16 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:16 sarek crmd: [4040]: info: Channel staus: 1
>>Nov 8 19:03:17 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:17 sarek crmd: [4040]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:17 sarek lrmd: [4039]: info: Channel staus: 1
>>Nov 8 19:03:17 sarek crmd: [4040]: info: mask(lrm.c:do_lrm_rsc_op): Performing op start on kill_spock
>>Nov 8 19:03:17 sarek lrmd: [4039]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:17 sarek crmd: [4040]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:17 sarek crmd: [4040]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:17 sarek crmd: [4040]: info: Channel staus: 1
>>Nov 8 19:03:17 sarek crmd: [4040]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:17 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:17 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:17 sarek lrmd: [4039]: info: Channel staus: 1
>>Nov 8 19:03:18 sarek lrmd: [4039]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:18 sarek lrmd: [25256]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:18 sarek lrmd: [25256]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:18 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:18 sarek lrmd: [25256]: info: Channel staus: 1
>>Nov 8 19:03:18 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:18 sarek lrmd: [25256]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:18 sarek stonithd: [4038]: info: Channel staus: 1
>>Nov 8 19:03:18 sarek stonithd: [4038]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:18 sarek lrmd: [25256]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:18 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:18 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:18 sarek lrmd: [25256]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:18 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>Nov 8 19:03:18 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:18 sarek lrmd: [25256]: info: Channel staus: 1
>>Nov 8 19:03:19 sarek lrmd: [25256]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:19 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:19 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:19 sarek stonithd: [4038]: info: Channel staus: 1
>>Nov 8 19:03:19 sarek stonithd: [4038]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:19 sarek cib: [4037]: WARN: mask(io.c:initializeCib): Option suppress_cib_writes not set
>>Nov 8 19:03:19 sarek cib: [4037]: info: mask(callbacks.c:cib_common_callback): Operation cib_query from client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>Nov 8 19:03:19 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:19 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:19 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:03:19 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:19 sarek cib: [4037]: info: mask(callbacks.c:cib_common_callback): Operation cib_query from client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>Nov 8 19:03:19 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:19 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:20 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:03:20 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:20 sarek cib: [4037]: info: mask(callbacks.c:cib_common_callback): Operation cib_query from client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>Nov 8 19:03:20 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:20 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:20 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:03:20 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:20 sarek cib: [4037]: info: mask(callbacks.c:cib_common_callback): Operation cib_query from client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>Nov 8 19:03:20 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:20 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:20 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:03:20 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:20 sarek cib: [4037]: info: mask(callbacks.c:cib_common_callback): Operation cib_query from client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>Nov 8 19:03:20 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:20 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:21 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:03:21 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:21 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:21 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:21 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>Nov 8 19:03:21 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:21 sarek cib: [4037]: info: mask(callbacks.c:cib_common_callback): Operation cib_query from client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>Nov 8 19:03:21 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:21 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:21 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:03:21 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:23 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:23 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:23 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>Nov 8 19:03:23 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:23 sarek cib: [4037]: info: mask(callbacks.c:cib_common_callback): Operation cib_query from client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>Nov 8 19:03:23 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:23 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:23 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:03:23 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:24 sarek lrmd: [25256]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:24 sarek lrmd: [25256]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:24 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:24 sarek lrmd: [25256]: info: Channel staus: 1
>>Nov 8 19:03:24 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:24 sarek lrmd: [25256]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:24 sarek stonithd: [4038]: info: Channel staus: 1
>>Nov 8 19:03:24 sarek stonithd: [4038]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:24 sarek lrmd: [25256]: WARN: mapped the invalid return code 14.
>>Nov 8 19:03:24 sarek crmd: [4040]: ERROR: mask(lrm.c:do_lrm_event): LRM operation (26) start_0 on kill_spock Error: unknown error
>>Nov 8 19:03:24 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:24 sarek postgres[25332]: [2-1] ERROR: database "hb_rg_testdb" does not exist
>>Nov 8 19:03:24 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:24 sarek postgres[25332]: [2-2] STATEMENT: DROP DATABASE hb_rg_testdb
>>Nov 8 19:03:25 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:25 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:25 sarek crmd: [4040]: info: mask(lrm.c:do_update_resource): Updating kill_spock resource definitions after start op
>>Nov 8 19:03:25 sarek lrmd: [4039]: info: Channel staus: 1
>>Nov 8 19:03:25 sarek stonithd: [4038]: info: Channel staus: 1
>>Nov 8 19:03:25 sarek crmd: [4040]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:25 sarek lrmd: [4039]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:25 sarek stonithd: [4038]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:25 sarek crmd: [4040]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:25 sarek postgres[25339]: [2-1] NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "hb_rg_testtable_pkey" for table "hb_rg_testtable"
>>Nov 8 19:03:25 sarek crmd: [4040]: info: Channel staus: 1
>>Nov 8 19:03:25 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:25 sarek crmd: [4040]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:25 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:25 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:25 sarek lrmd: [4039]: info: Channel staus: 1
>>Nov 8 19:03:25 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:26 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>Nov 8 19:03:26 sarek lrmd: [4039]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:26 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:26 sarek crmd: [4040]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:26 sarek crmd: [4040]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:26 sarek crmd: [4040]: info: Channel staus: 1
>>Nov 8 19:03:26 sarek crmd: [4040]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:26 sarek cib: [4037]: info: mask(callbacks.c:cib_common_callback): Operation cib_update from client 01d8a5e6-1642-4182-801b-91b74af0ee8/cib_rw
>>Nov 8 19:03:26 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:26 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:26 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:03:26 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:26 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:26 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:27 sarek heartbeat: [4018]: info: Channel staus: 1
>>Nov 8 19:03:27 sarek heartbeat: [4018]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:27 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:27 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:27 sarek heartbeat: [4018]: info: Channel staus: 1
>>Nov 8 19:03:27 sarek heartbeat: [4018]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:27 sarek cib: [4037]: info: mask(callbacks.c:cib_peer_callback): Processing cib_apply_diff msg (172d) from spock
>>Nov 8 19:03:27 sarek cib: [4037]: WARN: mask(io.c:initializeCib): Option suppress_cib_writes not set
>>Nov 8 19:03:27 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:27 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:27 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:03:27 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:27 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:27 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:27 sarek heartbeat: [4018]: info: Channel staus: 1
>>Nov 8 19:03:27 sarek heartbeat: [4018]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:28 sarek crmd: [4040]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:28 sarek crmd: [4040]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:28 sarek crmd: [4040]: info: Channel staus: 1
>>Nov 8 19:03:28 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:28 sarek crmd: [4040]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:28 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:28 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:28 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:28 sarek crmd: [4040]: info: mask(lrm.c:do_lrm_rsc_op): Performing op stop on kill_spock
>>Nov 8 19:03:28 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>Nov 8 19:03:28 sarek lrmd: [4039]: info: Channel staus: 1
>>Nov 8 19:03:28 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:29 sarek lrmd: [4039]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:29 sarek crmd: [4040]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:29 sarek crmd: [4040]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:29 sarek crmd: [4040]: info: Channel staus: 1
>>Nov 8 19:03:29 sarek crmd: [4040]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:29 sarek lrmd: [25335]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:29 sarek lrmd: [25335]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:29 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:29 sarek lrmd: [25335]: info: Channel staus: 1
>>Nov 8 19:03:29 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:29 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:30 sarek lrmd: [4039]: info: Channel staus: 1
>>Nov 8 19:03:30 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:30 sarek lrmd: [25335]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:30 sarek stonithd: [4038]: info: Channel staus: 1
>>Nov 8 19:03:30 sarek lrmd: [4039]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:30 sarek stonithd: [4038]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:30 sarek lrmd: [25335]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:30 sarek stonithd: [4038]: notice: try to stop a resource kill_spock who is not in started resource queue.
>>Nov 8 19:03:30 sarek lrmd: [25335]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:30 sarek lrmd: [25335]: info: Channel staus: 1
>>Nov 8 19:03:30 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:30 sarek lrmd: [25335]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:30 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:30 sarek stonithd: [4038]: info: Channel staus: 1
>>Nov 8 19:03:30 sarek stonithd: [4038]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:31 sarek lrmd: [25335]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:31 sarek lrmd: [25335]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:31 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:31 sarek lrmd: [25335]: info: Channel staus: 1
>>Nov 8 19:03:31 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:31 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>Nov 8 19:03:31 sarek lrmd: [25335]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:31 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:31 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:31 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:31 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:31 sarek stonithd: [4038]: info: Channel staus: 1
>>Nov 8 19:03:31 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:32 sarek stonithd: [4038]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:32 sarek lrmd: [4039]: info: Channel staus: 1
>>Nov 8 19:03:32 sarek lrmd: [4039]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:32 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:32 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:32 sarek stonithd: [4038]: info: Channel staus: 1
>>Nov 8 19:03:32 sarek stonithd: [4038]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:32 sarek crmd: [4040]: info: mask(lrm.c:do_update_resource): Updating kill_spock resource definitions after stop op
>>Nov 8 19:03:32 sarek crmd: [4040]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:32 sarek crmd: [4040]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:32 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:32 sarek crmd: [4040]: info: Channel staus: 1
>>Nov 8 19:03:32 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:33 sarek crmd: [4040]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:33 sarek lrmd: [4039]: info: Channel staus: 1
>>Nov 8 19:03:33 sarek lrmd: [4039]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:33 sarek crmd: [4040]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:33 sarek crmd: [4040]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:33 sarek crmd: [4040]: info: Channel staus: 1
>>Nov 8 19:03:33 sarek crmd: [4040]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:33 sarek cib: [4037]: info: mask(callbacks.c:cib_common_callback): Operation cib_update from client 01d8a5e6-1642-4182-801b-91b74af0ee8/cib_rw
>>Nov 8 19:03:33 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:33 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:33 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:03:33 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:33 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:33 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:33 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:34 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:34 sarek heartbeat: [4018]: info: Channel staus: 1
>>Nov 8 19:03:34 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>Nov 8 19:03:34 sarek heartbeat: [4018]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:34 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:34 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:34 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:34 sarek heartbeat: [4018]: info: Channel staus: 1
>>Nov 8 19:03:34 sarek heartbeat: [4018]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:34 sarek cib: [4037]: info: mask(callbacks.c:cib_peer_callback): Processing cib_apply_diff msg (172f) from spock
>>Nov 8 19:03:34 sarek cib: [4037]: WARN: mask(io.c:initializeCib): Option suppress_cib_writes not set
>>Nov 8 19:03:34 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:34 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:34 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:03:35 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:35 sarek cib: [4037]: info: mask(callbacks.c:cib_common_callback): Operation cib_query from client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>Nov 8 19:03:35 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:35 sarek cib: [4037]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:35 sarek cib: [4037]: info: Channel staus: 1
>>Nov 8 19:03:35 sarek cib: [4037]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:36 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:36 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:36 sarek heartbeat: [4018]: info: Channel staus: 3
>>Nov 8 19:03:36 sarek heartbeat: [4018]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:36 sarek heartbeat: [4018]: ERROR: Client /usr/lib/heartbeat/stonithd killed by signal 11.
>>Nov 8 19:03:36 sarek heartbeat: [4018]: ERROR: Respawning client "/usr/lib/heartbeat/stonithd":
>>Nov 8 19:03:36 sarek heartbeat: [4018]: info: Starting child client "/usr/lib/heartbeat/stonithd" (0,0)
>>Nov 8 19:03:36 sarek heartbeat: [25368]: info: Starting "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 25368)
>>Nov 8 19:03:36 sarek stonithd: [25368]: info: Enable using logging daemon
>>Nov 8 19:03:36 sarek stonithd: [25368]: info: G_main_add_SignalHandler: Added signal handler for signal 10
>>Nov 8 19:03:36 sarek stonithd: [25368]: info: G_main_add_SignalHandler: Added signal handler for signal 12
>>Nov 8 19:03:36 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:36 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:36 sarek stonithd: [25368]: info: pid 25368 locked in memory.
>>Nov 8 19:03:36 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>Nov 8 19:03:36 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:36 sarek stonithd: [25368]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:37 sarek stonithd: [25368]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:37 sarek stonithd: [25368]: info: Channel staus: 1
>>Nov 8 19:03:37 sarek stonithd: [25368]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:37 sarek stonithd: [25368]: info: Signing in with heartbeat.
>>Nov 8 19:03:37 sarek stonithd: [25368]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:37 sarek stonithd: [25368]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:37 sarek stonithd: [25368]: info: Channel staus: 1
>>Nov 8 19:03:38 sarek stonithd: [25368]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:38 sarek stonithd: [25368]: notice: /usr/lib/heartbeat/stonithd start up successfully.
>>Nov 8 19:03:38 sarek stonithd: [25368]: info: G_main_add_SignalHandler: Added signal handler for signal 17
>>Nov 8 19:03:38 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:38 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:38 sarek heartbeat: [4018]: info: Channel staus: 1
>>Nov 8 19:03:38 sarek heartbeat: [4018]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:38 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:38 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:38 sarek heartbeat: [4018]: info: Channel staus: 1
>>Nov 8 19:03:38 sarek heartbeat: [4018]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>Nov 8 19:03:38 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will audit the ha_msg.
>>Nov 8 19:03:38 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will detect the status of the channel as an indirect checking
>>Nov 8 19:03:38 sarek heartbeat: [4018]: info: Channel staus: 1
>>Nov 8 19:03:38 sarek heartbeat: [4018]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>
>>
>>------------------------------------------------------------------------
>>
>>_______________________________________________
>>Linux-HA mailing list
>>Linux-HA [at] lists
>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>See also: http://linux-ha.org/ReportingProblems
>>

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


hasjd at cn

Nov 9, 2005, 4:20 AM

Post #44 of 55 (1846 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

Hi Stefan Peinkofer,

I removed ZAPCHAN according to gshi's suggestion. Can you have a try of CVS HEAD
again. Thanks a lots in advance.

Guochun Shi wrote:
> Nov 8 19:03:35 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will
> audit the ha_msg.
> Nov 8 19:03:35 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will
> detect the status of the channel as an indirect checking
> Nov 8 19:03:35 sarek heartbeat: [4018]: WARN: Exiting
> /usr/lib/heartbeat/stonithd process 4038 killed by signal 11.
> Nov 8 19:03:35 sarek heartbeat: [4018]: ERROR: Exiting
> /usr/lib/heartbeat/stonithd process 4038 dumped core
>
>
> So the message is fine, the channel is messed up.
>
> I suspect the channel has already been destroied when the core dump
> happened.
>
> -Guochun
>
>
>
>
> Stefan Peinkofer wrote:
>
>> Hello Sun Jiang Dong,
>> On Tue, 2005-11-08 at 18:23 +0800, Sun Jiang Dong wrote:
>>
>>
>>
>>>>>>>>>>> Anyway I think the problem you met has been fixed in CVS.
>>>>>>>>>>> Please have a try.
>>>>>>>>>>> If you still meet it, please tell me. Thanks.
>>>>>>>>>
>>>>>>>>> That was Problem 2 (cannot add field to ha_msg Error) which was
>>>>>>>>> fixed one or two weeks ago. What I mean is Problem 1 the
>>>>>>>>> stonithd coredump + not properly handled restart of the
>>>>>>>>> stonithd resources, after the core dump.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> And, I put some more safeguards into the code which was
>>>>>>>>>> implicated. And, gshi fixed a somewhat-related problem.
>>>>>>>>>>
>>>>>>>>>> Could you try again from CVS(HEAD)?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I tried one from 2005-11-2 but it had still the problem 2. I
>>>>>>>>> will make a new try tomorrow and report the results.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I tryed the recent CVS HEAD, and it shows still the same
>>>>>>>> behavior. After some time heartbeat was running:
>>>>>>>> Nov 5 11:47:58 sarek lrmd: [9297]: WARN: on_op_timeout_expired:
>>>>>>>> TIMEOUT: operation monitor[22] on stonith::wti_nps::kill_spock
>>>>>>>> for client 9298, its parameters: timeout=5000
>>>>>>>> ipaddr=192.168.1.204 te-target-rc=7 lrm-is-probe=true
>>>>>>>> password=XXXXX crm_feature_set=1.0.3 interval=10000 ...
>>>>>>>> Nov 5 11:48:01 sarek crmd: [9298]: ERROR:
>>>>>>>> mask(lrm.c:do_lrm_event): LRM operation (22) monitor_10000 on
>>>>>>>> kill_spock Timed Out
>>>>>>>> ...
>>>>>>>> Nov 5 11:48:02 sarek crmd: [9298]: info:
>>>>>>>> mask(lrm.c:do_lrm_rsc_op): Performing op stop on kill_spock
>>>>>>>> Nov 5 11:48:02 sarek crmd: [9298]: WARN:
>>>>>>>> mask(lrm.c:do_lrm_event): LRM operation (22) monitor_10000 on
>>>>>>>> kill_spock Cancelled
>>>>>>>> ...
>>>>>>>> Nov 5 11:48:04 sarek crmd: [9298]: info:
>>>>>>>> mask(lrm.c:do_lrm_rsc_op): Performing op start on kill_spoc
>>>>>>>> ...
>>>>>>>> Nov 5 11:48:20 sarek crmd: [9298]: ERROR:
>>>>>>>> mask(lrm.c:do_lrm_event): LRM operation (26) start_0 on
>>>>>>>> kill_spock Error: unknown error
>>>>>>>> ..
>>>>>>>> Nov 5 11:48:21 sarek crmd: [9298]: info:
>>>>>>>> mask(lrm.c:do_lrm_rsc_op): Performing op stop on kill_spock
>>>>>>>> Nov 5 11:48:21 sarek stonithd: [9296]: notice: try to stop a
>>>>>>>> resource kill_spock who is not in started resource queue.
>>>>>>>> Nov 5 11:48:22 sarek crmd: [9298]: info:
>>>>>>>> mask(lrm.c:do_update_resource): Updating kill_spock resource
>>>>>>>> definitions after stop op
>>>>>>>> ...
>>>>>>>> Nov 5 11:48:24 sarek heartbeat: [9261]: WARN: Exiting
>>>>>>>> /usr/lib/heartbeat/stonithd process 9296 killed by signal 11.
>>>>>>>> Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Exiting
>>>>>>>> /usr/lib/heartbeat/stonithd process 9296 dumped core
>>>>>>>> Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Client
>>>>>>>> /usr/lib/heartbeat/stonithd killed by signal 11.
>>>>>>>> Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Respawning
>>>>>>>> client "/usr/lib/heartbeat/stonithd":
>>>>>>>> Nov 5 11:48:24 sarek heartbeat: [9261]: info: Starting child
>>>>>>>> client "/usr/lib/heartbeat/stonithd" (0,0)
>>>>>>>> Nov 5 11:48:24 sarek heartbeat: [17057]: info: Starting
>>>>>>>> "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 17057)
>>>>>>>>
>>>>>>>
>>>>>>> I'm puzzled by this issue ( stonithd killed by signal 11 ) for a
>>>>>>> long time, because it's not reproduced on my machine.
>>>>>>> It's so fortune for me you can reproduce it stably. ;-)
>>>>>>>
>>>>>>
>>>>>> In fact it is killed everytime I start heartbeat. Sometimes it is
>>>>>> killed after 4 or 5 minutes takes a little bit longer (1 hour)
>>>>>> (subjective impression is that it takes longer if the machine is
>>>>>> fresh rebooted)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> I make a small patch again to current HEAD file
>>>>>>> lib/clplumbing/cl_msg.c. Can you please apply it and try again?
>>>>>>> This should be helpful for me to located the issue more further.
>>>>>>> Thanks a lots in advance.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> OK, used the current CVS HEAD from today. I have attached the logs
>>>>>> of both nodes.
>>>>>> Im not 100 percent sure yet, but it seems, that if stonithd
>>>>>> segfaulted one time, and therefore no monitor operations are
>>>>>> carried out anymore it will not segfault anymore. So maybe the
>>>>>> monitor operation causes the segfault somehow???
>>>>>> (Just wanted to mention that, perhaps it's helpful)
>>>>>>
>>>>>
>>>>> Thanks so much for your help.
>>>>> Besides, do you apply my small patch as the attachment? I cannot
>>>>> see the output
>>>>> from the small patch.
>>>>
>>>>
>>>>
>>>>> And, from the log you attached, it seems the issue of this time has
>>>>> a different cause comparing to the last one. I added several memory
>>>>> initializing statements in CVS. Could you please have a try again.
>>>>> Thanks and waiting for your result.
>>>>>
>>>>>
>>>>
>>>> Ups, I misunderstood your mail, I though the patch were in the CVS
>>>> HEAD,
>>>> sorry. I think I will be able to apply the patch in a few hours and
>>>> then
>>>> mail you the logs.
>>>>
>>>>
>>>
>>> No problem. Look forward to your result.
>>>
>>
>> OK, I applied the patch some hours ago and started heartbeat. Somehow,
>> it took much longer until stontithd segfaulted (3 hours against few
>> minutes).
>> Since the log file is pretty hughe (6.6mb unziped and 176kb bzipped) I
>> attached only a little part of it. If you want me to mail the full logs
>> directely, let me know.
>>
>> Many thanks in advance.
>> Stefan Peinkofer
>>
>>
>>>>>> BTW: I would much appreciate it, if someone could get LRM (or CRM)
>>>>>> to restart the stonith resources reliably, in such a case. It's
>>>>>> maybe sufficient if the stonith resources get restarted until the
>>>>>> start operation succeeds. Is there somewhere a trigger in cib.xml
>>>>>> where I can specify, try to restart infinitely? (or at least try
>>>>>> it 100 times or so :)
>>>>>>
>>>>>
>>>>> I'll file a bug for this, but currently only for tracking the
>>>>> requirement.
>>>>> http://www.osdl.org/developer_bugzilla/show_bug.cgi?id=950
>>>>>
>>>>
>>>> Many thanks for that.
>>>>
>>>
>>> Welcome.
>>>
>>>
>>>> Stefan Peinkofer
>>>>
>>>>
>>>>
>>>>>> Many thanks in advance.
>>>>>> Stefan Peinkofer
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> Note, I haven't attached full logs + core backtrace since the
>>>>>>>> look like the onesI have provided in the former mail. If you
>>>>>>>> want them regardles of that, let me know.
>>>>>>>> BTW. At least the OCF resource script IPAddr in the recent CVS
>>>>>>>> HEAD is "broken" (at least for my system). To get heartbeat
>>>>>>>> working for testing Problem 2 status, I used the ones from a CVS
>>>>>>>> version from 2005-11-02. I have no time today to investigate
>>>>>>>> further, but I think I will look at it closer towmorrow evening.
>>>>>>>> Many thanks in advance.
>>>>>>>>
>>>>>>>
>>>>>>> BTW, I fixed the broken issue of the OCF IPAddr.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Stefan Peinkofer
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I have to Thank.
>>>>>>>>> Stefan Peinkofer
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Alan Robertson <alanr [at] unix>
>>>>>>>>>>
>>>>>>>>>> "Openness is the foundation and preservative of friendship...
>>>>>>>>>> Let me claim from you at all times your undisguised opinions."
>>>>>>>>>> - William Wilberforce
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Linux-HA mailing list
>>>>>>>>>> Linux-HA [at] lists
>>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Linux-HA mailing list
>>>>>>>>> Linux-HA [at] lists
>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Linux-HA mailing list
>>>>>>>> Linux-HA [at] lists
>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> BRs,
>>>>>>>
>>>>>>> Sun Jiang Dong
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Index: cl_msg.c
>>>>>>> ===================================================================
>>>>>>> RCS file: /home/cvs/linux-ha/linux-ha/lib/clplumbing/cl_msg.c,v
>>>>>>> retrieving revision 1.101
>>>>>>> diff -u -r1.101 cl_msg.c
>>>>>>> --- cl_msg.c 3 Nov 2005 22:28:32 -0000 1.101
>>>>>>> +++ cl_msg.c 7 Nov 2005 07:35:43 -0000
>>>>>>> @@ -1964,11 +1964,24 @@
>>>>>>> return HA_FAIL;
>>>>>>> }
>>>>>>>
>>>>>>> + /* + * Just for debugging bug 730, will remove it after
>>>>>>> the bug is fixed.
>>>>>>> + * http://www.osdl.org/developer_bugzilla/show_bug.cgi?id=730
>>>>>>> + */
>>>>>>> + cl_log(LOG_INFO, "%s:%d: Will audit the ha_msg.",
>>>>>>> __FUNCTION__, __LINE__);
>>>>>>> + AUDITMSG(m); +
>>>>>>> + cl_log(LOG_INFO, "%s:%d: Will detect the status of the
>>>>>>> channel as an "
>>>>>>> + " indirect checking", __FUNCTION__, __LINE__);
>>>>>>> + cl_log(LOG_INFO, "Channel staus: %d",
>>>>>>> ch->ops->get_chan_status(ch));
>>>>>>> +
>>>>>>> if ((imsg = hamsg2ipcmsg(m, ch)) == NULL) {
>>>>>>> cl_log(LOG_ERR, "hamsg2ipcmsg() failure");
>>>>>>> return HA_FAIL;
>>>>>>> }
>>>>>>>
>>>>>>> + cl_log(LOG_INFO, "%s:%d: hamsg2ipcmsg() ok.", __FUNCTION__,
>>>>>>> __LINE__);
>>>>>>> +
>>>>>>> if (ch->ops->send(ch, imsg) != IPC_OK) {
>>>>>>> if (ch->ch_status == IPC_CONNECT) {
>>>>>>> snprintf(ch->failreason,MAXFAILREASON,
>>>>>>
>>>>>>
>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Linux-HA mailing list
>>>>>>> Linux-HA [at] lists
>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Linux-HA mailing list
>>>>>>> Linux-HA [at] lists
>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Linux-HA mailing list
>>>>> Linux-HA [at] lists
>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> Nov 8 19:02:51 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:02:51 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_common_callback): Operation cib_query from
>>> client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>> Nov 8 19:02:51 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:02:51 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:02:51 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:02:51 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:02:53 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971:
>>> Will audit the ha_msg.
>>> Nov 8 19:02:53 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975:
>>> Will detect the status of the channel as an indirect checking
>>> Nov 8 19:02:53 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>> Nov 8 19:02:53 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:02:53 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_common_callback): Operation cib_query from
>>> client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>> Nov 8 19:02:53 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:02:53 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:02:53 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:02:53 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:02:56 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971:
>>> Will audit the ha_msg.
>>> Nov 8 19:02:56 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975:
>>> Will detect the status of the channel as an indirect checking
>>> Nov 8 19:02:56 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>> Nov 8 19:02:56 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:02:56 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_common_callback): Operation cib_query from
>>> client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>> Nov 8 19:02:56 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:02:56 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:02:56 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:02:56 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:02:57 sarek lrmd: [25206]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:02:57 sarek lrmd: [25206]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:02:57 sarek lrmd: [25206]: info: Channel staus: 1
>>> Nov 8 19:02:57 sarek lrmd: [25206]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:02:57 sarek lrmd: [25206]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:02:57 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:02:57 sarek lrmd: [25206]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:02:57 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:02:57 sarek lrmd: [25206]: info: Channel staus: 1
>>> Nov 8 19:02:57 sarek stonithd: [4038]: info: Channel staus: 1
>>> Nov 8 19:02:57 sarek lrmd: [25206]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:02:57 sarek stonithd: [4038]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:02:57 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:02:57 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:02:57 sarek stonithd: [4038]: info: Channel staus: 1
>>> Nov 8 19:02:57 sarek stonithd: [4038]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:02:58 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971:
>>> Will audit the ha_msg.
>>> Nov 8 19:02:58 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975:
>>> Will detect the status of the channel as an indirect checking
>>> Nov 8 19:02:58 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>> Nov 8 19:02:58 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:02:58 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_common_callback): Operation cib_query from
>>> client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>> Nov 8 19:02:58 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:02:58 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:02:58 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:02:58 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:00 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971:
>>> Will audit the ha_msg.
>>> Nov 8 19:03:00 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975:
>>> Will detect the status of the channel as an indirect checking
>>> Nov 8 19:03:00 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>> Nov 8 19:03:00 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:00 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_common_callback): Operation cib_query from
>>> client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>> Nov 8 19:03:00 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:03:00 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:00 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:03:00 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:02 sarek lrmd: [4039]: WARN: on_op_timeout_expired:
>>> TIMEOUT: operation monitor[22] on stonith::wti_nps::kill_spock for
>>> client 4040, its parameters: timeout=5000 ipaddr=192.168.1.204
>>> te-target-rc=7 lrm-is-probe=true password=XXXXXX
>>> crm_feature_set=1.0.3 interval=10000 .
>>> Nov 8 19:03:02 sarek T/O PS:: F S UID PID PPID C PRI NI
>>> ADDR SZ WCHAN STIME TTY TIME CMD
>>> Nov 8 19:03:02 sarek t/o ps:: PID TTY STAT TIME COMMAND
>>> Nov 8 19:03:02 sarek t/o ps:: 1 ? S 0:00 init
>>> [3] Nov 8 19:03:02 sarek t/o ps:: 2
>>> ? S 0:00 [migration/0]
>>> Nov 8 19:03:02 sarek t/o ps:: 3 ? SN 0:00 [ksoftirqd/0]
>>> Nov 8 19:03:02 sarek t/o ps:: 4 ? S 0:00 [migration/1]
>>> Nov 8 19:03:02 sarek t/o ps:: 5 ? SN 0:00 [ksoftirqd/1]
>>> Nov 8 19:03:02 sarek t/o ps:: 6 ? S 0:00 [migration/2]
>>> Nov 8 19:03:02 sarek t/o ps:: 7 ? SN 0:00 [ksoftirqd/2]
>>> Nov 8 19:03:02 sarek t/o ps:: 8 ? S 0:00 [migration/3]
>>> Nov 8 19:03:02 sarek t/o ps:: 9 ? SN 0:00 [ksoftirqd/3]
>>> Nov 8 19:03:02 sarek t/o ps:: 10 ? S< 0:00 [events/0]
>>> Nov 8 19:03:02 sarek t/o ps:: 11 ? S< 0:00 [events/1]
>>> Nov 8 19:03:02 sarek t/o ps:: 12 ? S< 0:00 [events/2]
>>> Nov 8 19:03:02 sarek t/o ps:: 13 ? S< 0:00 [events/3]
>>> Nov 8 19:03:02 sarek t/o ps:: 14 ? S< 0:00 [khelper]
>>> Nov 8 19:03:02 sarek t/o ps:: 15 ? S< 0:00 [kacpid]
>>> Nov 8 19:03:02 sarek t/o ps:: 41 ? S< 0:00 [kblockd/0]
>>> Nov 8 19:03:02 sarek t/o ps:: 42 ? S< 0:00 [kblockd/1]
>>> Nov 8 19:03:02 sarek t/o ps:: 43 ? S< 0:00 [kblockd/2]
>>> Nov 8 19:03:02 sarek t/o ps:: 44 ? S< 0:00 [kblockd/3]
>>> Nov 8 19:03:02 sarek t/o ps:: 54 ? S 0:00 [pdflush]
>>> Nov 8 19:03:02 sarek t/o ps:: 55 ? S 0:00 [pdflush]
>>> Nov 8 19:03:02 sarek t/o ps:: 57 ? S< 0:00 [aio/0]
>>> Nov 8 19:03:02 sarek t/o ps:: 58 ? S< 0:00 [aio/1]
>>> Nov 8 19:03:02 sarek t/o ps:: 59 ? S< 0:00 [aio/2]
>>> Nov 8 19:03:02 sarek t/o ps:: 60 ? S< 0:00 [aio/3]
>>> Nov 8 19:03:02 sarek t/o ps:: 45 ? S 0:00 [khubd]
>>> Nov 8 19:03:02 sarek t/o ps:: 56 ? S 0:00 [kswapd0]
>>> Nov 8 19:03:02 sarek t/o ps:: 133 ? S 0:00 [kseriod]
>>> Nov 8 19:03:02 sarek t/o ps:: 203 ? S 0:00 [scsi_eh_0]
>>> Nov 8 19:03:02 sarek t/o ps:: 204 ? S 0:00 [ahd_dv_0]
>>> Nov 8 19:03:02 sarek t/o ps:: 235 ? S 0:00 [scsi_eh_1]
>>> Nov 8 19:03:02 sarek t/o ps:: 236 ? S 0:00 [ahd_dv_1]
>>> Nov 8 19:03:02 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971:
>>> Will audit the ha_msg.
>>> Nov 8 19:03:02 sarek t/o ps:: 242 ? S 0:00 [scsi_eh_2]
>>> Nov 8 19:03:02 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975:
>>> Will detect the status of the channel as an indirect checking
>>> Nov 8 19:03:02 sarek t/o ps:: 243 ? S< 0:00
>>> [qla2300_2_dpc]
>>> Nov 8 19:03:03 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>> Nov 8 19:03:03 sarek t/o ps:: 261 ? S 0:00 [scsi_eh_3]
>>> Nov 8 19:03:03 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:03 sarek t/o ps:: 262 ? S< 0:00
>>> [qla2300_3_dpc]
>>> Nov 8 19:03:03 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_common_callback): Operation cib_query from
>>> client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>> Nov 8 19:03:03 sarek t/o ps:: 270 ? S 0:00 [md2_raid1]
>>> Nov 8 19:03:03 sarek t/o ps:: 272 ? S 0:00 [md1_raid1]
>>> Nov 8 19:03:03 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:03:03 sarek t/o ps:: 273 ? S 0:02 [md0_raid1]
>>> Nov 8 19:03:03 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:03 sarek t/o ps:: 274 ? D 0:46 [kjournald]
>>> Nov 8 19:03:03 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:03:03 sarek t/o ps:: 1599 ? S<s 0:00 udevd
>>> Nov 8 19:03:03 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:03 sarek t/o ps:: 1829 ? S< 0:00 [kmirrord/0]
>>> Nov 8 19:03:03 sarek t/o ps:: 1830 ? S< 0:00 [kmirrord/1]
>>> Nov 8 19:03:03 sarek t/o ps:: 1831 ? S< 0:00 [kmirrord/2]
>>> Nov 8 19:03:03 sarek t/o ps:: 1832 ? S< 0:00 [kmirrord/3]
>>> Nov 8 19:03:03 sarek t/o ps:: 1870 ? S 0:00 [kjournald]
>>> Nov 8 19:03:03 sarek t/o ps:: 2645 ? Ds 0:13 syslogd -m 0
>>> Nov 8 19:03:03 sarek t/o ps:: 2649 ? Ss 0:00 klogd -x
>>> Nov 8 19:03:03 sarek t/o ps:: 2659 ? Ss 0:00 irqbalance
>>> Nov 8 19:03:03 sarek t/o ps:: 2676 ? Ss 0:00 portmap
>>> Nov 8 19:03:03 sarek t/o ps:: 2695 ? Ss 0:00 rpc.statd
>>> Nov 8 19:03:03 sarek t/o ps:: 2706 ? Ss 0:00 mdadm
>>> --monitor --scan -f
>>> Nov 8 19:03:04 sarek t/o ps:: 2736 ? Ss 0:00 rpc.idmapd
>>> Nov 8 19:03:04 sarek t/o ps:: 2903 ? S 0:00
>>> /usr/sbin/smartd
>>> Nov 8 19:03:04 sarek t/o ps:: 2912 ? Ss 0:00
>>> /usr/sbin/acpid
>>> Nov 8 19:03:04 sarek t/o ps:: 2923 ? Ss 0:00 cupsd
>>> Nov 8 19:03:04 sarek t/o ps:: 2987 ? Ss 0:00 /usr/sbin/sshd
>>> Nov 8 19:03:04 sarek t/o ps:: 3006 ? Ss 0:00 xinetd
>>> -stayalive -pidfile /var/run/xinetd.pid
>>> Nov 8 19:03:04 sarek t/o ps:: 3021 ? SLs 0:00 ntpd -u
>>> ntp:ntp -p /var/run/ntpd.pid
>>> Nov 8 19:03:04 sarek t/o ps:: 3039 ? Ss 0:00 sendmail:
>>> accepting connections
>>> Nov 8 19:03:04 sarek t/o ps:: 3047 ? Ss 0:00 sendmail:
>>> Queue runner [at] 0:00:00 for /var/spool/clientmqueue
>>> Nov 8 19:03:04 sarek t/o ps:: 3077 ? Ss 0:00
>>> /usr/sbin/htt -retryonerror 0
>>> Nov 8 19:03:04 sarek t/o ps:: 3078 ? S 0:00 htt_server
>>> -nodaemon
>>> Nov 8 19:03:04 sarek t/o ps:: 3087 ? Ss 0:00 crond
>>> Nov 8 19:03:04 sarek t/o ps:: 3127 ? Ss 0:00 xfs
>>> -droppriv -daemon
>>> Nov 8 19:03:04 sarek t/o ps:: 3144 ? Ss 0:00 /usr/sbin/atd
>>> Nov 8 19:03:04 sarek t/o ps:: 3153 ? Ssl 0:00
>>> dbus-daemon-1 --system
>>> Nov 8 19:03:04 sarek t/o ps:: 3164 ? Ss 0:00 rhnsd
>>> --interval 60
>>> Nov 8 19:03:04 sarek t/o ps:: 3173 ? Ss 0:01 hald
>>> Nov 8 19:03:04 sarek t/o ps:: 3183 tty1 Ss+ 0:00
>>> /sbin/mingetty tty1
>>> Nov 8 19:03:04 sarek t/o ps:: 3185 tty2 Ss+ 0:00
>>> /sbin/mingetty tty2
>>> Nov 8 19:03:04 sarek t/o ps:: 3186 tty3 Ss+ 0:00
>>> /sbin/mingetty tty3
>>> Nov 8 19:03:04 sarek t/o ps:: 3187 tty4 Ss+ 0:00
>>> /sbin/mingetty tty4
>>> Nov 8 19:03:04 sarek t/o ps:: 3188 tty5 Ss+ 0:00
>>> /sbin/mingetty tty5
>>> Nov 8 19:03:04 sarek t/o ps:: 3190 tty6 Ss+ 0:00
>>> /sbin/mingetty tty6
>>> Nov 8 19:03:04 sarek t/o ps:: 3898 ? Ss 0:00 sshd:
>>> root [at] pt/0 Nov 8 19:03:04 sarek t/o ps:: 3900 pts/0 Ss+ 0:00
>>> -bash
>>> Nov 8 19:03:04 sarek t/o ps:: 3953 ? Ss 0:00 sshd:
>>> root [at] pt/1 Nov 8 19:03:04 sarek t/o ps:: 3955 pts/1 Ss+ 0:00
>>> -bash
>>> Nov 8 19:03:04 sarek t/o ps:: 3995 pts/1 S 0:12 ha_logd:
>>> read process Nov 8 19:03:04 sarek t/o ps:: 4017 pts/1
>>> S 0:16 ha_logd: write process Nov 8 19:03:04 sarek t/o
>>> ps:: 4018 ? SLs 0:24 heartbeat: master control process
>>> Nov 8 19:03:04 sarek t/o ps:: 4025 ? SL 0:00 heartbeat:
>>> FIFO reader Nov 8 19:03:04 sarek t/o ps:: 4026 ? SL
>>> 0:00 heartbeat: write: bcast eth3
>>> Nov 8 19:03:04 sarek t/o ps:: 4027 ? SL 0:00 heartbeat:
>>> read: bcast eth3 Nov 8 19:03:04 sarek t/o ps:: 4028 ? SL
>>> 0:00 heartbeat: write: ping gomtuu.rz.fh-muenchen.de
>>> Nov 8 19:03:05 sarek t/o ps:: 4029 ? SL 0:02 heartbeat:
>>> read: ping gomtuu.rz.fh-muenchen.de
>>> Nov 8 19:03:05 sarek t/o ps:: 4030 ? SL 0:01 heartbeat:
>>> write: ping nagilum.rz.fh-muenchen.de
>>> Nov 8 19:03:05 sarek t/o ps:: 4031 ? SL 0:02 heartbeat:
>>> read: ping nagilum.rz.fh-muenchen.de
>>> Nov 8 19:03:05 sarek t/o ps:: 4032 ? SL 0:02 heartbeat:
>>> write: ping infotest.rz.fh-muenchen.de
>>> Nov 8 19:03:05 sarek t/o ps:: 4033 ? SL 0:03 heartbeat:
>>> read: ping infotest.rz.fh-muenchen.de
>>> Nov 8 19:03:05 sarek t/o ps:: 4034 ? SL 0:02 heartbeat:
>>> write: ping infotst2.rz.fh-muenchen.de
>>> Nov 8 19:03:05 sarek t/o ps:: 4035 ? SL 0:02 heartbeat:
>>> read: ping infotst2.rz.fh-muenchen.de
>>> Nov 8 19:03:05 sarek t/o ps:: 4036 ? S 0:00
>>> /usr/lib/heartbeat/ccm
>>> Nov 8 19:03:05 sarek t/o ps:: 4037 ? S 8:00
>>> /usr/lib/heartbeat/cib
>>> Nov 8 19:03:05 sarek t/o ps:: 4038 ? SL 0:02
>>> /usr/lib/heartbeat/stonithd
>>> Nov 8 19:03:05 sarek t/o ps:: 4039 ? S 0:01
>>> /usr/lib/heartbeat/lrmd
>>> Nov 8 19:03:05 sarek t/o ps:: 4040 ? S 0:00
>>> /usr/lib/heartbeat/crmd
>>> Nov 8 19:03:05 sarek t/o ps:: 4041 ? Ss 0:01 sshd:
>>> root [at] pt/2 Nov 8 19:03:05 sarek t/o ps:: 4043 pts/2 Ss 0:00
>>> -bash
>>> Nov 8 19:03:05 sarek t/o ps:: 4076 pts/2 S+ 16:10
>>> /root/cluster/heartbeat/heartbeat_current/linux-ha-2005-11-7-2/crm/admin/.libs/lt-crm_mon
>>> -i 2
>>> Nov 8 19:03:05 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971:
>>> Will audit the ha_msg.
>>> Nov 8 19:03:05 sarek t/o ps:: 4299 ? S 0:00 [kjournald]
>>> Nov 8 19:03:05 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975:
>>> Will detect the status of the channel as an indirect checking
>>> Nov 8 19:03:05 sarek t/o ps:: 4345 ? S 0:02
>>> /usr/postgres/bin/postmaster -D /telebase/data
>>> Nov 8 19:03:05 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>> Nov 8 19:03:05 sarek t/o ps:: 4370 ? S 0:00 postgres:
>>> writer process Nov 8 19:03:05 sarek lt-crm_mon:
>>> [4076]: info: msg2ipcchan:1983: hamsg2ipcmsg() ok.
>>> Nov 8 19:03:05 sarek t/o ps:: 4371 ? S 0:00 postgres:
>>> stats buffer process Nov 8 19:03:05 sarek cib:
>>> [4037]: info: mask(callbacks.c:cib_common_callback): Operation
>>> cib_query from client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>> Nov 8 19:03:05 sarek t/o ps:: 4372 ? S 0:00 postgres:
>>> stats collector process Nov 8 19:03:05 sarek t/o ps::
>>> 25206 ? S 0:00 /usr/lib/heartbeat/lrmd
>>> Nov 8 19:03:05 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:03:05 sarek t/o ps:: 25207 ? S 0:00
>>> /usr/lib/heartbeat/stonithd
>>> Nov 8 19:03:05 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:05 sarek t/o ps:: 25231 ? S 0:00 sh -c ps
>>> axww | logger -p daemon.info -t 't/o ps:'
>>> Nov 8 19:03:05 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:03:05 sarek t/o ps:: 25232 ? R 0:00 ps axww
>>> Nov 8 19:03:05 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:06 sarek t/o ps:: 25233 ? S 0:00 logger -p
>>> daemon.info -t t/o ps:
>>> Nov 8 19:03:06 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:06 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:06 sarek lrmd: [4039]: info: Channel staus: 1
>>> Nov 8 19:03:06 sarek crmd: [4040]: ERROR: mask(lrm.c:do_lrm_event):
>>> LRM operation (22) monitor_10000 on kill_spock Timed Out
>>> Nov 8 19:03:06 sarek lrmd: [4039]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:06 sarek crmd: [4040]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:06 sarek crmd: [4040]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:06 sarek crmd: [4040]: info: Channel staus: 1
>>> Nov 8 19:03:06 sarek crmd: [4040]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:06 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_common_callback): Operation cib_update from
>>> client 01d8a5e6-1642-4182-801b-91b74af0ee8/cib_rw
>>> Nov 8 19:03:07 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:03:07 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:07 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:03:07 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:07 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:07 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:07 sarek heartbeat: [4018]: info: Channel staus: 1
>>> Nov 8 19:03:07 sarek heartbeat: [4018]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:07 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:07 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:07 sarek heartbeat: [4018]: info: Channel staus: 1
>>> Nov 8 19:03:07 sarek heartbeat: [4018]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:07 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_peer_callback): Processing cib_apply_diff msg
>>> (171f) from spock
>>> Nov 8 19:03:07 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971:
>>> Will audit the ha_msg.
>>> Nov 8 19:03:07 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975:
>>> Will detect the status of the channel as an indirect checking
>>> Nov 8 19:03:08 sarek cib: [4037]: WARN: mask(io.c:initializeCib):
>>> Option suppress_cib_writes not set
>>> Nov 8 19:03:08 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>> Nov 8 19:03:08 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:08 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:03:08 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:08 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:03:08 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:08 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:08 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:08 sarek heartbeat: [4018]: info: Channel staus: 1
>>> Nov 8 19:03:08 sarek heartbeat: [4018]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:08 sarek crmd: [4040]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:08 sarek crmd: [4040]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:08 sarek crmd: [4040]: info: Channel staus: 1
>>> Nov 8 19:03:08 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:08 sarek crmd: [4040]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:09 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:09 sarek lrmd: [4039]: info: Channel staus: 1
>>> Nov 8 19:03:09 sarek lrmd: [4039]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:09 sarek crmd: [4040]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:09 sarek crmd: [4040]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:09 sarek crmd: [4040]: info: Channel staus: 1
>>> Nov 8 19:03:09 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:09 sarek crmd: [4040]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:09 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:09 sarek crmd: [4040]: info: mask(lrm.c:do_lrm_rsc_op):
>>> Performing op stop on kill_spock
>>> Nov 8 19:03:09 sarek lrmd: [4039]: info: Channel staus: 1
>>> Nov 8 19:03:09 sarek lrmd: [4039]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:09 sarek crmd: [4040]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:09 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:09 sarek crmd: [4040]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:10 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:10 sarek crmd: [4040]: info: Channel staus: 1
>>> Nov 8 19:03:10 sarek lrmd: [4039]: info: Channel staus: 1
>>> Nov 8 19:03:10 sarek crmd: [4040]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:10 sarek lrmd: [4039]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:10 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:10 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971:
>>> Will audit the ha_msg.
>>> Nov 8 19:03:10 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:10 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975:
>>> Will detect the status of the channel as an indirect checking
>>> Nov 8 19:03:10 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>> Nov 8 19:03:10 sarek lrmd: [4039]: info: Channel staus: 1
>>> Nov 8 19:03:10 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:11 sarek lrmd: [4039]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:11 sarek lrmd: [25254]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:11 sarek lrmd: [25254]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:11 sarek lrmd: [25254]: info: Channel staus: 1
>>> Nov 8 19:03:11 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:11 sarek crmd: [4040]: WARN: mask(lrm.c:do_lrm_event):
>>> LRM operation (22) monitor_10000 on kill_spock Cancelled
>>> Nov 8 19:03:11 sarek lrmd: [25254]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:11 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:11 sarek stonithd: [4038]: info: Channel staus: 1
>>> Nov 8 19:03:11 sarek lrmd: [25254]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:11 sarek stonithd: [4038]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:11 sarek lrmd: [25254]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:11 sarek lrmd: [25254]: info: Channel staus: 1
>>> Nov 8 19:03:11 sarek lrmd: [25254]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:12 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:12 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:12 sarek lrmd: [25254]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:12 sarek stonithd: [4038]: info: Channel staus: 1
>>> Nov 8 19:03:12 sarek lrmd: [25254]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:12 sarek stonithd: [4038]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:12 sarek lrmd: [25254]: info: Channel staus: 1
>>> Nov 8 19:03:12 sarek lrmd: [25254]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:12 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:12 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:13 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:13 sarek lrmd: [4039]: info: Channel staus: 1
>>> Nov 8 19:03:13 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:13 sarek lrmd: [4039]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:13 sarek stonithd: [4038]: info: Channel staus: 1
>>> Nov 8 19:03:13 sarek stonithd: [4038]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:13 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971:
>>> Will audit the ha_msg.
>>> Nov 8 19:03:13 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975:
>>> Will detect the status of the channel as an indirect checking
>>> Nov 8 19:03:13 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>> Nov 8 19:03:13 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:13 sarek crmd: [4040]: info:
>>> mask(lrm.c:do_update_resource): Updating kill_spock resource
>>> definitions after stop op
>>> Nov 8 19:03:13 sarek crmd: [4040]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:13 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:13 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:13 sarek crmd: [4040]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:13 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:14 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:14 sarek crmd: [4040]: info: Channel staus: 1
>>> Nov 8 19:03:14 sarek lrmd: [4039]: info: Channel staus: 1
>>> Nov 8 19:03:14 sarek stonithd: [4038]: info: Channel staus: 1
>>> Nov 8 19:03:14 sarek crmd: [4040]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:14 sarek lrmd: [4039]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:14 sarek stonithd: [4038]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:14 sarek crmd: [4040]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:14 sarek crmd: [4040]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:14 sarek crmd: [4040]: info: Channel staus: 1
>>> Nov 8 19:03:14 sarek crmd: [4040]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:14 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_common_callback): Operation cib_update from
>>> client 01d8a5e6-1642-4182-801b-91b74af0ee8/cib_rw
>>> Nov 8 19:03:15 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:03:15 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:15 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:03:15 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:15 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:15 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:15 sarek heartbeat: [4018]: info: Channel staus: 1
>>> Nov 8 19:03:15 sarek heartbeat: [4018]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:15 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:15 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:15 sarek heartbeat: [4018]: info: Channel staus: 1
>>> Nov 8 19:03:15 sarek heartbeat: [4018]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:15 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_peer_callback): Processing cib_apply_diff msg
>>> (1721) from spock
>>> Nov 8 19:03:15 sarek cib: [4037]: WARN: mask(io.c:initializeCib):
>>> Option suppress_cib_writes not set
>>> Nov 8 19:03:15 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:03:15 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:15 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971:
>>> Will audit the ha_msg.
>>> Nov 8 19:03:15 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:03:15 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975:
>>> Will detect the status of the channel as an indirect checking
>>> Nov 8 19:03:16 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:16 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>> Nov 8 19:03:16 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:16 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:16 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:16 sarek heartbeat: [4018]: info: Channel staus: 1
>>> Nov 8 19:03:16 sarek heartbeat: [4018]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:16 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_peer_callback): Processing cib_apply_diff msg
>>> (1722) from spock
>>> Nov 8 19:03:16 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:16 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:16 sarek heartbeat: [4018]: info: Channel staus: 1
>>> Nov 8 19:03:16 sarek heartbeat: [4018]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:16 sarek crmd: [4040]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:16 sarek crmd: [4040]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:16 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:16 sarek crmd: [4040]: info: Channel staus: 1
>>> Nov 8 19:03:17 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:17 sarek crmd: [4040]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:17 sarek lrmd: [4039]: info: Channel staus: 1
>>> Nov 8 19:03:17 sarek crmd: [4040]: info: mask(lrm.c:do_lrm_rsc_op):
>>> Performing op start on kill_spock
>>> Nov 8 19:03:17 sarek lrmd: [4039]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:17 sarek crmd: [4040]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:17 sarek crmd: [4040]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:17 sarek crmd: [4040]: info: Channel staus: 1
>>> Nov 8 19:03:17 sarek crmd: [4040]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:17 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:17 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:17 sarek lrmd: [4039]: info: Channel staus: 1
>>> Nov 8 19:03:18 sarek lrmd: [4039]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:18 sarek lrmd: [25256]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:18 sarek lrmd: [25256]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:18 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:18 sarek lrmd: [25256]: info: Channel staus: 1
>>> Nov 8 19:03:18 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:18 sarek lrmd: [25256]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:18 sarek stonithd: [4038]: info: Channel staus: 1
>>> Nov 8 19:03:18 sarek stonithd: [4038]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:18 sarek lrmd: [25256]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:18 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971:
>>> Will audit the ha_msg.
>>> Nov 8 19:03:18 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975:
>>> Will detect the status of the channel as an indirect checking
>>> Nov 8 19:03:18 sarek lrmd: [25256]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:18 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>> Nov 8 19:03:18 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:18 sarek lrmd: [25256]: info: Channel staus: 1
>>> Nov 8 19:03:19 sarek lrmd: [25256]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:19 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:19 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:19 sarek stonithd: [4038]: info: Channel staus: 1
>>> Nov 8 19:03:19 sarek stonithd: [4038]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:19 sarek cib: [4037]: WARN: mask(io.c:initializeCib):
>>> Option suppress_cib_writes not set
>>> Nov 8 19:03:19 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_common_callback): Operation cib_query from
>>> client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>> Nov 8 19:03:19 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:03:19 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:19 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:03:19 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:19 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_common_callback): Operation cib_query from
>>> client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>> Nov 8 19:03:19 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:03:19 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:20 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:03:20 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:20 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_common_callback): Operation cib_query from
>>> client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>> Nov 8 19:03:20 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:03:20 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:20 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:03:20 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:20 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_common_callback): Operation cib_query from
>>> client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>> Nov 8 19:03:20 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:03:20 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:20 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:03:20 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:20 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_common_callback): Operation cib_query from
>>> client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>> Nov 8 19:03:20 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:03:20 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:21 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:03:21 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:21 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971:
>>> Will audit the ha_msg.
>>> Nov 8 19:03:21 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975:
>>> Will detect the status of the channel as an indirect checking
>>> Nov 8 19:03:21 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>> Nov 8 19:03:21 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:21 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_common_callback): Operation cib_query from
>>> client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>> Nov 8 19:03:21 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:03:21 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:21 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:03:21 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:23 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971:
>>> Will audit the ha_msg.
>>> Nov 8 19:03:23 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975:
>>> Will detect the status of the channel as an indirect checking
>>> Nov 8 19:03:23 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>> Nov 8 19:03:23 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:23 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_common_callback): Operation cib_query from
>>> client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>> Nov 8 19:03:23 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:03:23 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:23 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:03:23 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:24 sarek lrmd: [25256]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:24 sarek lrmd: [25256]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:24 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:24 sarek lrmd: [25256]: info: Channel staus: 1
>>> Nov 8 19:03:24 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:24 sarek lrmd: [25256]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:24 sarek stonithd: [4038]: info: Channel staus: 1
>>> Nov 8 19:03:24 sarek stonithd: [4038]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:24 sarek lrmd: [25256]: WARN: mapped the invalid return
>>> code 14.
>>> Nov 8 19:03:24 sarek crmd: [4040]: ERROR: mask(lrm.c:do_lrm_event):
>>> LRM operation (26) start_0 on kill_spock Error: unknown error
>>> Nov 8 19:03:24 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:24 sarek postgres[25332]: [2-1] ERROR: database
>>> "hb_rg_testdb" does not exist
>>> Nov 8 19:03:24 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:24 sarek postgres[25332]: [2-2] STATEMENT: DROP
>>> DATABASE hb_rg_testdb
>>> Nov 8 19:03:25 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:25 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:25 sarek crmd: [4040]: info:
>>> mask(lrm.c:do_update_resource): Updating kill_spock resource
>>> definitions after start op
>>> Nov 8 19:03:25 sarek lrmd: [4039]: info: Channel staus: 1
>>> Nov 8 19:03:25 sarek stonithd: [4038]: info: Channel staus: 1
>>> Nov 8 19:03:25 sarek crmd: [4040]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:25 sarek lrmd: [4039]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:25 sarek stonithd: [4038]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:25 sarek crmd: [4040]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:25 sarek postgres[25339]: [2-1] NOTICE: CREATE TABLE /
>>> PRIMARY KEY will create implicit index "hb_rg_testtable_pkey" for
>>> table "hb_rg_testtable"
>>> Nov 8 19:03:25 sarek crmd: [4040]: info: Channel staus: 1
>>> Nov 8 19:03:25 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:25 sarek crmd: [4040]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:25 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:25 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971:
>>> Will audit the ha_msg.
>>> Nov 8 19:03:25 sarek lrmd: [4039]: info: Channel staus: 1
>>> Nov 8 19:03:25 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975:
>>> Will detect the status of the channel as an indirect checking
>>> Nov 8 19:03:26 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>> Nov 8 19:03:26 sarek lrmd: [4039]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:26 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:26 sarek crmd: [4040]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:26 sarek crmd: [4040]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:26 sarek crmd: [4040]: info: Channel staus: 1
>>> Nov 8 19:03:26 sarek crmd: [4040]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:26 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_common_callback): Operation cib_update from
>>> client 01d8a5e6-1642-4182-801b-91b74af0ee8/cib_rw
>>> Nov 8 19:03:26 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:03:26 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:26 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:03:26 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:26 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:26 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:27 sarek heartbeat: [4018]: info: Channel staus: 1
>>> Nov 8 19:03:27 sarek heartbeat: [4018]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:27 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:27 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:27 sarek heartbeat: [4018]: info: Channel staus: 1
>>> Nov 8 19:03:27 sarek heartbeat: [4018]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:27 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_peer_callback): Processing cib_apply_diff msg
>>> (172d) from spock
>>> Nov 8 19:03:27 sarek cib: [4037]: WARN: mask(io.c:initializeCib):
>>> Option suppress_cib_writes not set
>>> Nov 8 19:03:27 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:03:27 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:27 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:03:27 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:27 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:27 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:27 sarek heartbeat: [4018]: info: Channel staus: 1
>>> Nov 8 19:03:27 sarek heartbeat: [4018]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:28 sarek crmd: [4040]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:28 sarek crmd: [4040]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:28 sarek crmd: [4040]: info: Channel staus: 1
>>> Nov 8 19:03:28 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:28 sarek crmd: [4040]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:28 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971:
>>> Will audit the ha_msg.
>>> Nov 8 19:03:28 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:28 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975:
>>> Will detect the status of the channel as an indirect checking
>>> Nov 8 19:03:28 sarek crmd: [4040]: info: mask(lrm.c:do_lrm_rsc_op):
>>> Performing op stop on kill_spock
>>> Nov 8 19:03:28 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>> Nov 8 19:03:28 sarek lrmd: [4039]: info: Channel staus: 1
>>> Nov 8 19:03:28 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:29 sarek lrmd: [4039]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:29 sarek crmd: [4040]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:29 sarek crmd: [4040]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:29 sarek crmd: [4040]: info: Channel staus: 1
>>> Nov 8 19:03:29 sarek crmd: [4040]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:29 sarek lrmd: [25335]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:29 sarek lrmd: [25335]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:29 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:29 sarek lrmd: [25335]: info: Channel staus: 1
>>> Nov 8 19:03:29 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:29 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:30 sarek lrmd: [4039]: info: Channel staus: 1
>>> Nov 8 19:03:30 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:30 sarek lrmd: [25335]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:30 sarek stonithd: [4038]: info: Channel staus: 1
>>> Nov 8 19:03:30 sarek lrmd: [4039]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:30 sarek stonithd: [4038]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:30 sarek lrmd: [25335]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:30 sarek stonithd: [4038]: notice: try to stop a
>>> resource kill_spock who is not in started resource queue.
>>> Nov 8 19:03:30 sarek lrmd: [25335]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:30 sarek lrmd: [25335]: info: Channel staus: 1
>>> Nov 8 19:03:30 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:30 sarek lrmd: [25335]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:30 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:30 sarek stonithd: [4038]: info: Channel staus: 1
>>> Nov 8 19:03:30 sarek stonithd: [4038]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:31 sarek lrmd: [25335]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:31 sarek lrmd: [25335]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:31 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971:
>>> Will audit the ha_msg.
>>> Nov 8 19:03:31 sarek lrmd: [25335]: info: Channel staus: 1
>>> Nov 8 19:03:31 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975:
>>> Will detect the status of the channel as an indirect checking
>>> Nov 8 19:03:31 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>> Nov 8 19:03:31 sarek lrmd: [25335]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:31 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:31 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:31 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:31 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:31 sarek stonithd: [4038]: info: Channel staus: 1
>>> Nov 8 19:03:31 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:32 sarek stonithd: [4038]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:32 sarek lrmd: [4039]: info: Channel staus: 1
>>> Nov 8 19:03:32 sarek lrmd: [4039]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:32 sarek stonithd: [4038]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:32 sarek stonithd: [4038]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:32 sarek stonithd: [4038]: info: Channel staus: 1
>>> Nov 8 19:03:32 sarek stonithd: [4038]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:32 sarek crmd: [4040]: info:
>>> mask(lrm.c:do_update_resource): Updating kill_spock resource
>>> definitions after stop op
>>> Nov 8 19:03:32 sarek crmd: [4040]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:32 sarek crmd: [4040]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:32 sarek lrmd: [4039]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:32 sarek crmd: [4040]: info: Channel staus: 1
>>> Nov 8 19:03:32 sarek lrmd: [4039]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:33 sarek crmd: [4040]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:33 sarek lrmd: [4039]: info: Channel staus: 1
>>> Nov 8 19:03:33 sarek lrmd: [4039]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:33 sarek crmd: [4040]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:33 sarek crmd: [4040]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:33 sarek crmd: [4040]: info: Channel staus: 1
>>> Nov 8 19:03:33 sarek crmd: [4040]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:33 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_common_callback): Operation cib_update from
>>> client 01d8a5e6-1642-4182-801b-91b74af0ee8/cib_rw
>>> Nov 8 19:03:33 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:03:33 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:33 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:03:33 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:33 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:33 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971:
>>> Will audit the ha_msg.
>>> Nov 8 19:03:33 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:34 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975:
>>> Will detect the status of the channel as an indirect checking
>>> Nov 8 19:03:34 sarek heartbeat: [4018]: info: Channel staus: 1
>>> Nov 8 19:03:34 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>> Nov 8 19:03:34 sarek heartbeat: [4018]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:34 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:34 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:34 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:34 sarek heartbeat: [4018]: info: Channel staus: 1
>>> Nov 8 19:03:34 sarek heartbeat: [4018]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:34 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_peer_callback): Processing cib_apply_diff msg
>>> (172f) from spock
>>> Nov 8 19:03:34 sarek cib: [4037]: WARN: mask(io.c:initializeCib):
>>> Option suppress_cib_writes not set
>>> Nov 8 19:03:34 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:03:34 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:34 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:03:35 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:35 sarek cib: [4037]: info:
>>> mask(callbacks.c:cib_common_callback): Operation cib_query from
>>> client 0f8c334b-f562-4069-b5c3-d12ea6f68ab/cib_ro
>>> Nov 8 19:03:35 sarek cib: [4037]: info: msg2ipcchan:1971: Will audit
>>> the ha_msg.
>>> Nov 8 19:03:35 sarek cib: [4037]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:35 sarek cib: [4037]: info: Channel staus: 1
>>> Nov 8 19:03:35 sarek cib: [4037]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:36 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:36 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:36 sarek heartbeat: [4018]: info: Channel staus: 3
>>> Nov 8 19:03:36 sarek heartbeat: [4018]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:36 sarek heartbeat: [4018]: ERROR: Client
>>> /usr/lib/heartbeat/stonithd killed by signal 11.
>>> Nov 8 19:03:36 sarek heartbeat: [4018]: ERROR: Respawning client
>>> "/usr/lib/heartbeat/stonithd":
>>> Nov 8 19:03:36 sarek heartbeat: [4018]: info: Starting child client
>>> "/usr/lib/heartbeat/stonithd" (0,0)
>>> Nov 8 19:03:36 sarek heartbeat: [25368]: info: Starting
>>> "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 25368)
>>> Nov 8 19:03:36 sarek stonithd: [25368]: info: Enable using logging
>>> daemon
>>> Nov 8 19:03:36 sarek stonithd: [25368]: info:
>>> G_main_add_SignalHandler: Added signal handler for signal 10
>>> Nov 8 19:03:36 sarek stonithd: [25368]: info:
>>> G_main_add_SignalHandler: Added signal handler for signal 12
>>> Nov 8 19:03:36 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1971:
>>> Will audit the ha_msg.
>>> Nov 8 19:03:36 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1975:
>>> Will detect the status of the channel as an indirect checking
>>> Nov 8 19:03:36 sarek stonithd: [25368]: info: pid 25368 locked in
>>> memory.
>>> Nov 8 19:03:36 sarek lt-crm_mon: [4076]: info: Channel staus: 1
>>> Nov 8 19:03:36 sarek lt-crm_mon: [4076]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:36 sarek stonithd: [25368]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:37 sarek stonithd: [25368]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:37 sarek stonithd: [25368]: info: Channel staus: 1
>>> Nov 8 19:03:37 sarek stonithd: [25368]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:37 sarek stonithd: [25368]: info: Signing in with
>>> heartbeat.
>>> Nov 8 19:03:37 sarek stonithd: [25368]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:37 sarek stonithd: [25368]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:37 sarek stonithd: [25368]: info: Channel staus: 1
>>> Nov 8 19:03:38 sarek stonithd: [25368]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:38 sarek stonithd: [25368]: notice:
>>> /usr/lib/heartbeat/stonithd start up successfully.
>>> Nov 8 19:03:38 sarek stonithd: [25368]: info:
>>> G_main_add_SignalHandler: Added signal handler for signal 17
>>> Nov 8 19:03:38 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:38 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:38 sarek heartbeat: [4018]: info: Channel staus: 1
>>> Nov 8 19:03:38 sarek heartbeat: [4018]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:38 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:38 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:38 sarek heartbeat: [4018]: info: Channel staus: 1
>>> Nov 8 19:03:38 sarek heartbeat: [4018]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>> Nov 8 19:03:38 sarek heartbeat: [4018]: info: msg2ipcchan:1971: Will
>>> audit the ha_msg.
>>> Nov 8 19:03:38 sarek heartbeat: [4018]: info: msg2ipcchan:1975: Will
>>> detect the status of the channel as an indirect checking
>>> Nov 8 19:03:38 sarek heartbeat: [4018]: info: Channel staus: 1
>>> Nov 8 19:03:38 sarek heartbeat: [4018]: info: msg2ipcchan:1983:
>>> hamsg2ipcmsg() ok.
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Linux-HA mailing list
>>> Linux-HA [at] lists
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>

--
BRs,

Sun Jiang Dong

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


peinkofe at fhm

Nov 9, 2005, 7:44 AM

Post #45 of 55 (1848 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

Hello Sun Jiang Dong and Guochun Shi,

On Wed, 2005-11-16 at 17:07 +0800, Sun Jiang Dong wrote:
> Hi Stefan Peinkofer,
>
> I removed ZAPCHAN according to gshi's suggestion. Can you have a try
of CVS HEAD
> again. Thanks a lots in advance.
>
I tried the CVS HEAD (with Sun's patch applied) but nothing has changed.
I have attached the logfile.
Question: If I look at the end of the logfile I see that some time after
the segfault, the messages from Sun's patch disappear. Is this normal?
(It was not so in the prior CVS HEAD version)

>>>>>> Im not 100 percent sure yet, but it seems, that if stonithd
> >>>>>> segfaulted one time, and therefore no monitor operations are
> >>>>>> carried out anymore it will not segfault anymore. So maybe the
> >>>>>> monitor operation causes the segfault somehow???
> >>>>>> (Just wanted to mention that, perhaps it's helpful
Note, I let heartbeat run over the night, tonight.
After the segfault on sarek at 19:03 stonithd segfaulted not again.
(Watched until 11:40 on the next day because then I stopped and started
the new CVS HEAD version)

Many thanks in advance.
Stefan Peinkofer

> Guochun Shi wrote:
> > Nov 8 19:03:35 sarek stonithd: [4038]: info: msg2ipcchan:1971:
Will
> > audit the ha_msg.
> > Nov 8 19:03:35 sarek stonithd: [4038]: info: msg2ipcchan:1975:
Will
> > detect the status of the channel as an indirect checking
> > Nov 8 19:03:35 sarek heartbeat: [4018]: WARN: Exiting
> > /usr/lib/heartbeat/stonithd process 4038 killed by signal 11.
> > Nov 8 19:03:35 sarek heartbeat: [4018]: ERROR: Exiting
> > /usr/lib/heartbeat/stonithd process 4038 dumped core
> >
> >
> > So the message is fine, the channel is messed up.
> >
> > I suspect the channel has already been destroied when the core dump
> > happened.
> >
> > -Guochun
> >
> >
> >
> >
> > Stefan Peinkofer wrote:
> >
> >> Hello Sun Jiang Dong,
> >> On Tue, 2005-11-08 at 18:23 +0800, Sun Jiang Dong wrote:
> >>
> >>
> >>
> >>>>>>>>>>> Anyway I think the problem you met has been fixed in CVS.
> >>>>>>>>>>> Please have a try.
> >>>>>>>>>>> If you still meet it, please tell me.
Thanks.
> >>>>>>>>>
> >>>>>>>>> That was Problem 2 (cannot add field to ha_msg Error) which
was
> >>>>>>>>> fixed one or two weeks ago. What I mean is Problem 1 the
> >>>>>>>>> stonithd coredump + not properly handled restart of the
> >>>>>>>>> stonithd resources, after the core dump.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> And, I put some more safeguards into the code which was
> >>>>>>>>>> implicated. And, gshi fixed a somewhat-related problem.
> >>>>>>>>>>
> >>>>>>>>>> Could you try again from CVS(HEAD)?
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I tried one from 2005-11-2 but it had still the problem 2.
I
> >>>>>>>>> will make a new try tomorrow and report the results.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> I tryed the recent CVS HEAD, and it shows still the same
> >>>>>>>> behavior. After some time heartbeat was running:
> >>>>>>>> Nov 5 11:47:58 sarek lrmd: [9297]: WARN:
on_op_timeout_expired:
> >>>>>>>> TIMEOUT: operation monitor[22] on
stonith::wti_nps::kill_spock
> >>>>>>>> for client 9298, its parameters: timeout=5000
> >>>>>>>> ipaddr=192.168.1.204 te-target-rc=7 lrm-is-probe=true
> >>>>>>>> password=XXXXX crm_feature_set=1.0.3 interval=10000 ...
> >>>>>>>> Nov 5 11:48:01 sarek crmd: [9298]: ERROR:
> >>>>>>>> mask(lrm.c:do_lrm_event): LRM operation (22) monitor_10000
on
> >>>>>>>> kill_spock Timed Out
> >>>>>>>> ...
> >>>>>>>> Nov 5 11:48:02 sarek crmd: [9298]: info:
> >>>>>>>> mask(lrm.c:do_lrm_rsc_op): Performing op stop on kill_spock
> >>>>>>>> Nov 5 11:48:02 sarek crmd: [9298]: WARN:
> >>>>>>>> mask(lrm.c:do_lrm_event): LRM operation (22) monitor_10000
on
> >>>>>>>> kill_spock Cancelled
> >>>>>>>> ...
> >>>>>>>> Nov 5 11:48:04 sarek crmd: [9298]: info:
> >>>>>>>> mask(lrm.c:do_lrm_rsc_op): Performing op start on kill_spoc
> >>>>>>>> ...
> >>>>>>>> Nov 5 11:48:20 sarek crmd: [9298]: ERROR:
> >>>>>>>> mask(lrm.c:do_lrm_event): LRM operation (26) start_0 on
> >>>>>>>> kill_spock Error: unknown error
> >>>>>>>> ..
> >>>>>>>> Nov 5 11:48:21 sarek crmd: [9298]: info:
> >>>>>>>> mask(lrm.c:do_lrm_rsc_op): Performing op stop on kill_spock
> >>>>>>>> Nov 5 11:48:21 sarek stonithd: [9296]: notice: try to stop
a
> >>>>>>>> resource kill_spock who is not in started resource queue.
> >>>>>>>> Nov 5 11:48:22 sarek crmd: [9298]: info:
> >>>>>>>> mask(lrm.c:do_update_resource): Updating kill_spock resource
> >>>>>>>> definitions after stop op
> >>>>>>>> ...
> >>>>>>>> Nov 5 11:48:24 sarek heartbeat: [9261]: WARN: Exiting
> >>>>>>>> /usr/lib/heartbeat/stonithd process 9296 killed by signal 11.
> >>>>>>>> Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Exiting
> >>>>>>>> /usr/lib/heartbeat/stonithd process 9296 dumped core
> >>>>>>>> Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Client
> >>>>>>>> /usr/lib/heartbeat/stonithd killed by signal 11.
> >>>>>>>> Nov 5 11:48:24 sarek heartbeat: [9261]: ERROR: Respawning
> >>>>>>>> client "/usr/lib/heartbeat/stonithd":
> >>>>>>>> Nov 5 11:48:24 sarek heartbeat: [9261]: info: Starting
child
> >>>>>>>> client "/usr/lib/heartbeat/stonithd" (0,0)
> >>>>>>>> Nov 5 11:48:24 sarek heartbeat: [17057]: info: Starting
> >>>>>>>> "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 17057)
> >>>>>>>>
> >>>>>>>
> >>>>>>> I'm puzzled by this issue ( stonithd killed by signal 11 ) for
a
> >>>>>>> long time, because it's not reproduced on my machine.
> >>>>>>> It's so fortune for me you can reproduce it stably. ;-)
> >>>>>>>
> >>>>>>
> >>>>>> In fact it is killed everytime I start heartbeat. Sometimes it
is
> >>>>>> killed after 4 or 5 minutes takes a little bit longer (1 hour)
> >>>>>> (subjective impression is that it takes longer if the machine
is
> >>>>>> fresh rebooted)
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> I make a small patch again to current HEAD file
> >>>>>>> lib/clplumbing/cl_msg.c. Can you please apply it and try
again?
> >>>>>>> This should be helpful for me to located the issue more
further.
> >>>>>>> Thanks a lots in advance.
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> OK, used the current CVS HEAD from today. I have attached the
logs
> >>>>>> of both nodes.
> >>>>>> Im not 100 percent sure yet, but it seems, that if stonithd
> >>>>>> segfaulted one time, and therefore no monitor operations are
> >>>>>> carried out anymore it will not segfault anymore. So maybe the
> >>>>>> monitor operation causes the segfault somehow???
> >>>>>> (Just wanted to mention that, perhaps it's helpful)
> >>>>>>
> >>>>>
> >>>>> Thanks so much for your help.
> >>>>> Besides, do you apply my small patch as the attachment? I
cannot
> >>>>> see the output
> >>>>> from the small patch.
> >>>>
> >>>>
> >>>>
> >>>>> And, from the log you attached, it seems the issue of this time
has
> >>>>> a different cause comparing to the last one. I added several
memory
> >>>>> initializing statements in CVS. Could you please have a try
again.
> >>>>> Thanks and waiting for your result.
> >>>>>
> >>>>>
> >>>>
> >>>> Ups, I misunderstood your mail, I though the patch were in the
CVS
> >>>> HEAD,
> >>>> sorry. I think I will be able to apply the patch in a few hours
and
> >>>> then
> >>>> mail you the logs.
> >>>>
> >>>>
> >>>
> >>> No problem. Look forward to your result.
> >>>
> >>
> >> OK, I applied the patch some hours ago and started heartbeat.
Somehow,
> >> it took much longer until stontithd segfaulted (3 hours against few
> >> minutes).
> >> Since the log file is pretty hughe (6.6mb unziped and 176kb
bzipped) I
> >> attached only a little part of it. If you want me to mail the full
logs
> >> directely, let me know.
> >>
> >> Many thanks in advance.
> >> Stefan Peinkofer
> >>
> >>
> >>>>>> BTW: I would much appreciate it, if someone could get LRM (or
CRM)
> >>>>>> to restart the stonith resources reliably, in such a case.
It's
> >>>>>> maybe sufficient if the stonith resources get restarted until
the
> >>>>>> start operation succeeds. Is there somewhere a trigger in
cib.xml
> >>>>>> where I can specify, try to restart infinitely? (or at least
try
> >>>>>> it 100 times or so :)
> >>>>>>
> >>>>>
> >>>>> I'll file a bug for this, but currently only for tracking the
> >>>>> requirement.
> >>>>> http://www.osdl.org/developer_bugzilla/show_bug.cgi?id=950
> >>>>>
> >>>>
> >>>> Many thanks for that.
> >>>>
> >>>
> >>> Welcome.
> >>>
> >>>
> >>>> Stefan Peinkofer
> >>>>
> >>>>
> >>>>
> >>>>>> Many thanks in advance.
> >>>>>> Stefan Peinkofer
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>> Note, I haven't attached full logs + core backtrace since
the
> >>>>>>>> look like the onesI have provided in the former mail. If you
> >>>>>>>> want them regardles of that, let me know.
> >>>>>>>> BTW. At least the OCF resource script IPAddr in the recent
CVS
> >>>>>>>> HEAD is "broken" (at least for my system). To get heartbeat
> >>>>>>>> working for testing Problem 2 status, I used the ones from a
CVS
> >>>>>>>> version from 2005-11-02. I have no time today to investigate
> >>>>>>>> further, but I think I will look at it closer towmorrow
evening.
> >>>>>>>> Many thanks in advance.
> >>>>>>>>
> >>>>>>>
> >>>>>>> BTW, I fixed the broken issue of the OCF IPAddr.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> Stefan Peinkofer
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>> Thanks!
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I have to Thank.
> >>>>>>>>> Stefan Peinkofer
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Alan Robertson <alanr [at] unix>
> >>>>>>>>>>
> >>>>>>>>>> "Openness is the foundation and preservative of
friendship...
> >>>>>>>>>> Let me claim from you at all times your undisguised
opinions."
> >>>>>>>>>> - William Wilberforce
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> Linux-HA mailing list
> >>>>>>>>>> Linux-HA [at] lists
> >>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>>>>>>>> See also: http://linux-ha.org/ReportingProblems
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Linux-HA mailing list
> >>>>>>>>> Linux-HA [at] lists
> >>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>>>>>>> See also: http://linux-ha.org/ReportingProblems
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Linux-HA mailing list
> >>>>>>>> Linux-HA [at] lists
> >>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>>>>>> See also: http://linux-ha.org/ReportingProblems
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> BRs,
> >>>>>>>
> >>>>>>> Sun Jiang Dong
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> Index: cl_msg.c
> >>>>>>>
===================================================================
> >>>>>>> RCS
file: /home/cvs/linux-ha/linux-ha/lib/clplumbing/cl_msg.c,v
> >>>>>>> retrieving revision 1.101
> >>>>>>> diff -u -r1.101 cl_msg.c
> >>>>>>> --- cl_msg.c 3 Nov 2005 22:28:32 -0000 1.101
> >>>>>>> +++ cl_msg.c 7 Nov 2005 07:35:43 -0000
> >>>>>>> @@ -1964,11 +1964,24 @@
> >>>>>>> return HA_FAIL;
> >>>>>>> }
> >>>>>>>
> >>>>>>> + /* + * Just for debugging bug 730, will remove it
after
> >>>>>>> the bug is fixed.
> >>>>>>> + *
http://www.osdl.org/developer_bugzilla/show_bug.cgi?id=730
> >>>>>>> + */
> >>>>>>> + cl_log(LOG_INFO, "%s:%d: Will audit the ha_msg.",
> >>>>>>> __FUNCTION__, __LINE__);
> >>>>>>> + AUDITMSG(m); +
> >>>>>>> + cl_log(LOG_INFO, "%s:%d: Will detect the status of the
> >>>>>>> channel as an "
> >>>>>>> + " indirect checking", __FUNCTION__, __LINE__);
> >>>>>>> + cl_log(LOG_INFO, "Channel staus: %d",
> >>>>>>> ch->ops->get_chan_status(ch));
> >>>>>>> +
> >>>>>>> if ((imsg = hamsg2ipcmsg(m, ch)) == NULL) {
> >>>>>>> cl_log(LOG_ERR, "hamsg2ipcmsg() failure");
> >>>>>>> return HA_FAIL;
> >>>>>>> }
> >>>>>>>
> >>>>>>> + cl_log(LOG_INFO, "%s:%d: hamsg2ipcmsg() ok.",
__FUNCTION__,
> >>>>>>> __LINE__);
> >>>>>>> +
> >>>>>>> if (ch->ops->send(ch, imsg) != IPC_OK) {
> >>>>>>> if (ch->ch_status == IPC_CONNECT) {
> >>>>>>> snprintf(ch->failreason,MAXFAILREASON,
> >>>>>>
> >>>>>>
Attachments: messages-spock-cvs-head-2005-11-09.txt.gz (47.4 KB)
  signature.asc (0.18 KB)


peinkofe at fhm

Nov 9, 2005, 8:10 AM

Post #46 of 55 (1841 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

Hello Sun Jiang Dong and Guochun Shi,
On Wed, 2005-11-09 at 16:44 +0100, Stefan Peinkofer wrote:
> Hello Sun Jiang Dong and Guochun Shi,
>
> On Wed, 2005-11-16 at 17:07 +0800, Sun Jiang Dong wrote:
> > Hi Stefan Peinkofer,
> >
> > I removed ZAPCHAN according to gshi's suggestion. Can you have a try
> of CVS HEAD
> > again. Thanks a lots in advance.
> >
> I tried the CVS HEAD (with Sun's patch applied) but nothing has changed.
> I have attached the logfile.
> Question: If I look at the end of the logfile I see that some time after
> the segfault, the messages from Sun's patch disappear. Is this normal?
> (It was not so in the prior CVS HEAD version)
Now stonithd on the other machine segfaulted too. But this time, the
messages from Sun's patch continued to appear after the segfault.(??)
(Attached a part of the log)
>
> >>>>>> Im not 100 percent sure yet, but it seems, that if stonithd
> > >>>>>> segfaulted one time, and therefore no monitor operations are
> > >>>>>> carried out anymore it will not segfault anymore. So maybe the
> > >>>>>> monitor operation causes the segfault somehow???
> > >>>>>> (Just wanted to mention that, perhaps it's helpful
> Note, I let heartbeat run over the night, tonight.
> After the segfault on sarek at 19:03 stonithd segfaulted not again.
> (Watched until 11:40 on the next day because then I stopped and started
> the new CVS HEAD version)
>
Many thanks in advance.
Stefan Peinkofer
Attachments: messages-sarek-cvs-head-2005-11-09.txt (90.0 KB)


peinkofe at fhm

Nov 9, 2005, 9:09 AM

Post #47 of 55 (1843 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

Hello Sun Jiang Dong and Guochun Shi,

I connected to the stonith devices via telnet today and they got hung in
the middle of displaying the plug state. (I could even ping them
anymore) That was weird since I had to restart the PWSW's in order to
login again. (Maybe waiting some time until the network connection
timeout of the power switches runs out had done the job too) I hope
there was no problem with the PWSW's that caused the segfault.
Could it be that a timeout of the monitor action, or better said a hung
of the network power switches has caused the segfault?

Many thanks in advance.
Stefan Peinkofer
On Wed, 2005-11-09 at 17:10 +0100, Stefan Peinkofer wrote:
> Hello Sun Jiang Dong and Guochun Shi,
> On Wed, 2005-11-09 at 16:44 +0100, Stefan Peinkofer wrote:
> > Hello Sun Jiang Dong and Guochun Shi,
> >
> > On Wed, 2005-11-16 at 17:07 +0800, Sun Jiang Dong wrote:
> > > Hi Stefan Peinkofer,
> > >
> > > I removed ZAPCHAN according to gshi's suggestion. Can you have a try
> > of CVS HEAD
> > > again. Thanks a lots in advance.
> > >
> > I tried the CVS HEAD (with Sun's patch applied) but nothing has changed.
> > I have attached the logfile.
> > Question: If I look at the end of the logfile I see that some time after
> > the segfault, the messages from Sun's patch disappear. Is this normal?
> > (It was not so in the prior CVS HEAD version)
> Now stonithd on the other machine segfaulted too. But this time, the
> messages from Sun's patch continued to appear after the segfault.(??)
> (Attached a part of the log)
> >
> > >>>>>> Im not 100 percent sure yet, but it seems, that if stonithd
> > > >>>>>> segfaulted one time, and therefore no monitor operations are
> > > >>>>>> carried out anymore it will not segfault anymore. So maybe the
> > > >>>>>> monitor operation causes the segfault somehow???
> > > >>>>>> (Just wanted to mention that, perhaps it's helpful
> > Note, I let heartbeat run over the night, tonight.
> > After the segfault on sarek at 19:03 stonithd segfaulted not again.
> > (Watched until 11:40 on the next day because then I stopped and started
> > the new CVS HEAD version)
> >
> Many thanks in advance.
> Stefan Peinkofer
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
Attachments: signature.asc (0.18 KB)


alanr at unix

Nov 9, 2005, 9:59 AM

Post #48 of 55 (1841 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

Stefan Peinkofer wrote:
> Hello Sun Jiang Dong and Guochun Shi,
>
> I connected to the stonith devices via telnet today and they got hung in
> the middle of displaying the plug state. (I could even ping them
> anymore) That was weird since I had to restart the PWSW's in order to
> login again. (Maybe waiting some time until the network connection
> timeout of the power switches runs out had done the job too) I hope
> there was no problem with the PWSW's that caused the segfault.

No matter what else is true, it's a bug.



--
Alan Robertson <alanr [at] unix>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


peinkofe at fhm

Nov 9, 2005, 10:56 AM

Post #49 of 55 (1840 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

On Wed, 2005-11-09 at 10:59 -0700, Alan Robertson wrote:
> Stefan Peinkofer wrote:
> > Hello Sun Jiang Dong and Guochun Shi,
> >
> > I connected to the stonith devices via telnet today and they got hung in
> > the middle of displaying the plug state. (I could even ping them
> > anymore) That was weird since I had to restart the PWSW's in order to
> > login again. (Maybe waiting some time until the network connection
> > timeout of the power switches runs out had done the job too) I hope
> > there was no problem with the PWSW's that caused the segfault.
>
OK, it's getting clearer and more weird.
Note: In my heartbeat config I use a monitoring interval of 10 seconds
and a timeout of 5 seconds for the stonith resources.

After doing a:
while [ 1 ]; date; do stonith -t wti_nps ipaddr=192.168.1.204
password=XXXXX -S; done;
('Log' is attached)
At the beginning, the call returns within a second. After some minutes,
it takes (apruptely) about 3 to 4 seconds. If I cancel the call at this
stage and try to logon manually, the connection freezes as shown below.
(How differs the thing what the stonith plugin does from a manual telnet
login???)

[root [at] sare log]# telnet kill-spock
Trying 192.168.1.204...
Connected to kill-spock (192.168.1.204).
Escape character is '^]'.

Enter Password: *****

Network Power Switch v3.02 Site: STONITH FOR SPOCK

Plug | Name | Status | Boot Delay | Password |
Default |
-----+------------------+---------+------------+------------------+---------+
1 | spock | ON | 5 sec | (undefined) | ON
|
2 | (undefined) | ON | 5 sec | (undefined) | ON
|
3 | (undefined) | ON | 5 sec | (undefined) | ON
|
4 | (undefined) | ON | 5 sec | (un

(Note, it freezed right at the password promt too, sometimes)
At the time it freezes, the device responds no longer to pings.
Note this is reliable reproducible, but only if I abort the sontih -S
loop and do a manual telnet connection. The stonith -S loop seems to run
'for ever' even though slow.
If I wait until the specified network connection timeout, the stonith
device becomes accessible again. Unfortunately the timeout can be set to
not less than 2 mins.
After this has occoured, the connections are fast again (for some time:)


Many thanks in advance.
Stefan Peinkofer

> No matter what else is true, it's a bug.
>
>
>
Attachments: nps.log (3.82 KB)
  signature.asc (0.18 KB)


peinkofe at fhm

Nov 9, 2005, 11:04 AM

Post #50 of 55 (1844 views)
Permalink
Re: New problem(s) with heartbeat 2.0.3 and STONITH [In reply to]

On Wed, 2005-11-09 at 19:56 +0100, Stefan Peinkofer wrote:
> On Wed, 2005-11-09 at 10:59 -0700, Alan Robertson wrote:
> > Stefan Peinkofer wrote:
> > > Hello Sun Jiang Dong and Guochun Shi,
> > >
> > > I connected to the stonith devices via telnet today and they got hung in
> > > the middle of displaying the plug state. (I could even ping them
> > > anymore) That was weird since I had to restart the PWSW's in order to
> > > login again. (Maybe waiting some time until the network connection
> > > timeout of the power switches runs out had done the job too) I hope
> > > there was no problem with the PWSW's that caused the segfault.
> >
Ahhhhhh, from wti_nps.c
* 2. We observed that on busy networks where there may be high
occurances
* of broadcasts, the NPS became unresponsive. In some
* configurations this necessitated placing the power switch onto a
* private subnet.
In fact is is on a private subnet but it may experience to much
connections because of the 10second interval!?
>
> OK, it's getting clearer and more weird.
> Note: In my heartbeat config I use a monitoring interval of 10 seconds
> and a timeout of 5 seconds for the stonith resources.
>
> After doing a:
> while [ 1 ]; date; do stonith -t wti_nps ipaddr=192.168.1.204
> password=XXXXX -S; done;
> ('Log' is attached)
> At the beginning, the call returns within a second. After some minutes,
> it takes (apruptely) about 3 to 4 seconds. If I cancel the call at this
> stage and try to logon manually, the connection freezes as shown below.
> (How differs the thing what the stonith plugin does from a manual telnet
> login???)
>
> [root [at] sare log]# telnet kill-spock
> Trying 192.168.1.204...
> Connected to kill-spock (192.168.1.204).
> Escape character is '^]'.
>
> Enter Password: *****
>
> Network Power Switch v3.02 Site: STONITH FOR SPOCK
>
> Plug | Name | Status | Boot Delay | Password |
> Default |
> -----+------------------+---------+------------+------------------+---------+
> 1 | spock | ON | 5 sec | (undefined) | ON
> |
> 2 | (undefined) | ON | 5 sec | (undefined) | ON
> |
> 3 | (undefined) | ON | 5 sec | (undefined) | ON
> |
> 4 | (undefined) | ON | 5 sec | (un
>
> (Note, it freezed right at the password promt too, sometimes)
> At the time it freezes, the device responds no longer to pings.
> Note this is reliable reproducible, but only if I abort the sontih -S
> loop and do a manual telnet connection. The stonith -S loop seems to run
> 'for ever' even though slow.
> If I wait until the specified network connection timeout, the stonith
> device becomes accessible again. Unfortunately the timeout can be set to
> not less than 2 mins.
> After this has occoured, the connections are fast again (for some time:)
>
>
> Many thanks in advance.
> Stefan Peinkofer
>
> > No matter what else is true, it's a bug.
> >
> >
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
Attachments: signature.asc (0.18 KB)

First page Previous page 1 2 3 Next page Last page  View All Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.