Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Pacemaker

monitor of stonith resource times out

 

 

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded


inouekazu at intellilink

Aug 10, 2012, 3:08 AM

Post #1 of 3 (280 views)
Permalink
monitor of stonith resource times out

Hi,
I am using Pacemaker-1.1.
When stonith resource is in fencing, monitor of the stonith resource times out.

[test case]
1. add sleep to reset() of stonith-plugin.
i am using libvirt.

# diff -u libvirt.ORG libvirt
--- libvirt.ORG 2012-07-17 13:10:01.000000000 +0900
+++ libvirt 2012-08-10 16:52:31.214366776 +0900
@@ -204,6 +204,7 @@
reset)
# libvirt has no reset so we do a power cycle
+ sleep 3600
libvirt_check_config
libvirt_set_domain_id $2


2. start a stonith resource (libvirt).
- stonith-timeout is 600s
- monitor interval is 10s
- monitor timeout is 60s
* stonith-timeout > monitor interval + monitor timeout

# crm_mon
:
Stack: corosync
Current DC: dev1 (-1788499776) - partition with quorum
Version: 1.1.7-916198b
2 Nodes configured, unknown expected votes
1 Resources configured.
============
Online: [ dev1 dev2 ]

Full list of resources:
f-dev2 (stonith:external/libvirt): Started dev1


3. require 'fence' and wait..
# crm node fence dev2

and wait..
then, although monitor of libvirt is performed, since libvirt is in
fencing by added sleep, monitor of the libvirt times out.

# crm_mon
:
============
Node dev2 (-1822054208): UNCLEAN (online)
Online: [ dev1 ]

Full list of resources:
f-dev2 (stonith:external/libvirt): Started dev1 FAILED

Failed actions:
f-dev2_monitor_10000 (node=dev1, call=10, rc=1, status=Timed Out): unknown error

----
Aug 10 15:39:59 [6559] dev1 stonith-ng: info: call_remote_stonith: Requesting that dev1 perform op reboot dev2
Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: stonith_command: Processing st_fence from dev1 ( 0)
Aug 10 15:39:59 [6559] dev1 stonith-ng: info: can_fence_host_with_device: f-dev2 can fence dev2: dynamic-list
Aug 10 15:39:59 [6559] dev1 stonith-ng: info: stonith_fence: Found 1 matching devices for 'dev2'
Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: schedule_stonith_command: Scheduling reboot on f-dev2 for 15fb4a4b-4d91-4b8e-9e3d-8ca9394ce7e5 (timeout=601ms)
Aug 10 15:39:59 [6559] dev1 stonith-ng: info: stonith_command: Processed st_fence from dev1: Operation now in progress (-115)
Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: plugin=external/libvirt
Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: hostlist=dev2
Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: hypervisor_uri=qemu+ssh://n7/system
Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: action=reboot
Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: nodename=dev2
Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: make_args: Performing reboot action for node 'dev2' as 'port=dev2'
Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: port=dev2
Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: make_args: Calculated: plugin=external/libvirt
hostlist=dev2
hypervisor_uri=qemu+ssh://n7/system
action=reboot
nodename=dev2
port=dev2
Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: run_stonith_agent: forking
Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: run_stonith_agent: sending args
Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: run_stonith_agent: Op: reboot on fence_legacy, pid: 6725, timeout: 601s
Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: stonith_device_execute: Operation reboot for node dev2 on f-dev2 now running with pid=6725, timeout=601ms
Aug 10 15:40:06 [6560] dev1 lrmd: trace: stonith_create_op: Sending call options: 00000000, 0
Aug 10 15:40:06 [6560] dev1 lrmd: trace: stonith_send_command: Sending st_execute message to STONITH service, Timeout: 64s
Aug 10 15:40:06 [6560] dev1 lrmd: trace: stonith_send_command: Reply: No data to dump as XML
Aug 10 15:40:06 [6560] dev1 lrmd: trace: stonith_send_command: Async call 8, returning
Aug 10 15:40:06 [6560] dev1 lrmd: trace: stonith_api_add_callback: Added callback to lrmd_stonith_callback for call 8
Aug 10 15:40:06 [6559] dev1 stonith-ng: debug: stonith_command: Processing st_execute from lrmd ( 0)
Aug 10 15:40:06 [6559] dev1 stonith-ng: debug: schedule_stonith_command: Scheduling monitor on f-dev2 for 15cf9708-0b22-4e8e-a724-7e58f1014b99 (timeout=64ms)
Aug 10 15:40:06 [6559] dev1 stonith-ng: info: stonith_command: Processed st_execute from lrmd: Operation now in progress (-115)
Aug 10 15:40:59 [6559] dev1 stonith-ng: debug: remote_op_query_timeout: Operation 15fb4a4b-4d91-4b8e-9e3d-8ca9394ce7e5 for dev2 already in progress
Aug 10 15:42:10 [6560] dev1 lrmd: error: stonith_async_timeout_handler: Async call 8 timed out after 124000ms
Aug 10 15:42:10 [6560] dev1 lrmd: trace: stonith_perform_callback: Invoking callback lrmd_stonith_callback for call 8
Aug 10 15:42:10 [6560] dev1 lrmd: trace: stonith_perform_callback: OP callback activated.
Aug 10 15:42:10 [6560] dev1 lrmd: debug: process_lrmd_message: Processed lrmd_rsc_info operation from eadb7520-fdff-4231-a488-9fde0b45073c: rc=0, reply=0, notify=0, exit=0
Aug 10 15:42:10 [6563] dev1 crmd: error: process_lrm_event: LRM operation f-dev2_monitor_10000 (10) Timed Out (timeout=64000ms)

Best Regards,
Kazunori INOUE
Attachments: stonith-monitor-timedout.tar.bz2 (84.2 KB)


andrew at beekhof

Aug 15, 2012, 3:08 AM

Post #2 of 3 (253 views)
Permalink
Re: monitor of stonith resource times out [In reply to]

Ouch. Yes definitely a bug.
I seem to recall David looking into something like this recently...
did you file a bug that he has been working on?

-- Andrew

On Fri, Aug 10, 2012 at 8:08 PM, Kazunori INOUE
<inouekazu [at] intellilink> wrote:
> Hi,
> I am using Pacemaker-1.1.
> When stonith resource is in fencing, monitor of the stonith resource times out.
>
> [test case]
> 1. add sleep to reset() of stonith-plugin.
> i am using libvirt.
>
> # diff -u libvirt.ORG libvirt
> --- libvirt.ORG 2012-07-17 13:10:01.000000000 +0900
> +++ libvirt 2012-08-10 16:52:31.214366776 +0900
> @@ -204,6 +204,7 @@
> reset)
> # libvirt has no reset so we do a power cycle
> + sleep 3600
> libvirt_check_config
> libvirt_set_domain_id $2
>
>
> 2. start a stonith resource (libvirt).
> - stonith-timeout is 600s
> - monitor interval is 10s
> - monitor timeout is 60s
> * stonith-timeout > monitor interval + monitor timeout
>
> # crm_mon
> :
> Stack: corosync
> Current DC: dev1 (-1788499776) - partition with quorum
> Version: 1.1.7-916198b
> 2 Nodes configured, unknown expected votes
> 1 Resources configured.
> ============
> Online: [ dev1 dev2 ]
>
> Full list of resources:
> f-dev2 (stonith:external/libvirt): Started dev1
>
>
> 3. require 'fence' and wait..
> # crm node fence dev2
>
> and wait..
> then, although monitor of libvirt is performed, since libvirt is in
> fencing by added sleep, monitor of the libvirt times out.
>
> # crm_mon
> :
> ============
> Node dev2 (-1822054208): UNCLEAN (online)
> Online: [ dev1 ]
>
> Full list of resources:
> f-dev2 (stonith:external/libvirt): Started dev1 FAILED
>
> Failed actions:
> f-dev2_monitor_10000 (node=dev1, call=10, rc=1, status=Timed Out): unknown error
>
> ----
> Aug 10 15:39:59 [6559] dev1 stonith-ng: info: call_remote_stonith: Requesting that dev1 perform op reboot dev2
> Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: stonith_command: Processing st_fence from dev1 ( 0)
> Aug 10 15:39:59 [6559] dev1 stonith-ng: info: can_fence_host_with_device: f-dev2 can fence dev2: dynamic-list
> Aug 10 15:39:59 [6559] dev1 stonith-ng: info: stonith_fence: Found 1 matching devices for 'dev2'
> Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: schedule_stonith_command: Scheduling reboot on f-dev2 for 15fb4a4b-4d91-4b8e-9e3d-8ca9394ce7e5 (timeout=601ms)
> Aug 10 15:39:59 [6559] dev1 stonith-ng: info: stonith_command: Processed st_fence from dev1: Operation now in progress (-115)
> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: plugin=external/libvirt
> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: hostlist=dev2
> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: hypervisor_uri=qemu+ssh://n7/system
> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: action=reboot
> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: nodename=dev2
> Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: make_args: Performing reboot action for node 'dev2' as 'port=dev2'
> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: port=dev2
> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: make_args: Calculated: plugin=external/libvirt
> hostlist=dev2
> hypervisor_uri=qemu+ssh://n7/system
> action=reboot
> nodename=dev2
> port=dev2
> Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: run_stonith_agent: forking
> Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: run_stonith_agent: sending args
> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: run_stonith_agent: Op: reboot on fence_legacy, pid: 6725, timeout: 601s
> Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: stonith_device_execute: Operation reboot for node dev2 on f-dev2 now running with pid=6725, timeout=601ms
> Aug 10 15:40:06 [6560] dev1 lrmd: trace: stonith_create_op: Sending call options: 00000000, 0
> Aug 10 15:40:06 [6560] dev1 lrmd: trace: stonith_send_command: Sending st_execute message to STONITH service, Timeout: 64s
> Aug 10 15:40:06 [6560] dev1 lrmd: trace: stonith_send_command: Reply: No data to dump as XML
> Aug 10 15:40:06 [6560] dev1 lrmd: trace: stonith_send_command: Async call 8, returning
> Aug 10 15:40:06 [6560] dev1 lrmd: trace: stonith_api_add_callback: Added callback to lrmd_stonith_callback for call 8
> Aug 10 15:40:06 [6559] dev1 stonith-ng: debug: stonith_command: Processing st_execute from lrmd ( 0)
> Aug 10 15:40:06 [6559] dev1 stonith-ng: debug: schedule_stonith_command: Scheduling monitor on f-dev2 for 15cf9708-0b22-4e8e-a724-7e58f1014b99 (timeout=64ms)
> Aug 10 15:40:06 [6559] dev1 stonith-ng: info: stonith_command: Processed st_execute from lrmd: Operation now in progress (-115)
> Aug 10 15:40:59 [6559] dev1 stonith-ng: debug: remote_op_query_timeout: Operation 15fb4a4b-4d91-4b8e-9e3d-8ca9394ce7e5 for dev2 already in progress
> Aug 10 15:42:10 [6560] dev1 lrmd: error: stonith_async_timeout_handler: Async call 8 timed out after 124000ms
> Aug 10 15:42:10 [6560] dev1 lrmd: trace: stonith_perform_callback: Invoking callback lrmd_stonith_callback for call 8
> Aug 10 15:42:10 [6560] dev1 lrmd: trace: stonith_perform_callback: OP callback activated.
> Aug 10 15:42:10 [6560] dev1 lrmd: debug: process_lrmd_message: Processed lrmd_rsc_info operation from eadb7520-fdff-4231-a488-9fde0b45073c: rc=0, reply=0, notify=0, exit=0
> Aug 10 15:42:10 [6563] dev1 crmd: error: process_lrm_event: LRM operation f-dev2_monitor_10000 (10) Timed Out (timeout=64000ms)
>
> Best Regards,
> Kazunori INOUE
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


inouekazu at intellilink

Aug 15, 2012, 11:41 PM

Post #3 of 3 (247 views)
Permalink
Re: monitor of stonith resource times out [In reply to]

Hi,

I reported it in Bugzilla.
* http://bugs.clusterlabs.org/show_bug.cgi?id=5098

Best Regards,
Kazunori INOUE

(12.08.15 19:08), Andrew Beekhof wrote:
> Ouch. Yes definitely a bug.
> I seem to recall David looking into something like this recently...
> did you file a bug that he has been working on?
>
> -- Andrew
>
> On Fri, Aug 10, 2012 at 8:08 PM, Kazunori INOUE
> <inouekazu [at] intellilink> wrote:
>> Hi,
>> I am using Pacemaker-1.1.
>> When stonith resource is in fencing, monitor of the stonith resource times out.
>>
>> [test case]
>> 1. add sleep to reset() of stonith-plugin.
>> i am using libvirt.
>>
>> # diff -u libvirt.ORG libvirt
>> --- libvirt.ORG 2012-07-17 13:10:01.000000000 +0900
>> +++ libvirt 2012-08-10 16:52:31.214366776 +0900
>> @@ -204,6 +204,7 @@
>> reset)
>> # libvirt has no reset so we do a power cycle
>> + sleep 3600
>> libvirt_check_config
>> libvirt_set_domain_id $2
>>
>>
>> 2. start a stonith resource (libvirt).
>> - stonith-timeout is 600s
>> - monitor interval is 10s
>> - monitor timeout is 60s
>> * stonith-timeout > monitor interval + monitor timeout
>>
>> # crm_mon
>> :
>> Stack: corosync
>> Current DC: dev1 (-1788499776) - partition with quorum
>> Version: 1.1.7-916198b
>> 2 Nodes configured, unknown expected votes
>> 1 Resources configured.
>> ============
>> Online: [ dev1 dev2 ]
>>
>> Full list of resources:
>> f-dev2 (stonith:external/libvirt): Started dev1
>>
>>
>> 3. require 'fence' and wait..
>> # crm node fence dev2
>>
>> and wait..
>> then, although monitor of libvirt is performed, since libvirt is in
>> fencing by added sleep, monitor of the libvirt times out.
>>
>> # crm_mon
>> :
>> ============
>> Node dev2 (-1822054208): UNCLEAN (online)
>> Online: [ dev1 ]
>>
>> Full list of resources:
>> f-dev2 (stonith:external/libvirt): Started dev1 FAILED
>>
>> Failed actions:
>> f-dev2_monitor_10000 (node=dev1, call=10, rc=1, status=Timed Out): unknown error
>>
>> ----
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: info: call_remote_stonith: Requesting that dev1 perform op reboot dev2
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: stonith_command: Processing st_fence from dev1 ( 0)
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: info: can_fence_host_with_device: f-dev2 can fence dev2: dynamic-list
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: info: stonith_fence: Found 1 matching devices for 'dev2'
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: schedule_stonith_command: Scheduling reboot on f-dev2 for 15fb4a4b-4d91-4b8e-9e3d-8ca9394ce7e5 (timeout=601ms)
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: info: stonith_command: Processed st_fence from dev1: Operation now in progress (-115)
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: plugin=external/libvirt
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: hostlist=dev2
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: hypervisor_uri=qemu+ssh://n7/system
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: action=reboot
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: nodename=dev2
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: make_args: Performing reboot action for node 'dev2' as 'port=dev2'
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: port=dev2
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: make_args: Calculated: plugin=external/libvirt
>> hostlist=dev2
>> hypervisor_uri=qemu+ssh://n7/system
>> action=reboot
>> nodename=dev2
>> port=dev2
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: run_stonith_agent: forking
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: run_stonith_agent: sending args
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: run_stonith_agent: Op: reboot on fence_legacy, pid: 6725, timeout: 601s
>> Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: stonith_device_execute: Operation reboot for node dev2 on f-dev2 now running with pid=6725, timeout=601ms
>> Aug 10 15:40:06 [6560] dev1 lrmd: trace: stonith_create_op: Sending call options: 00000000, 0
>> Aug 10 15:40:06 [6560] dev1 lrmd: trace: stonith_send_command: Sending st_execute message to STONITH service, Timeout: 64s
>> Aug 10 15:40:06 [6560] dev1 lrmd: trace: stonith_send_command: Reply: No data to dump as XML
>> Aug 10 15:40:06 [6560] dev1 lrmd: trace: stonith_send_command: Async call 8, returning
>> Aug 10 15:40:06 [6560] dev1 lrmd: trace: stonith_api_add_callback: Added callback to lrmd_stonith_callback for call 8
>> Aug 10 15:40:06 [6559] dev1 stonith-ng: debug: stonith_command: Processing st_execute from lrmd ( 0)
>> Aug 10 15:40:06 [6559] dev1 stonith-ng: debug: schedule_stonith_command: Scheduling monitor on f-dev2 for 15cf9708-0b22-4e8e-a724-7e58f1014b99 (timeout=64ms)
>> Aug 10 15:40:06 [6559] dev1 stonith-ng: info: stonith_command: Processed st_execute from lrmd: Operation now in progress (-115)
>> Aug 10 15:40:59 [6559] dev1 stonith-ng: debug: remote_op_query_timeout: Operation 15fb4a4b-4d91-4b8e-9e3d-8ca9394ce7e5 for dev2 already in progress
>> Aug 10 15:42:10 [6560] dev1 lrmd: error: stonith_async_timeout_handler: Async call 8 timed out after 124000ms
>> Aug 10 15:42:10 [6560] dev1 lrmd: trace: stonith_perform_callback: Invoking callback lrmd_stonith_callback for call 8
>> Aug 10 15:42:10 [6560] dev1 lrmd: trace: stonith_perform_callback: OP callback activated.
>> Aug 10 15:42:10 [6560] dev1 lrmd: debug: process_lrmd_message: Processed lrmd_rsc_info operation from eadb7520-fdff-4231-a488-9fde0b45073c: rc=0, reply=0, notify=0, exit=0
>> Aug 10 15:42:10 [6563] dev1 crmd: error: process_lrm_event: LRM operation f-dev2_monitor_10000 (10) Timed Out (timeout=64000ms)
>>
>> Best Regards,
>> Kazunori INOUE
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker [at] oss
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.