
inouekazu at intellilink
Aug 2, 2012, 6:55 PM
Post #4 of 4
(209 views)
Permalink
|
|
Re: Among monitor of STONITH resource, monitor of other resources is not performed
[In reply to]
|
|
Hi David, I reported it in Bugzilla with crm_report. Please confirm it. * http://bugs.clusterlabs.org/show_bug.cgi?id=5090 Best Regards, Kazunori INOUE (12.08.01 06:49), David Vossel wrote: > > > ----- Original Message ----- >> From: "David Vossel" <dvossel [at] redhat> >> To: "The Pacemaker cluster resource manager" <pacemaker [at] oss> >> Sent: Tuesday, July 31, 2012 10:11:44 AM >> Subject: Re: [Pacemaker] Among monitor of STONITH resource, monitor of other resources is not performed >> >> ----- Original Message ----- >>> From: "Kazunori INOUE" <inouekazu [at] intellilink> >>> To: "pacemaker [at] os" <pacemaker [at] oss> >>> Cc: shimazakik [at] intellilink >>> Sent: Monday, July 30, 2012 5:15:26 AM >>> Subject: [Pacemaker] Among monitor of STONITH resource, monitor of >>> other resources is not performed >>> >>> Hi, >>> >>> I am using Pacemaker-1.1. >>> - glue (2012 Jul 16) 2719:18489f275f75 >>> - libqb (2012 Jul 19) 11b20e19beff7f1b6003be0b4c73da8ecf936442 >>> - corosync (2012 Jul 19) b9eb19e623d2b69c86cf78f1fa50a004c804ac20 >>> - pacemaker (2012 Jul 29) 33119da31c235710195c783e5c9a32c6e95b3efc >>> >>> The monitor operation of other resources is not performed during >>> the >>> monitor of STONITH resource. >> >> Looking at the code I can see this is true. How important is it for >> the stonith monitors to be able to be performed in parallel with the >> other resource monitors? I would expect the stonith monitor >> interval to be quite large compared to all the other monitors, >> meaning that it only executes once every hour or so. >> >> -- Vossel > > Yeah, this should be fixed. Can you create an issue for this on bugs.clusterlabs.org with your crm_report please. > > -- Vossel > > >> >>> 1. service corosync start ; service pacemaker start >>> 2. cibadmin -U -x test.xml >>> (STONITH resource + Dummy resource are started on the same node) >>> 3. After libvirt (STONITH resource) started, sleep was added to >>> status() >>> of libvirt. >>> >>> [root [at] dev external]# diff -u libvirt.ORG libvirt >>> --- libvirt.ORG 2012-07-17 13:10:01.000000000 +0900 >>> +++ libvirt 2012-07-30 13:36:19.661431208 +0900 >>> @@ -221,6 +221,7 @@ >>> ;; >>> >>> status) >>> + sleep 3600 >>> libvirt_check_config >>> libvirt_status >>> exit $? >>> >>> [root [at] dev ~]# ps -ef|egrep >>> "UID|corosync|pacemaker|stonith|fence|sleep" >>> UID PID PPID C STIME TTY TIME CMD >>> root 18567 1 0 18:47 ? 00:00:02 corosync >>> root 18585 1 0 18:47 ? 00:00:00 pacemakerd >>> 496 18587 18585 0 18:47 ? 00:00:00 >>> /usr/libexec/pacemaker/cib >>> root 18588 18585 0 18:47 ? 00:00:00 >>> /usr/libexec/pacemaker/stonithd >>> root 18589 18585 76 18:47 ? 00:05:27 >>> /usr/libexec/pacemaker/lrmd >>> 496 18590 18585 0 18:47 ? 00:00:00 >>> /usr/libexec/pacemaker/attrd >>> 496 18591 18585 0 18:47 ? 00:00:00 >>> /usr/libexec/pacemaker/pengine >>> 496 18592 18585 0 18:47 ? 00:00:00 >>> /usr/libexec/pacemaker/crmd >>> root 18767 18588 0 18:48 ? 00:00:00 /usr/bin/perl >>> /usr/sbin/fence_legacy >>> root 18768 18767 0 18:48 ? 00:00:00 stonith -t >>> external/libvirt >>> -E -S >>> root 18778 18768 0 18:48 ? 00:00:00 /bin/sh >>> /usr/lib64/stonith/plugins/external/libvirt status >>> root 18792 18778 0 18:48 ? 00:00:00 sleep 3600 >>> >>> 4. Then monitor of Dummy resource is not performed. >>> >>> The following is behavior of lrmd at that time. >>> >>> # gdb /usr/libexec/pacemaker/lrmd `pgrep lrmd` >>> (gdb) bt >>> #0 0x0000003f808e83e2 in recv () from /lib64/libc.so.6 >>> #1 0x00007f0de3820062 in qb_ipc_us_recv_at_most >>> (one_way=0x1118ee8, >>> msg=0x111f390, >>> len=20480, timeout=500) at ipc_us.c:299 >>> #2 0x00007f0de381a28e in qb_ipcc_recv (c=0x1118ba0, >>> msg_ptr=0x111f390, msg_len= >>> 20480, ms_timeout=500) at ipcc.c:249 >>> #3 0x00007f0de42bc5fb in crm_ipc_send (client=0x111b580, >>> message=0x111d860, reply= >>> 0x7fffc8ec4c60, ms_timeout=61060000) at ipc.c:517 >>> #4 0x00007f0de3c97e29 in stonith_send_command (stonith=0x111a6c0, >>> op= >>> 0x7f0de3c998dd "st_execute", data=0x1119170, output_data=0x0, >>> call_options=4096, >>> timeout=61000) at st_client.c:1676 >>> #5 0x00007f0de3c94bd1 in stonith_api_call (stonith=0x111a6c0, >>> call_options=4096, >>> id=0x1129860 "f-2", action=0x7f0de3c998f7 "monitor", >>> victim=0x0, >>> timeout=61000, >>> output=0x0) at st_client.c:951 >>> #6 0x00007f0de3c94d31 in stonith_api_monitor (stonith=0x111a6c0, >>> call_options=4096, >>> id=0x1129860 "f-2", timeout=61000) at st_client.c:985 >>> #7 0x00000000004044e2 in lrmd_rsc_execute_stonith (rsc=0x111f130, >>> cmd=0x1129660) at >>> lrmd.c:522 >>> #8 0x0000000000404cd6 in lrmd_rsc_execute (rsc=0x111f130) at >>> lrmd.c:667 >>> #9 0x0000000000404d2d in lrmd_rsc_dispatch (user_data=0x111f130) >>> at >>> lrmd.c:678 >>> #10 0x00007f0de42dcd00 in crm_trigger_dispatch (source=0x111f300, >>> callback= >>> 0x404d06 <lrmd_rsc_dispatch>, userdata=0x111f300) at >>> mainloop.c:105 >>> #11 0x0000003642638f0e in g_main_context_dispatch () from >>> /lib64/libglib-2.0.so.0 >>> #12 0x000000364263c938 in ?? () from /lib64/libglib-2.0.so.0 >>> #13 0x000000364263cd55 in g_main_loop_run () from >>> /lib64/libglib-2.0.so.0 >>> #14 0x0000000000402d3f in main (argc=1, argv=0x7fffc8ec5188) at >>> main.c:285 >>> (gdb) fin >>> Run till exit from #0 0x0000003f808e83e2 in recv () from >>> /lib64/libc.so.6 >>> 0x00007f0de3820062 in qb_ipc_us_recv_at_most (one_way=0x1118ee8, >>> msg=0x111f390, >>> len=20480, timeout=500) at ipc_us.c:299 >>> 299 result = recv(one_way->u.us.sock, &data[processed], >>> to_recv, >>> (gdb) fin >>> Run till exit from #0 0x00007f0de3820062 in qb_ipc_us_recv_at_most >>> (one_way= >>> 0x1118ee8, msg=0x111f390, len=20480, timeout=500) at >>> ipc_us.c:299 >>> 0x00007f0de381a28e in qb_ipcc_recv (c=0x1118ba0, msg_ptr=0x111f390, >>> msg_len=20480, >>> ms_timeout=500) at ipcc.c:249 >>> 249 res = c->funcs.recv(&c->response, msg_ptr, msg_len, >>> ms_timeout); >>> Value returned is $1 = -11 >>> (gdb) fin >>> Run till exit from #0 0x00007f0de381a28e in qb_ipcc_recv >>> (c=0x1118ba0, msg_ptr= >>> 0x111f390, msg_len=20480, ms_timeout=500) at ipcc.c:249 >>> 0x00007f0de42bc5fb in crm_ipc_send (client=0x111b580, >>> message=0x111d860, reply= >>> 0x7fffc8ec4c60, ms_timeout=61060000) at ipc.c:517 >>> 517 rc = qb_ipcc_recv(client->ipc, client->buffer, >>> client->buf_size, 500); >>> Value returned is $2 = -11 >>> (gdb) n >>> 518 if(rc > 0 || crm_ipc_connected(client) == >>> FALSE) >>> { >>> (gdb) p rc >>> $3 = -11 >>> (gdb) n >>> 522 } while(time(NULL) < timeout); >>> (gdb) n >>> 517 rc = qb_ipcc_recv(client->ipc, client->buffer, >>> client->buf_size, 500); >>> (gdb) n >>> 518 if(rc > 0 || crm_ipc_connected(client) == >>> FALSE) >>> { >>> (gdb) p rc >>> $4 = -11 >>> (gdb) n >>> 522 } while(time(NULL) < timeout); >>> (gdb) n >>> 517 rc = qb_ipcc_recv(client->ipc, client->buffer, >>> client->buf_size, 500); >>> (gdb) n >>> 518 if(rc > 0 || crm_ipc_connected(client) == >>> FALSE) >>> { >>> (gdb) p rc >>> $5 = -11 >>> (gdb) n >>> 522 } while(time(NULL) < timeout); >>> (gdb) n >>> 517 rc = qb_ipcc_recv(client->ipc, client->buffer, >>> client->buf_size, 500); >>> (gdb) n >>> 518 if(rc > 0 || crm_ipc_connected(client) == >>> FALSE) >>> { >>> (gdb) p rc >>> $6 = -11 >>> (gdb) >>> >>> It seems that lrmd has repeated the reply reception from stonithd >>> out of the g_main_loop. Therefore, monitor of Dummy is not >>> performed. >>> >>> [root [at] dev ~]# top -bn1 >>> top - 18:53:31 up 5 days, 8:56, 4 users, load average: 0.98, >>> 0.63, >>> 0.44 >>> Tasks: 198 total, 2 running, 196 sleeping, 0 stopped, 0 >>> zombie >>> Cpu(s): 0.9%us, 1.2%sy, 0.0%ni, 97.9%id, 0.0%wa, 0.0%hi, >>> 0.0%si, 0.0%st >>> Mem: 5089052k total, 2417444k used, 2671608k free, 266660k >>> buffers >>> Swap: 1048568k total, 0k used, 1048568k free, 1724616k >>> cached >>> >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >>> COMMAND >>> 18589 root 20 0 83932 3448 2548 R 98.2 0.1 4:15.89 lrmd >>> 1 root 20 0 19348 1520 1212 S 0.0 0.0 0:00.78 init >>> 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 >>> kthreadd >>> 3 root RT 0 0 0 0 S 0.0 0.0 0:06.93 >>> migration/0 >>> 4 root 20 0 0 0 0 S 0.0 0.0 15:23.59 >>> ksoftirqd/0 >>> 5 root RT 0 0 0 0 S 0.0 0.0 0:00.10 >>> migration/0 >>> >>> Best Regards, >>> Kazunori INOUE >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker [at] oss >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker [at] oss >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > _______________________________________________ > Pacemaker mailing list: Pacemaker [at] oss > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker [at] oss http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
|