Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Pacemaker

[Problem]It is judged that a stopping resource is starting.

 

 

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded


renayama19661014 at ybb

Dec 26, 2011, 11:15 PM

Post #1 of 15 (1621 views)
Permalink
[Problem]It is judged that a stopping resource is starting.

Hi All,

When Pacemaker stops when there is the resource that failed in probe processing, crmd outputs the following error message.


Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX was active at shutdown. You may ignore this error if it is unmanaged.


Because the resource that failed in probe processing does not start, this error message is not right.

I think that the following correction may be good, but we do not have conviction.


* crmd/lrm.c
(snip)
} else if(op->rc == EXECRA_NOT_RUNNING) {
active = FALSE;
+ } else if(op->rc != EXECRA_OK && op->interval == 0
+ && safe_str_eq(op->op_type, CRMD_ACTION_STATUS)) {
+ active = FALSE;
} else {
active = TRUE;
}
(snip)


In the source for development of Pacemaker, handling of this processing seems to be considerably changed.
It requests backporting to Pacemaker1.0 system of this change that we can do it.

Best Regards,
Hideo Yamauchi.



_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andrew at beekhof

Jan 5, 2012, 4:29 PM

Post #2 of 15 (1553 views)
Permalink
Re: [Problem]It is judged that a stopping resource is starting. [In reply to]

On Tue, Dec 27, 2011 at 6:15 PM, <renayama19661014 [at] ybb> wrote:
> Hi All,
>
> When Pacemaker stops when there is the resource that failed in probe processing, crmd outputs the following error message.
>
>
>  Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX was active at shutdown.  You may ignore this error if it is unmanaged.
>
>
> Because the resource that failed in probe processing does not start,

But it should have a subsequent stop action which would set it back to
being inactive.
Did that not happen in this case?

> this error message is not right.
>
> I think that the following correction may be good, but we do not have conviction.
>
>
>  * crmd/lrm.c
>  (snip)
>                } else if(op->rc == EXECRA_NOT_RUNNING) {
>                        active = FALSE;
> +                } else if(op->rc != EXECRA_OK && op->interval == 0
> +                                && safe_str_eq(op->op_type, CRMD_ACTION_STATUS)) {
> +                        active = FALSE;
>                } else {
>                        active = TRUE;
>                }
>  (snip)
>
>
> In the source for development of Pacemaker, handling of this processing seems to be considerably changed.
> It requests backporting to Pacemaker1.0 system of this change that we can do it.
>
> Best Regards,
> Hideo Yamauchi.
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


renayama19661014 at ybb

Jan 5, 2012, 5:37 PM

Post #3 of 15 (1549 views)
Permalink
Re: [Problem]It is judged that a stopping resource is starting. [In reply to]

Hi Andrew,

Thank you for comment.

> But it should have a subsequent stop action which would set it back to
> being inactive.
> Did that not happen in this case?

Yes.

Log of "verify_stopped" is only recorded.
The stop handling of resource that failed in probe was not carried out.

-----------------------------
######### yamauchi PREV STOP ##########
Jan 6 19:21:56 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ifcheckd process group 3462 with signal 15
Jan 6 19:21:56 rh57-1 ifcheckd: [3462]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
Jan 6 19:21:56 rh57-1 ifcheckd: [3462]: info: do_node_walk: Requesting the list of configured nodes
Jan 6 19:21:58 rh57-1 ifcheckd: [3462]: info: main: Exiting ifcheckd
Jan 6 19:21:58 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/crmd process group 3461 with signal 15
Jan 6 19:21:58 rh57-1 crmd: [3461]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
Jan 6 19:21:58 rh57-1 crmd: [3461]: info: crm_shutdown: Requesting shutdown
Jan 6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
Jan 6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
Jan 6 19:21:58 rh57-1 crmd: [3461]: info: do_shutdown_req: Sending shutdown request to DC: rh57-1
Jan 6 19:21:59 rh57-1 crmd: [3461]: info: handle_shutdown_request: Creating shutdown request for rh57-1 (state=S_POLICY_ENGINE)
Jan 6 19:21:59 rh57-1 attrd: [3460]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (1325845319)
Jan 6 19:21:59 rh57-1 attrd: [3460]: info: attrd_perform_update: Sent update 14: shutdown=1325845319
Jan 6 19:21:59 rh57-1 crmd: [3461]: info: abort_transition_graph: te_update_diff:150 - Triggered transition abort (complete=1, tag=nvpair, id=status-1fdd5e2a-44b6-44b9-9993-97fa120072a4-shutdown, name=shutdown, value=1325845319, magic=NA, cib=0.101.16) : Transient attribute: update
Jan 6 19:22:01 rh57-1 crmd: [3461]: info: crm_timer_popped: New Transition Timer (I_PE_CALC) just popped!
Jan 6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke: Query 44: Requesting the current CIB: S_POLICY_ENGINE
Jan 6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke_callback: Invoking the PE: query=44, ref=pe_calc-dc-1325845321-26, seq=1, quorate=0
Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: unpack_config: On loss of CCM Quorum: Ignore
Jan 6 19:22:01 rh57-1 pengine: [3464]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Jan 6 19:22:01 rh57-1 pengine: [3464]: WARN: unpack_nodes: Blind faith: not fencing unseen nodes
Jan 6 19:22:01 rh57-1 pengine: [3464]: info: determine_online_status: Node rh57-1 is shutting down
Jan 6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from re-starting anywhere in the cluster
Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: group_print: Resource Group: grpUltraMonkey
Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: native_print: prmVIP (ocf::heartbeat:LVM): Stopped
Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: group_print: Resource Group: grpStonith1
Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: native_print: prmStonith1-2 (stonith:external/ssh): Stopped
Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: native_print: prmStonith1-3 (stonith:meatware): Stopped
Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: group_print: Resource Group: grpStonith2
Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: native_print: prmStonith2-2 (stonith:external/ssh): Started rh57-1
Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: native_print: prmStonith2-3 (stonith:meatware): Started rh57-1
Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: clone_print: Clone Set: clnPingd
Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: short_print: Started: [ rh57-1 ]
Jan 6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: clnPingd: Rolling back scores from prmVIP
Jan 6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmPingd:0 cannot run anywhere
Jan 6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmVIP cannot run anywhere
Jan 6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith1-2: Rolling back scores from prmStonith1-3
Jan 6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-2 cannot run anywhere
Jan 6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-3 cannot run anywhere
Jan 6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith2-2: Rolling back scores from prmStonith2-3
Jan 6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-2 cannot run anywhere
Jan 6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-3 cannot run anywhere
Jan 6 19:22:01 rh57-1 pengine: [3464]: info: stage6: Scheduling Node rh57-1 for shutdown
Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave resource prmVIP (Stopped)
Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave resource prmStonith1-2 (Stopped)
Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave resource prmStonith1-3 (Stopped)
Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop resource prmStonith2-2 (rh57-1)
Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop resource prmStonith2-3 (rh57-1)
Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop resource prmPingd:0 (rh57-1)
Jan 6 19:22:01 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Jan 6 19:22:01 rh57-1 pengine: [3464]: info: process_pe_message: Transition 4: PEngine Input stored in: /var/lib/pengine/pe-input-4.bz2
Jan 6 19:22:01 rh57-1 crmd: [3461]: info: unpack_graph: Unpacked transition 4: 9 actions in 9 synapses
Jan 6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing graph 4 (ref=pe_calc-dc-1325845321-26) derived from /var/lib/pengine/pe-input-4.bz2
Jan 6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 19 fired and confirmed
Jan 6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 24 fired and confirmed
Jan 6 19:22:01 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 21: stop prmPingd:0_stop_0 on rh57-1 (local)
Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[10] on prmPingd:0 for client 3461, its parameters: CRM_meta_interval=[10000] multiplier=[100] CRM_meta_on_fail=[restart] CRM_meta_timeout=[60000] name=[default_ping_set] CRM_meta_clone_max=[1] crm_feature_set=[3.0.1] host_list=[192.168.40.1] CRM_meta_globally_unique=[false] CRM_meta_name=[monitor] CRM_meta_clone=[0] CRM_meta_clone_node_max=[1] CRM_meta_notify=[false] cancelled
Jan 6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=21:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmPingd:0_stop_0 )
Jan 6 19:22:02 rh57-1 pingd: [3529]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmPingd:0 stop[14] (pid 3612)
Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[14] on prmPingd:0 for client 3461: pid 3612 exited with return code 0
Jan 6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_monitor_10000 (call=10, status=1, cib-update=0, confirmed=true) Cancelled
Jan 6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_stop_0 (call=14, rc=0, cib-update=45, confirmed=true) ok
Jan 6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmPingd:0_stop_0 (21) confirmed on rh57-1 (rc=0)
Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 25 fired and confirmed
Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 4 fired and confirmed
Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 16: stop prmStonith2-3_stop_0 on rh57-1 (local)
Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[13] on prmStonith2-3 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[600s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[2] CRM_meta_name=[monitor] cancelled
Jan 6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=16:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-3_stop_0 )
Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-3 stop[15] (pid 3617)
Jan 6 19:22:02 rh57-1 lrmd: [3617]: info: Try to stop STONITH resource <rsc_id=prmStonith2-3> : Device=meatware
Jan 6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_monitor_3600000 (call=13, status=1, cib-update=0, confirmed=true) Cancelled
Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[15] on prmStonith2-3 for client 3461: pid 3617 exited with return code 0
Jan 6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_stop_0 (call=15, rc=0, cib-update=46, confirmed=true) ok
Jan 6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-3_stop_0 (16) confirmed on rh57-1 (rc=0)
Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 15: stop prmStonith2-2_stop_0 on rh57-1 (local)
Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[11] on prmStonith2-2 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[60s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[1] CRM_meta_name=[monitor] cancelled
Jan 6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=15:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-2_stop_0 )
Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-2 stop[16] (pid 3619)
Jan 6 19:22:02 rh57-1 lrmd: [3619]: info: Try to stop STONITH resource <rsc_id=prmStonith2-2> : Device=external/ssh
Jan 6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_monitor_3600000 (call=11, status=1, cib-update=0, confirmed=true) Cancelled
Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[16] on prmStonith2-2 for client 3461: pid 3619 exited with return code 0
Jan 6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_stop_0 (call=16, rc=0, cib-update=47, confirmed=true) ok
Jan 6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-2_stop_0 (15) confirmed on rh57-1 (rc=0)
Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 20 fired and confirmed
Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: Executing crm-event (28): do_shutdown on rh57-1
Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: crm-event (28) is a local shutdown
Jan 6 19:22:02 rh57-1 crmd: [3461]: info: run_graph: ====================================================
Jan 6 19:22:02 rh57-1 crmd: [3461]: notice: run_graph: Transition 4 (Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-4.bz2): Complete
Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_graph_trigger: Transition 4 is now complete
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_dc_release: DC role released
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
Jan 6 19:22:03 rh57-1 pengine: [3464]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Transitioner is now inactive
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Disconnecting STONITH...
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: tengine_stonith_connection_destroy: Fencing daemon disconnected
Jan 6 19:22:03 rh57-1 crmd: [3461]: notice: Not currently connected.
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
Jan 6 19:22:03 rh57-1 crmd: [3461]: WARN: register_fsa_input_adv: do_shutdown stalled the FSA with pending inputs
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
Jan 6 19:22:03 rh57-1 crmd: [3461]: WARN: do_log: FSA: Input I_RELEASE_SUCCESS from do_dc_release() received in state S_STOPPING
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
Jan 6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 420 ms (> 100 ms) before being called (GSource: 0x179d9b0)
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: G_SIG_dispatch: started at 429442052 should have started at 429442010
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: crmdManagedChildDied: Process pengine:[3464] exited (signal=0, exitcode=0)
Jan 6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 80 ms (> 30 ms) (GSource: 0x179d9b0)
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: pe_msg_dispatch: Received HUP from pengine:[3464]
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: pe_connection_destroy: Connection to the Policy Engine released
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
Jan 6 19:22:03 rh57-1 crmd: [3461]: ERROR: verify_stopped: Resource prmVIP was active at shutdown. You may ignore this error if it is unmanaged.
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_lrm_control: Disconnected from the LRM
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_ha_control: Disconnected from Heartbeat
Jan 6 19:22:03 rh57-1 ccm: [3456]: info: client (pid=3461) removed from ccm
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_cib_control: Disconnecting CIB
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: crmd_cib_connection_destroy: Connection to the CIB terminated...
Jan 6 19:22:03 rh57-1 cib: [3457]: info: cib_process_readwrite: We are now in R/O mode
Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd
Jan 6 19:22:03 rh57-1 cib: [3457]: WARN: send_ipc_message: IPC Channel to 3461 is not connected
Jan 6 19:22:04 rh57-1 crmd: [3461]: info: free_mem: Dropping I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ]
Jan 6 19:22:04 rh57-1 cib: [3457]: WARN: send_via_callback_channel: Delivery of reply to client 3461/5f69edda-aec9-42c7-ae52-045a05d1c5db failed
Jan 6 19:22:04 rh57-1 crmd: [3461]: info: do_exit: [crmd] stopped (0)
Jan 6 19:22:04 rh57-1 cib: [3457]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed
Jan 6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/attrd process group 3460 with signal 15
Jan 6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 50 ms (> 30 ms) (GSource: 0x7b28140)
Jan 6 19:22:04 rh57-1 attrd: [3460]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
Jan 6 19:22:04 rh57-1 attrd: [3460]: info: attrd_shutdown: Exiting
Jan 6 19:22:04 rh57-1 attrd: [3460]: info: main: Exiting...
Jan 6 19:22:04 rh57-1 attrd: [3460]: info: attrd_cib_connection_destroy: Connection to the CIB terminated...
Jan 6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/stonithd process group 3459 with signal 15
Jan 6 19:22:04 rh57-1 stonithd: [3459]: notice: /usr/lib64/heartbeat/stonithd normally quit.
Jan 6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/lrmd -r process group 3458 with signal 15
Jan 6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
Jan 6 19:22:04 rh57-1 lrmd: [3458]: info: lrmd is shutting down
Jan 6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/cib process group 3457 with signal 15
Jan 6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
Jan 6 19:22:04 rh57-1 cib: [3457]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
Jan 6 19:22:04 rh57-1 cib: [3457]: info: cib_shutdown: Disconnected 0 clients
Jan 6 19:22:04 rh57-1 cib: [3457]: info: cib_process_disconnect: All clients disconnected...
Jan 6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: initiate_exit: Disconnecting heartbeat
Jan 6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: Exiting...
Jan 6 19:22:04 rh57-1 cib: [3457]: info: main: Done
Jan 6 19:22:04 rh57-1 ccm: [3456]: info: client (pid=3457) removed from ccm
Jan 6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ccm process group 3456 with signal 15
Jan 6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 60 ms (> 30 ms) (GSource: 0x7b28140)
Jan 6 19:22:04 rh57-1 ccm: [3456]: info: received SIGTERM, going to shut down
Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBFIFO process 3446 with signal 15
Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3447 with signal 15
Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3448 with signal 15
Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3449 with signal 15
Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3450 with signal 15
Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3448 exited. 5 remaining
Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3447 exited. 4 remaining
Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3450 exited. 3 remaining
Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3446 exited. 2 remaining
Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3449 exited. 1 remaining
Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: rh57-1 Heartbeat shutdown complete.

-----------------------------



Best Regards,
Hideo Yamauchi.





--- On Fri, 2012/1/6, Andrew Beekhof <andrew [at] beekhof> wrote:

> On Tue, Dec 27, 2011 at 6:15 PM,  <renayama19661014 [at] ybb> wrote:
> > Hi All,
> >
> > When Pacemaker stops when there is the resource that failed in probe processing, crmd outputs the following error message.
> >
> >
> >  Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX was active at shutdown.  You may ignore this error if it is unmanaged.
> >
> >
> > Because the resource that failed in probe processing does not start,
>
> But it should have a subsequent stop action which would set it back to
> being inactive.
> Did that not happen in this case?
>
> > this error message is not right.
> >
> > I think that the following correction may be good, but we do not have conviction.
> >
> >
> >  * crmd/lrm.c
> >  (snip)
> >                } else if(op->rc == EXECRA_NOT_RUNNING) {
> >                        active = FALSE;
> > +                } else if(op->rc != EXECRA_OK && op->interval == 0
> > +                                && safe_str_eq(op->op_type, CRMD_ACTION_STATUS)) {
> > +                        active = FALSE;
> >                } else {
> >                        active = TRUE;
> >                }
> >  (snip)
> >
> >
> > In the source for development of Pacemaker, handling of this processing seems to be considerably changed.
> > It requests backporting to Pacemaker1.0 system of this change that we can do it.
> >
> > Best Regards,
> > Hideo Yamauchi.
> >
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker [at] oss
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andrew at beekhof

Jan 15, 2012, 10:28 PM

Post #4 of 15 (1522 views)
Permalink
Re: [Problem]It is judged that a stopping resource is starting. [In reply to]

On Fri, Jan 6, 2012 at 12:37 PM, <renayama19661014 [at] ybb> wrote:
> Hi Andrew,
>
> Thank you for comment.
>
>> But it should have a subsequent stop action which would set it back to
>> being inactive.
>> Did that not happen in this case?
>
> Yes.

Could you send me the PE file related to this log please?

Jan 6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
graph 4 (ref=pe_calc-dc-1325845321-26) derived from
/var/lib/pengine/pe-input-4.bz2



> Log of "verify_stopped" is only recorded.
> The stop handling of resource that failed in probe was not carried out.
>
> -----------------------------
> ######### yamauchi PREV STOP ##########
> Jan  6 19:21:56 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ifcheckd process group 3462 with signal 15
> Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: do_node_walk: Requesting the list of configured nodes
> Jan  6 19:21:58 rh57-1 ifcheckd: [3462]: info: main: Exiting ifcheckd
> Jan  6 19:21:58 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/crmd process group 3461 with signal 15
> Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_shutdown: Requesting shutdown
> Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
> Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
> Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_shutdown_req: Sending shutdown request to DC: rh57-1
> Jan  6 19:21:59 rh57-1 crmd: [3461]: info: handle_shutdown_request: Creating shutdown request for rh57-1 (state=S_POLICY_ENGINE)
> Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (1325845319)
> Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_perform_update: Sent update 14: shutdown=1325845319
> Jan  6 19:21:59 rh57-1 crmd: [3461]: info: abort_transition_graph: te_update_diff:150 - Triggered transition abort (complete=1, tag=nvpair, id=status-1fdd5e2a-44b6-44b9-9993-97fa120072a4-shutdown, name=shutdown, value=1325845319, magic=NA, cib=0.101.16) : Transient attribute: update
> Jan  6 19:22:01 rh57-1 crmd: [3461]: info: crm_timer_popped: New Transition Timer (I_PE_CALC) just popped!
> Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke: Query 44: Requesting the current CIB: S_POLICY_ENGINE
> Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke_callback: Invoking the PE: query=44, ref=pe_calc-dc-1325845321-26, seq=1, quorate=0
> Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Jan  6 19:22:01 rh57-1 pengine: [3464]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> Jan  6 19:22:01 rh57-1 pengine: [3464]: WARN: unpack_nodes: Blind faith: not fencing unseen nodes
> Jan  6 19:22:01 rh57-1 pengine: [3464]: info: determine_online_status: Node rh57-1 is shutting down
> Jan  6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from re-starting anywhere in the cluster
> Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpUltraMonkey
> Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmVIP       (ocf::heartbeat:LVM):   Stopped
> Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith1
> Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-2        (stonith:external/ssh): Stopped
> Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-3        (stonith:meatware):     Stopped
> Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith2
> Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-2        (stonith:external/ssh): Started rh57-1
> Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-3        (stonith:meatware):     Started rh57-1
> Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: clone_print:  Clone Set: clnPingd
> Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: short_print:      Started: [ rh57-1 ]
> Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: clnPingd: Rolling back scores from prmVIP
> Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmPingd:0 cannot run anywhere
> Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmVIP cannot run anywhere
> Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith1-2: Rolling back scores from prmStonith1-3
> Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-2 cannot run anywhere
> Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-3 cannot run anywhere
> Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith2-2: Rolling back scores from prmStonith2-3
> Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-2 cannot run anywhere
> Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-3 cannot run anywhere
> Jan  6 19:22:01 rh57-1 pengine: [3464]: info: stage6: Scheduling Node rh57-1 for shutdown
> Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmVIP     (Stopped)
> Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-2      (Stopped)
> Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-3      (Stopped)
> Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-2      (rh57-1)
> Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-3      (rh57-1)
> Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmPingd:0 (rh57-1)
> Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> Jan  6 19:22:01 rh57-1 pengine: [3464]: info: process_pe_message: Transition 4: PEngine Input stored in: /var/lib/pengine/pe-input-4.bz2
> Jan  6 19:22:01 rh57-1 crmd: [3461]: info: unpack_graph: Unpacked transition 4: 9 actions in 9 synapses
> Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing graph 4 (ref=pe_calc-dc-1325845321-26) derived from /var/lib/pengine/pe-input-4.bz2
> Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 19 fired and confirmed
> Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 24 fired and confirmed
> Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 21: stop prmPingd:0_stop_0 on rh57-1 (local)
> Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[10] on prmPingd:0 for client 3461, its parameters: CRM_meta_interval=[10000] multiplier=[100] CRM_meta_on_fail=[restart] CRM_meta_timeout=[60000] name=[default_ping_set] CRM_meta_clone_max=[1] crm_feature_set=[3.0.1] host_list=[192.168.40.1] CRM_meta_globally_unique=[false] CRM_meta_name=[monitor] CRM_meta_clone=[0] CRM_meta_clone_node_max=[1] CRM_meta_notify=[false]  cancelled
> Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=21:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmPingd:0_stop_0 )
> Jan  6 19:22:02 rh57-1 pingd: [3529]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmPingd:0 stop[14] (pid 3612)
> Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[14] on prmPingd:0 for client 3461: pid 3612 exited with return code 0
> Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_monitor_10000 (call=10, status=1, cib-update=0, confirmed=true) Cancelled
> Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_stop_0 (call=14, rc=0, cib-update=45, confirmed=true) ok
> Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmPingd:0_stop_0 (21) confirmed on rh57-1 (rc=0)
> Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 25 fired and confirmed
> Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 4 fired and confirmed
> Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 16: stop prmStonith2-3_stop_0 on rh57-1 (local)
> Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[13] on prmStonith2-3 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[600s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[2] CRM_meta_name=[monitor]  cancelled
> Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=16:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-3_stop_0 )
> Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-3 stop[15] (pid 3617)
> Jan  6 19:22:02 rh57-1 lrmd: [3617]: info: Try to stop STONITH resource <rsc_id=prmStonith2-3> : Device=meatware
> Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_monitor_3600000 (call=13, status=1, cib-update=0, confirmed=true) Cancelled
> Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[15] on prmStonith2-3 for client 3461: pid 3617 exited with return code 0
> Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_stop_0 (call=15, rc=0, cib-update=46, confirmed=true) ok
> Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-3_stop_0 (16) confirmed on rh57-1 (rc=0)
> Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 15: stop prmStonith2-2_stop_0 on rh57-1 (local)
> Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[11] on prmStonith2-2 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[60s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[1] CRM_meta_name=[monitor]  cancelled
> Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=15:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-2_stop_0 )
> Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-2 stop[16] (pid 3619)
> Jan  6 19:22:02 rh57-1 lrmd: [3619]: info: Try to stop STONITH resource <rsc_id=prmStonith2-2> : Device=external/ssh
> Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_monitor_3600000 (call=11, status=1, cib-update=0, confirmed=true) Cancelled
> Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[16] on prmStonith2-2 for client 3461: pid 3619 exited with return code 0
> Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_stop_0 (call=16, rc=0, cib-update=47, confirmed=true) ok
> Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-2_stop_0 (15) confirmed on rh57-1 (rc=0)
> Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 20 fired and confirmed
> Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: Executing crm-event (28): do_shutdown on rh57-1
> Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: crm-event (28) is a local shutdown
> Jan  6 19:22:02 rh57-1 crmd: [3461]: info: run_graph: ====================================================
> Jan  6 19:22:02 rh57-1 crmd: [3461]: notice: run_graph: Transition 4 (Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-4.bz2): Complete
> Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_graph_trigger: Transition 4 is now complete
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ]
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_dc_release: DC role released
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
> Jan  6 19:22:03 rh57-1 pengine: [3464]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Transitioner is now inactive
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Disconnecting STONITH...
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: tengine_stonith_connection_destroy: Fencing daemon disconnected
> Jan  6 19:22:03 rh57-1 crmd: [3461]: notice: Not currently connected.
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
> Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: register_fsa_input_adv: do_shutdown stalled the FSA with pending inputs
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
> Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: do_log: FSA: Input I_RELEASE_SUCCESS from do_dc_release() received in state S_STOPPING
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
> Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 420 ms (> 100 ms) before being called (GSource: 0x179d9b0)
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: G_SIG_dispatch: started at 429442052 should have started at 429442010
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmdManagedChildDied: Process pengine:[3464] exited (signal=0, exitcode=0)
> Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 80 ms (> 30 ms) (GSource: 0x179d9b0)
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_msg_dispatch: Received HUP from pengine:[3464]
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_connection_destroy: Connection to the Policy Engine released
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
> Jan  6 19:22:03 rh57-1 crmd: [3461]: ERROR: verify_stopped: Resource prmVIP was active at shutdown.  You may ignore this error if it is unmanaged.
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_lrm_control: Disconnected from the LRM
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_ha_control: Disconnected from Heartbeat
> Jan  6 19:22:03 rh57-1 ccm: [3456]: info: client (pid=3461) removed from ccm
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_cib_control: Disconnecting CIB
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmd_cib_connection_destroy: Connection to the CIB terminated...
> Jan  6 19:22:03 rh57-1 cib: [3457]: info: cib_process_readwrite: We are now in R/O mode
> Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd
> Jan  6 19:22:03 rh57-1 cib: [3457]: WARN: send_ipc_message: IPC Channel to 3461 is not connected
> Jan  6 19:22:04 rh57-1 crmd: [3461]: info: free_mem: Dropping I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ]
> Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: send_via_callback_channel: Delivery of reply to client 3461/5f69edda-aec9-42c7-ae52-045a05d1c5db failed
> Jan  6 19:22:04 rh57-1 crmd: [3461]: info: do_exit: [crmd] stopped (0)
> Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed
> Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/attrd process group 3460 with signal 15
> Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 50 ms (> 30 ms) (GSource: 0x7b28140)
> Jan  6 19:22:04 rh57-1 attrd: [3460]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_shutdown: Exiting
> Jan  6 19:22:04 rh57-1 attrd: [3460]: info: main: Exiting...
> Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_cib_connection_destroy: Connection to the CIB terminated...
> Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/stonithd process group 3459 with signal 15
> Jan  6 19:22:04 rh57-1 stonithd: [3459]: notice: /usr/lib64/heartbeat/stonithd normally quit.
> Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/lrmd -r process group 3458 with signal 15
> Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
> Jan  6 19:22:04 rh57-1 lrmd: [3458]: info: lrmd is shutting down
> Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/cib process group 3457 with signal 15
> Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
> Jan  6 19:22:04 rh57-1 cib: [3457]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_shutdown: Disconnected 0 clients
> Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_process_disconnect: All clients disconnected...
> Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: initiate_exit: Disconnecting heartbeat
> Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: Exiting...
> Jan  6 19:22:04 rh57-1 cib: [3457]: info: main: Done
> Jan  6 19:22:04 rh57-1 ccm: [3456]: info: client (pid=3457) removed from ccm
> Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ccm process group 3456 with signal 15
> Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 60 ms (> 30 ms) (GSource: 0x7b28140)
> Jan  6 19:22:04 rh57-1 ccm: [3456]: info: received SIGTERM, going to shut down
> Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBFIFO process 3446 with signal 15
> Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3447 with signal 15
> Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3448 with signal 15
> Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3449 with signal 15
> Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3450 with signal 15
> Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3448 exited. 5 remaining
> Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3447 exited. 4 remaining
> Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3450 exited. 3 remaining
> Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3446 exited. 2 remaining
> Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3449 exited. 1 remaining
> Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: rh57-1 Heartbeat shutdown complete.
>
> -----------------------------
>
>
>
> Best Regards,
> Hideo Yamauchi.
>
>
>
>
>
> --- On Fri, 2012/1/6, Andrew Beekhof <andrew [at] beekhof> wrote:
>
>> On Tue, Dec 27, 2011 at 6:15 PM,  <renayama19661014 [at] ybb> wrote:
>> > Hi All,
>> >
>> > When Pacemaker stops when there is the resource that failed in probe processing, crmd outputs the following error message.
>> >
>> >
>> >  Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX was active at shutdown.  You may ignore this error if it is unmanaged.
>> >
>> >
>> > Because the resource that failed in probe processing does not start,
>>
>> But it should have a subsequent stop action which would set it back to
>> being inactive.
>> Did that not happen in this case?
>>
>> > this error message is not right.
>> >
>> > I think that the following correction may be good, but we do not have conviction.
>> >
>> >
>> >  * crmd/lrm.c
>> >  (snip)
>> >                } else if(op->rc == EXECRA_NOT_RUNNING) {
>> >                        active = FALSE;
>> > +                } else if(op->rc != EXECRA_OK && op->interval == 0
>> > +                                && safe_str_eq(op->op_type, CRMD_ACTION_STATUS)) {
>> > +                        active = FALSE;
>> >                } else {
>> >                        active = TRUE;
>> >                }
>> >  (snip)
>> >
>> >
>> > In the source for development of Pacemaker, handling of this processing seems to be considerably changed.
>> > It requests backporting to Pacemaker1.0 system of this change that we can do it.
>> >
>> > Best Regards,
>> > Hideo Yamauchi.
>> >
>> >
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker [at] oss
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


renayama19661014 at ybb

Jan 15, 2012, 11:03 PM

Post #5 of 15 (1515 views)
Permalink
Re: [Problem]It is judged that a stopping resource is starting. [In reply to]

Hi Andrew,

Thank you for comments.

> Could you send me the PE file related to this log please?
>
> Jan 6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
> graph 4 (ref=pe_calc-dc-1325845321-26) derived from
> /var/lib/pengine/pe-input-4.bz2

The old file disappeared.
I send log and the PE file which reappeared in the same procedure.

* trac1818.zip
* https://skydrive.live.com/?cid=3a14d57622c66876&id=3A14D57622C66876%21127

Best Regards,
Hideo Yamauchi.


--- On Mon, 2012/1/16, Andrew Beekhof <andrew [at] beekhof> wrote:

> On Fri, Jan 6, 2012 at 12:37 PM,  <renayama19661014 [at] ybb> wrote:
> > Hi Andrew,
> >
> > Thank you for comment.
> >
> >> But it should have a subsequent stop action which would set it back to
> >> being inactive.
> >> Did that not happen in this case?
> >
> > Yes.
>
> Could you send me the PE file related to this log please?
>
> Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
> graph 4 (ref=pe_calc-dc-1325845321-26) derived from
> /var/lib/pengine/pe-input-4.bz2
>
>
>
> > Log of "verify_stopped" is only recorded.
> > The stop handling of resource that failed in probe was not carried out.
> >
> > -----------------------------
> > ######### yamauchi PREV STOP ##########
> > Jan  6 19:21:56 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ifcheckd process group 3462 with signal 15
> > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: do_node_walk: Requesting the list of configured nodes
> > Jan  6 19:21:58 rh57-1 ifcheckd: [3462]: info: main: Exiting ifcheckd
> > Jan  6 19:21:58 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/crmd process group 3461 with signal 15
> > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_shutdown: Requesting shutdown
> > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
> > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
> > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_shutdown_req: Sending shutdown request to DC: rh57-1
> > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: handle_shutdown_request: Creating shutdown request for rh57-1 (state=S_POLICY_ENGINE)
> > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (1325845319)
> > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_perform_update: Sent update 14: shutdown=1325845319
> > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: abort_transition_graph: te_update_diff:150 - Triggered transition abort (complete=1, tag=nvpair, id=status-1fdd5e2a-44b6-44b9-9993-97fa120072a4-shutdown, name=shutdown, value=1325845319, magic=NA, cib=0.101.16) : Transient attribute: update
> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: crm_timer_popped: New Transition Timer (I_PE_CALC) just popped!
> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke: Query 44: Requesting the current CIB: S_POLICY_ENGINE
> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke_callback: Invoking the PE: query=44, ref=pe_calc-dc-1325845321-26, seq=1, quorate=0
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: WARN: unpack_nodes: Blind faith: not fencing unseen nodes
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: determine_online_status: Node rh57-1 is shutting down
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from re-starting anywhere in the cluster
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpUltraMonkey
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmVIP       (ocf::heartbeat:LVM):   Stopped
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith1
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-2        (stonith:external/ssh): Stopped
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-3        (stonith:meatware):     Stopped
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith2
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-2        (stonith:external/ssh): Started rh57-1
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-3        (stonith:meatware):     Started rh57-1
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: clone_print:  Clone Set: clnPingd
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: short_print:      Started: [ rh57-1 ]
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: clnPingd: Rolling back scores from prmVIP
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmPingd:0 cannot run anywhere
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmVIP cannot run anywhere
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith1-2: Rolling back scores from prmStonith1-3
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-2 cannot run anywhere
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-3 cannot run anywhere
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith2-2: Rolling back scores from prmStonith2-3
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-2 cannot run anywhere
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-3 cannot run anywhere
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: stage6: Scheduling Node rh57-1 for shutdown
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmVIP     (Stopped)
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-2      (Stopped)
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-3      (Stopped)
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-2      (rh57-1)
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-3      (rh57-1)
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmPingd:0 (rh57-1)
> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: process_pe_message: Transition 4: PEngine Input stored in: /var/lib/pengine/pe-input-4.bz2
> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: unpack_graph: Unpacked transition 4: 9 actions in 9 synapses
> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing graph 4 (ref=pe_calc-dc-1325845321-26) derived from /var/lib/pengine/pe-input-4.bz2
> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 19 fired and confirmed
> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 24 fired and confirmed
> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 21: stop prmPingd:0_stop_0 on rh57-1 (local)
> > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[10] on prmPingd:0 for client 3461, its parameters: CRM_meta_interval=[10000] multiplier=[100] CRM_meta_on_fail=[restart] CRM_meta_timeout=[60000] name=[default_ping_set] CRM_meta_clone_max=[1] crm_feature_set=[3.0.1] host_list=[192.168.40.1] CRM_meta_globally_unique=[false] CRM_meta_name=[monitor] CRM_meta_clone=[0] CRM_meta_clone_node_max=[1] CRM_meta_notify=[false]  cancelled
> > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=21:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmPingd:0_stop_0 )
> > Jan  6 19:22:02 rh57-1 pingd: [3529]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmPingd:0 stop[14] (pid 3612)
> > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[14] on prmPingd:0 for client 3461: pid 3612 exited with return code 0
> > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_monitor_10000 (call=10, status=1, cib-update=0, confirmed=true) Cancelled
> > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_stop_0 (call=14, rc=0, cib-update=45, confirmed=true) ok
> > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmPingd:0_stop_0 (21) confirmed on rh57-1 (rc=0)
> > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 25 fired and confirmed
> > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 4 fired and confirmed
> > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 16: stop prmStonith2-3_stop_0 on rh57-1 (local)
> > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[13] on prmStonith2-3 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[600s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[2] CRM_meta_name=[monitor]  cancelled
> > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=16:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-3_stop_0 )
> > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-3 stop[15] (pid 3617)
> > Jan  6 19:22:02 rh57-1 lrmd: [3617]: info: Try to stop STONITH resource <rsc_id=prmStonith2-3> : Device=meatware
> > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_monitor_3600000 (call=13, status=1, cib-update=0, confirmed=true) Cancelled
> > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[15] on prmStonith2-3 for client 3461: pid 3617 exited with return code 0
> > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_stop_0 (call=15, rc=0, cib-update=46, confirmed=true) ok
> > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-3_stop_0 (16) confirmed on rh57-1 (rc=0)
> > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 15: stop prmStonith2-2_stop_0 on rh57-1 (local)
> > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[11] on prmStonith2-2 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[60s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[1] CRM_meta_name=[monitor]  cancelled
> > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=15:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-2_stop_0 )
> > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-2 stop[16] (pid 3619)
> > Jan  6 19:22:02 rh57-1 lrmd: [3619]: info: Try to stop STONITH resource <rsc_id=prmStonith2-2> : Device=external/ssh
> > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_monitor_3600000 (call=11, status=1, cib-update=0, confirmed=true) Cancelled
> > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[16] on prmStonith2-2 for client 3461: pid 3619 exited with return code 0
> > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_stop_0 (call=16, rc=0, cib-update=47, confirmed=true) ok
> > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-2_stop_0 (15) confirmed on rh57-1 (rc=0)
> > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 20 fired and confirmed
> > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: Executing crm-event (28): do_shutdown on rh57-1
> > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: crm-event (28) is a local shutdown
> > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: run_graph: ====================================================
> > Jan  6 19:22:02 rh57-1 crmd: [3461]: notice: run_graph: Transition 4 (Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-4.bz2): Complete
> > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_graph_trigger: Transition 4 is now complete
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ]
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_dc_release: DC role released
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
> > Jan  6 19:22:03 rh57-1 pengine: [3464]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Transitioner is now inactive
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Disconnecting STONITH...
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: tengine_stonith_connection_destroy: Fencing daemon disconnected
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: notice: Not currently connected.
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: register_fsa_input_adv: do_shutdown stalled the FSA with pending inputs
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: do_log: FSA: Input I_RELEASE_SUCCESS from do_dc_release() received in state S_STOPPING
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 420 ms (> 100 ms) before being called (GSource: 0x179d9b0)
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: G_SIG_dispatch: started at 429442052 should have started at 429442010
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmdManagedChildDied: Process pengine:[3464] exited (signal=0, exitcode=0)
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 80 ms (> 30 ms) (GSource: 0x179d9b0)
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_msg_dispatch: Received HUP from pengine:[3464]
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_connection_destroy: Connection to the Policy Engine released
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: ERROR: verify_stopped: Resource prmVIP was active at shutdown.  You may ignore this error if it is unmanaged.
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_lrm_control: Disconnected from the LRM
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_ha_control: Disconnected from Heartbeat
> > Jan  6 19:22:03 rh57-1 ccm: [3456]: info: client (pid=3461) removed from ccm
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_cib_control: Disconnecting CIB
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmd_cib_connection_destroy: Connection to the CIB terminated...
> > Jan  6 19:22:03 rh57-1 cib: [3457]: info: cib_process_readwrite: We are now in R/O mode
> > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd
> > Jan  6 19:22:03 rh57-1 cib: [3457]: WARN: send_ipc_message: IPC Channel to 3461 is not connected
> > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: free_mem: Dropping I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ]
> > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: send_via_callback_channel: Delivery of reply to client 3461/5f69edda-aec9-42c7-ae52-045a05d1c5db failed
> > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: do_exit: [crmd] stopped (0)
> > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed
> > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/attrd process group 3460 with signal 15
> > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 50 ms (> 30 ms) (GSource: 0x7b28140)
> > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_shutdown: Exiting
> > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: main: Exiting...
> > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_cib_connection_destroy: Connection to the CIB terminated...
> > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/stonithd process group 3459 with signal 15
> > Jan  6 19:22:04 rh57-1 stonithd: [3459]: notice: /usr/lib64/heartbeat/stonithd normally quit.
> > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/lrmd -r process group 3458 with signal 15
> > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
> > Jan  6 19:22:04 rh57-1 lrmd: [3458]: info: lrmd is shutting down
> > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/cib process group 3457 with signal 15
> > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
> > Jan  6 19:22:04 rh57-1 cib: [3457]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_shutdown: Disconnected 0 clients
> > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_process_disconnect: All clients disconnected...
> > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: initiate_exit: Disconnecting heartbeat
> > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: Exiting...
> > Jan  6 19:22:04 rh57-1 cib: [3457]: info: main: Done
> > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: client (pid=3457) removed from ccm
> > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ccm process group 3456 with signal 15
> > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 60 ms (> 30 ms) (GSource: 0x7b28140)
> > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: received SIGTERM, going to shut down
> > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBFIFO process 3446 with signal 15
> > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3447 with signal 15
> > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3448 with signal 15
> > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3449 with signal 15
> > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3450 with signal 15
> > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3448 exited. 5 remaining
> > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3447 exited. 4 remaining
> > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3450 exited. 3 remaining
> > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3446 exited. 2 remaining
> > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3449 exited. 1 remaining
> > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: rh57-1 Heartbeat shutdown complete.
> >
> > -----------------------------
> >
> >
> >
> > Best Regards,
> > Hideo Yamauchi.
> >
> >
> >
> >
> >
> > --- On Fri, 2012/1/6, Andrew Beekhof <andrew [at] beekhof> wrote:
> >
> >> On Tue, Dec 27, 2011 at 6:15 PM,  <renayama19661014 [at] ybb> wrote:
> >> > Hi All,
> >> >
> >> > When Pacemaker stops when there is the resource that failed in probe processing, crmd outputs the following error message.
> >> >
> >> >
> >> >  Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX was active at shutdown.  You may ignore this error if it is unmanaged.
> >> >
> >> >
> >> > Because the resource that failed in probe processing does not start,
> >>
> >> But it should have a subsequent stop action which would set it back to
> >> being inactive.
> >> Did that not happen in this case?
> >>
> >> > this error message is not right.
> >> >
> >> > I think that the following correction may be good, but we do not have conviction.
> >> >
> >> >
> >> >  * crmd/lrm.c
> >> >  (snip)
> >> >                } else if(op->rc == EXECRA_NOT_RUNNING) {
> >> >                        active = FALSE;
> >> > +                } else if(op->rc != EXECRA_OK && op->interval == 0
> >> > +                                && safe_str_eq(op->op_type, CRMD_ACTION_STATUS)) {
> >> > +                        active = FALSE;
> >> >                } else {
> >> >                        active = TRUE;
> >> >                }
> >> >  (snip)
> >> >
> >> >
> >> > In the source for development of Pacemaker, handling of this processing seems to be considerably changed.
> >> > It requests backporting to Pacemaker1.0 system of this change that we can do it.
> >> >
> >> > Best Regards,
> >> > Hideo Yamauchi.
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > Pacemaker mailing list: Pacemaker [at] oss
> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >> >
> >> > Project Home: http://www.clusterlabs.org
> >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> > Bugs: http://bugs.clusterlabs.org
> >>
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker [at] oss
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


renayama19661014 at ybb

Feb 13, 2012, 4:20 PM

Post #6 of 15 (1465 views)
Permalink
Re: [Problem]It is judged that a stopping resource is starting. [In reply to]

Hi Andrew,

About this problem, how did it turn out afterwards?

Best Regards,
Hideo Yamauchi.


--- On Mon, 2012/1/16, renayama19661014 [at] ybb <renayama19661014 [at] ybb> wrote:

> Hi Andrew,
>
> Thank you for comments.
>
> > Could you send me the PE file related to this log please?
> >
> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
> > /var/lib/pengine/pe-input-4.bz2
>
> The old file disappeared.
> I send log and the PE file which reappeared in the same procedure.
>
> * trac1818.zip   
>   * https://skydrive.live.com/?cid=3a14d57622c66876&id=3A14D57622C66876%21127
>
> Best Regards,
> Hideo Yamauchi.
>
>
> --- On Mon, 2012/1/16, Andrew Beekhof <andrew [at] beekhof> wrote:
>
> > On Fri, Jan 6, 2012 at 12:37 PM,  <renayama19661014 [at] ybb> wrote:
> > > Hi Andrew,
> > >
> > > Thank you for comment.
> > >
> > >> But it should have a subsequent stop action which would set it back to
> > >> being inactive.
> > >> Did that not happen in this case?
> > >
> > > Yes.
> >
> > Could you send me the PE file related to this log please?
> >
> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
> > /var/lib/pengine/pe-input-4.bz2
> >
> >
> >
> > > Log of "verify_stopped" is only recorded.
> > > The stop handling of resource that failed in probe was not carried out.
> > >
> > > -----------------------------
> > > ######### yamauchi PREV STOP ##########
> > > Jan  6 19:21:56 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ifcheckd process group 3462 with signal 15
> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: do_node_walk: Requesting the list of configured nodes
> > > Jan  6 19:21:58 rh57-1 ifcheckd: [3462]: info: main: Exiting ifcheckd
> > > Jan  6 19:21:58 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/crmd process group 3461 with signal 15
> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_shutdown: Requesting shutdown
> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_shutdown_req: Sending shutdown request to DC: rh57-1
> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: handle_shutdown_request: Creating shutdown request for rh57-1 (state=S_POLICY_ENGINE)
> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (1325845319)
> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_perform_update: Sent update 14: shutdown=1325845319
> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: abort_transition_graph: te_update_diff:150 - Triggered transition abort (complete=1, tag=nvpair, id=status-1fdd5e2a-44b6-44b9-9993-97fa120072a4-shutdown, name=shutdown, value=1325845319, magic=NA, cib=0.101.16) : Transient attribute: update
> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: crm_timer_popped: New Transition Timer (I_PE_CALC) just popped!
> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke: Query 44: Requesting the current CIB: S_POLICY_ENGINE
> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke_callback: Invoking the PE: query=44, ref=pe_calc-dc-1325845321-26, seq=1, quorate=0
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: WARN: unpack_nodes: Blind faith: not fencing unseen nodes
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: determine_online_status: Node rh57-1 is shutting down
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from re-starting anywhere in the cluster
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpUltraMonkey
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmVIP       (ocf::heartbeat:LVM):   Stopped
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith1
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-2        (stonith:external/ssh): Stopped
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-3        (stonith:meatware):     Stopped
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith2
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-2        (stonith:external/ssh): Started rh57-1
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-3        (stonith:meatware):     Started rh57-1
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: clone_print:  Clone Set: clnPingd
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: short_print:      Started: [ rh57-1 ]
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: clnPingd: Rolling back scores from prmVIP
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmPingd:0 cannot run anywhere
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmVIP cannot run anywhere
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith1-2: Rolling back scores from prmStonith1-3
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-2 cannot run anywhere
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-3 cannot run anywhere
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith2-2: Rolling back scores from prmStonith2-3
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-2 cannot run anywhere
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-3 cannot run anywhere
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: stage6: Scheduling Node rh57-1 for shutdown
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmVIP     (Stopped)
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-2      (Stopped)
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-3      (Stopped)
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-2      (rh57-1)
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-3      (rh57-1)
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmPingd:0 (rh57-1)
> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: process_pe_message: Transition 4: PEngine Input stored in: /var/lib/pengine/pe-input-4.bz2
> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: unpack_graph: Unpacked transition 4: 9 actions in 9 synapses
> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing graph 4 (ref=pe_calc-dc-1325845321-26) derived from /var/lib/pengine/pe-input-4.bz2
> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 19 fired and confirmed
> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 24 fired and confirmed
> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 21: stop prmPingd:0_stop_0 on rh57-1 (local)
> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[10] on prmPingd:0 for client 3461, its parameters: CRM_meta_interval=[10000] multiplier=[100] CRM_meta_on_fail=[restart] CRM_meta_timeout=[60000] name=[default_ping_set] CRM_meta_clone_max=[1] crm_feature_set=[3.0.1] host_list=[192.168.40.1] CRM_meta_globally_unique=[false] CRM_meta_name=[monitor] CRM_meta_clone=[0] CRM_meta_clone_node_max=[1] CRM_meta_notify=[false]  cancelled
> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=21:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmPingd:0_stop_0 )
> > > Jan  6 19:22:02 rh57-1 pingd: [3529]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmPingd:0 stop[14] (pid 3612)
> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[14] on prmPingd:0 for client 3461: pid 3612 exited with return code 0
> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_monitor_10000 (call=10, status=1, cib-update=0, confirmed=true) Cancelled
> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_stop_0 (call=14, rc=0, cib-update=45, confirmed=true) ok
> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmPingd:0_stop_0 (21) confirmed on rh57-1 (rc=0)
> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 25 fired and confirmed
> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 4 fired and confirmed
> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 16: stop prmStonith2-3_stop_0 on rh57-1 (local)
> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[13] on prmStonith2-3 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[600s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[2] CRM_meta_name=[monitor]  cancelled
> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=16:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-3_stop_0 )
> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-3 stop[15] (pid 3617)
> > > Jan  6 19:22:02 rh57-1 lrmd: [3617]: info: Try to stop STONITH resource <rsc_id=prmStonith2-3> : Device=meatware
> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_monitor_3600000 (call=13, status=1, cib-update=0, confirmed=true) Cancelled
> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[15] on prmStonith2-3 for client 3461: pid 3617 exited with return code 0
> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_stop_0 (call=15, rc=0, cib-update=46, confirmed=true) ok
> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-3_stop_0 (16) confirmed on rh57-1 (rc=0)
> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 15: stop prmStonith2-2_stop_0 on rh57-1 (local)
> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[11] on prmStonith2-2 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[60s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[1] CRM_meta_name=[monitor]  cancelled
> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=15:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-2_stop_0 )
> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-2 stop[16] (pid 3619)
> > > Jan  6 19:22:02 rh57-1 lrmd: [3619]: info: Try to stop STONITH resource <rsc_id=prmStonith2-2> : Device=external/ssh
> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_monitor_3600000 (call=11, status=1, cib-update=0, confirmed=true) Cancelled
> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[16] on prmStonith2-2 for client 3461: pid 3619 exited with return code 0
> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_stop_0 (call=16, rc=0, cib-update=47, confirmed=true) ok
> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-2_stop_0 (15) confirmed on rh57-1 (rc=0)
> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 20 fired and confirmed
> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: Executing crm-event (28): do_shutdown on rh57-1
> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: crm-event (28) is a local shutdown
> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: run_graph: ====================================================
> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: notice: run_graph: Transition 4 (Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-4.bz2): Complete
> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_graph_trigger: Transition 4 is now complete
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ]
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_dc_release: DC role released
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
> > > Jan  6 19:22:03 rh57-1 pengine: [3464]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Transitioner is now inactive
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Disconnecting STONITH...
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: tengine_stonith_connection_destroy: Fencing daemon disconnected
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: notice: Not currently connected.
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: register_fsa_input_adv: do_shutdown stalled the FSA with pending inputs
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: do_log: FSA: Input I_RELEASE_SUCCESS from do_dc_release() received in state S_STOPPING
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 420 ms (> 100 ms) before being called (GSource: 0x179d9b0)
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: G_SIG_dispatch: started at 429442052 should have started at 429442010
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmdManagedChildDied: Process pengine:[3464] exited (signal=0, exitcode=0)
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 80 ms (> 30 ms) (GSource: 0x179d9b0)
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_msg_dispatch: Received HUP from pengine:[3464]
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_connection_destroy: Connection to the Policy Engine released
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: ERROR: verify_stopped: Resource prmVIP was active at shutdown.  You may ignore this error if it is unmanaged.
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_lrm_control: Disconnected from the LRM
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_ha_control: Disconnected from Heartbeat
> > > Jan  6 19:22:03 rh57-1 ccm: [3456]: info: client (pid=3461) removed from ccm
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_cib_control: Disconnecting CIB
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmd_cib_connection_destroy: Connection to the CIB terminated...
> > > Jan  6 19:22:03 rh57-1 cib: [3457]: info: cib_process_readwrite: We are now in R/O mode
> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd
> > > Jan  6 19:22:03 rh57-1 cib: [3457]: WARN: send_ipc_message: IPC Channel to 3461 is not connected
> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: free_mem: Dropping I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ]
> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: send_via_callback_channel: Delivery of reply to client 3461/5f69edda-aec9-42c7-ae52-045a05d1c5db failed
> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: do_exit: [crmd] stopped (0)
> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed
> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/attrd process group 3460 with signal 15
> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 50 ms (> 30 ms) (GSource: 0x7b28140)
> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_shutdown: Exiting
> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: main: Exiting...
> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_cib_connection_destroy: Connection to the CIB terminated...
> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/stonithd process group 3459 with signal 15
> > > Jan  6 19:22:04 rh57-1 stonithd: [3459]: notice: /usr/lib64/heartbeat/stonithd normally quit.
> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/lrmd -r process group 3458 with signal 15
> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
> > > Jan  6 19:22:04 rh57-1 lrmd: [3458]: info: lrmd is shutting down
> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/cib process group 3457 with signal 15
> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_shutdown: Disconnected 0 clients
> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_process_disconnect: All clients disconnected...
> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: initiate_exit: Disconnecting heartbeat
> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: Exiting...
> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: main: Done
> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: client (pid=3457) removed from ccm
> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ccm process group 3456 with signal 15
> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 60 ms (> 30 ms) (GSource: 0x7b28140)
> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: received SIGTERM, going to shut down
> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBFIFO process 3446 with signal 15
> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3447 with signal 15
> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3448 with signal 15
> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3449 with signal 15
> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3450 with signal 15
> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3448 exited. 5 remaining
> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3447 exited. 4 remaining
> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3450 exited. 3 remaining
> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3446 exited. 2 remaining
> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3449 exited. 1 remaining
> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: rh57-1 Heartbeat shutdown complete.
> > >
> > > -----------------------------
> > >
> > >
> > >
> > > Best Regards,
> > > Hideo Yamauchi.
> > >
> > >
> > >
> > >
> > >
> > > --- On Fri, 2012/1/6, Andrew Beekhof <andrew [at] beekhof> wrote:
> > >
> > >> On Tue, Dec 27, 2011 at 6:15 PM,  <renayama19661014 [at] ybb> wrote:
> > >> > Hi All,
> > >> >
> > >> > When Pacemaker stops when there is the resource that failed in probe processing, crmd outputs the following error message.
> > >> >
> > >> >
> > >> >  Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX was active at shutdown.  You may ignore this error if it is unmanaged.
> > >> >
> > >> >
> > >> > Because the resource that failed in probe processing does not start,
> > >>
> > >> But it should have a subsequent stop action which would set it back to
> > >> being inactive.
> > >> Did that not happen in this case?
> > >>
> > >> > this error message is not right.
> > >> >
> > >> > I think that the following correction may be good, but we do not have conviction.
> > >> >
> > >> >
> > >> >  * crmd/lrm.c
> > >> >  (snip)
> > >> >                } else if(op->rc == EXECRA_NOT_RUNNING) {
> > >> >                        active = FALSE;
> > >> > +                } else if(op->rc != EXECRA_OK && op->interval == 0
> > >> > +                                && safe_str_eq(op->op_type, CRMD_ACTION_STATUS)) {
> > >> > +                        active = FALSE;
> > >> >                } else {
> > >> >                        active = TRUE;
> > >> >                }
> > >> >  (snip)
> > >> >
> > >> >
> > >> > In the source for development of Pacemaker, handling of this processing seems to be considerably changed.
> > >> > It requests backporting to Pacemaker1.0 system of this change that we can do it.
> > >> >
> > >> > Best Regards,
> > >> > Hideo Yamauchi.
> > >> >
> > >> >
> > >> >
> > >> > _______________________________________________
> > >> > Pacemaker mailing list: Pacemaker [at] oss
> > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >> >
> > >> > Project Home: http://www.clusterlabs.org
> > >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > >> > Bugs: http://bugs.clusterlabs.org
> > >>
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker [at] oss
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> >
>

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andrew at beekhof

Feb 16, 2012, 3:54 AM

Post #7 of 15 (1463 views)
Permalink
Re: [Problem]It is judged that a stopping resource is starting. [In reply to]

Sorry!

I'm getting to this soon, really :-)
First it was corosync 2.0 stuff, so that /something/ in fedora-17
works, then fixing everything I broke when adding corosync 2.0
support.

On Tue, Feb 14, 2012 at 11:20 AM, <renayama19661014 [at] ybb> wrote:
> Hi Andrew,
>
> About this problem, how did it turn out afterwards?
>
> Best Regards,
> Hideo Yamauchi.
>
>
> --- On Mon, 2012/1/16, renayama19661014 [at] ybb <renayama19661014 [at] ybb> wrote:
>
>> Hi Andrew,
>>
>> Thank you for comments.
>>
>> > Could you send me the PE file related to this log please?
>> >
>> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
>> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
>> > /var/lib/pengine/pe-input-4.bz2
>>
>> The old file disappeared.
>> I send log and the PE file which reappeared in the same procedure.
>>
>>  * trac1818.zip
>>   * https://skydrive.live.com/?cid=3a14d57622c66876&id=3A14D57622C66876%21127
>>
>> Best Regards,
>> Hideo Yamauchi.
>>
>>
>> --- On Mon, 2012/1/16, Andrew Beekhof <andrew [at] beekhof> wrote:
>>
>> > On Fri, Jan 6, 2012 at 12:37 PM,  <renayama19661014 [at] ybb> wrote:
>> > > Hi Andrew,
>> > >
>> > > Thank you for comment.
>> > >
>> > >> But it should have a subsequent stop action which would set it back to
>> > >> being inactive.
>> > >> Did that not happen in this case?
>> > >
>> > > Yes.
>> >
>> > Could you send me the PE file related to this log please?
>> >
>> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
>> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
>> > /var/lib/pengine/pe-input-4.bz2
>> >
>> >
>> >
>> > > Log of "verify_stopped" is only recorded.
>> > > The stop handling of resource that failed in probe was not carried out.
>> > >
>> > > -----------------------------
>> > > ######### yamauchi PREV STOP ##########
>> > > Jan  6 19:21:56 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ifcheckd process group 3462 with signal 15
>> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: do_node_walk: Requesting the list of configured nodes
>> > > Jan  6 19:21:58 rh57-1 ifcheckd: [3462]: info: main: Exiting ifcheckd
>> > > Jan  6 19:21:58 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/crmd process group 3461 with signal 15
>> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_shutdown: Requesting shutdown
>> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
>> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
>> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_shutdown_req: Sending shutdown request to DC: rh57-1
>> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: handle_shutdown_request: Creating shutdown request for rh57-1 (state=S_POLICY_ENGINE)
>> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (1325845319)
>> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_perform_update: Sent update 14: shutdown=1325845319
>> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: abort_transition_graph: te_update_diff:150 - Triggered transition abort (complete=1, tag=nvpair, id=status-1fdd5e2a-44b6-44b9-9993-97fa120072a4-shutdown, name=shutdown, value=1325845319, magic=NA, cib=0.101.16) : Transient attribute: update
>> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: crm_timer_popped: New Transition Timer (I_PE_CALC) just popped!
>> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke: Query 44: Requesting the current CIB: S_POLICY_ENGINE
>> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke_callback: Invoking the PE: query=44, ref=pe_calc-dc-1325845321-26, seq=1, quorate=0
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: unpack_config: On loss of CCM Quorum: Ignore
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: WARN: unpack_nodes: Blind faith: not fencing unseen nodes
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: determine_online_status: Node rh57-1 is shutting down
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from re-starting anywhere in the cluster
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpUltraMonkey
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmVIP       (ocf::heartbeat:LVM):   Stopped
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith1
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-2        (stonith:external/ssh): Stopped
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-3        (stonith:meatware):     Stopped
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith2
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-2        (stonith:external/ssh): Started rh57-1
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-3        (stonith:meatware):     Started rh57-1
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: clone_print:  Clone Set: clnPingd
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: short_print:      Started: [ rh57-1 ]
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: clnPingd: Rolling back scores from prmVIP
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmPingd:0 cannot run anywhere
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmVIP cannot run anywhere
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith1-2: Rolling back scores from prmStonith1-3
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-2 cannot run anywhere
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-3 cannot run anywhere
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith2-2: Rolling back scores from prmStonith2-3
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-2 cannot run anywhere
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-3 cannot run anywhere
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: stage6: Scheduling Node rh57-1 for shutdown
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmVIP     (Stopped)
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-2      (Stopped)
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-3      (Stopped)
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-2      (rh57-1)
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-3      (rh57-1)
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmPingd:0 (rh57-1)
>> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
>> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: process_pe_message: Transition 4: PEngine Input stored in: /var/lib/pengine/pe-input-4.bz2
>> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: unpack_graph: Unpacked transition 4: 9 actions in 9 synapses
>> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing graph 4 (ref=pe_calc-dc-1325845321-26) derived from /var/lib/pengine/pe-input-4.bz2
>> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 19 fired and confirmed
>> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 24 fired and confirmed
>> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 21: stop prmPingd:0_stop_0 on rh57-1 (local)
>> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[10] on prmPingd:0 for client 3461, its parameters: CRM_meta_interval=[10000] multiplier=[100] CRM_meta_on_fail=[restart] CRM_meta_timeout=[60000] name=[default_ping_set] CRM_meta_clone_max=[1] crm_feature_set=[3.0.1] host_list=[192.168.40.1] CRM_meta_globally_unique=[false] CRM_meta_name=[monitor] CRM_meta_clone=[0] CRM_meta_clone_node_max=[1] CRM_meta_notify=[false]  cancelled
>> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=21:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmPingd:0_stop_0 )
>> > > Jan  6 19:22:02 rh57-1 pingd: [3529]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmPingd:0 stop[14] (pid 3612)
>> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[14] on prmPingd:0 for client 3461: pid 3612 exited with return code 0
>> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_monitor_10000 (call=10, status=1, cib-update=0, confirmed=true) Cancelled
>> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_stop_0 (call=14, rc=0, cib-update=45, confirmed=true) ok
>> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmPingd:0_stop_0 (21) confirmed on rh57-1 (rc=0)
>> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 25 fired and confirmed
>> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 4 fired and confirmed
>> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 16: stop prmStonith2-3_stop_0 on rh57-1 (local)
>> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[13] on prmStonith2-3 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[600s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[2] CRM_meta_name=[monitor]  cancelled
>> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=16:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-3_stop_0 )
>> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-3 stop[15] (pid 3617)
>> > > Jan  6 19:22:02 rh57-1 lrmd: [3617]: info: Try to stop STONITH resource <rsc_id=prmStonith2-3> : Device=meatware
>> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_monitor_3600000 (call=13, status=1, cib-update=0, confirmed=true) Cancelled
>> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[15] on prmStonith2-3 for client 3461: pid 3617 exited with return code 0
>> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_stop_0 (call=15, rc=0, cib-update=46, confirmed=true) ok
>> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-3_stop_0 (16) confirmed on rh57-1 (rc=0)
>> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 15: stop prmStonith2-2_stop_0 on rh57-1 (local)
>> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[11] on prmStonith2-2 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[60s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[1] CRM_meta_name=[monitor]  cancelled
>> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=15:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-2_stop_0 )
>> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-2 stop[16] (pid 3619)
>> > > Jan  6 19:22:02 rh57-1 lrmd: [3619]: info: Try to stop STONITH resource <rsc_id=prmStonith2-2> : Device=external/ssh
>> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_monitor_3600000 (call=11, status=1, cib-update=0, confirmed=true) Cancelled
>> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[16] on prmStonith2-2 for client 3461: pid 3619 exited with return code 0
>> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_stop_0 (call=16, rc=0, cib-update=47, confirmed=true) ok
>> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-2_stop_0 (15) confirmed on rh57-1 (rc=0)
>> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 20 fired and confirmed
>> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: Executing crm-event (28): do_shutdown on rh57-1
>> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: crm-event (28) is a local shutdown
>> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: run_graph: ====================================================
>> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: notice: run_graph: Transition 4 (Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-4.bz2): Complete
>> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_graph_trigger: Transition 4 is now complete
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ]
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_dc_release: DC role released
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
>> > > Jan  6 19:22:03 rh57-1 pengine: [3464]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Transitioner is now inactive
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Disconnecting STONITH...
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: tengine_stonith_connection_destroy: Fencing daemon disconnected
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: notice: Not currently connected.
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: register_fsa_input_adv: do_shutdown stalled the FSA with pending inputs
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: do_log: FSA: Input I_RELEASE_SUCCESS from do_dc_release() received in state S_STOPPING
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 420 ms (> 100 ms) before being called (GSource: 0x179d9b0)
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: G_SIG_dispatch: started at 429442052 should have started at 429442010
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmdManagedChildDied: Process pengine:[3464] exited (signal=0, exitcode=0)
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 80 ms (> 30 ms) (GSource: 0x179d9b0)
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_msg_dispatch: Received HUP from pengine:[3464]
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_connection_destroy: Connection to the Policy Engine released
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: ERROR: verify_stopped: Resource prmVIP was active at shutdown.  You may ignore this error if it is unmanaged.
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_lrm_control: Disconnected from the LRM
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_ha_control: Disconnected from Heartbeat
>> > > Jan  6 19:22:03 rh57-1 ccm: [3456]: info: client (pid=3461) removed from ccm
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_cib_control: Disconnecting CIB
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmd_cib_connection_destroy: Connection to the CIB terminated...
>> > > Jan  6 19:22:03 rh57-1 cib: [3457]: info: cib_process_readwrite: We are now in R/O mode
>> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd
>> > > Jan  6 19:22:03 rh57-1 cib: [3457]: WARN: send_ipc_message: IPC Channel to 3461 is not connected
>> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: free_mem: Dropping I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ]
>> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: send_via_callback_channel: Delivery of reply to client 3461/5f69edda-aec9-42c7-ae52-045a05d1c5db failed
>> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: do_exit: [crmd] stopped (0)
>> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed
>> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/attrd process group 3460 with signal 15
>> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 50 ms (> 30 ms) (GSource: 0x7b28140)
>> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_shutdown: Exiting
>> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: main: Exiting...
>> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_cib_connection_destroy: Connection to the CIB terminated...
>> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/stonithd process group 3459 with signal 15
>> > > Jan  6 19:22:04 rh57-1 stonithd: [3459]: notice: /usr/lib64/heartbeat/stonithd normally quit.
>> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/lrmd -r process group 3458 with signal 15
>> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
>> > > Jan  6 19:22:04 rh57-1 lrmd: [3458]: info: lrmd is shutting down
>> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/cib process group 3457 with signal 15
>> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
>> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_shutdown: Disconnected 0 clients
>> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_process_disconnect: All clients disconnected...
>> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: initiate_exit: Disconnecting heartbeat
>> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: Exiting...
>> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: main: Done
>> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: client (pid=3457) removed from ccm
>> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ccm process group 3456 with signal 15
>> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 60 ms (> 30 ms) (GSource: 0x7b28140)
>> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: received SIGTERM, going to shut down
>> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBFIFO process 3446 with signal 15
>> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3447 with signal 15
>> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3448 with signal 15
>> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3449 with signal 15
>> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3450 with signal 15
>> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3448 exited. 5 remaining
>> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3447 exited. 4 remaining
>> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3450 exited. 3 remaining
>> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3446 exited. 2 remaining
>> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3449 exited. 1 remaining
>> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: rh57-1 Heartbeat shutdown complete.
>> > >
>> > > -----------------------------
>> > >
>> > >
>> > >
>> > > Best Regards,
>> > > Hideo Yamauchi.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > --- On Fri, 2012/1/6, Andrew Beekhof <andrew [at] beekhof> wrote:
>> > >
>> > >> On Tue, Dec 27, 2011 at 6:15 PM,  <renayama19661014 [at] ybb> wrote:
>> > >> > Hi All,
>> > >> >
>> > >> > When Pacemaker stops when there is the resource that failed in probe processing, crmd outputs the following error message.
>> > >> >
>> > >> >
>> > >> >  Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX was active at shutdown.  You may ignore this error if it is unmanaged.
>> > >> >
>> > >> >
>> > >> > Because the resource that failed in probe processing does not start,
>> > >>
>> > >> But it should have a subsequent stop action which would set it back to
>> > >> being inactive.
>> > >> Did that not happen in this case?
>> > >>
>> > >> > this error message is not right.
>> > >> >
>> > >> > I think that the following correction may be good, but we do not have conviction.
>> > >> >
>> > >> >
>> > >> >  * crmd/lrm.c
>> > >> >  (snip)
>> > >> >                } else if(op->rc == EXECRA_NOT_RUNNING) {
>> > >> >                        active = FALSE;
>> > >> > +                } else if(op->rc != EXECRA_OK && op->interval == 0
>> > >> > +                                && safe_str_eq(op->op_type, CRMD_ACTION_STATUS)) {
>> > >> > +                        active = FALSE;
>> > >> >                } else {
>> > >> >                        active = TRUE;
>> > >> >                }
>> > >> >  (snip)
>> > >> >
>> > >> >
>> > >> > In the source for development of Pacemaker, handling of this processing seems to be considerably changed.
>> > >> > It requests backporting to Pacemaker1.0 system of this change that we can do it.
>> > >> >
>> > >> > Best Regards,
>> > >> > Hideo Yamauchi.
>> > >> >
>> > >> >
>> > >> >
>> > >> > _______________________________________________
>> > >> > Pacemaker mailing list: Pacemaker [at] oss
>> > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> > >> >
>> > >> > Project Home: http://www.clusterlabs.org
>> > >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > >> > Bugs: http://bugs.clusterlabs.org
>> > >>
>> > >
>> > > _______________________________________________
>> > > Pacemaker mailing list: Pacemaker [at] oss
>> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> > >
>> > > Project Home: http://www.clusterlabs.org
>> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > > Bugs: http://bugs.clusterlabs.org
>> >
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


renayama19661014 at ybb

Feb 16, 2012, 3:49 PM

Post #8 of 15 (1461 views)
Permalink
Re: [Problem]It is judged that a stopping resource is starting. [In reply to]

Hi Andrew,

Thank you for comment.

> I'm getting to this soon, really :-)
> First it was corosync 2.0 stuff, so that /something/ in fedora-17
> works, then fixing everything I broke when adding corosync 2.0
> support.

All right!

I wait for your answer.

Best Regards,
Hideo Yamauchi.

--- On Thu, 2012/2/16, Andrew Beekhof <andrew [at] beekhof> wrote:

> Sorry!
>
> I'm getting to this soon, really :-)
> First it was corosync 2.0 stuff, so that /something/ in fedora-17
> works, then fixing everything I broke when adding corosync 2.0
> support.
>
> On Tue, Feb 14, 2012 at 11:20 AM,  <renayama19661014 [at] ybb> wrote:
> > Hi Andrew,
> >
> > About this problem, how did it turn out afterwards?
> >
> > Best Regards,
> > Hideo Yamauchi.
> >
> >
> > --- On Mon, 2012/1/16, renayama19661014 [at] ybb <renayama19661014 [at] ybb> wrote:
> >
> >> Hi Andrew,
> >>
> >> Thank you for comments.
> >>
> >> > Could you send me the PE file related to this log please?
> >> >
> >> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
> >> > /var/lib/pengine/pe-input-4.bz2
> >>
> >> The old file disappeared.
> >> I send log and the PE file which reappeared in the same procedure.
> >>
> >>  * trac1818.zip
> >>   * https://skydrive.live.com/?cid=3a14d57622c66876&id=3A14D57622C66876%21127
> >>
> >> Best Regards,
> >> Hideo Yamauchi.
> >>
> >>
> >> --- On Mon, 2012/1/16, Andrew Beekhof <andrew [at] beekhof> wrote:
> >>
> >> > On Fri, Jan 6, 2012 at 12:37 PM,  <renayama19661014 [at] ybb> wrote:
> >> > > Hi Andrew,
> >> > >
> >> > > Thank you for comment.
> >> > >
> >> > >> But it should have a subsequent stop action which would set it back to
> >> > >> being inactive.
> >> > >> Did that not happen in this case?
> >> > >
> >> > > Yes.
> >> >
> >> > Could you send me the PE file related to this log please?
> >> >
> >> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
> >> > /var/lib/pengine/pe-input-4.bz2
> >> >
> >> >
> >> >
> >> > > Log of "verify_stopped" is only recorded.
> >> > > The stop handling of resource that failed in probe was not carried out.
> >> > >
> >> > > -----------------------------
> >> > > ######### yamauchi PREV STOP ##########
> >> > > Jan  6 19:21:56 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ifcheckd process group 3462 with signal 15
> >> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> >> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: do_node_walk: Requesting the list of configured nodes
> >> > > Jan  6 19:21:58 rh57-1 ifcheckd: [3462]: info: main: Exiting ifcheckd
> >> > > Jan  6 19:21:58 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/crmd process group 3461 with signal 15
> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_shutdown: Requesting shutdown
> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_shutdown_req: Sending shutdown request to DC: rh57-1
> >> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: handle_shutdown_request: Creating shutdown request for rh57-1 (state=S_POLICY_ENGINE)
> >> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (1325845319)
> >> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_perform_update: Sent update 14: shutdown=1325845319
> >> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: abort_transition_graph: te_update_diff:150 - Triggered transition abort (complete=1, tag=nvpair, id=status-1fdd5e2a-44b6-44b9-9993-97fa120072a4-shutdown, name=shutdown, value=1325845319, magic=NA, cib=0.101.16) : Transient attribute: update
> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: crm_timer_popped: New Transition Timer (I_PE_CALC) just popped!
> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke: Query 44: Requesting the current CIB: S_POLICY_ENGINE
> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke_callback: Invoking the PE: query=44, ref=pe_calc-dc-1325845321-26, seq=1, quorate=0
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: unpack_config: On loss of CCM Quorum: Ignore
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: WARN: unpack_nodes: Blind faith: not fencing unseen nodes
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: determine_online_status: Node rh57-1 is shutting down
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from re-starting anywhere in the cluster
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpUltraMonkey
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmVIP       (ocf::heartbeat:LVM):   Stopped
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith1
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-2        (stonith:external/ssh): Stopped
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-3        (stonith:meatware):     Stopped
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith2
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-2        (stonith:external/ssh): Started rh57-1
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-3        (stonith:meatware):     Started rh57-1
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: clone_print:  Clone Set: clnPingd
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: short_print:      Started: [ rh57-1 ]
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: clnPingd: Rolling back scores from prmVIP
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmPingd:0 cannot run anywhere
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmVIP cannot run anywhere
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith1-2: Rolling back scores from prmStonith1-3
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-2 cannot run anywhere
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-3 cannot run anywhere
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith2-2: Rolling back scores from prmStonith2-3
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-2 cannot run anywhere
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-3 cannot run anywhere
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: stage6: Scheduling Node rh57-1 for shutdown
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmVIP     (Stopped)
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-2      (Stopped)
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-3      (Stopped)
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-2      (rh57-1)
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-3      (rh57-1)
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmPingd:0 (rh57-1)
> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: process_pe_message: Transition 4: PEngine Input stored in: /var/lib/pengine/pe-input-4.bz2
> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: unpack_graph: Unpacked transition 4: 9 actions in 9 synapses
> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing graph 4 (ref=pe_calc-dc-1325845321-26) derived from /var/lib/pengine/pe-input-4.bz2
> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 19 fired and confirmed
> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 24 fired and confirmed
> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 21: stop prmPingd:0_stop_0 on rh57-1 (local)
> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[10] on prmPingd:0 for client 3461, its parameters: CRM_meta_interval=[10000] multiplier=[100] CRM_meta_on_fail=[restart] CRM_meta_timeout=[60000] name=[default_ping_set] CRM_meta_clone_max=[1] crm_feature_set=[3.0.1] host_list=[192.168.40.1] CRM_meta_globally_unique=[false] CRM_meta_name=[monitor] CRM_meta_clone=[0] CRM_meta_clone_node_max=[1] CRM_meta_notify=[false]  cancelled
> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=21:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmPingd:0_stop_0 )
> >> > > Jan  6 19:22:02 rh57-1 pingd: [3529]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmPingd:0 stop[14] (pid 3612)
> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[14] on prmPingd:0 for client 3461: pid 3612 exited with return code 0
> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_monitor_10000 (call=10, status=1, cib-update=0, confirmed=true) Cancelled
> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_stop_0 (call=14, rc=0, cib-update=45, confirmed=true) ok
> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmPingd:0_stop_0 (21) confirmed on rh57-1 (rc=0)
> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 25 fired and confirmed
> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 4 fired and confirmed
> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 16: stop prmStonith2-3_stop_0 on rh57-1 (local)
> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[13] on prmStonith2-3 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[600s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[2] CRM_meta_name=[monitor]  cancelled
> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=16:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-3_stop_0 )
> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-3 stop[15] (pid 3617)
> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3617]: info: Try to stop STONITH resource <rsc_id=prmStonith2-3> : Device=meatware
> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_monitor_3600000 (call=13, status=1, cib-update=0, confirmed=true) Cancelled
> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[15] on prmStonith2-3 for client 3461: pid 3617 exited with return code 0
> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_stop_0 (call=15, rc=0, cib-update=46, confirmed=true) ok
> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-3_stop_0 (16) confirmed on rh57-1 (rc=0)
> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 15: stop prmStonith2-2_stop_0 on rh57-1 (local)
> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[11] on prmStonith2-2 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[60s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[1] CRM_meta_name=[monitor]  cancelled
> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=15:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-2_stop_0 )
> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-2 stop[16] (pid 3619)
> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3619]: info: Try to stop STONITH resource <rsc_id=prmStonith2-2> : Device=external/ssh
> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_monitor_3600000 (call=11, status=1, cib-update=0, confirmed=true) Cancelled
> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[16] on prmStonith2-2 for client 3461: pid 3619 exited with return code 0
> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_stop_0 (call=16, rc=0, cib-update=47, confirmed=true) ok
> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-2_stop_0 (15) confirmed on rh57-1 (rc=0)
> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 20 fired and confirmed
> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: Executing crm-event (28): do_shutdown on rh57-1
> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: crm-event (28) is a local shutdown
> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: run_graph: ====================================================
> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: notice: run_graph: Transition 4 (Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-4.bz2): Complete
> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_graph_trigger: Transition 4 is now complete
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ]
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_dc_release: DC role released
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
> >> > > Jan  6 19:22:03 rh57-1 pengine: [3464]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Transitioner is now inactive
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Disconnecting STONITH...
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: tengine_stonith_connection_destroy: Fencing daemon disconnected
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: notice: Not currently connected.
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: register_fsa_input_adv: do_shutdown stalled the FSA with pending inputs
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: do_log: FSA: Input I_RELEASE_SUCCESS from do_dc_release() received in state S_STOPPING
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 420 ms (> 100 ms) before being called (GSource: 0x179d9b0)
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: G_SIG_dispatch: started at 429442052 should have started at 429442010
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmdManagedChildDied: Process pengine:[3464] exited (signal=0, exitcode=0)
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 80 ms (> 30 ms) (GSource: 0x179d9b0)
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_msg_dispatch: Received HUP from pengine:[3464]
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_connection_destroy: Connection to the Policy Engine released
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: ERROR: verify_stopped: Resource prmVIP was active at shutdown.  You may ignore this error if it is unmanaged.
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_lrm_control: Disconnected from the LRM
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_ha_control: Disconnected from Heartbeat
> >> > > Jan  6 19:22:03 rh57-1 ccm: [3456]: info: client (pid=3461) removed from ccm
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_cib_control: Disconnecting CIB
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmd_cib_connection_destroy: Connection to the CIB terminated...
> >> > > Jan  6 19:22:03 rh57-1 cib: [3457]: info: cib_process_readwrite: We are now in R/O mode
> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd
> >> > > Jan  6 19:22:03 rh57-1 cib: [3457]: WARN: send_ipc_message: IPC Channel to 3461 is not connected
> >> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: free_mem: Dropping I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ]
> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: send_via_callback_channel: Delivery of reply to client 3461/5f69edda-aec9-42c7-ae52-045a05d1c5db failed
> >> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: do_exit: [crmd] stopped (0)
> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed
> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/attrd process group 3460 with signal 15
> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 50 ms (> 30 ms) (GSource: 0x7b28140)
> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_shutdown: Exiting
> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: main: Exiting...
> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_cib_connection_destroy: Connection to the CIB terminated...
> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/stonithd process group 3459 with signal 15
> >> > > Jan  6 19:22:04 rh57-1 stonithd: [3459]: notice: /usr/lib64/heartbeat/stonithd normally quit.
> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/lrmd -r process group 3458 with signal 15
> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
> >> > > Jan  6 19:22:04 rh57-1 lrmd: [3458]: info: lrmd is shutting down
> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/cib process group 3457 with signal 15
> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_shutdown: Disconnected 0 clients
> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_process_disconnect: All clients disconnected...
> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: initiate_exit: Disconnecting heartbeat
> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: Exiting...
> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: main: Done
> >> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: client (pid=3457) removed from ccm
> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ccm process group 3456 with signal 15
> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 60 ms (> 30 ms) (GSource: 0x7b28140)
> >> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: received SIGTERM, going to shut down
> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBFIFO process 3446 with signal 15
> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3447 with signal 15
> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3448 with signal 15
> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3449 with signal 15
> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3450 with signal 15
> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3448 exited. 5 remaining
> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3447 exited. 4 remaining
> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3450 exited. 3 remaining
> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3446 exited. 2 remaining
> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3449 exited. 1 remaining
> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: rh57-1 Heartbeat shutdown complete.
> >> > >
> >> > > -----------------------------
> >> > >
> >> > >
> >> > >
> >> > > Best Regards,
> >> > > Hideo Yamauchi.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > --- On Fri, 2012/1/6, Andrew Beekhof <andrew [at] beekhof> wrote:
> >> > >
> >> > >> On Tue, Dec 27, 2011 at 6:15 PM,  <renayama19661014 [at] ybb> wrote:
> >> > >> > Hi All,
> >> > >> >
> >> > >> > When Pacemaker stops when there is the resource that failed in probe processing, crmd outputs the following error message.
> >> > >> >
> >> > >> >
> >> > >> >  Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX was active at shutdown.  You may ignore this error if it is unmanaged.
> >> > >> >
> >> > >> >
> >> > >> > Because the resource that failed in probe processing does not start,
> >> > >>
> >> > >> But it should have a subsequent stop action which would set it back to
> >> > >> being inactive.
> >> > >> Did that not happen in this case?
> >> > >>
> >> > >> > this error message is not right.
> >> > >> >
> >> > >> > I think that the following correction may be good, but we do not have conviction.
> >> > >> >
> >> > >> >
> >> > >> >  * crmd/lrm.c
> >> > >> >  (snip)
> >> > >> >                } else if(op->rc == EXECRA_NOT_RUNNING) {
> >> > >> >                        active = FALSE;
> >> > >> > +                } else if(op->rc != EXECRA_OK && op->interval == 0
> >> > >> > +                                && safe_str_eq(op->op_type, CRMD_ACTION_STATUS)) {
> >> > >> > +                        active = FALSE;
> >> > >> >                } else {
> >> > >> >                        active = TRUE;
> >> > >> >                }
> >> > >> >  (snip)
> >> > >> >
> >> > >> >
> >> > >> > In the source for development of Pacemaker, handling of this processing seems to be considerably changed.
> >> > >> > It requests backporting to Pacemaker1.0 system of this change that we can do it.
> >> > >> >
> >> > >> > Best Regards,
> >> > >> > Hideo Yamauchi.
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > _______________________________________________
> >> > >> > Pacemaker mailing list: Pacemaker [at] oss
> >> > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >> > >> >
> >> > >> > Project Home: http://www.clusterlabs.org
> >> > >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> > >> > Bugs: http://bugs.clusterlabs.org
> >> > >>
> >> > >
> >> > > _______________________________________________
> >> > > Pacemaker mailing list: Pacemaker [at] oss
> >> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >> > >
> >> > > Project Home: http://www.clusterlabs.org
> >> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> > > Bugs: http://bugs.clusterlabs.org
> >> >
> >>
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker [at] oss
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andrew at beekhof

Feb 20, 2012, 5:42 PM

Post #9 of 15 (1442 views)
Permalink
Re: [Problem]It is judged that a stopping resource is starting. [In reply to]

On Fri, Feb 17, 2012 at 10:49 AM, <renayama19661014 [at] ybb> wrote:
> Hi Andrew,
>
> Thank you for comment.
>
>> I'm getting to this soon, really :-)
>> First it was corosync 2.0 stuff, so that /something/ in fedora-17
>> works, then fixing everything I broke when adding corosync 2.0
>> support.
>
> All right!
>
> I wait for your answer.

I somehow missed that the failure was "not configured"

Failed actions:
prmVIP_monitor_0 (node=rh57-1, call=2, rc=6, status=complete): not
configured

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html
lists rc=6 as fatal, but I believe we changed that behaviour (the
stopping aspect) in the PE as there was also insufficient information
for the agent to stop the service.
Which results in the node being fenced, the resource being probed,
which fails along with the subsequent stop, then the node is fenced
again, etc.

So two things:

this log message should include the human version of rc=6
Jan 6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard
error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from
re-starting anywhere in the cluster

and the docs need to be updated.

>
> Best Regards,
> Hideo Yamauchi.
>
> --- On Thu, 2012/2/16, Andrew Beekhof <andrew [at] beekhof> wrote:
>
>> Sorry!
>>
>> I'm getting to this soon, really :-)
>> First it was corosync 2.0 stuff, so that /something/ in fedora-17
>> works, then fixing everything I broke when adding corosync 2.0
>> support.
>>
>> On Tue, Feb 14, 2012 at 11:20 AM,  <renayama19661014 [at] ybb> wrote:
>> > Hi Andrew,
>> >
>> > About this problem, how did it turn out afterwards?
>> >
>> > Best Regards,
>> > Hideo Yamauchi.
>> >
>> >
>> > --- On Mon, 2012/1/16, renayama19661014 [at] ybb <renayama19661014 [at] ybb> wrote:
>> >
>> >> Hi Andrew,
>> >>
>> >> Thank you for comments.
>> >>
>> >> > Could you send me the PE file related to this log please?
>> >> >
>> >> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
>> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
>> >> > /var/lib/pengine/pe-input-4.bz2
>> >>
>> >> The old file disappeared.
>> >> I send log and the PE file which reappeared in the same procedure.
>> >>
>> >>  * trac1818.zip
>> >>   * https://skydrive.live.com/?cid=3a14d57622c66876&id=3A14D57622C66876%21127
>> >>
>> >> Best Regards,
>> >> Hideo Yamauchi.
>> >>
>> >>
>> >> --- On Mon, 2012/1/16, Andrew Beekhof <andrew [at] beekhof> wrote:
>> >>
>> >> > On Fri, Jan 6, 2012 at 12:37 PM,  <renayama19661014 [at] ybb> wrote:
>> >> > > Hi Andrew,
>> >> > >
>> >> > > Thank you for comment.
>> >> > >
>> >> > >> But it should have a subsequent stop action which would set it back to
>> >> > >> being inactive.
>> >> > >> Did that not happen in this case?
>> >> > >
>> >> > > Yes.
>> >> >
>> >> > Could you send me the PE file related to this log please?
>> >> >
>> >> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
>> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
>> >> > /var/lib/pengine/pe-input-4.bz2
>> >> >
>> >> >
>> >> >
>> >> > > Log of "verify_stopped" is only recorded.
>> >> > > The stop handling of resource that failed in probe was not carried out.
>> >> > >
>> >> > > -----------------------------
>> >> > > ######### yamauchi PREV STOP ##########
>> >> > > Jan  6 19:21:56 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ifcheckd process group 3462 with signal 15
>> >> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> >> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: do_node_walk: Requesting the list of configured nodes
>> >> > > Jan  6 19:21:58 rh57-1 ifcheckd: [3462]: info: main: Exiting ifcheckd
>> >> > > Jan  6 19:21:58 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/crmd process group 3461 with signal 15
>> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_shutdown: Requesting shutdown
>> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
>> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
>> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_shutdown_req: Sending shutdown request to DC: rh57-1
>> >> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: handle_shutdown_request: Creating shutdown request for rh57-1 (state=S_POLICY_ENGINE)
>> >> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (1325845319)
>> >> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_perform_update: Sent update 14: shutdown=1325845319
>> >> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: abort_transition_graph: te_update_diff:150 - Triggered transition abort (complete=1, tag=nvpair, id=status-1fdd5e2a-44b6-44b9-9993-97fa120072a4-shutdown, name=shutdown, value=1325845319, magic=NA, cib=0.101.16) : Transient attribute: update
>> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: crm_timer_popped: New Transition Timer (I_PE_CALC) just popped!
>> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke: Query 44: Requesting the current CIB: S_POLICY_ENGINE
>> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke_callback: Invoking the PE: query=44, ref=pe_calc-dc-1325845321-26, seq=1, quorate=0
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: unpack_config: On loss of CCM Quorum: Ignore
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: WARN: unpack_nodes: Blind faith: not fencing unseen nodes
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: determine_online_status: Node rh57-1 is shutting down
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from re-starting anywhere in the cluster
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpUltraMonkey
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmVIP       (ocf::heartbeat:LVM):   Stopped
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith1
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-2        (stonith:external/ssh): Stopped
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-3        (stonith:meatware):     Stopped
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith2
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-2        (stonith:external/ssh): Started rh57-1
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-3        (stonith:meatware):     Started rh57-1
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: clone_print:  Clone Set: clnPingd
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: short_print:      Started: [ rh57-1 ]
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: clnPingd: Rolling back scores from prmVIP
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmPingd:0 cannot run anywhere
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmVIP cannot run anywhere
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith1-2: Rolling back scores from prmStonith1-3
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-2 cannot run anywhere
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-3 cannot run anywhere
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith2-2: Rolling back scores from prmStonith2-3
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-2 cannot run anywhere
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-3 cannot run anywhere
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: stage6: Scheduling Node rh57-1 for shutdown
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmVIP     (Stopped)
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-2      (Stopped)
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-3      (Stopped)
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-2      (rh57-1)
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-3      (rh57-1)
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmPingd:0 (rh57-1)
>> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: process_pe_message: Transition 4: PEngine Input stored in: /var/lib/pengine/pe-input-4.bz2
>> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: unpack_graph: Unpacked transition 4: 9 actions in 9 synapses
>> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing graph 4 (ref=pe_calc-dc-1325845321-26) derived from /var/lib/pengine/pe-input-4.bz2
>> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 19 fired and confirmed
>> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 24 fired and confirmed
>> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 21: stop prmPingd:0_stop_0 on rh57-1 (local)
>> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[10] on prmPingd:0 for client 3461, its parameters: CRM_meta_interval=[10000] multiplier=[100] CRM_meta_on_fail=[restart] CRM_meta_timeout=[60000] name=[default_ping_set] CRM_meta_clone_max=[1] crm_feature_set=[3.0.1] host_list=[192.168.40.1] CRM_meta_globally_unique=[false] CRM_meta_name=[monitor] CRM_meta_clone=[0] CRM_meta_clone_node_max=[1] CRM_meta_notify=[false]  cancelled
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=21:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmPingd:0_stop_0 )
>> >> > > Jan  6 19:22:02 rh57-1 pingd: [3529]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmPingd:0 stop[14] (pid 3612)
>> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[14] on prmPingd:0 for client 3461: pid 3612 exited with return code 0
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_monitor_10000 (call=10, status=1, cib-update=0, confirmed=true) Cancelled
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_stop_0 (call=14, rc=0, cib-update=45, confirmed=true) ok
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmPingd:0_stop_0 (21) confirmed on rh57-1 (rc=0)
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 25 fired and confirmed
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 4 fired and confirmed
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 16: stop prmStonith2-3_stop_0 on rh57-1 (local)
>> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[13] on prmStonith2-3 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[600s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[2] CRM_meta_name=[monitor]  cancelled
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=16:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-3_stop_0 )
>> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-3 stop[15] (pid 3617)
>> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3617]: info: Try to stop STONITH resource <rsc_id=prmStonith2-3> : Device=meatware
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_monitor_3600000 (call=13, status=1, cib-update=0, confirmed=true) Cancelled
>> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[15] on prmStonith2-3 for client 3461: pid 3617 exited with return code 0
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_stop_0 (call=15, rc=0, cib-update=46, confirmed=true) ok
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-3_stop_0 (16) confirmed on rh57-1 (rc=0)
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 15: stop prmStonith2-2_stop_0 on rh57-1 (local)
>> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[11] on prmStonith2-2 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[60s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[1] CRM_meta_name=[monitor]  cancelled
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=15:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-2_stop_0 )
>> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-2 stop[16] (pid 3619)
>> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3619]: info: Try to stop STONITH resource <rsc_id=prmStonith2-2> : Device=external/ssh
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_monitor_3600000 (call=11, status=1, cib-update=0, confirmed=true) Cancelled
>> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[16] on prmStonith2-2 for client 3461: pid 3619 exited with return code 0
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_stop_0 (call=16, rc=0, cib-update=47, confirmed=true) ok
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-2_stop_0 (15) confirmed on rh57-1 (rc=0)
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 20 fired and confirmed
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: Executing crm-event (28): do_shutdown on rh57-1
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: crm-event (28) is a local shutdown
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: run_graph: ====================================================
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: notice: run_graph: Transition 4 (Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-4.bz2): Complete
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_graph_trigger: Transition 4 is now complete
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ]
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_dc_release: DC role released
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
>> >> > > Jan  6 19:22:03 rh57-1 pengine: [3464]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Transitioner is now inactive
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Disconnecting STONITH...
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: tengine_stonith_connection_destroy: Fencing daemon disconnected
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: notice: Not currently connected.
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: register_fsa_input_adv: do_shutdown stalled the FSA with pending inputs
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: do_log: FSA: Input I_RELEASE_SUCCESS from do_dc_release() received in state S_STOPPING
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 420 ms (> 100 ms) before being called (GSource: 0x179d9b0)
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: G_SIG_dispatch: started at 429442052 should have started at 429442010
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmdManagedChildDied: Process pengine:[3464] exited (signal=0, exitcode=0)
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 80 ms (> 30 ms) (GSource: 0x179d9b0)
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_msg_dispatch: Received HUP from pengine:[3464]
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_connection_destroy: Connection to the Policy Engine released
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: ERROR: verify_stopped: Resource prmVIP was active at shutdown.  You may ignore this error if it is unmanaged.
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_lrm_control: Disconnected from the LRM
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_ha_control: Disconnected from Heartbeat
>> >> > > Jan  6 19:22:03 rh57-1 ccm: [3456]: info: client (pid=3461) removed from ccm
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_cib_control: Disconnecting CIB
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmd_cib_connection_destroy: Connection to the CIB terminated...
>> >> > > Jan  6 19:22:03 rh57-1 cib: [3457]: info: cib_process_readwrite: We are now in R/O mode
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd
>> >> > > Jan  6 19:22:03 rh57-1 cib: [3457]: WARN: send_ipc_message: IPC Channel to 3461 is not connected
>> >> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: free_mem: Dropping I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ]
>> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: send_via_callback_channel: Delivery of reply to client 3461/5f69edda-aec9-42c7-ae52-045a05d1c5db failed
>> >> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: do_exit: [crmd] stopped (0)
>> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed
>> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/attrd process group 3460 with signal 15
>> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 50 ms (> 30 ms) (GSource: 0x7b28140)
>> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_shutdown: Exiting
>> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: main: Exiting...
>> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_cib_connection_destroy: Connection to the CIB terminated...
>> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/stonithd process group 3459 with signal 15
>> >> > > Jan  6 19:22:04 rh57-1 stonithd: [3459]: notice: /usr/lib64/heartbeat/stonithd normally quit.
>> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/lrmd -r process group 3458 with signal 15
>> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
>> >> > > Jan  6 19:22:04 rh57-1 lrmd: [3458]: info: lrmd is shutting down
>> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/cib process group 3457 with signal 15
>> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
>> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_shutdown: Disconnected 0 clients
>> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_process_disconnect: All clients disconnected...
>> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: initiate_exit: Disconnecting heartbeat
>> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: Exiting...
>> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: main: Done
>> >> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: client (pid=3457) removed from ccm
>> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ccm process group 3456 with signal 15
>> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 60 ms (> 30 ms) (GSource: 0x7b28140)
>> >> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: received SIGTERM, going to shut down
>> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBFIFO process 3446 with signal 15
>> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3447 with signal 15
>> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3448 with signal 15
>> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3449 with signal 15
>> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3450 with signal 15
>> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3448 exited. 5 remaining
>> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3447 exited. 4 remaining
>> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3450 exited. 3 remaining
>> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3446 exited. 2 remaining
>> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3449 exited. 1 remaining
>> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: rh57-1 Heartbeat shutdown complete.
>> >> > >
>> >> > > -----------------------------
>> >> > >
>> >> > >
>> >> > >
>> >> > > Best Regards,
>> >> > > Hideo Yamauchi.
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > --- On Fri, 2012/1/6, Andrew Beekhof <andrew [at] beekhof> wrote:
>> >> > >
>> >> > >> On Tue, Dec 27, 2011 at 6:15 PM,  <renayama19661014 [at] ybb> wrote:
>> >> > >> > Hi All,
>> >> > >> >
>> >> > >> > When Pacemaker stops when there is the resource that failed in probe processing, crmd outputs the following error message.
>> >> > >> >
>> >> > >> >
>> >> > >> >  Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX was active at shutdown.  You may ignore this error if it is unmanaged.
>> >> > >> >
>> >> > >> >
>> >> > >> > Because the resource that failed in probe processing does not start,
>> >> > >>
>> >> > >> But it should have a subsequent stop action which would set it back to
>> >> > >> being inactive.
>> >> > >> Did that not happen in this case?
>> >> > >>
>> >> > >> > this error message is not right.
>> >> > >> >
>> >> > >> > I think that the following correction may be good, but we do not have conviction.
>> >> > >> >
>> >> > >> >
>> >> > >> >  * crmd/lrm.c
>> >> > >> >  (snip)
>> >> > >> >                } else if(op->rc == EXECRA_NOT_RUNNING) {
>> >> > >> >                        active = FALSE;
>> >> > >> > +                } else if(op->rc != EXECRA_OK && op->interval == 0
>> >> > >> > +                                && safe_str_eq(op->op_type, CRMD_ACTION_STATUS)) {
>> >> > >> > +                        active = FALSE;
>> >> > >> >                } else {
>> >> > >> >                        active = TRUE;
>> >> > >> >                }
>> >> > >> >  (snip)
>> >> > >> >
>> >> > >> >
>> >> > >> > In the source for development of Pacemaker, handling of this processing seems to be considerably changed.
>> >> > >> > It requests backporting to Pacemaker1.0 system of this change that we can do it.
>> >> > >> >
>> >> > >> > Best Regards,
>> >> > >> > Hideo Yamauchi.
>> >> > >> >
>> >> > >> >
>> >> > >> >
>> >> > >> > _______________________________________________
>> >> > >> > Pacemaker mailing list: Pacemaker [at] oss
>> >> > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >> > >> >
>> >> > >> > Project Home: http://www.clusterlabs.org
>> >> > >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> > >> > Bugs: http://bugs.clusterlabs.org
>> >> > >>
>> >> > >
>> >> > > _______________________________________________
>> >> > > Pacemaker mailing list: Pacemaker [at] oss
>> >> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >> > >
>> >> > > Project Home: http://www.clusterlabs.org
>> >> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> > > Bugs: http://bugs.clusterlabs.org
>> >> >
>> >>
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker [at] oss
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


renayama19661014 at ybb

Feb 21, 2012, 4:31 PM

Post #10 of 15 (1441 views)
Permalink
Re: [Problem]It is judged that a stopping resource is starting. [In reply to]

Hi Andrew,

Thank you for comment.

Sorry...I cannot understand your answer well.

Does your answer mean next?

1)It is necessary for the manager of the system to cope when rc is 6(fatal) log.
2)And it is necessary for this to be reflected by a document.

And does it mean that the next log should not be output until a system administrator controls it?

Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX was active at shutdown. You may ignore this error if it is unmanaged.

Best Regards,
Hideo Yamauchi.

--- On Tue, 2012/2/21, Andrew Beekhof <andrew [at] beekhof> wrote:

> On Fri, Feb 17, 2012 at 10:49 AM,  <renayama19661014 [at] ybb> wrote:
> > Hi Andrew,
> >
> > Thank you for comment.
> >
> >> I'm getting to this soon, really :-)
> >> First it was corosync 2.0 stuff, so that /something/ in fedora-17
> >> works, then fixing everything I broke when adding corosync 2.0
> >> support.
> >
> > All right!
> >
> > I wait for your answer.
>
> I somehow missed that the failure was "not configured"
>
> Failed actions:
>     prmVIP_monitor_0 (node=rh57-1, call=2, rc=6, status=complete): not
> configured
>
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html
> lists rc=6 as fatal, but I believe we changed that behaviour (the
> stopping aspect) in the PE as there was also insufficient information
> for the agent to stop the service.
> Which results in the node being fenced, the resource being probed,
> which fails along with the subsequent stop, then the node is fenced
> again, etc.
>
> So two things:
>
> this log message should include the human version of rc=6
> Jan  6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard
> error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from
> re-starting anywhere in the cluster
>
> and the docs need to be updated.
>
> >
> > Best Regards,
> > Hideo Yamauchi.
> >
> > --- On Thu, 2012/2/16, Andrew Beekhof <andrew [at] beekhof> wrote:
> >
> >> Sorry!
> >>
> >> I'm getting to this soon, really :-)
> >> First it was corosync 2.0 stuff, so that /something/ in fedora-17
> >> works, then fixing everything I broke when adding corosync 2.0
> >> support.
> >>
> >> On Tue, Feb 14, 2012 at 11:20 AM,  <renayama19661014 [at] ybb> wrote:
> >> > Hi Andrew,
> >> >
> >> > About this problem, how did it turn out afterwards?
> >> >
> >> > Best Regards,
> >> > Hideo Yamauchi.
> >> >
> >> >
> >> > --- On Mon, 2012/1/16, renayama19661014 [at] ybb <renayama19661014 [at] ybb> wrote:
> >> >
> >> >> Hi Andrew,
> >> >>
> >> >> Thank you for comments.
> >> >>
> >> >> > Could you send me the PE file related to this log please?
> >> >> >
> >> >> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
> >> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
> >> >> > /var/lib/pengine/pe-input-4.bz2
> >> >>
> >> >> The old file disappeared.
> >> >> I send log and the PE file which reappeared in the same procedure.
> >> >>
> >> >>  * trac1818.zip
> >> >>   * https://skydrive.live.com/?cid=3a14d57622c66876&id=3A14D57622C66876%21127
> >> >>
> >> >> Best Regards,
> >> >> Hideo Yamauchi.
> >> >>
> >> >>
> >> >> --- On Mon, 2012/1/16, Andrew Beekhof <andrew [at] beekhof> wrote:
> >> >>
> >> >> > On Fri, Jan 6, 2012 at 12:37 PM,  <renayama19661014 [at] ybb> wrote:
> >> >> > > Hi Andrew,
> >> >> > >
> >> >> > > Thank you for comment.
> >> >> > >
> >> >> > >> But it should have a subsequent stop action which would set it back to
> >> >> > >> being inactive.
> >> >> > >> Did that not happen in this case?
> >> >> > >
> >> >> > > Yes.
> >> >> >
> >> >> > Could you send me the PE file related to this log please?
> >> >> >
> >> >> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
> >> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
> >> >> > /var/lib/pengine/pe-input-4.bz2
> >> >> >
> >> >> >
> >> >> >
> >> >> > > Log of "verify_stopped" is only recorded.
> >> >> > > The stop handling of resource that failed in probe was not carried out.
> >> >> > >
> >> >> > > -----------------------------
> >> >> > > ######### yamauchi PREV STOP ##########
> >> >> > > Jan  6 19:21:56 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ifcheckd process group 3462 with signal 15
> >> >> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> >> >> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: do_node_walk: Requesting the list of configured nodes
> >> >> > > Jan  6 19:21:58 rh57-1 ifcheckd: [3462]: info: main: Exiting ifcheckd
> >> >> > > Jan  6 19:21:58 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/crmd process group 3461 with signal 15
> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_shutdown: Requesting shutdown
> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_shutdown_req: Sending shutdown request to DC: rh57-1
> >> >> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: handle_shutdown_request: Creating shutdown request for rh57-1 (state=S_POLICY_ENGINE)
> >> >> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (1325845319)
> >> >> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_perform_update: Sent update 14: shutdown=1325845319
> >> >> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: abort_transition_graph: te_update_diff:150 - Triggered transition abort (complete=1, tag=nvpair, id=status-1fdd5e2a-44b6-44b9-9993-97fa120072a4-shutdown, name=shutdown, value=1325845319, magic=NA, cib=0.101.16) : Transient attribute: update
> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: crm_timer_popped: New Transition Timer (I_PE_CALC) just popped!
> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke: Query 44: Requesting the current CIB: S_POLICY_ENGINE
> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke_callback: Invoking the PE: query=44, ref=pe_calc-dc-1325845321-26, seq=1, quorate=0
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: unpack_config: On loss of CCM Quorum: Ignore
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: WARN: unpack_nodes: Blind faith: not fencing unseen nodes
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: determine_online_status: Node rh57-1 is shutting down
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from re-starting anywhere in the cluster
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpUltraMonkey
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmVIP       (ocf::heartbeat:LVM):   Stopped
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith1
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-2        (stonith:external/ssh): Stopped
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-3        (stonith:meatware):     Stopped
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith2
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-2        (stonith:external/ssh): Started rh57-1
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-3        (stonith:meatware):     Started rh57-1
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: clone_print:  Clone Set: clnPingd
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: short_print:      Started: [ rh57-1 ]
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: clnPingd: Rolling back scores from prmVIP
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmPingd:0 cannot run anywhere
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmVIP cannot run anywhere
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith1-2: Rolling back scores from prmStonith1-3
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-2 cannot run anywhere
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-3 cannot run anywhere
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith2-2: Rolling back scores from prmStonith2-3
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-2 cannot run anywhere
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-3 cannot run anywhere
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: stage6: Scheduling Node rh57-1 for shutdown
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmVIP     (Stopped)
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-2      (Stopped)
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-3      (Stopped)
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-2      (rh57-1)
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-3      (rh57-1)
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmPingd:0 (rh57-1)
> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: process_pe_message: Transition 4: PEngine Input stored in: /var/lib/pengine/pe-input-4.bz2
> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: unpack_graph: Unpacked transition 4: 9 actions in 9 synapses
> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing graph 4 (ref=pe_calc-dc-1325845321-26) derived from /var/lib/pengine/pe-input-4.bz2
> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 19 fired and confirmed
> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 24 fired and confirmed
> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 21: stop prmPingd:0_stop_0 on rh57-1 (local)
> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[10] on prmPingd:0 for client 3461, its parameters: CRM_meta_interval=[10000] multiplier=[100] CRM_meta_on_fail=[restart] CRM_meta_timeout=[60000] name=[default_ping_set] CRM_meta_clone_max=[1] crm_feature_set=[3.0.1] host_list=[192.168.40.1] CRM_meta_globally_unique=[false] CRM_meta_name=[monitor] CRM_meta_clone=[0] CRM_meta_clone_node_max=[1] CRM_meta_notify=[false]  cancelled
> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=21:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmPingd:0_stop_0 )
> >> >> > > Jan  6 19:22:02 rh57-1 pingd: [3529]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmPingd:0 stop[14] (pid 3612)
> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[14] on prmPingd:0 for client 3461: pid 3612 exited with return code 0
> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_monitor_10000 (call=10, status=1, cib-update=0, confirmed=true) Cancelled
> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_stop_0 (call=14, rc=0, cib-update=45, confirmed=true) ok
> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmPingd:0_stop_0 (21) confirmed on rh57-1 (rc=0)
> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 25 fired and confirmed
> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 4 fired and confirmed
> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 16: stop prmStonith2-3_stop_0 on rh57-1 (local)
> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[13] on prmStonith2-3 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[600s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[2] CRM_meta_name=[monitor]  cancelled
> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=16:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-3_stop_0 )
> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-3 stop[15] (pid 3617)
> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3617]: info: Try to stop STONITH resource <rsc_id=prmStonith2-3> : Device=meatware
> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_monitor_3600000 (call=13, status=1, cib-update=0, confirmed=true) Cancelled
> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[15] on prmStonith2-3 for client 3461: pid 3617 exited with return code 0
> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_stop_0 (call=15, rc=0, cib-update=46, confirmed=true) ok
> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-3_stop_0 (16) confirmed on rh57-1 (rc=0)
> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 15: stop prmStonith2-2_stop_0 on rh57-1 (local)
> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[11] on prmStonith2-2 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[60s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[1] CRM_meta_name=[monitor]  cancelled
> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=15:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-2_stop_0 )
> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-2 stop[16] (pid 3619)
> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3619]: info: Try to stop STONITH resource <rsc_id=prmStonith2-2> : Device=external/ssh
> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_monitor_3600000 (call=11, status=1, cib-update=0, confirmed=true) Cancelled
> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[16] on prmStonith2-2 for client 3461: pid 3619 exited with return code 0
> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_stop_0 (call=16, rc=0, cib-update=47, confirmed=true) ok
> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-2_stop_0 (15) confirmed on rh57-1 (rc=0)
> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 20 fired and confirmed
> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: Executing crm-event (28): do_shutdown on rh57-1
> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: crm-event (28) is a local shutdown
> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: run_graph: ====================================================
> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: notice: run_graph: Transition 4 (Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-4.bz2): Complete
> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_graph_trigger: Transition 4 is now complete
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ]
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_dc_release: DC role released
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
> >> >> > > Jan  6 19:22:03 rh57-1 pengine: [3464]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Transitioner is now inactive
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Disconnecting STONITH...
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: tengine_stonith_connection_destroy: Fencing daemon disconnected
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: notice: Not currently connected.
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: register_fsa_input_adv: do_shutdown stalled the FSA with pending inputs
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: do_log: FSA: Input I_RELEASE_SUCCESS from do_dc_release() received in state S_STOPPING
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 420 ms (> 100 ms) before being called (GSource: 0x179d9b0)
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: G_SIG_dispatch: started at 429442052 should have started at 429442010
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmdManagedChildDied: Process pengine:[3464] exited (signal=0, exitcode=0)
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 80 ms (> 30 ms) (GSource: 0x179d9b0)
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_msg_dispatch: Received HUP from pengine:[3464]
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_connection_destroy: Connection to the Policy Engine released
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: ERROR: verify_stopped: Resource prmVIP was active at shutdown.  You may ignore this error if it is unmanaged.
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_lrm_control: Disconnected from the LRM
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_ha_control: Disconnected from Heartbeat
> >> >> > > Jan  6 19:22:03 rh57-1 ccm: [3456]: info: client (pid=3461) removed from ccm
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_cib_control: Disconnecting CIB
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmd_cib_connection_destroy: Connection to the CIB terminated...
> >> >> > > Jan  6 19:22:03 rh57-1 cib: [3457]: info: cib_process_readwrite: We are now in R/O mode
> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd
> >> >> > > Jan  6 19:22:03 rh57-1 cib: [3457]: WARN: send_ipc_message: IPC Channel to 3461 is not connected
> >> >> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: free_mem: Dropping I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ]
> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: send_via_callback_channel: Delivery of reply to client 3461/5f69edda-aec9-42c7-ae52-045a05d1c5db failed
> >> >> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: do_exit: [crmd] stopped (0)
> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed
> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/attrd process group 3460 with signal 15
> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 50 ms (> 30 ms) (GSource: 0x7b28140)
> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_shutdown: Exiting
> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: main: Exiting...
> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_cib_connection_destroy: Connection to the CIB terminated...
> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/stonithd process group 3459 with signal 15
> >> >> > > Jan  6 19:22:04 rh57-1 stonithd: [3459]: notice: /usr/lib64/heartbeat/stonithd normally quit.
> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/lrmd -r process group 3458 with signal 15
> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
> >> >> > > Jan  6 19:22:04 rh57-1 lrmd: [3458]: info: lrmd is shutting down
> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/cib process group 3457 with signal 15
> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_shutdown: Disconnected 0 clients
> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_process_disconnect: All clients disconnected...
> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: initiate_exit: Disconnecting heartbeat
> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: Exiting...
> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: main: Done
> >> >> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: client (pid=3457) removed from ccm
> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ccm process group 3456 with signal 15
> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 60 ms (> 30 ms) (GSource: 0x7b28140)
> >> >> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: received SIGTERM, going to shut down
> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBFIFO process 3446 with signal 15
> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3447 with signal 15
> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3448 with signal 15
> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3449 with signal 15
> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3450 with signal 15
> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3448 exited. 5 remaining
> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3447 exited. 4 remaining
> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3450 exited. 3 remaining
> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3446 exited. 2 remaining
> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3449 exited. 1 remaining
> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: rh57-1 Heartbeat shutdown complete.
> >> >> > >
> >> >> > > -----------------------------
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > Best Regards,
> >> >> > > Hideo Yamauchi.
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > --- On Fri, 2012/1/6, Andrew Beekhof <andrew [at] beekhof> wrote:
> >> >> > >
> >> >> > >> On Tue, Dec 27, 2011 at 6:15 PM,  <renayama19661014 [at] ybb> wrote:
> >> >> > >> > Hi All,
> >> >> > >> >
> >> >> > >> > When Pacemaker stops when there is the resource that failed in probe processing, crmd outputs the following error message.
> >> >> > >> >
> >> >> > >> >
> >> >> > >> >  Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX was active at shutdown.  You may ignore this error if it is unmanaged.
> >> >> > >> >
> >> >> > >> >
> >> >> > >> > Because the resource that failed in probe processing does not start,
> >> >> > >>
> >> >> > >> But it should have a subsequent stop action which would set it back to
> >> >> > >> being inactive.
> >> >> > >> Did that not happen in this case?
> >> >> > >>
> >> >> > >> > this error message is not right.
> >> >> > >> >
> >> >> > >> > I think that the following correction may be good, but we do not have conviction.
> >> >> > >> >
> >> >> > >> >
> >> >> > >> >  * crmd/lrm.c
> >> >> > >> >  (snip)
> >> >> > >> >                } else if(op->rc == EXECRA_NOT_RUNNING) {
> >> >> > >> >                        active = FALSE;
> >> >> > >> > +                } else if(op->rc != EXECRA_OK && op->interval == 0
> >> >> > >> > +                                && safe_str_eq(op->op_type, CRMD_ACTION_STATUS)) {
> >> >> > >> > +                        active = FALSE;
> >> >> > >> >                } else {
> >> >> > >> >                        active = TRUE;
> >> >> > >> >                }
> >> >> > >> >  (snip)
> >> >> > >> >
> >> >> > >> >
> >> >> > >> > In the source for development of Pacemaker, handling of this processing seems to be considerably changed.
> >> >> > >> > It requests backporting to Pacemaker1.0 system of this change that we can do it.
> >> >> > >> >
> >> >> > >> > Best Regards,
> >> >> > >> > Hideo Yamauchi.
> >> >> > >> >
> >> >> > >> >
> >> >> > >> >
> >> >> > >> > _______________________________________________
> >> >> > >> > Pacemaker mailing list: Pacemaker [at] oss
> >> >> > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >> >> > >> >
> >> >> > >> > Project Home: http://www.clusterlabs.org
> >> >> > >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> >> > >> > Bugs: http://bugs.clusterlabs.org
> >> >> > >>
> >> >> > >
> >> >> > > _______________________________________________
> >> >> > > Pacemaker mailing list: Pacemaker [at] oss
> >> >> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >> >> > >
> >> >> > > Project Home: http://www.clusterlabs.org
> >> >> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> >> > > Bugs: http://bugs.clusterlabs.org
> >> >> >
> >> >>
> >> >
> >> > _______________________________________________
> >> > Pacemaker mailing list: Pacemaker [at] oss
> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >> >
> >> > Project Home: http://www.clusterlabs.org
> >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> > Bugs: http://bugs.clusterlabs.org
> >>
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker [at] oss
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andrew at beekhof

Feb 23, 2012, 3:03 PM

Post #11 of 15 (1431 views)
Permalink
Re: [Problem]It is judged that a stopping resource is starting. [In reply to]

On Wed, Feb 22, 2012 at 11:31 AM, <renayama19661014 [at] ybb> wrote:
> Hi Andrew,
>
> Thank you for comment.
>
> Sorry...I cannot understand your answer well.
>
> Does your answer mean next?
>
> 1)It is necessary for the manager of the system to cope when rc is 6(fatal) log.
> 2)And it is necessary for this to be reflected by a document.
>
> And does it mean that the next log should not be output until a system administrator controls it?
>
> Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX was active at shutdown.  You may ignore this error if it is unmanaged.

Right. There was actually a third part... a slightly more restrictive
version of your original patch:

--- a/crmd/lrm.c
+++ b/crmd/lrm.c
@@ -694,6 +694,9 @@ is_rsc_active(const char *rsc_id)

} else if (entry->last->rc == EXECRA_NOT_RUNNING) {
return FALSE;
+
+ } else if (entry->last->interval == 0 && entry->last->rc ==
EXECRA_NOT_CONFIGURED) {
+ return FALSE;
}

return TRUE;


>
> Best Regards,
> Hideo Yamauchi.
>
> --- On Tue, 2012/2/21, Andrew Beekhof <andrew [at] beekhof> wrote:
>
>> On Fri, Feb 17, 2012 at 10:49 AM,  <renayama19661014 [at] ybb> wrote:
>> > Hi Andrew,
>> >
>> > Thank you for comment.
>> >
>> >> I'm getting to this soon, really :-)
>> >> First it was corosync 2.0 stuff, so that /something/ in fedora-17
>> >> works, then fixing everything I broke when adding corosync 2.0
>> >> support.
>> >
>> > All right!
>> >
>> > I wait for your answer.
>>
>> I somehow missed that the failure was "not configured"
>>
>> Failed actions:
>>     prmVIP_monitor_0 (node=rh57-1, call=2, rc=6, status=complete): not
>> configured
>>
>> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html
>> lists rc=6 as fatal, but I believe we changed that behaviour (the
>> stopping aspect) in the PE as there was also insufficient information
>> for the agent to stop the service.
>> Which results in the node being fenced, the resource being probed,
>> which fails along with the subsequent stop, then the node is fenced
>> again, etc.
>>
>> So two things:
>>
>> this log message should include the human version of rc=6
>> Jan  6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard
>> error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from
>> re-starting anywhere in the cluster
>>
>> and the docs need to be updated.
>>
>> >
>> > Best Regards,
>> > Hideo Yamauchi.
>> >
>> > --- On Thu, 2012/2/16, Andrew Beekhof <andrew [at] beekhof> wrote:
>> >
>> >> Sorry!
>> >>
>> >> I'm getting to this soon, really :-)
>> >> First it was corosync 2.0 stuff, so that /something/ in fedora-17
>> >> works, then fixing everything I broke when adding corosync 2.0
>> >> support.
>> >>
>> >> On Tue, Feb 14, 2012 at 11:20 AM,  <renayama19661014 [at] ybb> wrote:
>> >> > Hi Andrew,
>> >> >
>> >> > About this problem, how did it turn out afterwards?
>> >> >
>> >> > Best Regards,
>> >> > Hideo Yamauchi.
>> >> >
>> >> >
>> >> > --- On Mon, 2012/1/16, renayama19661014 [at] ybb <renayama19661014 [at] ybb> wrote:
>> >> >
>> >> >> Hi Andrew,
>> >> >>
>> >> >> Thank you for comments.
>> >> >>
>> >> >> > Could you send me the PE file related to this log please?
>> >> >> >
>> >> >> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
>> >> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
>> >> >> > /var/lib/pengine/pe-input-4.bz2
>> >> >>
>> >> >> The old file disappeared.
>> >> >> I send log and the PE file which reappeared in the same procedure.
>> >> >>
>> >> >>  * trac1818.zip
>> >> >>   * https://skydrive.live.com/?cid=3a14d57622c66876&id=3A14D57622C66876%21127
>> >> >>
>> >> >> Best Regards,
>> >> >> Hideo Yamauchi.
>> >> >>
>> >> >>
>> >> >> --- On Mon, 2012/1/16, Andrew Beekhof <andrew [at] beekhof> wrote:
>> >> >>
>> >> >> > On Fri, Jan 6, 2012 at 12:37 PM,  <renayama19661014 [at] ybb> wrote:
>> >> >> > > Hi Andrew,
>> >> >> > >
>> >> >> > > Thank you for comment.
>> >> >> > >
>> >> >> > >> But it should have a subsequent stop action which would set it back to
>> >> >> > >> being inactive.
>> >> >> > >> Did that not happen in this case?
>> >> >> > >
>> >> >> > > Yes.
>> >> >> >
>> >> >> > Could you send me the PE file related to this log please?
>> >> >> >
>> >> >> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
>> >> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
>> >> >> > /var/lib/pengine/pe-input-4.bz2
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > > Log of "verify_stopped" is only recorded.
>> >> >> > > The stop handling of resource that failed in probe was not carried out.
>> >> >> > >
>> >> >> > > -----------------------------
>> >> >> > > ######### yamauchi PREV STOP ##########
>> >> >> > > Jan  6 19:21:56 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ifcheckd process group 3462 with signal 15
>> >> >> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> >> >> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: do_node_walk: Requesting the list of configured nodes
>> >> >> > > Jan  6 19:21:58 rh57-1 ifcheckd: [3462]: info: main: Exiting ifcheckd
>> >> >> > > Jan  6 19:21:58 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/crmd process group 3461 with signal 15
>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_shutdown: Requesting shutdown
>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_shutdown_req: Sending shutdown request to DC: rh57-1
>> >> >> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: handle_shutdown_request: Creating shutdown request for rh57-1 (state=S_POLICY_ENGINE)
>> >> >> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (1325845319)
>> >> >> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_perform_update: Sent update 14: shutdown=1325845319
>> >> >> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: abort_transition_graph: te_update_diff:150 - Triggered transition abort (complete=1, tag=nvpair, id=status-1fdd5e2a-44b6-44b9-9993-97fa120072a4-shutdown, name=shutdown, value=1325845319, magic=NA, cib=0.101.16) : Transient attribute: update
>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: crm_timer_popped: New Transition Timer (I_PE_CALC) just popped!
>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke: Query 44: Requesting the current CIB: S_POLICY_ENGINE
>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke_callback: Invoking the PE: query=44, ref=pe_calc-dc-1325845321-26, seq=1, quorate=0
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: unpack_config: On loss of CCM Quorum: Ignore
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: WARN: unpack_nodes: Blind faith: not fencing unseen nodes
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: determine_online_status: Node rh57-1 is shutting down
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from re-starting anywhere in the cluster
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpUltraMonkey
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmVIP       (ocf::heartbeat:LVM):   Stopped
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith1
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-2        (stonith:external/ssh): Stopped
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-3        (stonith:meatware):     Stopped
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith2
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-2        (stonith:external/ssh): Started rh57-1
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-3        (stonith:meatware):     Started rh57-1
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: clone_print:  Clone Set: clnPingd
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: short_print:      Started: [ rh57-1 ]
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: clnPingd: Rolling back scores from prmVIP
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmPingd:0 cannot run anywhere
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmVIP cannot run anywhere
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith1-2: Rolling back scores from prmStonith1-3
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-2 cannot run anywhere
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-3 cannot run anywhere
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith2-2: Rolling back scores from prmStonith2-3
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-2 cannot run anywhere
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-3 cannot run anywhere
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: stage6: Scheduling Node rh57-1 for shutdown
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmVIP     (Stopped)
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-2      (Stopped)
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-3      (Stopped)
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-2      (rh57-1)
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-3      (rh57-1)
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmPingd:0 (rh57-1)
>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: process_pe_message: Transition 4: PEngine Input stored in: /var/lib/pengine/pe-input-4.bz2
>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: unpack_graph: Unpacked transition 4: 9 actions in 9 synapses
>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing graph 4 (ref=pe_calc-dc-1325845321-26) derived from /var/lib/pengine/pe-input-4.bz2
>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 19 fired and confirmed
>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 24 fired and confirmed
>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 21: stop prmPingd:0_stop_0 on rh57-1 (local)
>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[10] on prmPingd:0 for client 3461, its parameters: CRM_meta_interval=[10000] multiplier=[100] CRM_meta_on_fail=[restart] CRM_meta_timeout=[60000] name=[default_ping_set] CRM_meta_clone_max=[1] crm_feature_set=[3.0.1] host_list=[192.168.40.1] CRM_meta_globally_unique=[false] CRM_meta_name=[monitor] CRM_meta_clone=[0] CRM_meta_clone_node_max=[1] CRM_meta_notify=[false]  cancelled
>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=21:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmPingd:0_stop_0 )
>> >> >> > > Jan  6 19:22:02 rh57-1 pingd: [3529]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmPingd:0 stop[14] (pid 3612)
>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[14] on prmPingd:0 for client 3461: pid 3612 exited with return code 0
>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_monitor_10000 (call=10, status=1, cib-update=0, confirmed=true) Cancelled
>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_stop_0 (call=14, rc=0, cib-update=45, confirmed=true) ok
>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmPingd:0_stop_0 (21) confirmed on rh57-1 (rc=0)
>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 25 fired and confirmed
>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 4 fired and confirmed
>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 16: stop prmStonith2-3_stop_0 on rh57-1 (local)
>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[13] on prmStonith2-3 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[600s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[2] CRM_meta_name=[monitor]  cancelled
>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=16:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-3_stop_0 )
>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-3 stop[15] (pid 3617)
>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3617]: info: Try to stop STONITH resource <rsc_id=prmStonith2-3> : Device=meatware
>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_monitor_3600000 (call=13, status=1, cib-update=0, confirmed=true) Cancelled
>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[15] on prmStonith2-3 for client 3461: pid 3617 exited with return code 0
>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_stop_0 (call=15, rc=0, cib-update=46, confirmed=true) ok
>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-3_stop_0 (16) confirmed on rh57-1 (rc=0)
>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 15: stop prmStonith2-2_stop_0 on rh57-1 (local)
>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[11] on prmStonith2-2 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[60s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[1] CRM_meta_name=[monitor]  cancelled
>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=15:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-2_stop_0 )
>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-2 stop[16] (pid 3619)
>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3619]: info: Try to stop STONITH resource <rsc_id=prmStonith2-2> : Device=external/ssh
>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_monitor_3600000 (call=11, status=1, cib-update=0, confirmed=true) Cancelled
>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[16] on prmStonith2-2 for client 3461: pid 3619 exited with return code 0
>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_stop_0 (call=16, rc=0, cib-update=47, confirmed=true) ok
>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-2_stop_0 (15) confirmed on rh57-1 (rc=0)
>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 20 fired and confirmed
>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: Executing crm-event (28): do_shutdown on rh57-1
>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: crm-event (28) is a local shutdown
>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: run_graph: ====================================================
>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: notice: run_graph: Transition 4 (Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-4.bz2): Complete
>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_graph_trigger: Transition 4 is now complete
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ]
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_dc_release: DC role released
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
>> >> >> > > Jan  6 19:22:03 rh57-1 pengine: [3464]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Transitioner is now inactive
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Disconnecting STONITH...
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: tengine_stonith_connection_destroy: Fencing daemon disconnected
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: notice: Not currently connected.
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: register_fsa_input_adv: do_shutdown stalled the FSA with pending inputs
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: do_log: FSA: Input I_RELEASE_SUCCESS from do_dc_release() received in state S_STOPPING
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 420 ms (> 100 ms) before being called (GSource: 0x179d9b0)
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: G_SIG_dispatch: started at 429442052 should have started at 429442010
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmdManagedChildDied: Process pengine:[3464] exited (signal=0, exitcode=0)
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 80 ms (> 30 ms) (GSource: 0x179d9b0)
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_msg_dispatch: Received HUP from pengine:[3464]
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_connection_destroy: Connection to the Policy Engine released
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: ERROR: verify_stopped: Resource prmVIP was active at shutdown.  You may ignore this error if it is unmanaged.
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_lrm_control: Disconnected from the LRM
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_ha_control: Disconnected from Heartbeat
>> >> >> > > Jan  6 19:22:03 rh57-1 ccm: [3456]: info: client (pid=3461) removed from ccm
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_cib_control: Disconnecting CIB
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmd_cib_connection_destroy: Connection to the CIB terminated...
>> >> >> > > Jan  6 19:22:03 rh57-1 cib: [3457]: info: cib_process_readwrite: We are now in R/O mode
>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd
>> >> >> > > Jan  6 19:22:03 rh57-1 cib: [3457]: WARN: send_ipc_message: IPC Channel to 3461 is not connected
>> >> >> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: free_mem: Dropping I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ]
>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: send_via_callback_channel: Delivery of reply to client 3461/5f69edda-aec9-42c7-ae52-045a05d1c5db failed
>> >> >> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: do_exit: [crmd] stopped (0)
>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed
>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/attrd process group 3460 with signal 15
>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 50 ms (> 30 ms) (GSource: 0x7b28140)
>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_shutdown: Exiting
>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: main: Exiting...
>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_cib_connection_destroy: Connection to the CIB terminated...
>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/stonithd process group 3459 with signal 15
>> >> >> > > Jan  6 19:22:04 rh57-1 stonithd: [3459]: notice: /usr/lib64/heartbeat/stonithd normally quit.
>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/lrmd -r process group 3458 with signal 15
>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
>> >> >> > > Jan  6 19:22:04 rh57-1 lrmd: [3458]: info: lrmd is shutting down
>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/cib process group 3457 with signal 15
>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_shutdown: Disconnected 0 clients
>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_process_disconnect: All clients disconnected...
>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: initiate_exit: Disconnecting heartbeat
>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: Exiting...
>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: main: Done
>> >> >> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: client (pid=3457) removed from ccm
>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ccm process group 3456 with signal 15
>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 60 ms (> 30 ms) (GSource: 0x7b28140)
>> >> >> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: received SIGTERM, going to shut down
>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBFIFO process 3446 with signal 15
>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3447 with signal 15
>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3448 with signal 15
>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3449 with signal 15
>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3450 with signal 15
>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3448 exited. 5 remaining
>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3447 exited. 4 remaining
>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3450 exited. 3 remaining
>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3446 exited. 2 remaining
>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3449 exited. 1 remaining
>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: rh57-1 Heartbeat shutdown complete.
>> >> >> > >
>> >> >> > > -----------------------------
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > > Best Regards,
>> >> >> > > Hideo Yamauchi.
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > > --- On Fri, 2012/1/6, Andrew Beekhof <andrew [at] beekhof> wrote:
>> >> >> > >
>> >> >> > >> On Tue, Dec 27, 2011 at 6:15 PM,  <renayama19661014 [at] ybb> wrote:
>> >> >> > >> > Hi All,
>> >> >> > >> >
>> >> >> > >> > When Pacemaker stops when there is the resource that failed in probe processing, crmd outputs the following error message.
>> >> >> > >> >
>> >> >> > >> >
>> >> >> > >> >  Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX was active at shutdown.  You may ignore this error if it is unmanaged.
>> >> >> > >> >
>> >> >> > >> >
>> >> >> > >> > Because the resource that failed in probe processing does not start,
>> >> >> > >>
>> >> >> > >> But it should have a subsequent stop action which would set it back to
>> >> >> > >> being inactive.
>> >> >> > >> Did that not happen in this case?
>> >> >> > >>
>> >> >> > >> > this error message is not right.
>> >> >> > >> >
>> >> >> > >> > I think that the following correction may be good, but we do not have conviction.
>> >> >> > >> >
>> >> >> > >> >
>> >> >> > >> >  * crmd/lrm.c
>> >> >> > >> >  (snip)
>> >> >> > >> >                } else if(op->rc == EXECRA_NOT_RUNNING) {
>> >> >> > >> >                        active = FALSE;
>> >> >> > >> > +                } else if(op->rc != EXECRA_OK && op->interval == 0
>> >> >> > >> > +                                && safe_str_eq(op->op_type, CRMD_ACTION_STATUS)) {
>> >> >> > >> > +                        active = FALSE;
>> >> >> > >> >                } else {
>> >> >> > >> >                        active = TRUE;
>> >> >> > >> >                }
>> >> >> > >> >  (snip)
>> >> >> > >> >
>> >> >> > >> >
>> >> >> > >> > In the source for development of Pacemaker, handling of this processing seems to be considerably changed.
>> >> >> > >> > It requests backporting to Pacemaker1.0 system of this change that we can do it.
>> >> >> > >> >
>> >> >> > >> > Best Regards,
>> >> >> > >> > Hideo Yamauchi.
>> >> >> > >> >
>> >> >> > >> >
>> >> >> > >> >
>> >> >> > >> > _______________________________________________
>> >> >> > >> > Pacemaker mailing list: Pacemaker [at] oss
>> >> >> > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >> >> > >> >
>> >> >> > >> > Project Home: http://www.clusterlabs.org
>> >> >> > >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> >> > >> > Bugs: http://bugs.clusterlabs.org
>> >> >> > >>
>> >> >> > >
>> >> >> > > _______________________________________________
>> >> >> > > Pacemaker mailing list: Pacemaker [at] oss
>> >> >> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >> >> > >
>> >> >> > > Project Home: http://www.clusterlabs.org
>> >> >> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> >> > > Bugs: http://bugs.clusterlabs.org
>> >> >> >
>> >> >>
>> >> >
>> >> > _______________________________________________
>> >> > Pacemaker mailing list: Pacemaker [at] oss
>> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >> >
>> >> > Project Home: http://www.clusterlabs.org
>> >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> > Bugs: http://bugs.clusterlabs.org
>> >>
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker [at] oss
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andrew at beekhof

Feb 23, 2012, 3:08 PM

Post #12 of 15 (1432 views)
Permalink
Re: [Problem]It is judged that a stopping resource is starting. [In reply to]

On Fri, Feb 24, 2012 at 10:03 AM, Andrew Beekhof <andrew [at] beekhof> wrote:
> On Wed, Feb 22, 2012 at 11:31 AM,  <renayama19661014 [at] ybb> wrote:
>> Hi Andrew,
>>
>> Thank you for comment.
>>
>> Sorry...I cannot understand your answer well.
>>
>> Does your answer mean next?
>>
>> 1)It is necessary for the manager of the system to cope when rc is 6(fatal) log.
>> 2)And it is necessary for this to be reflected by a document.

No to both.

>> And does it mean that the next log should not be output until a system administrator controls it?
>>
>> Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX was active at shutdown.  You may ignore this error if it is unmanaged.
>
> Right.  There was actually a third part... a slightly more restrictive
> version of your original patch:

https://github.com/beekhof/pacemaker/commit/543ee8e

> --- a/crmd/lrm.c
> +++ b/crmd/lrm.c
> @@ -694,6 +694,9 @@ is_rsc_active(const char *rsc_id)
>
>     } else if (entry->last->rc == EXECRA_NOT_RUNNING) {
>         return FALSE;
> +
> +    } else if (entry->last->interval == 0 && entry->last->rc ==
> EXECRA_NOT_CONFIGURED) {
> +        return FALSE;
>     }
>
>     return TRUE;
>
>
>>
>> Best Regards,
>> Hideo Yamauchi.
>>
>> --- On Tue, 2012/2/21, Andrew Beekhof <andrew [at] beekhof> wrote:
>>
>>> On Fri, Feb 17, 2012 at 10:49 AM,  <renayama19661014 [at] ybb> wrote:
>>> > Hi Andrew,
>>> >
>>> > Thank you for comment.
>>> >
>>> >> I'm getting to this soon, really :-)
>>> >> First it was corosync 2.0 stuff, so that /something/ in fedora-17
>>> >> works, then fixing everything I broke when adding corosync 2.0
>>> >> support.
>>> >
>>> > All right!
>>> >
>>> > I wait for your answer.
>>>
>>> I somehow missed that the failure was "not configured"
>>>
>>> Failed actions:
>>>     prmVIP_monitor_0 (node=rh57-1, call=2, rc=6, status=complete): not
>>> configured
>>>
>>> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html
>>> lists rc=6 as fatal, but I believe we changed that behaviour (the
>>> stopping aspect) in the PE as there was also insufficient information
>>> for the agent to stop the service.
>>> Which results in the node being fenced, the resource being probed,
>>> which fails along with the subsequent stop, then the node is fenced
>>> again, etc.
>>>
>>> So two things:
>>>
>>> this log message should include the human version of rc=6
>>> Jan  6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard
>>> error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from
>>> re-starting anywhere in the cluster
>>>
>>> and the docs need to be updated.
>>>
>>> >
>>> > Best Regards,
>>> > Hideo Yamauchi.
>>> >
>>> > --- On Thu, 2012/2/16, Andrew Beekhof <andrew [at] beekhof> wrote:
>>> >
>>> >> Sorry!
>>> >>
>>> >> I'm getting to this soon, really :-)
>>> >> First it was corosync 2.0 stuff, so that /something/ in fedora-17
>>> >> works, then fixing everything I broke when adding corosync 2.0
>>> >> support.
>>> >>
>>> >> On Tue, Feb 14, 2012 at 11:20 AM,  <renayama19661014 [at] ybb> wrote:
>>> >> > Hi Andrew,
>>> >> >
>>> >> > About this problem, how did it turn out afterwards?
>>> >> >
>>> >> > Best Regards,
>>> >> > Hideo Yamauchi.
>>> >> >
>>> >> >
>>> >> > --- On Mon, 2012/1/16, renayama19661014 [at] ybb <renayama19661014 [at] ybb> wrote:
>>> >> >
>>> >> >> Hi Andrew,
>>> >> >>
>>> >> >> Thank you for comments.
>>> >> >>
>>> >> >> > Could you send me the PE file related to this log please?
>>> >> >> >
>>> >> >> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
>>> >> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
>>> >> >> > /var/lib/pengine/pe-input-4.bz2
>>> >> >>
>>> >> >> The old file disappeared.
>>> >> >> I send log and the PE file which reappeared in the same procedure.
>>> >> >>
>>> >> >>  * trac1818.zip
>>> >> >>   * https://skydrive.live.com/?cid=3a14d57622c66876&id=3A14D57622C66876%21127
>>> >> >>
>>> >> >> Best Regards,
>>> >> >> Hideo Yamauchi.
>>> >> >>
>>> >> >>
>>> >> >> --- On Mon, 2012/1/16, Andrew Beekhof <andrew [at] beekhof> wrote:
>>> >> >>
>>> >> >> > On Fri, Jan 6, 2012 at 12:37 PM,  <renayama19661014 [at] ybb> wrote:
>>> >> >> > > Hi Andrew,
>>> >> >> > >
>>> >> >> > > Thank you for comment.
>>> >> >> > >
>>> >> >> > >> But it should have a subsequent stop action which would set it back to
>>> >> >> > >> being inactive.
>>> >> >> > >> Did that not happen in this case?
>>> >> >> > >
>>> >> >> > > Yes.
>>> >> >> >
>>> >> >> > Could you send me the PE file related to this log please?
>>> >> >> >
>>> >> >> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
>>> >> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
>>> >> >> > /var/lib/pengine/pe-input-4.bz2
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > > Log of "verify_stopped" is only recorded.
>>> >> >> > > The stop handling of resource that failed in probe was not carried out.
>>> >> >> > >
>>> >> >> > > -----------------------------
>>> >> >> > > ######### yamauchi PREV STOP ##########
>>> >> >> > > Jan  6 19:21:56 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ifcheckd process group 3462 with signal 15
>>> >> >> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>>> >> >> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: do_node_walk: Requesting the list of configured nodes
>>> >> >> > > Jan  6 19:21:58 rh57-1 ifcheckd: [3462]: info: main: Exiting ifcheckd
>>> >> >> > > Jan  6 19:21:58 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/crmd process group 3461 with signal 15
>>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_shutdown: Requesting shutdown
>>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
>>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
>>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_shutdown_req: Sending shutdown request to DC: rh57-1
>>> >> >> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: handle_shutdown_request: Creating shutdown request for rh57-1 (state=S_POLICY_ENGINE)
>>> >> >> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (1325845319)
>>> >> >> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_perform_update: Sent update 14: shutdown=1325845319
>>> >> >> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: abort_transition_graph: te_update_diff:150 - Triggered transition abort (complete=1, tag=nvpair, id=status-1fdd5e2a-44b6-44b9-9993-97fa120072a4-shutdown, name=shutdown, value=1325845319, magic=NA, cib=0.101.16) : Transient attribute: update
>>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: crm_timer_popped: New Transition Timer (I_PE_CALC) just popped!
>>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke: Query 44: Requesting the current CIB: S_POLICY_ENGINE
>>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke_callback: Invoking the PE: query=44, ref=pe_calc-dc-1325845321-26, seq=1, quorate=0
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: unpack_config: On loss of CCM Quorum: Ignore
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: WARN: unpack_nodes: Blind faith: not fencing unseen nodes
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: determine_online_status: Node rh57-1 is shutting down
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from re-starting anywhere in the cluster
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpUltraMonkey
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmVIP       (ocf::heartbeat:LVM):   Stopped
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith1
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-2        (stonith:external/ssh): Stopped
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-3        (stonith:meatware):     Stopped
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith2
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-2        (stonith:external/ssh): Started rh57-1
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-3        (stonith:meatware):     Started rh57-1
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: clone_print:  Clone Set: clnPingd
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: short_print:      Started: [ rh57-1 ]
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: clnPingd: Rolling back scores from prmVIP
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmPingd:0 cannot run anywhere
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmVIP cannot run anywhere
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith1-2: Rolling back scores from prmStonith1-3
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-2 cannot run anywhere
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-3 cannot run anywhere
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith2-2: Rolling back scores from prmStonith2-3
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-2 cannot run anywhere
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-3 cannot run anywhere
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: stage6: Scheduling Node rh57-1 for shutdown
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmVIP     (Stopped)
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-2      (Stopped)
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-3      (Stopped)
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-2      (rh57-1)
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-3      (rh57-1)
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmPingd:0 (rh57-1)
>>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: process_pe_message: Transition 4: PEngine Input stored in: /var/lib/pengine/pe-input-4.bz2
>>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: unpack_graph: Unpacked transition 4: 9 actions in 9 synapses
>>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing graph 4 (ref=pe_calc-dc-1325845321-26) derived from /var/lib/pengine/pe-input-4.bz2
>>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 19 fired and confirmed
>>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 24 fired and confirmed
>>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 21: stop prmPingd:0_stop_0 on rh57-1 (local)
>>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[10] on prmPingd:0 for client 3461, its parameters: CRM_meta_interval=[10000] multiplier=[100] CRM_meta_on_fail=[restart] CRM_meta_timeout=[60000] name=[default_ping_set] CRM_meta_clone_max=[1] crm_feature_set=[3.0.1] host_list=[192.168.40.1] CRM_meta_globally_unique=[false] CRM_meta_name=[monitor] CRM_meta_clone=[0] CRM_meta_clone_node_max=[1] CRM_meta_notify=[false]  cancelled
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=21:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmPingd:0_stop_0 )
>>> >> >> > > Jan  6 19:22:02 rh57-1 pingd: [3529]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmPingd:0 stop[14] (pid 3612)
>>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[14] on prmPingd:0 for client 3461: pid 3612 exited with return code 0
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_monitor_10000 (call=10, status=1, cib-update=0, confirmed=true) Cancelled
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_stop_0 (call=14, rc=0, cib-update=45, confirmed=true) ok
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmPingd:0_stop_0 (21) confirmed on rh57-1 (rc=0)
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 25 fired and confirmed
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 4 fired and confirmed
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 16: stop prmStonith2-3_stop_0 on rh57-1 (local)
>>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[13] on prmStonith2-3 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[600s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[2] CRM_meta_name=[monitor]  cancelled
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=16:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-3_stop_0 )
>>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-3 stop[15] (pid 3617)
>>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3617]: info: Try to stop STONITH resource <rsc_id=prmStonith2-3> : Device=meatware
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_monitor_3600000 (call=13, status=1, cib-update=0, confirmed=true) Cancelled
>>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[15] on prmStonith2-3 for client 3461: pid 3617 exited with return code 0
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_stop_0 (call=15, rc=0, cib-update=46, confirmed=true) ok
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-3_stop_0 (16) confirmed on rh57-1 (rc=0)
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 15: stop prmStonith2-2_stop_0 on rh57-1 (local)
>>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[11] on prmStonith2-2 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[60s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[1] CRM_meta_name=[monitor]  cancelled
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=15:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-2_stop_0 )
>>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-2 stop[16] (pid 3619)
>>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3619]: info: Try to stop STONITH resource <rsc_id=prmStonith2-2> : Device=external/ssh
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_monitor_3600000 (call=11, status=1, cib-update=0, confirmed=true) Cancelled
>>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[16] on prmStonith2-2 for client 3461: pid 3619 exited with return code 0
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_stop_0 (call=16, rc=0, cib-update=47, confirmed=true) ok
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-2_stop_0 (15) confirmed on rh57-1 (rc=0)
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 20 fired and confirmed
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: Executing crm-event (28): do_shutdown on rh57-1
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: crm-event (28) is a local shutdown
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: run_graph: ====================================================
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: notice: run_graph: Transition 4 (Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-4.bz2): Complete
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_graph_trigger: Transition 4 is now complete
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ]
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_dc_release: DC role released
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
>>> >> >> > > Jan  6 19:22:03 rh57-1 pengine: [3464]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Transitioner is now inactive
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Disconnecting STONITH...
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: tengine_stonith_connection_destroy: Fencing daemon disconnected
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: notice: Not currently connected.
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: register_fsa_input_adv: do_shutdown stalled the FSA with pending inputs
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: do_log: FSA: Input I_RELEASE_SUCCESS from do_dc_release() received in state S_STOPPING
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 420 ms (> 100 ms) before being called (GSource: 0x179d9b0)
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: G_SIG_dispatch: started at 429442052 should have started at 429442010
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmdManagedChildDied: Process pengine:[3464] exited (signal=0, exitcode=0)
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 80 ms (> 30 ms) (GSource: 0x179d9b0)
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_msg_dispatch: Received HUP from pengine:[3464]
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_connection_destroy: Connection to the Policy Engine released
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: ERROR: verify_stopped: Resource prmVIP was active at shutdown.  You may ignore this error if it is unmanaged.
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_lrm_control: Disconnected from the LRM
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_ha_control: Disconnected from Heartbeat
>>> >> >> > > Jan  6 19:22:03 rh57-1 ccm: [3456]: info: client (pid=3461) removed from ccm
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_cib_control: Disconnecting CIB
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmd_cib_connection_destroy: Connection to the CIB terminated...
>>> >> >> > > Jan  6 19:22:03 rh57-1 cib: [3457]: info: cib_process_readwrite: We are now in R/O mode
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd
>>> >> >> > > Jan  6 19:22:03 rh57-1 cib: [3457]: WARN: send_ipc_message: IPC Channel to 3461 is not connected
>>> >> >> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: free_mem: Dropping I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ]
>>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: send_via_callback_channel: Delivery of reply to client 3461/5f69edda-aec9-42c7-ae52-045a05d1c5db failed
>>> >> >> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: do_exit: [crmd] stopped (0)
>>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed
>>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/attrd process group 3460 with signal 15
>>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 50 ms (> 30 ms) (GSource: 0x7b28140)
>>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_shutdown: Exiting
>>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: main: Exiting...
>>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_cib_connection_destroy: Connection to the CIB terminated...
>>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/stonithd process group 3459 with signal 15
>>> >> >> > > Jan  6 19:22:04 rh57-1 stonithd: [3459]: notice: /usr/lib64/heartbeat/stonithd normally quit.
>>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/lrmd -r process group 3458 with signal 15
>>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
>>> >> >> > > Jan  6 19:22:04 rh57-1 lrmd: [3458]: info: lrmd is shutting down
>>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/cib process group 3457 with signal 15
>>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
>>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_shutdown: Disconnected 0 clients
>>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_process_disconnect: All clients disconnected...
>>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: initiate_exit: Disconnecting heartbeat
>>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: Exiting...
>>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: main: Done
>>> >> >> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: client (pid=3457) removed from ccm
>>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ccm process group 3456 with signal 15
>>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 60 ms (> 30 ms) (GSource: 0x7b28140)
>>> >> >> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: received SIGTERM, going to shut down
>>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBFIFO process 3446 with signal 15
>>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3447 with signal 15
>>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3448 with signal 15
>>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3449 with signal 15
>>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3450 with signal 15
>>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3448 exited. 5 remaining
>>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3447 exited. 4 remaining
>>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3450 exited. 3 remaining
>>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3446 exited. 2 remaining
>>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3449 exited. 1 remaining
>>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: rh57-1 Heartbeat shutdown complete.
>>> >> >> > >
>>> >> >> > > -----------------------------
>>> >> >> > >
>>> >> >> > >
>>> >> >> > >
>>> >> >> > > Best Regards,
>>> >> >> > > Hideo Yamauchi.
>>> >> >> > >
>>> >> >> > >
>>> >> >> > >
>>> >> >> > >
>>> >> >> > >
>>> >> >> > > --- On Fri, 2012/1/6, Andrew Beekhof <andrew [at] beekhof> wrote:
>>> >> >> > >
>>> >> >> > >> On Tue, Dec 27, 2011 at 6:15 PM,  <renayama19661014 [at] ybb> wrote:
>>> >> >> > >> > Hi All,
>>> >> >> > >> >
>>> >> >> > >> > When Pacemaker stops when there is the resource that failed in probe processing, crmd outputs the following error message.
>>> >> >> > >> >
>>> >> >> > >> >
>>> >> >> > >> >  Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX was active at shutdown.  You may ignore this error if it is unmanaged.
>>> >> >> > >> >
>>> >> >> > >> >
>>> >> >> > >> > Because the resource that failed in probe processing does not start,
>>> >> >> > >>
>>> >> >> > >> But it should have a subsequent stop action which would set it back to
>>> >> >> > >> being inactive.
>>> >> >> > >> Did that not happen in this case?
>>> >> >> > >>
>>> >> >> > >> > this error message is not right.
>>> >> >> > >> >
>>> >> >> > >> > I think that the following correction may be good, but we do not have conviction.
>>> >> >> > >> >
>>> >> >> > >> >
>>> >> >> > >> >  * crmd/lrm.c
>>> >> >> > >> >  (snip)
>>> >> >> > >> >                } else if(op->rc == EXECRA_NOT_RUNNING) {
>>> >> >> > >> >                        active = FALSE;
>>> >> >> > >> > +                } else if(op->rc != EXECRA_OK && op->interval == 0
>>> >> >> > >> > +                                && safe_str_eq(op->op_type, CRMD_ACTION_STATUS)) {
>>> >> >> > >> > +                        active = FALSE;
>>> >> >> > >> >                } else {
>>> >> >> > >> >                        active = TRUE;
>>> >> >> > >> >                }
>>> >> >> > >> >  (snip)
>>> >> >> > >> >
>>> >> >> > >> >
>>> >> >> > >> > In the source for development of Pacemaker, handling of this processing seems to be considerably changed.
>>> >> >> > >> > It requests backporting to Pacemaker1.0 system of this change that we can do it.
>>> >> >> > >> >
>>> >> >> > >> > Best Regards,
>>> >> >> > >> > Hideo Yamauchi.
>>> >> >> > >> >
>>> >> >> > >> >
>>> >> >> > >> >
>>> >> >> > >> > _______________________________________________
>>> >> >> > >> > Pacemaker mailing list: Pacemaker [at] oss
>>> >> >> > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> >> >> > >> >
>>> >> >> > >> > Project Home: http://www.clusterlabs.org
>>> >> >> > >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> >> >> > >> > Bugs: http://bugs.clusterlabs.org
>>> >> >> > >>
>>> >> >> > >
>>> >> >> > > _______________________________________________
>>> >> >> > > Pacemaker mailing list: Pacemaker [at] oss
>>> >> >> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> >> >> > >
>>> >> >> > > Project Home: http://www.clusterlabs.org
>>> >> >> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> >> >> > > Bugs: http://bugs.clusterlabs.org
>>> >> >> >
>>> >> >>
>>> >> >
>>> >> > _______________________________________________
>>> >> > Pacemaker mailing list: Pacemaker [at] oss
>>> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> >> >
>>> >> > Project Home: http://www.clusterlabs.org
>>> >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> >> > Bugs: http://bugs.clusterlabs.org
>>> >>
>>> >
>>> > _______________________________________________
>>> > Pacemaker mailing list: Pacemaker [at] oss
>>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> >
>>> > Project Home: http://www.clusterlabs.org
>>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> > Bugs: http://bugs.clusterlabs.org
>>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker [at] oss
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


renayama19661014 at ybb

Feb 23, 2012, 4:06 PM

Post #13 of 15 (1437 views)
Permalink
Re: [Problem]It is judged that a stopping resource is starting. [In reply to]

Hi Andrew,

Thank you for comment.

> >>
> >> 1)It is necessary for the manager of the system to cope when rc is 6(fatal) log.
> >> 2)And it is necessary for this to be reflected by a document.
>
> No to both.

All right.

> >> And does it mean that the next log should not be output until a system administrator controls it?
> >>
> >> Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX was active at shutdown.  You may ignore this error if it is unmanaged.
> >
> > Right.  There was actually a third part... a slightly more restrictive
> > version of your original patch:
>
> https://github.com/beekhof/pacemaker/commit/543ee8e

I confirmed it.

Many Thanks!!
Hideo Yamauchi.

> >> --- On Tue, 2012/2/21, Andrew Beekhof <andrew [at] beekhof> wrote:
> >>
> >>> On Fri, Feb 17, 2012 at 10:49 AM,  <renayama19661014 [at] ybb> wrote:
> >>> > Hi Andrew,
> >>> >
> >>> > Thank you for comment.
> >>> >
> >>> >> I'm getting to this soon, really :-)
> >>> >> First it was corosync 2.0 stuff, so that /something/ in fedora-17
> >>> >> works, then fixing everything I broke when adding corosync 2.0
> >>> >> support.
> >>> >
> >>> > All right!
> >>> >
> >>> > I wait for your answer.
> >>>
> >>> I somehow missed that the failure was "not configured"
> >>>
> >>> Failed actions:
> >>>     prmVIP_monitor_0 (node=rh57-1, call=2, rc=6, status=complete): not
> >>> configured
> >>>
> >>> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html
> >>> lists rc=6 as fatal, but I believe we changed that behaviour (the
> >>> stopping aspect) in the PE as there was also insufficient information
> >>> for the agent to stop the service.
> >>> Which results in the node being fenced, the resource being probed,
> >>> which fails along with the subsequent stop, then the node is fenced
> >>> again, etc.
> >>>
> >>> So two things:
> >>>
> >>> this log message should include the human version of rc=6
> >>> Jan  6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard
> >>> error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from
> >>> re-starting anywhere in the cluster
> >>>
> >>> and the docs need to be updated.
> >>>
> >>> >
> >>> > Best Regards,
> >>> > Hideo Yamauchi.
> >>> >
> >>> > --- On Thu, 2012/2/16, Andrew Beekhof <andrew [at] beekhof> wrote:
> >>> >
> >>> >> Sorry!
> >>> >>
> >>> >> I'm getting to this soon, really :-)
> >>> >> First it was corosync 2.0 stuff, so that /something/ in fedora-17
> >>> >> works, then fixing everything I broke when adding corosync 2.0
> >>> >> support.
> >>> >>
> >>> >> On Tue, Feb 14, 2012 at 11:20 AM,  <renayama19661014 [at] ybb> wrote:
> >>> >> > Hi Andrew,
> >>> >> >
> >>> >> > About this problem, how did it turn out afterwards?
> >>> >> >
> >>> >> > Best Regards,
> >>> >> > Hideo Yamauchi.
> >>> >> >
> >>> >> >
> >>> >> > --- On Mon, 2012/1/16, renayama19661014 [at] ybb <renayama19661014 [at] ybb> wrote:
> >>> >> >
> >>> >> >> Hi Andrew,
> >>> >> >>
> >>> >> >> Thank you for comments.
> >>> >> >>
> >>> >> >> > Could you send me the PE file related to this log please?
> >>> >> >> >
> >>> >> >> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
> >>> >> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
> >>> >> >> > /var/lib/pengine/pe-input-4.bz2
> >>> >> >>
> >>> >> >> The old file disappeared.
> >>> >> >> I send log and the PE file which reappeared in the same procedure.
> >>> >> >>
> >>> >> >>  * trac1818.zip
> >>> >> >>   * https://skydrive.live.com/?cid=3a14d57622c66876&id=3A14D57622C66876%21127
> >>> >> >>
> >>> >> >> Best Regards,
> >>> >> >> Hideo Yamauchi.
> >>> >> >>
> >>> >> >>
> >>> >> >> --- On Mon, 2012/1/16, Andrew Beekhof <andrew [at] beekhof> wrote:
> >>> >> >>
> >>> >> >> > On Fri, Jan 6, 2012 at 12:37 PM,  <renayama19661014 [at] ybb> wrote:
> >>> >> >> > > Hi Andrew,
> >>> >> >> > >
> >>> >> >> > > Thank you for comment.
> >>> >> >> > >
> >>> >> >> > >> But it should have a subsequent stop action which would set it back to
> >>> >> >> > >> being inactive.
> >>> >> >> > >> Did that not happen in this case?
> >>> >> >> > >
> >>> >> >> > > Yes.
> >>> >> >> >
> >>> >> >> > Could you send me the PE file related to this log please?
> >>> >> >> >
> >>> >> >> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
> >>> >> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
> >>> >> >> > /var/lib/pengine/pe-input-4.bz2
> >>> >> >> >
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > > Log of "verify_stopped" is only recorded.
> >>> >> >> > > The stop handling of resource that failed in probe was not carried out.
> >>> >> >> > >
> >>> >> >> > > -----------------------------
> >>> >> >> > > ######### yamauchi PREV STOP ##########
> >>> >> >> > > Jan  6 19:21:56 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ifcheckd process group 3462 with signal 15
> >>> >> >> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> >>> >> >> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: do_node_walk: Requesting the list of configured nodes
> >>> >> >> > > Jan  6 19:21:58 rh57-1 ifcheckd: [3462]: info: main: Exiting ifcheckd
> >>> >> >> > > Jan  6 19:21:58 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/crmd process group 3461 with signal 15
> >>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> >>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_shutdown: Requesting shutdown
> >>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
> >>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
> >>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_shutdown_req: Sending shutdown request to DC: rh57-1
> >>> >> >> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: handle_shutdown_request: Creating shutdown request for rh57-1 (state=S_POLICY_ENGINE)
> >>> >> >> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (1325845319)
> >>> >> >> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_perform_update: Sent update 14: shutdown=1325845319
> >>> >> >> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: abort_transition_graph: te_update_diff:150 - Triggered transition abort (complete=1, tag=nvpair, id=status-1fdd5e2a-44b6-44b9-9993-97fa120072a4-shutdown, name=shutdown, value=1325845319, magic=NA, cib=0.101.16) : Transient attribute: update
> >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: crm_timer_popped: New Transition Timer (I_PE_CALC) just popped!
> >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke: Query 44: Requesting the current CIB: S_POLICY_ENGINE
> >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke_callback: Invoking the PE: query=44, ref=pe_calc-dc-1325845321-26, seq=1, quorate=0
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: unpack_config: On loss of CCM Quorum: Ignore
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: WARN: unpack_nodes: Blind faith: not fencing unseen nodes
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: determine_online_status: Node rh57-1 is shutting down
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from re-starting anywhere in the cluster
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpUltraMonkey
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmVIP       (ocf::heartbeat:LVM):   Stopped
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith1
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-2        (stonith:external/ssh): Stopped
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-3        (stonith:meatware):     Stopped
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith2
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-2        (stonith:external/ssh): Started rh57-1
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-3        (stonith:meatware):     Started rh57-1
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: clone_print:  Clone Set: clnPingd
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: short_print:      Started: [ rh57-1 ]
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: clnPingd: Rolling back scores from prmVIP
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmPingd:0 cannot run anywhere
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmVIP cannot run anywhere
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith1-2: Rolling back scores from prmStonith1-3
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-2 cannot run anywhere
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-3 cannot run anywhere
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith2-2: Rolling back scores from prmStonith2-3
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-2 cannot run anywhere
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-3 cannot run anywhere
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: stage6: Scheduling Node rh57-1 for shutdown
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmVIP     (Stopped)
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-2      (Stopped)
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-3      (Stopped)
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-2      (rh57-1)
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-3      (rh57-1)
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmPingd:0 (rh57-1)
> >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: process_pe_message: Transition 4: PEngine Input stored in: /var/lib/pengine/pe-input-4.bz2
> >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: unpack_graph: Unpacked transition 4: 9 actions in 9 synapses
> >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing graph 4 (ref=pe_calc-dc-1325845321-26) derived from /var/lib/pengine/pe-input-4.bz2
> >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 19 fired and confirmed
> >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 24 fired and confirmed
> >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 21: stop prmPingd:0_stop_0 on rh57-1 (local)
> >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[10] on prmPingd:0 for client 3461, its parameters: CRM_meta_interval=[10000] multiplier=[100] CRM_meta_on_fail=[restart] CRM_meta_timeout=[60000] name=[default_ping_set] CRM_meta_clone_max=[1] crm_feature_set=[3.0.1] host_list=[192.168.40.1] CRM_meta_globally_unique=[false] CRM_meta_name=[monitor] CRM_meta_clone=[0] CRM_meta_clone_node_max=[1] CRM_meta_notify=[false]  cancelled
> >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=21:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmPingd:0_stop_0 )
> >>> >> >> > > Jan  6 19:22:02 rh57-1 pingd: [3529]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmPingd:0 stop[14] (pid 3612)
> >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[14] on prmPingd:0 for client 3461: pid 3612 exited with return code 0
> >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_monitor_10000 (call=10, status=1, cib-update=0, confirmed=true) Cancelled
> >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_stop_0 (call=14, rc=0, cib-update=45, confirmed=true) ok
> >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmPingd:0_stop_0 (21) confirmed on rh57-1 (rc=0)
> >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 25 fired and confirmed
> >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 4 fired and confirmed
> >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 16: stop prmStonith2-3_stop_0 on rh57-1 (local)
> >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[13] on prmStonith2-3 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[600s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[2] CRM_meta_name=[monitor]  cancelled
> >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=16:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-3_stop_0 )
> >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-3 stop[15] (pid 3617)
> >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3617]: info: Try to stop STONITH resource <rsc_id=prmStonith2-3> : Device=meatware
> >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_monitor_3600000 (call=13, status=1, cib-update=0, confirmed=true) Cancelled
> >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[15] on prmStonith2-3 for client 3461: pid 3617 exited with return code 0
> >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_stop_0 (call=15, rc=0, cib-update=46, confirmed=true) ok
> >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-3_stop_0 (16) confirmed on rh57-1 (rc=0)
> >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 15: stop prmStonith2-2_stop_0 on rh57-1 (local)
> >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[11] on prmStonith2-2 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[60s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[1] CRM_meta_name=[monitor]  cancelled
> >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=15:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-2_stop_0 )
> >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-2 stop[16] (pid 3619)
> >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3619]: info: Try to stop STONITH resource <rsc_id=prmStonith2-2> : Device=external/ssh
> >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_monitor_3600000 (call=11, status=1, cib-update=0, confirmed=true) Cancelled
> >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[16] on prmStonith2-2 for client 3461: pid 3619 exited with return code 0
> >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_stop_0 (call=16, rc=0, cib-update=47, confirmed=true) ok
> >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-2_stop_0 (15) confirmed on rh57-1 (rc=0)
> >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 20 fired and confirmed
> >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: Executing crm-event (28): do_shutdown on rh57-1
> >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: crm-event (28) is a local shutdown
> >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: run_graph: ====================================================
> >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: notice: run_graph: Transition 4 (Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-4.bz2): Complete
> >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_graph_trigger: Transition 4 is now complete
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ]
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_dc_release: DC role released
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
> >>> >> >> > > Jan  6 19:22:03 rh57-1 pengine: [3464]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Transitioner is now inactive
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Disconnecting STONITH...
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: tengine_stonith_connection_destroy: Fencing daemon disconnected
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: notice: Not currently connected.
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: register_fsa_input_adv: do_shutdown stalled the FSA with pending inputs
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: do_log: FSA: Input I_RELEASE_SUCCESS from do_dc_release() received in state S_STOPPING
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 420 ms (> 100 ms) before being called (GSource: 0x179d9b0)
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: G_SIG_dispatch: started at 429442052 should have started at 429442010
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmdManagedChildDied: Process pengine:[3464] exited (signal=0, exitcode=0)
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 80 ms (> 30 ms) (GSource: 0x179d9b0)
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_msg_dispatch: Received HUP from pengine:[3464]
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_connection_destroy: Connection to the Policy Engine released
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: ERROR: verify_stopped: Resource prmVIP was active at shutdown.  You may ignore this error if it is unmanaged.
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_lrm_control: Disconnected from the LRM
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_ha_control: Disconnected from Heartbeat
> >>> >> >> > > Jan  6 19:22:03 rh57-1 ccm: [3456]: info: client (pid=3461) removed from ccm
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_cib_control: Disconnecting CIB
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmd_cib_connection_destroy: Connection to the CIB terminated...
> >>> >> >> > > Jan  6 19:22:03 rh57-1 cib: [3457]: info: cib_process_readwrite: We are now in R/O mode
> >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd
> >>> >> >> > > Jan  6 19:22:03 rh57-1 cib: [3457]: WARN: send_ipc_message: IPC Channel to 3461 is not connected
> >>> >> >> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: free_mem: Dropping I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ]
> >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: send_via_callback_channel: Delivery of reply to client 3461/5f69edda-aec9-42c7-ae52-045a05d1c5db failed
> >>> >> >> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: do_exit: [crmd] stopped (0)
> >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed
> >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/attrd process group 3460 with signal 15
> >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 50 ms (> 30 ms) (GSource: 0x7b28140)
> >>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> >>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_shutdown: Exiting
> >>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: main: Exiting...
> >>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_cib_connection_destroy: Connection to the CIB terminated...
> >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/stonithd process group 3459 with signal 15
> >>> >> >> > > Jan  6 19:22:04 rh57-1 stonithd: [3459]: notice: /usr/lib64/heartbeat/stonithd normally quit.
> >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/lrmd -r process group 3458 with signal 15
> >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
> >>> >> >> > > Jan  6 19:22:04 rh57-1 lrmd: [3458]: info: lrmd is shutting down
> >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/cib process group 3457 with signal 15
> >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
> >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_shutdown: Disconnected 0 clients
> >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_process_disconnect: All clients disconnected...
> >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: initiate_exit: Disconnecting heartbeat
> >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: Exiting...
> >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: main: Done
> >>> >> >> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: client (pid=3457) removed from ccm
> >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ccm process group 3456 with signal 15
> >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 60 ms (> 30 ms) (GSource: 0x7b28140)
> >>> >> >> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: received SIGTERM, going to shut down
> >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBFIFO process 3446 with signal 15
> >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3447 with signal 15
> >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3448 with signal 15
> >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3449 with signal 15
> >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3450 with signal 15
> >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3448 exited. 5 remaining
> >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3447 exited. 4 remaining
> >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3450 exited. 3 remaining
> >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3446 exited. 2 remaining
> >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3449 exited. 1 remaining
> >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: rh57-1 Heartbeat shutdown complete.
> >>> >> >> > >
> >>> >> >> > > -----------------------------
> >>> >> >> > >
> >>> >> >> > >
> >>> >> >> > >
> >>> >> >> > > Best Regards,
> >>> >> >> > > Hideo Yamauchi.
> >>> >> >> > >
> >>> >> >> > >
> >>> >> >> > >
> >>> >> >> > >
> >>> >> >> > >
> >>> >> >> > > --- On Fri, 2012/1/6, Andrew Beekhof <andrew [at] beekhof> wrote:
> >>> >> >> > >
> >>> >> >> > >> On Tue, Dec 27, 2011 at 6:15 PM,  <renayama19661014 [at] ybb> wrote:
> >>> >> >> > >> > Hi All,
> >>> >> >> > >> >
> >>> >> >> > >> > When Pacemaker stops when there is the resource that failed in probe processing, crmd outputs the following error message.
> >>> >> >> > >> >
> >>> >> >> > >> >
> >>> >> >> > >> >  Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX was active at shutdown.  You may ignore this error if it is unmanaged.
> >>> >> >> > >> >
> >>> >> >> > >> >
> >>> >> >> > >> > Because the resource that failed in probe processing does not start,
> >>> >> >> > >>
> >>> >> >> > >> But it should have a subsequent stop action which would set it back to
> >>> >> >> > >> being inactive.
> >>> >> >> > >> Did that not happen in this case?
> >>> >> >> > >>
> >>> >> >> > >> > this error message is not right.
> >>> >> >> > >> >
> >>> >> >> > >> > I think that the following correction may be good, but we do not have conviction.
> >>> >> >> > >> >
> >>> >> >> > >> >
> >>> >> >> > >> >  * crmd/lrm.c
> >>> >> >> > >> >  (snip)
> >>> >> >> > >> >                } else if(op->rc == EXECRA_NOT_RUNNING) {
> >>> >> >> > >> >                        active = FALSE;
> >>> >> >> > >> > +                } else if(op->rc != EXECRA_OK && op->interval == 0
> >>> >> >> > >> > +                                && safe_str_eq(op->op_type, CRMD_ACTION_STATUS)) {
> >>> >> >> > >> > +                        active = FALSE;
> >>> >> >> > >> >                } else {
> >>> >> >> > >> >                        active = TRUE;
> >>> >> >> > >> >                }
> >>> >> >> > >> >  (snip)
> >>> >> >> > >> >
> >>> >> >> > >> >
> >>> >> >> > >> > In the source for development of Pacemaker, handling of this processing seems to be considerably changed.
> >>> >> >> > >> > It requests backporting to Pacemaker1.0 system of this change that we can do it.
> >>> >> >> > >> >
> >>> >> >> > >> > Best Regards,
> >>> >> >> > >> > Hideo Yamauchi.
> >>> >> >> > >> >
> >>> >> >> > >> >
> >>> >> >> > >> >
> >>> >> >> > >> > _______________________________________________
> >>> >> >> > >> > Pacemaker mailing list: Pacemaker [at] oss
> >>> >> >> > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>> >> >> > >> >
> >>> >> >> > >> > Project Home: http://www.clusterlabs.org
> >>> >> >> > >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> >> >> > >> > Bugs: http://bugs.clusterlabs.org
> >>> >> >> > >>
> >>> >> >> > >
> >>> >> >> > > _______________________________________________
> >>> >> >> > > Pacemaker mailing list: Pacemaker [at] oss
> >>> >> >> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>> >> >> > >
> >>> >> >> > > Project Home: http://www.clusterlabs.org
> >>> >> >> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> >> >> > > Bugs: http://bugs.clusterlabs.org
> >>> >> >> >
> >>> >> >>
> >>> >> >
> >>> >> > _______________________________________________
> >>> >> > Pacemaker mailing list: Pacemaker [at] oss
> >>> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>> >> >
> >>> >> > Project Home: http://www.clusterlabs.org
> >>> >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> >> > Bugs: http://bugs.clusterlabs.org
> >>> >>
> >>> >
> >>> > _______________________________________________
> >>> > Pacemaker mailing list: Pacemaker [at] oss
> >>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>> >
> >>> > Project Home: http://www.clusterlabs.org
> >>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> > Bugs: http://bugs.clusterlabs.org
> >>>
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker [at] oss
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
>

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


renayama19661014 at ybb

Feb 23, 2012, 4:44 PM

Post #14 of 15 (1428 views)
Permalink
Re: [Problem]It is judged that a stopping resource is starting. [In reply to]

Hi Andrwe,

I overlooked it.
We want Pacemaker1.0 to apply a similar correction. (e.g., like the patch which I contributed)

Best Regards,
Hideo Yamauchi.


--- On Fri, 2012/2/24, renayama19661014 [at] ybb <renayama19661014 [at] ybb> wrote:

> Hi Andrew,
>
> Thank you for comment.
>
> > >>
> > >> 1)It is necessary for the manager of the system to cope when rc is 6(fatal) log.
> > >> 2)And it is necessary for this to be reflected by a document.
> >
> > No to both.
>
> All right.
>
> > >> And does it mean that the next log should not be output until a system administrator controls it?
> > >>
> > >> Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX was active at shutdown.  You may ignore this error if it is unmanaged.
> > >
> > > Right.  There was actually a third part... a slightly more restrictive
> > > version of your original patch:
> >
> > https://github.com/beekhof/pacemaker/commit/543ee8e
>
> I confirmed it.
>
> Many Thanks!!
> Hideo Yamauchi.
>
> > >> --- On Tue, 2012/2/21, Andrew Beekhof <andrew [at] beekhof> wrote:
> > >>
> > >>> On Fri, Feb 17, 2012 at 10:49 AM,  <renayama19661014 [at] ybb> wrote:
> > >>> > Hi Andrew,
> > >>> >
> > >>> > Thank you for comment.
> > >>> >
> > >>> >> I'm getting to this soon, really :-)
> > >>> >> First it was corosync 2.0 stuff, so that /something/ in fedora-17
> > >>> >> works, then fixing everything I broke when adding corosync 2.0
> > >>> >> support.
> > >>> >
> > >>> > All right!
> > >>> >
> > >>> > I wait for your answer.
> > >>>
> > >>> I somehow missed that the failure was "not configured"
> > >>>
> > >>> Failed actions:
> > >>>     prmVIP_monitor_0 (node=rh57-1, call=2, rc=6, status=complete): not
> > >>> configured
> > >>>
> > >>> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html
> > >>> lists rc=6 as fatal, but I believe we changed that behaviour (the
> > >>> stopping aspect) in the PE as there was also insufficient information
> > >>> for the agent to stop the service.
> > >>> Which results in the node being fenced, the resource being probed,
> > >>> which fails along with the subsequent stop, then the node is fenced
> > >>> again, etc.
> > >>>
> > >>> So two things:
> > >>>
> > >>> this log message should include the human version of rc=6
> > >>> Jan  6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard
> > >>> error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from
> > >>> re-starting anywhere in the cluster
> > >>>
> > >>> and the docs need to be updated.
> > >>>
> > >>> >
> > >>> > Best Regards,
> > >>> > Hideo Yamauchi.
> > >>> >
> > >>> > --- On Thu, 2012/2/16, Andrew Beekhof <andrew [at] beekhof> wrote:
> > >>> >
> > >>> >> Sorry!
> > >>> >>
> > >>> >> I'm getting to this soon, really :-)
> > >>> >> First it was corosync 2.0 stuff, so that /something/ in fedora-17
> > >>> >> works, then fixing everything I broke when adding corosync 2.0
> > >>> >> support.
> > >>> >>
> > >>> >> On Tue, Feb 14, 2012 at 11:20 AM,  <renayama19661014 [at] ybb> wrote:
> > >>> >> > Hi Andrew,
> > >>> >> >
> > >>> >> > About this problem, how did it turn out afterwards?
> > >>> >> >
> > >>> >> > Best Regards,
> > >>> >> > Hideo Yamauchi.
> > >>> >> >
> > >>> >> >
> > >>> >> > --- On Mon, 2012/1/16, renayama19661014 [at] ybb <renayama19661014 [at] ybb> wrote:
> > >>> >> >
> > >>> >> >> Hi Andrew,
> > >>> >> >>
> > >>> >> >> Thank you for comments.
> > >>> >> >>
> > >>> >> >> > Could you send me the PE file related to this log please?
> > >>> >> >> >
> > >>> >> >> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
> > >>> >> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
> > >>> >> >> > /var/lib/pengine/pe-input-4.bz2
> > >>> >> >>
> > >>> >> >> The old file disappeared.
> > >>> >> >> I send log and the PE file which reappeared in the same procedure.
> > >>> >> >>
> > >>> >> >>  * trac1818.zip
> > >>> >> >>   * https://skydrive.live.com/?cid=3a14d57622c66876&id=3A14D57622C66876%21127
> > >>> >> >>
> > >>> >> >> Best Regards,
> > >>> >> >> Hideo Yamauchi.
> > >>> >> >>
> > >>> >> >>
> > >>> >> >> --- On Mon, 2012/1/16, Andrew Beekhof <andrew [at] beekhof> wrote:
> > >>> >> >>
> > >>> >> >> > On Fri, Jan 6, 2012 at 12:37 PM,  <renayama19661014 [at] ybb> wrote:
> > >>> >> >> > > Hi Andrew,
> > >>> >> >> > >
> > >>> >> >> > > Thank you for comment.
> > >>> >> >> > >
> > >>> >> >> > >> But it should have a subsequent stop action which would set it back to
> > >>> >> >> > >> being inactive.
> > >>> >> >> > >> Did that not happen in this case?
> > >>> >> >> > >
> > >>> >> >> > > Yes.
> > >>> >> >> >
> > >>> >> >> > Could you send me the PE file related to this log please?
> > >>> >> >> >
> > >>> >> >> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
> > >>> >> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
> > >>> >> >> > /var/lib/pengine/pe-input-4.bz2
> > >>> >> >> >
> > >>> >> >> >
> > >>> >> >> >
> > >>> >> >> > > Log of "verify_stopped" is only recorded.
> > >>> >> >> > > The stop handling of resource that failed in probe was not carried out.
> > >>> >> >> > >
> > >>> >> >> > > -----------------------------
> > >>> >> >> > > ######### yamauchi PREV STOP ##########
> > >>> >> >> > > Jan  6 19:21:56 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ifcheckd process group 3462 with signal 15
> > >>> >> >> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> > >>> >> >> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: do_node_walk: Requesting the list of configured nodes
> > >>> >> >> > > Jan  6 19:21:58 rh57-1 ifcheckd: [3462]: info: main: Exiting ifcheckd
> > >>> >> >> > > Jan  6 19:21:58 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/crmd process group 3461 with signal 15
> > >>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> > >>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_shutdown: Requesting shutdown
> > >>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
> > >>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
> > >>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_shutdown_req: Sending shutdown request to DC: rh57-1
> > >>> >> >> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: handle_shutdown_request: Creating shutdown request for rh57-1 (state=S_POLICY_ENGINE)
> > >>> >> >> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (1325845319)
> > >>> >> >> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_perform_update: Sent update 14: shutdown=1325845319
> > >>> >> >> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: abort_transition_graph: te_update_diff:150 - Triggered transition abort (complete=1, tag=nvpair, id=status-1fdd5e2a-44b6-44b9-9993-97fa120072a4-shutdown, name=shutdown, value=1325845319, magic=NA, cib=0.101.16) : Transient attribute: update
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: crm_timer_popped: New Transition Timer (I_PE_CALC) just popped!
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke: Query 44: Requesting the current CIB: S_POLICY_ENGINE
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke_callback: Invoking the PE: query=44, ref=pe_calc-dc-1325845321-26, seq=1, quorate=0
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: WARN: unpack_nodes: Blind faith: not fencing unseen nodes
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: determine_online_status: Node rh57-1 is shutting down
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from re-starting anywhere in the cluster
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpUltraMonkey
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmVIP       (ocf::heartbeat:LVM):   Stopped
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith1
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-2        (stonith:external/ssh): Stopped
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-3        (stonith:meatware):     Stopped
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith2
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-2        (stonith:external/ssh): Started rh57-1
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-3        (stonith:meatware):     Started rh57-1
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: clone_print:  Clone Set: clnPingd
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: short_print:      Started: [ rh57-1 ]
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: clnPingd: Rolling back scores from prmVIP
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmPingd:0 cannot run anywhere
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmVIP cannot run anywhere
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith1-2: Rolling back scores from prmStonith1-3
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-2 cannot run anywhere
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-3 cannot run anywhere
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith2-2: Rolling back scores from prmStonith2-3
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-2 cannot run anywhere
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-3 cannot run anywhere
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: stage6: Scheduling Node rh57-1 for shutdown
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmVIP     (Stopped)
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-2      (Stopped)
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-3      (Stopped)
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-2      (rh57-1)
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-3      (rh57-1)
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmPingd:0 (rh57-1)
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: process_pe_message: Transition 4: PEngine Input stored in: /var/lib/pengine/pe-input-4.bz2
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: unpack_graph: Unpacked transition 4: 9 actions in 9 synapses
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing graph 4 (ref=pe_calc-dc-1325845321-26) derived from /var/lib/pengine/pe-input-4.bz2
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 19 fired and confirmed
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 24 fired and confirmed
> > >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 21: stop prmPingd:0_stop_0 on rh57-1 (local)
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[10] on prmPingd:0 for client 3461, its parameters: CRM_meta_interval=[10000] multiplier=[100] CRM_meta_on_fail=[restart] CRM_meta_timeout=[60000] name=[default_ping_set] CRM_meta_clone_max=[1] crm_feature_set=[3.0.1] host_list=[192.168.40.1] CRM_meta_globally_unique=[false] CRM_meta_name=[monitor] CRM_meta_clone=[0] CRM_meta_clone_node_max=[1] CRM_meta_notify=[false]  cancelled
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=21:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmPingd:0_stop_0 )
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 pingd: [3529]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmPingd:0 stop[14] (pid 3612)
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[14] on prmPingd:0 for client 3461: pid 3612 exited with return code 0
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_monitor_10000 (call=10, status=1, cib-update=0, confirmed=true) Cancelled
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_stop_0 (call=14, rc=0, cib-update=45, confirmed=true) ok
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmPingd:0_stop_0 (21) confirmed on rh57-1 (rc=0)
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 25 fired and confirmed
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 4 fired and confirmed
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 16: stop prmStonith2-3_stop_0 on rh57-1 (local)
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[13] on prmStonith2-3 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[600s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[2] CRM_meta_name=[monitor]  cancelled
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=16:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-3_stop_0 )
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-3 stop[15] (pid 3617)
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3617]: info: Try to stop STONITH resource <rsc_id=prmStonith2-3> : Device=meatware
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_monitor_3600000 (call=13, status=1, cib-update=0, confirmed=true) Cancelled
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[15] on prmStonith2-3 for client 3461: pid 3617 exited with return code 0
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_stop_0 (call=15, rc=0, cib-update=46, confirmed=true) ok
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-3_stop_0 (16) confirmed on rh57-1 (rc=0)
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 15: stop prmStonith2-2_stop_0 on rh57-1 (local)
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[11] on prmStonith2-2 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[60s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[1] CRM_meta_name=[monitor]  cancelled
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=15:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-2_stop_0 )
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-2 stop[16] (pid 3619)
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3619]: info: Try to stop STONITH resource <rsc_id=prmStonith2-2> : Device=external/ssh
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_monitor_3600000 (call=11, status=1, cib-update=0, confirmed=true) Cancelled
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[16] on prmStonith2-2 for client 3461: pid 3619 exited with return code 0
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_stop_0 (call=16, rc=0, cib-update=47, confirmed=true) ok
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-2_stop_0 (15) confirmed on rh57-1 (rc=0)
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 20 fired and confirmed
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: Executing crm-event (28): do_shutdown on rh57-1
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: crm-event (28) is a local shutdown
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: run_graph: ====================================================
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: notice: run_graph: Transition 4 (Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-4.bz2): Complete
> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_graph_trigger: Transition 4 is now complete
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ]
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_dc_release: DC role released
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 pengine: [3464]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Transitioner is now inactive
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Disconnecting STONITH...
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: tengine_stonith_connection_destroy: Fencing daemon disconnected
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: notice: Not currently connected.
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: register_fsa_input_adv: do_shutdown stalled the FSA with pending inputs
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: do_log: FSA: Input I_RELEASE_SUCCESS from do_dc_release() received in state S_STOPPING
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 420 ms (> 100 ms) before being called (GSource: 0x179d9b0)
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: G_SIG_dispatch: started at 429442052 should have started at 429442010
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmdManagedChildDied: Process pengine:[3464] exited (signal=0, exitcode=0)
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 80 ms (> 30 ms) (GSource: 0x179d9b0)
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_msg_dispatch: Received HUP from pengine:[3464]
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_connection_destroy: Connection to the Policy Engine released
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: ERROR: verify_stopped: Resource prmVIP was active at shutdown.  You may ignore this error if it is unmanaged.
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_lrm_control: Disconnected from the LRM
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_ha_control: Disconnected from Heartbeat
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 ccm: [3456]: info: client (pid=3461) removed from ccm
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_cib_control: Disconnecting CIB
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmd_cib_connection_destroy: Connection to the CIB terminated...
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 cib: [3457]: info: cib_process_readwrite: We are now in R/O mode
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd
> > >>> >> >> > > Jan  6 19:22:03 rh57-1 cib: [3457]: WARN: send_ipc_message: IPC Channel to 3461 is not connected
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: free_mem: Dropping I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ]
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: send_via_callback_channel: Delivery of reply to client 3461/5f69edda-aec9-42c7-ae52-045a05d1c5db failed
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: do_exit: [crmd] stopped (0)
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/attrd process group 3460 with signal 15
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 50 ms (> 30 ms) (GSource: 0x7b28140)
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_shutdown: Exiting
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: main: Exiting...
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_cib_connection_destroy: Connection to the CIB terminated...
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/stonithd process group 3459 with signal 15
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 stonithd: [3459]: notice: /usr/lib64/heartbeat/stonithd normally quit.
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/lrmd -r process group 3458 with signal 15
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 lrmd: [3458]: info: lrmd is shutting down
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/cib process group 3457 with signal 15
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_shutdown: Disconnected 0 clients
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_process_disconnect: All clients disconnected...
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: initiate_exit: Disconnecting heartbeat
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: Exiting...
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: main: Done
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: client (pid=3457) removed from ccm
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ccm process group 3456 with signal 15
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 60 ms (> 30 ms) (GSource: 0x7b28140)
> > >>> >> >> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: received SIGTERM, going to shut down
> > >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBFIFO process 3446 with signal 15
> > >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3447 with signal 15
> > >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3448 with signal 15
> > >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3449 with signal 15
> > >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3450 with signal 15
> > >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3448 exited. 5 remaining
> > >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3447 exited. 4 remaining
> > >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3450 exited. 3 remaining
> > >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3446 exited. 2 remaining
> > >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3449 exited. 1 remaining
> > >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: rh57-1 Heartbeat shutdown complete.
> > >>> >> >> > >
> > >>> >> >> > > -----------------------------
> > >>> >> >> > >
> > >>> >> >> > >
> > >>> >> >> > >
> > >>> >> >> > > Best Regards,
> > >>> >> >> > > Hideo Yamauchi.
> > >>> >> >> > >
> > >>> >> >> > >
> > >>> >> >> > >
> > >>> >> >> > >
> > >>> >> >> > >
> > >>> >> >> > > --- On Fri, 2012/1/6, Andrew Beekhof <andrew [at] beekhof> wrote:
> > >>> >> >> > >
> > >>> >> >> > >> On Tue, Dec 27, 2011 at 6:15 PM,  <renayama19661014 [at] ybb> wrote:
> > >>> >> >> > >> > Hi All,
> > >>> >> >> > >> >
> > >>> >> >> > >> > When Pacemaker stops when there is the resource that failed in probe processing, crmd outputs the following error message.
> > >>> >> >> > >> >
> > >>> >> >> > >> >
> > >>> >> >> > >> >  Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX was active at shutdown.  You may ignore this error if it is unmanaged.
> > >>> >> >> > >> >
> > >>> >> >> > >> >
> > >>> >> >> > >> > Because the resource that failed in probe processing does not start,
> > >>> >> >> > >>
> > >>> >> >> > >> But it should have a subsequent stop action which would set it back to
> > >>> >> >> > >> being inactive.
> > >>> >> >> > >> Did that not happen in this case?
> > >>> >> >> > >>
> > >>> >> >> > >> > this error message is not right.
> > >>> >> >> > >> >
> > >>> >> >> > >> > I think that the following correction may be good, but we do not have conviction.
> > >>> >> >> > >> >
> > >>> >> >> > >> >
> > >>> >> >> > >> >  * crmd/lrm.c
> > >>> >> >> > >> >  (snip)
> > >>> >> >> > >> >                } else if(op->rc == EXECRA_NOT_RUNNING) {
> > >>> >> >> > >> >                        active = FALSE;
> > >>> >> >> > >> > +                } else if(op->rc != EXECRA_OK && op->interval == 0
> > >>> >> >> > >> > +                                && safe_str_eq(op->op_type, CRMD_ACTION_STATUS)) {
> > >>> >> >> > >> > +                        active = FALSE;
> > >>> >> >> > >> >                } else {
> > >>> >> >> > >> >                        active = TRUE;
> > >>> >> >> > >> >                }
> > >>> >> >> > >> >  (snip)
> > >>> >> >> > >> >
> > >>> >> >> > >> >
> > >>> >> >> > >> > In the source for development of Pacemaker, handling of this processing seems to be considerably changed.
> > >>> >> >> > >> > It requests backporting to Pacemaker1.0 system of this change that we can do it.
> > >>> >> >> > >> >
> > >>> >> >> > >> > Best Regards,
> > >>> >> >> > >> > Hideo Yamauchi.
> > >>> >> >> > >> >
> > >>> >> >> > >> >
> > >>> >> >> > >> >
> > >>> >> >> > >> > _______________________________________________
> > >>> >> >> > >> > Pacemaker mailing list: Pacemaker [at] oss
> > >>> >> >> > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >>> >> >> > >> >
> > >>> >> >> > >> > Project Home: http://www.clusterlabs.org
> > >>> >> >> > >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > >>> >> >> > >> > Bugs: http://bugs.clusterlabs.org
> > >>> >> >> > >>
> > >>> >> >> > >
> > >>> >> >> > > _______________________________________________
> > >>> >> >> > > Pacemaker mailing list: Pacemaker [at] oss
> > >>> >> >> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >>> >> >> > >
> > >>> >> >> > > Project Home: http://www.clusterlabs.org
> > >>> >> >> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > >>> >> >> > > Bugs: http://bugs.clusterlabs.org
> > >>> >> >> >
> > >>> >> >>
> > >>> >> >
> > >>> >> > _______________________________________________
> > >>> >> > Pacemaker mailing list: Pacemaker [at] oss
> > >>> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >>> >> >
> > >>> >> > Project Home: http://www.clusterlabs.org
> > >>> >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > >>> >> > Bugs: http://bugs.clusterlabs.org
> > >>> >>
> > >>> >
> > >>> > _______________________________________________
> > >>> > Pacemaker mailing list: Pacemaker [at] oss
> > >>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >>> >
> > >>> > Project Home: http://www.clusterlabs.org
> > >>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > >>> > Bugs: http://bugs.clusterlabs.org
> > >>>
> > >>
> > >> _______________________________________________
> > >> Pacemaker mailing list: Pacemaker [at] oss
> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >>
> > >> Project Home: http://www.clusterlabs.org
> > >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > >> Bugs: http://bugs.clusterlabs.org
> >
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andrew at beekhof

Feb 23, 2012, 4:46 PM

Post #15 of 15 (1428 views)
Permalink
Re: [Problem]It is judged that a stopping resource is starting. [In reply to]

Sure :)

On Fri, Feb 24, 2012 at 11:44 AM, <renayama19661014 [at] ybb> wrote:
> Hi Andrwe,
>
> I overlooked it.
> We want Pacemaker1.0 to apply a similar correction. (e.g., like the patch which I contributed)
>
> Best Regards,
> Hideo Yamauchi.
>
>
> --- On Fri, 2012/2/24, renayama19661014 [at] ybb <renayama19661014 [at] ybb> wrote:
>
>> Hi Andrew,
>>
>> Thank you for comment.
>>
>> > >>
>> > >> 1)It is necessary for the manager of the system to cope when rc is 6(fatal) log.
>> > >> 2)And it is necessary for this to be reflected by a document.
>> >
>> > No to both.
>>
>> All right.
>>
>> > >> And does it mean that the next log should not be output until a system administrator controls it?
>> > >>
>> > >> Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX was active at shutdown.  You may ignore this error if it is unmanaged.
>> > >
>> > > Right.  There was actually a third part... a slightly more restrictive
>> > > version of your original patch:
>> >
>> > https://github.com/beekhof/pacemaker/commit/543ee8e
>>
>> I confirmed it.
>>
>> Many Thanks!!
>> Hideo Yamauchi.
>>
>> > >> --- On Tue, 2012/2/21, Andrew Beekhof <andrew [at] beekhof> wrote:
>> > >>
>> > >>> On Fri, Feb 17, 2012 at 10:49 AM,  <renayama19661014 [at] ybb> wrote:
>> > >>> > Hi Andrew,
>> > >>> >
>> > >>> > Thank you for comment.
>> > >>> >
>> > >>> >> I'm getting to this soon, really :-)
>> > >>> >> First it was corosync 2.0 stuff, so that /something/ in fedora-17
>> > >>> >> works, then fixing everything I broke when adding corosync 2.0
>> > >>> >> support.
>> > >>> >
>> > >>> > All right!
>> > >>> >
>> > >>> > I wait for your answer.
>> > >>>
>> > >>> I somehow missed that the failure was "not configured"
>> > >>>
>> > >>> Failed actions:
>> > >>>     prmVIP_monitor_0 (node=rh57-1, call=2, rc=6, status=complete): not
>> > >>> configured
>> > >>>
>> > >>> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html
>> > >>> lists rc=6 as fatal, but I believe we changed that behaviour (the
>> > >>> stopping aspect) in the PE as there was also insufficient information
>> > >>> for the agent to stop the service.
>> > >>> Which results in the node being fenced, the resource being probed,
>> > >>> which fails along with the subsequent stop, then the node is fenced
>> > >>> again, etc.
>> > >>>
>> > >>> So two things:
>> > >>>
>> > >>> this log message should include the human version of rc=6
>> > >>> Jan  6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard
>> > >>> error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from
>> > >>> re-starting anywhere in the cluster
>> > >>>
>> > >>> and the docs need to be updated.
>> > >>>
>> > >>> >
>> > >>> > Best Regards,
>> > >>> > Hideo Yamauchi.
>> > >>> >
>> > >>> > --- On Thu, 2012/2/16, Andrew Beekhof <andrew [at] beekhof> wrote:
>> > >>> >
>> > >>> >> Sorry!
>> > >>> >>
>> > >>> >> I'm getting to this soon, really :-)
>> > >>> >> First it was corosync 2.0 stuff, so that /something/ in fedora-17
>> > >>> >> works, then fixing everything I broke when adding corosync 2.0
>> > >>> >> support.
>> > >>> >>
>> > >>> >> On Tue, Feb 14, 2012 at 11:20 AM,  <renayama19661014 [at] ybb> wrote:
>> > >>> >> > Hi Andrew,
>> > >>> >> >
>> > >>> >> > About this problem, how did it turn out afterwards?
>> > >>> >> >
>> > >>> >> > Best Regards,
>> > >>> >> > Hideo Yamauchi.
>> > >>> >> >
>> > >>> >> >
>> > >>> >> > --- On Mon, 2012/1/16, renayama19661014 [at] ybb <renayama19661014 [at] ybb> wrote:
>> > >>> >> >
>> > >>> >> >> Hi Andrew,
>> > >>> >> >>
>> > >>> >> >> Thank you for comments.
>> > >>> >> >>
>> > >>> >> >> > Could you send me the PE file related to this log please?
>> > >>> >> >> >
>> > >>> >> >> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
>> > >>> >> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
>> > >>> >> >> > /var/lib/pengine/pe-input-4.bz2
>> > >>> >> >>
>> > >>> >> >> The old file disappeared.
>> > >>> >> >> I send log and the PE file which reappeared in the same procedure.
>> > >>> >> >>
>> > >>> >> >>  * trac1818.zip
>> > >>> >> >>   * https://skydrive.live.com/?cid=3a14d57622c66876&id=3A14D57622C66876%21127
>> > >>> >> >>
>> > >>> >> >> Best Regards,
>> > >>> >> >> Hideo Yamauchi.
>> > >>> >> >>
>> > >>> >> >>
>> > >>> >> >> --- On Mon, 2012/1/16, Andrew Beekhof <andrew [at] beekhof> wrote:
>> > >>> >> >>
>> > >>> >> >> > On Fri, Jan 6, 2012 at 12:37 PM,  <renayama19661014 [at] ybb> wrote:
>> > >>> >> >> > > Hi Andrew,
>> > >>> >> >> > >
>> > >>> >> >> > > Thank you for comment.
>> > >>> >> >> > >
>> > >>> >> >> > >> But it should have a subsequent stop action which would set it back to
>> > >>> >> >> > >> being inactive.
>> > >>> >> >> > >> Did that not happen in this case?
>> > >>> >> >> > >
>> > >>> >> >> > > Yes.
>> > >>> >> >> >
>> > >>> >> >> > Could you send me the PE file related to this log please?
>> > >>> >> >> >
>> > >>> >> >> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
>> > >>> >> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
>> > >>> >> >> > /var/lib/pengine/pe-input-4.bz2
>> > >>> >> >> >
>> > >>> >> >> >
>> > >>> >> >> >
>> > >>> >> >> > > Log of "verify_stopped" is only recorded.
>> > >>> >> >> > > The stop handling of resource that failed in probe was not carried out.
>> > >>> >> >> > >
>> > >>> >> >> > > -----------------------------
>> > >>> >> >> > > ######### yamauchi PREV STOP ##########
>> > >>> >> >> > > Jan  6 19:21:56 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ifcheckd process group 3462 with signal 15
>> > >>> >> >> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> > >>> >> >> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: do_node_walk: Requesting the list of configured nodes
>> > >>> >> >> > > Jan  6 19:21:58 rh57-1 ifcheckd: [3462]: info: main: Exiting ifcheckd
>> > >>> >> >> > > Jan  6 19:21:58 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/crmd process group 3461 with signal 15
>> > >>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> > >>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_shutdown: Requesting shutdown
>> > >>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
>> > >>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
>> > >>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_shutdown_req: Sending shutdown request to DC: rh57-1
>> > >>> >> >> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: handle_shutdown_request: Creating shutdown request for rh57-1 (state=S_POLICY_ENGINE)
>> > >>> >> >> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (1325845319)
>> > >>> >> >> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_perform_update: Sent update 14: shutdown=1325845319
>> > >>> >> >> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: abort_transition_graph: te_update_diff:150 - Triggered transition abort (complete=1, tag=nvpair, id=status-1fdd5e2a-44b6-44b9-9993-97fa120072a4-shutdown, name=shutdown, value=1325845319, magic=NA, cib=0.101.16) : Transient attribute: update
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: crm_timer_popped: New Transition Timer (I_PE_CALC) just popped!
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke: Query 44: Requesting the current CIB: S_POLICY_ENGINE
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke_callback: Invoking the PE: query=44, ref=pe_calc-dc-1325845321-26, seq=1, quorate=0
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: unpack_config: On loss of CCM Quorum: Ignore
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: WARN: unpack_nodes: Blind faith: not fencing unseen nodes
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: determine_online_status: Node rh57-1 is shutting down
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from re-starting anywhere in the cluster
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpUltraMonkey
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmVIP       (ocf::heartbeat:LVM):   Stopped
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith1
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-2        (stonith:external/ssh): Stopped
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-3        (stonith:meatware):     Stopped
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith2
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-2        (stonith:external/ssh): Started rh57-1
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-3        (stonith:meatware):     Started rh57-1
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: clone_print:  Clone Set: clnPingd
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: short_print:      Started: [ rh57-1 ]
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: clnPingd: Rolling back scores from prmVIP
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmPingd:0 cannot run anywhere
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmVIP cannot run anywhere
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith1-2: Rolling back scores from prmStonith1-3
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-2 cannot run anywhere
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-3 cannot run anywhere
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith2-2: Rolling back scores from prmStonith2-3
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-2 cannot run anywhere
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-3 cannot run anywhere
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: stage6: Scheduling Node rh57-1 for shutdown
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmVIP     (Stopped)
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-2      (Stopped)
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-3      (Stopped)
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-2      (rh57-1)
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-3      (rh57-1)
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmPingd:0 (rh57-1)
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: process_pe_message: Transition 4: PEngine Input stored in: /var/lib/pengine/pe-input-4.bz2
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: unpack_graph: Unpacked transition 4: 9 actions in 9 synapses
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing graph 4 (ref=pe_calc-dc-1325845321-26) derived from /var/lib/pengine/pe-input-4.bz2
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 19 fired and confirmed
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 24 fired and confirmed
>> > >>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 21: stop prmPingd:0_stop_0 on rh57-1 (local)
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[10] on prmPingd:0 for client 3461, its parameters: CRM_meta_interval=[10000] multiplier=[100] CRM_meta_on_fail=[restart] CRM_meta_timeout=[60000] name=[default_ping_set] CRM_meta_clone_max=[1] crm_feature_set=[3.0.1] host_list=[192.168.40.1] CRM_meta_globally_unique=[false] CRM_meta_name=[monitor] CRM_meta_clone=[0] CRM_meta_clone_node_max=[1] CRM_meta_notify=[false]  cancelled
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=21:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmPingd:0_stop_0 )
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 pingd: [3529]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmPingd:0 stop[14] (pid 3612)
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[14] on prmPingd:0 for client 3461: pid 3612 exited with return code 0
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_monitor_10000 (call=10, status=1, cib-update=0, confirmed=true) Cancelled
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_stop_0 (call=14, rc=0, cib-update=45, confirmed=true) ok
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmPingd:0_stop_0 (21) confirmed on rh57-1 (rc=0)
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 25 fired and confirmed
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 4 fired and confirmed
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 16: stop prmStonith2-3_stop_0 on rh57-1 (local)
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[13] on prmStonith2-3 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[600s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[2] CRM_meta_name=[monitor]  cancelled
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=16:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-3_stop_0 )
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-3 stop[15] (pid 3617)
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3617]: info: Try to stop STONITH resource <rsc_id=prmStonith2-3> : Device=meatware
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_monitor_3600000 (call=13, status=1, cib-update=0, confirmed=true) Cancelled
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[15] on prmStonith2-3 for client 3461: pid 3617 exited with return code 0
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_stop_0 (call=15, rc=0, cib-update=46, confirmed=true) ok
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-3_stop_0 (16) confirmed on rh57-1 (rc=0)
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 15: stop prmStonith2-2_stop_0 on rh57-1 (local)
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[11] on prmStonith2-2 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[60s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[1] CRM_meta_name=[monitor]  cancelled
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=15:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-2_stop_0 )
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-2 stop[16] (pid 3619)
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3619]: info: Try to stop STONITH resource <rsc_id=prmStonith2-2> : Device=external/ssh
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_monitor_3600000 (call=11, status=1, cib-update=0, confirmed=true) Cancelled
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[16] on prmStonith2-2 for client 3461: pid 3619 exited with return code 0
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_stop_0 (call=16, rc=0, cib-update=47, confirmed=true) ok
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-2_stop_0 (15) confirmed on rh57-1 (rc=0)
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 20 fired and confirmed
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: Executing crm-event (28): do_shutdown on rh57-1
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: crm-event (28) is a local shutdown
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: run_graph: ====================================================
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: notice: run_graph: Transition 4 (Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-4.bz2): Complete
>> > >>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_graph_trigger: Transition 4 is now complete
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ]
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_dc_release: DC role released
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 pengine: [3464]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Transitioner is now inactive
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Disconnecting STONITH...
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: tengine_stonith_connection_destroy: Fencing daemon disconnected
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: notice: Not currently connected.
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: register_fsa_input_adv: do_shutdown stalled the FSA with pending inputs
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: do_log: FSA: Input I_RELEASE_SUCCESS from do_dc_release() received in state S_STOPPING
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 420 ms (> 100 ms) before being called (GSource: 0x179d9b0)
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: G_SIG_dispatch: started at 429442052 should have started at 429442010
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmdManagedChildDied: Process pengine:[3464] exited (signal=0, exitcode=0)
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 80 ms (> 30 ms) (GSource: 0x179d9b0)
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_msg_dispatch: Received HUP from pengine:[3464]
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_connection_destroy: Connection to the Policy Engine released
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: ERROR: verify_stopped: Resource prmVIP was active at shutdown.  You may ignore this error if it is unmanaged.
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_lrm_control: Disconnected from the LRM
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_ha_control: Disconnected from Heartbeat
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 ccm: [3456]: info: client (pid=3461) removed from ccm
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_cib_control: Disconnecting CIB
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmd_cib_connection_destroy: Connection to the CIB terminated...
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 cib: [3457]: info: cib_process_readwrite: We are now in R/O mode
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd
>> > >>> >> >> > > Jan  6 19:22:03 rh57-1 cib: [3457]: WARN: send_ipc_message: IPC Channel to 3461 is not connected
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: free_mem: Dropping I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ]
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: send_via_callback_channel: Delivery of reply to client 3461/5f69edda-aec9-42c7-ae52-045a05d1c5db failed
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: do_exit: [crmd] stopped (0)
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/attrd process group 3460 with signal 15
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 50 ms (> 30 ms) (GSource: 0x7b28140)
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_shutdown: Exiting
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: main: Exiting...
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_cib_connection_destroy: Connection to the CIB terminated...
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/stonithd process group 3459 with signal 15
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 stonithd: [3459]: notice: /usr/lib64/heartbeat/stonithd normally quit.
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/lrmd -r process group 3458 with signal 15
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 lrmd: [3458]: info: lrmd is shutting down
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/cib process group 3457 with signal 15
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_shutdown: Disconnected 0 clients
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_process_disconnect: All clients disconnected...
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: initiate_exit: Disconnecting heartbeat
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: Exiting...
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: main: Done
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: client (pid=3457) removed from ccm
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ccm process group 3456 with signal 15
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 60 ms (> 30 ms) (GSource: 0x7b28140)
>> > >>> >> >> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: received SIGTERM, going to shut down
>> > >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBFIFO process 3446 with signal 15
>> > >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3447 with signal 15
>> > >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3448 with signal 15
>> > >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3449 with signal 15
>> > >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3450 with signal 15
>> > >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3448 exited. 5 remaining
>> > >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3447 exited. 4 remaining
>> > >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3450 exited. 3 remaining
>> > >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3446 exited. 2 remaining
>> > >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3449 exited. 1 remaining
>> > >>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: rh57-1 Heartbeat shutdown complete.
>> > >>> >> >> > >
>> > >>> >> >> > > -----------------------------
>> > >>> >> >> > >
>> > >>> >> >> > >
>> > >>> >> >> > >
>> > >>> >> >> > > Best Regards,
>> > >>> >> >> > > Hideo Yamauchi.
>> > >>> >> >> > >
>> > >>> >> >> > >
>> > >>> >> >> > >
>> > >>> >> >> > >
>> > >>> >> >> > >
>> > >>> >> >> > > --- On Fri, 2012/1/6, Andrew Beekhof <andrew [at] beekhof> wrote:
>> > >>> >> >> > >
>> > >>> >> >> > >> On Tue, Dec 27, 2011 at 6:15 PM,  <renayama19661014 [at] ybb> wrote:
>> > >>> >> >> > >> > Hi All,
>> > >>> >> >> > >> >
>> > >>> >> >> > >> > When Pacemaker stops when there is the resource that failed in probe processing, crmd outputs the following error message.
>> > >>> >> >> > >> >
>> > >>> >> >> > >> >
>> > >>> >> >> > >> >  Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX was active at shutdown.  You may ignore this error if it is unmanaged.
>> > >>> >> >> > >> >
>> > >>> >> >> > >> >
>> > >>> >> >> > >> > Because the resource that failed in probe processing does not start,
>> > >>> >> >> > >>
>> > >>> >> >> > >> But it should have a subsequent stop action which would set it back to
>> > >>> >> >> > >> being inactive.
>> > >>> >> >> > >> Did that not happen in this case?
>> > >>> >> >> > >>
>> > >>> >> >> > >> > this error message is not right.
>> > >>> >> >> > >> >
>> > >>> >> >> > >> > I think that the following correction may be good, but we do not have conviction.
>> > >>> >> >> > >> >
>> > >>> >> >> > >> >
>> > >>> >> >> > >> >  * crmd/lrm.c
>> > >>> >> >> > >> >  (snip)
>> > >>> >> >> > >> >                } else if(op->rc == EXECRA_NOT_RUNNING) {
>> > >>> >> >> > >> >                        active = FALSE;
>> > >>> >> >> > >> > +                } else if(op->rc != EXECRA_OK && op->interval == 0
>> > >>> >> >> > >> > +                                && safe_str_eq(op->op_type, CRMD_ACTION_STATUS)) {
>> > >>> >> >> > >> > +                        active = FALSE;
>> > >>> >> >> > >> >                } else {
>> > >>> >> >> > >> >                        active = TRUE;
>> > >>> >> >> > >> >                }
>> > >>> >> >> > >> >  (snip)
>> > >>> >> >> > >> >
>> > >>> >> >> > >> >
>> > >>> >> >> > >> > In the source for development of Pacemaker, handling of this processing seems to be considerably changed.
>> > >>> >> >> > >> > It requests backporting to Pacemaker1.0 system of this change that we can do it.
>> > >>> >> >> > >> >
>> > >>> >> >> > >> > Best Regards,
>> > >>> >> >> > >> > Hideo Yamauchi.
>> > >>> >> >> > >> >
>> > >>> >> >> > >> >
>> > >>> >> >> > >> >
>> > >>> >> >> > >> > _______________________________________________
>> > >>> >> >> > >> > Pacemaker mailing list: Pacemaker [at] oss
>> > >>> >> >> > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> > >>> >> >> > >> >
>> > >>> >> >> > >> > Project Home: http://www.clusterlabs.org
>> > >>> >> >> > >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > >>> >> >> > >> > Bugs: http://bugs.clusterlabs.org
>> > >>> >> >> > >>
>> > >>> >> >> > >
>> > >>> >> >> > > _______________________________________________
>> > >>> >> >> > > Pacemaker mailing list: Pacemaker [at] oss
>> > >>> >> >> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> > >>> >> >> > >
>> > >>> >> >> > > Project Home: http://www.clusterlabs.org
>> > >>> >> >> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > >>> >> >> > > Bugs: http://bugs.clusterlabs.org
>> > >>> >> >> >
>> > >>> >> >>
>> > >>> >> >
>> > >>> >> > _______________________________________________
>> > >>> >> > Pacemaker mailing list: Pacemaker [at] oss
>> > >>> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> > >>> >> >
>> > >>> >> > Project Home: http://www.clusterlabs.org
>> > >>> >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > >>> >> > Bugs: http://bugs.clusterlabs.org
>> > >>> >>
>> > >>> >
>> > >>> > _______________________________________________
>> > >>> > Pacemaker mailing list: Pacemaker [at] oss
>> > >>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> > >>> >
>> > >>> > Project Home: http://www.clusterlabs.org
>> > >>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > >>> > Bugs: http://bugs.clusterlabs.org
>> > >>>
>> > >>
>> > >> _______________________________________________
>> > >> Pacemaker mailing list: Pacemaker [at] oss
>> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> > >>
>> > >> Project Home: http://www.clusterlabs.org
>> > >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > >> Bugs: http://bugs.clusterlabs.org
>> >
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker [at] oss
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.