krienke at uni-koblenz
Apr 4, 2012, 2:28 AM
Am 03.04.2012 17:06, schrieb Lars Marowsky-Bree:
Re: Cluster node hanging upon access to ocfs2 fs when second cluster node dies ?
[In reply to]
> On 2012-04-03T15:59:00, Rainer Krienke <krienke [at] uni-koblenz> wrote:
>> Hi Lars,
>> this was something I detected already. And I changed the timeout in the
>> cluster configuration to 200sec. So the log I posted was the result of
>> the configuration below (200sec). Is this still to small?
>> $ crm configure show
>> primitive stonith_sbd stonith:external/sbd \
>> op monitor interval="200" timeout="200" start-delay="200" \
>> params sbd_device="/dev/disk/by-id/scsi-259316a7265713551-part1"
> This is not what I meant. I meant to change the setting stonith-timeout,
> not the settings on the primitive ;-) In fact, monitoring sbd is quite
> unnecessary, and you actually don't need to specify sbd_device anymore,
> you can just do:
> primitive stonith_sbd stonith:external/sbd
> and leave it at this. But, back to your timeout! Run this:
> crm configure property stonith-timeout=240s
> (And yes, it needs to be over 10% higher than the msgwait timeout,
> because of how stonith-ng internally allocates the stonith-timeout value
> to various stages in the stonith process. Sorry about that, that's a
> pacemaker issue.)
> You will still see IO freeze for approx. 3 minutes until the fence
> completes. That's a side-effect of the sbd values you have configured,
> in particular watchdog and msgwait.
thanks a lot for finding the problem. The wrong set timeout value was
really the causing the trouble. Now it works. I lowered the timeout
values to avoid to long freezing of the clustered filesystem and it
There is one basic thing however I do not understand: My setup involves
only a clustered filesystem. What I do not understand is why a stonith
resource is needed at all in this case which after all causes freezes
of the cl-filesystem depending on the timeout values.
Basically in a cluster fs it should be not important if a node dies. Its
the nature of a cluster fs that many nodes can acces it. If one dies
this is of no meaning to the other nodes that still can access the
So my question comes down to this: Why do I have to fence a node (in
case it failes) in a cluster that has nothing else but a cluster
filesystem. What could go wrong without fencing in this case?
Thanks a lot
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
56070 Koblenz, http://userpages.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html,Fax: +49261287
Linux-HA mailing list
Linux-HA [at] lists
See also: http://linux-ha.org/ReportingProblems