Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Pacemaker

pacemaker/dlm problems

 

 

First page Previous page 1 2 Next page Last page  View All Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded


bubble at hoster-ok

Sep 6, 2011, 12:27 AM

Post #1 of 43 (4033 views)
Permalink
pacemaker/dlm problems

Hi Andrew, hi all,

I'm further investigating dlm lockspace hangs I described in
https://www.redhat.com/archives/cluster-devel/2011-August/msg00133.html
and in the thread starting from
https://lists.linux-foundation.org/pipermail/openais/2011-September/016701.html
.

What I described there is setup which involves pacemaker-1.1.6 with
corosync-1.4.1 and dlm_controld.pcmk from cluster-3.0.17 (without cman).
I use openais stack for pacemaker.

I found that it is possible to reproduce dlm kern_stop state across a
whole cluster with iptables on just one node, it is sufficient to block
all (or just corosync-specific) incoming/outgoing UDP for several
seconds (that time probably depends on corosync settings). I my case I
reproduced hang with 3-seconds traffic block:
iptables -I INPUT 1 -p udp -j REJECT; \
iptables -I OUTPUT 1 -p udp -j REJECT; \
sleep 3; \
iptables -D INPUT 1; \
iptables -D OUTPUT 1

I tried to make dlm_controld schedule fencing on CPG_REASON_NODEDOWN
event (just to look if it helps with problems I described in posts
referenced above), but without much success, following code does not work:

int fd = pcmk_cluster_fd;
int rc = crm_terminate_member_no_mainloop(nodeid, NULL, &fd);

I get "Could not kick node XXX from the cluster" message accompanied
with "No connection to the cluster". That means that
attrd_update_no_mainloop() fails.

Andrew, could you please give some pointers why may it fail? I'd then
try to fix dlm_controld. I do not see any other uses of that function
except than in dlm_controld.pcmk.

I agree with Jiaju
(https://lists.linux-foundation.org/pipermail/openais/2011-September/016713.html),
that could be solely pacemaker problem, because it probably should
originate fencing itself is such situation I think.

So, using pacemaker/dlm with openais stack is currently risky due to
possible hangs of dlm_lockspaces. Originally I got it due to heavy load
on one cluster nodes (actually on a host which has that cluster node
running as virtual guest).

Ok, I switched to cman to see if it helps. Fencing is configured in
pacemaker, not in cluster.conf.

Things became even worse ;( .

Although it took 25 seconds instead of 3 to break the cluster (I
understand, this is almost impossible to load host so much, but
anyways), then I got a real nightmare: two nodes of 3-node cluster had
cman stopped (and pacemaker too because of cman connection loss) - they
asked to kick_node_from_cluster() for each other, and that succeeded.
But fencing didn't happen (I still need to look why, but this is cman
specific).
Remaining node had pacemaker hanged, it doesn't even
notice cluster infrastructure change, down nodes were listed as a
online, one of them was a DC, all resources are marked as started on all
(down too) nodes. No log entries from pacemaker at all.

So, from my PoV cman+pacemaker is not currently suitable for HA tasks too.

That means that both possible alternatives are currently unusable if one
needs self-repairing pacemaker cluster with dlm support ;( That is
really regrettable.

I can provide all needed information and really hope that it is possible
to fix both issues:
* dlm blockage with openais and
* pacemaker lock with cman and no fencing from within dlm_controld

I think both issues are really high priority, because it is definitely
not acceptable when problems with load on one cluster node (or with link
to that node) lead to a total cluster lock or even crash.

I also offer any possible assistance from my side (f.e. patch trials
etc.) to get that all fixed. I can run either openais or cman and can
quickly switch between that stacks.

Sorry for not being brief,

Best regards,
Vladislav


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


andrew at beekhof

Sep 26, 2011, 12:10 AM

Post #2 of 43 (3876 views)
Permalink
Re: pacemaker/dlm problems [In reply to]

On Tue, Sep 6, 2011 at 5:27 PM, Vladislav Bogdanov <bubble [at] hoster-ok> wrote:
> Hi Andrew, hi all,
>
> I'm further investigating dlm lockspace hangs I described in
> https://www.redhat.com/archives/cluster-devel/2011-August/msg00133.html
> and in the thread starting from
> https://lists.linux-foundation.org/pipermail/openais/2011-September/016701.html
> .
>
> What I described there is setup which involves pacemaker-1.1.6 with
> corosync-1.4.1 and dlm_controld.pcmk from cluster-3.0.17 (without cman).
> I use openais stack for pacemaker.
>
> I found that it is possible to reproduce dlm kern_stop state across a
> whole cluster with iptables on just one node, it is sufficient to block
> all (or just corosync-specific) incoming/outgoing UDP for several
> seconds (that time probably depends on corosync settings). I my case I
> reproduced hang with 3-seconds traffic block:
> iptables -I INPUT 1 -p udp -j REJECT; \
> iptables -I OUTPUT 1 -p udp -j REJECT; \
> sleep 3; \
> iptables -D INPUT 1; \
> iptables -D OUTPUT 1
>
> I tried to make dlm_controld schedule fencing on CPG_REASON_NODEDOWN
> event (just to look if it helps with problems I described in posts
> referenced above), but without much success, following code does not work:
>
>    int fd = pcmk_cluster_fd;
>    int rc = crm_terminate_member_no_mainloop(nodeid, NULL, &fd);
>
> I get "Could not kick node XXX from the cluster" message accompanied
> with "No connection to the cluster". That means that
> attrd_update_no_mainloop() fails.
>
> Andrew, could you please give some pointers why may it fail? I'd then
> try to fix dlm_controld. I do not see any other uses of that function
> except than in dlm_controld.pcmk.

I can't think of anything except that attrd might not be running. Is it?

Regardless, for 1.1.6 the dlm would be better off making a call like:

rc = st->cmds->fence(st, st_opts, target, "reboot", 120);

from fencing/admin.c

That would talk directly to the fencing daemon, bypassing attrd, crnd
and PE - and thus be more reliable.

This is what the cman plugin will be doing soon too.

>
> I agree with Jiaju
> (https://lists.linux-foundation.org/pipermail/openais/2011-September/016713.html),
> that could be solely pacemaker problem, because it probably should
> originate fencing itself is such situation I think.
>
> So, using pacemaker/dlm with openais stack is currently risky due to
> possible hangs of dlm_lockspaces.

It shouldn't be, failing to connect to attrd is very unusual.

> Originally I got it due to heavy load
> on one cluster nodes (actually on a host which has that cluster node
> running as virtual guest).
>
> Ok, I switched to cman to see if it helps. Fencing is configured in
> pacemaker, not in cluster.conf.
>
> Things became even worse ;( .
>
> Although it took 25 seconds instead of 3 to break the cluster (I
> understand, this is almost impossible to load host so much, but
> anyways), then I got a real nightmare: two nodes of 3-node cluster had
> cman stopped (and pacemaker too because of cman connection loss) - they
> asked to kick_node_from_cluster() for each other, and that succeeded.
> But fencing didn't happen (I still need to look why, but this is cman
> specific).
> Remaining node had pacemaker hanged, it doesn't even
> notice cluster infrastructure change, down nodes were listed as a
> online, one of them was a DC, all resources are marked as started on all
> (down too) nodes. No log entries from pacemaker at all.

Well I can't see any logs from anyone to its hard for me to comment.

> So, from my PoV cman+pacemaker is not currently suitable for HA tasks too.
>
> That means that both possible alternatives are currently unusable if one
> needs self-repairing pacemaker cluster with dlm support ;( That is
> really regrettable.
>
> I can provide all needed information and really hope that it is possible
> to fix both issues:
> * dlm blockage with openais and
> * pacemaker lock with cman and no fencing from within dlm_controld
>
> I think both issues are really high priority, because it is definitely
> not acceptable when problems with load on one cluster node (or with link
> to that node) lead to a total cluster lock or even crash.
>
> I also offer any possible assistance from my side (f.e. patch trials
> etc.) to get that all fixed. I can run either openais or cman and can
> quickly switch between that stacks.
>
> Sorry for not being brief,
>
> Best regards,
> Vladislav
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


bubble at hoster-ok

Sep 26, 2011, 12:38 AM

Post #3 of 43 (3878 views)
Permalink
Re: pacemaker/dlm problems [In reply to]

Hi Andrew,

26.09.2011 10:10, Andrew Beekhof wrote:
> On Tue, Sep 6, 2011 at 5:27 PM, Vladislav Bogdanov <bubble [at] hoster-ok> wrote:
>> Hi Andrew, hi all,
>>
>> I'm further investigating dlm lockspace hangs I described in
>> https://www.redhat.com/archives/cluster-devel/2011-August/msg00133.html
>> and in the thread starting from
>> https://lists.linux-foundation.org/pipermail/openais/2011-September/016701.html
>> .
>>
>> What I described there is setup which involves pacemaker-1.1.6 with
>> corosync-1.4.1 and dlm_controld.pcmk from cluster-3.0.17 (without cman).
>> I use openais stack for pacemaker.
>>
>> I found that it is possible to reproduce dlm kern_stop state across a
>> whole cluster with iptables on just one node, it is sufficient to block
>> all (or just corosync-specific) incoming/outgoing UDP for several
>> seconds (that time probably depends on corosync settings). I my case I
>> reproduced hang with 3-seconds traffic block:
>> iptables -I INPUT 1 -p udp -j REJECT; \
>> iptables -I OUTPUT 1 -p udp -j REJECT; \
>> sleep 3; \
>> iptables -D INPUT 1; \
>> iptables -D OUTPUT 1
>>
>> I tried to make dlm_controld schedule fencing on CPG_REASON_NODEDOWN
>> event (just to look if it helps with problems I described in posts
>> referenced above), but without much success, following code does not work:
>>
>> int fd = pcmk_cluster_fd;
>> int rc = crm_terminate_member_no_mainloop(nodeid, NULL, &fd);
>>
>> I get "Could not kick node XXX from the cluster" message accompanied
>> with "No connection to the cluster". That means that
>> attrd_update_no_mainloop() fails.
>>
>> Andrew, could you please give some pointers why may it fail? I'd then
>> try to fix dlm_controld. I do not see any other uses of that function
>> except than in dlm_controld.pcmk.
>
> I can't think of anything except that attrd might not be running. Is it?

Will recheck.

>
> Regardless, for 1.1.6 the dlm would be better off making a call like:
>
> rc = st->cmds->fence(st, st_opts, target, "reboot", 120);
>
> from fencing/admin.c
>
> That would talk directly to the fencing daemon, bypassing attrd, crnd
> and PE - and thus be more reliable.
>
> This is what the cman plugin will be doing soon too.

Great to know, I'll try that in near future. Thank you very much for
pointer.

>
>>
>> I agree with Jiaju
>> (https://lists.linux-foundation.org/pipermail/openais/2011-September/016713.html),
>> that could be solely pacemaker problem, because it probably should
>> originate fencing itself is such situation I think.
>>
>> So, using pacemaker/dlm with openais stack is currently risky due to
>> possible hangs of dlm_lockspaces.
>
> It shouldn't be, failing to connect to attrd is very unusual.

By the way, one of underlying problems, which actually made me to notice
all this, is that pacemaker cluster does not fence its DC if it leaves
the cluster for a very short time. That is what Jiaju told in his notes.
And I can confirm that.

>
>> Originally I got it due to heavy load
>> on one cluster nodes (actually on a host which has that cluster node
>> running as virtual guest).
>>
>> Ok, I switched to cman to see if it helps. Fencing is configured in
>> pacemaker, not in cluster.conf.
>>
>> Things became even worse ;( .
>>
>> Although it took 25 seconds instead of 3 to break the cluster (I
>> understand, this is almost impossible to load host so much, but
>> anyways), then I got a real nightmare: two nodes of 3-node cluster had
>> cman stopped (and pacemaker too because of cman connection loss) - they
>> asked to kick_node_from_cluster() for each other, and that succeeded.
>> But fencing didn't happen (I still need to look why, but this is cman
>> specific).
>> Remaining node had pacemaker hanged, it doesn't even
>> notice cluster infrastructure change, down nodes were listed as a
>> online, one of them was a DC, all resources are marked as started on all
>> (down too) nodes. No log entries from pacemaker at all.
>
> Well I can't see any logs from anyone to its hard for me to comment.

Logs are sent privately.

>
>> So, from my PoV cman+pacemaker is not currently suitable for HA tasks too.
>>
>> That means that both possible alternatives are currently unusable if one
>> needs self-repairing pacemaker cluster with dlm support ;( That is
>> really regrettable.
>>
>> I can provide all needed information and really hope that it is possible
>> to fix both issues:
>> * dlm blockage with openais and
>> * pacemaker lock with cman and no fencing from within dlm_controld
>>
>> I think both issues are really high priority, because it is definitely
>> not acceptable when problems with load on one cluster node (or with link
>> to that node) lead to a total cluster lock or even crash.
>>
>> I also offer any possible assistance from my side (f.e. patch trials
>> etc.) to get that all fixed. I can run either openais or cman and can
>> quickly switch between that stacks.
>>
>> Sorry for not being brief,
>>
>> Best regards,
>> Vladislav
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker [at] oss
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


andrew at beekhof

Sep 26, 2011, 1:16 AM

Post #4 of 43 (3856 views)
Permalink
Re: pacemaker/dlm problems [In reply to]

On Mon, Sep 26, 2011 at 5:38 PM, Vladislav Bogdanov
<bubble [at] hoster-ok> wrote:
> Hi Andrew,
>
> 26.09.2011 10:10, Andrew Beekhof wrote:
>> On Tue, Sep 6, 2011 at 5:27 PM, Vladislav Bogdanov <bubble [at] hoster-ok> wrote:
>>> Hi Andrew, hi all,
>>>
>>> I'm further investigating dlm lockspace hangs I described in
>>> https://www.redhat.com/archives/cluster-devel/2011-August/msg00133.html
>>> and in the thread starting from
>>> https://lists.linux-foundation.org/pipermail/openais/2011-September/016701.html
>>> .
>>>
>>> What I described there is setup which involves pacemaker-1.1.6 with
>>> corosync-1.4.1 and dlm_controld.pcmk from cluster-3.0.17 (without cman).
>>> I use openais stack for pacemaker.
>>>
>>> I found that it is possible to reproduce dlm kern_stop state across a
>>> whole cluster with iptables on just one node, it is sufficient to block
>>> all (or just corosync-specific) incoming/outgoing UDP for several
>>> seconds (that time probably depends on corosync settings). I my case I
>>> reproduced hang with 3-seconds traffic block:
>>> iptables -I INPUT 1 -p udp -j REJECT; \
>>> iptables -I OUTPUT 1 -p udp -j REJECT; \
>>> sleep 3; \
>>> iptables -D INPUT 1; \
>>> iptables -D OUTPUT 1
>>>
>>> I tried to make dlm_controld schedule fencing on CPG_REASON_NODEDOWN
>>> event (just to look if it helps with problems I described in posts
>>> referenced above), but without much success, following code does not work:
>>>
>>>    int fd = pcmk_cluster_fd;
>>>    int rc = crm_terminate_member_no_mainloop(nodeid, NULL, &fd);
>>>
>>> I get "Could not kick node XXX from the cluster" message accompanied
>>> with "No connection to the cluster". That means that
>>> attrd_update_no_mainloop() fails.
>>>
>>> Andrew, could you please give some pointers why may it fail? I'd then
>>> try to fix dlm_controld. I do not see any other uses of that function
>>> except than in dlm_controld.pcmk.
>>
>> I can't think of anything except that attrd might not be running.  Is it?
>
> Will recheck.
>
>>
>> Regardless, for 1.1.6 the dlm would be better off making a call like:
>>
>>           rc = st->cmds->fence(st, st_opts, target, "reboot", 120);
>>
>> from fencing/admin.c
>>
>> That would talk directly to the fencing daemon, bypassing attrd, crnd
>> and PE - and thus be more reliable.
>>
>> This is what the cman plugin will be doing soon too.
>
> Great to know, I'll try that in near future. Thank you very much for
> pointer.

1.1.7 will actually make use of this API regardless of any *_controld
changes - i'm in the middle of updating the two library functions they
use (crm_terminate_member and crm_terminate_member_no_mainloop).

>
>>
>>>
>>> I agree with Jiaju
>>> (https://lists.linux-foundation.org/pipermail/openais/2011-September/016713.html),
>>> that could be solely pacemaker problem, because it probably should
>>> originate fencing itself is such situation I think.
>>>
>>> So, using pacemaker/dlm with openais stack is currently risky due to
>>> possible hangs of dlm_lockspaces.
>>
>> It shouldn't be, failing to connect to attrd is very unusual.
>
> By the way, one of underlying problems, which actually made me to notice
> all this, is that pacemaker cluster does not fence its DC if it leaves
> the cluster for a very short time. That is what Jiaju told in his notes.
> And I can confirm that.

Thats highly surprising. Do the logs you sent display this behaviour?

>
>>
>>> Originally I got it due to heavy load
>>> on one cluster nodes (actually on a host which has that cluster node
>>> running as virtual guest).
>>>
>>> Ok, I switched to cman to see if it helps. Fencing is configured in
>>> pacemaker, not in cluster.conf.
>>>
>>> Things became even worse ;( .
>>>
>>> Although it took 25 seconds instead of 3 to break the cluster (I
>>> understand, this is almost impossible to load host so much, but
>>> anyways), then I got a real nightmare: two nodes of 3-node cluster had
>>> cman stopped (and pacemaker too because of cman connection loss) - they
>>> asked to kick_node_from_cluster() for each other, and that succeeded.
>>> But fencing didn't happen (I still need to look why, but this is cman
>>> specific).
>>> Remaining node had pacemaker hanged, it doesn't even
>>> notice cluster infrastructure change, down nodes were listed as a
>>> online, one of them was a DC, all resources are marked as started on all
>>> (down too) nodes. No log entries from pacemaker at all.
>>
>> Well I can't see any logs from anyone to its hard for me to comment.
>
> Logs are sent privately.
>
>>
>>> So, from my PoV cman+pacemaker is not currently suitable for HA tasks too.
>>>
>>> That means that both possible alternatives are currently unusable if one
>>> needs self-repairing pacemaker cluster with dlm support ;( That is
>>> really regrettable.
>>>
>>> I can provide all needed information and really hope that it is possible
>>> to fix both issues:
>>> * dlm blockage with openais and
>>> * pacemaker lock with cman and no fencing from within dlm_controld
>>>
>>> I think both issues are really high priority, because it is definitely
>>> not acceptable when problems with load on one cluster node (or with link
>>> to that node) lead to a total cluster lock or even crash.
>>>
>>> I also offer any possible assistance from my side (f.e. patch trials
>>> etc.) to get that all fixed. I can run either openais or cman and can
>>> quickly switch between that stacks.
>>>
>>> Sorry for not being brief,
>>>
>>> Best regards,
>>> Vladislav
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker [at] oss
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker [at] oss
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


bubble at hoster-ok

Sep 26, 2011, 1:41 AM

Post #5 of 43 (3873 views)
Permalink
Re: pacemaker/dlm problems [In reply to]

26.09.2011 11:16, Andrew Beekhof wrote:
[snip]
>>
>>>
>>> Regardless, for 1.1.6 the dlm would be better off making a call like:
>>>
>>> rc = st->cmds->fence(st, st_opts, target, "reboot", 120);
>>>
>>> from fencing/admin.c
>>>
>>> That would talk directly to the fencing daemon, bypassing attrd, crnd
>>> and PE - and thus be more reliable.
>>>
>>> This is what the cman plugin will be doing soon too.
>>
>> Great to know, I'll try that in near future. Thank you very much for
>> pointer.
>
> 1.1.7 will actually make use of this API regardless of any *_controld
> changes - i'm in the middle of updating the two library functions they
> use (crm_terminate_member and crm_terminate_member_no_mainloop).

Ah, I then try your patch and wait for that to be resolved.

>
>>
>>>
>>>>
>>>> I agree with Jiaju
>>>> (https://lists.linux-foundation.org/pipermail/openais/2011-September/016713.html),
>>>> that could be solely pacemaker problem, because it probably should
>>>> originate fencing itself is such situation I think.
>>>>
>>>> So, using pacemaker/dlm with openais stack is currently risky due to
>>>> possible hangs of dlm_lockspaces.
>>>
>>> It shouldn't be, failing to connect to attrd is very unusual.
>>
>> By the way, one of underlying problems, which actually made me to notice
>> all this, is that pacemaker cluster does not fence its DC if it leaves
>> the cluster for a very short time. That is what Jiaju told in his notes.
>> And I can confirm that.
>
> Thats highly surprising. Do the logs you sent display this behaviour?

They do. Rest of the cluster begins the election, but then accepts
returned DC back (I write this from memory, I looked at logs Sep 5-6, so
I may mix up something).

[snip]
>>>> Although it took 25 seconds instead of 3 to break the cluster (I
>>>> understand, this is almost impossible to load host so much, but
>>>> anyways), then I got a real nightmare: two nodes of 3-node cluster had
>>>> cman stopped (and pacemaker too because of cman connection loss) - they
>>>> asked to kick_node_from_cluster() for each other, and that succeeded.
>>>> But fencing didn't happen (I still need to look why, but this is cman
>>>> specific).

Btw this part is tricky for me to understand the underlying logic:
* cman just stops cman processes on remote nodes, disregarding the
quorum. I hope that could be fixed in corosync If I understand one of
latest threads there right.
* But cman does not do fencing of that nodes, and they still run
resources. And this could be extremely dangerous under some
circumstances. And cman does not do fencing even if it has fence devices
configure in cluster.conf (I verified that).

>>>> Remaining node had pacemaker hanged, it doesn't even
>>>> notice cluster infrastructure change, down nodes were listed as a
>>>> online, one of them was a DC, all resources are marked as started on all
>>>> (down too) nodes. No log entries from pacemaker at all.
>>>
>>> Well I can't see any logs from anyone to its hard for me to comment.
>>
>> Logs are sent privately.
>>
>>>

Vladislav


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


andrew at beekhof

Sep 26, 2011, 10:59 PM

Post #6 of 43 (3899 views)
Permalink
Re: pacemaker/dlm problems [In reply to]

On Mon, Sep 26, 2011 at 6:41 PM, Vladislav Bogdanov
<bubble [at] hoster-ok> wrote:
> 26.09.2011 11:16, Andrew Beekhof wrote:
> [snip]
>>>
>>>>
>>>> Regardless, for 1.1.6 the dlm would be better off making a call like:
>>>>
>>>>           rc = st->cmds->fence(st, st_opts, target, "reboot", 120);
>>>>
>>>> from fencing/admin.c
>>>>
>>>> That would talk directly to the fencing daemon, bypassing attrd, crnd
>>>> and PE - and thus be more reliable.
>>>>
>>>> This is what the cman plugin will be doing soon too.
>>>
>>> Great to know, I'll try that in near future. Thank you very much for
>>> pointer.
>>
>> 1.1.7 will actually make use of this API regardless of any *_controld
>> changes - i'm in the middle of updating the two library functions they
>> use (crm_terminate_member and crm_terminate_member_no_mainloop).
>
> Ah, I then try your patch and wait for that to be resolved.
>
>>
>>>
>>>>
>>>>>
>>>>> I agree with Jiaju
>>>>> (https://lists.linux-foundation.org/pipermail/openais/2011-September/016713.html),
>>>>> that could be solely pacemaker problem, because it probably should
>>>>> originate fencing itself is such situation I think.
>>>>>
>>>>> So, using pacemaker/dlm with openais stack is currently risky due to
>>>>> possible hangs of dlm_lockspaces.
>>>>
>>>> It shouldn't be, failing to connect to attrd is very unusual.
>>>
>>> By the way, one of underlying problems, which actually made me to notice
>>> all this, is that pacemaker cluster does not fence its DC if it leaves
>>> the cluster for a very short time. That is what Jiaju told in his notes.
>>> And I can confirm that.
>>
>> Thats highly surprising.  Do the logs you sent display this behaviour?
>
> They do. Rest of the cluster begins the election, but then accepts
> returned DC back (I write this from memory, I looked at logs Sep 5-6, so
> I may mix up something).

Actually, this might be possible - if DC.old came back before DC.new
had a chance to get elected, run the PE and initiate fencing, then
there would be no need to fence.

> [snip]
>>>>> Although it took 25 seconds instead of 3 to break the cluster (I
>>>>> understand, this is almost impossible to load host so much, but
>>>>> anyways), then I got a real nightmare: two nodes of 3-node cluster had
>>>>> cman stopped (and pacemaker too because of cman connection loss) - they
>>>>> asked to kick_node_from_cluster() for each other, and that succeeded.
>>>>> But fencing didn't happen (I still need to look why, but this is cman
>>>>> specific).
>
> Btw this part is tricky for me to understand the underlying logic:
> * cman just stops cman processes on remote nodes, disregarding the
> quorum. I hope that could be fixed in corosync If I understand one of
> latest threads there right.
> * But cman does not do fencing of that nodes, and they still run
> resources. And this could be extremely dangerous under some
> circumstances. And cman does not do fencing even if it has fence devices
> configure in cluster.conf (I verified that).
>
>>>>> Remaining node had pacemaker hanged, it doesn't even
>>>>> notice cluster infrastructure change, down nodes were listed as a
>>>>> online, one of them was a DC, all resources are marked as started on all
>>>>> (down too) nodes. No log entries from pacemaker at all.
>>>>
>>>> Well I can't see any logs from anyone to its hard for me to comment.
>>>
>>> Logs are sent privately.
>>>
>>>>
>
> Vladislav
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


bubble at hoster-ok

Sep 27, 2011, 12:07 AM

Post #7 of 43 (3895 views)
Permalink
Re: pacemaker/dlm problems [In reply to]

27.09.2011 08:59, Andrew Beekhof wrote:
[snip]
>>>>>> I agree with Jiaju
>>>>>> (https://lists.linux-foundation.org/pipermail/openais/2011-September/016713.html),
>>>>>> that could be solely pacemaker problem, because it probably should
>>>>>> originate fencing itself is such situation I think.
>>>>>>
>>>>>> So, using pacemaker/dlm with openais stack is currently risky due to
>>>>>> possible hangs of dlm_lockspaces.
>>>>>
>>>>> It shouldn't be, failing to connect to attrd is very unusual.
>>>>
>>>> By the way, one of underlying problems, which actually made me to notice
>>>> all this, is that pacemaker cluster does not fence its DC if it leaves
>>>> the cluster for a very short time. That is what Jiaju told in his notes.
>>>> And I can confirm that.
>>>
>>> Thats highly surprising. Do the logs you sent display this behaviour?
>>
>> They do. Rest of the cluster begins the election, but then accepts
>> returned DC back (I write this from memory, I looked at logs Sep 5-6, so
>> I may mix up something).
>
> Actually, this might be possible - if DC.old came back before DC.new
> had a chance to get elected, run the PE and initiate fencing, then
> there would be no need to fence.
>

(text below is for pacemaker on top of openais stack, not for cman)

Except dlm lockspaces are in kern_stop state, so a whole dlm-related
part is frozen :( - clvmd in my case, but I expect the same from gfs2
and ocfs2.
And fencing requests originated on CPG NODEDOWN event by dlm_controld
(with my patch to dlm_controld and your patch for
crm_terminate_member_common()) on a quorate partition are lost. DC.old
doesn't accept CIB updates from other nodes, so that fencing requests
are discarded.

I think that problem is that membership changes are handled in a
non-transactional way (?).
If pacemaker fully finish processing of one membership change - elect
new DC on a quorate partition, and do not try to take over dc role (or
release it) on a non-quorate partition if quorate one exists, that
problem could be gone.
I didn't dig into code so much, so all above is just my deduction which
may be completely wrong.
And of course real logic could (should) be much more complicated, with
handling of just rebooted members, etc.

(end of openais specific part)

>> [snip]
>>>>>> Although it took 25 seconds instead of 3 to break the cluster (I
>>>>>> understand, this is almost impossible to load host so much, but
>>>>>> anyways), then I got a real nightmare: two nodes of 3-node cluster had
>>>>>> cman stopped (and pacemaker too because of cman connection loss) - they
>>>>>> asked to kick_node_from_cluster() for each other, and that succeeded.
>>>>>> But fencing didn't happen (I still need to look why, but this is cman
>>>>>> specific).
>>
>> Btw this part is tricky for me to understand the underlying logic:
>> * cman just stops cman processes on remote nodes, disregarding the
>> quorum. I hope that could be fixed in corosync If I understand one of
>> latest threads there right.
>> * But cman does not do fencing of that nodes, and they still run
>> resources. And this could be extremely dangerous under some
>> circumstances. And cman does not do fencing even if it has fence devices
>> configure in cluster.conf (I verified that).
>>
>>>>>> Remaining node had pacemaker hanged, it doesn't even
>>>>>> notice cluster infrastructure change, down nodes were listed as a
>>>>>> online, one of them was a DC, all resources are marked as started on all
>>>>>> (down too) nodes. No log entries from pacemaker at all.
>>>>>
>>>>> Well I can't see any logs from anyone to its hard for me to comment.
>>>>
>>>> Logs are sent privately.
>>>>
>>>>>
>>
>> Vladislav
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker [at] oss
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


andrew at beekhof

Sep 27, 2011, 12:56 AM

Post #8 of 43 (3880 views)
Permalink
Re: pacemaker/dlm problems [In reply to]

On Tue, Sep 27, 2011 at 5:07 PM, Vladislav Bogdanov
<bubble [at] hoster-ok> wrote:
> 27.09.2011 08:59, Andrew Beekhof wrote:
> [snip]
>>>>>>> I agree with Jiaju
>>>>>>> (https://lists.linux-foundation.org/pipermail/openais/2011-September/016713.html),
>>>>>>> that could be solely pacemaker problem, because it probably should
>>>>>>> originate fencing itself is such situation I think.
>>>>>>>
>>>>>>> So, using pacemaker/dlm with openais stack is currently risky due to
>>>>>>> possible hangs of dlm_lockspaces.
>>>>>>
>>>>>> It shouldn't be, failing to connect to attrd is very unusual.
>>>>>
>>>>> By the way, one of underlying problems, which actually made me to notice
>>>>> all this, is that pacemaker cluster does not fence its DC if it leaves
>>>>> the cluster for a very short time. That is what Jiaju told in his notes.
>>>>> And I can confirm that.
>>>>
>>>> Thats highly surprising.  Do the logs you sent display this behaviour?
>>>
>>> They do. Rest of the cluster begins the election, but then accepts
>>> returned DC back (I write this from memory, I looked at logs Sep 5-6, so
>>> I may mix up something).
>>
>> Actually, this might be possible - if DC.old came back before DC.new
>> had a chance to get elected, run the PE and initiate fencing, then
>> there would be no need to fence.
>>
>
> (text below is for pacemaker on top of openais stack, not for cman)
>
> Except dlm lockspaces are in kern_stop state, so a whole dlm-related
> part is frozen :( - clvmd in my case, but I expect the same from gfs2
> and ocfs2.
> And fencing requests originated on CPG NODEDOWN event by dlm_controld
> (with my patch to dlm_controld and your patch for
> crm_terminate_member_common()) on a quorate partition are lost. DC.old
> doesn't accept CIB updates from other nodes, so that fencing requests
> are discarded.

All the more reason to start using the stonith api directly.
I was playing around list night with the dlm_controld.pcmk code:
https://github.com/beekhof/dlm/commit/9f890a36f6844c2a0567aea0a0e29cc47b01b787

>
> I think that problem is that membership changes are handled in a
> non-transactional way (?).

Sounds more like the dlm/etc is being dumb - if the host is back and
healthy, why would we want to shoot it?

> If pacemaker fully finish processing of one membership change - elect
> new DC on a quorate partition, and do not try to take over dc role (or
> release it) on a non-quorate partition if quorate one exists, that
> problem could be gone.

Non quorate partitions still have a DC.
They're just not supposed to do anything (depending on the value of
no-quorum-policy).

> I didn't dig into code so much, so all above is just my deduction which
> may be completely wrong.
> And of course real logic could (should) be much more complicated, with
> handling of just rebooted members, etc.
>
> (end of openais specific part)
>
>>> [snip]
>>>>>>> Although it took 25 seconds instead of 3 to break the cluster (I
>>>>>>> understand, this is almost impossible to load host so much, but
>>>>>>> anyways), then I got a real nightmare: two nodes of 3-node cluster had
>>>>>>> cman stopped (and pacemaker too because of cman connection loss) - they
>>>>>>> asked to kick_node_from_cluster() for each other, and that succeeded.
>>>>>>> But fencing didn't happen (I still need to look why, but this is cman
>>>>>>> specific).
>>>
>>> Btw this part is tricky for me to understand the underlying logic:
>>> * cman just stops cman processes on remote nodes, disregarding the
>>> quorum. I hope that could be fixed in corosync If I understand one of
>>> latest threads there right.
>>> * But cman does not do fencing of that nodes, and they still run
>>> resources. And this could be extremely dangerous under some
>>> circumstances. And cman does not do fencing even if it has fence devices
>>> configure in cluster.conf (I verified that).
>>>
>>>>>>> Remaining node had pacemaker hanged, it doesn't even
>>>>>>> notice cluster infrastructure change, down nodes were listed as a
>>>>>>> online, one of them was a DC, all resources are marked as started on all
>>>>>>> (down too) nodes. No log entries from pacemaker at all.
>>>>>>
>>>>>> Well I can't see any logs from anyone to its hard for me to comment.
>>>>>
>>>>> Logs are sent privately.
>>>>>
>>>>>>
>>>
>>> Vladislav
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker [at] oss
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker [at] oss
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


bubble at hoster-ok

Sep 27, 2011, 1:24 AM

Post #9 of 43 (3897 views)
Permalink
Re: pacemaker/dlm problems [In reply to]

27.09.2011 10:56, Andrew Beekhof wrote:
> On Tue, Sep 27, 2011 at 5:07 PM, Vladislav Bogdanov
> <bubble [at] hoster-ok> wrote:
>> 27.09.2011 08:59, Andrew Beekhof wrote:
>> [snip]
>>>>>>>> I agree with Jiaju
>>>>>>>> (https://lists.linux-foundation.org/pipermail/openais/2011-September/016713.html),
>>>>>>>> that could be solely pacemaker problem, because it probably should
>>>>>>>> originate fencing itself is such situation I think.
>>>>>>>>
>>>>>>>> So, using pacemaker/dlm with openais stack is currently risky due to
>>>>>>>> possible hangs of dlm_lockspaces.
>>>>>>>
>>>>>>> It shouldn't be, failing to connect to attrd is very unusual.
>>>>>>
>>>>>> By the way, one of underlying problems, which actually made me to notice
>>>>>> all this, is that pacemaker cluster does not fence its DC if it leaves
>>>>>> the cluster for a very short time. That is what Jiaju told in his notes.
>>>>>> And I can confirm that.
>>>>>
>>>>> Thats highly surprising. Do the logs you sent display this behaviour?
>>>>
>>>> They do. Rest of the cluster begins the election, but then accepts
>>>> returned DC back (I write this from memory, I looked at logs Sep 5-6, so
>>>> I may mix up something).
>>>
>>> Actually, this might be possible - if DC.old came back before DC.new
>>> had a chance to get elected, run the PE and initiate fencing, then
>>> there would be no need to fence.
>>>
>>
>> (text below is for pacemaker on top of openais stack, not for cman)
>>
>> Except dlm lockspaces are in kern_stop state, so a whole dlm-related
>> part is frozen :( - clvmd in my case, but I expect the same from gfs2
>> and ocfs2.
>> And fencing requests originated on CPG NODEDOWN event by dlm_controld
>> (with my patch to dlm_controld and your patch for
>> crm_terminate_member_common()) on a quorate partition are lost. DC.old
>> doesn't accept CIB updates from other nodes, so that fencing requests
>> are discarded.
>
> All the more reason to start using the stonith api directly.
> I was playing around list night with the dlm_controld.pcmk code:
> https://github.com/beekhof/dlm/commit/9f890a36f6844c2a0567aea0a0e29cc47b01b787

Wow, I'll try it!

Btw (offtopic), don't you think that it could be interesting to have
stacks support in dlopened modules there? From what I see in that code,
it could be almost easily achieved. One just needs to create module API
structure, enumerate functions in each stack, add module loading to
dlm_controld core and change calls to module functions.

>
>>
>> I think that problem is that membership changes are handled in a
>> non-transactional way (?).
>
> Sounds more like the dlm/etc is being dumb - if the host is back and
> healthy, why would we want to shoot it?

Ammmm..... No comments from me on this ;)

But, anyways, something needs to be done at either side...

>
>> If pacemaker fully finish processing of one membership change - elect
>> new DC on a quorate partition, and do not try to take over dc role (or
>> release it) on a non-quorate partition if quorate one exists, that
>> problem could be gone.
>
> Non quorate partitions still have a DC.
> They're just not supposed to do anything (depending on the value of
> no-quorum-policy).

I actually meant "do not try to take over dc role in a rejoined cluster
(or release that role) if it was running on a non-quorate partition
before rejoin if quorate one existed". Sorry for confusion. Not very
natural wording again, but should be better.

May be DC from non-quorate partition should just have lower priority to
become DC when cluster rejoins and new election happen (does it?)?

>
>> I didn't dig into code so much, so all above is just my deduction which
>> may be completely wrong.
>> And of course real logic could (should) be much more complicated, with
>> handling of just rebooted members, etc.
>>
>> (end of openais specific part)
>>
>>>> [snip]
>>>>>>>> Although it took 25 seconds instead of 3 to break the cluster (I
>>>>>>>> understand, this is almost impossible to load host so much, but
>>>>>>>> anyways), then I got a real nightmare: two nodes of 3-node cluster had
>>>>>>>> cman stopped (and pacemaker too because of cman connection loss) - they
>>>>>>>> asked to kick_node_from_cluster() for each other, and that succeeded.
>>>>>>>> But fencing didn't happen (I still need to look why, but this is cman
>>>>>>>> specific).
>>>>
>>>> Btw this part is tricky for me to understand the underlying logic:
>>>> * cman just stops cman processes on remote nodes, disregarding the
>>>> quorum. I hope that could be fixed in corosync If I understand one of
>>>> latest threads there right.
>>>> * But cman does not do fencing of that nodes, and they still run
>>>> resources. And this could be extremely dangerous under some
>>>> circumstances. And cman does not do fencing even if it has fence devices
>>>> configure in cluster.conf (I verified that).
>>>>
>>>>>>>> Remaining node had pacemaker hanged, it doesn't even
>>>>>>>> notice cluster infrastructure change, down nodes were listed as a
>>>>>>>> online, one of them was a DC, all resources are marked as started on all
>>>>>>>> (down too) nodes. No log entries from pacemaker at all.
>>>>>>>
>>>>>>> Well I can't see any logs from anyone to its hard for me to comment.
>>>>>>
>>>>>> Logs are sent privately.
>>>>>>
>>>>>>>
>>>>
>>>> Vladislav
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker [at] oss
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker [at] oss
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker [at] oss
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


bubble at hoster-ok

Sep 28, 2011, 2:12 AM

Post #10 of 43 (3849 views)
Permalink
Re: pacemaker/dlm problems [In reply to]

Hi,

27.09.2011 10:56, Andrew Beekhof wrote:
[snip]
> All the more reason to start using the stonith api directly.
> I was playing around list night with the dlm_controld.pcmk code:
> https://github.com/beekhof/dlm/commit/9f890a36f6844c2a0567aea0a0e29cc47b01b787

Doesn't seem to apply to 3.0.17, so I rebased that commit against it for
my build. Then it doesn't compile without attached patch.
It may need to be rebased a bit against your tree.

Now I have package built and am building node images. Will try shortly.

[snip]

Best,
Vladislav
Attachments: cluster-3.0.17-dlm-pcmk-stonith-api-new-compile-fix.patch (2.33 KB)


bubble at hoster-ok

Sep 28, 2011, 7:41 AM

Post #11 of 43 (4093 views)
Permalink
Re: [Partially SOLVED] pacemaker/dlm problems [In reply to]

Hi Andrew,

>> All the more reason to start using the stonith api directly.
>> I was playing around list night with the dlm_controld.pcmk code:
>> https://github.com/beekhof/dlm/commit/9f890a36f6844c2a0567aea0a0e29cc47b01b787
>
> Doesn't seem to apply to 3.0.17, so I rebased that commit against it for
> my build. Then it doesn't compile without attached patch.
> It may need to be rebased a bit against your tree.
>
> Now I have package built and am building node images. Will try shortly.

Fencing from within dlm_controld.pcmk still did not work with your first
patch against that _no_mainloop function (expected).

So I did my best to build packages from the current git tree.

Voila! I got failed node correctly fenced!
I'll do some more extensive testing next days, but I believe everything
should be much better now.

I knew you're genius he-he ;)

So, here are steps to get DLM handle CPG NODEDOWN events correctly with
pacemaker using openais stack:

1. Build pacemaker (as of 2011-09-28) from git.
2. Apply attached patches to cluster-3.0.17 source tree.
3. Build dlm_controld.pcmk

One note - gfs2_controld probably needs to be fixed too (FIXME).

Best regards,
Vladislav
Attachments: cluster-3.0.17-dlm-fence-nodedown.patch (0.63 KB)
  cluster-3.0.17-dlm-pcmk-new-stonith-api.patch (4.68 KB)
  cluster-3.0.17-dlm-pcmk-stonith-api-new-compile-fix.patch (2.32 KB)


andrew at beekhof

Oct 2, 2011, 6:41 PM

Post #12 of 43 (3873 views)
Permalink
Re: pacemaker/dlm problems [In reply to]

On Tue, Sep 27, 2011 at 6:24 PM, Vladislav Bogdanov
<bubble [at] hoster-ok> wrote:
> 27.09.2011 10:56, Andrew Beekhof wrote:
>> On Tue, Sep 27, 2011 at 5:07 PM, Vladislav Bogdanov
>> <bubble [at] hoster-ok> wrote:
>>> 27.09.2011 08:59, Andrew Beekhof wrote:
>>> [snip]
>>>>>>>>> I agree with Jiaju
>>>>>>>>> (https://lists.linux-foundation.org/pipermail/openais/2011-September/016713.html),
>>>>>>>>> that could be solely pacemaker problem, because it probably should
>>>>>>>>> originate fencing itself is such situation I think.
>>>>>>>>>
>>>>>>>>> So, using pacemaker/dlm with openais stack is currently risky due to
>>>>>>>>> possible hangs of dlm_lockspaces.
>>>>>>>>
>>>>>>>> It shouldn't be, failing to connect to attrd is very unusual.
>>>>>>>
>>>>>>> By the way, one of underlying problems, which actually made me to notice
>>>>>>> all this, is that pacemaker cluster does not fence its DC if it leaves
>>>>>>> the cluster for a very short time. That is what Jiaju told in his notes.
>>>>>>> And I can confirm that.
>>>>>>
>>>>>> Thats highly surprising.  Do the logs you sent display this behaviour?
>>>>>
>>>>> They do. Rest of the cluster begins the election, but then accepts
>>>>> returned DC back (I write this from memory, I looked at logs Sep 5-6, so
>>>>> I may mix up something).
>>>>
>>>> Actually, this might be possible - if DC.old came back before DC.new
>>>> had a chance to get elected, run the PE and initiate fencing, then
>>>> there would be no need to fence.
>>>>
>>>
>>> (text below is for pacemaker on top of openais stack, not for cman)
>>>
>>> Except dlm lockspaces are in kern_stop state, so a whole dlm-related
>>> part is frozen :( - clvmd in my case, but I expect the same from gfs2
>>> and ocfs2.
>>> And fencing requests originated on CPG NODEDOWN event by dlm_controld
>>> (with my patch to dlm_controld and your patch for
>>> crm_terminate_member_common()) on a quorate partition are lost. DC.old
>>> doesn't accept CIB updates from other nodes, so that fencing requests
>>> are discarded.
>>
>> All the more reason to start using the stonith api directly.
>> I was playing around list night with the dlm_controld.pcmk code:
>>    https://github.com/beekhof/dlm/commit/9f890a36f6844c2a0567aea0a0e29cc47b01b787
>
> Wow, I'll try it!
>
> Btw (offtopic), don't you think that it could be interesting to have
> stacks support in dlopened modules there? From what I see in that code,
> it could be almost easily achieved. One just needs to create module API
> structure, enumerate functions in each stack, add module loading to
> dlm_controld core and change calls to module functions.

I'm sure its possible. Just up to David if he wants to support it.

>
>>
>>>
>>> I think that problem is that membership changes are handled in a
>>> non-transactional way (?).
>>
>> Sounds more like the dlm/etc is being dumb - if the host is back and
>> healthy, why would we want to shoot it?
>
> Ammmm..... No comments from me on this ;)
>
> But, anyways, something needs to be done at either side...
>
>>
>>> If pacemaker fully finish processing of one membership change - elect
>>> new DC on a quorate partition, and do not try to take over dc role (or
>>> release it) on a non-quorate partition if quorate one exists, that
>>> problem could be gone.
>>
>> Non quorate partitions still have a DC.
>> They're just not supposed to do anything (depending on the value of
>> no-quorum-policy).
>
> I actually meant "do not try to take over dc role in a rejoined cluster
> (or release that role) if it was running on a non-quorate partition
> before rejoin if quorate one existed".

All existing DC's give up the role and a new one is elected when two
partitions join.
So I'm unsure what you're referring to here :-)

> Sorry for confusion. Not very
> natural wording again, but should be better.
>
> May be DC from non-quorate partition should just have lower priority to
> become DC when cluster rejoins and new election happen (does it?)?

There is no bias towards past DCs in the election.

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


bubble at hoster-ok

Oct 2, 2011, 9:34 PM

Post #13 of 43 (3845 views)
Permalink
Re: pacemaker/dlm problems [In reply to]

03.10.2011 04:41, Andrew Beekhof wrote:
[...]
>>>> If pacemaker fully finish processing of one membership change - elect
>>>> new DC on a quorate partition, and do not try to take over dc role (or
>>>> release it) on a non-quorate partition if quorate one exists, that
>>>> problem could be gone.
>>>
>>> Non quorate partitions still have a DC.
>>> They're just not supposed to do anything (depending on the value of
>>> no-quorum-policy).
>>
>> I actually meant "do not try to take over dc role in a rejoined cluster
>> (or release that role) if it was running on a non-quorate partition
>> before rejoin if quorate one existed".
>
> All existing DC's give up the role and a new one is elected when two
> partitions join.
> So I'm unsure what you're referring to here :-)
>
>> Sorry for confusion. Not very
>> natural wording again, but should be better.
>>
>> May be DC from non-quorate partition should just have lower priority to
>> become DC when cluster rejoins and new election happen (does it?)?
>
> There is no bias towards past DCs in the election.

From what I understand, election result highly depends on nodes
(pacemaker processes) uptime. And DC.old has a great chance to win an
election, just because it won it before, and nothing changed in election
parameters after that. Please fix me.

Best,
Vladislav

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


andrew at beekhof

Oct 3, 2011, 12:56 AM

Post #14 of 43 (3870 views)
Permalink
Re: pacemaker/dlm problems [In reply to]

On Mon, Oct 3, 2011 at 3:34 PM, Vladislav Bogdanov <bubble [at] hoster-ok> wrote:
> 03.10.2011 04:41, Andrew Beekhof wrote:
> [...]
>>>>> If pacemaker fully finish processing of one membership change - elect
>>>>> new DC on a quorate partition, and do not try to take over dc role (or
>>>>> release it) on a non-quorate partition if quorate one exists, that
>>>>> problem could be gone.
>>>>
>>>> Non quorate partitions still have a DC.
>>>> They're just not supposed to do anything (depending on the value of
>>>> no-quorum-policy).
>>>
>>> I actually meant "do not try to take over dc role in a rejoined cluster
>>> (or release that role) if it was running on a non-quorate partition
>>> before rejoin if quorate one existed".
>>
>> All existing DC's give up the role and a new one is elected when two
>> partitions join.
>> So I'm unsure what you're referring to here :-)
>>
>>> Sorry for confusion. Not very
>>> natural wording again, but should be better.
>>>
>>> May be DC from non-quorate partition should just have lower priority to
>>> become DC when cluster rejoins and new election happen (does it?)?
>>
>> There is no bias towards past DCs in the election.
>
> From what I understand, election result highly depends on nodes
> (pacemaker processes) uptime. And DC.old has a great chance to win an
> election, just because it won it before, and nothing changed in election
> parameters after that. Please fix me.

Correct. But its not getting an advantage because it was DC.

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


bubble at hoster-ok

Oct 3, 2011, 1:29 AM

Post #15 of 43 (3846 views)
Permalink
Re: pacemaker/dlm problems [In reply to]

03.10.2011 10:56, Andrew Beekhof wrote:
> On Mon, Oct 3, 2011 at 3:34 PM, Vladislav Bogdanov<bubble [at] hoster-ok> wrote:
>> 03.10.2011 04:41, Andrew Beekhof wrote:
>> [...]
>>>>>> If pacemaker fully finish processing of one membership change - elect
>>>>>> new DC on a quorate partition, and do not try to take over dc role (or
>>>>>> release it) on a non-quorate partition if quorate one exists, that
>>>>>> problem could be gone.
>>>>>
>>>>> Non quorate partitions still have a DC.
>>>>> They're just not supposed to do anything (depending on the value of
>>>>> no-quorum-policy).
>>>>
>>>> I actually meant "do not try to take over dc role in a rejoined cluster
>>>> (or release that role) if it was running on a non-quorate partition
>>>> before rejoin if quorate one existed".
>>>
>>> All existing DC's give up the role and a new one is elected when two
>>> partitions join.
>>> So I'm unsure what you're referring to here :-)
>>>
>>>> Sorry for confusion. Not very
>>>> natural wording again, but should be better.
>>>>
>>>> May be DC from non-quorate partition should just have lower priority to
>>>> become DC when cluster rejoins and new election happen (does it?)?
>>>
>>> There is no bias towards past DCs in the election.
>>
>> From what I understand, election result highly depends on nodes
>> (pacemaker processes) uptime. And DC.old has a great chance to win an
>> election, just because it won it before, and nothing changed in election
>> parameters after that. Please fix me.
>
> Correct. But its not getting an advantage because it was DC.

But it could have it because it f.e. has greater uptime (and that
actually was a reason it won previous elections, before split-brain).
And then it can drop all cib modifications which happened in a quorate
partition during split-brain. At least some messages in logs (you should
have them) make me think so. If it is possible to avoid this - it would
be great. So, from my PoV, one of two should happen
* DC.old does not win
* DC old wins and replaces its CIB with copy from DC.new

Am I wrong here?

Vladislav

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


andrew at beekhof

Oct 3, 2011, 4:53 PM

Post #16 of 43 (3825 views)
Permalink
Re: pacemaker/dlm problems [In reply to]

On Mon, Oct 3, 2011 at 7:29 PM, Vladislav Bogdanov <bubble [at] hoster-ok> wrote:
> 03.10.2011 10:56, Andrew Beekhof wrote:
>>
>> On Mon, Oct 3, 2011 at 3:34 PM, Vladislav Bogdanov<bubble [at] hoster-ok>
>>  wrote:
>>>
>>> 03.10.2011 04:41, Andrew Beekhof wrote:
>>> [...]
>>>>>>>
>>>>>>> If pacemaker fully finish processing of one membership change - elect
>>>>>>> new DC on a quorate partition, and do not try to take over dc role
>>>>>>> (or
>>>>>>> release it) on a non-quorate partition if quorate one exists, that
>>>>>>> problem could be gone.
>>>>>>
>>>>>> Non quorate partitions still have a DC.
>>>>>> They're just not supposed to do anything (depending on the value of
>>>>>> no-quorum-policy).
>>>>>
>>>>> I actually meant "do not try to take over dc role in a rejoined cluster
>>>>> (or release that role) if it was running on a non-quorate partition
>>>>> before rejoin if quorate one existed".
>>>>
>>>> All existing DC's give up the role and a new one is elected when two
>>>> partitions join.
>>>> So I'm unsure what you're referring to here :-)
>>>>
>>>>> Sorry for confusion. Not very
>>>>> natural wording again, but should be better.
>>>>>
>>>>> May be DC from non-quorate partition should just have lower priority to
>>>>> become DC when cluster rejoins and new election happen (does it?)?
>>>>
>>>> There is no bias towards past DCs in the election.
>>>
>>>  From what I understand, election result highly depends on nodes
>>> (pacemaker processes) uptime. And DC.old has a great chance to win an
>>> election, just because it won it before, and nothing changed in election
>>> parameters after that. Please fix me.
>>
>> Correct.  But its not getting an advantage because it was DC.
>
> But it could have it because it f.e. has greater uptime (and that actually
> was a reason it won previous elections, before split-brain).
> And then it can drop all cib modifications which happened in a quorate
> partition during split-brain. At least some messages in logs (you should
> have them) make me think so. If it is possible to avoid this - it would be
> great. So, from my PoV, one of two should happen
> * DC.old does not win
> * DC old wins and replaces its CIB with copy from DC.new
>
> Am I wrong here?

The CIB which is used depends not on which node was DC but which node
had CIB.latest.

>
> Vladislav
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


bubble at hoster-ok

Nov 14, 2011, 12:36 PM

Post #17 of 43 (3776 views)
Permalink
Re: [Partially SOLVED] pacemaker/dlm problems [In reply to]

Hi Andrew,

I just found another problem with dlm_controld.pcmk (with your latest
patch from github applied and also my fixes to actually build it - they
are included in a message referenced by this one).
One node which just requested fencing of another one stucks at printing
that message where you print ctime() in fence_node_time() (pacemaker.c
near 293) every second. No other messages appear, although
fence_node_time() is called only from check_fencing_done() (cpg.c near
444). So, both of (last_fenced_time >= node->fail_time) and
(!node->fence_queries || node->fence_time != last_fenced_time) are
false, otherwise one of messages for that cases should be shown. Then,
fence_node_time() seems to return 0 from
if (wait_count)
return 0;
(wait_count is incremented if (last_fenced_time >= node->fail_time) is
false), so it never reaches check_fencing_done() call and never return
expected 1.
Offending node was actually fenced, but that was actually not handled by
dlm_controld.

May I ask you to help me a bit with all that logic (as you already dived
into dlm_controld sources again), I seem to be so near the success... :|

btw, I cant find what source is your dlm repo forked from, may be you
remember?

Best,
Vladislav

28.09.2011 17:41, Vladislav Bogdanov wrote:
> Hi Andrew,
>
>>> All the more reason to start using the stonith api directly.
>>> I was playing around list night with the dlm_controld.pcmk code:
>>> https://github.com/beekhof/dlm/commit/9f890a36f6844c2a0567aea0a0e29cc47b01b787
>>
>> Doesn't seem to apply to 3.0.17, so I rebased that commit against it for
>> my build. Then it doesn't compile without attached patch.
>> It may need to be rebased a bit against your tree.
>>
>> Now I have package built and am building node images. Will try shortly.
>
> Fencing from within dlm_controld.pcmk still did not work with your first
> patch against that _no_mainloop function (expected).
>
> So I did my best to build packages from the current git tree.
>
> Voila! I got failed node correctly fenced!
> I'll do some more extensive testing next days, but I believe everything
> should be much better now.
>
> I knew you're genius he-he ;)
>
> So, here are steps to get DLM handle CPG NODEDOWN events correctly with
> pacemaker using openais stack:
>
> 1. Build pacemaker (as of 2011-09-28) from git.
> 2. Apply attached patches to cluster-3.0.17 source tree.
> 3. Build dlm_controld.pcmk
>
> One note - gfs2_controld probably needs to be fixed too (FIXME).
>
> Best regards,
> Vladislav
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


bubble at hoster-ok

Nov 14, 2011, 12:58 PM

Post #18 of 43 (3774 views)
Permalink
Re: [Partially SOLVED] pacemaker/dlm problems [In reply to]

Ah, one (possibly important) addition.

That offending node is a vm on another cluster which experienced serious
problems that time due to bug in one of my RAs, and cluster host
carrying that vm was hardly shut down. So, it may be possible that
fencing request succeeded right before that host done. There was some
significant time before vm was started on another host. And it also
possible that vm failed again after start. Can't say anything more
precise right now, I needed to return all that to the alive state quickly.
From the above, may it be possible that last_fenced_time should be set
to zero if node have been seen after if was fenced and then appeared again?

14.11.2011 23:36, Vladislav Bogdanov wrote:
> Hi Andrew,
>
> I just found another problem with dlm_controld.pcmk (with your latest
> patch from github applied and also my fixes to actually build it - they
> are included in a message referenced by this one).
> One node which just requested fencing of another one stucks at printing
> that message where you print ctime() in fence_node_time() (pacemaker.c
> near 293) every second. No other messages appear, although
> fence_node_time() is called only from check_fencing_done() (cpg.c near
> 444). So, both of (last_fenced_time >= node->fail_time) and
> (!node->fence_queries || node->fence_time != last_fenced_time) are
> false, otherwise one of messages for that cases should be shown. Then,
> fence_node_time() seems to return 0 from
> if (wait_count)
> return 0;
> (wait_count is incremented if (last_fenced_time >= node->fail_time) is
> false), so it never reaches check_fencing_done() call and never return
> expected 1.
> Offending node was actually fenced, but that was actually not handled by
> dlm_controld.
>
> May I ask you to help me a bit with all that logic (as you already dived
> into dlm_controld sources again), I seem to be so near the success... :|
>
> btw, I cant find what source is your dlm repo forked from, may be you
> remember?
>
> Best,
> Vladislav
>
> 28.09.2011 17:41, Vladislav Bogdanov wrote:
>> Hi Andrew,
>>
>>>> All the more reason to start using the stonith api directly.
>>>> I was playing around list night with the dlm_controld.pcmk code:
>>>> https://github.com/beekhof/dlm/commit/9f890a36f6844c2a0567aea0a0e29cc47b01b787
>>>
>>> Doesn't seem to apply to 3.0.17, so I rebased that commit against it for
>>> my build. Then it doesn't compile without attached patch.
>>> It may need to be rebased a bit against your tree.
>>>
>>> Now I have package built and am building node images. Will try shortly.
>>
>> Fencing from within dlm_controld.pcmk still did not work with your first
>> patch against that _no_mainloop function (expected).
>>
>> So I did my best to build packages from the current git tree.
>>
>> Voila! I got failed node correctly fenced!
>> I'll do some more extensive testing next days, but I believe everything
>> should be much better now.
>>
>> I knew you're genius he-he ;)
>>
>> So, here are steps to get DLM handle CPG NODEDOWN events correctly with
>> pacemaker using openais stack:
>>
>> 1. Build pacemaker (as of 2011-09-28) from git.
>> 2. Apply attached patches to cluster-3.0.17 source tree.
>> 3. Build dlm_controld.pcmk
>>
>> One note - gfs2_controld probably needs to be fixed too (FIXME).
>>
>> Best regards,
>> Vladislav
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker [at] oss
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


andrew at beekhof

Nov 23, 2011, 8:33 PM

Post #19 of 43 (3728 views)
Permalink
Re: [Partially SOLVED] pacemaker/dlm problems [In reply to]

On Tue, Nov 15, 2011 at 7:36 AM, Vladislav Bogdanov
<bubble [at] hoster-ok> wrote:
> Hi Andrew,
>
> I just found another problem with dlm_controld.pcmk (with your latest
> patch from github applied and also my fixes to actually build it - they
> are included in a message referenced by this one).
> One node which just requested fencing of another one stucks at printing
> that message where you print ctime() in fence_node_time() (pacemaker.c
> near 293) every second.

So not blocked, it just keeps repeating that message?
What date does it print?

Did you change it to the following?
log_debug("Node %d was last shot at: %s", nodeid, ctime(*last_fenced_time));

> No other messages appear, although
> fence_node_time() is called only from check_fencing_done() (cpg.c near
> 444). So, both of (last_fenced_time >= node->fail_time) and
> (!node->fence_queries || node->fence_time != last_fenced_time) are
> false, otherwise one of messages for that cases should be shown. Then,
> fence_node_time() seems to return 0 from
> if (wait_count)
>        return 0;
> (wait_count is incremented if (last_fenced_time >= node->fail_time) is
> false), so it never reaches check_fencing_done() call and never return
> expected 1.
> Offending node was actually fenced, but that was actually not handled by
> dlm_controld.
>
> May I ask you to help me a bit with all that logic (as you already dived
> into dlm_controld sources again), I seem to be so near the success... :|
>
> btw, I cant find what source is your dlm repo forked from, may be you
> remember?

iirc, it was dlm.git on fedorahosted.

>
> Best,
> Vladislav
>
> 28.09.2011 17:41, Vladislav Bogdanov wrote:
>> Hi Andrew,
>>
>>>> All the more reason to start using the stonith api directly.
>>>> I was playing around list night with the dlm_controld.pcmk code:
>>>>    https://github.com/beekhof/dlm/commit/9f890a36f6844c2a0567aea0a0e29cc47b01b787
>>>
>>> Doesn't seem to apply to 3.0.17, so I rebased that commit against it for
>>> my build. Then it doesn't compile without attached patch.
>>> It may need to be rebased a bit against your tree.
>>>
>>> Now I have package built and am building node images. Will try shortly.
>>
>> Fencing from within dlm_controld.pcmk still did not work with your first
>> patch against that _no_mainloop function (expected).
>>
>> So I did my best to build packages from the current git tree.
>>
>> Voila! I got failed node correctly fenced!
>> I'll do some more extensive testing next days, but I believe everything
>> should be much better now.
>>
>> I knew you're genius he-he ;)
>>
>> So, here are steps to get DLM handle CPG NODEDOWN events correctly with
>> pacemaker using openais stack:
>>
>> 1. Build pacemaker (as of 2011-09-28) from git.
>> 2. Apply attached patches to cluster-3.0.17 source tree.
>> 3. Build dlm_controld.pcmk
>>
>> One note - gfs2_controld probably needs to be fixed too (FIXME).
>>
>> Best regards,
>> Vladislav
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker [at] oss
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


bubble at hoster-ok

Nov 23, 2011, 8:58 PM

Post #20 of 43 (3716 views)
Permalink
Re: [Partially SOLVED] pacemaker/dlm problems [In reply to]

24.11.2011 07:33, Andrew Beekhof wrote:
> On Tue, Nov 15, 2011 at 7:36 AM, Vladislav Bogdanov
> <bubble [at] hoster-ok> wrote:
>> Hi Andrew,
>>
>> I just found another problem with dlm_controld.pcmk (with your latest
>> patch from github applied and also my fixes to actually build it - they
>> are included in a message referenced by this one).
>> One node which just requested fencing of another one stucks at printing
>> that message where you print ctime() in fence_node_time() (pacemaker.c
>> near 293) every second.
>
> So not blocked, it just keeps repeating that message?
> What date does it print?

Blocked... kern_stop

It prints the same date not so far ago (in that case).
I did catch it only once and cannot repeat yet. Date is printed correct
in a "normal" fencing circumstances.

>
> Did you change it to the following?
> log_debug("Node %d was last shot at: %s", nodeid, ctime(*last_fenced_time));

http://www.mail-archive.com/pacemaker [at] oss/msg09959.html
contains patches against 3.0.17 which I use. I only backported commits
to dlm_controld core from 3.1.1 (and 3.1.7 last days) to make it up2date
(they are minor).

man ctime
char *ctime(const time_t *timep);

int fence_node_time(int nodeid, uint64_t *last_fenced_time)
is called from check_fencing_done() with
uint64_t last_fenced_time;
rv = fence_node_time(node->nodeid, &last_fenced_time);
so, I changed it to ctime(last_fenced_time). btw ctime adds trailing
newline, so it badly fits for logs.

One thought: may be last commits to dlm.git (with membership monitoring,
notably e529211682418a8e33feafc9f703cff87e23aeba) may help here?

And one note - I use fence_xvm for that failed VM, and I found that it
is a little bit deficient - only one instance of it can be run on a host
simultaneously as it binds to the predefined TCP port. May be that may
influence as well...

>
>> No other messages appear, although
>> fence_node_time() is called only from check_fencing_done() (cpg.c near
>> 444). So, both of (last_fenced_time >= node->fail_time) and
>> (!node->fence_queries || node->fence_time != last_fenced_time) are
>> false, otherwise one of messages for that cases should be shown. Then,
>> fence_node_time() seems to return 0 from
>> if (wait_count)
>> return 0;
>> (wait_count is incremented if (last_fenced_time >= node->fail_time) is
>> false), so it never reaches check_fencing_done() call and never return
>> expected 1.
>> Offending node was actually fenced, but that was actually not handled by
>> dlm_controld.
>>
>> May I ask you to help me a bit with all that logic (as you already dived
>> into dlm_controld sources again), I seem to be so near the success... :|
>>
>> btw, I cant find what source is your dlm repo forked from, may be you
>> remember?
>
> iirc, it was dlm.git on fedorahosted.

Yep, I found that already, pacemaker branch. It seems to be a little bit
outdated comparing to 3.0.17 btw.

>
>>
>> Best,
>> Vladislav
>>
>> 28.09.2011 17:41, Vladislav Bogdanov wrote:
>>> Hi Andrew,
>>>
>>>>> All the more reason to start using the stonith api directly.
>>>>> I was playing around list night with the dlm_controld.pcmk code:
>>>>> https://github.com/beekhof/dlm/commit/9f890a36f6844c2a0567aea0a0e29cc47b01b787
>>>>
>>>> Doesn't seem to apply to 3.0.17, so I rebased that commit against it for
>>>> my build. Then it doesn't compile without attached patch.
>>>> It may need to be rebased a bit against your tree.
>>>>
>>>> Now I have package built and am building node images. Will try shortly.
>>>
>>> Fencing from within dlm_controld.pcmk still did not work with your first
>>> patch against that _no_mainloop function (expected).
>>>
>>> So I did my best to build packages from the current git tree.
>>>
>>> Voila! I got failed node correctly fenced!
>>> I'll do some more extensive testing next days, but I believe everything
>>> should be much better now.
>>>
>>> I knew you're genius he-he ;)
>>>
>>> So, here are steps to get DLM handle CPG NODEDOWN events correctly with
>>> pacemaker using openais stack:
>>>
>>> 1. Build pacemaker (as of 2011-09-28) from git.
>>> 2. Apply attached patches to cluster-3.0.17 source tree.
>>> 3. Build dlm_controld.pcmk
>>>
>>> One note - gfs2_controld probably needs to be fixed too (FIXME).
>>>
>>> Best regards,
>>> Vladislav
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker [at] oss
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>>


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andrew at beekhof

Nov 23, 2011, 9:49 PM

Post #21 of 43 (3726 views)
Permalink
Re: [Partially SOLVED] pacemaker/dlm problems [In reply to]

On Thu, Nov 24, 2011 at 3:58 PM, Vladislav Bogdanov
<bubble [at] hoster-ok> wrote:
> 24.11.2011 07:33, Andrew Beekhof wrote:
>> On Tue, Nov 15, 2011 at 7:36 AM, Vladislav Bogdanov
>> <bubble [at] hoster-ok> wrote:
>>> Hi Andrew,
>>>
>>> I just found another problem with dlm_controld.pcmk (with your latest
>>> patch from github applied and also my fixes to actually build it - they
>>> are included in a message referenced by this one).
>>> One node which just requested fencing of another one stucks at printing
>>> that message where you print ctime() in fence_node_time() (pacemaker.c
>>> near 293) every second.
>>
>> So not blocked, it just keeps repeating that message?
>> What date does it print?
>
> Blocked... kern_stop

I'm confused.
How can it do that every second?

>
> It prints the same date not so far ago (in that case).
> I did catch it only once and cannot repeat yet. Date is printed correct
> in a "normal" fencing circumstances.
>
>>
>> Did you change it to the following?
>>   log_debug("Node %d was last shot at: %s", nodeid, ctime(*last_fenced_time));
>
> http://www.mail-archive.com/pacemaker [at] oss/msg09959.html
> contains patches against 3.0.17 which I use. I only backported commits
> to dlm_controld core from 3.1.1 (and 3.1.7 last days) to make it up2date
> (they are minor).

Ok, this (which was from my original patch) is wrong:

+ log_debug("Node %d/%s was last shot at: %s", nodeid,
ctime(*last_fenced_time));

The format string expects 3 parameters but there are only 2 supplied.
This could easily result in what you're seeing.


>
> man ctime
> char *ctime(const time_t *timep);
>
> int fence_node_time(int nodeid, uint64_t *last_fenced_time)
> is called from check_fencing_done() with
> uint64_t last_fenced_time;
> rv = fence_node_time(node->nodeid, &last_fenced_time);
> so, I changed it to ctime(last_fenced_time). btw ctime adds trailing
> newline, so it badly fits for logs.
>
> One thought: may be last commits to dlm.git (with membership monitoring,
> notably e529211682418a8e33feafc9f703cff87e23aeba) may help here?
>
> And one note - I use fence_xvm for that failed VM, and I found that it
> is a little bit deficient - only one instance of it can be run on a host
> simultaneously as it binds to the predefined TCP port. May be that may
> influence as well...
>
>>
>>> No other messages appear, although
>>> fence_node_time() is called only from check_fencing_done() (cpg.c near
>>> 444). So, both of (last_fenced_time >= node->fail_time) and
>>> (!node->fence_queries || node->fence_time != last_fenced_time) are
>>> false, otherwise one of messages for that cases should be shown. Then,
>>> fence_node_time() seems to return 0 from
>>> if (wait_count)
>>>        return 0;
>>> (wait_count is incremented if (last_fenced_time >= node->fail_time) is
>>> false), so it never reaches check_fencing_done() call and never return
>>> expected 1.
>>> Offending node was actually fenced, but that was actually not handled by
>>> dlm_controld.
>>>
>>> May I ask you to help me a bit with all that logic (as you already dived
>>> into dlm_controld sources again), I seem to be so near the success... :|
>>>
>>> btw, I cant find what source is your dlm repo forked from, may be you
>>> remember?
>>
>> iirc, it was dlm.git on fedorahosted.
>
> Yep, I found that already, pacemaker branch. It seems to be a little bit
> outdated comparing to 3.0.17 btw.
>
>>
>>>
>>> Best,
>>> Vladislav
>>>
>>> 28.09.2011 17:41, Vladislav Bogdanov wrote:
>>>> Hi Andrew,
>>>>
>>>>>> All the more reason to start using the stonith api directly.
>>>>>> I was playing around list night with the dlm_controld.pcmk code:
>>>>>>    https://github.com/beekhof/dlm/commit/9f890a36f6844c2a0567aea0a0e29cc47b01b787
>>>>>
>>>>> Doesn't seem to apply to 3.0.17, so I rebased that commit against it for
>>>>> my build. Then it doesn't compile without attached patch.
>>>>> It may need to be rebased a bit against your tree.
>>>>>
>>>>> Now I have package built and am building node images. Will try shortly.
>>>>
>>>> Fencing from within dlm_controld.pcmk still did not work with your first
>>>> patch against that _no_mainloop function (expected).
>>>>
>>>> So I did my best to build packages from the current git tree.
>>>>
>>>> Voila! I got failed node correctly fenced!
>>>> I'll do some more extensive testing next days, but I believe everything
>>>> should be much better now.
>>>>
>>>> I knew you're genius he-he ;)
>>>>
>>>> So, here are steps to get DLM handle CPG NODEDOWN events correctly with
>>>> pacemaker using openais stack:
>>>>
>>>> 1. Build pacemaker (as of 2011-09-28) from git.
>>>> 2. Apply attached patches to cluster-3.0.17 source tree.
>>>> 3. Build dlm_controld.pcmk
>>>>
>>>> One note - gfs2_controld probably needs to be fixed too (FIXME).
>>>>
>>>> Best regards,
>>>> Vladislav
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker [at] oss
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>
>>>
>
>

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


bubble at hoster-ok

Nov 23, 2011, 11:21 PM

Post #22 of 43 (3724 views)
Permalink
Re: [Partially SOLVED] pacemaker/dlm problems [In reply to]

24.11.2011 08:49, Andrew Beekhof wrote:
> On Thu, Nov 24, 2011 at 3:58 PM, Vladislav Bogdanov
> <bubble [at] hoster-ok> wrote:
>> 24.11.2011 07:33, Andrew Beekhof wrote:
>>> On Tue, Nov 15, 2011 at 7:36 AM, Vladislav Bogdanov
>>> <bubble [at] hoster-ok> wrote:
>>>> Hi Andrew,
>>>>
>>>> I just found another problem with dlm_controld.pcmk (with your latest
>>>> patch from github applied and also my fixes to actually build it - they
>>>> are included in a message referenced by this one).
>>>> One node which just requested fencing of another one stucks at printing
>>>> that message where you print ctime() in fence_node_time() (pacemaker.c
>>>> near 293) every second.
>>>
>>> So not blocked, it just keeps repeating that message?
>>> What date does it print?
>>
>> Blocked... kern_stop
>
> I'm confused.

As well as me...

> How can it do that every second?

Only in one case:
if both of (last_fenced_time >= node->fail_time) and
(!node->fence_queries || node->fence_time != last_fenced_time) are *false*.

So, three conditions are *true* at the same moment:
* last_fenced_time < node->fail_time
* node->fence_queries != 0
* node->fence_time == last_fenced_time

If that all are true, check_fencing_done just silently returns 0.

In all other cases I'd see one of messages "check_fencing %d done" or
"check_fencing %d wait" (first one should stop that loop btw) in between
of consequent "Node %d/%s was last shot at: %s".

>
>>
>> It prints the same date not so far ago (in that case).
>> I did catch it only once and cannot repeat yet. Date is printed correct
>> in a "normal" fencing circumstances.
>>
>>>
>>> Did you change it to the following?
>>> log_debug("Node %d was last shot at: %s", nodeid, ctime(*last_fenced_time));
>>
>> http://www.mail-archive.com/pacemaker [at] oss/msg09959.html
>> contains patches against 3.0.17 which I use. I only backported commits
>> to dlm_controld core from 3.1.1 (and 3.1.7 last days) to make it up2date
>> (they are minor).
>
> Ok, this (which was from my original patch) is wrong:
>
> + log_debug("Node %d/%s was last shot at: %s", nodeid,
> ctime(*last_fenced_time));

Agree, and I use
log_debug("Node %d/%s was last shot at: %s", nodeid, node_uname,
ctime(last_fenced_time));
Please see patches included in the message referenced above (a little
bit below of the backport of your original patch).

gcc sometimes is smart enough ;)

>
> The format string expects 3 parameters but there are only 2 supplied.
> This could easily result in what you're seeing.

So, no, that's not it.

>
>
>>
>> man ctime
>> char *ctime(const time_t *timep);
>>
>> int fence_node_time(int nodeid, uint64_t *last_fenced_time)
>> is called from check_fencing_done() with
>> uint64_t last_fenced_time;
>> rv = fence_node_time(node->nodeid, &last_fenced_time);
>> so, I changed it to ctime(last_fenced_time). btw ctime adds trailing
>> newline, so it badly fits for logs.
>>
>> One thought: may be last commits to dlm.git (with membership monitoring,
>> notably e529211682418a8e33feafc9f703cff87e23aeba) may help here?
>>
>> And one note - I use fence_xvm for that failed VM, and I found that it
>> is a little bit deficient - only one instance of it can be run on a host
>> simultaneously as it binds to the predefined TCP port. May be that may
>> influence as well...
>>
>>>
>>>> No other messages appear, although
>>>> fence_node_time() is called only from check_fencing_done() (cpg.c near
>>>> 444). So, both of (last_fenced_time >= node->fail_time) and
>>>> (!node->fence_queries || node->fence_time != last_fenced_time) are
>>>> false, otherwise one of messages for that cases should be shown. Then,
>>>> fence_node_time() seems to return 0 from
>>>> if (wait_count)
>>>> return 0;
>>>> (wait_count is incremented if (last_fenced_time >= node->fail_time) is
>>>> false), so it never reaches check_fencing_done() call and never return
>>>> expected 1.
>>>> Offending node was actually fenced, but that was actually not handled by
>>>> dlm_controld.
>>>>
>>>> May I ask you to help me a bit with all that logic (as you already dived
>>>> into dlm_controld sources again), I seem to be so near the success... :|
>>>>
>>>> btw, I cant find what source is your dlm repo forked from, may be you
>>>> remember?
>>>
>>> iirc, it was dlm.git on fedorahosted.
>>
>> Yep, I found that already, pacemaker branch. It seems to be a little bit
>> outdated comparing to 3.0.17 btw.
>>
>>>
>>>>
>>>> Best,
>>>> Vladislav
>>>>
>>>> 28.09.2011 17:41, Vladislav Bogdanov wrote:
>>>>> Hi Andrew,
>>>>>
>>>>>>> All the more reason to start using the stonith api directly.
>>>>>>> I was playing around list night with the dlm_controld.pcmk code:
>>>>>>> https://github.com/beekhof/dlm/commit/9f890a36f6844c2a0567aea0a0e29cc47b01b787
>>>>>>
>>>>>> Doesn't seem to apply to 3.0.17, so I rebased that commit against it for
>>>>>> my build. Then it doesn't compile without attached patch.
>>>>>> It may need to be rebased a bit against your tree.
>>>>>>
>>>>>> Now I have package built and am building node images. Will try shortly.
>>>>>
>>>>> Fencing from within dlm_controld.pcmk still did not work with your first
>>>>> patch against that _no_mainloop function (expected).
>>>>>
>>>>> So I did my best to build packages from the current git tree.
>>>>>
>>>>> Voila! I got failed node correctly fenced!
>>>>> I'll do some more extensive testing next days, but I believe everything
>>>>> should be much better now.
>>>>>
>>>>> I knew you're genius he-he ;)
>>>>>
>>>>> So, here are steps to get DLM handle CPG NODEDOWN events correctly with
>>>>> pacemaker using openais stack:
>>>>>
>>>>> 1. Build pacemaker (as of 2011-09-28) from git.
>>>>> 2. Apply attached patches to cluster-3.0.17 source tree.
>>>>> 3. Build dlm_controld.pcmk
>>>>>
>>>>> One note - gfs2_controld probably needs to be fixed too (FIXME).
>>>>>
>>>>> Best regards,
>>>>> Vladislav
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker [at] oss
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>
>>>>
>>
>>


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


bubble at hoster-ok

Dec 1, 2011, 6:32 AM

Post #23 of 43 (3729 views)
Permalink
Re: [Partially SOLVED] pacemaker/dlm problems [In reply to]

Hi Andrew,

I investigated on my test cluster what actually happens with dlm and
fencing.

I added more debug messages to dlm dump, and also did a re-kick of nodes
after some time.

Results are that stonith history actually doesn't contain any
information until pacemaker decides to fence node itself.

Testcase I used is: killall -9 dlm_controld.pcmk on one node,
After that I see in dlm dump:
1322748122 dlm:controld conf 3 0 1 memb 1074005258 1090782474 1124336906
join left 1107559690
1322748122 dlm:ls:clvmd conf 3 0 1 memb 1074005258 1090782474 1124336906
join left 1107559690
1322748122 clvmd add_change cg 7 remove nodeid 1107559690 reason 5
1322748122 Requested that node 1107559690 be kicked from the cluster
1322748122 clvmd add_change cg 7 counts member 3 joined 0 remove 1 failed 1
1322748122 clvmd stop_kernel cg 7
1322748122 write "0" to "/sys/kernel/dlm/clvmd/control"
1322748122 It does not appear node 1107559690/vd01-c has been shot
1322748122 clvmd check_fencing 1107559690 wait add 1322748073 fail
1322748122 last 0
1322748122 It does not appear node 1107559690/vd01-c has been shot
1322748123 It does not appear node 1107559690/vd01-c has been shot
...
1322748133 It does not appear node 1107559690/vd01-c has been shot
1322748133 Requested that node 1107559690 be kicked from the cluster
1322748134 It does not appear node 1107559690/vd01-c has been shot
...
1322748276 It does not appear node 1107559690/vd01-c has been shot
1322748276 Requested that node 1107559690 be kicked from the cluster
1322748277 It does not appear node 1107559690/vd01-c has been shot
1322748278 It does not appear node 1107559690/vd01-c has been shot
1322748279 It does not appear node 1107559690/vd01-c has been shot
1322748280 It does not appear node 1107559690/vd01-c has been shot
1322748281 It does not appear node 1107559690/vd01-c has been shot
1322748282 It does not appear node 1107559690/vd01-c has been shot
1322748283 Stonith history[0]: Fencing of node 1107559690/vd01-c is in
progress
1322748284 Stonith history[0]: Fencing of node 1107559690/vd01-c is in
progress
1322748285 Stonith history[0]: Fencing of node 1107559690/vd01-c is in
progress
1322748286 Stonith history[0]: Fencing of node 1107559690/vd01-c is in
progress
1322748287 Stonith history[0]: Fencing of node 1107559690/vd01-c is in
progress
1322748288 Stonith history[0]: Fencing of node 1107559690/vd01-c is in
progress
1322748289 Stonith history[0]: Fencing of node 1107559690/vd01-c is in
progress
1322748290 Stonith history[0]: Fencing of node 1107559690/vd01-c is in
progress
1322748291 Stonith history[0]: Fencing of node 1107559690/vd01-c is in
progress
1322748292 Stonith history[0]: Fencing of node 1107559690/vd01-c is in
progress
1322748293 Stonith history[0]: Fencing of node 1107559690/vd01-c is in
progress
1322748294 Stonith history[0]: Fencing of node 1107559690/vd01-c is in
progress
1322748295 Stonith history[0]: Fencing of node 1107559690/vd01-c is in
progress
1322748296 Processing membership 488
1322748296 Skipped active node 1124336906: born-on=476, last-seen=488,
this-event=488, last-event=484
1322748296 Skipped active node 1074005258: born-on=484, last-seen=488,
this-event=488, last-event=484
1322748296 Skipped active node 1090782474: born-on=464, last-seen=488,
this-event=488, last-event=484
1322748296 del_configfs_node rmdir
"/sys/kernel/config/dlm/cluster/comms/1107559690"
1322748296 Removed inactive node 1107559690: born-on=468, last-seen=484,
this-event=488, last-event=484
1322748296 Stonith history[0]: Node 1107559690/vd01-c fenced at 1322748296
1322748296 Node 1107559690/vd01-c was last shot at: 1322748296
1322748296 clvmd check_fencing 1107559690 done add 1322748073 fail
1322748122 last 1322748296

So, first stonith history entry appeared only after 161 second after
initial fencing attempt.
And that corresponds to following log lines (1322748283 = Dec 01 2011
14:04:43 UTC):
Dec 1 14:04:42 vd01-b pengine: [1894]: WARN: stage6: Scheduling Node
vd01-c for STONITH
Dec 1 14:04:42 vd01-b pengine: [1894]: WARN: native_stop_constraints:
Stop of failed resource dlm:2 is implicit after vd01-c is fenced
Dec 1 14:04:42 vd01-b pengine: [1894]: WARN: native_stop_constraints:
Stop of failed resource clvmd:2 is implicit after vd01-c is fenced

From my PoV that means that the call to
crm_terminate_member_no_mainloop() does not actually schedule fencing
operation.

Best,
Vladislav


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andrew at beekhof

Dec 8, 2011, 4:11 PM

Post #24 of 43 (3617 views)
Permalink
Re: [Partially SOLVED] pacemaker/dlm problems [In reply to]

On Fri, Dec 2, 2011 at 1:32 AM, Vladislav Bogdanov <bubble [at] hoster-ok> wrote:
> Hi Andrew,
>
> I investigated on my test cluster what actually happens with dlm and
> fencing.
>
> I added more debug messages to dlm dump, and also did a re-kick of nodes
> after some time.
>
> Results are that stonith history actually doesn't contain any
> information until pacemaker decides to fence node itself.

...

> From my PoV that means that the call to
> crm_terminate_member_no_mainloop() does not actually schedule fencing
> operation.

You're going to have to remind me... what does your copy of
crm_terminate_member_no_mainloop() look like?
This is with the non-cman editions of the controlds too right?

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andrew at beekhof

Dec 8, 2011, 4:15 PM

Post #25 of 43 (3614 views)
Permalink
Re: [Partially SOLVED] pacemaker/dlm problems [In reply to]

On Thu, Nov 24, 2011 at 6:21 PM, Vladislav Bogdanov
<bubble [at] hoster-ok> wrote:
> 24.11.2011 08:49, Andrew Beekhof wrote:
>> On Thu, Nov 24, 2011 at 3:58 PM, Vladislav Bogdanov
>> <bubble [at] hoster-ok> wrote:
>>> 24.11.2011 07:33, Andrew Beekhof wrote:
>>>> On Tue, Nov 15, 2011 at 7:36 AM, Vladislav Bogdanov
>>>> <bubble [at] hoster-ok> wrote:
>>>>> Hi Andrew,
>>>>>
>>>>> I just found another problem with dlm_controld.pcmk (with your latest
>>>>> patch from github applied and also my fixes to actually build it - they
>>>>> are included in a message referenced by this one).
>>>>> One node which just requested fencing of another one stucks at printing
>>>>> that message where you print ctime() in fence_node_time() (pacemaker.c
>>>>> near 293) every second.
>>>>
>>>> So not blocked, it just keeps repeating that message?
>>>> What date does it print?
>>>
>>> Blocked... kern_stop
>>
>> I'm confused.
>
> As well as me...
>
>> How can it do that every second?
>
> Only in one case:

I'm clearly not a kernel guy, but once the kernel is stopped, wouldn't
it be doing nothing?
How could the system re-hit the same condition if its stopped?

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

First page Previous page 1 2 Next page Last page  View All Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.