Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Dev

Fwd: [PATCH] call validate-all when monitoring with OCF_RESKEY_CRM_meta_interval=0

 

 

Linux-HA dev RSS feed   Index | Next | Previous | View Threaded


r.bhatia at ipax

Nov 10, 2009, 3:35 AM

Post #1 of 7 (1077 views)
Permalink
Fwd: [PATCH] call validate-all when monitoring with OCF_RESKEY_CRM_meta_interval=0

hi,

i've got the patch below waiting in my repository. i remember a
discussion with andrew and/or lars on irc but cannot recall the
exact reasons for doing this in mysqlproxy_monitor().

maybe you can figure it out and then either apply or deny my patch ;)

cheers,
raoul

-------- Original Message --------
Subject: [PATCH] call validate-all when monitoring with
OCF_RESKEY_CRM_meta_interval=0
Date: Fri, 18 Sep 2009 16:18:03 +0200
From: Raoul Bhatia [IPAX] <r.bhatia [at] ipax>
To: r.bhatia [at] ipax

# HG changeset patch
# User Raoul Bhatia [IPAX] <r.bhatia [at] ipax>
# Date 1253283431 -7200
# Node ID 751cdef555dee4af414be66de3919a22896c8310
# Parent be501346e016014a14a3078d4af0e824331586fb
call validate-all when monitoring with OCF_RESKEY_CRM_meta_interval=0

diff -r be501346e016 -r 751cdef555de heartbeat/mysql-proxy
--- a/heartbeat/mysql-proxy Wed Sep 16 13:52:48 2009 +0200
+++ b/heartbeat/mysql-proxy Fri Sep 18 16:17:11 2009 +0200
@@ -349,6 +349,14 @@

mysqlproxy_monitor()
{
+ if [ "${OCF_RESKEY_CRM_meta_interval:-0}" -eq "0" ]; then
+ # in case of probe, monitor operation is surely treated as
+ # under suspension. This will call start operation.
+ # (c/p from ocf:heartbeat:sfex)
+ mysqlproxy_validate_all
+ return $?
+ fi
+
if mysqlproxy_status ; then
return $OCF_SUCCESS
fi
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


dejanmm at fastmail

Nov 10, 2009, 4:57 AM

Post #2 of 7 (1026 views)
Permalink
Re: Fwd: [PATCH] call validate-all when monitoring with OCF_RESKEY_CRM_meta_interval=0 [In reply to]

Hi,

On Tue, Nov 10, 2009 at 12:35:29PM +0100, Raoul Bhatia [IPAX] wrote:
> hi,
>
> i've got the patch below waiting in my repository. i remember a
> discussion with andrew and/or lars on irc but cannot recall the
> exact reasons for doing this in mysqlproxy_monitor().
>
> maybe you can figure it out and then either apply or deny my patch ;)

Since the probe is the first action on a resource it does make
sense to do the validation there, though I'm not sure why
shouldn't it always be done on monitor.

The patch is good in principle, but not so in implementation.
I'll fix it and apply.

Cheers,

Dejan

> cheers,
> raoul
>
> -------- Original Message --------
> Subject: [PATCH] call validate-all when monitoring with
> OCF_RESKEY_CRM_meta_interval=0
> Date: Fri, 18 Sep 2009 16:18:03 +0200
> From: Raoul Bhatia [IPAX] <r.bhatia [at] ipax>
> To: r.bhatia [at] ipax
>
> # HG changeset patch
> # User Raoul Bhatia [IPAX] <r.bhatia [at] ipax>
> # Date 1253283431 -7200
> # Node ID 751cdef555dee4af414be66de3919a22896c8310
> # Parent be501346e016014a14a3078d4af0e824331586fb
> call validate-all when monitoring with OCF_RESKEY_CRM_meta_interval=0
>
> diff -r be501346e016 -r 751cdef555de heartbeat/mysql-proxy
> --- a/heartbeat/mysql-proxy Wed Sep 16 13:52:48 2009 +0200
> +++ b/heartbeat/mysql-proxy Fri Sep 18 16:17:11 2009 +0200
> @@ -349,6 +349,14 @@
>
> mysqlproxy_monitor()
> {
> + if [ "${OCF_RESKEY_CRM_meta_interval:-0}" -eq "0" ]; then
> + # in case of probe, monitor operation is surely treated as
> + # under suspension. This will call start operation.
> + # (c/p from ocf:heartbeat:sfex)
> + mysqlproxy_validate_all
> + return $?
> + fi
> +
> if mysqlproxy_status ; then
> return $OCF_SUCCESS
> fi
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


lmb at suse

Nov 10, 2009, 3:16 PM

Post #3 of 7 (1023 views)
Permalink
Re: Fwd: [PATCH] call validate-all when monitoring with OCF_RESKEY_CRM_meta_interval=0 [In reply to]

On 2009-11-10T13:57:48, Dejan Muhamedagic <dejanmm [at] fastmail> wrote:

> The patch is good in principle, but not so in implementation.
> I'll fix it and apply.

The problem is that the mysqld binary might well reside on the shared
device. If you error out with ERR_CONFIGURED in the probe, it will never
get started.

The patch is buggy.

validate-all is only allowed to do _syntax_ checks, not _semantic_ ones.
You can only check for all pre-requisites at start time, basically,
because only then all dependencies must be satisfied.

This is a common mistake, we remove them when we find them, but please
don't add new ones ;-)


Regards,
Lars

--
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


florian.haas at linbit

Nov 11, 2009, 1:14 AM

Post #4 of 7 (1011 views)
Permalink
Re: Fwd: [PATCH] call validate-all when monitoring with OCF_RESKEY_CRM_meta_interval=0 [In reply to]

On 2009-11-11 00:16, Lars Marowsky-Bree wrote:
> On 2009-11-10T13:57:48, Dejan Muhamedagic <dejanmm [at] fastmail> wrote:
>
>> The patch is good in principle, but not so in implementation.
>> I'll fix it and apply.
>
> The problem is that the mysqld binary might well reside on the shared
> device. If you error out with ERR_CONFIGURED in the probe, it will never
> get started.
>
> The patch is buggy.
>
> validate-all is only allowed to do _syntax_ checks, not _semantic_ ones.
> You can only check for all pre-requisites at start time, basically,
> because only then all dependencies must be satisfied.
>
> This is a common mistake, we remove them when we find them, but please
> don't add new ones ;-)

Maybe I'm missing something, but I don't follow that. If that is indeed
a mistake, then please replace "common" with "ubiquitous" as I believe
at least checking for binaries and erroring out if they are not present
is something that almost all resource agents do.

Examples (by no means exhaustive):

- mysql
- pgsql
- Route
- VirtualDomain
- Filesystem

Some of these check for binaries inside validate, some in a part of the
script that is unconditionally executed every time it's invoked
(including during validate).

So please, either revisit that policy, or fix all those RAs.

Cheers,
Florian
Attachments: signature.asc (0.25 KB)


lmb at suse

Nov 11, 2009, 1:45 AM

Post #5 of 7 (1006 views)
Permalink
Re: Fwd: [PATCH] call validate-all when monitoring with OCF_RESKEY_CRM_meta_interval=0 [In reply to]

On 2009-11-11T10:14:29, Florian Haas <florian.haas [at] linbit> wrote:

> > This is a common mistake, we remove them when we find them, but please
> > don't add new ones ;-)
>
> Maybe I'm missing something, but I don't follow that. If that is indeed
> a mistake, then please replace "common" with "ubiquitous" as I believe
> at least checking for binaries and erroring out if they are not present
> is something that almost all resource agents do.
>
> Examples (by no means exhaustive):
>
> - mysql
> - pgsql
> - Route
> - VirtualDomain
> - Filesystem
>
> Some of these check for binaries inside validate, some in a part of the
> script that is unconditionally executed every time it's invoked
> (including during validate).
>
> So please, either revisit that policy, or fix all those RAs.

The point being that it doesn't even need to be a policy; it is simply
an observation.

"Filesystem" is quite probably fine, because it just relies on standard
system binaries, and none of the binary paths are configurable. (Same
applies to Route/VirtualDomain.) Further, it is quite unlikely that
Filesystem needs to have a special cluster fs mounted, since one would
use Filesystem to do that ;-)

For mysql, pgsql, apache or others with configurable paths, it is
possible (and sometimes implemented as such) that they reference a
binary/file on cluster-managed storage. That won't be yet mounted on a
normal start-up, and thus, if the probe tries to verify the path, fail.

Simply put: if you're checking for dependencies possibly provided by
other resources, these won't be present at probe time. (Kind of obvious,
really.)

So yes, all RAs which unconditionally check for files which could
realistically be on cluster managed storage need to be fixed.

(pgsql would benefit immensely from using check_binary directly, too;
but that script further believes that the binary not present means that
the service is stopped. Ah, well.)


Regards,
Lars

--
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


dejanmm at fastmail

Nov 11, 2009, 3:22 AM

Post #6 of 7 (998 views)
Permalink
Re: Fwd: [PATCH] call validate-all when monitoring with OCF_RESKEY_CRM_meta_interval=0 [In reply to]

Hi,

On Wed, Nov 11, 2009 at 10:45:13AM +0100, Lars Marowsky-Bree wrote:
> On 2009-11-11T10:14:29, Florian Haas <florian.haas [at] linbit> wrote:
>
> > > This is a common mistake, we remove them when we find them, but please
> > > don't add new ones ;-)
> >
> > Maybe I'm missing something, but I don't follow that. If that is indeed
> > a mistake, then please replace "common" with "ubiquitous" as I believe
> > at least checking for binaries and erroring out if they are not present
> > is something that almost all resource agents do.
> >
> > Examples (by no means exhaustive):
> >
> > - mysql
> > - pgsql
> > - Route
> > - VirtualDomain
> > - Filesystem
> >
> > Some of these check for binaries inside validate, some in a part of the
> > script that is unconditionally executed every time it's invoked
> > (including during validate).
> >
> > So please, either revisit that policy, or fix all those RAs.
>
> The point being that it doesn't even need to be a policy; it is simply
> an observation.
>
> "Filesystem" is quite probably fine, because it just relies on standard
> system binaries, and none of the binary paths are configurable. (Same
> applies to Route/VirtualDomain.) Further, it is quite unlikely that
> Filesystem needs to have a special cluster fs mounted, since one would
> use Filesystem to do that ;-)
>
> For mysql, pgsql, apache or others with configurable paths, it is
> possible (and sometimes implemented as such) that they reference a
> binary/file on cluster-managed storage. That won't be yet mounted on a
> normal start-up, and thus, if the probe tries to verify the path, fail.
>
> Simply put: if you're checking for dependencies possibly provided by
> other resources, these won't be present at probe time. (Kind of obvious,
> really.)

In this case, the CRM, which knows that dependencies are not
started, should ignore the INSTALLED error. But that's probably
an extra burden on an already complex piece of software. Or we
provide some facility to all resource agents which would deal
with the binaries on a missing filesystem situation. I mean, this
is a issue common to all resource agents.

> So yes, all RAs which unconditionally check for files which could
> realistically be on cluster managed storage need to be fixed.
>
> (pgsql would benefit immensely from using check_binary directly, too;
> but that script further believes that the binary not present means that
> the service is stopped. Ah, well.)

Recently, I opened a bugzilla for this issue:

http://developerbugs.linux-foundation.org/show_bug.cgi?id=2204

which is still open.

Thanks to Lars for the exhaustive account on the matter (which
one can read in the bugzilla too).

Dejan

>
> Regards,
> Lars
>
> --
> Architect Storage/HA, OPS Engineering, Novell, Inc.
> SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


lmb at suse

Nov 11, 2009, 4:09 AM

Post #7 of 7 (1008 views)
Permalink
Re: Fwd: [PATCH] call validate-all when monitoring with OCF_RESKEY_CRM_meta_interval=0 [In reply to]

On 2009-11-11T12:22:48, Dejan Muhamedagic <dejanmm [at] fastmail> wrote:

> > Simply put: if you're checking for dependencies possibly provided by
> > other resources, these won't be present at probe time. (Kind of obvious,
> > really.)
>
> In this case, the CRM, which knows that dependencies are not
> started, should ignore the INSTALLED error.

No, that is silly, sorry. The RA simply shouldn't return the wrong
result.

For example, the ERR_INSTALLED error is quite proper to return if the
node doesn't ever have enough memory to support the resource, if it's
running the wrong kernel, the wrong architecture, a system binary is
missing or the wrong version ... In that case, one wouldn't want the CRM
to try and start there.

The distinction is quite similar to ERR_INSTALLED versus ERR_CONFIGURED;
the RA "simply" must return the proper value. It's almost like saying
the CRM should know that "ERR_GENERIC" on probe means stopped ...

> But that's probably an extra burden on an already complex piece of
> software. Or we provide some facility to all resource agents which
> would deal with the binaries on a missing filesystem situation. I
> mean, this is a issue common to all resource agents.

We already have check_binary and have_binary; adding "probe_binary" and
"probe_path" (for non-executables) would be trivial.

Still, the RAs would need to be fixed accordingly.


Regards,
Lars

--
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Linux-HA dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.