Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Dev

drbd peer outdater exit codes

 

 

Linux-HA dev RSS feed   Index | Next | Previous | View Threaded


r.bhatia at ipax

Sep 12, 2008, 2:34 AM

Post #1 of 6 (1821 views)
Permalink
drbd peer outdater exit codes

hello drbd-user,
hello linux-ha-dev,

[0] shows a list of exit codes of the "peer outdater interface".

during my tests i encounter two other exit codes, which might be worth
mentioning somewhere (i do not know if this should be in the drbd
user-guide or somwhere at linux-ha):

for the records, i use "on node_x" instead of the real host name
in my drbd.conf, as suggested in [1].


on wc01 aka node_0, i issue:
> # /usr/lib/heartbeat/drbd-peer-outdater -p host -r mysql; echo $?
> 20

on wc02 aka node_1, i see:
> /usr/lib/heartbeat/dopd[31027]: 2008/09/12_11:17:24 info: unknown exit code from drbdadm outdate mysql: 10
> /usr/lib/heartbeat/dopd[31027]: 2008/09/12_11:17:24 info: sending return code: 20, wc02 -> wc01

drbdadm exit code 10 is the usual thing i see when __DRBD_NODE__ is not
correctly set:

> 2 ~ # drbdadm state all; echo $?
> /etc/drbd.conf:475: in resource www, on node_0 { ... } ... on node_1 { ... }:
> There are multiple host sections for the peer.
> Maybe misspelled local host name 'wc02'?
> ...
> 10

moreover, in the dopd source i find:
> * other => 20 (which is "officially undefined",
> * unspecified error, could not be outdated)

which explains the conversion from return code 10 to return code 20.

so i would suggest to mention this at the relevant places, e.g. [0] and
maybe at [1]. what do you think?

cheers,
raoul
[0] http://www.drbd.org/users-guide/s-outdate-peer.html
[1]
http://www.linux-ha.org/DRBD/HowTov2#head-d708dad9dda821dbd1f53296bedebd74339dfa82
--
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc. email. r.bhatia [at] ipax
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
Barawitzkagasse 10/2/2/11 email. office [at] ipax
1190 Wien tel. +43 1 3670030
FN 277995t HG Wien fax. +43 1 3670030 15
____________________________________________________________________
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


lars.ellenberg at linbit

Sep 12, 2008, 2:54 AM

Post #2 of 6 (1732 views)
Permalink
Re: [DRBD-user] drbd peer outdater exit codes [drbd ocf floating peers not working. don't try] [In reply to]

On Fri, Sep 12, 2008 at 11:34:46AM +0200, Raoul Bhatia [IPAX] wrote:
> hello drbd-user,
> hello linux-ha-dev,
>
> [0] shows a list of exit codes of the "peer outdater interface".
>
> during my tests i encounter two other exit codes, which might be worth
> mentioning somewhere (i do not know if this should be in the drbd
> user-guide or somwhere at linux-ha):
>
> for the records, i use "on node_x" instead of the real host name
> in my drbd.conf, as suggested in [1].

don't do that.
won't work with any handlers.
for the "floating peer" stuff to work in practice,
we'd need much more work in the user land tools.

while a neat idea in theory,
it does not work out in practice.

it even uses an undocumented "hack",
namely setting the "__DRBD_NODE__" environment variable,
which was introduced solely to make it more easy for me
to "sanity check" drbd config files sent to me to comment on,
using the "dump" and "-d up" and similar commands.
it was never intended to be actually used.

so the whole "floating peer" stuff is a hack in itself.
does anybody out there really use it?

it might be possible to get this hack sort of working
by adding more hacks. like telling the kernel which
"nodename" to fake when calling the user space helpers.

but I don't think that would be a good idea.

--
: Lars Ellenberg
: LINBIT HA-Solutions GmbH
: DRBD®/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks
of LINBIT Information Technologies GmbH
__
please don't Cc me, but send to list -- I'm subscribed
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


r.bhatia at ipax

Sep 12, 2008, 4:53 AM

Post #3 of 6 (1735 views)
Permalink
Re: [DRBD-user] drbd peer outdater exit codes [drbd ocf floating peers not working. don't try] [In reply to]

Lars Ellenberg wrote:
> don't do that.
> won't work with any handlers.
> for the "floating peer" stuff to work in practice,
> we'd need much more work in the user land tools.
>
> while a neat idea in theory,
> it does not work out in practice.
>
> it even uses an undocumented "hack",
> namely setting the "__DRBD_NODE__" environment variable,
> which was introduced solely to make it more easy for me
> to "sanity check" drbd config files sent to me to comment on,
> using the "dump" and "-d up" and similar commands.
> it was never intended to be actually used.

ok, i never realized that as it was explicitly mentioned in
the linux-ha documentation.

> so the whole "floating peer" stuff is a hack in itself.
> does anybody out there really use it?

well, if you have not got a dedicated storage subsystem with
2/3 nodes but rely on e.g. 2 nodes out of 5 doing the mirroring
and offering files via nfs, or
you do not care where service x is started but need the drbd device
for it to function properly, or
you want to have some "hot-spare-nodes" for your cluster, it sounds
like "a neat idea" :)

but i've only used it in cases where i too would be able to use the
actual hostname(s).

> it might be possible to get this hack sort of working
> by adding more hacks. like telling the kernel which
> "nodename" to fake when calling the user space helpers.
>
> but I don't think that would be a good idea.

correct me if i am wrong but saving the node's alias during drbdadm
attach/up/etc. shouldn't be that bad, because it already has been used
to find the right configuration from the drbd.conf file anyway.

cheers,
raoul
--
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc. email. r.bhatia [at] ipax
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
Barawitzkagasse 10/2/2/11 email. office [at] ipax
1190 Wien tel. +43 1 3670030
FN 277995t HG Wien fax. +43 1 3670030 15
____________________________________________________________________
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


lmb at suse

Sep 12, 2008, 11:36 AM

Post #4 of 6 (1730 views)
Permalink
Re: Re: [DRBD-user] drbd peer outdater exit codes [drbd ocf floating peers not working. don't try] [In reply to]

On 2008-09-12T11:54:38, Lars Ellenberg <lars.ellenberg [at] linbit> wrote:

> don't do that.
> won't work with any handlers.
> for the "floating peer" stuff to work in practice,
> we'd need much more work in the user land tools.
>
> while a neat idea in theory,
> it does not work out in practice.

That's simply not true. It works, but it does not work with dopd. That
is a difference. ;-)

> so the whole "floating peer" stuff is a hack in itself.
> does anybody out there really use it?

Yes, I think we have one or two customers use it to move the peers. Each
rack has access to a separate SAN.

> it might be possible to get this hack sort of working
> by adding more hacks. like telling the kernel which
> "nodename" to fake when calling the user space helpers.
>
> but I don't think that would be a good idea.

I actually think that dopd is the real hack, and drbd instead should
listen to the notifications we provide, and infer the peer state by that
means ... ;-)



Regards,
Lars

--
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


lars.ellenberg at linbit

Sep 12, 2008, 1:15 PM

Post #5 of 6 (1716 views)
Permalink
Re: Re: [DRBD-user] drbd peer outdater exit codes [drbd ocf floating peers not working. don't try] [In reply to]

On Fri, Sep 12, 2008 at 08:36:14PM +0200, Lars Marowsky-Bree wrote:
> On 2008-09-12T11:54:38, Lars Ellenberg <lars.ellenberg [at] linbit> wrote:
>
> > don't do that.
> > won't work with any handlers.
> > for the "floating peer" stuff to work in practice,
> > we'd need much more work in the user land tools.
> >
> > while a neat idea in theory,
> > it does not work out in practice.
>
> That's simply not true. It works, but it does not work with dopd. That
> is a difference. ;-)

and it does not work with _any_ handler.
by default, starting with 8.2.6 iirc,
we call the "before-resync-target" handler,
whether that is configured or not,
before we become sync target.
if that does not know what node it is,
it will return != 0.
so resync (and connection) is aborted.


> > so the whole "floating peer" stuff is a hack in itself.
> > does anybody out there really use it?
>
> Yes, I think we have one or two customers use it to move the peers. Each
> rack has access to a separate SAN.

using drbd 0.7 ?

> > it might be possible to get this hack sort of working
> > by adding more hacks. like telling the kernel which
> > "nodename" to fake when calling the user space helpers.
> >
> > but I don't think that would be a good idea.
>
> I actually think that dopd is the real hack, and drbd instead should

dopd is a hack in itself.
but it is not the problem here.

the problem is to pretend to be someone which you are not,
and rely on the fragile hope that either no-one cares,
or that impersonation would somehow be propagated.

but, maybe we actually will propagate it starting drbd 8.2.7.
I'm not sure yet.
maybe I simply remove the __DRBD_NODE__ hack instead.
;-)

> listen to the notifications we provide, and infer the peer state by that
> means ... ;-)

yeah. I asked you before,
how exactly that would look like,
and so far I saw only handwaving.

yes, dopd is a hack.
but right now its the only thing that can do what it does,
namely prevent (ok, reduce the chance of) going online with stale data.

tell me how to get that done using "higher level"
heartbeat/crm/pacemaker constructs, and I'm happy to do that.
I'd much rather see dopd (the functionality) be implemented
in "higher levels" than to have to port that to OpenAIS or whatever
other low level cluster communications infrastructure there is to come.

--
: Lars Ellenberg
: LINBIT HA-Solutions GmbH
: DRBD®/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks
of LINBIT Information Technologies GmbH
__
please don't Cc me, but send to list -- I'm subscribed
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


lmb at suse

Sep 12, 2008, 1:27 PM

Post #6 of 6 (1720 views)
Permalink
Re: Re: [DRBD-user] drbd peer outdater exit codes [drbd ocf floating peers not working. don't try] [In reply to]

On 2008-09-12T22:15:41, Lars Ellenberg <lars.ellenberg [at] linbit> wrote:

> > That's simply not true. It works, but it does not work with dopd. That
> > is a difference. ;-)
> and it does not work with _any_ handler.
> by default, starting with 8.2.6 iirc,
> we call the "before-resync-target" handler,
> whether that is configured or not,
> before we become sync target.
> if that does not know what node it is,
> it will return != 0.
> so resync (and connection) is aborted.

Ah, right; I admit I was thinking drbd 0.7, not drbd8. Indeed, drbd8 has
the restrictions you mention.

But I'd really like to get this kind of functionality for drbd8 too.

> the problem is to pretend to be someone which you are not,
> and rely on the fragile hope that either no-one cares,
> or that impersonation would somehow be propagated.

Well, it worked for the use case we had.

> > listen to the notifications we provide, and infer the peer state by that
> > means ... ;-)
> yeah. I asked you before,
> how exactly that would look like,
> and so far I saw only handwaving.

Hm, I don't think there was hand-waving. Sorry. What was unclear?

You get notifications when the peer starts or goes down (or is fenced,
which looks the same). This is not yet relayed to drbd internally (just
the RA gets the notification so far), but we could, for example, call
"standalone" explicity to disconnect; we can discuss this mechanism.

When drbd loses the peer internally, but w/o us providing the
notification, it's either the replication link crashed, or fencing
failing or loss of quorum; anyway, you'd "outdate" yourself (and freeze
io) until this notification was provided (which of course needs to be
persistent across reboots).

Wouldn't that work?

> tell me how to get that done using "higher level"
> heartbeat/crm/pacemaker constructs, and I'm happy to do that.

Yes, sorry if that wasn't clear, and this is a good discussion to have.
I was very busy in the last few weeks, and apologize if I dropped the
ball somewhere.


Regards,
Lars

--
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Linux-HA dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.