Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DRBD: Users

3-node active/active/active config?

 

 

DRBD users RSS feed   Index | Next | Previous | View Threaded


sujiannming at gmail

Nov 14, 2009, 10:08 PM

Post #1 of 9 (1506 views)
Permalink
3-node active/active/active config?

Don't know if this has been discussed before, but it seems like with
8.3's dual primary support, it's possible to set up a 3-node, all
active drbd cluster.

Each node would be dual primary with each other node. Node A would
form drbd0 with node B and drbd1 with node C. Node B would have drbd0
with A and drbd2 with C. And, C would have drbd1 with A and drbd2
with B.

Then you can use software raid to create md0 on each node with its
respective drbd devices.

Noce A: md0 (drbd0, drbd1)
Node B: md0 (drbd0, drbd2)
Node C: md0 (drbd1, drbd2)

Then format md0 on each node with a cluster filesystem like ocfs2 with
each node as a member of the cluster.

Do the drbd experts here have any thoughts on how this idea would go
horribly wrong? Thanks for any input and feedback.

--
Jiann-Ming Su
"I have to decide between two equally frightening options.
If I wanted to do that, I'd vote." --Duckman
"The system's broke, Hank. The election baby has peed in
the bath water. You got to throw 'em both out." --Dale Gribble
"Those who vote decide nothing.
Those who count the votes decide everything.” --Joseph Stalin
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


lars.ellenberg at linbit

Nov 17, 2009, 1:42 AM

Post #2 of 9 (1430 views)
Permalink
Re: 3-node active/active/active config? [In reply to]

On Sun, Nov 15, 2009 at 01:08:30AM -0500, Jiann-Ming Su wrote:
> Don't know if this has been discussed before, but it seems like with
> 8.3's dual primary support, it's possible to set up a 3-node, all
> active drbd cluster.
>
> Each node would be dual primary with each other node. Node A would
> form drbd0 with node B and drbd1 with node C. Node B would have drbd0
> with A and drbd2 with C. And, C would have drbd1 with A and drbd2
> with B.
>
> Then you can use software raid to create md0 on each node with its
> respective drbd devices.
>
> Noce A: md0 (drbd0, drbd1)
> Node B: md0 (drbd0, drbd2)
> Node C: md0 (drbd1, drbd2)
>
> Then format md0 on each node with a cluster filesystem like ocfs2 with
> each node as a member of the cluster.
>
> Do the drbd experts here have any thoughts on how this idea would go
> horribly wrong? Thanks for any input and feedback.


A-MD(A-DRBD0(A-sdx), A-DRBD1(A-sdy))
B-MD(B-DRBD0(B-sdx), B-DRBD2(B-sdy))
C-MD( C-DRBD1(C-sdx), C-DRBD2(C-sdy))


So you write block N on A-MD.
It will hit A-DRBD0, and A-DRBD1,
which will hit local disks A-sdx, A-sdy,
and via DRBD replication B-sdx, and C-sdx.

Still with me?
Great.

Now, you want to read that block N on B,
which decides in its read balancing path to fetch
block N from B-sdy.

It will surely get _some_ data from there,
but certainly not the data you just wrote on A-MD.

To summ it up in one word:
DON'T.

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list -- I'm subscribed
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


sujiannming at gmail

Nov 17, 2009, 7:10 AM

Post #3 of 9 (1432 views)
Permalink
Re: [Drbd-dev] 3-node active/active/active config? [In reply to]

On Tue, Nov 17, 2009 at 4:42 AM, Lars Ellenberg
<lars.ellenberg [at] linbit> wrote:
> On Sun, Nov 15, 2009 at 01:08:30AM -0500, Jiann-Ming Su wrote:
>> Don't know if this has been discussed before, but it seems like with
>> 8.3's dual primary support, it's possible to set up a 3-node, all
>> active drbd cluster.
>>
>> Each node would be dual primary with each other node.  Node A would
>> form drbd0 with node B and drbd1 with node C.  Node B would have drbd0
>> with A and drbd2 with C.  And, C would have drbd1 with A and drbd2
>> with B.
>>
>> Then you can use software raid to create md0 on each node with its
>> respective drbd devices.
>>
>> Noce A:  md0 (drbd0, drbd1)
>> Node B:  md0 (drbd0, drbd2)
>> Node C:  md0 (drbd1, drbd2)
>>
>> Then format md0 on each node with a cluster filesystem like ocfs2 with
>> each node as a member of the cluster.
>>
>> Do the drbd experts here have any thoughts on how this idea would go
>> horribly wrong?  Thanks for any input and feedback.
>
>
>  A-MD(A-DRBD0(A-sdx), A-DRBD1(A-sdy))
>  B-MD(B-DRBD0(B-sdx),                 B-DRBD2(B-sdy))
>  C-MD(                C-DRBD1(C-sdx), C-DRBD2(C-sdy))
>
>
> So you write block N on A-MD.
> It will hit A-DRBD0, and A-DRBD1,
>  which will hit local disks A-sdx, A-sdy,
>  and via DRBD replication B-sdx, and C-sdx.
>
> Still with me?
> Great.
>
> Now, you want to read that block N on B,
> which decides in its read balancing path to fetch
> block N from B-sdy.
>
> It will surely get _some_ data from there,
> but certainly not the data you just wrote on A-MD.
>
> To summ it up in one word:
> DON'T.
>

Thanks for the insight! So there's no easy way to verify a write has
sync'd across all the nodes? For the application I want to use this
type of 3-node config on, I think I'm willing to sacrifice the
performance for the data replication.

--
Jiann-Ming Su
"I have to decide between two equally frightening options.
If I wanted to do that, I'd vote." --Duckman
"The system's broke, Hank. The election baby has peed in
the bath water. You got to throw 'em both out." --Dale Gribble
"Those who vote decide nothing.
Those who count the votes decide everything.” --Joseph Stalin
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


lars.ellenberg at linbit

Nov 17, 2009, 9:50 AM

Post #4 of 9 (1433 views)
Permalink
Re: 3-node active/active/active config? [In reply to]

On Tue, Nov 17, 2009 at 10:10:13AM -0500, Jiann-Ming Su wrote:
> On Tue, Nov 17, 2009 at 4:42 AM, Lars Ellenberg
> <lars.ellenberg [at] linbit> wrote:
> > On Sun, Nov 15, 2009 at 01:08:30AM -0500, Jiann-Ming Su wrote:
> >> Don't know if this has been discussed before, but it seems like with
> >> 8.3's dual primary support, it's possible to set up a 3-node, all
> >> active drbd cluster.
> >>
> >> Each node would be dual primary with each other node.  Node A would
> >> form drbd0 with node B and drbd1 with node C.  Node B would have drbd0
> >> with A and drbd2 with C.  And, C would have drbd1 with A and drbd2
> >> with B.
> >>
> >> Then you can use software raid to create md0 on each node with its
> >> respective drbd devices.
> >>
> >> Noce A:  md0 (drbd0, drbd1)
> >> Node B:  md0 (drbd0, drbd2)
> >> Node C:  md0 (drbd1, drbd2)
> >>
> >> Then format md0 on each node with a cluster filesystem like ocfs2 with
> >> each node as a member of the cluster.
> >>
> >> Do the drbd experts here have any thoughts on how this idea would go
> >> horribly wrong?  Thanks for any input and feedback.
> >
> >
> >  A-MD(A-DRBD0(A-sdx), A-DRBD1(A-sdy))
> >  B-MD(B-DRBD0(B-sdx),                 B-DRBD2(B-sdy))
> >  C-MD(                C-DRBD1(C-sdx), C-DRBD2(C-sdy))
> >
> >
> > So you write block N on A-MD.
> > It will hit A-DRBD0, and A-DRBD1,
> >  which will hit local disks A-sdx, A-sdy,
> >  and via DRBD replication B-sdx, and C-sdx.
> >
> > Still with me?
> > Great.
> >
> > Now, you want to read that block N on B,
> > which decides in its read balancing path to fetch
> > block N from B-sdy.
> >
> > It will surely get _some_ data from there,
> > but certainly not the data you just wrote on A-MD.
> >
> > To summ it up in one word:
> > DON'T.
> >
>
> Thanks for the insight! So there's no easy way to verify a write has
> sync'd across all the nodes? For the application I want to use this
> type of 3-node config on, I think I'm willing to sacrifice the
> performance for the data replication.

No. You did not understand.

It is not a question of performance.
Or whether a write reached all *nodes*.

In your setup, it is technically *impossible*
for a write to reach all lower level *disks*.

A sure way to data corruption.

Got me this time?

;)

So by all means: use one iSCSI on DRBD cluster,
and have any number of ocfs2 clients via iSCSI.
Or double check if NFS can do the trick for you.

again, your fancy cool and whatever setup won't work.
DO NOT DO THIS.

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list -- I'm subscribed
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


sujiannming at gmail

Nov 17, 2009, 8:30 PM

Post #5 of 9 (1417 views)
Permalink
Re: 3-node active/active/active config? [In reply to]

On Tue, Nov 17, 2009 at 12:50 PM, Lars Ellenberg
<lars.ellenberg [at] linbit> wrote:
>
> No. You did not understand.
>
> It is not a question of performance.
> Or whether a write reached all *nodes*.
>
> In your setup, it is technically *impossible*
> for a write to reach all lower level *disks*.
>

Can you give a brief explanation of why that is the case?

> again, your fancy cool and whatever setup won't work.
> DO NOT DO THIS.
>


Is the problem the dual path? What if a single path was used in some
stacked configuration where a drbd is used as a backing device for
another drbd share?


> A sure way to data corruption.
>
> Got me this time?
>
>  ;)
>
> So by all means: use one iSCSI on DRBD cluster,
> and have any number of ocfs2 clients via iSCSI.
> Or double check if NFS can do the trick for you.
>

We have three geographically independent locations that have to share
data, but still remain independent. iSCSI is an interesting idea
since it exposes a block device. But, I could ask the same silly
question I'm asking now about drbd: is it possible to come up with
some clever way to do a 3-node config? Or, will it run into the same
technical limitations that you say drbd has?

I'm almost better off synchronizing with manually with rsync than
using iSCSI or NFS in the more conventional sense. I mean, currently,
we do a pretty simple two node DRBD config, but it'd be nice to add in
a third.

> again, your fancy cool and whatever setup won't work.
> DO NOT DO THIS.
>

The question really is whether this is a desirable configuration? If
so, what technical limitations have to be overcome to make something
like this work.

--
Jiann-Ming Su
"I have to decide between two equally frightening options.
If I wanted to do that, I'd vote." --Duckman
"The system's broke, Hank. The election baby has peed in
the bath water. You got to throw 'em both out." --Dale Gribble
"Those who vote decide nothing.
Those who count the votes decide everything.” --Joseph Stalin
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


gianluca.cecchi at gmail

Nov 18, 2009, 12:12 AM

Post #6 of 9 (1413 views)
Permalink
Re: 3-node active/active/active config? [In reply to]

On Wed, Nov 18, 2009 at 5:30 AM, Jiann-Ming Su <sujiannming [at] gmail>wrote:

>
> Can you give a brief explanation of why that is the case?
>
>
> Is the problem the dual path? What if a single path was used in some
> stacked configuration where a drbd is used as a backing device for
> another drbd share?
>
>
> --
> Jiann-Ming Su
>
>
It is wrong from a conceptual point of view, because as Lars already
explained, when you write on md0 of A, you are only writing on half md0 of B
(left side) and half md0 of C (right side).
So the other two md0 devices are not in sync with the original md0....
HIH,
Gianluca


gianluca.cecchi at gmail

Nov 18, 2009, 1:05 AM

Post #7 of 9 (1416 views)
Permalink
Re: 3-node active/active/active config? [In reply to]

On Wed, Nov 18, 2009 at 9:12 AM, Gianluca Cecchi
<gianluca.cecchi [at] gmail> wrote:
>
> On Wed, Nov 18, 2009 at 5:30 AM, Jiann-Ming Su <sujiannming [at] gmail> wrote:
>>
>> Can you give a brief explanation of why that is the case?
>>
>>
>> Is the problem the dual path?  What if a single path was used in some
>> stacked configuration where a drbd is used as a backing device for
>> another drbd share?
>>
>>
>> --
>> Jiann-Ming Su
>
> It is wrong from a conceptual point of view, because as Lars already explained, when you write on md0 of A, you are only writing on half md0 of B (left side) and half md0 of C (right side).
> So the other two md0 devices are not in sync with the original md0....
> HIH,
> Gianluca

Actually, based on your first post where you gave:

Noce A: md0 (drbd0, drbd1)
Node B: md0 (drbd0, drbd2)
Node C: md0 (drbd1, drbd2)

also for C you are hitting left side, so correct explanation would be:
when you write on md0 of A, you are only writing on half md0 of B
(left side) and half md0 of C (left side)
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


lars.ellenberg at linbit

Nov 18, 2009, 1:19 AM

Post #8 of 9 (1416 views)
Permalink
Re: 3-node active/active/active config? [In reply to]

On Tue, Nov 17, 2009 at 11:30:04PM -0500, Jiann-Ming Su wrote:
> On Tue, Nov 17, 2009 at 12:50 PM, Lars Ellenberg
> <lars.ellenberg [at] linbit> wrote:
> >
> > No. You did not understand.
> >
> > It is not a question of performance.
> > Or whether a write reached all *nodes*.
> >
> > In your setup, it is technically *impossible*
> > for a write to reach all lower level *disks*.
> >
>
> Can you give a brief explanation of why that is the case?

I thought I already did?

> > again, your fancy cool and whatever setup won't work.
> > DO NOT DO THIS.
> >
>
>
> Is the problem the dual path? What if a single path was used in some
> stacked configuration where a drbd is used as a backing device for
> another drbd share?

"Stacked DRBD", three (or four) node setups, are perfecly fine and
supported. It is NOT possible to have more than two nodes _active_
though. See the User's Guide or contact LINBIT for details.

> > A sure way to data corruption.
> >
> > Got me this time?
> >
> >  ;)
> >
> > So by all means: use one iSCSI on DRBD cluster,
> > and have any number of ocfs2 clients via iSCSI.
> > Or double check if NFS can do the trick for you.
> >
>
> We have three geographically independent locations

Who is "We", and where are those locations?
How far appart? Network latency? Bandwidth?
Data Volume?
Approximate number of Files? Directories? Files per Directory?
Average and peak _file_ creation/deletion/modification rate?
Average _data_ change rate?
Peak data change rate?

> that have to share data, but still remain independent.

And you think you want to drive one of the currently available
cluster filesystem in some "geographically dispersed" mode.

Yeahright.

Cluster file systems are latency critical.

Even if you'd get the all-active-fully-meshed-replication to work (we at
LINBIT are working on adding that functionality in some later version of
DRBD), latency will kill your performance.

And whenever you have a network hickup, you'd have no availability,
because you'd need to reboot at least one node.

I'm sure you have an interessting setup.
But to save you a lot of time experimenting with things that simply
won't work outside the lab, or possibly not even there, I think you
really could use some consultancy ;)

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list -- I'm subscribed
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


sujiannming at gmail

Nov 18, 2009, 12:07 PM

Post #9 of 9 (1398 views)
Permalink
Re: 3-node active/active/active config? [In reply to]

On Wed, Nov 18, 2009 at 4:19 AM, Lars Ellenberg
<lars.ellenberg [at] linbit> wrote:
> On Tue, Nov 17, 2009 at 11:30:04PM -0500, Jiann-Ming Su wrote:
>> On Tue, Nov 17, 2009 at 12:50 PM, Lars Ellenberg
>> <lars.ellenberg [at] linbit> wrote:
>> >
>> > No. You did not understand.
>> >
>> > It is not a question of performance.
>> > Or whether a write reached all *nodes*.
>> >
>> > In your setup, it is technically *impossible*
>> > for a write to reach all lower level *disks*.
>> >
>>
>> Can you give a brief explanation of why that is the case?
>
> I thought I already did?
>

Gianluca's explanation cleared it up for me.

>> Is the problem the dual path?  What if a single path was used in some
>> stacked configuration where a drbd is used as a backing device for
>> another drbd share?
>
> "Stacked DRBD", three (or four) node setups, are perfecly fine and
> supported.  It is NOT possible to have more than two nodes _active_
> though.  See the User's Guide or contact LINBIT for details.
>

Ah, okay. Thanks for clarifying that.

>> > A sure way to data corruption.
>> >
>> > Got me this time?
>> >
>> >  ;)
>> >
>> > So by all means: use one iSCSI on DRBD cluster,
>> > and have any number of ocfs2 clients via iSCSI.
>> > Or double check if NFS can do the trick for you.
>> >
>>
>> We have three geographically independent locations
>
> Who is "We", and where are those locations?
> How far appart? Network latency? Bandwidth?

Gig attached, less than 5ms ping times.

> Data Volume?

Less than 1GB.

> Approximate number of Files? Directories? Files per Directory?

Roughly 5000-10000 files and directories combined.

> Average and peak _file_ creation/deletion/modification rate?

Over a day, as low as 500 files/hr up to 10000 files/hr modification rate.

> Average _data_ change rate?
> Peak data change rate?
>
>> that have to share data, but still remain independent.
>
> And you think you want to drive one of the currently available
> cluster filesystem in some "geographically dispersed" mode.
>
> Yeahright.
>
> Cluster file systems are latency critical.
>

For this application, the filesystem performance, specifically writes,
is not that critical. What's important is data replication. We're
much more interested in the ability to write/modify files from any of
the three nodes.

> Even if you'd get the all-active-fully-meshed-replication to work (we at
> LINBIT are working on adding that functionality in some later version of
> DRBD), latency will kill your performance.
>

Performance is in the eye of the beholder... ;-)

> And whenever you have a network hickup, you'd have no availability,
> because you'd need to reboot at least one node.
>

Yeah, that's one of the nice things about the two node config. It's
relatively resilient to network issues.

> I'm sure you have an interessting setup.
> But to save you a lot of time experimenting with things that simply
> won't work outside the lab, or possibly not even there, I think you
> really could use some consultancy  ;)
>

Yeah, that's why I asked here first. :-) Thanks for all your insights.


--
Jiann-Ming Su
"I have to decide between two equally frightening options.
If I wanted to do that, I'd vote." --Duckman
"The system's broke, Hank. The election baby has peed in
the bath water. You got to throw 'em both out." --Dale Gribble
"Those who vote decide nothing.
Those who count the votes decide everything.” --Joseph Stalin
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user

DRBD users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.