Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DRBD: Users

question on recovery from network failure on primary/primary

 

 

DRBD users RSS feed   Index | Next | Previous | View Threaded


brchrisman at gmail

Apr 5, 2012, 11:34 AM

Post #1 of 3 (289 views)
Permalink
question on recovery from network failure on primary/primary

I have a shared/parallel filesystem on top of drbd dual primary/protocol C
(using 8.3.11 right now).

My question is about recovering after a network outage where I have a
'resource-and-stonith' fence handler which panics both systems as soon as
possible.

Even with Protocol-C, can the bitmaps still have dirty bits set? (ie,
different writes on each local device which haven't returned/acknowledged
to the shared filesystem because they haven't yet been written remotely?)

Maybe a more concrete example will make my question clearer:
- node A & B (2 node cluster) are operating nominally in primary/primary
mode (shared filesystem provides locking and prevents simultaneous write
access to the same blocks on the shared disk).
- node A: write to drbd device, block 234567, written locally, but remote
copy does not complete due to network failure
- node B: write to drbd device, block 876543, written locally, but remote
copy does not complete due to network failure
- Both writes do not complete and do not return successfully to the
filesystem (protocolC).
- Fencing handler is invoked, where I can suspend-io and/or panic both
nodes (since neither one is reliable at this point).

If there is a chance of having unreplicated/unacknowledged writes on two
different disks (those writes can't conflict, because the shared filesystem
wont write to the same blocks on both nodes simultaneously), is there a
resync option that will effectively 'revert' any
unreplicated/unacknowledged writes?

I am considering writing a test for this and would like to know a bit more
about what to expect before I do so.

Thanks,
Brian


florian at hastexo

Apr 5, 2012, 11:53 AM

Post #2 of 3 (276 views)
Permalink
Re: question on recovery from network failure on primary/primary [In reply to]

On Thu, Apr 5, 2012 at 8:34 PM, Brian Chrisman <brchrisman [at] gmail> wrote:
> I have a shared/parallel filesystem on top of drbd dual primary/protocol C
> (using 8.3.11 right now).

_Which_ filesystem precisely?

> My question is about recovering after a network outage where I have a
> 'resource-and-stonith' fence handler which panics both systems as soon as
> possible.

Self-fencing is _not_ how a resource-and-stonith fencing handler is
meant to operate.

> Even with Protocol-C, can the bitmaps still have dirty bits set? (ie,
> different writes on each local device which haven't returned/acknowledged to
> the shared filesystem because they haven't yet been written remotely?)

The bitmaps only apply to background synchronization. Foreground
replication does not use the quick-sync bitmap.

> Maybe a more concrete example will make my question clearer:
> - node A & B (2 node cluster) are operating nominally in primary/primary
> mode (shared filesystem provides locking and prevents simultaneous write
> access to the same blocks on the shared disk).
> - node A: write to drbd device, block 234567, written locally, but remote
> copy does not complete due to network failure
> - node B: write to drbd device, block 876543, written locally, but remote
> copy does not complete due to network failure

Makes sense up to here.

> - Both writes do not complete and do not return successfully to the
> filesystem (protocolC).

You are aware that "do not return successfully" means that no
completion is signaled, which is correct, but not that non-completion
is signaled, which would be incorrect?

> - Fencing handler is invoked, where I can suspend-io and/or panic both nodes
> (since neither one is reliable at this point).

"Panicking" a node is pointless, and panicking both is even worse.
What fencing is meant to do is use an alternate communications channel
to remove the _other_ node, not the local one. And only one of them
will win.

> If there is a chance of having unreplicated/unacknowledged writes on two
> different disks (those writes can't conflict, because the shared filesystem
> wont write to the same blocks on both nodes simultaneously), is there a
> resync option that will effectively 'revert' any unreplicated/unacknowledged
> writes?

Yes, it's called the Activity Log, but you've got this part wrong as
you're under an apparent misconception as to what the fencing handler
should be doing.

> I am considering writing a test for this and would like to know a bit more
> about what to expect before I do so.

Tell us what exactly you're trying to achieve please?

Florian

--
Need help with High Availability?
http://www.hastexo.com/now
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


brchrisman at gmail

Apr 5, 2012, 1:07 PM

Post #3 of 3 (271 views)
Permalink
Re: question on recovery from network failure on primary/primary [In reply to]

On Thu, Apr 5, 2012 at 11:53 AM, Florian Haas <florian [at] hastexo> wrote:

> On Thu, Apr 5, 2012 at 8:34 PM, Brian Chrisman <brchrisman [at] gmail>
> wrote:
> > I have a shared/parallel filesystem on top of drbd dual primary/protocol
> C
> > (using 8.3.11 right now).
>
> _Which_ filesystem precisely?
>

I'm testing this with GPFS.


>
> > My question is about recovering after a network outage where I have a
> > 'resource-and-stonith' fence handler which panics both systems as soon as
> > possible.
>
> Self-fencing is _not_ how a resource-and-stonith fencing handler is
> meant to operate.
>

I'm not concerned about basic disconnect where I can use a tie breaker
setup (I do have a fencing setup which looks like it handles that just
fine, ie, selecting the 'working' node --defined by cluster membership-- to
continue/resume IO). I'm talking about something more apocalyptic where
both nodes can't contact a tie breaker. At this point I don't care about
having a node continue operations, I just want to make sure there's no data
corruption.


>
> > Even with Protocol-C, can the bitmaps still have dirty bits set? (ie,
> > different writes on each local device which haven't
> returned/acknowledged to
> > the shared filesystem because they haven't yet been written remotely?)
>
> The bitmaps only apply to background synchronization. Foreground
> replication does not use the quick-sync bitmap.
>

I was reading in the documentation that when a disconnect event occurred,
there's a UUID-shuffle where the 'current' -> 'bitmap' -> historic... and
'new' becomes 'current'. Is that the scheme we're discussing that's only
applicable to background sync?


>
> > Maybe a more concrete example will make my question clearer:
> > - node A & B (2 node cluster) are operating nominally in primary/primary
> > mode (shared filesystem provides locking and prevents simultaneous write
> > access to the same blocks on the shared disk).
> > - node A: write to drbd device, block 234567, written locally, but remote
> > copy does not complete due to network failure
> > - node B: write to drbd device, block 876543, written locally, but remote
> > copy does not complete due to network failure
>
> Makes sense up to here.
>
> > - Both writes do not complete and do not return successfully to the
> > filesystem (protocolC).
>
> You are aware that "do not return successfully" means that no
> completion is signaled, which is correct, but not that non-completion
> is signaled, which would be incorrect?
>

Yeah, I suppose there are a whole host of issues with this in regard to
sync/async writes, but my expectation was that a synchronous call would
hang.


>
> > - Fencing handler is invoked, where I can suspend-io and/or panic both
> nodes
> > (since neither one is reliable at this point).
>
> "Panicking" a node is pointless, and panicking both is even worse.
> What fencing is meant to do is use an alternate communications channel
> to remove the _other_ node, not the local one. And only one of them
> will win.
>

I was expecting fencing to basically mean the same thing as in the old SAN
sense of 'fencing off' a path to a device such that a surviving node can
tell the SAN "shut out those node that's screwed up/don't allow it to
write". In the apocalyptic case, I was using (perhaps abusing) this as a
callout in the case where a drbd network dies. But I suppose that this
would be the same scenario (if I crashed the nodes) as if there was a
simultaneous power failure to both nodes.


>
> > If there is a chance of having unreplicated/unacknowledged writes on two
> > different disks (those writes can't conflict, because the shared
> filesystem
> > wont write to the same blocks on both nodes simultaneously), is there a
> > resync option that will effectively 'revert' any
> unreplicated/unacknowledged
> > writes?
>
> Yes, it's called the Activity Log, but you've got this part wrong as
> you're under an apparent misconception as to what the fencing handler
> should be doing.
>

My impression of the fencing handler, with the 'resource-and-stonith'
option selected is:
When a write can't be completed to the remote disk, immediately suspend all
requests and call the provided fencing handler. If the fence handler
returns 7, then continue on in standalone mode (well, that's what I've been
intending to use it for).

The fence handler can/does get invoked on both nodes in primary/primary,
though not necessarily both at the same time. It seems once either fs
client/app issues a write to drbd, and it can't contact its peer, it
invokes the fencing handler (which is what I want).



>
> > I am considering writing a test for this and would like to know a bit
> more
> > about what to expect before I do so.
>
> Tell us what exactly you're trying to achieve please?
>

My current state:
My current setup is such that drbd in primary/primary handles a node being
disconnected from a cluster just fine (with a quorum indicating the
surviving node). I've been able to recover from that (treating the
surviving node as 'good' for continuity purposes). When the disconnected
node reconnects, it has to become secondary and sync to the 'good' node,
discarding, etc.

I was concerned that an apocalyptic outage (where everybody loses quorum),
can be recovered from. I hadn't read up on the activity log before, but
that's indeed what I was looking for. If there's a primary/primary setup
and the whole cluster loses power, then each peer in the drbd device will
rollback to a consistent point in the activity log?



>
> Florian
>
> --
> Need help with High Availability?
> http://www.hastexo.com/now
>

DRBD users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.