philipp.reisner at linbit
Apr 6, 2012, 6:01 AM
Post #1 of 2
Compared to 8.3.12 this has only bug fixes. All of them fixing
bugs that are very hard to trigger, or only affect operation in
an unnoticed way.
Nevertheless I need to explain the first item on the list.
There are quite a few conditions to be met, in order to
trigger this bug:
1) You have a device that actually does write-reordering
(I.e. a spinning hard disk (not SSD), without RAID
controller in front of it).
2) Network connection gets interrupted
3) A block on the degraded primary gets modified
4) Network connection gets established again, DRBD starts the resync
5) The same block on the primary is written again, at the same
time the resync process is about to resync it.
DRBD orders the operations on the SyncSource right, it orders the
write submissions on the SyncTarget right. Though in prior releases
DRBD may submitted the second write before it got completion signaled
for the first one.
At the current point in time it is known that there is one hardware
and Linux kernel combination out there, where this could lead to a
reordering of these two writes on the SyncTarget node. I.e. making
it look like, if one write operation was not mirrored.
Please note: This is a release candidate, not intended for production.
Please help with testing!
* Fixed a write ordering problem on SyncTarget nodes for a write
to a block that gets resynced at the same time. The bug can
only be triggered with a device that has a firmware that
actually reorders writes to the same block
* Fixed a race between disconnect and receive_state, that could case
a IO lockup
* Make sure that hard state changed do not disturb the connection
establishing process (I.e. detach due to an IO error). When the
bug was triggered it caused a retry in the connect process
* Postpone soft state changes to no disturb the connection
establishing process (I.e. becoming primary). When the bug
was triggered it could case both nodes going into SyncSource state
* Fixed a refcount leak that could cause failures when trying to
unload a protocol family modules, that was used by DRBD
* Deny normal detach (as opposed to --forced) if the user tries
to detach from the last UpToDate disk in the resource
* Fixed a possible protocol error that could be caused by
* Enforce the disk-timeout option also on meta-data IO operations
: Dipl-Ing Philipp Reisner
: LINBIT | Your Way to High Availability
: Tel: +43-1-8178292-50, Fax: +43-1-8178292-82
DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria.
drbd-user mailing list
drbd-user [at] lists