Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DRBD: Users

Switching from internal to external meta-disk

 

 

DRBD users RSS feed   Index | Next | Previous | View Threaded


jbfarez at gmail

Jul 10, 2012, 9:59 AM

Post #1 of 4 (503 views)
Permalink
Switching from internal to external meta-disk

Hello,

Today I was confronted with this problem :

I've got a 2 nodes cluster. Both of them uses drbd resource to assume
failover.
The main drbd resource store all the data of an Oracle SGBD.

To solve my problematic (frequently resize resources), I decided to move
the drbd metadatas from internal to external volume.

The sub-layer used is LVM.
Here is how I was do this :


1. Create new logical volume to store meta-datas
2. Unmount the resource to migrate
3. Dump the meta-datas to plain file
4. Shutdown the resource (drbdadm down <RES>)
5. Modify the drbd.conf to use meta-disk external
6. Restart drbd daemon
7. Create md for the resource (drbdadm create-md <RES>)
8. Startup resource
9. Doing drbdadm -- --overwrite-data-of-peer primary <RES> (from the
primary node)
10. Let synchronize process ending
11. Done

At this step everything is fine, my SGBD was restarted without any warning,
nothing seems to go wrong.
But ... I was lost 11 days of data in my SGBD.

I'm disapointed, does anyone have an idea about this ?
At least to explain this.

Thanks


ff at mpexnet

Jul 11, 2012, 12:32 AM

Post #2 of 4 (484 views)
Permalink
Re: Switching from internal to external meta-disk [In reply to]

Hi,

On 07/10/2012 06:59 PM, Jean-Baptiste wrote:
> 9. Doing drbdadm -- --overwrite-data-of-peer primary <RES> (from the
> primary node)
> 10. Let synchronize process ending
> 11. Done
>
> At this step everything is fine, my SGBD was restarted without any
> warning, nothing seems to go wrong.
> But ... I was lost 11 days of data in my SGBD.

we've seen similar effects on several occasions on this list. So far, it
has always (iirc) been a case of "diskless primary".

Have you retained logs from 11 days ago? I'd expect you to find a note
around that time stating that your primary detached its backing device.

*If* this assumption is correct, here's what's happened then: You
primary happily kept writing data, but it never reached its local HDD.
Instead, all changes were written to the secondary's disk only. When you
did your changes and overwrote the data of the secondary, you killed
your data.

Bottom line is, it's crucial to be mindful of the health state of your
resources. Ideally, monitoring should report whenever your disks are not
UpToDate/UpToDate, among other possible problems.

HTH,
Felix
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


dbarker at visioncomm

Jul 11, 2012, 6:45 AM

Post #3 of 4 (485 views)
Permalink
Re: Switching from internal to external meta-disk [In reply to]

And further, the FIRST step in any maintenance should be is "cat
/proc/drbd". You would have seen which node had the current data.

Dan

-----Original Message-----
From: drbd-user-bounces [at] lists
[mailto:drbd-user-bounces [at] lists] On Behalf Of Felix Frank
Sent: Wednesday, July 11, 2012 3:33 AM
To: Jean-Baptiste
Cc: drbd-user [at] lists
Subject: Re: [DRBD-user] Switching from internal to external meta-disk

Hi,

On 07/10/2012 06:59 PM, Jean-Baptiste wrote:
> 9. Doing drbdadm -- --overwrite-data-of-peer primary <RES> (from the
> primary node)
> 10. Let synchronize process ending
> 11. Done
>
> At this step everything is fine, my SGBD was restarted without any
> warning, nothing seems to go wrong.
> But ... I was lost 11 days of data in my SGBD.

we've seen similar effects on several occasions on this list. So far, it has
always (iirc) been a case of "diskless primary".

Have you retained logs from 11 days ago? I'd expect you to find a note
around that time stating that your primary detached its backing device.

*If* this assumption is correct, here's what's happened then: You primary
happily kept writing data, but it never reached its local HDD.
Instead, all changes were written to the secondary's disk only. When you did
your changes and overwrote the data of the secondary, you killed your data.

Bottom line is, it's crucial to be mindful of the health state of your
resources. Ideally, monitoring should report whenever your disks are not
UpToDate/UpToDate, among other possible problems.

HTH,
Felix
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user

_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


lars.ellenberg at linbit

Jul 11, 2012, 12:24 PM

Post #4 of 4 (485 views)
Permalink
Re: Switching from internal to external meta-disk [In reply to]

On Tue, Jul 10, 2012 at 06:59:36PM +0200, Jean-Baptiste wrote:
> Hello,
>
> Today I was confronted with this problem :
>
> I've got a 2 nodes cluster. Both of them uses drbd resource to assume
> failover.
> The main drbd resource store all the data of an Oracle SGBD.
>
> To solve my problematic (frequently resize resources), I decided to move
> the drbd metadatas from internal to external volume.
>
> The sub-layer used is LVM.
> Here is how I was do this :
>

The starting point is missing.

Have the peers been "Connected UpToDate/UpToDate" ?

> 1. Create new logical volume to store meta-datas
> 2. Unmount the resource to migrate
> 3. Dump the meta-datas to plain file
> 4. Shutdown the resource (drbdadm down <RES>)
> 5. Modify the drbd.conf to use meta-disk external
> 6. Restart drbd daemon

uhm. there is no drbd "daemon" ... but anyways...

> 7. Create md for the resource (drbdadm create-md <RES>)

I don't see the step where you restore the previously dumped meta data?

> 8. Startup resource

So in 7. you told DRBD that it was a fresh "just created" instance,
without any valid data.

If by "startup resource" you mean "drbdadm up <resource>" (or adjust),
it will connect to the peer, which *does* have valid meta data,
so immediate full sync is started right here, from the peer
to the node where you did "create-md".

> 9. Doing drbdadm -- --overwrite-data-of-peer primary <RES> (from the
> primary node)

Well, of course you can promote a SyncTarget to Primary,
as it has access to the peers data.
But the "--overwrite-data-of-peer" is actually a "--force",
and only relevant if DRBD otherwise would have refused to promote.
Which in this case, it likely would not have, anyways.

> 10. Let synchronize process ending

Without noticing that it was not the expected direction.


> 11. Done
>
> At this step everything is fine, my SGBD was restarted without any warning,
> nothing seems to go wrong.
> But ... I was lost 11 days of data in my SGBD.
>
> I'm disapointed, does anyone have an idea about this ?
> At least to explain this.
>
> Thanks



--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user

DRBD users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.