Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DRBD: Users

cannot connect primary (remains StandAlone)

 

 

DRBD users RSS feed   Index | Next | Previous | View Threaded


Marcel.Gsteiger at milprog

Oct 2, 2009, 6:53 AM

Post #1 of 5 (1609 views)
Permalink
cannot connect primary (remains StandAlone)

Hi all

I'm using DRBD 8.3.2-6 on CentOS x86_64. I have a active/backup setup with 6 drbd devices, all being in the primary role on server 1, server 2 being in secondary role.

After restarting my primary server while the secondary server was online, I get this /proc/drbd on server1:

cat /proc/drbd
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by mockbuild [at] v20z-x86-64, 2009-08-29 14:08:07
(drbd0-2 are ok)
3: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----
ns:0 nr:0 dw:51316 dr:267907 al:140 bm:140 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:559036
(drbd4-5 are ok again)

On the Server 2, I had status UpToDate/DUnknown and WFConnection. But trying to connect the primary using
drbdadm connect res5
failed.

So I thought, perhaps I have a Split Brain and tried to resolve it according to the manual as follows:
on secondary:
drbdadm disconnect res5
drbdadm -- --discard-my-data connect winxp_c

..now server 2 shows with cat /proc/drbd:
3: cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown C r----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:14680064

but on the primary server it's still not possible to connect, it stays in state StandAlone without giving an error message.

What's the "official way" to get out from this situation?

Thanks for any help and regards
--Marcel
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


johannes.thoma at linbit

Oct 2, 2009, 7:36 AM

Post #2 of 5 (1517 views)
Permalink
Re: cannot connect primary (remains StandAlone) [In reply to]

Marcel,

Did you check the kernel logs (normally /var/log/kern.log)? Usually DRBD will
print the reason why it stays in StandAlone mode.

Hope that helps,

- Johannes


--
: DI. Johannes Thoma
: LINBIT | Your Way to High Availability
: Tel. +43-1-8178292-0, Fax +43-1-8178292-82


Marcel.Gsteiger at milprog

Oct 5, 2009, 3:44 AM

Post #3 of 5 (1491 views)
Permalink
Re: cannot connect primary (remains StandAlone) [In reply to]

Johannes,

the logs say:

conn( StandAlone -> Unconnected)
Starting receiver thread (from drbd3_worker [6482])
receiver (re)started
conn( Unconnected -> WFConnection )
bind before listen failed, err = -98
conn( WFConnedtion -> Disconnecting )
Discarding network configuration.
connection closed
conn( Disconnection -> StandAlone )
receiver terminated
Terminating receiver thread

Perhaps I should mention that I am running drbd across a bonding interface (two gigabit interfaces connected directly without going through a switch), running bonding mode 0 (balance-rr). When watching the counters and /proc/bond/bond0, everything seems to work ok for the other drbd devices synced through the same connection. I get a near-perfect load balancing and redundancy this way (at least so I hoped - this is the first time I am trying this).

My box has three interfaces: eth0 and eth2 are the slaves of my bonding interface bond0, eth1 is my "outside connection" - bond0 is used exclusively for the drbd interconnect between the two boxes.

Unfortunately I don't know what err = -98 could mean - any suggestions?

regards
-Marcel




>Marcel,
>
>Did you check the kernel logs (...)
>
>>Hi all
>>
>>I'm using DRBD 8.3.2-6 on CentOS x86_64. I have a active/backup setup with 6 drbd devices, all being in the primary role on server 1, server 2 being in secondary role.
>>
>>After restarting my primary server while the secondary server was online, I get this /proc/drbd on server1:
>>
>>cat /proc/drbd
>>version: 8.3.2 (api:88/proto:86-90)
>>GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by mockbuild [at] v20z-x86-64, 2009-08-29 14:08:07
>>(drbd0-2 are ok)
>> 3: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----
>> ns:0 nr:0 dw:51316 dr:267907 al:140 bm:140 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:559036
>> (drbd4-5 are ok again)
>>
>>On the Server 2, I had status UpToDate/DUnknown and WFConnection. But trying to connect the primary using
>>drbdadm connect res5
>>failed.
>>
>>So I thought, perhaps I have a Split Brain and tried to resolve it according to the manual as follows:
>>on secondary:
>>drbdadm disconnect res5
>>drbdadm -- --discard-my-data connect winxp_c
>>
>>..now server 2 shows with cat /proc/drbd:
>>3: cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown C r----
>> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:14680064
>>
>>but on the primary server it's still not possible to connect, it stays in state StandAlone without giving an error message.
>>
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


herve.gautier at thalesgroup

Oct 5, 2009, 5:03 AM

Post #4 of 5 (1493 views)
Permalink
Re: cannot connect primary (remains StandAlone) [In reply to]

Marcel Gsteiger a écrit :
> Johannes,
>
> the logs say:
>
> conn( StandAlone -> Unconnected)
> Starting receiver thread (from drbd3_worker [6482])
> receiver (re)started
> conn( Unconnected -> WFConnection )
> bind before listen failed, err = -98
> conn( WFConnedtion -> Disconnecting )
> Discarding network configuration.
> connection closed
> conn( Disconnection -> StandAlone )
> receiver terminated
> Terminating receiver thread
>
> Perhaps I should mention that I am running drbd across a bonding interface (two gigabit interfaces connected directly without going through a switch), running bonding mode 0 (balance-rr). When watching the counters and /proc/bond/bond0, everything seems to work ok for the other drbd devices synced through the same connection. I get a near-perfect load balancing and redundancy this way (at least so I hoped - this is the first time I am trying this).
>
> My box has three interfaces: eth0 and eth2 are the slaves of my bonding interface bond0, eth1 is my "outside connection" - bond0 is used exclusively for the drbd interconnect between the two boxes.
>
> Unfortunately I don't know what err = -98 could mean - any suggestions?
>
$ grep 98 /usr/include/asm/errno.h
#define EADDRINUSE 98 /* Address already in use */
> regards
> -Marcel
>
>
--
Rv

_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


Marcel.Gsteiger at milprog

Oct 5, 2009, 6:40 AM

Post #5 of 5 (1506 views)
Permalink
Re: cannot connect primary (remains StandAlone) [In reply to]

..I can't imagine why it gets that "address already in use" error. I
have 6 identically configured resources, resource 0 on port 7780,
resource 1 on port 7781 etc. Resources 0,1,2,4,5 work as expected, while
resource 3 gets this error. lsof -i does not show any used addresses, I
don't know how to find out who (except drbd itself) could use this
address.

I'm simply looking for a safe way to restart this resource without
having to throw away the entire partition and start from scratch - I
don't mind if the secondary will need to fully re-synchronize.

still looking for help...

--Marcel

>>> Hervé Gautier <herve.gautier [at] thalesgroup> 05.10.2009 14:03 >>>


Marcel Gsteiger a écrit :
> Johannes,
>
> the logs say:
>
> conn( StandAlone -> Unconnected)
> Starting receiver thread (from drbd3_worker [6482])
> receiver (re)started
> conn( Unconnected -> WFConnection )
> bind before listen failed, err = -98
> conn( WFConnedtion -> Disconnecting )
> Discarding network configuration.
> connection closed
> conn( Disconnection -> StandAlone )
> receiver terminated
> Terminating receiver thread
>
> Perhaps I should mention that I am running drbd across a bonding
interface (two gigabit interfaces connected directly without going
through a switch), running bonding mode 0 (balance-rr). When watching
the counters and /proc/bond/bond0, everything seems to work ok for the
other drbd devices synced through the same connection. I get a
near-perfect load balancing and redundancy this way (at least so I hoped
- this is the first time I am trying this).
>
> My box has three interfaces: eth0 and eth2 are the slaves of my
bonding interface bond0, eth1 is my "outside connection" - bond0 is used
exclusively for the drbd interconnect between the two boxes.
>
> Unfortunately I don't know what err = -98 could mean - any
suggestions?
>
$ grep 98 /usr/include/asm/errno.h
#define EADDRINUSE 98 /* Address already in use */
> regards
> -Marcel
>
>
--
Rv

_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user

DRBD users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.