Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DRBD: Users

PingAsk timeout.

 

 

DRBD users RSS feed   Index | Next | Previous | View Threaded


litao5 at hisense

Aug 15, 2012, 8:18 PM

Post #1 of 4 (304 views)
Permalink
PingAsk timeout.

Hi all,



I used drbd 8.3.7 on HA. When Master host is dead and HA swatches from
Master to Slave, the drbd can't switch because it spends 10 minutes to mount
its partition. But the time is timeout to HA.(in HA, default overtime is 2
miniutes).



Why does drbd spent that long time?



The log is:

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739458] block drbd1: peer(
Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate ->
DUnknown )

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739468] block drbd1: asender
terminated

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739470] block drbd1: Terminating
asender thread

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739526] block drbd1: short read
expecting header on sock: r=-512

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739666] block drbd1: Connection
closed

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739672] block drbd1: conn(
NetworkFailure -> Unconnected )

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739678] block drbd1: receiver
terminated

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739680] block drbd1: Restarting
receiver thread

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739683] block drbd1: receiver
(re)started

Jul 22 21:06:34 QD-CS-MDC-B kernel: [325560.739687] block drbd1: conn(
Unconnected -> WFConnection )

Jul 22 21:06:39 QD-CS-MDC-B pengine: [17776]: info: crm_log_init: Changed
active directory to /usr/var/lib/heartbeat/cores/root

Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.727331] NET: Registered protocol
family 17

Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.768912] block drbd0: role(
Secondary -> Primary )

Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.772742] block drbd1: role(
Secondary -> Primary )

Jul 22 21:06:47 QD-CS-MDC-B kernel: [325573.772997] block drbd1: Creating
new current UUID

Jul 22 21:08:47 QD-CS-MDC-B su: (to hitv) root on none

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032485] block drbd0: PingAck did
not arrive in time.

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032493] block drbd0: peer(
Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate ->
DUnknown )

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032503] block drbd0: asender
terminated

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032506] block drbd0: Terminating
asender thread

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032514] block drbd0: Creating
new current UUID

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032567] block drbd0: short read
expecting header on sock: r=-512

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032868] block drbd0: Connection
closed

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032875] block drbd0: conn(
NetworkFailure -> Unconnected )

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032879] block drbd0: receiver
terminated

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032881] block drbd0: Restarting
receiver thread

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032884] block drbd0: receiver
(re)started

Jul 22 21:16:47 QD-CS-MDC-B kernel: [326174.032888] block drbd0: conn(
Unconnected -> WFConnection )

Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.600888] kjournald starting.
Commit interval 15 seconds

Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.600956] EXT3-fs warning: maximal
mount count reached, running e2fsck is recommended

Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601330] EXT3 FS on drbd0,
internal journal

Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601334] EXT3-fs: recovery
complete.

Jul 22 21:16:48 QD-CS-MDC-B kernel: [326174.601392] EXT3-fs: mounted
filesystem with ordered data mode.



According to the log, the timeout is PingAsk operation.





Thanks your help.




simon


ff at mpexnet

Aug 28, 2012, 2:00 AM

Post #2 of 4 (263 views)
Permalink
Re: PingAsk timeout. [In reply to]

On 08/16/2012 05:18 AM, simon wrote:
> According to the log, the timeout is PingAsk operation.

PingAck did not arrive: DRBD just noticed the peer is dead. Funny this
happens much later for drbd0 than drbd1, but there you have it.

The problem is probably with your "HA" configuration, because it's too
slow to notice that a failover is necessary.

Are you running pacemaker? Your crm configuration would be relevant.

Best,
Felix
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


pascal.berton3 at free

Aug 28, 2012, 2:28 AM

Post #3 of 4 (265 views)
Permalink
Re: PingAsk timeout. [In reply to]

Yes, Simon, thanks for the yesterday's ifconfig reports (Which did not
reveal anything wrong BTW), but I also asked you the "crm configure show"
results and you still omitted it...

Regards,

Pascal.

-----Message d'origine-----
De : drbd-user-bounces [at] lists
[mailto:drbd-user-bounces [at] lists] De la part de Felix Frank
Envoyé : mardi 28 août 2012 11:00
À : simon
Cc : drbd-user [at] lists
Objet : Re: [DRBD-user] PingAsk timeout.

On 08/16/2012 05:18 AM, simon wrote:
> According to the log, the timeout is PingAsk operation.

PingAck did not arrive: DRBD just noticed the peer is dead. Funny this
happens much later for drbd0 than drbd1, but there you have it.

The problem is probably with your "HA" configuration, because it's too slow
to notice that a failover is necessary.

Are you running pacemaker? Your crm configuration would be relevant.

Best,
Felix
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user

_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


mahatma at bspu

Aug 28, 2012, 5:26 AM

Post #4 of 4 (265 views)
Permalink
Re: PingAsk timeout. [In reply to]

simon напиÑал:

> I used drbd 8.3.7 on HA. When Master host is dead and HA swatches from
> Master to Slave, the drbd can't switch because it spends 10 minutes to mount
> its partition. But the time is timeout to HA.(in HA, default overtime is 2
> miniutes).
>
>
>
> Why does drbd spent that long time?

For me it was new problem this summer. Even while I temporary replace broken
onboard e1001e by semi-compatible d-link/VIA card (with slow PCI transfer and
lost packets) I have no this error, but SUDDENLY it happened this sammer with
same configs, starting kernel 3.5.0. IMHO new kernels (including tcp stack and
network drivers) and drbd control some things like multitasking/preemption with
different ways and need more to care.

First, check hardware health. Then - all (not directly this) network cards on
same switch to PAUSE frame support (using ethtool, make Y if not sure - some
other cases like PSPacer is other subject). Then check kernel config and PREEMPT
states and related things. For example, now I virtualize semi-dead server and
inside of qemu-kvm - CONFIG_PARAVIRT_TIME_ACCOUNTING=y (with PREEMPT and
TREE_PREEMPT_RCU).

--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/


_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user

DRBD users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.