
dejanmm at fastmail
Jul 2, 2009, 2:00 AM
Post #5 of 8
(512 views)
Permalink
|
|
Re: [patch] clock_t wrapped around causingfalse resourcestart failure
[In reply to]
|
|
On Wed, Jul 01, 2009 at 05:13:42PM +0200, Lars Ellenberg wrote: > On Wed, Jul 01, 2009 at 04:59:24PM +0200, Dejan Muhamedagic wrote: > > Hi Lars, > > > > On Wed, Jul 01, 2009 at 04:44:06PM +0200, Lars Ellenberg wrote: > > > On Wed, Jul 01, 2009 at 10:22:58AM -0400, Tavanyar, Simon wrote: > > > > Aahh ... so it's measured after reboot. That makes sense. > > > > Thanks, Lars. > > > > > > > > Should I wait for you to log a bugzilla on this? > > > > > > bugzilla and me don't go along very well, really ;-) > > > so please feel free to open that bugzilla, > > > and attach my proposed patch there. > > > > > > just to reiterate: > > > the problem is not the uptime wrap or resulting longclock wrap, > > > as that is handled correctly in cl_times() and time_longclock. > > > > > > but the cmp_longclock comparing unsigned values > > > without checking for wrap. > > > > The wrap is stuffed into upper bits of the longclock_t variables: > > > > return (lc_wrapcount | (longclock_t)timesval); > > > > Or am I missing something? The only explanation I can think of is > > that this is actually wrong: > > absolutely. Looking again, this is probably not a problem. The nexttime is always in the future. > > if (cmp_longclock(lnow, append->nexttime) >= 0) { > > > > i.e. it should be strictly greater. I guess that what happened is > > that somehow the prepare function got called fast enough, so that > > both timestamps ended equal. > > absolutely not. > ;) > > Much worse! > as far as I see, cmp_longclock is broken (does not care for wrap) I still don't understand what do you mean by "does not care for wrap". In case of 32-bit clock_t the wrapcount is stuffed into the upper bits of longclock_t which is 64-bit. How can it then not take it into account? In case of 64-bit clock_t, there is no wrap. It has enough bits to serve us until the end of the world. > even on 64bit archs, and the time window for this to happen is about > as large as the larges action (or else) timeout you set somewhere in > crm/lrm/whatever. > > did you have a look at my explanation ealier in this thread, > and the patch? (part of that message pasted here for reference) > > > if (cmp_longclock(lnow, append->nexttime) >= 0) > assuming that nexttime was set correctly, and lnow is correct, too, > and further assuming your clock_t is only 32 bit, > longclock_t is defined as unsigned long long, > and that thing becomes: > > if ((unsigned long long)(lnow) >= (unsigned long long)(append->nexttime)) > > which exactly does _NOT_ care for wrap around :( > > > example: > say, you start with a _large_ lnow (e.g. the equivalent of "-15 But you can't start with large lnow. lnow is 64-bit and you can't get that far into the future. > seconds"), and add an intervall of 30 seconds, resulting > "nexttime" become the equivalent of +15 seconds. > comparing absolute _unsigned_ means that test ends up comparing > if (something verry large >= something very small). > which is always true :( > > you need instead cast both values to _signed_, and compare the > _difference_ against zero, as below: > > > cmp_longclock macro and functions are broken. conceivably some other > longclock related macros/functions may be broken as well. > > it should probably be more like > (caution, not even compile tested) > ((and this time incompletely copy'n'pasted, > so whitespace will be wrong)) > > diff -r 731f8f7b5450 include/clplumbing/longclock.h > --- a/include/clplumbing/longclock.h Tue Jun 30 12:02:16 2009 +0200 > +++ b/include/clplumbing/longclock.h Wed Jul 01 12:27:27 2009 +0200 > @@ -127,9 +127,9 @@ > ((longclock_t)(l) - (longclock_t)(r)) > > # define cmp_longclock(l,r) \ > - (((longclock_t)(l) < (longclock_t)(r)) \ > + ((((long long)(l) - (long long)(r)) < 0) \ > ? -1 \ > - : (((longclock_t)(l) > (longclock_t)(r)) \ > + : ((((long long)(l) - (long long)(r)) > 0) \ > ? +1 : 0)) This patch may be correct, but for practical reasons it won't make any difference. Thanks, Dejan > -- > : Lars Ellenberg > : LINBIT | Your Way to High Availability > : DRBD/HA support and consulting http://www.linbit.com > > DRBD? and LINBIT? are registered trademarks of LINBIT, Austria. > _______________________________________________ > Linux-HA mailing list > Linux-HA[at]lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA[at]lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
|