Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

Bug: loops_per_jiffy based udelay() mostly shorter than requested

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


rmk at arm

Jan 9, 2011, 9:08 AM

Post #1 of 3 (620 views)
Permalink
Bug: loops_per_jiffy based udelay() mostly shorter than requested

I've been looking at the ARM implementation of udelay(), as Moore's Law
may bite us in maybe the next six or so months. In doing so, I've been
comparing udelay() candidates against our replacement sched_clock()
implementation, which on the platform I've been using has a resolution of
about 41ns.

However, I'm having a hard time making it satisfy the requirement that
"udelay() must produce a delay of at least the requested duration." It
appears that it mostly produces a delay less than requested, particularly
if IRQs are off or for delays shorter than the jiffy interval (up to a
point). At the bottom of this message are the measured udelay() values
for a range of delays from 1us to 5ms.

It seems that loops_per_jiffy is being a little optimistic. The causes
seem to be:

1. the calibration loop in init/calibrate.c runs a successive
approximation on the value. This is biased towards producing a
shorter delay than one jiffy, as the last bit tested will be cleared
if __delay(loops_per_jiffy) produced a delay longer than one jiffy.

Eg, if on the penultimate iteration, loopbit = 0x800, and
__delay(0x7f800) produced a longer than desired duration, we clear
this bit and try the next bit: loopbit = 0x400, __delay(0x7f400).
If this produces a delay longer than the duration, we exit with
loops_per_jiffy = 0x7f000 - and __delay(0x7f000) will produce a
delay shorter than one jiffy. Otherwise, the delay was still
shorter and we exit with loops_per_jiffy leaving the bit set -
so again __delay(0x7f400) was shorter than one jiffy.

2. as a consequence of how this algorithm works, what we're actually
measuring is the time taken for __delay(loops_per_jiffy) plus the
overhead of one timer IRQ. So we end up with a loops_per_jiffy which
produces a delay equivalent to
(usec_per_jiffy - timer_interrupt_usec)

The error we're talking about is around .7% towards delays being "too
short" on the ARM926 test platform. To get correct results, I need (eg)
a lpj value to be increased from 521216 (0x7F400) to 5000000 / 4964033
* 521216 = 524993 (0x802C1).

For udelay(us), we - just like many architectures - calculate the number
of loops using the formula:

loops = us * loops_per_jiffy * HZ / 1000000

and we fall short on 'loops' - so that a requested 5ms delay becomes a
4.96ms delay. This error is also reflected as a %age in shorter delays
too, until about 2us delays where we finally produce delays longer than
requested due to the overheads in calculating the loops value.

Given the difference in values (0x7F400 vs 0x802C1) (1) doesn't really
come into it as the dominant error is from (2) - in this case, 0x7f800
was measured as too long so cleared, and the last trial of 0x7f400 was
measured as too short.

Obviously, we can't run the calibration with IRQs off, otherwise jiffies
won't increment and the calibration algorithm locks up. We also may not
have sched_clock() initialized at this point either (it may also be
jiffy based), so we can't use that.

Any suggestions or thoughts, or should we not care too much if udelay()
produces slightly shorter than desired delays? Or am I doing something
horribly wrong in the ARM code?


== results and code ===
Measured delays against sched_lock() counter running at 24MHz (41ns
resolution) with IRQs off, and with the patch below applied to ARM to
eliminate lost precision in the udelay conversion:

Calibrating delay loop... 104.24 BogoMIPS (lpj=521216)
udelay(1) = 1516ns
udelay(2) = 2424ns
udelay(5) = 5116ns
udelay(10) = 10091ns
udelay(20) = 20099ns
udelay(50) = 49766ns
udelay(100) = 99474ns
udelay(200) = 198674ns
udelay(500) = 496524ns
udelay(1000) = 992916ns
udelay(2000) = 1985766ns
udelay(5000) = 4964033ns

With loops_per_jiffy forced to 524993 (based on measured shortfall and
current lpj):
udelay(1) = 1166ns
udelay(2) = 2232ns
udelay(5) = 5141ns
udelay(10) = 10166ns
udelay(20) = 20250ns
udelay(50) = 50208ns
udelay(100) = 100232ns
udelay(200) = 200241ns
udelay(500) = 500424ns
udelay(1000) = 1000733ns
udelay(2000) = 2001391ns
udelay(5000) = 5003041ns

This was produced by:

static int __init test_udelay(void)
{
unsigned long long sc1, sc2, sd0, sd;
unsigned long delay, dm, d[3] = {1, 2, 5};
int i, j;

local_irq_disable();
sc1 = sched_clock();
sc2 = sched_clock();
sd0 = sc2 - sc1; /* eliminate sched_clock() overhead */

for (dm = 1; dm < 10000; dm *= 10) {
for (j = 0; j < 3; j++) {
delay = dm * d[j];
for (sd = 0, i = 0; i < 5; i++) {
sc1 = sched_clock();
udelay(delay);
sc2 = sched_clock();
sd += sc2 - sc1 - sd0;
}
printk("udelay(%lu) = %luns\n", delay,
(unsigned long)sd / 5);
}
}
local_irq_enable();
return 0;
}
subsys_initcall(test_udelay).

=== patch to fix loss of precision in calculating 'loops' ===
This affects only udelay() itself. The function used to calibrate lpj
(__delay()) uses the result of this calculation and is independent of
this error.

diff --git a/arch/arm/lib/delay.S b/arch/arm/lib/delay.S
index 8d6a876..3c9a05c 100644
--- a/arch/arm/lib/delay.S
+++ b/arch/arm/lib/delay.S
@@ -25,11 +25,15 @@ ENTRY(__udelay)
ldr r2, .LC1
mul r0, r2, r0
ENTRY(__const_udelay) @ 0 <= r0 <= 0x7fffff06
+ mov r1, #-1
ldr r2, .LC0
ldr r2, [r2] @ max = 0x01ffffff
+ add r0, r0, r1, lsr #32-14
mov r0, r0, lsr #14 @ max = 0x0001ffff
+ add r2, r2, r1, lsr #32-10
mov r2, r2, lsr #10 @ max = 0x00007fff
mul r0, r2, r0 @ max = 2^32-1
+ add r0, r0, r1, lsr #32-6
movs r0, r0, lsr #6
moveq pc, lr


--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


torvalds at linux-foundation

Jan 12, 2011, 1:19 PM

Post #2 of 3 (603 views)
Permalink
Re: Bug: loops_per_jiffy based udelay() mostly shorter than requested [In reply to]

On Sun, Jan 9, 2011 at 9:08 AM, Russell King <rmk [at] arm> wrote:
>
> Any suggestions or thoughts, or should we not care too much if udelay()
> produces slightly shorter than desired delays?  Or am I doing something
> horribly wrong in the ARM code?

Judging by the numbers you quote, I would definitely put this in the
"don't care too much".

If it's about 1% off, it's all fine. If somebody picked a delay value
that is so sensitive to small errors in the delay that they notice
that - or even notice something like 5% - then they have picked too
short of a delay.

udelay() was never really meant to be some kind of precision
instrument. Especially with CPU's running at different frequencies,
we've historically had some rather wild fluctuation. The traditional
busy loop ends up being affected not just by interrupts, but also by
things like cache alignment (we used to inline it), and then later the
TSC-based one obviously depended on TSC's being stable (which they
weren't for a while).

So historically, we've seen udelay() being _really_ off (ie 50% off
etc), I wouldn't worry about things in the 1% range.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


linux at arm

Jan 12, 2011, 2:23 PM

Post #3 of 3 (602 views)
Permalink
Re: Bug: loops_per_jiffy based udelay() mostly shorter than requested [In reply to]

On Wed, Jan 12, 2011 at 01:19:49PM -0800, Linus Torvalds wrote:
> On Sun, Jan 9, 2011 at 9:08 AM, Russell King <rmk [at] arm> wrote:
> >
> > Any suggestions or thoughts, or should we not care too much if udelay()
> > produces slightly shorter than desired delays?  Or am I doing something
> > horribly wrong in the ARM code?
>
> Judging by the numbers you quote, I would definitely put this in the
> "don't care too much".

Thanks - that's my thoughts too. I think I'm still going to push out
the rounding patch for the sake of correctness even though it's down
in the noise.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.