Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

Linux-next Alsa hda_intel events/0 high CPU usage, bisected

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


benjkidwell at yahoo

Jun 6, 2008, 2:07 PM

Post #1 of 12 (906 views)
Permalink
Linux-next Alsa hda_intel events/0 high CPU usage, bisected

Linux-next Alsa hda_intel events/0 high CPU usage, bisected

During sound playback with recent linux-next-20080806 I observed much higher
than usual cpu usage for the events/0 process, locked at a near constant 4.3%
according to ps. 2.6.25-rc5 does not exhibit this behavior on my machine.

I git-bisected and arrived at this commit to sound/pci/hda/hda_intel.c
8a4bd4dff53ad23c32ffc0b366e592319dd566a4
which fixes IRQ timing issues.

As a very crude debugging test, I short-circuited this decision:

--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -1489,7 +1489,7 @@
}

if (pos % azx_dev->period_bytes > azx_dev->period_bytes / 2)
- return 0; /* NG - it's below the period boundary */
+ return 1; /* Tell a lie */
return 1; /* OK, it's fine */
}

and sure enough, the events/0 process cpu load dropped down to its usual
level, undetectable with top and ps. I don't know if the high events/0 load is
the intended behavior of this code on my hardware or not. Sound playback and
general system behavior seemed subjectively equivalent regardless of the
highly active events/0 process.

This report is from /dev/random_user and I'm not qualified to diff bugs vs.
features, so sorry if this message is just noise. I'm happy to provide more
info if needed.

Here's a bit of relevant info from my machine from a run with the high
events/0 usage. A few bits of dmesg:

[ 0.066419] ACPI: Core revision 20080321
[ 0.075988] ENABLING IO-APIC IRQs
[ 0.075988] ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.075988] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
[ 0.075988] ...trying to set up timer (IRQ0) through the 8259A ...
failed.
[ 0.075988] ...trying to set up timer as Virtual Wire IRQ... works.
[ 0.086170] CPU0: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 09
[ 0.086986] CPU 1 irqstacks, hard=c0359000 soft=c0357000
[ 0.086986] Booting processor 1/1 ip 6000
[ 0.096985] Initializing CPU#1
[ 0.096985] Calibrating delay using timer specific routine.. 5998.67
BogoMIPS (lpj=2999338)
[ 0.160975] ACPI: bus type pci registered
[ 0.160975] PCI: MCFG configuration 0: base e0000000 segment 0 buses
0 -
255
[ 0.160975] PCI: MCFG area at e0000000 reserved in E820
[ 0.160975] PCI: Using MMCONFIG for extended config space
[ 0.160975] PCI: Using configuration type 1 for base access
[ 0.160975] Setting up standard PCI resources
[ 0.166580] ACPI: EC: Look up EC in DSDT
[ 0.172973] ACPI: Interpreter enabled
[ 0.172977] ACPI: (supports S0 S3 S5)
[ 0.173185] ACPI: Using IOAPIC for interrupt routing
[ 0.179736] ACPI: PCI Root Bridge [PCI0] (0000:00)
[ 0.180460] PCI: Transparent bridge - 0000:00:14.4
[ 0.180566] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[ 0.180972] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P2P_._PRT]
[ 0.194970] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 11)
*0, disabled.
[ 0.195606] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 10 11)
*0, disabled.
[ 0.196309] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 10 11)
*0, disabled.
[ 0.197026] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 10 11)
*0, disabled.
[ 0.197969] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 *7 10 11)
[ 0.198286] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11)
*0, disabled.
[ 0.198989] ACPI: PCI Interrupt Link [LNK0] (IRQs 3 4 5 6 7 10 *11)
[ 0.199567] ACPI: PCI Interrupt Link [LNK1] (IRQs 3 4 *5 6 7 10 11)
[ 17.037709] piix4_smbus 0000:00:14.0: Found 0000:00:14.0 device
[ 17.135812] ACPI: PCI Interrupt 0000:00:14.2[A] -> GSI 16 (level,
low) -> IRQ 16
[ 27.183697] hda-intel: Invalid position buffer, using LPIB read
method instead.

output of lsmod:

ipv6 223300 12
cpufreq_userspace 2988 0
cpufreq_stats 4748 0
cpufreq_powersave 1792 0
cpufreq_ondemand 6796 0
freq_table 3592 2 cpufreq_stats,cpufreq_ondemand
sbs 11144 0
sbshc 4864 1 sbs
container 3456 0
battery 10116 0
iptable_filter 2688 0
ip_tables 9872 1 iptable_filter
x_tables 12164 1 ip_tables
ac 4228 0
loop 13452 0
af_packet 15232 2
snd_hda_intel 55692 3
snd_pcm 62084 1 snd_hda_intel
snd_timer 18312 1 snd_pcm
serio_raw 5124 0
snd 39092 9 snd_hda_intel,snd_pcm,snd_timer
psmouse 23432 0
i2c_piix4 7696 0
soundcore 5984 1 snd
snd_page_alloc 7304 2 snd_hda_intel,snd_pcm
i2c_core 18324 1 i2c_piix4
ati_agp 6540 0
agpgart 26312 1 ati_agp
button 6032 0
evdev 8704 3
ext3 116616 1
jbd 39828 1 ext3
mbcache 6404 1 ext3
usbhid 19364 0
hid 36096 1 usbhid
ide_cd_mod 29444 0
cdrom 33056 1 ide_cd_mod
sd_mod 20760 3
pata_atiixp 4608 0
8139too 20992 0
8139cp 17408 0
mii 4480 2 8139too,8139cp
atiixp 3588 0 [permanent]
ehci_hcd 31372 0
ohci_hcd 20228 0
sata_sil 7560 2
pata_acpi 3968 0
ata_generic 4868 0
usbcore 120176 4 usbhid,ehci_hcd,ohci_hcd
libata 139616 4
pata_atiixp,sata_sil,pata_acpi,ata_generic
scsi_mod 85448 2 sd_mod,libata
dock 7824 1 libata
ide_pci_generic 4100 0 [permanent]
ide_core 84904 3 ide_cd_mod,atiixp,ide_pci_generic
thermal 15004 0
processor 31424 1 thermal
fan 4228 0
fuse 45468 3

audio entry snipped from lspci -vvv
00:14.2 Audio device: ATI Technologies Inc IXP SB4x0 High Definition
Audio Controller (rev 01)
Subsystem: Micro-Star International Co., Ltd. Unknown device 725a
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64, Cache Line Size: 32 bytes
Interrupt: pin ? routed to IRQ 16
Region 0: Memory at fdff4000 (64-bit, non-prefetchable)
[size=16K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [60] Message Signalled Interrupts: Mask- 64bit+
Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


benjkidwell at yahoo

Jun 6, 2008, 2:15 PM

Post #2 of 12 (827 views)
Permalink
Re: Linux-next Alsa hda_intel events/0 high CPU usage, bisected [In reply to]

--- Benjamin Kidwell <benjkidwell[at]yahoo.com> wrote:

> according to ps. 2.6.25-rc5 does not exhibit this behavior on my machine.

That's a thinko. 2.6.26-rc5 does not exhibit the behavior. Sorry. 2.6.25.4
also does not, of course.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


tiwai at suse

Jun 7, 2008, 2:38 AM

Post #3 of 12 (816 views)
Permalink
Re: Linux-next Alsa hda_intel events/0 high CPU usage, bisected [In reply to]

At Fri, 6 Jun 2008 14:07:55 -0700 (PDT),
Benjamin Kidwell wrote:
>
> Linux-next Alsa hda_intel events/0 high CPU usage, bisected
>
> During sound playback with recent linux-next-20080806 I observed much higher
> than usual cpu usage for the events/0 process, locked at a near constant 4.3%
> according to ps. 2.6.25-rc5 does not exhibit this behavior on my machine.
>
> I git-bisected and arrived at this commit to sound/pci/hda/hda_intel.c
> 8a4bd4dff53ad23c32ffc0b366e592319dd566a4
> which fixes IRQ timing issues.
>
> As a very crude debugging test, I short-circuited this decision:
>
> --- a/sound/pci/hda/hda_intel.c
> +++ b/sound/pci/hda/hda_intel.c
> @@ -1489,7 +1489,7 @@
> }
>
> if (pos % azx_dev->period_bytes > azx_dev->period_bytes / 2)
> - return 0; /* NG - it's below the period boundary */
> + return 1; /* Tell a lie */
> return 1; /* OK, it's fine */
> }
>
> and sure enough, the events/0 process cpu load dropped down to its usual
> level, undetectable with top and ps. I don't know if the high events/0 load is
> the intended behavior of this code on my hardware or not. Sound playback and
> general system behavior seemed subjectively equivalent regardless of the
> highly active events/0 process.

The problem is that your hardware reports the wrong DMA position.
Some apps don't care about the exact position, but some do. The
latter apps (such as JACK running with two periods) suffer from this
bug severely.

I didn't expect that this delay is so much, (IOW, the hardware is so
buggy). Maybe it should be put into its own workq.
Also, you can replace cond_resched() with msleep(1) in
azx_irq_pending_work() as well.


Takashi

> This report is from /dev/random_user and I'm not qualified to diff bugs vs.
> features, so sorry if this message is just noise. I'm happy to provide more
> info if needed.
>
> Here's a bit of relevant info from my machine from a run with the high
> events/0 usage. A few bits of dmesg:
>
> [ 0.066419] ACPI: Core revision 20080321
> [ 0.075988] ENABLING IO-APIC IRQs
> [ 0.075988] ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
> [ 0.075988] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
> [ 0.075988] ...trying to set up timer (IRQ0) through the 8259A ...
> failed.
> [ 0.075988] ...trying to set up timer as Virtual Wire IRQ... works.
> [ 0.086170] CPU0: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 09
> [ 0.086986] CPU 1 irqstacks, hard=c0359000 soft=c0357000
> [ 0.086986] Booting processor 1/1 ip 6000
> [ 0.096985] Initializing CPU#1
> [ 0.096985] Calibrating delay using timer specific routine.. 5998.67
> BogoMIPS (lpj=2999338)
> [ 0.160975] ACPI: bus type pci registered
> [ 0.160975] PCI: MCFG configuration 0: base e0000000 segment 0 buses
> 0 -
> 255
> [ 0.160975] PCI: MCFG area at e0000000 reserved in E820
> [ 0.160975] PCI: Using MMCONFIG for extended config space
> [ 0.160975] PCI: Using configuration type 1 for base access
> [ 0.160975] Setting up standard PCI resources
> [ 0.166580] ACPI: EC: Look up EC in DSDT
> [ 0.172973] ACPI: Interpreter enabled
> [ 0.172977] ACPI: (supports S0 S3 S5)
> [ 0.173185] ACPI: Using IOAPIC for interrupt routing
> [ 0.179736] ACPI: PCI Root Bridge [PCI0] (0000:00)
> [ 0.180460] PCI: Transparent bridge - 0000:00:14.4
> [ 0.180566] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
> [ 0.180972] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P2P_._PRT]
> [ 0.194970] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 11)
> *0, disabled.
> [ 0.195606] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 10 11)
> *0, disabled.
> [ 0.196309] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 10 11)
> *0, disabled.
> [ 0.197026] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 10 11)
> *0, disabled.
> [ 0.197969] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 *7 10 11)
> [ 0.198286] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11)
> *0, disabled.
> [ 0.198989] ACPI: PCI Interrupt Link [LNK0] (IRQs 3 4 5 6 7 10 *11)
> [ 0.199567] ACPI: PCI Interrupt Link [LNK1] (IRQs 3 4 *5 6 7 10 11)
> [ 17.037709] piix4_smbus 0000:00:14.0: Found 0000:00:14.0 device
> [ 17.135812] ACPI: PCI Interrupt 0000:00:14.2[A] -> GSI 16 (level,
> low) -> IRQ 16
> [ 27.183697] hda-intel: Invalid position buffer, using LPIB read
> method instead.
>
> output of lsmod:
>
> ipv6 223300 12
> cpufreq_userspace 2988 0
> cpufreq_stats 4748 0
> cpufreq_powersave 1792 0
> cpufreq_ondemand 6796 0
> freq_table 3592 2 cpufreq_stats,cpufreq_ondemand
> sbs 11144 0
> sbshc 4864 1 sbs
> container 3456 0
> battery 10116 0
> iptable_filter 2688 0
> ip_tables 9872 1 iptable_filter
> x_tables 12164 1 ip_tables
> ac 4228 0
> loop 13452 0
> af_packet 15232 2
> snd_hda_intel 55692 3
> snd_pcm 62084 1 snd_hda_intel
> snd_timer 18312 1 snd_pcm
> serio_raw 5124 0
> snd 39092 9 snd_hda_intel,snd_pcm,snd_timer
> psmouse 23432 0
> i2c_piix4 7696 0
> soundcore 5984 1 snd
> snd_page_alloc 7304 2 snd_hda_intel,snd_pcm
> i2c_core 18324 1 i2c_piix4
> ati_agp 6540 0
> agpgart 26312 1 ati_agp
> button 6032 0
> evdev 8704 3
> ext3 116616 1
> jbd 39828 1 ext3
> mbcache 6404 1 ext3
> usbhid 19364 0
> hid 36096 1 usbhid
> ide_cd_mod 29444 0
> cdrom 33056 1 ide_cd_mod
> sd_mod 20760 3
> pata_atiixp 4608 0
> 8139too 20992 0
> 8139cp 17408 0
> mii 4480 2 8139too,8139cp
> atiixp 3588 0 [permanent]
> ehci_hcd 31372 0
> ohci_hcd 20228 0
> sata_sil 7560 2
> pata_acpi 3968 0
> ata_generic 4868 0
> usbcore 120176 4 usbhid,ehci_hcd,ohci_hcd
> libata 139616 4
> pata_atiixp,sata_sil,pata_acpi,ata_generic
> scsi_mod 85448 2 sd_mod,libata
> dock 7824 1 libata
> ide_pci_generic 4100 0 [permanent]
> ide_core 84904 3 ide_cd_mod,atiixp,ide_pci_generic
> thermal 15004 0
> processor 31424 1 thermal
> fan 4228 0
> fuse 45468 3
>
> audio entry snipped from lspci -vvv
> 00:14.2 Audio device: ATI Technologies Inc IXP SB4x0 High Definition
> Audio Controller (rev 01)
> Subsystem: Micro-Star International Co., Ltd. Unknown device 725a
> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 64, Cache Line Size: 32 bytes
> Interrupt: pin ? routed to IRQ 16
> Region 0: Memory at fdff4000 (64-bit, non-prefetchable)
> [size=16K]
> Capabilities: [50] Power Management version 2
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [60] Message Signalled Interrupts: Mask- 64bit+
> Queue=0/0 Enable-
> Address: 0000000000000000 Data: 0000
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo[at]vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


tiwai at suse

Jun 7, 2008, 2:38 AM

Post #4 of 12 (822 views)
Permalink
Re: Linux-next Alsa hda_intel events/0 high CPU usage, bisected [In reply to]

At Fri, 6 Jun 2008 14:15:37 -0700 (PDT),
Benjamin Kidwell wrote:
>
> --- Benjamin Kidwell <benjkidwell[at]yahoo.com> wrote:
>
> > according to ps. 2.6.25-rc5 does not exhibit this behavior on my machine.
>
> That's a thinko. 2.6.26-rc5 does not exhibit the behavior. Sorry. 2.6.25.4
> also does not, of course.

Right. The patch is only on linux-next.


Takashi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


benjkidwell at yahoo

Jun 7, 2008, 6:56 PM

Post #5 of 12 (795 views)
Permalink
Re: Linux-next Alsa hda_intel events/0 high CPU usage, bisected [In reply to]

--- Takashi Iwai <> wrote:

> The problem is that your hardware reports the wrong DMA position.
> I didn't expect that this delay is so much, (IOW, the hardware is so
> buggy). Maybe it should be put into its own workq.

I've been getting my process timing information from ps, top, and htop, which
I know can be unreliable in some ways (I've been using a tickless (NO_HZ=y)
kernel). I recompiled the g8a4bd4d kernel with that option off and a few more
changes to make my settings more conservative, and rechecked events/0. It
still reports at a consistent 2.5%, (down from about 4.3% with previous kernel
options) and after a short period of use (uptime 30 minutes) events/0 has had
0:48 of time. In comparison, Xorg and Firefox (usually the biggest users)
report 1:12 and 0:44 in the same timeframe.

So if I understand the purpose of the code, that means that my system has
generated 48 seconds worth of timing mistakes in audio playback relative to 25
minutes or so of audio stream, so that suggests the timing error of the
hardware is about 3%! That seems like huge latency for hardware so I see why
you wouldn't expect that.

I should also say that close listening has convinced me that my audio
performance has been IMPROVED by the patch - IOW, the code is working well at
improving the audio stream, even though its using a fair amount of CPU time to
do it. With the new code, I seem to hear less random clicks, interruptions, or
static that I had previously blamed on the source of the audio. I don't know
how to test the output to the speakers relative to the input source
quantitatively, though.

Given that the worst-case-scenario for compensating for buggy hardware is
worse than expected, what is the best behavior for the driver given hardware
like mine? I don't know if a high CPU time for the events/0 process is
actually a problem, I'm just not used to seeing a -5 niceness process so
active under top. Let me know if you'd like any more info on my hardware or
setup, or if there are any additional kernel config options that might affect
my DMA reporting errors. I'm trying to be a useful linux-next tester, so any
advice is appreciated.
Ben



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


tiwai at suse

Jun 9, 2008, 4:11 AM

Post #6 of 12 (789 views)
Permalink
Re: Linux-next Alsa hda_intel events/0 high CPU usage, bisected [In reply to]

At Sat, 7 Jun 2008 18:56:33 -0700 (PDT),
Benjamin Kidwell wrote:
>
> --- Takashi Iwai <> wrote:
>
> > The problem is that your hardware reports the wrong DMA position.
> > I didn't expect that this delay is so much, (IOW, the hardware is so
> > buggy). Maybe it should be put into its own workq.
>
> I've been getting my process timing information from ps, top, and htop, which
> I know can be unreliable in some ways (I've been using a tickless (NO_HZ=y)
> kernel). I recompiled the g8a4bd4d kernel with that option off and a few more
> changes to make my settings more conservative, and rechecked events/0. It
> still reports at a consistent 2.5%, (down from about 4.3% with previous kernel
> options) and after a short period of use (uptime 30 minutes) events/0 has had
> 0:48 of time. In comparison, Xorg and Firefox (usually the biggest users)
> report 1:12 and 0:44 in the same timeframe.
>
> So if I understand the purpose of the code, that means that my system has
> generated 48 seconds worth of timing mistakes in audio playback relative to 25
> minutes or so of audio stream, so that suggests the timing error of the
> hardware is about 3%! That seems like huge latency for hardware so I see why
> you wouldn't expect that.

Could you check the patch below in addition? It's against my latest
git tree and should be applicable as is to linux-next (since it merges
from my tree), too.

This will show some latency time in the kernel messages. If the
average latency is about 20 or 30us, this is almost in the expected
range. Occasional high latencies should be the result of
cond_resched(), so it can be ignored.

> I should also say that close listening has convinced me that my audio
> performance has been IMPROVED by the patch - IOW, the code is working well at
> improving the audio stream, even though its using a fair amount of CPU time to
> do it. With the new code, I seem to hear less random clicks, interruptions, or
> static that I had previously blamed on the source of the audio. I don't know
> how to test the output to the speakers relative to the input source
> quantitatively, though.

First we need to gather some hardware information to sort out the
issue. If the latency is almost constant, we may add some another
trick, e.g. setting different IRQ-trigger pointers. I'll prepare
another patch.


> Given that the worst-case-scenario for compensating for buggy hardware is
> worse than expected, what is the best behavior for the driver given hardware
> like mine? I don't know if a high CPU time for the events/0 process is
> actually a problem, I'm just not used to seeing a -5 niceness process so
> active under top. Let me know if you'd like any more info on my hardware or
> setup, or if there are any additional kernel config options that might affect
> my DMA reporting errors. I'm trying to be a useful linux-next tester, so any
> advice is appreciated.

The loop has cond_resched() and thus it won't give a significant
system latency (and thus niceness wouldn't be much important here).
But, of course, such a loop takes some CPU time, and it'd be better to
avoid.


thanks,

Takashi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


tiwai at suse

Jun 9, 2008, 4:13 AM

Post #7 of 12 (787 views)
Permalink
Re: Linux-next Alsa hda_intel events/0 high CPU usage, bisected [In reply to]

At Mon, 09 Jun 2008 13:11:43 +0200,
I wrote:
>
> At Sat, 7 Jun 2008 18:56:33 -0700 (PDT),
> Benjamin Kidwell wrote:
> >
> > --- Takashi Iwai <> wrote:
> >
> > > The problem is that your hardware reports the wrong DMA position.
> > > I didn't expect that this delay is so much, (IOW, the hardware is so
> > > buggy). Maybe it should be put into its own workq.
> >
> > I've been getting my process timing information from ps, top, and htop, which
> > I know can be unreliable in some ways (I've been using a tickless (NO_HZ=y)
> > kernel). I recompiled the g8a4bd4d kernel with that option off and a few more
> > changes to make my settings more conservative, and rechecked events/0. It
> > still reports at a consistent 2.5%, (down from about 4.3% with previous kernel
> > options) and after a short period of use (uptime 30 minutes) events/0 has had
> > 0:48 of time. In comparison, Xorg and Firefox (usually the biggest users)
> > report 1:12 and 0:44 in the same timeframe.
> >
> > So if I understand the purpose of the code, that means that my system has
> > generated 48 seconds worth of timing mistakes in audio playback relative to 25
> > minutes or so of audio stream, so that suggests the timing error of the
> > hardware is about 3%! That seems like huge latency for hardware so I see why
> > you wouldn't expect that.
>
> Could you check the patch below in addition? It's against my latest
> git tree and should be applicable as is to linux-next (since it merges
> from my tree), too.

Doh forgot the patch. Here it is.


thanks,

Takashi


diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index dc68709..c0ae0e8 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -917,6 +917,26 @@ static void azx_init_pci(struct azx *chip)
}


+static struct timespec irq_pending_time;
+static long max_delay, avg_delay, avg_count;
+
+static void check_latency(void)
+{
+ struct timespec end_time;
+ long delay;
+
+ getnstimeofday(&end_time);
+ delay = end_time.tv_sec - irq_pending_time.tv_sec;
+ delay *= 1000000000L;
+ delay += end_time.tv_nsec - irq_pending_time.tv_nsec;
+ if (delay > max_delay) {
+ max_delay = delay;
+ printk(KERN_DEBUG "XXX hda: max_delay=%ld ns\n", max_delay);
+ }
+ avg_delay += delay / 1000;
+ avg_count++;
+}
+
static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev);

/*
@@ -952,6 +972,7 @@ static irqreturn_t azx_interrupt(int irq, void *dev_id)
} else {
/* bogus IRQ, process it later */
azx_dev->irq_pending = 1;
+ getnstimeofday(&irq_pending_time);
schedule_work(&chip->irq_pending_work);
}
}
@@ -1288,6 +1309,13 @@ static int azx_pcm_close(struct snd_pcm_substream *substream)
hinfo->ops.close(hinfo, apcm->codec, substream);
snd_hda_power_down(apcm->codec);
mutex_unlock(&chip->open_mutex);
+ if (avg_count) {
+ printk(KERN_DEBUG "XXX hda: avg_delay=%ld us\n",
+ avg_delay / avg_count);
+ avg_count = 0;
+ avg_delay = 0;
+ max_delay = 0;
+ }
return 0;
}

@@ -1521,6 +1549,7 @@ static void azx_irq_pending_work(struct work_struct *work)
azx_dev->irq_pending = 0;
spin_unlock(&chip->reg_lock);
snd_pcm_period_elapsed(azx_dev->substream);
+ check_latency();
spin_lock(&chip->reg_lock);
} else
pending++;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


tiwai at suse

Jun 9, 2008, 6:13 AM

Post #8 of 12 (792 views)
Permalink
Re: Linux-next Alsa hda_intel events/0 high CPU usage, bisected [In reply to]

At Mon, 09 Jun 2008 13:11:43 +0200,
I wrote:
>
> At Sat, 7 Jun 2008 18:56:33 -0700 (PDT),
> Benjamin Kidwell wrote:
> >
> > --- Takashi Iwai <> wrote:
> >
> > > The problem is that your hardware reports the wrong DMA position.
> > > I didn't expect that this delay is so much, (IOW, the hardware is so
> > > buggy). Maybe it should be put into its own workq.
> >
> > I've been getting my process timing information from ps, top, and htop, which
> > I know can be unreliable in some ways (I've been using a tickless (NO_HZ=y)
> > kernel). I recompiled the g8a4bd4d kernel with that option off and a few more
> > changes to make my settings more conservative, and rechecked events/0. It
> > still reports at a consistent 2.5%, (down from about 4.3% with previous kernel
> > options) and after a short period of use (uptime 30 minutes) events/0 has had
> > 0:48 of time. In comparison, Xorg and Firefox (usually the biggest users)
> > report 1:12 and 0:44 in the same timeframe.
> >
> > So if I understand the purpose of the code, that means that my system has
> > generated 48 seconds worth of timing mistakes in audio playback relative to 25
> > minutes or so of audio stream, so that suggests the timing error of the
> > hardware is about 3%! That seems like huge latency for hardware so I see why
> > you wouldn't expect that.
>
> Could you check the patch below in addition? It's against my latest
> git tree and should be applicable as is to linux-next (since it merges
> from my tree), too.
>
> This will show some latency time in the kernel messages. If the
> average latency is about 20 or 30us, this is almost in the expected
> range. Occasional high latencies should be the result of
> cond_resched(), so it can be ignored.
>
> > I should also say that close listening has convinced me that my audio
> > performance has been IMPROVED by the patch - IOW, the code is working well at
> > improving the audio stream, even though its using a fair amount of CPU time to
> > do it. With the new code, I seem to hear less random clicks, interruptions, or
> > static that I had previously blamed on the source of the audio. I don't know
> > how to test the output to the speakers relative to the input source
> > quantitatively, though.
>
> First we need to gather some hardware information to sort out the
> issue. If the latency is almost constant, we may add some another
> trick, e.g. setting different IRQ-trigger pointers. I'll prepare
> another patch.

Here is the patch to do another trick.
The driver tries to delay the IRQ timing in one sample so that the DMA
pointer returns the proper position.

It seems working on my test machine. This patch can work together
with the last "latency measurement" patch, and you'll see no longer
latency things in the kernel message if this workaround really plays
its role.

Give it a try, and let me know if you still have any issue.


thanks,

Takashi


diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index dc68709..401a4fd 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -58,6 +58,7 @@ static int position_fix[SNDRV_CARDS];
static int probe_mask[SNDRV_CARDS] = {[0 ... (SNDRV_CARDS-1)] = -1};
static int single_cmd;
static int enable_msi;
+static int bdl_pos_adj = 1;

module_param_array(index, int, NULL, 0444);
MODULE_PARM_DESC(index, "Index value for Intel HD audio interface.");
@@ -77,6 +78,8 @@ MODULE_PARM_DESC(single_cmd, "Use single command to communicate with codecs "
"(for debugging only).");
module_param(enable_msi, int, 0444);
MODULE_PARM_DESC(enable_msi, "Enable Message Signaled Interrupt (MSI)");
+module_param(bdl_pos_adj, int, 0644);
+MODULE_PARM_DESC(bdl_pos_adj, "BDL position adjustment offset");

#ifdef CONFIG_SND_HDA_POWER_SAVE
/* power_save option is defined in hda_codec.c */
@@ -310,6 +313,7 @@ struct azx_dev {
unsigned int opened :1;
unsigned int running :1;
unsigned int irq_pending: 1;
+ unsigned int irq_ignore: 1;
};

/* CORB/RIRB */
@@ -943,6 +947,11 @@ static irqreturn_t azx_interrupt(int irq, void *dev_id)
azx_sd_writeb(azx_dev, SD_STS, SD_INT_MASK);
if (!azx_dev->substream || !azx_dev->running)
continue;
+ /* ignore the first dummy IRQ (due to pos_adj) */
+ if (azx_dev->irq_ignore) {
+ azx_dev->irq_ignore = 0;
+ continue;
+ }
/* check whether this IRQ is really acceptable */
if (azx_position_ok(chip, azx_dev)) {
azx_dev->irq_pending = 0;
@@ -977,14 +986,53 @@ static irqreturn_t azx_interrupt(int irq, void *dev_id)


/*
+ * set up a BDL entry
+ */
+static int setup_bdle(struct snd_pcm_substream *substream,
+ struct azx_dev *azx_dev, u32 **bdlp,
+ int ofs, int size, int with_ioc)
+{
+ struct snd_sg_buf *sgbuf = snd_pcm_substream_sgbuf(substream);
+ u32 *bdl = *bdlp;
+
+ while (size > 0) {
+ dma_addr_t addr;
+ int chunk;
+
+ if (azx_dev->frags >= AZX_MAX_BDL_ENTRIES)
+ return -EINVAL;
+
+ addr = snd_pcm_sgbuf_get_addr(sgbuf, ofs);
+ /* program the address field of the BDL entry */
+ bdl[0] = cpu_to_le32((u32)addr);
+ bdl[1] = cpu_to_le32(upper_32bit(addr));
+ /* program the size field of the BDL entry */
+ chunk = PAGE_SIZE - (ofs % PAGE_SIZE);
+ if (size < chunk)
+ chunk = size;
+ bdl[2] = cpu_to_le32(chunk);
+ /* program the IOC to enable interrupt
+ * only when the whole fragment is processed
+ */
+ size -= chunk;
+ bdl[3] = (size || !with_ioc) ? 0 : cpu_to_le32(0x01);
+ bdl += 4;
+ azx_dev->frags++;
+ ofs += chunk;
+ }
+ *bdlp = bdl;
+ return ofs;
+}
+
+/*
* set up BDL entries
*/
static int azx_setup_periods(struct snd_pcm_substream *substream,
struct azx_dev *azx_dev)
{
- struct snd_sg_buf *sgbuf = snd_pcm_substream_sgbuf(substream);
u32 *bdl;
int i, ofs, periods, period_bytes;
+ int pos_adj = 0;

/* reset BDL address */
azx_sd_writel(azx_dev, SD_BDLPL, 0);
@@ -998,39 +1046,44 @@ static int azx_setup_periods(struct snd_pcm_substream *substream,
bdl = (u32 *)azx_dev->bdl.area;
ofs = 0;
azx_dev->frags = 0;
- for (i = 0; i < periods; i++) {
- int size, rest;
- if (i >= AZX_MAX_BDL_ENTRIES) {
- snd_printk(KERN_ERR "Too many BDL entries: "
- "buffer=%d, period=%d\n",
- azx_dev->bufsize, period_bytes);
- /* reset */
- azx_sd_writel(azx_dev, SD_BDLPL, 0);
- azx_sd_writel(azx_dev, SD_BDLPU, 0);
- return -EINVAL;
+ azx_dev->irq_ignore = 0;
+ if (bdl_pos_adj > 0) {
+ struct snd_pcm_runtime *runtime = substream->runtime;
+ pos_adj = (bdl_pos_adj * runtime->rate + 47999) / 48000;
+ if (!pos_adj)
+ pos_adj = 1;
+ pos_adj = frames_to_bytes(runtime, pos_adj);
+ if (pos_adj >= period_bytes) {
+ snd_printk(KERN_WARNING "Too big adjustment %d\n",
+ bdl_pos_adj);
+ pos_adj = 0;
+ } else {
+ ofs = setup_bdle(substream, azx_dev,
+ &bdl, ofs, pos_adj, 1);
+ if (ofs < 0)
+ goto error;
+ azx_dev->irq_ignore = 1;
}
- rest = period_bytes;
- do {
- dma_addr_t addr = snd_pcm_sgbuf_get_addr(sgbuf, ofs);
- /* program the address field of the BDL entry */
- bdl[0] = cpu_to_le32((u32)addr);
- bdl[1] = cpu_to_le32(upper_32bit(addr));
- /* program the size field of the BDL entry */
- size = PAGE_SIZE - (ofs % PAGE_SIZE);
- if (rest < size)
- size = rest;
- bdl[2] = cpu_to_le32(size);
- /* program the IOC to enable interrupt
- * only when the whole fragment is processed
- */
- rest -= size;
- bdl[3] = rest ? 0 : cpu_to_le32(0x01);
- bdl += 4;
- azx_dev->frags++;
- ofs += size;
- } while (rest > 0);
+ }
+ for (i = 0; i < periods; i++) {
+ if (i == periods - 1 && pos_adj)
+ ofs = setup_bdle(substream, azx_dev, &bdl, ofs,
+ period_bytes - pos_adj, 0);
+ else
+ ofs = setup_bdle(substream, azx_dev, &bdl, ofs,
+ period_bytes, 1);
+ if (ofs < 0)
+ goto error;
}
return 0;
+
+ error:
+ snd_printk(KERN_ERR "Too many BDL entries: buffer=%d, period=%d\n",
+ azx_dev->bufsize, period_bytes);
+ /* reset */
+ azx_sd_writel(azx_dev, SD_BDLPL, 0);
+ azx_sd_writel(azx_dev, SD_BDLPU, 0);
+ return -EINVAL;
}

/*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


benjkidwell at yahoo

Jun 9, 2008, 3:09 PM

Post #9 of 12 (773 views)
Permalink
Re: Linux-next Alsa hda_intel events/0 high CPU usage, bisected [In reply to]

--- Takashi Iwai <> wrote:
>Here is the patch to do another trick.
>The driver tries to delay the IRQ timing in one sample so that the DMA
>pointer returns the proper position.

I refer to the above patch as "patch #2" in discussing the tests below.

>Could you check the patch below in addition? It's against my latest
>git tree and should be applicable as is to linux-next (since it merges
>from my tree), too.
>This will show some latency time in the kernel messages. If the
>average latency is about 20 or 30us, this is almost in the expected
>range. Occasional high latencies should be the result of
>cond_resched(), so it can be ignored.

Thanks very much for your time. Both patches applied to next-20080606. Patch
#2 to delay IRQ had 20/21 line offsets in hunks 4,5, and 6. I built kernels
with patch #1 only and with both patch #1 and #2. With reporting only (#1)
patch:
[ 28.897483] hda-intel: Invalid position buffer, using LPIB read method
instead.
[ 28.900819] XXX hda: max_delay=4535702 ns
[ 29.344196] XXX hda: avg_delay=940 us
[ 37.787584] XXX hda: max_delay=730886 ns
[ 38.008854] XXX hda: max_delay=733118 ns
[ 38.799034] XXX hda: max_delay=734487 ns
[ 38.849953] XXX hda: avg_delay=721 us
[ 38.958973] XXX hda: max_delay=728287 ns
[ 40.269209] XXX hda: max_delay=2020331 ns
[ 40.840784] XXX hda: max_delay=6639814 ns
[ 46.543949] XXX hda: max_delay=8514464 ns
[ 47.660992] XXX hda: avg_delay=905 us
[ 55.677483] XXX hda: max_delay=721319 ns
[ 55.701527] XXX hda: max_delay=722752 ns
[ 56.515656] XXX hda: max_delay=724277 ns
[ 56.762500] XXX hda: max_delay=729125 ns
[ 58.147477] XXX hda: max_delay=3541590 ns
[ 61.185499] XXX hda: max_delay=5094055 ns
[ 68.174152] XXX hda: max_delay=7323778 ns
[ 190.467470] XXX hda: max_delay=7446118 ns
[ 205.747463] XXX hda: max_delay=8168689 ns
[ 206.240797] XXX hda: max_delay=8587483 ns
[ 252.643947] XXX hda: max_delay=9425211 ns

With BOTH patches applied:
[ 29.988316] hda-intel: Invalid position buffer, using LPIB read method
instead.
[ 29.991655] XXX hda: max_delay=4357840 ns
[ 30.435030] XXX hda: avg_delay=603 us
[ 36.211751] XXX hda: max_delay=374595 ns
[ 36.534235] XXX hda: max_delay=1521772 ns
[ 37.278175] XXX hda: avg_delay=404 us
[ 37.384060] XXX hda: max_delay=366048 ns
[ 37.408317] XXX hda: max_delay=372760 ns
[ 37.583300] XXX hda: max_delay=2233243 ns
[ 39.310484] XXX hda: max_delay=2369779 ns
[ 39.367269] XXX hda: max_delay=9307321 ns
[ 46.185005] XXX hda: avg_delay=502 us
[ 66.041615] XXX hda: max_delay=367668 ns
[ 66.065345] XXX hda: max_delay=372093 ns
[ 67.521437] XXX hda: max_delay=373733 ns
[ 68.656740] XXX hda: max_delay=373915 ns
[ 71.574977] XXX hda: max_delay=4827092 ns
[ 107.734975] XXX hda: max_delay=6589856 ns
[ 111.288309] XXX hda: max_delay=6715648 ns
[ 151.121642] XXX hda: max_delay=7695259 ns
[ 220.478297] XXX hda: max_delay=9907063 ns
[ 352.964963] XXX hda: max_delay=10756583 ns

With both patches in place, CPU% reported by ps for events/0 during sound
playback is substantially reduced. ps reports consistent 0.6% cpu instead of
2-3% or more. I notice that the avg_delay report is much higher than the
20-30us you say is the expected range. It is lower with the second patch
applied, but still large.

Additional experimentation with various .config options had also shown that
options such as CONFIG_HZ also affect the reported CPU% and TIME from
events/0. Here's output from a kernel with both new patches applied and
configued with HZ=1000, and PREEMPT_RCU=y instead of 300 and
PREEMPT_VOLUNTARY.

[ 29.550748] hda-intel: Invalid position buffer, using LPIB read method
instead.
[ 29.553750] XXX hda: max_delay=3314411 ns
[ 29.998764] XXX hda: avg_delay=519 us
[ 31.406760] eth0: no IPv6 routers present
[ 35.297443] XXX hda: max_delay=375157 ns
[ 36.355211] XXX hda: avg_delay=147 us
[ 36.595685] XXX hda: max_delay=367326 ns
[ 36.620318] XXX hda: max_delay=373443 ns
[ 37.041228] XXX hda: max_delay=1481064 ns
[ 37.364519] XXX hda: max_delay=3983313 ns
[ 41.265473] XXX hda: max_delay=5290882 ns
[ 45.309086] XXX hda: avg_delay=437 us
[ 80.042739] XXX hda: max_delay=367816 ns
[ 80.067627] XXX hda: max_delay=370010 ns
[ 80.955786] XXX hda: max_delay=389501 ns
[ 82.881726] XXX hda: max_delay=1042298 ns
[ 95.345744] XXX hda: max_delay=1144114 ns
[ 95.397576] XXX hda: max_delay=3741325 ns
[ 104.039739] XXX hda: max_delay=6832958 ns
[ 137.435750] XXX hda: max_delay=9916982 ns

With this higher timer frequency and patch #2, ps reports avg. 1.2% events/0
usage vs. 2.3% with reporting patch #1 only. Here are timings from the same
.config but with only patch #1 (reporting) applied:

[ 29.782022] hda-intel: Invalid position buffer, using LPIB read method
instead.
[ 29.782708] XXX hda: max_delay=578709 ns
[ 29.804012] XXX hda: max_delay=667540 ns
[ 29.825934] XXX hda: max_delay=672530 ns
[ 29.847229] XXX hda: max_delay=727943 ns
[ 29.890260] XXX hda: max_delay=987514 ns
[ 30.231269] XXX hda: avg_delay=677 us
[ 30.254334] eth0: no IPv6 routers present
[ 35.129783] XXX hda: max_delay=725808 ns
[ 35.205719] XXX hda: max_delay=2394065 ns
[ 36.181885] XXX hda: avg_delay=787 us
[ 36.191131] XXX hda: max_delay=719683 ns
[ 36.215970] XXX hda: max_delay=725257 ns
[ 36.487256] XXX hda: max_delay=725928 ns
[ 36.734025] XXX hda: max_delay=726886 ns
[ 36.758713] XXX hda: max_delay=728874 ns
[ 37.257252] XXX hda: max_delay=5298044 ns
[ 43.330254] XXX hda: max_delay=6694653 ns
[ 44.392286] XXX hda: max_delay=7673067 ns
[ 44.892458] XXX hda: avg_delay=849 us
[ 70.744252] XXX hda: max_delay=721293 ns
[ 70.769134] XXX hda: max_delay=724172 ns
[ 71.262857] XXX hda: max_delay=726017 ns
[ 71.459987] XXX hda: max_delay=726454 ns
[ 71.657588] XXX hda: max_delay=729122 ns
[ 72.002882] XXX hda: max_delay=732543 ns
[ 78.322253] XXX hda: max_delay=948022 ns
[ 81.901253] XXX hda: max_delay=1121037 ns
[ 81.951748] XXX hda: max_delay=2948531 ns
[ 85.387252] XXX hda: max_delay=7308180 ns
[ 101.678225] XXX hda: max_delay=8139181 ns
[ 190.655620] XXX hda: max_delay=8721454 ns
[ 315.991170] XXX hda: max_delay=10913594 ns
[ 438.610252] XXX hda: max_delay=11273250 ns

SUMMARY: the IRQ timing delay change patch (#2) produces improvement in all
test cases, reducing the avg_delay and events/0 CPU%. The avg_delay is still
outside expected values, but perhaps this is the best possible result given
the bugginess of the hardware. Audio quality and system performance are very
good in all tests.

Thanks again very much for your time and the test patches. Happy to provide
more info and testing if its useful to you.
Ben



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


tiwai at suse

Jun 10, 2008, 3:13 AM

Post #10 of 12 (775 views)
Permalink
Re: Linux-next Alsa hda_intel events/0 high CPU usage, bisected [In reply to]

At Mon, 9 Jun 2008 15:09:33 -0700 (PDT),
Benjamin Kidwell wrote:
>
> --- Takashi Iwai <> wrote:
> >Here is the patch to do another trick.
> >The driver tries to delay the IRQ timing in one sample so that the DMA
> >pointer returns the proper position.
>
> I refer to the above patch as "patch #2" in discussing the tests below.
>
> >Could you check the patch below in addition? It's against my latest
> >git tree and should be applicable as is to linux-next (since it merges
> >from my tree), too.
> >This will show some latency time in the kernel messages. If the
> >average latency is about 20 or 30us, this is almost in the expected
> >range. Occasional high latencies should be the result of
> >cond_resched(), so it can be ignored.
>
> Thanks very much for your time. Both patches applied to next-20080606. Patch
> #2 to delay IRQ had 20/21 line offsets in hunks 4,5, and 6. I built kernels
> with patch #1 only and with both patch #1 and #2. With reporting only (#1)
> patch:
> [ 28.897483] hda-intel: Invalid position buffer, using LPIB read method
> instead.
> [ 28.900819] XXX hda: max_delay=4535702 ns
> [ 29.344196] XXX hda: avg_delay=940 us
> [ 37.787584] XXX hda: max_delay=730886 ns
> [ 38.008854] XXX hda: max_delay=733118 ns
> [ 38.799034] XXX hda: max_delay=734487 ns
> [ 38.849953] XXX hda: avg_delay=721 us
> [ 38.958973] XXX hda: max_delay=728287 ns
> [ 40.269209] XXX hda: max_delay=2020331 ns
> [ 40.840784] XXX hda: max_delay=6639814 ns
> [ 46.543949] XXX hda: max_delay=8514464 ns
> [ 47.660992] XXX hda: avg_delay=905 us
> [ 55.677483] XXX hda: max_delay=721319 ns
> [ 55.701527] XXX hda: max_delay=722752 ns
> [ 56.515656] XXX hda: max_delay=724277 ns
> [ 56.762500] XXX hda: max_delay=729125 ns
> [ 58.147477] XXX hda: max_delay=3541590 ns
> [ 61.185499] XXX hda: max_delay=5094055 ns
> [ 68.174152] XXX hda: max_delay=7323778 ns
> [ 190.467470] XXX hda: max_delay=7446118 ns
> [ 205.747463] XXX hda: max_delay=8168689 ns
> [ 206.240797] XXX hda: max_delay=8587483 ns
> [ 252.643947] XXX hda: max_delay=9425211 ns
>
> With BOTH patches applied:
> [ 29.988316] hda-intel: Invalid position buffer, using LPIB read method
> instead.
> [ 29.991655] XXX hda: max_delay=4357840 ns
> [ 30.435030] XXX hda: avg_delay=603 us
> [ 36.211751] XXX hda: max_delay=374595 ns
> [ 36.534235] XXX hda: max_delay=1521772 ns
> [ 37.278175] XXX hda: avg_delay=404 us
> [ 37.384060] XXX hda: max_delay=366048 ns
> [ 37.408317] XXX hda: max_delay=372760 ns
> [ 37.583300] XXX hda: max_delay=2233243 ns
> [ 39.310484] XXX hda: max_delay=2369779 ns
> [ 39.367269] XXX hda: max_delay=9307321 ns
> [ 46.185005] XXX hda: avg_delay=502 us
> [ 66.041615] XXX hda: max_delay=367668 ns
> [ 66.065345] XXX hda: max_delay=372093 ns
> [ 67.521437] XXX hda: max_delay=373733 ns
> [ 68.656740] XXX hda: max_delay=373915 ns
> [ 71.574977] XXX hda: max_delay=4827092 ns
> [ 107.734975] XXX hda: max_delay=6589856 ns
> [ 111.288309] XXX hda: max_delay=6715648 ns
> [ 151.121642] XXX hda: max_delay=7695259 ns
> [ 220.478297] XXX hda: max_delay=9907063 ns

Thanks for testing!

All the results show that the default bdl_pos_adj value is obviously
too small. As default it's 1, corresponding to 22 us. Judging from
avg_delay, bdl_pos_adj=32 could work better in your case.

Try to run

# echo 32 > /sys/modules/snd_hda_intel/parameters/bdl_pos_adj

and test again.

> With both patches in place, CPU% reported by ps for events/0 during sound
> playback is substantially reduced. ps reports consistent 0.6% cpu instead of
> 2-3% or more. I notice that the avg_delay report is much higher than the
> 20-30us you say is the expected range. It is lower with the second patch
> applied, but still large.

This reflects how often inaccurate IRQs are issued and/or inaccurate
DMA positions are reported. With the second patch, bdl_pos_adj
addition, the timing is forced to be slightly delayed and thus reduces
the busy loop. Maybe this delay covered some IRQs that were triggered
almost at the right time.


> Additional experimentation with various .config options had also shown that
> options such as CONFIG_HZ also affect the reported CPU% and TIME from
> events/0. Here's output from a kernel with both new patches applied and
> configued with HZ=1000, and PREEMPT_RCU=y instead of 300 and
> PREEMPT_VOLUNTARY.
>
> [ 29.550748] hda-intel: Invalid position buffer, using LPIB read method
> instead.
> [ 29.553750] XXX hda: max_delay=3314411 ns
> [ 29.998764] XXX hda: avg_delay=519 us
> [ 31.406760] eth0: no IPv6 routers present
> [ 35.297443] XXX hda: max_delay=375157 ns
> [ 36.355211] XXX hda: avg_delay=147 us
> [ 36.595685] XXX hda: max_delay=367326 ns
> [ 36.620318] XXX hda: max_delay=373443 ns
> [ 37.041228] XXX hda: max_delay=1481064 ns
> [ 37.364519] XXX hda: max_delay=3983313 ns
> [ 41.265473] XXX hda: max_delay=5290882 ns
> [ 45.309086] XXX hda: avg_delay=437 us
> [ 80.042739] XXX hda: max_delay=367816 ns
> [ 80.067627] XXX hda: max_delay=370010 ns
> [ 80.955786] XXX hda: max_delay=389501 ns
> [ 82.881726] XXX hda: max_delay=1042298 ns
> [ 95.345744] XXX hda: max_delay=1144114 ns
> [ 95.397576] XXX hda: max_delay=3741325 ns
> [ 104.039739] XXX hda: max_delay=6832958 ns
> [ 137.435750] XXX hda: max_delay=9916982 ns
>
> With this higher timer frequency and patch #2, ps reports avg. 1.2% events/0
> usage vs. 2.3% with reporting patch #1 only. Here are timings from the same
> .config but with only patch #1 (reporting) applied:
>
> [ 29.782022] hda-intel: Invalid position buffer, using LPIB read method
> instead.
> [ 29.782708] XXX hda: max_delay=578709 ns
> [ 29.804012] XXX hda: max_delay=667540 ns
> [ 29.825934] XXX hda: max_delay=672530 ns
> [ 29.847229] XXX hda: max_delay=727943 ns
> [ 29.890260] XXX hda: max_delay=987514 ns
> [ 30.231269] XXX hda: avg_delay=677 us
> [ 30.254334] eth0: no IPv6 routers present
> [ 35.129783] XXX hda: max_delay=725808 ns
> [ 35.205719] XXX hda: max_delay=2394065 ns
> [ 36.181885] XXX hda: avg_delay=787 us
> [ 36.191131] XXX hda: max_delay=719683 ns
> [ 36.215970] XXX hda: max_delay=725257 ns
> [ 36.487256] XXX hda: max_delay=725928 ns
> [ 36.734025] XXX hda: max_delay=726886 ns
> [ 36.758713] XXX hda: max_delay=728874 ns
> [ 37.257252] XXX hda: max_delay=5298044 ns
> [ 43.330254] XXX hda: max_delay=6694653 ns
> [ 44.392286] XXX hda: max_delay=7673067 ns
> [ 44.892458] XXX hda: avg_delay=849 us
> [ 70.744252] XXX hda: max_delay=721293 ns
> [ 70.769134] XXX hda: max_delay=724172 ns
> [ 71.262857] XXX hda: max_delay=726017 ns
> [ 71.459987] XXX hda: max_delay=726454 ns
> [ 71.657588] XXX hda: max_delay=729122 ns
> [ 72.002882] XXX hda: max_delay=732543 ns
> [ 78.322253] XXX hda: max_delay=948022 ns
> [ 81.901253] XXX hda: max_delay=1121037 ns
> [ 81.951748] XXX hda: max_delay=2948531 ns
> [ 85.387252] XXX hda: max_delay=7308180 ns
> [ 101.678225] XXX hda: max_delay=8139181 ns
> [ 190.655620] XXX hda: max_delay=8721454 ns
> [ 315.991170] XXX hda: max_delay=10913594 ns
> [ 438.610252] XXX hda: max_delay=11273250 ns
>
> SUMMARY: the IRQ timing delay change patch (#2) produces improvement in all
> test cases, reducing the avg_delay and events/0 CPU%. The avg_delay is still
> outside expected values, but perhaps this is the best possible result given
> the bugginess of the hardware. Audio quality and system performance are very
> good in all tests.

As mentioned in previous mails, you can ignore higher numbers in
general. This is a result of cond_resched() in the loop.
Better to concentrate on avg_delay (although it also counts these high
numbers and cannot be so accurate).

(BTW, please add me to Cc at the next time (or make lkml Cc) as other
people do. Otherwise it's hard to find your reply and follow up.)


thanks,

Takashi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


benjkidwell at yahoo

Jun 10, 2008, 5:20 AM

Post #11 of 12 (769 views)
Permalink
Re: Linux-next Alsa hda_intel events/0 high CPU usage, bisected [In reply to]

> --- Takashi Iwai <> wrote:
>Try to run
>
> # echo 32 > /sys/modules/snd_hda_intel/parameters/bdl_pos_adj
>and test again.

Thanks, I think that value must be just about perfect! Tested as follows:

(kill gdm)
#modprobe -r snd_hda_intel
#modprobe snd_hda_intel
#echo 32 > /sys/module/snd_hda_intel/bdl_pos_adj
(restart gdm, start playing sound)

That caused events/0 CPU% to return to almost nothing and the XXX hda delay
output did not trigger. I assume this means the latency has disappeared. For
comparison, I repeated the above processand returned bdl_pos_adj to 1, and the
delay printks reappeared and events/0 started showing significant cpu usage
again. Another reset of the driver and I reset bdl_pos_adj to 32 -- no delay
printks, almost no events/0 activity apparent.

So setting bdl_pos_adj to 32 with all new patches applied makes everything
perfect so far as I can tell. Thanks for the fix-ups for my buggy hardware,
and for reminding me to use reply and Cc: correctly. Let me know if there's
anything else I should test.
Ben



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


tiwai at suse

Jun 10, 2008, 5:36 AM

Post #12 of 12 (770 views)
Permalink
Re: Linux-next Alsa hda_intel events/0 high CPU usage, bisected [In reply to]

At Tue, 10 Jun 2008 05:20:49 -0700 (PDT),
Benjamin Kidwell wrote:
>
> > --- Takashi Iwai <> wrote:
> >Try to run
> >
> > # echo 32 > /sys/modules/snd_hda_intel/parameters/bdl_pos_adj
> >and test again.
>
> Thanks, I think that value must be just about perfect! Tested as follows:
>
> (kill gdm)
> #modprobe -r snd_hda_intel
> #modprobe snd_hda_intel
> #echo 32 > /sys/module/snd_hda_intel/bdl_pos_adj
> (restart gdm, start playing sound)
>
> That caused events/0 CPU% to return to almost nothing and the XXX hda delay
> output did not trigger. I assume this means the latency has disappeared. For
> comparison, I repeated the above processand returned bdl_pos_adj to 1, and the
> delay printks reappeared and events/0 started showing significant cpu usage
> again. Another reset of the driver and I reset bdl_pos_adj to 32 -- no delay
> printks, almost no events/0 activity apparent.
>
> So setting bdl_pos_adj to 32 with all new patches applied makes everything
> perfect so far as I can tell. Thanks for the fix-ups for my buggy hardware,
> and for reminding me to use reply and Cc: correctly. Let me know if there's
> anything else I should test.

Thanks for quick testing!

The above value 32 is likely the size of FIFO. I guess ATI chip
reports the position with FIFO offset while ICH reports smaller delay.
I'll create one more patch to assign this value for ATI chipset.

BTW, my latest test patches can be found on the following git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-unstable-2.6.git

These are not for 2.6.27, but some of them should go to the linux-next
tree now.


Takashi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.