
valentin at siteground
May 8, 2012, 1:26 AM
Post #1 of 4
(95 views)
Permalink
|
|
PROBLEM: 3.2.14 - Sporadic HARD lockups/__slab_free/intel_pmu_handle_irq->perf_event_overflow
|
|
Kindly Note that kernel is patched against grsec but grsec is not enabled inside the kernel configuration. [1.] One line summary of the problem: Sporadic hard lockups cause server to freeze. [2.] Full description of the problem/report: Server is freezing with no output logged to /var/log/messages. Bug messages are however properly exported and captured by the netconsole module. Issue can't be recreated. It occurs sporadicly and hard lockups appears to be caused by different parts of the kernel at least to me. Not sure which part triggers them. [3.] Keywords (i.e., modules, networking, kernel): intel_pmu_handle_irq perf_event_overflow hard lockup __slab_free aacraid [4.] Kernel information [4.1.] Kernel version (from /proc/version): Linux version 3.2.14-grsec-clean-sg1 (root [at] testbe) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-52)) #2 SMP Wed Apr 11 03:57:50 CDT 2012 Note that kernel is patched against grsec but grsec is not enabled in the kernel configuration. [4.2.] Kernel .config file: Attached [5.] Most recent kernel version which did not have the bug: 2.6.28 :-) [6.] Output of Oops.. message (if applicable) with symbolic information resolved (see Documentation/oops-tracing.txt) Case 1: ------------[ cut here ]------------ WARNING: at kernel/watchdog.c:241 watchdog_overflow_callback+0x9a/0xa4() Hardware name: X8DTL Watchdog detected hard LOCKUP on cpu 1 Modules linked in: netconsole configfs ipv6 nf_nat_ftp nf_conntrack_ftp xt_length xt_pkttype xt_dscp xt_multiport xt_owner ipt_REDIRECT iptable_nat nf_nat iptable_mangle iptable_raw autofs4 lockd sunrpc nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ip6_tables iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_multipath acpi_pad acpi_ipmi ipmi_msghandler e1000e iTCO_wdt iTCO_vendor_support ioatdma dca i2c_i801 i7core_edac dm_region_hash dm_log usb_storage ata_piix aacraid uhci_hcd ohci_hcd ehci_hcd raid1 md_mod [last unloaded: microcode] Pid: 0, comm: swapper/1 Not tainted 3.2.14-grsec-clean-sg1 #2 Call Trace: [<c102f100>] ? vprintk+0x248/0x32b [<c1061ea7>] ? watchdog_overflow_callback+0x9a/0xa4 [<c102e3ce>] warn_slowpath_common+0x75/0x8a [<c1061ea7>] ? watchdog_overflow_callback+0x9a/0xa4 [<c1061e0d>] ? __touch_watchdog+0x16/0x16 [<c102e45f>] warn_slowpath_fmt+0x2e/0x30 [<c1061ea7>] watchdog_overflow_callback+0x9a/0xa4 [<c106edd8>] __perf_event_overflow+0x131/0x1a4 [<c100c0b5>] ? x86_perf_event_set_period+0x1e7/0x1f2 [<c106ef6b>] perf_event_overflow+0x15/0x17 [<c100fa78>] intel_pmu_handle_irq+0x1d1/0x221 [<c1076ade>] ? __free_pages+0x1e/0x29 [<c109cf0f>] ? __free_slab+0xd8/0xe0 [<c109e1ed>] ? __slab_free+0x162/0x253 [<c100bb9f>] perf_event_nmi_handler+0x16/0x1c [<c1004e3a>] nmi_handle+0x2e/0x49 [<c1005083>] do_nmi+0x72/0x2af [<c1269650>] ? free_iova_mem+0xf/0x11 [<c13025e9>] nmi_stack_correct+0x28/0x2d [<c13017ed>] ? _raw_spin_lock_irqsave+0x1b/0x24 [<c126adac>] add_unmap+0x14/0x91 [<c126cc27>] intel_unmap_sg+0xf6/0xfe [<c11fcfa7>] ? scsi_done+0xb/0xd [<c122ab02>] ? ata_scsi_qc_complete+0x2e7/0x2ef [<c126cb31>] ? intel_map_sg+0x1f0/0x1f0 [<c122559a>] ata_sg_clean+0x6e/0x81 [<c12255f6>] __ata_qc_complete+0x49/0xb4 [<c12264ac>] ata_qc_complete+0x11e/0x131 [<c12339a7>] ata_hsm_qc_complete+0xb5/0xbb [<c1234013>] ata_sff_hsm_move+0x666/0x6bd [<c1004e3a>] ? nmi_handle+0x2e/0x49 [<c1005299>] ? do_nmi+0x288/0x2af [<c130007b>] ? __schedule+0x33e/0x836 [<c1234107>] __ata_sff_port_intr+0x9d/0xa9 [<c1234776>] ata_bmdma_port_intr+0x6d/0xce [<c1232957>] ata_bmdma_interrupt+0x71/0x14f [<c1062870>] handle_irq_event_percpu+0x25/0x110 [<c106297f>] handle_irq_event+0x24/0x3b [<c1064908>] ? handle_simple_irq+0x4f/0x4f [<c106496e>] handle_fasteoi_irq+0x66/0x8c [<c1003c43>] handle_irq+0x6a/0x8c Case 2: ------------[ cut here ]------------ WARNING: at kernel/watchdog.c:241 watchdog_overflow_callback+0x9a/0xa4() Hardware name: X8DTL Watchdog detected hard LOCKUP on cpu 5 Modules linked in: netconsole configfs nf_nat_ftp nf_conntrack_ftp xt_length xt_state xt_pkttype xt_dscp xt_multiport xt_owner ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle iptable_raw ip6t_REJECT ip6table_filter ip6_tables ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_multipath acpi_pad acpi_ipmi ipmi_msghandler e1000e ioatdma iTCO_wdt iTCO_vendor_support dca i7core_edac i2c_i801 dm_region_hash dm_log usb_storage ata_piix aacraid uhci_hcd ohci_hcd ehci_hcd raid1 md_mod Pid: 0, comm: swapper/5 Not tainted 3.2.14-grsec-clean-sg1 #2 Call Trace: [<c102f100>] ? vprintk+0x248/0x32b [<c1061ea7>] ? watchdog_overflow_callback+0x9a/0xa4 [<c102e3ce>] warn_slowpath_common+0x75/0x8a [<c1061ea7>] ? watchdog_overflow_callback+0x9a/0xa4 [<c1061e0d>] ? __touch_watchdog+0x16/0x16 [<c102e45f>] warn_slowpath_fmt+0x2e/0x30 [<c1061ea7>] watchdog_overflow_callback+0x9a/0xa4 [<c106edd8>] __perf_event_overflow+0x131/0x1a4 [<c100c0b5>] ? x86_perf_event_set_period+0x1e7/0x1f2 [<c106ef6b>] perf_event_overflow+0x15/0x17 [<c100fa78>] intel_pmu_handle_irq+0x1d1/0x221 [<c100fabb>] ? intel_pmu_handle_irq+0x214/0x221 [<c109e1ed>] ? __slab_free+0x162/0x253 [<c100bb9f>] perf_event_nmi_handler+0x16/0x1c [<c1004e3a>] nmi_handle+0x2e/0x49 [<c1005083>] do_nmi+0x72/0x2af [<c1269650>] ? free_iova_mem+0xf/0x11 [<c13025e9>] nmi_stack_correct+0x28/0x2d [<c13017ef>] ? _raw_spin_lock_irqsave+0x1d/0x24 [<c126adac>] add_unmap+0x14/0x91 [<c126cc27>] intel_unmap_sg+0xf6/0xfe [<c10558e3>] ? __smp_call_function_single+0x7d/0x83 [<c126cb31>] ? intel_map_sg+0x1f0/0x1f0 [<c12038cc>] scsi_dma_unmap+0x48/0x4f [<f82d7a15>] io_callback+0x62/0x145 [aacraid] [<f82dcc6b>] aac_intr_normal+0x1d3/0x25f [aacraid] [<f82ddd3a>] aac_rx_intr_message+0x60/0x98 [aacraid] [<c1062870>] handle_irq_event_percpu+0x25/0x110 [<c130007b>] ? __schedule+0x33e/0x836 [<c106297f>] handle_irq_event+0x24/0x3b [<c1064908>] ? handle_simple_irq+0x4f/0x4f [<c106496e>] handle_fasteoi_irq+0x66/0x8c [<c1003c43>] handle_irq+0x6a/0x8c <IRQ> [<c1032d9c>] ? _local_bh_enable+0xd/0xf [<c1003541>] do_IRQ+0x36/0x9c [<c1302d29>] common_interrupt+0x29/0x30 [<c104007b>] ? flush_workqueue_prep_cwqs+0x14b/0x14b [<c11cbbfa>] ? acpi_idle_enter_bm+0x237/0x26e [<c1260f9b>] cpuidle_idle_call+0x57/0x9f [<c100183c>] cpu_idle+0x47/0x6a [<c1521271>] start_secondary+0x1a5/0x1ab ---[ end trace 3aa9068f1323980e ]--- [7.] A small shell script or example program which triggers the problem (if possible) Unfortunately unavailable [8.] Environment [8.1.] Software (add the output of the ver_linux script here) ----------------- Linux servername.com 3.2.14-grsec-clean-sg1 #2 SMP Wed Apr 11 03:57:50 CDT 2012 i686 i686 i386 GNU/Linux Gnu C 4.1.2 Gnu make 3.81 binutils 2.17.50.0.6 util-linux 2.13-pre7 mount 2.13-pre7 module-init-tools 3.3-pre2 e2fsprogs 1.39 pcmciautils 014 quota-tools 3.13. PPP 2.4.4 isdn4k-utils 3.9 Linux C Library 2.5 Dynamic linker (ldd) 2.5 Procps 3.2.7 Net-tools 1.60 Kbd 1.12 Sh-utils 5.97 udev 095 Modules Loaded hcpdriver nf_nat_ftp nf_conntrack_ftp xt_length xt_state xt_pkttype xt_dscp xt_multiport xt_owner ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle iptable_raw ip6t_REJECT ip6table_filter ip6_tables ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_region_hash dm_log dm_multipath acpi_pad acpi_ipmi ipmi_msghandler igb iTCO_wdt iTCO_vendor_support ioatdma i2c_i801 dca i7core_edac ata_piix aacraid uhci_hcd ohci_hcd ehci_hcd raid1 md_mod ----------------- [8.2.] Processor information (from /proc/cpuinfo): processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz stepping : 2 cpu MHz : 2400.324 cache size : 12288 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 4 apicid : 0 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm [8] bogomips : 4800.64 [8.3.] Module information (from /proc/modules): netconsole 5653 0 - Live 0xf8305000 0xf82dc000 configfs 18130 2 netconsole, Live 0xf84c4000 0xf8220000 nf_nat_ftp 1314 0 - Live 0xf8757000 0xf8755000 nf_conntrack_ftp 4724 1 nf_nat_ftp, Live 0xf874e000 0xf874c000 xt_length 914 1 - Live 0xf8740000 0xf873e000 xt_state 948 51 - Live 0xf8738000 0xf8736000 xt_pkttype 798 4 - Live 0xf8730000 0xf872e000 xt_dscp 1203 1 - Live 0xf86b8000 0xf86b6000 xt_multiport 1331 2 - Live 0xf86b0000 0xf86ae000 xt_owner 864 4 - Live 0xf86a8000 0xf86a6000 ipt_REDIRECT 883 1 - Live 0xf869a000 0xf8698000 iptable_nat 2983 1 - Live 0xf8690000 0xf868e000 nf_nat 13254 3 nf_nat_ftp,ipt_REDIRECT,iptable_nat, Live 0xf8683000 0xf8681000 nf_conntrack_ipv4 9626 54 iptable_nat,nf_nat, Live 0xf8673000 0xf8671000 nf_conntrack 56362 6 nf_nat_ftp,nf_conntrack_ftp,xt_state,iptable_nat,nf_nat,nf_conntrack_ipv4, Live 0xf8654000 0xf8652000 nf_defrag_ipv4 1123 1 nf_conntrack_ipv4, Live 0xf8638000 0xf8636000 iptable_mangle 1269 0 - Live 0xf862a000 0xf8628000 iptable_raw 1060 0 - Live 0xf861b000 0xf8619000 ip6t_REJECT 2091 1 - Live 0xf85ee000 0xf85ec000 ip6table_filter 1245 1 - Live 0xf85b3000 0xf85b1000 ip6_tables 12454 1 ip6table_filter, Live 0xf85a7000 0xf85a5000 ipv6 242902 45 ip6t_REJECT, Live 0xf8554000 0xf8550000 iscsi_tcp 7435 0 - Live 0xf84f3000 0xf84f1000 libiscsi_tcp 12208 1 iscsi_tcp, Live 0xf84e4000 0xf84e2000 libiscsi 33681 2 iscsi_tcp,libiscsi_tcp, Live 0xf84cc000 0xf84ca000 scsi_transport_iscsi 33760 4 iscsi_tcp,libiscsi, Live 0xf84ad000 0xf84ab000 dm_mirror 10932 0 - Live 0xf83f7000 0xf83f5000 dm_region_hash 7873 1 dm_mirror, Live 0xf83e8000 0xf83e6000 dm_log 7385 2 dm_mirror,dm_region_hash, Live 0xf83b9000 0xf83b7000 dm_multipath 12486 0 - Live 0xf839c000 0xf8350000 acpi_pad 4593 0 - Live 0xf8334000 0xf8332000 acpi_ipmi 2615 0 - Live 0xf8314000 0xf82cd000 ipmi_msghandler 28300 1 acpi_ipmi, Live 0xf831b000 0xf82c5000 e1000e 114447 0 - Live 0xf837f000 0xf829d000 ioatdma 31331 24 - Live 0xf8223000 0xf8210000 dca 5184 1 ioatdma, Live 0xf8356000 0xf8354000 i7core_edac 14595 0 - Live 0xf8348000 0xf8346000 i2c_i801 6703 0 - Live 0xf8339000 0xf8337000 iTCO_wdt 10354 0 - Live 0xf8328000 0xf8326000 iTCO_vendor_support 2309 1 iTCO_wdt, Live 0xf8319000 0xf8317000 raid1 21370 0 - Live 0xf830a000 0xf8308000 md_mod 90902 1 raid1, Live 0xf82e0000 0xf82de000 ata_piix 12086 1 - Live 0xf82b3000 0xf82b1000 aacraid 64508 3 - Live 0xf827e000 0xf827c000 uhci_hcd 15568 0 - Live 0xf825a000 0xf8258000 ohci_hcd 14656 0 - Live 0xf8248000 0xf8246000 ehci_hcd 28982 0 - Live 0xf822e000 0xf822c000 [8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem) 0000-03af : PCI Bus 0000:00 0000-001f : dma1 0020-0021 : pic1 0040-0043 : timer0 0050-0053 : timer1 0060-0060 : keyboard 0064-0064 : keyboard 0070-0071 : rtc0 0080-008f : dma page reg 00a0-00a1 : pic2 00c0-00df : dma2 00f0-00ff : fpu 02f8-02ff : serial 03b0-03bb : PCI Bus 0000:00 03c0-03df : PCI Bus 0000:00 03c0-03df : vga+ 03e0-0cf7 : PCI Bus 0000:00 03f8-03ff : serial 0400-041f : 0000:00:1f.3 0400-041f : i801_smbus 04d0-04d1 : pnp 00:09 0500-057f : pnp 00:09 0800-087f : pnp 00:09 0800-0803 : ACPI PM1a_EVT_BLK 0804-0805 : ACPI PM1a_CNT_BLK 0808-080b : ACPI PM_TMR 0810-0815 : ACPI CPU throttle 0820-082f : ACPI GPE0_BLK 0830-0833 : iTCO_wdt 0850-0850 : ACPI PM2_CNT_BLK 0860-087f : iTCO_wdt 0a10-0a1f : pnp 00:06 0ca2-0ca3 : pnp 00:09 0cf8-0cff : PCI conf1 0d00-efff : PCI Bus 0000:00 1000-1fff : PCI Bus 0000:05 a400-a40f : 0000:00:1f.5 a400-a40f : ata_piix a480-a48f : 0000:00:1f.5 a480-a48f : ata_piix a800-a803 : 0000:00:1f.5 a800-a803 : ata_piix a880-a887 : 0000:00:1f.5 a880-a887 : ata_piix ac00-ac03 : 0000:00:1f.5 ac00-ac03 : ata_piix b000-b007 : 0000:00:1f.5 b000-b007 : ata_piix b400-b40f : 0000:00:1f.2 b400-b40f : ata_piix b480-b48f : 0000:00:1f.2 b480-b48f : ata_piix b800-b803 : 0000:00:1f.2 b800-b803 : ata_piix b880-b887 : 0000:00:1f.2 b880-b887 : ata_piix bc00-bc03 : 0000:00:1f.2 bc00-bc03 : ata_piix c000-c007 : 0000:00:1f.2 c000-c007 : ata_piix c080-c09f : 0000:00:1d.2 c080-c09f : uhci_hcd c400-c41f : 0000:00:1d.1 c400-c41f : uhci_hcd c480-c49f : 0000:00:1d.0 c480-c49f : uhci_hcd c800-c81f : 0000:00:1a.2 c800-c81f : uhci_hcd c880-c89f : 0000:00:1a.1 c880-c89f : uhci_hcd cc00-cc1f : 0000:00:1a.0 cc00-cc1f : uhci_hcd d000-dfff : PCI Bus 0000:06 dc00-dc1f : 0000:06:00.0 e000-efff : PCI Bus 0000:07 ec00-ec1f : 0000:07:00.0 f000-ffff : PCI Bus 0000:00 [8.5.] PCI information ('lspci -vvv' as root) 0000-03af : PCI Bus 0000:00 0000-001f : dma1 0020-0021 : pic1 0040-0043 : timer0 0050-0053 : timer1 0060-0060 : keyboard 0064-0064 : keyboard 0070-0071 : rtc0 0080-008f : dma page reg 00a0-00a1 : pic2 00c0-00df : dma2 00f0-00ff : fpu 02f8-02ff : serial 03b0-03bb : PCI Bus 0000:00 03c0-03df : PCI Bus 0000:00 03c0-03df : vga+ 03e0-0cf7 : PCI Bus 0000:00 03f8-03ff : serial 0400-041f : 0000:00:1f.3 0400-041f : i801_smbus 04d0-04d1 : pnp 00:09 0500-057f : pnp 00:09 0800-087f : pnp 00:09 0800-0803 : ACPI PM1a_EVT_BLK 0804-0805 : ACPI PM1a_CNT_BLK 0808-080b : ACPI PM_TMR 0810-0815 : ACPI CPU throttle 0820-082f : ACPI GPE0_BLK 0830-0833 : iTCO_wdt 0850-0850 : ACPI PM2_CNT_BLK 0860-087f : iTCO_wdt 0a10-0a1f : pnp 00:06 0ca2-0ca3 : pnp 00:09 0cf8-0cff : PCI conf1 0d00-efff : PCI Bus 0000:00 1000-1fff : PCI Bus 0000:05 a400-a40f : 0000:00:1f.5 a400-a40f : ata_piix a480-a48f : 0000:00:1f.5 a480-a48f : ata_piix a800-a803 : 0000:00:1f.5 a800-a803 : ata_piix a880-a887 : 0000:00:1f.5 a880-a887 : ata_piix ac00-ac03 : 0000:00:1f.5 ac00-ac03 : ata_piix b000-b007 : 0000:00:1f.5 b000-b007 : ata_piix b400-b40f : 0000:00:1f.2 b400-b40f : ata_piix b480-b48f : 0000:00:1f.2 b480-b48f : ata_piix b800-b803 : 0000:00:1f.2 b800-b803 : ata_piix b880-b887 : 0000:00:1f.2 b880-b887 : ata_piix bc00-bc03 : 0000:00:1f.2 bc00-bc03 : ata_piix c000-c007 : 0000:00:1f.2 c000-c007 : ata_piix c080-c09f : 0000:00:1d.2 c080-c09f : uhci_hcd c400-c41f : 0000:00:1d.1 c400-c41f : uhci_hcd c480-c49f : 0000:00:1d.0 c480-c49f : uhci_hcd c800-c81f : 0000:00:1a.2 c800-c81f : uhci_hcd c880-c89f : 0000:00:1a.1 c880-c89f : uhci_hcd cc00-cc1f : 0000:00:1a.0 cc00-cc1f : uhci_hcd d000-dfff : PCI Bus 0000:06 dc00-dc1f : 0000:06:00.0 e000-efff : PCI Bus 0000:07 ec00-ec1f : 0000:07:00.0 f000-ffff : PCI Bus 0000:00 root [at] m03ne:~# cat /proc/iomem 00000000-0000ffff : reserved 00010000-00095bff : System RAM 00095c00-0009ffff : reserved 000a0000-000bffff : PCI Bus 0000:00 000a0000-000bffff : Video RAM area 000c0000-000c7fff : Video ROM 000c8000-000c8fff : Adapter ROM 000c9000-000cf7ff : Adapter ROM 000d0000-000dffff : PCI Bus 0000:00 000e4000-000fffff : reserved 000f0000-000fffff : System ROM 00100000-bf77ffff : System RAM 01000000-01306fff : Kernel code 01481000-014eecbf : Kernel data 0154b000-01598fff : Kernel bss bf780000-bf78dfff : RAM buffer bf78e000-bf78ffff : reserved bf790000-bf79dfff : ACPI Tables bf79e000-bf7cffff : ACPI Non-volatile Storage bf7d0000-bf7dffff : reserved bf7e0000-bf7ebfff : RAM buffer bf7ec000-bfffffff : reserved c0000000-dfffffff : PCI Bus 0000:00 c0000000-c01fffff : PCI Bus 0000:07 c0200000-c03fffff : PCI Bus 0000:06 c0400000-c05fffff : PCI Bus 0000:05 c0600000-c07fffff : PCI Bus 0000:05 e0000000-efffffff : PCI MMCONFIG 0000 [bus 00-ff] e0000000-efffffff : reserved e0000000-efffffff : pnp 00:0c f0000000-fed8ffff : PCI Bus 0000:00 f9000000-f9ffffff : PCI Bus 0000:08 f9000000-f9ffffff : 0000:08:01.0 fab00000-fadfffff : PCI Bus 0000:01 fab80000-fabfffff : 0000:01:00.0 fac00000-fadfffff : 0000:01:00.0 faf00000-fb7fffff : PCI Bus 0000:08 faffc000-faffffff : 0000:08:01.0 fb000000-fb7fffff : 0000:08:01.0 fbc00000-fbcfffff : PCI Bus 0000:06 fbcdc000-fbcdffff : 0000:06:00.0 fbcdc000-fbcdffff : e1000e fbce0000-fbcfffff : 0000:06:00.0 fbce0000-fbcfffff : e1000e fbd00000-fbdfffff : PCI Bus 0000:07 fbddc000-fbddffff : 0000:07:00.0 fbddc000-fbddffff : e1000e fbde0000-fbdfffff : 0000:07:00.0 fbde0000-fbdfffff : e1000e fbed2000-fbed20ff : 0000:00:1f.3 fbed4000-fbed7fff : 0000:00:1b.0 fbed8000-fbed83ff : 0000:00:1d.7 fbed8000-fbed83ff : ehci_hcd fbeda000-fbeda3ff : 0000:00:1a.7 fbeda000-fbeda3ff : ehci_hcd fbedc000-fbedffff : 0000:00:16.7 fbedc000-fbedffff : ioatdma fbee0000-fbee3fff : 0000:00:16.6 fbee0000-fbee3fff : ioatdma fbee4000-fbee7fff : 0000:00:16.5 fbee4000-fbee7fff : ioatdma fbee8000-fbeebfff : 0000:00:16.4 fbee8000-fbeebfff : ioatdma fbeec000-fbeeffff : 0000:00:16.3 fbeec000-fbeeffff : ioatdma fbef0000-fbef3fff : 0000:00:16.2 fbef0000-fbef3fff : ioatdma fbef4000-fbef7fff : 0000:00:16.1 fbef4000-fbef7fff : ioatdma fbef8000-fbefbfff : 0000:00:16.0 fbef8000-fbefbfff : ioatdma fec00000-fec003ff : IOAPIC 0 fec8a000-fec8afff : 0000:00:13.0 fec8a000-fec8a3ff : IOAPIC 1 fed00000-fed003ff : HPET 0 fed1c000-fed1ffff : pnp 00:01 fed1c000-fed1ffff : pnp 00:09 fed20000-fed3ffff : pnp 00:09 fed40000-fed8ffff : pnp 00:09 fee00000-fee00fff : Local APIC fee00000-fee00fff : reserved fee00000-fee00fff : pnp 00:0b ffc00000-ffffffff : reserved 100000000-43fffffff : System RAM [8.6.] SCSI information (from /proc/scsi/scsi) Attached devices: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: Adaptec Model: 5405Z RAID10 Rev: V1.0 Type: Direct-Access ANSI SCSI revision: 02 Host: scsi0 Channel: 01 Id: 00 Lun: 00 Vendor: WDC Model: WD1003FBYX-0 Rev: 01.0 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi0 Channel: 01 Id: 01 Lun: 00 Vendor: WDC Model: WD1003FBYX-0 Rev: 01.0 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi0 Channel: 01 Id: 02 Lun: 00 Vendor: WDC Model: WD1003FBYX-0 Rev: 01.0 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi0 Channel: 01 Id: 03 Lun: 00 Vendor: WDC Model: WD1002FBYS-0 Rev: 03.0 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi1 Channel: 00 Id: 00 Lun: 00 Vendor: ATA Model: INTEL SSDSA2CW12 Rev: 4PC1 Type: Direct-Access ANSI SCSI revision: 05 [8.7.] Other information that might be relevant to the problem (please look in /proc and include all information that you think to be relevant): None [X.] Other notes, patches, fixes, workarounds: Freezes like the one described above has been outgoing for us since 2.6.28. Every kernel version we are trying to utilize _after_ 2.6.28 causes sporadic machine restarts, hard lockups and in 99% of the cases _NO_ output is logged to /var/log/messages. This applies to 2.6.32, 2.6.38.6, 3.2.5, 3.2.9. Sadly but we are still struggling to find stable kernel for our servers after 2.6.28. Not sure if this is going to help but before we tried if 3.2.14 will work correctly for us, we used 3.2.5 and experienced similar lockups on similar hardware configuration. ------------[ cut here ]------------ WARNING: at kernel/watchdog.c:241 watchdog_overflow_callback+0x97/0xa3() Watchdog detected hard LOCKUP on cpu 1 Modules linked in: netconsole cryptoloop configfs nf_nat_ftp nf_conntrack_ftp xt_length ipt_REJECT xt_state xt_pkttype xt_dscp xt_multiport xt_owner iptable_filter ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle iptable_raw ip_tables iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi xt_set ip_set nfnetlink dm_mirror dm_region_hash dm_log dm_multipath dm_mod thermal pci_slot hed acpi_pad sg usbhid evdev button processor e1000e ioatdma iTCO_wdt iTCO_vendor_support i7core_edac dca edac_core ata_piix libata 3w_sas sd_mod scsi_mod uhci_hcd ohci_hcd ehci_hcd [last unloaded: netconsole] Pid: 0, comm: swapper/1 Not tainted 3.2.5-grsec-sg6 #1 Call Trace: [<0005c011>] ? watchdog_overflow_callback+0x97/0xa3 [<0002ac89>] ? warn_slowpath_common+0x5d/0x72 [<0005c011>] ? watchdog_overflow_callback+0x97/0xa3 [<0005bf7a>] ? __touch_watchdog+0x12/0x12 [<0002ad26>] ? warn_slowpath_fmt+0x33/0x37 [<0005c011>] ? watchdog_overflow_callback+0x97/0xa3 [<00068454>] ? __perf_event_overflow+0x141/0x1bb [<0000ffff>] ? intel_pmu_enable_event+0x142/0x1d0 [<000686c4>] ? perf_event_overflow+0x12/0x14 [<0001079c>] ? intel_pmu_handle_irq+0x1c3/0x213 [<0000cfaa>] ? perf_event_nmi_handler+0x13/0x18 [<00005f83>] ? nmi_handle+0x2c/0x48 [<000061db>] ? do_nmi+0x9d/0x2de [<0021b15c>] ? dma_pfn_level_pte+0x7b/0x93 [<0021c4ca>] ? dma_pte_free_pagetable+0x74/0x1d6 [<0029a625>] ? nmi_stack_correct+0x34/0x3e [<00004642>] ? arch_show_interrupts+0xab/0x5a7 [<00210068>] ? write_page+0x1e8/0x3a1 [<00299533>] ? _raw_spin_lock_irqsave+0x18/0x20 [<0021cad1>] ? add_unmap+0x11/0x8c [<0021e531>] ? intel_map_sg+0x1de/0x1de [<002bead5>] ? scsi_dma_unmap+0x45/0x4c [scsi_mod] [<00750001>] ? 0x750000 [<002efef7>] ? twl_interrupt+0x524/0x54d [3w_sas] [<000063f7>] ? do_nmi+0x2b9/0x2de [<00092724>] ? compaction_alloc+0xa5/0x219 [<000713ea>] ? show_free_areas+0x5db/0x6e4 [<0005c9c3>] ? handle_irq_event_percpu+0x24/0x110 [<00006161>] ? do_nmi+0x23/0x2de [<0005e940>] ? handle_edge_irq+0xa1/0xa1 [<0005cad0>] ? handle_irq_event+0x21/0x37 [<0005e940>] ? handle_edge_irq+0xa1/0xa1 [<0005e9a3>] ? handle_fasteoi_irq+0x63/0x7b [<00004c97>] ? handle_irq+0x6a/0x98 <IRQ> [<00004bbf>] ? do_IRQ+0x31/0x8a [<000143e4>] ? smp_apic_timer_interrupt+0x61/0x6d [<0029ae35>] ? common_interrupt+0x35/0x40 [<00013791>] ? native_machine_shutdown+0x3b/0x72 [<0000488b>] ? arch_show_interrupts+0x2f4/0x5a7 [<0000488b>] ? arch_show_interrupts+0x2f4/0x5a7 [<000100d8>] ? intel_pmu_enable_all+0x4b/0xd1 [<002123bf>] ? need_resched+0xc/0x10 [<0021252c>] ? poll_idle+0x1e/0x6b [<002124c6>] ? cpuidle_idle_call+0x53/0x9b [<00002a8c>] ? cpu_idle+0x44/0x6e ---[ end trace 7c5429b10ec8a603 ]--- ------------[ cut here ]------------ WARNING: at kernel/watchdog.c:241 watchdog_overflow_callback+0x97/0xa3() Not sure if the above information will help you to track this issue down but any assistance will be greately appreciated. If there is any other information we can pass let us know. Unfortunately the machines where these issues occur are in production thus enabling too many debugging options and often reboots are not a good solution :( Best regards, vaLentin
|