Author Topic: How to deal with kernel oops and Core Hardlocks... ?  (Read 2238 times)

bulek

  • Administrator
  • wants to work for LinuxMCE
  • *****
  • Posts: 909
  • Living with LMCE
    • View Profile
How to deal with kernel oops and Core Hardlocks... ?
« on: January 23, 2008, 01:31:22 pm »
Hi,

I'm getting a lot of kernel oops and sometimes also my LMCE 704 Core hardlocks (only HW reset helps)....

I'm spotting one weird problem :
1. I see kernel Oops or Eeek messages only if I'm connected via ssh at that time. if I search in syslog,messages or dmesg logs aftewards, I can't find any kernel messages Oops or Eeek in them. Do we have some setting to prevent this ? I get 1-2 lockups per day on my core but can't see kernel messages that describe cause of hard lockup... How to deal with this ?

2. How to see if kernel message it just kernel error causing hardlock trouble and is not related to any HW failure ?

3. How would be the procedure to rule out HW failure causes for hard lockups ? I've tried to rum memtest for 1 day and no errors were found in memory. I'm not sure how to check if anything is wrong with disks ? My disks seem not to have support for SMART health check...

4. I've seen all my kernel messages already reported on proper places for that (https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/107325)
I need just to wait and try new kernels or is there anything else that needs to be done ?


Thanks in advance,

regards,

Bulek.


Code: [Select]
cerouter_42077:~#
Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] Eeek! page_mapcount(page) went negative! (-1)

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] page pfn = 23232

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] page->flags = 40000004

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] page->count = 1

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] page->mapping = 00000000

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] vma->vm_ops = 0x0

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] ------------[ cut here ]------------

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] invalid opcode: 0000 [1]

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] SMP

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] CPU: 0

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] EIP: 0060:[page_remove_rmap+224/240] Tainted: P VLI

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] EFLAGS: 00210246 (2.6.20-15-generic 2)

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] EIP is at page_remove_rmap+0xe0/0xf0

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] eax: 00000000 ebx: c1464640 ecx: 00200046 edx: 00000000

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] esi: de27f128 edi: b7800000 ebp: c1464640 esp: daaf5eb8

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] ds: 007b es: 007b ss: 0068

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] Process perl (pid: 29700, ti=daaf4000 task=eb80b030 task.ti=daaf4000)

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] Stack: c036966b 00000000 00000000 e6fc5000 c0163a8d 00000000 b7b63fff 00000000

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] de27f128 daaf5f44 00000000 00000001 b7b64000 f280fb78 cbe74200 c1806640

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] 00000000 ffffffff c14df8ac e6fc5000 f280fb78 23232323 c14df8ac 00362bf3

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] Call Trace:

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] [unmap_vmas+733/1472] unmap_vmas+0x2dd/0x5c0

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] [exit_mmap+119/240] exit_mmap+0x77/0xf0

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] [mmput+56/160] mmput+0x38/0xa0

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] [do_exit+242/2048] do_exit+0xf2/0x800

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] [do_page_fault+831/1520] do_page_fault+0x33f/0x5f0

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] [do_group_exit+38/128] do_group_exit+0x26/0x80

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] [sysenter_past_esp+105/169] sysenter_past_esp+0x69/0xa9

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] [xfrm_state_find+1251/1392] xfrm_state_find+0x4e3/0x570

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] =======================

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] Code: c0 74 0d 8b 50 08 b8 fc a1 36 c0 e8 3b ca fd ff 8b 46 48 85 c0 74 14 8b 40 10 85 c0 74 0d 8b 50 2c b8 1c a2 36 c0 e8 20 ca fd ff <0f> 0b eb fe 8b 53 0c eb 95 8d b4 26 00 00 00 00 55 57 56 89 d6

Message from syslogd@dcerouter at Mon Jan 14 23:39:48 2008 ...
dcerouter kernel: [14373.108000] EIP: [page_remove_rmap+224/240] page_remove_rmap+0xe0/0xf0 SS:ESP 0068:daaf5eb8

dcerouter_42077:~#





dcerouter kernel: [100495.644000] Oops: 0002 [#1]

Message from syslogd@dcerouter at Wed Jan 23 12:36:15 2008 ...
dcerouter kernel: [100495.644000] SMP

Message from syslogd@dcerouter at Wed Jan 23 12:36:15 2008 ...
dcerouter kernel: [100495.644000] CPU:    0

Message from syslogd@dcerouter at Wed Jan 23 12:36:15 2008 ...
dcerouter kernel: [100495.644000] EIP:    0060:[cache_alloc_refill+298/1360]
Tainted: P      VLI

Message from syslogd@dcerouter at Wed Jan 23 12:36:15 2008 ...
dcerouter kernel: [100495.644000] EFLAGS: 00010046   (2.6.20-15-generic #2)

Message from syslogd@dcerouter at Wed Jan 23 12:36:15 2008 ...
dcerouter kernel: [100495.644000] EIP is at cache_alloc_refill+0x12a/0x550

Message from syslogd@dcerouter at Wed Jan 23 12:36:15 2008 ...
dcerouter kernel: [100495.644000] eax: df90bdc0   ebx: 00000006   ecx: 00000035
  edx: a7b4bbc0

Message from syslogd@dcerouter at Wed Jan 23 12:36:15 2008 ...
dcerouter kernel: [100495.644000] esi: 0000001d   edi: dbdfe000   ebp: df909c00
  esp: e545be68

Message from syslogd@dcerouter at Wed Jan 23 12:36:15 2008 ...
dcerouter kernel: [100495.644000] ds: 007b   es: 007b   ss: 0068

Message from syslogd@dcerouter at Wed Jan 23 12:36:15 2008 ...
dcerouter kernel: [100495.644000] Process fuser (pid: 13555, ti=e545a000 task=d5
4a5560 task.ti=e545a000)

Message from syslogd@dcerouter at Wed Jan 23 12:36:15 2008 ...
dcerouter kernel: [100495.644000] Stack: 00000010 000000d0 00000004 000000d0 df9
0e9c0 df908000 df90bdc0 00000000

Message from syslogd@dcerouter at Wed Jan 23 12:36:15 2008 ...
dcerouter kernel: [100495.644000]        479726af 0e3c60b5 e56b9560 c0189dba e56
b9560 dbdfe01c 00000246 000000d0

Message from syslogd@dcerouter at Wed Jan 23 12:36:15 2008 ...
dcerouter kernel: [100495.644000]        df90e9c0 da1657c8 c0172c00 00000000 000
00423 00000004 c018811b e545bf08

Message from syslogd@dcerouter at Wed Jan 23 12:36:15 2008 ...
dcerouter kernel: [100495.644000] Call Trace:

Message from syslogd@dcerouter at Wed Jan 23 12:36:15 2008 ...
dcerouter kernel: [100495.644000]  [alloc_inode+202/384] alloc_inode+0xca/0x180

Message from syslogd@dcerouter at Wed Jan 23 12:36:15 2008 ...
dcerouter kernel: [100495.644000]  [kmem_cache_alloc+128/144] kmem_cache_alloc+0
x80/0x90

Message from syslogd@dcerouter at Wed Jan 23 12:36:15 2008 ...
dcerouter kernel: [100495.644000]  [d_alloc+27/400] d_alloc+0x1b/0x190

Message from syslogd@dcerouter at Wed Jan 23 12:36:15 2008 ...
dcerouter kernel: [100495.644000]  [proc_fill_cache+257/320] proc_fill_cache+0x1
01/0x140

Message from syslogd@dcerouter at Wed Jan 23 12:36:15 2008 ...
dcerouter kernel: [100495.644000]  [filldir64+0/224] filldir64+0x0/0xe0

Message from syslogd@dcerouter at Wed Jan 23 12:36:15 2008 ...
dcerouter kernel: [100495.644000]  [proc_readfd+223/480] proc_readfd+0xdf/0x1e0

Message from syslogd@dcerouter at Wed Jan 23 12:36:15 2008 ...
dcerouter kernel: [100495.644000]  [proc_fd_instantiate+0/352] proc_fd_instantia
te+0x0/0x160

Message from syslogd@dcerouter at Wed Jan 23 12:36:15 2008 ...
dcerouter kernel: [100495.644000]  [filldir64+0/224] filldir64+0x0/0xe0

Message from syslogd@dcerouter at Wed Jan 23 12:36:15 2008 ...
dcerouter kernel: [100495.644000]  [filldir64+0/224] filldir64+0x0/0xe0

« Last Edit: January 23, 2008, 01:37:30 pm by bulek »
Thanks in advance,

regards,

Bulek.

rafik24

  • Guru
  • ****
  • Posts: 158
    • View Profile
Re: How to deal with kernel oops and Core Hardlocks... ?
« Reply #1 on: January 23, 2008, 05:50:58 pm »
 Hi Bulek,

 I'm not sure if this is related to your issue but most of the hard lockups i had were eighter hardware related or gfx related. on some of my machines i had to disable acpi and play around with the gfx related bios setting and also with memory timings.

 Regards,

Rafik