Tuesday, June 28, 2011

kernel crash debugging Tip: how to know which line in your code caused the crash

Use "objdump" utility with option "-S" on your object file (.o) which dumps the disassembly of your object file along with source code. it displays each line and the corresponding disassembly.

Example: following is an instance of kernel crash which shows the crash happened at address 0x58 in init_module() function which of size 0x8c and the module name is domu_share:

root@PVHVM-domU:~/tets_programs/page_share_interdomain# dmesg -c
[ 2463.297489] BUG: unable to handle kernel paging request at 000000003bd28000
[ 2463.297495] IP: [] init_module+0x58/0x8c [domu_share]
[ 2463.297503] PGD 3bdb3067 PUD 36f56067 PMD 0
[ 2463.297506] Oops: 0002 [#1] SMP
[ 2463.297508] last sysfs file: /sys/devices/pci0000:00/0000:00:01.2/usb1/1-0:1.0/uevent
[ 2463.297512] CPU 0
=========

Now do objdump of domu_share.o and redirect to a file (as sometimes the objdump can be very big)
# objdump -S domu_share.o > my_objdump

Now look for the init_module function in "my_objdump" and from the base address of the function, go 0x58 bytes further and check the line to which the address belongs to. that is where the crash happened exactly.

The section of code which caused the crash is as below

int init_module(void)
{
80: 55 push %rbp <<<<<------ 80 (actually 0x80) is the base address of the init_module();
81: 48 89 e5 mov %rsp,%rbp
84: e8 00 00 00 00 callq 89
* that is several physically contiguous pages long, and doesn't zero
.....
..... /* crash is at 0x58 of init_module; so go to 0x80+0x58 = 0xd8 */
sring = (struct as_sring*) page;
a6: 48 63 d0 movslq %eax,%rdx

SHARED_RING_INIT(sring);

a9: be 2f 00 00 00 mov $0x2f,%esi
ae: 48 8d 7a 11 lea 0x11(%rdx),%rdi
b2: c7 42 08 00 00 00 00 movl $0x0,0x8(%rdx)
b9: c7 02 00 00 00 00 movl $0x0,(%rdx)
bf: c7 42 0c 01 00 00 00 movl $0x1,0xc(%rdx)
c6: c7 42 04 01 00 00 00 movl $0x1,0x4(%rdx)
cd: 40 f6 c7 01 test $0x1,%dil
d1: 0f 85 f1 00 00 00 jne 1c8
d7: 40 f6 c7 02 test $0x2,%dil <<<<<----- here is where the crash happened
and this assembly belongs to
SHARED_RING_INIT(sring);

db: 0f 85 ff 00 00 00 jne 1e0
==============

So, have a look at SHARED_RING_INIT(sring) which caused the crash.

No comments: