Kernel module debugging: a simple technique
anomit | November 4, 2009Disclaimer: I have only started out with developing kernel modules and even novice would be an overstatement to describe my current skills. What follows is stuff I gathered from different sources while trying to debug a kernel oops due to a module: some googled, some from the LDD3 book which finally put together gives more or less a basic strategy to start with debugging a kernel module.
As I figured out from reading LDD3, you can use one of these tools to debug a module
- plain ol’ gdb
- kgdb
- kdb
kgdb doesn’t really strike me as something I will be needing in the near future. I’m quite sure I won’t be taking the trouble to find another system to set up a debug session. But for all I know it might be invaluable to those involved in serious work.
kdb requires you to patch the kernel. I’ll admit I didn’t try this out of sheer laziness.
gdb should be a part of the arsenal of even a half-serious programmer and in my case, it was. There are just a few things that need to be in place before you start using it. First, you need the uncompressed kernel image, vmlinux (not vmlinuz). Second, you need to compile the kernel with some extra options to help you with debugging. This one is again from the LDD3 book, Chapter 4.
CONFIG_DEBUG_KERNEL*
CONFIG_DEBUG_SLAB
CONFIG_DEBUG_PAGEALLOC
CONFIG_DEBUG_SPINLOCK
CONFIG_DEBUG_SPINLOCK_SLEEP
CONFIG_INIT_DEBUG*
CONFIG_DEBUG_INFO*
CONFIG_MAGIC_SYSRQ
CONFIG_DEBUG_STACKOVERFLOW
CONFIG_DEBUG_STACK_USAGE
CONFIG_KALLSYMS*
CONFIG_IKCONFIG*
CONFIG_IKCONFIG_PROC*
CONFIG_ACPI_DEBUG
CONFIG_DEBUG_DRIVER
CONFIG_SCSI_CONSTANTS
CONFIG_INPUT_EVBUG
CONFIG_PROFILING*
It’s not that all of these are absolutely necessary to get any kind of debugging work done but you never know what kind of oops/kernel panic you might be facing. Still I have starred the ones that I feel *must* be enabled. But don’t go by my words, compile and recompile to find out the truth
With all the yak mowing out of the way, you can finally start debugging the module with your freshly recompiled kernel.
Start the debugger with
#gdb /usr/src/linux/vmlinux /proc/kcore
But gdb doesn’t yet know where to find the module’s code and data sections. You can either do it manually by going into /sys/module/module_name/sections, cat-ing the values of .text, .data and .bss and then this command at the gdb prompt
(gdb)add-symbol-file /path/to/module 0xd081d000 \ # .text
-s .data 0xd08232c0 \
-s .bss 0xd0823e20
or this shell script will output the whole command for you:
#!/bin/bash
#
# gdbline module image
#
# Outputs an add-symbol-file line suitable for pasting into gdb to examine
# a loaded module.
#
cd /sys/module/$1/sections
echo -n add-symbol-file $2 `/bin/cat .text`
for section in .[a-z]* *; do
if [ $section != ".text" ]; then
echo " \\"
echo -n " -s" $section `/bin/cat $section`
fi
done
echo
This information is again thanks to the LDD3 author Corbet, from this article. What would I have done without his book and articles?!
The module I was trying to debug was causing an oops due to null pointer dereferencing, which actually has been the source of quite a few vulnerabilities in the mainline kernel source. The following is what it looked like (got it from dmesg)
[27570.020736] BUG: unable to handle kernel NULL pointer dereference at 00000018 [27570.020747] IP: [<e07b3c31>] :plan9_net:socknet_connect+0xd1/0x110 [27570.020760] *pde = 00000000 [27570.020767] Oops: 0000 [#1] SMP [snip] [27570.020939] [27570.020945] Pid: 8622, comm: bash Tainted: P (2.6.27-14-generic #1) [27570.020951] EIP: 0060:[<e07b3c31>] EFLAGS: 00010296 CPU: 0 [27570.020960] EIP is at socknet_connect+0xd1/0x110 [plan9_net] [27570.020966] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00f60000 [27570.020971] ESI: de4182a8 EDI: 00000002 EBP: dddedf20 ESP: dddedef4 [27570.020977] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [27570.020983] Process bash (pid: 8622, ti=dddec000 task=c198f110 task.ti=dddec000) [27570.020988] Stack: 00000000 de4189f0 de418308 17000002 0101a8c0 d0f85494 d0f85493 de4ed200 [27570.021004] dda077c0 d0f85494 d0f85493 dddedf54 e07b37d5 00000000 dd4fd340 dda077c0 [27570.021019] 0000000e dd4fd340 de4182a8 d0f85493 dd4fd340 d0f85480 09582808 caaa4540 [27570.021035] Call Trace: [27570.021041] [<e07b37d5>] ? tcp_n_ctl_process+0x145/0x170 [plan9_net] [27570.021053] [<e07b3505>] ? slashnet_write_file+0x185/0x190 [plan9_net] [27570.021070] [<c01b2c70>] ? vfs_write+0xa0/0x110 [27570.021081] [<e07b3380>] ? slashnet_write_file+0x0/0x190 [plan9_net] [27570.021092] [<c01b2db2>] ? sys_write+0x42/0x70 [27570.021101] [<c0103f7b>] ? sysenter_do_call+0x12/0x2f [27570.021110] [<c0380000>] ? __down_killable+0x60/0xd0 [27570.021121] ======================= [27570.021124] Code: 7b e0 bb ff ff ff ff e8 cc ab bc df eb 8f 8b 58 0c 8d 55 e0 b9 10 00 00 00 c7 04 24 00 00 00 00 ff 53 10 89 c3 8b 45 f0 8b 40 14 <8b> 40 18 c7 04 24 5c 3e 7b e0 89 44 24 04 e8 9a ab bc df 85 db [27570.021211] EIP: [<e07b3c31>] socknet_connect+0xd1/0x110 [plan9_net] SS:ESP 0068:dddedef4 [27570.021235] ---[ end trace 1d54537d6fc8b3bc ]---
Phew that’s a lot of information! You get a dump of all the register values, the stacktrace, codetrace etc in an oops message. I’ve given a couple of links at the end that deal with all the information present. Refer to them for more details.
For now, we can see that something was executed in the socknet_connect section at an offset of 0xd1 which caused the null pointer dereference. We’re very close to finding out the errant piece of code now. Just do the following in the gdb prompt to home in right on the culprit statement :
(gdb)list *socknet_connect+0xd1
..and we are done! Pretty simple and basic, wasn’t it?
These two links are really good for pointers on how to look for the necessary information in an oops message
- Re: what’s an OOPS by John Bradford from LKML
- A very detailed oops report analysis that’ll really help you with ‘how to get from bug report to the source of bug’
I’ve been also trying to use the offset information with the disassembled module to figure out which part of the source code it might actually correspond to. I haven’t met with much success though.






