Kernel Debugging Tricks

简介: Kernel Debugging Tricks

Debugging the kernel is not necessarily rocket science; in fact it can be achieved using very simple and straight forward techniques and some time, patience and perseverance. This page describes some tricks and techniques to help debug the kernel.

  • printk is your friend
    The simplest, and probably most effective way to debug the kernel is via printk(). This enables one to print messages to the console, and it very similar to printf(). Note that printk() can slow down the execution of code which can alter the way code runs, for example, changing the way race conditions occur.
  • CHANGING THE RING BUFFER SIZE
    The internal kernel console message buffer can sometimes be too small to capture all of the printk messages, especially when debug code generates a lot of printk messages. If the buffer fills up, it wraps around and one can lose valueable debug messages.

To increase the internal buffer, use the kernel boot parameter:

log_buf_len=N

where N is the size of the buffer in bytes, and must be a power of 2.

  • CHANGING DEBUG LEVELS
    One can specify the type of printk() log level by pre-pending the 1st printk() argument with one of the following:
KERN_EMERG    /* system is unusable                   */
KERN_ALERT    /* action must be taken immediately     */
KERN_CRIT     /* critical conditions                  */
KERN_ERR      /* error conditions                     */
KERN_WARNING  /* warning conditions                   */
KERN_NOTICE   /* normal but significant condition     */
KERN_INFO     /* informational                        */
KERN_DEBUG    /* debug-level messages                 */
e.g. printk(KERN_DEBUG "example debug message\n");

If one does not specify the log level then the default log level of KERN_WARNING is used. For example, enable all levels of console message:

echo 7 > /proc/sys/kernel/printk

To view console messages at boot, remove the quite and splash boot parameters from the kernel boot line in grub. This will disable the usplash splash screen and re-enable console messages.

  • Serial Console
    Serial console enables one to dump out console messages over a serial cable. Most modern PCs do not have legacy serial ports, so instead, one can use a USB serial dongle instead. A "null serial cable" or "universal file transfer cable" is needed to connect the target computer with the host. Most commonly this will be a DB9 female to DB9 female null serial cable. In addition, one needs to enable USB serial support as a kernel build configuration:
CONFIG_USB_SERIAL_CONSOLE=y
CONFIG_USB_SERIAL=y

and enable the appropriate driver, e.g.:

CONFIG_USB_SERIAL_PL2303=y

and boot this kernel with

console=ttyUSB0,9600n8

one may need to adjust the baud rate appropriately.

Note: Generally, there is NO hardware or software flow control on serial console drivers, which means one may get dropped characters when running very high speed tty baud rates, such as 115200 baud.

  • Console Messages
    Kernel Oops messages general contain a fair amount of information, ranging from register and process state dump and a stack dump too. Unfortunately the stack dump can be more than 25 lines and can scroll off the top of the 25 line Virtual Console. Hence to capture more of a Oops, try the following:
chvt 1
setfont /usr/share/consolefonts/Uni1-VGA8.psf.gz

Of course, one may still have a stack dump that scrolls the top of the Oops message off the console, so one trick is to rebuild the kernel with the stack dump removed, just to capture the initial Oops information. To do this, modify dump_stack in arch/x86/kernel/dumpstack_*.c and comment out the call to show_trace()

  • Slowing down kernel messages on boot
    One may find a machine hangs during the kernel boot process and one would like to be able to see all the kernel messages but unfortunately they scroll off the console too quickly. One can slow down kernel console messages at boot time using by building the kernel with the following option enabled:
CONFIG_BOOT_PRINTK_DELAY=y

And boot the machine with the following kernel boot parameter:

boot_delay=N

where N = msecs delay between each console message.

  • Kernel panic during suspend
    Debugging suspend/resume issues can be difficult if the kernel panics during suspend, especially late in the suspend because console messages are disabled. One can stop console messages from being suspended by using the kernel parameter no_console_suspend:
    no_console_suspend=1
    This will force the console not to suspend. Boot with this option, chvt 1 (to console #1), and suspend using pm-suspend
  • Serial Console in VirtualBox
    In some debug scenerios it can be helpful to debug the kernel running inside a virtual machine. This is useful for some classes of non-hardware specific bugs, for example generic kernel core problems or debugging file system drivers.

One can capture Linux console messages running inside VirtualBox by setting it the VirtualBox serial log to /tmp/vbox and running a serial tty communications program such as minicom, and configure it to communicate with a named pipe tty called unix#/tmp/vbox

Boot with virtualised kernel boot line:

console=ttyS0,9600

and minicom will capture the console messages

  • Network Console
    One can route console messages over a network using netconsole. Note that it's not useful for capturing kernel panics as kernel halts before the messages can be transmitted over the network. However it can be useful to monitor systems without the need of message serial console cabling.

see Documentation/networking/netconsole.txt and also Kernel/Netconsole.

netconsole=[src-port]@[src-ip]/[<dev>],[tgt-port]@<tgt-ip>/[tgt-macaddr]
        where
             src-port      source for UDP packets (defaults to 6665)
             src-ip        source IP to use (interface address)
             dev           network interface (eth0)
             tgt-port      port for logging agent (6666)
             tgt-ip        IP address for logging agent
             tgt-macaddr   ethernet MAC address for logging agent (broadcast)

Examples:

linux netconsole=4444@10.0.0.1/eth1,9353@10.0.0.2/12:34:56:78:9a:bc

The remote host can run either netcat -u -l -p <port>or syslogd.

  • gdb on vmlinux
    One can disassemble a built kernel using gdb on the vmlinux image. This is useful when one gets a kernel Oops message and a stack dump - one can then disassemble the object code and see where the Oops is occuring. For example:
gdb  debian/build/build-generic/vmlinux
(gdb) disassemble printk
Dump of assembler code for function printk:
0xffffffff8023dce0 <printk+0>:  sub    $0xd8,%rsp
0xffffffff8023dce7 <printk+7>:  lea    0xe0(%rsp),%rax
0xffffffff8023dcef <printk+15>: mov    %rsi,0x28(%rsp)
0xffffffff8023dcf4 <printk+20>: mov    %rsp,%rsi
0xffffffff8023dcf7 <printk+23>: mov    %rdx,0x30(%rsp)
0xffffffff8023dcfc <printk+28>: mov    %rcx,0x38(%rsp)
0xffffffff8023dd01 <printk+33>: mov    %rax,0x8(%rsp)
0xffffffff8023dd06 <printk+38>: lea    0x20(%rsp),%rax
0xffffffff8023dd0b <printk+43>: mov    %r8,0x40(%rsp)
0xffffffff8023dd10 <printk+48>: mov    %r9,0x48(%rsp)
0xffffffff8023dd15 <printk+53>: movl   $0x8,(%rsp)
0xffffffff8023dd1c <printk+60>: movl   $0x30,0x4(%rsp)
0xffffffff8023dd24 <printk+68>: mov    %rax,0x10(%rsp)
0xffffffff8023dd29 <printk+73>: callq  0xffffffff8023d980 <vprintk>
0xffffffff8023dd2e <printk+78>: add    $0xd8,%rsp
0xffffffff8023dd35 <printk+85>: retq

End of assembler dump.

  • Objdump
    If one has the built object code at hand, one can disassemble the object using objdump as follows:
objdump -SdCg debian/build/build-generic/fs/dcache.o
  • Using GDB to find the location where your kernel panicked or oopsed.
    A quick and easy way to find the line of code where your kernel panicked or oopsed is to use GDB list command. You can do this as follows.

Lets assume your panic/oops message says something like:

[  174.507084] Stack:
[  174.507163]  ce0bd8ac 00000008 00000000 ce4a7e90 c039ce30 ce0bd8ac c0718b04 c07185a0
[  174.507380]  ce4a7ea0 c0398f22 ce0bd8ac c0718b04 ce4a7eb0 c037deee ce0bd8e0 ce0bd8ac
[  174.507597]  ce4a7ec0 c037dfe0 c07185a0 ce0bd8ac ce4a7ed4 c037d353 ce0bd8ac ce0bd8ac
[  174.507888] Call Trace:
[  174.508125]  [<c039ce30>] ? sd_remove+0x20/0x70
[  174.508235]  [<c0398f22>] ? scsi_bus_remove+0x32/0x40
[  174.508326]  [<c037deee>] ? __device_release_driver+0x3e/0x70
[  174.508421]  [<c037dfe0>] ? device_release_driver+0x20/0x40
[  174.508514]  [<c037d353>] ? bus_remove_device+0x73/0x90
[  174.508606]  [<c037bccf>] ? device_del+0xef/0x150
[  174.508693]  [<c0399207>] ? __scsi_remove_device+0x47/0x80
[  174.508786]  [<c0399262>] ? scsi_remove_device+0x22/0x40
[  174.508877]  [<c0399324>] ? __scsi_remove_target+0x94/0xd0
[  174.508969]  [<c03993c0>] ? __remove_child+0x0/0x20
[  174.509060]  [<c03993d7>] ? __remove_child+0x17/0x20
[  174.509148]  [<c037b868>] ? device_for_each_child+0x38/0x60
[  174.509241]  [<c039938f>] ? scsi_remove_target+0x2f/0x60
[  174.509393]  [<d0c38907>] ? __iscsi_unbind_session+0x77/0xa0 [scsi_transport_iscsi]
[  174.509699]  [<c015272e>] ? run_workqueue+0x6e/0x140
[  174.509801]  [<d0c38890>] ? __iscsi_unbind_session+0x0/0xa0 [scsi_transport_iscsi]
[  174.509977]  [<c0152888>] ? worker_thread+0x88/0xe0
[  174.510047]  [<c01566a0>] ? autoremove_wake_function+0x0/0x40

Lets say you want to know what line of code represents sd_remove+0x20/0x70. cd to the ubuntu debian/build/build-generic directory in your kernel tree and run gdb on the ".o" file which has the function sd_remove() in this case in sd.o, and use the gdb "list" command, (gdb) list *(function+0xoffset), in this case function is sd_remove() and offset is 0x20, and gdb should tell you the line number where you hit the panic or oops. This has worked for me very reliably for most cases.

manjo@hungry:~/devel/ubuntu/kernel/ubuntu-karmic-397906/debian/build/build-generic/drivers/scsi$ gdb sd.o
GNU gdb (GDB) 6.8.50.20090628-cvs-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
(gdb) list *(sd_remove+0x20)
0x1650 is in sd_remove (/home/manjo/devel/ubuntu/kernel/ubuntu-karmic-397906/drivers/scsi/sd.c:2125).
2120    static int sd_remove(struct device *dev)
2121    {
2122            struct scsi_disk *sdkp;
2123
2124            async_synchronize_full();
2125            sdkp = dev_get_drvdata(dev);
2126            blk_queue_prep_rq(sdkp->device->request_queue, scsi_prep_fn);
2127            device_del(&sdkp->dev);
2128            del_gendisk(sdkp->disk);
2129            sd_shutdown(dev);
(gdb)
相关实践学习
阿里云图数据库GDB入门与应用
图数据库(Graph Database,简称GDB)是一种支持Property Graph图模型、用于处理高度连接数据查询与存储的实时、可靠的在线数据库服务。它支持Apache TinkerPop Gremlin查询语言,可以帮您快速构建基于高度连接的数据集的应用程序。GDB非常适合社交网络、欺诈检测、推荐引擎、实时图谱、网络/IT运营这类高度互连数据集的场景。 GDB由阿里云自主研发,具备如下优势: 标准图查询语言:支持属性图,高度兼容Gremlin图查询语言。 高度优化的自研引擎:高度优化的自研图计算层和存储层,云盘多副本保障数据超高可靠,支持ACID事务。 服务高可用:支持高可用实例,节点故障迅速转移,保障业务连续性。 易运维:提供备份恢复、自动升级、监控告警、故障切换等丰富的运维功能,大幅降低运维成本。 产品主页:https://www.aliyun.com/product/gdb
相关文章
|
5月前
|
开发框架 JSON .NET
探索Semantic Kernel内置插件:深入了解HttpPlugin的应用
其他 Put和Delete类似。 最后 可以借鉴HttpPlugin的实现思路在项目中灵活的运行,如果不支持那就可以自定义插件来完成需求的开发,还是比较期待这个插件能够更加完善的一点,在未来以更灵活的方式支持Post等请求的多种形式。
|
Oracle 关系型数据库 Unix
Linux Kernel Lowmem Pressure Issues and Related Kernel Structures (Doc ID 452326.1)
Linux Kernel Lowmem Pressure Issues and Related Kernel Structures (Doc ID 452326.1)
129 0
|
TensorFlow 算法框架/工具 Python
成功解决Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
成功解决Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
成功解决Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
|
SQL 编译器 API
Efficiently Compiling Efficient Query Plans for Modern Hardware 论文解读
这应该是SQL查询编译的一篇经典文章了,作者是著名的Thomas Neumann,主要讲解了TUM的HyPer数据库中对于CodeGen的应用。 在morsel-driven那篇paper 中,介绍了HyPer的整个执行框架,会以task为单位处理一个morsel的数据,而执行的处理逻辑(一个pipeline job)就被编译为一个函数。这篇paper则具体讲如何实现动态编译。
444 0
Efficiently Compiling Efficient Query Plans for Modern Hardware 论文解读
|
开发工具
修复VirtualBox "This kernel requires the following features not present on the CPU: pae Unable to boot – please use a kernel appropriate for your CPU"
异常处理汇总-开发工具  http://www.cnblogs.com/dunitian/p/4522988.html 修复VirtualBox "This kernel requires the following features not present on the CPU: pae Una...
1013 0
How to Detect and Respond to Malware-Free Intrusions
http://www.crowdstrike.com/beyond-malware-how-to-detect-and-respond-to-malware-free-intrusions/...
848 0
|
PHP 测试技术
OpenCASCADE6.8.0 Reference Manual Serach Problem
OpenCASCADE6.8.0 Reference Manual Serach Problem eryar@163.com 1. Problem 有网友反映OpenCASCADE6.8.0的Reference Manual的搜索功能不好用,以前没注意,自己试了下,的确在本机上不能正确搜索出给出的关键字相关的内容。
1030 0