Kernel Debugging Tricks

简介: Kernel Debugging Tricks

Debugging the kernel is not necessarily rocket science; in fact it can be achieved using very simple and straight forward techniques and some time, patience and perseverance. This page describes some tricks and techniques to help debug the kernel.

  • printk is your friend
    The simplest, and probably most effective way to debug the kernel is via printk(). This enables one to print messages to the console, and it very similar to printf(). Note that printk() can slow down the execution of code which can alter the way code runs, for example, changing the way race conditions occur.
  • CHANGING THE RING BUFFER SIZE
    The internal kernel console message buffer can sometimes be too small to capture all of the printk messages, especially when debug code generates a lot of printk messages. If the buffer fills up, it wraps around and one can lose valueable debug messages.

To increase the internal buffer, use the kernel boot parameter:

log_buf_len=N

where N is the size of the buffer in bytes, and must be a power of 2.

  • CHANGING DEBUG LEVELS
    One can specify the type of printk() log level by pre-pending the 1st printk() argument with one of the following:
KERN_EMERG    /* system is unusable                   */
KERN_ALERT    /* action must be taken immediately     */
KERN_CRIT     /* critical conditions                  */
KERN_ERR      /* error conditions                     */
KERN_WARNING  /* warning conditions                   */
KERN_NOTICE   /* normal but significant condition     */
KERN_INFO     /* informational                        */
KERN_DEBUG    /* debug-level messages                 */
e.g. printk(KERN_DEBUG "example debug message\n");

If one does not specify the log level then the default log level of KERN_WARNING is used. For example, enable all levels of console message:

echo 7 > /proc/sys/kernel/printk

To view console messages at boot, remove the quite and splash boot parameters from the kernel boot line in grub. This will disable the usplash splash screen and re-enable console messages.

  • Serial Console
    Serial console enables one to dump out console messages over a serial cable. Most modern PCs do not have legacy serial ports, so instead, one can use a USB serial dongle instead. A "null serial cable" or "universal file transfer cable" is needed to connect the target computer with the host. Most commonly this will be a DB9 female to DB9 female null serial cable. In addition, one needs to enable USB serial support as a kernel build configuration:
CONFIG_USB_SERIAL_CONSOLE=y
CONFIG_USB_SERIAL=y

and enable the appropriate driver, e.g.:

CONFIG_USB_SERIAL_PL2303=y

and boot this kernel with

console=ttyUSB0,9600n8

one may need to adjust the baud rate appropriately.

Note: Generally, there is NO hardware or software flow control on serial console drivers, which means one may get dropped characters when running very high speed tty baud rates, such as 115200 baud.

  • Console Messages
    Kernel Oops messages general contain a fair amount of information, ranging from register and process state dump and a stack dump too. Unfortunately the stack dump can be more than 25 lines and can scroll off the top of the 25 line Virtual Console. Hence to capture more of a Oops, try the following:
chvt 1
setfont /usr/share/consolefonts/Uni1-VGA8.psf.gz

Of course, one may still have a stack dump that scrolls the top of the Oops message off the console, so one trick is to rebuild the kernel with the stack dump removed, just to capture the initial Oops information. To do this, modify dump_stack in arch/x86/kernel/dumpstack_*.c and comment out the call to show_trace()

  • Slowing down kernel messages on boot
    One may find a machine hangs during the kernel boot process and one would like to be able to see all the kernel messages but unfortunately they scroll off the console too quickly. One can slow down kernel console messages at boot time using by building the kernel with the following option enabled:
CONFIG_BOOT_PRINTK_DELAY=y

And boot the machine with the following kernel boot parameter:

boot_delay=N

where N = msecs delay between each console message.

  • Kernel panic during suspend
    Debugging suspend/resume issues can be difficult if the kernel panics during suspend, especially late in the suspend because console messages are disabled. One can stop console messages from being suspended by using the kernel parameter no_console_suspend:
    no_console_suspend=1
    This will force the console not to suspend. Boot with this option, chvt 1 (to console #1), and suspend using pm-suspend
  • Serial Console in VirtualBox
    In some debug scenerios it can be helpful to debug the kernel running inside a virtual machine. This is useful for some classes of non-hardware specific bugs, for example generic kernel core problems or debugging file system drivers.

One can capture Linux console messages running inside VirtualBox by setting it the VirtualBox serial log to /tmp/vbox and running a serial tty communications program such as minicom, and configure it to communicate with a named pipe tty called unix#/tmp/vbox

Boot with virtualised kernel boot line:

console=ttyS0,9600

and minicom will capture the console messages

  • Network Console
    One can route console messages over a network using netconsole. Note that it's not useful for capturing kernel panics as kernel halts before the messages can be transmitted over the network. However it can be useful to monitor systems without the need of message serial console cabling.

see Documentation/networking/netconsole.txt and also Kernel/Netconsole.

netconsole=[src-port]@[src-ip]/[<dev>],[tgt-port]@<tgt-ip>/[tgt-macaddr]
        where
             src-port      source for UDP packets (defaults to 6665)
             src-ip        source IP to use (interface address)
             dev           network interface (eth0)
             tgt-port      port for logging agent (6666)
             tgt-ip        IP address for logging agent
             tgt-macaddr   ethernet MAC address for logging agent (broadcast)

Examples:

linux netconsole=4444@10.0.0.1/eth1,9353@10.0.0.2/12:34:56:78:9a:bc

The remote host can run either netcat -u -l -p <port>or syslogd.

  • gdb on vmlinux
    One can disassemble a built kernel using gdb on the vmlinux image. This is useful when one gets a kernel Oops message and a stack dump - one can then disassemble the object code and see where the Oops is occuring. For example:
gdb  debian/build/build-generic/vmlinux
(gdb) disassemble printk
Dump of assembler code for function printk:
0xffffffff8023dce0 <printk+0>:  sub    $0xd8,%rsp
0xffffffff8023dce7 <printk+7>:  lea    0xe0(%rsp),%rax
0xffffffff8023dcef <printk+15>: mov    %rsi,0x28(%rsp)
0xffffffff8023dcf4 <printk+20>: mov    %rsp,%rsi
0xffffffff8023dcf7 <printk+23>: mov    %rdx,0x30(%rsp)
0xffffffff8023dcfc <printk+28>: mov    %rcx,0x38(%rsp)
0xffffffff8023dd01 <printk+33>: mov    %rax,0x8(%rsp)
0xffffffff8023dd06 <printk+38>: lea    0x20(%rsp),%rax
0xffffffff8023dd0b <printk+43>: mov    %r8,0x40(%rsp)
0xffffffff8023dd10 <printk+48>: mov    %r9,0x48(%rsp)
0xffffffff8023dd15 <printk+53>: movl   $0x8,(%rsp)
0xffffffff8023dd1c <printk+60>: movl   $0x30,0x4(%rsp)
0xffffffff8023dd24 <printk+68>: mov    %rax,0x10(%rsp)
0xffffffff8023dd29 <printk+73>: callq  0xffffffff8023d980 <vprintk>
0xffffffff8023dd2e <printk+78>: add    $0xd8,%rsp
0xffffffff8023dd35 <printk+85>: retq

End of assembler dump.

  • Objdump
    If one has the built object code at hand, one can disassemble the object using objdump as follows:
objdump -SdCg debian/build/build-generic/fs/dcache.o
  • Using GDB to find the location where your kernel panicked or oopsed.
    A quick and easy way to find the line of code where your kernel panicked or oopsed is to use GDB list command. You can do this as follows.

Lets assume your panic/oops message says something like:

[  174.507084] Stack:
[  174.507163]  ce0bd8ac 00000008 00000000 ce4a7e90 c039ce30 ce0bd8ac c0718b04 c07185a0
[  174.507380]  ce4a7ea0 c0398f22 ce0bd8ac c0718b04 ce4a7eb0 c037deee ce0bd8e0 ce0bd8ac
[  174.507597]  ce4a7ec0 c037dfe0 c07185a0 ce0bd8ac ce4a7ed4 c037d353 ce0bd8ac ce0bd8ac
[  174.507888] Call Trace:
[  174.508125]  [<c039ce30>] ? sd_remove+0x20/0x70
[  174.508235]  [<c0398f22>] ? scsi_bus_remove+0x32/0x40
[  174.508326]  [<c037deee>] ? __device_release_driver+0x3e/0x70
[  174.508421]  [<c037dfe0>] ? device_release_driver+0x20/0x40
[  174.508514]  [<c037d353>] ? bus_remove_device+0x73/0x90
[  174.508606]  [<c037bccf>] ? device_del+0xef/0x150
[  174.508693]  [<c0399207>] ? __scsi_remove_device+0x47/0x80
[  174.508786]  [<c0399262>] ? scsi_remove_device+0x22/0x40
[  174.508877]  [<c0399324>] ? __scsi_remove_target+0x94/0xd0
[  174.508969]  [<c03993c0>] ? __remove_child+0x0/0x20
[  174.509060]  [<c03993d7>] ? __remove_child+0x17/0x20
[  174.509148]  [<c037b868>] ? device_for_each_child+0x38/0x60
[  174.509241]  [<c039938f>] ? scsi_remove_target+0x2f/0x60
[  174.509393]  [<d0c38907>] ? __iscsi_unbind_session+0x77/0xa0 [scsi_transport_iscsi]
[  174.509699]  [<c015272e>] ? run_workqueue+0x6e/0x140
[  174.509801]  [<d0c38890>] ? __iscsi_unbind_session+0x0/0xa0 [scsi_transport_iscsi]
[  174.509977]  [<c0152888>] ? worker_thread+0x88/0xe0
[  174.510047]  [<c01566a0>] ? autoremove_wake_function+0x0/0x40

Lets say you want to know what line of code represents sd_remove+0x20/0x70. cd to the ubuntu debian/build/build-generic directory in your kernel tree and run gdb on the ".o" file which has the function sd_remove() in this case in sd.o, and use the gdb "list" command, (gdb) list *(function+0xoffset), in this case function is sd_remove() and offset is 0x20, and gdb should tell you the line number where you hit the panic or oops. This has worked for me very reliably for most cases.

manjo@hungry:~/devel/ubuntu/kernel/ubuntu-karmic-397906/debian/build/build-generic/drivers/scsi$ gdb sd.o
GNU gdb (GDB) 6.8.50.20090628-cvs-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
(gdb) list *(sd_remove+0x20)
0x1650 is in sd_remove (/home/manjo/devel/ubuntu/kernel/ubuntu-karmic-397906/drivers/scsi/sd.c:2125).
2120    static int sd_remove(struct device *dev)
2121    {
2122            struct scsi_disk *sdkp;
2123
2124            async_synchronize_full();
2125            sdkp = dev_get_drvdata(dev);
2126            blk_queue_prep_rq(sdkp->device->request_queue, scsi_prep_fn);
2127            device_del(&sdkp->dev);
2128            del_gendisk(sdkp->disk);
2129            sd_shutdown(dev);
(gdb)
相关实践学习
阿里云图数据库GDB入门与应用
图数据库(Graph Database,简称GDB)是一种支持Property Graph图模型、用于处理高度连接数据查询与存储的实时、可靠的在线数据库服务。它支持Apache TinkerPop Gremlin查询语言,可以帮您快速构建基于高度连接的数据集的应用程序。GDB非常适合社交网络、欺诈检测、推荐引擎、实时图谱、网络/IT运营这类高度互连数据集的场景。 GDB由阿里云自主研发,具备如下优势: 标准图查询语言:支持属性图,高度兼容Gremlin图查询语言。 高度优化的自研引擎:高度优化的自研图计算层和存储层,云盘多副本保障数据超高可靠,支持ACID事务。 服务高可用:支持高可用实例,节点故障迅速转移,保障业务连续性。 易运维:提供备份恢复、自动升级、监控告警、故障切换等丰富的运维功能,大幅降低运维成本。 产品主页:https://www.aliyun.com/product/gdb
相关文章
|
Linux Shell C语言
【Shell 命令集合 磁盘维护 】Linux 分区管理的工具 sfdisk命令使用教程
【Shell 命令集合 磁盘维护 】Linux 分区管理的工具 sfdisk命令使用教程
242 1
|
IDE PHP Apache
PhpStorm+Xampp+Xdebug搭建环境并部署应用
PhpStorm+Xampp+Xdebug搭建环境并部署应用
427 0
|
5月前
|
机器学习/深度学习 人工智能 负载均衡
Trae 04.22版本深度解析:Agent能力升级与MCP市场对复杂任务执行的革新
在当今快速发展的AI技术领域,Agent系统正成为自动化任务执行和智能交互的核心组件。Trae作为一款先进的AI协作平台,在04.22版本中带来了重大更新,特别是在Agent能力升级和MCP市场支持方面。本文将深入探讨这些更新如何重新定义复杂任务的执行方式,为开发者提供更强大的工具和更灵活的解决方案。
568 1
断路器/熔断器? 断路器的状态有哪些
● closed:关闭状态,断路器放行所有请求,并开始统计异常比例、慢请求比例。超过阈值则切换到open状态 ● open:打开状态,服务调用被熔断,访问被熔断服务的请求会被拒绝,快速失败,直接走降级逻辑。Open状态5秒后会进入half-open状态 ● half-open:半开状态,放行一次请求,根据执行结果来判断接下来的操作。 ○ 请求成功:则切换到closed状态 ○ 请求失败:则切换到open状态
|
Java 开发工具 Android开发
Android各版本对应的SDK和JDK版本
原文:Android各版本对应的SDK和JDK版本 一、Android各版本对应的SDK版本: 平台版本 SDK版本 版本名称 Android 8.
5949 0
|
Ubuntu Linux Shell
使用ramdisk启动ubuntu文件系统(pivot_root)
使用ramdisk启动ubuntu文件系统(pivot_root)
|
Linux 数据处理
探索Linux下的readelf命令:深入了解ELF文件
`readelf`是Linux下分析ELF文件的命令行工具,用于查看文件头、节区、符号表等信息。支持可执行文件、共享库等多种类型。常用选项有`-h`(文件头)、`-l`(程序头)、`-S`(节区)、`-s`(符号表)、`-r`(重定位)和`-d`(动态节区)。结合其他工具如`objdump`,能深入理解二进制文件,助力开发和调试。
|
Java 程序员 开发工具
Git Cherry-pick 使用
Git Cherry-pick 使用
|
新零售 消息中间件 存储
保证分布式系统数据一致性的6种方案
在电商等业务中,系统一般由多个独立的服务组成,如何解决分布式调用时候数据的一致性?
4518 0
|
安全 Unix Linux
`/var/log/wtmp` 和 `/var/run/utmp`日志详解
`/var/log/wtmp` 和 `/var/run/utmp` 是Unix/Linux系统中记录用户登录信息的关键文件。`wtmp` 文件存储所有登录和注销事件,供 `last` 命令显示登录历史,而 `utmp` 文件实时更新,记录当前登录用户信息,可由 `who` 或 `w` 命令解析展示。两者皆为root用户访问,系统重启可能清空,且常受安全措施保护,用于系统管理和安全审计。
1103 1