Kprobes 是 Linux 中的轻量级装置,可以将断点插入到正在运行的内核之中。Kprobes 可以地收集处理器寄存器和全局数据结构等调试信息。甚至可以使用 Kprobes 来修改 寄存器值和全局数据结构的值。
Kprobes 向运行的内核中给定地址写入断点指令,插入一个探测器。 执行被探测的指令会导致断点错误。Kprobes 钩住(hook in)断点处理器并收集调试信息。Kprobes 甚至可以单步执行被探测的指令。
内核编译中开启CONFIG_KPROBE_EVENTS=y.即可动态添加kprobe。
1. 工作原理
用户指定一个探测点,并把一个用户定义的处理函数关联到该探测点,当内核执行到该探测点时,相应的关联函数被执行,然后继续执行正常的代码路径。
kprobe实现了三种类型的探测点: kprobes, jprobes和kretprobes (也叫返回探测点)。 kprobes是可以被插入到内核的任何指令位置的探测点,jprobes则只能被插入到一个内核函数的入口,而kretprobes则是在指定的内核函数返回时才被执行。
l  安装一个kprobes探测点,kprobe先备份被探测的指令,然后使用断点指令来取代被探测指令的头一个或几个字节。
l  当执行到探测点时,将因运行断点指令而执行trap操作,保存CPU的寄存器,调用相应的trap处理函数。
l  trap处理函数将调用相应的notifier_call_chain中注册的所有notifier函数,kprobe正是通过向trap对应的notifier_call_chain注册关联到探测点的处理函数来实现探测处理的。
l  首先执行关联到探测点的pre_handler函数,并把相应的kprobe struct和保存的寄存器作为该函数的参数,最后kprobe执行post_handler。等所有这些运行完毕后,最后紧跟在被探测指令后的指令流。
如下图:
2. kprobe初始化
kprobes作为一个模块,其初始化函数为init_kprobes,代码路径kernel/kprobes.c
3. 通过ftrace接口使用
可以通过
/sys/kernel/debug/tracing/kprobe_events,
并使能
/sys/kernel/debug/tracing/events/kprobes/<EVENT>/enabled.
语法
事件的语法如下:
p[:[GRP/]EVENT] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS] : Set a probe
r[MAXACTIVE][:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS] : Set a return probe
-:[GRP/]EVENT : Clear a probe
GRP : Group name. If omitted, use "kprobes" for it.
EVENT : Event name. If omitted, the event name is generated
based on SYM+offs or MEMADDR.
MOD : Module name which has given SYM.
SYM[+offs] : Symbol+offset where the probe is inserted.
MEMADDR : Address where the probe is inserted.
MAXACTIVE : Maximum number of instances of the specified function that
can be probed simultaneously, or 0 for the default value
as defined in Documentation/kprobes.txt section 1.3.1.
FETCHARGS : Arguments. Each probe can have up to 128 args.
%REG : Fetch register REG
@ADDR : Fetch memory at ADDR (ADDR should be in kernel)
@SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
$stackN : Fetch Nth entry of stack (N >= 0)
$stack : Fetch stack address.
$retval : Fetch return value.(*)
$comm : Fetch current task comm.
+|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
(u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
(x8/x16/x32/x64), "string" and bitfield are supported.
(*) only for return probe.
(**) this is useful for fetching a field of data structures.
增加kprobe事件
例如,增加一个新的事件do_sys_open。
echo 'p:myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)' > /sys/kernel/debug/tracing/kprobe_events
可以查看文件:
# cat /sys/kernel/debug/tracing/kprobe_events
p:kprobes/myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)
查看内核源码发现do_sys_open定义如下:
long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
所以说,dfd=%ax filename=%dx flags=%cx mode=+4($stack)是do_sys_open的参数,因为kprobe是在函数入口处,
增加kretprobe事件
定义一个返回出的事件如下:
echo 'r:myretprobe do_sys_open $retval' >> /sys/kernel/debug/tracing/kprobe_events
继续查看,发现有两个事件:
# cat /sys/kernel/debug/tracing/kprobe_events
p:kprobes/myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)
r:kprobes/myretprobe do_sys_open arg1=$retval
查看格式
关于所定义事件的格式,可以通过如下查看
# cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format
name: myprobe
ID: 1844
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:unsigned long __probe_ip; offset:8; size:8; signed:0;
field:u64 dfd; offset:16; size:8; signed:0;
field:u64 filename; offset:24; size:8; signed:0;
field:u64 flags; offset:32; size:8; signed:0;
field:u64 mode; offset:40; size:8; signed:0;
print fmt: "(%lx) dfd=0x%Lx filename=0x%Lx flags=0x%Lx mode=0x%Lx", REC->__probe_ip, REC->dfd, REC->filename, REC->flags, REC->mode
可以看到有四个参数。
使能事件跟踪
定义事件后,默认是关闭的。如果要使能,命令如下:
echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
echo 1 > /sys/kernel/debug/tracing/events/kprobes/myretprobe/enable
使能之后,就可以查看事件
#cat /sys/kernel/debug/tracing/trace
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
bash-4024 [001] .... 7507.712770: myprobe: (do_sys_open+0x0/0x210) dfd=0x2 filename=0x8241 flags=0x1b6 mode=0xffffffff
awk-5016 [000] .... 7508.140821: myprobe: (do_sys_open+0x0/0x210) dfd=0x2 filename=0x88000 flags=0x1 mode=0xffffffff
awk-5016 [000] d... 7508.140829: myretprobe: (do_syscall_64+0x6e/0x1a0 <- do_sys_open) arg1=0x3
awk-5016 [000] .... 7508.140851: myprobe: (do_sys_open+0x0/0x210) dfd=0x2 filename=0x88000 flags=0xe148 mode=0xffffffff
awk-5016 [000] d... 7508.140856: myretprobe: (do_syscall_64+0x6e/0x1a0 <- do_sys_open) arg1=0x3
awk-5016 [000] .... 7508.140908: myprobe: (do_sys_open+0x0/0x210) dfd=0x2 filename=0x88000 flags=0xe148 mode=0xffffffff
awk-5016 [000] d... 7508.140913: myretprobe: (do_syscall_64+0x6e/0x1a0 <- do_sys_open) arg1=0x3
awk-5016 [000] .... 7508.140962: myprobe: (do_sys_open+0x0/0x210) dfd=0x2 filename=0x88000 flags=0xe148 mode=0xffffffff
awk-5016 [000] d... 7508.140966: myretprobe: (do_syscall_64+0x6e/0x1a0 <- do_sys_open) arg1=0x3
awk-5016 [000] .... 7508.141351: myprobe: (do_sys_open+0x0/0x210) dfd=0x2 filename=0x88000 flags=0x768 mode=0xffffffff
awk-5016 [000] d... 7508.141357: myretprobe: (do_syscall_64+0x6e/0x1a0 <- do_sys_open) arg1=0x3
awk-5016 [000] .... 7508.141451: myprobe: (do_sys_open+0x0/0x210) dfd=0x2 filename=0x8000 flags=0x0 mode=0xffffffff
每一行表示事件发生,其中<-符号表示从哪里返回。
清空kprobe事件
echo 0 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
echo 0 > /sys/kernel/debug/tracing/events/kprobes/myretprobe/enable
命令如下:
echo -:myprobe >> kprobe_events
4. 内核模块方式
使用代码如下:
/*
* NOTE: This example is works on x86.
* Here's a sample kernel module showing the use of kprobes to dump a
* stack trace and selected registers when do_fork() is called.
*
* For more information on theory of operation of kprobes, see
* Documentation/kprobes.txt
*
* You will see the trace data in /var/log/messages and on the console
* whenever do_fork() is invoked to create a new process.
*/
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/kprobes.h>
/* For each probe you need to allocate a kprobe structure */
static struct kprobe kp = {
.symbol_name = "do_fork",
};
/* kprobe pre_handler: called just before the probed instruction is executed */
static int handler_pre(struct kprobe *p, struct pt_regs *regs)
{
#ifdef CONFIG_X86
printk(KERN_INFO "pre_handler: p->addr = 0x%p, ip = %lx,"
" flags = 0x%lx\n",
p->addr, regs->ip, regs->flags);
#endif
#ifdef CONFIG_PPC
printk(KERN_INFO "pre_handler: p->addr = 0x%p, nip = 0x%lx,"
" msr = 0x%lx\n",
p->addr, regs->nip, regs->msr);
#endif
#ifdef CONFIG_MIPS
printk(KERN_INFO "pre_handler: p->addr = 0x%p, epc = 0x%lx,"
" status = 0x%lx\n",
p->addr, regs->cp0_epc, regs->cp0_status);
#endif
/* A dump_stack() here will give a stack backtrace */
return 0;
}
/* kprobe post_handler: called after the probed instruction is executed */
static void handler_post(struct kprobe *p, struct pt_regs *regs,
unsigned long flags)
{
#ifdef CONFIG_X86
printk(KERN_INFO "post_handler: p->addr = 0x%p, flags = 0x%lx\n",
p->addr, regs->flags);
#endif
#ifdef CONFIG_PPC
printk(KERN_INFO "post_handler: p->addr = 0x%p, msr = 0x%lx\n",
p->addr, regs->msr);
#endif
#ifdef CONFIG_MIPS
printk(KERN_INFO "post_handler: p->addr = 0x%p, status = 0x%lx\n",
p->addr, regs->cp0_status);
#endif
}
/*
* fault_handler: this is called if an exception is generated for any
* instruction within the pre- or post-handler, or when Kprobes
* single-steps the probed instruction.
*/
static int handler_fault(struct kprobe *p, struct pt_regs *regs, int trapnr)
{
printk(KERN_INFO "fault_handler: p->addr = 0x%p, trap #%dn",
p->addr, trapnr);
/* Return 0 because we don't handle the fault. */
return 0;
}
static int __init kprobe_init(void)
{
int ret;
kp.pre_handler = handler_pre;
kp.post_handler = handler_post;
kp.fault_handler = handler_fault;
ret = register_kprobe(&kp);
if (ret < 0) {
printk(KERN_INFO "register_kprobe failed, returned %d\n", ret);
return ret;
}
printk(KERN_INFO "Planted kprobe at %p\n", kp.addr);
return 0;
}
static void __exit kprobe_exit(void)
{
unregister_kprobe(&kp);
printk(KERN_INFO "kprobe at %p unregistered\n", kp.addr);
}
module_init(kprobe_init)
module_exit(kprobe_exit)
MODULE_LICENSE("GPL");
添加Makefile如下:
obj-m := pr.o
CROSS_COMPILE=''
KDIR := /lib/modules/`uname -r`/build
PWD := $(shell pwd)
default:
make -C $(KDIR) M=$(PWD) modules
clean:
rm -rf *.o .* .cmd *.ko *.mod.c .tmp_versions
然后加载pr.ko文件后,可以通过dmesg命令查看相关输出。
5. 参考
https://blog.csdn.net/luckyapple1028/article/details/52972315
Documentation/kprobes.txt
Documentation/trace/ftrace.txt