作者
pengdonglin137@163.com
正文
以libbpf-bootstrap中的一个测试程序minimal.bpf.c为例。
下面是minimal.bpf.c
的源码:
minimal.bpf.c
这个函数在sys_enter_write
这个tracepoint上挂一个钩子,这个钩子会检查执行到这个tracepoint的进程的pid,如果等于当前进程的pid,那么执行输出,对应的用户态程序是minimal.c
,源码如下:
minimal.c
其中通过getpid获取当前进程的pid,然后设置给mypid,因为这是一个没有初始化的全局变量,所以放在bss段,在minimal_bpf__open()
返回后,通过skel->bss->my_pid
可以访问这个变量。
下面是编译命令:
clang -g -O2 -target bpf -D__TARGET_ARCH_x86 \ -I.output -I../../libbpf/include/uapi -I../../vmlinux/x86/ -I/mnt/libbpf-bootstrap_compiled/blazesym/include -idirafter /usr/lib/llvm-17/lib/clang/17/include -idirafter /usr/local/include -idirafter /usr/include/x86_64-linux-gnu -idirafter /include -idirafter /usr/include \ -c minimal.bpf.c -o .output/minimal.tmp.bpf.o /mnt/libbpf-bootstrap_compiled/examples/c/.output/bpftool/bootstrap/bpftool gen object .output/minimal.bpf.o .output/minimal.tmp.bpf.o /mnt/libbpf-bootstrap_compiled/examples/c/.output/bpftool/bootstrap/bpftool gen skeleton .output/minimal.bpf.o > .output/minimal.skel.h cc -g -Wall -I.output -I../../libbpf/include/uapi -I../../vmlinux/x86/ -I/mnt/libbpf-bootstrap_compiled/blazesym/include -c minimal.c -o .output/minimal.o cc -g -Wall .output/minimal.o /mnt/libbpf-bootstrap_compiled/examples/c/.output/libbpf.a -lelf -lz -o minimal
上面编译时生成的minimal.skel.h
文件的内容:
minimal.skel.h
对中间文件.output/minimal.bpf.o
进行反汇编得到其bpf字节码:
# llvm-objdump -S .output/minimal.bpf.o .output/minimal.bpf.o: file format ELF64-BPF Disassembly of section tp/syscalls/sys_enter_write: 0000000000000000 handle_tp: 0: 85 00 00 00 0e 00 00 00 call 14 1: 77 00 00 00 20 00 00 00 r0 >>= 32 2: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 4: 61 11 00 00 00 00 00 00 r1 = *(u32 *)(r1 + 0) 5: 5d 01 05 00 00 00 00 00 if r1 != r0 goto +5 <LBB0_2> 6: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 8: b7 02 00 00 1c 00 00 00 r2 = 28 9: bf 03 00 00 00 00 00 00 r3 = r0 10: 85 00 00 00 06 00 00 00 call 6 0000000000000058 LBB0_2: 11: b7 00 00 00 00 00 00 00 r0 = 0 12: 95 00 00 00 00 00 00 00 exit
可以看到,上面调用bpf_get_current_pid_tgid
和bpf_printk
函数的地方反汇编后分别变成了call 14
和call 6
,这里的数字是哪里来的呢?
这里的数字来自.output/bpf/bpf_helper_defs.h
:
bpf_helper_defs.h部分源码
在libbpf项目中的sync-kernel.sh提供了这个文件的生成命令:
# Generate bpf_helper_defs.h and commit, if anything changed # restore Linux tip to use bpf_doc.py cd_to ${LINUX_REPO} git checkout ${TIP_TAG} # re-generate bpf_helper_defs.h cd_to ${LIBBPF_REPO} "${LINUX_ABS_DIR}/scripts/bpf_doc.py" --header \ --file include/uapi/linux/bpf.h > src/bpf_helper_defs.h # if anything changed, commit it helpers_changes=$(git status --porcelain src/bpf_helper_defs.h | wc -l) if ((${helpers_changes} == 1)); then git add src/bpf_helper_defs.h git commit -s -m "sync: auto-generate latest BPF helpers Latest changes to BPF helper definitions. " -- src/bpf_helper_defs.h fi
当minimal开始运行后,字节码会加载到内核中,可以使用bpftool可以把实际运行的字节码打印出来:
# bpftool prog ... 22: tracepoint name handle_tp tag 6a5dcef153b1001e gpl loaded_at 2023-11-01T22:26:31+0800 uid 0 xlated 104B jited 73B memlock 4096B map_ids 8,9 btf_id 19 # bpftool prog dump xlated id 22 int handle_tp(void * ctx): ; int pid = bpf_get_current_pid_tgid() >> 32; 0: (85) call bpf_get_current_pid_tgid#200752 ; int pid = bpf_get_current_pid_tgid() >> 32; 1: (77) r0 >>= 32 ; if (pid != my_pid) 2: (18) r1 = map[id:8][0]+0 4: (61) r1 = *(u32 *)(r1 +0) ; if (pid != my_pid) 5: (5d) if r1 != r0 goto pc+5 ; bpf_printk("BPF triggered from PID %d.\n", pid); 6: (18) r1 = map[id:9][0]+0 8: (b7) r2 = 28 9: (bf) r3 = r0 10: (85) call bpf_trace_printk#-83056 ; } 11: (b7) r0 = 0 12: (95) exit
在加载到内核的过程中,内核会把上面的id换成实际的函数调用。
下面是跟踪minimal运行时跟踪的日志(用文本文档打开):
libbpf: loading object 'minimal_bpf' from buffer libbpf: elf: section(2) .symtab, size 216, link 1, flags 0, type=2 libbpf: elf: section(3) tp/syscalls/sys_enter_write, size 104, link 0, flags 6, type=1 libbpf: sec 'tp/syscalls/sys_enter_write': found program 'handle_tp' at insn offset 0 (0 bytes), code size 13 insns (104 bytes) libbpf: elf: section(4) license, size 13, link 0, flags 3, type=1 libbpf: license of minimal_bpf is Dual BSD/GPL libbpf: elf: section(5) .bss, size 4, link 0, flags 3, type=8 libbpf: elf: section(6) .rodata, size 28, link 0, flags 2, type=1 libbpf: elf: section(7) .reltp/syscalls/sys_enter_write, size 32, link 2, flags 40, type=9 libbpf: elf: section(8) .BTF, size 586, link 0, flags 0, type=1 libbpf: elf: section(9) .BTF.ext, size 160, link 0, flags 0, type=1 libbpf: looking for externs among 9 symbols... libbpf: collected 0 externs total libbpf: map 'minimal_.bss' (global data): at sec_idx 5, offset 0, flags 400. libbpf: map 0 is "minimal_.bss" libbpf: map 'minimal_.rodata' (global data): at sec_idx 6, offset 0, flags 80. libbpf: map 1 is "minimal_.rodata" libbpf: sec '.reltp/syscalls/sys_enter_write': collecting relocation for section(3) 'tp/syscalls/sys_enter_write' libbpf: sec '.reltp/syscalls/sys_enter_write': relo #0: insn #2 against 'my_pid' libbpf: prog 'handle_tp': found data map 0 (minimal_.bss, sec 5, off 0) for insn 2 libbpf: sec '.reltp/syscalls/sys_enter_write': relo #1: insn #6 against '.rodata' libbpf: prog 'handle_tp': found data map 1 (minimal_.rodata, sec 6, off 0) for insn 6 libbpf: map 'minimal_.bss': created successfully, fd=4 libbpf: map 'minimal_.rodata': created successfully, fd=5 Successfully started! Please run `sudo cat /sys/kernel/debug/tracing/trace_pipe` to see output of the BPF programs. ...^C
这里把bpf程序可以调用的内核函数转换成一个唯一的编号,然后内置到bpf字节码中,在加载到内核时,再根据实际的内核函数地址对bpf字节码进行修改。
下面是bpf_func_id在内核中是如何定义的一些历史:
最开始的时候,这个id是通过手动枚举实现,参考下面的patch:
commit ffeedafbf0236f03aeb2e8db273b3e5ae5f5bc89 Author: Alexei Starovoitov <ast@kernel.org> Date: Fri Jun 12 19:39:12 2015 -0700 bpf: introduce current->pid, tgid, uid, gid, comm accessors eBPF programs attached to kprobes need to filter based on current->pid, uid and other fields, so introduce helper functions: u64 bpf_get_current_pid_tgid(void) Return: current->tgid << 32 | current->pid u64 bpf_get_current_uid_gid(void) Return: current_gid << 32 | current_uid bpf_get_current_comm(char *buf, int size_of_buf) stores current->comm into buf They can be used from the programs attached to TC as well to classify packets based on current task fields. Update tracex2 example to print histogram of write syscalls for each process instead of aggregated for all. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
后来为了解决反汇编字节码时可以显示实际的函数名,添加了__BPF_FUNC_MAPPER
,这样可以方便地根据枚举值得到函数名:
commit ebb676daa1a340ccef25eb769aefc09b79c01f8a Author: Thomas Graf <tgraf@suug.ch> Date: Thu Oct 27 11:23:51 2016 +0200 bpf: Print function name in addition to function id The verifier currently prints raw function ids when printing CALL instructions or when complaining: 5: (85) call 23 unknown func 23 print a meaningful function name instead: 5: (85) call bpf_redirect#23 unknown func bpf_redirect#23 Moves the function documentation to a single comment and renames all helpers names in the list to conform to the bpf_ prefix notation so they can be greped in the kernel source. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
再后来为了方便地看到内核函数跟id的对应关系,有对宏进行了重新设计:
commit 8a76145a2ec2a81dfe34d7ac42e8c242f095e8c8 Refs: v6.0-2740-g8a76145a2ec2 Author: Andrii Nakryiko <andrii@kernel.org> AuthorDate: Wed Oct 5 21:24:51 2022 -0700 Commit: Alexei Starovoitov <ast@kernel.org> CommitDate: Thu Oct 6 08:19:30 2022 -0700 bpf: explicitly define BPF_FUNC_xxx integer values Historically enum bpf_func_id's BPF_FUNC_xxx enumerators relied on implicit sequential values being assigned by compiler. This is convenient, as new BPF helpers are always added at the very end, but it also has its downsides, some of them being: - with over 200 helpers now it's very hard to know what's each helper's ID, which is often important to know when working with BPF assembly (e.g., by dumping raw bpf assembly instructions with llvm-objdump -d command). it's possible to work around this by looking into vmlinux.h, dumping /sys/btf/kernel/vmlinux, looking at libbpf-provided bpf_helper_defs.h, etc. But it always feels like an unnecessary step and one should be able to quickly figure this out from UAPI header. - when backporting and cherry-picking only some BPF helpers onto older kernels it's important to be able to skip some enum values for helpers that weren't backported, but preserve absolute integer IDs to keep BPF helper IDs stable so that BPF programs stay portable across upstream and backported kernels. While neither problem is insurmountable, they come up frequently enough and are annoying enough to warrant improving the situation. And for the backporting the problem can easily go unnoticed for a while, especially if backport is done with people not very familiar with BPF subsystem overall. Anyways, it's easy to fix this by making sure that __BPF_FUNC_MAPPER macro provides explicit helper IDs. Unfortunately that would potentially break existing users that use UAPI-exposed __BPF_FUNC_MAPPER and are expected to pass macro that accepts only symbolic helper identifier (e.g., map_lookup_elem for bpf_map_lookup_elem() helper). As such, we need to introduce a new macro (___BPF_FUNC_MAPPER) which would specify both identifier and integer ID, but in such a way as to allow existing __BPF_FUNC_MAPPER be expressed in terms of new ___BPF_FUNC_MAPPER macro. And that's what this patch is doing. To avoid duplication and allow __BPF_FUNC_MAPPER stay *exactly* the same, ___BPF_FUNC_MAPPER accepts arbitrary "context" arguments, which can be used to pass any extra macros, arguments, and whatnot. In our case we use this to pass original user-provided macro that expects single argument and __BPF_FUNC_MAPPER is using it's own three-argument __BPF_FUNC_MAPPER_APPLY intermediate macro to impedance-match new and old "callback" macros. Once we resolve this, we use new ___BPF_FUNC_MAPPER to define enum bpf_func_id with explicit values. The other users of __BPF_FUNC_MAPPER in kernel (namely in kernel/bpf/disasm.c) are kept exactly the same both as demonstration that backwards compat works, but also to avoid unnecessary code churn. Note that new ___BPF_FUNC_MAPPER() doesn't forcefully insert comma between values, as that might not be appropriate in all possible cases where ___BPF_FUNC_MAPPER might be used by users. This doesn't reduce usability, as it's trivial to insert that comma inside "callback" macro. To validate all the manually specified IDs are exactly right, we used BTF to compare before and after values: $ bpftool btf dump file ~/linux-build/default/vmlinux | rg bpf_func_id -A 211 > after.txt $ git stash # stach UAPI changes $ make -j90 ... re-building kernel without UAPI changes ... $ bpftool btf dump file ~/linux-build/default/vmlinux | rg bpf_func_id -A 211 > before.txt $ diff -u before.txt after.txt --- before.txt 2022-10-05 10:48:18.119195916 -0700 +++ after.txt 2022-10-05 10:46:49.446615025 -0700 @@ -1,4 +1,4 @@ -[14576] ENUM 'bpf_func_id' encoding=UNSIGNED size=4 vlen=211 +[9560] ENUM 'bpf_func_id' encoding=UNSIGNED size=4 vlen=211 'BPF_FUNC_unspec' val=0 'BPF_FUNC_map_lookup_elem' val=1 'BPF_FUNC_map_update_elem' val=2 As can be seen from diff above, the only thing that changed was resulting BTF type ID of ENUM bpf_func_id, not any of the enumerators, their names or integer values. The only other place that needed fixing was scripts/bpf_doc.py used to generate man pages and bpf_helper_defs.h header for libbpf and selftests. That script is tightly-coupled to exact shape of ___BPF_FUNC_MAPPER macro definition, so had to be trivially adapted. Cc: Quentin Monnet <quentin@isovalent.com> Reported-by: Andrea Terzolo <andrea.terzolo@polito.it> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20221006042452.2089843-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
完。