bpf_func_id是如何产生的?

简介: bpf_func_id是如何产生的?

作者

pengdonglin137@163.com

正文

libbpf-bootstrap中的一个测试程序minimal.bpf.c为例。

下面是minimal.bpf.c的源码:

minimal.bpf.c

这个函数在sys_enter_write这个tracepoint上挂一个钩子,这个钩子会检查执行到这个tracepoint的进程的pid,如果等于当前进程的pid,那么执行输出,对应的用户态程序是minimal.c,源码如下:

minimal.c

其中通过getpid获取当前进程的pid,然后设置给mypid,因为这是一个没有初始化的全局变量,所以放在bss段,在minimal_bpf__open()返回后,通过skel->bss->my_pid可以访问这个变量。

下面是编译命令:

clang -g -O2 -target bpf -D__TARGET_ARCH_x86                  \
             -I.output -I../../libbpf/include/uapi -I../../vmlinux/x86/ -I/mnt/libbpf-bootstrap_compiled/blazesym/include -idirafter /usr/lib/llvm-17/lib/clang/17/include -idirafter /usr/local/include -idirafter /usr/include/x86_64-linux-gnu -idirafter /include -idirafter /usr/include               \
             -c minimal.bpf.c -o .output/minimal.tmp.bpf.o
/mnt/libbpf-bootstrap_compiled/examples/c/.output/bpftool/bootstrap/bpftool gen object .output/minimal.bpf.o .output/minimal.tmp.bpf.o
/mnt/libbpf-bootstrap_compiled/examples/c/.output/bpftool/bootstrap/bpftool gen skeleton .output/minimal.bpf.o > .output/minimal.skel.h
cc -g -Wall -I.output -I../../libbpf/include/uapi -I../../vmlinux/x86/ -I/mnt/libbpf-bootstrap_compiled/blazesym/include -c minimal.c -o .output/minimal.o
cc -g -Wall .output/minimal.o /mnt/libbpf-bootstrap_compiled/examples/c/.output/libbpf.a   -lelf -lz -o minimal

上面编译时生成的minimal.skel.h文件的内容:

minimal.skel.h

对中间文件.output/minimal.bpf.o进行反汇编得到其bpf字节码:

# llvm-objdump -S .output/minimal.bpf.o
.output/minimal.bpf.o:  file format ELF64-BPF
Disassembly of section tp/syscalls/sys_enter_write:
0000000000000000 handle_tp:
0:       85 00 00 00 0e 00 00 00 call 14
1:       77 00 00 00 20 00 00 00 r0 >>= 32
2:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
4:       61 11 00 00 00 00 00 00 r1 = *(u32 *)(r1 + 0)
5:       5d 01 05 00 00 00 00 00 if r1 != r0 goto +5 <LBB0_2>
6:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
8:       b7 02 00 00 1c 00 00 00 r2 = 28
9:       bf 03 00 00 00 00 00 00 r3 = r0
10:       85 00 00 00 06 00 00 00 call 6
0000000000000058 LBB0_2:
11:       b7 00 00 00 00 00 00 00 r0 = 0
12:       95 00 00 00 00 00 00 00 exit

可以看到,上面调用bpf_get_current_pid_tgidbpf_printk函数的地方反汇编后分别变成了call 14call 6,这里的数字是哪里来的呢?

这里的数字来自.output/bpf/bpf_helper_defs.h

bpf_helper_defs.h部分源码

libbpf项目中的sync-kernel.sh提供了这个文件的生成命令:

# Generate bpf_helper_defs.h and commit, if anything changed
# restore Linux tip to use bpf_doc.py
cd_to ${LINUX_REPO}
git checkout ${TIP_TAG}
# re-generate bpf_helper_defs.h
cd_to ${LIBBPF_REPO}
"${LINUX_ABS_DIR}/scripts/bpf_doc.py" --header                \
  --file include/uapi/linux/bpf.h > src/bpf_helper_defs.h
# if anything changed, commit it
helpers_changes=$(git status --porcelain src/bpf_helper_defs.h | wc -l)
if ((${helpers_changes} == 1)); then
  git add src/bpf_helper_defs.h
  git commit -s -m "sync: auto-generate latest BPF helpers
 
Latest changes to BPF helper definitions.
" -- src/bpf_helper_defs.h
fi

当minimal开始运行后,字节码会加载到内核中,可以使用bpftool可以把实际运行的字节码打印出来:

# bpftool prog
...
22: tracepoint  name handle_tp  tag 6a5dcef153b1001e  gpl
        loaded_at 2023-11-01T22:26:31+0800  uid 0
        xlated 104B  jited 73B  memlock 4096B  map_ids 8,9
        btf_id 19
# bpftool prog dump xlated id 22
int handle_tp(void * ctx):
; int pid = bpf_get_current_pid_tgid() >> 32;
   0: (85) call bpf_get_current_pid_tgid#200752
; int pid = bpf_get_current_pid_tgid() >> 32;
   1: (77) r0 >>= 32
; if (pid != my_pid)
   2: (18) r1 = map[id:8][0]+0
   4: (61) r1 = *(u32 *)(r1 +0)
; if (pid != my_pid)
   5: (5d) if r1 != r0 goto pc+5
; bpf_printk("BPF triggered from PID %d.\n", pid);
   6: (18) r1 = map[id:9][0]+0
   8: (b7) r2 = 28
   9: (bf) r3 = r0
  10: (85) call bpf_trace_printk#-83056
; }
  11: (b7) r0 = 0
  12: (95) exit

在加载到内核的过程中,内核会把上面的id换成实际的函数调用。

下面是跟踪minimal运行时跟踪的日志(用文本文档打开):

libbpf: loading object 'minimal_bpf' from buffer
libbpf: elf: section(2) .symtab, size 216, link 1, flags 0, type=2
libbpf: elf: section(3) tp/syscalls/sys_enter_write, size 104, link 0, flags 6, type=1
libbpf: sec 'tp/syscalls/sys_enter_write': found program 'handle_tp' at insn offset 0 (0 bytes), code size 13 insns (104 bytes)
libbpf: elf: section(4) license, size 13, link 0, flags 3, type=1
libbpf: license of minimal_bpf is Dual BSD/GPL
libbpf: elf: section(5) .bss, size 4, link 0, flags 3, type=8
libbpf: elf: section(6) .rodata, size 28, link 0, flags 2, type=1
libbpf: elf: section(7) .reltp/syscalls/sys_enter_write, size 32, link 2, flags 40, type=9
libbpf: elf: section(8) .BTF, size 586, link 0, flags 0, type=1
libbpf: elf: section(9) .BTF.ext, size 160, link 0, flags 0, type=1
libbpf: looking for externs among 9 symbols...
libbpf: collected 0 externs total
libbpf: map 'minimal_.bss' (global data): at sec_idx 5, offset 0, flags 400.
libbpf: map 0 is "minimal_.bss"
libbpf: map 'minimal_.rodata' (global data): at sec_idx 6, offset 0, flags 80.
libbpf: map 1 is "minimal_.rodata"
libbpf: sec '.reltp/syscalls/sys_enter_write': collecting relocation for section(3) 'tp/syscalls/sys_enter_write'
libbpf: sec '.reltp/syscalls/sys_enter_write': relo #0: insn #2 against 'my_pid'
libbpf: prog 'handle_tp': found data map 0 (minimal_.bss, sec 5, off 0) for insn 2
libbpf: sec '.reltp/syscalls/sys_enter_write': relo #1: insn #6 against '.rodata'
libbpf: prog 'handle_tp': found data map 1 (minimal_.rodata, sec 6, off 0) for insn 6
libbpf: map 'minimal_.bss': created successfully, fd=4
libbpf: map 'minimal_.rodata': created successfully, fd=5
Successfully started! Please run `sudo cat /sys/kernel/debug/tracing/trace_pipe` to see output of the BPF programs.
...^C

这里把bpf程序可以调用的内核函数转换成一个唯一的编号,然后内置到bpf字节码中,在加载到内核时,再根据实际的内核函数地址对bpf字节码进行修改。


下面是bpf_func_id在内核中是如何定义的一些历史:

最开始的时候,这个id是通过手动枚举实现,参考下面的patch:

commit ffeedafbf0236f03aeb2e8db273b3e5ae5f5bc89
Author: Alexei Starovoitov <ast@kernel.org>
Date:   Fri Jun 12 19:39:12 2015 -0700
    bpf: introduce current->pid, tgid, uid, gid, comm accessors
    eBPF programs attached to kprobes need to filter based on
    current->pid, uid and other fields, so introduce helper functions:
    u64 bpf_get_current_pid_tgid(void)
    Return: current->tgid << 32 | current->pid
    u64 bpf_get_current_uid_gid(void)
    Return: current_gid << 32 | current_uid
bpf_get_current_comm(char *buf, int size_of_buf)
    stores current->comm into buf
    They can be used from the programs attached to TC as well to classify packets
    based on current task fields.
    Update tracex2 example to print histogram of write syscalls for each process
    instead of aggregated for all.
    Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

后来为了解决反汇编字节码时可以显示实际的函数名,添加了__BPF_FUNC_MAPPER,这样可以方便地根据枚举值得到函数名:

commit ebb676daa1a340ccef25eb769aefc09b79c01f8a
Author: Thomas Graf <tgraf@suug.ch>
Date:   Thu Oct 27 11:23:51 2016 +0200
    bpf: Print function name in addition to function id
    The verifier currently prints raw function ids when printing CALL
    instructions or when complaining:
5: (85) call 23
            unknown func 23
    print a meaningful function name instead:
5: (85) call bpf_redirect#23
            unknown func bpf_redirect#23
    Moves the function documentation to a single comment and renames all
    helpers names in the list to conform to the bpf_ prefix notation so
    they can be greped in the kernel source.
    Signed-off-by: Thomas Graf <tgraf@suug.ch>
    Acked-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

再后来为了方便地看到内核函数跟id的对应关系,有对宏进行了重新设计:

commit 8a76145a2ec2a81dfe34d7ac42e8c242f095e8c8
Refs: v6.0-2740-g8a76145a2ec2
Author:     Andrii Nakryiko <andrii@kernel.org>
AuthorDate: Wed Oct 5 21:24:51 2022 -0700
Commit:     Alexei Starovoitov <ast@kernel.org>
CommitDate: Thu Oct 6 08:19:30 2022 -0700
    bpf: explicitly define BPF_FUNC_xxx integer values
    Historically enum bpf_func_id's BPF_FUNC_xxx enumerators relied on
implicit sequential values being assigned by compiler. This is
convenient, as new BPF helpers are always added at the very end, but it
also has its downsides, some of them being:
      - with over 200 helpers now it's very hard to know what's each helper's ID,
        which is often important to know when working with BPF assembly (e.g.,
        by dumping raw bpf assembly instructions with llvm-objdump -d
        command). it's possible to work around this by looking into vmlinux.h,
        dumping /sys/btf/kernel/vmlinux, looking at libbpf-provided
        bpf_helper_defs.h, etc. But it always feels like an unnecessary step
        and one should be able to quickly figure this out from UAPI header.
      - when backporting and cherry-picking only some BPF helpers onto older
        kernels it's important to be able to skip some enum values for helpers
        that weren't backported, but preserve absolute integer IDs to keep BPF
        helper IDs stable so that BPF programs stay portable across upstream
        and backported kernels.
    While neither problem is insurmountable, they come up frequently enough
    and are annoying enough to warrant improving the situation. And for the
    backporting the problem can easily go unnoticed for a while, especially
if backport is done with people not very familiar with BPF subsystem overall.
    Anyways, it's easy to fix this by making sure that __BPF_FUNC_MAPPER
    macro provides explicit helper IDs. Unfortunately that would potentially
break existing users that use UAPI-exposed __BPF_FUNC_MAPPER and are
    expected to pass macro that accepts only symbolic helper identifier
(e.g., map_lookup_elem for bpf_map_lookup_elem() helper).
    As such, we need to introduce a new macro (___BPF_FUNC_MAPPER) which
    would specify both identifier and integer ID, but in such a way as to
    allow existing __BPF_FUNC_MAPPER be expressed in terms of new
    ___BPF_FUNC_MAPPER macro. And that's what this patch is doing. To avoid
    duplication and allow __BPF_FUNC_MAPPER stay *exactly* the same,
    ___BPF_FUNC_MAPPER accepts arbitrary "context" arguments, which can be
    used to pass any extra macros, arguments, and whatnot. In our case we
    use this to pass original user-provided macro that expects single
    argument and __BPF_FUNC_MAPPER is using it's own three-argument
    __BPF_FUNC_MAPPER_APPLY intermediate macro to impedance-match new and
    old "callback" macros.
    Once we resolve this, we use new ___BPF_FUNC_MAPPER to define enum
    bpf_func_id with explicit values. The other users of __BPF_FUNC_MAPPER
    in kernel (namely in kernel/bpf/disasm.c) are kept exactly the same both
    as demonstration that backwards compat works, but also to avoid
    unnecessary code churn.
    Note that new ___BPF_FUNC_MAPPER() doesn't forcefully insert comma
    between values, as that might not be appropriate in all possible cases
    where ___BPF_FUNC_MAPPER might be used by users. This doesn't reduce
    usability, as it's trivial to insert that comma inside "callback" macro.
    To validate all the manually specified IDs are exactly right, we used
    BTF to compare before and after values:
      $ bpftool btf dump file ~/linux-build/default/vmlinux | rg bpf_func_id -A 211 > after.txt
      $ git stash # stach UAPI changes
      $ make -j90
      ... re-building kernel without UAPI changes ...
      $ bpftool btf dump file ~/linux-build/default/vmlinux | rg bpf_func_id -A 211 > before.txt
      $ diff -u before.txt after.txt
      --- before.txt  2022-10-05 10:48:18.119195916 -0700
      +++ after.txt   2022-10-05 10:46:49.446615025 -0700
      @@ -1,4 +1,4 @@
      -[14576] ENUM 'bpf_func_id' encoding=UNSIGNED size=4 vlen=211
      +[9560] ENUM 'bpf_func_id' encoding=UNSIGNED size=4 vlen=211
'BPF_FUNC_unspec' val=0
'BPF_FUNC_map_lookup_elem' val=1
'BPF_FUNC_map_update_elem' val=2
    As can be seen from diff above, the only thing that changed was resulting BTF
    type ID of ENUM bpf_func_id, not any of the enumerators, their names or integer
    values.
    The only other place that needed fixing was scripts/bpf_doc.py used to generate
    man pages and bpf_helper_defs.h header for libbpf and selftests. That script is
    tightly-coupled to exact shape of ___BPF_FUNC_MAPPER macro definition, so had
    to be trivially adapted.
    Cc: Quentin Monnet <quentin@isovalent.com>
    Reported-by: Andrea Terzolo <andrea.terzolo@polito.it>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Reviewed-by: Quentin Monnet <quentin@isovalent.com>
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Link: https://lore.kernel.org/r/20221006042452.2089843-1-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

完。

相关文章
|
5月前
|
Go
Go - struct{} 实现 interface{}
Go - struct{} 实现 interface{}
45 9
container_of(ptr,type,member)宏
详细解释了container_of(ptr,type,member)宏的用途
|
存储 Go
Go空结构体struct {}
struct {}介绍、使用场景、和struct {}{}比较
112 0
|
编译器 C语言
__builtin_return_address()函数的使用方法
__builtin_return_address()函数的使用方法
288 1
|
8月前
|
编译器 C++
offsetof宏的使用、模拟实现及 (size_t)&(((struct_type*)0)->mem_name)的解释
offsetof宏的使用、模拟实现及 (size_t)&(((struct_type*)0)->mem_name)的解释
用#define宏实现Add函数
用#define宏实现Add函数
110 0
|
JavaScript
深入理解 V8 的 Call Stack
Call Stack(调用栈) 一般指计算机程序执行时子程序之间消息处理的相互调用产生的一些列函数序列,而且几乎所有的计算机程序都依赖于调用栈。
3641 0
|
Linux
编译OpenJDK8:error: control reaches end of non-void function [-Werror=return-type]
编译OpenJDK8:error: control reaches end of non-void function [-Werror=return-type]
196 0
|
PHP
【laravel】call_user_func_array在框架的使用
【laravel】call_user_func_array在框架的使用
212 0
【laravel】call_user_func_array在框架的使用
【GO】goto结构体
【GO】goto结构体
106 0
【GO】goto结构体

热门文章

最新文章