bpf_func_id是如何产生的?

简介: bpf_func_id是如何产生的?

作者

pengdonglin137@163.com

正文

libbpf-bootstrap中的一个测试程序minimal.bpf.c为例。

下面是minimal.bpf.c的源码:

minimal.bpf.c

这个函数在sys_enter_write这个tracepoint上挂一个钩子,这个钩子会检查执行到这个tracepoint的进程的pid,如果等于当前进程的pid,那么执行输出,对应的用户态程序是minimal.c,源码如下:

minimal.c

其中通过getpid获取当前进程的pid,然后设置给mypid,因为这是一个没有初始化的全局变量,所以放在bss段,在minimal_bpf__open()返回后,通过skel->bss->my_pid可以访问这个变量。

下面是编译命令:

clang -g -O2 -target bpf -D__TARGET_ARCH_x86                  \
             -I.output -I../../libbpf/include/uapi -I../../vmlinux/x86/ -I/mnt/libbpf-bootstrap_compiled/blazesym/include -idirafter /usr/lib/llvm-17/lib/clang/17/include -idirafter /usr/local/include -idirafter /usr/include/x86_64-linux-gnu -idirafter /include -idirafter /usr/include               \
             -c minimal.bpf.c -o .output/minimal.tmp.bpf.o
/mnt/libbpf-bootstrap_compiled/examples/c/.output/bpftool/bootstrap/bpftool gen object .output/minimal.bpf.o .output/minimal.tmp.bpf.o
/mnt/libbpf-bootstrap_compiled/examples/c/.output/bpftool/bootstrap/bpftool gen skeleton .output/minimal.bpf.o > .output/minimal.skel.h
cc -g -Wall -I.output -I../../libbpf/include/uapi -I../../vmlinux/x86/ -I/mnt/libbpf-bootstrap_compiled/blazesym/include -c minimal.c -o .output/minimal.o
cc -g -Wall .output/minimal.o /mnt/libbpf-bootstrap_compiled/examples/c/.output/libbpf.a   -lelf -lz -o minimal

上面编译时生成的minimal.skel.h文件的内容:

minimal.skel.h

对中间文件.output/minimal.bpf.o进行反汇编得到其bpf字节码:

# llvm-objdump -S .output/minimal.bpf.o
.output/minimal.bpf.o:  file format ELF64-BPF
Disassembly of section tp/syscalls/sys_enter_write:
0000000000000000 handle_tp:
0:       85 00 00 00 0e 00 00 00 call 14
1:       77 00 00 00 20 00 00 00 r0 >>= 32
2:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
4:       61 11 00 00 00 00 00 00 r1 = *(u32 *)(r1 + 0)
5:       5d 01 05 00 00 00 00 00 if r1 != r0 goto +5 <LBB0_2>
6:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
8:       b7 02 00 00 1c 00 00 00 r2 = 28
9:       bf 03 00 00 00 00 00 00 r3 = r0
10:       85 00 00 00 06 00 00 00 call 6
0000000000000058 LBB0_2:
11:       b7 00 00 00 00 00 00 00 r0 = 0
12:       95 00 00 00 00 00 00 00 exit

可以看到,上面调用bpf_get_current_pid_tgidbpf_printk函数的地方反汇编后分别变成了call 14call 6,这里的数字是哪里来的呢?

这里的数字来自.output/bpf/bpf_helper_defs.h

bpf_helper_defs.h部分源码

libbpf项目中的sync-kernel.sh提供了这个文件的生成命令:

# Generate bpf_helper_defs.h and commit, if anything changed
# restore Linux tip to use bpf_doc.py
cd_to ${LINUX_REPO}
git checkout ${TIP_TAG}
# re-generate bpf_helper_defs.h
cd_to ${LIBBPF_REPO}
"${LINUX_ABS_DIR}/scripts/bpf_doc.py" --header                \
  --file include/uapi/linux/bpf.h > src/bpf_helper_defs.h
# if anything changed, commit it
helpers_changes=$(git status --porcelain src/bpf_helper_defs.h | wc -l)
if ((${helpers_changes} == 1)); then
  git add src/bpf_helper_defs.h
  git commit -s -m "sync: auto-generate latest BPF helpers
 
Latest changes to BPF helper definitions.
" -- src/bpf_helper_defs.h
fi

当minimal开始运行后,字节码会加载到内核中,可以使用bpftool可以把实际运行的字节码打印出来:

# bpftool prog
...
22: tracepoint  name handle_tp  tag 6a5dcef153b1001e  gpl
        loaded_at 2023-11-01T22:26:31+0800  uid 0
        xlated 104B  jited 73B  memlock 4096B  map_ids 8,9
        btf_id 19
# bpftool prog dump xlated id 22
int handle_tp(void * ctx):
; int pid = bpf_get_current_pid_tgid() >> 32;
   0: (85) call bpf_get_current_pid_tgid#200752
; int pid = bpf_get_current_pid_tgid() >> 32;
   1: (77) r0 >>= 32
; if (pid != my_pid)
   2: (18) r1 = map[id:8][0]+0
   4: (61) r1 = *(u32 *)(r1 +0)
; if (pid != my_pid)
   5: (5d) if r1 != r0 goto pc+5
; bpf_printk("BPF triggered from PID %d.\n", pid);
   6: (18) r1 = map[id:9][0]+0
   8: (b7) r2 = 28
   9: (bf) r3 = r0
  10: (85) call bpf_trace_printk#-83056
; }
  11: (b7) r0 = 0
  12: (95) exit

在加载到内核的过程中,内核会把上面的id换成实际的函数调用。

下面是跟踪minimal运行时跟踪的日志(用文本文档打开):

libbpf: loading object 'minimal_bpf' from buffer
libbpf: elf: section(2) .symtab, size 216, link 1, flags 0, type=2
libbpf: elf: section(3) tp/syscalls/sys_enter_write, size 104, link 0, flags 6, type=1
libbpf: sec 'tp/syscalls/sys_enter_write': found program 'handle_tp' at insn offset 0 (0 bytes), code size 13 insns (104 bytes)
libbpf: elf: section(4) license, size 13, link 0, flags 3, type=1
libbpf: license of minimal_bpf is Dual BSD/GPL
libbpf: elf: section(5) .bss, size 4, link 0, flags 3, type=8
libbpf: elf: section(6) .rodata, size 28, link 0, flags 2, type=1
libbpf: elf: section(7) .reltp/syscalls/sys_enter_write, size 32, link 2, flags 40, type=9
libbpf: elf: section(8) .BTF, size 586, link 0, flags 0, type=1
libbpf: elf: section(9) .BTF.ext, size 160, link 0, flags 0, type=1
libbpf: looking for externs among 9 symbols...
libbpf: collected 0 externs total
libbpf: map 'minimal_.bss' (global data): at sec_idx 5, offset 0, flags 400.
libbpf: map 0 is "minimal_.bss"
libbpf: map 'minimal_.rodata' (global data): at sec_idx 6, offset 0, flags 80.
libbpf: map 1 is "minimal_.rodata"
libbpf: sec '.reltp/syscalls/sys_enter_write': collecting relocation for section(3) 'tp/syscalls/sys_enter_write'
libbpf: sec '.reltp/syscalls/sys_enter_write': relo #0: insn #2 against 'my_pid'
libbpf: prog 'handle_tp': found data map 0 (minimal_.bss, sec 5, off 0) for insn 2
libbpf: sec '.reltp/syscalls/sys_enter_write': relo #1: insn #6 against '.rodata'
libbpf: prog 'handle_tp': found data map 1 (minimal_.rodata, sec 6, off 0) for insn 6
libbpf: map 'minimal_.bss': created successfully, fd=4
libbpf: map 'minimal_.rodata': created successfully, fd=5
Successfully started! Please run `sudo cat /sys/kernel/debug/tracing/trace_pipe` to see output of the BPF programs.
...^C

这里把bpf程序可以调用的内核函数转换成一个唯一的编号,然后内置到bpf字节码中,在加载到内核时,再根据实际的内核函数地址对bpf字节码进行修改。


下面是bpf_func_id在内核中是如何定义的一些历史:

最开始的时候,这个id是通过手动枚举实现,参考下面的patch:

commit ffeedafbf0236f03aeb2e8db273b3e5ae5f5bc89
Author: Alexei Starovoitov <ast@kernel.org>
Date:   Fri Jun 12 19:39:12 2015 -0700
    bpf: introduce current->pid, tgid, uid, gid, comm accessors
    eBPF programs attached to kprobes need to filter based on
    current->pid, uid and other fields, so introduce helper functions:
    u64 bpf_get_current_pid_tgid(void)
    Return: current->tgid << 32 | current->pid
    u64 bpf_get_current_uid_gid(void)
    Return: current_gid << 32 | current_uid
bpf_get_current_comm(char *buf, int size_of_buf)
    stores current->comm into buf
    They can be used from the programs attached to TC as well to classify packets
    based on current task fields.
    Update tracex2 example to print histogram of write syscalls for each process
    instead of aggregated for all.
    Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

后来为了解决反汇编字节码时可以显示实际的函数名,添加了__BPF_FUNC_MAPPER,这样可以方便地根据枚举值得到函数名:

commit ebb676daa1a340ccef25eb769aefc09b79c01f8a
Author: Thomas Graf <tgraf@suug.ch>
Date:   Thu Oct 27 11:23:51 2016 +0200
    bpf: Print function name in addition to function id
    The verifier currently prints raw function ids when printing CALL
    instructions or when complaining:
5: (85) call 23
            unknown func 23
    print a meaningful function name instead:
5: (85) call bpf_redirect#23
            unknown func bpf_redirect#23
    Moves the function documentation to a single comment and renames all
    helpers names in the list to conform to the bpf_ prefix notation so
    they can be greped in the kernel source.
    Signed-off-by: Thomas Graf <tgraf@suug.ch>
    Acked-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

再后来为了方便地看到内核函数跟id的对应关系,有对宏进行了重新设计:

commit 8a76145a2ec2a81dfe34d7ac42e8c242f095e8c8
Refs: v6.0-2740-g8a76145a2ec2
Author:     Andrii Nakryiko <andrii@kernel.org>
AuthorDate: Wed Oct 5 21:24:51 2022 -0700
Commit:     Alexei Starovoitov <ast@kernel.org>
CommitDate: Thu Oct 6 08:19:30 2022 -0700
    bpf: explicitly define BPF_FUNC_xxx integer values
    Historically enum bpf_func_id's BPF_FUNC_xxx enumerators relied on
implicit sequential values being assigned by compiler. This is
convenient, as new BPF helpers are always added at the very end, but it
also has its downsides, some of them being:
      - with over 200 helpers now it's very hard to know what's each helper's ID,
        which is often important to know when working with BPF assembly (e.g.,
        by dumping raw bpf assembly instructions with llvm-objdump -d
        command). it's possible to work around this by looking into vmlinux.h,
        dumping /sys/btf/kernel/vmlinux, looking at libbpf-provided
        bpf_helper_defs.h, etc. But it always feels like an unnecessary step
        and one should be able to quickly figure this out from UAPI header.
      - when backporting and cherry-picking only some BPF helpers onto older
        kernels it's important to be able to skip some enum values for helpers
        that weren't backported, but preserve absolute integer IDs to keep BPF
        helper IDs stable so that BPF programs stay portable across upstream
        and backported kernels.
    While neither problem is insurmountable, they come up frequently enough
    and are annoying enough to warrant improving the situation. And for the
    backporting the problem can easily go unnoticed for a while, especially
if backport is done with people not very familiar with BPF subsystem overall.
    Anyways, it's easy to fix this by making sure that __BPF_FUNC_MAPPER
    macro provides explicit helper IDs. Unfortunately that would potentially
break existing users that use UAPI-exposed __BPF_FUNC_MAPPER and are
    expected to pass macro that accepts only symbolic helper identifier
(e.g., map_lookup_elem for bpf_map_lookup_elem() helper).
    As such, we need to introduce a new macro (___BPF_FUNC_MAPPER) which
    would specify both identifier and integer ID, but in such a way as to
    allow existing __BPF_FUNC_MAPPER be expressed in terms of new
    ___BPF_FUNC_MAPPER macro. And that's what this patch is doing. To avoid
    duplication and allow __BPF_FUNC_MAPPER stay *exactly* the same,
    ___BPF_FUNC_MAPPER accepts arbitrary "context" arguments, which can be
    used to pass any extra macros, arguments, and whatnot. In our case we
    use this to pass original user-provided macro that expects single
    argument and __BPF_FUNC_MAPPER is using it's own three-argument
    __BPF_FUNC_MAPPER_APPLY intermediate macro to impedance-match new and
    old "callback" macros.
    Once we resolve this, we use new ___BPF_FUNC_MAPPER to define enum
    bpf_func_id with explicit values. The other users of __BPF_FUNC_MAPPER
    in kernel (namely in kernel/bpf/disasm.c) are kept exactly the same both
    as demonstration that backwards compat works, but also to avoid
    unnecessary code churn.
    Note that new ___BPF_FUNC_MAPPER() doesn't forcefully insert comma
    between values, as that might not be appropriate in all possible cases
    where ___BPF_FUNC_MAPPER might be used by users. This doesn't reduce
    usability, as it's trivial to insert that comma inside "callback" macro.
    To validate all the manually specified IDs are exactly right, we used
    BTF to compare before and after values:
      $ bpftool btf dump file ~/linux-build/default/vmlinux | rg bpf_func_id -A 211 > after.txt
      $ git stash # stach UAPI changes
      $ make -j90
      ... re-building kernel without UAPI changes ...
      $ bpftool btf dump file ~/linux-build/default/vmlinux | rg bpf_func_id -A 211 > before.txt
      $ diff -u before.txt after.txt
      --- before.txt  2022-10-05 10:48:18.119195916 -0700
      +++ after.txt   2022-10-05 10:46:49.446615025 -0700
      @@ -1,4 +1,4 @@
      -[14576] ENUM 'bpf_func_id' encoding=UNSIGNED size=4 vlen=211
      +[9560] ENUM 'bpf_func_id' encoding=UNSIGNED size=4 vlen=211
'BPF_FUNC_unspec' val=0
'BPF_FUNC_map_lookup_elem' val=1
'BPF_FUNC_map_update_elem' val=2
    As can be seen from diff above, the only thing that changed was resulting BTF
    type ID of ENUM bpf_func_id, not any of the enumerators, their names or integer
    values.
    The only other place that needed fixing was scripts/bpf_doc.py used to generate
    man pages and bpf_helper_defs.h header for libbpf and selftests. That script is
    tightly-coupled to exact shape of ___BPF_FUNC_MAPPER macro definition, so had
    to be trivially adapted.
    Cc: Quentin Monnet <quentin@isovalent.com>
    Reported-by: Andrea Terzolo <andrea.terzolo@polito.it>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Reviewed-by: Quentin Monnet <quentin@isovalent.com>
    Acked-by: Jiri Olsa <jolsa@kernel.org>
    Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Link: https://lore.kernel.org/r/20221006042452.2089843-1-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

完。

相关文章
|
监控 安全 Unix
UNIX域套接字(Unix Domain Socket)在安全性和隐私性
UNIX域套接字(Unix Domain Socket)在安全性和隐私性
533 2
|
应用服务中间件 调度 nginx
Kubernetes的Pod调度:让你的应用像乘坐头等舱!
Kubernetes的Pod调度:让你的应用像乘坐头等舱!
|
缓存 算法 Java
Linux内核新特性年终大盘点-安卓杀后台现象减少的背后功臣MGLRU算法简介
MGLRU是一种新型内存管理算法,它的出现是为了弥补传统LRU(Least Recently Used)和LFU(Least Frequently Used)算法在缓存替换选择上的不足,LRU和LFU的共同缺点就是在做内存页面替换时,只考虑内存页面在最近一段时间内被访问的次数和最后一次的访问时间,但是一个页面的最近访问次数少或者最近一次的访问时间较早,可能仅仅是因为这个内存页面新近才被创建,属于刚刚完成初始化的年代代页面,它的频繁访问往往会出现在初始化之后的一段时间里,那么这时候就把这种年轻代的页面迁移出去
|
存储 Kubernetes API
使用Kubernetes管理容器化应用的深度解析
【5月更文挑战第20天】本文深度解析Kubernetes在管理容器化应用中的作用。Kubernetes是一个开源平台,用于自动化部署、扩展和管理容器,提供API对象描述应用资源并维持其期望状态。核心组件包括负责集群控制的Master节点(含API Server、Scheduler、Controller Manager和Etcd)和运行Pod的工作节点Node(含Kubelet、Kube-Proxy和容器运行时环境)。
|
SQL 存储 关系型数据库
数据库的基本原理
数据库的基本原理
179 2
|
分布式计算 大数据 MaxCompute
MaxCompute产品使用合集之使用pyodps读取OSS(阿里云对象存储)中的文件的步骤是什么
MaxCompute作为一款全面的大数据处理平台,广泛应用于各类大数据分析、数据挖掘、BI及机器学习场景。掌握其核心功能、熟练操作流程、遵循最佳实践,可以帮助用户高效、安全地管理和利用海量数据。以下是一个关于MaxCompute产品使用的合集,涵盖了其核心功能、应用场景、操作流程以及最佳实践等内容。
|
12月前
|
Java 开发工具 Android开发
搭建大型源码阅读环境——使用 OpenGrok
RTFSC 是程序员成长的必修课,营造舒适的环境至关重要。本文介绍了阅读大型源码(如 AOSP)的工具选择,重点推荐了免费开源的 OpenGrok。OpenGrok 提供快速搜索、版本历史查看、语法高亮等功能,适用于特大型项目。文章还详细讲解了 OpenGrok 的安装和配置步骤,帮助读者高效阅读源码。
2249 6
|
敏捷开发 JavaScript Java
阿里云云效产品使用合集之如何进行Maven私有仓库迁移
云效作为一款全面覆盖研发全生命周期管理的云端效能平台,致力于帮助企业实现高效协同、敏捷研发和持续交付。本合集收集整理了用户在使用云效过程中遇到的常见问题,问题涉及项目创建与管理、需求规划与迭代、代码托管与版本控制、自动化测试、持续集成与发布等方面。
|
中间件 数据挖掘 API
ERP系统的系统集成与接口管理:实现高效协同
【7月更文挑战第29天】 ERP系统的系统集成与接口管理:实现高效协同
1075 0
|
缓存 网络协议 Linux
深入理解Linux网络——本机网络IO
前面的章节深度分析了网络包的接收,也拆分了网络包的发送,总之收发流程算是闭环了。不过还有一种特殊的情况没有讨论,那就是接收和发送都在本机进行。而且实践中这种本机网络IO出现的场景还不少,而且还有越来越多的趋势。例如LNMP技术栈中的nginx和php-fpm进程就是通过本机来通信的,还有流行的微服务中sidecar模式也是本机网络IO。