seccomp BPF与容器安全（上）-阿里云开发者社区

1 简介
2 实现
3 libseccomp
4 其它工具
5 使用Seccomp保护Docker的安全
6 参考链接

本文详细介绍了关于seccomp的相关概念，包括seccomp的发展历史、Seccomp BPF的实现原理以及与seccomp相关的一些工具等。此外，通过实例验证了如何使用seccomp bpf来保护Docker的安全。

1 简介

seccomp（全称securecomputing mode）是linux kernel支持的一种安全机制。在Linux系统里，大量的系统调用（system call）直接暴露给用户态程序。但是，并不是所有的系统调用都被需要，而且不安全的代码滥用系统调用会对系统造成安全威胁。通过seccomp，我们限制程序使用某些系统调用，这样可以减少系统的暴露面，同时是程序进入一种“安全”的状态。\

1.1 `Seccomp`的发展历史

2005年，Linux 2.6.12中的引入了第一个版本的seccomp，通过向/proc/PID/seccomp接口中写入“1”来启用过滤器，最初只有一个模式：严格模式（strict mode），该模式下只允许被限制的进程使用4种系统调用：read(), write(), _exit(), 和 sigreturn() ，需要注意的是，open()系统调用也是被禁止的，这就意味着在进入严格模式之前必须先打开文件。一旦为程序施加了严格模式的seccomp，对于其它的所有系统调用，都会触发SIGKILL并立即终止进程。

2007年，Linux 2.6.23内核使用prctl()操作代替了/proc/PID/seccomp接口来施加seccomp，通过prctl(PR_SET_SECCOMP,arg)修改调用者的seccomp模式；prctl(PR_GET_SECCOMP)用来获取seccomp的状态，返回值为0时代表进程没有被施加seccomp，但是如果进程配置了seccomp，则会由于不能调用prctl()导致进程中止，那就没有其他返回值了？？

2012年，Linux 3.5引入了seccomp mode 2，为seccomp带来了一种新的模式：过滤模式（filter mode），该模式使用Berkeley包过滤器(BPF) 程序过滤系统调用及其参数，使用该模式，进程可以使用prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, ...)来指定允许哪些系统调用。现在已经有许多应用使用seccomp过滤器来对系统调用进行控制，包括Chrome/Chromium浏览器、OpenSSH、vsftpd和Firefox OS。

2013年，Linux 3.8版本，在/proc/PID/status中添加了一个Seccomp字段，可以通过读取该文件获取对应进程的seccomp模式的状态（0表示禁用，1表示严格，2表示过滤）。

/* Valid values for seccomp.mode and prctl(PR_SET_SECCOMP, <mode>) */
#define SECCOMP_MODE_DISABLED       0 /* seccomp is not in use. */
#define SECCOMP_MODE_STRICT         1 /* uses hard-coded filter. */
#define SECCOMP_MODE_FILTER         2 /* uses user-supplied filter. */

示例：

$ cat /proc/1/status | grep Seccomp
Seccomp:        0
Seccomp_filters:    0

2014年，Linux 3.17 引入了seccomp()系统调用，seccomp()在prctl()的基础上提供了现有功能的超集，增加了将进程中的所有线程同步到同一组过滤器的能力，这有助于确保即使在施加seccomp过滤器之前创建的线程仍然有效。

1.2 `Seccomp + BPF`

seccomp过滤模式允许开发人员编写BPF程序基于系统调用号和参数（寄存器）值对系统调用进行过滤。当使用seccomp()或prctl()对进程施加seccomp时，需要提前将编写好的BPF程序安装到内核，之后每次系统调用都会经过该过滤器。而且此过程是不可逆的，因为安装过滤器实际上是声明任何后续执行的代码都不可信。

BPF在1992年的tcpdump程序中首次提出，tcpdump是一个网络数据包的监控工具，但是由于数据包的数量很大，而且将内核空间捕获到的数据包传输到用户空间会带来很多不必要的性能损耗，所以要对数据包进行过滤，只保留感兴趣的那一部分，而在内核中过滤感兴趣的数据包比在用户空间中进行过滤更有效。BPF就是提供了一种进行内核过滤的方法，因此用户空间只需要处理经过内核过滤后感兴趣的数据包。

BPF定义了一个可以在内核内实现的虚拟机(VM)。该虚拟机有以下特性：

简单指令集

小型指令集

所有的指令大小相同

实现过程简单、快速

只有分支向前指令

程序是有向无环图(DAGs)，没有循环

易于验证程序的有效性/安全性

简单的指令集⇒可以验证操作码和参数

可以检测死代码

程序必须以Return结束

BPF过滤器程序仅限于4096条指令

BPF程序在Linux内核中主要在filter.h和bpf_common.h中实现，主要的数据结构包括以下几个：

Linux v5.18.4/include/uapi/linux/filte.h->sock_fprog

struct sock_fprog {                     /* Required for SO_ATTACH_FILTER. */
    unsigned short              len;    /* BPF指令的数量 */
    struct sock_filter __user   *filter;/* 指向BPF数组的指针 */
};

这个结构体记录了过滤规则个数与规则数组起始位置，而filter域指向了具体的规则，每一条规则的形式如下：

Linux v5.18.4/include/uapi/linux/filte.h->sock_filter

struct sock_filter {
    __u16   code;       /* Actual filter code */
    __u8    jt;         /* Jump true */
    __u8    jf;         /* Jump false */
    __u32   k;          /* Generic multiuse field */
};

该规则有四个参数，code：过滤指令；jt:条件真跳转；jf：条件假跳转；k：操作数。

BPF的指令集比较简单，主要有以下几个指令：

加载指令

存储指令

跳转指令

算术逻辑指令

包括：ADD、SUB、MUL、DIV、MOD、NEG、OR、AND、XOR、LSH、RSH

Return指令

条件跳转指令

有两个跳转目标，jt为真，jf为假

jmp目标是指令偏移量，最大 255

如何编写BPF程序呢？BPF指令可以手工编写，但是，开发人员定义了符号常量和两个方便的宏BPF_STMT和BPF_JUMP可以用来方便的编写BPF规则。

Linux v5.18.4/include/uapi/linux/filte.h->BPF_STMT&BPF_JUMP

/*
 * Macros for filter block array initializers.
 */
#ifndef BPF_STMT
#define BPF_STMT(code, k) { (unsigned short)(code), 0, 0, k }
#endif
#ifndef BPF_JUMP
#define BPF_JUMP(code, k, jt, jf) { (unsigned short)(code), jt, jf, k }
#endif

BPF_STMT
BPF_STMT有两个参数，操作码(code)和值(k)，举个例子：

BPF_STMT(BPF_LD | BPF_W | BPF_ABS,(offsetof(struct seccomp_data, arch)))

这里的操作码是由三个指令或组成的，BPF_LD: 建一个BPF加载操作；BPF_W:操作数大小是一个字；BPF_ABS: 使用绝对偏移，即使用指令中的值作为数据区的偏移量，该值是体系结构字段与数据区域的偏移量。offsetof()生成数据区域中期望字段的偏移量。
该指令的功能是将体系架构数加载到累加器中。
BPF_JUMP
BPF_JUMP中有四个参数：操作码、值(k)、为真跳转(jt)和为假跳转(jf)，举个例子：

BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K ,AUDIT_ARCH_X86_64 , 1, 0)

PF_JMP | BPF JEQ会创建一个相等跳转指令，它将指令中的值（即第二个参数AUDIT_ARCH_X86_64）与累加器中的值（BPF_K）进行比较。判断是否相等，也就是说，如果架构是 x86-64，则跳过下一条指令（jt=1，代表测试为真跳过一条指令），否则将执行下一条指令（jf=0，代表如果测试为假，则跳过0条指令，也就是继续执行下一条指令）。
上面这两条指令常用作系统架构的验证。

再举个实际例子，该示例用作过滤execve系统调用的过滤规则：

struct sock_filter filter[] = {
    // 将帧的偏移0处，取4个字节数据，也就是系统调用号的值载入累加器 
    BPF_STMT(BPF_LD | BPF_W | BPF_ABS, 0),           
    // 当A == 59时，顺序执行下一条规则，否则跳过下一条规则，
    // 这里的59就是x64的execve系统调用号
    BPF_JUMP(BPF_JMP | BPF_JEQ, 59, 0, 1),
    BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL),   //返回KILL
    BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),  //返回ALLOW
};

在bpf_common.h中给出了BPF_STMT和BPF_JUMP相关的操作码:

/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
#ifndef _UAPI__LINUX_BPF_COMMON_H__
#define _UAPI__LINUX_BPF_COMMON_H__
/* 指令分类 */                    
#define BPF_CLASS(code) ((code) & 0x07)
#define BPF_LD          0x00    // 将值复制到累加器中
#define BPF_LDX         0x01    // 将值加载到索引寄存器中
#define BPF_ST          0x02    // 将累加器中的值存到暂存器
#define BPF_STX         0x03    // 将索引寄存器的值存储在暂存器中
#define BPF_ALU         0x04    // 用索引寄存器或常数作为操作数在累加器上执行算数或逻辑运算
#define BPF_JMP         0x05    // 跳转
#define BPF_RET         0x06    // 返回
#define BPF_MISC        0x07    // 其他类别
/* ld/ldx指令相关位域 */
#define BPF_SIZE(code)  ((code) & 0x18)
#define BPF_W           0x00    // 字-32位
#define BPF_H           0x08    // 半字-16位
#define BPF_B           0x10    // 字节-8位
/* eBPF BPF_DW          0x18 */ // 双字-64位
#define BPF_MODE(code)  ((code) & 0xe0)
#define BPF_IMM         0x00    // 常数 
#define BPF_ABS         0x20    // 固定偏移量的数据包数据(绝对偏移)
#define BPF_IND         0x40    // 可变偏移量的数据包数据(相对偏移)
#define BPF_MEM         0x60    // 暂存器中的一个字
#define BPF_LEN         0x80    // 数据包长度
#define BPF_MSH         0xa0
/* alu/jmp fields */
#define BPF_OP(code)    ((code) & 0xf0)       //当操作码类型为ALU时，指定具体运算符   
#define BPF_ADD         0x00        
#define BPF_SUB         0x10
#define BPF_MUL         0x20
#define BPF_DIV         0x30
#define BPF_OR          0x40
#define BPF_AND         0x50
#define BPF_LSH         0x60
#define BPF_RSH         0x70
#define BPF_NEG         0x80
#define BPF_MOD         0x90
#define BPF_XOR         0xa0
// 当操作码是jmp时指定跳转类型
#define BPF_JA          0x00
#define BPF_JEQ         0x10
#define BPF_JGT         0x20
#define BPF_JGE         0x30
#define BPF_JSET        0x40
#define BPF_SRC(code)   ((code) & 0x08)
#define BPF_K           0x00    // 常数
#define BPF_X           0x08    // 索引寄存器
#ifndef BPF_MAXINSNS
#define BPF_MAXINSNS    4096
#endif
#endif /* _UAPI__LINUX_BPF_COMMON_H__ */

与seccomp相关的定义大多数在seccomp.h中定义。

一旦为程序配置了seccomp-BPF，每个系统调用都会经过seccomp过滤器，这在一定程度上会影响系统的性能。此外，Seccomp过滤器会向内核返回一个值，指示是否允许该系统调用，该返回值是一个32位的数值，其中最重要的16位（SECCOMP_RET_ACTION掩码）指定内核应该采取的操作，其他位（SECCOMP_RET_DATA掩码）用于返回与操作关联的数据。

/*
 * All BPF programs must return a 32-bit value.
 * The bottom 16-bits are for optional return data.
 * The upper 16-bits are ordered from least permissive values to most,
 * as a signed value (so 0x8000000 is negative).
 *
 * The ordering ensures that a min_t() over composed return values always
 * selects the least permissive choice.
 */
#define SECCOMP_RET_KILL_PROCESS 0x80000000U /* kill the process */
#define SECCOMP_RET_KILL_THREAD     0x00000000U /* kill the thread */
#define SECCOMP_RET_KILL     SECCOMP_RET_KILL_THREAD
#define SECCOMP_RET_TRAP     0x00030000U /* disallow and force a SIGSYS */
#define SECCOMP_RET_ERRNO     0x00050000U /* returns an errno */
#define SECCOMP_RET_USER_NOTIF     0x7fc00000U /* notifies userspace */
#define SECCOMP_RET_TRACE     0x7ff00000U /* pass to a tracer or disallow */
#define SECCOMP_RET_LOG         0x7ffc0000U /* allow after logging */
#define SECCOMP_RET_ALLOW     0x7fff0000U /* allow */
/* Masks for the return value sections. */
#define SECCOMP_RET_ACTION_FULL    0xffff0000U
#define SECCOMP_RET_ACTION    0x7fff0000U
#define SECCOMP_RET_DATA    0x0000ffffU

SECCOMP_RET_ALLOW：允许执行
SECCOMP_RET_KILL：立即终止执行
SECCOMP_RET_ERRNO：从系统调用中返回一个错误（系统调用不执行）
SECCOMP_RET_TRACE：尝试通知ptrace()，使之有机会获得控制权
SECCOMP_RET_TRAP：通知内核发送SIGSYS信号（系统调用不执行）

每一个seccomp-BPF程序都使用seccomp_data结构作为输入参数：

/include/uapi/linux/seccomp.h

struct seccomp_data {
    int     nr ;                    /* 系统调用号（依赖于体系架构） */
    __u32   arch ;                  /* 架构（如AUDIT_ARCH_X86_64） */
    __u64   instruction_pointer ;   /* CPU指令指针 */
    __u64   args [6];               /* 系统调用参数，最多有6个参数 */
};

2 实现

2.1 `prctl()`

prctl函数是为进程设定而设计的，该函数原型如下：

#include <sys/prctl.h>
int prctl(int option,
        unsigned long arg2,
        unsigned long arg3,
        unsigned long arg4,
        unsigned long arg5);

其中明确指定哪种操作在于option选项， option有很多，与seccomp有关的option主要有两个： PR_SET_NO_NEW_PRIVS()和PR_SET_SECCOMP()。

PR_SET_NO_NEW_PRIVS()：
是在Linux 3.5之后引入的特性，当一个进程或者子进程设置了PR_SET_NO_NEW_PRIVS属性，则其不能访问一些无法共享的操作，如setuid、chroot等。配置seccomp-BPF的程序必须拥有Capabilities中的CAP_SYS_ADMIN，或者，程序已经定义了no_new_privs属性。若不这样做，非root用户使用该程序时seccomp保护将会失效，设置了PR_SET_NO_NEW_PRIVS位后能保证seccomp对所有用户都能起作用。

prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);

如果将其第二个参数设置为1，则这个操作能保证seccomp对所有用户都能起作用，并且会使子进程即execve后的进程依然受到seccomp的限制。
PR_SET_SECCOMP()：
为进程设置seccomp；通常的形式如下

prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);

SECCOMP_MODE_FILTER参数表示设置seccomp的过滤模式，如果设置为SECCOMP_MODE_STRICT，则代表严格模式；若为过滤模式，则对应的系统调用限制通过&prog结构体定义（上面提到过的 struct sock_fprog）。

2.2 严格模式的简单示例

在严格模式下，进程可用的系统调用只有4个，因为open()也被禁用，所有在进入严格模式前，需要先打开文件，简单的示例如下：

seccomp_strict.c

#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/prctl.h>
#include <linux/seccomp.h>
void configure_seccomp() {
    printf("Configuring seccomp\n");
    prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);
}
int main(int argc, char* argv[]) {
    int         infd, outfd;
    ssize_t     read_bytes;
    char        buffer[1024];
    if (argc < 3) {
        printf("Usage:\n\tdup_file <input path> <output_path>\n");
        return -1;
    }
    /* 配置seccomp */
    configure_seccomp(); 
    printf("Opening '%s' for reading\n", argv[1]);
    /* open() 被禁用，进程会在此终止*/
    if ((infd = open(argv[1], O_RDONLY)) > 0) {
        printf("Opening '%s' for writing\n", argv[2]);
        if ((outfd = open(argv[2], O_WRONLY | O_CREAT, 0644)) > 0) {
            while((read_bytes = read(infd, &buffer, 1024)) > 0)
                write(outfd, &buffer, (ssize_t)read_bytes);
        }
    }
    close(infd);
    close(outfd);
    return 0;
}

代码功能实现简单的文件复制，当seccomp施加严格模式的时候，seccomp会在执行open(argv[1], O_RDONLY)函数调用时终止应用程序。

$ gcc -o seccomp_strict seccomp_strict.c
$ ./seccomp_strict /etc/passwd output
Configuring seccomp
Opening '/etc/passwd' for reading
Killed

2.3 过滤模式的简单示例

通过上面的介绍和程序流，如果我们想要为一个程序施加seccomp-BPF策略，那可以分为以下几个步骤，首先定义filter数组，之后定义prog参数，最后使用prctl施加策略。

示例一：禁止execve系统调用（seccomp_filter_execv.c）

#include <stdio.h>
#include <sys/prctl.h>
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <stdlib.h>
#include <unistd.h>
struct sock_filter filter[] = {
    // 将帧的偏移0处，取4个字节数据，也就是系统调用号的值载入累加器
    BPF_STMT(BPF_LD+BPF_W+BPF_ABS,0),
    // 判断系统调用号是否为59，是则顺序执行，否则跳过下一条
    BPF_JUMP(BPF_JMP+BPF_JEQ,59,0,1),
    // 返回KILL
    BPF_STMT(BPF_RET+BPF_K,SECCOMP_RET_KILL),
    // 返回ALLOW
    BPF_STMT(BPF_RET+BPF_K,SECCOMP_RET_ALLOW),
};
struct sock_fprog prog = {
    // 规则条数
    .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
    // 结构体数组指针
    .filter = filter,
};
int main()
{
    // 设置NO_NEW_PRIVS
    prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
    prctl(PR_SET_SECCOMP,SECCOMP_MODE_FILTER, &prog);
    write(0,"test\n",5);
    system("/bin/sh");
    return 0;
}

示例二：（seccomp_filter.c）

#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <stddef.h>
#include <sys/prctl.h>
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <linux/unistd.h>
void configure_seccomp() {
    struct sock_filter filter [] = {
        // 将系统调用号载入累加器
        BPF_STMT(BPF_LD | BPF_W | BPF_ABS, (offsetof(struct seccomp_data, nr))),
        // 测试系统调用号是否匹配'__NR__write',如果是允许其他syscall，如果不是则跳过下一条指令，
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_write, 0, 1), 
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
        // 测试是否为'__NR_open',不是直接退出
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_open, 0, 3),
        // 第二个参数送入累加器
        BPF_STMT(BPF_LD | BPF_W | BPF_ABS, (offsetof(struct seccomp_data, args[1]))),
        // 判断是否是'O_RDONLY'的方式，是则允许
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, O_RDONLY, 0, 1),
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL)
    };
    struct sock_fprog prog = {
        .len = (unsigned short)(sizeof(filter) / sizeof (filter[0])),
        .filter = filter,
    };
    printf("Configuring seccomp\n");
    prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
    prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);
}
int main(int argc, char* argv[]) {
    int     infd, outfd;
    ssize_t read_bytes;
    char    buffer[1024];
    if (argc < 3) {
        printf("Usage:\n\tdup_file <input path> <output_path>\n");
        return -1;
    }
    printf("Ducplicating file '%s' to '%s'\n", argv[1], argv[2]);
    configure_seccomp(); //配置seccomp
    printf("Opening '%s' for reading\n", argv[1]);
    if ((infd = open(argv[1], O_RDONLY)) > 0) {
        printf("Opening '%s' for writing\n", argv[2]);
        if ((outfd = open(argv[2], O_WRONLY | O_CREAT, 0644)) > 0) {
            while((read_bytes = read(infd, &buffer, 1024)) > 0)
                write(outfd, &buffer, (ssize_t)read_bytes);
        }
    }
    close(infd);
    close(outfd);
    return 0;
}

在这种情况下，在这种情况下，seccomp-BPF程序将允许使用O_RDONLY参数打开第一个调用 , 但是在使用 O_WRONLY | O_CREAT 参数调用 open 时终止程序。

$ ./seccomp_filter /etc/passwd output
Ducplicating file '/etc/passwd' to 'output'
Configuring seccomp
Opening '/etc/passwd' for reading
Opening 'output' for writing
Bad system call

3 libseccomp

项目地址：libseccomp：https://github.com/seccomp/libseccomp

基于prctl()函数的机制不够灵活，libseccomp库可以提供一些函数实现prctl类似的效果，库中封装了一些函数，可以不用了解BPF规则而实现过滤。但是在c程序中使用它，需要装一些库文件：

$ sudo apt install libseccomp-dev libseccomp2 seccomp

使用示例1（simple_syscall_seccomp.c）

//gcc -g simple_syscall_seccomp.c -o simple_syscall_seccomp -lseccomp
#include <unistd.h>
#include <seccomp.h>
#include <linux/seccomp.h>
int main(void){
    scmp_filter_ctx ctx;
    ctx = seccomp_init(SCMP_ACT_ALLOW);
    seccomp_rule_add(ctx, SCMP_ACT_KILL, SCMP_SYS(execve), 0);
    seccomp_load(ctx);
    char * filename = "/bin/sh";
    char * argv[] = {"/bin/sh",NULL};
    char * envp[] = {NULL};
    write(1,"i will give you a shell\n",24);
    syscall(59, filename, argv, envp);//execve
    return 0;
}

编译运行, 在执行 execve 时程序报错退出 :

$ gcc -g simple_syscall_seccomp.c -o simple_syscall_seccomp -lseccomp
$ ./simple_syscall_seccomp
i will give you a shell
Bad system call (core dumped)

解释上述代码

scmp_filter_ctx : 过滤器的结构体
seccomp_init : 初始化的过滤状态 ,函数原型：

seccomp_init(uint32_t def_action)

可选的def_action有：

SCMP_ACT_ALLOW：允许所有系统调用；
SCMP_ACT_KILL：线程将会被内核以SIGSYS信号终止；
SCMP_ACT_KILL_PROCESS：整个进程将被内核终止；
SCMP_ACT_TRAP:线程将会抛出一个SIGSYS信号；
SCMP_ACT_TRACE(uint16_t msg_num)：与ptrace的调用有关；
SCMP_ACT_ERRNO(uint16_t errno)：匹配，会收到errno的返回值；
SCMP_ACT_LOG：不影响系统调用，但是会被记录；

seccomp_rule_add ：添加一条规则，函数原型为：

int seccomp_rule_add(scmp_filter_ctx ctx, uint32_t action,int syscall, unsigned int arg_cnt, ...);

其中arg_cnt参数表明是否需要对系统调用的参数做出限制以及指示做出限制的个数，如果仅仅需要允许或者禁止所有某个系统调用，arg_cnt直接传入0即可，如 seccomp_rule_add(ctx, SCMP_ACT_KILL, SCMP_SYS(execve), 0) 即禁用execve，不管其参数如何。如果arg_cnt的参数不为0，那arg_cnt表示后面限制的参数的个数，也就是只有调用execve，且参数满足要求时，才会拦截syscall。如果想要更细粒度的过滤系统调用，把参数也考虑进去,就要设置arg_cnt不为零，然后再利用宏做一些过滤。

示例2（限制参数）
举个例子，拦截write函数的参数大于0x10时的系统调用：
编译执行

$ gcc -g seccomp_write_limit.c -o seccomp_write_limit -lseccomp
$ ./seccomp_write_limit
1234567812345678Bad system call (core dumped)

其中SCMP_A2代表为第二个参数指定比较结构，SCMP_CMP_GT代表 大于(greater than)，详细内容如下。
除了seccomp_rule_add之外，还有其他添加规则的函数，如：seccomp_rule_add_array ()、 seccomp_rule_add_exact ()和seccomp_rule_add_exact_array ()，详细信息可查看参考链接。

libseccmop/include/seccomp.h.in：

// ...
// ...
// 比较算子
enum scmp_compare {
    _SCMP_CMP_MIN = 0,
    SCMP_CMP_NE = 1,        /**< 不等于 */
    SCMP_CMP_LT = 2,        /**< 小于 */
    SCMP_CMP_LE = 3,        /**< 小于等于 */
    SCMP_CMP_EQ = 4,        /**< 等于 */
    SCMP_CMP_GE = 5,        /**< 大于等于 */
    SCMP_CMP_GT = 6,        /**< 大于 */
    SCMP_CMP_MASKED_EQ = 7, /**< 掩码后相等 */
    _SCMP_CMP_MAX,
};
// ...
 struct scmp_arg_cmp {
    unsigned int arg;       /**< 参数个数，从0开始 */
    enum scmp_compare op;   /**< 指定比较算子，前面的SCMP_CMP_* */
    scmp_datum_t datum_a;
    scmp_datum_t datum_b;
};
// ....
/**
 * Specify a 32-bit argument comparison struct for use in declaring rules
 * @param arg the argument number, starting at 0
 * @param op the comparison operator, e.g. SCMP_CMP_*
 * @param datum_a dependent on comparison (32-bits)
 * @param datum_b dependent on comparison, optional (32-bits)
 */
#define SCMP_CMP32(x, y, ...) \
    _SCMP_MACRO_DISPATCHER(_SCMP_CMP32_, __VA_ARGS__)(x, y, __VA_ARGS__)
// 指定参数0的64位比较结构
#define SCMP_A0_64(...)         SCMP_CMP64(0, __VA_ARGS__)
#define SCMP_A0                 SCMP_A0_64
// 指定参数0的32位比较结构
#define SCMP_A0_32(x, ...)      SCMP_CMP32(0, x, __VA_ARGS__)
// 指定参数1的64位比较结构
#define SCMP_A1_64(...)         SCMP_CMP64(1, __VA_ARGS__)
#define SCMP_A1                 SCMP_A1_64
// 指定参数1的32位比较结构
#define SCMP_A1_32(x, ...)      SCMP_CMP32(1, x, __VA_ARGS__)
// 指定参数2的64位比较结构
#define SCMP_A2_64(...)         SCMP_CMP64(2, __VA_ARGS__)
#define SCMP_A2                 SCMP_A2_64
// 指定参数2的32位比较结构
#define SCMP_A2_32(x, ...)      SCMP_CMP32(2, x, __VA_ARGS__)
// 指定参数3的64位比较结构
#define SCMP_A3_64(...)         SCMP_CMP64(3, __VA_ARGS__)
#define SCMP_A3                 SCMP_A3_64
// 指定参数3的32位比较结构
#define SCMP_A3_32(x, ...)      SCMP_CMP32(3, x, __VA_ARGS__)
// 指定参数4的64位比较结构
#define SCMP_A4_64(...)         SCMP_CMP64(4, __VA_ARGS__)
#define SCMP_A4                 SCMP_A4_64
// 指定参数4的32位比较结构
#define SCMP_A4_32(x, ...)      SCMP_CMP32(4, x, __VA_ARGS__)
// 指定参数5的64位比较结构
#define SCMP_A5_64(...)         SCMP_CMP64(5, __VA_ARGS__)
#define SCMP_A5                 SCMP_A5_64
// 指定参数5的32位比较结构
#define SCMP_A5_32(x, ...)      SCMP_CMP32(5, x, __VA_ARGS__)
// ...
// ...

seccomp_write_limit.c

#include <unistd.h>
#include <seccomp.h>
#include <linux/seccomp.h>
int main(void){
    scmp_filter_ctx ctx;
    ctx = seccomp_init(SCMP_ACT_ALLOW);
    // 第2(从0)个参数大于0x10
    seccomp_rule_add(ctx, SCMP_ACT_KILL, SCMP_SYS(write),1,SCMP_A2(SCMP_CMP_GT,0x10));
    seccomp_load(ctx);
    write(1,"1234567812345678",0x10);//不被拦截
    write(1,"i will give you a shell\n",24);//会拦截   
    return 0;
}

seccomp_load： 将当前的 seccomp 过滤器加载到内核中，函数原型：

int seccomp_load(scmp_filter_ctx ctx);

seccomp_reset : 释放现有的过滤上下文重新初始化之前的状态，并且只能在成功调用seccomp_init () 之后才能使用。

seccomp BPF与容器安全（上）

1 简介

1.1 `Seccomp`的发展历史

1.2 `Seccomp + BPF`

2 实现

2.1 `prctl()`

2.2 严格模式的简单示例

2.3 过滤模式的简单示例

3 libseccomp

容器服务

热门文章

最新文章

相关课程

相关电子书

相关实验场景

seccomp BPF与容器安全（上）

1 简介

1.1 Seccomp的发展历史

1.2 Seccomp + BPF

2 实现

2.1 prctl()

2.2 严格模式的简单示例

2.3 过滤模式的简单示例

3 libseccomp

容器服务

热门文章

最新文章

相关课程

相关电子书

相关实验场景

1.1 `Seccomp`的发展历史

1.2 `Seccomp + BPF`

2.1 `prctl()`