k8s-CKS真题-故障排查Sysdig & falco

简介: k8s-CKS真题-故障排查Sysdig & falco

题目

Task:
使用运行时检测工具来检测 Pod tomcat123 单个容器中频发生成和执行的异常进程。
有两种工具可供使用:
    sysdig
    falco
注:这些工具只预装在 cluster 的工作节点 node02 上,不在 master 节点。
使用工具至少分析 30 秒 ,使用过滤器检查生成和执行的进程,将事件写到 /opt/KSR00101/incidents/summary 文件中,
其中包含检测的事件, 格式如下:
timestamp,uid/username,processName
保持工具的原始时间戳格式不变。
注:确保事件文件存储在集群的工作节点上。

请注意,考试时,考题里已表明 sysdig 在工作节点上,所以你需要ssh 到开头写的工作节点上

环境搭建

安装sysdig

在下方github参考链接的Release下载即可

博主下载的sysdig-0.31.5-x86_64.tar.gz,之后把可执行文件移动即可

命令

tar -zxf sysdig-0.31.5-x86_64.tar.gz
cp sysdig-0.31.5-x86_64/usr/bin/sysdig /usr/bin/sysdig

ubuntu 安装sysdig

命令

apt-get install -y sysdig

截图

sysdig --help
sysdig version 0.31.5
Usage: sysdig [options] [-p <output_format>] [filter]
Options:
 -A, --print-ascii  Only print the text portion of data buffers, and echo
                    end-of-lines. This is useful to only display human-readable
                    data.
 -b, --print-base64 Print data buffers in base64. This is useful for encoding
                    binary data that needs to be used over media designed to
                    handle textual data (i.e., terminal or json).
 -B<bpf_probe>, --bpf=<bpf_probe>
                    Enable live capture using the specified BPF probe instead of the kernel module.
                    The BPF probe can also be specified via the environment variable
                    SYSDIG_BPF_PROBE. If <bpf_probe> is left empty, sysdig will
                    try to load one from the scap-driver-loader script.
 -c <chiselname> <chiselargs>, --chisel <chiselname> <chiselargs>
                    run the specified chisel. If the chisel require arguments,
                    they must be specified in the command line after the name.
 -cl, --list-chisels
                    lists the available chisels. Sysdig looks for chisels in the
                    following directories: ./chisels, ~/.chisels, /usr/share/sysdig/chisels.
  --cpus-for-each-buffer <cpus_num>
                    [EXPERIMENTAL] Please note this config regards only the modern BPF probe.
                    They are experimental so they could change over releases.
                    How many CPUs you want to assign to a single syscall buffer (ring buffer).
                    By default, every syscall buffer is associated to 2 CPUs, so the mapping is
                    1:2. The modern BPF probe allows you to choose different mappings, for
                    example, 1:1 would mean a syscall buffer for each CPU.
 -C <file_size>, --file-size=<file_size>
                    Before writing an event, check whether the file is
                    currently larger than file_size and, if so, close the
                    current file and open a new one. Saved files will have the
                    name specified with the -w flag, with a number after it,
                    starting at 0 and continuing upward. The units of file_size
                    are millions of bytes (10^6, not 2^20). Use the -W flag to
                    determine how many files will be saved to disk.
 --cri <path>       Path to CRI socket for container metadata
                    Use the specified socket to fetch data from a CRI-compatible runtime
 --cri-timeout <timeout_ms>
                    Wait at most <timeout_ms> milliseconds for response from CRI
 -d, --displayflt   Make the given filter a display one
                    Setting this option causes the events to be filtered
                    after being parsed by the state system. Events are
                    normally filtered before being analyzed, which is more
                    efficient, but can cause state (e.g. FD names) to be lost.
 -D, --debug        Capture events about sysdig itself, display internal events
                    in addition to system events, and print additional
                    logging on standard error.
 -E, --exclude-users
                    Don't create the user/group tables by querying the OS when
                    sysdig starts. This also means that no user or group info
                    will be written to the trace file by the -w flag.
                    The user/group tables are necessary to use filter fields
                    like user.name or group.name. However, creating them can
                    increase sysdig's startup time. Moreover, they contain
                    information that could be privacy sensitive.
 -e <num_events>    If used together with -w option, creates a series of dump files
                    containing only a specified number of events given in num_events
                    parameter each.
                    Used alongside -W flags creates a ring buffer of file containing
                    num_events each.
 -F, --fatfile      Enable fatfile mode
                    when writing in fatfile mode, the output file will contain
                    events that will be invisible when reading the file, but
                    that are necessary to fully reconstruct the state.
                    Fatfile mode is useful when saving events to disk with an
                    aggressive filter. The filter could drop events that would
                    the state to be updated (e.g. clone() or open()). With
                    fatfile mode, those events are still saved to file, but
                    'hidden' so that they won't appear when reading the file.
                    Be aware that using this flag might generate substantially
                    bigger traces files.
 --filter-proclist  apply the filter to the process table
                    a full dump of /proc is typically included in any trace file
                    to make sure all the state required to decode events is in the
                    file. This could cause the file to contain unwanted or sensitive
                    information. Using this flag causes the command line filter to
                    be applied to the /proc dump as well.
 -g, --gvisor-config
                    Parse events from gVisor using the specified configuration file.
                    A sysdig-compatible configuration file can be generated with --gvisor-generate-config
                    and can be used for both runsc and sysdig.
 --gvisor-generate-config [=<socket_path>(=/tmp/gvisor.sock)]
                    Generate a configuration file that can be used for gVisor.
 --gvisor-root <gvisor_root>
                    gVisor root directory for storage of container state. Equivalent to runsc --root flag.
 -G <num_seconds>, --seconds=<num_seconds>
                    Rotates the dump file specified with the -w option every
                    num_seconds seconds. Saved files will have the name specified
                    by -w which should include a time format as defined by strftime(3).
                    If no time format is specified, a counter will be used.
                    If no data format is specified, this can be used with -W flag to
                    create a ring buffer of events.
 -h, --help         Print this page
 -H <pluginname>[:<initconfig>], --plugin <pluginname>[:<initconfig>]
                    Registers a plugin, using the passed init config if present.
                    The format of initconf is controlled by the plugin, refer to each
                    plugin's documentation to learn about it.
                    A path can also be used as pluginname.
 -I <pluginname>[:<openparams>], --input <pluginname>[:<openparams>]
                    Set a previously registered plugin as input,
                    capturing events using it and passing the 
                    openparams string as open parameters.
                    Only a single source plugin can be registered.
                    If no plugins were registered, any found plugin in the directories
                    specified by ;-separated environment variable SYSDIG_PLUGIN_DIR and
                    in /usr/share/sysdig/plugins is registered; then use the provided one as input source.
                    The format of openparams is controlled by the plugin, refer to each
                    plugin's documentation to learn about it.
                    See https://falco.org/docs/plugins/plugin-api-reference/#ss-plugin-t-plugin-init-const-char-config-int32-t-rc-required-yes
                    and https://falco.org/docs/plugins/plugin-api-reference/#ss-instance-t-plugin-open-ss-plugin-t-s-const-char-params-int32-t-rc-required-yes for more infos.
                    The event sources available for capture vary depending on which 
                    plugins have been installed.
 -Il                Lists the loaded plugins. If no plugin has been registered through '-H',
                    Sysdig looks for plugins in the directories 
                    specified by ;-separated environment variable SYSDIG_PLUGIN_DIR and
                    in /usr/share/sysdig/plugins.
 -i <chiselname>, --chisel-info <chiselname>
                    Get a longer description and the arguments associated with
                    a chisel found in the -cl option list.
 -j, --json         Emit output as json, data buffer encoding will depend from the
                    print format selected.
 -k <url>, --k8s-api=<url>
                    Enable Kubernetes support by connecting to the API server
                    specified as argument. E.g. "http://admin:password@127.0.0.1:8080".
                    The API server can also be specified via the environment variable
                    SYSDIG_K8S_API.
 --node-name=<url>
                    The node name is used as a filter when requesting metadata of pods
                    to the API server; if empty, no filter is set
 -K <bt_file> | <cert_file>:<key_file[#password]>[:<ca_cert_file>], --k8s-api-cert=<bt_file> | <cert_file>:<key_file[#password]>[:<ca_cert_file>]
                    Use the provided files names to authenticate user and (optionally) verify the K8S API
                    server identity.
                    Each entry must specify full (absolute, or relative to the current directory) path
                    to the respective file.
                    Private key password is optional (needed only if key is password protected).
                    CA certificate is optional. For all files, only PEM file format is supported. 
                    Specifying CA certificate only is obsoleted - when single entry is provided 
                    for this option, it will be interpreted as the name of a file containing bearer token.
                    Note that the format of this command-line option prohibits use of files whose names contain
                    ':' or '#' characters in the file name.
                    Option can also be provided via the environment variable SYSDIG_K8S_API_CERT.
 -L, --list-events  List the events that the engine supports
 -l, --list         List the fields that can be used for filtering and output
                    formatting. Use -lv to get additional information for each
                    field.
 --libs-version     Print the falcosecurity/libs version
 --large-environment
                    Support environments larger than 4KiB
                    When the environment is larger than 4KiB, load the whole
                    environment from /proc instead of truncating to the first 4KiB
                    This may fail for short-lived processes and in that case
                    the truncated environment is used instead.
 --log-level=<trace|debug|info|notice|warning|error|critical|fatal>
                    Select log level. Useful together with --debug.
 --list-markdown    like -l, but produces markdown output
 -m <url[,marathon_url]>, --mesos-api=<url[,marathon_url]>
                    Enable Mesos support by connecting to the API server
                    specified as argument. E.g. "http://admin:password@127.0.0.1:5050".
                    Marathon url is optional and defaults to Mesos address, port 8080.
                    The API servers can also be specified via the environment variable
                    SYSDIG_MESOS_API.
--modern-bpf
                    [EXPERIMENTAL] Enable live capture using the modern BPF probe instead of
                    of the kernel module.
 -M <num_seconds>   Stop collecting after <num_seconds> reached.
 -n <num>, --numevents=<num>
                    Stop capturing after <num> events
 --page-faults      Capture user/kernel major/minor page faults
 --plugin-config-file
                    Load the plugin configuration from a Falco-compatible yaml file.
                    Do not mix this option with the '-H' or '-I' options: it is unsupported.
                    See the plugin section in https://falco.org/docs/configuration/ for
                    additional information
 -P, --progress     Print progress on stderr while processing trace files
 -p <output_format>, --print=<output_format>
                    Specify the format to be used when printing the events.
                    With -pc or -pcontainer will use a container-friendly format.
                    With -pk or -pkubernetes will use a kubernetes-friendly format.
                    With -pm or -pmesos will use a mesos-friendly format.
                    See the examples section below for more info.
 --plugin-info <pluginname>
                    Print info for a single plugin. This includes name, author,
                    and all the descriptive info of the plugin. If present,
                    this also prints the schema format for the init configuration
                    and a list of suggested open parameters.
                    All this info is controlled by the plugin, refer to each
                    plugin's documentation to learn more about it.
                    This can be combined with the -H option to load the plugin
                    with a given configuration.
                    A path can also be used as pluginname.
 -q, --quiet        Don't print events on the screen
                    Useful when dumping to disk.
 -R                 Resolve port numbers to names.
 -r <readfile>, --read=<readfile>
                    Read the events from <readfile>.
 -S, --summary      print the event summary (i.e. the list of the top events)
                    when the capture ends.
 -s <len>, --snaplen=<len>
                    Capture the first <len> bytes of each I/O buffer.
                    By default, the first 80 bytes are captured. Use this
                    option with caution, it can generate huge trace files.
 -t <timetype>, --timetype=<timetype>
                    Change the way event time is displayed. Accepted values are
                    h for human-readable string, a for absolute timestamp from
                    epoch, r for relative time from the beginning of the
                    capture, d for delta between event enter and exit, and
                    D for delta from the previous event.
 -T, --force-tracers-capture
                    Tell the driver to make sure full buffers are captured from
                    /dev/null, to make sure that tracers are completely
                    captured. Note that sysdig will enable extended /dev/null
                    capture by itself after detecting that tracers are written
                    there, but that could result in the truncation of some
                    tracers at the beginning of the capture. This option allows
                    preventing that.
 --unbuffered       Turn off output buffering. This causes every single line
                    emitted by sysdig to be flushed, which generates higher CPU
                    usage but is useful when piping sysdig's output into another
                    process or into a script.
 -U, --suppress-comm
                    Ignore all events from processes having the provided comm.
 -v, --verbose      Verbose output.
                    This flag will cause the full content of text and binary
                    buffers to be printed on screen, instead of being truncated
                    to 40 characters. Note that data buffers length is still
                    limited by the snaplen (refer to the -s flag documentation)
                    -v will also make sysdig print some summary information at
                    the end of the capture.
 --version          Print version number.
 -w <writefile>, --write=<writefile>
                    Write the captured events to <writefile>.
 -W <num>, --limit <num>
                    Used in conjunction with the -C option, this will limit the number
                    of files created to the specified number, and begin overwriting files
                    from the beginning, thus creating a 'rotating' buffer.
                    Used in conjunction with the -G option, this will limit the number
                    of rotated dump files that get created, exiting with status 0 when
                    reaching the limit. If used with -C as well, the behavior will result
                    in cyclical files per timeslice.
 -x, --print-hex    Print data buffers in hex.
 -X, --print-hex-ascii
                    Print data buffers in hex and ASCII.
 -z, --compress     Used with -w, enables compression for trace files.
Output format:
By default, sysdig prints the information for each captured event on a single
 line with the following format:
 %evt.num %evt.outputtime %evt.cpu %proc.name (%thread.tid) %evt.dir %evt.type %evt.info
where:
 evt.num is the incremental event number
 evt.time is the event timestamp
 evt.cpu is the CPU number where the event was captured
 proc.name is the name of the process that generated the event
 thread.tid id the TID that generated the event, which corresponds to the
   PID for single thread processes
 evt.dir is the event direction, > for enter events and < for exit events
 evt.type is the name of the event, e.g. 'open' or 'read'
 evt.info is the list of event arguments.
The output format can be customized with the -p switch, using any of the
fields listed by 'sysdig -l'.
Using -pc or -pcontainer, the default format will be changed to a container-friendly one:
%evt.num %evt.outputtime %evt.cpu %container.name (%container.id) %proc.name (%thread.tid:%thread.vtid) %evt.dir %evt.type %evt.info
Using -pk or -pkubernetes, the default format will be changed to a kubernetes-friendly one:
%evt.num %evt.outputtime %evt.cpu %k8s.pod.name (%container.id) %proc.name (%thread.tid:%thread.vtid) %evt.dir %evt.type %evt.info
Using -pm or -pmesos, the default format will be changed to a mesos-friendly one:
%evt.num %evt.outputtime %evt.cpu %mesos.task.name (%container.id) %proc.name (%thread.tid:%thread.vtid) %evt.dir %evt.type %evt.info
Examples:
 Capture all the events from the live system and print them to screen
   $ sysdig
 Capture all the events from the live system and save them to disk
   $ sysdig -w dumpfile.scap
 Read events from a file and print them to screen
   $ sysdig -r dumpfile.scap
 Print all the open system calls invoked by cat
   $ sysdig proc.name=cat and evt.type=open
 Print the name of the files opened by cat
   $ sysdig -p"%evt.arg.name" proc.name=cat and evt.type=open
 Register any found plugin and use dummy as input source passing to it open params
   $ sysdig -I dummy:10'
 Load and register dummy source plugin passing to it init config and open params
   $ sysdig -H dummy:'{"jitter":50}' -I dummy:10

创建容器

kubectl run tomcat123 --image=nginx

创建目录、文件

mkdir -p /opt/KSR00101/incidents/ && touch /opt/KSR00101/incidents/summary

解题 - sysdig

查看容器名字或ID

docker ps | grep tomcat123

如果没有docker,使用

crictl ps | grep tomcat123

如果都没有,可以使用

kubectl get po tomcat123 -oyaml | grep containerID

使用sysdig做检测

命令

sysdig -M 30 -p "%evt.time,%user.name,%proc.name" --cri /run/containerd/containerd.sock container.name=tomcat123 >> /opt/KSR00101/incidents/summary
sysdig -M 30 -p "%evt.time,%user.name,%proc.name" --cri /run/containerd/containerd.sock container.id= >> /opt/KSR00101/incidents/summary

截图

解题 - falco

查看一下falco是否安装

命令

falco --help

截图

查看tomcat123 pod是否存在

命令

kubectl get po

截图

写一个falco规则

rule1.yaml

- rule: rule1
  desc: rule1
  condition: container.name = "tomcat123"
  output: "%evt.time,%user.name,%proc.name"
  priority: WARNING

截图

命令

sudo falco -M 30 -r rule1.yaml >> /opt/KSR00101/incidents/summary

等待30秒,等待期间再开一个tab,进入容器执行ls等操作

截图

错误(centos下安装)

Unable to load the driver
error opening device /dev/scap0. Make sure you have root credentials and that the scap module is loaded: No such file or directory
modprobe scap
modprobe: ERROR: could not insert 'scap': Required key not available

看了一下,是操作系统层面的问题,直接在模拟环境下做题了。

模拟环境

参考

github-sysdig

wiki-How-to-Install-Sysdig-for-Linux

sysdig.com

Youtube-Sysdig Open Source - Getting Started With Csysdig

github-falco

falco-rules

相关实践学习
通过Ingress进行灰度发布
本场景您将运行一个简单的应用,部署一个新的应用用于新的发布,并通过Ingress能力实现灰度发布。
容器应用与集群管理
欢迎来到《容器应用与集群管理》课程,本课程是“云原生容器Clouder认证“系列中的第二阶段。课程将向您介绍与容器集群相关的概念和技术,这些概念和技术可以帮助您了解阿里云容器服务ACK/ACK Serverless的使用。同时,本课程也会向您介绍可以采取的工具、方法和可操作步骤,以帮助您了解如何基于容器服务ACK Serverless构建和管理企业级应用。 学习完本课程后,您将能够: 掌握容器集群、容器编排的基本概念 掌握Kubernetes的基础概念及核心思想 掌握阿里云容器服务ACK/ACK Serverless概念及使用方法 基于容器服务ACK Serverless搭建和管理企业级网站应用
相关文章
|
运维 Kubernetes 网络安全
|
7月前
|
容器 Perl Kubernetes
深入 Kubernetes 网络:实战K8s网络故障排查与诊断策略
本文介绍了Kubernetes网络的基础知识和故障排查经验,重点讨论了私有化环境中Kubernetes网络的挑战。首先,文章阐述了Kubernetes网络模型的三大核心要素:Pod网络、Service网络和CNI,并强调了其在容器通信和服务发现中的作用。接着,通过三个具体的故障案例,展示了网络冲突、主节点DNS配置更改导致的服务中断以及容器网络抖动问题的解决过程,强调了网络规划、配置管理和人员培训的重要性。最后,提到了KubeSkoop exporter工具在监控和定位网络抖动问题中的应用。通过这些案例,读者可以深入了解Kubernetes网络的复杂性,并学习到实用的故障排查方法。
147647 19
|
8月前
|
运维 Kubernetes 调度
【kubernetes】关于k8s集群的污点、容忍、驱逐以及k8s集群故障排查思路
【kubernetes】关于k8s集群的污点、容忍、驱逐以及k8s集群故障排查思路
|
8月前
|
运维 Kubernetes 监控
避免业务中断,K8s节点故障排查攻略,速来围观!
避免业务中断,K8s节点故障排查攻略,速来围观!
125 0
|
运维 Kubernetes Shell
Kubernetes —集群故障排查(Kubectl 、telepresence)
Kubernetes —集群故障排查(Kubectl 、telepresence)
198 2
|
8月前
|
运维 Kubernetes 网络安全
k8s学习-CKA真题-集群故障排查kubelet
k8s学习-CKA真题-集群故障排查kubelet
210 0
|
存储 缓存 JSON
Kubernetes集群故障排查—审计
Kubernetes集群故障排查—审计
187 1
|
运维 Kubernetes 安全
5 分钟搞懂 K8S Pod Terminating/Unknown 故障排查
5 分钟搞懂 K8S Pod Terminating/Unknown 故障排查
1593 0
|
JSON 运维 Kubernetes
Kubernetes集群故障排查—使用 crictl 对 Kubernetes 节点进行调试
Kubernetes集群故障排查—使用 crictl 对 Kubernetes 节点进行调试
400 0
|
Prometheus 运维 Kubernetes
Kubernetes 故障排查工具- Robusta 解析
Hello folks,我是 Luga,今天我们来分享一款用于 Kubernetes Cluster 故障排查的开源工具 - Robusta (罗布斯塔)。作为一个用于多集群 Kubernetes 监控、故障排除和自动化的开源平台,就像 Docker 用于部署应用程序的基础设施即代码一样,Robusta 用于维护 Kubernetes Cluster 应用程序和处理其警报的基础设施即代码。
432 0

热门文章

最新文章