k8s诊断之记一次pod oom的异常shmem输出

简介: 客户反馈java设置xms,xmx参数固定8G内存,而pod的limit则为16G, 三番五次出现了pod oom的情况,且oom的时候他的业务进程输出也是8g, 同时promethus的监控对应pod的working_set_memory也是8个多g, 剩下的几个g的内存到底去哪里了呢?

背景信息:

客户反馈java设置xms,xmx参数固定8G内存,而pod的limit则为16G, 三番五次出现了pod oom的情况,且oom的时候他的业务进程输出也是8g, 同时promethus的监控对应pod的working_set_memory也是8个多g, 剩下的几个g的内存到底去哪里了呢?

image.png

查看pod内存使用的几种方式:

1. 通过cgroup的统计查看pod内存使用:

默认进到pod里面看到的memory.stat是已经对应到了container目录的cgroup统计,并不包含pause以及pod 对应cgroup根目录的memory统计

# cat /sys/fs/cgroup/memory/memory.stat 
cache 1066070016
rss 4190208
rss_huge 0
shmem 1048363008
mapped_file 4730880
dirty 135168
writeback 0
swap 0
workingset_refault_anon 0
workingset_refault_file 405504
workingset_activate_anon 0
workingset_activate_file 0
workingset_restore_anon 0
workingset_restore_file 0
workingset_nodereclaim 0
pgpgin 267531
pgpgout 6185
pgfault 7293
pgmajfault 99
inactive_anon 528506880
active_anon 524181504
inactive_file 11718656
active_file 5677056
unevictable 0
hierarchical_memory_limit 1073741824
hierarchical_memsw_limit 1073741824
total_cache 1066070016
total_rss 4190208
total_rss_huge 0
total_shmem 1048363008
total_mapped_file 4730880
total_dirty 135168
total_writeback 0
total_swap 0
total_workingset_refault_anon 0
total_workingset_refault_file 405504
total_workingset_activate_anon 0
total_workingset_activate_file 0
total_workingset_restore_anon 0
total_workingset_restore_file 0
total_workingset_nodereclaim 0
total_pgpgin 267531
total_pgpgout 6185
total_pgfault 7293
total_pgmajfault 99
total_inactive_anon 528506880
total_active_anon 524181504
total_inactive_file 11718656
total_active_file 5677056
total_unevictable 0

cgroup中memory类型的分类:

image.png

memory.stat中的信息是最全的:


image.png

为什么说不应该看usage_in_bytes:

5.5 usage_in_bytes
For efficiency, as other kernel components, memory cgroup uses some optimization
to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
value for efficient access. (Of course, when necessary, it's synchronized.)
If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
value in memory.stat(see 5.2).
参考 https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt 

2.通过 kubectl top pod 查看的内存指标:

# kubectl top pods
NAME                                             CPU(cores)   MEMORY(bytes)   
alpine-64ccd465b4-mzsqw                          0m           0Mi             
golang-7cc44b559c-bt6gn                          0m           0Mi             
my-nettools-multicluster-test-75497d67fb-8w5z7   1m           3Mi             
my-nettools-multicluster-test-75497d67fb-fmskd   1m           3Mi             
my-nettools-primary-8584bffdf5-x97rx             1m           15Mi            
my-wordpress-8b77b598c-ncfdm                     1m           13Mi            
nginx-deployment-basic-598887494c-9jqp9          1m           2Mi             
nginx-qinhexing-6cb7c848b-kqbxn                  0m           1Mi             
stress-ceshi-nodename-7fdcb59799-zm275           0m           0Mi             
tomcat-875bdfdc-6p725                            1m           83Mi        

kubectl top pod的内存计算公式:

kubectl top pod 得到的内存使用量,并不是cadvisor 中的container_memory_usage_bytes,而是container_memory_working_set_bytes,计算方式为:

container_memory_usage_bytes == container_memory_rss + container_memory_cache + kernel memory
container_memory_working_set_bytes = container_memory_usage_bytes - total_inactive_file(未激活的匿名缓存页)
即:container_memory_working_set_bytes = container_memory_rss + container_memory_cache + kernel memory(一般可忽略) - total_inactive_file

container_memory_working_set_bytes是容器真实使用的内存量,也是limit限制时的 oom 判断依据(部分active_file可释放,但是发送oom时说明无可释放内存)

通过pod内执行top命令查看:

pod内执行top命令是有坑的,大家务必注意,像是上半部分的cpu,内存等资源显示的是节点的cpu 内存资源,task以及下面输出pid的才是pod自身的

Tasks:   8 total,   1 running,   7 sleeping,   0 stopped,   0 zombie
%Cpu0  : 53.5 us,  4.4 sy,  0.0 ni, 41.8 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
%Cpu1  : 52.0 us,  2.7 sy,  0.0 ni, 44.9 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
%Cpu2  : 41.7 us,  6.1 sy,  0.0 ni, 51.2 id,  0.7 wa,  0.0 hi,  0.3 si,  0.0 st
%Cpu3  : 41.1 us,  6.1 sy,  0.0 ni, 52.2 id,  0.3 wa,  0.0 hi,  0.3 si,  0.0 st
MiB Mem :  15752.1 total,    352.3 free,   5164.6 used,  10235.1 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  10118.9 avail Mem 
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                      
      1 root      20   0    2540     92      0 S   0.0   0.0   0:00.02 sleep                                                                                        
     23 root      20   0   58736   1520      0 S   0.0   0.0   0:00.00 nginx                                                                                        
     24 www-data  20   0   59064   2564    720 S   0.0   0.0   0:17.95 nginx                                                                                        
     25 www-data  20   0   59064   1820      0 S   0.0   0.0   0:00.00 nginx                                                                                        
     26 www-data  20   0   59064   1820      0 S   0.0   0.0   0:00.00 nginx                                                                                        
     27 www-data  20   0   59064   1820      0 S   0.0   0.0   0:00.00 nginx                                                                                        
    179 root      20   0    4468   3824   3168 S   0.0   0.0   0:00.00 bash 

通过docker 以及crictl 查看内存使用率:

# docker stats e00a11510727
CONTAINER ID        NAME                                                                                                  CPU %               MEM USAGE / LIMIT   MEM %               NET I/O             BLOCK I/O           PIDS
e00a11510727        k8s_my-nettools_my-nettools-primary-8584bffdf5-x97rx_default_a9d684a4-53fe-4114-b8d0-8f138b933551_1   0.00%               3.82MiB / 1GiB      0.37%               0B / 0B             21.2MB / 1.59MB     6
CONTAINER ID        NAME                                                                                                  CPU %               MEM USAGE / LIMIT   MEM %               NET I/O             BLOCK I/O           PIDS
e00a11510727        k8s_my-nettools_my-nettools-primary-8584bffdf5-x97rx_default_a9d684a4-53fe-4114-b8d0-8f138b933551_1   0.00%               3.82MiB / 1GiB      0.37%               0B / 0B             21.2MB / 1.59MB     6
# crictl stats --id 72dc2a4abc659
CONTAINER           CPU %               MEM                 DISK                INODES
72dc2a4abc659       0.00                3.92MB              16.68MB             9

通过apiserver查看metrics里面记录的监控数据:

metrics得到的指标实际上是已经被计算过了的,即 metrics拿到的是现成指标

kubectl get --raw   "/apis/metrics.k8s.io/v1beta1/namespaces/sj8***rod/pods/prod****-zj67p" |jq
{
  "kind": "PodMetrics",
  "apiVersion": "metrics.k8s.io/v1beta1",
  "metadata": {
    "name": "prod-****-zj67p",
    "namespace": "s****d",
    "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/sj****od/pods/prod-****-zj67p",
    "creationTimestamp": "2021-11-09T13:03:36Z"
  },
  "timestamp": "2021-11-09T13:03:20Z",
  "window": "30s",
  "containers": [
    {
      "name": "prod-****ess",
      "usage": {
        "cpu": "1045813398n",
        "memory": "2513804Ki"
      }
    }
  ]
}
也可以直接通过 kubectl 命令来访问这些 API,比如
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes/<node-name>
kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespace/<namespace-name>/pods/<pod-name>



通过kubelet的cadvisor查看cad采集的数据:

cad summary也是计算后的指标,其实如果比较会看cad的采集,到这里就已经可以知道答案在哪了,但是出现这个问题之前,cad的指标我并没有详细的看过

查看cad的汇总数据,直接获取summary的方式是这样
curl 127.0.0.1:10255/stats/summary
  {
 "node": {
  "nodeName": "cn-beijing.192.168.0.237",
  "systemContainers": [
   {
    "name": "kubelet",
    "startTime": "2021-09-06T02:28:36Z",
    "cpu": {
     "time": "2021-11-22T06:42:50Z",
     "usageNanoCores": 145371200,
     "usageCoreNanoSeconds": 827652227406905
    },
    "memory": {
     "time": "2021-11-22T06:42:50Z",
     "usageBytes": 196685824,
     "workingSetBytes": 161320960,
     "rssBytes": 126496768,
     "pageFaults": 5177146887,
     "majorPageFaults": 2343
    }
   },
   {
    "name": "runtime",
    "startTime": "2021-10-09T05:58:09Z",
    "cpu": {
     "time": "2021-11-22T06:42:46Z",
     "usageNanoCores": 100393752,
     "usageCoreNanoSeconds": 594566183951015
    },
    "memory": {
     "time": "2021-11-22T06:42:46Z",
     "usageBytes": 1394094080,
     "workingSetBytes": 1075867648,
     "rssBytes": 124157952,
     "pageFaults": 6155392122,
     "majorPageFaults": 924
    }
   },
   {
    "name": "pods",
    "startTime": "2021-10-09T05:57:59Z",
    "cpu": {
     "time": "2021-11-22T06:43:02Z",
     "usageNanoCores": 1284929792,
     "usageCoreNanoSeconds": 4873156254419795
    },
    "memory": {
     "time": "2021-11-22T06:43:02Z",
     "availableBytes": 11415199744,
     "usageBytes": 7008845824,
     "workingSetBytes": 4368039936,
     "rssBytes": 3479117824,
     "pageFaults": 0,
     "majorPageFaults": 0
    }
   }
  ],
  "startTime": "2021-09-06T02:28:25Z",
  "cpu": {
   "time": "2021-11-22T06:43:02Z",
   "usageNanoCores": 1715353110,
   "usageCoreNanoSeconds": 9464334847104386
  },
  "memory": {
   "time": "2021-11-22T06:43:02Z",
   "availableBytes": 6170161152,
   "usageBytes": 14745010176,
   "workingSetBytes": 10347081728,
   "rssBytes": 4769071104,
   "pageFaults": 31828203,
   "majorPageFaults": 132
  },
 ...
   {
   "podRef": {
    "name": "my-net****x97rx",
    "namespace": "default",
    "uid": "a9d684a4-53fe-4114-b8d0-8f138b933551"
   },
   "startTime": "2021-11-15T12:16:16Z",
   "containers": [
    {
     "name": "my-nettools",
     "startTime": "2021-11-18T13:27:44Z",
     "cpu": {
      "time": "2021-11-22T06:43:02Z",
      "usageNanoCores": 50389,
      "usageCoreNanoSeconds": 38669070971
     },
     "memory": {
      "time": "2021-11-22T06:43:02Z",
      "availableBytes": 1058471936, 这个pod的内存统计可用很多,已用很小
      "usageBytes": 34553856,
      "workingSetBytes": 15269888,
      "rssBytes": 3649536,
      "pageFaults": 24684,
      "majorPageFaults": 165
     },
     "rootfs": {
      "time": "2021-11-22T06:43:02Z",
      "device": "",
      "availableBytes": 57990582272,
      "capacityBytes": 126692048896,
      "usedBytes": 10391552,
      "inodesFree": 7291487,
      "inodes": 7864320,
      "inodesUsed": 30
     },
     "logs": {
      "time": "2021-11-22T06:43:02Z",
      "device": "",
      "availableBytes": 57990582272,
      "capacityBytes": 126692048896,
      "usedBytes": 28672,
      "inodesFree": 7291487,
      "inodes": 7864320,
      "inodesUsed": 572833
     }
    }
   ],
   "cpu": {
    "time": "2021-11-22T06:42:58Z",
    "usageNanoCores": 77743,
    "usageCoreNanoSeconds": 715087495132
   },
   "memory": {
    "time": "2021-11-22T06:42:58Z",
    "availableBytes": 114749440,  注意这里的可用
    "usageBytes": 978309120, 已经用了九百多m了
    "workingSetBytes": 958992384, working set也是900多m
    "rssBytes": 3649536,
    "pageFaults": 0,
    "majorPageFaults": 0
   },
   "network": {
    "time": "2021-11-22T06:42:55Z",
    "name": "eth0",
    "rxBytes": 61572501,
    "rxErrors": 0,
    "txBytes": 76220238,
    "txErrors": 0,
    "interfaces": [
     {
      "name": "eth0",
      "rxBytes": 61572501,
      "rxErrors": 0,
      "txBytes": 76220238,
      "txErrors": 0
     }
    ]
   },
   "volume": [
    {
     "time": "2021-11-22T06:42:39Z",
     "device": "",
     "availableBytes": 7314903040,
     "capacityBytes": 8258621440,
     "usedBytes": 943718400,  这个就是我后面测试要用到的tmpfs对应的emptydir
     "inodesFree": 2016263,
     "inodes": 2016265,
     "inodesUsed": 2,
     "name": "volume-1623324311949" 我的tmpfsdir的名称
    },
    {
     "time": "2021-11-15T12:17:09Z",
     "device": "",
     "availableBytes": 8258609152,
     "capacityBytes": 8258621440,
     "usedBytes": 12288,
     "inodesFree": 2016256,
     "inodes": 2016265,
     "inodesUsed": 9,
     "name": "default-token-kn777"
    }
   ],
   "ephemeral-storage": {
    "time": "2021-11-22T06:43:02Z",
    "device": "",
    "availableBytes": 57990582272,
    "capacityBytes": 126692048896,
    "usedBytes": 10420224,
    "inodesFree": 7291487,
    "inodes": 7864320,
    "inodesUsed": 30
   }
  },
查看cad的细项数据,细项数据比较难度,建议可以看cad的summary,summary是top pod 以及监控平台的指标数据的来源
# curl http://127.0.0.1:10255/metrics/cadvisor  |grep my-*****-x97rx|grep -i memory
container_memory_cache{container="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda9551.slice",image="",name="",namespace="default",pod="my-nettools-primary-8584bffdf5-x97rx"} 9.74966784e+08 1637563886638
container_memory_cache{container="POD",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda9d51.slice/docker-6b14fd.scope",image="registry-vpc.cn-beijing.aliyuncs.com/acs/pause:3.2",name="k8s_POD_my-nettools-primary-8584bffdf5-x97rx_default_a9d684a4-53fe-4114-b8d0-8f138b933551_0",namespace="default",pod="my-nett****-x97rx"} 0 1637563884720
...
container_spec_memory_swap_limit_bytes{container="my-nettools",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda9d651.slice/docker-e031fe3f53.scope",image="sha256:fa7b0d7ccb2a5d0174b4eb0972fad721af98d3e0e290dbfacb9b3537152c0580",name="k8s_my-nettools_****x97rx_default_a9d684a4-53fe-4114-b8d0-8f138b933551_1",namespace="default",pod="my-****97rx"} 1.073741824e+09

使用cgroup的数值计算cpu使用的核数:

简单计算脚本:

tstart=$(date +%s%N);cstart=$(cat /sys/fs/cgroup/cpu/cpuacct.usage);sleep 5;tstop=$(date +%s%N);cstop=$(cat /sys/fs/cgroup/cpu/cpuacct.usage);result=`awk 'BEGIN{printf "%.2f\n",'$(($cstop - $cstart))'/'$(($tstop - $tstart))'}'`;echo $result;


本案例问题复现时的OOM日志:

Nov 11 06:55:00 iZb10Z kernel: GC Thread#2 invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=873
Nov 11 06:55:00 iZ0Z kernel: GC Thread#2 cpuset=docker-24ec5b44****338000ff2f7516.scope mems_allowed=0
Nov 11 06:55:00 iZbp15by10Z kernel: CPU: 3 PID: 213115 Comm: GC Thread#2 Tainted: G        W  OE     4.19.91-22.2.al7.x86_64 #1
Nov 11 06:55:00 iZbp1nenhby10Z kernel: Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8a46cfe 04/01/2014
Nov 11 06:55:00 ienhby10Z kernel: Call Trace:
Nov 11 06:55:00 iZbp15y10Z kernel: dump_stack+0x66/0x8b
Nov 11 06:55:00 iZbpy10Z kernel: dump_memcg_header+0x12/0x40
Nov 11 06:55:00 iZb10Z kernel: oom_kill_process+0x219/0x310
Nov 11 06:55:00 iZbZ kernel: out_of_memory+0xf7/0x4c0
Nov 11 06:55:00 iZb10Z kernel: mem_cgroup_out_of_memory+0xc2/0xe0
Nov 11 06:55:00 iZbpy10Z kernel: try_charge+0x7b4/0x810
Nov 11 06:55:00 i0Z kernel: mem_cgroup_charge+0xfe/0x250
Nov 11 06:55:00 iZb10Z kernel: do_anonymous_page+0xe1/0x5b0
Nov 11 06:55:00 iZbp0Z kernel: __handle_mm_fault+0x665/0xa20
Nov 11 06:55:00 iZb0Z kernel: handle_mm_fault+0x106/0x1c0
Nov 11 06:55:00 iZbp10Z kernel: __do_page_fault+0x1b7/0x470
Nov 11 06:55:00 iZbpy10Z kernel: do_page_fault+0x32/0x140
Nov 11 06:55:00 iZbhby10Z kernel: ? async_page_fault+0x8/0x30
Nov 11 06:55:00 iZbpnhby10Z kernel: async_page_fault+0x1e/0x30
Nov 11 06:55:00 iZbpy10Z kernel: RIP: 0033:0x7faef467a164
Nov 11 06:55:00 iZbpy10Z kernel: Code: 43 38 49 89 45 38 48 8b 43 30 49 89 45 30 48 8b 43 28 49 89 45 28 48 8b 43 20 49 89 45 20 48 8b 43 18 49 89 45 18 48 8b 43 10 <49> 89 45 10 48 8b 43 08 49 89 45 08 48 8b 03 49 89 45 00 e9 34 ff
Nov 11 06:55:00 iZbpy10Z kernel: RSP: 002b:00007fae8e5b69c0 EFLAGS: 00010217
Nov 11 06:55:00 iZbpy10Z kernel: RAX: 0000000087a956d8 RBX: 000000077c6f5c90 RCX: 000000067d541003
Nov 11 06:55:00 iZbpy10Z kernel: RDX: ffffffffff85c208 RSI: 0000000000000001 RDI: 00007fae5c0219f0
Nov 11 06:55:00 iZbpy10Z kernel: RBP: 00007fae8e5b6a30 R08: 0000000000000000 R09: 00000000000005c7
Nov 11 06:55:00 iZbpy10Z kernel: R10: 0000000000000001 R11: 000000000000000f R12: 00007fae5c017290
Nov 11 06:55:00 iZbpy10Z kernel: R13: 000000067d541000 R14: 0000000000000003 R15: 00007fae5c021a80
Nov 11 06:55:00 iZbpy10Z kernel: Task in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod96bc****670f9.slice/docker-24ec*5ae338000ff2f7516.scope killed as a result of limit of /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod96bc17dc_f218_4e61_be05_c945fa8670f9.slice
下面这一条说明了pod的内存限制以及使用情况,说明触发了上限被oom了
Nov 11 06:55:00 iZy10Z kernel: memory: usage 16777216kB, limit 16777216kB, failcnt 17883
Nov 11 06:55:00 iZy10Z kernel: memory+swap: usage 16777216kB, limit 9007199254740988kB, failcnt 0
Nov 11 06:55:00 iZbpy10Z kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
下面这条是cgroup打印出来的pod cgroup根目录下各个docker目录占用的情况,这一条实际可以看出来是一个docker占用了8个g的shmem,shmem算到cache里面,
Nov 11 06:55:00 iZbpy10Z kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-po*34a0838.scope: cache:8840436KB rss:388KB rss_huge:0KB shmem:8840436KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB workingset_refault_anon:0KB workingset_refault_file:0KB workingset_activate_anon:0KB workingset_activate_file:0KB workingset_restore_anon:0KB workingset_restore_file:0KB workingset_nodereclaim:0KB inactive_anon:4207500KB active_anon:4631220KB inactive_file:748KB active_file:0KB unevictable:0KB
这一条则是当时的主业务进程所在的container cgroup信息,说明cgroup记录到过“残留”的pod内存使用
Nov 11 06:55:00 iZbp1*10Z kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-p*8000ff2f7516.scope: cache:132KB rss:7917200KB rss_huge:7163904KB shmem:0KB mapped_file:660KB dirty:0KB writeback:0KB swap:0KB workingset_refault_anon:0KB workingset_refault_file:1716KB workingset_activate_anon:0KB workingset_activate_file:0KB workingset_restore_anon:0KB workingset_restore_file:0KB workingset_nodereclaim:0KB inactive_anon:7917060KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
下面则是触发cgroup内存上限后的oom日志,打分并kill了java进程(rss是page数,1个page是4Kb)
Nov 11 06:55:00 iZbphby10Z kernel: Tasks state (memory values in pages):
Nov 11 06:55:00 iZbphby10Z kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Nov 11 06:55:00 iZbphby10Z kernel: [2064649]     0 2064649      242        1    24576        0          -998 pause
Nov 11 06:55:00 iZbphby10Z kernel: [ 212336]  1001 212336     1095      184    49152        0           873 tini
Nov 11 06:55:00 iZbphby10Z kernel: [ 212425]  1001 212425  4025962  1985479 16760832        0           873 java
Nov 11 06:55:00 iZbphby10Z kernel: oom_reaper: reaped process 212425 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:32kB

异常OOM的shmem占用原因:emptyDir

当 Pod 分派到某个 Node 上时,emptyDir 卷会被创建,并且在 Pod 在该节点上运行期间,卷一直存在。 就像其名称表示的那样,卷最初是空的。 尽管 Pod 中的容器挂载 emptyDir 卷的路径可能相同也可能不同,这些容器都可以读写 emptyDir 卷中相同的文件。 当 Pod 因为某些原因被从节点上删除时,emptyDir 卷中的数据也会被永久删除。

说明: 容器崩溃并不会导致 Pod 被从节点上移除,因此容器崩溃期间 emptyDir 卷中的数据是安全的。

emptyDir 的一些用途:

  • 缓存空间,例如基于磁盘的归并排序。
  • 为耗时较长的计算任务提供检查点,以便任务能方便地从崩溃前状态恢复执行。
  • 在 Web 服务器容器服务数据时,保存内容管理器容器获取的文件。

取决于你的环境,emptyDir 卷存储在该节点所使用的介质上;这里的介质可以是磁盘或 SSD 或网络存储。但是,你可以将 emptyDir.medium 字段设置为 "Memory",以告诉 Kubernetes 为你挂载 tmpfs(基于 RAM 的文件系统)。 虽然 tmpfs 速度非常快,但是要注意它与磁盘不同。 tmpfs 在节点重启时会被清除,并且你所写入的所有文件都会计入容器的内存消耗,受容器内存限制约束。

image.png

问题复现与验证:

1,创建memory类型的emptydir

image.png

2,在挂载目录里面放一些大文件

image.png

3,top pod观察

image.png

当时的疑问?

kubepods-burstable-***0f9.slice 这个相同说明是一个pod,

docker-295fa7c9e5ed1***14d6b5180fdfc34a0838

docker-24ec5b440a5****38000ff2f7516.scope

这个不同是两个container,但是实际的排查过程中,默认cgroup的pod目录下只有pause以及主业务的docker目录,出现了一个“历史”

docker目录,这个很有问题!

复现oom日志参考:

Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: dd invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=968
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: dd cpuset=docker-45c3345632cb4d19698badde97941c43f6c3593f21d98a847049b5f03805b642.scope mems_allowed=0
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: CPU: 1 PID: 2992940 Comm: dd Kdump: loaded Tainted: G           OE     4.19.91-22.2.al7.x86_64 #1
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: Call Trace:
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel:  dump_stack+0x66/0x8b
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel:  dump_memcg_header+0x12/0x40
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel:  oom_kill_process+0x219/0x310
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel:  out_of_memory+0xf7/0x4c0
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel:  mem_cgroup_out_of_memory+0xc2/0xe0
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel:  try_charge+0x7b4/0x810
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel:  mem_cgroup_charge+0xfe/0x250
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel:  shmem_add_to_page_cache+0x1d6/0x340
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel:  ? shmem_alloc_and_acct_page+0x76/0x1d0
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel:  shmem_getpage_gfp+0x5df/0xce0
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel:  ? copyin+0x22/0x30
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel:  shmem_getpage+0x2d/0x40
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel:  generic_perform_write+0xb2/0x190
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel:  __generic_file_write_iter+0x184/0x1c0
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel:  ? __handle_mm_fault+0x665/0xa20
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel:  generic_file_write_iter+0xec/0x1d0
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel:  new_sync_write+0xdb/0x120
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel:  vfs_write+0xad/0x1a0
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel:  ? vfs_read+0x110/0x130
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel:  ksys_write+0x4a/0xc0
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel:  do_syscall_64+0x5b/0x1b0
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: RIP: 0033:0x7fe35dee5c27
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: Code: Bad RIP value.
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: RSP: 002b:00007ffcb77532b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: RAX: ffffffffffffffda RBX: 0000000000100000 RCX: 00007fe35dee5c27
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: RDX: 0000000000100000 RSI: 00007fe35dcd9000 RDI: 0000000000000001
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: RBP: 00007fe35dcd9000 R08: 00007fe35dcd8010 R09: 00007fe35dcd8010
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: R10: 00007fe35dcd9000 R11: 0000000000000246 R12: 0000000000000000
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: R13: 0000000000000000 R14: 00007fe35dcd9000 R15: 0000000000100000
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: Task in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podc00e47e7_5c66_4885_a929_607663ed3cd4.slice/docker-45c3345632cb4d19698badde
97941c43f6c3593f21d98a847049b5f03805b642.scope killed as a result of limit of /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podc00e47e7_5c66_4885_a929_607663ed3cd4.slice
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: memory: usage 1048576kB, limit 1048576kB, failcnt 243
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: memory+swap: usage 1048576kB, limit 9007199254740988kB, failcnt 0
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podc00e47e7_5c66_4885_a929_607663ed3cd4.slice: cache:0KB rss:
0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB workingset_refault_anon:0KB workingset_refault_file:0KB workingset_activate_anon:0KB workingset_activate_file:0KB wor
kingset_restore_anon:0KB workingset_restore_file:0KB workingset_nodereclaim:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podc00e47e7_5c66_4885_a929_607663ed3cd4.slice/docker-27240dd2
0e37e46a25ae6c2219763a8123782762e981e46be2b3311620e17fcc.scope: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB workingset_refault_anon:0KB workingset_
refault_file:0KB workingset_activate_anon:0KB workingset_activate_file:0KB workingset_restore_anon:0KB workingset_restore_file:0KB workingset_nodereclaim:0KB inactive_anon:0KB active_anon:0KB i
nactive_file:0KB active_file:0KB unevictable:0KB
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podc00e47e7_5c66_4885_a929_607663ed3cd4.slice/docker-45c33456
32cb4d19698badde97941c43f6c3593f21d98a847049b5f03805b642.scope: cache:1043196KB rss:5280KB rss_huge:0KB shmem:1042932KB mapped_file:132KB dirty:132KB writeback:0KB swap:0KB workingset_refault_a
non:0KB workingset_refault_file:396KB workingset_activate_anon:0KB workingset_activate_file:0KB workingset_restore_anon:0KB workingset_restore_file:0KB workingset_nodereclaim:0KB inactive_anon:
536316KB active_anon:511896KB inactive_file:156KB active_file:0KB unevictable:0KB
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: Tasks state (memory values in pages):
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: [2456337]     0 2456337      242        1    28672        0          -998 pause
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: [2456661]     0 2456661      634       22    45056        0           968 sleep
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: [2456827]     0 2456827    14684      382   102400        0           968 nginx
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: [2456828]    33 2456828    14766      457   106496        0           968 nginx
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: [2456829]    33 2456829    14766      457   106496        0           968 nginx
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: [2456830]    33 2456830    14766      457   106496        0           968 nginx
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: [2456831]    33 2456831    14766      457   106496        0           968 nginx
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: [2991826]     0 2991826     1116      165    49152        0           968 bash
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: [2992940]     0 2992940      903      272    45056        0           968 dd
Nov 12 12:46:50 iZ2zeeh81ypipbs4e866uzZ kernel: oom_reaper: reaped process 2456831 (nginx), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

docker inspect相关信息:

# docker inspect  45c3345632cb |grep -i pid
            "Pid": 2456661,
            "PidMode": "",
            "PidsLimit": null,
[root@iZ2ze***6uzZ ~]# more /proc/2456661/cgroup 
12:memory:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podc00e47e7_5c66_4885_a929_607663ed3cd4.slice/docker-45c3345632cb4d19698badde97941c43f6c3593f21d98a847049b5f03805b642.scope
...

问题遗留:

使用脚本采集shmem相关信息发现,一开始的shmem是记录在pod的cgroup 根目录的

podbasedir=/sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod***3551.slice
for i in {1..86400}
  do
  echo $(date) >> /root/shmem-leak-check.log
  echo "$podbasedir has this docker dir" >>/root/shmem-leak-check.log
  ls -l $podbasedir |grep docker >>/root/shmem-leak-check.log
  echo "Find docker containers and write docker inspect name,the POD is pause" >>/root/shmem-leak-check.log
  ls -l $podbasedir |grep docker |awk '{print $NF}'|cut -b 8-15|xargs -n1 -I {}  docker inspect  --format="{{json .Name}}" {} >>/root/shmem-leak-check.log
  echo "Start record total memory.stat " >>/root/shmem-leak-check.log
  egrep "total_cache|total_rss|total_shmem|total_inactive_file" $podbasedir/memory.stat >>/root/shmem-leak-check.log
  echo "Start record docker memory.stat" >>/root/shmem-leak-check.log 
  egrep "total_cache|total_rss|total_shmem|total_inactive_file" $podbasedir/docker-*/memory.stat >>/root/shmem-leak-check.log
  sleep 60s
  done

采集效果:

第一个total_shmem是pod的cgroup目录,后面则是采集的docker的cgroup目录,可以看到total存在shmem,docker目录反而不存在,跟oom实际发生的kill不一样

Thu Nov 18 13:37:49 CST 2021
/sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod96bc10f9.slice has this docker dir
drwxr-xr-x 2 root root 0 Nov 17 19:42 docker-3ebefb627.scope
drwxr-xr-x 2 root root 0 Jul 14 17:28 docker-a32b86285.scope
find docker containers and write docker inspect name,the POD is pause
"/k8s_traden-prod_96bc17dc-f218-4e61-be05-c945fa8670f9_44"
"/k8s_POD_trade-c7dc-f218-4e61-be05-c945fa8670f9_0"
start record total memory.stat 
total_cache 9075990528
total_rss 6361980928
total_rss_huge 0
total_shmem 9069502464
total_inactive_file 2592768
start record docker memory.stat
/sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstab45fa8670f9.slice/docker-3eb2fc3dbd3c99edf0937a362fb627.scope/memory.stat:total_cache 1892352
/sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstab45fa8670f9.slice/docker-3eb2fc3dbd3c99edf0937a362fb627.scope/memory.stat:total_rss 6373216256
/sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstab45fa8670f9.slice/docker-3eb2fc3dbd3c99edf0937a362fb627.scope/memory.stat:total_rss_huge 5605687296
/sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstab45fa8670f9.slice/docker-3eb2fc3dbd3c99edf0937a362fb627.scope/memory.stat:total_shmem 0
/sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstab45fa8670f9.slice/docker-3eb2fc3dbd3c99edf0937a362fb627.scope/memory.stat:total_inactive_file 1892352
/sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstab45fa8670f9.slice/docker-a32d946806ba0525a9af6cb9c86285.scope/memory.stat:total_cache 0
/sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstab45fa8670f9.slice/docker-a32d946806ba0525a9af6cb9c86285.scope/memory.stat:total_rss 180224
/sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstab45fa8670f9.slice/docker-a32d946806ba0525a9af6cb9c86285.scope/memory.stat:total_rss_huge 0
/sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstab45fa8670f9.slice/docker-a32d946806ba0525a9af6cb9c86285.scope/memory.stat:total_shmem 0
/sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstab45fa8670f9.slice/docker-a32d946806ba0525a9af6cb9c86285.scope/memory.stat:total_inactive_file 0

告知用户复现方式,客户自行复现:

image.png

最终结论:

pod oom 时 pod不会被删除重建,而是重建一个container,但是重建的container 以及被oom删除的container 都不会删除tmpfs里面的内容(原因是emptydir需要删除pod或者重启节点才会清理空间),因此新起的container自己的docker cgroup目录没有shmem的记录,老的container的shmem会移送到pod的cgroup根目录进行统计,因此最终引发oom的时候,新的container的目录统计到的内存并不多


扩展如何查看内存里面的数据:

1,先查看pod的内存占用

image.png

2,往njq7t里面拷贝一个大文件,这里用messages日志

kubectl cp /var/log/messages my-nettools-6fb6864b8c-njq7t:/root

过大的文件容易失败(可以在pod内去下载),docker cp也是,报错如下:

# docker cp /var/log/messages c210654bd1f9:/root
ERRO[0006] Can't add file /var/log/messages to tar: archive/tar: write too long

3,在pod内打开message文件,开始各种搜索关键词,可以看到内存占用起来了

image.png

4,查看pod内的内存占用分配

这里的单位是bytes,直接除以2次 1024/1024得到的是Mb的单位,可以看出来cache占了大部分
~# cat /sys/fs/cgroup/memory/memory.stat 
cache 442810368
rss 5812224
rss_huge 0
shmem 0
mapped_file 3649536
dirty 0
writeback 0
swap 0
workingset_refault_anon 0
workingset_refault_file 733962240
workingset_activate_anon 0
workingset_activate_file 506609664
workingset_restore_anon 0
workingset_restore_file 72720384
workingset_nodereclaim 0
pgpgin 416988
pgpgout 307447
pgfault 235191
pgmajfault 231
inactive_anon 5677056
active_anon 0
inactive_file 211652608
active_file 231276544
unevictable 0
hierarchical_memory_limit 1073741824
hierarchical_memsw_limit 1073741824
total_cache 442810368
total_rss 5812224
total_rss_huge 0
total_shmem 0
total_mapped_file 3649536
total_dirty 0
total_writeback 0
total_swap 0
total_workingset_refault_anon 0
total_workingset_refault_file 733962240
total_workingset_activate_anon 0
total_workingset_activate_file 506609664
total_workingset_restore_anon 0
total_workingset_restore_file 72720384
total_workingset_nodereclaim 0
total_pgpgin 416988
total_pgpgout 307447
total_pgfault 235191
total_pgmajfault 231
total_inactive_anon 5677056
total_active_anon 0
total_inactive_file 211652608
total_active_file 231276544
total_unevictable 0

5.这些内存 cache里面放的都是神马数据呢?

使用fincore采集不到,pcstat 也不行

我前面是用vi打开的文件,然后全局搜索关键词,把内存打起来的,要找到vi的pid

image.png

使用下面的脚本抓一下

procdump()
( 
    cat /proc/$1/maps | grep -Fv ".so" | grep " 0 " | awk '{print $1}' | ( IFS="-"
    while read a b; do
        dd if=/proc/$1/mem bs=$( getconf PAGESIZE ) iflag=skip_bytes,count_bytes \
           skip=$(( 0x$a )) count=$(( 0x$b - 0x$a )) of="$1_mem_$a.bin"
    done )
)

抓取提示:

# procdump  1241520 
dd: ‘/proc/1241520/mem’: cannot skip to specified offset
14+0 records in
14+0 records out
57344 bytes (57 kB) copied, 0.000247268 s, 232 MB/s
dd: ‘/proc/1241520/mem’: cannot skip to specified offset
226221+0 records in
226221+0 records out
926601216 bytes (927 MB) copied, 4.14441 s, 224 MB/s
......
dd: ‘/proc/1241520/mem’: cannot skip to specified offset
dd: error reading ‘/proc/1241520/mem’: Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000151461 s, nan kB/s
dd: ‘/proc/1241520/mem’: cannot skip to specified offset
2+0 records in
2+0 records out
8192 bytes (8.2 kB) copied, 0.00014037 s, 58.4 MB/s

会在当前目录生成一些文件,如下,我们看看九百多m的是谁

image.png

效果:

这个最大的900多m内存的占用,都是我的messages里面的内容

# hexdump -C -n 10000 1241520_mem_55c9bab6c000.bin |more
00000000  00 00 00 00 00 00 00 00  91 02 00 00 00 00 00 00  |................|
00000010  07 00 07 00 03 00 01 00  02 00 02 00 01 00 01 00  |................|
00000020  05 00 01 00 03 00 01 00  02 00 07 00 02 00 07 00  |................|
00000030  03 00 03 00 02 00 01 00  04 00 02 00 00 00 01 00  |................|
00000040  01 00 02 00 01 00 02 00  05 00 01 00 07 00 02 00  |................|
00000050  01 00 00 00 01 00 02 00  01 00 00 00 01 00 01 00  |................|
00000060  01 00 01 00 01 00 00 00  01 00 01 00 01 00 03 00  |................|
00000070  00 00 01 00 01 00 01 00  00 00 02 00 01 00 01 00  |................|
00000080  01 00 00 00 01 00 01 00  00 00 01 00 00 00 01 00  |................|
00000090  80 be ac c1 c9 55 00 00  60 87 b5 c2 c9 55 00 00  |.....U..`....U..|
000000a0  90 f6 b9 ba c9 55 00 00  f0 e0 e6 f1 c9 55 00 00  |.....U.......U..|
000000b0  b0 ef b9 ba c9 55 00 00  a0 b2 b7 ba c9 55 00 00  |.....U.......U..|
000000c0  e0 f8 dc ba c9 55 00 00  90 99 da ba c9 55 00 00  |.....U.......U..|
000000d0  00 34 da ba c9 55 00 00  d0 e8 b9 ba c9 55 00 00  |.4...U.......U..|
000000e0  00 75 b9 ba c9 55 00 00  10 05 c6 f1 c9 55 00 00  |.u...U.......U..|
000000f0  80 bb b9 ba c9 55 00 00  00 85 db ba c9 55 00 00  |.....U.......U..|
00000100  00 86 b9 ba c9 55 00 00  50 7b b9 ba c9 55 00 00  |.....U..P{...U..|
00000110  e0 f4 dc ba c9 55 00 00  90 f7 b9 ba c9 55 00 00  |.....U.......U..|
00000120  00 d7 b9 ba c9 55 00 00  e0 95 ea f1 c9 55 00 00  |.....U.......U..|
00000130  f0 81 da ba c9 55 00 00  30 56 e9 f1 c9 55 00 00  |.....U..0V...U..|
00000140  00 00 00 00 00 00 00 00  50 5f dc ba c9 55 00 00  |........P_...U..|
00000150  40 f7 dc ba c9 55 00 00  e0 60 dc ba c9 55 00 00  |@....U...`...U..|
00000160  c0 e9 e6 f1 c9 55 00 00  a0 dc b7 ba c9 55 00 00  |.....U.......U..|
00000170  a0 c2 b6 ba c9 55 00 00  50 e4 e6 f1 c9 55 00 00  |.....U..P....U..|
00000180  70 0c db ba c9 55 00 00  b0 ba dd ba c9 55 00 00  |p....U.......U..|
00000190  c0 b4 bb ba c9 55 00 00  00 00 00 00 00 00 00 00  |.....U..........|
000001a0  f0 ce eb f1 c9 55 00 00  a0 07 da ba c9 55 00 00  |.....U.......U..|
000001b0  a0 57 e9 f1 c9 55 00 00  00 00 00 00 00 00 00 00  |.W...U..........|
000001c0  30 1d d2 ba c9 55 00 00  00 5a e9 f1 c9 55 00 00  |0....U...Z...U..|
000001d0  80 fb c5 f1 c9 55 00 00  e0 f2 df f1 c9 55 00 00  |.....U.......U..|
000001e0  d0 99 c7 ba c9 55 00 00  00 00 00 00 00 00 00 00  |.....U..........|
000001f0  30 d1 eb f1 c9 55 00 00  b0 36 cf ba c9 55 00 00  |0....U...6...U..|
00000200  f0 8a da ba c9 55 00 00  40 e1 e6 f1 c9 55 00 00  |.....U..@....U..|
00000210  00 00 00 00 00 00 00 00  e0 41 dc ba c9 55 00 00  |.........A...U..|
00000220  b0 dd e6 f1 c9 55 00 00  20 fe c5 f1 c9 55 00 00  |.....U.. ....U..|
00000230  00 00 00 00 00 00 00 00  d0 0b da ba c9 55 00 00  |.............U..|
00000240  40 e6 e6 f1 c9 55 00 00  e0 05 c6 f1 c9 55 00 00  |@....U.......U..|
00000250  70 01 c6 f1 c9 55 00 00  00 00 00 00 00 00 00 00  |p....U..........|
00000260  b0 06 bc ba c9 55 00 00  10 d4 eb f1 c9 55 00 00  |.....U.......U..|
00000270  00 00 00 00 00 00 00 00  b0 4f e9 f1 c9 55 00 00  |.........O...U..|
00000280  00 00 00 00 00 00 00 00  80 97 ca bb c9 55 00 00  |.............U..|
00000290  00 00 00 00 00 00 00 00  e1 01 00 00 00 00 00 00  |................|
000002a0  5c 2c 22 e6 cc 55 00 00  10 c0 b6 ba c9 55 00 00  |\,"..U.......U..|
000002b0  69 5a 32 7a 65 69 7a 75  78 74 32 6e 61 71 38 31  |iZ2zeizuxt2naq81|
000002c0  32 74 32 70 66 39 5a 20  6b 75 62 65 6c 65 74 3a  |2t2pf9Z kubelet:|
000002d0  20 49 31 31 30 37 20 31  38 3a 34 36 3a 34 35 2e  | I1107 18:46:45.|
000002e0  35 35 33 37 35 33 20 20  20 20 20 35 31 34 20 6b  |553753     514 k|
000002f0  75 62 65 72 75 6e 74 69  6d 65 5f 6d 61 6e 61 67  |uberuntime_manag|
00000300  65 72 2e 67 6f 3a 36 35  30 5d 20 63 6f 6d 70 75  |er.go:650] compu|
00000310  74 65 50 6f 64 41 63 74  69 6f 6e 73 20 67 6f 74  |tePodActions got|
00000320  20 7b 4b 69 6c 6c 50 6f  64 3a 66 61 6c 73 65 20  | {KillPod:false |
00000330  43 72 65 61 74 65 53 61  6e 64 62 6f 78 3a 66 61  |CreateSandbox:fa|
00000340  6c 73 65 20 53 61 6e 64  62 6f 78 49 44 3a 31 36  |lse SandboxID:16|
00000350  39 65 65 66 37 33 36 38  38 31 65 35 39 36 36 35  |9eef736881e59665|
00000360  39 63 66 38 34 34 31 33  30 63 39 37 63 34 34 30  |9cf844130c97c440|
00000370  62 38 36 36 64 30 66 34  30 33 38 38 63 36 35 62  |b866d0f40388c65b|
00000380  66 61 63 66 62 31 31 31  65 64 65 33 37 39 20 41  |facfb111ede379 A|
00000390  74 74 65 6d 70 74 3a 32  20 4e 65 78 74 49 6e 69  |ttempt:2 NextIni|
000003a0  74 43 6f 6e 74 61 69 6e  65 72 54 6f 53 74 61 72  |tContainerToStar|
000003b0  74 3a 6e 69 6c 20 43 6f  6e 74 61 69 6e 65 72 73  |t:nil Containers|
000003c0  54 6f 53 74 61 72 74 3a  5b 5d 20 43 6f 6e 74 61  |ToStart:[] Conta|
000003d0  69 6e 65 72 73 54 6f 4b  69 6c 6c 3a 6d 61 70 5b  |inersToKill:map[|
000003e0  5d 20 45 70 68 65 6d 65  72 61 6c 43 6f 6e 74 61  |] EphemeralConta|
000003f0  69 6e 65 72 73 54 6f 53  74 61 72 74 3a 5b 5d 7d  |inersToStart:[]}|
00000400  20 66 6f 72 20 70 6f 64  20 22 6b 72 75 69 73 65  | for pod "kruise|
00000410  2d 63 6f 6e 74 72 6f 6c  6c 65 72 2d 6d 61 6e 61  |-controller-mana|
00000420  67 65 72 2d 37 62 64 66  35 34 36 37 64 38 2d 67  |ger-7bdf5467d8-g|
00000430  77 76 38 74 5f 6b 72 75  69 73 65 2d 73 79 73 74  |wv8t_kruise-syst|
00000440  65 6d 28 66 37 30 38 37  63 32 36 2d 37 33 37 64  |em(f7087c26-737d|
00000450  2d 34 64 30 36 2d 62 34  34 30 2d 32 63 66 64 63  |-4d06-b440-2cfdc|
00000460  36 30 30 31 32 35 39 29  22 00 00 00 00 00 00 00  |6001259)".......|
00000470  60 3f 22 c5 05 7f 00 00  81 00 00 00 00 00 00 00  |`?".............|
00000480  69 66 20 67 65 74 6c 69  6e 65 28 31 29 20 3d 7e  |if getline(1) =~|
00000490  20 27 5e 46 72 6f 6d 20  5b 30 2d 39 61 2d 66 5d  | '^From [0-9a-f]|
000004a0  5c 7b 34 30 5c 7d 20 4d  6f 6e 20 53 65 70 20 31  |\{40\} Mon Sep 1|
000004b0  37 20 30 30 3a 30 30 3a  30 30 20 32 30 30 31 24  |7 00:00:00 2001$|
000004c0  27 20 7c 20 20 20 73 65  74 66 20 67 69 74 73 65  |' |   setf gitse|
000004d0  6e 64 65 6d 61 69 6c 20  7c 20 65 6c 73 65 20 7c  |ndemail | else ||

我们再回过头来看下tmpfs占用的内存,能不能dump内存抓到

放到tmpfs目录里面的文件,使用procdump在ecs上对pod的pid进行扫描获取不到,我们试试pod内运行fincore

# kubectl exec -it my-nettools-primary-8584bffdf5-x97rx -- bash
root@my-nettools-primary-8584bffdf5-x97rx:/# ls -l /memtest/
total 921600
-rw-r--r-- 1 root root 943718400 Nov 18 13:54 900m
root@my-nettools-primary-8584bffdf5-x97rx:/# 
root@my-nettools-primary-8584bffdf5-x97rx:/# 
root@my-nettools-primary-8584bffdf5-x97rx:/# cd ..
root@my-nettools-primary-8584bffdf5-x97rx:/# exit
exit
[root@iZ2zeeh81ypipbs4e866uzZ ~]# docker ps |grep my-n
e00a11510727        fa7b0d7ccb2a                                                                "sleep 360000"           3 days ago          Up 3 days                                                                k8s_my-nettools_my-nettools-primary-8584bffdf5-x97rx_default_a9d684a4-53fe-4114-b8d0-8f138b933551_1
6bfc5310e773        registry-vpc.cn-beijing.aliyuncs.com/acs/pause:3.2                          "/pause"                 6 days ago          Up 6 days                                                                k8s_POD_my-nettools-primary-8584bffdf5-x97rx_default_a9d684a4-53fe-4114-b8d0-8f138b933551_0
[root@iZ2zeeh81ypipbs4e866uzZ ~]# docker inspect e00a11510727 |grep -i pid
            "Pid": 3747629,
            "PidMode": "",
            "PidsLimit": null,
[root@iZ2zeeh81ypipbs4e866uzZ ~]# pstree -sp 3747629
systemd(1)───containerd(614)───containerd-shim(3747610)───sleep(3747629)───nginx(3747717)─┬─nginx(3747718)
                                                                                          ├─nginx(3747719)
                                                                                          ├─nginx(3747720)
                                                                                          └─nginx(3747721)
# procdump 3747610
# procdump 3747629
# ls
3747610_mem_009d6000.bin      3747610_mem_7ffcb17b2000.bin  3747610_mem_c000400000.bin    3747629_mem_7f0de51c2000.bin  3747629_mem_7ffccfdc3000.bin
3747610_mem_7fce67b20000.bin  3747610_mem_7ffcb17b5000.bin  3747629_mem_55a0c9624000.bin  3747629_mem_7f0de51cf000.bin  3747629_mem_7ffccfdc6000.bin
3747610_mem_7ffcb1771000.bin  3747610_mem_c000000000.bin    3747629_mem_7f0de4fdc000.bin  3747629_mem_7ffccfd72000.bin
[root@iZ2zeeh81ypipbs4e866uzZ dump]# for i in `ls`;do hexdump -C $i |grep 900m ;done
[root@iZ2zeeh81ypipbs4e866uzZ dump]# 

fincore的扫描情况:

 ./fincore-detection.sh --pages=false --summarize --only-cached  /memtest/*
 filename size   total pages     cached pages    cached size     cached percentage
...
/usr/bin/bash 1404744 343 343 1404928 100.000000
...
/usr/bin/bash 1404744 343 343 1404928 100.000000
0 
/tmp/cache.pids 34 1 1 4096 100.000000
0 
/root/fincore-detection.sh 957 1 1 4096 100.000000
...
/usr/sbin/nginx 1199248 293 293 1200128 100.000000
... 
/var/log/nginx/access.log 7598726 1856 1856 7602176 100.000000
---
total cached size: 20422656

fincore也扫描不出来pod内的tmpfs的内存占用,所以查看cadvisor的pod summary最为准确!

希望看完这一篇的小伙伴可以在未来遇到内存相关的问题上,玩得愉快~









相关实践学习
容器服务Serverless版ACK Serverless 快速入门:在线魔方应用部署和监控
通过本实验,您将了解到容器服务Serverless版ACK Serverless 的基本产品能力,即可以实现快速部署一个在线魔方应用,并借助阿里云容器服务成熟的产品生态,实现在线应用的企业级监控,提升应用稳定性。
云原生实践公开课
课程大纲 开篇:如何学习并实践云原生技术 基础篇: 5 步上手 Kubernetes 进阶篇:生产环境下的 K8s 实践 相关的阿里云产品:容器服务&nbsp;ACK 容器服务&nbsp;Kubernetes&nbsp;版(简称&nbsp;ACK)提供高性能可伸缩的容器应用管理能力,支持企业级容器化应用的全生命周期管理。整合阿里云虚拟化、存储、网络和安全能力,打造云端最佳容器化应用运行环境。 了解产品详情:&nbsp;https://www.aliyun.com/product/kubernetes
目录
相关文章
|
23天前
|
前端开发 编解码 数据格式
浅谈响应式编程在企业级前端应用 UI 开发中的实践
浅谈响应式编程在企业级前端应用 UI 开发中的实践
20 0
浅谈响应式编程在企业级前端应用 UI 开发中的实践
|
1月前
|
Prometheus Kubernetes 监控
容器服务ACK常见问题之pod设置securityContext调整参数失败如何解决
容器服务ACK(阿里云容器服务 Kubernetes 版)是阿里云提供的一种托管式Kubernetes服务,帮助用户轻松使用Kubernetes进行应用部署、管理和扩展。本汇总收集了容器服务ACK使用中的常见问题及答案,包括集群管理、应用部署、服务访问、网络配置、存储使用、安全保障等方面,旨在帮助用户快速解决使用过程中遇到的难题,提升容器管理和运维效率。
|
14天前
|
存储 Kubernetes 调度
Kubernetes Pod生命周期
Kubernetes Pod生命周期
23 0
Kubernetes Pod生命周期
|
14天前
|
存储 Kubernetes 应用服务中间件
Kubernetes Pod
Kubernetes Pod
46 0
Kubernetes Pod
|
25天前
|
存储 Kubernetes 调度
K8s Pod亲和性、污点、容忍度、生命周期与健康探测详解(下)
本文全面探讨了Kubernetes集群中Pod的四种关键机制——Pod亲和性、污点(Taints)、容忍度(Tolerations)、生命周期以及健康探测,为读者提供了深入理解并有效应用这些特性的指南。
|
25天前
|
Kubernetes 网络协议 Perl
k8s Failed to create pod sandbox: open /run/systemd/resolve/resolv.conf: no such file or directory
k8s Failed to create pod sandbox: open /run/systemd/resolve/resolv.conf: no such file or directory
24 0
|
1月前
|
Kubernetes Nacos 微服务
nacos常见问题之v2.2.3 k8s 微服务注册nacos强制删除 pod不消失如何解决
Nacos是阿里云开源的服务发现和配置管理平台,用于构建动态微服务应用架构;本汇总针对Nacos在实际应用中用户常遇到的问题进行了归纳和解答,旨在帮助开发者和运维人员高效解决使用Nacos时的各类疑难杂症。
24 1
nacos常见问题之v2.2.3 k8s 微服务注册nacos强制删除 pod不消失如何解决
|
1月前
|
域名解析 Kubernetes Linux
Kubernetes 外部 HTTP 请求到达 Pod 容器的全过程
Kubernetes 外部 HTTP 请求到达 Pod 容器的全过程
42 4
|
2月前
|
Kubernetes API Python
|
2月前
|
Kubernetes 应用服务中间件 nginx
K8S Pod Sidecar 应用场景之一 - 加入 NGINX Sidecar 做反代和 web 服务器
K8S Pod Sidecar 应用场景之一 - 加入 NGINX Sidecar 做反代和 web 服务器

推荐镜像

更多