1614943191442567 2019-03-18 21979浏览量
Recently, customers reported a issue that pod keep restarting after pod migrating from node to node, and java process inside pod exit abnormally.
After a couple of troubleshooting, we could figure out root cause OOM caused by LimitRange of namespace, kernel killed newly created process once memory of JVM request exceeds the default limit. In this article, I will explain the troubleshooting method step by step, it should be common PD method for most application OOM/pod restarting related in Kubernetes.
kubectl logs console-54dc5566b4-nq6gs -n test
jetty process failed to start without any log
Starting Jetty: -Djava.io.tmpdir=/tmp/jetty
start-stop-daemon -S -p/var/run/jetty.pid -croot -d/var/lib/jetty -b -m -a /usr/bin/java -- -Djava.io.tmpdir=/tmp/jetty -Xms512m -Xmx1g -Djetty.logs=/usr/local/jetty/logs -Dspring.profiles.active=StandaloneLogin,Agility,NotAliCloud,NotPublicCloud -Djetty.home=/usr/local/jetty -Djetty.base=/var/lib/jetty -Djava.io.tmpdir=/tmp/jetty -jar /usr/local/jetty/start.jar jetty.state=/var/lib/jetty/jetty.state jetty-started.xml start-log-file=/usr/local/jetty/logs/start.log
jetty.state=/var/lib/jetty/jetty.state
FAILED Thu Mar 14 09:43:55 UTC 2019
tail: cannot open '/var/lib/jetty/logs/*.log' for reading: No such file or directory
tail: no files remaining
console pod keep re-creating on k8s cluster
kubectl get events --all-namespaces | grep console
test 1m 1m 1 console-54dc5566b4-sx2r6.158bc8b1f2a076ce Pod spec.containers{console} Normal Killing kubelet, k8s003 Killing container with id docker://console:Container failed liveness probe.. Container will be killed and recreated.
test 1m 6m 2 console-54dc5566b4-hx6wb.158bc86c4379c4e7 Pod spec.containers{console} Normal Started kubelet, k8s001 Started container
test 1m 6m 2 console-54dc5566b4-hx6wb.158bc86c355ab395 Pod spec.containers{console} Normal Created kubelet, k8s001 Created container
test 1m 6m 2 console-54dc5566b4-hx6wb.158bc86c2fe32c76 Pod spec.containers{console} Normal Pulled kubelet, k8s001 Container image "registry.cn-hangzhou.aliyuncs.com/eric-dev/console:0.9-62f837e" already present on machine
test 1m 1m 1 console-54dc5566b4-hx6wb.158bc8b87083e752 Pod spec.containers{console} Normal Killing kubelet, k8s001 Killing container with id docker://console:Container failed liveness probe.. Container will be killed and recreated.
kubectl get pod console-54dc5566b4-hx6wb -n test -o yaml | grep reason -C5
lastState:
terminated:
containerID: docker://90e5c9e618f3e745ebf510b8f215da3a165e3d03be58e0369e27c1773e75ef70
exitCode: 137
finishedAt: 2019-03-14T09:29:51Z
reason: OOMKilled
startedAt: 2019-03-14T09:24:51Z
name: console
ready: true
restartCount: 3
state:
kubectl get pod console-54dc5566b4-hx6wb -n test -o jsonpath='{.status.containerStatuses[].lastState}'
map[terminated:map[exitCode:137 reason:OOMKilled startedAt:2019-03-14T09:24:51Z finishedAt:2019-03-14T09:29:51Z containerID:docker://90e5c9e618f3e745ebf510b8f215da3a165e3d03be58e0369e27c1773e75ef70]]
Following error indicate an java oom caused by cgroup setting
# grep oom /var/log/messages
/var/log/messages:2019-03-14T09:15:17.541049+00:00 iZbp185dy2o3o6lnlo4f07Z kernel: [8040341.949064] java invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=0, order=0, oom_score_adj=968
/var/log/messages:2019-03-14T09:15:17.541117+00:00 iZbp185dy2o3o6lnlo4f07Z kernel: [8040341.949153] [<ffffffff81191de4>] oom_kill_process+0x214/0x3f0
/var/log/messages:2019-03-14T09:15:17.541119+00:00 iZbp185dy2o3o6lnlo4f07Z kernel: [8040341.949171] [<ffffffff811f9481>] mem_cgroup_oom_synchronize+0x2f1/0x310
/var/log/messages:2019-03-14T09:15:17.541147+00:00 iZbp185dy2o3o6lnlo4f07Z kernel: [8040341.950571] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
#grep oom /var/log/warn
2019-03-14T09:15:17.541049+00:00 iZbp185dy2o3o6lnlo4f07Z kernel: [8040341.949064] java invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=0, order=0, oom_score_adj=968
2019-03-14T09:15:17.541117+00:00 iZbp185dy2o3o6lnlo4f07Z kernel: [8040341.949153] [<ffffffff81191de4>] oom_kill_process+0x214/0x3f0
2019-03-14T09:15:17.541119+00:00 iZbp185dy2o3o6lnlo4f07Z kernel: [8040341.949171] [<ffffffff811f9481>] mem_cgroup_oom_synchronize+0x2f1/0x310
kubectl get pod console-54dc5566b4-hx6wb -n test -o yaml | grep limits -A4
limits:
memory: 512Mi
requests:
memory: 256Mi
In this case, application console pod extends the limits setting from default limits of namespace test
kubectl describe pod console-54dc5566b4-hx6wb -n test | grep limit
Annotations: kubernetes.io/limit-ranger=LimitRanger plugin set: memory request for container console; memory limit for container console
kubectl get limitrange -n test
NAME CREATED AT
mem-limit-range 2019-03-14T09:04:10Z
kubectl describe ns test
Name: test
Labels: <none>
Annotations: <none>
Status: Active
No resource quota.
Resource Limits
Type Resource Min Max Default Request Default Limit Max Limit/Request Ratio
---- -------- --- --- --------------- ------------- -----------------------
Container memory - - 256Mi 512Mi -
After fixing limits setting and recreated pod, application become healthy.
kubectl delete limitrange mem-limit-range -n test
kubectl delete pod console-54dc5566b4-hx6wb
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。
云原生时代,是开发者最好的时代