前言:
CKA是一个线上的技术认证考试,具体是个什么样的认证网上的资料非常多,在此不重复了。
大家比较关心的问题大体集中在这几点:
1,CKA认证的含金量到底如何?
CKA 证书是云原生计算基金会CNCF组织的,它考察的是你是否具备足够管理 Kubernetes 集群的必备知识。考试形式是上机直接在集群上操作,非常考验个人知识的扎实程度和 Kubernetes 实践经验。它本身是非常细致、严谨的官方技术考试。目前来看关于这块技术的认证,不论是发证机构还是考核方式及难度,CKA看来都是很权威的
2,CKA认证的难易程度?
CKA认证考试的题目全部是实操题(多说一句和rhce这些线下考试基本一样,都是上机实操,区别仅仅是CKA是线上而已),没有任何一个选择题或者填空题之类的,而且经常听学员反馈考点设备的网络较卡,设备反应慢等问题,考试环境中需要你自己动手把所有的实操题全部完成才行,所以对于没有基础的学员来说还是有难度的,也就是说,会者不难,难者不会。
3,是否需要CKA认证?
我看过了比较多的各种各样的CKA题库,从实际的生产中来说,CKA认证里的很多部分是用不到的或者说很少会用到的,比如,集群的升级,etcd数据库的备份(通常etcd的备份都是脚本自动化完成),节点的维护等等这些。但,大部分的内容还是有实际的应用价值的。
综上,可以负责任的说,CKA认证还是有实际意义的,确实能够证明一个人对kubernetes集群的管理能力。
有官网有模拟考试两次的机会,两次正式考试的机会。
其实任何的考试都是一个窍门:勤学苦练 ,没有什么特别的技巧(天赋异禀的另说喽)。
本文计划以2022CKA考试题库为蓝本,每天详细记录5道题目的解题思路,解题方法。题库总计17道题目。
一,
RBAC
题目要求:
创建名称 deployment-clusterrole 的 ClusterRole,该⻆色具备创建 Deployment、Statefulset、 Daemonset 的权限,在命名空间 app-team1 中创建名称为 cicd-token 的 ServiceAccount,绑定 ClusterRole 到 ServiceAccount,且限定命名空间为 app-team1。
解题思路:
此题涉及四种资源:serviceAccount,clusterrole,rolebinding,namespace,本文的资源都使用缩写。
这里主要是考察三点,一个是给sa赋权,一个是sa和clusterrole的绑定,一个是限定命名空间,而绑定只有rolebinding是和命名空间有关,clusterrolebinding是无关命名空间的,因此,绑定需要使用rolebinding。
这里说明一下如何知道某个资源是和namespace有关联的:
例如,我们查询角色绑定 rolebinding 输出会有namespace
很明显,角色绑定rolebinding和namespace是绑定的
root@k8s-master:~# kubectl get rolebindings.rbac.authorization.k8s.io -A NAMESPACE NAME ROLE AGE default cicd-deployment ClusterRole/deployment-clusterrole 54m default leader-locking-nfs-client-provisioner Role/leader-locking-nfs-client-provisioner 354d ingress-nginx ingress-nginx Role/ingress-nginx 354d ingress-nginx ingress-nginx-admission Role/ingress-nginx-admission 354d kube-public kubeadm:bootstrap-signer-clusterinfo Role/kubeadm:bootstrap-signer-clusterinfo 369d kube-public system:controller:bootstrap-signer Role/system:controller:bootstrap-signer 369d
查看角色 role呢?输出会有namespace
很明显,role是和namespace绑定的
root@k8s-master:~# kubectl get roles.rbac.authorization.k8s.io -A NAMESPACE NAME CREATED AT default leader-locking-nfs-client-provisioner 2021-12-23T09:57:07Z ingress-nginx ingress-nginx 2021-12-23T10:10:47Z ingress-nginx ingress-nginx-admission 2021-12-23T10:10:49Z kube-public kubeadm:bootstrap-signer-clusterinfo 2021-12-08T06:32:46Z
查询集群角色 clusterrole呢?输出没有namespace
很明显,clusterrole是不和namespace绑定的
root@k8s-master:~# kubectl get clusterroles.rbac.authorization.k8s.io -A NAME CREATED AT admin 2021-12-08T06:32:43Z calico-kube-controllers 2021-12-08T06:43:37Z calico-node 2021-12-08T06:43:37Z cluster-admin 2021-12-08T06:32:43Z
查询集群角色绑定 clusterrolebinding呢?输出没有namespace
很明显,clusterrolebinding是不和namespace绑定的
root@k8s-master:~# kubectl get clusterrolebindings.rbac.authorization.k8s.io -A NAME ROLE AGE calico-kube-controllers ClusterRole/calico-kube-controllers 369d calico-node ClusterRole/calico-node 369d cluster-admin ClusterRole/cluster-admin 369d ingress-nginx ClusterRole/ingress-nginx 354d ingress-nginx-admission ClusterRole/ingress-nginx-admission 354d kubeadm:get-nodes ClusterRole/kubeadm:get-nodes 369d kubeadm:kubelet-bootstrap ClusterRole/system:node-bootstrapper 369d
查询服务账号sa,输出是有namespace的,
很明显,serviceaccount是和namespace绑定的
root@k8s-master:~# kubectl get sa -A NAMESPACE NAME SECRETS AGE app-team1 cicd-token 1 85m app-team1 default 1 354d default default 1 369d default nfs-client-provisioner 1 354d
解题代码:
根据题目要求,得出应该有如下代码:
需要非常注意的是,rolebinding 一定要加namespace,否则此题不算通过。
root@k8s-master:~# kubectl create ns app-team1 Error from server (AlreadyExists): namespaces "app-team1" already exists root@k8s-master:~# kubectl create sa cicd-token -n app-team1 error: failed to create serviceaccount: serviceaccounts "cicd-token" already exists root@k8s-master:~# kubectl create clusterrole deployment-clusterrole --verb=create --resource=Deployment,StatefulSet,DaemonSet -n app-team1 Error from server (AlreadyExists): clusterroles.rbac.authorization.k8s.io "deployment-clusterrole" already exists root@k8s-master:~# kubectl create rolebinding cicd-deployment --clusterrole=deployment-clusterrole --serviceaccount=app-team1:cicd-token -n app-team1 error: failed to create rolebinding: rolebindings.rbac.authorization.k8s.io "cicd-deployment" already exists
验证解题结果:
查看创建的资源
root@k8s-master:~# kubectl get sa cicd-token -n app-team1 NAME SECRETS AGE cicd-token 1 96m root@k8s-master:~# kubectl get clusterrole deployment-clusterrole NAME CREATED AT deployment-clusterrole 2022-12-13T02:52:51Z root@k8s-master:~# kubectl get rolebindings.rbac.authorization.k8s.io cicd-deployment -n app-team1 NAME ROLE AGE cicd-deployment ClusterRole/deployment-clusterrole 46s
查看权限:
root@k8s-master:~# kubectl describe clusterroles.rbac.authorization.k8s.io deployment-clusterrole Name: deployment-clusterrole Labels: <none> Annotations: <none> PolicyRule: Resources Non-Resource URLs Resource Names Verbs --------- ----------------- -------------- ----- daemonsets.apps [] [] [create] deployments.apps [] [] [create] statefulsets.apps [] [] [create]
官网参考链接:使用 RBAC 鉴权 | Kubernetes(中文)
二,
节点调度
题目要求:
设置 ek8s-node-1 节点为不可用,重新调度该节点上的所有 pod
解题思路:
整个的节点维护的流程: 1)首先查看当前集群所有节点状态,例如共有四个节点都处于ready状态; 2)查看当前nginx两个副本分别运行在d-node1和k-node2两个节点上; 3)使用cordon命令将d-node1标记为不可调度; 4)再使用kubectl get nodes查看节点状态,发现d-node1虽然还处于Ready状态,但是同时还能被禁调度,代表新的pod将不会被调度到d-node1上。 5)再查看nginx状态,没有任何变化,两个副本仍运行在d-node1和k-node2上; 6)执行drain命令,将运行在d-node1上运行的pod平滑的赶到其他节点上; 7)再查看nginx的状态发现,d-node1上的副本已经被迁移到k-node1上;此时就可以对d-node1进行一些节点维护的操作,如升级内核,升级Docker等; 8)节点维护完后,使用uncordon命令解锁d-node1,使其重新变得可调度; 9)检查节点状态,发现d-node1重新变回Ready状态。
因此,需要使用到三个命令,cordon,drain,uncordon,根据题目要求不需要恢复节点调度,因此,uncordon不需要使用,但恢复使用也没什么错。
解题步骤:
1,
查询节点状态和要维护节点内运行的pod,这里以k8s-node1节点作为维护对象
root@k8s-master:~# kubectl get no k8s-node1 NAME STATUS ROLES AGE VERSION k8s-node1 Ready <none> 13h v1.22.2 root@k8s-master:~# kubectl get po -A -owide |grep k8s-node1 default front-end-6f94965fd9-dq7t8 1/1 Running 1 (173m ago) 13h 10.244.36.74 k8s-node1 <none> <none> default guestbook-86bb8f5bc9-mcdvg 1/1 Running 1 (173m ago) 13h 10.244.36.77 k8s-node1 <none> <none> default guestbook-86bb8f5bc9-zh7zq 1/1 Running 1 (173m ago) 13h 10.244.36.76 k8s-node1 <none> <none> default nfs-client-provisioner-56dd5765dc-gp6mz 1/1 Running 2 (173m ago) 13h 10.244.36.72 k8s-node1 <none> <none> default task-2-ds-pmlqw 1/1 Running 1 (173m ago) 13h 10.244.36.75 k8s-node1 <none> <none> ing-internal nginx-app-68b95cb66f-qkkpx 1/1 Running 1 (173m ago) 13h 10.244.36.73 k8s-node1 <none> <none> ingress-nginx ingress-nginx-controller-gqzgg 1/1 Running 1 (173m ago) 13h 192.168.123.151 k8s-node1 <none> <none> kube-system calico-node-g6rwl 1/1 Running 1 (173m ago) 13h 192.168.123.151 k8s-node1 <none> <none> kube-system kube-proxy-6ckmt 1/1 Running 1 (173m ago) 13h 192.168.123.151 k8s-node1 <none> <none> kube-system metrics-server-576fc6cd56-svg7q 1/1 Running 1 (173m ago) 13h 10.244.36.78 k8s-node1 <none> <none>
2,
进入维护状态,查询是否为正确的维护状态
root@k8s-master:~# kubectl cordon k8s-node1 node/k8s-node1 cordoned root@k8s-master:~# kubectl get no k8s-node1 NAME STATUS ROLES AGE VERSION k8s-node1 Ready,SchedulingDisabled <none> 13h v1.22.2
3,
平滑驱逐该节点的所有pod除daemonset类资源
root@k8s-master:~# kubectl drain k8s-node1 -- --add-dir-header --client-certificate --grace-period --log-file= --pod-selector --skip-headers --user --alsologtostderr --client-certificate= --grace-period= --log-file-max-size --pod-selector= --skip-log-headers --user= --as --client-key --ignore-daemonsets --log-file-max-size= --profile --skip-wait-for-delete-timeout --username --as= --client-key= --ignore-errors --log-flush-frequency --profile= --skip-wait-for-delete-timeout= --username= --as-group --cluster --insecure-skip-tls-verify --log-flush-frequency= --profile-output --stderrthreshold --v --as-group= --cluster= --kubeconfig --logtostderr --profile-output= --stderrthreshold= --v= --cache-dir --context --kubeconfig= --match-server-version --request-timeout --timeout --vmodule --cache-dir= --context= --log-backtrace-at --namespace --request-timeout= --timeout= --vmodule= --certificate-authority --delete-emptydir-data --log-backtrace-at= --namespace= --selector --tls-server-name --warnings-as-errors --certificate-authority= --disable-eviction --log-dir --one-output --selector= --tls-server-name= --chunk-size --dry-run --log-dir= --password --server --token --chunk-size= --force --log-file --password= --server= --token= root@k8s-master:~# kubectl drain k8s-node1 --ignore-daemonsets --delete-emptydir-data --force node/k8s-node1 already cordoned WARNING: ignoring DaemonSet-managed Pods: default/task-2-ds-pmlqw, ingress-nginx/ingress-nginx-controller-gqzgg, kube-system/calico-node-g6rwl, kube-system/kube-proxy-6ckmt evicting pod kube-system/metrics-server-576fc6cd56-svg7q evicting pod default/guestbook-86bb8f5bc9-zh7zq evicting pod default/front-end-6f94965fd9-dq7t8 evicting pod default/guestbook-86bb8f5bc9-mcdvg evicting pod default/nfs-client-provisioner-56dd5765dc-gp6mz evicting pod ing-internal/nginx-app-68b95cb66f-qkkpx pod/guestbook-86bb8f5bc9-mcdvg evicted pod/front-end-6f94965fd9-dq7t8 evicted pod/guestbook-86bb8f5bc9-zh7zq evicted pod/nfs-client-provisioner-56dd5765dc-gp6mz evicted pod/metrics-server-576fc6cd56-svg7q evicted pod/nginx-app-68b95cb66f-qkkpx evicted node/k8s-node1 evicted
4,
查看驱逐结果
root@k8s-master:~# kubectl get po -A -owide |grep k8s-node1 default task-2-ds-pmlqw 1/1 Running 1 (179m ago) 13h 10.244.36.75 k8s-node1 <none> <none> ingress-nginx ingress-nginx-controller-gqzgg 1/1 Running 1 (179m ago) 13h 192.168.123.151 k8s-node1 <none> <none> kube-system calico-node-g6rwl 1/1 Running 1 (179m ago) 13h 192.168.123.151 k8s-node1 <none> <none> kube-system kube-proxy-6ckmt 1/1 Running 1 (179m ago) 13h 192.168.123.151 k8s-node1 <none> <none> root@k8s-master:~# kubectl get no k8s-node1 NAME STATUS ROLES AGE VERSION k8s-node1 Ready,SchedulingDisabled <none> 13h v1.22.2
可以看到有许多不是daemonset控制器的pod已经被驱逐到其它节点了,此时这个节点是比较干净的。
5,
恢复节点调度,退出节点维护模式(不需要做)
root@k8s-master:~# kubectl uncordon k8s-node1 node/k8s-node1 uncordoned root@k8s-master:~# kubectl get no k8s-node1 NAME STATUS ROLES AGE VERSION k8s-node1 Ready <none> 13h v1.22.2
官网参考链接: Kubectl Reference Docs
三,
集群升级组件
题目要求:
升级 master 节点为1.22.2,升级前确保drain master 节点,不要升级worker node 、容器 manager、 etcd、 CNI插件、DNS 等内容;
解题思路:
不升级worker node,也就是操作只在master节点,上一题的节点维护工作换到master了。
排除etcd升级,因为etcd是静态pod,不是daemonset部署,因此,upgrade的时候排除即可。
解题步骤:
切换 context kubectl get nodes ssh k8s-master kubectl cordon k8s-master kubectl drain k8s-master --ignore-daemonsets --force apt-mark unhold kubeadm kubectl kubelet apt-get update && apt-get install -y kubeadm=1.22.2-00 kubelet=1.22.2-00 kubectl=1.22.2-00 apt-mark hold kubeadm kubectl kubelet kubeadm upgrade plan kubeadm upgrade apply v1.22.2 --etcd-upgrade=false systemctl daemon-reload && systemctl restart kubelet //kubectl -n kube-system rollout undo deployment coredns 有些朋友建议 rollout coredns, kubectl uncordon k8s-master 检查master节点状态以及版本 kubectl get node
官网参考链接:升级 kubeadm 集群 | Kubernetes
四,
etcd的备份和还原
题目要求:
备份 https://127.0.0.1:2379 上的 etcd 数据到 /var/lib/backup/etcd-snapshot.db,使用之前的备份文件 /data/backup/etcd-snapshot-previous.db 还原 etcd,使用指定的 ca.crt 、 etcd-client.crt 、 etcd-client.key
解题思路:
这个没什么好说的,网上看的很多人有什么第二种方法,第三种方法什么的,没必要,就一个最标准的方法就行了。
注意:如果执行时,提示permission denied,则是权限不够,命令最前面加sudo即可。 备份: ETCDCTL_API=3 etcdctl --endpoints https://172.0.0.1:2379 -- cacert=/opt/KUIN00601/ca.crt --cert=/opt/KUIN00601/etcd-client.crt -- key=/opt/KUIN00601/etcd-client.key snapshot save /var/lib/backup/etcd-snapshot.db 还原: ETCDCTL_API=3 etcdctl --endpoints https://172.0.0.1:2379 -- cacert=/opt/KUIN00601/ca.crt --cert=/opt/KUIN00601/etcd-client.crt -- key=/opt/KUIN00601/etcd-client.key snapshot restore /data/backup/etcd-snapshotprevious.db 还原成功后,最好通过 get nodes 确定集群状态是正常的
解题步骤:
先查看总共有多少pod:
root@k8s-master:~# kubectl get po -A |wc -l 24
按题目要求备份etcd,生成备份文件/var/lib/backup/etcd-snapshot.db:
ETCDCTL_API=3 etcdctl \ --endpoints https://192.168.123.150:2379 \ --cacert=/opt/KUIN00601/ca.crt \ --cert=/opt/KUIN00601/etcd-client.crt \ --key=/opt/KUIN00601/etcd-client.key snapshot save /var/lib/backup/etcd-snapshot.db
输出如下;
{"level":"info","ts":1670922988.1987517,"caller":"snapshot/v3_snapshot.go:68","msg":"created temporary db file","path":"/var/lib/backup/etcd-snapshot.db.part"} {"level":"info","ts":1670922988.2269363,"logger":"client","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"} {"level":"info","ts":1670922988.2272522,"caller":"snapshot/v3_snapshot.go:76","msg":"fetching snapshot","endpoint":"https://127.0.0.1:2379"} {"level":"info","ts":1670922988.6475282,"logger":"client","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"} {"level":"info","ts":1670922988.6933029,"caller":"snapshot/v3_snapshot.go:91","msg":"fetched snapshot","endpoint":"https://127.0.0.1:2379","size":"9.4 MB","took":"now"} {"level":"info","ts":1670922988.6935413,"caller":"snapshot/v3_snapshot.go:100","msg":"saved","path":"/var/lib/backup/etcd-snapshot.db"} Snapshot saved at /var/lib/backup/etcd-snapshot.db
还原etcd:
ETCDCTL_API=3 etcdctl --endpoints https://127.0.0.1:2379 --cacert=/opt/KUIN00601/ca.crt --cert=/opt/KUIN00601/server.crt --key=/opt/KUIN00601/server.key snapshot restore /data/backup/etcd-snapshot-previous.db
输出如下:
2022-12-13T17:17:15+08:00 info snapshot/v3_snapshot.go:251 restoring snapshot {"path": "/data/backup/etcd-snapshot-previous.db", "wal-dir": "default.etcd/member/wal", "data-dir": "default.etcd", "snap-dir": "default.etcd/member/snap", "stack": "go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdutl/snapshot/v3_snapshot.go:257\ngo.etcd.io/etcd/etcdutl/v3/etcdutl.SnapshotRestoreCommandFunc\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdutl/etcdutl/snapshot_command.go:147\ngo.etcd.io/etcd/etcdctl/v3/ctlv3/command.snapshotRestoreCommandFunc\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/ctlv3/command/snapshot_command.go:128\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.Start\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/ctlv3/ctl.go:107\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.MustStart\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/ctlv3/ctl.go:111\nmain.main\n\t/tmp/etcd-release-3.5.0/etcd/release/etcd/etcdctl/main.go:59\nruntime.main\n\t/home/remote/sbatsche/.gvm/gos/go1.16.3/src/runtime/proc.go:225"} 2022-12-13T17:17:15+08:00 info membership/store.go:119 Trimming membership information from the backend... 2022-12-13T17:17:16+08:00 info membership/cluster.go:393 added member {"cluster-id": "cdf818194e3a8c32", "local-member-id": "0", "added-peer-id": "8e9e05c52164694d", "added-peer-peer-urls": ["http://localhost:2380"]} 2022-12-13T17:17:16+08:00 info snapshot/v3_snapshot.go:272 restored snapshot {"path": "/data/backup/etcd-snapshot-previous.db", "wal-dir": "default.etcd/member/wal", "data-dir": "default.etcd", "snap-dir": "default.etcd/member/snap"}
官网参考链接:为 Kubernetes 运行 etcd 集群 | Kubernetes
五,
配置网络策略 NetworkPolicy
题目要求:
题目解析:
本题需要根据官网资料来做一些修改,官网参考链接:网络策略 | Kubernetes ,第一个示例复制下来后,根据题目要求修改即可。修改后的文件内容如下;
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-port-from-namespace namespace: fubar spec: podSelector: matchLabels: policyTypes: - Egress egress: - to: - namespaceSelector: matchLabels: project: my-app ports: - protocol: TCP port: 8080
应用部署:
root@k8s-master:~# kubectl apply -f networkpolicy.yaml networkpolicy.networking.k8s.io/allow-port-from-namespace created