Centos7部署k8s[v1.16]高可用[keepalived]集群
实验目的
一般情况下,k8s集群中只有一台master和多台node,当master故障时,引发的事故后果可想而知。
故本文目的在于体现集群的高可用,即当集群中的一台master宕机后,k8s集群通过vip的转移,又会有新的节点被选举为集群的master,并保持集群的正常运作。
因本文体现的是master节点的高可用,为了实现效果,同时因资源条件限制,故总共采用4台服务器完成本次实验,3台master,1台node。
看到这也需有人有疑惑,总共有4台机器的资源,为啥不能2台master呢?这是因为通过kubeadm部署的集群,当中的etcd集群默认部署在master节点上,3节点方式最多能容忍1台服务器宕机。如果是2台master,当中1台宕机了,直接导致etcd集群故障,以至于k8s集群异常,这些基础环境都over了,vip漂移等高可用也就在白瞎。
环境说明
基本信息
主机列表
10.2.2.137 master1
10.2.2.166 master2
10.2.2.96 master3
10.2.3.27 node0
软件版本
docker version:18.09.9
k8s version:v1.16.4
架构信息
本文采用kubeadm方式搭建集群,通过keepalived的vip策略实现高可用,架构图如下:
主备模式高可用架构说明
a)apiserver通过keepalived实现高可用,当某个节点故障时触发vip转移;
b)controller-manager和scheduler在k8s内容通过选举方式产生领导者(由leader-elect选型控制,默认为true),同一时刻集群内只有一个scheduler组件运行;
c)etcd在kubeadm方式实现集群时,其在master节点会自动创建etcd集群,来实现高可用,部署的节点为奇数,3节点方式最多容忍一台机器宕机。
环境准备
说明
1、大多数文章都是一步步写命令写步骤,而对于有部署经验的人来说觉得繁琐化了,故本文大部分服务器shell命令操作都将集成到脚本;
2、所有要加入到k8s集群的机器都执行本部分操作。
操作
a)将所有服务器修改成对应的主机名,master1示例如下;
hostnamectl set-hostname master1 #重新登录后显示新设置的主机名
b)配置master1到master2、master3免密登录,本步骤只在master1上执行;
[root@master1 ~]# ssh-keygen -t rsa # 一路回车
[root@master1 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.2.2.166
[root@master1 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.2.2.96
c)脚本实现环境需求配置;
sh set-prenv.sh
set-prenv.sh
软件安装
docker安装
说明:所有节点都执行本部分操作!
sh install-docker.sh
install-docker.sh
keepalived安装
说明:三台master节点执行本部分操作!
安装
yum -y install keepalived
配置
master1上keepalived配置
[root@master1 ~]# cat /etc/keepalived/keepalived.conf
View Code
master2上keepalived配置
[root@master2 ~]# cat /etc/keepalived/keepalived.conf
View Code
master3上keepalived配置
[root@master3 ~]# cat /etc/keepalived/keepalived.conf
View Code
启动
service keepalived start
systemctl enable keepalived
vip查看
[root@master1 ~]# ip a
功能功验
将master1上的keepalived服务器停止或者master1服务器关机后,master2上接管vip,同时master2也关机后,master3接管vip。
k8s安装
说明:所有节点都执行本部分操作!
组件:
kubelet
运行在集群所有节点上,用于启动Pod和容器等对象的工具
kubeadm
用于初始化集群,启动集群的命令工具
kubectl
用于和集群通信的命令行,通过kubectl可以部署和管理应用,查看各种资源,创建、删除和更新各种组件
sh install-k8s.sh
install-k8s.sh
镜像下载
说明:所有节点都执行本部分操作!
因国内网络的限制,故从阿里云镜像仓库下载镜像后本地打回默认标签名的方式,让kubeadm在部署集群时能正常使用镜像。
sh download-images.sh
download-images.sh
master初始化
初始化操作
[root@master1 ~]# cat kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: v1.16.4
apiServer:
certSANs:
- 10.2.2.6
controlPlaneEndpoint: "10.2.2.6:6443"
networking:
podSubnet: "10.244.0.0/16"
[root@master01 ~]# kubeadm init --config=kubeadm-config.yaml
初始化成功后末尾显示kubeadm join的信息,记录下来;
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
kubeadm join 10.2.2.6:6443 --token 2ccecd.v72vziyzdfnbr46u \
--discovery-token-ca-cert-hash sha256:eb92768acb748d722ef7d97bc60751a375b67b12a46c7a7232c54cdb378d2e61 \
--control-plane
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 10.2.2.6:6443 --token 2ccecd.v72vziyzdfnbr46u \
--discovery-token-ca-cert-hash sha256:eb92768acb748d722ef7d97bc60751a375b67b12a46c7a7232c54cdb378d2e61
初始化失败后可重新初始化
kubeadm reset
rm -rf $HOME/.kube/config
添加环境变量
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
source ~/.bash_profile
安装flannel插件
wget https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
kubectl apply -f kube-flannel.yml
control plane节点加入
证书分发
master1将认证文件同步到其他master节点
[root@master1 ~]# sh cert-others-master.sh
View Code
master2和master3节点配置证书
[root@master2 ~]# sh cert-set.sh
[root@master3 ~]# sh cert-set.sh
View Code
others master加入集群
master2和master3加入集群,下文以master2为示例,master3按部就班即可;
[root@master2 ~]# kubeadm join 10.2.2.6:6443 --token 2ccecd.v72vziyzdfnbr46u \
--discovery-token-ca-cert-hash sha256:eb92768acb748d722ef7d97bc60751a375b67b12a46c7a7232c54cdb378d2e61 \
--control-plane
[root@master2 ~]# scp master1:/etc/kubernetes/admin.conf /etc/kubernetes/
[root@master2 ~]# echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile && source .bash_profile
集群节点查看
kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready master 20h v1.16.4
master2 Ready master 20h v1.16.4
master3 Ready master 19h v1.16.4
node节点加入
加入操作
[root@node0 ~]# kubeadm join 10.2.2.6:6443 --token 2ccecd.v72vziyzdfnbr46u \
--discovery-token-ca-cert-hash sha256:eb92768acb748d722ef7d97bc60751a375b67b12a46c7a7232c54cdb378d2e61
节点查看
kubectl get nodes
NAME STATUS ROLES AGE VERSION
node0 Ready 18h v1.16.4
master1 Ready master 20h v1.16.4
master2 Ready master 20h v1.16.4
master3 Ready master 19h v1.16.4
集群功能验证
操作
关机master1,模拟宕机
[root@master1 ~]# init 0
vip飘到master2
[root@master2 ~]# ip a |grep '2.6'
inet 10.2.2.6/32 scope global eth0
组件controller-manager和scheduler发生迁移
kubectl get endpoints kube-controller-manager -n kube-system -o yaml |grep holderIdentity
control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"master3_885468ec-f9ce-4cc6-93d6-235508b5a130","leaseDurationSeconds":15,"acquireTime":"2020-04-01T10:15:28Z","renewTime":"2020-04-02T06:02:46Z","leaderTransitions":8}'
kubectl get endpoints kube-scheduler -n kube-system -o yaml |grep holderIdentity
control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"master2_cf16fd61-0202-4610-9a27-3cd9d26b4141","leaseDurationSeconds":15,"acquireTime":"2020-04-01T10:15:25Z","renewTime":"2020-04-02T06:03:09Z","leaderTransitions":9}'
集群创建pod,依旧正常使用
cat nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-test
spec:
selector:
matchLabels:
app: nginx
replicas: 3
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
kubectl apply -f nginx.yaml
kubectl get pods
结论
1、集群中3个master节点,无论哪个节点宕机,都不影响集群的正常使用;
2、当集群中3个master节点有2个故障,则造成etcd集群故障,直接影响集群,导致异常!
========================================
作者:罗穆瑞
出处:http://www.cnblogs.com/kazihuo/