3.7.2 线上环境中的Fabric应用实例
笔者线上的核心业务机器统一都是AWS EC2主机,机器数量较多,每个数据中心都部署了Fabric跳板机(物理拓扑图可参考图3-3),系统为Amazon Linux,内核版本为3.14.34-27.48.amzn1.x86_64,Python版本为Python 2.6.9。
如果公司项目组核心开发人员离职,线上机器就都要更改密钥,由于密钥一般是以组的形式存在的,再加上机器数量繁多,因此单纯通过技术人员手工操作,基本上是一项不可能完成的任务,但若是通过Fabric自动化运维工具的话,这就是一项简单的工作了,由于现在的线上服务器多采用SSH Key的方式管理,所以对于大多数系统运维人员来说SSH Key分发也是工作内容之一,故而建议大家掌握此脚本的用法。示例脚本内容如下:
#!/usr/bin/python2.6
# -*- coding: utf-8 -*-
from fabric.api import *
from fabric.colors import *
from fabric.context_managers import *
#这里为了简化工作,脚本采用纯Python的写法,没有采用Fabric的@task修饰器
env.user = 'ec2-user'
env.key_filename = '/home/ec2-user/.ssh/id_rsa'
hosts=['budget','adserver','bidder1','bidder2','bidder3','bidder4','bidder5','bidder6','bidder7','bidder8','bidder9',redis1','redis2','redis3','redis4','redis5','redis6']
#机器数量众多,这里只罗列了部分
def put_ec2_key():
with settings(warn_only=False):
put("/home/ec2-user/admin-master.pub","/home/ec2-user/admin-master.pub")
sudo("\cp /home/ec2-user/admin-master.pub /home/ec2-user/.ssh/authorized_keys")
#\cp的作用是取消其别名作用,即不让cp-i生效
sudo("chmod 600 /home/ec2-user/.ssh/authorized_keys")
def put_admin_key():
with settings(warn_only=False):
put("/home/ec2-user/admin-operation.pub",
"/home/ec2-user/admin-operation.pub")
sudo("\cp /home/ec2-user/admin-operation.pub /home/admin/.ssh/authorized_keys")
sudo("chown admin:admin /home/admin/.ssh/authorized_keys")
sudo("chmod 600 /home/admin/.ssh/authorized_keys")
def put_readonly_key():
with settings(warn_only=False):
put("/home/ec2-user/admin-readonly.pub",
"/home/ec2-user/admin-readonly.pub")
sudo("\cp /home/ec2-user/admin-readonly.pub /home/readonly/.ssh/authorized_keys")
sudo("chown readonly:readonly /home/readonly/.ssh/authorized_keys")
sudo("chmod 600 /home/readonly/.ssh/authorized_keys")
for host in hosts:
env.host_string = host
put_ec2_key()
put_admin_key()
put_readonly_key()
大家可以输入如下命令查看系统中定义的别名(CentOS 6.4 x86_64)。
alias
命令显示结果如下所示:
alias cp='cp -i'
alias l.='ls -d .* --color=auto'
alias ll='ls -l --color=auto'
alias ls='ls --color=auto'
alias mv='mv -i'
alias rm='rm -i'
alias which='alias | /usr/bin/which --tty-only --read-alias --show-dot --show-tilde'
Amazon Linux系统与CentOS 6.4略有差别,已经取消了cp的别名定义。
如果线上的Nagios 客户端的监控脚本因为业务需求又发生了改动,而bidder业务集群约有23台(下面只列出了其中10台),且其中的一个业务需求脚本前前后后改动了4次,这时,手动操作肯定会耗费大量人力及时间成本,因此这里用Fabric推送此脚本并执行,代码如下:
#!/usr/bin/python2.6
## -*- coding: utf-8 -*-
from fabric.api import *
from fabric.colors import *
from fabric.context_managers import *
user = 'ec2-user'
hosts=['bidder1','bidder2','bidder3','bidder4','bidder5','bidder6','bidder7','bidder8','bidder9','bidder10']
#机器数量比较多,这里只列出其中10台
@task
#这里用到了@task修饰器
def put_task():
print yellow("Put Local File to Nagios Client")
with settings(warn_only=True):
put("/home/ec2-user/check_cpu_utili.sh",
"/home/ec2-user/check_cpu_utili.sh")
sudo("cp /home/ec2-user/check_cpu_utili.sh /usr/local/nagios/libexec")
sudo("chown nagios:nagios /usr/local/nagios/libexec/check_cpu_utili.sh")
sudo("chmod +x /usr/local/nagios/libexec/check_cpu_utili")
sudo("kill `ps aux | grep nrpe | head -n1 | awk '{print $2}' `")
sudo("/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d")
print green("upload File success and restart nagios service!")
#这里以绿色字体打印结果是为了方便查看脚本执行结果
for host in hosts:
env.host_string = host
put_task()
执行上面的脚本以后,Fabric也会返回清晰的显示结果,大家可以根据显示结果得知哪些机器已经成功运行,哪些机器失败,非常直观,结果如下所示:
Put Local File to remote
[bidder1] put: /home/ec2-user/check_cpu_utili.sh -> /home/ec2-user/check_cpu_utili.sh
[bidder1] sudo: cp /home/ec2-user/check_cpu_utili.sh /usr/local/nagios/libexec/check_cpu_utili.sh
[bidder1] sudo: chown nagios:nagios /usr/local/nagios/libexec/check_cpu_utili.sh
[bidder1] sudo: chmod +x /usr/local/nagios/libexec/check_cpu_utili.sh
[bidder1] sudo: kill `ps aux | grep nrpe | head -n1 | awk '{print $2}' `
[bidder1] sudo: /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
upload File success and restart nagios service!
Put Local File to remote
[bidder2] put: /home/ec2-user/check_cpu_utili.sh -> /home/ec2-user/check_cpu_utili.sh
[bidder2] sudo: cp /home/ec2-user/check_cpu_utili.sh /usr/local/nagios/libexec/check_cpu_utili.sh
[bidder2] sudo: chown nagios:nagios /usr/local/nagios/libexec/check_cpu_utili.sh
[bidder2] sudo: chmod +x /usr/local/nagios/libexec/check_cpu_utili.sh
[bidder2] sudo: kill `ps aux | grep nrpe | head -n1 | awk '{print $2}' `
[bidder2] sudo: /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
upload File success and restart nagios service!
Put Local File to remote
[bidder3] put: /home/ec2-user/check_cpu_utili.sh -> /home/ec2-user/check_cpu_utili.sh
[bidder3] sudo: cp /home/ec2-user/check_cpu_utili.sh /usr/local/nagios/libexec/check_cpu_utili.sh
[bidder3] sudo: chown nagios:nagios /usr/local/nagios/libexec/check_cpu_utili.sh
[bidder3] sudo: chmod +x /usr/local/nagios/libexec/check_cpu_utili.sh
[bidder3] sudo: kill `ps aux | grep nrpe | head -n1 | awk '{print $2}' `
[bidder3] sudo: /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
upload File success and restart nagios service!
Put Local File to remote
[bidder4] put: /home/ec2-user/check_cpu_utili.sh -> /home/ec2-user/check_cpu_utili.sh
[bidder4] sudo: cp /home/ec2-user/check_cpu_utili.sh /usr/local/nagios/libexec/check_cpu_utili.sh
[bidder4] sudo: chown nagios:nagios /usr/local/nagios/libexec/check_cpu_utili.sh
[bidder4] sudo: chmod +x /usr/local/nagios/libexec/check_cpu_utili.sh
[bidder4] sudo: kill `ps aux | grep nrpe | head -n1 | awk '{print $2}' `
[bidder4] sudo: /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
upload File success and restart nagios service!
Put Local File to remote
[bidder5] put: /home/ec2-user/check_cpu_utili.sh -> /home/ec2-user/check_cpu_utili.sh
[bidder5] sudo: cp /home/ec2-user/check_cpu_utili.sh /usr/local/nagios/libexec/check_cpu_utili.sh
[bidder5] sudo: chown nagios:nagios /usr/local/nagios/libexec/check_cpu_utili.sh
[bidder5] sudo: chmod +x /usr/local/nagios/libexec/check_cpu_utili.sh
[bidder5] sudo: kill `ps aux | grep nrpe | head -n1 | awk '{print $2}' `
[bidder5] sudo: /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
upload File success and restart nagios service!
大家可以看到,短短几行代码就达到了自动化运维的效果,而且跟Fabric相关的代码都是纯Python代码和Shell代码,开发人员和运维人员很容易上手,在公司里推广应用,大家的认可程度也高。事实上,通过上面的举例大家应该能发现,Fabric特别适合于需要重复执行大量Shell命令的工作场景。