实现NFS共享存储的WEB群集,前提是建立好web群集 这是我另一篇建立web群集的博客: blog.csdn.net/qq_45714272…
基于NFS服务器的WEB群集原理
可以根据这个图片来理解这个原理,我在web1上有照片,但是我web2没有,我可以在两台web后加一个nfs存储服务器,NFS是一个共享目录或文件的服务。web1收到的数据,在发给nfs,nfs在给web2 1、安装NFS服务器
[root@sto yum.repos.d]# yum install -y nfs-utils Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile Package matching 1:nfs-utils-1.3.0-0.21.el7.x86_64 already installed. Checking for update. Nothing to do [root@sto yum.repos.d]# systemctl restart rpcbind [root@sto yum.repos.d]# systemctl enable rpcbind [root@sto yum.repos.d]# systemctl enable nfs-server Created symlink from /etc/systemd/system/multi-user.target.wants/nfs-server.service to /usr/lib/systemd/system/nfs-server.service. [root@sto yum.repos.d]# systemctl start nfs-server [root@sto yum.repos.d]# systemctl status firewalld [root@sto http]# systemctl stop firewalld
防火墙开启的情况要增加nfs,rpc-bind,mounted服务,然后reload生效
2、准备NFS服务器资源
创建Export目录(共享目录) [root@sto yum.repos.d]# mkdir /http [root@sto yum.repos.d]# chmod a+x /http [root@sto yum.repos.d]# vi /etc/exports /http * (rw) [root@sto yum.repos.d]# systemctl restart nfs-server 在其他节点进行测试 [root@rs1 ~]# showmount -e sto Export list for sto: /http * [root@rs1 ~]# mkdir /mnt/nfs [root@rs1 ~]# mount sto:/http /mnt/nfs [root@rs1 ~]# cp ./anaconda-ks.cfg /mnt/nfs/test.txt [root@rs2 ~]# showmount -e sto Export list for sto: /http * [root@rs2 ~]# mkdir /mnt/nfs [root@rs2 ~]# mount sto:/http /mnt/nfs [root@rs2 ~]# cp ./anaconda-ks.cfg /mnt/nfs/test2.txt [root@sto http]# ls test2.txt test.txt
创建NFS集群
[root@rs1 ~]# pcs resource create WebFS ocf:heartbeat:Filesystem \ > device='sto:/http' directory='/var/www/html' fstype='nfs' \ > op monitor interval=20s timeout=40s \ > op start timeout=60s op stop timeout=60s [root@rs1 ~]# pcs resource create WebFS ocf:heartbeat:Filesystem device='storage:/http' directory='/var/www/html' fstype='nfs' op monitor interval=20s timeout=40s > op start timeout=60s op stop timeout=60s [root@rs1 ~]# pcs status Cluster name: cluster1 Stack: corosync Current DC: rs1 (version 1.1.20-5.el7-3c4c782f70) - partition with quorum Last updated: Sat May 9 02:59:31 2020 Last change: Sat May 9 02:59:22 2020 by root via cibadmin on rs1 2 nodes configured 3 resources configured Online: [ rs1 rs2 ] Full list of resources: VirtualIP (ocf::heartbeat:IPaddr2): Started rs1 WebFS (ocf::heartbeat:Filesystem): Started rs2 Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled [root@rs1 ~]# echo NFS > /var/www/html [root@rs1 ~]# systemctl restart httpd
创建web集群
pcs resource create Wbsite ocf:heartbeat:apache httpd="/usr/sbin/httpd" \ configfile=/etc/httpd/conf/httpd.conf \
statusurl="http://localhost/server-status" \ op monitor interval=1min
之前碰到的问题是web群集起不来,后来在一篇文章发现要添加httpd="/usr/sbin/httpd" ,也就是httpd命令的位置,这样就可以了
[root@rs1 html]# pcs status Cluster name: cluster1 Stack: corosync Current DC: rs1 (version 1.1.20-5.el7-3c4c782f70) - partition with quorum Last updated: Sat May 9 03:41:11 2020 Last change: Sat May 9 03:21:45 2020 by root via cibadmin on rs1 2 nodes configured 3 resources configured Online: [ rs1 rs2 ] Full list of resources: VirtualIP (ocf::heartbeat:IPaddr2): Started rs1 WebFS (ocf::heartbeat:Filesystem): Started rs1 Wbsite (ocf::heartbeat:apache): Started rs1 Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled
配置组和约束
通过将资源运行在同一节点上
[root@rs1 ~]# pcs resource group add a VirtualIP WebFS Website [root@rs1 html]# pcs status Cluster name: cluster1 Stack: corosync Current DC: rs1 (version 1.1.20-5.el7-3c4c782f70) - partition with quorum Last updated: Sat May 9 03:42:47 2020 Last change: Sat May 9 03:42:44 2020 by root via cibadmin on rs1
2 nodes configured 3 resources configured
Online: [ rs1 rs2 ]
Full list of resources:
Resource Group: a VirtualIP (ocf::heartbeat:IPaddr2): Started rs1 WebFS (ocf::heartbeat:Filesystem): Started rs1 Wbsite (ocf::heartbeat:apache): Started rs1
Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled
通过约束管理资源的启动
[root@rs1 ~]# pcs constraint order VirtualIP then Website Adding VirtualIP Wbsite (kind: Mandatory) (Options: first-action=start then-action=start) [root@rs1 ~]# pcs constraint order WebFS then Website Adding WebFS Wbsite (kind: Mandatory) (Options: first-action=start then-action=start)
资源在web1的情况下访问VIP
资源在web2的情况下访问:
[root@rs1 ~]# pcs status Cluster name: cluster1 Stack: corosync Current DC: rs1 (version 1.1.20-5.el7-3c4c782f70) - partition with quorum Last updated: Sat May 9 03:55:54 2020 Last change: Sat May 9 03:55:40 2020 by root via cibadmin on rs1 2 nodes configured 3 resources configured Node rs1: standby Online: [ rs2 ] Full list of resources: Resource Group: a VirtualIP (ocf::heartbeat:IPaddr2): Started rs2 WebFS (ocf::heartbeat:Filesystem): Started rs2 Wbsite (ocf::heartbeat:apache): Starting rs2 Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled
结果还是一样的,这样就完成啦基于NFS存储的WEB群集了
还有个重点:
[root@rs1 html]# systemctl status httpd ● httpd.service - The Apache HTTP Server Loaded: loaded (/usr/lib/systemd/system/httpd.service; disabled; vendor preset: disabled) Active: failed (Result: exit-code) since Sat 2020-05-09 03:37:45 EDT; 10min ago Docs: man:httpd(8) man:apachectl(8) Process: 27583 ExecStop=/bin/kill -WINCH ${MAINPID} (code=exited, status=1/FAILURE) Process: 27582 ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND (code=exited, status=1/FAILURE) Main PID: 27582 (code=exited, status=1/FAILURE) May 09 03:37:45 rs1 httpd[27582]: (98)Address already in use: AH00072: make_sock: co...:80 May 09 03:37:45 rs1 httpd[27582]: (98)Address already in use: AH00072: make_sock: co...:80 May 09 03:37:45 rs1 httpd[27582]: no listening sockets available, shutting down May 09 03:37:45 rs1 httpd[27582]: AH00015: Unable to open logs May 09 03:37:45 rs1 systemd[1]: httpd.service: main process exited, code=exited, sta...URE May 09 03:37:45 rs1 kill[27583]: kill: cannot find process "" May 09 03:37:45 rs1 systemd[1]: httpd.service: control process exited, code=exited s...s=1 May 09 03:37:45 rs1 systemd[1]: Failed to start The Apache HTTP Server. May 09 03:37:45 rs1 systemd[1]: Unit httpd.service entered failed state. May 09 03:37:45 rs1 systemd[1]: httpd.service failed. Hint: Some lines were ellipsized, use -l to show in full. [root@rs1 html]# ps aux |grep http root 18067 0.0 0.3 224068 3476 ? Ss 03:19 0:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid apache 18068 0.0 0.3 224204 3752 ? S 03:19 0:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid apache 18069 0.0 0.3 224204 3752 ? S 03:19 0:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid apache 18070 0.0 0.3 224204 3752 ? S 03:19 0:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid apache 18071 0.0 0.3 224204 3756 ? S 03:19 0:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid apache 18072 0.0 0.3 224204 3764 ? S 03:19 0:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid apache 31002 0.0 0.3 224204 3704 ? S 03:44 0:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid root 35170 0.0 0.0 112712 952 pts/1 R+ 03:49 0:00 grep --color=auto http
我们的httpd服务显示暂停了,但是看进程还是在运行的,是因为: 当我们添加web群集之后,httpd不在由systemctl控制,而是由群集组件控制,所以我们用进程看httpd还是在运行的
用PCS建立WEB集群,一直不生效怎么解决!
心累,耗了我2天时间才解决。
[root@rs1 ~]# pcs resource create Website ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf
statusurl="http://localhost/server-status" \ op monitor interval=1min
[root@rs1 ~]# pcs status Cluster name: cluster1 Stack: corosync Current DC: rs1 (version 1.1.20-5.el7-3c4c782f70) - partition with quorum Last updated: Sat May 9 02:29:59 2020 Last change: Sat May 9 02:28:36 2020 by root via cibadmin on rs1
2 nodes configured 2 resources configured
Online: [ rs1 rs2 ]
Full list of resources:
VirtualIP (ocf::heartbeat:IPaddr2): Started rs1 Website (ocf::heartbeat:apache): Stopped
Failed Resource Actions:
- Website_start_0 o n rs1 'unknown error' (1): call=19, status=Timed Out, exitreason='', last-rc-change='Sat May 9 02:29:17 2020', queued=0ms, exec=40004ms
- Website_start_0 on rs2 'unknown error' (1): call=14, status=Timed Out, exitreason='', last-rc-change='Sat May 9 02:28:36 2020', queued=0ms, exec=40006ms
Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled
在放个日志文件
May 9 02:29:48 b apache(Website)[12391]: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up May 9 02:29:49 b apache(Website)[12391]: INFO: apache not running May 9 02:29:49 b apache(Website)[12391]: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up May 9 02:29:50 b apache(Website)[12391]: INFO: apache not running May 9 02:29:50 b apache(Website)[12391]: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up May 9 02:29:51 b apache(Website)[12391]: INFO: apache not running May 9 02:29:51 b apache(Website)[12391]: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up May 9 02:29:52 b apache(Website)[12391]: INFO: apache not running May 9 02:29:52 b apache(Website)[12391]: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up May 9 02:29:53 b apache(Website)[12391]: INFO: apache not running May 9 02:29:53 b apache(Website)[12391]: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up May 9 02:29:54 b apache(Website)[12391]: INFO: apache not running May 9 02:29:54 b apache(Website)[12391]: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up May 9 02:29:55 b apache(Website)[12391]: INFO: apache not running May 9 02:29:55 b apache(Website)[12391]: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up May 9 02:29:56 b apache(Website)[12391]: INFO: apache not running May 9 02:29:56 b apache(Website)[12391]: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up May 9 02:29:57 b lrmd[9228]: warning: Website_start_0 process (PID 12391) timed out May 9 02:29:57 b lrmd[9228]: warning: Website_start_0:12391 - timed out after 40000ms May 9 02:29:57 b crmd[9231]: error: Result of start operation for Website on rs1: Timed Out May 9 02:29:57 b crmd[9231]: warning: Action 5 (Website_start_0) on rs1 failed (target: 0 vs. rc: 1): Error May 9 02:29:57 b crmd[9231]: notice: Transition aborted by operation Website_start_0 'modify' on rs1: Event failed May 9 02:29:57 b crmd[9231]: notice: Transition 16 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-input-16.bz2): Complete May 9 02:29:57 b pengine[9230]: warning: Processing failed start of Website on rs1: unknown error May 9 02:29:57 b pengine[9230]: warning: Processing failed start of Website on rs1: unknown error May 9 02:29:57 b pengine[9230]: warning: Processing failed start of Website on rs2: unknown error May 9 02:29:57 b pengine[9230]: warning: Forcing Website away from rs1 after 1000000 failures (max=1000000) May 9 02:29:57 b pengine[9230]: warning: Forcing Website away from rs2 after 1000000 failures (max=1000000) May 9 02:29:57 b pengine[9230]: notice: * Stop Website ( rs1 ) due to node availability May 9 02:29:57 b pengine[9230]: notice: Calculated transition 17, saving inputs in /var/lib/pacemaker/pengine/pe-input-17.bz2
解决方案: 重新创建web集群
[root@rs1 html]# pcs resource create Wbsite ocf:heartbeat:apache httpd="/usr/sbin/httpd" \ configfile=/etc/httpd/conf/httpd.conf \ statusurl="http://localhost/server-status" \ op monitor interval=1min 可以看出我在原来的基础上加了一个httpd="/usr/sbin/httpd" \,也就是httpd命令的位置,看结果: [root@rs1 html]# pcs status Cluster name: cluster1 Stack: corosync Current DC: rs1 (version 1.1.20-5.el7-3c4c782f70) - partition with quorum Last updated: Sat May 9 03:33:01 2020 Last change: Sat May 9 03:21:45 2020 by root via cibadmin on rs1 2 nodes configured 3 resources configured Online: [ rs1 rs2 ] Full list of resources: VirtualIP (ocf::heartbeat:IPaddr2): Started rs1 WebFS (ocf::heartbeat:Filesystem): Started rs1 Wbsite (ocf::heartbeat:apache): Started rs1 Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled
解决了!