在 Azure CentOS VM 中配置 SQL Server 2019 AG - (上)
前文
假定您对Azure和SQL Server HA具有基础知识
假定您对Azure Cli具有基础知识
目标是在Azure Linux VM上创建一个具有三个副本的可用性组,并实现侦听器和Fencing配置
环境
SQL Server 2019 Developer on Linux
Azure VM Fencing agent
Azure Cli实现部分配置
CentOS 7.7 Azure VM,分别SQL19N1,SQL19N2,SQL19N3,位于同一VNet
步骤
为VM创建资源组和可用性集
中国东部2创建资源组
az group create --name SQL-DEMO-RG --location chinaeast2
创建用于VM人Availability Set,配置2个容错域,2个更新域
az vm availability-set create \
--resource-group SQL-DEMO-RG \
--name AGLinux-AvailabilitySet \
--platform-fault-domain-count 2 \
--platform-update-domain-count 2
使用Template部署3台VM
第一次创建VM时,会生成template,然后下载保存下,修改其中的参数值后,就可以方便地创建配置类似的VM。VM的配置主要有:
使用前面的可用性集
使用同一个子网
IP使用Standard
SSH public key配置
模板和参数文件太长,就不展示了。可以在Azure Portal上自行获取。
如下是SQL19N2的配置,修改参数文件后,直接可以用于创建SQL19N3
templateFile="./templateFile"
paramFile="./vmParams-sql19n2.json"
az deployment group validate --name sql19n2vm \
-g SQL-DEMO-RG --template-file $templateFile --parameters $paramFile
配置VM使用固定内网IP和公网DNS Label
三台VM都需要修改配置,如下只是一台的配置示例
找出nic和IP的信息
az network nic list -g SQL-DEMO-RG --query "[].{nicName:name,configuration:ipConfigurations[].{ipName:name,ip:privateIpAddress,method:privateIpAllocationMethod}}" -o yaml
修改privateIpAllocationMethod为Static
az network nic ip-config update -g SQL-DEMO-RG --nic-name sql19n1152 --name ipconfig1 --set privateIpAllocationMethod=Static
找出pbulic ip名称
az network public-ip list -g SQL-DEMO-RG --query "[].name" -o tsv
配置Public IP的DNS name,只能使用数据和小字字母
az network public-ip update -g SQL-DEMO-RG -n SQL19N1ip851 --dns-name sql19n1
安装HA相关软件包
最好先更新一下系统的软件包,再安装HA相关软件。
yum update -y
yum install -y pacemaker pcs fence-agents-all resource-agents fence-agents-azure-arm
reboot
为群集和SQL Server开放防火墙端口
Pacemaker和Corosync的端口
TCP: Ports 2224,3121,21064,5405
UDP: Port 5405
firewall-cmd --add-port=2224/tcp --permanent
firewall-cmd --add-port=2224/tcp --permanent
firewall-cmd --add-port=21064/tcp --permanent
firewall-cmd --add-port=5405/tcp --permanent
firewall-cmd --add-port=5405/udp --permanent
SQL Server端口和AG镜像端口
TCP: 1433,5022
firewall-cmd --add-port=1433/tcp --permanent
firewall-cmd --add-port=5022/tcp --permanent
firewall-cmd --reload
添加hosts记录
vi /etc/hosts
172.17.2.8 SQL19N1
172.17.2.9 SQL19N2
172.17.2.10 SQL19N3
创建Pacemaker群集
设置Pacemaker的默认用户密码,三台VM上
passwd hacluster
设置pacemaker和pcsd自启动在三台VM上
systemctl enable pcsd
systemctl start pcsd
systemctl enable pacemaker
创建群集,在master节点
sudo pcs cluster auth SQL19N1 SQL19N2 SQL19N3 -u hacluster
sudo pcs cluster setup --name agcluster SQL19N1 SQL19N2 SQL19N3 --token 30000 --force
sudo pcs cluster start --all
sudo pcs cluster enable --all
查看群集状态
pcs status
在三个节点上修改quorum的expected-votes为3,其实三节点群集默认为3
设置表示,群集存活需要3票,这个修改只影响当前running群集,不会变成群集的永久性配置保存下来
pcs quorum expected-votes 3
在Azure上为Fencing Agent配置Servic Princinpal
1. 创建 aad app,成功后记录下相应的appID
az ad app create --display-name sqldemorg-app --identifier-uris http://localhost
--password "1qaz@WSX3edc" --end-date '2030-04-27' --credential-description "sql19 ag secret"
2. 创建aad App的Service Principal
az ad sp create --id
3. 将service Principal分配到VM对应的管理role,对每个VM都要执行
我这里分配的是Owner role,这不是安全的做法。应该使用自定义一个role,只给最小权限
自定义role需要Azure订阅是PP1或者PP2级别
az role assignment create --assignee --role owner \
--scope /subscriptions//resourceGroups//providers/Microsoft.Compute/virtualMachines/SQL19N1
创建Azure的STONITH 设备
我使用的是Azure China,所以需要指定cloud=china,如果使用global Azure不需要指定此参数。
执行 fence_azure_arm -h,查看此资源代理的更多帮助信息
pcs property set stonith-timeout=900
pcs stonith create rsc_st_azure fence_azure_arm login="" passwd="" resourceGroup="" tenantId="" subscriptionId="" power_timeout=240 pcmk_reboot_timeout=900 cloud=china
安装SQL 2019及工具
安装 SQL 2019和HA 资源代理
sudo curl -o /etc/yum.repos.d/mssql-server.repo https://packages.microsoft.com/config/rhel/7/mssql-server-2019.repo
sudo yum install -y mssql-server
sudo /opt/mssql/bin/mssql-conf setup
sudo yum install mssql-server-ha
安装 mssql-tools
sudo curl -o /etc/yum.repos.d/msprod.repo https://packages.microsoft.com/config/rhel/7/prod.repo
sudo yum install -y mssql-tools unixODBC-devel
将mssql-tools目录加入到aPATH,方便使用
echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bash_profile
echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc
source ~/.bashrc
安装 mssql-cli
sudo rpm --import https://packages.microsoft.com/keys/microsoft.asc
sudo curl -o /etc/yum.repos.d/mssql-cli.repo https://packages.microsoft.com/config/rhel/7/prod.repo
sudo yum install mssql-cli
查看SQL 状态
systemctl status mssql-server
如果您熟悉 SQL Server相关的PowerShell,建议将PowerShell也安装上,并安装SQLServer module。对SQL Server的配置,使用PowerShell会方便很多
yum install powershell -y
pwsh
Install-Module SQLServer
查看SQL相关的命令
Get-Command -Module SQLServer
配置AG
创建PowerShell 函数方便后续执行T-SQL
打开PowerShell的 profile文件,如果不存在需要则需要创建
vi /root/.config/powershell/Microsoft.PowerShell_profile.ps1
将如下函数加入 到 profile文件中,每次打开pwsh时就可以直接调用
函数有两个参数,$sql表示需要执行的T-SQL,最好使用here-string以避免字符转义问题
$servers表示目标实例,数组类型。默认值为当前环境中的三个实例
function run-sql ($sql,$servers=("SQL19N1","SQL19N2","SQL19N3"))
{
$secpasswd = "1qaz@WSX"|ConvertTo-SecureString -AsPlainText -Force
$cred=New-Object System.Management.Automation.PSCredential -ArgumentList 'sa', $secpasswd
$sql
"---------"
foreach($svr in $servers) {"Running T-SQL on $svr..."; Invoke-Sqlcmd -ServerInstance $svr -Credential $cred -Query $sql}
}
启用 hadr功能,每个实例
sudo /opt/mssql/bin/mssql-conf set hadr.hadrenabled 1
sudo systemctl restart mssql-server
启动AG extened event session
T-SQL,每个实例
ALTER EVENT SESSION AlwaysOn_health ON SERVER WITH (STARTUP_STATE=ON);
GO
在主副本实例上创建证书,这个证书用于验证Mirroring endpoint通信。将证书和私钥复制到其它节点上的相同的目录位置。授予mssql用户访问权限
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '1qaz@WSX';
GO
CREATE CERTIFICATE dbm_certificate WITH SUBJECT = 'dbm';
GO
BACKUP CERTIFICATE dbm_certificate
TO FILE = '/var/opt/mssql/data/dbm_certificate.cer'
WITH PRIVATE KEY (
FILE = '/var/opt/mssql/data/dbm_certificate.pvk',
ENCRYPTION BY PASSWORD = '1qaz@WSX'
);
复制证书和私钥到辅助副本主机SQL19N2和SQL19N3
cd /var/opt/mssql/data
scp dbm_certificate.* root@SQL19N2:/var/opt/mssql/data/
scp dbm_certificate.* root@SQL19N3:/var/opt/mssql/data/
辅助副本节点上修改权限
cd /var/opt/mssql/data
chown mssql:mssql dbm_certificate.*
在辅助副本实例中创建master key并导入证书
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '1qaz@WSX';
GO
CREATE CERTIFICATE dbm_certificate
FROM FILE = '/var/opt/mssql/data/dbm_certificate.cer'
WITH PRIVATE KEY (
FILE = '/var/opt/mssql/data/dbm_certificate.pvk',
DECRYPTION BY PASSWORD = '1qaz@WSX'
);
创建AG的镜像端口,注意防火墙和NSG配置端口例外
CREATE ENDPOINT [Hadr_endpoint]
AS TCP (LISTENER_PORT = 5022)
FOR DATABASE_MIRRORING (
ROLE = ALL,
AUTHENTICATION = CERTIFICATE dbm_certificate,
ENCRYPTION = REQUIRED ALGORITHM AES
);
GO
ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED;
创建三个副本,同步模式的AG,主副本实例上执行
CREATE AVAILABILITY GROUP [ag1]
WITH (DB_FAILOVER = ON, CLUSTER_TYPE = EXTERNAL)
FOR REPLICA ON
N'SQL19N1'
WITH (
ENDPOINT_URL = N'tcp://SQL19N1:5022',
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
FAILOVER_MODE = EXTERNAL,
SEEDING_MODE = AUTOMATIC,
SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL)
),
N'SQL19N2'
WITH (
ENDPOINT_URL = N'tcp://SQL19N2:5022',
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
FAILOVER_MODE = EXTERNAL,
SEEDING_MODE = AUTOMATIC,
SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL)
),
N'SQL19N3'
WITH(
ENDPOINT_URL = N'tcp://SQL19N3:5022',
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
FAILOVER_MODE = EXTERNAL,
SEEDING_MODE = AUTOMATIC,
SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL)
);
GO
ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE;
GO
为Pacemaker创建sql登录并授权,每个实例
USE [master]
GO
CREATE LOGIN [pacemakerLogin] with PASSWORD= N'1qaz@WSX'
go
ALTER SERVER ROLE [sysadmin] ADD MEMBER [pacemakerLogin];
GO
将pacemaker的login信息保存到本地文件
echo "pacemakerLogin" >> /var/opt/mssql/secrets/passwd
echo "1qaz@WSX" >> /var/opt/mssql/secrets/passwd
只允许root读取
chown root:root /var/opt/mssql/secrets/passwd
chmod 400 /var/opt/mssql/secrets/passwd
将辅助副本加入到AG, 辅助副本执行
ALTER AVAILABILITY GROUP [ag1] JOIN WITH (CLUSTER_TYPE = EXTERNAL);
GO
auto_seeding功能需要的权限
ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE;
GO
如果您不希望pacemakerLogin具有sysadmin的权限,可以将之从sysadmin中移除,并授予如下权限。每个实例
ALTER SERVER ROLE [sysadmin] DROP MEMBER [pacemakerLogin]
GO
GRANT ALTER, CONTROL, VIEW DEFINITION ON AVAILABILITY GROUP::ag1 TO pacemakerLogin;
GO
GRANT VIEW SERVER STATE TO pacemakerLogin;
GO
添加数据库到AG,主副本执行
CREATE DATABASE [db1];
GO
ALTER DATABASE [db1] SET RECOVERY FULL;
GO
BACKUP DATABASE [db1]
TO DISK = N'nul';
GO
ALTER AVAILABILITY GROUP [ag1] ADD DATABASE [db1];
GO
可用性数据库状态
SELECT * FROM sys.databases WHERE name = 'db1';
GO
SELECT DB_NAME(database_id) AS 'database', synchronization_state_desc FROM sys.dm_hadr_database_replica_states;
在Pacemaker群集中配置AG
创建AG资源,ag_name要指定为之前创建AG名称
pcs resource create agcluster ocf:mssql:ag ag_name=ag1 meta failure-timeout=30s master notify=true
创建虚拟IP资源
禁用fencing
pcs property set stonith-enabled=false
创建VIP
pcs resource create virtualip ocf:heartbeat:IPaddr2 ip=172.17.2.7
创建 colacation constraint,vip和master必需在同一个节点上启动
pcs constraint colocation add virtualip agcluster-master INFINITY with-rsc-role=Master
创建 ordering constraint,vip要先于master副本资源启动
pcs constraint order promote agcluster-master then start virtualip
查看当前的约束
pcs constraint show --full
重新启用STONITH并查看群集状态
pcs property set stonith-enabled=true
pcs status
我的环境中的状态信息
Cluster name: agcluster
Stack: corosync
Current DC: SQL19N3 (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum
Last updated: Wed Apr 29 04:24:50 2020
Last change: Wed Apr 29 04:24:45 2020 by root via cibadmin on SQL19N1
3 nodes configured
5 resources configured
Online: [ SQL19N1 SQL19N2 SQL19N3 ]
Full list of resources:
rsc_st_azure (stonith:fence_azure_arm): Started SQL19N1
Master/Slave Set: agcluster-master [agcluster]
Masters: [ SQL19N1 ]
Slaves: [ SQL19N2 SQL19N3 ]
virtualip (ocf::heartbeat:IPaddr2): Started SQL19N1
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
测试Failover和Fencing
手动failover
pcs resource move agcluster-master SQL19N2 --master
pcs status
手动 failover会生成一个constraint,避免AG资源再回到原来的节点
如果希望AG后续还能 failover回来,需要手动删除之
pcs constraint show --full
pcs constraint remove cli-prefer-agcluster-master
尝试Fencing群集节点,每个节点都试一下
如下命令的fencing只是重启node,如果要安全关闭node,使用--off参数
pcs stonith fence SQL19N3 --debug
作者:Joe.TJ