复制集(replica Set)或者副本集是MongoDB的核心高可用特性之一,它基于主节点的oplog日志持续传送到辅助节点,并重放得以实现主从节点一致。再结合心跳机制,当感知到主节点不可访问或宕机的情形下,辅助节点通过选举机制来从剩余的辅助节点中推选一个新的主节点从而实现自动切换。这个特性与MySQL MHA实现原理一样。本文主要描述MongoDB复制集并给出创建复制集示例以及完成自动切换。
一、复制集相关概念
复制集
复制是在多台服务器之间同步数据的过程,由一组Mongod实例(进程)组成,包含一个Primary节点和多个Secondary节点
Mongodb Driver(客户端)的所有数据都写入Primary,Secondary从Primary同步写入的数据
通过上述方式来保持复制集内所有成员存储相同的数据集,提供数据的高可用
复制的目的
Failover (故障转移,故障切换,故障恢复)
Redundancy(数据冗余)
避免单点,用于灾难时恢复,报表处理,提升数据可用性
读写分离,分担读压力
对用户透明的系统维护升级
复制集的原理
主节点记录所有的变更到oplog日志
辅助节点(Secondary)复制主节点的oplog日志并且将这些日志在辅助节点进行重放(做)
各个节点之间会定期发送心跳信息,一旦主节点宕机,则触发选举一个新的主节点,剩余的辅助节点指向新的主
10s内各辅助节点无法感知主节点的存在,则开始触发选举
通常1分钟内完成主辅助节点切换,10-30s内感知主节点故障,10-30s内完成选举及切换
复制≠备份
用户恢复数据,防止数据丢失,实现灾难恢复
人为误操作导致数据删除,程序Bug导致数据损坏等
Primary
首要复制节点,由选举产生,提供读写服务的节点,产生oplog日志
Secondary
备用(辅助)复制节点,Secondary可以提供读服务,增加Secondary节点可以提供复制集的读服务能力
在故障时,备用节点可以根据设定的优先级别提升为首要节点。提升了复制集的可用性
Arbiter
Arbiter节点只参与投票,不能被选为Primary,并且不从Primary同步数据
Arbiter本身不存储数据,是非常轻量级的服务。
当复制集成员为偶数时,最好加入一个Arbiter节点,以提升复制集可用性
复制集示意图
二、创建复制集
# cat /etc/redhat-release
CentOS release 6.7 (Final)
# mongod --version
db version v3.0.12
git version: 33934938e0e95d534cebbaff656cde916b9c3573
创建实例对应的数据目录
# mkdir -pv /data/{n1,n2,n3}
# mongod --replSet repSetTest --dbpath /data/n1 --logpath /data/n1/n1.log \
> --port 27000 --smallfiles --oplogSize 128 --fork
# mongod --replSet repSetTest --dbpath /data/n2 --logpath /data/n2/n2.log \
> --port 27001 --smallfiles --oplogSize 128 --fork
# mongod --replSet repSetTest --dbpath /data/n3 --logpath /data/n3/n3.log \
> --port 27002 --smallfiles --oplogSize 128 --fork
查看相应的端口
# netstat -nltp|grep mongod
tcp 0 0 0.0.0.0:27000 0.0.0.0:* LISTEN 5765/mongod
tcp 0 0 0.0.0.0:27001 0.0.0.0:* LISTEN 5781/mongod
tcp 0 0 0.0.0.0:27002 0.0.0.0:* LISTEN 5810/mongod
连接到第一个实例
# mongo localhost:27000
MongoDB shell version: 3.0.12
connecting to: localhost:27000/test
> db.person.insert({name:'Fred', age:35}) //提示当前节点非master节点
WriteResult({ "writeError" : { "code" : undefined, "errmsg" : "not master" } })
>
//下面我们添加复制集的配置文件
> cfg = {
... '_id':'repSetTest',
... 'members':[
... {'_id':0, 'host': 'localhost:27000'},
... {'_id':1, 'host': 'localhost:27001'},
... {'_id':2, 'host': 'localhost:27002'}
... ]
... }
{
"_id" : "repSetTest",
"members" : [
{
"_id" : 0,
"host" : "localhost:27000"
},
{
"_id" : 1,
"host" : "localhost:27001"
},
{
"_id" : 2,
"host" : "localhost:27002"
}
]
}
//复制集通过replSetInitiate命令(或mongo shell的rs.initiate())进行初始化
//初始化后各个成员间开始发送心跳消息,并发起Priamry选举操作
//获得『大多数』成员投票支持的节点,会成为Primary,其余节点成为Secondary。
//通常建议将复制集成员数量设置为奇数,以确保在复制集故障的时候能够正确选举出Primary。
//对于复制集故障导致无法正确选举得到Primary的情形下,复制集将无法提供写服务,处于只读状态
> rs.initiate(cfg) //初始化配置文件
{ "ok" : 1 }
//查看状态,以下提示27000为主节点,其余2个端口为辅助节点
repSetTest:OTHER> rs.status()
{
"set" : "repSetTest",
"date" : ISODate("2016-08-30T05:41:15.302Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "localhost:27000",
"health" : 1, //健康状态:OK
"state" : 1,
"stateStr" : "PRIMARY", //当前为主节点
"uptime" : 118,
"optime" : Timestamp(1472535666, 1),
"optimeDate" : ISODate("2016-08-30T05:41:06Z"),
"electionTime" : Timestamp(1472535670, 1),
"electionDate" : ISODate("2016-08-30T05:41:10Z"),
"configVersion" : 1,
"self" : true
},
{
"_id" : 1,
"name" : "localhost:27001",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 9,
"optime" : Timestamp(1472535666, 1),
"optimeDate" : ISODate("2016-08-30T05:41:06Z"),
"lastHeartbeat" : ISODate("2016-08-30T05:41:14.030Z"),
"lastHeartbeatRecv" : ISODate("2016-08-30T05:41:14.048Z"),
"pingMs" : 0,
"configVersion" : 1
},
{
"_id" : 2,
"name" : "localhost:27002",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 9,
"optime" : Timestamp(1472535666, 1),
"optimeDate" : ISODate("2016-08-30T05:41:06Z"),
"lastHeartbeat" : ISODate("2016-08-30T05:41:14.030Z"),
"lastHeartbeatRecv" : ISODate("2016-08-30T05:41:14.057Z"),
"pingMs" : 0,
"configVersion" : 1
}
],
"ok" : 1
}
//使用isMaster()函数寻找谁是Master
repSetTest:PRIMARY> db.isMaster()
{
"setName" : "repSetTest",
"setVersion" : 1,
"ismaster" : true,
"secondary" : false,
"hosts" : [
"localhost:27000",
"localhost:27001",
"localhost:27002"
],
"primary" : "localhost:27000",
"me" : "localhost:27000",
"electionId" : ObjectId("57c51c76d5963b4abbd1d72f"),
"maxBsonObjectSize" : 16777216,
"maxMessageSizeBytes" : 48000000,
"maxWriteBatchSize" : 1000,
"localTime" : ISODate("2016-08-30T05:42:12.328Z"),
"maxWireVersion" : 3,
"minWireVersion" : 0,
"ok" : 1
}
//连接到primary或者secondary
# mongo localhost:27000
# mongo localhost:27001
# mongo localhost:27002
//在主复制集上插入文档
repSetTest:PRIMARY> db.replTest.insert({_id:1, value:'abc'})
WriteResult({ "nInserted" : 1 })
repSetTest:PRIMARY> db.replTest.findOne()
{ "_id" : 1, "value" : "abc" }
//连接到从库查询,提示not master
# mongo localhost:27001
MongoDB shell version: 3.0.12
connecting to: localhost:27001/test
repSetTest:SECONDARY> db.replTest.find()
Error: error: { "$err" : "not master and slaveOk=false", "code" : 13435 }
//开启slave查询
repSetTest:SECONDARY> rs.slaveOk(true)
repSetTest:SECONDARY> db.replTest.find()
{ "_id" : 1, "value" : "abc" }
//辅助复制集不支持CUD
repSetTest:SECONDARY> db.replTest.insert({_id:2,value:"cde"})
WriteResult({ "writeError" : { "code" : undefined, "errmsg" : "not master" } })
三、复制集自动故障转移
# netstat -nltp|grep 27000
tcp 0 0 0.0.0.0:27000 0.0.0.0:* LISTEN 13555/mongod
# kill -9 13555
# mongo localhost:27000
connecting to: localhost:27000/test
2016-08-30T13:44:55.671+0800 W NETWORK Failed to connect to 127.0.0.1:27000,
reason: errno:111 Connection refused
2016-08-30T13:44:55.672+0800 E QUERY Error: couldn't connect to server localhost:27000 (127.0.0.1),
connection attempt failed
at connect (src/mongo/shell/mongo.js:181:14)
at (connect):1:6 at src/mongo/shell/mongo.js:181
exception: connect failed
//连接到27001端口,如下面的查询,27000连接失败,27001已经提升为PRIMARY
# mongo localhost:27001
MongoDB shell version: 3.0.12
connecting to: localhost:27001/test
repSetTest:PRIMARY> rs.status()
{
"set" : "repSetTest",
"date" : ISODate("2016-08-30T05:45:39.018Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "localhost:27000",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)", //此时提示27000不可达
"uptime" : 0,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2016-08-30T05:45:38.378Z"),
"lastHeartbeatRecv" : ISODate("2016-08-30T05:44:48.263Z"),
"pingMs" : 0,
"lastHeartbeatMessage" : "Failed attempt to connect to localhost:27000;
couldn't connect to server localhost:27000 (127.0.0.1), connection attempt failed",
"configVersion" : -1
},
{
"_id" : 1,
"name" : "localhost:27001", // Author : Leshami
"health" : 1, // Blog : http://blog.csdn.net/leshami
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 372,
"optime" : Timestamp(1472535845, 2),
"optimeDate" : ISODate("2016-08-30T05:44:05Z"),
"electionTime" : Timestamp(1472535890, 1),
"electionDate" : ISODate("2016-08-30T05:44:50Z"),
"configVersion" : 1,
"self" : true
},
{
"_id" : 2,
"name" : "localhost:27002",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 272,
"optime" : Timestamp(1472535845, 2),
"optimeDate" : ISODate("2016-08-30T05:44:05Z"),
"lastHeartbeat" : ISODate("2016-08-30T05:45:38.356Z"),
"lastHeartbeatRecv" : ISODate("2016-08-30T05:45:38.356Z"),
"pingMs" : 0,
"configVersion" : 1
}
],
"ok" : 1
}
//重新启动27000实例
# mongod --replSet repSetTest --dbpath /data/n1 --logpath /data/n1/n1.log --port \
> 27000 --smallfiles --oplogSize 128 --fork
about to fork child process, waiting until server is ready for connections.
forked process: 16473
child process started successfully, parent exiting
//再次查看复制集的状态,此时27000为辅助副本
repSetTest:PRIMARY> rs.status()
{
"set" : "repSetTest",
"date" : ISODate("2016-08-30T05:47:25.220Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "localhost:27000",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY", //此时该节点变成了辅助节点
"uptime" : 12,
"optime" : Timestamp(1472535845, 2),
"optimeDate" : ISODate("2016-08-30T05:44:05Z"),
"lastHeartbeat" : ISODate("2016-08-30T05:47:24.819Z"),
"lastHeartbeatRecv" : ISODate("2016-08-30T05:47:25.061Z"),
"pingMs" : 0,
"configVersion" : 1
},
{
"_id" : 1,
"name" : "localhost:27001",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 478,
"optime" : Timestamp(1472535845, 2),
"optimeDate" : ISODate("2016-08-30T05:44:05Z"),
"electionTime" : Timestamp(1472535890, 1),
"electionDate" : ISODate("2016-08-30T05:44:50Z"),
"configVersion" : 1,
"self" : true
},
{
"_id" : 2,
"name" : "localhost:27002",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 379,
"optime" : Timestamp(1472535845, 2),
"optimeDate" : ISODate("2016-08-30T05:44:05Z"),
"lastHeartbeat" : ISODate("2016-08-30T05:47:24.816Z"),
"lastHeartbeatRecv" : ISODate("2016-08-30T05:47:24.816Z"),
"pingMs" : 0,
"configVersion" : 1
}
],
"ok" : 1
}
四、获取复制集的帮助
repSetTest:PRIMARY> rs.help() //获取副本集相关的帮助命令
rs.status() { replSetGetStatus : 1 } checks repl set status
rs.initiate() { replSetInitiate : null } initiates set with default settings
rs.initiate(cfg) { replSetInitiate : cfg } initiates set with configuration cfg
rs.conf() get the current configuration object from local.system.replset
rs.reconfig(cfg) updates the configuration of a running replica set with cfg (disconnects)
rs.add(hostportstr) add a new member to the set with default attributes (disconnects)
rs.add(membercfgobj) add a new member to the set with extra attributes (disconnects)
rs.addArb(hostportstr) add a new member which is arbiterOnly:true (disconnects)
rs.stepDown([stepdownSecs, catchUpSecs]) step down as primary (disconnects)
rs.syncFrom(hostportstr) make a secondary sync from the given member
rs.freeze(secs) make a node ineligible to become primary for the time specified
rs.remove(hostportstr) remove a host from the replica set (disconnects)
rs.slaveOk() allow queries on secondary nodes
rs.printReplicationInfo() check oplog size and time range
rs.printSlaveReplicationInfo() check replica set members and replication lag
db.isMaster() check who is primary
reconfiguration helpers disconnect from the database so the shell will display
an error, even if the command succeeds.