MongoDB的用户在遇到性能问题时,经常会关注到 serverStatus.globalLock 指标,但对指标的含义不是很明确,本文会深入解释下 globalLock 指标的含义。
PRIMARY> db.serverStatus().globalLock
{
"totalTime" : NumberLong("7069085891000"),
"currentQueue" : {
"total" : 0,
"readers" : 0,
"writers" : 0
},
"activeClients" : {
"total" : 23,
"readers" : 0,
"writers" : 0
}
}
大家可以先看下官方文档 对globalLock的解释 (使用MongoDB遇到问题都请第一时间去查阅官方文档) ,如果中间分析部分的内容读起来有困难,可直接调至最后的总结部分。
globalLock
A document that reports on the database’s lock state.
Generally, the locks document provides more detailed data on lock uses.
globalLock.totalTime
The time, in microseconds, since the database last started and created the globalLock. This is roughly equivalent to total server uptime.
globalLock.currentQueue
A document that provides information concerning the number of operations queued because of a lock.
globalLock.currentQueue.total
The total number of operations queued waiting for the lock (i.e., the sum of globalLock.currentQueue.readers and globalLock.currentQueue.writers).
A consistently small queue, particularly of shorter operations, should cause no concern. The globalLock.activeClients readers and writers information provides contenxt for this data.
globalLock.currentQueue.readers
The number of operations that are currently queued and waiting for the read lock. A consistently small read-queue, particularly of shorter operations, should cause no concern.
globalLock.currentQueue.writers
The number of operations that are currently queued and waiting for the write lock. A consistently small write-queue, particularly of shorter operations, is no cause for concern.
globalLock.activeClients
A document that provides information about the number of connected clients and the read and write operations performed by these clients.
Use this data to provide context for the globalLock.currentQueue data.
globalLock.activeClients.total
The total number of active client connections to the database (i.e., the sum of globalLock.activeClients.readers and globalLock.activeClients.writers).
globalLock.activeClients.readers
The number of the active client connections performing read operations.
globalLock.activeClients.writers
The number of active client connections performing write operations.
Client锁的状态
enum ClientState { // 枚举常量,标识Client的当前状态
kInactive,
kActiveReader,
kActiveWriter,
kQueuedReader,
kQueuedWriter };
Mongod上每个连接会对应一个Client对象,Client里包含当前锁的状态,初始为 kInactive,根据请求及并发状况的不同,会进入到其他的状态,核心逻辑在 lockGlobalBegin 里实现。
template <bool IsForMMAPV1>
LockResult LockerImpl<IsForMMAPV1>::lockGlobalBegin(LockMode mode) {
dassert(isLocked() == (_modeForTicket != MODE_NONE));
if (_modeForTicket == MODE_NONE) {
const bool reader = isSharedLockMode(mode);
auto holder = ticketHolders[mode];
if (holder) {
_clientState.store(reader ? kQueuedReader : kQueuedWriter);
holder->waitForTicket();
}
_clientState.store(reader ? kActiveReader : kActiveWriter);
_modeForTicket = mode;
}
const LockResult result = lockBegin(resourceIdGlobal, mode);
if (result == LOCK_OK)
return LOCK_OK;
// Currently, deadlock detection does not happen inline with lock acquisition so the only
// unsuccessful result that the lock manager would return is LOCK_WAITING.
invariant(result == LOCK_WAITING);
return result;
}
而 serverStatus.globalLock 其实根据这个锁的状态进行导出
2018-03-13 update
- 获取 Client 状态时,已经获取到 ticket 的 Reader/Writer 如果在等锁,也会认为是 Queued 状态,这个之前忽略了。
template <bool IsForMMAPV1>
Locker::ClientState LockerImpl<IsForMMAPV1>::getClientState() const {
auto state = _clientState.load();
if (state == kActiveReader && hasLockPending())
state = kQueuedReader;
if (state == kActiveWriter && hasLockPending())
state = kQueuedWriter;
return state;
}
ret.append("totalTime", (long long)(1000 * (curTimeMillis64() - _started)));
{
BSONObjBuilder currentQueueBuilder(ret.subobjStart("currentQueue"));
currentQueueBuilder.append("total",
clientStatusCounts[Locker::kQueuedReader] +
clientStatusCounts[Locker::kQueuedWriter]);
currentQueueBuilder.append("readers", clientStatusCounts[Locker::kQueuedReader]);
currentQueueBuilder.append("writers", clientStatusCounts[Locker::kQueuedWriter]);
currentQueueBuilder.done();
}
{
BSONObjBuilder activeClientsBuilder(ret.subobjStart("activeClients"));
activeClientsBuilder.append("total", clientStatusCounts.sum());
activeClientsBuilder.append("readers", clientStatusCounts[Locker::kActiveReader]);
activeClientsBuilder.append("writers", clientStatusCounts[Locker::kActiveWriter]);
activeClientsBuilder.done();
}
总结一下
globalLock.totalTime = 进程启动后经历的时间
globalLock.currentQueue.total = 下面2者之和
globalLock.currentQueue.readers = kQueuedReader 状态Client总数
globalLock.currentQueue.writers = kQueuedWriter 状态Client总数
globalLock.activerClients.totol = 下面2者之和 + 系统内部的一些Client(比如同步线程)
globalLock.activerClients.readers = kActiveReader 状态Client总数
globalLock.activerClients.writers = kActiveWriter 状态Client总数
详解 globalLock 状态转换
为了方便后续介绍,先科普一下MongoDB的层次锁模型
锁的模式
/**
* Lock modes.
*
* Compatibility Matrix
* Granted mode
* ---------------.--------------------------------------------------------.
* Requested Mode | MODE_NONE MODE_IS MODE_IX MODE_S MODE_X |
* MODE_IS | + + + + - |
* MODE_IX | + + + - - |
* MODE_S | + + - + - |
* MODE_X | + - - - - |
*/
MongoDB 加锁时,有四种模式【MODE_IS、MODE_IX、MODE_S、MODE_X】,MODE_S, MODE_X 很容易理解,分别是互斥读锁、互斥写锁,MODE_IS、MODE_IX是为了实现层次锁模型引入的,称为意向读锁、意向写锁,锁之间的竞争情况如上图所示。
MongoDB在加锁时,是一个层次性的管理方式,从 globalLock ==> DBLock ==> CollecitonLock ... ,比如我们都知道MongoDB wiredtiger是文档级别锁,那么读写并发时,加锁就类似如下
写操作
1. globalLock (这一层只关注是读还是写,不关注具体是什么LOCK)
2. DBLock MODE_IX
3. Colleciotn MODE_IX
4. pass request to wiredtiger
读操作
1. globalLock MODE_IS (这一层只关注是读还是写,不关注具体是什么LOCK)
2. DBLock MODE_IS
3. Colleciton MODE_IS
4. pass request to wiredtiger
根据上图的竞争情况,IS和IX是无需竞争的,所以读写请求可以在没有竞争的情况下,同时传到wiredtiger引擎去处理。
再举个栗子,如果一个前台建索引的操作跟一个读请求并发了
前台建索引操作
1. globalLock MODE_IX (这一层只关注是读还是写,不关注具体是什么LOCK)
2. DBLock MODE_X
3. pass to wiredtiger
读操作
1. globalLock MODE_IS (这一层只关注是读还是写,不关注具体是什么LOCK)
2. DBLock MODE_IS
3. Colleciton MODE_IS
4. pass request to wiredtiger
根据竞争表,MODE_X和MODE_IS是要竞争的,这也就是为什么前台建索引的过程中读是被阻塞的。
我们今天介绍的 globalLock 对应上述的第一步,在globalLock这一层,只关心是读锁、还是写锁,不关心是互斥锁还是意向锁,所以 globalLock 这一层是不存在竞争的。那么 globalLock 里的几个指标到底意味着什么?
从上述的代码可以发现,globalLockBegin里(基本所有的数据库读写请求都要走这个路径)决定了globalLock的状态转换,核心逻辑如下
template <bool IsForMMAPV1>
LockResult LockerImpl<IsForMMAPV1>::lockGlobalBegin(LockMode mode) {
const bool reader = isSharedLockMode(mode);
auto holder = ticketHolders[mode];
if (holder) {
_clientState.store(reader ? kQueuedReader : kQueuedWriter);
holder->waitForTicket();
}
_clientState.store(reader ? kActiveReader : kActiveWriter);
....
const LockResult result = lockBegin(resourceIdGlobal, mode);
if (result == LOCK_OK)
return LOCK_OK;
...
}
上述代码里,如果holder不为空,Client会先进去kQueuedReader或kQueuedWriter状态,然后获取一个ticket,获取到后转换为kActiveReader或kActiveWriter状态。这里的ticket是什么东西?
这里的ticket是引擎可以设置的一个限制。正常情况下,如果没有锁竞争,所有的读写请求都会被pass到引擎层,这样就有个问题,你请求到了引擎层面,还是得排队执行,而且不同引擎处理能力肯定也不同,于是引擎层就可以通过设置这个ticket,来限制一下传到引擎层面的最大并发数。比如
- wiredtiger设置了读写ticket均为128,也就是说wiredtiger引擎层最多支持128的读写并发(这个值经过测试是非常合理的经验值,无需修改)。
- mmapv1引擎并没有设置ticket的限制,也就是说用mmapv1引擎时,globalLock的currentQueue会一直是0.
globalLock完成后,client就进入了kActiveReader或kActiveWriter中的一种状态,这个就对应了globalLock.activerClients字段里的指标,接下来才开始lockBegin,加DB、Collection等层次锁,更底层的锁竞争会间接影响到globalLock。
总结
serverStatus.globalLock 或者 mongostat (qr|qw ar|aw指标)能查看mongod globalLock的各个指标情况。
- Wiredtiger限制传递到引擎层面的最大读写并发数均为128(合理的经验值,通常无需调整),如果超过这个阈值,排队的请求就会体现在globalLock.currentQueue.readers/writers里。
- 如果globalLock.currentQueue.readers/writers个值长时间都不为0(此时globalLock.activeClients.readers/writers肯定是持续接近或等于128的),说明你的系统并发太高(或者有长时间占用互斥锁的请求比如前台建索引),可以通过优化单个请求的处理时间(比如建索引来减少COLLSCAN或SORT),或升级后端资源(内存、磁盘IO能力、CPU)来优化。
- globalLock.activeClients.readers/writers 持续不为0(但没达到128,此时currentQueue为空),并且你觉得请求处理已经很慢了,这时也可以考虑2中提到的优化方法。