In-place update in WiredTiger

本文涉及的产品
云原生数据库 PolarDB 分布式版,标准版 2核8GB
RDS PostgreSQL Serverless,0.5-4RCU 50GB 3个月
推荐场景:
对影评进行热评分析
云数据库 RDS MySQL,集群系列 2核4GB
推荐场景:
搭建个人博客
简介:

There is a great new feature in the release note of MongoDB 3.5.12.

Faster In-place Updates in WiredTiger

This work brings improvements to in-place update workloads for users running the WiredTiger engine, especially for updates to large documents. Some workloads may see a reduction of up to 7x in disk utilization (from 24 MB/s to 3 MB/s) as well as a 20% improvement in throughput.

I thought wiredtiger has impeletementd the delta page feature introduced in the bw-tree paper, that is, writing pages that are deltas from previously written pages. But after I read the source code, I found it's a totally diffirent idea, in-place update only impacted the in-meomry and journal format, the on disk layout of data is not changed.

I will explain the core of the in-place update implementation.

MongoDB introduced mutable bson to descirbe document update as incremental(delta) update.

Mutable BSON provides classes to facilitate the manipulation of existing BSON objects or the construction of new BSON objects from scratch in an incremental fashion.

Suppose you have a very large document, see 1MB

{
   _id: ObjectId("59097118be4a61d87415cd15"),
   name: "ZhangYoudong",
   birthday: "xxxx",
   fightvalue: 100,
   xxx: .... // many other fields
}

If the fightvalue is changed from 100 to 101, you can use a DamageEvent to describe the update, it just tells you the offset、size、content(kept in another array) of the change.

struct DamageEvent {
    typedef uint32_t OffsetSizeType;
    // Offset of source data (in some buffer held elsewhere).
    OffsetSizeType sourceOffset;

    // Offset of target data (in some buffer held elsewhere).
    OffsetSizeType targetOffset;

    // Size of the damage region.
    size_t size;
};

So if you have many small changes for a document, you will have DamageEvent array, MongoDB add a new storage interface to support inserting DamageEvent array (DamageVector).

bool WiredTigerRecordStore::updateWithDamagesSupported() const {
    return true;
}

StatusWith<RecordData> WiredTigerRecordStore::updateWithDamages(
    OperationContext* opCtx,
    const RecordId& id,
    const RecordData& oldRec,
    const char* damageSource,
    const mutablebson::DamageVector& damages) {

}

WiredTiger added a new update type called WT_UPDATE_MODIFIED to support MongoDB, when a WT_UPDATE_MODIFIED update happened, wiredTiger first logged a change list which is transformed from DamageVector into journal, then kept the change list in memory associated with the original record.

When the record is read, wiredTiger will first read the original record, then apply every operation in change list, returned the final record to the client.

So the core for in-place update:

  1. WiredTiger support delta update in memory and journal, so the IO of writing journal will be greatly reduced for large document.
  2. WiredTiger's data layout is kept unchanged, so the IO of writing data is not changed.
相关实践学习
MongoDB数据库入门
MongoDB数据库入门实验。
快速掌握 MongoDB 数据库
本课程主要讲解MongoDB数据库的基本知识,包括MongoDB数据库的安装、配置、服务的启动、数据的CRUD操作函数使用、MongoDB索引的使用(唯一索引、地理索引、过期索引、全文索引等)、MapReduce操作实现、用户管理、Java对MongoDB的操作支持(基于2.x驱动与3.x驱动的完全讲解)。 通过学习此课程,读者将具备MongoDB数据库的开发能力,并且能够使用MongoDB进行项目开发。 &nbsp; 相关的阿里云产品:云数据库 MongoDB版 云数据库MongoDB版支持ReplicaSet和Sharding两种部署架构,具备安全审计,时间点备份等多项企业能力。在互联网、物联网、游戏、金融等领域被广泛采用。 云数据库MongoDB版(ApsaraDB for MongoDB)完全兼容MongoDB协议,基于飞天分布式系统和高可靠存储引擎,提供多节点高可用架构、弹性扩容、容灾、备份回滚、性能优化等解决方案。 产品详情: https://www.aliyun.com/product/mongodb
相关文章
|
4月前
|
存储 关系型数据库 MySQL
InnoDB and MyISAM Index Statistics Collection
存储引擎收集表统计信息,供优化器使用,关键数据为平均值组大小,反映相同键前缀值的行数均值。该值影响索引效率,值越大,索引查找行数越多,效用越低。MySQL通过调整`innodb_stats_method`和`myisam_status`系统变量控制统计方法,涉及NULL值处理,如nulls_equal将所有NULL视为同一值组,可能影响索引使用决策。通过设置变量可优化统计信息收集,提升查询性能。
|
关系型数据库 数据库 PostgreSQL
PG备份恢复:multiple primary keys for table "t1" are not allowed
PG备份恢复:multiple primary keys for table "t1" are not allowed
488 0
|
存储 关系型数据库 MySQL
超详细!Mysql错误1452 - Cannot add or update a child row: a foreign key constraint fails 原因及解决方法
超详细!Mysql错误1452 - Cannot add or update a child row: a foreign key constraint fails 原因及解决方法
4177 0
超详细!Mysql错误1452 - Cannot add or update a child row: a foreign key constraint fails 原因及解决方法
|
索引
Truncate Table的时候不管是用drop storage 或reuse storage都会将HWM重新设置到第一
A, B 为两个Table . A, B 的数据分别放在 erp_data  表空间下  A, B 的索引分别放在 erp_indx  表空间下   那么我们使用下面的两个语句删除两个表中的数据 Truncate table A  drop    storage  ;...
863 0
|
关系型数据库 MySQL 数据库
MySQL问题解决:Cannot delete or update a parent row: a foreign key constraint fails
MySQL问题解决:Cannot delete or update a parent row: a foreign key constraint fails
1550 0
|
关系型数据库 MySQL
Mysql - 删除表时出现: Cannot delete or update a parent row: a foreign key constraint fails
Mysql - 删除表时出现: Cannot delete or update a parent row: a foreign key constraint fails
281 0