MongoDB 3.0 WiredTiger Compression and Performance

本文涉及的产品
云数据库 MongoDB,独享型 2核8GB
推荐场景:
构建全方位客户视图
简介:

MongoDB3.0中的压缩选项

在MongoDB 3.0中,WiredTiger为集合提供三个压缩选项:

  1. 无压缩
  2. Snappy(默认启用) – 很不错的压缩,有效利用资源
  3. zlib(类似gzip) – 出色的压缩,但需要占用更多资源

有索引的两个压缩选项:

  1. 无压缩
  2. 前缀(默认启用) – 良好的压缩,资源的有效利用

 

请记住哪些适用于MongoDB的3.0所有压缩选项:

 

  1. 随机数据不能压缩
  2. 二进制数据不能压缩(它可能已经被压缩)
  3. 文本压缩效果特别好
  4. 对于文件中的字段名压缩效果特别好(尤其对短字段名来说)

 官方说法:

Compression

With WiredTiger, MongoDB supports compression for all collections and indexes. Compression minimizes storage use at the expense of additional CPU.

By default, WiredTiger uses block compression with the snappy compression library for all collections andprefix compression for all indexes.

For collections, block compression with zlib is also available. To specify an alternate compression algorithm or no compression, use the storage.wiredTiger.collectionConfig.blockCompressor setting.

For indexes, to disable prefix compression, use thestorage.wiredTiger.indexConfig.prefixCompression setting.

Compression settings are also configurable on a per-collection and per-index basis during collection and index creation. See Specify Storage Engine Options and db.collection.createIndex() storageEngine option.

For most workloads, the default compression settings balance storage efficiency and processing requirements.

The WiredTiger journal is also compressed by default. For information on journal compression, see Journal.

MongoDB 3.0 WiredTiger Compression and Performance

One of the most exciting developments over the lifetime of MongoDB must be the inclusion of the WiredTiger storage engine in MongoDB 3.0. Its very design and core architecture are legions ahead of the current MMAPv1 engine and comparable to most modern day storage engines for various relational and non-relational stores. One of the most compelling features of the WiredTiger storage engine is compression. Let's talk a bit more about performance and compression.

Configuration

MongoDB 3.0 allows the user to configure different storage engines through the storage engine API. For the first time ever we have an amazing array of options for setting up MongoDB to match our workloads and use-cases. To run WiredTiger the version must be 3.0 or higher and the configuration file must call for WiredTiger. For example:

storage:
   dbPath: "/data/mongodb" journal: enabled: true engine: "wiredTiger" wiredTiger: engineConfig: cacheSizeGB: 99 journalCompressor: none directoryForIndexes: "/indexes/mongodb/" collectionConfig: blockCompressor: snappy indexConfig: prefixCompression: true systemLog: destination: file path: "/tmp/mongodb.log" logAppend: true processManagement: fork: true net: port: 9005 unixDomainSocket: enabled : true 

There are a lot of new configuration options in 3.0 so let's take the notable options one by one.

  • storage.engine. The setting ensures we are using the WiredTiger storage engine. Should be set to "wiredTiger" to use the WiredTiger engine. It can also be set to "mmapv1". MMAPv1 is the default in 3.0, but in MongoDB 3.1 (potentially) this will change to wiredTiger.

  • storage.wiredTiger.engineConfig.cacheSizeGB. This sets up a page cache for WiredTiger to cache frequently used data and index blocks in GB. If this is not specified, MongoDB will automatically assign memory up to about 50% of total addressable memory.

  • storage.wiredTiger.engineConfig.directoryForIndexes. Yes! We can now store indexes on a separate block device. This should help DBAs size, capacity plan, and augment performance as needed.

  • storage.wiredTiger.collectionConfig.blockCompressor. This can be set to 'snappy' or 'zlib'. Snappy having higher performance and lower compression than zlib. Some more detail later on compression algorithms.

  • storage.wiredTiger.indexConfig.prefixCompression. This setting enables prefix compression for indexes. Valid options are true|false and the default is true.

Let's talk performance

WiredTiger is going to be much faster than MMAPv1 for almost all workloads. Its real sweet spot is highly concurrent and/or workloads with lots of updates. This may surprise some folks because traditionally compression is a trade off. Add compression, lose performance. That is normally true, but a couple of things need to be considered here. One, we are comparing the MMAPv1 engine with database level locking to WiredTiger with document level locking. Any reasonable concurrent workload is almost always bound by locking and seldom by pure system level resources. Two, WiredTiger does page level compression. More on this later.

There are a few things that make WiredTiger faster other than its locking scope. WiredTiger also has a streamlined process for free space lookups and management and it has a proper cache with its own I/O components.

Because WiredTiger allows for compression, a common worry is the potential for overall performance impact. But as you can see, in a practical sense this worry is mostly unfounded.

A couple graphs for relative performance difference for sysbench-mongodb. It should be noted that WiredTiger is using defaults in this configuration, including snappy compression and index prefix compression.

graph

Let's break it down a bit more:

graph

The relative CPU usage for each:

cpu

Let's talk more about compression

Compressing data inside a database is tricky. WiredTiger does a great job at handling compression because of its sophisticated management approach:

The cache generally stores uncompressed changes (the exception is for very large documents). The default snappy compression is fairly straightforward: it gathers data up to a maximum of 32KB, compresses it, and if compression is successful, writes the block rounded up to the nearest 4KB.

The alternative zlib compression works a little differently: it will gather more data and compress enough to fill a 32KB block on disk. This is more CPU intensive but generally results in better compression ratios (independent of the inherent differences between snappy and zlib).

—Michael Cahill

This approach is great for performance. But compression still has overhead and can vary in effectiveness. What this means for users is two-fold:

  • Not all data sets compress equally, it depends on the data format itself.
  • Data compression is temporal. One day being better than another depending on the specific workload.

One approach is to take a mongodump of the dataset in question then mongorestore that data to a compressed WiredTiger database and measure the difference. This gives a rough measurement of what one can expect the compression ratio to be. That said, as soon as the new compressed database starts taking load, that compression ratio may vary. Probably not by a massive margin however.

It should be noted there are some tricky bits to consider when running a database using compression. Because WiredTiger compresses each page before it hits the disk the memory region is uncompressed. This means that highly compressed data will have a large ratio between its footprint on disk and the cache that serves it. Poorly compressed data the opposite. The effect may be the database becomes slow. It will be hard to know that the problem is the caching pattern has changed because the compression properties of the underlaying data have changed. Keeping good time series data on the cache utilization, and periodically checking the compression of the data by hand may help the DBA understand these patterns better.

For instance, note the different compression ratios of various datasets:

comp

Take Aways

  • MongoDB 3.0 has a new storage engine API, and is delivered with the optional WiredTiger engine.
  • MongoDB 3.0 with WiredTiger is much faster than MMAPv1 mostly because of increased concurrency.
  • MongoDB 3.0 with WiredTiger is much faster than MMAPv1 even when compressing the data.

Lastly, remember, MongoDB 3.0 is a new piece of software. Test before moving production workloads to it. TEST TEST TEST.

If you would like to test MongoDB 3.0 with WiredTiger, ObjectRocket has it as generally available and it's simple and quick to setup. As with anything ObjectRocket, there are a team of DBAs and Developers to help you with your projects. Don't be shy hitting them up at support@objectrocket.com with questions or email me directly.

Note: test configuration and details documented here.












本文转自张昺华-sky博客园博客,原文链接:http://www.cnblogs.com/bonelee/p/6289909.html,如需转载请自行联系原作者


相关实践学习
MongoDB数据库入门
MongoDB数据库入门实验。
快速掌握 MongoDB 数据库
本课程主要讲解MongoDB数据库的基本知识,包括MongoDB数据库的安装、配置、服务的启动、数据的CRUD操作函数使用、MongoDB索引的使用(唯一索引、地理索引、过期索引、全文索引等)、MapReduce操作实现、用户管理、Java对MongoDB的操作支持(基于2.x驱动与3.x驱动的完全讲解)。 通过学习此课程,读者将具备MongoDB数据库的开发能力,并且能够使用MongoDB进行项目开发。   相关的阿里云产品:云数据库 MongoDB版 云数据库MongoDB版支持ReplicaSet和Sharding两种部署架构,具备安全审计,时间点备份等多项企业能力。在互联网、物联网、游戏、金融等领域被广泛采用。 云数据库MongoDB版(ApsaraDB for MongoDB)完全兼容MongoDB协议,基于飞天分布式系统和高可靠存储引擎,提供多节点高可用架构、弹性扩容、容灾、备份回滚、性能优化等解决方案。 产品详情: https://www.aliyun.com/product/mongodb
相关文章
|
3月前
|
存储 NoSQL 算法
MongoDB存储引擎发展及WiredTiger深入解析(二)
MongoDB存储引擎发展及WiredTiger深入解析(二)
|
10月前
|
存储 NoSQL Shell
如何将阿里云WiredTiger引擎的MongoDB物理备份文件恢复至自建数据库
数据库操作一直是一个比较敏感的话题,动不动“删库跑路”,可见数据库操作对于一个项目而言是非常重要的,我们有时候会因为一个游戏的严重bug或者运营故障要回档数据库,而你们刚好使用的是阿里云的Mongodb,那么这篇文章将给你提供一个思路(或许你按照阿里云官网的文档一顿操作下来,并不是那么顺利,有一些报错,无法登录...)
|
存储 NoSQL
MongoDB 存储引擎 WiredTiger 原理解析
在团队内部分享了 Wiredtiger 引擎的原理,为此画了多张图来辅助说明,对了解 Wiredtiger 应该是非常有帮助的,内容分享出来给大家。暂时没时间整理文字版,对实现原理非常感兴趣的同学,如果PPT没讲明白,可以找我私下交流。
|
存储 NoSQL API
MongoDB如何使用wiredTiger?
Mongodb 3.0支持用户自定义存储引擎,用户可配置使用mmapv1或者wiredTiger存储引擎,本文主要介绍Mongodb是如何使用wiredTiger数据库作为底层的数据存储层。目前还没有读过wiredTiger的源码,本文的内容都是基于wiredTiger官方文档,以及Mongodb.
MongoDB 3.2.9 请求 hang 分析及 wiredtiger 调优
MongoDB 3.2.9 版本在 wiredtiger 上做了很多改进,但不幸的时,这个版本引入了一个新的 bug,持续大量 insert/update 场景,有一定的可能导致 wiredtiger 进入 deadlock,MongoDB 官方迅速的在3.2.10里修复了该问题,该版本在 wir.
|
存储 NoSQL 关系型数据库
MongoDB WiredTiger 存储引擎cache_pool设计 (上) -- 原理篇
## 1. MongoDB 多引擎体系 -- WiredTiger MongoDB v.3.0之前的版本,默认使用`MMAP(MMap引擎)`方式对内存中的数据进行写盘存储,遭受了很多诟病。比如`并发受限的表锁、不支持压缩、不可控的IO`操作等,MMAP甚至不能称作一个完整的存储引擎(笔者的个人观点),对数据(Btree的数据页、索引页)的操作甚至要依赖os的mmap(in_page_ca
6998 0
|
存储 NoSQL 数据库
MongoDB Wiredtiger存储引擎实现原理
Mongodb Wiredtiger存储引擎实现原理 Mongodb-3.2已经WiredTiger设置为了默认的存储引擎,最近通过阅读wiredtiger源代码(在不了解其内部实现的情况下,读代码难度相当大,代码量太大,强烈建议官方多出些介绍文章),理清了wiredtiger的大致原理,并简单总
下一篇
DDNS