mongodb压缩——snappy、zlib块压缩,btree索引前缀压缩

本文涉及的产品
云数据库 MongoDB,独享型 2核8GB
推荐场景:
构建全方位客户视图
简介:

MongoDB 3.0 WiredTiger Compression and Performance

One of the most exciting developments over the lifetime of MongoDB must be the inclusion of the WiredTiger storage engine in MongoDB 3.0. Its very design and core architecture are legions ahead of the current MMAPv1 engine and comparable to most modern day storage engines for various relational and non-relational stores. One of the most compelling features of the WiredTiger storage engine is compression. Let's talk a bit more about performance and compression.

Configuration

MongoDB 3.0 allows the user to configure different storage engines through the storage engine API. For the first time ever we have an amazing array of options for setting up MongoDB to match our workloads and use-cases. To run WiredTiger the version must be 3.0 or higher and the configuration file must call for WiredTiger. For example:

storage:
   dbPath: "/data/mongodb" journal: enabled: true engine: "wiredTiger" wiredTiger: engineConfig: cacheSizeGB: 99 journalCompressor: none directoryForIndexes: "/indexes/mongodb/" collectionConfig: blockCompressor: snappy indexConfig: prefixCompression: true systemLog: destination: file path: "/tmp/mongodb.log" logAppend: true processManagement: fork: true net: port: 9005 unixDomainSocket: enabled : true 

There are a lot of new configuration options in 3.0 so let's take the notable options one by one.

  • storage.engine. The setting ensures we are using the WiredTiger storage engine. Should be set to "wiredTiger" to use the WiredTiger engine. It can also be set to "mmapv1". MMAPv1 is the default in 3.0, but in MongoDB 3.1 (potentially) this will change to wiredTiger.

  • storage.wiredTiger.engineConfig.cacheSizeGB. This sets up a page cache for WiredTiger to cache frequently used data and index blocks in GB. If this is not specified, MongoDB will automatically assign memory up to about 50% of total addressable memory.

  • storage.wiredTiger.engineConfig.directoryForIndexes. Yes! We can now store indexes on a separate block device. This should help DBAs size, capacity plan, and augment performance as needed.

  • storage.wiredTiger.collectionConfig.blockCompressor. This can be set to 'snappy' or 'zlib'. Snappy having higher performance and lower compression than zlib. Some more detail later on compression algorithms.

  • storage.wiredTiger.indexConfig.prefixCompression. This setting enables prefix compression for indexes. Valid options are true|false and the default is true.

Let's talk performance

WiredTiger is going to be much faster than MMAPv1 for almost all workloads. Its real sweet spot is highly concurrent and/or workloads with lots of updates. This may surprise some folks because traditionally compression is a trade off. Add compression, lose performance. That is normally true, but a couple of things need to be considered here. One, we are comparing the MMAPv1 engine with database level locking to WiredTiger with document level locking. Any reasonable concurrent workload is almost always bound by locking and seldom by pure system level resources. Two, WiredTiger does page level compression. More on this later.

There are a few things that make WiredTiger faster other than its locking scope. WiredTiger also has a streamlined process for free space lookups and management and it has a proper cache with its own I/O components.

Because WiredTiger allows for compression, a common worry is the potential for overall performance impact. But as you can see, in a practical sense this worry is mostly unfounded.

A couple graphs for relative performance difference for sysbench-mongodb. It should be noted that WiredTiger is using defaults in this configuration, including snappy compression and index prefix compression.

graph

Let's break it down a bit more:

graph

The relative CPU usage for each:

cpu

Let's talk more about compression

Compressing data inside a database is tricky. WiredTiger does a great job at handling compression because of its sophisticated management approach:

The cache generally stores uncompressed changes (the exception is for very large documents). The default snappy compression is fairly straightforward: it gathers data up to a maximum of 32KB, compresses it, and if compression is successful, writes the block rounded up to the nearest 4KB.

The alternative zlib compression works a little differently: it will gather more data and compress enough to fill a 32KB block on disk. This is more CPU intensive but generally results in better compression ratios (independent of the inherent differences between snappy and zlib).

—Michael Cahill

This approach is great for performance. But compression still has overhead and can vary in effectiveness. What this means for users is two-fold:

  • Not all data sets compress equally, it depends on the data format itself.
  • Data compression is temporal. One day being better than another depending on the specific workload.

One approach is to take a mongodump of the dataset in question then mongorestore that data to a compressed WiredTiger database and measure the difference. This gives a rough measurement of what one can expect the compression ratio to be. That said, as soon as the new compressed database starts taking load, that compression ratio may vary. Probably not by a massive margin however.

It should be noted there are some tricky bits to consider when running a database using compression. Because WiredTiger compresses each page before it hits the disk the memory region is uncompressed. This means that highly compressed data will have a large ratio between its footprint on disk and the cache that serves it. Poorly compressed data the opposite. The effect may be the database becomes slow. It will be hard to know that the problem is the caching pattern has changed because the compression properties of the underlaying data have changed. Keeping good time series data on the cache utilization, and periodically checking the compression of the data by hand may help the DBA understand these patterns better.

For instance, note the different compression ratios of various datasets:

comp

Take Aways

  • MongoDB 3.0 has a new storage engine API, and is delivered with the optional WiredTiger engine.
  • MongoDB 3.0 with WiredTiger is much faster than MMAPv1 mostly because of increased concurrency.
  • MongoDB 3.0 with WiredTiger is much faster than MMAPv1 even when compressing the data.

Lastly, remember, MongoDB 3.0 is a new piece of software. Test before moving production workloads to it. TEST TEST TEST.

If you would like to test MongoDB 3.0 with WiredTiger, ObjectRocket has it as generally available and it's simple and quick to setup. As with anything ObjectRocket, there are a team of DBAs and Developers to help you with your projects. Don't be shy hitting them up at support@objectrocket.com with questions or email me directly.

Note: test configuration and details documented here.














本文转自张昺华-sky博客园博客,原文链接:http://www.cnblogs.com/bonelee/p/6520269.html,如需转载请自行联系原作者

相关实践学习
MongoDB数据库入门
MongoDB数据库入门实验。
快速掌握 MongoDB 数据库
本课程主要讲解MongoDB数据库的基本知识,包括MongoDB数据库的安装、配置、服务的启动、数据的CRUD操作函数使用、MongoDB索引的使用(唯一索引、地理索引、过期索引、全文索引等)、MapReduce操作实现、用户管理、Java对MongoDB的操作支持(基于2.x驱动与3.x驱动的完全讲解)。 通过学习此课程,读者将具备MongoDB数据库的开发能力,并且能够使用MongoDB进行项目开发。   相关的阿里云产品:云数据库 MongoDB版 云数据库MongoDB版支持ReplicaSet和Sharding两种部署架构,具备安全审计,时间点备份等多项企业能力。在互联网、物联网、游戏、金融等领域被广泛采用。 云数据库MongoDB版(ApsaraDB for MongoDB)完全兼容MongoDB协议,基于飞天分布式系统和高可靠存储引擎,提供多节点高可用架构、弹性扩容、容灾、备份回滚、性能优化等解决方案。 产品详情: https://www.aliyun.com/product/mongodb
相关文章
|
存储 编解码 算法
什么是压缩算法及压缩算法定义
什么是压缩算法及压缩算法定义
197 0
|
存储 SQL 算法
无序数组压缩查询【转】
假期重新把之前在新浪博客里面的文字梳理了下,搬到这里。
216 0
|
分布式数据库 数据库 Hbase
|
SQL HIVE 索引
下一篇
无影云桌面