公司的一台放多媒体文件的MONGODB要转成SHARD,在测试环境里面对gridfs里面的fs.chunks做sharding的时候。
db.settings.update( {"_id" : "chunksize"}, { $set: {"value" : 200 } } )
这将改为200MB.
这个值修改后可能要重启mongos才能生效.
在一段时间后报错如下 :
warning: chunk is larger than 65203623200 bytes because of key { files_id: ObjectId('4e2ea40efa30e751113fc633') }
Tue Sep 27 12:27:11 [conn7] about to log metadata event: { _id: "db-xxx-xxx-xxx-xxx.sky-mobi.com.sh.nj-2011-09-27T04:27:11-4388", server: "db-xxx-xxx-xxx-xxx.sky-mobi.com.sh.nj", clientAddr: "xxx.xxx.xxx.xxx:10751", time: new Date(1317097631976), what: "moveChunk.from", ns: "digoal.fs.chunks", details: { min: { files_id: ObjectId('4e2ea40efa30e751113fc633') }, max: { files_id: ObjectId('4e2ea41afa30e751bd40c633') }, step1: 0, step2: 111, note: "aborted" } }
Tue Sep 27 12:28:34 [conn12] warning: can't move chunk of size (approximately) 97528508 because maximum size allowed to move is 67108864 ns: digoal.fs.chunks { files_id: ObjectId('4e2ea40efa30e751113fc633') } -> { files_id: ObjectId('4e2ea41afa30e751bd40c633') }
Tue Sep 27 12:28:45 [conn7] command admin.$cmd command: { moveChunk: "digoal.fs.chunks", from: "digoal001/xxx.xxx.xxx:xxxx,xxx.xxx.xxx:xxxx
,xxx.xxx.xxx:xxxx
", to: "digoal004/
xxx.xxx.xxx:xxxx,xxx.xxx.xxx:xxxx
,xxx.xxx.xxx:xxxx
", min: { files_id: ObjectId('4e2ea40efa30e751113fc633') }, max: { files_id: ObjectId('4e2ea41afa30e751bd40c633') }, maxChunkSizeBytes: 67108864, shardId: "digoal.fs.chunks-files_id_ObjectId('4e2ea40efa30e751113fc633')", configdb: "
xxx.xxx.xxx:xxxx,xxx.xxx.xxx:xxxx
,xxx.xxx.xxx:xxxx
" } ntoreturn:1 reslen:109 176ms
从日志中可以看到有一个chunk的SIZE是97528508字节,无法移动,because maximum size allowed to move is 67108864
查询了fs.chunks之后,发现有超过150MB的文件。不过这个警告信息是对的,这么大的CHUNK确实不应该被MOVE。应该先SPLIT到64M以下再被MOVE。否则BALANCE进程会带来严重的性能问题。
如果要修改64M的限制,可以进入mongos/config库
db.settings.find()
{ "_id" : "chunksize", "value" : 64 }
{ "_id" : "chunksize", "value" : 64 }
db.settings.update( {"_id" : "chunksize"}, { $set: {"value" : 200 } } )
这将改为200MB.
这个值修改后可能要重启mongos才能生效.
但是推荐还是不要修改。
处理这类事件的手段举例 :
修改chunksize :
> use config
> db.settings.update( {"_id" : "chunksize"}, { $set: {"value" : new_chunk_size_in_mb } } )
Note though that for an existing cluster, it may take some time for the collections to split to that size, if smaller than before, and currently
autosplitting is only triggered if the collection gets new documents or updates.
手工拆分chunk :
The following command splits the chunk where the { _id : 99 }} resides (or would reside if present) in two. The key used as the split point is computed internally and is approximately the key which would divide the chunk in two equally sized new chunks.
> use admin
switched to db admin
> db.runCommand( { split : "test.foo" , find : { _id : 99 } } )
...
The Balancer treats all chunks the same way, regardless if they were generated by a manual or an automatic split.
预split :
In the example below the command splits the chunk where the _id 99 would reside using that key as the split point. Again note that a key need not exist for a chunk to use it in its range. The chunk may even be empty.
> use admin
switched to db admin
> db.runCommand( { split : "test.foo" , middle : { _id : 99 } } )
...
split后的chunk值范围:
["$MinKey", "99")
["99", "$MaxKey")
预move chunk :
> db.printShardingStatus("verbose") 找到属于哪个shard,需要移动到哪个shard
> db.runCommand({ moveChunk : "foo.bar" , find : { hash : "8000" }, to : "shard0000" })
预split和预move chunk的好处 :
1. Chunks will not split until the data reaches a certain minimum amount in size (hundreds of megabytes). Until this occurs balancing and migration will not take place. When the data volume is this small, distributing data between multiple servers is not required anyway. When pre-splitting manually, many chunks can exist even if very little data is present for each chunk initially.
2. "db.runCommand( { split : "test.foo" , middle : { _id : 99 } } ) " , This version of the command allows one to do a data presplitting that is especially useful in a load. If the range and distribution of keys to be data presplitting inserted are known in advance, the collection can be split proportionately to the number of servers using the command above, and the (empty) chunks could be migrated upfront using the moveChunk command.
其他参考资料: