【数据库运维】hdfs，10T硬盘被撑爆

2022-05-08 161

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 【数据库运维】hdfs，10T硬盘被撑爆

正文

最近遇到一个很坑，我一个 6 节点的分布式数据库，一个节点 10T 的硬盘，经过一层又一层的手动翻 hdfs 本地目录去找大文件，终于找到源头，一个 dncp-block-verification.log.curr 占了 5.6T，心中一个个问号冒出来时，非常义愤填膺：这玩意也能撑这么大？比我数据文件还要大？

11.webp.jpg

image.png

今天才假期第二天，客户那边就来催了，“解决方案商量好了吗？”，我赶紧在本地虚拟机上再尝试复现一下——虽然解决方案已经出来了，把那两个文件删了就行了，但毕竟是生产环境，不敢随便删除，还是稳点好。

回过头来，这其实是一个老版本 hdfs 的 bug，在新版本之后已经修复了，我们关掉 Datanode 把这两个特别大的日志删了就行了。

另附上正统的解法：

One solution, although slightly drastic, is to disable the block scanner entirely, by setting into the HDFS 
DataNode configuration the key `dfs.datanode.scan.period.hours` to `0` (default is `504` in hours). The 
negative effect of this is that your DNs may not auto-detect corrupted block files (and would need to wait 
upon a future block reading client to detect them instead); this isn't a big deal if your average replication is 3-
ish, but you can consider the change as a short term one until you upgrade to a release that fixes the issue.
Note that this problem will not happen if you upgrade to the latest CDH 5.4.x or higher release versions, 
which includes the [HDFS-7430](https://issues.apache.org/jira/browse/HDFS-7430) rewrite changes and 
associated bug fixes. These changes have done away with the use of such a local file, thereby removing the 
problem.

【数据库运维】hdfs，10T硬盘被撑爆

正文

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

【数据库运维】hdfs，10T硬盘被撑爆

正文

热门文章

最新文章

相关课程

相关电子书

相关实验场景