会不会是因为原本的数据不完整,导致节点的数据块出错。中间有一次查看日志发现HDFS进入了安全模式,然后我给强制退出了。请教一下master闪挂的原因,以及如何解决?
根据你提供的日志判断master闪挂的原因是有挂掉的DataNode节点,通过节点监控WebUI或者访问各个节点查看是否有DataNode已经挂掉,并且查看hdfs-site.xml文件replication的配置,由于节点坏死导致无法获取Block数据,重启死掉的DataNode节点即可,还有一种原因是DataNode之间的通信有问题,可能性很小,参考以下文章:https://thebipalace.com/2016/05/16/hadoop-error-org-apache-hadoop-hdfs-blockmissingexception-could-not-obtain-block/
HMaster报错是在启动Master的是开启activeMasterManager的时候调用finishActiveMasterInitialization()结束activeMaster初始化的时候this.fileSystemManager = new MasterFileSystem(this, this);由于DataNode节点死掉的缘故,创建MasterFileSystem会抛异常,导致“Failed to become active master”问题,源码如下:
private void startActiveMasterManager(int infoPort) throws KeeperException {
String backupZNode = ZKUtil.joinZNode(
zooKeeper.backupMasterAddressesZNode, serverName.toString());
/*
LOG.info("Adding backup master ZNode " + backupZNode);
if (!MasterAddressTracker.setMasterAddress(zooKeeper, backupZNode,
serverName, infoPort)) {
LOG.warn("Failed create of " + backupZNode + " by " + serverName);
}
activeMasterManager.setInfoPort(infoPort);
// Start a thread to try to become the active master, so we won't block here
Threads.setDaemonThreadRunning(new Thread(new Runnable() {
@Override
public void run() {
int timeout = conf.getInt(HConstants.ZK_SESSION_TIMEOUT,
HConstants.DEFAULT_ZK_SESSION_TIMEOUT);
// If we're a backup master, stall until a primary to writes his address
if (conf.getBoolean(HConstants.MASTER_TYPE_BACKUP,
HConstants.DEFAULT_MASTER_TYPE_BACKUP)) {
LOG.debug("HMaster started in backup mode. "
+ "Stalling until master znode is written.");
// This will only be a minute or so while the cluster starts up,
// so don't worry about setting watches on the parent znode
while (!activeMasterManager.hasActiveMaster()) {
LOG.debug("Waiting for master address ZNode to be written "
+ "(Also watching cluster state node)");
Threads.sleep(timeout);
}
}
MonitoredTask status = TaskMonitor.get().createStatus("Master startup");
status.setDescription("Master startup");
try {
if (activeMasterManager.blockUntilBecomingActiveMaster(timeout, status)) {
finishActiveMasterInitialization(status);
}
} catch (Throwable t) {
status.setStatus("Failed to become active: " + t.getMessage());
LOG.fatal("Failed to become active master", t);
// HBASE-5680: Likely hadoop23 vs hadoop 20.x/1.x incompatibility
if (t instanceof NoClassDefFoundError &&
t.getMessage()
.contains("org/apache/hadoop/hdfs/protocol/HdfsConstants$SafeModeAction")) {
// improved error message for this special case
abort("HBase is having a problem with its Hadoop jars. You may need to "
+ "recompile HBase against Hadoop version "
+ org.apache.hadoop.util.VersionInfo.getVersion()
+ " or change your hadoop jars to start properly", t);
} else {
abort("Unhandled exception. Starting shutdown.", t);
}
} finally {
status.cleanup();
}
}
}, getServerName().toShortString() + ".activeMasterManager"));
}
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。