柏辰爸爸 2016-11-29 789浏览量
This document gives an overview of NodeManager (NM) restart, a feature that enables the NodeManager to be restarted without losing the active containers running on the node. At a high level, the NM stores any necessary state to a local state-store as it processes container-management requests. When the NM restarts, it recovers by first loading state for various subsystems and then letting those subsystems perform recovery using the loaded state.
这个是官网的介绍。NodeManager Restart Recover是Hadoop-2.6.0全面引入的一个新特性。它旨在实现NodeManager在不丢失Active Containers、Localized Resource、Applications等的情况下的重启,NodeManager会使用Leveldb记录Containers、Localized Resource、Applications等的关键请求状态,在NodeManager重启后实现状态恢复等。
如果要开启NodeManager重启特性,需要配置以下两个参数:
<property> <description>Enable the node manager to recover after starting</description> <name>yarn.nodemanager.recovery.enabled</name> <value>true</value> </property> <property> <description>The local filesystem directory in which the node manager will store state when recovery is enabled.</description> <name>yarn.nodemanager.recovery.dir</name> <value>${hadoop.tmp.dir}/yarn-nm-recovery</value> </property>
NodeManager Restart Recover的实现范围如下:
首先,在NodeManager中,有一个NMStateStoreService组件,定义如下:
private NMStateStoreService nmStore = null;
这个NMStateStoreService组件就是NodeManager Restart Recover的具体实现,它实现了Containers、Localized Resource、Applications等关键事件状态的存储、恢复、删除的等;
NMStateStoreService是一个抽象类,继承自AbstractService这个抽象服务,并在服务的serviceInit()、serviceStart()、serviceStop()等方法中分别调用了存储相关的initStorage(conf)、startStorage()、closeStorage()三个方法,分别完成存储的初始化、开始、结束等工作。
以Applications为例,NMStateStoreService中定义了如下几个抽象方法:
/** * Load the state of applications * @return recovered state for applications * @throws IOException */ public abstract RecoveredApplicationsState loadApplicationsState() throws IOException; /** * Record the start of an application * @param appId the application ID * @param p state to store for the application * @throws IOException */ public abstract void storeApplication(ApplicationId appId, ContainerManagerApplicationProto p) throws IOException; /** * Record that an application has finished * @param appId the application ID * @throws IOException */ public abstract void storeFinishedApplication(ApplicationId appId) throws IOException; /** * Remove records corresponding to an application * @param appId the application ID * @throws IOException */ public abstract void removeApplication(ApplicationId appId) throws IOException;
NMStateStoreService组件的实现目前有两种:NMNullStateStoreService、NMLeveldbStateStoreService,而NMNullStateStoreService仅仅是空的NMStateStoreService组件,里面的上述关键方法,如initStorage(conf)、startStorage()、closeStorage()、storeApplication()等方法全部是空方法,而NMLeveldbStateStoreService则是上述参数配置后实现NodeManager Restart Recover的具体实现组件。
在NodeManager的serviceInit()方法中,有如下调用:
initAndStartRecoveryStore(conf);而initAndStartRecoveryStore()方法中,则会判断上述两个参数,完成NMStateStoreService的初始化,如下:
private void initAndStartRecoveryStore(Configuration conf) throws IOException { boolean recoveryEnabled = conf.getBoolean( YarnConfiguration.NM_RECOVERY_ENABLED, YarnConfiguration.DEFAULT_NM_RECOVERY_ENABLED); if (recoveryEnabled) { FileSystem recoveryFs = FileSystem.getLocal(conf); String recoveryDirName = conf.get(YarnConfiguration.NM_RECOVERY_DIR); if (recoveryDirName == null) { throw new IllegalArgumentException("Recovery is enabled but " + YarnConfiguration.NM_RECOVERY_DIR + " is not set."); } Path recoveryRoot = new Path(recoveryDirName); recoveryFs.mkdirs(recoveryRoot, new FsPermission((short)0700)); nmStore = new NMLeveldbStateStoreService(); } else { nmStore = new NMNullStateStoreService(); } nmStore.init(conf); nmStore.start(); }当参数yarn.nodemanager.recovery.enabled配置成true时,NMStateStoreService初始化为NMLeveldbStateStoreService,并初始化存储路径yarn.nodemanager.recovery.dir,否则为NMNullStateStoreService。
继而调用NMStateStoreService的init()和start()方法,完成初始化和启动。
最后,当NodeManager服务停止时,在其serviceStop()方法中会调用stopRecoveryStore()方法,其会调用NMStateStoreService的stop()方法,停止存储服务。
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。
分享数据库前沿,解构实战干货,推动数据库技术变革