在nm启动container的过程中,有一个步骤是把当前的tokens写入本地目录,默认情况下具体的调用的方法是在DefaultContainerExecutor类的startLocalizer 方法中:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
public
synchronized
void
startLocalizer (Path nmPrivateContainerTokensPath,
InetSocketAddress nmAddr, String user, String appId, String locId,
List<String> localDirs, List<String> logDirs)
throws
IOException, InterruptedException {
ContainerLocalizer localizer =
new
ContainerLocalizer( lfs, user, appId, locId, getPaths(localDirs),
RecordFactoryProvider.getRecordFactory(getConf()));
createUserLocalDirs(localDirs, user);
//Initialize the local directories for a particular user,create $local.dir/usercache/$user and its immediate parent
createUserCacheDirs(localDirs, user);
//Initialize the local cache directories for a particular user.$local.dir/usercache/$user,$local.dir/usercache/$user/appcache,$local.dir/usercache/$user/filecache
createAppDirs(localDirs, user, appId);
//Initialize the local directories for a particular user.$local.dir/usercache/$user/appcache/$appi
createAppLogDirs(appId, logDirs);
//Create application log directories on all disks.create $log.dir/$appid
// TODO : Why pick first app dir. The same in LCE why not random?
Path appStorageDir = getFirstApplicationDir (localDirs, user, appId);
String tokenFn = String.format(ContainerLocalizer.TOKEN_FILE_NAME_FMT, locId);
Path tokenDst =
new
Path (appStorageDir, tokenFn);
lfs.util().copy(nmPrivateContainerTokensPath, tokenDst);
LOG.info(
"Copying from "
+ nmPrivateContainerTokensPath +
" to "
+ tokenDst);
lfs.setWorkingDirectory(appStorageDir);
LOG.info(
"CWD set to "
+ appStorageDir +
" = "
+ lfs.getWorkingDirectory());
// TODO : DO it over RPC for maintaining similarity?
localizer.runLocalization(nmAddr);
}
|
主要注意 getFirstApplicationDir (localDirs, user, appId)这一段,先生成token文件的名称,然后调用copy的操作把具体的token文件cp到yarn的本地工作目录。
这里getFirstApplicationDir 方法,传入的第一个参数是yarn写临时数据的目录,和
1
|
yarn.nodemanager.local-dirs(List of directories to store localized files in.)
|
相关
1
2
3
4
|
private
Path getFirstApplicationDir (List<String> localDirs, String user,
String appId) {
return
getApplicationDir(
new
Path(localDirs.get(
0
)), user, appId);
}
|
而这里使用了localDirs.get(0),再来看下localDirs的生成:
localDirs的获取定义在ResourceLocalizationService内部类LocalizerRunner类的run方法中:
1
2
3
4
|
private
LocalDirsHandlerService dirsHandler;
....
List<String> localDirs = dirsHandler.getLocalDirs();
List<String> logDirs = dirsHandler.getLogDirs();
|
调用LocalDirsHandlerService 类:
1
2
3
4
5
6
7
8
|
/** Local dirs to store localized files in */
private
DirectoryCollection localDirs =
null
;
/** storage for container logs*/
private
DirectoryCollection logDirs =
null
;
localDirs =
new
DirectoryCollection(
validatePaths(conf.getTrimmedStrings(YarnConfiguration.NM_LOCAL_DIRS)));
logDirs =
new
DirectoryCollection(
validatePaths(conf.getTrimmedStrings(YarnConfiguration.NM_LOG_DIRS)));
|
这里localDirs 是通过解析yarn.nodemanager.local-dirs配置项的值获取的,因为配置项是一定的,这就导致得出的localDirs 一直是同一个List,从而导致写入token的目录一直是同一个目录,这其实是一个bug:
https://issues.apache.org/jira/browse/YARN-2566
导致在写入token文件时,所有的container的token都会写到同一个目录,解决的方法其实是使用了随机数的方式,具体可以看patch.
本文转自菜菜光 51CTO博客,原文链接:http://blog.51cto.com/caiguangguang/1585277,如需转载请自行联系原作者