1.MapReduce Map端Join报错’winutils.exe symlink xxx/position.txt \tmp\xxx\position.txt’ failed 1 with: CreateSymbolicLink error (1314)
在Hadoop中进行MapReduce开始时,会有进行Map端Join的场景,一般都需要在Driver中添加缓存文件。
但是执行时可能会报错:
INFO [org.apache.hadoop.mapreduce.JobSubmitter] - Submitting tokens for job: job_local1986965861_0001 INFO [org.apache.hadoop.mapred.LocalDistributedCacheManager] - Creating symlink: \tmp\xxx\position.txt <- xxx/position.txt WARN [org.apache.hadoop.fs.FileUtil] - Command 'xxx\winutils.exe symlink xxx\position.txt \tmp\xxx\position.txt' failed 1 with: CreateSymbolicLink error (1314): ??????????? WARN [org.apache.hadoop.mapred.LocalDistributedCacheManager] - Failed to create symlink: \tmp\xxx\position.txt <- xxx/position.txt INFO [org.apache.hadoop.mapred.LocalDistributedCacheManager] - Localized file:/xxx/position.txt as file:/xxx/position.txt INFO [org.apache.hadoop.mapred.LocalJobRunner] - map task executor complete. WARN [org.apache.hadoop.mapred.LocalJobRunner] - job_local1986965861_0001 java.lang.Exception: java.io.FileNotFoundException: position.txt (系统找不到指定的文件。) at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:491) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:551) Caused by: java.io.FileNotFoundException: position.txt (系统找不到指定的文件。) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138) ...
出现异常的主要问题是winutils.exe文件在创建符号表链接时失败,导致缓存找Windows本地的临时文件不存在,程序找不到缓存文件,所以不能正常执行。
原因在于:
Mapreduce正在访问一些受限制的路径/位置,而Windows账户不具备创建符号表的权限,也就是存在权限的问题。
解决方法如下:
(1)Win+R键打开Run窗口,输入gpedit.msc打开本地组策略编辑器,并按下图进行操作:
注意,在操作完成后需要重启电脑。
(2)如果在Run窗口中输入gpedit.msc提示Windows找不到文件 'gpedit.msc'。请确定文件名是否正确后,再试一次.,则可能是组策略编辑器遗失,需要重新安装,可以参考https://blog.csdn.net/qq_41731507/article/details/115875247进行解决,解决后再重新打开组策略编辑器进行编辑和重启。
2.通过脚本停止Yarn时提示no resourcemanager to stop、no nodemanager to stop
在通过脚本stop-yarn.sh
停止Yarn集群时,有时候会报错,例如:
[root@node03 ~]$ stop-yarn.sh stopping yarn daemons no resourcemanager to stop node01: no nodemanager to stop node02: no nodemanager to stop node03: no nodemanager to stop no proxyserver to stop
但是通过jps命令查看各个节点时,可以看到ResourceManager和NodeManager都还在运行状态,说明没有成功停止。
原因:
这是因为yarn-deamon.sh文件中配置了ResourceManager和NodeManager服务的pid文件,里面存储了它们的pid,默认的存储位置是/tmp,但是系统会定期清理这个目录,所以pid文件可能会丢失,找不到文件就会报上面的错。
解决办法:
为了一劳永逸地解决这个问题,先在一个节点进行操作,需要修改yarn-deamon.sh中的pid文件路径,该文件位于Hadoop安装目录下的sbin目录下,编辑该文件的88行左右,如下:
if [ "$YARN_PID_DIR" = "" ]; then # YARN_PID_DIR=/tmp YARN_PID_DIR=/opt/software/hadoop-2.9.2/data/pids fi
可以看到,原目录YARN_PID_DIR的值为/tmp,这里修改为/opt/software/hadoop-2.9.2/data/pids,也可以根据自己的需要进行设置。
同时手动创建该目录mkdir /opt/software/hadoop-2.9.2/data/pids。
在修改和创建完成后需要通过分发脚本将yarn-deamon.sh脚本和pids目录分发到其他节点,或者在其他节点手动进行同样的操作。
然后通过kill -9 pid停止各个节点的Yarn ResourceManager和NodeManager服务,然后再执行start-yarn.sh就会在指定的目录(/opt/software/hadoop-2.9.2/data/pids)下创建对应的pid文件。
扩展:
停止Hadoop和HistoryServer时可能也会遇到类似的问题,例如no namenode to stop和no historyserver to stop,问题的原因和Yarn类似,也需要修改对应的pid文件路径:
Hadoop修改对应的hadoop-daemon.sh114行左右,如下:
if [ "$HADOOP_PID_DIR" = "" ]; then # HADOOP_PID_DIR=/tmp HADOOP_PID_DIR=/opt/software/hadoop-2.9.2/data/pids fi
HistoryServer修改对应的mr-jobhistory-daemon.sh87行左右,如下:
if [ "$HADOOP_MAPRED_PID_DIR" = "" ]; then # HADOOP_MAPRED_PID_DIR=/tmp HADOOP_MAPRED_PID_DIR=/opt/software/hadoop-2.9.2/data/pids fi
然后将脚本分发至其他节点,先手动停止对应开启的服务,再通过脚本启动也会在指定目录下生成相应的pid文件。
3.Hadoop编译源码报错[ERROR] xxx.java:864: 警告: 没有 @return
有时候需要对Hadoop源码进行二次开发,开发完成需要使用Maven编译打包才能使用,但是有时候打包会报错,类似如下:
... [ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/target/generated-sources/avro/org/apache/hadoop/mapreduce/jobhistory/MapAttemptFinished.java:864: 警告: 没有 @return [ERROR] public org.apache.hadoop.mapreduce.jobhistory.MapAttemptFinished.Builder clearPhysMemKbytes() { [ERROR] ^ [ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/MapAttemptFinishedEvent.java:190: 警告: 没有 @return [ERROR] public TaskID getTaskId() { [ERROR] ^ [ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/MapAttemptFinishedEvent.java:194: 警告: 没有 @return [ERROR] public TaskAttemptID getAttemptId() { [ERROR] ^ [ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/MapAttemptFinishedEvent.java:199: 警告: 没有 @return [ERROR] public TaskType getTaskType() { [ERROR] ^ [ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/MapAttemptFinishedEvent.java:203: 警告: 没有 @return [ERROR] public String getTaskStatus() { return taskStatus.toString(); } [ERROR] ^ [ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/MapAttemptFinishedEvent.java:205: 警告: 没有 @return [ERROR] public long getMapFinishTime() { return mapFinishTime; } ... [ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/TaskReport.java:63: 警告: @param 没有说明 [ERROR] * @param startTime [ERROR] ^ [ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/TaskReport.java:64: 警告: @param 没有说明 [ERROR] * @param finishTime [ERROR] ^ [ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/TaskReport.java:65: 警告: @param 没有说明 [ERROR] * @param counters [ERROR] ^ [ERROR] [ERROR] Command line was: /opt/software/java/jdk1.8.0_231/jre/../bin/javadoc @options @packages [ERROR] [ERROR] Refer to the generated Javadoc files in '/opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/target' dir. [ERROR] [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <args> -rf :hadoop-mapreduce-client-core
即错误类似于xxx.java:864: 警告: 没有 @return和[ERROR] * @param counters;
经过初步观察和分析,可以得到是编译Java文档时Java代码的参数说明格式不对导致的,此时可以选择不编译Java文档,即在编译命令后添加-D maven.javadoc.skip=true参数跳过编译Java文档,完整的命令为mvn package -P dist,native -D skipTests -D tar -D maven.javadoc.skip=true,此时再编译就不会报错了。
4.Hive开发自定义UDF报错Failure to find org.pentaho:pentaho-aggdesigner-algorithm:pom:5.1.5-jhyde
在使用Hive时,有时候需要自定义函数,即UDF,定义好类后需要打包,有时候会报错如下:
即如下错误信息:
Failure to find org.pentaho:pentaho-aggdesigner-algorithm:pom:5.1.5-jhyde in http://maven.aliyun.com/nexus/content/repositories/central/ was cached in the local repository, resolution will not be reattempted until the update interval of alimaven has elapsed or updates are forced [ERROR] COMPILATION ERROR : [INFO] ------------------------------------------------------------- [ERROR] 读取XXX\.m2\repository\org\pentaho\pentaho-aggdesigner-algorithm\5.1.5-jhyde\pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar时出错; zip END header not found [ERROR] XXX/HiveUDFDemo/src/main/java/com/bigdata/hive/nvl.java:[1,1] 无法访问com.bigdata.hive zip END header not found [INFO] 2 errors
显然,这是pentaho-aggdesigner-algorithm包出了问题,此时需要在Maven项目的配置文件pom.xml中修改hive-exec依赖的配置,增加以下内容:
<exclusions> <exclusion> <groupId>org.pentaho</groupId> <artifactId>pentaho-aggdesigner-algorithm</artifactId> </exclusion> </exclusions>
完整内容如下:
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.example</groupId> <artifactId>HiveUDFDemo</artifactId> <version>1.0-SNAPSHOT</version> <properties> <maven.compiler.source>8</maven.compiler.source> <maven.compiler.target>8</maven.compiler.target> </properties> <dependencies> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>2.3.7</version> <exclusions> <exclusion> <groupId>org.pentaho</groupId> <artifactId>pentaho-aggdesigner-algorithm</artifactId> </exclusion> </exclusions> </dependency> </dependencies> </project>
再重启IDEA,打包即可成功。