hadoop将结果从hdfs复制到S3-问答-阿里云开发者社区-阿里云

开发者社区> 问答> 正文

hadoop将结果从hdfs复制到S3

小六码奴 2019-04-22 16:11:53 964

我想从HDFS复制结果到S3,但有一些问题

这是代码(--steps)

{

"Name":"AAAAA",
"Type":"CUSTOM_JAR",
"Jar":"command-runner.jar",
"ActionOnFailure":"CONTINUE",
"Args": [
    "s3-dist-cp",
    "--src", "hdfs:///seqaddid_output",
    "--dest", "s3://wuda-notebook/seqaddid"
]  

}
这是日志:

2019-04-12 03:01:23,571 INFO com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): Running with args: -libjars /usr/share/aws/emr/s3-dist-cp/lib/commons-httpclient-3.1.jar,/usr/share/aws/emr/s3-dist-cp/lib/commons-logging-1.0.4.jar,/usr/share/aws/emr/s3-dist-cp/lib/guava-18.0.jar,/usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp-2.10.0.jar,/usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp.jar --src hdfs:///seqaddid_output/ --dest s3://wuda-notebook/seqaddid 2019-04-12 03:01:24,196 INFO com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): S3DistCp args: --src hdfs:///seqaddid_output/ --dest s3://wuda-notebook/seqaddid 2019-04-12 03:01:24,203 INFO com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): Using output path 'hdfs:/tmp/4f93d497-fade-4c78-86b9-59fc3da35b4e/output' 2019-04-12 03:01:24,263 INFO com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): GET http://169.254.169.254/latest/meta-data/placement/availability-zone result: us-east-1f 2019-04-12 03:01:24,664 FATAL com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): Failed to get source file system java.io.FileNotFoundException: File does not exist: hdfs:/seqaddid_output at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1444) at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1437) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1452) at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:795) at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:705) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at com.amazon.elasticmapreduce.s3distcp.Main.main(Main.java:22) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:234) at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

分布式计算 Hadoop
分享到
取消 提交回答
全部回答(1)
  • 小六码奴
    2019-07-17 23:33:59

    当CopyFilesReducer使用多个CopyFilesRunable实例从S3下载文件时,看起来该错误是由竞争条件引起的。问题是它在多个线程中使用相同的临时目录,并且线程在完成后删除临时目录。因此,当一个线程在另一个线程之前完成时,它会删除另一个线程仍在使用的临时目录。

    我已经向AWS报告了这个问题,但同时你可以通过在你的作业配置中将变量s3DistCp.copyfiles.mapper.numWorkers设置为1 来强制reducer使用单个线程来解决这个问题。

    0 0
+ 订阅

大数据计算实践乐园,近距离学习前沿技术

推荐文章
相似问题