开发者社区> 问答> 正文

e-mapreduce读取OSS失败

help@ftp4oss 2016-02-07 00:08:48 3563

使用e-mapreduce创建集群和spark作业,输入和输出使用oss。
在输入数据很小时(几kb),程序没有问题。当输入的文件是100MB时,作业运行失败,查看节点的错误日志,有如下内容:
16/02/06 15:56:08 INFO oss.OssRDD: Input split: oss://syq-emr/testset.txt:52568064+52568064
16/02/06 15:56:08 INFO nat.NativeOssFileSystem: Opening 'oss://syq-emr/testset.txt' for reading
16/02/06 15:56:12 INFO nat.NativeOssFileSystem: Opening key 'testset.txt' for reading at position '52568064'
16/02/06 15:56:12 INFO aliyun.oss: [Server]Unable to execute HTTP request: Not Found

16/02/06 15:56:12 INFO aliyun.oss: [Server]Unable to execute HTTP request: Not Found

16/02/06 15:56:12 INFO nat.NativeOssFileSystem: OutputStream for key 'output-7/part-1' writing to tempfile '/mnt/disk1/data/oss/output-8972595934495471655.tmp'
16/02/06 15:56:12 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 1
16/02/06 15:56:12 INFO storage.MemoryStore: ensureFreeSpace(34245) called with curMem=262391, maxMem=556038881
16/02/06 15:56:12 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 33.4 KB, free 530.0 MB)
16/02/06 15:56:12 INFO broadcast.TorrentBroadcast: Reading broadcast variable 1 took 14 ms
16/02/06 15:56:13 INFO storage.MemoryStore: ensureFreeSpace(1178808) called with curMem=296636, maxMem=556038881
16/02/06 15:56:13 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 1151.2 KB, free 528.9 MB)
16/02/06 15:56:13 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 2
16/02/06 15:56:13 INFO storage.MemoryStore: ensureFreeSpace(97) called with curMem=1475444, maxMem=556038881
16/02/06 15:56:13 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 97.0 B, free 528.9 MB)
16/02/06 15:56:13 INFO broadcast.TorrentBroadcast: Reading broadcast variable 2 took 26 ms
16/02/06 15:56:13 INFO storage.MemoryStore: ensureFreeSpace(40) called with curMem=1475541, maxMem=556038881
16/02/06 15:56:13 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 40.0 B, free 528.9 MB)
16/02/06 16:09:03 ERROR executor.Executor: Exception in task 1.0 in stage 1.0 (TID 3)
org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 52568064; received: 5042666

at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:180) 
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137) 
at com.aliyun.fs.oss.nat.NativeOssFileSystem$NativeOssFsInputStream.read(NativeOssFileSystem.java:71) 
at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) 
at java.io.BufferedInputStream.read(BufferedInputStream.java:334) 
at java.io.DataInputStream.read(DataInputStream.java:100) 
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180) 
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) 
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) 
at com.aliyun.fs.oss.common.OssRecordReader.next(OssRecordReader.java:120) 
at com.aliyun.fs.oss.common.OssRecordReader.next(OssRecordReader.java:37) 
at org.apache.spark.aliyun.oss.OssRDD

$$ anon$1.getNext(OssRDD.scala:100) at org.apache.spark.aliyun.oss.OssRDD $$

anon$1.getNext(OssRDD.scala:88)

at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) 
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) 
at scala.collection.Iterator

$$ anon$11.hasNext(Iterator.scala:327) at org.apache.spark.aliyun.oss.OssOps.org$apache$spark$aliyun$oss$OssOps $$

writeToFile$1(OssOps.scala:100)

at org.apache.spark.aliyun.oss.OssOps

$$ anonfun$saveToOssFile$1.apply(OssOps.scala:112) at org.apache.spark.aliyun.oss.OssOps $$

anonfun$saveToOssFile$1.apply(OssOps.scala:112)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) 
at org.apache.spark.scheduler.Task.run(Task.scala:70) 
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:745) 

感觉是读oss上的输入文件失败了。读取重复尝试了3次,每次都是上述的错误提示。最终任务终止。
请各位帮忙分析原因。非常感谢!

分布式计算 Apache 对象存储 Spark
分享到
取消 提交回答
全部回答(1)
  • qilu
    2019-07-17 18:28:14

    从报错上,初步判断可能是OSS的短暂服务不稳定或者作业参数配置不当导致Executor运行期间异常导致的,但是目前还需要更多的判断信息。建议提交工单来进一步排查。

    0 0
+ 订阅

大数据计算实践乐园,近距离学习前沿技术

推荐文章
相似问题
推荐课程