使用e-mapreduce创建集群和spark作业,输入和输出使用oss。
在输入数据很小时(几kb),程序没有问题。当输入的文件是100MB时,作业运行失败,查看节点的错误日志,有如下内容:
16/02/06 15:56:08 INFO oss.OssRDD: Input split: oss://syq-emr/testset.txt:52568064+52568064
16/02/06 15:56:08 INFO nat.NativeOssFileSystem: Opening 'oss://syq-emr/testset.txt' for reading
16/02/06 15:56:12 INFO nat.NativeOssFileSystem: Opening key 'testset.txt' for reading at position '52568064'
16/02/06 15:56:12 INFO aliyun.oss: [Server]Unable to execute HTTP request: Not Found
16/02/06 15:56:12 INFO aliyun.oss: [Server]Unable to execute HTTP request: Not Found
16/02/06 15:56:12 INFO nat.NativeOssFileSystem: OutputStream for key 'output-7/part-1' writing to tempfile '/mnt/disk1/data/oss/output-8972595934495471655.tmp'
16/02/06 15:56:12 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 1
16/02/06 15:56:12 INFO storage.MemoryStore: ensureFreeSpace(34245) called with curMem=262391, maxMem=556038881
16/02/06 15:56:12 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 33.4 KB, free 530.0 MB)
16/02/06 15:56:12 INFO broadcast.TorrentBroadcast: Reading broadcast variable 1 took 14 ms
16/02/06 15:56:13 INFO storage.MemoryStore: ensureFreeSpace(1178808) called with curMem=296636, maxMem=556038881
16/02/06 15:56:13 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 1151.2 KB, free 528.9 MB)
16/02/06 15:56:13 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 2
16/02/06 15:56:13 INFO storage.MemoryStore: ensureFreeSpace(97) called with curMem=1475444, maxMem=556038881
16/02/06 15:56:13 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 97.0 B, free 528.9 MB)
16/02/06 15:56:13 INFO broadcast.TorrentBroadcast: Reading broadcast variable 2 took 26 ms
16/02/06 15:56:13 INFO storage.MemoryStore: ensureFreeSpace(40) called with curMem=1475541, maxMem=556038881
16/02/06 15:56:13 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 40.0 B, free 528.9 MB)
16/02/06 16:09:03 ERROR executor.Executor: Exception in task 1.0 in stage 1.0 (TID 3)
org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 52568064; received: 5042666
at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:180)
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
at com.aliyun.fs.oss.nat.NativeOssFileSystem$NativeOssFsInputStream.read(NativeOssFileSystem.java:71)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at com.aliyun.fs.oss.common.OssRecordReader.next(OssRecordReader.java:120)
at com.aliyun.fs.oss.common.OssRecordReader.next(OssRecordReader.java:37)
at org.apache.spark.aliyun.oss.OssRDD
$$ anon$1.getNext(OssRDD.scala:100) at org.apache.spark.aliyun.oss.OssRDD $$
anon$1.getNext(OssRDD.scala:88)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator
$$ anon$11.hasNext(Iterator.scala:327) at org.apache.spark.aliyun.oss.OssOps.org$apache$spark$aliyun$oss$OssOps $$
writeToFile$1(OssOps.scala:100)
at org.apache.spark.aliyun.oss.OssOps
$$ anonfun$saveToOssFile$1.apply(OssOps.scala:112) at org.apache.spark.aliyun.oss.OssOps $$
anonfun$saveToOssFile$1.apply(OssOps.scala:112)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
感觉是读oss上的输入文件失败了。读取重复尝试了3次,每次都是上述的错误提示。最终任务终止。
请各位帮忙分析原因。非常感谢!
从报错上,初步判断可能是OSS的短暂服务不稳定或者作业参数配置不当导致Executor运行期间异常导致的,但是目前还需要更多的判断信息。建议提交工单来进一步排查。
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。