开发者社区> 问答> 正文

无法在EMR spark群集中运行python作业

小六码奴 2019-04-22 17:13:13 308

我正在尝试向AWS EMR spark集群提交python作业。

我在spark-submit选项部分中的设置如下:

--master yarn --driver-memory 4g --executor-memory 2g

但是,我在工作期间遇到了一个失败的案例。

以下是错误日志文件:

19/04/09 10:40:25 INFO RMProxy: Connecting to ResourceManager at ip-172-31-53-241.ec2.internal/172.31.53.241:8032
19/04/09 10:40:26 INFO Client: Requesting a new application from cluster with 3 NodeManagers
19/04/09 10:40:26 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (11520 MB per container)
19/04/09 10:40:26 INFO Client: Will allocate AM container, with 4505 MB memory including 409 MB overhead
19/04/09 10:40:26 INFO Client: Setting up container launch context for our AM
19/04/09 10:40:26 INFO Client: Setting up the launch environment for our AM container
19/04/09 10:40:26 INFO Client: Preparing resources for our AM container
19/04/09 10:40:26 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
19/04/09 10:40:29 INFO Client: Uploading resource file:/mnt/tmp/spark-a8e941b7-f20f-46e5-8b2d-05c52785bd22/__spark_libs__3200812915608084660.zip -> hdfs://ip-172-31-53-241.ec2.internal:8020/user/hadoop/.sparkStaging/application_1554806206610_0001/__spark_libs__3200812915608084660.zip
19/04/09 10:40:32 INFO Client: Uploading resource s3://spark-yaowen/labelp.py -> hdfs://ip-172-31-53-241.ec2.internal:8020/user/hadoop/.sparkStaging/application_1554806206610_0001/labelp.py
19/04/09 10:40:32 INFO S3NativeFileSystem: Opening 's3://spark-yaowen/labelp.py' for reading
19/04/09 10:40:32 INFO Client: Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> hdfs://ip-172-31-53-241.ec2.internal:8020/user/hadoop/.sparkStaging/application_1554806206610_0001/pyspark.zip
19/04/09 10:40:33 INFO Client: Uploading resource file:/usr/lib/spark/python/lib/py4j-0.10.7-src.zip -> hdfs://ip-172-31-53-241.ec2.internal:8020/user/hadoop/.sparkStaging/application_1554806206610_0001/py4j-0.10.7-src.zip
19/04/09 10:40:34 INFO Client: Uploading resource file:/mnt/tmp/spark-a8e941b7-f20f-46e5-8b2d-05c52785bd22/__spark_conf__6746542371431989978.zip -> hdfs://ip-172-31-53-241.ec2.internal:8020/user/hadoop/.sparkStaging/application_1554806206610_0001/__spark_conf__.zip
19/04/09 10:40:34 INFO SecurityManager: Changing view acls to: hadoop
19/04/09 10:40:34 INFO SecurityManager: Changing modify acls to: hadoop
19/04/09 10:40:34 INFO SecurityManager: Changing view acls groups to:
19/04/09 10:40:34 INFO SecurityManager: Changing modify acls groups to:
19/04/09 10:40:34 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
19/04/09 10:40:36 INFO Client: Submitting application application_1554806206610_0001 to ResourceManager
19/04/09 10:40:37 INFO YarnClientImpl: Submitted application application_1554806206610_0001
19/04/09 10:40:38 INFO Client: Application report for application_1554806206610_0001 (state: ACCEPTED)
19/04/09 10:40:38 INFO Client:

 client token: N/A
 diagnostics: AM container is launched, waiting for AM container to Register with RM
 ApplicationMaster host: N/A
 ApplicationMaster RPC port: -1
 queue: default
 start time: 1554806436561
 final status: UNDEFINED
 tracking URL: http://ip-172-31-53-241.ec2.internal:20888/proxy/application_1554806206610_0001/
 user: hadoop

19/04/09 10:40:39 INFO Client: Application report for application_1554806206610_0001 (state: ACCEPTED)
19/04/09 10:40:40 INFO Client: Application report for application_1554806206610_0001 (state: ACCEPTED)
19/04/09 10:40:41 INFO Client: Application report for application_1554806206610_0001 (state: ACCEPTED)
19/04/09 10:40:42 INFO Client: Application report for application_1554806206610_0001 (state: ACCEPTED)
19/04/09 10:40:43 INFO Client: Application report for application_15548062066

分布式计算 资源调度 Hadoop Spark Python 容器
分享到
取消 提交回答
全部回答(1)
  • 小六码奴
    2019-07-17 23:34:00

    输入数据csv文件的第一行是空行。它会导致程序中的索引超出范围错误

    0 0
+ 订阅

大数据计算实践乐园,近距离学习前沿技术

推荐文章
相似问题
推荐课程