Spark通过YARN提交任务不成功(包含YARN cluster和YARN client)

简介:

 无论用YARN cluster和YARN client来跑,均会出现如下问题。

 

复制代码
[spark@master spark-1.6.1-bin-hadoop2.6]$ jps
2049 NameNode
2706 Jps
2372 ResourceManager
2660 Master
2203 SecondaryNameNode
[spark@master spark-1.6.1-bin-hadoop2.6]$ $SPARK_HOME/bin/spark-submit \
>  --master yarn\
>  --deploy-mode client \
>  --name javawordcount \
>  --num-executors 1 \
>  --driver-memory 512m \
>  --executor-memory 512m \
>  --executor-cores 1 \
>  --class zhouls.bigdata.MyJavaWordCount \
>  /home/spark/testspark/mySpark-1.0-SNAPSHOT.jar \
>  hdfs://master:9000/testspark/inputData/wordcount/wc.txt \
>  hdfs://master:9000/testspark/outData/MyJavaWordCount
17/03/30 20:36:57 INFO spark.SparkContext: Running Spark version 1.6.1
17/03/30 20:36:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/03/30 20:36:59 INFO spark.SecurityManager: Changing view acls to: spark
17/03/30 20:36:59 INFO spark.SecurityManager: Changing modify acls to: spark
17/03/30 20:36:59 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark)
17/03/30 20:37:01 INFO util.Utils: Successfully started service 'sparkDriver' on port 54074.
17/03/30 20:37:03 INFO slf4j.Slf4jLogger: Slf4jLogger started
17/03/30 20:37:03 INFO Remoting: Starting remoting
17/03/30 20:37:04 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.80.10:52224]
17/03/30 20:37:04 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 52224.
17/03/30 20:37:04 INFO spark.SparkEnv: Registering MapOutputTracker
17/03/30 20:37:04 INFO spark.SparkEnv: Registering BlockManagerMaster
17/03/30 20:37:04 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-b6575213-cc8e-4a50-bc83-6ab089a65341
17/03/30 20:37:04 INFO storage.MemoryStore: MemoryStore started with capacity 146.2 MB
17/03/30 20:37:05 INFO spark.SparkEnv: Registering OutputCommitCoordinator
17/03/30 20:37:06 INFO server.Server: jetty-8.y.z-SNAPSHOT
17/03/30 20:37:06 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
17/03/30 20:37:06 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
17/03/30 20:37:06 INFO ui.SparkUI: Started SparkUI at http://192.168.80.10:4040
17/03/30 20:37:06 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-fdfdb880-f6cf-47eb-8981-1176e657d466/httpd-f5d25b97-30bd-4f13-b925-d96026063a63
17/03/30 20:37:06 INFO spark.HttpServer: Starting HTTP Server
17/03/30 20:37:06 INFO server.Server: jetty-8.y.z-SNAPSHOT
17/03/30 20:37:06 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:54651
17/03/30 20:37:06 INFO util.Utils: Successfully started service 'HTTP file server' on port 54651.
17/03/30 20:37:07 INFO spark.SparkContext: Added JAR file:/home/spark/testspark/mySpark-1.0-SNAPSHOT.jar at http://192.168.80.10:54651/jars/mySpark-1.0-SNAPSHOT.jar with timestamp 1490877427613
17/03/30 20:37:08 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.80.10:8032
17/03/30 20:37:09 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
17/03/30 20:37:09 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
17/03/30 20:37:09 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
17/03/30 20:37:09 INFO yarn.Client: Setting up container launch context for our AM
17/03/30 20:37:09 INFO yarn.Client: Setting up the launch environment for our AM container
17/03/30 20:37:09 INFO yarn.Client: Preparing resources for our AM container
17/03/30 20:37:14 INFO yarn.Client: Uploading resource file:/usr/local/spark/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar -> hdfs://master:9000/user/spark/.sparkStaging/application_1490877371054_0001/spark-assembly-1.6.1-hadoop2.6.0.jar
17/03/30 20:37:36 INFO yarn.Client: Uploading resource file:/tmp/spark-fdfdb880-f6cf-47eb-8981-1176e657d466/__spark_conf__3748671039525906996.zip -> hdfs://master:9000/user/spark/.sparkStaging/application_1490877371054_0001/__spark_conf__3748671039525906996.zip
17/03/30 20:37:38 INFO spark.SecurityManager: Changing view acls to: spark
17/03/30 20:37:38 INFO spark.SecurityManager: Changing modify acls to: spark
17/03/30 20:37:38 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark)
17/03/30 20:37:38 INFO yarn.Client: Submitting application 1 to ResourceManager
17/03/30 20:37:39 INFO impl.YarnClientImpl: Submitted application application_1490877371054_0001
17/03/30 20:37:40 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:37:40 INFO yarn.Client:
     client token: N/A     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1490877458881
     final status: UNDEFINED
     tracking URL: http://master:8088/proxy/application_1490877371054_0001/
     user: spark
17/03/30 20:37:41 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:37:42 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:37:43 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:37:44 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:37:45 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:37:46 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:37:47 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:37:48 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:37:49 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:37:50 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:37:51 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:37:52 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:37:53 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:37:54 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:37:55 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:37:56 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:37:57 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:37:58 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:37:59 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:38:00 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:38:01 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:38:02 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:38:03 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:38:04 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:38:05 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:38:06 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:38:07 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:38:08 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:38:09 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:38:10 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:38:12 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:38:13 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:38:14 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:38:15 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:38:16 INFO yarn.Client: Application report for application_1490877371054_0001 (state: ACCEPTED)
17/03/30 20:38:17 INFO yarn.Client: Application report for application_1490877371054_0001 (state: FAILED)
17/03/30 20:38:17 INFO yarn.Client:
     client token: N/A
     diagnostics: Application application_1490877371054_0001 failed 2 times due to AM Container for appattempt_1490877371054_0001_000002 exited with  exitCode: -103
For more detailed output, check application tracking page:http://master:8088/proxy/application_1490877371054_0001/Then, click on links to logs of each attempt.
Diagnostics: Container [pid=2417,containerID=container_1490877371054_0001_02_000001] is running beyond virtual memory limits. Current usage: 79.2 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1490877371054_0001_02_000001 :
    |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
    |- 2421 2417 2417 2417 (java) 283 147 2256482304 19967 /usr/local/jdk/jdk1.8.0_60/bin/java -server -Xmx512m -Djava.io.tmpdir=/usr/local/hadoop/hadoop-2.6.0/tmp/nm-local-dir/usercache/spark/appcache/application_1490877371054_0001/container_1490877371054_0001_02_000001/tmp -Dspark.yarn.app.container.log.dir=/usr/local/hadoop/hadoop-2.6.0/logs/userlogs/application_1490877371054_0001/container_1490877371054_0001_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg 192.168.80.10:54074 --executor-memory 512m --executor-cores 1 --properties-file /usr/local/hadoop/hadoop-2.6.0/tmp/nm-local-dir/usercache/spark/appcache/application_1490877371054_0001/container_1490877371054_0001_02_000001/__spark_conf__/__spark_conf__.properties
    |- 2417 2415 2417 2417 (bash) 0 1 108650496 305 /bin/bash -c /usr/local/jdk/jdk1.8.0_60/bin/java -server -Xmx512m -Djava.io.tmpdir=/usr/local/hadoop/hadoop-2.6.0/tmp/nm-local-dir/usercache/spark/appcache/application_1490877371054_0001/container_1490877371054_0001_02_000001/tmp -Dspark.yarn.app.container.log.dir=/usr/local/hadoop/hadoop-2.6.0/logs/userlogs/application_1490877371054_0001/container_1490877371054_0001_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg '192.168.80.10:54074' --executor-memory 512m --executor-cores 1 --properties-file /usr/local/hadoop/hadoop-2.6.0/tmp/nm-local-dir/usercache/spark/appcache/application_1490877371054_0001/container_1490877371054_0001_02_000001/__spark_conf__/__spark_conf__.properties 1> /usr/local/hadoop/hadoop-2.6.0/logs/userlogs/application_1490877371054_0001/container_1490877371054_0001_02_000001/stdout 2> /usr/local/hadoop/hadoop-2.6.0/logs/userlogs/application_1490877371054_0001/container_1490877371054_0001_02_000001/stderr

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Failing this attempt. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1490877458881
     final status: FAILED
     tracking URL: http://master:8088/cluster/app/application_1490877371054_0001
     user: spark
17/03/30 20:38:17 INFO yarn.Client: Deleting staging directory .sparkStaging/application_1490877371054_0001
17/03/30 20:38:17 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:530)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
    at zhouls.bigdata.MyJavaWordCount.main(MyJavaWordCount.java:31)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
17/03/30 20:38:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
17/03/30 20:38:17 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.80.10:4040
17/03/30 20:38:17 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
17/03/30 20:38:17 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down
17/03/30 20:38:17 INFO cluster.YarnClientSchedulerBackend: Stopped
17/03/30 20:38:17 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/03/30 20:38:17 INFO storage.MemoryStore: MemoryStore cleared
17/03/30 20:38:17 INFO storage.BlockManager: BlockManager stopped
17/03/30 20:38:17 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
17/03/30 20:38:17 WARN metrics.MetricsSystem: Stopping a MetricsSystem that is not running
17/03/30 20:38:17 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/03/30 20:38:17 INFO spark.SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:530)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
    at zhouls.bigdata.MyJavaWordCount.main(MyJavaWordCount.java:31)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/03/30 20:38:17 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
17/03/30 20:38:18 INFO util.ShutdownHookManager: Shutdown hook called
17/03/30 20:38:18 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
17/03/30 20:38:18 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-fdfdb880-f6cf-47eb-8981-1176e657d466
17/03/30 20:38:18 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-fdfdb880-f6cf-47eb-8981-1176e657d466/httpd-f5d25b97-30bd-4f13-b925-d96026063a63
[spark@master spark-1.6.1-bin-hadoop2.6]
复制代码

 

 

 

 

 

 

 

 

 

 

 

 

解决思路

  第一种解决版本:首先想到是集群中内存资源不足,可以检查下每台机器是否有足够剩余内存( free -g);也可能是其他已经提交的Spark应用占了大部分资源;

  第二种解决办法:如果1>正常,我们可以看看YARN集群是否启动成功。注意“坑”可能就在这里: 即使Slave上的nodemanager进程存在,要注意检查resource manager日志,看看各个node manager是否启动成功,我的问题就出现在这里:进程在,但是日志显示node manager状态为UNHEALTHY,所以YARN集群能识别到的总内存资源为0。。。

  检查了UNHEALTHY的原因,是因为/tmp下一个目录被识别为bad, 因为是临时目录,我把每个node manager的对应目录删掉,然后重启YARN集群,最终问题解决。

 



本文转自大数据躺过的坑博客园博客,原文链接:http://www.cnblogs.com/zlslch/p/6645549.html,如需转载请自行联系原作者

相关文章
|
22天前
|
存储 缓存 分布式计算
Spark任务OOM问题如何解决?
大家好,我是V哥。在实际业务中,Spark任务常因数据量过大、资源分配不合理或代码瓶颈导致OOM(Out of Memory)。本文详细分析了各种业务场景下的OOM原因,并提供了优化方案,包括调整Executor内存和CPU资源、优化内存管理策略、数据切分及减少宽依赖等。通过综合运用这些方法,可有效解决Spark任务中的OOM问题。关注威哥爱编程,让编码更顺畅!
151 3
|
3月前
|
SQL 分布式计算 DataWorks
DataWorks产品使用合集之如何开发ODPS Spark任务
DataWorks作为一站式的数据开发与治理平台,提供了从数据采集、清洗、开发、调度、服务化、质量监控到安全管理的全套解决方案,帮助企业构建高效、规范、安全的大数据处理体系。以下是对DataWorks产品使用合集的概述,涵盖数据处理的各个环节。
|
3月前
|
分布式计算 资源调度 大数据
【决战大数据之巅】:Spark Standalone VS YARN —— 揭秘两大部署模式的恩怨情仇与终极对决!
【8月更文挑战第7天】随着大数据需求的增长,Apache Spark 成为关键框架。本文对比了常见的 Spark Standalone 与 YARN 部署模式。Standalone 作为自带的轻量级集群管理服务,易于设置,适用于小规模或独立部署;而 YARN 作为 Hadoop 的资源管理系统,支持资源的统一管理和调度,更适合大规模生产环境及多框架集成。我们将通过示例代码展示如何在这两种模式下运行 Spark 应用程序。
226 3
|
21天前
|
分布式计算 资源调度 Hadoop
Spark Standalone与YARN的区别?
【10月更文挑战第5天】随着大数据处理需求的增长,Apache Spark 成为了广泛采用的大数据处理框架。本文详细解析了 Spark Standalone 与 YARN 两种常见部署模式的区别,并通过示例代码展示了如何在不同模式下运行 Spark 应用程序。Standalone 模式自带轻量级集群管理,适合小规模集群或独立部署;YARN 则作为外部资源管理器,能够与 Hadoop 生态系统中的其他应用共享资源,更适合大规模生产环境。文章对比了两者的资源管理、部署灵活性、扩展性和集成能力,帮助读者根据需求选择合适的部署模式。
14 1
|
2月前
|
消息中间件 分布式计算 Java
Linux环境下 java程序提交spark任务到Yarn报错
Linux环境下 java程序提交spark任务到Yarn报错
36 5
|
2月前
|
SQL 机器学习/深度学习 分布式计算
Spark适合处理哪些任务?
【9月更文挑战第1天】Spark适合处理哪些任务?
122 3
|
3月前
|
存储 分布式计算 供应链
Spark在供应链核算中应用问题之通过Spark UI进行任务优化如何解决
Spark在供应链核算中应用问题之通过Spark UI进行任务优化如何解决
|
4月前
|
SQL 弹性计算 资源调度
云服务器 ECS产品使用问题之bin/spark-sql --master yarn如何进行集群模式运行
云服务器ECS(Elastic Compute Service)是各大云服务商阿里云提供的一种基础云计算服务,它允许用户租用云端计算资源来部署和运行各种应用程序。以下是一个关于如何使用ECS产品的综合指南。
|
3月前
|
分布式计算 Serverless 数据处理
EMR Serverless Spark 实践教程 | 通过 Apache Airflow 使用 Livy Operator 提交任务
Apache Airflow 是一个强大的工作流程自动化和调度工具,它允许开发者编排、计划和监控数据管道的执行。EMR Serverless Spark 为处理大规模数据处理任务提供了一个无服务器计算环境。本文为您介绍如何通过 Apache Airflow 的 Livy Operator 实现自动化地向 EMR Serverless Spark 提交任务,以实现任务调度和执行的自动化,帮助您更有效地管理数据处理任务。
189 0
|
22天前
|
存储 分布式计算 算法
大数据-106 Spark Graph X 计算学习 案例:1图的基本计算、2连通图算法、3寻找相同的用户
大数据-106 Spark Graph X 计算学习 案例:1图的基本计算、2连通图算法、3寻找相同的用户
46 0