【大数据开发运维解决方案】Hadoop2.7.6+Spark单机伪分布式安装

本文涉及的产品
云原生大数据计算服务 MaxCompute,5000CU*H 100GB 3个月
服务治理 MSE Sentinel/OpenSergo,Agent数量 不受限
云原生大数据计算服务MaxCompute,500CU*H 100GB 3个月
简介: 一、安装spark依赖的Scala1.1 下载和解压缩Scala下载地址:点此下载或则直接去官网挑选下载:官网连接在Linux服务器的opt目录下新建一个名为scala的文件夹,并将下载的压缩包上载上去:[root@hadoop opt]# cd /usr/[root@hadoop usr]# mkdir scala[root@hadoop usr]# cd scala/[ro...

一、安装spark依赖的Scala

1.1 下载和解压缩Scala

下载地址:
点此下载
或则直接去官网挑选下载:
官网连接
在Linux服务器的opt目录下新建一个名为scala的文件夹,并将下载的压缩包上载上去:

[root@hadoop opt]# cd /usr/
[root@hadoop usr]# mkdir scala
[root@hadoop usr]# cd scala/
[root@hadoop scala]# pwd
/usr/scala
[root@hadoop scala]# ls
scala-2.12.2.tgz
[root@hadoop scala]# tar -xvf scala-2.12.2.tgz 
。。。
[root@hadoop scala]# ls
scala-2.12.2  scala-2.12.2.tgz
[root@hadoop scala]# rm -rf *tgz
[root@hadoop scala]# ls
scala-2.12.2
[root@hadoop scala]# cd scala-2.12.2/
[root@hadoop scala-2.12.2]# ls
bin  doc  lib  man
[root@hadoop scala-2.12.2]# pwd
/usr/scala/scala-2.12.2

1.2 配置环境变量

编辑/etc/profile这个文件,在文件中增加一行配置:

export    SCALA_HOME=/usr/scala/scala-2.12.2

在该文件的PATH变量中增加下面的内容:

${SCALA_HOME}/bin

添加完成后,我的/etc/profile的配置如下:

export JAVA_HOME=/usr/java/jdk1.8.0_151
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/hadoop/
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export HIVE_HOME=/hadoop/hive
export HIVE_CONF_DIR=${HIVE_HOME}/conf
export HCAT_HOME=$HIVE_HOME/hcatalog
export HIVE_DEPENDENCY=/hadoop/hive/conf:/hadoop/hive/lib/*:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-pig-adapter-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-core-2.3.3.jar:/hadoop/hiv
e/hcatalog/share/hcatalog/hive-hcatalog-server-extensions-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-streaming-2.3.3.jar:/hadoop/hive/lib/hive-exec-2.3.3.jarexport HBASE_HOME=/hadoop/hbase/
export ZOOKEEPER_HOME=/hadoop/zookeeper
export KAFKA_HOME=/hadoop/kafka
export KYLIN_HOME=/hadoop/kylin/
export GGHOME=/hadoop/ogg12
export SCALA_HOME=/usr/scala/scala-2.12.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HCAT_HOME/bin:$HBASE_HOME/bin:$ZOOKEEPER_HOME:$KAFKA_HOME:$KYLIN_HOME/bin:${SCALA_HOME}/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:${HIVE_HOME}/lib:$HBASE_HOME/lib:$KYLIN_HOME/lib

保存退出,source一下使环境变量生效:

[root@hadoop ~]# source /etc/profile

1.3 验证Scala

[root@hadoop ~]# scala -version
Scala code runner version 2.12.2 -- Copyright 2002-2017, LAMP/EPFL and Lightbend, Inc.

二、 下载和解压缩Spark

2.1、下载Spark

下载地址:
点此下载
如果是多个节点的话,在每个节点上都安装Spark,也就是重复下面的步骤。

2.2 解压缩Spark

在/hadoop创建spark目录用户存放spark。

[root@hadoop hadoop]# pwd
/hadoop
[root@hadoop hadoop]# mkdir spark
[root@hadoop spark]# pwd
/hadoop/spark
通过xftp上传安装包到spark目录
[root@hadoop spark]# ls
spark-2.4.5-bin-hadoop2.7.tgz
解压缩
[root@hadoop spark]# tar -xvf spark-2.4.5-bin-hadoop2.7.tgz
[root@hadoop spark]# ls
spark-2.4.5-bin-hadoop2.7  spark-2.4.5-bin-hadoop2.7.tgz
[root@hadoop spark]# rm -rf *tgz
[root@hadoop spark]# ls
spark-2.4.5-bin-hadoop2.7
[root@hadoop spark]# mv spark-2.4.5-bin-hadoop2.7/* .
[root@hadoop spark]# ls
bin  conf  data  examples  jars  kubernetes  LICENSE  licenses  NOTICE  python  R  README.md  RELEASE  sbin  spark-2.4.5-bin-hadoop2.7  yarn

三、Spark相关的配置

3.1、配置环境变量

编辑/etc/profile文件,增加

export  SPARK_HOME=/hadoop/spark

上面的变量添加完成后编辑该文件中的PATH变量,添加

${SPARK_HOME}/bin

修改完成后,我的/etc/profile文件内容是:

export JAVA_HOME=/usr/java/jdk1.8.0_151
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/hadoop/
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export HIVE_HOME=/hadoop/hive
export HIVE_CONF_DIR=${HIVE_HOME}/conf
export HCAT_HOME=$HIVE_HOME/hcatalog
export HIVE_DEPENDENCY=/hadoop/hive/conf:/hadoop/hive/lib/*:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-pig-adapter-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-core-2.3.3.jar:/hadoop/hiv
e/hcatalog/share/hcatalog/hive-hcatalog-server-extensions-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-streaming-2.3.3.jar:/hadoop/hive/lib/hive-exec-2.3.3.jarexport HBASE_HOME=/hadoop/hbase/
export ZOOKEEPER_HOME=/hadoop/zookeeper
export KAFKA_HOME=/hadoop/kafka
export KYLIN_HOME=/hadoop/kylin/
export SPARK_HOME=/hadoop/spark
export GGHOME=/hadoop/ogg12
export SCALA_HOME=/usr/scala/scala-2.12.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HCAT_HOME/bin:$HBASE_HOME/bin:$ZOOKEEPER_HOME:$KAFKA_HOME:$KYLIN_HOME/bin:${SCALA_HOME}/bin:${SPARK_HOME}/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:${HIVE_HOME}/lib:$HBASE_HOME/lib:$KYLIN_HOME/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME/jre/lib/amd64/libjsig.so:$JAVA_HOME/jre/lib/amd64/server/libjvm.so:$JAVA_HOME/jre/lib/amd64/server:$JAVA_HOME/jre/lib/amd64:$GG_HOME:/lib

编辑完成后,执行命令 source /etc/profile使环境变量生效。

3.2、配置参数文件

进入conf目录

[root@hadoop conf]# pwd
/hadoop/spark/conf

复制一份配置文件并重命名

root@hadoop conf]# cp spark-env.sh.template   spark-env.sh
[root@hadoop conf]# ls 
docker.properties.template  fairscheduler.xml.template  log4j.properties.template  metrics.properties.template  slaves.template  spark-defaults.conf.template  spark-env.sh  spark-env.sh.template

编辑spark-env.h文件,在里面加入配置(具体路径以自己的为准):

export SCALA_HOME=/usr/scala/scala-2.12.2
export JAVA_HOME=/usr/java/jdk1.8.0_151
export HADOOP_HOME=/hadoop
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export SPARK_HOME=/hadoop/spark
export SPARK_MASTER_IP=192.168.1.66
export SPARK_EXECUTOR_MEMORY=1G

source /etc/profile生效。

3.3、新建slaves文件

以spark为我们创建好的模板创建一个slaves文件,命令是:

[root@hadoop conf]# pwd
/hadoop/spark/conf
[root@hadoop conf]# cp slaves.template slaves

4、启动spark

因为spark是依赖于hadoop提供的分布式文件系统的,所以在启动spark之前,先确保hadoop在正常运行。
在hadoop正常运行的情况下,在hserver1(也就是hadoop的namenode,spark的marster节点)上执行命令:

   cd /hadoop/spark/sbin
  ./start-all.sh
  [root@hadoop sbin]# ./start-all.sh 
starting org.apache.spark.deploy.master.Master, logging to /hadoop/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-hadoop.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /hadoop/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop.out
[root@hadoop sbin]# cat /hadoop/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop.out
Spark Command: /usr/java/jdk1.8.0_151/bin/java -cp /hadoop/spark/conf/:/hadoop/spark/jars/*:/hadoop/etc/hadoop/ -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://hadoop:7077
========================================
20/03/30 16:10:37 INFO worker.Worker: Started daemon with process name: 24171@hadoop
20/03/30 16:10:37 INFO util.SignalUtils: Registered signal handler for TERM
20/03/30 16:10:37 INFO util.SignalUtils: Registered signal handler for HUP
20/03/30 16:10:37 INFO util.SignalUtils: Registered signal handler for INT
20/03/30 16:10:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/30 16:10:38 INFO spark.SecurityManager: Changing view acls to: root
20/03/30 16:10:38 INFO spark.SecurityManager: Changing modify acls to: root
20/03/30 16:10:38 INFO spark.SecurityManager: Changing view acls groups to: 
20/03/30 16:10:38 INFO spark.SecurityManager: Changing modify acls groups to: 
20/03/30 16:10:38 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permiss
ions: Set(root); groups with modify permissions: Set()20/03/30 16:10:38 INFO util.Utils: Successfully started service 'sparkWorker' on port 33620.
20/03/30 16:10:39 INFO worker.Worker: Starting Spark worker 192.168.1.66:33620 with 8 cores, 4.6 GB RAM
20/03/30 16:10:39 INFO worker.Worker: Running Spark version 2.4.5
20/03/30 16:10:39 INFO worker.Worker: Spark home: /hadoop/spark
20/03/30 16:10:39 INFO util.log: Logging initialized @1841ms
20/03/30 16:10:39 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
20/03/30 16:10:39 INFO server.Server: Started @1919ms
20/03/30 16:10:39 INFO server.AbstractConnector: Started ServerConnector@7a1c1e4a{HTTP/1.1,[http/1.1]}{0.0.0.0:8081}
20/03/30 16:10:39 INFO util.Utils: Successfully started service 'WorkerUI' on port 8081.
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@56674156{/logPage,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2aacb62f{/logPage/json,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@429b256c{/,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@540735e2{/json,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1bd5ec83{/static,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5050e7d7{/log,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO ui.WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://hadoop:8081
20/03/30 16:10:39 INFO worker.Worker: Connecting to master hadoop:7077...
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@53d7b653{/metrics/json,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO client.TransportClientFactory: Successfully created connection to hadoop/192.168.1.66:7077 after 30 ms (0 ms spent in bootstraps)
20/03/30 16:10:39 INFO worker.Worker: Successfully registered with master spark://hadoop:7077

访问webui:http://192.168.1.66:8080/

image.png

5、运行Spark提供的计算圆周率的示例程序

这里只是简单的用local模式运行一个计算圆周率的Demo。按照下面的步骤来操作。

[root@hadoop sbin]# cd /hadoop/spark/
[root@hadoop spark]# ./bin/spark-submit  --class  org.apache.spark.examples.SparkPi  --master local   examples/jars/spark-examples_2.11-2.4.5.jar 
20/03/30 16:16:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/30 16:16:26 INFO spark.SparkContext: Running Spark version 2.4.5
20/03/30 16:16:26 INFO spark.SparkContext: Submitted application: Spark Pi
20/03/30 16:16:26 INFO spark.SecurityManager: Changing view acls to: root
20/03/30 16:16:26 INFO spark.SecurityManager: Changing modify acls to: root
20/03/30 16:16:26 INFO spark.SecurityManager: Changing view acls groups to: 
20/03/30 16:16:26 INFO spark.SecurityManager: Changing modify acls groups to: 
20/03/30 16:16:26 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permiss
ions: Set(root); groups with modify permissions: Set()20/03/30 16:16:27 INFO util.Utils: Successfully started service 'sparkDriver' on port 41623.
20/03/30 16:16:27 INFO spark.SparkEnv: Registering MapOutputTracker
20/03/30 16:16:27 INFO spark.SparkEnv: Registering BlockManagerMaster
20/03/30 16:16:27 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/03/30 16:16:27 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/03/30 16:16:27 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-0af3c940-24b0-4784-8ec7-e5f4935e21f7
20/03/30 16:16:27 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
20/03/30 16:16:27 INFO spark.SparkEnv: Registering OutputCommitCoordinator
20/03/30 16:16:27 INFO util.log: Logging initialized @3646ms
20/03/30 16:16:27 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
20/03/30 16:16:27 INFO server.Server: Started @3776ms
20/03/30 16:16:27 INFO server.AbstractConnector: Started ServerConnector@7bd69e82{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
20/03/30 16:16:27 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@26d10f2e{/jobs,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5f577419{/jobs/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@28fa700e{/jobs/job,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@e041f0c{/jobs/job/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6a175569{/stages,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@11963225{/stages/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3f3c966c{/stages/stage,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@61a5b4ae{/stages/stage/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3a71c100{/stages/pool,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5b69fd74{/stages/pool/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@f325091{/storage,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@437e951d{/storage/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@77b325b3{/storage/rdd,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@63a5e46c{/storage/rdd/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7e8e8651{/environment,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@49ef32e0{/environment/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@271f18d3{/executors,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6bd51ed8{/executors/json,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@61e3a1fd{/executors/threadDump,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@51abf713{/executors/threadDump/json,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@eadb475{/static,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@64b31700{/,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3b65e559{/api,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1c05a54d{/jobs/job/kill,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@65ef722a{/stages/stage/kill,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://hadoop:4040
20/03/30 16:16:28 INFO spark.SparkContext: Added JAR file:/hadoop/spark/examples/jars/spark-examples_2.11-2.4.5.jar at spark://hadoop:41623/jars/spark-examples_2.11-2.4.5.jar with timestamp 1585556188078
20/03/30 16:16:28 INFO executor.Executor: Starting executor ID driver on host localhost
20/03/30 16:16:28 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39317.
20/03/30 16:16:28 INFO netty.NettyBlockTransferService: Server created on hadoop:39317
20/03/30 16:16:28 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/03/30 16:16:28 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, hadoop, 39317, None)
20/03/30 16:16:28 INFO storage.BlockManagerMasterEndpoint: Registering block manager hadoop:39317 with 366.3 MB RAM, BlockManagerId(driver, hadoop, 39317, None)
20/03/30 16:16:28 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, hadoop, 39317, None)
20/03/30 16:16:28 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, hadoop, 39317, None)
20/03/30 16:16:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@56ba8773{/metrics/json,null,AVAILABLE,@Spark}
20/03/30 16:16:30 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:38
20/03/30 16:16:30 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
20/03/30 16:16:30 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
20/03/30 16:16:30 INFO scheduler.DAGScheduler: Parents of final stage: List()
20/03/30 16:16:30 INFO scheduler.DAGScheduler: Missing parents: List()
20/03/30 16:16:30 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
20/03/30 16:16:31 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.0 KB, free 366.3 MB)
20/03/30 16:16:32 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1381.0 B, free 366.3 MB)
20/03/30 16:16:32 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop:39317 (size: 1381.0 B, free: 366.3 MB)
20/03/30 16:16:32 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1163
20/03/30 16:16:32 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1))
20/03/30 16:16:32 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
20/03/30 16:16:32 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7866 bytes)
20/03/30 16:16:32 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
20/03/30 16:16:32 INFO executor.Executor: Fetching spark://hadoop:41623/jars/spark-examples_2.11-2.4.5.jar with timestamp 1585556188078
20/03/30 16:16:32 INFO client.TransportClientFactory: Successfully created connection to hadoop/192.168.1.66:41623 after 83 ms (0 ms spent in bootstraps)
20/03/30 16:16:32 INFO util.Utils: Fetching spark://hadoop:41623/jars/spark-examples_2.11-2.4.5.jar to /tmp/spark-c53497ab-50d7-4f03-a485-07c447462768/userFiles-a7caa3f1-c1cf-4c26-afb6-883d4fd5afb2/fetchFileTem
p7331251015287722970.tmp20/03/30 16:16:32 INFO executor.Executor: Adding file:/tmp/spark-c53497ab-50d7-4f03-a485-07c447462768/userFiles-a7caa3f1-c1cf-4c26-afb6-883d4fd5afb2/spark-examples_2.11-2.4.5.jar to class loader
20/03/30 16:16:32 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 824 bytes result sent to driver
20/03/30 16:16:32 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7866 bytes)
20/03/30 16:16:32 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
20/03/30 16:16:32 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 603 ms on localhost (executor driver) (1/2)
20/03/30 16:16:32 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 824 bytes result sent to driver
20/03/30 16:16:32 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 32 ms on localhost (executor driver) (2/2)
20/03/30 16:16:32 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
20/03/30 16:16:32 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 2.083 s
20/03/30 16:16:32 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 2.207309 s
Pi is roughly 3.137355686778434
20/03/30 16:16:32 INFO server.AbstractConnector: Stopped Spark@7bd69e82{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
20/03/30 16:16:32 INFO ui.SparkUI: Stopped Spark web UI at http://hadoop:4040
20/03/30 16:16:32 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/03/30 16:16:32 INFO memory.MemoryStore: MemoryStore cleared
20/03/30 16:16:32 INFO storage.BlockManager: BlockManager stopped
20/03/30 16:16:32 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
20/03/30 16:16:32 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/03/30 16:16:33 INFO spark.SparkContext: Successfully stopped SparkContext
20/03/30 16:16:33 INFO util.ShutdownHookManager: Shutdown hook called
20/03/30 16:16:33 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-c53497ab-50d7-4f03-a485-07c447462768
20/03/30 16:16:33 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-27cddb1e-18a9-4066-b529-231a4a4bc936

可以看到输出:Pi is roughly 3.137355686778434
已经打印出了圆周率。
上面只是使用了单机本地模式调用Demo,使用集群模式运行Demo,请继续看。

6、用yarn-cluster模式执行计算程序
进入到Spark的安装目录,执行命令,用yarn-cluster模式运行计算圆周率的Demo:

[root@hadoop spark]# pwd
/hadoop/spark
[root@hadoop spark]# ./bin/spark-submit  --class  org.apache.spark.examples.SparkPi  --master  yarn-cluster   examples/jars/spark-examples_2.11-2.4.5.jar 
Warning: Master yarn-cluster is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
20/03/30 16:38:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/30 16:38:10 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032
20/03/30 16:38:11 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
20/03/30 16:38:11 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
20/03/30 16:38:11 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
20/03/30 16:38:11 INFO yarn.Client: Setting up container launch context for our AM
20/03/30 16:38:11 INFO yarn.Client: Setting up the launch environment for our AM container
20/03/30 16:38:11 INFO yarn.Client: Preparing resources for our AM container
20/03/30 16:38:12 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
20/03/30 16:38:14 INFO yarn.Client: Uploading resource file:/tmp/spark-aab93a8f-5930-4a9d-808f-e80ec7328fe6/__spark_libs__3884281235398378542.zip -> hdfs://192.168.1.66:9000/user/root/.sparkStaging/application_
1585555727521_0002/__spark_libs__3884281235398378542.zip20/03/30 16:38:19 INFO yarn.Client: Uploading resource file:/hadoop/spark/examples/jars/spark-examples_2.11-2.4.5.jar -> hdfs://192.168.1.66:9000/user/root/.sparkStaging/application_1585555727521_0002/spark-exa
mples_2.11-2.4.5.jar20/03/30 16:38:19 INFO yarn.Client: Uploading resource file:/tmp/spark-aab93a8f-5930-4a9d-808f-e80ec7328fe6/__spark_conf__2114469885015683725.zip -> hdfs://192.168.1.66:9000/user/root/.sparkStaging/application_
1585555727521_0002/__spark_conf__.zip20/03/30 16:38:19 INFO spark.SecurityManager: Changing view acls to: root
20/03/30 16:38:19 INFO spark.SecurityManager: Changing modify acls to: root
20/03/30 16:38:19 INFO spark.SecurityManager: Changing view acls groups to: 
20/03/30 16:38:19 INFO spark.SecurityManager: Changing modify acls groups to: 
20/03/30 16:38:19 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permiss
ions: Set(root); groups with modify permissions: Set()20/03/30 16:38:21 INFO yarn.Client: Submitting application application_1585555727521_0002 to ResourceManager
20/03/30 16:38:21 INFO impl.YarnClientImpl: Submitted application application_1585555727521_0002
20/03/30 16:38:22 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:22 INFO yarn.Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1585557501645
     final status: UNDEFINED
     tracking URL: http://hadoop:8088/proxy/application_1585555727521_0002/
     user: root
20/03/30 16:38:23 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:24 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:25 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:26 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:27 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:28 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:29 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:30 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:31 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:32 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:33 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:34 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:35 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:36 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:36 INFO yarn.Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: hadoop
     ApplicationMaster RPC port: 43325
     queue: default
     start time: 1585557501645
     final status: UNDEFINED
     tracking URL: http://hadoop:8088/proxy/application_1585555727521_0002/
     user: root
20/03/30 16:38:37 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:38 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:39 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:40 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:41 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:42 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:43 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:44 INFO yarn.Client: Application report for application_1585555727521_0002 (state: FINISHED)
20/03/30 16:38:44 INFO yarn.Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: hadoop
     ApplicationMaster RPC port: 43325
     queue: default
     start time: 1585557501645
     final status: SUCCEEDED
     tracking URL: http://hadoop:8088/proxy/application_1585555727521_0002/
     user: root
20/03/30 16:38:44 INFO yarn.Client: Deleted staging directory hdfs://192.168.1.66:9000/user/root/.sparkStaging/application_1585555727521_0002
20/03/30 16:38:44 INFO util.ShutdownHookManager: Shutdown hook called
20/03/30 16:38:44 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-aab93a8f-5930-4a9d-808f-e80ec7328fe6
20/03/30 16:38:44 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-0d062493-80a1-4624-b9bf-9f95acfb3626

注意,使用yarn-cluster模式计算,结果没有输出在控制台,结果写在了Hadoop集群的日志中,如何查看计算结果?注意到刚才的输出中有地址:
tracking URL: http://hadoop:8088/proxy/application_1585555727521_0002/
进去看看:
image.png
点击logs:
image.png
选择stdout:
image.png
圆周率结果已经打印出来了。
这里再给出几个常用命令:

启动spark
./sbin/start-all.sh
启动Hadoop以**及Spark:
./starths.sh
相关实践学习
基于MaxCompute的热门话题分析
本实验围绕社交用户发布的文章做了详尽的分析,通过分析能得到用户群体年龄分布,性别分布,地理位置分布,以及热门话题的热度。
SaaS 模式云数据仓库必修课
本课程由阿里云开发者社区和阿里云大数据团队共同出品,是SaaS模式云原生数据仓库领导者MaxCompute核心课程。本课程由阿里云资深产品和技术专家们从概念到方法,从场景到实践,体系化的将阿里巴巴飞天大数据平台10多年的经过验证的方法与实践深入浅出的讲给开发者们。帮助大数据开发者快速了解并掌握SaaS模式的云原生的数据仓库,助力开发者学习了解先进的技术栈,并能在实际业务中敏捷的进行大数据分析,赋能企业业务。 通过本课程可以了解SaaS模式云原生数据仓库领导者MaxCompute核心功能及典型适用场景,可应用MaxCompute实现数仓搭建,快速进行大数据分析。适合大数据工程师、大数据分析师 大量数据需要处理、存储和管理,需要搭建数据仓库?学它! 没有足够人员和经验来运维大数据平台,不想自建IDC买机器,需要免运维的大数据平台?会SQL就等于会大数据?学它! 想知道大数据用得对不对,想用更少的钱得到持续演进的数仓能力?获得极致弹性的计算资源和更好的性能,以及持续保护数据安全的生产环境?学它! 想要获得灵活的分析能力,快速洞察数据规律特征?想要兼得数据湖的灵活性与数据仓库的成长性?学它! 出品人:阿里云大数据产品及研发团队专家 产品 MaxCompute 官网 https://www.aliyun.com/product/odps 
相关文章
|
1月前
|
分布式计算 Kubernetes Hadoop
大数据-82 Spark 集群模式启动、集群架构、集群管理器 Spark的HelloWorld + Hadoop + HDFS
大数据-82 Spark 集群模式启动、集群架构、集群管理器 Spark的HelloWorld + Hadoop + HDFS
148 6
|
16天前
|
存储 分布式计算 Hadoop
数据湖技术:Hadoop与Spark在大数据处理中的协同作用
【10月更文挑战第27天】在大数据时代,数据湖技术凭借其灵活性和成本效益成为企业存储和分析大规模异构数据的首选。Hadoop和Spark作为数据湖技术的核心组件,通过HDFS存储数据和Spark进行高效计算,实现了数据处理的优化。本文探讨了Hadoop与Spark的最佳实践,包括数据存储、处理、安全和可视化等方面,展示了它们在实际应用中的协同效应。
62 2
|
16天前
|
运维 Serverless 数据处理
Serverless架构通过提供更快的研发交付速度、降低成本、简化运维、优化资源利用、提供自动扩展能力、支持实时数据处理和快速原型开发等优势,为图像处理等计算密集型应用提供了一个高效、灵活且成本效益高的解决方案。
Serverless架构通过提供更快的研发交付速度、降低成本、简化运维、优化资源利用、提供自动扩展能力、支持实时数据处理和快速原型开发等优势,为图像处理等计算密集型应用提供了一个高效、灵活且成本效益高的解决方案。
51 1
|
17天前
|
存储 分布式计算 Hadoop
数据湖技术:Hadoop与Spark在大数据处理中的协同作用
【10月更文挑战第26天】本文详细探讨了Hadoop与Spark在大数据处理中的协同作用,通过具体案例展示了两者的最佳实践。Hadoop的HDFS和MapReduce负责数据存储和预处理,确保高可靠性和容错性;Spark则凭借其高性能和丰富的API,进行深度分析和机器学习,实现高效的批处理和实时处理。
57 1
|
1月前
|
运维 Serverless 数据处理
Serverless架构通过提供更快的研发交付速度、降低成本、简化运维、优化资源利用、提供自动扩展能力、支持实时数据处理和快速原型开发等优势,为图像处理等计算密集型应用提供了一个高效、灵活且成本效益高的解决方案。
Serverless架构通过提供更快的研发交付速度、降低成本、简化运维、优化资源利用、提供自动扩展能力、支持实时数据处理和快速原型开发等优势,为图像处理等计算密集型应用提供了一个高效、灵活且成本效益高的解决方案。
57 3
|
1月前
|
SQL 分布式计算 大数据
大数据-168 Elasticsearch 单机云服务器部署运行 详细流程
大数据-168 Elasticsearch 单机云服务器部署运行 详细流程
53 2
|
1月前
|
存储 消息中间件 druid
大数据-150 Apache Druid 安装部署 单机启动 系统架构
大数据-150 Apache Druid 安装部署 单机启动 系统架构
38 1
|
1月前
|
弹性计算 缓存 搜索推荐
大数据个性化推荐,AWS终端用户解决方案
大数据个性化推荐,AWS终端用户解决方案
|
1月前
|
分布式计算 资源调度 Hadoop
大数据-80 Spark 简要概述 系统架构 部署模式 与Hadoop MapReduce对比
大数据-80 Spark 简要概述 系统架构 部署模式 与Hadoop MapReduce对比
64 2
|
1月前
|
分布式计算 Hadoop 大数据
大数据体系知识学习(一):PySpark和Hadoop环境的搭建与测试
这篇文章是关于大数据体系知识学习的,主要介绍了Apache Spark的基本概念、特点、组件,以及如何安装配置Java、PySpark和Hadoop环境。文章还提供了详细的安装步骤和测试代码,帮助读者搭建和测试大数据环境。
55 1