【大数据开发运维解决方案】Hadoop2.7.6+Spark单机伪分布式安装

本文涉及的产品
服务治理 MSE Sentinel/OpenSergo,Agent数量 不受限
简介: 一、安装spark依赖的Scala1.1 下载和解压缩Scala下载地址:点此下载或则直接去官网挑选下载:官网连接在Linux服务器的opt目录下新建一个名为scala的文件夹,并将下载的压缩包上载上去:[root@hadoop opt]# cd /usr/[root@hadoop usr]# mkdir scala[root@hadoop usr]# cd scala/[ro...

一、安装spark依赖的Scala

1.1 下载和解压缩Scala

下载地址:
点此下载
或则直接去官网挑选下载:
官网连接
在Linux服务器的opt目录下新建一个名为scala的文件夹,并将下载的压缩包上载上去:

[root@hadoop opt]# cd /usr/
[root@hadoop usr]# mkdir scala
[root@hadoop usr]# cd scala/
[root@hadoop scala]# pwd
/usr/scala
[root@hadoop scala]# ls
scala-2.12.2.tgz
[root@hadoop scala]# tar -xvf scala-2.12.2.tgz 
。。。
[root@hadoop scala]# ls
scala-2.12.2  scala-2.12.2.tgz
[root@hadoop scala]# rm -rf *tgz
[root@hadoop scala]# ls
scala-2.12.2
[root@hadoop scala]# cd scala-2.12.2/
[root@hadoop scala-2.12.2]# ls
bin  doc  lib  man
[root@hadoop scala-2.12.2]# pwd
/usr/scala/scala-2.12.2

1.2 配置环境变量

编辑/etc/profile这个文件,在文件中增加一行配置:

export    SCALA_HOME=/usr/scala/scala-2.12.2

在该文件的PATH变量中增加下面的内容:

${SCALA_HOME}/bin

添加完成后,我的/etc/profile的配置如下:

export JAVA_HOME=/usr/java/jdk1.8.0_151
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/hadoop/
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export HIVE_HOME=/hadoop/hive
export HIVE_CONF_DIR=${HIVE_HOME}/conf
export HCAT_HOME=$HIVE_HOME/hcatalog
export HIVE_DEPENDENCY=/hadoop/hive/conf:/hadoop/hive/lib/*:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-pig-adapter-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-core-2.3.3.jar:/hadoop/hiv
e/hcatalog/share/hcatalog/hive-hcatalog-server-extensions-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-streaming-2.3.3.jar:/hadoop/hive/lib/hive-exec-2.3.3.jarexport HBASE_HOME=/hadoop/hbase/
export ZOOKEEPER_HOME=/hadoop/zookeeper
export KAFKA_HOME=/hadoop/kafka
export KYLIN_HOME=/hadoop/kylin/
export GGHOME=/hadoop/ogg12
export SCALA_HOME=/usr/scala/scala-2.12.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HCAT_HOME/bin:$HBASE_HOME/bin:$ZOOKEEPER_HOME:$KAFKA_HOME:$KYLIN_HOME/bin:${SCALA_HOME}/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:${HIVE_HOME}/lib:$HBASE_HOME/lib:$KYLIN_HOME/lib

保存退出,source一下使环境变量生效:

[root@hadoop ~]# source /etc/profile

1.3 验证Scala

[root@hadoop ~]# scala -version
Scala code runner version 2.12.2 -- Copyright 2002-2017, LAMP/EPFL and Lightbend, Inc.

二、 下载和解压缩Spark

2.1、下载Spark

下载地址:
点此下载
如果是多个节点的话,在每个节点上都安装Spark,也就是重复下面的步骤。

2.2 解压缩Spark

在/hadoop创建spark目录用户存放spark。

[root@hadoop hadoop]# pwd
/hadoop
[root@hadoop hadoop]# mkdir spark
[root@hadoop spark]# pwd
/hadoop/spark
通过xftp上传安装包到spark目录
[root@hadoop spark]# ls
spark-2.4.5-bin-hadoop2.7.tgz
解压缩
[root@hadoop spark]# tar -xvf spark-2.4.5-bin-hadoop2.7.tgz
[root@hadoop spark]# ls
spark-2.4.5-bin-hadoop2.7  spark-2.4.5-bin-hadoop2.7.tgz
[root@hadoop spark]# rm -rf *tgz
[root@hadoop spark]# ls
spark-2.4.5-bin-hadoop2.7
[root@hadoop spark]# mv spark-2.4.5-bin-hadoop2.7/* .
[root@hadoop spark]# ls
bin  conf  data  examples  jars  kubernetes  LICENSE  licenses  NOTICE  python  R  README.md  RELEASE  sbin  spark-2.4.5-bin-hadoop2.7  yarn

三、Spark相关的配置

3.1、配置环境变量

编辑/etc/profile文件,增加

export  SPARK_HOME=/hadoop/spark

上面的变量添加完成后编辑该文件中的PATH变量,添加

${SPARK_HOME}/bin

修改完成后,我的/etc/profile文件内容是:

export JAVA_HOME=/usr/java/jdk1.8.0_151
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/hadoop/
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export HIVE_HOME=/hadoop/hive
export HIVE_CONF_DIR=${HIVE_HOME}/conf
export HCAT_HOME=$HIVE_HOME/hcatalog
export HIVE_DEPENDENCY=/hadoop/hive/conf:/hadoop/hive/lib/*:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-pig-adapter-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-core-2.3.3.jar:/hadoop/hiv
e/hcatalog/share/hcatalog/hive-hcatalog-server-extensions-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-streaming-2.3.3.jar:/hadoop/hive/lib/hive-exec-2.3.3.jarexport HBASE_HOME=/hadoop/hbase/
export ZOOKEEPER_HOME=/hadoop/zookeeper
export KAFKA_HOME=/hadoop/kafka
export KYLIN_HOME=/hadoop/kylin/
export SPARK_HOME=/hadoop/spark
export GGHOME=/hadoop/ogg12
export SCALA_HOME=/usr/scala/scala-2.12.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HCAT_HOME/bin:$HBASE_HOME/bin:$ZOOKEEPER_HOME:$KAFKA_HOME:$KYLIN_HOME/bin:${SCALA_HOME}/bin:${SPARK_HOME}/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:${HIVE_HOME}/lib:$HBASE_HOME/lib:$KYLIN_HOME/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME/jre/lib/amd64/libjsig.so:$JAVA_HOME/jre/lib/amd64/server/libjvm.so:$JAVA_HOME/jre/lib/amd64/server:$JAVA_HOME/jre/lib/amd64:$GG_HOME:/lib

编辑完成后,执行命令 source /etc/profile使环境变量生效。

3.2、配置参数文件

进入conf目录

[root@hadoop conf]# pwd
/hadoop/spark/conf

复制一份配置文件并重命名

root@hadoop conf]# cp spark-env.sh.template   spark-env.sh
[root@hadoop conf]# ls 
docker.properties.template  fairscheduler.xml.template  log4j.properties.template  metrics.properties.template  slaves.template  spark-defaults.conf.template  spark-env.sh  spark-env.sh.template

编辑spark-env.h文件,在里面加入配置(具体路径以自己的为准):

export SCALA_HOME=/usr/scala/scala-2.12.2
export JAVA_HOME=/usr/java/jdk1.8.0_151
export HADOOP_HOME=/hadoop
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export SPARK_HOME=/hadoop/spark
export SPARK_MASTER_IP=192.168.1.66
export SPARK_EXECUTOR_MEMORY=1G

source /etc/profile生效。

3.3、新建slaves文件

以spark为我们创建好的模板创建一个slaves文件,命令是:

[root@hadoop conf]# pwd
/hadoop/spark/conf
[root@hadoop conf]# cp slaves.template slaves

4、启动spark

因为spark是依赖于hadoop提供的分布式文件系统的,所以在启动spark之前,先确保hadoop在正常运行。
在hadoop正常运行的情况下,在hserver1(也就是hadoop的namenode,spark的marster节点)上执行命令:

   cd /hadoop/spark/sbin
  ./start-all.sh
  [root@hadoop sbin]# ./start-all.sh 
starting org.apache.spark.deploy.master.Master, logging to /hadoop/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-hadoop.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /hadoop/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop.out
[root@hadoop sbin]# cat /hadoop/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop.out
Spark Command: /usr/java/jdk1.8.0_151/bin/java -cp /hadoop/spark/conf/:/hadoop/spark/jars/*:/hadoop/etc/hadoop/ -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://hadoop:7077
========================================
20/03/30 16:10:37 INFO worker.Worker: Started daemon with process name: 24171@hadoop
20/03/30 16:10:37 INFO util.SignalUtils: Registered signal handler for TERM
20/03/30 16:10:37 INFO util.SignalUtils: Registered signal handler for HUP
20/03/30 16:10:37 INFO util.SignalUtils: Registered signal handler for INT
20/03/30 16:10:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/30 16:10:38 INFO spark.SecurityManager: Changing view acls to: root
20/03/30 16:10:38 INFO spark.SecurityManager: Changing modify acls to: root
20/03/30 16:10:38 INFO spark.SecurityManager: Changing view acls groups to: 
20/03/30 16:10:38 INFO spark.SecurityManager: Changing modify acls groups to: 
20/03/30 16:10:38 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permiss
ions: Set(root); groups with modify permissions: Set()20/03/30 16:10:38 INFO util.Utils: Successfully started service 'sparkWorker' on port 33620.
20/03/30 16:10:39 INFO worker.Worker: Starting Spark worker 192.168.1.66:33620 with 8 cores, 4.6 GB RAM
20/03/30 16:10:39 INFO worker.Worker: Running Spark version 2.4.5
20/03/30 16:10:39 INFO worker.Worker: Spark home: /hadoop/spark
20/03/30 16:10:39 INFO util.log: Logging initialized @1841ms
20/03/30 16:10:39 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
20/03/30 16:10:39 INFO server.Server: Started @1919ms
20/03/30 16:10:39 INFO server.AbstractConnector: Started ServerConnector@7a1c1e4a{HTTP/1.1,[http/1.1]}{0.0.0.0:8081}
20/03/30 16:10:39 INFO util.Utils: Successfully started service 'WorkerUI' on port 8081.
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@56674156{/logPage,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2aacb62f{/logPage/json,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@429b256c{/,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@540735e2{/json,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1bd5ec83{/static,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5050e7d7{/log,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO ui.WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://hadoop:8081
20/03/30 16:10:39 INFO worker.Worker: Connecting to master hadoop:7077...
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@53d7b653{/metrics/json,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO client.TransportClientFactory: Successfully created connection to hadoop/192.168.1.66:7077 after 30 ms (0 ms spent in bootstraps)
20/03/30 16:10:39 INFO worker.Worker: Successfully registered with master spark://hadoop:7077

访问webui:http://192.168.1.66:8080/

image.png

5、运行Spark提供的计算圆周率的示例程序

这里只是简单的用local模式运行一个计算圆周率的Demo。按照下面的步骤来操作。

[root@hadoop sbin]# cd /hadoop/spark/
[root@hadoop spark]# ./bin/spark-submit  --class  org.apache.spark.examples.SparkPi  --master local   examples/jars/spark-examples_2.11-2.4.5.jar 
20/03/30 16:16:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/30 16:16:26 INFO spark.SparkContext: Running Spark version 2.4.5
20/03/30 16:16:26 INFO spark.SparkContext: Submitted application: Spark Pi
20/03/30 16:16:26 INFO spark.SecurityManager: Changing view acls to: root
20/03/30 16:16:26 INFO spark.SecurityManager: Changing modify acls to: root
20/03/30 16:16:26 INFO spark.SecurityManager: Changing view acls groups to: 
20/03/30 16:16:26 INFO spark.SecurityManager: Changing modify acls groups to: 
20/03/30 16:16:26 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permiss
ions: Set(root); groups with modify permissions: Set()20/03/30 16:16:27 INFO util.Utils: Successfully started service 'sparkDriver' on port 41623.
20/03/30 16:16:27 INFO spark.SparkEnv: Registering MapOutputTracker
20/03/30 16:16:27 INFO spark.SparkEnv: Registering BlockManagerMaster
20/03/30 16:16:27 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/03/30 16:16:27 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/03/30 16:16:27 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-0af3c940-24b0-4784-8ec7-e5f4935e21f7
20/03/30 16:16:27 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
20/03/30 16:16:27 INFO spark.SparkEnv: Registering OutputCommitCoordinator
20/03/30 16:16:27 INFO util.log: Logging initialized @3646ms
20/03/30 16:16:27 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
20/03/30 16:16:27 INFO server.Server: Started @3776ms
20/03/30 16:16:27 INFO server.AbstractConnector: Started ServerConnector@7bd69e82{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
20/03/30 16:16:27 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@26d10f2e{/jobs,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5f577419{/jobs/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@28fa700e{/jobs/job,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@e041f0c{/jobs/job/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6a175569{/stages,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@11963225{/stages/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3f3c966c{/stages/stage,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@61a5b4ae{/stages/stage/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3a71c100{/stages/pool,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5b69fd74{/stages/pool/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@f325091{/storage,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@437e951d{/storage/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@77b325b3{/storage/rdd,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@63a5e46c{/storage/rdd/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7e8e8651{/environment,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@49ef32e0{/environment/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@271f18d3{/executors,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6bd51ed8{/executors/json,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@61e3a1fd{/executors/threadDump,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@51abf713{/executors/threadDump/json,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@eadb475{/static,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@64b31700{/,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3b65e559{/api,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1c05a54d{/jobs/job/kill,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@65ef722a{/stages/stage/kill,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://hadoop:4040
20/03/30 16:16:28 INFO spark.SparkContext: Added JAR file:/hadoop/spark/examples/jars/spark-examples_2.11-2.4.5.jar at spark://hadoop:41623/jars/spark-examples_2.11-2.4.5.jar with timestamp 1585556188078
20/03/30 16:16:28 INFO executor.Executor: Starting executor ID driver on host localhost
20/03/30 16:16:28 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39317.
20/03/30 16:16:28 INFO netty.NettyBlockTransferService: Server created on hadoop:39317
20/03/30 16:16:28 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/03/30 16:16:28 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, hadoop, 39317, None)
20/03/30 16:16:28 INFO storage.BlockManagerMasterEndpoint: Registering block manager hadoop:39317 with 366.3 MB RAM, BlockManagerId(driver, hadoop, 39317, None)
20/03/30 16:16:28 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, hadoop, 39317, None)
20/03/30 16:16:28 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, hadoop, 39317, None)
20/03/30 16:16:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@56ba8773{/metrics/json,null,AVAILABLE,@Spark}
20/03/30 16:16:30 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:38
20/03/30 16:16:30 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
20/03/30 16:16:30 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
20/03/30 16:16:30 INFO scheduler.DAGScheduler: Parents of final stage: List()
20/03/30 16:16:30 INFO scheduler.DAGScheduler: Missing parents: List()
20/03/30 16:16:30 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
20/03/30 16:16:31 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.0 KB, free 366.3 MB)
20/03/30 16:16:32 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1381.0 B, free 366.3 MB)
20/03/30 16:16:32 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop:39317 (size: 1381.0 B, free: 366.3 MB)
20/03/30 16:16:32 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1163
20/03/30 16:16:32 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1))
20/03/30 16:16:32 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
20/03/30 16:16:32 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7866 bytes)
20/03/30 16:16:32 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
20/03/30 16:16:32 INFO executor.Executor: Fetching spark://hadoop:41623/jars/spark-examples_2.11-2.4.5.jar with timestamp 1585556188078
20/03/30 16:16:32 INFO client.TransportClientFactory: Successfully created connection to hadoop/192.168.1.66:41623 after 83 ms (0 ms spent in bootstraps)
20/03/30 16:16:32 INFO util.Utils: Fetching spark://hadoop:41623/jars/spark-examples_2.11-2.4.5.jar to /tmp/spark-c53497ab-50d7-4f03-a485-07c447462768/userFiles-a7caa3f1-c1cf-4c26-afb6-883d4fd5afb2/fetchFileTem
p7331251015287722970.tmp20/03/30 16:16:32 INFO executor.Executor: Adding file:/tmp/spark-c53497ab-50d7-4f03-a485-07c447462768/userFiles-a7caa3f1-c1cf-4c26-afb6-883d4fd5afb2/spark-examples_2.11-2.4.5.jar to class loader
20/03/30 16:16:32 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 824 bytes result sent to driver
20/03/30 16:16:32 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7866 bytes)
20/03/30 16:16:32 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
20/03/30 16:16:32 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 603 ms on localhost (executor driver) (1/2)
20/03/30 16:16:32 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 824 bytes result sent to driver
20/03/30 16:16:32 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 32 ms on localhost (executor driver) (2/2)
20/03/30 16:16:32 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
20/03/30 16:16:32 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 2.083 s
20/03/30 16:16:32 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 2.207309 s
Pi is roughly 3.137355686778434
20/03/30 16:16:32 INFO server.AbstractConnector: Stopped Spark@7bd69e82{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
20/03/30 16:16:32 INFO ui.SparkUI: Stopped Spark web UI at http://hadoop:4040
20/03/30 16:16:32 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/03/30 16:16:32 INFO memory.MemoryStore: MemoryStore cleared
20/03/30 16:16:32 INFO storage.BlockManager: BlockManager stopped
20/03/30 16:16:32 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
20/03/30 16:16:32 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/03/30 16:16:33 INFO spark.SparkContext: Successfully stopped SparkContext
20/03/30 16:16:33 INFO util.ShutdownHookManager: Shutdown hook called
20/03/30 16:16:33 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-c53497ab-50d7-4f03-a485-07c447462768
20/03/30 16:16:33 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-27cddb1e-18a9-4066-b529-231a4a4bc936

可以看到输出:Pi is roughly 3.137355686778434
已经打印出了圆周率。
上面只是使用了单机本地模式调用Demo,使用集群模式运行Demo,请继续看。

6、用yarn-cluster模式执行计算程序
进入到Spark的安装目录,执行命令,用yarn-cluster模式运行计算圆周率的Demo:

[root@hadoop spark]# pwd
/hadoop/spark
[root@hadoop spark]# ./bin/spark-submit  --class  org.apache.spark.examples.SparkPi  --master  yarn-cluster   examples/jars/spark-examples_2.11-2.4.5.jar 
Warning: Master yarn-cluster is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
20/03/30 16:38:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/30 16:38:10 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032
20/03/30 16:38:11 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
20/03/30 16:38:11 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
20/03/30 16:38:11 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
20/03/30 16:38:11 INFO yarn.Client: Setting up container launch context for our AM
20/03/30 16:38:11 INFO yarn.Client: Setting up the launch environment for our AM container
20/03/30 16:38:11 INFO yarn.Client: Preparing resources for our AM container
20/03/30 16:38:12 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
20/03/30 16:38:14 INFO yarn.Client: Uploading resource file:/tmp/spark-aab93a8f-5930-4a9d-808f-e80ec7328fe6/__spark_libs__3884281235398378542.zip -> hdfs://192.168.1.66:9000/user/root/.sparkStaging/application_
1585555727521_0002/__spark_libs__3884281235398378542.zip20/03/30 16:38:19 INFO yarn.Client: Uploading resource file:/hadoop/spark/examples/jars/spark-examples_2.11-2.4.5.jar -> hdfs://192.168.1.66:9000/user/root/.sparkStaging/application_1585555727521_0002/spark-exa
mples_2.11-2.4.5.jar20/03/30 16:38:19 INFO yarn.Client: Uploading resource file:/tmp/spark-aab93a8f-5930-4a9d-808f-e80ec7328fe6/__spark_conf__2114469885015683725.zip -> hdfs://192.168.1.66:9000/user/root/.sparkStaging/application_
1585555727521_0002/__spark_conf__.zip20/03/30 16:38:19 INFO spark.SecurityManager: Changing view acls to: root
20/03/30 16:38:19 INFO spark.SecurityManager: Changing modify acls to: root
20/03/30 16:38:19 INFO spark.SecurityManager: Changing view acls groups to: 
20/03/30 16:38:19 INFO spark.SecurityManager: Changing modify acls groups to: 
20/03/30 16:38:19 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permiss
ions: Set(root); groups with modify permissions: Set()20/03/30 16:38:21 INFO yarn.Client: Submitting application application_1585555727521_0002 to ResourceManager
20/03/30 16:38:21 INFO impl.YarnClientImpl: Submitted application application_1585555727521_0002
20/03/30 16:38:22 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:22 INFO yarn.Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1585557501645
     final status: UNDEFINED
     tracking URL: http://hadoop:8088/proxy/application_1585555727521_0002/
     user: root
20/03/30 16:38:23 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:24 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:25 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:26 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:27 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:28 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:29 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:30 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:31 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:32 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:33 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:34 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:35 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:36 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:36 INFO yarn.Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: hadoop
     ApplicationMaster RPC port: 43325
     queue: default
     start time: 1585557501645
     final status: UNDEFINED
     tracking URL: http://hadoop:8088/proxy/application_1585555727521_0002/
     user: root
20/03/30 16:38:37 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:38 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:39 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:40 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:41 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:42 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:43 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:44 INFO yarn.Client: Application report for application_1585555727521_0002 (state: FINISHED)
20/03/30 16:38:44 INFO yarn.Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: hadoop
     ApplicationMaster RPC port: 43325
     queue: default
     start time: 1585557501645
     final status: SUCCEEDED
     tracking URL: http://hadoop:8088/proxy/application_1585555727521_0002/
     user: root
20/03/30 16:38:44 INFO yarn.Client: Deleted staging directory hdfs://192.168.1.66:9000/user/root/.sparkStaging/application_1585555727521_0002
20/03/30 16:38:44 INFO util.ShutdownHookManager: Shutdown hook called
20/03/30 16:38:44 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-aab93a8f-5930-4a9d-808f-e80ec7328fe6
20/03/30 16:38:44 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-0d062493-80a1-4624-b9bf-9f95acfb3626

注意,使用yarn-cluster模式计算,结果没有输出在控制台,结果写在了Hadoop集群的日志中,如何查看计算结果?注意到刚才的输出中有地址:
tracking URL: http://hadoop:8088/proxy/application_1585555727521_0002/
进去看看:
image.png
点击logs:
image.png
选择stdout:
image.png
圆周率结果已经打印出来了。
这里再给出几个常用命令:

启动spark
./sbin/start-all.sh
启动Hadoop以**及Spark:
./starths.sh
相关实践学习
基于MaxCompute的热门话题分析
本实验围绕社交用户发布的文章做了详尽的分析,通过分析能得到用户群体年龄分布,性别分布,地理位置分布,以及热门话题的热度。
SaaS 模式云数据仓库必修课
本课程由阿里云开发者社区和阿里云大数据团队共同出品,是SaaS模式云原生数据仓库领导者MaxCompute核心课程。本课程由阿里云资深产品和技术专家们从概念到方法,从场景到实践,体系化的将阿里巴巴飞天大数据平台10多年的经过验证的方法与实践深入浅出的讲给开发者们。帮助大数据开发者快速了解并掌握SaaS模式的云原生的数据仓库,助力开发者学习了解先进的技术栈,并能在实际业务中敏捷的进行大数据分析,赋能企业业务。 通过本课程可以了解SaaS模式云原生数据仓库领导者MaxCompute核心功能及典型适用场景,可应用MaxCompute实现数仓搭建,快速进行大数据分析。适合大数据工程师、大数据分析师 大量数据需要处理、存储和管理,需要搭建数据仓库?学它! 没有足够人员和经验来运维大数据平台,不想自建IDC买机器,需要免运维的大数据平台?会SQL就等于会大数据?学它! 想知道大数据用得对不对,想用更少的钱得到持续演进的数仓能力?获得极致弹性的计算资源和更好的性能,以及持续保护数据安全的生产环境?学它! 想要获得灵活的分析能力,快速洞察数据规律特征?想要兼得数据湖的灵活性与数据仓库的成长性?学它! 出品人:阿里云大数据产品及研发团队专家 产品 MaxCompute 官网 https://www.aliyun.com/product/odps 
相关文章
|
1天前
|
分布式计算 Kubernetes Spark
大数据之spark on k8s
大数据之spark on k8s
|
3天前
|
分布式计算 DataWorks MaxCompute
DataWorks操作报错合集之spark操作odps,写入时报错,是什么导致的
DataWorks是阿里云提供的一站式大数据开发与治理平台,支持数据集成、数据开发、数据服务、数据质量管理、数据安全管理等全流程数据处理。在使用DataWorks过程中,可能会遇到各种操作报错。以下是一些常见的报错情况及其可能的原因和解决方法。
|
10天前
|
分布式计算 Spark 大数据
深入探究Apache Spark在大数据处理中的实践应用
【6月更文挑战第2天】Apache Spark是流行的开源大数据处理框架,以其内存计算速度和低延迟脱颖而出。本文涵盖Spark概述、核心组件(包括Spark Core、SQL、Streaming和MLlib)及其在数据预处理、批处理分析、交互式查询、实时处理和机器学习中的应用。通过理解Spark内部机制和实践应用,可提升大数据处理效率,发挥其在各行业的潜力。
|
13天前
|
分布式计算 Hadoop 大数据
分布式计算框架比较:Hadoop、Spark 与 Flink
【5月更文挑战第31天】Hadoop是大数据处理的开创性框架,专注于大规模批量数据处理,具有高扩展性和容错性。然而,它在实时任务上表现不足。以下是一个简单的Hadoop MapReduce的WordCount程序示例,展示如何统计文本中单词出现次数。
49 0
|
16天前
|
分布式计算 Hadoop 大数据
探索大数据技术:Hadoop与Spark的奥秘之旅
【5月更文挑战第28天】本文探讨了大数据技术中的Hadoop和Spark,Hadoop作为分布式系统基础架构,通过HDFS和MapReduce处理大规模数据,适用于搜索引擎等场景。Spark是快速数据处理引擎,采用内存计算和DAG模型,适用于实时推荐和机器学习。两者各有优势,未来将继续发展和完善,助力大数据时代的发展。
|
23天前
|
存储 分布式计算 Hadoop
Spark编程实验一:Spark和Hadoop的安装使用
Spark编程实验一:Spark和Hadoop的安装使用
39 4
|
24天前
|
存储 分布式计算 Hadoop
大数据存储技术(1)—— Hadoop简介及安装配置
大数据存储技术(1)—— Hadoop简介及安装配置
64 0
|
29天前
|
分布式计算 数据可视化 Hadoop
大数据实战——基于Hadoop的Mapreduce编程实践案例的设计与实现
大数据实战——基于Hadoop的Mapreduce编程实践案例的设计与实现
212 0
|
29天前
|
分布式计算 资源调度 Hadoop
java与大数据:Hadoop与MapReduce
java与大数据:Hadoop与MapReduce
31 0
|
29天前
|
分布式计算 DataWorks 大数据
MaxCompute操作报错合集之大数据计算的MaxCompute Spark引擎无法读取到表,是什么原因
MaxCompute是阿里云提供的大规模离线数据处理服务,用于大数据分析、挖掘和报表生成等场景。在使用MaxCompute进行数据处理时,可能会遇到各种操作报错。以下是一些常见的MaxCompute操作报错及其可能的原因与解决措施的合集。
MaxCompute操作报错合集之大数据计算的MaxCompute Spark引擎无法读取到表,是什么原因