一、安装spark依赖的Scala
1.1 下载和解压缩Scala
下载地址:
点此下载
或则直接去官网挑选下载:
官网连接
在Linux服务器的opt目录下新建一个名为scala的文件夹,并将下载的压缩包上载上去:
[root@hadoop opt]# cd /usr/
[root@hadoop usr]# mkdir scala
[root@hadoop usr]# cd scala/
[root@hadoop scala]# pwd
/usr/scala
[root@hadoop scala]# ls
scala-2.12.2.tgz
[root@hadoop scala]# tar -xvf scala-2.12.2.tgz
。。。
[root@hadoop scala]# ls
scala-2.12.2 scala-2.12.2.tgz
[root@hadoop scala]# rm -rf *tgz
[root@hadoop scala]# ls
scala-2.12.2
[root@hadoop scala]# cd scala-2.12.2/
[root@hadoop scala-2.12.2]# ls
bin doc lib man
[root@hadoop scala-2.12.2]# pwd
/usr/scala/scala-2.12.2
1.2 配置环境变量
编辑/etc/profile这个文件,在文件中增加一行配置:
export SCALA_HOME=/usr/scala/scala-2.12.2
在该文件的PATH变量中增加下面的内容:
${SCALA_HOME}/bin
添加完成后,我的/etc/profile的配置如下:
export JAVA_HOME=/usr/java/jdk1.8.0_151
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/hadoop/
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export HIVE_HOME=/hadoop/hive
export HIVE_CONF_DIR=${HIVE_HOME}/conf
export HCAT_HOME=$HIVE_HOME/hcatalog
export HIVE_DEPENDENCY=/hadoop/hive/conf:/hadoop/hive/lib/*:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-pig-adapter-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-core-2.3.3.jar:/hadoop/hiv
e/hcatalog/share/hcatalog/hive-hcatalog-server-extensions-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-streaming-2.3.3.jar:/hadoop/hive/lib/hive-exec-2.3.3.jarexport HBASE_HOME=/hadoop/hbase/
export ZOOKEEPER_HOME=/hadoop/zookeeper
export KAFKA_HOME=/hadoop/kafka
export KYLIN_HOME=/hadoop/kylin/
export GGHOME=/hadoop/ogg12
export SCALA_HOME=/usr/scala/scala-2.12.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HCAT_HOME/bin:$HBASE_HOME/bin:$ZOOKEEPER_HOME:$KAFKA_HOME:$KYLIN_HOME/bin:${SCALA_HOME}/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:${HIVE_HOME}/lib:$HBASE_HOME/lib:$KYLIN_HOME/lib
保存退出,source一下使环境变量生效:
[root@hadoop ~]# source /etc/profile
1.3 验证Scala
[root@hadoop ~]# scala -version
Scala code runner version 2.12.2 -- Copyright 2002-2017, LAMP/EPFL and Lightbend, Inc.
二、 下载和解压缩Spark
2.1、下载Spark
下载地址:
点此下载
如果是多个节点的话,在每个节点上都安装Spark,也就是重复下面的步骤。
2.2 解压缩Spark
在/hadoop创建spark目录用户存放spark。
[root@hadoop hadoop]# pwd
/hadoop
[root@hadoop hadoop]# mkdir spark
[root@hadoop spark]# pwd
/hadoop/spark
通过xftp上传安装包到spark目录
[root@hadoop spark]# ls
spark-2.4.5-bin-hadoop2.7.tgz
解压缩
[root@hadoop spark]# tar -xvf spark-2.4.5-bin-hadoop2.7.tgz
[root@hadoop spark]# ls
spark-2.4.5-bin-hadoop2.7 spark-2.4.5-bin-hadoop2.7.tgz
[root@hadoop spark]# rm -rf *tgz
[root@hadoop spark]# ls
spark-2.4.5-bin-hadoop2.7
[root@hadoop spark]# mv spark-2.4.5-bin-hadoop2.7/* .
[root@hadoop spark]# ls
bin conf data examples jars kubernetes LICENSE licenses NOTICE python R README.md RELEASE sbin spark-2.4.5-bin-hadoop2.7 yarn
三、Spark相关的配置
3.1、配置环境变量
编辑/etc/profile文件,增加
export SPARK_HOME=/hadoop/spark
上面的变量添加完成后编辑该文件中的PATH变量,添加
${SPARK_HOME}/bin
修改完成后,我的/etc/profile文件内容是:
export JAVA_HOME=/usr/java/jdk1.8.0_151
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/hadoop/
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export HIVE_HOME=/hadoop/hive
export HIVE_CONF_DIR=${HIVE_HOME}/conf
export HCAT_HOME=$HIVE_HOME/hcatalog
export HIVE_DEPENDENCY=/hadoop/hive/conf:/hadoop/hive/lib/*:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-pig-adapter-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-core-2.3.3.jar:/hadoop/hiv
e/hcatalog/share/hcatalog/hive-hcatalog-server-extensions-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-streaming-2.3.3.jar:/hadoop/hive/lib/hive-exec-2.3.3.jarexport HBASE_HOME=/hadoop/hbase/
export ZOOKEEPER_HOME=/hadoop/zookeeper
export KAFKA_HOME=/hadoop/kafka
export KYLIN_HOME=/hadoop/kylin/
export SPARK_HOME=/hadoop/spark
export GGHOME=/hadoop/ogg12
export SCALA_HOME=/usr/scala/scala-2.12.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HCAT_HOME/bin:$HBASE_HOME/bin:$ZOOKEEPER_HOME:$KAFKA_HOME:$KYLIN_HOME/bin:${SCALA_HOME}/bin:${SPARK_HOME}/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:${HIVE_HOME}/lib:$HBASE_HOME/lib:$KYLIN_HOME/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME/jre/lib/amd64/libjsig.so:$JAVA_HOME/jre/lib/amd64/server/libjvm.so:$JAVA_HOME/jre/lib/amd64/server:$JAVA_HOME/jre/lib/amd64:$GG_HOME:/lib
编辑完成后,执行命令 source /etc/profile使环境变量生效。
3.2、配置参数文件
进入conf目录
[root@hadoop conf]# pwd
/hadoop/spark/conf
复制一份配置文件并重命名
root@hadoop conf]# cp spark-env.sh.template spark-env.sh
[root@hadoop conf]# ls
docker.properties.template fairscheduler.xml.template log4j.properties.template metrics.properties.template slaves.template spark-defaults.conf.template spark-env.sh spark-env.sh.template
编辑spark-env.h文件,在里面加入配置(具体路径以自己的为准):
export SCALA_HOME=/usr/scala/scala-2.12.2
export JAVA_HOME=/usr/java/jdk1.8.0_151
export HADOOP_HOME=/hadoop
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export SPARK_HOME=/hadoop/spark
export SPARK_MASTER_IP=192.168.1.66
export SPARK_EXECUTOR_MEMORY=1G
source /etc/profile生效。
3.3、新建slaves文件
以spark为我们创建好的模板创建一个slaves文件,命令是:
[root@hadoop conf]# pwd
/hadoop/spark/conf
[root@hadoop conf]# cp slaves.template slaves
4、启动spark
因为spark是依赖于hadoop提供的分布式文件系统的,所以在启动spark之前,先确保hadoop在正常运行。
在hadoop正常运行的情况下,在hserver1(也就是hadoop的namenode,spark的marster节点)上执行命令:
cd /hadoop/spark/sbin
./start-all.sh
[root@hadoop sbin]# ./start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /hadoop/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-hadoop.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /hadoop/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop.out
[root@hadoop sbin]# cat /hadoop/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop.out
Spark Command: /usr/java/jdk1.8.0_151/bin/java -cp /hadoop/spark/conf/:/hadoop/spark/jars/*:/hadoop/etc/hadoop/ -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://hadoop:7077
========================================
20/03/30 16:10:37 INFO worker.Worker: Started daemon with process name: 24171@hadoop
20/03/30 16:10:37 INFO util.SignalUtils: Registered signal handler for TERM
20/03/30 16:10:37 INFO util.SignalUtils: Registered signal handler for HUP
20/03/30 16:10:37 INFO util.SignalUtils: Registered signal handler for INT
20/03/30 16:10:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/30 16:10:38 INFO spark.SecurityManager: Changing view acls to: root
20/03/30 16:10:38 INFO spark.SecurityManager: Changing modify acls to: root
20/03/30 16:10:38 INFO spark.SecurityManager: Changing view acls groups to:
20/03/30 16:10:38 INFO spark.SecurityManager: Changing modify acls groups to:
20/03/30 16:10:38 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permiss
ions: Set(root); groups with modify permissions: Set()20/03/30 16:10:38 INFO util.Utils: Successfully started service 'sparkWorker' on port 33620.
20/03/30 16:10:39 INFO worker.Worker: Starting Spark worker 192.168.1.66:33620 with 8 cores, 4.6 GB RAM
20/03/30 16:10:39 INFO worker.Worker: Running Spark version 2.4.5
20/03/30 16:10:39 INFO worker.Worker: Spark home: /hadoop/spark
20/03/30 16:10:39 INFO util.log: Logging initialized @1841ms
20/03/30 16:10:39 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
20/03/30 16:10:39 INFO server.Server: Started @1919ms
20/03/30 16:10:39 INFO server.AbstractConnector: Started ServerConnector@7a1c1e4a{HTTP/1.1,[http/1.1]}{0.0.0.0:8081}
20/03/30 16:10:39 INFO util.Utils: Successfully started service 'WorkerUI' on port 8081.
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@56674156{/logPage,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2aacb62f{/logPage/json,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@429b256c{/,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@540735e2{/json,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1bd5ec83{/static,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5050e7d7{/log,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO ui.WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://hadoop:8081
20/03/30 16:10:39 INFO worker.Worker: Connecting to master hadoop:7077...
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@53d7b653{/metrics/json,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO client.TransportClientFactory: Successfully created connection to hadoop/192.168.1.66:7077 after 30 ms (0 ms spent in bootstraps)
20/03/30 16:10:39 INFO worker.Worker: Successfully registered with master spark://hadoop:7077
访问webui:http://192.168.1.66:8080/
5、运行Spark提供的计算圆周率的示例程序
这里只是简单的用local模式运行一个计算圆周率的Demo。按照下面的步骤来操作。
[root@hadoop sbin]# cd /hadoop/spark/
[root@hadoop spark]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local examples/jars/spark-examples_2.11-2.4.5.jar
20/03/30 16:16:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/30 16:16:26 INFO spark.SparkContext: Running Spark version 2.4.5
20/03/30 16:16:26 INFO spark.SparkContext: Submitted application: Spark Pi
20/03/30 16:16:26 INFO spark.SecurityManager: Changing view acls to: root
20/03/30 16:16:26 INFO spark.SecurityManager: Changing modify acls to: root
20/03/30 16:16:26 INFO spark.SecurityManager: Changing view acls groups to:
20/03/30 16:16:26 INFO spark.SecurityManager: Changing modify acls groups to:
20/03/30 16:16:26 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permiss
ions: Set(root); groups with modify permissions: Set()20/03/30 16:16:27 INFO util.Utils: Successfully started service 'sparkDriver' on port 41623.
20/03/30 16:16:27 INFO spark.SparkEnv: Registering MapOutputTracker
20/03/30 16:16:27 INFO spark.SparkEnv: Registering BlockManagerMaster
20/03/30 16:16:27 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/03/30 16:16:27 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/03/30 16:16:27 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-0af3c940-24b0-4784-8ec7-e5f4935e21f7
20/03/30 16:16:27 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
20/03/30 16:16:27 INFO spark.SparkEnv: Registering OutputCommitCoordinator
20/03/30 16:16:27 INFO util.log: Logging initialized @3646ms
20/03/30 16:16:27 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
20/03/30 16:16:27 INFO server.Server: Started @3776ms
20/03/30 16:16:27 INFO server.AbstractConnector: Started ServerConnector@7bd69e82{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
20/03/30 16:16:27 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@26d10f2e{/jobs,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5f577419{/jobs/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@28fa700e{/jobs/job,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@e041f0c{/jobs/job/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6a175569{/stages,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@11963225{/stages/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3f3c966c{/stages/stage,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@61a5b4ae{/stages/stage/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3a71c100{/stages/pool,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5b69fd74{/stages/pool/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@f325091{/storage,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@437e951d{/storage/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@77b325b3{/storage/rdd,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@63a5e46c{/storage/rdd/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7e8e8651{/environment,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@49ef32e0{/environment/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@271f18d3{/executors,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6bd51ed8{/executors/json,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@61e3a1fd{/executors/threadDump,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@51abf713{/executors/threadDump/json,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@eadb475{/static,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@64b31700{/,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3b65e559{/api,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1c05a54d{/jobs/job/kill,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@65ef722a{/stages/stage/kill,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://hadoop:4040
20/03/30 16:16:28 INFO spark.SparkContext: Added JAR file:/hadoop/spark/examples/jars/spark-examples_2.11-2.4.5.jar at spark://hadoop:41623/jars/spark-examples_2.11-2.4.5.jar with timestamp 1585556188078
20/03/30 16:16:28 INFO executor.Executor: Starting executor ID driver on host localhost
20/03/30 16:16:28 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39317.
20/03/30 16:16:28 INFO netty.NettyBlockTransferService: Server created on hadoop:39317
20/03/30 16:16:28 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/03/30 16:16:28 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, hadoop, 39317, None)
20/03/30 16:16:28 INFO storage.BlockManagerMasterEndpoint: Registering block manager hadoop:39317 with 366.3 MB RAM, BlockManagerId(driver, hadoop, 39317, None)
20/03/30 16:16:28 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, hadoop, 39317, None)
20/03/30 16:16:28 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, hadoop, 39317, None)
20/03/30 16:16:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@56ba8773{/metrics/json,null,AVAILABLE,@Spark}
20/03/30 16:16:30 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:38
20/03/30 16:16:30 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
20/03/30 16:16:30 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
20/03/30 16:16:30 INFO scheduler.DAGScheduler: Parents of final stage: List()
20/03/30 16:16:30 INFO scheduler.DAGScheduler: Missing parents: List()
20/03/30 16:16:30 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
20/03/30 16:16:31 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.0 KB, free 366.3 MB)
20/03/30 16:16:32 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1381.0 B, free 366.3 MB)
20/03/30 16:16:32 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop:39317 (size: 1381.0 B, free: 366.3 MB)
20/03/30 16:16:32 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1163
20/03/30 16:16:32 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1))
20/03/30 16:16:32 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
20/03/30 16:16:32 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7866 bytes)
20/03/30 16:16:32 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
20/03/30 16:16:32 INFO executor.Executor: Fetching spark://hadoop:41623/jars/spark-examples_2.11-2.4.5.jar with timestamp 1585556188078
20/03/30 16:16:32 INFO client.TransportClientFactory: Successfully created connection to hadoop/192.168.1.66:41623 after 83 ms (0 ms spent in bootstraps)
20/03/30 16:16:32 INFO util.Utils: Fetching spark://hadoop:41623/jars/spark-examples_2.11-2.4.5.jar to /tmp/spark-c53497ab-50d7-4f03-a485-07c447462768/userFiles-a7caa3f1-c1cf-4c26-afb6-883d4fd5afb2/fetchFileTem
p7331251015287722970.tmp20/03/30 16:16:32 INFO executor.Executor: Adding file:/tmp/spark-c53497ab-50d7-4f03-a485-07c447462768/userFiles-a7caa3f1-c1cf-4c26-afb6-883d4fd5afb2/spark-examples_2.11-2.4.5.jar to class loader
20/03/30 16:16:32 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 824 bytes result sent to driver
20/03/30 16:16:32 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7866 bytes)
20/03/30 16:16:32 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
20/03/30 16:16:32 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 603 ms on localhost (executor driver) (1/2)
20/03/30 16:16:32 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 824 bytes result sent to driver
20/03/30 16:16:32 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 32 ms on localhost (executor driver) (2/2)
20/03/30 16:16:32 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
20/03/30 16:16:32 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 2.083 s
20/03/30 16:16:32 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 2.207309 s
Pi is roughly 3.137355686778434
20/03/30 16:16:32 INFO server.AbstractConnector: Stopped Spark@7bd69e82{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
20/03/30 16:16:32 INFO ui.SparkUI: Stopped Spark web UI at http://hadoop:4040
20/03/30 16:16:32 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/03/30 16:16:32 INFO memory.MemoryStore: MemoryStore cleared
20/03/30 16:16:32 INFO storage.BlockManager: BlockManager stopped
20/03/30 16:16:32 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
20/03/30 16:16:32 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/03/30 16:16:33 INFO spark.SparkContext: Successfully stopped SparkContext
20/03/30 16:16:33 INFO util.ShutdownHookManager: Shutdown hook called
20/03/30 16:16:33 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-c53497ab-50d7-4f03-a485-07c447462768
20/03/30 16:16:33 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-27cddb1e-18a9-4066-b529-231a4a4bc936
可以看到输出:Pi is roughly 3.137355686778434
已经打印出了圆周率。
上面只是使用了单机本地模式调用Demo,使用集群模式运行Demo,请继续看。
6、用yarn-cluster模式执行计算程序
进入到Spark的安装目录,执行命令,用yarn-cluster模式运行计算圆周率的Demo:
[root@hadoop spark]# pwd
/hadoop/spark
[root@hadoop spark]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster examples/jars/spark-examples_2.11-2.4.5.jar
Warning: Master yarn-cluster is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
20/03/30 16:38:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/30 16:38:10 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032
20/03/30 16:38:11 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
20/03/30 16:38:11 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
20/03/30 16:38:11 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
20/03/30 16:38:11 INFO yarn.Client: Setting up container launch context for our AM
20/03/30 16:38:11 INFO yarn.Client: Setting up the launch environment for our AM container
20/03/30 16:38:11 INFO yarn.Client: Preparing resources for our AM container
20/03/30 16:38:12 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
20/03/30 16:38:14 INFO yarn.Client: Uploading resource file:/tmp/spark-aab93a8f-5930-4a9d-808f-e80ec7328fe6/__spark_libs__3884281235398378542.zip -> hdfs://192.168.1.66:9000/user/root/.sparkStaging/application_
1585555727521_0002/__spark_libs__3884281235398378542.zip20/03/30 16:38:19 INFO yarn.Client: Uploading resource file:/hadoop/spark/examples/jars/spark-examples_2.11-2.4.5.jar -> hdfs://192.168.1.66:9000/user/root/.sparkStaging/application_1585555727521_0002/spark-exa
mples_2.11-2.4.5.jar20/03/30 16:38:19 INFO yarn.Client: Uploading resource file:/tmp/spark-aab93a8f-5930-4a9d-808f-e80ec7328fe6/__spark_conf__2114469885015683725.zip -> hdfs://192.168.1.66:9000/user/root/.sparkStaging/application_
1585555727521_0002/__spark_conf__.zip20/03/30 16:38:19 INFO spark.SecurityManager: Changing view acls to: root
20/03/30 16:38:19 INFO spark.SecurityManager: Changing modify acls to: root
20/03/30 16:38:19 INFO spark.SecurityManager: Changing view acls groups to:
20/03/30 16:38:19 INFO spark.SecurityManager: Changing modify acls groups to:
20/03/30 16:38:19 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permiss
ions: Set(root); groups with modify permissions: Set()20/03/30 16:38:21 INFO yarn.Client: Submitting application application_1585555727521_0002 to ResourceManager
20/03/30 16:38:21 INFO impl.YarnClientImpl: Submitted application application_1585555727521_0002
20/03/30 16:38:22 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:22 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1585557501645
final status: UNDEFINED
tracking URL: http://hadoop:8088/proxy/application_1585555727521_0002/
user: root
20/03/30 16:38:23 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:24 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:25 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:26 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:27 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:28 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:29 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:30 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:31 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:32 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:33 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:34 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:35 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:36 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:36 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: hadoop
ApplicationMaster RPC port: 43325
queue: default
start time: 1585557501645
final status: UNDEFINED
tracking URL: http://hadoop:8088/proxy/application_1585555727521_0002/
user: root
20/03/30 16:38:37 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:38 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:39 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:40 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:41 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:42 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:43 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:44 INFO yarn.Client: Application report for application_1585555727521_0002 (state: FINISHED)
20/03/30 16:38:44 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: hadoop
ApplicationMaster RPC port: 43325
queue: default
start time: 1585557501645
final status: SUCCEEDED
tracking URL: http://hadoop:8088/proxy/application_1585555727521_0002/
user: root
20/03/30 16:38:44 INFO yarn.Client: Deleted staging directory hdfs://192.168.1.66:9000/user/root/.sparkStaging/application_1585555727521_0002
20/03/30 16:38:44 INFO util.ShutdownHookManager: Shutdown hook called
20/03/30 16:38:44 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-aab93a8f-5930-4a9d-808f-e80ec7328fe6
20/03/30 16:38:44 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-0d062493-80a1-4624-b9bf-9f95acfb3626
注意,使用yarn-cluster模式计算,结果没有输出在控制台,结果写在了Hadoop集群的日志中,如何查看计算结果?注意到刚才的输出中有地址:
tracking URL: http://hadoop:8088/proxy/application_1585555727521_0002/
进去看看:
点击logs:
选择stdout:
圆周率结果已经打印出来了。
这里再给出几个常用命令:
启动spark
./sbin/start-all.sh
启动Hadoop以**及Spark:
./starths.sh