【大数据开发运维解决方案】Hadoop2.7.6+Spark单机伪分布式安装

简介: 一、安装spark依赖的Scala1.1 下载和解压缩Scala下载地址:点此下载或则直接去官网挑选下载:官网连接在Linux服务器的opt目录下新建一个名为scala的文件夹,并将下载的压缩包上载上去:[root@hadoop opt]# cd /usr/[root@hadoop usr]# mkdir scala[root@hadoop usr]# cd scala/[ro...

一、安装spark依赖的Scala

1.1 下载和解压缩Scala

下载地址:
点此下载
或则直接去官网挑选下载:
官网连接
在Linux服务器的opt目录下新建一个名为scala的文件夹,并将下载的压缩包上载上去:

[root@hadoop opt]# cd /usr/
[root@hadoop usr]# mkdir scala
[root@hadoop usr]# cd scala/
[root@hadoop scala]# pwd
/usr/scala
[root@hadoop scala]# ls
scala-2.12.2.tgz
[root@hadoop scala]# tar -xvf scala-2.12.2.tgz 
。。。
[root@hadoop scala]# ls
scala-2.12.2  scala-2.12.2.tgz
[root@hadoop scala]# rm -rf *tgz
[root@hadoop scala]# ls
scala-2.12.2
[root@hadoop scala]# cd scala-2.12.2/
[root@hadoop scala-2.12.2]# ls
bin  doc  lib  man
[root@hadoop scala-2.12.2]# pwd
/usr/scala/scala-2.12.2

1.2 配置环境变量

编辑/etc/profile这个文件,在文件中增加一行配置:

export    SCALA_HOME=/usr/scala/scala-2.12.2

在该文件的PATH变量中增加下面的内容:

${SCALA_HOME}/bin

添加完成后,我的/etc/profile的配置如下:

export JAVA_HOME=/usr/java/jdk1.8.0_151
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/hadoop/
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export HIVE_HOME=/hadoop/hive
export HIVE_CONF_DIR=${HIVE_HOME}/conf
export HCAT_HOME=$HIVE_HOME/hcatalog
export HIVE_DEPENDENCY=/hadoop/hive/conf:/hadoop/hive/lib/*:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-pig-adapter-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-core-2.3.3.jar:/hadoop/hiv
e/hcatalog/share/hcatalog/hive-hcatalog-server-extensions-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-streaming-2.3.3.jar:/hadoop/hive/lib/hive-exec-2.3.3.jarexport HBASE_HOME=/hadoop/hbase/
export ZOOKEEPER_HOME=/hadoop/zookeeper
export KAFKA_HOME=/hadoop/kafka
export KYLIN_HOME=/hadoop/kylin/
export GGHOME=/hadoop/ogg12
export SCALA_HOME=/usr/scala/scala-2.12.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HCAT_HOME/bin:$HBASE_HOME/bin:$ZOOKEEPER_HOME:$KAFKA_HOME:$KYLIN_HOME/bin:${SCALA_HOME}/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:${HIVE_HOME}/lib:$HBASE_HOME/lib:$KYLIN_HOME/lib

保存退出,source一下使环境变量生效:

[root@hadoop ~]# source /etc/profile

1.3 验证Scala

[root@hadoop ~]# scala -version
Scala code runner version 2.12.2 -- Copyright 2002-2017, LAMP/EPFL and Lightbend, Inc.

二、 下载和解压缩Spark

2.1、下载Spark

下载地址:
点此下载
如果是多个节点的话,在每个节点上都安装Spark,也就是重复下面的步骤。

2.2 解压缩Spark

在/hadoop创建spark目录用户存放spark。

[root@hadoop hadoop]# pwd
/hadoop
[root@hadoop hadoop]# mkdir spark
[root@hadoop spark]# pwd
/hadoop/spark
通过xftp上传安装包到spark目录
[root@hadoop spark]# ls
spark-2.4.5-bin-hadoop2.7.tgz
解压缩
[root@hadoop spark]# tar -xvf spark-2.4.5-bin-hadoop2.7.tgz
[root@hadoop spark]# ls
spark-2.4.5-bin-hadoop2.7  spark-2.4.5-bin-hadoop2.7.tgz
[root@hadoop spark]# rm -rf *tgz
[root@hadoop spark]# ls
spark-2.4.5-bin-hadoop2.7
[root@hadoop spark]# mv spark-2.4.5-bin-hadoop2.7/* .
[root@hadoop spark]# ls
bin  conf  data  examples  jars  kubernetes  LICENSE  licenses  NOTICE  python  R  README.md  RELEASE  sbin  spark-2.4.5-bin-hadoop2.7  yarn

三、Spark相关的配置

3.1、配置环境变量

编辑/etc/profile文件,增加

export  SPARK_HOME=/hadoop/spark

上面的变量添加完成后编辑该文件中的PATH变量,添加

${SPARK_HOME}/bin

修改完成后,我的/etc/profile文件内容是:

export JAVA_HOME=/usr/java/jdk1.8.0_151
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/hadoop/
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export HIVE_HOME=/hadoop/hive
export HIVE_CONF_DIR=${HIVE_HOME}/conf
export HCAT_HOME=$HIVE_HOME/hcatalog
export HIVE_DEPENDENCY=/hadoop/hive/conf:/hadoop/hive/lib/*:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-pig-adapter-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-core-2.3.3.jar:/hadoop/hiv
e/hcatalog/share/hcatalog/hive-hcatalog-server-extensions-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-streaming-2.3.3.jar:/hadoop/hive/lib/hive-exec-2.3.3.jarexport HBASE_HOME=/hadoop/hbase/
export ZOOKEEPER_HOME=/hadoop/zookeeper
export KAFKA_HOME=/hadoop/kafka
export KYLIN_HOME=/hadoop/kylin/
export SPARK_HOME=/hadoop/spark
export GGHOME=/hadoop/ogg12
export SCALA_HOME=/usr/scala/scala-2.12.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HCAT_HOME/bin:$HBASE_HOME/bin:$ZOOKEEPER_HOME:$KAFKA_HOME:$KYLIN_HOME/bin:${SCALA_HOME}/bin:${SPARK_HOME}/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:${HIVE_HOME}/lib:$HBASE_HOME/lib:$KYLIN_HOME/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME/jre/lib/amd64/libjsig.so:$JAVA_HOME/jre/lib/amd64/server/libjvm.so:$JAVA_HOME/jre/lib/amd64/server:$JAVA_HOME/jre/lib/amd64:$GG_HOME:/lib

编辑完成后,执行命令 source /etc/profile使环境变量生效。

3.2、配置参数文件

进入conf目录

[root@hadoop conf]# pwd
/hadoop/spark/conf

复制一份配置文件并重命名

root@hadoop conf]# cp spark-env.sh.template   spark-env.sh
[root@hadoop conf]# ls 
docker.properties.template  fairscheduler.xml.template  log4j.properties.template  metrics.properties.template  slaves.template  spark-defaults.conf.template  spark-env.sh  spark-env.sh.template

编辑spark-env.h文件,在里面加入配置(具体路径以自己的为准):

export SCALA_HOME=/usr/scala/scala-2.12.2
export JAVA_HOME=/usr/java/jdk1.8.0_151
export HADOOP_HOME=/hadoop
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export SPARK_HOME=/hadoop/spark
export SPARK_MASTER_IP=192.168.1.66
export SPARK_EXECUTOR_MEMORY=1G

source /etc/profile生效。

3.3、新建slaves文件

以spark为我们创建好的模板创建一个slaves文件,命令是:

[root@hadoop conf]# pwd
/hadoop/spark/conf
[root@hadoop conf]# cp slaves.template slaves

4、启动spark

因为spark是依赖于hadoop提供的分布式文件系统的,所以在启动spark之前,先确保hadoop在正常运行。
在hadoop正常运行的情况下,在hserver1(也就是hadoop的namenode,spark的marster节点)上执行命令:

   cd /hadoop/spark/sbin
  ./start-all.sh
  [root@hadoop sbin]# ./start-all.sh 
starting org.apache.spark.deploy.master.Master, logging to /hadoop/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-hadoop.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /hadoop/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop.out
[root@hadoop sbin]# cat /hadoop/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-hadoop.out
Spark Command: /usr/java/jdk1.8.0_151/bin/java -cp /hadoop/spark/conf/:/hadoop/spark/jars/*:/hadoop/etc/hadoop/ -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://hadoop:7077
========================================
20/03/30 16:10:37 INFO worker.Worker: Started daemon with process name: 24171@hadoop
20/03/30 16:10:37 INFO util.SignalUtils: Registered signal handler for TERM
20/03/30 16:10:37 INFO util.SignalUtils: Registered signal handler for HUP
20/03/30 16:10:37 INFO util.SignalUtils: Registered signal handler for INT
20/03/30 16:10:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/30 16:10:38 INFO spark.SecurityManager: Changing view acls to: root
20/03/30 16:10:38 INFO spark.SecurityManager: Changing modify acls to: root
20/03/30 16:10:38 INFO spark.SecurityManager: Changing view acls groups to: 
20/03/30 16:10:38 INFO spark.SecurityManager: Changing modify acls groups to: 
20/03/30 16:10:38 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permiss
ions: Set(root); groups with modify permissions: Set()20/03/30 16:10:38 INFO util.Utils: Successfully started service 'sparkWorker' on port 33620.
20/03/30 16:10:39 INFO worker.Worker: Starting Spark worker 192.168.1.66:33620 with 8 cores, 4.6 GB RAM
20/03/30 16:10:39 INFO worker.Worker: Running Spark version 2.4.5
20/03/30 16:10:39 INFO worker.Worker: Spark home: /hadoop/spark
20/03/30 16:10:39 INFO util.log: Logging initialized @1841ms
20/03/30 16:10:39 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
20/03/30 16:10:39 INFO server.Server: Started @1919ms
20/03/30 16:10:39 INFO server.AbstractConnector: Started ServerConnector@7a1c1e4a{HTTP/1.1,[http/1.1]}{0.0.0.0:8081}
20/03/30 16:10:39 INFO util.Utils: Successfully started service 'WorkerUI' on port 8081.
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@56674156{/logPage,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2aacb62f{/logPage/json,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@429b256c{/,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@540735e2{/json,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1bd5ec83{/static,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5050e7d7{/log,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO ui.WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://hadoop:8081
20/03/30 16:10:39 INFO worker.Worker: Connecting to master hadoop:7077...
20/03/30 16:10:39 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@53d7b653{/metrics/json,null,AVAILABLE,@Spark}
20/03/30 16:10:39 INFO client.TransportClientFactory: Successfully created connection to hadoop/192.168.1.66:7077 after 30 ms (0 ms spent in bootstraps)
20/03/30 16:10:39 INFO worker.Worker: Successfully registered with master spark://hadoop:7077

访问webui:http://192.168.1.66:8080/

image.png

5、运行Spark提供的计算圆周率的示例程序

这里只是简单的用local模式运行一个计算圆周率的Demo。按照下面的步骤来操作。

[root@hadoop sbin]# cd /hadoop/spark/
[root@hadoop spark]# ./bin/spark-submit  --class  org.apache.spark.examples.SparkPi  --master local   examples/jars/spark-examples_2.11-2.4.5.jar 
20/03/30 16:16:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/30 16:16:26 INFO spark.SparkContext: Running Spark version 2.4.5
20/03/30 16:16:26 INFO spark.SparkContext: Submitted application: Spark Pi
20/03/30 16:16:26 INFO spark.SecurityManager: Changing view acls to: root
20/03/30 16:16:26 INFO spark.SecurityManager: Changing modify acls to: root
20/03/30 16:16:26 INFO spark.SecurityManager: Changing view acls groups to: 
20/03/30 16:16:26 INFO spark.SecurityManager: Changing modify acls groups to: 
20/03/30 16:16:26 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permiss
ions: Set(root); groups with modify permissions: Set()20/03/30 16:16:27 INFO util.Utils: Successfully started service 'sparkDriver' on port 41623.
20/03/30 16:16:27 INFO spark.SparkEnv: Registering MapOutputTracker
20/03/30 16:16:27 INFO spark.SparkEnv: Registering BlockManagerMaster
20/03/30 16:16:27 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/03/30 16:16:27 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/03/30 16:16:27 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-0af3c940-24b0-4784-8ec7-e5f4935e21f7
20/03/30 16:16:27 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
20/03/30 16:16:27 INFO spark.SparkEnv: Registering OutputCommitCoordinator
20/03/30 16:16:27 INFO util.log: Logging initialized @3646ms
20/03/30 16:16:27 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
20/03/30 16:16:27 INFO server.Server: Started @3776ms
20/03/30 16:16:27 INFO server.AbstractConnector: Started ServerConnector@7bd69e82{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
20/03/30 16:16:27 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@26d10f2e{/jobs,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5f577419{/jobs/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@28fa700e{/jobs/job,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@e041f0c{/jobs/job/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6a175569{/stages,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@11963225{/stages/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3f3c966c{/stages/stage,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@61a5b4ae{/stages/stage/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3a71c100{/stages/pool,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5b69fd74{/stages/pool/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@f325091{/storage,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@437e951d{/storage/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@77b325b3{/storage/rdd,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@63a5e46c{/storage/rdd/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7e8e8651{/environment,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@49ef32e0{/environment/json,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@271f18d3{/executors,null,AVAILABLE,@Spark}
20/03/30 16:16:27 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6bd51ed8{/executors/json,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@61e3a1fd{/executors/threadDump,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@51abf713{/executors/threadDump/json,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@eadb475{/static,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@64b31700{/,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3b65e559{/api,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1c05a54d{/jobs/job/kill,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@65ef722a{/stages/stage/kill,null,AVAILABLE,@Spark}
20/03/30 16:16:28 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://hadoop:4040
20/03/30 16:16:28 INFO spark.SparkContext: Added JAR file:/hadoop/spark/examples/jars/spark-examples_2.11-2.4.5.jar at spark://hadoop:41623/jars/spark-examples_2.11-2.4.5.jar with timestamp 1585556188078
20/03/30 16:16:28 INFO executor.Executor: Starting executor ID driver on host localhost
20/03/30 16:16:28 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39317.
20/03/30 16:16:28 INFO netty.NettyBlockTransferService: Server created on hadoop:39317
20/03/30 16:16:28 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/03/30 16:16:28 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, hadoop, 39317, None)
20/03/30 16:16:28 INFO storage.BlockManagerMasterEndpoint: Registering block manager hadoop:39317 with 366.3 MB RAM, BlockManagerId(driver, hadoop, 39317, None)
20/03/30 16:16:28 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, hadoop, 39317, None)
20/03/30 16:16:28 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, hadoop, 39317, None)
20/03/30 16:16:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@56ba8773{/metrics/json,null,AVAILABLE,@Spark}
20/03/30 16:16:30 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:38
20/03/30 16:16:30 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
20/03/30 16:16:30 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
20/03/30 16:16:30 INFO scheduler.DAGScheduler: Parents of final stage: List()
20/03/30 16:16:30 INFO scheduler.DAGScheduler: Missing parents: List()
20/03/30 16:16:30 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
20/03/30 16:16:31 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.0 KB, free 366.3 MB)
20/03/30 16:16:32 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1381.0 B, free 366.3 MB)
20/03/30 16:16:32 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop:39317 (size: 1381.0 B, free: 366.3 MB)
20/03/30 16:16:32 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1163
20/03/30 16:16:32 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1))
20/03/30 16:16:32 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
20/03/30 16:16:32 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7866 bytes)
20/03/30 16:16:32 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
20/03/30 16:16:32 INFO executor.Executor: Fetching spark://hadoop:41623/jars/spark-examples_2.11-2.4.5.jar with timestamp 1585556188078
20/03/30 16:16:32 INFO client.TransportClientFactory: Successfully created connection to hadoop/192.168.1.66:41623 after 83 ms (0 ms spent in bootstraps)
20/03/30 16:16:32 INFO util.Utils: Fetching spark://hadoop:41623/jars/spark-examples_2.11-2.4.5.jar to /tmp/spark-c53497ab-50d7-4f03-a485-07c447462768/userFiles-a7caa3f1-c1cf-4c26-afb6-883d4fd5afb2/fetchFileTem
p7331251015287722970.tmp20/03/30 16:16:32 INFO executor.Executor: Adding file:/tmp/spark-c53497ab-50d7-4f03-a485-07c447462768/userFiles-a7caa3f1-c1cf-4c26-afb6-883d4fd5afb2/spark-examples_2.11-2.4.5.jar to class loader
20/03/30 16:16:32 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 824 bytes result sent to driver
20/03/30 16:16:32 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7866 bytes)
20/03/30 16:16:32 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
20/03/30 16:16:32 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 603 ms on localhost (executor driver) (1/2)
20/03/30 16:16:32 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 824 bytes result sent to driver
20/03/30 16:16:32 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 32 ms on localhost (executor driver) (2/2)
20/03/30 16:16:32 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
20/03/30 16:16:32 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 2.083 s
20/03/30 16:16:32 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 2.207309 s
Pi is roughly 3.137355686778434
20/03/30 16:16:32 INFO server.AbstractConnector: Stopped Spark@7bd69e82{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
20/03/30 16:16:32 INFO ui.SparkUI: Stopped Spark web UI at http://hadoop:4040
20/03/30 16:16:32 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/03/30 16:16:32 INFO memory.MemoryStore: MemoryStore cleared
20/03/30 16:16:32 INFO storage.BlockManager: BlockManager stopped
20/03/30 16:16:32 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
20/03/30 16:16:32 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/03/30 16:16:33 INFO spark.SparkContext: Successfully stopped SparkContext
20/03/30 16:16:33 INFO util.ShutdownHookManager: Shutdown hook called
20/03/30 16:16:33 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-c53497ab-50d7-4f03-a485-07c447462768
20/03/30 16:16:33 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-27cddb1e-18a9-4066-b529-231a4a4bc936

可以看到输出:Pi is roughly 3.137355686778434
已经打印出了圆周率。
上面只是使用了单机本地模式调用Demo,使用集群模式运行Demo,请继续看。

6、用yarn-cluster模式执行计算程序
进入到Spark的安装目录,执行命令,用yarn-cluster模式运行计算圆周率的Demo:

[root@hadoop spark]# pwd
/hadoop/spark
[root@hadoop spark]# ./bin/spark-submit  --class  org.apache.spark.examples.SparkPi  --master  yarn-cluster   examples/jars/spark-examples_2.11-2.4.5.jar 
Warning: Master yarn-cluster is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
20/03/30 16:38:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/30 16:38:10 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032
20/03/30 16:38:11 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
20/03/30 16:38:11 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
20/03/30 16:38:11 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
20/03/30 16:38:11 INFO yarn.Client: Setting up container launch context for our AM
20/03/30 16:38:11 INFO yarn.Client: Setting up the launch environment for our AM container
20/03/30 16:38:11 INFO yarn.Client: Preparing resources for our AM container
20/03/30 16:38:12 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
20/03/30 16:38:14 INFO yarn.Client: Uploading resource file:/tmp/spark-aab93a8f-5930-4a9d-808f-e80ec7328fe6/__spark_libs__3884281235398378542.zip -> hdfs://192.168.1.66:9000/user/root/.sparkStaging/application_
1585555727521_0002/__spark_libs__3884281235398378542.zip20/03/30 16:38:19 INFO yarn.Client: Uploading resource file:/hadoop/spark/examples/jars/spark-examples_2.11-2.4.5.jar -> hdfs://192.168.1.66:9000/user/root/.sparkStaging/application_1585555727521_0002/spark-exa
mples_2.11-2.4.5.jar20/03/30 16:38:19 INFO yarn.Client: Uploading resource file:/tmp/spark-aab93a8f-5930-4a9d-808f-e80ec7328fe6/__spark_conf__2114469885015683725.zip -> hdfs://192.168.1.66:9000/user/root/.sparkStaging/application_
1585555727521_0002/__spark_conf__.zip20/03/30 16:38:19 INFO spark.SecurityManager: Changing view acls to: root
20/03/30 16:38:19 INFO spark.SecurityManager: Changing modify acls to: root
20/03/30 16:38:19 INFO spark.SecurityManager: Changing view acls groups to: 
20/03/30 16:38:19 INFO spark.SecurityManager: Changing modify acls groups to: 
20/03/30 16:38:19 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permiss
ions: Set(root); groups with modify permissions: Set()20/03/30 16:38:21 INFO yarn.Client: Submitting application application_1585555727521_0002 to ResourceManager
20/03/30 16:38:21 INFO impl.YarnClientImpl: Submitted application application_1585555727521_0002
20/03/30 16:38:22 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:22 INFO yarn.Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1585557501645
     final status: UNDEFINED
     tracking URL: http://hadoop:8088/proxy/application_1585555727521_0002/
     user: root
20/03/30 16:38:23 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:24 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:25 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:26 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:27 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:28 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:29 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:30 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:31 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:32 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:33 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:34 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:35 INFO yarn.Client: Application report for application_1585555727521_0002 (state: ACCEPTED)
20/03/30 16:38:36 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:36 INFO yarn.Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: hadoop
     ApplicationMaster RPC port: 43325
     queue: default
     start time: 1585557501645
     final status: UNDEFINED
     tracking URL: http://hadoop:8088/proxy/application_1585555727521_0002/
     user: root
20/03/30 16:38:37 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:38 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:39 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:40 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:41 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:42 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:43 INFO yarn.Client: Application report for application_1585555727521_0002 (state: RUNNING)
20/03/30 16:38:44 INFO yarn.Client: Application report for application_1585555727521_0002 (state: FINISHED)
20/03/30 16:38:44 INFO yarn.Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: hadoop
     ApplicationMaster RPC port: 43325
     queue: default
     start time: 1585557501645
     final status: SUCCEEDED
     tracking URL: http://hadoop:8088/proxy/application_1585555727521_0002/
     user: root
20/03/30 16:38:44 INFO yarn.Client: Deleted staging directory hdfs://192.168.1.66:9000/user/root/.sparkStaging/application_1585555727521_0002
20/03/30 16:38:44 INFO util.ShutdownHookManager: Shutdown hook called
20/03/30 16:38:44 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-aab93a8f-5930-4a9d-808f-e80ec7328fe6
20/03/30 16:38:44 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-0d062493-80a1-4624-b9bf-9f95acfb3626

注意,使用yarn-cluster模式计算,结果没有输出在控制台,结果写在了Hadoop集群的日志中,如何查看计算结果?注意到刚才的输出中有地址:
tracking URL: http://hadoop:8088/proxy/application_1585555727521_0002/
进去看看:
image.png
点击logs:
image.png
选择stdout:
image.png
圆周率结果已经打印出来了。
这里再给出几个常用命令:

启动spark
./sbin/start-all.sh
启动Hadoop以**及Spark:
./starths.sh
相关实践学习
基于MaxCompute的热门话题分析
Apsara Clouder大数据专项技能认证配套课程:基于MaxCompute的热门话题分析
相关文章
|
9月前
|
人工智能 分布式计算 大数据
大数据≠大样本:基于Spark的特征降维实战(提升10倍训练效率)
本文探讨了大数据场景下降维的核心问题与解决方案,重点分析了“维度灾难”对模型性能的影响及特征冗余的陷阱。通过数学证明与实际案例,揭示高维空间中样本稀疏性问题,并提出基于Spark的分布式降维技术选型与优化策略。文章详细展示了PCA在亿级用户画像中的应用,包括数据准备、核心实现与效果评估,同时深入探讨了协方差矩阵计算与特征值分解的并行优化方法。此外,还介绍了动态维度调整、非线性特征处理及降维与其他AI技术的协同效应,为生产环境提供了最佳实践指南。最终总结出降维的本质与工程实践原则,展望未来发展方向。
460 0
|
分布式计算 大数据 Apache
ClickHouse与大数据生态集成:Spark & Flink 实战
【10月更文挑战第26天】在当今这个数据爆炸的时代,能够高效地处理和分析海量数据成为了企业和组织提升竞争力的关键。作为一款高性能的列式数据库系统,ClickHouse 在大数据分析领域展现出了卓越的能力。然而,为了充分利用ClickHouse的优势,将其与现有的大数据处理框架(如Apache Spark和Apache Flink)进行集成变得尤为重要。本文将从我个人的角度出发,探讨如何通过这些技术的结合,实现对大规模数据的实时处理和分析。
1103 2
ClickHouse与大数据生态集成:Spark & Flink 实战
|
存储 分布式计算 Hadoop
从“笨重大象”到“敏捷火花”:Hadoop与Spark的大数据技术进化之路
从“笨重大象”到“敏捷火花”:Hadoop与Spark的大数据技术进化之路
605 79
|
SQL 分布式计算 关系型数据库
基于云服务器的数仓搭建-hive/spark安装
本文介绍了在本地安装和配置MySQL、Hive及Spark的过程。主要内容包括: - **MySQL本地安装**:详细描述了内存占用情况及安装步骤,涉及安装脚本的编写与执行,以及连接MySQL的方法。 - **Hive安装**:涵盖了从上传压缩包到配置环境变量的全过程,并解释了如何将Hive元数据存储配置到MySQL中。 - **Hive与Spark集成**:说明了如何安装Spark并将其与Hive集成,确保Hive任务由Spark执行,同时解决了依赖冲突问题。 - **常见问题及解决方法**:列举了安装过程中可能遇到的问题及其解决方案,如内存配置不足、节点间通信问题等。
基于云服务器的数仓搭建-hive/spark安装
|
存储 分布式计算 Hadoop
数据湖技术:Hadoop与Spark在大数据处理中的协同作用
【10月更文挑战第27天】在大数据时代,数据湖技术凭借其灵活性和成本效益成为企业存储和分析大规模异构数据的首选。Hadoop和Spark作为数据湖技术的核心组件,通过HDFS存储数据和Spark进行高效计算,实现了数据处理的优化。本文探讨了Hadoop与Spark的最佳实践,包括数据存储、处理、安全和可视化等方面,展示了它们在实际应用中的协同效应。
656 2
|
存储 分布式计算 Hadoop
数据湖技术:Hadoop与Spark在大数据处理中的协同作用
【10月更文挑战第26天】本文详细探讨了Hadoop与Spark在大数据处理中的协同作用,通过具体案例展示了两者的最佳实践。Hadoop的HDFS和MapReduce负责数据存储和预处理,确保高可靠性和容错性;Spark则凭借其高性能和丰富的API,进行深度分析和机器学习,实现高效的批处理和实时处理。
561 1
|
6月前
|
机器学习/深度学习 传感器 分布式计算
数据才是真救命的:聊聊如何用大数据提升灾难预警的精准度
数据才是真救命的:聊聊如何用大数据提升灾难预警的精准度
442 14
|
8月前
|
数据采集 分布式计算 DataWorks
ODPS在某公共数据项目上的实践
本项目基于公共数据定义及ODPS与DataWorks技术,构建一体化智能化数据平台,涵盖数据目录、归集、治理、共享与开放六大目标。通过十大子系统实现全流程管理,强化数据安全与流通,提升业务效率与决策能力,助力数字化改革。
291 4
|
7月前
|
机器学习/深度学习 运维 监控
运维不怕事多,就怕没数据——用大数据喂饱你的运维策略
运维不怕事多,就怕没数据——用大数据喂饱你的运维策略
529 0
|
6月前
|
传感器 人工智能 监控
数据下田,庄稼不“瞎种”——聊聊大数据如何帮农业提效
数据下田,庄稼不“瞎种”——聊聊大数据如何帮农业提效
222 14