上一遍文章中其实最主要的就是JAVA环境变量以及hadoop环境变量的设置,这两个设置好了的话,运行hadoop基本上不会出问题。
在hadoop的基础上安装spark好简单。
安装Spark之前需要先安装Hadoop集群,因为之前已经安装了hadoop,所以我直接在之前的hadoop集群上安装spark。
硬件环境:
hddcluster1 10.0.0.197 redhat7
hddcluster2 10.0.0.228 centos7 这台作为master
hddcluster3 10.0.0.202 redhat7
hddcluster4 10.0.0.181 centos7
软件环境:
scala-2.11.7
spark-2.0.2-bin-hadoop2.7.tgz
#所有操作用hadoop
基本流程:
1、master解压scala-2.11.7和spark-2.0.2-bin-hadoop2.7.tgz到相应的目录
2、配置scala和spark环境变量
3、修改配置文件
4、拷贝scala和spark到各个节点,授权
5、启动spark集群
1
2
3
4
5
6
7
8
9
10
11
|
#hadoop用户下操作,下载scala,安装
wget http:
//downloads
.lightbend.com
/scala/2
.11.7
/scala-2
.11.7.tgz
tar
-zxvf scala-2.11.7.tgz
mv
scala-2.11.7
/usr/local/scala
sudo
mv
scala-2.11.7
/usr/local/scala
vim .bashrc
#添加
export
SCALA_HOME=
/usr/local/scala
export
PATH=$PATH:$HADOOP_HOME
/sbin
:$HADOOP_HOME
/bin
:$SCALA_HOME
/bin
source
.bashrc
[hadoop@hddcluster2 ~]$ scala -version
Scala code runner version 2.11.7 -- Copyright 2002-2013, LAMP
/EPFL
|
1
2
3
4
5
6
7
|
#在官网下载spark-2.0.2-bin-hadoop2.7.tgz
tar
-zxvf spark-2.0.2-bin-hadoop2.7.tgz
mv
spark-2.0.2-bin-hadoop2.7 spark
sudo
mv
spark
/usr/local/
vim .bashrc
#添加
export
SPARK_HOME=
/usr/local/spark
export
PATH=$PATH:$HADOOP_HOME
/sbin
:$HADOOP_HOME
/bin
:$SCALA_HOME
/bin
:$SPARK_HOME
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
|
#修改spark配置文件
cd
/usr/local/spark/conf
cp
spark-
env
.sh.template spark-
env
.sh
vi
spark-
env
.sh
#添加下面内容
###jdk dir
export
JAVA_HOME=
/usr/lib/jvm/java-1
.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64
###scala dir
export
SCALA_HOME=
/usr/local/scala
###the ip of master node of spark
export
SPARK_MASTER_IP=10.0.0.228
###the max memory size of worker
export
SPARK_WORKER_MEMORY=8G
###hadoop configuration file dir
export
HADOOP_CONF_DIR=
/usr/local/hadoop/etc/hadoop/
#修改slave
cp
slaves.template slaves
vi
slaves
#把localhost改为下面内容
hddcluster1
hddcluster2
hddcluster3
hddcluster4
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
#把/usr/local/spark 和 /usr/local/scala打包,然后复制到slave节点、
cd
/usr/local
tar
-zcf ~
/master
.spark.
tar
.gz .
/spark
tar
-zcf ~
/master
.scala.
tar
.gz .
/scala
scp
master.spark.
tar
.gz hddcluster1:~
scp
master.scala.
tar
.gz hddcluster1:~
#登录各个节点进行解压到/usr/local
tar
-zxf master.spark.
tar
.gz -C
/usr/local/
tar
-zxf master.scala.
tar
.gz -C
/usr/local/
chown
-R hadoop:hadoop
/usr/local/spark
chown
-R hadoop:hadoop
/usr/local/scala
再配置.bashrc环境变量和master的一样。
加上hadoop上一篇的.bashrc内容是这样子:
#scala
export
SCALA_HOME=
/usr/local/scala
#spark
export
SPARK_HOME=
/usr/local/spark
#java and hadoop
export
JAVA_HOME=
/usr/lib/jvm/java-1
.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64
export
HADOOP_HOME=
/usr/local/hadoop
export
HADOOP_INSTALL=$HADOOP_HOME
export
HADOOP_MAPRED_HOME=$HADOOP_HOME
export
HADOOP_COMMON_HOME=$HADOOP_HOME
export
HADOOP_HDFS_HOME=$HADOOP_HOME
export
YARN_HOME=$HADOOP_HOME
export
HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME
/lib/native
export
PATH=$PATH:$HADOOP_HOME
/sbin
:$HADOOP_HOME
/bin
:$SCALA_HOME
/bin
:$SPARK_HOME
export
HADOOP_PREFIX=$HADOOP_HOME
export
HADOOP_OPTS=
"-Djava.library.path=$HADOOP_PREFIX/lib:$HADOOP_PREFIX/lib/native"
到此Spark集群搭建完毕
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
.启动Spark集群:
启动Spark之前需要先将hadoop的dfs以及yarn启动。
/usr/local/spark/sbin/start-all
.sh
启动所有服务之后,在命令行输入jps:
[hadoop@hddcluster2 ~]$ jps
29601 ResourceManager
32098 SparkSubmit
29188 DataNode
29364 SecondaryNameNode
29062 NameNode
29915 NodeManager
30251 Master
30380 Worker
30062 JobHistoryServer
18767 Jps
比hadoop集群启动时多了Master和worker
/usr/local/spark/bin/spark-shell
.sh
出现scala>时说明成功。
在浏览器中输入10.0.0.228:8080时,会看到如下图,有4个Worker
|
本文转自 yanconggod 51CTO博客,原文链接:http://blog.51cto.com/yanconggod/1885082