centos7二进制安装Hadoop3

简介: centos7二进制安装Hadoop3

一、Hadoop 简介

1.1 Hadoop3核心组件

1

2

3

4

5

HDFS:分布式文件系统:解决海量数据存储

 

YARN:集群资源管理和任务调度框架:解决资源任务调度

 

MapReduce:分布式计算框架:解决海量数据计算

1.2 Hadoop集群简介 

1

2

3

Hadoop集群包括两个集群:HDFS  YARN

两个集群 逻辑上分离(互不影响、互不依赖)  物理上一起(部分进程在一台服务器上)

默认为主从架构

1.2.1 HDFS

1

2

3

主角色:NameNode(NN)

从角色:DataNode(DN)

主角色辅助角色:SecondrayNameNode(SNN)

1.2.2 YARN

1

2

主角色:ResourceManager(RM)

从角色:NodeManager(NM) 

二、环境信息及准备

2.1 机器及机器角色规划

2.2 节点添加hosts解析

1

2

3

192.168.1.131 hdp01.dialev.com

192.168.1.132 hdp02.dialev.com

192.168.1.133 hdp03.dialev.com

2.3 关闭防火墙  

2.4 hdp01到三台机器免密

1

2

echo "StrictHostKeyChecking no" >~/.ssh/config

ssh-copy-id -i 192.168.1.13{1..3}

2.5 时间同步

1

2

3

yum -y install ntpdate

ntpdate ntp.aliyun.com

echo '*/5 * * * * ntpdate ntp.aliyun.com 2>&1' >> /var/spool/cron/root

2.6 调大用户文件描述符

1

2

3

4

5

vim /etc/security/limits.conf

* soft nofile 65535

* hard nofile 65535

 

# 配置需要重启才能生效

2.7 安装Java环境

1

2

3

4

5

6

7

8

9

10

11

tar xf jdk-8u65-linux-x64.tar.gz -C /usr/local/

cd /usr/local/

ln -sv jdk1.8.0_65/  java

 

vim /etc/profile.d/java.sh

export JAVA_HOME=/usr/local/java

export CLASSPATH=$JAVA_HOME/lib/tools.jar

export PATH=$JAVA_HOME/bin:$PATH

 

source /etc/profile.d/java.sh

java -version

三、安装Hadoop

3.1 解压安装包

此篇文档及Hadoop相关文档相关软件包统一在此百度网盘:

链接:https://pan.baidu.com/s/11F4THdIfgrULMn2gNcObRA?pwd=cjll

1

2

3

4

5

6

7

8

9

10

11

12

# https://archive.apache.org/dist/hadoop/common/ 也可以根据实际部署版本下载tar xf hadoop-3.1.4.tar.gz -C /usr/local/

cd /usr/local/

ln -sv hadoop-3.1.4 hadoop

 

目录结构:

├── bin  Hadoop最基本的管理脚本和使用脚本的目录。

├── etc  Hadoop配置文件目录

├── include  编程库头文件,用于C++程序访问HDFS或者编写MapReduce

├── lib  Hadoop对外提供的编程动态和静态库

├── libexec  各个服务对外用的shell配置文件目录,可用于日志输出,启动参数等基本信息

├── sbin Hadoop管理脚本所在目录,主要保护HDFS和YARN中各类服务的启动/关闭脚本

└── share  Hadoop各个模块编译后的jar包目录,官方自带示例

3.2 修改Hadoop环境配置变量

参考:https://hadoop.apache.org/docs/r3.1.4/         #Configuration 章节,左侧最下方

1

2

3

4

5

6

7

8

9

10

11

cd etc/hadoop

cp hadoop-env.sh hadoop-env.sh-bak

vim hadoop-env.sh

export JAVA_HOME=/usr/local/java

export HDFS_NAMENODE_USER=root

export HDFS_DATANODE_USER=root

export HDFS_SECONDARYNAMENODE_USER=root

export YARN_RESOURCEMANAGER_USER=root

export YARN_NODEMANAGER_USER=root

export HADOOP_HOME=/usr/local/hadoop

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

3.3 指定集群默认配置

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

cp core-site.xml  core-site.xml-bak

vim core-site.xml

<configuration>

    <!-- 指定HDFS老大(namenode)的通信地址 -->

    <property>

        <name>fs.defaultFS</name>

        <value>hdfs://hdp01.dialev.com:8020</value>

    </property>

    <!-- 指定hadoop运行时产生文件的存储路径 -->

    <property>

        <name>hadoop.tmp.dir</name>

        <value>/Hadoop/tmp</value>

    </property>

    <!-- 设置HDFS web UI访问用户 -->

    <property>

        <name>hadoop.http.staticuser.user</name>

        <value>root</value>

    </property>

</configuration>     

3.4 修改SNN配置  

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

cp hdfs-site.xml hdfs-site.xml-bak

vim hdfs-site.xml

<configuration>

    <!-- 设置namenode的http通讯地址 -->

    <property>

       <name>dfs.namenode.http-address</name>

       <value>hdp01.dialev.com:50070</value>

    </property>

    <!-- 设置secondarynamenode的http通讯地址 -->

    <property>

       <name>dfs.namenode.secondary.http-address</name>

       <value>hdp02.dialev.com:50090</value>

    </property>

     <!-- 设置namenode存放的路径 -->

    <property>

       <name>dfs.namenode.name.dir</name>

       <value>/Hadoop/name</value>

    </property>

    <!-- 设置hdfs副本数量 -->

    <property>

       <name>dfs.replication</name>

       <value>2</value>

    </property>

    <!-- 设置datanode存放的路径,如果不指定则使用hadoop.tmp.dir所指路径 -->

    <property>

        <name>dfs.datanode.data.dir</name>

        <value>/Hadoop/data</value>

        </property>

</configuration>

3.5 MapReduce配置

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

cp mapred-site.xml mapred-site.xml-bak

vim mapred-site.xml

<configuration>

    <!-- 通知框架MR使用YARN -->

    <property>

       <name>mapreduce.framework.name</name>

       <value>yarn</value>

    </property>

    <!-- MR App Mater 环境变量 -->

    <property>

      <name>yarn.app.mapreduce.am.env</name>

      <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>

    </property>

    

    <!-- MR Map Task 环境变量 -->

    <property>

      <name>mapreduce.map.env</name>

      <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>

    </property>

    

    <!-- MR Reduce Task 环境变量 -->

    <property>

      <name>mapreduce.reduce.env</name>

      <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>

    </property>

</configuration>

3.6 YARN配置

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

cp yarn-site.xml yarn-site.xml-bak

vim yarn-site.xml

<configuration>

<property>

    <name>yarn.resourcemanager.hostname</name>

    <value>hdp01.dialev.com</value>

</property>

 

<property>

    <name>yarn.nodemanager.aux-services</name>

    <value>mapreduce_shuffle</value>

</property>

 

<!-- 是否将对容器实施物理内存限制 -->

<property>

    <name>yarn.nodemanager.pmem-check-enabled</name>

    <value>false</value>

</property>

 

<!-- 是否将对容器实施虚拟内存限制。 -->

<property>

    <name>yarn.nodemanager.vmem-check-enabled</name>

    <value>false</value>

</property>

 

<!-- 开启日志聚集 -->

<property>

  <name>yarn.log-aggregation-enable</name>

  <value>true</value>

</property>

 

<!-- 保存的时间7天 -->

<property>

  <name>yarn.log-aggregation.retain-seconds</name>

  <value>604800</value>

</property>

</configuration>

3.7 配置从角色机器地址  

1

2

3

4

vim workers

hdp01.dialev.com

hdp02.dialev.com

hdp03.dialev.com

8.同步集群配置

1

2

3

4

5

scp -r -q hadoop-3.1.4 192.168.1.132:/usr/local/

scp -r -q hadoop-3.1.4 192.168.1.133:/usr/local/

 

# 2 3 两个节点创建软连接

ln -sv hadoop-3.1.4 hadoop

四、启动Hadoop

4.1 初始化名称节点

在hdp01.dialev.com上执行,仅此一次,误操作可以删除初始化目录 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

hdfs namenode -format

......

2022-12-26 16:40:03,355 INFO util.GSet: 0.029999999329447746% max memory 940.5 MB = 288.9 KB

2022-12-26 16:40:03,355 INFO util.GSet: capacity      = 2^15 = 32768 entries

2022-12-26 16:40:03,406 INFO namenode.FSImage: Allocated new BlockPoolId: BP-631728325-192.168.1.131-1672044003397

2022-12-26 16:40:03,437 INFO common.Storage: Storage directory /Hadoop/name has been successfully formatted.         # /Hadoop/name初始化目录,这行信息表明对应的存储已经格式化成功。

2022-12-26 16:40:03,498 INFO namenode.FSImageFormatProtobuf: Saving image file /Hadoop/name/current/fsimage.ckpt_0000000000000000000 using no compression

2022-12-26 16:40:03,781 INFO namenode.FSImageFormatProtobuf: Image file /Hadoop/name/current/fsimage.ckpt_0000000000000000000 of size 391 bytes saved in 0 seconds .

2022-12-26 16:40:03,802 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0

2022-12-26 16:40:03,820 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid = 0 when meet shutdown.

2022-12-26 16:40:03,821 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at hdp01.dialev.com/192.168.1.131

************************************************************/

4.2 启动服务

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

1.自带sh脚本

cd hadoop/sbin/

HDFS集群:

    start-dfs.sh

    stop-dfs.sh

YARN集群:

    start-yarn.sh

    stop-yarn.sh

Hadoop集群(HDFS+YARN):

    start-all.sh

    stop-all.sh

jps  #查看命令结果是否与集群规划一致

/usr/local/hadoop/logs 日志路径,默认安装目录下logs目录

 

 

 

 

 

2.手动操作(了解):

HDFS集群

hdfs --daemon start namenode | datanode |secondarynamenode

hdfs --daemon stop namenode | datanode | secondarynamenode

YARN集群

yarn --daemon start resourcemanager | nodemanager

yarn --daemon stop resourcemanager |nodemanager 

五、验证

5.1 访问相关web UI

1

2

3

4

5

6

7

8

1.打印集群状态

hdfs dfsadmin -report

 

2.访问YARN的管理界面

http://192.168.1.131:8088/cluster/nodes   #RM服务所在主机8088端口

 

3.访问namenode的管理页面

http://192.168.1.131:50070/dfshealth.html#tab-overview  #NN服务所在主机,配置项为  hdfs-site.xml配置文件中dfs.namenode.http-address的值  2.x端口默认为50070  3.x默认为9870 这里50070是因为我没注意新版本改变还是写的2.x配置

5.2 测试创建、上传功能  

1

2

hadoop fs -mkdir /bowen   #在Hadoop根目录下创建一个 bowen 目录

hadoop fs -put yarn-env.sh /bowen  #上传文件到bowen目录下

5.3 测试MapReduce执行

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

cd /usr/local/hadoop/share/hadoop/mapreduce

hadoop jar hadoop-mapreduce-examples-3.1.4.jar pi 2 4

Number of Maps  = 2

Samples per Map = 4

Wrote input for Map #0

Wrote input for Map #1

Starting Job

2022-12-27 09:20:12,868 INFO client.RMProxy: Connecting to ResourceManager at hdp01.dialev.com/192.168.1.131:8032  #首先会连接到RM上申请资源

2022-12-27 09:20:14,091 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1672045511416_0001

2022-12-27 09:20:14,503 INFO input.FileInputFormat: Total input files to process : 2

2022-12-27 09:20:14,707 INFO mapreduce.JobSubmitter: number of splits:2

2022-12-27 09:20:15,349 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1672045511416_0001

2022-12-27 09:20:15,351 INFO mapreduce.JobSubmitter: Executing with tokens: []

2022-12-27 09:20:16,072 INFO conf.Configuration: resource-types.xml not found

2022-12-27 09:20:16,073 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.

2022-12-27 09:20:16,974 INFO impl.YarnClientImpl: Submitted application application_1672045511416_0001

2022-12-27 09:20:17,204 INFO mapreduce.Job: The url to track the job: http://hdp01.dialev.com:8088/proxy/application_1672045511416_0001/

2022-12-27 09:20:17,206 INFO mapreduce.Job: Running job: job_1672045511416_0001

2022-12-27 09:20:33,618 INFO mapreduce.Job: Job job_1672045511416_0001 running in uber mode : false

2022-12-27 09:20:33,621 INFO mapreduce.Job:  map 0% reduce 0%    #MapReduce有两个阶段,分别是 map和reduce

2022-12-27 09:20:47,862 INFO mapreduce.Job:  map 100% reduce 0%

2022-12-27 09:20:53,944 INFO mapreduce.Job:  map 100% reduce 100%

2022-12-27 09:20:53,968 INFO mapreduce.Job: Job job_1672045511416_0001 completed successfully

......

5.4 集群基准测试

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

1.写测试:写入10个文件,每个10MB

hadoop jar hadoop-mapreduce-client-jobclient-3.1.4-tests.jar TestDFSIO -write -nrFiles 5 -fileSize 10MB

......

2022-12-27 09:51:12,775 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write

2022-12-27 09:51:12,775 INFO fs.TestDFSIO:             Date & time: Tue Dec 27 09:51:12 CST 2022

2022-12-27 09:51:12,775 INFO fs.TestDFSIO:         Number of files: 5      #文件个数

2022-12-27 09:51:12,775 INFO fs.TestDFSIO:  Total MBytes processed: 50     #总大小

2022-12-27 09:51:12,775 INFO fs.TestDFSIO:       Throughput mb/sec: 12.49  #吞吐量

2022-12-27 09:51:12,775 INFO fs.TestDFSIO:  Average IO rate mb/sec: 15.46  #平均IO速率

2022-12-27 09:51:12,775 INFO fs.TestDFSIO:   IO rate std deviation: 7.51   #IO速率标准偏差

2022-12-27 09:51:12,776 INFO fs.TestDFSIO:      Test exec time sec: 32.95  #执行时间

 

2.读测试

hadoop jar hadoop-mapreduce-client-jobclient-3.1.4-tests.jar TestDFSIO -read -nrFiles 5 -fileSize 10MB

......

2022-12-27 09:54:23,826 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read

2022-12-27 09:54:23,826 INFO fs.TestDFSIO:             Date & time: Tue Dec 27 09:54:23 CST 2022

2022-12-27 09:54:23,826 INFO fs.TestDFSIO:         Number of files: 5

2022-12-27 09:54:23,826 INFO fs.TestDFSIO:  Total MBytes processed: 50

2022-12-27 09:54:23,826 INFO fs.TestDFSIO:       Throughput mb/sec: 94.34

2022-12-27 09:54:23,827 INFO fs.TestDFSIO:  Average IO rate mb/sec: 101.26

2022-12-27 09:54:23,827 INFO fs.TestDFSIO:   IO rate std deviation: 30.07

2022-12-27 09:54:23,827 INFO fs.TestDFSIO:      Test exec time sec: 34.39

 

3.清理测试数据

hadoop jar hadoop-mapreduce-client-jobclient-3.1.4-tests.jar TestDFSIO -clean

  

相关文章
|
8天前
|
Unix Linux 开发工具
centos的官网下载和vm16虚拟机安装centos8【保姆级教程图解】
本文详细介绍了如何在官网下载CentOS 8以及在VMware Workstation Pro 16虚拟机上安装CentOS 8的步骤,包括可能出现的问题和解决方案,如vcpu-0错误的处理方法。
centos的官网下载和vm16虚拟机安装centos8【保姆级教程图解】
|
9天前
|
消息中间件 Linux
centos7安装rabbitmq
centos7安装rabbitmq
|
8天前
|
Linux 虚拟化 Windows
完美解决:重新安装VMware Tools灰色。以及共享文件夹的创建(centos8)
这篇文章提供了解决VMware Tools无法重新安装(显示为灰色)问题的步骤,并介绍了如何在CentOS 8上创建和配置VMware共享文件夹。
完美解决:重新安装VMware Tools灰色。以及共享文件夹的创建(centos8)
|
9天前
|
Docker 容器
centos7.3之安装docker
centos7.3之安装docker
|
10天前
|
NoSQL Linux Redis
linux之centos安装redis
linux之centos安装redis
|
5天前
|
缓存 Linux 开发工具
CentOS7 安装KDE报错的解决方法:Loaded plugins:fastestmirror,langpacks There is no installed group.
CentOS7 安装KDE报错的解决方法:Loaded plugins:fastestmirror,langpacks There is no installed group.
14 0
|
5天前
|
安全 测试技术 Linux
CentOS7 安装vulhub漏洞测试环境
CentOS7 安装vulhub漏洞测试环境
11 0
|
8天前
|
Linux
centos 安装etcd|待优化
centos 安装etcd|待优化
|
10天前
|
Linux Python
Linux之centos安装clinkhouse以及python如何连接
Linux之centos安装clinkhouse以及python如何连接
|
10天前
|
Linux
linux之centos安装dataease数据报表工具
linux之centos安装dataease数据报表工具
下一篇
无影云桌面