Yarn 集群资源管理系统
Yarn 角色及概念
•Yarn 是 Hadoop 的一个通用的资源管理系统
• Yarn 角色
– Resourcemanager
– Nodemanager
– ApplicationMaster
– Container
– Client
• ResourceManager
– 处理客户端请求
– 启动 / 监控 ApplicationMaster
– 监控 NodeManager
– 资源分配与调度
• NodeManager
– 单个节点上的资源管理
– 处理来自 ResourceManager 的命令
– 处理来自 ApplicationMaster 的命令
• Container
– 对任务运行行环境的抽象,封装了 CPU 、内存等
– 多维资源以及环境变量、启动命令等任务运行相关的信息资源分配与调度
• ApplicationMaster
– 数据切分
– 为应用程序申请资源,并分配给内部任务
– 任务监控与容错
• Client
– 用户与 YARN 交互的客户端程序
– 提交应用程序、监控应用程序状态,杀死应用程序等
Yarn 结构
• YARN 的核心思想
• 将 JobTracker 和 TaskTacker 进行分离,它由下面几大构成组件:
– ResourceManager 一个全局的资源管理器
– NodeManager 每个节点(RM)代理
– ApplicationMaster 表示每个应用
– 每一个 ApplicationMaster 有多个 Container 在NodeManager 上运行
系统规划
主机 角色 软件
192.168.4.1 master Resource Manager YARN
192.168.4.2 node1 Node Manager YARN
192.168.4.3 node2 Node Manager YARN
192.168.4.4 node3 Node Manager YARN
Yarn 安装与配置
具体实验准备 可以参考 http://blog.51cto.com/13558754/2066708
# ssh 192.168.4.1
# cd /usr/local/hadoop/
# cd etc/hadoop/
# cp mapred-site.xml.template mapred-site.xml
# vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value> // 配置使用yarn 资源管理系统
</property>
</configuration>
# vim yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value> //配置 Resource Manager 角色
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value> // 一个 java 的类 真实环境与开发人员沟通
</property>
</configuration>
配置完成以后
# for i in node{1..3} //将配置文件同步到所有主机
> do
> rsync -azSH --delete /usr/local/hadoop/etc/hadoop/ ${i}:/usr/local/hadoop/etc/hadoop -e 'ssh'
> done
# cd /usr/local/hadoop/
启动 yarn 服务
# ./sbin/start-yarn.sh
在所有主机上执行 jps, 查看是否启动成功
# for i in master node{1..3}
> do
> echo ${i}
> ssh ${i} "jps"
> done
master
3312 Jps
3005 ResourceManager
node1
3284 Jps
3162 NodeManager
node2
2882 NodeManager
3004 Jps
node3
2961 Jps
2831 NodeManager
显示所有可用的计算节点
# ./bin/yarn node -list
18/01/31 06:41:56 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.4.1:8032
Total Nodes:3
Node-Id Node-StateNode-Http-AddressNumber-of-Running-Containers
node3:46007 RUNNING node3:8042 0
node2:54895 RUNNING node2:8042 0
node1:51087 RUNNING node1:8042
resourcemanager
nodemangager
验证 Yarn
# bin/hadoop fs -ls /input
Found 3 items
-rw-r--r-- 2 root supergroup 84854 2018-01-29 21:37 /input/LICENSE.txt
-rw-r--r-- 2 root supergroup 14978 2018-01-29 21:37 /input/NOTICE.txt
-rw-r--r-- 2 root supergroup 1366 2018-01-29 21:37 /input/README.txt
使用yarn 统计 样本文件中单词出现频率
# ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount hdfs://master:9000/input hdfs://master:9000/output
查看结果
# ./bin/hadoop fs -cat hdfs://master:9000/output/*
Yarn 节点管理
[root@master ~] # cat /etc/hosts
192.168.4.1master
192.168.4.2 node1
192.168.4.3 node2
192.168.4.4 node3
192.168.4.5 newnode
[root@newnode ~]# rsync -azSH --delete master:/usr/local/hadoop /usr/local
[root@master hadoop]# ./sbin/start-yarn.sh
添加节点
[root@master hadoop]# ./bin/yarn node -list
18/01/28 21:06:57 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.4.1:8032
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
node1:33596 RUNNING node1:8042 0
node2:53475 RUNNING node2:8042 0
node3:34736 RUNNING node3:8042 0
[root@newnode hadoop]# sbin/yarn-daemon.sh start nodemanager
[root@master hadoop]# ./bin/yarn node -list
18/01/28 21:07:53 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.4.1:8032
Total Nodes:4
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
newnode:39690 RUNNING newnode:8042 0
node1:33596 RUNNING node1:8042 0
node2:53475 RUNNING node2:8042 0
node3:34736 RUNNING node3:8042 0
删除节点
[root@newnode hadoop]# sbin/yarn-daemon.sh stop nodemanager
//不会立即删除
[root@master hadoop]# ./bin/yarn node -list
18/01/28 21:11:31 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.4.1:8032
Total Nodes:4
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
newnode:39690 RUNNING newnode:8042 0
node1:33596 RUNNING node1:8042 0
node2:53475 RUNNING node2:8042 0
node3:34736 RUNNING node3:8042 0
//需要重新启动服务
[root@master hadoop]# ./sbin/stop-yarn.sh
[root@master hadoop]# ./sbin/start-yarn.sh
[root@master hadoop]# ./bin/yarn node -list
18/01/28 21:12:46 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.4.1:8032
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
node1:42010 RUNNING node1:8042 0
node2:55043 RUNNING node2:8042 0
node3:38256 RUNNING node3:8042 0