<!-- /* Font Definitions */ @font-face {font-family:宋体; panose-1:2 1 6 0 3 1 1 1 1 1; mso-font-charset:80; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 680460288 22 0 262145 0;} @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 0 0 0 1 0;} @font-face {font-family:"\@宋体"; panose-1:2 1 6 0 3 1 1 1 1 1; mso-font-charset:80; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 680460288 22 0 262145 0;} @font-face {font-family:Cambria; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:-536870145 1073743103 0 0 415 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin:0cm; margin-bottom:.0001pt; text-align:justify; text-justify:inter-ideograph; mso-pagination:none; font-size:12.0pt; font-family:Cambria; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:宋体; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi; mso-font-kerning:1.0pt;} h1 {mso-style-priority:9; mso-style-unhide:no; mso-style-qformat:yes; mso-style-link:"标题 1字符"; mso-margin-top-alt:auto; margin-right:0cm; mso-margin-bottom-alt:auto; margin-left:0cm; mso-pagination:widow-orphan; mso-outline-level:1; font-size:24.0pt; font-family:宋体;} span.MsoIntenseEmphasis {mso-style-priority:21; mso-style-unhide:no; mso-style-qformat:yes; color:#4F81BD; mso-themecolor:accent1; font-weight:bold; font-style:italic;} span.1 {mso-style-name:"标题 1字符"; mso-style-priority:9; mso-style-unhide:no; mso-style-locked:yes; mso-style-link:"标题 1"; mso-ansi-font-size:24.0pt; mso-bidi-font-size:24.0pt; font-family:宋体; mso-ascii-font-family:宋体; mso-fareast-font-family:宋体; mso-hansi-font-family:宋体; mso-font-kerning:18.0pt; font-weight:bold;} .MsoChpDefault {mso-style-type:export-only; mso-default-props:yes; font-family:Cambria; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} /* Page Definitions */ @page {mso-page-border-surround-header:no; mso-page-border-surround-footer:no;} @page WordSection1 {size:612.0pt 792.0pt; margin:72.0pt 90.0pt 72.0pt 90.0pt; mso-header-margin:36.0pt; mso-footer-margin:36.0pt; mso-paper-source:0;} div.WordSection1 {page:WordSection1;} -->
本文主要记录了fair调度器drf调度策略作业不执行问题的解决过程,重点介绍了调查方法和调查过程细节,希望对大家了解fair调度器有所帮助 。
除本人知乎专栏外,转载请联系我。
一.问题背景
yarn的调度器有capacity和fair两种,之间的区别可以自行谷歌。fair调度器(附录1)是企业级hadoop用户常用的资源池类型,该调度器默认的队列内部调度策略(SchudelingPolicy)是fair,即分配资源时只考虑内存限制。
对一个跨部门多个团队共同使用的大集群来说,如果存在cpu密集型作业,不进行cpu控制肯定会影响其他作业的运行。想要在分配资源时同时计算内存和cpu限制,需要指定队列的调度策略为drf,即DominantResourceFainessPolicy(附录2)。一般配合cgroup使用,控制容器实际的cpu资源(附录3)。
使用drf时遇到一个问题,配置非常简单的一个fair-scheduler示例,但作业提交后不分配资源。
配置如下:
<!-- /* Font Definitions */ @font-face {font-family:宋体; panose-1:2 1 6 0 3 1 1 1 1 1; mso-font-charset:80; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 680460288 22 0 262145 0;} @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 0 0 0 1 0;} @font-face {font-family:"\@宋体"; panose-1:2 1 6 0 3 1 1 1 1 1; mso-font-charset:80; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 680460288 22 0 262145 0;} @font-face {font-family:Cambria; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:-536870145 1073743103 0 0 415 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin:0cm; margin-bottom:.0001pt; text-align:justify; text-justify:inter-ideograph; mso-pagination:none; font-size:12.0pt; font-family:Cambria; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:宋体; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi; mso-font-kerning:1.0pt;} .MsoChpDefault {mso-style-type:export-only; mso-default-props:yes; font-family:Cambria; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} /* Page Definitions */ @page {mso-page-border-surround-header:no; mso-page-border-surround-footer:no;} @page WordSection1 {size:612.0pt 792.0pt; margin:72.0pt 90.0pt 72.0pt 90.0pt; mso-header-margin:36.0pt; mso-footer-margin:36.0pt; mso-paper-source:0;} div.WordSection1 {page:WordSection1;} -->
运行wordcount卡住
<!-- /* Font Definitions */ @font-face {font-family:宋体; panose-1:2 1 6 0 3 1 1 1 1 1; mso-font-charset:80; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 680460288 22 0 262145 0;} @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 0 0 0 1 0;} @font-face {font-family:"\@宋体"; panose-1:2 1 6 0 3 1 1 1 1 1; mso-font-charset:80; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 680460288 22 0 262145 0;} @font-face {font-family:Cambria; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:-536870145 1073743103 0 0 415 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin:0cm; margin-bottom:.0001pt; text-align:justify; text-justify:inter-ideograph; mso-pagination:none; font-size:12.0pt; font-family:Cambria; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:宋体; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi; mso-font-kerning:1.0pt;} .MsoChpDefault {mso-style-type:export-only; mso-default-props:yes; font-family:Cambria; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} /* Page Definitions */ @page {mso-page-border-surround-header:no; mso-page-border-surround-footer:no;} @page WordSection1 {size:612.0pt 792.0pt; margin:72.0pt 90.0pt 72.0pt 90.0pt; mso-header-margin:36.0pt; mso-footer-margin:36.0pt; mso-paper-source:0;} div.WordSection1 {page:WordSection1;} -->
将调度策略由drf改成fair或不设置(默认还是fair),作业运行正常
<!-- /* Font Definitions */ @font-face {font-family:Wingdings; panose-1:5 0 0 0 0 0 0 0 0 0; mso-font-charset:2; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:0 268435456 0 0 -2147483648 0;} @font-face {font-family:宋体; panose-1:2 1 6 0 3 1 1 1 1 1; mso-font-charset:80; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 680460288 22 0 262145 0;} @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 0 0 0 1 0;} @font-face {font-family:"\@宋体"; panose-1:2 1 6 0 3 1 1 1 1 1; mso-font-charset:80; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 680460288 22 0 262145 0;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4; mso-font-charset:0; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:-520092929 1073786111 9 0 415 0;} @font-face {font-family:Cambria; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:-536870145 1073743103 0 0 415 0;} @font-face {font-family:"PingFang SC Regular"; panose-1:2 11 4 0 0 0 0 0 0 0; mso-font-charset:80; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:-1610611969 2060451323 23 0 262145 0;} @font-face {font-family:"\@PingFang SC Regular"; panose-1:2 11 4 0 0 0 0 0 0 0; mso-font-charset:80; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:-1610611969 2060451323 23 0 262145 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin:0cm; margin-bottom:.0001pt; text-align:justify; text-justify:inter-ideograph; mso-pagination:none; font-size:12.0pt; font-family:Cambria; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:宋体; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi; mso-font-kerning:1.0pt;} h1 {mso-style-priority:9; mso-style-unhide:no; mso-style-qformat:yes; mso-style-link:"标题 1字符"; mso-margin-top-alt:auto; margin-right:0cm; mso-margin-bottom-alt:auto; margin-left:0cm; mso-pagination:widow-orphan; mso-outline-level:1; font-size:24.0pt; font-family:宋体;} h2 {mso-style-priority:9; mso-style-qformat:yes; mso-style-link:"标题 2字符"; mso-style-next:正常; margin-top:13.0pt; margin-right:0cm; margin-bottom:13.0pt; margin-left:0cm; text-align:justify; text-justify:inter-ideograph; line-height:173%; mso-pagination:lines-together; page-break-after:avoid; mso-outline-level:2; font-size:16.0pt; font-family:Calibri; mso-ascii-font-family:Calibri; mso-ascii-theme-font:major-latin; mso-fareast-font-family:宋体; mso-fareast-theme-font:major-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:major-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:major-bidi; mso-font-kerning:1.0pt;} a:link, span.MsoHyperlink {mso-style-priority:99; color:blue; mso-themecolor:hyperlink; text-decoration:underline; text-underline:single;} a:visited, span.MsoHyperlinkFollowed {mso-style-noshow:yes; mso-style-priority:99; color:purple; mso-themecolor:followedhyperlink; text-decoration:underline; text-underline:single;} p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph {mso-style-priority:34; mso-style-unhide:no; mso-style-qformat:yes; margin:0cm; margin-bottom:.0001pt; text-align:justify; text-justify:inter-ideograph; text-indent:21.0pt; mso-char-indent-count:2.0; mso-pagination:none; font-size:12.0pt; font-family:Cambria; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:宋体; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi; mso-font-kerning:1.0pt;} span.MsoIntenseEmphasis {mso-style-priority:21; mso-style-unhide:no; mso-style-qformat:yes; color:#4F81BD; mso-themecolor:accent1; font-weight:bold; font-style:italic;} span.1 {mso-style-name:"标题 1字符"; mso-style-priority:9; mso-style-unhide:no; mso-style-locked:yes; mso-style-link:"标题 1"; mso-ansi-font-size:24.0pt; mso-bidi-font-size:24.0pt; font-family:宋体; mso-ascii-font-family:宋体; mso-fareast-font-family:宋体; mso-hansi-font-family:宋体; mso-font-kerning:18.0pt; font-weight:bold;} span.2 {mso-style-name:"标题 2字符"; mso-style-priority:9; mso-style-unhide:no; mso-style-locked:yes; mso-style-link:"标题 2"; mso-ansi-font-size:16.0pt; mso-bidi-font-size:16.0pt; font-family:Calibri; mso-ascii-font-family:Calibri; mso-ascii-theme-font:major-latin; mso-fareast-font-family:宋体; mso-fareast-theme-font:major-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:major-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:major-bidi; font-weight:bold;} .MsoChpDefault {mso-style-type:export-only; mso-default-props:yes; font-family:Cambria; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} /* Page Definitions */ @page {mso-page-border-surround-header:no; mso-page-border-surround-footer:no;} @page WordSection1 {size:612.0pt 792.0pt; margin:72.0pt 90.0pt 72.0pt 90.0pt; mso-header-margin:36.0pt; mso-footer-margin:36.0pt; mso-paper-source:0;} div.WordSection1 {page:WordSection1;} /* List Definitions */ @list l0 {mso-list-id:407965220; mso-list-type:hybrid; mso-list-template-ids:1228422330 878216062 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;} @list l0:level1 {mso-level-tab-stop:none; mso-level-number-position:left; margin-left:18.0pt; text-indent:-18.0pt;} @list l0:level2 {mso-level-number-format:bullet; mso-level-text:; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:48.0pt; text-indent:-24.0pt; font-family:Wingdings;} @list l0:level3 {mso-level-number-format:bullet; mso-level-text:; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:72.0pt; text-indent:-24.0pt; font-family:Wingdings;} @list l0:level4 {mso-level-number-format:bullet; mso-level-text:; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:96.0pt; text-indent:-24.0pt; font-family:Wingdings;} @list l0:level5 {mso-level-number-format:bullet; mso-level-text:; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:120.0pt; text-indent:-24.0pt; font-family:Wingdings;} @list l0:level6 {mso-level-number-format:bullet; mso-level-text:; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:144.0pt; text-indent:-24.0pt; font-family:Wingdings;} @list l0:level7 {mso-level-number-format:bullet; mso-level-text:; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:168.0pt; text-indent:-24.0pt; font-family:Wingdings;} @list l0:level8 {mso-level-number-format:bullet; mso-level-text:; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:192.0pt; text-indent:-24.0pt; font-family:Wingdings;} @list l0:level9 {mso-level-number-format:bullet; mso-level-text:; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:216.0pt; text-indent:-24.0pt; font-family:Wingdings;} @list l1 {mso-list-id:915744145; mso-list-type:hybrid; mso-list-template-ids:-1065563516 878216062 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;} @list l1:level1 {mso-level-tab-stop:none; mso-level-number-position:left; margin-left:18.0pt; text-indent:-18.0pt;} @list l1:level2 {mso-level-number-format:alpha-lower; mso-level-text:"%2\)"; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:48.0pt; text-indent:-24.0pt;} @list l1:level3 {mso-level-number-format:roman-lower; mso-level-tab-stop:none; mso-level-number-position:right; margin-left:72.0pt; text-indent:-24.0pt;} @list l1:level4 {mso-level-tab-stop:none; mso-level-number-position:left; margin-left:96.0pt; text-indent:-24.0pt;} @list l1:level5 {mso-level-number-format:alpha-lower; mso-level-text:"%5\)"; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:120.0pt; text-indent:-24.0pt;} @list l1:level6 {mso-level-number-format:roman-lower; mso-level-tab-stop:none; mso-level-number-position:right; margin-left:144.0pt; text-indent:-24.0pt;} @list l1:level7 {mso-level-tab-stop:none; mso-level-number-position:left; margin-left:168.0pt; text-indent:-24.0pt;} @list l1:level8 {mso-level-number-format:alpha-lower; mso-level-text:"%8\)"; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:192.0pt; text-indent:-24.0pt;} @list l1:level9 {mso-level-number-format:roman-lower; mso-level-tab-stop:none; mso-level-number-position:right; margin-left:216.0pt; text-indent:-24.0pt;} @list l2 {mso-list-id:1632979707; mso-list-type:hybrid; mso-list-template-ids:-1647177912 878216062 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;} @list l2:level1 {mso-level-tab-stop:none; mso-level-number-position:left; margin-left:18.0pt; text-indent:-18.0pt;} @list l2:level2 {mso-level-number-format:alpha-lower; mso-level-text:"%2\)"; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:48.0pt; text-indent:-24.0pt;} @list l2:level3 {mso-level-number-format:roman-lower; mso-level-tab-stop:none; mso-level-number-position:right; margin-left:72.0pt; text-indent:-24.0pt;} @list l2:level4 {mso-level-tab-stop:none; mso-level-number-position:left; margin-left:96.0pt; text-indent:-24.0pt;} @list l2:level5 {mso-level-number-format:alpha-lower; mso-level-text:"%5\)"; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:120.0pt; text-indent:-24.0pt;} @list l2:level6 {mso-level-number-format:roman-lower; mso-level-tab-stop:none; mso-level-number-position:right; margin-left:144.0pt; text-indent:-24.0pt;} @list l2:level7 {mso-level-tab-stop:none; mso-level-number-position:left; margin-left:168.0pt; text-indent:-24.0pt;} @list l2:level8 {mso-level-number-format:alpha-lower; mso-level-text:"%8\)"; mso-level-tab-stop:none; mso-level-number-position:left; margin-left:192.0pt; text-indent:-24.0pt;} @list l2:level9 {mso-level-number-format:roman-lower; mso-level-tab-stop:none; mso-level-number-position:right; margin-left:216.0pt; text-indent:-24.0pt;} ol {margin-bottom:0cm;} ul {margin-bottom:0cm;} -->
下面详细分析调查过程,不关心分析过程的可直接看五.结论章节。
二.发现表象原因
1. 查fair官网文档,没发现有什么问题。
2. 搜谷歌,没找到类似问题。搜github hadoop的issue,也没找到类似问题。
3. 看了官网文档和其他技术网站的各种示例,灵光一闪,将queue的name从default换成root,作业运行正常了。难道是不能叫default?
4. 尝试queue的name为“defaul”等等,作业都正常。就是default不行。但调度策略换回fair后,default可以。
5. 搜hadoop源码,没找到有类似对default做特殊业务处理的地方。
6. 测试集群用的emr-hadoop 2.7.2版本,是hadoop定制版。换成apache原生的hadoop 2.7.2,queue名字为default作业也不运行。说明原生hadoop也有同样问题。
那么为什么队列叫default作业就不运行呢?
三.深入调查的方法
通过加打印日志的方式定位问题。
1. 下载hadoop源码(附录四),fair调度器的源码在./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager子项目下,org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair包里。
2. 参考一些源码分析的文章(附录五,六,七)看fair调度器的源码,在合适的地方加打印日志。
3. 对子项目打包,执行mvn package –DskipTests,target目录下生成hadoop-yarn-server-resourcemanager-2.7.2.jar。
4. 上传到服务器,代替hadoop目录下的share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.7.2.jar。
5. 重启resourcemanager。
6. 修改fair-scheduler的配置,提交作业,查看resourceManager的日志输出。
四.调查过程
4.1作业为什么不执行
FSLeafQueue类的updateDemand()方法会刷新子队列的资源请求,加打印如图,查看队列实时资源,队列名,调度策略,最大资源,作业信息等。
看日志发现,
1. 不管配置文件的队列名是什么,作业都提交给了root.default这个队列。说明作业默认都是提交到这个队列。
2. 该队列的实时资源fairShare只有内存,vcores一直是0.
3. 当配置文件显式配置了default队列调度策略为drf时,root.default队列的调度策略为drf。作业申请AM要一个vcores资源,没资源一直等待。
4. 当配置文件没显式配置default时,root.default队列的调度策略为默认的fair,作业申请AM只计算内存资源,成功分配资源。
那么只有root.default这个队列会vcores一直为0吗?
配置文件设置另一个队列test,调度策略为drf,作业提交时指定test队列:
hadoop jar /opt/apps/ecm/service/hadoop/2.7.2-1.2.14/package/hadoop-2.7.2-1.2.14/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar
wordcount -D mapreduce.job.queuename=test /apps/tez-0.9.1/NOTICE count10
发现依然vcores为0,作业一直等待。
作业不执行的原因确定了,是队列一直没有vcores资源。drf调度策略,am需要一个vcores资源,没有就一直等待资源。那为什么所有队列都没有vcores资源呢?
4.2.队列为什么没有vcores资源
继续加打印,FairScheduler类的update()方法会更新各队列的资源分配,加打印查看root根队列的集群资源。
ComputeFairShares类的computeShares()方法会计算队列内部各子队列/作业的资源分配,加打印。
日志看到,root根队列的集群资源是有vcores的,但root.default队列一直没有分配到vcores资源!
继续加打印,FSLeafQueue类和FSParentQueue类的recomputeShares()方法是重新计算队列内部资源的入口,加打印。
终于发现问题了,root根队列调度策略是默认的fair,给子队列分配资源时,FairSharePolicy类的computeShares()方法只会分配内存类型的资源,所以root.default队列只有内存资源,一旦配置子队列调度类型为drf计算vcores资源,会因为没有vcores资源一直等待。
验证,配置文件配置root和default两个队列,调度策略都是drf。提交作业到default队列,正常运行,日志显示default队列分配到了vcores资源。
4.3.调度器初始化细节
QueueManager类的initialize()方法,会用队列默认属性初始化了一个root根队列,和一个root.default子队列。
配置文件里写的队列名的规则是如果不是以root.开头或就是root,就增加“root.”前缀放到root根队列下。
读配置文件时,如果有root队列或default/root.default队列,会覆盖这两个队列的默认配置。
五.结论
如果想用drf调度策略计算vcores资源,那么必须从root根队列递归到叶子队列,显式配置所有队列调度策略都为drf。如果父队列没配置用了默认fair,那么只会给子队列分配内存资源,子队列用drf调度策略作业就都会没资源卡住。
六.Fair调度器源码详解
待补充
七.附录
1. fair调度器官方文档
2. yarn公平调度drf算法
3. yarn的cpu资源隔离
4. hadoop源码
6. fair调度器源码分析