DataNode启动失败问题解决

简介: 启动DataNode 提示Missing NameNode address

start all没有报错,但是发现这NameNode的webUI上面DataNode没有挂上。

进入DataNode查看日志发现下面问题。


datanode 进程没有起来

NodeManager启动过一段时间退出了。

错误 java.io.IOException: No services to connect, missing NameNode address.

2021-05-15 16:31:40,824 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Unable to get NameNode addresses.

2021-05-15 16:31:40,907 INFO org.eclipse.jetty.server.handler.ContextHandler: Stopped o.e.j.w.WebAppContext@33308786{/,null,UNAVAILABLE}{/datanode}

2021-05-15 16:31:40,921 INFO org.eclipse.jetty.server.AbstractConnector: Stopped ServerConnector@4b6e2263{HTTP/1.1,[http/1.1]}{localhost:0}

2021-05-15 16:31:40,921 INFO org.eclipse.jetty.server.handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@46b61c56{/static,file:///usr/local/hadoop/hadoop-3.2.1/share/hadoop/hdfs/webapps/static/,UNAVAILABLE}

2021-05-15 16:31:40,922 INFO org.eclipse.jetty.server.handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@36060e{/logs,file:///usr/local/hadoop/hadoop-3.2.1/logs/,UNAVAILABLE}

2021-05-15 16:31:40,950 INFO org.apache.hadoop.ipc.Server: Stopping server on 9867

2021-05-15 16:31:40,951 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping DataNode metrics system…

2021-05-15 16:31:40,951 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system stopped.

2021-05-15 16:31:40,952 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system shutdown complete.

2021-05-15 16:31:40,963 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Shutdown complete.

2021-05-15 16:31:40,964 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain

java.io.IOException: No services to connect, missing NameNode address.

at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.java:165)

at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1441)

at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:501)

at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2806)

at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2714)

at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2756)

at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2900)

at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2924)

2021-05-15 16:31:40,967 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: java.io.IOException: No services to connect, missing NameNode address.

2021-05-15 16:31:40,988 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:

*


这个错误前面还有一个警告日志:

Unable to get NameNode addresses


NodeManager 过了一会也挂了

详细错误棧

2021-05-15 16:47:25,535 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy

is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2021-05-15 16:47:49,580 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Cache Size Before Cl

ean: 0, Total Deleted: 0, Public Deleted: 0, Private Deleted: 0

2021-05-15 16:47:56,541 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); retry policy

is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2021-05-15 16:47:57,542 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 1 time(s); retry policy

is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2021-05-15 16:47:58,546 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 2 time(s); retry policy

is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2021-05-15 16:47:59,547 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 3 time(s); retry policy

is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2021-05-15 16:48:00,549 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 4 time(s); retry policy

is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2021-05-15 16:48:01,552 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 5 time(s); retry policy

is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2021-05-15 16:48:02,556 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 6 time(s); retry policy

is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2021-05-15 16:48:03,558 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 7 time(s); retry policy

is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2021-05-15 16:48:04,559 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 8 time(s); retry policy

is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2021-05-15 16:48:05,563 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy

is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2021-05-15 16:48:05,564 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Unexpected error starting NodeStatusUpdater

java.net.ConnectException: Your endpoint configuration is wrong; For more details see: http://wiki.apache.org/hadoop/UnsetHostnameOrPort

at sun.reflect.GeneratedConstructorAccessor21.newInstance(Unknown Source)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:833)

at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:753)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1549)

at org.apache.hadoop.ipc.Client.call(Client.java:1491)

at org.apache.hadoop.ipc.Client.call(Client.java:1388)

at org.apache.hadoop.ipc.ProtobufRpcEngineI n v o k e r . i n v o k e ( P r o t o b u f R p c E n g i n e . j a v a : 233 ) a t o r g . a p a c h e . h a d o o p . i p c . P r o t o b u f R p c E n g i n e Invoker.invoke(ProtobufRpcEngine.java:233) at org.apache.hadoop.ipc.ProtobufRpcEngineInvoker.invoke(ProtobufRpcEngine.java:233)atorg.apache.hadoop.ipc.ProtobufRpcEngineInvoker.invoke(ProtobufRpcEngine.java:118)

at com.sun.proxy.P r o x y 73. r e g i s t e r N o d e M a n a g e r ( U n k n o w n S o u r c e ) a t o r g . a p a c h e . h a d o o p . y a r n . s e r v e r . a p i . i m p l . p b . c l i e n t . R e s o u r c e T r a c k e r P B C l i e n t I m p l . r e g i s t e r N o d e M a n a g e r ( R e s o u r c e T r a c k e r P B C l i e n t I m p l . j a v a : 73 ) a t s u n . r e f l e c t . G e n e r a t e d M e t h o d A c c e s s o r 11. i n v o k e ( U n k n o w n S o u r c e ) a t s u n . r e f l e c t . D e l e g a t i n g M e t h o d A c c e s s o r I m p l . i n v o k e ( D e l e g a t i n g M e t h o d A c c e s s o r I m p l . j a v a : 43 ) a t j a v a . l a n g . r e f l e c t . M e t h o d . i n v o k e ( M e t h o d . j a v a : 498 ) a t o r g . a p a c h e . h a d o o p . i o . r e t r y . R e t r y I n v o c a t i o n H a n d l e r . i n v o k e M e t h o d ( R e t r y I n v o c a t i o n H a n d l e r . j a v a : 422 ) a t o r g . a p a c h e . h a d o o p . i o . r e t r y . R e t r y I n v o c a t i o n H a n d l e r Proxy73.registerNodeManager(Unknown Source) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:73) at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandlerProxy73.registerNodeManager(UnknownSource)atorg.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:73)atsun.reflect.GeneratedMethodAccessor11.invoke(UnknownSource)atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)atjava.lang.reflect.Method.invoke(Method.java:498)atorg.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)atorg.apache.hadoop.io.retry.RetryInvocationHandlerCall.invokeMethod(RetryInvocationHandler.java:165)

:

2021-05-15 16:48:05,602 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Public cache exiting

2021-05-15 16:48:05,602 WARN org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl: org.apache.hadoop.yarn.server.nodemanager.NodeResou

rceMonitorImpl is interrupted. Exiting.

2021-05-15 16:48:05,609 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NodeManager metrics system…

2021-05-15 16:48:05,610 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system stopped.

2021-05-15 16:48:05,610 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system shutdown complete.

2021-05-15 16:48:05,610 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager

org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Your endpoint configuration is wrong; For more details see: htt

p://wiki.apache.org/hadoop/UnsetHostnameOrPort

at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:278)

at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)

at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)

at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)

at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:975)

at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1054)

Caused by: java.net.ConnectException: Your endpoint configuration is wrong; For more details see: http://wiki.apache.org/hadoop/UnsetHostnameOrPort

at sun.reflect.GeneratedConstructorAccessor21.newInstance(Unknown Source)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:833)

at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:753)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1549)

at org.apache.hadoop.ipc.Client.call(Client.java:1491)

at org.apache.hadoop.ipc.Client.call(Client.java:1388)

at org.apache.hadoop.ipc.ProtobufRpcEngineI n v o k e r . i n v o k e ( P r o t o b u f R p c E n g i n e . j a v a : 233 ) a t o r g . a p a c h e . h a d o o p . i p c . P r o t o b u f R p c E n g i n e Invoker.invoke(ProtobufRpcEngine.java:233) at org.apache.hadoop.ipc.ProtobufRpcEngineInvoker.invoke(ProtobufRpcEngine.java:233)atorg.apache.hadoop.ipc.ProtobufRpcEngineInvoker.invoke(ProtobufRpcEngine.java:118)

at com.sun.proxy.P r o x y 73. r e g i s t e r N o d e M a n a g e r ( U n k n o w n S o u r c e ) a t o r g . a p a c h e . h a d o o p . y a r n . s e r v e r . a p i . i m p l . p b . c l i e n t . R e s o u r c e T r a c k e r P B C l i e n t I m p l . r e g i s t e r N o d e M a n a g e r ( R e s o u r c e T r a c k e r P B C l i e n t I m p l . j a v a : 73 ) a t s u n . r e f l e c t . G e n e r a t e d M e t h o d A c c e s s o r 11. i n v o k e ( U n k n o w n S o u r c e ) a t s u n . r e f l e c t . D e l e g a t i n g M e t h o d A c c e s s o r I m p l . i n v o k e ( D e l e g a t i n g M e t h o d A c c e s s o r I m p l . j a v a : 43 ) a t j a v a . l a n g . r e f l e c t . M e t h o d . i n v o k e ( M e t h o d . j a v a : 498 ) a t o r g . a p a c h e . h a d o o p . i o . r e t r y . R e t r y I n v o c a t i o n H a n d l e r . i n v o k e M e t h o d ( R e t r y I n v o c a t i o n H a n d l e r . j a v a : 422 ) a t o r g . a p a c h e . h a d o o p . i o . r e t r y . R e t r y I n v o c a t i o n H a n d l e r Proxy73.registerNodeManager(Unknown Source) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:73) at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandlerProxy73.registerNodeManager(UnknownSource)atorg.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:73)atsun.reflect.GeneratedMethodAccessor11.invoke(UnknownSource)atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)atjava.lang.reflect.Method.invoke(Method.java:498)atorg.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)atorg.apache.hadoop.io.retry.RetryInvocationHandlerCall.invokeMethod(RetryInvocationHandler.java:165)

at org.apache.hadoop.io.retry.RetryInvocationHandlerC a l l . i n v o k e ( R e t r y I n v o c a t i o n H a n d l e r . j a v a : 157 ) a t o r g . a p a c h e . h a d o o p . i o . r e t r y . R e t r y I n v o c a t i o n H a n d l e r Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandlerCall.invoke(RetryInvocationHandler.java:157)atorg.apache.hadoop.io.retry.RetryInvocationHandlerCall.invokeOnce(RetryInvocationHandler.java:95)

at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)

at com.sun.proxy.P r o x y 74. r e g i s t e r N o d e M a n a g e r ( U n k n o w n S o u r c e ) a t o r g . a p a c h e . h a d o o p . y a r n . s e r v e r . n o d e m a n a g e r . N o d e S t a t u s U p d a t e r I m p l . r e g i s t e r W i t h R M ( N o d e S t a t u s U p d a t e r I m p l . j a v a : 416 ) a t o r g . a p a c h e . h a d o o p . y a r n . s e r v e r . n o d e m a n a g e r . N o d e S t a t u s U p d a t e r I m p l . s e r v i c e S t a r t ( N o d e S t a t u s U p d a t e r I m p l . j a v a : 272 ) . . . 5 m o r e C a u s e d b y : j a v a . n e t . C o n n e c t E x c e p t i o n : C o n n e c t i o n r e f u s e d a t s u n . n i o . c h . S o c k e t C h a n n e l I m p l . c h e c k C o n n e c t ( N a t i v e M e t h o d ) a t s u n . n i o . c h . S o c k e t C h a n n e l I m p l . f i n i s h C o n n e c t ( S o c k e t C h a n n e l I m p l . j a v a : 715 ) a t o r g . a p a c h e . h a d o o p . n e t . S o c k e t I O W i t h T i m e o u t . c o n n e c t ( S o c k e t I O W i t h T i m e o u t . j a v a : 206 ) a t o r g . a p a c h e . h a d o o p . n e t . N e t U t i l s . c o n n e c t ( N e t U t i l s . j a v a : 533 ) a t o r g . a p a c h e . h a d o o p . i p c . C l i e n t Proxy74.registerNodeManager(Unknown Source) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:416) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:272) ... 5 more Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:715) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533) at org.apache.hadoop.ipc.ClientProxy74.registerNodeManager(UnknownSource)atorg.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:416)atorg.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:272)...5moreCausedby:java.net.ConnectException:Connectionrefusedatsun.nio.ch.SocketChannelImpl.checkConnect(NativeMethod)atsun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:715)atorg.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)atorg.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)atorg.apache.hadoop.ipc.ClientConnection.setupConnection(Client.java:700)


分析

关键信息:

org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.ConnectException: Your endpoint configuration is wrong; For more details see: htt

p://wiki.apache.org/hadoop/UnsetHostnameOrPort

Caused by: java.net.ConnectException: Connection refused


可能是DataNode忘记配置配置常规文件了。

core-site.xml - 导致DataNode进程连接不到master 9000端口

yarn-site.xml - 导致NodeManager多次重试连接本机8031端口(yarn resourcetracker),没有配置的默认值。


这属于两个分开的问题,最后在进行了如下改动。

core-site.xml 设置fs.defaultFS指定为集群namenode url。

yarn-site.xml 设置yarn.resourcemanager.hostname指定值为resourcemanager的主机,当然也可以直接修改yarn.resourcemanager.resource-tracker.address为RM主机:端口号。


重跑一下NodeManager进程(不建议跑stop-all和start-all.sh,会把整个集群重启)。


检查

查看NameNode WebUI

没启动的DataNode重新连接集群了。

image.png

问题彻底解决。

目录
相关文章
|
6月前
CDH5.6下线Hdfs的DataNode
CDH5.6下线Hdfs的DataNode
107 0
|
分布式计算 Hadoop 网络安全
Hadoop 集群启动后,从节点的NodeManager没有启动解决
1.slaves节点报错,报的是启动nodemanager 所需内存不足 解决: a: 修改 yarn-site.
4999 0
|
机器学习/深度学习 资源调度 分布式计算
HA场景下主NameNode启动失败
HA场景下主NameNode启动失败
伪分布式安装转分布式安装secondarynamenode服务启动失败问题
伪分布式安装转分布式安装secondarynamenode服务启动失败问题
287 0
伪分布式安装转分布式安装secondarynamenode服务启动失败问题
|
分布式计算 Hadoop Shell
Regionserver启动后又关闭
Regionserver启动后又关闭
182 0
|
存储 分布式计算 Hadoop
Hadoop集群改名导致无法启动DataNode
Hadoop集群更名 导致无法启动D a ta N o de 错误描述
286 0
Hadoop集群改名导致无法启动DataNode
|
分布式计算 Java Apache
Spark集群的启动日志
Spark集群的启动日志
115 0
NameNode和SecondaryNameNode工作机制
NameNode启动时,先滚动Edits并生成一个空的edits.inprogress,然后加载Edits和Fsimage到内存中,此时NameNode内存就持有最新的元数据信息。
|
人工智能 大数据 分布式数据库
regionserver启动后又关闭
欢迎关注大数据和人工智能技术文章发布的微信公众号:清研学堂,在这里你可以学到夜白(作者笔名)精心整理的笔记,让我们每天进步一点点,让优秀成为一种习惯! 今天启动hbase shell,输入hbase命令时报错: ERROR [regionserver/regionserver1/172.
990 0