今天收到报警Datanode is dead,登录上去看下发现datanode进程还“活着”,没有高负载,内存也正常,datanode日志发现只有几处block传输异常,之后就是在接收block,但是心跳超时导致被NN认为死亡


1
2
3
4
5
6
7
8
9
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException inBlockReceiver.run():
java.net.SocketException:Connection reset
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java: 96 )
at java.net.SocketOutputStream.write(SocketOutputStream.java: 136 )
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java: 65 )
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java: 123 )
at java.io.DataOutputStream.flush(DataOutputStream.java: 106 )
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java: 1003 )
at java.lang.Thread.run(Thread.java: 662 )

查看了ThreadDump(见附1)后终于发现原因:

在pipeline失败后,需要创建一个新的DataXceiver进行pipeline recovery,1.调用recoverRbw方法会锁定FsDatasetImp;2.然后调用stopWriter:即在进行错误恢复时,如果该数据节点的写线程DataReceiver如果还没有结束,要首先中断并等待该线程退出,保证后续恢复:

1
2
3
4
5
6
7
8
9
10
public  void  stopWriter()  throws  IOException {
   if  (writer !=  null  && writer != Thread.currentThread() && writer.isAlive()) {
     writer.interrupt();
     try  {
       writer.join();
     catch  (InterruptedException e) {
       throw  new  IOException( "Waiting for writer thread is interrupted." );
     }
   }
}

3.同时老的写入线程可能正在提交block,因此需要对FsDatasetImpl进行锁定,这时死锁发生了!!!

4.而Datanode发送Heartbeat前也需要获取FsDatasetImpl的锁定获取节点容量,因此发生超时!


其实这是一个Bug,在CDH4.4.0已经修复了,引入隐藏参数dfs.datanode.xceiver.stop.timeout.millis默认1min,对stopWriter引入超时join.

附1:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
"DataXceiverfor client xxx at xxx [Receiving block xxx]"  daemon prio=10tid= 0x00000000406bc000  nid= 0x612d  in Object.wait() [ 0x00007ffdcb314000 ]
    java.lang.Thread.State: WAITING (on objectmonitor)
         at java.lang.Object.wait(Native Method)
         atjava.lang.Thread.join(Thread.java: 1186 )
         - locked < 0x00000007ca882a30 > (aorg.apache.hadoop.util.Daemon)
         atjava.lang.Thread.join(Thread.java: 1239 )
         atorg.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java: 157 )
         atorg.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverRbw(FsDatasetImpl.java: 707 )
         - locked < 0x0000000784acf478 > (aorg.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
         atorg.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverRbw(FsDatasetImpl.java: 91 )
         atorg.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java: 163 )
         atorg.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java: 457 )
         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java: 103 )
         atorg.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java: 67 )
         atorg.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java: 221 )
         at java.lang.Thread.run(Thread.java: 662 )


1
2
3
4
5
6
"PacketResponder:xxx, type=HAS_DOWNSTREAM_IN_PIPELINE"  daemon prio=10tid= 0x00007ffde0248000  nid= 0x5ed7  waiting  for  monitor entry [ 0x00007ffdccbf9000 ]
    java.lang.Thread.State: BLOCKED (on objectmonitor)
         atorg.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock(FsDatasetImpl.java: 846 )
         - waiting to lock< 0x0000000784acf478 > (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
         atorg.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java: 964 )
         atjava.lang.Thread.run(Thread.java: 662 )

本文转自MIKE老毕 51CTO博客,原文链接:http://blog.51cto.com/boylook/1313655,如需转载请自行联系原作者