聊聊 hadoop 与 sasl 安全框架
1 从一个数据同步作业的 hadoop sasl 异常讲起
某数据同步作业使用 datax 从RDBMS 同步数据到开启了kerberos安全认证的hdfs文件系统,同步作业执行过程中报错,核心报错信息是,客户端与各个 datanode 创建 BlockOutputStream 时都有报错 “Exception in createBlockOutputStream javax.security.sasl.SaslException: DIGEST-MD5: No common protection layer between client and server”, 最终所有 datanode 都因为这个原因被 exclude 排除了,所以写文件失败,作业失败退出,详细报错日志如下:
## 报错日志: 与某 datanode 创建 BlockOutputStream 时的报错 Jul 06, 2023 2:45:59 PM org.apache.hadoop.security.UserGroupInformation loginUserFromKeytab ## 这里可以看到kerberos认证是成功的 INFO: Login successful for user hs_dap@TDH using keytab file /etc/hs_dap.keytab Jul 06, 2023 2:46:00 PM org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer createBlockOutputStream INFO: Exception in createBlockOutputStream javax.security.sasl.SaslException: DIGEST-MD5: No common protection layer between client and server at com.sun.security.sasl.digest.DigestMD5Client.checkQopSupport(DigestMD5Client.java:418) at com.sun.security.sasl.digest.DigestMD5Client.evaluateChallenge(DigestMD5Client.java:221) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslParticipant.evaluateChallengeOrResponse(SaslParticipant.java:113) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:452) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:391) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:263) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1289) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1237) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449) Jul 06, 2023 2:46:00 PM org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer nextBlockOutputStream INFO: Abandoning BP-26777269-10.2.43.201-1686811609596:blk_1073808907_68083 Jul 06, 2023 2:46:00 PM org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer nextBlockOutputStream INFO: Excluding datanode DatanodeInfoWithStorage[10.2.43.203:50010,DS-be4ebd27-0a08-4240-8a71-4355b25aa04b,DISK] ## 报错日志: 各个 datanode 创建 BlockOutputStream 都有报错,各个 datanode 都被 exclude 了,所以最终客户端写文件失败 Jul 06, 2023 2:46:00 PM org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer nextBlockOutputStream INFO: Abandoning BP-26777269-10.2.43.201-1686811609596:blk_1073808909_68085 Jul 06, 2023 2:46:00 PM org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer nextBlockOutputStream INFO: Excluding datanode DatanodeInfoWithStorage[10.2.43.201:50010,DS-cff0b0d1-88da-4a64-b9f1-ca5696d8d2c5,DISK] Jul 06, 2023 2:46:00 PM org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer run WARNING: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hundsun/dap_pxqs/hive/hs_ods/xxx/part_date=20230704__f6414321_134d_41a4_8fa2_baaa2f89a6be/node1__1dcd5a30_7b67_4c3f_a269_583ba389945c could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and 3 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1621) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3224) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3148) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:722) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:493) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2225) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2221) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2197) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2219) at org.apache.hadoop.ipc.Client.call(Client.java:1476) at org.apache.hadoop.ipc.Client.call(Client.java:1407) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy10.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy11.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1430) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1226) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
2 hadoop sasl 报错问题-原因分析与解决方法
- 上述报错 “Exception in createBlockOutputStreamjavax.security.sasl.SaslException: DIGEST-MD5: No common protection layer between client and server at com.sun.security.sasl.digest.DigestMD5Client.checkQopSupport(DigestMD5Client.java:418)” 的核心原因,其实其报错信息中已经告诉我们了,“No common protection layer between client and server”,即 sasl 客户端和 sasl服务端在建立连接的初始阶段,在协商 QoP (quality of protection)时,因为客户端和服务端支持的 QoP 没有交集,所以 QoP协商失败,从而创建连接失败,进而导致作业失败;
- hadoop 客户端 (sasl客户端)对sasl相关配置参数如 hadoop.rpc.protection/dfs.data.transfer.protection 的配置,都应该以 namonode/datanode (sasl 服务端)为准,两者需要有交集,才能成功协商 QoP并建立连接;
- hadoop 客户端(sasl客户端)可以将 hadoop.rpc.protection/dfs.data.transfer.protectio 配置跟服务端一致;
- hadoop 客户端(sasl客户端)也可以将 hadoop.rpc.protection/dfs.data.transfer.protection 配置为逗号分隔的包含多个值的一个列表,此时客户端和服务端会协商确定具体的quality of protection (qop);
- 有的hadoop服务端可能只配置了 hadoop.rpc.protection,而没有配置dfs.data.transfer.protection,此时客户端对前者的配置仍应以服务端为准,而后者则可以配置为任意有效值或有效值列表;
- 如果需要更改服务端参数 hadoop.rpc.protection/dfs.data.transfer.protection,则参数修改后需要配置服务并重启hdfs;
- For a successful SASL authentication, both the client and server need to negotiate and come to agreement on a quality of protection. The Hadoop configuration property hadoop.rpc.protection/dfs.data.transfer.protection supports a comma-separate list of values that map to the SASL QoP values. Both the client end and the server end must have at least one value in common in that list;
- 可以通过如下命令查询hadoop服务端 sasl QoP相关参数的配置值:
- grep -C 2 'dfs.data.transfer.protection' /etc/hdfs1/conf/hdfs-site.xml; - grep -C 2 'hadoop.rpc.protection' /etc/hdfs1/conf/core-site.xml; - hdfs getconf -confKey "dfs.data.transfer.protection" ; - hdfs getconf -confKey "hadoop.rpc.protection" ;
具体到 datax 作业,需要配置作业的hadoopConfi参数,datax代码底层会解析该参数并配置到 org.apache.hadoop.conf.Configuration 对象中:
## 配置1-配置为单个值,以服务端为准: "hadoopConfig":{ "dfs.data.transfer.protection":"integrity", "hadoop.rpc.protection":"authentication" } ## 配置2-配置为list列表: "hadoopConfig" : { "hadoop.rpc.protection" : "authentication,integrity,privacy", "dfs.data.transfer.protection" : "authentication,integrity,privacy" }
3. hadoop sasl 报错问题 - datago相关配置方式
- 具体到公司的数据同步工具 datago,其本质是对开源 datax 在功能和易用性上的增强;
- datago在see的安装界面上,提供了对hadoop 原生 sasl 相关参数 dfs.data.transfer.protection/hadoop.rpc.protection的配置,两者可配置的值除了hadoop原生支持的authentication/integrity/privacy外,还额外支持配置为nosasl;
- 查看 datago 和源码可知,配置dfs.data.transfer.protection/hadoop.rpc.protection 为 nosasl时,其底层不会把对应的参数透传给datax的 hadoopConfig 参数,也就不会透传给 org.apache.hadoop.conf.Configuration 对象;
- 查看 datago 和源码可知,配置dfs.data.transfer.protection/hadoop.rpc.protection 为非 nosasl 时,其底层会把对应的参数透传给 datax 的 hadoopConfig 参数,也就会进而透传给org.apache.hadoop.conf.Configuration 对象:
4 hadoop sasl 相关背景知识
- Hadoop 的 RPC 通信是基于 SASL 框架的,可以通过参数 hadoop.rpc.protection配置 rpc 通信的 quality of protection(QoP)级别;
- Hadoop DataNode 的数据数据传输协议使用的不是 Hadoop RPC framework;
- 在 Hadoop 2.6.0 版本前,datanode 数据传输的安全性,是基于 root 权限和 privileged 端口的(hadoop假定攻击者无法获取 DataNode 节点的 root 权限):启动 datanode 时使用了 jsvc 工具程序,首先使用 root 用户启动 jsvc并绑定 privileged ports,然后才会切换使用 HDFS_DATANODE_SECURE_USER 指定的普通用户启动 datanode;
- 在 Hadoop 2.6.0 版本后,datanode 的数据传输协议开始支持 sasl,并支持多种 sasl QoP 模式:可以配置参数 dfs.data.transfer.protection 指定具体的QoP(此时配置客户端的QoP时,应该参考 datanode 服务端并以后者为准,二者需要有交集);当没有配置参数 dfs.data.transfer.protection 时,datanode 数据传输的安全性,使用的仍是基于 root 权限和 privileged 端口以及 jsvc 的安全机制;
- 在 Hadoop 2.6.0 版本后,也可以配置参数 dfs.encrypt.data.transfer 为 true,以开启读写 hdfs block 数据时的加密传输,该参数优先级高于 dfs.data.transfer.protection;
- SASL QoP 可配置的具体值如下(适用于 hadoop.rpc.protection/dfs.data.transfer.protection):
- authentication :authentication only;
- integrity: integrity check in addition to authentication;
- privacy: data encryption in addition to integrity.
- 在 hadoop 中,配置sasl客户端和sasl服务端的原理和方法,概述如下:
- hadoop 客户端对参数 hadoop.rpc.protection/dfs.data.transfer.protection 的配置,应参考服务端以服务端为准, 可以将客户端配置为跟服务端一致;
- hadoop 客户端也可以将 hadoop.rpc.protection/dfs.data.transfer.protection 都配置为逗号分隔的包含多个值的一个列表,此时客户端和服务端会协商确定具体的quality of protection (qop);
- 也可以配置参数 dfs.encrypt.data.transfer 为 true,以开启读写 hdfs block 数据时的加密传输,该参数优先级高于 dfs.data.transfer.protection;
- hadoop rpc framework: To encrypt data that is transferred between Hadoop services and clients, set hadoop.rpc.protection to privacy in core-site.xml;
- The DataNode data transfer protocol does not use the Hadoop RPC framework,to activate data encryption for the data transfer protocol of DataNode, set dfs.encrypt.data.transfer to true in hdfs-site.xml. (you can optionally set:dfs.encrypt.data.transfer.algorithm/dfs.encrypt.data.transfer.cipher.suites/dfs.encrypt.data.transfer.algorithm/dfs.encrypt.data.transfer.cipher.key.bitlength);
- Data Encryption on HTTP: Data transfers between Web console and clients are protected using SSL (HTTPS), such as httpfs and webHDFS;
5 hadoop sasl 相关参数汇总
##相关参数: - hadoop.security.authentication:default to simple.Possible values are simple (no authentication), and kerberos; - hadoop.rpc.protection:default to authentication.A comma-separated list of protection values for secured sasl connections. Possible values are authentication, integrity and privacy. authentication means authentication only and no integrity or privacy; integrity implies authentication and integrity are enabled; and privacy implies all of authentication, integrity and privacy are enabled. hadoop.security.saslproperties.resolver.class can be used to override the hadoop.rpc.protection for a connection at the server side.The data transfered between hadoop services and clients can be encrypted on the wire. Setting hadoop.rpc.protection to privacy in core-site.xml activates data encryption. - dfs.encrypt.data.transfer:default to false. only need to be set on nn/dn,client will deduce this. Whether or not actual block data that is read/written from/to HDFS should be encrypted on the wire. It is possible to override this setting per connection by specifying custom logic via dfs.trustedchannel.resolver.class.You need to set dfs.encrypt.data.transfer to true in the hdfs-site.xml in order to activate data encryption for data transfer protocol of DataNode. - dfs.data.transfer.protection: This property is unspecified by default. Setting this property enables SASL for authentication of data transfer protocol. If this is enabled, then dfs.datanode.address must use a non-privileged port, dfs.http.policy must be set to HTTPS_ONLY and the HDFS_DATANODE_SECURE_USER environment variable must be undefined when starting the DataNode process.A comma-separated list of SASL protection values used for secured connections to the DataNode when reading or writing block data. Possible values are authentication, integrity and privacy. If dfs.encrypt.data.transfer is set to true, then it supersedes the setting for dfs.data.transfer.protection and enforces that all connections must use a specialized encrypted SASL handshake. This property is ignored for connections to a DataNode listening on a privileged port. In this case, it is assumed that the use of a privileged port establishes sufficient trust; ##参考链接: https://docs.oracle.com/javase/8/docs/technotes/guides/security/sasl/sasl-refguide.html#DEBUG https://issues.apache.org/jira/browse/HADOOP-10211 https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SecureMode.html https://commons.apache.org/proper/commons-daemon/jsvc.html