之前在测试集群在折腾Cloudera Manager,有一次误把cloudera-scm-agent给删了。原因是卸载httpd的时候,没有发现cloudera-scm-agent依赖http服务,卸载的时候连同cloudera-scm-agent一起给删了。那次我重新安装了cloudera-manager-agent,反复折腾,CM就是无法发现这台主机。无奈之下,由于是测试集群,我就重装了一遍Cloudera Manager。
仔细一想,分布式集群,挂了一台从节点,按道理从节点恢复后,根据IP或者主机名,从节点应该能连接上主结点的,不可能需要重装。难道出在连接IP或者主机名的过程中。
后来仔细看了这个节点的cloudera-scm-agent.log日志,发现原来真是IP的问题
[13/Sep/2020 05:01:33 +0800] 22503 MainThread agent ERROR Heartbeating to localhost:7182 failed. Traceback (most recent call last): File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1390, in _send_heartbeat self.cfg.master_port) File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 469, in __init__ self.conn.connect() File "/usr/lib64/python2.7/httplib.py", line 833, in connect self.timeout, self.source_address) File "/usr/lib64/python2.7/socket.py", line 571, in create_connection raise err error: [Errno 111] Connection refused [13/Sep/2020 05:01:55 +0800] 22503 MainThread heartbeat_tracker INFO HB stats (seconds): num:1 LIFE_MIN:0.00 min:0.00 mean:0.00 max:0.00 LIFE_MAX:0.00
单独启动cloudera-scm-agent后,连接的是 localhost:7182 而不是 server端的ip
于是我们需要修改cloudera-scm-agent连接的cloudera-scm-server配置
[root@cdh2 cloudera-scm-agent]# vim /etc/cloudera-scm-agent/config.ini # Configuration file for cloudera-scm-agent. # Please note that this file supports multi-line values. Multi-line # values are indicated by indenting following lines with a space. # # If you have whitespace in front of a parameter name, it will be # read as a continuation of the previous parameter value. Please # be careful not to leave spaces in front of parameter names. # # To check if this file has spaces in front of parameters names # you can do a grep like this: # grep '^[[:blank:]]' /etc/cloudera-scm-agent/config.ini [General] # Hostname of the CM server. server_host=192.168.0.171 # Port that the CM server is listening on. server_port=7182
然后重启cloudera-scm-agent就可以了
systemctl restart cloudera-scm-agent