问题现象:某集群管理节点gccli可以正常登陆数据库,但是执行sql时提示ERROR 1789 (HY000) at line 1:get cluster task id fail .
问题分析:查看该节点的system日志,其中有大量的获取句柄状态失败的记录,其中error=15/2的记录很多。
220418 17:26:11 [Note] gcwCrmHandleInThreadGet() failed, error=15
220418 17:26:11 [Note] gcwCrmHandleInThreadGet() failed, error=15
220418 17:26:11 [Note] gcwCrmHandleInThreadGet() failed, error=15
220418 17:26:11 [Note] gcwCrmHandleInThreadGet() failed, error=15
220418 17:26:11 [Note] gcwCrmHandleInThreadGet() failed, error=15
220418 17:26:11 [Note] gcwCrmHandleInThreadGet() failed, error=15
220418 17:26:11 [Note] gcwCrmHandleInThreadGet() failed, error=15
220418 17:26:11 [Note] gcwCrmHandleInThreadGet() failed, error=15
220418 17:26:11 [Note] gcwCrmHandleInThreadGet() failed, error=15
220418 17:26:11 [Note] gcwCrmHandleInThreadGet() failed, error=15
220418 17:26:11 [Note] gcwCrmHandleInThreadGet() failed, error=15
220418 17:26:11 [Note] gcwCrmHandleInThreadGet() failed, error=15
220418 17:26:11 [Note] gcwCrmHandleInThreadGet() failed, error=15
220418 17:26:11 [Note] gcwCrmHandleInThreadGet() failed, error=15
220418 17:26:11 [Note] gcwCrmHandleInThreadGet() failed, error=15
220418 17:26:11 [Note] gcwCrmHandleInThreadGet() failed, error=15
其中error=15表示/dev/shm快满了,error=2表示彻底满了,为6表示corosync之间正在同步。
解决方法:kill掉该节点的gclusterd进程,自动拉起后,sql执行恢复正常