Unable to connect to the MKS: A general system error occured: Internal error

简介:

All 16 hosts in cluster are up and running since long time without any issue - uptime 300+ days. On all hosts we cannot get access to VM console. Opening VM console from viClient we get error "Unable to connect to the MKS: A general system error occured: Internal error".
We cannot vmotion VMs to another esxi hosts in cluster. 

2xa09-esx2e.png


We login into esxi hosts and noticed that root Ramdisk is full:

# vdf -h | tail -6 

full-root.jpg


The uptime of esxi hosts was impressive:

# uptime 

uptime-300%2B.png


When we tried to get information about Virtual Machines using vim-cmd command we got error:

# vim-cmd vmsvc/getallvms 

vim-cmd.png



We tried to figure out what consumed space on root in Ramdisk, we run command:

# find / -size +10k -exec du -h {} \; | egrep -v volumes | egrep -v disks  | less

I spotted a lot of EMCProvider logs in /opt/emc/cim/log

# ls -l | head -5

EMCProvider-log.png


And bingo! these logs eat the space:

# du -h /opt/emc/cim/log/

du-full.png


It seems that EMCProvider logs haven't rotated and fulfilled root in Ramdisk. I couldn't find any parameter in conf file to setup rotation of EMCProvider logs - it is more feature than bug ;)

We deleted logs older than 200 days (eventually we deleted all EMCProvider logs older than 1 day) on esxi hosts in cluster:

# cd /opt/emc/cim/log/
# find . -name '*.log' -mtime +200 -exec rm -f {} \;

We got some free space on root and were able to got access to some VM console, but some VMs started to show another error 'Unable to connect to the MKS: Failed to connect to server fqdn.com:902':

2xa23-esx2a.png


We identified that VMs located on 3 esxi hosts encounter the error above.

We noticed that on affected esxi hosts nothing is listen on port 902 even when we already had enough free space on root ramdisk:

# esxcli network ip connection | grep :902

no902.png

 
 VMs which no longer encountered issue with VM console access were located on esxi hosts where 'busybox' listened on port 902:

yes902.png

 
 We decided to put affected esxi hosts into MM (Maintenance Mode) and reboot. After esxi host reboot 'busybox' started to listen on port 902 and VM console issue gone.

The main take-away is that full root ramdisk condition is abnormal - we have to remember that in *nix world everything is a file it could explain why some hosts cannot create TCP socket for 902 port when root was full even after we got some free space on root ramdisk.

Here all steps in one printscreen:

summary.png



 The End.

本文转自学海无涯博客51CTO博客,原文链接http://blog.51cto.com/549687/1842397如需转载请自行联系原作者

520feng2007
相关文章
|
2月前
|
Shell 网络安全 开发工具
fatal: unable to access 'https://github.com/wolfcw/libfaketime.git/': Encountered end of file
fatal: unable to access 'https://github.com/wolfcw/libfaketime.git/': Encountered end of file
|
6月前
|
缓存 IDE Linux
Internal error. Please report to https://jb.gg/ide/critical-startup-errors
Internal error. Please report to https://jb.gg/ide/critical-startup-errors
135 0
SignTool Error: An error occurred while attempting/Error information: “SignerTimeStamp() failed.“
SignTool Error: An error occurred while attempting/Error information: “SignerTimeStamp() failed.“
148 0
Error information: “Error: SignerSign() failed.“ (-2147012889/0x80072ee7)
Error information: “Error: SignerSign() failed.“ (-2147012889/0x80072ee7)
126 0
|
Java
Error: A JNI error has occurred, please check your installation and try again
Error: A JNI error has occurred, please check your installation and try again
198 0
|
Go iOS开发
The operation couldn’t be completed. Unable to log in with account 'myappleid'. An unexpected failure occurred while logging in (Underlying error code 1100).解决方法
The operation couldn’t be completed. Unable to log in with account 'myappleid'. An unexpected failure occurred while logging in (Underlying error code 1100).解决方法
454 0
|
SQL 数据库
DBCC CHECKDB 遭遇Operating system error 112(failed to retrieve text for this error. Reason: 15105) encountered
我们一个SQL Server服务器在执行YourSQLDBa的作业YourSQLDba_FullBackups_And_Maintenance时遇到了错误:   Exec YourSQLDba.Maint.
1163 0
|
Windows Python
Fatal error in launcher:Unable to create process using
Windows 下同时存在 Python2 和 Python3 使用 pip 时系统报错:`Fatal error in launcher: Unable to create process using '"'` 的解决方案
5349 0