Fixing Insufficient resources to satisfy configured failover level for HA

简介:


This post comes from a few days of poring over manuals as well as some technical support.  This is a good one.  The error came from trying to power on a VM in our VMware cluster and we would get these errors:

“Insufficient resources to satisfy configured failover level for HA”

VMWARE: insufficient resources to satisfy configured failover level for HA

And this alert on our cluster

“Insufficient resources to satisfy HA failover level on cluster vmCluster in vmTST”

Our way of thinking was we had to power one off to power another one on.

But that didn’t work.

Here we is the actual solution. (p.s. Great VMware HA education for me on this one!)

PROBLEM SOURCE: VMware HA is turned on and you are violating constraints


VMware HA is turned on, and you have it configured so that there is a certain amount of resource reserve for failover.  By turning on this VM, you are going to dip into that resource reserve and so VMware is telling you “Nope, not turning it on….”

There is a quick fix to get the VM turned on (one good way, one bad way), and then there are two long term fixes for you to consider.  In my case, the first one was faster, while the second one was better for my environment.

My VMware environment

Datacenter: vmTST
Cluster: vmCluster
OS: ESXi 4.1.0
Five (5) servers in a cluster.

VMware environment

My VMware Cluster Errors

As mentioned above:

“Insufficient resources to satisfy configured failover level for HA”

and

“Insufficient resources to satisfy HA failover level on cluster vmCluster in vmTST”

TWO WAYS TO DO QUICK FIX


  1. Turning off HA (popular, and I would say WRONG)

  2. Disable Admission Control (much better!!)

#1: Turning off HA (though I recommend against)

This is the solution I saw on some forums (including vmware forum).  After looking at it more, I recommend against it and I’ll explain why, but here it is:

VSphere Client: Browse Inventory -> Hosts and Clusters

Browse vmware inventory hosts and clusters

Edit VMware cluster settings

Right Click on Cluster name -> Edit Settings

VMware cluster: edit settings

Turning off HA

While this works, if you do this, whenever you turn it back on, it has to do a recalculation for the HA failover.  Bad, especially for testing or doing temporary power ons.

WRONG WAY: do not turn off VMware HA

#2: Disable “Admission Control” (better IMO)

Better to disable “Admission Control” so VMS will power on despite violating availability constraints.  This way your HA is still on. In the long run, though, it is better to fix your issue.

Same window, but next bullet item on the left:

VMware: better to disable admission control

LONG TERM FIX: TWO WAYS

There are two things I ended up having to look at.  One was pretty good long term fix and that I had found suggested on forums including VMware forums.

The second is the actual fix to my problem, the best one in the long term


FIX #1: Change from “Host Failures Cluster Tolerates” to “Percentage of cluster resources reserved as failover spare capacity”

In other words, instead of telling VMware you want to have enough resource reserve so that you can lose one host, you are telling VMware you want to have a certain percentage of resources unused for failover.

We had it configured to lose one host.  So by switching to a percentage it was a quick and easy fix for my environment.

VMware HA: Host failures cluster tolerates (?)

VMware HA: Host Failures Cluster Tolerates

So if we look at the “VMware HA” window, you’ll see that my “Host failures cluster tolerates” was set to 1.  Now with 5 servers you would think that means “20%” but that’s not so. Because what if one of your VMs (or more) for whatever reason took up 75% of your resources, then by worst case calculation you could only have one VM on your five node cluster.

A worst case calculation of your largest VM will determine what’s called a “slot” size. The VMware HA will then calculate how many total “slots” can be used which determines how many total VMs you can have powered on.

When this option is chosen, from what I’ve read on VMware forums, the calculations are VERY conservative.

Find Your Slot Size: VMware Cluster Summary -> Advanced Runtime Info

VMware cluster summary: Advanced Runtime Info

VMware Advanced Runtime Info: Slot sizes

VMware HA Advanced Runtime Info

So you can see above, worst case scenario, one slot size is 2507Mhz, 4256 MB.  With that in mind, there are 55 slots available on my five node cluster.  There are a total of 156 VMs out of 55.

This means I would have to power off 102 VMs to get to 54 powered on VMs leaving one slot open to power the new one on… (YIKES!0

Changing To Percentage: First Check Resource Usage

Out of curiosity, I checked the actual resource usage in my cluster

VMware cluster: Hosts

If you tally up all the green bars in CPU, I could fit all the CPU usage of every VM on one host.

If you tally up all the green bars in Memory, I could fit all the memory usage in about three hosts.

So why can’t I power on a VM?  Because the calculation is *THAT CONSERVATIVE* for the “Host failure cluster tolerates” option

VMware HA: Switch to percentage

VMware HA: Percentage of cluster resources reserved as failover spare capacity

Now, the first time I did this, I chose “20%” which prorated to one server out of the five being free.

And I was able to power on a VM

On a whim, I kept upping the percentage and I got as high as 75% before I decided to stop, thinking I was doing something wrong.

Part of it was that the VM I was powering on was very very small in resource usage (and later I found out also it had 0 reserve configured with it) which is probably why it powered on even at 75% failover spare capacity.

Anyhow, so in a pinch, this is one way to configure some amount of reserve AND be able to power on your VMs, at least if your resource usage somewhat mirrors mine (see previous picture)

FIX #2: Best Long Term Fix: Determine WHY the cluster resource reserve is so high and see if it is actually needed, or if it is just poorly configured

In the end this was the actual fix for us, because it delved into the actual source of the problem.  Which was to find out:

WHY the heck was our VM slot size so BIG?

Because obviously all five hosts combined were using VERY LITTLE CPU and RAM.  Less than 20% on CPU (it could fit all on one server), and less than 50% on RAM (it could fit on two to three servers).

It turns out:  The slot size is not based on usage, it is based on a VM resource reservation.

So here is how to check your resource reservation for your Vms.

VMware Cluster: Resource Allocation for CPU and Memory

CPU

VMware cluster: resource allocation cpu

(The dashed lines are my VM names which I blanked out)

Click on the “CPU” button and look for the “Reservation” column and sort by largest to smallest.

Memory

VMware cluster: resource allocation memory

(The dashed lines are my VM names which I blanked out)

Click on the “Memory” button and look for the “Reservation” column and sort by largest to smallest.

As you can see, there are many VMs with resource reservation.  This means as soon as the VM is powered on, it will reserve this much resource REGARDLESS IF IT IS NEEDED OR NOT!

But as you can see by actual usage, we are not even near to capacity, there is no real reason for us to reserve that much.

One of the culprits: it turns out many of our templates we use to clone/deploy VMs had resources reservation already set, so each time we made a new VM it had a resource reservation.

VMware Cluster: Virtual Machines Actual Usage

Go to the tab “Virtual Machines” now and you can see actual usage.  There is a column “HOST CPU – Mhz” and “Guest Mem – %”.  These show actual usage by the VM.

VMware cluster: Virtual Machines List

I sorted alphabetically here and referenced the previous two pictures (VMs with the highest reservations) and then checked this list to see actual usage. Sure enough, many of our VMs were not using that much resource (as you can tell from earlier graphs)

Next step: contact VM owners to see if the VM was in typical usage.  If so, get permission to turn the resource reserve down or even off.

VMware: Right Click -> Edit Settings

To configure resource reserve, right click on the VM and Edit Settings

VMware cluster: Right Click and Edit Settings

VMware: CPU reservation and Memory reservation

Here I turned the CPU Resource reservation and memory reservation low or to zero

VMware Edit CPU Resource Reservation Settings

VMware edit settings memory resource reservation

REMEMBER TO CONSULT YOUR USER FIRST TO SEE IF VM IS IN TYPICAL USE

VMware HA: Advanced Runtime Info Results

Now go back to your Advanced Runtime Info Results… (you might have to turn the VMware HA to “Host failover cluster tolerates” if you had changed it to the percentage as an intermediate fix)

VMware HA: Advanced Runtime Info

When all was said and done, I went from 55 slots to 550 slots.

And from being in the “red” of 101 VMs I’d need to power off to power one on to being in the “green” of 394 VM slots available.

CPU slot size went down a factor of 10
Memory slot size went down a factor of 20

NICE!!!

Hope this has been helpful!

本文转自学海无涯博客51CTO博客,原文链接http://blog.51cto.com/549687/1953598如需转载请自行联系原作者


520feng2007

相关文章
|
8月前
|
SQL Windows Perl
Configuring Automated Maintenance Tasks
Configuring Automated Maintenance Tasks
48 0
|
资源调度
yarn 错误:There appears to be trouble with your network connection. Retrying…
yarn 错误:There appears to be trouble with your network connection. Retrying…
1414 0
|
4月前
|
SQL Java
flywa报错Detected resolved migration not applied to database: 20221103.10000
flywa报错Detected resolved migration not applied to database: 20221103.10000
67 2
|
网络协议 Apache 开发工具
|
网络协议 关系型数据库 PostgreSQL