Worry if The CPU based Policy Does Not Cut It For You

Imagine you are in the process of migration of a legacy system from the data center to the cloud to make the system scalable. If containerization is not an option you will try cloud-native autoscaling. One of the most commonly used metric to perform autoscaling is CPU utilisation. If your application does not scale well based on this metric this is pretty likely because of more serious issues. Let's see why.

Usually, the proper utilization of instance resources will cause 70+% CPU utilization in case of high load. That is why it is usually recommended to scale the system out when CPU utilization reaches 50%. Here are several possible scenarios of resource utilization in heavy load based on the application type.

CPU critical applications - obviously higher workload should result in higher CPU utilization percentage, as CPU is the main resource they use;
Memory critical applications: if it is an application with automatic garbage collection, i.e. JVM based, at some point when the application has not enough memory it will run garbage collection more often, which will increase CPU consumption;
Multiple threads applications - the more threads are started - the more resources they need from CPU to get managed and processed.

So if the instance is failing health-checks without CPU consumption reaching 70% it may be caused by some application or configurations restrictions which blocks it from using all resources more effectively.

One of the common examples is the Redis instance with several CPU cores i.e. 4 cores, which has performance problems but CPU usage on the level of 25%. This happens due to Redis is a single thread and can use only one core, but the CPU utilization metric is calculated for all cores together. So 100% + 0 +0 +0/ 4 =25%

The other case may be if there is some server configuration restricting the number of threads. In this case, if at some points the application has more requests than allowed threads so that it can not respond to health-check of LB and will be removed.

So any time some instance is observed to be dying before reaches significant CPU utilization it is good to check if one of the described situations takes place and whether to improve application configuration or to change the instance type to use instances more effectively.

Sources & More details: AWS documentation