Demystifying Kubernetes CPU Limits (and Throttling)

Recently, I've been doing some investigation into high CPU utilization occurring during routine security scans of our Wordpress websites causing issues such as slow response, increased errors, and other undesirable outcomes. This is typically limited to a single pod–the one the scanner randomly gets routed to–but can still be user-visible (and Pagerduty-activating 😅), so we want to get better monitoring on it.

Initial Investigation

Like anyone else in IT investigating something they're not sure of, I turned first to Google. I sought out what other people are doing to monitor CPU usage of pods in Kubernetes. This is what first led me to discover that it's actually far more useful to monitor how much the CPU is being throttled rather than how much it's being used.

I already knew of the kubernetes-mixin project, which provides sane default Prometheus alerting rules for monitoring Kubernetes cluster health, so I looked there first to see what rules they are using to monitor CPU. Currently, the only CPU usage alert bundled in is "CPUThrottlingHigh", which calculates number_of_cpu_cycles_pod_gets_throttled / number_of_cpu_cycles_total (not acutal metric names) to give you a percentage of how frequently your pod is getting its CPU throttled.

But wait, what does throttled even mean? Throttled (at least in my mind) means something along the lines of just getting slowed down, but in this case throttled means completely stopped – you cannot use any more CPU until the next CFS period (every 100ms in Kubernetes, which is also the Linux default - more on this later).

While abstractly this seems pretty cut and dry, it gets more confusing when you're actually looking at in practice on production servers with tons of CPU cores.

Conceptualizing

For the purposes of this article, I'll be referring to a server with 128 CPU cores running a pod with a CPU limit of 4.0.

If you are not already familiar with the concept of millicores, suffice to say that 1 milllicore = 1/1000th of a CPU's time (1000 millicores = 1 whole core). This is the metric used to define CPU requests/limits in Kubernetes. Our example pod has a limit of 4.0 which is 4,000 millicores, or 4 whole cores worth of work capability.

But how does the operating system kernel even enforce this measure? If you're famililar with how Linux containers work, you probably have heard of cgroups. Cgroups, put simply, are a way to isolate and control groups of processes such that have no awareness of the other processes also running on the same server as them It's why when you run a Docker container, it thinks that its ENTRYPOINT + CMD is PID 1.

Among other things, Cgroups use the Linux CFS (Completely Fair Scheduler) to set and enforce resource limits on groups of processes, e.g. our pods in Kubernetes. It does this by setting a quota and a period. A quota is how much CPU time you can use during a given period. Once you use up your quota, you are "throttled" until the next period when you can begin using CPU again.

Going back to our discussion on millicores, this means that in every 100ms cfs_period in the operating system, we get 400ms of usage allowed. The reason why we get 400ms in a 100ms time frame is each core is capable of doing 100ms of work in a 100ms period – 100ms x 4cores = 400ms. This 400ms of work can be broken up in any way - it could translate to 4 vCPUs each doing 100ms of work in a 100ms cfs_period, 8 vCPUs each doing doing 50ms of work, etc. Remember - CPU limits are based on time, not actual vCPUs.

Understanding that, the reason for the throttling confusion starts to come into focus. So far as I can comprehend, the theoretical upper bound of throttling is n * (100ms) - limit where n is the number of vCPUs and limit is how many milliseconds of CPU you are allotted in a 100ms window (calculated earlier by cpuLimit * 100ms). This means that the theoretical upper bound for throttling on my 128 core machine is 124 seconds of throttling per second because (128cores * 100ms - 400ms) * 10 = 124.

Note: the actual CPU throttling is determined by how many processes you're running and which core(s) they're assigned to by the OS scheduler.

Putting it together

Now things started to finally click in my brain. At least... as much as they could, considering I'm still somewhat ignorant of all the nitty gritty details occurring in the Linux scheduler itself.

This whole investigation was kicked off by the fact that when I went to use a rate() function on the container_cpu_cfs_throttled_seconds_total metric in Prometheus, the per second rate of throttling was significantly higher than 1s (think closer to 70s per second). How can a pod be throttled for more than 1second in a 1second window? I wondered.

Putting all this information together, I now know that the reason for such high throttling was because httpd was spawning additional processes on additional CPU cores, which raises the amount of throttling to significantly higher than the resource limit.

Conclusion

With my brain sufficiently exploded, I can now say that we have sufficient monitoring in place to alert us of high CPU usage based on the amount of time the CPU is being throttled. This is the alert we have in place now:

    - alert: Wordpress_High_CPU_Throttling
      expr: rate(container_cpu_cfs_throttled_seconds_total{namespace=~"wordpress-.*"}[1m]) > 1
      for: 30m
      labels:
        severity: warning
      annotations:
        message: The {{ $labels.pod_name }} pod in {{ $labels.namespace }} is experiencing a high amount of CPU throttling as a result of its CPU limit.

In summary:

Kubernetes uses a cfs_period_us of 100ms (Linux default)
Each a CPU request of 1.0 in k8s represents 100ms of CPU time in a cfs_period
- Theoretically this is 100% of 1 CPU's time, but not pratically since pods usually run multiple processes on multiple cores
The upper bound of how many seconds the kernel reports a pod being throttled for is determined by the number of CPU cores that the pod is using.
- The number of CPU cores you use is not directly related to your CPU limit. It correlates more strongly with the number of processes your pod runs.

Please feel free to reach out if I got anything wrong, or if you have any questions. I'm available on Twitter @wbhegedus

Works Cited

These resources were useful to me in my quest for knowledge.

CFS Bandwidth Control - Kernel.org
Unthrottled: Fixing CPU Limits in the Cloud - Indeed Engineering
CFS quotas can lead to unnecessary throttling - Kubernetes Github Issue #67577
CPU bandwidth control for CFS - Academic paper on Linux CFS from Turner, Rao, and Rao
cAdvisor Github
Kubernetes Monitoring Mixin CPU Alert
CPU limits and aggressive throttling in Kubernetes - Omio Engineering