How to manage, optimize, and operate, in a FinOps perspective, a Kubernetes Cluster hosted in a Cloud environment

Claranet
9 min readJun 9, 2021

To quickly summarize a FinOps journey, you have three phases:

  1. Inform (know where you spent your $$$, i.e: visibility/cost allocation)
  2. Optimize (Cost optimization using rate discount/scaledown strategy/…)
  3. Operate (Continuous improvement and monitoring of your FinOps strategy. Also the automation is part of this phase)

For each phase, we will try to describe the key concepts and the best approach, depending on the Cloud provider (mainly Amazon Web Services and Microsoft Azure) and the Cluster environment (production/non-production).

Inform

In the Kubernetes world and especially with managed services, the cost allocation can be a real nightmare as to get good accuracy, you need to consider and identify multiple resources. Resources at the Kubernetes level like pods but also at the Cloud hosting level as you need to take into account cost like Load Balancer/Node/Storage/Network/…

Cost allocation is the process to split your bills and associate each part with a Cost Center, which can be an application/service/team/ and try to get the highest accuracy. By doing it, you gain Cost Visibility. In a Cloud environment, this is not easy as some resources are shared between many Cost Centers, you also need to handle discounts like reservation, network bandwidth, and many more items.

To get it, Tags are used to identify resources owner (ex: application1), and then an additional strategy must be established to allocated untagged/untaggable items.

Moreover, Cloud Providers are charging at the host level (exception for AWS EKS running in full Fargate mode and GKE Autopilot), not at container’s one, so you need to allocate a given amount of the host resources (vCPU/Memory/…) to each container and then summarize it following your tagging strategy. You also need to deal with “shared cost” like Kubernetes Master Cost and “idle cost” (percentage of unused Host resources). You can’t use common Cloud provider costs management service like AWS Cost Explorer or Azure Cost Management to create cost distribution.

Using a third-party tool like Cloudhealth could be really helpful in your Cost Allocation task but comes at a high cost. Unless you can use tools like kube cost to get a cost distribution estimation (works with AWS and GCP)

There is no general strategy that works in all cases, you need to find the best suited for your use case. But we can try to give some guidelines:

  1. You need to define a Tagging Strategy. It must be the same as the one defined at the Cloud provider level.
  2. Use Kubernetes Labels with your defined Tagging Strategy for Cost allocation
  3. Group containers using Kubernetes namespaces simplified cost allocation.

Optimize

The optimize phase focus on cost reduction processes. Some of them can be implemented without Cloud provider consideration but others depend on the Cloud provider specifications.

We can group optimizations into four categories:

  1. Rightsizing: allocate the right amount of resources (vCPU/Memory) to the hosts and pods.
  2. Off-hours: turning off or scaling down resources during the nights and weekends.
  3. Spot Node: use spot instances for Kubernetes Nodes.
  4. Node Reservation: Buy instance/Virtual Machine reservation (or Savings Plans for AWS).

As a quick helper, check the diagram below to find the recommended optimizations order in relation with the cluster’s environment (production or non-production):

recommended optimizations order in relation with the cluster’s environment (production or non-production)

RightSizing

RightSizing can be done at the pod/container level and/or at the host/node level.

Note: rightsizing means adjusting resource capacities but also adding/removing identical resources.

Container

In the container world, there are multiple strategies to do rightsizing: Horizontal rightsizing, Vertical rightsizing, and Node rightsizing.

Horizontal rightsizing (HPA)

Like an EC2 AutoScaling Group or an Azure VMSS, Horizontal Rightsizing aims to add or remove resources (pod in this case) to follow the current workload.

The Kubernetes horizontal pod autoscaler is designed for stateless application which can spin up quickly to handle usage spikes and shut down gracefully.

Vertical rightsizing (VPA)

Unlike Horizontal rightsizing, which add or remove pods, vertical rightsizing (can) increase or decrease pod resources (CPU/Memory) after an evaluation period.

The Kubernetes vertical pod autoscaler is especially useful to adjust the CPU and Memory requests to fit the real container consumption. Frequently, container’s requests are oversized resulting in wasted resources.

VPA can automatically adjust container resources but it’s not recommended, use it as a recommendation and then adjust your Kubernetes resource definitions.

TOOL : Check the open-source kube resource report to get a VPA report in HTML

Node rightsizing

Your pods use node resources thus when you using HPA or VPA it’s mandatory to be able to adjust your node capacity.

The Kubernetes Cluster Autoscaler can add or remove nodes to optimize the cluster resource capacity.

Pod Disruption Budget is a must-have when cluster Autoscaler is activated to avoid application’s downtime.

Another important concept, regarding node rightsizing, is workload matching.

Workload matching means you should take into account the CPU-to-memory ratio of your node to avoid getting too much unused Memory or CPU.
For example, your pods are consuming more Memory than CPU but the Nodes capacity balance between both is more or less equal. In this case, you are going to waste CPU resources and increase your infrastructure cost.
To avoid that, change the Node instance type to get one Memory-optimized.

To detect workload matching issue, check the “CPU requested usage” and “memory request usage”. You should have something similar. For example, you have an issue if your “CPU requested usage” is 100% whereas the “memory request usage” is only at 50%.

Off-hour

Off-hour means stopping or reducing resource capacity during nights (and weekends). For Non-production environments, it should be a standard!

A resource stop between 20:00 p.m — 07:00 a.m and during the weekend cut its price by 60%. Better than a reservation!

kube-downscaler is the entry-point for both production and non-production clusters (must be combined with the Cluster Autoscaler to save money).
This open-source project will scale-down deployments/stateful set/HPA matching the policies defined (hour/not matching exclusion conditions like namespace restriction).

Production

For a production environment, you must ensure that, even during off-hours, your cluster/customer application stay highly available and fault-tolerant by:

  • having at least 2 replicas of each critical application
  • having at least 2 worker nodes

It’s not a real off-hours as it’s more a scale-down when the cluster load decrease.

For a production environment, you must ensure that, even during off-hours, your cluster/customer application stay highly available and fault-tolerant by:

  • having at least 2 replicas of each critical application
  • having at least 2 worker nodes

It’s not a real off-hours as it’s more a scale-down when the cluster load decrease.

Non-production

A non-production Cluster should support down-scale to 0 Worker node but with managed Kubernetes clusters, some Cloud services have some limitations (check below).

cluster-turndown is an open-source project to easily doing off-hours. Currently, the project support GKE, EKS and KOPS on AWS.

AWS EKS

You can scale-down the Workers node (managed or unmanaged group) to 0 but keep in mind that AWS still charges for cluster management (0,10$/hour).

AZURE AKS

Azure AKS is different from AWS as Azure doesn’t charge for cluster management (except when you add the uptime SLA option but this option doesn’t make sense for a non-production cluster) but you can’t scale-down default (system) node pool to 0.

So… you have a constant charge. The cost depends on the Virtual Machine size. For the worker nodes pools, you can scale-down them to 0.

Spot Node

Spot Nodes are another good way to reduce Kubernetes Cluster Cost. For a general definition, spot instances/virtual machines are “normal” resources highly discounted. But this discount has a drawback, Cloud providers don’t warrant the availability of the resources. They can be terminated at any moment.
This means that the running applications must be fault-tolerant for a production environment (in non-production it’s not mandatory but you/the customer must accept the risk to have downtime until a new Node is ready).
Each Cloud Provider offers a way to receive notification (local metadata API) some seconds/minutes before the node’s termination. This help to mitigate the node termination impact.

For a successful Spot node utilization, you should consider using :

AWS announced in December 2020 the support of spot instances in a managed node group. With this new feature, node draining and spot strategy are managed by AWS.

Also, it offers capacity-optimized allocation strategy and Capacity Rebalancing two nice features for Spot AutoScaling groups.

Azure has some limitations/conditions to be able to use spot nodes :

The default (system) node pool doesn’t support spot node (and must have at least 1 VM)

You cannot mix standard and spot nodes in the same node pool. You must create one(or more) standard node pool and another one(or more) spot node pool.

Production

The general guidance to use spot nodes in a production cluster is to distribute the node capacity between on-demand/standard and spot to avoid suffering a complete downtime in case of non-spot availability. The distribution mostly depends on the hosted applications.

For example, the distribution could be:

  • 35% Spot — 75% On-Demand/Standard
  • 50% Spot — 50% On-Demand/Standard
  • 75% Spot — 35% On-Demand/Standard

Non-production

For a non-production environment, your spot strategy can be much more aggressive than in production as there are no needs for absolute availability. You can even target distribution with only spot nodes. Otherwise, like for a production environment choose a balance between On-Demand/Standard and spot.

Instance/VM Reservation

AWS: before reading the following section, be sure to understand the concept and differences between Instance Reservation and Savings plans. You can check this article for a complete description.

Production

For a production cluster, and in general, you should start as soon as possible the reservation process. You can start slowly by reserving only, for example, 10% of your capacity and reserve more over time.

AWS

In the case of the use of AWS Fargate (part or full), it’s an easy choice as only Compute Savings Plans can cover both EC2 and Fargate.

Otherwise, if you have a mix of On-Demand and Spot instance, you could choose to cover 100% of your OD and if you don’t have Spot instance, It depends on the cluster load. For both please check this page for insights about how to select the right option.

AZURE

The starting point is to buy Reserved Virtual Machines for the default (system) node pool. And then if you have a mix of Standard and Spot instance, you could choose to cover 100% of your Standard and if you don’t have Spot instance, It depends on the cluster load.

With the reservation cancellation feature, you can be really aggressive on the reservation coverage as the cash flow break-even point is reached quickly (cancellation fee are only 3% of the remaining reservation charge). Check the Azure documentation for more information

Non-production

AWS

If you don’t run your cluster with only Spot nodes and don’t scale down to 0 node pools during nights/weekends, you could reserve the number of minimum running nodes. check this page for insights about how to select the right option.

AZURE

As you cannot scale down the default (system) nodes pool to 0, you can buy Reserved Virtual Machines to cover them.

For the Workers Nodes pools, if you scale down to 0 during nights/weekends you can’t buy Reservations. Otherwise, you could reserve the number of minimum running nodes.

With the reservation cancellation feature, you can be really aggressive on the reservation coverage as the cash flow break-even point is reached quickly (cancellation fee are only 3% of the remaining reservation charge). Check the Azure documentation for more information

Specificities of AWS Fargate

Mandatory:

  • interruptible workloads only

Limitations:

  • statefulset not supported (use EFS for persistent volume)
  • not available in all AWS regions
  • only support ALB as Load Balancer type
  • max 4vCPU and 30GB memory per container

Options:

  • Fargate
  • Fargate with spot instances

Pricing model:

Per vCPU per hour and per Memory (GB) per hour.

Fargate is a good candidate for:

  • very small cluster
  • quick test/poc

Operate

The operate phase helps to continuously improve your FinOps strategy. Its checks to perform:

  • Global cost evolution
  • Cost evolution by dimension (i.e: workload)
  • Empower and incentivize developers to track their Kubernetes utilization
  • Use the latest Cloud provider feature (ex: new instance/vm generation availability)
  • check if new optimizations can be applied

This a non-exhaustive list of things to do/check during the operate phase which MUST BE performed on production AND non-production environments.

Resources

Below a list of useful resources related to Kubernetes from a FinOps perspective:

  1. Google Cloud Best Practices for running cost effective Kubernetes
  2. Whitepaper FinOps for Kubernetes from the FinOps Foundation

--

--

Claranet

Combining pioneering technologies, practices, and expertise to propel your business ambitions