KubeCost: Monitor Kubernetes Costs with kubectl

You already know you can take full control of all resources in a Kubernetes cluster using the kubectl client. There is a new open source kubectl plugin that enables kubectl to monitor costs now too. The cost
plugin allows every engineering team to quickly determine the cost and efficiency for any Kubernetes workload in tandem with the open source Kubecost application.
Modern cloud infrastructure is increasingly complex, challenging teams beyond just operations and engineering. Financial controllers are under extreme pressure to allocate costs in order to monitor and improve financial performance of teams. They turn to engineering teams for answers. The collaboration between finance, operations and engineering enhances visibility of modern cloud workflows. As technology evolves, so does corporate culture. The cost plugin for kubectl is an answer to the challenges of modern enterprise infrastructure.
Shining Light on Infrastructure Costs
Kubernetes clusters are often shared across teams, microservices, applications, and even departments, making infrastructure simpler to manage. With a shared platform, teams often use labels and/or namespaces to organize deployments. A Kubernetes namespace is a logical separation inside a Kubernetes cluster which could be assigned to a particular team, application, or even a business unit.
Most organizations map a namespace to a specific workload type or purpose. For example, the fictitious e-commerce company VeryCoolStore runs a cluster with one namespace for monitoring and one namespace for logging for use by their DevOps teams who maintain the cluster. The customer-facing web frontend application, the search and the product suggestion applications are hosted in that same cluster in different namespaces.
Creating these logical divisions inside a cluster is convenient but doesn’t solve all problems. First, it still doesn’t allow accurate measurement of resource usage and allocation of costs to each tenant based on detailed billing data. More importantly, it doesn’t expose inefficiencies or wasted resources.
Waste is a huge problem whose effects pile up all the way to the final consumer, often with large impact on the price of units sold. No wonder that reducing waste is a corporate mandate for many managers.
To discover waste and improve efficiency, you need the appropriate tools. A Kubecost user leading an SRE team was constantly seeing a service fail. They noticed that service was kicked out of the cluster by lack of resources. With Kubecost reports and kubectl cost they discovered the root cause: an application was requesting 30 pods that went largely unused.
The resources allocated for this overprovisioned application were often forcing the scheduler to kick out other applications, making them fail. Reviewing cost and efficiency reports highlighted that those 30 pods were unnecessary. Once the application was reconfigured to demand one pod instead of 30, things improved immediately. And the engineering team was happy to discover that one pod was indeed enough.
How to Use Kubectl Cost
The cost plugin can be installed in minutes. If you use Krew, just type the following:
kubectl krew install cost
Alternatively, check the installation instructions on GitHub for different options.
There are a number of supported subcommands, including the following:
- namespace
- deployment
- controller
- label
- pod
- tui (to be covered in a future post!)
Each subcommand by default displays the projected monthly cost based on the activity during the window. There is also a non-rate display mode (–historical) that shows the total cost for the duration of the window.
How to Get the Most out of Kubectl Cost
In any complex environment, it’s important to understand what drives costs and potential increases. Finance controllers have developed sophisticated models to assign costs in different scenarios: from factory production lines to hospital wards, the finance teams and operations work together with the objectives to discover better ways to use their resources. With Kubecost, it’s possible to find inefficiencies and improve team performance, both in financial and operational terms.
Most engineering teams organize costs in Kubernetes by namespace or label. For example, the same VeryCoolStore company of before may check the cost of the web frontend by querying that namespace.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
kubectl cost namespace --show-all-resources +-------------------+-----------+----------+-----------+-------------+----------+-----------+----------+-------------+--------------------+-----------------+ | NAMESPACE | CPU | CPU EFF. | MEMORY | MEMORY EFF. | GPU | PV | NETWORK | SHARED COST | MONTHLY RATE (ALL) | COST EFFICIENCY | +-------------------+-----------+----------+-----------+-------------+----------+-----------+----------+-------------+--------------------+-----------------+ | kubecost | 10.166745 | 0.101536 | 17.002068 | 0.173600 | 0.000000 | 14.200444 | 0.129462 | 0.000000 | 59.748720 | 0.146633 | | kube-system | 41.502550 | 0.036747 | 5.459635 | 0.363383 | 0.000000 | 0.000000 | 0.045637 | 0.000000 | 47.007822 | 0.074721 | | kubecost-stage | 5.298161 | 0.083082 | 1.424946 | 3.479493 | 0.000000 | 2.552889 | 0.000000 | 0.000000 | 27.525996 | 0.802943 | | default | 3.758165 | 0.000377 | 0.278833 | 0.404114 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 4.036998 | 0.028263 | | logging | 0.822894 | 0.003321 | 0.689348 | 0.363660 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.512241 | 0.167579 | | frontend-services | 0.822894 | 0.003280 | 0.689348 | 0.365613 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.512241 | 0.168448 | | data-science | 0.000295 | 1.000000 | 0.010280 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.010575 | 1.000000 | +-------------------+-----------+----------+-----------+-------------+----------+-----------+----------+-------------+--------------------+-----------------+ | SUMMED | 62.371704 | | 25.554458 | | 0.000000 | 16.753333 | 0.175098 | 0.000000 | USD 141.354594 | | +-------------------+-----------+----------+-----------+-------------+----------+-----------+----------+-------------+--------------------+-----------------+ |
And to monitor the cost of each application in the cluster, as denoted by the “app” label:
kubectl cost label -l app
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
+----------------------------+--------------------+-----------------+ | LABEL | MONTHLY RATE (ALL) | COST EFFICIENCY | +----------------------------+--------------------+-----------------+ | cost-analyzer | 59.145248 | 0.137997 | | prometheus | 7.940403 | 1.000000 | | test-app1 | 3.612244 | 0.167835 | | nginx | 3.459811 | 0.006337 | | stackdriver-metadata-agent | 1.981401 | 0.324363 | | grafana | 0.847323 | 1.000000 | | kubecost-network-costs | 0.269981 | 1.000000 | +----------------------------+--------------------+-----------------+ | SUMMED | USD 77.256411 | | +----------------------------+--------------------+-----------------+ |
Measuring Spend Efficiency and Why It Matters
Spend efficiency in Kubecost is defined as the percentage of requested CPU & memory dollars utilized over the measured time window. Values range from 0 to above 100 percent. For example, consider the table below representing a namespace with two pods. Pod #1 runs on an on-demand node with a more expensive CPU; pod #2 runs on a spot node.
CPU Request | CPU Monthly Cost | CPUs Used | Utilization | Cost of Used | Cost-weighted Efficiency | |
Pod #1 | 1.0 | $20 | 0.20 | 20.0% | $4 | 20.0% |
Pod #2 | 1.0 | $2 | 0.80 | 80.0% | $2 | 80.0% |
Total | 2.0 | $22 | 1.00 | 50.0% | $6 | 25.5% |
The resulting efficiency for CPU cost is a mere 25.5% and the CPU utilization is 50%. This measurement is important because it clearly shows the areas where it’s worth focusing on to improve spending.
To investigate further, let’s continue with the namespace example started above. Imagine one namespace is costing much more than expected and/or has very low efficiency. Next, we can drill into the namespace and see the cost and efficiency of every single workload:
1 |
kubecost cost controller -n kube-system |
This shows you cost and efficiency for each deployment, replicaset, job, etc:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
+-------------+-----------------------------------------------------+--------------------+-----------------+ | NAMESPACE | CONTROLLER | MONTHLY RATE (ALL) | COST EFFICIENCY | +-------------+-----------------------------------------------------+--------------------+-----------------+ | kube-system | daemonset:fluentbit-gke | 12.712514 | 0.038815 | | | deployment:kube-dns | 9.508486 | 0.023992 | | | daemonset:kube-proxy | 6.232683 | 0.015296 | | | deployment:coredns | 2.680050 | 0.029732 | | | deployment:stackdriver-metadata-agent-cluster-level | 1.981401 | 0.324363 | | | deployment:metrics-server-v0.3.6 | 1.024543 | 0.063730 | | | daemonset:gke-metrics-agent | 0.961624 | 0.444516 | | | daemonset:aws-node | 0.941192 | 0.567099 | | | deployment:kube-dns-autoscaler | 0.342245 | 0.036308 | | | deployment:l7-default-backend | 0.202655 | 0.031027 | | | daemonset:prometheus-to-sd | 0.128543 | 1.000000 | | | deployment:event-exporter-gke | 0.035273 | 1.000000 | +-------------+-----------------------------------------------------+--------------------+-----------------+ | SUMMED | | USD 36.751209 | | +-------------+-----------------------------------------------------+--------------------+-----------------+ |
Finally, we can drill into the cost details with the -A flag. This shows all components of cost to fully understand what is driving total costs.
kubecost cost controller -n kube-system -A
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
+-------------+-----------------------------------------------------+-----------+----------+----------+-------------+----------+----------+----------+-------------+--------------------+-----------------+ | NAMESPACE | CONTROLLER | CPU | CPU EFF. | MEMORY | MEMORY EFF. | GPU | PV | NETWORK | SHARED COST | MONTHLY RATE (ALL) | COST EFFICIENCY | +-------------+-----------------------------------------------------+-----------+----------+----------+-------------+----------+----------+----------+-------------+--------------------+-----------------+ | kube-system | daemonset:fluentbit-gke | 10.075020 | 0.021476 | 2.637494 | 0.105050 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 12.712514 | 0.038815 | | | deployment:kube-dns | 8.977704 | 0.007332 | 0.530782 | 0.305785 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 9.508486 | 0.023992 | | | daemonset:kube-proxy | 6.137676 | 0.003714 | 0.072189 | 1.000000 | 0.000000 | 0.000000 | 0.022818 | 0.000000 | 6.232683 | 0.015296 | | | deployment:coredns | 2.455070 | 0.020677 | 0.224980 | 0.128545 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2.680050 | 0.029732 | | | deployment:stackdriver-metadata-agent-cluster-level | 1.560401 | 0.379148 | 0.421000 | 0.121307 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.981401 | 0.324363 | | | deployment:metrics-server-v0.3.6 | 0.786997 | 0.020952 | 0.237545 | 0.205456 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.024543 | 0.063730 | | | daemonset:gke-metrics-agent | 0.302251 | 0.460528 | 0.659374 | 0.437175 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.961624 | 0.444516 | | | daemonset:aws-node | 0.613768 | 0.352255 | 0.304606 | 1.000000 | 0.000000 | 0.000000 | 0.022818 | 0.000000 | 0.941192 | 0.567099 | | | deployment:kube-dns-autoscaler | 0.321223 | 0.008121 | 0.021022 | 0.467031 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.342245 | 0.036308 | | | deployment:l7-default-backend | 0.160612 | 0.007665 | 0.042043 | 0.120273 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.202655 | 0.031027 | | | daemonset:prometheus-to-sd | 0.033301 | 1.000000 | 0.095241 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.128543 | 1.000000 | | | deployment:event-exporter-gke | 0.003508 | 1.000000 | 0.031765 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.035273 | 1.000000 | +-------------+-----------------------------------------------------+-----------+----------+----------+-------------+----------+----------+----------+-------------+--------------------+-----------------+ | SUMMED | | 31.427531 | | 5.278041 | | 0.000000 | 0.000000 | 0.045637 | 0.000000 | USD 36.751209 | | +-------------+-----------------------------------------------------+-----------+----------+----------+-------------+----------+----------+----------+-------------+--------------------+-----------------+ |
Each resource type can now be tuned for your business. Most of our customers aim for utilization in the following ranges:
- CPU: 50%-65%
- Memory: 45%-60%
- Storage: 65%-80%
Target figures are highly dependent on the predictability and distribution of your resource usage (e.g. P99 vs median), the impact of high utilization on your core product/business metrics, and more. Finding the ranges that work for you is a matter of balancing some trade-offs: too low resource utilization is wasteful; too high utilization can lead to latency increases, reliability issues, and other negative behavior. Looking at historical data can help strike the right balance.
Kubernetes keeps expanding its reach and while growing, it poses new challenges to finance and engineering teams. The collaboration is at its infancy but it’s already showing a clear path forward where open source leads the way.