Kubernetes Infrastructure: Know the Inner Dev Loop
This article is a part of a series on Kubernetes and its ecosystem where we will dive deep into the infrastructure one piece at a time.
While we have explored the various principles, architectures and deployment strategies around the Cloud Native Kubernetes Stack so far in this series, having such complex and distributed systems can often become a huge toll on productivity and agility for the Engineering and DevOps teams if not done right. While having a cloud native stack is really important in today’s world, it should be complemented with a clear path to a faster development workflow bringing in the much-needed agility to the teams. That is exactly what we are going to discuss today.
According to Kubernetes-focused DevOps stack provider Ambassador, the “Inner Development Loop” is “The iterative process of writing, building and debugging code that a single developer performs before sharing the code, either publicly or with their team.”
Then, there are “outer dev loops” which start as soon as your code is checked into version control (or you send a PR for the same). This is where you hand over most of the responsibilities to automated systems, CI/CD pipelines to do the job for you as per a typical GitOps workflow.
The inner dev loop is the most critical of both because this is where the actual work is done by the developers who are either working on enhancements, bug fixes, security issues or anything you can imagine. This is where there a rapid and continuous feedback loop really helps debugging issues, making changes and immediately seeing the output in front of them without any context switching or delays. This means that a faster inner dev loop would mean great productivity, happy developers, rapid shipping cycles which ultimately translates into happy customers and better revenue growth.
Now that I have made my attempt to convince you on what the inner dev loop is and why it is really important for both the developers as well as for the business, let us have a look at what it actually looks like.
This is how a typical inner and outer dev loop looks like:
Since the inner dev loop is where you spend most of your time in your workflow, it calls for a lot of optimization, taking away all possible bottlenecks in the development process and let us be honest, working with Kubernetes has been hard so far considering the huge amount of complexity we carry with us from both Kubernetes and the tools around it.
But no longer. This has been the most important focus of the Kubernetes and open source community especially for the past year with a lot of amazing tools now available to solve the issue of the inner dev loop. But before we dive into those, let us have a look at what we expect from such tools, and how we can make the inner dev loop much faster.
Setting the Stage:
- Avoid Image Builds Whenever Possible: Depending on what you are trying to build, the caching you have in place and the builder you are using underneath, image builds can take quite a significant amount of time ranging from few seconds to minutes which when added up over time can turn out to be very costly and unproductive. This is why a strategy to avoid image builds is needed whenever possible.
- Seamless Networking: While we work with Kubernetes, our cluster may be running either locally or remotely depending on which the steps needed to send/receive traffic to/from our app can vary drastically. This may include inter-service communication within the cluster and if you have tight security constraints in place, you might also have more layers on top (like tunneling through bastion hosts, proxies, VPN and so on). But this is not something that you should worry about during development since the focus should be on making the changes as needed, getting the output and iterating on the same. This is why we would need seamless networking in place so that all the complexities from a networking perspective are abstracted away from the users.
- Great IDE/Editor Integrations: The IDE/Editor is where any developer does the major chunk of the work. Be it writing code, using IntelliSense and language-specific features, extensions, snippets and personally, I breathe through VSCode every second and use it to power all my workflows. This makes it very important for us to have great integrations with our editor/IDE. While you can always use the CLIs and shell scripts to get your job done, it is better to use them for one-off operations or those workflows that you do very few times a day. In addition to this, context-switching is very costly when it comes to productivity. This is why a good integration with our editor can actually help a lot.
- Support for Hybrid Development: When you are working with a microservices architecture in Kubernetes or even otherwise, and you have a lot of services you rely on to get the job done for you and fulfill the requests, you have them running on your Kubernetes clusters. But as the number of services increases, it is often not possible to have them all running locally within your system since it can require a lot of resources and this would often create bottlenecks and slow down your development cycle. This calls for a hybrid development workflow, where you just work on the service which you are currently modifying locally while you have the rest of the tools and services running remotely proxying requests to and fro to/from the cluster speeding up your development cycle while also drastically reducing the resources you would need locally. As you see below, hybrid definitely fits most of the cases without the developer needing to worry about the scale of the architecture.
- Support for Collaboration with Isolation: This might sound strange and you may wonder how is collaboration possible while also isolating your services. Kubernetes must be an enabler when multiple developers in your team are wanting to work on the same microservice at the same time without much operational nightmare. While every developer can have his/her namespace or his/her own cluster to do all the development in isolation, this can cause a nightmare for the OPS teams as they scale with OS/Version upgrades, security patches, RBAC permissions and you might even want to figure out a way to have all the config in sync. And this also comes with added costs since you pretty much duplicate every tool/service for every developer which might not be needed after all. Having a separate cluster or namespace can be very powerful for developers since it gives you complete control and a high degree of isolation, but it might not be needed for everyone in the team after all considering the complexity it adds. This is where support for header-based proxying and routing mechanisms can really help.
- Great Logging/Debugging/Alerting Support: Logs are one of the most critical parts of almost every developer’s workflow without question. And if you are using an editor like VSCode, there is a high chance that you might be using a debugger as well to power your workflows walking through all the code, variables and watching for changes as you step through. And finally alerting/distributed tracing can be useful to get you notified if you are working with multiple services and there is an error in service C because of a change you made in service A. All of these become a very critical part of the inner dev loop and hence a better logging/debugging/alerting support would make a developer’s life much more better.
- Clear and Consistent Path to Production: While concentrating on the inner dev loop is really important, it would be even better if the tool also provides a clear path to production for users. And you might say, that’s why we have CI/CD pipelines. True and it will always stay that way, but considering that any developer would test for all the code running properly on his/her dev environment, the drift/duplication of configuration between dev and prod environments can cause trouble since dev may be working and prod might still end up failing at some point. This is why it would be great if we can have a bridge from development to production.
- Support for APIs/Plugins from the Community: While we can assume a utopian tool to do all the job for us, in reality, it isn’t the case since every use case can be quite different and it is practically not possible to cater to all the myriad of developer workflows out there and it isn’t expected out of a tool either. What is expected instead is that the tool works really well with plugins from the community acting as a platform exposing all the basic APIs as needed so that developers can build their own plugins on top which fits their use case as required.
- Secure by Default: While accelerating productivity is really important for every developer and the organization as a whole, it should not be at the cost of security. This can include ability to run as non-root, avoiding privilege escalation, working with a restricted Pod Security Policy and more reducing the surface area for any possible attacks in the future.
Now that we have made everything clear about our expectations and the various ways in which we can speed up the inner dev loop, let us look at some options we have out there today.
Telepresence from Ambassador (now part of Cloud Native Computing Foundation), had been the pioneer for inner dev loop for quite some time now and can be a really good fit if you already have all the isolation in place between developers (be it as separate clusters or namespaces). It offers various mechanisms to divert traffic to/from your local machine to the remote clusters using Proxy and Swap deployments with/without docker in place.
But this can become challenging if you would like to have multiple developers share the same namespace creating a lot of conflicts in the network traffic since it has no mechanism for collaboration without Telepresence (Proprietary, initially called Service Preview) which will essentially run a telepresence proxy in your cluster and helps in routing the requests based on the headers with the help of Traffic Manager from Ambassador.
The other fact you need to be mindful of is that it currently uses SSHD and you might run into security permission issues if you are running a restricted pod security policy in your cluster.
Tilt is a great tool to have in your stack if you are someone who does most of the work with local Kubernetes clusters be it Kind, Minikube, MicroK8s, K3s or anything similar. Tilt uses Starlark for all the configurations making it really easy for you to work with, especially if you know Python without worrying too much about fiddling with YAML files and you can orchestrate multiple Tiltfiles as well. As they say, it is as simple as doing tilt up and tilt down making the experience pleasant when you work with local clusters, supports port forwarding, multiple container engines/builders and comes with a nice dashboard as well with logs, alerts and related info which you can also use as your playbook even for all the shell scripts you may have.
And while it does support remote development with the ability to do live sync, it is not well optimized for that, does not support hybrid workflows out of the box and since the dashboard is a separate interface, you will need to switch contexts between your editor and dashboard (you will definitely want to have multiple monitors in this case) which can prove to be costly over time.
But considering it is in its very early stages as a startup, I do believe that there is a lot of amazing things in the pipeline (they are currently working on Tilt Cloud) and might be interesting to wait and watch what they have to offer.
If you are someone who is looking to have one namespace for every developer or a completely isolated environment, then Okteto can be a good choice. The Okteto CLI is open source and enables you to work with any local and remote clusters with support for file syncing, SSH server support, etc. though it does not give you the complete ability of Okteto cloud with temporary dev or preview environments, automatic resource cleanup and more.
While it does offer a fast inner dev loop when working with your remote clusters, it does not support a hybrid workflow yet. But being a very new startup, we can expect a lot from Okteto as well soon.
While Garden advertises itself as a tool for making testing easy in Kubernetes, it is a really powerful tool when it comes to the rest of the development workflows as well. While it does not support hybrid workflows yet, it does support local and remote development workflows and makes use of the concept of a Stack graph where all the dependencies maintain their own configurations and are ultimately defined explicitly.
The issue here is that Garden can turn out to be an overkill for a lot of applications since you have to learn and apply a considerable number of concepts starting from Projects, Modules, Services, Tests, Tasks, Workflows and more making it difficult to start off with, lack of support for hybrid workflows and features like automatic environment cleanup are planned only in enterprise versions of the product.
But if you are willing to spend the time to set up and maintain all the configurations you would need, Garden can be very much worth the effort.
Skaffold + Cloud Code
Skaffold is in a really interesting space today helping developers not just across the inner dev loop, but the outer dev loop as well where you can not just use it for your development workflows but also in your CI/CD pipelines as well. And even more interesting is its integration with Cloud Code extension which makes the development experience really pleasant. It has powerful features including the ability to do file sync, port forwarding of services, automatic cleanup (for local clusters), log tailing, automatic tagging based on templates, support for multiple container builders, support for multiple profiles along with support for helm charts and other deployment mechanisms thus packing a comprehensive feature set for all in active development with huge community support and backed by the amazing Google Container Tools team.
But as with many other tools we have seen in this list, it doesn’t support Hybrid development workflows yet, can be tricky when you are using it with microservices (They are working on support for Multiple Configs), remote development can be quite tricky and Cloud Code is still in its early days of active development with a lot in their pipeline.
While it may not be relevant to all, Cloud Code also ships with integrations to GCP like (Cloud Run, APIs, Secret Manager, Project Explorer, etc.), allows beginners onboard quickly with Kubernetes, comes with inbuilt debugging support for multiple languages, IntelliSense for Kubernetes YAMLs, Log Viewer and more this making Skaffold and Cloud Code a great pair.
Bridge for Kubernetes (Mindaro)
Bridge for Kubernetes is a very new tool from Microsoft which recently went GA. It aims to provide the Hybrid workflow which we have been talking about all through this blog allowing the developers to concentrate just on the service they are working on while taking care of the rest of the routing by itself based on the headers being sent. You can read more about how it works here where it runs an agent in your Kubernetes cluster, uses Envoy to do the routing based on the headers, does port forwarding locally thereby allowing multiple developers to work on the same service in the same namespace at the same time without affecting the work of the other which is exactly what we are looking for here.
But since it is in the very early days, I did face quite a lot of challenges running it which I have documented here like lack of support for VSCode Remote SSH, sidecar support, problems running it with restricted pod security policy, etc. but all of those issues are definitely in their radar and we can expect something great from them hopefully soon.
One thing to note though is that Bridge for Kubernetes does not satisfy the complete inner dev loop equation all by itself and leverages the ecosystem. In fact, it hooks into Kubernetes VSCode extension and uses its API to do the rest behind the scenes. So it can be seen as an important piece to a bigger puzzle.
These are not the only tools in the show right now. In fact, we have not talked about tools like Ksync, Squash, Draft and more all of which focuses on the inner dev loop while attempting to solve the problem differently and the best way to choose the right tool for you is to actually try them out keeping your use case in mind. In fact, I just wish I can have a tool with a mix of the best of all these tools I have listed and that day is very much near us and the best way to speed the process is to contribute back and everything counts, small or big, be it contributing back to the source, attending UX surveys, filing bug reports, write blogs like this or contributing to the docs.
My vision for the Kubernetes ecosystem is that it fades away behind the scenes while allowing the developers to concentrate on the most important thing they have at hand, the business logic and problem to be solved and considering all the developments we have been seeing in the ecosystem, we should reach there soon.