What We Learned at PlatformCon
Platform engineering is constantly evolving. PlatformCon 2022, the virtual conference created by and for platform engineers, is proof.
Platform engineering is the “discipline of designing and building toolchains and workflows that enable self-service capabilities for software engineering organizations in the cloud native era. Platform engineers build what is often called an internal developer platform (or IDP), that covers the operational necessities of the entire life cycle of an application.”
The discipline rose from the ashes of “you build it, you run it” DevOps, which worked well for large companies but disastrously for organizations with less access to resources and talent. It aims to reduce the cognitive load on developers that resulted from this stage of the DevOps evolution.
In his keynote, Puppet field CTO Nigel Kersten raised concerns that, like DevOps, platform engineering could become too ambiguous a term to provide real value. He urged the community to come up with a descriptive model of what platform engineering actually entails. In many ways, PlatformCon seemed to be the start of this process.
With 78 talks to dive into, it took me most of the summer to watch and absorb the insights from PlatformCon. Experts like Kersten, AWS’s Gregor Hohpe, DevOps consultant Manuel Pais, and consultancy OpenCredo’s Nicki Watt weighed in on the most important platform engineering topics. Speakers shared their platform stories, the choices they made and what they learned from the process.
Almost 7,000 platform engineers from around the globe tuned in to the livestream, chatted with speakers on the platform engineering Slack channel and chimed in on social media with their thoughts.
Whether you’re new to platform engineering or have built a platform from scratch, PlatformCon has a lot of insight to offer. If you missed it, here’s a recap.
The Core Problem: Cognitive Load
Cognitive load refers to the amount of information a person must process to complete a task. When this load exceeds our working memory capacity, we struggle to complete our tasks. In her talk, Syntasso COO Paula Kennedy illustrated how the evolution of DevOps over the past years has exponentially increased the cognitive load on developers. Developers are now not only expected to write code, but also run it in production.
Dealing with microservice architectures in a cloud native setup often requires knowledge of Kubernetes, infrastructure provisioning, deployment pipelines, configuration management and more.
With tons of new tools and frameworks for developers to learn and apply, it’s nearly impossible to keep up. All of this gets in the way of developers’ most important responsibility: delivering features.
Platforms are built to improve the developer experience, or DevEx, by reducing the cognitive load while maintaining an appropriate degree of freedom for developers.
Platform engineering tries to solve these problems by building platforms that enable and support developer self-service. Humanitec CEO Kaspar von Grünberg describes platforms as “the sum of all tech and tools that a platform engineering team binds together to pave golden paths for developers.”
Platforms are built to improve the developer experience, or DevEx, by reducing the cognitive load while maintaining an appropriate degree of freedom for developers. Most simply, DevEx is about creating an environment where developers are empowered to do their best work.
It touches on providing proper tools, maintaining sufficient documentation, finding the right level of abstraction, automation and enabling developer self-service.
DevEx is enhanced when platforms provide golden paths. A golden path is an opinionated and supported path to build something. It suggests tools and frameworks to use for day-to-day tasks, but maintains developers’ freedom to deviate from the path where needed.
Golden paths prevent engineers from reinventing the wheel and reduce the number of decisions developers have to make. All of this helps developers free up mental capacity to put toward shipping code more quickly and efficiently. This flexibility makes golden paths a crucial component of a successful platform.
Platform as a Product
In his talk, Manuel Pais, co-author of the book “Team Topologies,” discussed the concept of platform as a product. Like any other product, he explained, platforms are optional to use, carefully designed to make using them easier and evolve as technology changes. As such, he argued, it is crucial to apply the same principles and processes to platforms as you would to products.
This involves conducting user research, creating a product roadmap, soliciting regular feedback, iterating, launching the platform and marketing it internally to your customers: the developers. Platforms that lack user buy-in, aren’t designed to address the needs of its users and don’t evolve their capabilities over time are likely to fail.
Many PlatformCon speakers advocated starting with user research. As Hohpe, author of “Cloud Strategy,” put it, you can either “be smarter than everyone else and anticipate all of their needs” or take the approach more likely to be successful: “evolve the platform based on user needs.”
Doma’s director of platform engineering Michael Galloway argued that unless you gain a deep understanding of developers’ pain points and what they’re already doing to mitigate them, you probably won’t build a platform people will actually want to use. He suggested interviewing your platform’s users about what they’re working on, what tasks they do most often, what tools they use, what they do and don’t like, etc. (He shared the questionnaire his team at Doma used.) He also emphasized the importance of seeking many perspectives to gain a more thorough understanding.
From your research, you can map the journey your users take to complete common tasks like building or debugging features. You can also incorporate other sources like onboarding documentation and other materials the team maintains.
Unless you gain a deep understanding of developers’ pain points and what they’re already doing to mitigate them, you probably won’t build a platform people will actually want to use.
SuperAwesome senior engineering manager Olga Sermon stressed that user research at the offset isn’t enough. She argued that platform teams should solicit regular feedback from users and use that feedback to iterate the platform. In her talk, she outlined how her team built a strong line of communication between the platform team and users. They presented problems, goals, possible solutions and their use cases to users. They then invited comments and used them to improve their platform.
Developers aren’t your only stakeholders, though. Midokura principal software engineer Galo Navarro and HelloFresh platform product lead Jessica Ulyate discussed how to get buy-in from other areas of the organization. They explained the unique perspectives of each stakeholder group and why an understanding of their differences is crucial to a successful platform.
For example, CEOs and executives are focused on strategic, long-term issues, so it’s helpful to understand and communicate how your platform has a measurable impact on those goals. Managers are juggling a lot: fostering healthy collaboration in their team, creating something that is high quality and creating something that helps business objectives. For them, you’ll want to understand which demand is causing the most pressure and find ways for the platform to alleviate that problem. SysAdmin and DevOps, on the other hand, want to know how the platform gives them more opportunities to create impact across the organization.
Hohpe also discussed the importance of communicating with stakeholders. He explained that, if you’re building a platform, it’s likely to be on top of a base platform (think something like Kubernetes) that is going to grow over time. He believes it’s important to decide what your platform will do when that happens and to communicate that to stakeholders.
One option is to keep the platform the same. This can help justify your investment in the features you built. However, as the base platform grows, your platform will start to duplicate things also covered by the base platform. As the “water level” —or functionality of the base platform — rises, your platform stays in the same place and “sinks.”
The other option is to build a “floating” platform. As the base platform gains the capabilities you built, you can ditch the redundant functionality and devote your resources to iterating beyond what the base platform can offer.
Both are reasonable options. It’s critical to make your choice clear to your stakeholders. For example, if you build a floating platform, stakeholders should be prepared for you to throw away things as the base platform capabilities catch up.
Platform as a product has a lot of aspects to it, but there was one theme: Understand your customer by avoiding assumptions, ask good questions, iterate on feedback and focus on the most important things first.
Provide the Right Level of Abstraction
One of the biggest challenges in building a platform is choosing the right level of abstraction. This is a crucial aspect to reducing cognitive load on developers.
Here, we see two main directions that are different but not mutually exclusive. On one hand, there are more code-based, declarative approaches inspired by GitOps workflows. On the other hand, there are UI-based solutions. Both attempt to shield complexity away from developers.
Mathieu Frenette, director of DevOps at mortgage site nesto, covers the first approach in his talk “On Growth Challenges, Generic Helm Charts and Golden Paths.” He explains how nesto’s growth created more complexity: more developers, more customers, more use cases, more microservices. This led to too many infrastructure repos and environment branches with loose Kubernetes manifests. It became more difficult to adequately oversee and manage everything.
To address these new challenges, the platform team created different levels of abstractions based on a new, generic Helm chart.
“Instead of the tempting approach of defining an individual chart for each microservice,” Frenette explained, “we went for a unique generic chart common to all microservices.” This enabled a better separation of concerns between developers and the platform team.
But this solution risked turning nesto’s golden paths into golden cages. To mitigate this issue, they defined different levels of abstraction for their developers. Complete recipes provided a path of least resistance with only the useful parameters exposed and best practices built in as a default.
Generic modules, which offered a bit more structure, provided reusable building blocks that developers could combine as needed. Custom extensions were available for exceptional cases in which developers needed raw resources or a custom chart. Use of custom extensions was then refactored into golden paths as much as possible.
Nils Balkow-Tychsen, principal engineer at Humanitec, demonstrated a platform-as-code approach using GitHub Actions for code scaffolding and Humanitec’s Platform Orchestrator. With this approach, the repository template has everything developers need to turn code into a running version of a service, fully automated in any environment — the code itself, the initial configuration, a list of dependencies, and the definition of the CI pipeline. (You can check it out for yourself using this Node.js application repository template.)
As Nils pointed out, this approach helps enforce standards in the development process, provides fast and efficient automation, and opens an opportunity to spin up fully provisioned dynamic environments.
Frenette and Balkow-Tychsen’s approaches are examples of more code-based abstractions. However, other teams attempt to reduce cognitive load with UIs on top of their platform setups. Netflix senior software engineer Brian Leathem shared how its Platform Experiences and Design (PXD) team adopted Backstage, Spotify’s open source developer portal and service catalog, to reduce context switching and unify the developer experience on top of its GraphQL-based platform API.
The code- and UI-based approaches aren’t mutually exclusive. In his talk on architecting a dynamic internal developer platform, Frontside CEO Taras Mankovski demonstrated how both can work together. Using Backstage as a service catalog and the single source of truth, he combined a code-based declarative application model to enable dynamic configuration management with a platform orchestrator.
PlatformCon 2022 was packed with internal platform use cases, blueprints and stories from a wide range of companies and platform practitioners. Speakers revealed the beginnings of an exciting, albeit nuanced, roadmap to succeeding at platform engineering.
And while there is no definitive recipe that will work for all organizations, many speakers stressed that continuous maintenance and development of the platform are key ingredients.
This can require repaving golden paths for developers, conducting user research, marketing the platform to different types of stakeholders and pinpointing the right level of abstraction. It might not be exciting work, but it’s crucial.
Building, implementing and maintaining a platform is a long journey, but you can get there. And PlatformCon 2022 talks are still online to help lead the way.