Developer Empowerment Via Platform Engineering, Self-Service Tooling
As platform engineering gains wider adoption and then evolves at many tech organizations, there’s still a lot of work to be done. While it often results in technical and procedural changes — like implementing an internal developer platform or portal — the important change is in culture and communication. Which many teams are still struggling to grasp, as platform adoption wanes.
Platform engineering relies on a Platform as a Product mindset with tight feedback loops that compel your internal developer customers to adopt your platform — not forcing anything upon them.
Instead of a traditional talk, when Adriana Villela and Ana Margarita Medina took the stage at KubeCon + CloudNativeCon North America, the Cloud Native Computing Foundation ambassadors and developer advocates at ServiceNow Cloud Observability (formerly Lightstep) went for a role play. They exhibited the dynamics of the often complex relationship between internal developers and platform engineers as a way to find common ground and build empathy between these two sides of the same coin.
Learn from them how to take that empathy and find pathways for technology and communication, even when that tech is in constant flux, with continuously changing demands.
Empathy Goes Both Ways
“We do know that software has been changing, but the way that we work with software has been changing constantly as well,” Medina said. “And that’s actually been dependent on the size of your organization or the size of your customers.”
This includes the progression of DevOps for collaboration to site reliability engineering (SRE) for uptime, and now platform engineering for developer experience. But, she continued, “We also have to remember that platform engineering is not just about developer experience. It encapsulates security considerations, reliability and a few other things.”
Platform engineering is truly a cross-functional, sociotechnical endeavor.
As platform engineering grows, she continued, there’s a mix of setup — platform engineering departments that include the SRE teams, or a bigger SRE team that then has some engineers focusing on platform engineering. And then there are orgs further along in their journey like the complex developer productivity at Netflix.
“That really breaks down the silos. It allows folks to come together and really collaborate,” Median said. “And, at the end of the day, a lot of the stuff we’re trying to do is to codify things to make them more repeatable and reliable.”
But, even when silos have allegedly fallen, and even when we are talking about all engineers, do devs and platform engineers really speak the same language?
“As a developer the way we build, test and deploy has gotten more complex,” Medina said, in her role play as a developer, lamenting her loss of autonomy in this time of public cloud, serverless workloads and Kubernetes.
“Unfortunately that means that, as a developer, if I want to have access to the things that I need when I want them, I’m at the mercy of other teams to bring things up for me. I’m at the mercy of the platform engineering team and I hate waiting for people to do things for me,” she said.
Indeed a platform engineering team never is short on backlog items. But often they are stuck performing the operations role so much that they aren’t able to build those golden paths and automation.
“OK, as platform engineers, we have the keys to the so-called cloud kingdom, but, listen, it’s not all about you. It’s not all about DevEx. We also have to maintain reliable systems. And it’s too much work and we are super stressed. We are at the point where we are drowning in Jira tickets,” Villela replied, wearing the hat of a platform engineer. “It’s gotten to the point where I’m not enjoying my job. I want to work on cool stuff,” like automations and making systems more reliable.
This results in wait time that can be hours or even days, leaving devs super frustrated. “And I have to explain to management that I’m being blocked by another team and I’m constantly waiting,” Medina countered. “I want the stuff that I need infrastructure-wise ready to go when I need it. You’re literally tying my hands here.”
And, she continued, these — stable and secure — environments should look the same no matter where. “I don’t really want to be concerned with what the underlying infrastructure is running. I just need things to work.”
This friction-filled dynamic is simply unhealthy for organizations. Add to this new pressure from the business side to release even faster. When teams are smaller. All while these bottlenecks persist.
Codify the Delivery of Things
“I guess we are codifying what we deliver to you but we aren’t codifying the delivery of things,” admitted Villela, as she continued to play the role of platform engineer.
By aiming for delivery as code, teams achieve infrastructure that is:
- On demand.
- With security guardrails built in.
- More efficient both in terms of cloud cost and the environmental impact.
While some organizations do build from scratch, there is already a plethora of internal developer platform and portal tooling that focuses on self-service provisioning interfaces, including:
It doesn’t mean an either/or, Villela later mentioned, but rather two or more of these tools can work together to drive internal developer experience — something she later demoed. Value especially comes, Medina added, when it’s all done in declarative code, which increases repeatability and reliability.
Villela then proceeded to go through a demo of typical internal developer platform or IDP offering, including:
- OpenTelemetry: generates, ingests and transforms data to an observability backend for analysis.
- OpenTelemetry Collector: used to ingest data from infrastructure and/or code, and then to transform data using processors to do things like mask, batch and create sample data. Then it uses an exporter to an observability backend.
- OpenTelemetry Operator for Kubernetes: to manage and deploy the OTel collector on Kubernetes.
As the role-play continued, Medina the developer remarked that none of what was listed was any sort of the tooling that her team was looking for. Because Villela wasn’t treating her as a customer. We know that instead of naming products that go into a platform, it’s better to sell the many platform changes, using a Platform as a Product approach to show the benefits in the context of the internal developers’ pains. And keep it small and iterative, closing that developer feedback loop.
Kratix’s ‘Promise’ to Do Better
What app development teams most often want are tools to enable their speed and autonomy. To not only build, test and deploy with less friction, but to be able to understand what’s going on in their apps and then more easily debug.
Villela’s internal sales pitch — including visualization of workflow — pivots at this point to talk about how her platform team is packaging all of the aforementioned cloud native tooling behind Kratix, an open source platform framework which, she explained, delivers capabilities via a Kubernetes YAML files with a “Promise.”
This Promise is actually an encapsulated functionality that allows developers to request resources that leverage a certain capability or Promise. Kratix doesn’t just deliver the capability but bundles it as a Kubernetes native API-as-a-Service. Remember that the preferred way developers want to access an internal developer platform is via the extensibility of an API.
The platform team owns a control cluster where Kratix is installed, and then the different developer teams each interact with their specific infrastructure, in this case, an additional Kubernetes cluster that has been provided per team. The platform team has ownership over these worker clusters too, so Kratix can then install the capabilities for the dev teams.
In the live demo, the developer needs to submit a simplified YAML document as an API request and receive back what they really want — an example Go application as a Service — already configured to view observability data in a Jaeger user interface. Under the covers, the KubeCon audience saw that the platform provides this simple interface by managing both a cert-manager Promise and an Otel operator Promise on the platform cluster, which then installs their necessary parts onto the relevant worker clusters.
For some organizations, asking application developers to write Kubernetes YAMLs is itself an antipattern when thinking about developer experience. This is why building a user-friendly developer platform API on top of the Kubernetes API using Kratix starts to pay dividends. Just as with any API, a platform API enables consumers to use the right interface for them, whether that be an internal developer portal like Port or Backstage, a CLI or scripting language, or even a chatbot.
Tips for Developing a Platform as a Product Organization
The pair offered some further takeaways for how to nurture a platform as a product mindset within an engineering organization:
- Don’t just rename your DevOps team to a platform team and then not change processes and communication.
- Remember, as Atlassian’s developer experience reminds us, every organization has different needs and a different experience.
- Platform success comes down to two-way communication and collaboration across development and platform teams.
- Try to codify as much as you can.
“We can see that, in scenarios like this, teams — like SREs, developers, platform engineers — can all come together and work together and self-serve tooling to make things more pleasurable, more reliable, and having more collaboration involved,” Medina concluded. “And let’s be honest, the less we use Jira, the better it is for everyone.”