How Spotify Adopted and Outsourced Its Platform Mindset
“The Platform team creates the technology that enables Spotify to learn quickly and scale easily, enabling rapid growth in our users and our business around the globe. Spanning many disciplines, we work to make the business work, creating the frameworks, capabilities and tools needed to welcome a billion customers. Join us and help to amplify productivity, quality and innovation across Spotify.”
This Spotify job ad for a role on its Platform Mission team is one of the best definitions of platform engineering, nay, platform enablement we’ve found. It’s not surprising as the audio streaming service is known for its unique culture of collaboration, transparency and simplicity. And Spotify contributed Backstage, an open platform for building developer portals, to incubate at the Cloud Native Computing Foundation.
The New Stack sat down with Spotify colleagues Marcin Floryan and Helen Greul to learn more about the company’s unique ways of working alongside technology to enable developer productivity for its 6,000 engineers.
Effectively Networking 6,000 Engineers
Spotify’s Platform Mission team — under a few different names, structures and iterations — kicked off about 10 years ago, as a way to connect platform builders with platform consumers, or to connect colleague with colleague. It has since grown to about 600 developers, or about a 10th of Spotify’s technical staff.
With tens of thousands of microservices, Greul told The New Stack that the Platform Mission team doesn’t own all of Spotify’s services, but it facilitates the internal marketplace of platforms.
Greul, as senior engineering manager, works in the developer productivity department that is responsible for Backstage, the single pane of glass that she calls the “all-encompassing developer go-to portal,” including data, documentation and tooling. The department doesn’t own it, per se, but rather curates it. The Platform Mission team works to facilitate the discovery of those capabilities and the relationships needed for developer productivity.
Platform Mission achieves this because, as Greul continued, it is a “really beautiful cross-functional team” that includes not only technical roles, but marketing on the platform level to promote internal products, including the MegaCAB, the annual customer advisory board workshop that brings together its internal customers to highlight the biggest opportunities and gaps the team can fill.
“Even though we’re part of the Spotify organization, we’re trying to act as if we were like a separate company providing tools to all of the developers internally just to keep that mindset,” Greul said. “And, yeah, not become one of those teams that keeps their audience captive in a way so they have to use the internal tooling because there is no other choice.”
The Cultural Side of Platform Engineering
Platform engineering is a sociotechnical discipline. Yet, for decades, organizations have only focused on the technical side of platforms, which leaves most engineers across most organization asking, “Who does what?” This can be harder in an organization that embraces flux.
Part of the Spotify Platform Mission is to help colleagues navigate the organization. A lot of the job of management on the Platform Mission team is communicating who does what across the organization. However, Floryan said, an expectation of everyone’s job is creating a cross-organizational network, something Spotify facilitates through Intro Days and bootcamp teams when new employees start.
Pre-pandemic, Spotify was an intentionally co-located company. But as it has moved toward work-from-anywhere and hybrid, the organization has created a wiki and slide decks that outline the different parts of the company. From there, a culture of open questioning in Slack helps fill in gaps.
First Step: Ask about the Status Quo
Floryan, director of engineering at Spotify, has referred to his job over the last couple years as either a tech Ops lead or an enterprise Agile coach for the Platform Mission team. A lot of his role focuses on the often-forgotten processes and ways of working that must drive platform teams.
Platform engineering should start with identifying where you are today and your company’s gaps, according to Manuel Pais, co-founder of Team Topologies, an organization built around an engineering management structure.
At Spotify, this organizational data includes GitHub activity, builds and deployments, system health and employee satisfaction surveys. There’s also ample data on human resources, like structure level, promotions and tenure. A large source of data Floryan pointed to was focused on uncovering networks and dependencies, understanding who works on which teams on which parts of the product. The key questions, he said, are:
- How much is each product team’s workload?
- Are they able to make progress?
- Are there teams that become the bottlenecks?
“The ultimate goal for this role was to understand the organizational ways of working, primarily through the lens of technology and see what works well and we can scale up, what can work better and we can improve, and how the hell do you actually understand what happens in an organization of thousands of people when there’s no easy way to just talk to everyone and get a feel for what’s happening in the org,” Floryan said, looking for ways to leverage existing sources of organization to build that picture.
Like all good user experience research, his team has also spent a lot of time asking questions of their internal customers, including:
- What kind of tools do we give to our developers?
- How do we kickstart their work?
- How do we automate most of their tasks?
- How do we give them the right information to make the right decisions?
Also, Floryan added, “What tools do we build to make it basically as easy and seamless for them to deliver value to the company rather than getting bogged down in the technical details of spinning out instances in the cloud?”
Constant Change at Spotify
You might assume that Spotify will just organize around the Spotify Model, but Floryan says his role aims to be much more inquisitive. He investigates whether the company’s engineers have the right team structures and timely, relevant information so that they can align with product needs.
“Do they have the right information at the right time to make the decisions on how to plan, to how do they structure their delivery?” he asked. “Are they trying to do too many things at once or fewer things in a sequence. If they do the sequence, how do they sequence the things, how to sequence the things to avoid? You can think of the critical path analysis for example: How do we avoid blocking?”
With this information, he said, it’s time to then understand if the organization has the skills needed in the right spaces. “Are our experts working on the most impactful pieces of work across the company?”
Finally, to make software delivery teams more effective, Floryan’s work gauges how effective team collaboration is. “And then do we avoid collaboration? Because that always has cost and overhead.” This brings up inner-sourcing — like open sourcing, but within the organization, in which anyone can submit a solution to questions like, “How extendible are code components? How self-service is the platform?”
Whenever you have a team that owns a lot of components, that means its engineers are getting a lot of requests from all over the organization — often conflicting requests. This creates a bottleneck. “That’s a signal to invest in that team, [by] increasing headcount,” Floryan said, or it could even mean splitting that team into two or three teams.
This is as much Conway’s Law — designing systems that mirror an organization’s communications structure — as it is about very flexible organizational structures and processes. “A lot of this is just creating that expectation of some level of fluidity of an organization,” Floryan said. “And the fact that the structures aren’t too rigid, so these changes are possible to make.”
In fact, Spotify’s third stated value is: “Change is our constant.” So that’s a built-in expectation of a Spotify engineer’s career. Floryan called it organized chaos, where everyone is “having to navigate uncertainty and ambiguity,” helping everyone find the right solution when things change, to adjust the plans and/or streamline some of the processes.
Of course, change came rather abruptly last month when Spotify laid off 6% of its staff. While that’s unfortunate, Floyan said, he believes Spotify teams are more equipped than most to deal with the change.
What Successful Platform Enablement Looks Like
While Backstage is the most public accomplishment, some of the internal platform accomplishments are equally notable. The Platform Mission team recently announced a measurable project with FleetShift, Greul said, “a declarative way to manage your infrastructure and to make any improvements and upgrades — like [making] routine mundane upgrades automatic — as long as you keep your services up to a certain standard.”
At a high level, she added, FleetShift lets any engineer or team make a large refactoring, code or library change, leveraging GitHub bots to execute the changes.
“The big disclaimer is that the services need to be compliant with a certain standard. There has to be a certain level of testing, a certain level of [what] we call a golden standard. If your infrastructure is not on par with the latest and greatest,” she emphasized, then the changes cannot proceed.
How do teams know if they are compliant with standards? The Platform Mission team has worked hard to gamify it. The open sourced Tech Radar plug-in within Backstage rates the languages, frameworks, processes and infrastructure and where they fall under categories:
- Use: The approved golden pathways for the majority of teams with the specified use cases.
- Trial: Under evaluation for specific use cases, having already exhibited clear benefits
- Assess: Being experimented with, so using in production comes at a high cost and risk
- Hold: Technology that is no longer invested in and teams should avoid it moving forward, and following the recommended migration path to a supported alternative
If your team achieves the golden standard, then it receives a coveted public badge as well as hoodies, stickers and, when the company was co-located, special cakes.
This is reflective of the platform mindset that Spotify has embraced from the start, Greul said, not a way to try enforcing any standards.
Everything is about standardizing, templatizing and automating as much of the mundane boilerplate as possible, so teams can focus on their niches of expertise.
One of the most common measurements of platform engineering is time to onboard. When Greul moved to the 120-person developer productivity unit about two years ago, it took, on average, 110 days from when a developer started to feel a sense of productivity, having made their 10th pull request. Now, after intentionally reducing competing standards, it’s merely 20 days.
“We’re trying to make it so it’s easy to do the right thing,” she said. “Or it’s fun to do the right thing.”