Spotify: Bigger the Codebase, the More Challenging the Migration

Here’s yet another article about platform migrations. But this one is different, scouts honor (full disclosure though it does include everyone’s favorite topic — technical debt). The information comes from Spotify, an excellent source because their mobile code base is constantly growing, by 29% in 2019, 49% in 2020, and 23% in 2021. More features, more experiences, more engineers. And the site keeps on working.
A year and a half ago the platform team began a platform migration initiative with the Mobile Engineering Strategy program working on multiple migrations to allow client features to be developed in an isolated environment — similarly to backend microservices.
“Some of these platform products will inherently require migrations and, potentially in a large scale and we should consider them part of the development lifecycle, together with testing and design,” asserted a blog post written by Spotify Staff Engineer Mariana Ardoino and Spotify Engineering Manager Raul Herbster.
The post captured what managing a platform migration looks like rather than what the code base looks like.
Spotify believes the bigger the codebase the more challenging the migration. And since the organization is only going to get larger, engineers will find that challenging migrations will be the new normal for delivering efficient platform solutions at scale. Having gone through a few of these, the team over at Spotify has some wisdom to share about it.
Define the Scope
The before picture looks very unmanageable. A lot of changes need to be made across the several use cases that have to get addressed and that’s before the technical debt is even considered. Adding to the pressure are the stakeholders/upper management…
Spotify recommends starting small and building out by finding the manageability. Define the goals and values of the product and use that to create the product brief. Once the what, why, and how are solved, it’s much easier to address everyone else and share the ideas, plans, and projections.
Start small when it’s time to start. Create a proof of concept, validate it with stakeholders/upper management and get the migration through alpha, beta, and GA product lifecycles. One by one add in the use cases as they get discovered. Find early adopters who are eager to try your solutions, collaborate, get feedback.
Scale up
How do you solve a traffic jam? Large numbers of teams are affected by changes, there is much work to be done including manual changes, the work is happening slooooow, and once again, management is confused and frustrated by the changes…
A traffic jam is solved by finding a way to get things moving again. And there are a few ways to do that here. One option, if there is a lot of work is to consider automation. Is it worth it to move to a more automated system? If a lot of the manual platform migrations can be changed with a script then that answer may be yes.
Another option is to increase effort. Spotify suggested making time for spike weeks. Partnering with other engineering teams to jointly dedicate time to work on the migration.
Education and training is a solid approach also. Getting the tech onboarded and adopted early and correctly will keep things moving forward as well. Training on the new tech, best practices, and new concepts is important with manual or automatic migrations.
Define Priorities
This is definitely not a platform engineering migration problem. An entire article can get written about the balancing act between working on the long-running to-do list (ie the technical debt) vs. the neverending new tasks getting added on a consistent basis.
The solution Spotify offered also isn’t a platform engineering migration-specific solution and has a great general-purpose feel. Evaluate on an ongoing basis. A suggestion was quarterly checkpoints to see the positives the migration in its current form vs goals for what’s next and the ideal timeline. Manage risk, the migration was decided upon for a reason. Should goals get streamlined? Would hiring contractors change? Noting is set in stone and everything is always in motion including new project priorities, migration priorities, and goals.
Be Accountable
Large ongoing infrastructure changes might lose the interest of some teams and this may cause issues with education and adoption where the long-term effect could then be longer migration time which is the goal of no one.
Infrastructure changes aren’t necessarily the most exciting part of coding although platform engineering is getting its moment in the sun these days. Keeping others engaged in the migration is similar to keeping people engaged in anything else. Include and inform them.
Tips for working together and informing others starts with a solid roadmap for the platform engineering team. Once the internal goals and tasks are organized and clear it’s easier to have a united front with others. Present process with graphs and models; dashboards are a great tool as well. Share clear information about when the migration is happening and when it isn’t. Include any specific start and end dates when possible.
Conclusion
Large, complicated migrations are it. Certain changes may be impossible without them. This is the reality of the world in which the tech community either lives or will live in. New technologies and an ever-advancing ecosystem will certainly help to drive these migrations. They take a long time, a lot of patience, and oversight that involves the management of people, expectations, and understandings in addition to code.