Q&A: How Team Topologies Supports Platform Engineering
This year has kicked off with a lot of conversations about platform engineering. In every interview I had about this sociotechnical discipline, Team Topologies was mentioned.
This approach to software development and business that optimizes for flow was created by Matthew Skelton and Manuel Pais. First they co-authored a book, published in 2019; then they expanded it into an organization that runs an academy to teach the practice.
We talked to Pais about how to leverage Team Topologies and how it interacts with platform engineering. This interview has been edited for length and clarity.
The New Stack: Can you introduce yourself and tell us how Team Topologies came about?
Manuel Pais: I have had 20 years of experience in different roles, from starting with development, testing, being a team lead and then consulting, especially around DevOps and continuous delivery. I met Matthew [Skelton] in 2014, and we started working together in these consulting engagements with clients around the world, mostly starting from, “we want to adopt DevOps” or “we want to have something like continuous delivery.”
Often what we would find [were] problems in terms of the way teams interacted or didn’t interact, lack of clarity of what was the purpose of a team, teams [that] were overloaded. Especially what we call now “platform teams” that often had too many stakeholders, and so they’re spread very thin and not able to provide actual value to their customers.
As we were working with clients, we discovered different patterns, some good and some not-so-good. And we realized this is broader than DevOps. And that’s where “Team Topologies” came in.
So the book applies engineering principles to the way we organize and set up our teams, we set up interactions.
What is a “Team Topology”?
“Topology” means the ways different parts of a system are connected or arranged. So when we’re talking about Team Topologies, there is no single right topology or ideal operating model for an organization.
But it’s the fact that, if we have the tools, if we have the building blocks to understand how we can put them together, how can we arrange them in a way that fits our organization, that fits our goals and our challenges, then we’re probably in a much better place than, let’s say, just adopting some framework because that’s what we’ve been told or sold that is the right way to organize.
What’s important is to start with where we are today and figure out what are the gaps. Do we need to build some new capabilities in the organization? Do we need to have teams with more ownership of their products so they can go faster, they can experiment and they can make things better for customers much faster than today? If we need those things, what are the types of teams and interactions that we should have in place?
This might lead to: We need to make some changes. Maybe we need to create some new teams, like an enabling team or platform team or we need to just change responsibilities or bring someone else onto the existing team.
“I see so many times: ‘Let’s reorg to this new operating model.’ And it’s based on assumptions that this is all going to be better. And many times it doesn’t get better — or not as much as we expected. What we need is much smaller steps on a continuous basis.”
—Manuel Pais, Team Topologies
For me, Team Topologies is not a framework. It’s not a model. It’s just really, to me, a set of ways to think about the organization. And then some helpful patterns in terms of types of teams and interaction modes, and how do we evolve, how do we sense when we need to change the organization in a much more continuous way than we’ve seen in the past.
How do people get started? Does it kick off as a top-down activity or can it be done on a team-by-team basis?
Platform teams are not a new thing. I think what’s different about Team Topologies is we’re not talking about platforms in isolation. This is a network of teams that we need to get things done for the organization, for our customers.
Yes, you typically need some kind of platform. It might be very, very small, very thin. In a small company, you might just have a few people that help define what is this platform, [and] how do we do certain things in a more effective way? At larger organizations, this might be multiple teams – even multiple platforms – that are helping.
Why do we need them? Why do they exist? In Team Topologies, the first goal of a platform should be to reduce cognitive load on the stream-aligned teams that are developing customer-facing services or products.
I think that’s why you hear so much about Team Topologies within the context of platform engineering. We talk about the Platform as a Product because it needs to help their platform customers, who are the other teams, do their work more effectively, reduce their cognitive load [and] how much effort it takes to do certain things.
Whereas in the past, we just talked about platforms in isolation. We would be thinking more about the technological side: What should we build? What is happening in other organizations? So then we need to build the same thing or we need to use the same new technology. Yes, we need to take advantage of technology, but that’s shouldn’t be the starting point.
You called a software organization a network of teams like an “organism.” Is that because Team Topologies is constantly changing? How often should an organization review its Team Topologies?
What’s still common are big [reorganizations] where some companies every two, three years, they’re doing a reorg and they’re shifting people around and changing their roles. And then they’re surprised why people are not engaged or burned out. It’s not an effective way.
Instead, we should be thinking about: What’s the smallest change I can make on an organizational side to get feedback and understand if this change is really helpful? I see so many times: “Let’s reorg to this new operating model.” And it’s based on assumptions that this is all going to be better. And many times it doesn’t get better — or not as much as we expected.
What we need is much smaller steps on a continuous basis. For example, we need help with data science or we need to have teams that have more data insights. Maybe we only have a centralized team of data scientists and they’re overloaded. It’s too slow. They’re a bottleneck. What can we do?
Maybe let’s start by having these data scientists do some kind of enabling work. Help the other teams learn the basics, so they can do some stuff by themselves. They won’t be experts from one day to another, but maybe the basic stuff they can do and not rely and depend on that centralized team anymore. That’s a very small change.
Ideally, you want a culture where the teams themselves are acting as sensors of, “We need to change something; this is not working. Maybe we have too many dependencies that are blocking us or we don’t have the skills that we need.”
We need to look at Conway’s Law. We need to look at Dunbar’s number for trust groups. Management needs to have a better understanding of the constraints that exist, that shape the way that teams are working together, and then provide a culture and autonomy for teams to also decide on the small changes that they need to make. At the end of the day, you need both a top-down and bottom-up approach to really succeed.
Tell us about the different types of teams, according to Team Topologies.
The stream-aligned teams are the teams that many would call cross-functional product teams. But today, for most products that I see, you cannot have just one team. It’s not enough. And so we need to identify value streams at multiple levels.
I was talking recently with a large pharmaceutical organization. At the top level, they really have two to four main value streams — drug development, diagnostics, and something else — and then you have thousands and thousands of people working for those three value streams. So you need to break it down: within this large value stream, what are the smaller streams of work that we have that might be aligned to different types of customers, different markets, different regions?
And then you go one level down. Within these different value streams, what are some of the products that we are building? Within the products, what are the different streams that might be different customer journeys or different sets of features that are fairly independent within the product, etc.?
So this kind of hierarchical value streaming within an organization can be quite helpful. Each stream-aligned team has end-to-end ownership of a stream of work, which might be one service that is part of a product, or it might be a set of features that is part of a product. But you want that team to have that ownership.
Relating to DevOps, these teams have build-and-run ownership. They have also ideation ownership. Ideally, they experiment, they get data from the users and they understand what we need to do next.
Then we talk about enabling teams, typically a small team of experts around some domain — like data science, security or user research.
Whatever is some domain that requires expert knowledge, how can we bring this knowledge to the stream-aligned teams in an effective way, basically reducing the learning curve to get teams up to speed with what they need to do to make our application or service more secure? How do we get the data that we need and how do we analyze the data to get the insights that we need about our service?
So the enabling teams [are] going to do this with the stream-aligned teams by teaching, by mentoring, by helping them learn quickly.
And then we’re talking about the platform teams. It’s sort of an alias, right? In the sense [that] the platform team ends up being a stream-aligned team themselves, with end-to-end ownership, only that the service they are providing is internal for consumption of other teams in the organization, for internal customers.
What ends up happening in medium to large organizations, that platform team is sort of a group of teams, where you might have inside the platform team, multiple stream-aligned teams, aligned to different services. Maybe one team is working on a monitoring service and other teams are working on some kind of [site reliability engineering] -related services and so on.
“That’s an anti-pattern I’ve seen very often, where the platform becomes a bucket for everything, and just becomes a huge mess with lack of ownership, lack of focus and a lot of waste, where teams are overloaded and working on a lot of stuff that’s not really a priority.”
—Manuel Pais, Team Topologies
And inside the platform, you might have also enabling teams. While at a startup or scale-up, the platform might just be some sort of Wiki page, where some people have gathered together some guidance on how to create some databases, how to do some stuff in an effective way for other teams to get started faster — something that reduces cognitive load.
[As you scale] you want each platform to, A, have sort of internal cohesion, [and] B, provide a consistent interface and way of being used by their customers internally.
For example, an infrastructure platform is different from a data services platform where maybe the data services consume the infrastructure platform, as well as provide data services to other teams in the organization.
You need to find the boundaries between different platforms, so that each platform has cohesion internally, to work together on how do we provide this to customers in a consistent way, that doesn’t confuse them. But also you need decouple platforms, that effectively these are two different things, and they should be separate, so they can evolve more independently.
And that’s also an anti-pattern I’ve seen very often, where the platform becomes a bucket for everything, and just becomes a huge mess with lack of ownership, lack of focus and a lot of waste, where teams are overloaded and working on a lot of stuff that’s not really a priority.
The tech industry is at a time of tech layoffs and hiring freezes. A lot of what you’ve described implies budgets for hiring and training. How would you adapt any of the Team Topologies to these leaner times of transition and burnout?
What’s missing is a team-first approach – to consider teams as the smallest unit of delivery of value to customers. Meaning, we don’t focus on the individuals as much and we focus on the teams: what’s going to help teams be able to make better decisions, be more autonomous in their work and deliver faster to value to customers?
This is often at odds with the way other parts of the organization works, like training budgets. We have this online academy with video-based training. We had a director of engineering and they have this training budget per individual per quarter – maybe €100 per quarter. This director of engineering couldn’t purchase one of our courses because they didn’t have enough budget in the quarter.
I think this is a good example of what not to do. It’s very focused on the individual and it’s not supporting the idea that the team should decide. From a training perspective, in this case, you don’t need to increase the training budget, but instead could have a bucket where if we have 10 people in the team, then we have €1,000 and that’s something we can use on a quarterly basis.
That’s why we call it a team-first approach. It’s also about individuals on the team putting the team first and then kind of thinking [about] what’s helpful for the team and the organization.
Thinking about the team as a unit and then the question of the layoffs is also interesting because, yes, we might have smaller teams because we have less people, unfortunately. But the team as a unit still remains and the team is more capable to absorb the impact of layoffs.
But you don’t need to break up teams. You keep the team as a unit, even if there’s some less capacity. Team Topologies remains valid in a layoff period or a downturn.
What you don’t want to do — and I’ve heard a lot of examples of companies — is let go of the people who have more expertise, possibly because the costs are higher. When you have these people in the organization, they’re very valuable.
What we say in Team Topologies is that you don’t want them to be bottlenecks. You don’t want to depend on the experts to get things done. You want the experts to enable others. And so the network effect of having experts that are helping other teams learn wherever we have gaps in the organization. Don’t let go of the people who are the experts, those are the ones who are going to be powerful if they embrace this idea of teaching and helping others.
The platform’s another area where organizations tend to start to cut budgets or even layoffs because they think, “Oh, this is expendable because it’s internal work on internal products.”
But you need to look at the big picture: If the platform is done right, in this kind of platform-as-a-product approach, then potentially you’re seeing gains in the teams that are developing customer-focused products that are much higher than what you’re going to gain by cutting some of the platform services and teams.
How does Team Topologies define teams?
We define “team” as a group of usually between five and nine individuals that have worked together [for some time] and identified how they like to work. They understand each other, have a common mission that is aligned to customer and organizational goals.
This is a long-lived or durable team. This is a unit that we keep together. It doesn’t mean that it’s static — of course, people are going to leave, some new people are going to join – but the team as a unit stays stable.
The work might change where maybe we need to stop investing in some product or some stream of work and we’re investing in some other stream of work. But we keep the team as the unit because that’s where there’s a lot of value. The team has gone through what some people know as the Tuchman model of team development, where a team takes time to really know each other, trust each other and then you can reach higher levels of performance.
“There are different ways where you can bring innovation and new knowledge into the team. It’s not just bringing in new people. You can make sure people have the time, availability and budget to learn new skills, to do training or to be helped by other teams.”
—Manuel Pais, Team Topologies
That’s different from what still happens [when] organizations create teams to deliver some project or some changes and then disband the team. Then you don’t have ownership of the services or capabilities that you build because you just move people around. And so that’s not very effective, because we know most of the cost of successful software is not in the development, it’s maintenance and support.
If you have a team that owns that service, that can make sure it runs smoothly, that you integrate new approaches. You make sure that you know it keeps providing value to customers.
Can that stagnate innovation? David Dame, Microsoft’s director of accessbility, said that a team that’s been together for 18 months or more has created such a common way of thinking that it decreases diversity and innovation.
Teams take around six months up to a year to actually gel, get to that point where we can call them a team that is improving and performing. And so after 18 months, that’s still quite early and you definitely don’t want to break up the team.
I quite like the work from Heidi Helfand, who wrote the book “Dynamic Reteaming,” talking about some of these patterns [and] how we change teams over time. It’s inevitable that teams are going to change. But if you break up the team, then you lose all the knowledge that the team has. We can rely on documentation about the decisions that the team has made, but nothing is going to replace that kind of storytelling inside the team.
There are different ways where you can bring innovation and new knowledge into the team. It’s not just bringing in new people. You can make sure people have the time, availability and budget to learn new skills, to do training or to be helped by other teams.
That’s very much something we talk about in Team Topologies: How do we build new capabilities into teams in a way that is effective? We are not saying just learn it on your own in your spare time. We should have mechanisms inside the organization to share knowledge. That’s where you have enabling teams and platform teams.
Regarding this idea of “team first” and the idea that the smallest unit of delivery of value is the team, does that mean, in the DevOps sense, that they are completely autonomous? And how does that fit in with platform engineering, that they should be able to build it and run it themselves?
The goal of the platform is to help these stream-aligned teams do their work more effectively. Some people talk about creating “golden paths” for these stream-aligned teams — meaning, they are the ones who own their service or product, and they are responsible for monitoring, deploying, etc.
“Traditionally, there has always been this focus on the individual, some over-overrated ideas around this 10x engineer. We do need people with expertise and we do want people to get better. But I think this is over-reliance on some kind of hero culture and or hustle culture.”
—Manuel Pais, Team Topologies
But how can we help them reduce cognitive load of doing those things? The way we can do that is often with a platform, where the platform provides a very good monitoring service, good observability service, deployment service, continuous integration and continuous delivery pipelines – all these kinds of things can be provided as part of an infrastructure platform, for example.
We need to be very focused on, as a platform team or a group of teams, is [whether] this is helping the stream-aligned teams do those things better and faster? Because there’s a real danger of the platform providing services in a way that are not easy to use, or it’s not as reliable as it should be. Or it’s confusing or doesn’t support the use cases that the stream-aligned teams need.
And then you’re not helping them. You’re increasing their cognitive load, because now we have to use the service that is not a good fit for what we need to do, so it’s actually slowing us rather than helping us go faster.
Think about the Platform as a Product, understand your users [and] talk to them. Do fast iteration on what you’re providing in the platform, get feedback as quickly as possible from your users.
What [do] we need to prioritize? What is more valuable? Because you’re likely going to have thousands of requests when you’re working on a platform, because everyone has their own needs. So you need to have this product vision of where are we going. What is the broader value to the organization? Let’s not build things that only help one or two teams.
All of this is product thinking. Do a bit of user research, talk to your customers, see how they use your services. Provide the right level of documentation as self-service, so that people can use the platform and not depend on you as a platform team to answer their requests.
Why do you think there’s resistance to developing a team-first mindset? What happens if that resistance is outside the team?
Traditionally, there has always been this focus on the individual, some over-overrated ideas around this 10x engineer. We do need people with expertise and we do want people to get better. But I think this is over-reliance on some kind of hero culture and or hustle culture.
All those things have contributed to organizations mostly focusing on individual performance. From the [human resources] point of view, they’re focused on the individuals, and, from a budget perspective, it’s still very much based on projects or deliveries, and not the budget based on teams and allowing teams to deliver value.
A big part of the problem is trying to budget for capabilities or products or deliverables that we assume are going to be what our customers need. But, because things change so much so quickly around us, in three months, you will realize it was all wrong, but you have it funded for a whole year or even multiyear.