Case Study /

How To Succeed at Failure with Microservices

8 Aug 2016 7:52am, by

The willingness to fail fast in the open is one of the tenets of today’s new distributed software paradigm. Moving from a monolith to a microservices architecture — often a key task in building a globally distributed application architecture — is challenging work, and brings inevitable misfires.

Failure at microservices is still rarely discussed. Even in The New Stack’s pages, there are only 25 search results for “failure” and “microservices architecture.” In many cases, this is because businesses want to keep their failures internal, reluctant to speak too publicly about how they grappled with the thornier side of architectural and infrastructure reorientation.

A deep exploration of failure at microservices can help enterprise succeed. After all, failure is feedback. Failure is a temporary state. Leadership author John C. Maxwell, in his book “Failing Forward,” encourages a new definition of failure, seeing it as “the price we pay to achieve success.”

Speaking at the recent CA Technologies’ API360 Summit in New York, Ronnie Mitra, co-author of “Microservice Architecture: Aligning Principles, Practices, and Culture,” said missteps in reorienting towards a microservices architecture is to be expected, after all, “when you make services small, the system around them becomes more complex.”

Mitra says that the essence of microservices is “speed and safety at scale and in harmony.” He points to three areas that become increasingly complex at scale:

  • when demand increases (there are now lots of users of your app),
  • with distance (code for the app is geographically dispersed across cloud infrastructure) and,
  • amongst organizations (as the business grows, what worked for a 10-person company may no longer work when there are thousands of staff).

Here, Mitra is hinting at one of the most forgotten aspects of architectural reorientation: it is a three-pronged approach, with team organization as crucial as software architecture and cloud server infrastructure decisions.

Vijay Alagarasan, principal architect at Asurion, a global white-label insurance provider for mobile and telco supporting over 290 million customers, says moving to a microservices architecture was key to the company’s ability to reducing cost, enhancing agility and providing a high-quality customer experience.

However, he says it is easy to fall into failure with microservices along the way, having identified seven anti-patterns that need to be avoided when mapping a new microservices architecture:

Screen Shot 2016-08-02 at 17.48.58

What Alagarasan did not touch on, though, was how Asurion organized itself to implement these design patterns and avoid the anti-patterns. Cassandra Shum, lead developer/consultant at ThoughtWorks New York, said that essential to microservices success is how teams are organized: Microservices can work “when organizations have flexibility and buy-in,” she said. Shum argues that when an organization is hierarchical and can’t create feature-driven teams, then microservices won’t succeed.

Failure In Focus: Why Technological Revolution Is Not The Starting Point

Holger Reinhardt is the chief technology officer at Germany’s Haufe-Lexware Group, a LexisNexis-type enterprise in Europe that has reinvented itself from being a traditional publisher to a digital services provider. He bravely goes into great detail on what failure looks like, and the hard lessons he has learned as a new CTO that originally joined the company keen to rebuild their tech stack as a microservices architecture.

When Reinhardt started, like with many traditional enterprises, Haufe-Lexware had its fair share of technical debt. There was one service platform, with one deployed artifact, so that it took 5 to 10 days to deploy any changes to production. It often took months to test anything. The code base was on just one instance of hosted hardware. Ideal for reorientation to microservices, right?

Screen Shot 2016-08-02 at 17.53.08

Nine months into a reorientation, Reinhardt had to cancel the project and admit it was a “complete failure.” Separating user management had led to zero test coverage. Scope creep meant the project team was often overwhelmed by all the new problems they had identified and wanted to fix. Agile had become fragile. Reinhardt warns others who are embarking on a similar challenge: “If you get to the point where you hear that the solution is ‘we just need more developers,’ then yellow flags should go up.”

Reinhardt says the outcome reminded him of the quote often attributed to Peter Drucker: “Culture eats strategy for breakfast.’

“Execution of strategy is a function of culture, technology and organizational structure,” Reinhardt said. “I sense that in any team, you need to have culture, technology and structure in alignment to succeed. And of the three, culture always wins and is going to snap any attempt at change back into the existing model.”

He gives an example that highlights how tech is not unbiased. “If you have a Java team that programs in Enterprise Java, the focus is on safety, stabilityatand predictability. But if you need to get time-to-market speed up and you introduce JavaScript, you will have the programmers using JavaScript like Java, because the cultural model of that team is to focus on safety.”

After putting the nine-month project on the bin, Reinhardt’s job at hand became even more pressing: not so much to solve the migration problem, but to prevent a downward spiral of morale amongst the team charged with the task in the first place. “I needed to provide an external structure to get the team to believe in themselves again.”

Reinhardt said there were two key ideas he came back to after seeing the impact of the failure on his team, and that influenced him in plotting a new way forward. One was Martin Fowler’s metaphor “You have to be at least this tall to use microservices.” The second was the lean transformation model of Toyota: “In lean transformation, you must first stabilize, then optimize, then transform.”

“We tried to transform before stabilizing,” Reinhardt admitted.

Lessons from Failure: Three Tools to Prepare Teams for Microservices Orientation

Reinhardt shares three ways he steered his team towards a new success, one that saw that same team that had failed at the reorientation introduce structural changes that have since brought their monolithic codebase deployments down from 5-10 days to thirty minutes.

1. Communication: Reinhardt believes that in an enterprise, communication often only occurs within the middle management level. He shares the story, whether it is real or not, of the moon program, when a reporter asked the person sweeping the launchpad what they were doing, the sweeper replied, “I’m helping send a man to the moon.” Reinhardt believes open and constant communication can help everyone in the team to see how their work aligns with the overall strategy. “We introduced Slack and Rocket into the organization. It democratizes communication: everyone on Rocket can just ping me. I am using that like an internal social media strategy where I continually feed points of reference and information into the organization.”

Reinhardt fostered a developer culture amongst his internal teams, leveraging IRC chat, a developer blog, meetups and keeping an open door policy to help build momentum.

2. Accept the cultural constraints you have: Given the organizational structure already in place, and the technology decisions that had been made over many years, moving to an agile approach ended up being too large a cultural shift for such a large project as microservices reorientation.

Reinhardt recalls the first two stages of lean transformation: first stabilize and then optimize. For the team that had failed, he chose what he calls a ‘lighthouse’ project, and he set a challenging timeline: three months. “If it was six months, the external perception of that team failing would have sunk in, so we challenged the team through shock and awe to get them back on track.”

Instead of a purely agile approach, a critical chain project management approach was put in place: “It is in between a strict waterfall approach and agile: you give an external expectation of time, and as long as the end date is being met, the team is fine.” Reinhardt says that providing this sort of external structure helped the team believe in themselves again.

Screen Shot 2016-08-02 at 17.57.40

“Agile works great for very mature teams,” Reinhardt said. “But you go to war with the army you have, and in a lot of teams you do not have that maturity. They have the notion of agile, but they don’t live agile. Teams started to get extremely unsure about themselves, and then you start to get estimates flowing back to you with lots of buffers built in because teams didn’t want to commit to something that they couldn’t achieve and so they swing to being really conservative.”

The answer came in imposing some top-down structure on those teams, he said.

“We had a hunch with the right resources the team could do the new project in three months. So we suspended agile for a little bit and put in almost a waterfall-ish approach to say they have to do these things in time,” Reinhardt said. “In the background, we had the confidence and had some critical people in place to support the team, if they needed it. So it stretched the team but also made sure the team didn’t lose faith in themselves. Then the magic happens, when the team is able to meet that stretch goal, that erases the stain of failure. They suddenly went from being the beaten dog to the top dog and were able to show what they did in three months.”

Reinhardt says that at this stage — like Shum from ThoughtWorks had urged around removing hierarchical organizational structures — “As the team regained confidence, you need to start taking your hands off from the steering wheel and let the team drive again. The intervention isn’t a permanent fix.”

3. Consider microservices as a philosophy rather than an architecture: Reinhardt says that, on reflection, he should have encouraged a microservices mindset rather than tried to immediately implement a microservices architecture.

“Microservices architecture is the biggest misnomer since global warming,” Reinhardt said. “‘Global warming’ rolls off the tongue so much better than ‘climate change.’ The drawback is that every time you have a cold winter’s day, people say global warming doesn’t exist, whereas climate change just says that the frequency of weather events is more extreme. It’s the same with microservices: people by instinct immediately focus on the micro part.

“But microservices architecture is an architectural approach that takes into consideration the way we work and the way we organize. What I find so fascinating with this is that it is not just technology, it is cultural factors like DevOps and organizational factors like Conway’s Law. So in order to achieve something, you need those three in alignment. Depending on what product you want to build, you need to build the teams accordingly.”

Screen Shot 2016-08-02 at 17.53.42

Conway’s Law — introduced in 1968 by programmer Mel Conway — models much of the discussion here on how to learn from failure and implement microservices architecture: “organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations.”

Speaking at API360, Conway reminded the audience, “every organization choice rules out a design choice. So how are you going to build an optimal system? With a flexible organization, experimentally.”

Now, almost fifty years later, Conway’s axiom in the era of microservices — as Reinhardt has proven — is truer than ever.

Feature image: Taken in Brazil by Edu Lauton. Licensed under CC0 1.0

A digest of the week’s most important stories & analyses.

View / Add Comments