Two Times Integration Testing in Production Has Gone Wrong
Once upon a time, there was a developer who was simply running a test, but a little slip-up in the process inadvertently turned them into an internet meme.
When Portuguese supermarket chain Continente accidentally sent a test notification roughly translating as “Testing Mariana — Received?” to all users of its Continente app, it sparked an internet frenzy with huge brands getting in on the joke.
Honestly, we can sympathize with that awful sinking feeling in the pit of your stomach when you realize that your carefully orchestrated test has just gone hideously wrong. We’ve been there!
And it’s not just us and the developers at Continente. There are many very public testing fails out there. I wanted to share a few of my favorites:
HBO Max Integration Testing
Last summer, HBO Max sent an integration test email to a number of HBO Max subscribers on its mailing list and backtracked quickly with this viral tweet. What I love most about this testing fail is the huge number of senior engineers who replied, reassuring the unidentified intern that everybody breaks production and sharing some delightful mistakes of their own!
We mistakenly sent out an empty test email to a portion of our HBO Max mailing list this evening. We apologize for the inconvenience, and as the jokes pile in, yes, it was the intern. No, really. And we’re helping them through it. ❤️
— HBOMaxHelp (@HBOMaxHelp) June 18, 2021
My first FT gig as a FE was @Wayfair where my first deploy to production code was to create a shimmering animation for the Sale menu item. But there was a naming collision for the keyframe animation that I wasn’t aware of…
Recreated it below.
It happens :) pic.twitter.com/6L9hUJ8ae4
— Ali Rehmatullah (@Ali_Rehmatullah) June 18, 2021
It’s ok. I dropped a prod database when I was a senior engineer. These things happen more often than you might think. Building good systems is about having resilience against human mistakes. Because we, humans, always make mistakes.
— Jaana Dogan ヤナ ドガン (@rakyll) June 18, 2021
I once globally took down Spotify. It almost happened twice. My team was awesome about it and I’m still here. You managed to find something broken in the way integration tests are done. It’s a good thing and will help improve things. Good luck <3.
— Daenney (@daenney) June 18, 2021
Dear intern, I once set up a load test that accidentally sent 10,000 queries per second to https://t.co/mWSx1RtJMg.
Another time, a bug in my code caused the Google local business search index to drop 500K random locations.
I’m now an Engineering Director.
Own it, learn, & grow.
— Sameer Ajmani (@Sajma) June 19, 2021
You Mean You Don’t Want to Make Public Mistakes?
If you’re keen to not accidentally see your tests go viral, which is something we all should try to aim for, what can you do to make integration testing a bit easier?
After experiencing our own spectacular testing failures when running Kubernetes in production, we created a framework that explicitly gives developers and testers a less error-prone testing workflow. Testkube is a Kubernetes native testing framework that allows testers and engineers to manage all the testing activities happening in their clusters.
Why not download the latest release from GitHub and give it a spin for yourself? If you’re interested in learning more, or just need some folks to commiserate with when your test goes comically wrong, join our Discord server and follow us on Twitter @Testkube_io or email me directly firstname.lastname@example.org. We’re looking forward to hearing from you!