The Inspection Paradox: How to Avoid Bias and Find Customer Metrics That Matter
Stackery sponsored this post.
I was reading an amazing article yesterday on “The Inspection Paradox,” a statistical illusion that occurs frequently in daily life.
Here is an example that illustrates this paradox: When we ask the dean what the average college class size is, she says 35. When we ask students, their answer is 95. No one here is lying.
The explanation is easy without doing any math: Students are much more likely to be in a large class. Random sampling would be much more likely to pick a student taking a 500-student lecture survey than one of the kids in a 10-person seminar.
The inspection paradox observes that when we try to find an average, it matters if we’re asking the observers or participants.
How the Inspection Paradox Relates to Development
I was thinking about this as I considered the situation of a web developer who’s trying to describe what constitutes “good” performance for their service.
Okay, before we dive into this, I want to say the above scenarios aren’t real-life examples of the inspection paradox. They’re really more closely related to selection or survivorship bias. But it was a fun entry point to the concept.
As Downey describes in his article, the inspection paradox will give you an odd feeling on the racetrack: everyone you pass seems to be going much slower and everyone who passes you seems to be really zooming. This is reasonable because when someone is going very close to your speed, it’s very very unlikely you’ll pass each other on a racetrack. If you’re moving at identical speeds you never will!
Our Distorted View of Average
Upon learning about the inspection paradox, I was immediately reminded of what developers experience when comparing the performance of their web apps. It’s easy to think of examples of apps that deliver incredibly short response times or web services that really crawl. But it’s much harder to think of examples of apps whose performance is very similar to our own.
This is also evident when we go looking for Medium blog posts about performance. There are way more articles on “how we beat speed-of-light limitations in our latest service” or “how we stopped taking two weeks to deliver an email” rather than “how we render our app in 7.2 seconds, which is fine.”
Okay, I said it already but I’m saying it again: I know this is selection bias.
The problem with focusing on extremely poor or unrealistically great examples is that it encourages a kind of paralysis.
“We’ll never get the responsiveness of Netflix, so why bother tweaking our CDN?” a developer might reasonably ask.
It’s very easy to feel daunted by examples where other dev teams seem to deliver the impossible. However, here is a more positive — and ultimately, rational — way to look at the performance gap:
“Yeah, we can be slow during peak usage, but at least we’re not Thingiverse!”
(I love you, Thingiverse — I’m sorry). Just because you’re beating the industry’s worst performers doesn’t mean you have reason to be complacent.
Above all, a focus on outliers tends to take the focus off incremental changes. Small improvements and small downgrades both seem tiny compared to extreme cases. But here’s the thing: No service gets to top performance in a single step. To make great improvements you must build on small ones. If you don’t take the time to celebrate a 50 ms average performance boost, you’ll accept changes that make for small slowdowns, and eventually, those slowdowns will accumulate until you find yourself heading to the bottom of the pack.
Why These Metrics Are Hiding the Real Contest
Finally, all of these number comparisons can be a distraction. At the end of the day, the most important thing is to listen to your users.
So, essentially, user expectations are subtle around site performance, and minor user interface (UI) tweaks can vastly change the perception of performance.
A Real Example
Were you annoyed that I didn’t use real examples of The Inspection Paradox? Okay, here’s a real one:
Something prevalent back in the day when I managed enterprise services for successful startup New Relic was a paradox I’d describe as one of two generalized questions:
“What’s our average page load time?” vs. “How often do we risk $100,000 in revenue through poor performance?”
The average user of any software tool is going to be a small dev shop or lone developer. They’ll often have high usage rates, meaning most times when your tool gets used it is under minimal loads and performs pretty well.
But with high-value customers, it’s likely they’re using your tool to its very limits (huge amounts of data, unusual environments, etc). This often means the more a customer matters to your business, the worse your product performs.
Measure, as Best You Can, Satisfaction Not Speed
The inspection paradox means that averaging user performance can get you a different answer than taking the average user response. But more importantly, your users know what counts as acceptable performance. If you measure their happiness either directly or with proxies like session time, you can see where you’re really succeeding. And always celebrate small improvements — they’re the building blocks for bigger wins.
New Relic is a sponsor of The New Stack.
Feature image via Pixabay.