TNS
VOXPOP
Where are you using WebAssembly?
Wasm promises to let developers build once and run anywhere. Are you using it yet?
At work, for production apps
0%
At work, but not for production apps
0%
I don’t use WebAssembly but expect to when the technology matures
0%
I have no plans to use WebAssembly
0%
No plans and I get mad whenever I see the buzzword
0%
DevOps / Software Development

Nextdoor’s Plan for Catching New Release Troubles Early on

Nextdoor built an App Release Anomaly Detection tool that stops client-side regressions before they have a severe impact on the application.
Feb 3rd, 2023 1:02pm by
Featued image for: Nextdoor’s Plan for Catching New Release Troubles Early on

Hyperlocal social networking service Nextdoor uses old-school statistics to isolate problems early on with new mobile app updates.

There are tens of millions of active users weekly on Nextdoor’s mobile Android and iOS applications. To keep up with regular weekly releases, Nextdoor built an App Release Anomaly Detection tool that stops client-side regressions before they have a severe impact on the application. The blog post written by Walt Leung, Nextdoor software engineer, and Shane Butler, Nextdoor data scientist, goes into great detail on the topic.

Nextdoor employs a phased rollout strategy for its weekly mobile releases. Initial users are limited to a new version to ensure safe and scalable deployments. But, since a very small and specific subset of their total user base engages with the latest versions first, traditional observability methods aren’t as effective in earlier phases. Nextdoor started using “difference-in-differences” analysis to identify app session decline 10 days earlier than week-over-week figures.

The Problem

Phased rollouts aren’t the problem, observing them is. Out-of-the-box methods fall short in early phases for two reasons — early data is very specific to the early users and very small. The first users to adopt a new version tend to be more active than the median skewing the overall data sample in one direction.

Small relates to the sample size in general. Consider a hypothetical new version, v.1.234.5, released on March 4th. If a regression was introduced where an app session wasn’t counted 5% of the time, at a 1% rollout, the aggregate impact is roughly 0.05% of all iOS app sessions. It’s a number that’s impossible to detect with aggregate-level observability. Factor in the high activity level — maybe 0.06% or 0.07% or all sessions were impacted. It’s hard to see or draw a clear analysis.

… until the full rollout when a 5% regression is “business critical”.

The top trend line shows app sessions. The bottom trend line shows the release adoption over time due to phased rollouts.

The information Nextdoor needs from the phased rollout’s early adopters is, “what is the difference between their actual app sessions after adoption compared with their hypothetical app sessions had they never adopted the release in the first place?” This is an unobserved counterfactual in statistics. Difference-in-differences analysis measures it.

The Solution — Applying Difference-in-Differences Analysis

Nextdoor can’t compare users who adopted the new version and those who didn’t directly — their underlying behaviors are too different. But they can look at overall trends and turn all metrics into relative metrics.

Nextdoor applied difference-in-differences analysis of this effect by accounting for the separate time-varying effects of users that have and haven’t adopted a release. For v1.234.5, this meant calculating the difference in app sessions of both early adopters and non-adopters for the three days before the release period and three days after. Nextdoor observed a -0.02 decline in early adopters and a +0.20 increase in non-adopters.

It’s critical to make sure both groups exhibit similar behavior pre-adoption. If the trend is similar before adoption, it means the results would match if the trends continued (the pre-trend assumption).

This didn’t happen in the case of v1.234.5 since there was a decrease of -0.02 in the adopters and an increase of +0.20 in the adopters. The difference-in-differences is calculated to estimate a comparison against an unobserved counterfactual.

-0.02–0.20 = -0.22 decrease in app sessions due to iOS release v1.234.5

Difference-in-differences analysis and their sample size of over hundreds of thousands of users gave Nextdoor high confidence in similar pre-trend behavior with a standard deviation bound over the preceding few days to adoption. If the behavior holds, they fit a linear regression model that estimates the average effect of a release for any particular metric.

y = β0 + β1* Time_Period + β2* Treated + β3*(Time_Period*Treated) + e

Results

Now statistically significant negative effects can be measured across multiple app sessions metrics.

The image above illustrates the average % lift of metrics Nextdoor ran App Release Anomaly Detection on for v1.234.5.

Because of difference-on-differences, phase 1 of the rollout is incredibly informative. Nextdoor can flag an app sessions decline 10 days earlier than the previous, diagnose the decline to a specific release, and isolate the regression to less than 1% of users. The engineering team also no longer needs to factor in external variables such as seasonality or the day of the week.

Nextdoor credits the App Release Anomaly Detection as one of the foundational elements that allow the engineering team to iterate quickly and effectively by preventing, “nearly all severe critical client-side regressions,” the Nextdoor engineers write. They also credit this tool for the “peace of mind it gives us to release bigger changes at a more rapid pace.”

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.