Monitoring / Sponsored

The New Stack @ Scale Podcast Show 10: Platform Versus the Chaos Monkeys

23 Jun 2016 2:38pm, by

To the maintainers of platforms, developers can sometimes seem like living, breathing chaos monkeys, forever poking new holes into the infrastructure in enthusiastic, and sometimes utterly novel, ways.

“Passionate people try to use platforms in creative and unique ways,” admitted Charity Majors, who was the infrastructure tech lead of the Parse mobile platform, a technology that ultimately was acquired by Facebook. “The greatest source of chaos in the galaxy are humans, right?”

In this episode of The New Stack @ Scale podcast, Majors was interviewed by The New Stack founder Alex Williams and Fredric Paul, editor-in-chief at New Relic. They are joined by Tori Wieldt, a developer advocate for New Relic. They discuss the powers — and the limitations — of extensive application monitoring, and the immense benefits it can bring to support engineers of  platform providers.

Like any good platform provider, Parse would have engineers on duty to debug customer issues, which could range anywhere from deep infrastructure bugs to the customer’s WiFi not working. As Parse became home to about a 100,000 applications, the time it took to solve issues spiraled out of control.

“They were all different,” Majors said of the trouble tickets coming in. Company engineers were “starting from scratch for every single one, trying to figure out what [the customer’s] experience is. Engineers get really burned out when they are trying to track down things that are very difficult and time-consuming.”

When Parse was purchased by Facebook, however, the engineers could avail themselves with the social networking giant’s own Scuba monitoring tool, and that made quite a difference.

“This was literally life-changing for us,” Majors said. “If we couldn’t figure out what the root cause was, we just started dropping in key-value pairs, and you could immediately search on them, and aggregate on any dimension.”

The average mean time of diagnosing a serious problem went from “hours to impossible” to somewhere around three to ten minutes, Majors said.

Still, even at its best, monitoring has its limits.

“Everybody wants a silver bullet. They want New Relic not only to monitor, but they want it to predict and fix. That just doesn’t exist yet. If that were easy, it would already be an app on your phone, and I would be working somewhere else,” Wieldt said. “It takes investment of time. Look at the stuff while it’s running, not when it is an emergency.”

New Relic is a sponsor of The New Stack.

Feature image via Pixabay.


A digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.