Microservices / Monitoring / Sponsored

NGINX’s Sarah Novotny: Monitoring Microservices Means Embracing Failure

16 Oct 2015 12:56pm, by

The web browser has been the leverage point for web application performance monitoring ever since AJAX first became feasible. JavaScript execution relies upon events, and APM monitors leverage those events to determine whether online transactions are proceeding smoothly.

This could change very dramatically, and extremely soon. Thursday at the Dynatrace Perform 2015 conference in Orlando, Florida (produced by Dynatrace, a long-time competitor to New Relic and AppDynamics), NGINX Head of Developer Relations Sarah Novotny revealed that the commercial version of the leading web server component for microservices is now capable of providing key signals necessary for Dynatrace, and tools like it, to monitor the performance of any class of distributed application it hosts.

151014 Dynatrace 006 (Sarah Novotny)

Sarah Novotny

“NGINX+ has additional metrics and visibility down into the way it runs,” said Novotny, in response to a question from The New Stack. “If you’re running NGINX open source, there’s a ‘Stub Status’ page … and it shows a couple of monotonically increasing counters, nothing terribly exciting, and you’re like, ‘Yay, it’s up.’ With NGINX+, we have a much more robust status page, which is a functioning dashboard to interact with multiple NGINX hosts, or incidences running for load-balancing situations.”

That status page, by default, has a .HTML filename extension. As Novotny told attendees here Thursday, all you’d need to do is remove that extension from its URL, for NGINX+ to return instead “a giant JSON object, with all of that information.”

That object can easily be imported into Dynatrace, she explained, or another APM system.

What Agent, Where?

The implications of this may only seem tremendous to those whose job it is to monitor performance — and as my colleague Alex Williams and I learned here all this week, those people are more likely not to be software developers, and are not necessarily considered IT admins. But consider the quality of the instrumentation to which software testers would have immediate and reliable access, if those metrics were being computed from data provided by the servers on their own side rather than the clients on the users’ side.

In the case of web browsers, there are still considerable behavioral differences between Google Chrome (currently the usage share leader), Mozilla Firefox, Microsoft Internet Explorer, Apple Safari, Opera, and now Microsoft Edge — the entirely new chassis built for Windows 10. It’s not exactly accurate to say it’s difficult for performance-minded IT personnel to factor these differences out, because in most organizations they’re not factored out at all — the variables are so much in flux, that no one bothers.

With NGINX+ being directly integrated into containers that host microservices, it is now already possible (this is not a future feature but a present one, according to Novotny) for microservices to report critical performance data to monitoring agents. What NGINX has left open, and what Dynatrace has yet to announce, is where such an agent would need to reside in such a relationship.

Here at Perform 2015 this week, attendees have been shown what Dynatrace characterizes as the next generation of APM — while at the same time carefully positioning it as an extension, not a replacement, for the existing Dynatrace Agent. Called Ruxit, the new system utilizes auto-discovery to determine the components of a complex network, including the software that runs there. It can then measure the operating state and relative performance level of the components and, in many cases, the software.

Ruxit has been marketed as a cloud-based service, although on Thursday, the company announced the addition of Ruxit Managed — an on-premises version whose management can still be overseen by Dynatrace. In either case, Ruxit technology is based on what product engineers here have called the “One Agent.”

What we have yet to learn is how this “One Agent” will interact with NGINX+. But during the Thursday morning keynote session, NGINX CEO Gus Robertson’s presence was indicator enough that cooperation between NGINX and Dynatrace was already under way.

151014 Dynatrace 007 (Gus Robertson)

Gus Robertson

Robertson said one key to improving user experiences was the ongoing shift in application architectures.

“This architecture is very different from the monolith architecture of the past,” said Robertson. “The monolith architecture of the past was a much more simple architecture. You’d have a single binary, and you’re duplicate that binary across many machines if you wanted to scale … Now, you have many, many microservices.”

But Robertson didn’t exactly draw a correlation between that plurality of microservices and the greater efficiencies that users would enjoy, in the form of faster and more efficient distributed applications. That’s what led attendees to Sarah Novotny’s session later that day.

Refactoring the Elephant

The theme of Novotny’s talk was something many veteran performance monitors might have found a little foreign: embracing failure. Historically, APM has been used as a way to measure the integrity and maturity of an application, both before it’s deployed in the field and during the production phase. It’s been described as something like a rock-polishing tumbler, casting light on the hard surfaces and polishing them smooth and shiny.

It’s not a very compatible metaphor with the microservices realm, where services are meant to be ephemeral, failure is to be expected but responded to, and outages are compensated for through redundancy.

“This is one of the things that microservices architecture really gives you,” Novotny advised attendees, “the ability to change your user experience, and degrade gracefully as components fail. It also gives you speed, in a lot of interesting ways. But speed isn’t just about how quickly your app loads, or how quickly your website loads. Speed is also, how quickly can you get new features out? Speed is, how quickly can you fix things when they break? Speed is also, how quickly can you adapt your business, and your business process, as a competitor does? Can you stay ahead of your competitors?”

She was speaking to an audience, many of whom had not yet considered the concept of deployments and rollbacks being automated and orchestrated. In the monolithic realm, provisioning is a manual process, usually conducted during an unobtrusive time of the weekend when nobody else is at work. In microservices — as users of Kubernetes, Mesosphere and Marathon are aware — deployments are small events for small services.

These deployments become regular events, taking place throughout the day rather than throughout the year. And because they’re automated, like the rendering of a web page … the processes associated with those deployments may be measured for performance.

Novotny’s talk definitely had a an effect on the group listening to the discussion at the conference. They had that look that comes when people realize how different the new way can be.

What has changed is the scale of the jobs. When monitoring in an automated fashion, frequency increases by an order of magnitude but the sizes of the jobs decrease. Metrics are used as the way to observe the jobs and how they are faring.

Novotny compared monolithic applications and the processes that accompany them — including APM — to the different parts of the proverbial elephant in the room. She advised everyone to “refactor this elephant we now have.

“Anybody know the joke about how you eat an elephant? One bite at a time.”

Dynatrace and New Relic are sponsors of The New Stack.

Feature image: “pink elephant” by daddyboskeazy is licensed under CC BY 2.0.

A newsletter digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.