Four Node.js Gotchas that Operations Teams Should Know about
There is no doubt that Node.js is one of the fastest growing platforms today. It can be found at start-ups and enterprises throughout all industries from high-tech to healthcare.
A lot of people have written about the reasons for its popularity and why it has made sense in “digital transformation” efforts. But when you implement Node.js, do you have to replace your mainframes and legacy software with a shiny new Node.js-based microservice architecture?
Let’s zoom out and walk in the shoes of those who oversee the whole digital value chain: operation and performance teams. What challenges do operation and performance teams face today when they begin to implement Node.js? Does it require an entire gutting of their system?
New Tier / New Paradigm / New Challenges
In many cases, Node.js acts as a new tier that augments the enterprise stack and connects it with new offerings. It’s the fast moving technology at the edge of the system.
One often embraced benefit of Node.js is that it enables teams to move much faster. Add microservices and suddenly there are multiple deployments per day compared to one every few weeks. For many enterprises, this introduces a new paradigm and requires changes to processes that affect other parts of the organization, particularly with those that are in charge of availability and performance, i.e. the operations and performance teams.
These teams don’t consist of Node.js experts and don’t have to. They are driven by metrics like mean time to repair (MTTR). Their main concern is to find the root cause of performance degradations and outages fast. How can these teams make sure the transition to Node.js goes smoothly based on their bottom line? How can they keep their systems humming?
Below we’ve listed out a few common Node.js problems that occur when you introduce it in the enterprise, and how best to manage and solve these problems.
Top Node.js Problems and How to Track Them Down
Node.js applications in enterprise scenarios are rather simple.
Common use cases are:
- Fetching data from backends.
- Performing authentication for incoming requests.
- Rendering views.
Still, some typical problem sources need to be watched closely.
1. Memory Leaks
Node.js is more similar to Java when it comes to runtime behavior. It’s a long running process and because of this, it is prone to memory leaks of all kinds. Like in other platform, memory leaks materialize in a steadily growing heap usage, which causes a crash when the maximum allocate able heap is exhausted. Often this is accompanied by high garbage collector churn while the runtime desperately tries to free memory.
Possible causes can be as simple as large objects that are hooked to the root scope and hence never freed. But, there are also more difficult cases caused by so-called closures (functions that rely on their enclosing scope) giving the garbage collector a hard time to dereference the dependencies. There are also cases where the host simply has too low of a memory configuration causing the garbage collector not to run in time.
2. CPU Problems
Node.js runs in a single thread. Hence it’s not a good fit for CPU-heavy operations. If the CPU is occupied, e.g. because it’s transforming a large chunk of JSON – no other requests can be handled during this time.
Netflix — a big Node.js shop — had such a problem when an automated script created routes without disposing of the old ones, causing the routing table to fill up over time. At some point, discovering the right function to call for an incoming request took so much time that it severely affected performance. Read their blog post about that.
Node.js out-of-the-box comes with hooks to switch on CPU sampling — the data produced by the sampler can then be consumed by various tools. Using this data, it is rather easy to find out where the time is spent.
Here is an example that uses v8-profiler again. This time for getting CPU sampling data to find out what was on the CPU at a given time slice.
3. Back Pressure
When Node.js acts as a gluing tier connecting different parts of the stack, problems down the stack may surface first in Node.js. Back pressure occurs, when Node.js dispatches requests to slow backends. While Node.js has excellent capabilities for performing outbound requests, slow backends can cause congestion of the machinery waiting for those requests to come back. Degraded performance and even exceptions can be the result.
The metric to look at in this case is the number of dispatched vs. the number of returning requests at any given time.
Such problems can only be tracked down to its root cause by using a monitoring solution that traces transactions passing through all tiers, providing metrics about inter-tier communication. Every major vendor in the APM space today provides agents that monitor requests going in and out to and from Node.js.
Node.js offers a huge repository of small composable modules. Using the Node.js package manager (npm), it is a matter of seconds to add modules to a project, well-known frameworks like HAPI or Express build on them, and it would be highly inefficient to relinquish their use completely.
Still, every module installed is third party code. It can be poorly maintained and contain bugs that are never fixed or — even worse — security issues. Before using a module, a developer should always check its quality and make sure that it’s not trivial enough to be done themselves.
To tackle the problem, many enterprises also run their own, private npm repository where only packages that went through some auditing process can be found.
Tools like the Node Security Platform or Snyk can streamline this process by using exploit databases to find and fix possible security issues in installed modules.
The Node.js diagnostics and the post mortem working groups solely focus on ways to extend and unify the tracing and debugging capabilities within Node.js.
A few highlights from them include:
- A new tracing facility is around the corner. It will allow low overhead process level tracing.
- There are current initiatives to unify the way core dumps can be analyzed.
- With async-hooks, there will be finally a generic way to accomplish long stack traces and transactional tracing through callbacks.
Given the current pace of development and how active the community is driving performance topics; Node.js enterprise capabilities will make another leap in 2017.
Very often Node.js applications are small and not complex. Communication between tiers, memory leaks and CPU congestion can cause issues. Luckily the platform isn’t a black box, and for every problem, there are ways to introspect running applications to find the root cause.
Monitoring is a topic taken seriously by the Node.js project, and within the next releases, additional ways to trace, debug and monitor Node.js will be introduced adding, even more, capabilities to fix problems fast.
The next time your development team wants to implement Node.js, have no fear, Ops.