Software intelligence company Dynatrace is on the front lines of monitoring traffic surges and other potential network disruptions in the wake of the COVID-19 epidemic. By applying its AI-assisted platform, the company can track unprecedented surges in network traffic many organizations are experiencing. Alois Reitbauer, vice president and chief technology strategist for Dynatrace, shared his observations about what the company is seeing.
While Reitbauer usually splits his time between living and working in the United States and Europe, Reitbauer spoke with The New Stack from his remote-location home in Austria.
What traffic changes are your customers seeing due to the effects of the COVID-19 pandemic?
It’s definitely important to know we’re experiencing a perfect storm scenario right now. We all need to be on the same page for what’s going to happen.
We have certainly ramped up our monitoring of networks recently. So the way you can describe the situation for many websites now is it’s just like Black Friday, where all people go really wild on a certain number of sites. The only difference with Black Friday- or Super Bowl-like surges in traffic compared to the saturation COVID-19 might cause is that nobody knows when it’s happening.
We are seeing — especially for COVID-19-related content on websites — a close correlation of news announcements and spikes on specific sites. We see a lot of firms actually keeping up with traffic pretty well because many companies out there are accustomed to traffic spikes. They happen for different reasons, so they are familiar with this scenario.
However, we also started to notice by monitoring governmental sites that they’re unable to keep up. There are some websites that might not be designed for this type of load because they usually never get anywhere near this kind of traffic.
So, organizations with websites that are prepared for such traffic spikes are certainly getting them right now. The challenge for these organizations now is to try to know when they’re going to happen. For other events like Black Friday or Super Bowl, we generally know in advance when they are going to happen and prepare for these events.
How is automation playing a role?
It’s a really good practice that organizations prepare for news announcements if they will affect traffic because they need to being to scale up their servers. A lot of customers are doing a lot of things automatically and automating the processes. They see a spike in traffic and then they start up servers. But even in realtime, scaling up usually takes a couple of minutes. You can’t spin up a server in sub-seconds very frequently.
For those who have automation, preparation in spikes is relatively easy. For those that don’t have automation in place, preparation takes significantly longer. Many sites are able to really are scaling up, but overall, what we also see from the sites that we’re actively monitoring through our traffic monitoring operation is many sites get slower.
If everybody’s searching for the same information, everything’s coming from cache and so that might not be the reason for the site becoming too slow.
Our traditional customer segments are used to these huge amounts of traffic. Usually, their infrastructure is ready for it and they have built in a way that they can scale up to this demand because there are always other events that can have a similar effect compared to what they have experienced with COVID-19-related surges in traffic. I wouldn’t say it’s business as usual for them, but at least they know what to do in this situation.
How and why is Dynatrace offering users extended free trial access to your Software Intelligence Platform [through June 19, 2020] and Real User Monitoring (RUM) [through Sept. 19, 2020]?
For people out there who don’t have a performance management solution still, we created this specific offering to run. We extended our free trial offering on our vendor monitor for monitoring third-party operations in order to really enable companies that might not otherwise have the ability to understand what’s going on and act accordingly.
Because, honestly, the more you know about what’s going to happen and how your system is reacting, the more proactively they can act. I think the biggest problem right now might be to have to put out fires when somebody calls you and you’re not even aware of a network issue.
So think of it like this: if you get a daily traffic spike and know exactly what’s happening with your infrastructure, it might be hard to manage one day and traffic becomes slower, but the next day you now know exactly what to do.
Especially with very complex sites, you really want to know what the issue is and what the components are that you might need to scale up. So you might think of a web analytics site, for example, that might get super slow. But at the same time, if everybody’s searching for the same information, everything’s coming from cache and so that might not be the reason for the site becoming too slow. I think the key is really for people to understand how the systems work, where they can tweak and tune it, what to expect, and what the wheels are to turn.
How can Dynatrace’s infrastructure, right now, that you have in place, be impacted by COVID-19 network traffic?
So for us at Dynatrace we run most of our infrastructure that gets automatically scaled in the cloud. So for us, it’s not such a big deal.
Obviously, for us we are always running at spare capacity for our customers for a number of reasons, probably because we always want to make sure that we have to run and throughout. We are in a unique situation that we are running a monitoring system. So for us, it actually means we have to be up when everything else at our customers’ is down worse-case, which so far hasn’t been the case, fortunately.
So, we have all this data available. We know how high traffic can go up for our customers, what to expect and how to react to these situations. So, we have systems that are scaled significantly higher than what might be needed immediately, and we are also able to scale them up in pretty much real-time to cope with incoming data.
We try to be ever-ready for multiple massive spikes in traffic. I think overall you have to have this mindset. Dynatrace employs massive automation processes, so we have what we call the NoOps mindset. A lot of Dynatrace’s operations tasks are thus automated so that we don’t have people sitting in a big room with screens looking at data and then typing something into their keyboard. That’s not how things work.
But if you look at most companies out there, they have prepared for heavy traffic. There is a large number of companies that are used to heavy traffic. That’s why I brought up the example of being ready for Black Friday events or Super Bowl events.
The first front is your website. That’s where people usually get into your system. The first thing you have to ensure is that everybody that gets on your website gets information that they need in the fastest possible session.
And then people must engage with your system. And then you really have to understand the way your system is working, which information has to be provided in real-time or where certain tasks are processed in the back end. So this is really what companies need to be super aware of, that is, bugs in their system as they occur.
If you think that these customers have already seen before, traffic-wise, you’re totally fine, even if it’s 20% percent higher, because usually people test and pass their maximum user levels. So once that happens, everything is fine. You already know how your infrastructure is going to cope.
It’s really up to companies that don’t have this information right now that they need and more or less need to acquire it in real-time and act accordingly. That is at this moment the biggest challenge for some organizations out there.
Dynatrace is a sponsor of The New Stack.