Culture / Networking / Tools

SaltStack’s CTO on Pandemics, the End of Empires and Software’s Future

9 Apr 2020 9:50am, by

It is too early to determine to what extent our lives will change in the future once the Coronavirus pandemic has run its full course. However, in the software industry, some possible outcomes are beginning to emerge, including consolidation and the potential for great changes to take place — both good and bad.

As a harbinger of what may come, Thomas S. Hatch, founder and Chief Technology Officer of SaltStack, a leading automation network infrastructure provider, evoked historical examples of pandemics and plagues in the past. He discussed what changes they wrought on ancient Egypt, the Roman Empire and the Renaissance era, while drawing parallels with the software industry. Patch also shared with The New Stack in this Q&A how software engineers’ lives have hardly changed, the folly of forcing workers to come to the office when they really do not need to and his observations of network infrastructure saturation in the wake of the COVID-19 pandemic.

What are some contingencies we might experience in the software development sector in the wake of the COVID-19 pandemic?

I think that there’s concern from an economic perspective to cause slowdowns and shutdowns and it’s really a “where’s the buck going to stop?” issue. So, if we go back to the Dotcom crash, one of the big reasons why the Dotcom crash happened was because all these tech companies were just customers of each other. So you had this little microcosm. And so the ripple effect just was the pebble in the lake turned into a tsunami.

I remember that.

And we’re not like that anymore. We’re going to see startups fail. Don’t get me wrong. We’re going to seek cutbacks on specific tech companies and just companies in a lot of companies in general. And that’s going to have an economic ripple effect. I don’t think that’s going to cause the internet to shut down. But companies like telcos are going to remain strong. I worry about over consolidation with really large companies. Historically speaking, if that happens in these sorts of situations, when you have a famine or you have a famine war, when the four horsemen come power tends to consolidate. And that’s not good for long-term economic growth.

Are you talking about the software industry — or in general?

In general. I think we will see this in software. We’ll see it in general, but also in software. We’re going to come out of this and they’re going to be probably more chain restaurants around than small restaurants and a bunch of Silicon Valley startups are not going to be able to get the funding they need. A lot of VCs are out there very publicly saying, “We’re just going to wholesale say no to specific companies.” And they’re picking and choosing who will live and who will die.

And a lot of what we’re seeing there is that existing customers are more likely to stay because change is expensive and risky and now is not a time to take risks. Your risk threshold goes down, but then also that means that for smaller companies, it’s going to be harder for them to bring in new revenue. And so I would actually argue that for a lot of big companies that are big enterprise software companies, they’re probably going to pull out fine. But at the same time, the willingness for change is probably going to spike on the other side of this thing.

We get to the other side of this thing, and many of the conditions of today are very likely to shift. Which again, is an extremely common thing historically to see with pandemics.

Could you give me an example related to the software industry?

You’ll see a massive change. And sometimes the change is good and sometimes it’s bad, like when the plague in the mid-500s sealed the fate of the Roman Empire. That plague is what prevented the reunification of the Roman Empire. And so that I would argue ushered in the Dark Ages.  You don’t know what’s going to happen on the other side of this thing but you do see consolidation. When you see a plague in ancient Egypt or a famine in ancient Egypt that is immediately followed by consolidated power in the house of the Pharaoh and regional governors or their power basis go down.

But if you’re running a tech company and you’re still insisting on people coming into the office, I don’t think you’re going to be in business in a year — because you’re just out of touch.

And same with the Black Death, you see small landowners die off and the ones who live come out with large, large land. So I think we’re going to see this in software because we are going to see small software companies die off, become weaker. There’ll be easy to snatch up and acquire by the companies that survive and do well, which will be the Googles and Microsofts. And so we’re going to see a whole new wave of consolidation, which is going to do two things. it’s going to wipe the slate clean, so to speak, for a lot of startups and a lot of that startup activity. But it’s also going to mean that there’s going to be a lot of fresh business opportunities on the other side of this thing or new ideas and new concepts are going to be able to flourish because there will be fewer grassroots competition.

You mentioned that COVID-19 cases are not saturating infrastructure as badly as other areas. So if tech companies had not made the switch to work from home, how might personnel issues that far outweigh any work from home concerns management has dreamed up of? 

So, I’ve spent a fair amount of time seeing articles of companies that are trying desperately to make their workers keep coming into the office when they just don’t need to come. And it blows my mind because they always come back and I say, “But people work better through an office.” and you go, “You don’t have any research that backs that up.” That’s just some CEO thinking that way. And I talked to friends who are managing a shipping company where you need feet on the ground and it’s interesting as they go through pretty serious measures so that if an outbreak, well then it’s only going to affect 25% of my workforce for two weeks and hopefully we can then clean everything down and keep functioning when that happens.

But if you’re running a tech company and you’re still insisting on people coming into the office, I don’t think you’re going to be in business in a year — because you’re just out of touch.

That’s my favorite quote I’ve heard all week — I also agree. 

Don’t get me wrong, I miss going into the office. I miss talking to people face-to-face more. But then again, if I wasn’t on freaking lockdown, I would actually be able to talk to my neighbors face-to-face more. I’ve got the social circles. So you don’t need to stare at people while they work. And we really should be looking at the management of employees based on output. One of the things I like to say to people when I hire them is, “I don’t pay you to be here, I don’t pay you to look nice. I don’t pay you to smile. I appreciate it if you’re good to work with, don’t get me wrong, but you were paid to produce.”

How has the turmoil that the pandemic has caused affected SaltStack? 

It’s a shared conundrum. The move for SaltStack to go fully remote was very, very easy for almost everybody. There are only a few people who I’ve talked to who’ve struggled but clearly, we were already pretty much set up to go fully remote because 55% of us were remote before.

Before the confinement started were your developers spread out all over the world?

Maybe about 50 to 55% of our team is here in Utah. That’s about half of engineering and about two-thirds of sales and 90% of marketing in the office. So, marketing and our corporate sales teams have struggled a little bit, but for the most part, they seem to have done a pretty good job of adjusting and engineering just doesn’t care.

Correct me if I’m wrong but I do this as well — you work, you program and you load stuff on git. You can be at home — or wherever. Sure, network saturation can cause you to have to wait a few seconds longer to upload code, but other than that, nothing really changes that much — or does it?

Nothing really changes there. And especially since we’re so used to working with people remote anyway because of all of our open source development.

You mentioned that many customers are seeing serious spikes in traffic. Could you describe what your customers are doing, how they’re using SaltStack’s platform and tools and then how that’s playing out with these traffic spikes. 

A lot of our customers and users are using SaltStack to help in rolling out more deployments and more coverage. So that really classic use case is definitely still enforced in a big way that a lot of our users are able to provision more systems very rapidly. A lot of them have come back and specified that they’re using a lot more minions, they’re just managing more systems to deal with that pipeline. One of the other things and I touched on lightly in these questions is a lot of it has to do with network automation. Managing and automating the systems it’s a very natural thing that software reliability engineering [SRE] teams are doing and have been doing for a long time. And so a lot of our customers have big data centers, so they are able to provision those new VMs and provision new bare metal and expand in that way.

But one of the bottlenecks that they’ve been running into is network automation. So there’s a lot of work that has to be done around those network switches. So that’s one of the areas that a lot of our network management components have been really useful for people. And our teams have spent a lot more time, very recently helping to deploy more network management because again, a lot of it isn’t built out. Again, our server and app deployments are automated through the roof, but network and security infrastructures still just aren’t. And that’s one of the other sides of the coin — we are seeing more cyberattacks.

And the irony is we’re seeing those attacks for a couple of reasons. One, I think that attackers are now quarantined and they’ve got nothing better to do. But this isn’t a very opportunistic time because those infrastructures that are behind banking and online commerce. Frankly, a lot of infrastructures that traditionally are SaltStack customers, those big-scale guys.

Which you can’t name, I assume… 

A lot of them are banks. Something like two-thirds of American or North American banks are SaltStack customers. So we are probably popular there. And a few of them I shouldn’t name just because me saying that they can’t keep up with their network automation would get me in trouble.

We also manage the backend and are used by a number of major clouds. So those are the guys that, to be honest, are having a lot more when it comes to network automation issues as well as cybersecurity issues.

Could you describe the cybersecurity issues in more detail? I would argue also that it’s not that they don’t have anything better to do and have more time to orchestrate attacks, but I would assume they’re trying to attack these networks when they are perceived as being more vulnerable. 

Yeah, from a military perspective, this is the time to strike. This is why [George] Washington hit the Hessians on Christmas night, right? Even when they’re drunk or busy, that’s what you do.

How would you explain what’s going on to a CTO, who might have a more general understanding of networks but with less detailed knowledge than an operations manager?

Well, when we look at a saturation, there are so many different layers that you have to bring into account.

Because you’ve got your WAN saturation. And for the most part, our WANs, they’ve got excess bandwidth. When we built the wide-area networks that we’re working on that we have with the intent that we would never be utilizing more than say 5 to 30% of that bandwidth. And so that’s held up really well. And I say, I give you a wide number there. I haven’t looked at those specific numbers for a few years. But that’s how they build, right? Because those WANs are much more focused on trying to ensure that latency is low.

That makes sense for the wide-area network connections, since you have to share the backend or the last mile network traffic with the consumers, Netflix, Zoom or what have you. 

There’s a lot of areas where some bottlenecks can happen. And when it comes to, what are the interesting things about banks too, is that many of them have moved to the cloud. And you’re right, when you’ve got a dedicated data center who you’ve got a dedicated pipe, and you’ve got a dedicated giant chunk of fiber coming into that thing and you have a very specific contract with that telecoms. Not all banks have contracts and assurances from data centers and clouds, sorry, from clouds that they’re going to have that level of throughput happening. And so one of our cloud customers sort of the struggles they run into is that their contracts with major customers like banks dictate that they have to have a certain level of throughput and that if they go below a certain latency, then they are considered offline as far as the SLEs are concerned.

And those latencies are something like below 10 milliseconds. So very, very strict. And this is actually why you end up with clouds that do really well against Amazon Web Services because they focus on these more exclusive contracts but there are also a lot of banks, they’re just on AWS. And we’ve got customers that are financial service providers. They’re just on AWS and they’re feeling, because they still have bad neighbor issues and they don’t have those dedicated pipes. And I do suspect that we’re going to see on the other side of this, a stronger proclivity towards either having those more exclusive, dedicated cloud resource deals with some of these other cloud providers.

And, for these specific rollouts. But even then we have even the next layer, which is just that rack by rack switch management. And those things aren’t automated for most companies. Most companies are still managing switches in a pretty manual way. So we’re partners with Juniper, Cisco and others. And it astonishes me how many joint customers we come in with on these guys where they don’t really have any automation to roll out new or reconfigure or secure those switches. And so that’s also become a vulnerability point and a vulnerable edge point because many of these edge devices and switches, many of them haven’t been updated to alleviate CDs that had been published by the switch providers for sometimes as long as a decade.

SaltStack and Amazon Web Services (AWS) are sponsors of The New Stack.

Feature image: “The Battle of Trenton,” Published by U.S. Government Printing Office; painting by Hugh Charles McBarron, Jr. (1902-1992), Public Domain.

A newsletter digest of the week’s most important stories & analyses.