Baptism by Fire: NS1 INS1GHTS Surveys Current DevOps Challenges

NS1 sponsored this post.
DevOps is many things: It is the need to deploy applications and updates at ever-increasing cadences. It is the necessity to shift left by integrating security processes and checks at the very beginning of the software production pipeline. But it is also critical not to neglect the shift right, either, by ensuring code remains secure as applications are increasingly deployed in highly distributed multicloud and containerized environments. Then there is the last mile that depends on the operations team to manage the underlying infrastructure in such a way to optimally deliver the final user experience.
And perhaps the most crucial element on which the lifeblood of DevOps — as well as that of the organization — is based, consists of continually striving to improve the well-being of all team members. This means doing everything that can be done to promote diversity both in house and among the community in ways that the tech community has not been doing nearly as much as it should.
But now, especially, the COVID-19 pandemic has compounded these challenges on an unprecedented scale, as the magnitude of security and technology needs of remote workers has exploded.
“The legacy infrastructure foundations that we built to support networks and traditional application delivery are coming suddenly under a ton of strain,” NS1’s Kris Beevers, CEO and co-founder, said. “This big dramatic shock event has driven what seems like pretty much everything online all at once.”
These and other themes of these tumultuous times were addressed, discussed and debated during the NS1’s NS1 INS1GHTS online summit last week.
NS1 CEO Kris Beevers: The infrastructure built to support networks and application delivery is “coming suddenly under a ton of strain” due to the #coronavirus and “this big dramatic shock…has driven…pretty much everything online.” @NS1 https://t.co/QfFA9TxnxR @thenewstack pic.twitter.com/hZXJCQrkBB
— BC Gain (@bcamerongain) June 26, 2020
The Big Shock
The coronavirus pandemic has unleashed an explosion of teleworkers, especially in the IT sector. The resulting momentous effects in connectivity requirements have placed unprecedented stress on the internet and network infrastructure, Beevers said during his keynote.
“We’ve learned through this coronavirus that our foundational infrastructure wasn’t really built to support the delivery and the access of all the tech,” Beevers said.
Prior to the coronavirus pandemic, many organizations, of course, were already undergoing a digital transformation, Beevers said. “All of these related sudden shifts are acting as a catalyst to accelerate this transformation from a timeline over the years to something that’s happening almost immediately. And this puts into stark relief the need for every single business to find ways to strengthen its tech posture to be resilient and scalable,” Beevers said. “But, of course, given the economic situation, businesses need to be cost-effective as well. And because customers and users aren’t really that forgiving, they demand, and deserve, even in this environment, really great digital experiences.“
The Last Mile
A great customer digital experience requires flawless last-mile delivery, whether it is for application delivery or streaming video games. In the case of this year’s Super Bowl, for example, NS1 was tasked with steering the live video traffic. “If you think about what that infrastructure looks like, it’s gigantic and real-time, with plenty of opportunity for everything to go wrong,” Beevers said, during the INS1GHTS2020 Fireside Chat session. “Our job in that situation is, first and foremost, to make sure nothing goes wrong and that when anyone goes to watch the Super Bowl, they don’t see an endless buffering.”
DevOps teams with significantly smaller-scale security and software deployment need compared to those of the Super Bowl producers are now more dependent than ever on traffic-steering capabilities in the wake of the pandemic. “As you can imagine, a lot of enterprises out there weren’t necessarily prepared for the entire workforce to suddenly swing remote, and especially weren’t necessarily prepared to secure that kind of footprint,” Beevers said. These organizations typically had a “VPN footprint to service 5% to 10% of the workforce — and suddenly that is 100%.”
Smart Cloud
Organizations seeking to modernize their infrastructure by migrating to the cloud and taking advantage of the opportunities microservices and container environments offer “are creating a foundation for the future of their company,” Jonathan Sullivan, NS1 chief technology officer and co-founder, said during his keynote. Businesses are going to be more resilient and prepared for the inevitable continued evolution of technology as “software continues to take everything,” Sullivan said.
However, it is critical to plan and build on the cloud wisely. “You’re going to need modern application delivery stack services in order to take advantage and leverage the investments you’ve made,” Sullivan said. “Without that, you can have this fantastic hybrid cloud strategy and if you have no way of intelligently orchestrating traffic across that, you’re just not going to see the ROI that potentially could so.”
NS1 CTO Jonathan Sullivan: Without that modern application delivery stack and with no way of “intelligently orchestrating traffic cross cloud deployments..you’re just not going to see the ROI.” @NS1 https://t.co/QfFA9TxnxR @thenewstack pic.twitter.com/0Bxc7BoYOf
— BC Gain (@bcamerongain) June 26, 2020
The migrations to cloud environments with containerized and microservices environments is also never easy, of course. “A digital transformation used to involve “migrating to the cloud” and determining “what that meant for your business, Sullivan said. “What we’re finding today is that everything is a lot more complicated than that,” he said.
NS1 CTO, co-founder Jonathan Sullivan, digital transformation used to involve “migrating to the cloud” and “what that meant for your business. “What we’re finding today is that everything is a lot more complicated than that,” he said. @NS1 https://t.co/QfFA9TxnxR @thenewstack pic.twitter.com/Awv9eUoIWB
— BC Gain (@bcamerongain) June 25, 2020
Organizations today, for example, might typically have resources on Google Cloud Platform, Amazon Web Services (AWS) or Microsoft Azure, Sullivan said at the INS1GHTS Fireside Chat session. “But it’s never going to get simplified down to, ‘we’re going to be able to migrate everything away on-premises and read the playbook and put it into one cloud,’” he said. “And so you’ve got to be ready with tools like Kubernetes or frameworks like VMware’s Tanzu, to put your stuff anywhere and just figure out how to make use of this complex infrastructure and complex substrates.”
While using cloud provisioning to increase capacity has become easier “because we can sort of just go out and buy more compute, it also makes things easier for us to ignore.” Heidi Waterhouse, senior developer advocate, LaunchDarkly, said during her talk “Breaking Strain: A Story About Capacities and Testing.” “We don’t get as many early warning signs that we’re running out of compute, and if everybody needs to buy compute at the same time, there’s only so much capacity. When I talk to my friends who do network provisioning for the backbones, it’s ‘like we are still using computers that still have to come from wherever they’re manufactured and we still need the cables and the fiber, and the switches to run the cloud,’” Waterhouse said. “The cloud is just somebody else’s server in their server room. So, when we’re thinking about increasing capacity, we need to have a reasonable expectation that we can either build it or buy it.”
Heidi Waterhouse, LaunchDarkly: Using cloud provisioning to boost capacity makes it a lot easier since “we can just buy more compute, but it also makes things easier for us to ignore,” with fewer “early warning signs..” #INS1GHTS2020 @thenewstack @LaunchDarkly @rookoutlabs pic.twitter.com/5cmohW8ioo
— BC Gain (@bcamerongain) June 26, 2020
Elastic capacity cloud provisioning can solve some, but not all capacity problems in the future, either, Waterhouse said. “You also need to be able to respond nimbly when you’re building a robust system and it really matters that you can corner, because if you have a large system that can only go forward if there’s something in the path, you’re going to have a lot of trouble,” Waterhouse said.
The Production Pipeline
As the dictum that “all companies today are software companies” continues to hold true, this means that the success or failure of any organization depends on whether or not it is filling the software pipeline with applications and code its end users want or require. During his talk “CI/CD with Jenkins and Kubernetes — the Ugly Bits,” Liran Haimovitch, co-founder and chief technology officer for Rookout spoke of the challenges his company faced when determining which tools to choose to develop its SaaS platform. For CI, Rookout relies on Kubernetes, as well as Docker, Helm and other tools and platforms to realize its goal of being able to deliver applications and updates several times a day.
Rookout CTO Liran Haimovitch: “It can be hard to configure Jenkins with configuration as code, so you’re probably going to have to do some manual configuration…And besides, those logs might be useful to keep track of.” #INS1GHTS2020 @thenewstack @rookoutlabs pic.twitter.com/HoCmPwljOY
— BC Gain (@bcamerongain) June 26, 2020
Delivering a robust experience for users who rely on Rookout’s platform requires a lot of work using a Jenkins pipeline, for example. Haimovitch asked, “why would you take something as complex as Jenkins and run it on a beast as complex as Kubernetes?” It can be difficult to configure Jenkins with configuration as code, for example, which means “you’re probably going to have to do some manual configuration,” Haimovitch said, adding that “those logs might be useful to keep track of.”
However, Jenkins offers Haimovitch’s team “with programmable pipelines, essentially configuration as code, but with actual code,” Haimovitch said.
DevOps vs. NetOps vs. SecOps
A recurring theme of NS1 INS1GHTS2020 was the importance of maintaining a collaborative spirit across DevOps, SecOps and NetOps. Prior to the summit, Warren Mead, NS1 vice president, channel and alliances said NS1 does a “good job of bridging the gap between DevOps, NetOps and SecOps.”
“We are so intertwined between the three,” he said.
However, during his keynote, Sullivan said DevOps, NetOps and SecOps can have conflicts and described how they can be resolved. “Our software-defined and containerized managed private DNS and DDI software is able to play nicely with all of these groups. These are teams that are working on high-velocity deployment continuous integration/continuous delivery (CI/CD) and, and all of these teams are kind of at odds when it comes to managing VDI or DNS infrastructure because they all want different things from it,” Sullivan said. The containerized platform “that we have is able to sort of play nicely with all of these groups, and give them levers to pull and coexist.”
Diversity Work
Diversity and the Black Lives Matter movement in the context of recent events served as a major theme of NS1 INS1GHTS2020. In fact, the original summit was delayed in the aftermath of George Floyd’s killing to support ongoing demonstrations for Back Lives Matter and equality.
Issues about the importance of maintaining proper work-life balance were also discussed. Indeed, setting policies in support of the well-being of the DevOps team, for example, is not only critical for the organization’s ability to function, but can also be considered an ethical duty by serving as an example in the tech community.
“If you build a culture that is founded on empathy and supportiveness is core to the values of the business, then you create room and support in the team and the organization for the needs of really everybody,” Beevers said.
For overall well-being, ways to avoid alert fatigue, and eventually, burnout, especially among on-call teams were the themes of a talk given by Bethany Abbott, a TechOps manager at NS1. She described processes she has learned and implemented as a TechOps manager at NS1 and elsewhere.
Prior to the pandemic, a survey showed 50% of tech workers were already experiencing burnout, Bethany Abbott, a NS1 TechOps manager, noted, during her talk on processes she recommends to help on-call team members beat “alert fatigue.” #INS1GHTS2020 @BethanyRAbbott @thenewstack pic.twitter.com/3i8QhgvnCK
— BC Gain (@bcamerongain) June 25, 2020
During her talk, Abbott asked the summit’s online participants — who were mostly all on call, she said to “think about your on-call shift, and now think about if you feel burned out right now, if you’ve felt burnt out in your career or if you’re a manager and have had team members tell you that they feel burnt out,” Abbott said. “So, think about if that happened to you in your life and in your technology career,” Abbott said.
Abbott communicated a number of strategies that teams can adopt to prevent burnout. This included, in addition to promoting a culture of empathy, semantic changes to how we describe duties, such as the removal of the word “quarterback” when referring to responsibility for a project.”We don’t need sports metaphors in our incident-management process,” Abbott said. “Let’s just call it an ‘owner,’ the ‘point-of-contact’ or “incident commander.’”
Noting that alerts are “interrupt-based by nature,” Abbott described how runbooks can help to reduce stress and anxiety by succinctly outlining what needs to be done for each type of alert. When receiving an alert, it should be possible, for example, to click the runbook and a dashboard to see relevant monitoring information in order to know “this is what I need to do when I get paged for this,” Abbott said.
Software also does not just run itself, despite major and ongoing advances in artificial intelligence (AI). It also certainly does not create itself, and as anyone in DevOps knows all too well, it breaks often once built. A number of team members, such as the reliability engineer (SRE), will also get that dreaded called in the middle of the night when the code or network crashes.
During her talk “Cultivating Production Excellence,” Liz Fong-Jones, principal developer advocate, for Honeycomb.io, drew on her experience for over 17 plus years as either an SRE or a systems engineer. She discussed “some of the lessons I have learned… how it’s really impacted how we think about our socio-technical systems and how we have to design those systems that are technical, as well as our people systems, in order to make sure that we’re able to run them sustainably and reliably,” she said. “So, I think that we try to create code in order to solve problems that we’re trying to solve, such as some kind of issue going on in the world, whether it is improving productivity or commerce, but the problem is that that code doesn’t exist in a vacuum,” Fong-Jones said.
The end goal should not be to just make systems more reliable, but to “also be friendlier to the people who operate them,” Fong-Jones said. “And you don’t get there by accident — we really have to develop a roadmap and plan to figure out how we get from where we are today, to the ideal world that we’d like to be in in the future,” Fong-Jones said.
Liz Fong-Jones, https://t.co/PR0IllAkft “We have to design those systems that are technical, as well as people systems, in order to make sure that we’re able to run them sustainably and reliably.” #INS1GHTS2020 @thenewstack @honeycombio pic.twitter.com/bc81ovfZnf
— BC Gain (@bcamerongain) June 26, 2020
Amazon Web Services, Honeycomb and VMware are sponsors of The New Stack.
Feature image via Pixabay.
At this time, The New Stack does not allow comments directly on this website. We invite all readers who wish to discuss a story to visit us on Twitter or Facebook. We also welcome your news tips and feedback via email: feedback@thenewstack.io.