NS1: Avoid the Trap of DNS Single-Point-of-Failure

Third-party DNS providers have seen tremendous consolidation during the past few years, resulting in dependence on a smaller pool of providers that maintain the world’s largest website lookups. Reliance on only one of a few single DNS providers also represents a heightened risk in the event of a distributed denial of service (DDoS) attack or DNS outage.
According to a study by researchers at Carnegie Mellon University, 89.2% of the top 100,000 websites ranked by Alexa rely on a single provider for their DNS service instead of using their own service with added redundancy.
This finding underscores how the vast majority of organizations with the highest-traffic websites would be disrupted in the event of an attack or a disruption since they lack redundancy in their DNS services.
Additionally, other websites are indirectly impacted by popular CDN and DNS providers — domain names ending in the academia “edu,” for example, rely on Amazon Web Services (AWS) DNS service through CDN MaxCDN, the researchers noted. A DDoS attack in 2019 on Amazon’s DNS servers resulted in hours of network disruption for Amazon as well as for a “significant” number of popular websites, the researchers said. Also, since a DDoS attack disrupted DNS provider Dyn’s service in 2016, the dependency on single-source DNS providers has increased by 4.7%.
“DNS is a significant target because of the critical role it plays in modern infrastructure. Further, due to DNS’s central role in orchestrating all internet and application traffic, the damage malicious actors can do by carrying out attacks against DNS is greater relative to other attacks,” Shannon Weyrick, vice president of architecture at NS1, told The New Stack. “Although it has long been considered a best practice, this research shows that implementing redundancy at the DNS level is still not widespread. The need for DNS redundancy is the key takeaway from the paper.“
-
Aqsa Kashaf of Carnegie Mellon University discusses the results of “Analyzing Third Party Service Dependencies in Modern Web Services: Have We Learned from the Mirai-Dyn Incident?”
The main takeaway for SREs and DevOps teams is that building redundancy into DNS is a key factor in reducing risk for the DNS single point of failure problem faced by the vast majority of those organizations running the world’s largest websites.
As DevOps teams radically seek to boost the cadence at which they deploy and update applications, a DNS failure risk thus represents a major yet avoidable disruption.
“DevOps teams are delivering code more than 40 times faster than traditional application development. It is important that DNS is available and able to keep up with that velocity,” Weyrick said. “It is equally important for DevOps teams to ensure DNS is resilient so that they are able to operate at optimal efficiency.”
Additionally, cloud architectures, microservices and infrastructure scale often have exposed unmet needs in DNS, such as the need for flexible traffic management, service discovery, rapid propagation and comprehensive API support, Weyrick said. DevOps environments, application teams often implement network functions, such as load balancing, within their applications, Weyrick said.
DNS redundancy and other managed DNS and VPN connection services can help DevOps teams avoid the single-point-of-failure risks associated with DNS disruptions, Weyrick said. For reliability engineers (SREs), for example, DNS has become “an important leverage tool” since it allows them to control and automate application traffic to ensure maximum performance and uptime,” Weyrick said.
“We work with our customers to apply logic in order to steer or manipulate traffic based on business policies, driven by real-time data and telemetry. We are essentially ‘pulling levers’ to steer traffic to boost performance, control costs, or route around problems to avoid downtime,” Weyrick said. “For SREs to ensure their digital services, sites, and applications remain resilient, they need to eliminate single points of failure in the application delivery stack — in particular DNS. Application developers, site reliability engineers, and IT leaders who oversee digital applications should make DNS resilience a priority.”
According to Carnegie Mellon, NS1 customers have a higher rate of redundancy than many other providers, Weyrick said. “This is because NS1 offers separate managed and dedicated DNS services, which makes it easy for customers to leverage two independent DNS services without any cross-provider technical limitations,” Weyrick said. “NS1 has also focused its efforts in educating customers about the importance of redundancy throughout every layer of the application stack.”
Among DNS resilience best practices, Weyrick noted, are the following:
- Diversify for Added Resiliency: Use a DNS solution that is independent of your cloud, CDN or data center. If the provider goes down, you will still have a functioning DNS to direct users to your other facilities, which builds resiliency into the entire application delivery stack.
- Practice Redundancy: It is important to have redundancy at every level of infrastructure, including the DNS host. DNS redundancy ensures that if one DNS network falls under duress, that the other will subsume the queries for the pair ensuring that queries don’t go unanswered. Organizations that deploy always-on, redundant DNS networks for their domains can prevent outages and recover much faster from DDoS and other attacks.
- Leverage an Anycast Network for DNS Resolution: Anycast is a highly resilient routing method. As soon as servers go down, are impacted by DDoS or become unavailable due to global connectivity issues (e.g. a cut fiber or congestion in a certain Internet segment), Anycast dynamically diverts DNS requests to an available region.
- DDoS Overage Protection: DDoS or other malicious traffic often leaves enterprises with surprise overage charges from DNS providers, sometimes up to hundreds of thousands of dollars. When selecting a DNS provider, consider if they provide overage protection to reduce or eliminate attack mitigation and resolution costs related to DNS overages.
- Automate DNS Management Processes: Reduce manual errors and improve resiliency by automating DNS management and embedding intelligent decision-making and traffic steering capabilities within networking infrastructure.
Amazon Web Services (AWS) and NS1 are sponsors of The New Stack.
Feature image via Pixabay.