Can Companies Really Self-Host at Scale?
There’s no such thing as free lunch, or in this case, free software. It’s a myth. Paul Vixie, vice president of security at Amazon Web Services, creator of the original Domain Name System (DNS), gave a compelling presentation at Open Source Summit Europe 2022 about this topic. His presentation included a comprehensive list of “dos and don’ts” for consumers of free software. Vixie’s docket included labor-intensive, often expensive, engineering work that ran the gamut of small routine upgrades to locally maintaining orphaned dependencies.
To sum the “dos and don’ts” up in one sentence though, engineer(s) are always working, monitoring, watching and ready for action. This “ready for action” engineer must have high-level expertise so that they can handle anything that comes their way. Free software isn’t inherently bad, and it definitely works. Identifying the hidden costs of selecting software also applies to the decision to self-host a database. Self-hosting is effective for many companies. But when is it time to let go and try the easier way?
What Is a Self-Hosted Database?
Self-hosted databases come in many forms. Locally hosted open source databases are the most obvious example. However, many commercial database products have tiered packages that include self-managed options. On-premises hosting comes with pros and cons: low security risk, the ability to work directly beside the data and complete control over the database are a few advantages. There is, of course, the problem with scaling. Self-hosting creates challenges for any business or developer team with spiky or unreliable traffic because on-demand scaling is impossible. Database engineers must always account for the highest amount of traffic with on-premises servers or otherwise risk an outage in the event of a traffic spike.
For businesses that want to self-host and scale on demand, self-hosting in the cloud is another option. This option allows businesses with spiky or less predictable traffic to scale alongside their needs. When self-hosting in the cloud, the cloud provider installs and hosts their database on a virtual machine in a traditional deployment model. When you’re hosting a commercial database in the cloud, support for cloud and the database is minimal because self-hosted always means your engineering resources helm the project. This extends to emergencies like outages and even security breaches.
The Skills Gap
There are many skilled professionals with experience managing databases at scale on-premises and in the cloud. SQL databases were the de facto database for decades. Now, with the rise of more purpose-built databases geared toward deriving maximum value from the data points they’re storing, the marketplace is shifting. Newer database types that are gaining a foothold within the community are columnar databases, search engine databases, graph databases and time series databases. Now developers familiar with these technologies can choose what they want to do with their expertise.
Time Series Data
Gradient Flow expects the global market for time series analysis software will grow at a compound annual rate of 11.5% from 2020 to 2027. Time series data is a vast category and includes any data with a timestamp. Businesses collect time series data from the physical world through items like consumer Internet of Things (IoT), industrial IoT and factory equipment. Time series data originating from online sources include observability metrics, logs, traces, security monitoring and DevOps performance monitoring. Time series data powers real-time dashboards, decision-making and statistical and machine learning models that heavily influence many artificial intelligence applications.
Bridging the Skills Gap
InfluxDB 3.0 is a purpose-built time series database that ingests, stores and analyzes all types of time series data in a single datastore, including metrics, events and traces. It’s built on top of Apache Arrow and optimized for scale and performance, which allows for real-time query responses. InfluxDB has native SQL support and open source extensibility and interoperability with data science tools.
InfluxDB Cloud Dedicated is a fully managed, single-tenant instance of InfluxDB created for customers who require privacy and customization without the challenges of self-hosting. The dedicated infrastructure is resilient and scalable with built-in, multi-tier data durability with 2x data replication. Managed services mean around-the-clock support, automated patches and version updates. A higher level of customization is also a characteristic of InfluxDB Cloud Dedicated. Customers choose the cluster tier that best matches their data and workloads for their dedicated private cloud resources. From the many customizable characteristics, increased query timeouts and in-memory caching are two.
It’s up to every organization to decide whether to self-manage or choose a managed database. Decision-makers and engineers must have a deep understanding of the organization’s needs, traffic flow patterns, engineering skills and resources and characteristics of the data before reaching the best decision.