Real-Time Databases: Who Is Using Them, and Why?
Almost every developer in need of a database will first turn to familiar favorites: Postgres, MySQL or MongoDB. These traditional databases are well-supported across a variety of programming languages, generally free or inexpensive to start and quite easy to implement and host thanks to decades of practical experience and community support.
But in today’s data-driven world, these databases won’t always work. The analytics pioneered by data science, data engineering and business intelligence are now being embedded into applications built by software developers, supporting real-time, user-facing features like in-product analytics, in-session personalization, anomaly detection and alerting, inventory management, sports betting optimization, usage-based pricing and more.
These so-called “real-time applications” require a new class of database, one better suited to support event-driven architectures, high-concurrency and low-latency connections, and the necessary speed and scale to handle sophisticated queries over large amounts of data.
That’s where real-time databases come in.
What Is a Real-Time Database?
At a high level, real-time databases are designed to support the following characteristics:
- High-frequency writes
- A high number of concurrent reads
- Complex analytical queries that typically involve filtering, joining and aggregating data
- Sub-second response times on those queries
By solving for these four factors, real-time databases fit neatly into event-driven architectures that support the user-facing features of real-time applications.
To draw a technical distinction, real-time databases typically use distributed columnar storage that minimizes read times when filtering and aggregating columns. In contrast, traditional relational databases rely on row-oriented storage that’s optimized for single-row look-up.
The three most commonly used open source real-time databases are ClickHouse, Apache Pinot and Apache Druid. You can read more about these three databases, their use cases and how they compare to traditional relational databases here.
Who Uses Real-Time Databases?
ClickHouse was built from scratch by Yandex to power Yandex.Metrica, its Google Analytics alternative. If you, like many, have experienced the frustration of slow Google Analytics queries that lack real-time data, you’ll understand what prompted Yandex to invest in building a new database.
ClickHouse has become more popular over the last few years. Cloudflare famously adopted it to power its user-facing analytics for HTTP traffic, and companies like eBay and Uber continue to use ClickHouse to power more and more real-time analytical workloads.
Similarly, companies like Vercel and the Hotels Network have scaled with ClickHouse using Tinybird as a real-time data platform to achieve use cases like real-time session personalization, in-product analytics, anomaly detection and alerting, and usage-based billing.
Like Yandex, LinkedIn developed its own real-time database to power user-facing features like Who Viewed My Profile, eventually donating it to the Apache Software Foundation. Uber uses Pinot for a variety of internal and external analytics features, ranging from the Uber Eats Restaurant Manager to internal exploratory analytics for business and engineering teams.
Why Do Real-Time Databases Matter?
At a small scale, row-oriented databases like Postgres or MySQL could work as a real-time database. Remember, the definition of “real-time” depends on your context. If you have few records with simple queries that don’t demand a high-concurrency of reads with sub-second response latency, then you’d be just fine using them. For more information on when Postgres could be used as a real-time database, read this.
But if you’ve tried to use something like Postgres for real-time applications, then you understand that it fails at scale. Postgres simply cannot return a query like
SELECT sum(sales_price) FROM sales_events in under a second when
sales_events has millions or billions of rows, regardless of its configuration, and caching provides little benefit when many new events are being generated every second.
Real-time databases like ClickHouse, on the other hand, can handle this problem, and quite easily.
Notably, real-time databases are the foundation for numerous real-time use cases across various industries, including customer care, fraud detection and prevention, monitoring and alerting systems, Internet of Things and sensor data management, social media and messaging applications, gaming, sports betting, location-based services and inventory management.
Should You Make the Switch to Real-Time Databases Now?
Large enterprises like Netflix, Uber, Cloudflare and LinkedIn have pioneered the use of real-time databases to offer a diverse range of previously unachievable features that are now integral to their platform performance and user experience.
But 99.9% of us don’t have the resources or expertise of Netflix or Uber to take the headlong plunge into real time, so the familiar comforts of Postgres, MySQL and MongoDB beckon.
That said, there has never been a better time to embrace real-time databases and the real-time applications they unlock. The costs have never been lower, the developer experience has never been easier, and real-time databases have never been more powerful.
So, while early adopters of real-time databases tend to be large enterprises far along on the data maturity scale, the next wave of real-time database users will be software developers or data engineers at small- to mid-sized companies.
It’s the classic tale we’ve seen play out many times in our industry: A sophisticated technology is made simpler over time by high-end users, and its access, costs and capabilities are democratized for everyone. It happened with Postgres, and it will happen with real-time databases.
How Do I Get Started with Real-Time Databases?
You may have considered real-time databases at some point but decided against them because of their cost and complexity. And added cost and complexity can make any engineer nervous.
To calm your nerves, you can move to real-time databases using a managed service. All the open source real-time databases listed above are available through managed services that let you leverage their power while abstracting some or most of their complexities.
However, even with these DBaaS (Database as a Service) providers, you must still understand how to scale each system to handle your workload and how to integrate with your source data feeds and applications.
Platforms like Tinybird, on the other hand, function as full-fledged development platforms on top of real-time databases. Tinybird, which is built on ClickHouse, transparently handles cluster operations such as scaling, load balancing and distributed storage through a serverless scale-up/down model. Additionally, Tinybird provides abstractions on top of the database like managed ETL (extract, transform, load) and streaming connectors, multiple development interfaces (UI, CLI and API), token management and a rapid API development framework.
Real-time data platforms not only simplify the complexity of the database but also make it much easier to integrate with existing technology and future architectures. As such, they serve an important role in the real-time database landscape by offering developers a low-friction pathway to adopting real-time mindsets and development patterns, unlocking new use cases faster and with less overhead.