Will real-time data processing replace batch processing?
At Confluent's user conference, Kafka co-creator Jay Kreps argued that stream processing would eventually supplant traditional methods of batch processing altogether.
Absolutely: Businesses operate in real-time and are looking to move their IT systems to real-time capabilities.
Eventually: Enterprises will adopt technology slowly, so batch processing will be around for several more years.
No way: Stream processing is a niche, and there will always be cases where batch processing is the only option.
API Management / Data / Software Development

The Fundamentals of Data API Design

Building a resilient data API that can handle the demands of high volumes of real-time data requires a journey back to first principles.
Jun 30th, 2023 11:25am by
Featued image for: The Fundamentals of Data API Design

APIs are a cornerstone of the modern internet, serving an endless variety of use cases and making it possible to integrate and communicate between systems and applications. A large subset of these APIs is what we call “data APIs”: APIs that enable access to a database through structured queries.

But building effective and robust data APIs is a complex and challenging task, one often underestimated. This is particularly true when your data API is built on top of high volumes of real-time streaming data, accessing billions of historical data records while handling a high number of concurrent client requests.

So building a resilient data API that can handle the demands of high volumes of real-time data requires a journey back to first principles. Let’s take a look at the fundamentals of data API design.

Design Comes First

Whether you’re building public data interfaces or internal data APIs for interconnected microservices, you absolutely must begin with a thoughtful design. An effective data API design sets the foundation for resilient endpoints and a good user experience. You must start by mapping and answering some important questions, like:

  1. What data should my API serve?
  2. How many systems will need concurrent access to the API?
  3. How fast must the API respond?
  4. What information might users pass to the API to get the desired response?

When it comes to data API design, prioritize simplicity and adherence to RESTful principles. Basic and perhaps obvious patterns lead to well-designed data APIs: intuitive endpoints with descriptive names that reflect their purpose, clearly defined data object models, diversified response formats and well-scoped request parameters with appropriate defaults.

As with any API, a well-designed data API is discoverable and intuitive, allowing developers to interact with the API effortlessly, saving time, effort and ultimately the number of requests that will tax your system. Organize your data in a way that makes sense to your users and create an API that is not only easy to use but also a pleasure to interact with.

Of course, don’t overlook errors and edge cases. Even with the most intuitive API design, errors will happen. Consider how to make graceful and informative error handling that guides users when issues arise. Comprehensive documentation plays a vital role in enabling developers to integrate your data API effectively. Do you have a list of best practices to make API requests more effective and efficient? Let your users know.

And don’t forget about rate limiting. Every non-cached request to your API will consume compute resources on your database cluster, so take care with access. Decide on the specific time periods for limiting requests, whether it’s per second, minute, day or a combination of these intervals. Given time, there will be that user or application that goes off the rails, either accidentally or maliciously. Protect your server resources against a DDoS (distributed denial-of-service) attack or a rogue junior developer by applying rate limits to avoid service disruptions or massive cloud bills. Include rate limit headers in your responses, and clearly document the limits.

Choose the Right Database

Underpinning every data API is a database, and its performance directly affects the resiliency and usability of your API. The faster and more performant your database, the faster and more performant your data API.

If you are working specifically with high volumes of real-time data, you’ll need a real-time database designed for high write frequency and low-latency reads. If you’re building an API designed to perform complex aggregations at scale, you’ll likely need columnar storage. If you’re building something where latency really matters, like online fraud detection, you need a database that consistently hits performance benchmarks for your use case at scale.

Regardless of your data API’s use case, the database matters. Choose the one that best underpins the purpose your data API serves.

Beyond performance, resilient data APIs require data storage solutions that establish a reliable foundation. This includes support for critical functionalities such as data replication, backups and recovery mechanisms. These features not only ensure data durability but also provide safeguards against potential disruptions, offering peace of mind when it comes to the integrity and availability of your data.

Plan for Scale from the Start

The choice of database dovetails nicely into a discussion about scale. General-purpose databases like Postgres can handle a wide variety of cases at a small scale, but that variety shrinks considerably at a large scale.

To build an effective and maintainable data API is to consider scale from the start. By doing so, you ensure that your data API can handle increased data and request volumes without sacrificing performance. This is your chance to live a little. Assume your API will be a smashing success, and then consider whether your chosen database will serve the API at the scale of success. You’ll avoid painful database migrations at a time when you should be thinking about growth.

Along with a database that has the required query performance, your data API also needs to scale horizontally to handle the ever-increasing number of requests that result from your inevitably wonderful design. Look for a database engine that makes it easy to implement sharding or parallel replication with distributed query engines to run queries across multiple machines.

Likewise, load balancing your data API on the frontend is equally important. By deploying API instances across multiple servers or leveraging containerization technologies, you can distribute the workload and allocate resources effectively. Proper load balancing ensures optimal performance and responsiveness, even during sudden spikes in traffic. Automatic scaling based on traffic patterns further enhances the API’s ability to adapt to varying demands.

Don’t Overlook Observability

The best time to invest in observability is not when you face your first severe traffic spike. To build a resilient data API, you must choose effective monitoring and logging tools at the outset.

From the day your API goes public, you’ll need to understand how it’s performing. Good observability tools and practices let you see in real-time key metrics such as response time, error rates, and data and query throughput. You can also use these tools to measure data quality and ensure the underlying data models accessed by your API remain valid.

By proactively tracking and analyzing these metrics, you can identify performance bottlenecks, optimize API behavior and ensure high availability and responsiveness. And ideally, your teams will know about system issues before your customers do.

Monitoring features are also important for your API consumers. The metadata provided to them about how they are using the API will help them make better design and implementation decisions. Help them understand the details and your entire platform will reap the benefits.

Putting It All Together

Of course, there are many other important considerations when designing and implementing a data API. You must decide which languages and frameworks to use, and all APIs demand authorization and authentication mechanisms to provide data privacy and security. You also need to decide whether you will host your API components, build it all using cloud services or adopt a hybrid approach. Each option has its own trade-offs in terms of scalability, maintenance, cost and deployment flexibility.

There are many things to think about when building any API, but when it comes to data APIs, focus on design, choosing the right database, dynamic scaling and load balancing, and solid observability. If you keep these fundamentals in mind before you start building, your chances of success with data APIs will be much higher.

Stay focused, and build a reliable, scalable and performant data API.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Pragma.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.