The API Bottleneck in SaaS Backup

Enterprise IT has crossed a threshold in which most organizations now realize that they need to protect their SaaS data. In the 2021 Evolution of Data Protection Cloud Strategies report from analyst firm ESG, 64% of IT decision-makers surveyed said that they are partially or fully responsible for backing up the data they have in SaaS applications.
Of course, that leaves more than one-third (35%) who say they depend solely on their SaaS vendor to protect their organization’s data, which is far too high. Because while SaaS vendors do take great care to protect their infrastructure, they don’t typically back up customer data, so if data is accidentally or maliciously deleted, the customer is likely out of luck.
Plus, substantial SaaS outages do happen, even with well-known services. Just recently, Atlassaian suffered an outage in April that wasn’t fully resolved for two weeks. Without backups, that data is completely inaccessible.
Backing up SaaS data, however, is an altogether different beast than traditional, on-premises data protection, in particular because SaaS backup depends on a limited resource: APIs. Obviously, IT does not have full control over the application or the data, because the data is in an offsite service on equipment that an IT team doesn’t manage.
So, to get that data, any SaaS backup system will need to access data via APIs. The problem is, API calls are capped, which mean IT teams must ensure they’re choosing the right API for the job, and to complicate matters, APIs change over time. This complexity means that enterprise IT teams must carefully consider API use and strategies for managing them.
Hard API Caps
Most SaaS apps operate on a multi-tenant basis, so multiple customers share the same resources, and that includes the APIs. As a result, it’s common for SaaS providers to put a limit on how many API calls any single customer can make in a 24-hour period to ensure that adequate resources are available for everyone, and that individual customers don’t consume a disproportionate amount.
The upshot is that IT has a hard cap on how much they can use each API for backup. That’s an important consideration, because these APIs aren’t exclusive to backup — they’re also used for sharing information with other apps that IT has integrated with the SaaS application. And especially in the case of a core SaaS application, such as Salesforce or an ERP, there’s going to be a ton of other applications and services that depend on those APIs.
It’s critical to select APIs for the data being backed up or restored to optimize for speed. For example, in Salesforce, REST API can move 1 million records per hour, whereas the BULK API can do 10 million records per hour. And if IT makes parallel API calls to multiplex data out of Salesforce, REST can achieve a maximum of 10 million records per hour, while BULK can hit a maximum of 300 million, depending on the complexity and size of the object.
The takeaway is that the choice of API can make an enormous difference to an organization’s recovery point objectives (RPOs). The faster you can perform backups, the more backups you can do in one day.
But that doesn’t mean an organization can depend on BULK APIs alone. In order to pull out and restore all of an enterprise’s data in a timely manner, backup systems must take full advantage of all of the APIs available, such as REST, BULK, BULK V2 and SOAP. After all, some kinds of objects cannot be accessed by BULK APIs, such as share objects, for which IT will need to use the REST API.
Plus, BULK APIs are precious to IT organizations, because they’ll almost certainly have other systems that use BULK APIs, and it can be easy to run up against that hard cap. So, IT will need to balance its use of APIs and manage how much they use each.
The Problem of Relying on a Single API
Additionally, just because an API can read data, that doesn’t mean it can write that data, which has enormous implications for restore. It may be that certain objects can be restored with another type of API, but unfortunately, there may very well be some data that cannot be restored due to architectural limitations within the SaaS application itself.
Just because there are limits to how much IT can use each API, however, doesn’t mean that they are set in stone. The actual API limits are based on the customer’s license agreement with the provider. So, it’s important to model out how much IT expects it will need to use each API. And don’t forget about API needs for restoring large amounts of data. Restoration can really eat up API resources, and no one wants to hit an API cap in the midst of a critical restore with business managers chomping at the bit to get back to working in their apps.
Finally, APIs change. SaaS backup systems must adapt to these changes, which adds additional complexity to an already complicated schema for managing and optimizing APIs for data protection.
While organizations are definitely coming to understand the need to back up their data in SaaS services, they may not be aware of how complex it is to build a SaaS backup system that protects all data in a way that meets its RTOs. Carefully considering API use and developing strategies for managing them has to be a critical part of building any SaaS backup solution.