What news from AWS re:Invent last week will have the most impact on you?
Amazon Q, an AI chatbot for explaining how AWS works.
Super-fast S3 Express storage.
New Graviton 4 processor instances.
Emily Freeman leaving AWS.
I don't use AWS, so none of this will affect me.
Data / Operations / Storage

When ‘Cron Jobs’ Won’t Work in Your Favor

Cron jobs are not sufficiently stable for backup automation. In this article, we explore how best to back up data in terms of setup, frequency, data retention and general automation best practices.
Jan 5th, 2023 10:00am by
Featued image for: When ‘Cron Jobs’ Won’t Work in Your Favor
Image via Pixabay.

Every developer has hit a point where they’ve launched something to production and it breaks, or the database information gets corrupted, which results in them needing to roll back to the data they had previously. If a bug has caused an unforeseen issue, the developer often has to scramble to figure out when the last backup was performed. Was it hours, days or even weeks ago? This is when backup protocols can solve the problem or lead to security and compliance headaches that spin out of control.

While it is best practice to back up regularly and often, in reality, developers have so much going on that backups can go by the wayside. However, if too much time has passed, fully rolling back to the last backup becomes a risky game of security versus compliance, which can come with significant data loss. In many sectors, any data loss can be catastrophic. Health care and finance are clear examples of industries where data is highly regulated and where losing even one day of data can compromise compliance requirements.

Database admins typically develop their own automation via Linux commands, or “cron jobs,” to schedule future backups as an added risk reduction measure to complement their manual backup processes; however, configuring cron jobs to take backups for each environment is not always stable or convenient. In fact, when it comes to cron jobs, database admins often find themselves questioning whether they can reliably roll back to yesterday, if they needed to.

So what is the sweet spot of backing up data in terms of setup, frequency, data retention and general automation best practices? Let’s explore them in more detail.

A Full Backup, Not a Snapshot

First, let us define what we mean by backups. Although the concept of data backups dates back to the early days of computing, web and IT teams today still feel the pains of cumbersome backup options that take time to set up or develop in house only to be disruptive or unstable when they run. When referencing backups, we are specifically referring to “full” backups — these are not snapshots of environments that a CIS admin may take, but rather a backup that stores all of the information, including data in the database at a specific moment.

When ‘Live’ Backups Make the Difference 

Various types of businesses will have different backup needs, but for larger companies, ensuring proper backups can be a big job. One enterprise customer we work with needed 30 rolling backups for 24 hours of data, every day. This is a security requirement for the company, so missing one backup over the last 30 days is a problem for them. Recently, they realized that they needed to automate their backups differently, as creating their own cron jobs wasn’t an efficient practice for them. For one, it introduced downtime, so they would have to schedule it during limited time periods, which can be tricky for a global company dealing with multiple time zones. If there was any other maintenance going on during that downtime, then there may be instances where the backups were lost. Furthermore, with cron jobs, they would have to set up each one individually for each of their projects, and they usually have a few projects happening simultaneously. Creating a new cron job every time they set up a project was overwhelming. They needed live backups, and they needed them automated.

With automated live backups, the CIS admin does not have to initiate or schedule the backup, and most importantly, the backups can be done without any downtime. Some businesses will choose to have live backups in combination with standard (manual) backups at specific times, like before a big deployment, to capture everything at the moment as it is. The latter would cause a short downtime, but it adds an extra element of control and certainty over the process when necessary. Automated backups not only allowed the enterprise to automatically backup every 30 days that went back 24 hours, but with live backups, they were also able to have multiple backups taken throughout the day that are synced and stored for each month going back a full year. The customer didn’t realize how effective this backup process would be for their peace of mind until they tried it and compared it to what they had originally set up on their own.

Daily Backups Are King, but Business Size Matters

When it comes to frequency, there’s variation based on company size and market. Daily backups are strongly recommended at minimum for most businesses. With larger businesses, especially ones that deal with a significant amount of consumer data or are in highly regulated industries, more frequent backups are optimal. For many larger enterprises, losing a day’s worth of data may be unacceptable, so the RPO (recovery point objective) goal may be as low as six hours, as that is the maximum amount of data the organization can tolerate losing (six hours from the time when a failure occurs to the most recent full data backup). In the event of a catastrophic system failure, with everything else to manage, restoring the website from the backup of choice should be just the click of a button. Though the RTO (recovery time objective) or how long a site can be down before it is restored is variable based on the size of the backup, 5GB is going to be faster than 500GB, the faster that process is started the less time it takes.

Backup Planning for Highly Regulated Industries 

In some markets, such as websites that manage health care or finance, every data point is extremely important. If patient information or other data is recorded by the hour, losing an entire day’s worth is catastrophic and a significant compliance issue. We’ve had customers come to us after experiencing slight but critical data loss, or perhaps the data wasn’t recorded after a push and that had a lot of complications in the aftermath.

E-commerce is another industry where rolling back too far can mean losing critical transaction data. Losing data related to customers who visit your website for a purchase can set a dangerous precedent for anyone who’s running an e-commerce website. This is the type of data project owners care about. Make sure that the data is retained in a few different ways or with a heightened level of redundancy. Another e-commerce scenario relates to traffic coming in and ensuring that emails from a website where a shopper is engaging are going through appropriately. Any data loss regarding who was visited can be detrimental to sales in the short term as well as the long term.

For these industries particularly, cron jobs are not sufficiently stable as a method of backup automation. The difference is mainly in the retention policy — for example, how long they must keep patient data or data that isn’t necessarily used right away but stored for another reason. These types of scenarios should be automated via live backups in combination with standard manual backups, and potentially incremental backups as well to manage storage costs.

Default Policy Should Exist for Every Single Project

Many CIS admins are singlehandedly juggling multiple projects and can benefit from having a default policy that they can adjust based on storage needs, as each backup will be quite large but may differ in scale. When a business takes an inconsistent approach to backups, it can result in unnecessary time spent. And in some cases, when only taking backups before deployment, it can lead to unpredictable data loss.

Three Backup Frequency and Automation Recommendations

The businesses we work with typically fit into the following three categories regarding backup frequency and automation options and best practices:

At minimum:

  • A backup should be taken every day with each retained for two days.
  • Two manual backups should be retained until a new one is taken, which would then replace the oldest backup currently stored.

For project owners that need more backups or longer retention, the next step entails:

  • A live backup taken daily, each retained for one week.
  • A weekly backup, each retained for four weeks.
  • A monthly backup, each retained for one year.
  • Four manual backups, all of which are retained until a new one is created and replaces the oldest.

Lastly, for project owners who feel they need more backups and longer retention, we recommend:

  • One live backup every six hours, each retained for one day.
  • A live daily backup, each retained for one month.
  • Twelve monthly backups, each retained for one year.
  • Four manual backups, all of which are retained until a new one is created and replaces the oldest.

When it comes to full backups, being able to roll back without losing too much data is paramount. For this to be consistently successful, setup and scheduling should be a very easy and seamless process. Automated live backups are best in combination with manual backups. The larger or more complex the business, the more a “build it” approach to automating backups can be challenging, lacking stability and reliability across projects and schedules. The purpose of backing up, after all, is to give project owners peace of mind, and when that’s not happening, it’s best to take a look at how to standardize, simplify, and scale across the board within your organization.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.