Which AWS Technology Should You Use for Your Open Source Database?
Many enterprises are moving to the cloud as a way of controlling costs, reducing complexity, meeting new business goals and getting products and services to market faster. New infrastructure paradigms like Platform as a Service (PaaS), Infrastructure as a Service (IaaS) and Software as a Service (SaaS) are not only becoming more common but are in fact becoming the normal way of operating.
Moving to the cloud allows enterprises to focus less on managing operations and more on developing and producing products.
One of the cloud services many enterprises are employing is Database as a Service (DBaaS), or offloading the infrastructure, management and maintenance of their database to a cloud provider.
One of the most well-known and trusted cloud providers is Amazon Web Services (AWS). AWS has a multitude of cloud service options, and the three most relevant when it comes to DBaaS are Amazon RDS for MySQL (or other open source database), Amazon Aurora, and Amazon EC2 (or, really, a roll-your-own arrangement).
In this article, we’ll look at these options and suggest what might be the reasons you would choose one over the other for various architectures and use case scenarios.
Amazon RDS for MySQL (or Other Open Source SQL Databases)
Amazon Relational Database Service (Amazon RDS) is a web service designed to set up, manage and scale a relational database in the cloud. Amazon RDS assumes responsibility for some of the day-to-day management and maintenance tasks associated with MySQL databases. This includes:
- Software patches
- Automatic failure detection
With a hardware purchase for hosting a server, you are buying a fixed amount of CPU, memory, storage, and IOPS. These will eventually become outdated (or break down over time). With Amazon RDS cloud offerings, resources can be scaled (up or down) independently.
Amazon RDS doesn’t provide access to the specific database instances and restricts access to some of the system procedures and tables that require advanced (super) privileges. You can manually create backup snapshots or automate your backup process for use in restoring your database in the event of an emergency.
With Amazon RDS, you can use other open source database software other than MySQL (MariaDB, PostgreSQL) as well as commercial software (Oracle, Microsoft SQL Server). You can also use Amazon’s MySQL-compatible Amazon Aurora DB engine (discussed in further detail below).
In addition to the security in your database package, you can help control who can access your Amazon RDS databases by using AWS Identity and Access Management (IAM) to define users and permissions. You can also help protect your databases by putting them in a virtual private cloud.
Scaling Reads with Amazon RDS for MySQL
One of the bigger issues many enterprises face is scaling reads of the database to match application usage (without impacting performance). As application use increases, and more calls to the database are made, if the database isn’t properly configured to handle the increased load it can slow down application response times. This could impact the user experience.
For Amazon RDS, scaling is addressed by:
- Providing up to five read replicas
- Two tiers (each with 30 read replicas)
- Normal asynchronous replication
However, Amazon RDS doesn’t allow you to split reads and writes into separate instances or provide traffic load balancing. Percona has recommendations on how to do this using ProxySQL or HAProxy — read about it here: How to Implement ProxySQL with AWS Aurora.
Amazon RDS Backups
Having a reliable backup strategy is key to ongoing business success. In Amazon RDS for MySQL, you can build secure and efficient backup and recovery solutions that leverage the scale of the AWS cloud. Amazon Simple Storage Service (Amazon S3), Amazon S3 Standard-Infrequent Access (S3 Standard–IA), and Amazon Glacier facilitate backup through automated policies and data recovery and reduce the need to provision and maintain on-premises infrastructure.
Amazon RDS High Availability
High availability is crucial for applications that are critical to your business success. These are applications that must always be available and accessible to customers. High availability is often measured in “9s” — which represent a percentage of total time in which the database is up: 99% is two nines, 99.9% is three nines, 99.99% is four nines, etc. The goal is as close to 100% uptime as possible (or necessary given the functionality of the applications; see MySQL High Availability On-Premises: A Geographically Distributed Scenario for more information on the “9s” uptime scale).
Your database can be set up with a high availability architecture as well. This involves setting up a primary database instance and a synchronous secondary instance that takes over should the primary fail.
Amazon RDS uses common failover techniques as well as the concepts of Availability Zones (AZ) to ensure database uptime. Availability zones are differently geographically located datacenters that can take over should one or more go down (due to natural disasters, power outages, etc.). Multi-AZ architectures ensure that someplace somewhere has the database up and running.
Failover between replicas and AZs can take up to one minute or more, however.
Amazon RDS Security
Security of your data becomes even more important when you move to the cloud. The AWS data centers and network architectures are built to meet the requirements of highly security-sensitive organizations. You have access to tools and services that enable you to achieve a better security posture than on many on-premises environments. AWS instances can scale up and down on demand, and by leveraging pay-as-you-go pricing, you get the secure environment that you need without any upfront hardware investments.
Amazon RDS Performance
Amazon RDS performance is comparable to standalone MySQL instances. The main difference is that in Amazon RDS, if you hit a performance limitation, you are restricted to what Amazon RDS allows you to configure in their instances for tuning performance. Your only other option is to expand the number of instances.
Amazon RDS Benefits, Limitations and Use Cases
Moving to Amazon RDS can provide many clear benefits for deploying a database in the cloud. It is easily turned on, off and scale up instances as needed. It removes much of the manual day-to-day database tasks off the plate of your DBAs — including setup, management, maintenance (including patching and upgrading) and provisioning.
However, there are a few limitations as well. You don’t get SUPER privileges for configuring individual MySQL database instances and aren’t allowed to configured OS and other platform configuration parameters to meet specific application requirements (in fact, there are over 70 parameters you can’t change).
You also don’t have direct access to instance logs for monitoring and accounting. While it can be a benefit that you aren’t responsible for patches and version upgrades in terms of overhead for your DBA teams, it also means that you are forced to be on AWS’s schedule for this type of maintenance.
However, there are good reasons and many specific use cases that making choosing Amazon RDS a smart choice. Generally, if your applications won’t suffer due to some database replication lag, failover times higher than one minute, or losing up to five minutes of transactions is acceptable, then there aren’t significant immediate roadblocks to Amazon RDS.
Other qualifying factors for using Amazon RDS:
- You have a “simple” workload and it will remain simple
- When you need to offload “Operations”
- You don’t need to use tools with SUPER privileges requirements
- You need to quickly scale reads for specific events (i.e., Black Friday, Friendship Day, etc.)
Amazon Aurora is a relational database engine that delivers high throughput that can exceed both standard MySQL and PostgreSQL under the right conditions. Aurora is based on MySQL and is designed to be compatible with MySQL and PostgreSQL. This means that applications and tools that use or were developed with MySQL and PostgreSQL can run without modification. Since it uses Amazon RDS, it also is good at reducing time-consuming administrative tasks (as mentioned above).
Amazon Aurora was built with high availability in mind and is integrated with an SSD-backed virtualized storage layer. Amazon Aurora is designed to detect database crashes and restart without the need for crash recovery or to rebuild the database cache.
Amazon Aurora automatically scales storage and rebalances I/Os to provide consistent performance without the need for over-provisioning.
Scaling Reads with Amazon Aurora
Again, as with Amazon RDS, scaling is often an important consideration. With Amazon Aurora, your scaling capacity is up to fifteen instances. You can set up load balancing (as with RDS), but you also have reduced lag and some auto-scaling.
Amazon Aurora High Availability
Amazon Aurora also uses common failover techniques as well as the concepts of Availability Zones (AZ) to ensure database uptime. In addition, it has the following options:
- Connection endpoints
- Storage: 6 copies, 3 AZs
- Automatic failure detection
- Replica promoted
- Failover takes a few seconds
Amazon Aurora Performance
When it comes to performance, Amazon Aurora can exceed MySQL using the same benchmarks (500,000 SELECTs/sec and 100,000 UPDATEs/sec) under favorable conditions. This performance improvement becomes more pronounced on larger instances.
It’s important to note that workload hotspots can decrease performance.
Amazon Aurora Benefits/Limitations/Use
Many of the reasons for moving to Amazon Aurora are the same as for moving to Amazon RDS. It provides a stable, highly available database environment that removes much of the day-to-day upkeep off your DBAs. However, many of the same limitations also exist — with a decreased set of manual tasks comes an equal decrease in the access to configuration options.
One notable caveat for Amazon Aurora is that while it is based on a fork of MySQL, it is NOT MySQL and doesn’t necessarily behave as MySQL would.
Below is a quick list of the benefits, limitations, and possible reasons why you might select Amazon Aurora over other options.
Additional Amazon Aurora benefits:
- Very low latency replication
- Data size up to 64 TB
- Especially high concurrency, large instances
Additional Amazon Aurora limitations:
- Slower (write latency) for small workloads
- Only InnoDB
- Only one logical copy of your data
- No performance_schema (for 5.7, at the moment)
Generally, the same reasons that you would choose Amazon RDS are the same for Amazon Aurora – with the exception that you have even fewer configuration options with Aurora than with RDS.
If your applications won’t suffer due to some database replication lag, failover times higher than one minute, or losing up to five minutes of transactions is acceptable, then there aren’t significant immediate roadblocks to Amazon Aurora.
Additional reasons to choose Amazon Aurora:
- You have a highly concurrent workload
- When you need to offload even more “operations”
Amazon EC2 (Roll-Your-Own)
Amazon Elastic Compute Cloud (Amazon EC2) provides scalable computing capacity in the AWS cloud. Amazon EC2 is really just buying infrastructure in the cloud, and it eliminates your need to invest in hardware up front. This allows you to focus on developing and deploying applications faster and getting products to market ahead of your competition.
Once you purchase the Amazon EC2 instances, you can use them to launch as many virtual servers as you need. You are responsible for configuring security, networking and storage. Amazon EC2 enables you to scale up or down to handle changes in requirements or spikes in popularity, reducing your need to forecast traffic.
Scaling Reads with Amazon EC2
Again, scaling is often a concern for any database deployment in the cloud. EC2 is no exception. Since EC2 affords much more freedom when it comes to how you deploy your database environment, there are many options for how to deal with scaling concerns. However, this freedom also allows you to create and use just about any solution available.
- Scaling depends on your implementation
- Create/add/remove as many replicas and levels as you want
- Relatively simple to add cluster nodes (this is not really “scaling” though)
- You can use third-party software, such as ProxySQL, for read and write splitting as well as load balancing
Amazon EC2 High Availability
Amazon EC2 by nature also allows you to set up your high availability options however you choose. It also allows you to employ the AZ configurations inherent to the entire AWS architecture. Other options include:
- Percona XtraDB Cluster (Galera cluster)
- Synchronous replication to other nodes
- No data loss for node failure
- Automatic failure detection with ProxySQL
- Orchestrator or MHA
Amazon EC2 Performance
Since you are able to set up your own environment and tune it as you want, EC2 performance measurements and comparisons are difficult to make against other AWS solutions. However, at Percona we have done some testing on like systems. You can find one such comparison here:
In them, Percona High Availability Practice Manager Marco Tusa looks at how Aurora compares to a custom set up.
For the rest of our discussion on Amazon EC2, we’ll use the following as a baseline configuration for our comparisons:
- Percona XtraDB Cluster on Amazon EC2
- Auto-provisioning with node AMI (SST)
- ProxySQL on app nodes
- Automatic failover
- Read/write splitting
- PMM for monitoring
- It’s not hard to automate backups
Amazon EC2 Benefits/Limitations/Use
Within an Amazon EC2 environment, you are responsible for many of the same aspects as an on-premises environment, except for hardware issues. You have autonomy over installing everything from the operating system onwards: you decide when to patch, upgrade or install security fixes. Things like backups and configuration changes are all within your control. You also have access to all the regular MySQL and OS configuration settings (as you would if your database environment was on-premises).
With this control comes the responsibility of maintaining, managing and configuring your environment. You can better fine tune your database to meet the needs of your applications, but with that comes the overhead of day-to-day tasks that otherwise would be handled by Amazon RDS or Amazon Aurora.
Additional Amazon EC2 benefits:
- You can get much more visibility into the performance of the database software and the OS, as well as schema
- Access to OS metrics
Additional Amazon EC2 limitations:
- Harder than RDS or Aurora to set up and maintain
- Depends on your chosen architecture
Additional reasons to choose EC2:
- You don’t need to depend on external operational resources and expertise — your DBAs are knowledgeable and able to address and configure database performance
- You value flexibility
- You want to use tools that use the SUPER privileges
- You want to read binary logs
When it comes to using AWS cloud services with your open source database software, you have different options. The most common choices are Amazon RDS, Amazon Aurora, and Amazon EC2. All of them have advantages and disadvantages, but really these depend on your needs and application requirements.
Feature image by Pete Linforth from Pixabay.