Configuring the Google Cloud Platform for High Availability

Here is a guide for migrating an on-premises SQL Server infrastructure to Google Cloud Platform (GCP), in such a way that you can ensure (HA) or 99.99% uptime of your system, by way of multizone clustering.
One of the first questions you’ll need to answer is just what kind of virtual machines (VMs) you’ll need. Selecting infrastructure in GCP is a much more nuanced exercise than selecting physical servers for an on-prem deployment. Fortunately, Google employs a team of customer engineers who can help you determine just what combination of compute power, RAM, and storage your SQL Server loads will require.
Once you’ve determined what that infrastructure looks like if the application you are running requires high availability plan for HA clustering. For HA in GCP, you’ll want to build out your SQL Server infrastructure in two different GCP zones. Then you’ll need a way to ensure that if one SQL Server instance goes down — and that can happen even in the cloud — the other one can take over the load without missing a beat.
Two Paths to High Availability
There are basically two ways to configure SQL Server for HA in GCP. One involves creating an Always On Availability Group (AG), using technologies built into SQL Server itself. The other involves creating what is called a SANless Cluster, which combines technologies available from a third party with Windows Server Failover Clustering (WSFC).
Fundamentally, they work in similar ways: Once you’ve settled on the instance size you’ll need to support your SQL Server workloads in GCP, you will create a second instance of the same size in a separate but nearby GCP zone. Note that it’s not enough to have two instances of SQL Server for true HA; they need to reside in separate zones within GCP. Any event that compromises the operations of the entire zone — a data center-wide power failure, for example — would take out both your primary and secondary instances if both were located in the same zone.
By locating them in separate but nearby zones, you increase the likelihood that your SQL Server deployment will remain operational because the chances of separate GCP zones going offline at the same time are much lower. Indeed, if you place both your primary and secondary instances of SQL Server in the same zone, GCP will give you an availability service level agreement (SLA) of up to 99.95% availability. To get the full four nines — 99.99% availability — you need to place that second instance in a separate zone.
One of the great strengths of GCP is its internal communications infrastructure, and the connections linking the zones within a Google region offer sufficient throughput that your SQL Server instances, even in two separate zones, can communicate quite efficiently. That’s important because both the AG and SANless clustering solutions will synchronously replicate the data from your primary SQL Server to another instance in the secondary zone. This is also key to HA in the cloud. You can’t construct a true HA solution in GCP using any kind of shared storage for the same reason you can’t locate both your SQL Server systems in the same zone. Housing your critical SQL Server database on any kind of shared storage would create a single point of vulnerability, and HA is all about eliminating those single points of vulnerability.
So, both the AG and SANless Clustering approaches replicate your SQL Server databases so that the secondary server (or servers, if you configure more than one backup system) always has an up-to-date copy of the primary system’s database. If the active instance of SQL Server fails, the HA solution will automatically failover to the secondary instance, which will immediately take over the workloads and use a copy of the same data that had been running in the primary infrastructure.
AG and SANless Clustering Contrasted
So what are the differences between the AG and SANless Clustering approaches? The differences lie in the details of implementation. Both AGs and SANless Clustering solutions rely on Windows Server Failover Clustering (WSFC), the technology built into Windows Server that enables the rapid, automatic failover of a clustered application if a failure is detected. From there, though, the approaches diverge.
The AG functionality is built into SQL Server itself. This has two practical consequences: First, AGs only work with data files associated with SQL Server. If you have any non-SQL files in storage alongside your SQL Server database, the AG replication services will not replicate those files. Moreover, AGs replicate only the user-defined SQL Server databases. Any databases created by SQL Server itself — including the jobs and passwords databases — are not replicated. That’s something to consider if the infrastructure supporting your primary SQL Server system ends up being offline for any length of time. You’ll want to have backups of these system databases on your secondary GCP infrastructure, but you’ll have to manage replication of that data by some other means.
Second, the Basic AG functionality that comes with SQL Server Standard edition allows the replication of only one SQL Server database per AG. If you need to replicate more than one database per AG, or if you plan to replicate that data to more than one secondary system, you’ll need to deploy the Always On AG functionality built into SQL Server Enterprise Edition. If you’re otherwise accustomed to running SQL Server Standard Edition, this will be a costly upgrade.
The SANless Clustering approach, in contrast, is untethered to SQL Server. It is application-agnostic, focusing only on replicating blocks of data from one storage system to another. The practical consequence of this independence is that a SANless Clustering solution will replicate not only your user- and system-named SQL databases, but it will also replicate any other files stored on the disk where your SQL Server databases reside. There’s no need for a separate replication process to ensure the availability of that data.
Additionally, because SANless Clustering replication only focuses on ones and zeros, it does not count how many SQL Server databases it is replicating. It simply replicates blocks of data across GCP zones. If your workloads involve multiple SQL databases running on SQL Server Standard Edition, the SANless Clustering approach to HA enables you to replicate as many databases as you want — without requiring an upgrade to SQL Server Enterprise edition.