MongoDB Can Now Continuously Synchronize Data Across Clusters

MongoDB’s recently-unveiled Cluster-to-Cluster Sync now allows developers and administrators to create a second cluster that accurately reflects their production level data in real-time.
Cluster-to-Cluster Sync can synchronize data across different environments, in a continuous, one-time, or stop/start unidirectional data flow from the source cluster to a destination cluster. It allows the developer to sync data across multiple environments, including MongoDB‘s own multicloud managed database service, Atlas.
- On-prem cluster—> Atlas cluster
- On-prem cluster —> On-premises cluster
- Atlas cluster —> Atlas cluster
- Atlas cluster —> On-prem cluster
Cluster-to-Cluster Sync’s Product Manager Alan Zheng, who led a team of 12 engineers to build the feature, described the goal of the project as one “to build a continuous mechanism to continue the data flow from source to destination 24/7.”
MongoDB launched Cluster-to-Cluster in preview for its MongoDB World last week and will be available in general availability in July 2022.
Ways Cluster-to-Cluster Make the Developer’s Life a Little Easier
Data Migration: As more and more companies want to move to the cloud, Cluster-to-Cluster Sync can help with reliability. Built-in tools include pause, resume, check progress, and reverse the direction if needed.
During the Software Development Lifecycle: Now development teams have access to real-time, accurate production data throughout the duration of the software development lifecycle without the possibility of interrupting production workflow. Blue/Green data environments are also supported. Currently, there is no ability to change the topology of the source or destination clusters but Zheng did mention this is in the project’s pipeline.
Audit and Compliance: In a very general sense this allows developers to hand over production data to audit and compliance teams without having to hand over the production data keys. On a thinner level, some countries have compliance regulations that require companies to keep data locally but those areas don’t have access to MongoDB Atlas.
Analytics: Similarly to audit and compliance, now analytics teams no longer need production data keys for production data.
Stressed Exit: Australia, New Zealand, and Southeast Asia are just a few countries that have regulations requiring companies to put exit strategies in place which allow for data to be pulled off of the cloud and back on-prem in a number of hours.
The Challenge of Sharding
One of the biggest challenges in building out continuous synchronization has been in sharding, or the splitting of one data set across multiple databases.
“How do you support more complicated topologies? If you have sharded clusters, how do you make sure to move all the shards over to the destination?” Zheng described the challenge.
He explained that with sharding comes chunk migration. When data is moved from source to destination, data might also be moving on the source so how does a team of engineers make sure to move that data as well?
It appears that this was successfully overcome as Shard Cluster Support is one of the key capabilities of the new feature. Currently, you can only sync between two clusters if the topology is alike on both sides (e.g. a sharded cluster to a sharded cluster). However, the ability to sync between two clusters with different topology is currently not supported.. Zheng and his team are working on the feature though no tentative release date is set yet.
Cluster-to-Cluster Sync supports continuous sync, resumable sync, or one-time sync. While there is the possibility of a delay between the destination cluster and the production cluster during continuous sync depending on the throughput and network, the destination cluster will be almost identical to the production cluster. Zheng discussed, “eventual consistency.”
Currently, Cluster-to-Cluster Sync supports a 1:1 cluster ratio. Zheng says that features allowing for additional clusters will be added in the future but did not provide a tentative release date.
For more details on how to set up Cluster-to-Cluster Sync, please see the documentation.