Amazon Web Services Now Supports the Zettabyte File System
A decades-old file system, one with a storied history, may play a pivotal role in the future of high-volume cloud computing.
Although an older technology, ZFS can potentially can offer great performance for very large, latency-sensitive workloads. Run on AWS Graviton ARM-based processors, ZFS can support up to 12.5GB per second (GB/s) of throughput, and up to 1 million IOPS for frequently accessed cached data, and 4 GB/s and up to 160,000 IOPS directly from persistent storage, all with sub-second latencies.
“It’s a wonderful file system, with tremendous capability,” enthused Wayne Duso, AWS vice president of engineering for storage, edge, and data governance services, in an interview with The New Stack.
This ZFS implementation, based on OpenZFS, is done through AWS FSx, an AWS managed file system service created by AWS to adopt third-party file systems to its cloud environment. FSx for OpenZFS volumes can be accessed from Linux, MacOS, and Windows clients by way of NFS protocols.
Sun Microsystems originally designed ZFS in the early 2000s, with the intent of making it the first file system with 128-bit address size. In effect, it can address 1.84 × 1019 times more data than 64-bit systems, enough indexing room to manage nearly an unlimited amount of data.
As a parallel file system, ZFS can simultaneously serve thousands of clients, or just send an overwhelming amount of data to a single client. It works best at sending lots of small files in parallel, and, as such, it would be useful for workloads such as machine learning, EDA (Electronic Design Automation), media processing, financial analytics, and other uses, according to AWS.
In the AWS environment, cloud clients served by FSx for ZFS include Amazon Elastic Compute Cloud (Amazon EC2) instances, Amazon Elastic Container Service (Amazon ECS) and Amazon Elastic Kubernetes Service (EKS) clusters, Amazon WorkSpaces virtual desktops, and VMware Cloud on AWS.
Despite its great promise, ZFS has had a difficult time of it. Initially designed for Sun’s OpenSolaris Unix operating system, which fell out of favor in the early 2000s. Linus Torvalds has refused to port it to the Linux kernel due to licensing reasons. Apple tried to port it for the MacIntosh but discontinued the effort after a few years.
Oracle acquired Sun Microsystems in 2010, and shuttered OpenSolaris shortly thereafter. Though it continued to work on ZFS, but updates were no longer open source, so the OpenZFS project was created to continue to work of porting the file system to other platforms.
A major advantage to using ZFS as a service is that the end-user doesn’t have to worry about deployment and management, which historically has not been easy with a file system this complex, Duso said. This has also been true with Lustre, another parallel file system, aimed primarily at the high-performance computing market that AWS also announced support for this week, through FSx. AWS also offers cloud support for Windows File Server and NetApp ONTAP, all through FSx.
The idea with FSx is to “bring file systems to our customers that they’re using today,” Duso said. “They built their workflows around these file systems” and it can be precarious to move data from one file system to another.
Big data? “Zettabytes, equals 1 billion exabytes, “is soon going to be commonplace in the corporate technology lexicon,” says @awscloud‘s @SwamiSivasubram. #reinvent2021 #keynote @thenewstack pic.twitter.com/DmEfBZqoj6
— BC Gain (@bcamerongain) December 1, 2021
“Customers said they loved the capabilities, but they didn’t want to allocate the staff and time manager on ZFS,” Duso said. As a managed service, ZFS can be easily deployed.
ZFS also offers built-in near real-time snapshot capabilities, allowing users to restore previous versions of files. FSx itself also executes daily file-system backups to Amazon S3. Each OpenZFS file system can contain multiple volumes, and each can be managed through separate quotas for attributes such as volume storage, per-user storage, and per-group storage.
AWS bills users for file system usage that is based on the storage capacity (per GB-month), SSD IOPS (per IOPS-month), and throughput capacity (per MBps-month).
Other Storage News
The ZFS news was one of a number of storage announcements made at the conference, being held this week in Las Vegas.
The company also launched a new “instant” tier for its Glacier long-term archival storage, optimized for those cases where the storage object is accessed more often than the rarely-accessed material in regular Glacier storage, but not quite enough to warrant the expense of live storage. This option would be great for material accessed about four times a year (such as financial data, Duso observed).
A new EBS Snapshots Archive was revealed for those customers that need to keep their volume snapshots for longer than the time they are typically retained. This approach can save customers up to 75% of the cost for retaining Amazon EBS Snapshots for months or years.
The idea with all these new service offerings is to help customers make the most use of their data and the most cost-effective price, Duso said. Being able to derive value from the data requires that the user can interact with the data seamlessly.
“You don’t have to build tools to move data,” Duso said. “All you have to do is call an API to move that data to where it needs to be,” he said.
So instead of thinking of ZFS as a legacy technology, it might be more accurate to say it is one decades ahead of the time.