ChubaoFS: The Cloud Native Computing Foundation’s Speedy New Distributed File System

Adding to its growing arsenal of cloud native open source tools, the Cloud Native Computing Foundation has brought the distributed file system, ChubaoFS, into its Sandbox-level entry point for early-stage projects.
Chinese online retailing giant JD.com contributed the file system to CNCF. The team behind the project contend that ChubaoFS is uniquely suited among distributed file systems to support cloud native workloads, thanks to its virtually unlimited scalability and a robust metadata subsystem spread across the working memory of multiple nodes.
“ChubaoFS offers an innovative new option for general-purpose distributed/shared storage infrastructure regardless of file size or file access pattern,” said ChubaoFS creator, JD Chief Architect and Technical Vice President Haifeng Liu, in a statement. Created internally in 2017, the file system supports JD.com’s Kubernetes-based container platform, running more than 160 applications and services. ChubaoFS has been deployed on thousands of nodes and its p99 latency can reach 5 milliseconds, according to engineers at the company.
Cloud Native and S3 Compatible
Each ChubaoFS volume can be seen as a complete file system from the vantage point of one container, or multiple containers sharing data (ChubaoFS can serve both containerized and uncontainerized environments). Because ChubaoFS implements the POSIX file system semantics, it can be mounted by the Linux OS just like a local filesystem. Volume sizes can range from a few gigabytes to several terabytes. A ChubaoFS cluster can have hundreds of thousands of separate volumes, or file systems, where each volume can have Elastic Block-level storage space.
It also provides a Simple Storage Service (S3)-compatible object storage interface that can be programmed against.
ChubaoFS consists of a metadata subsystem, a data subsystem, and a resource manager. Offering both object and file storage, ChubaoFS offers strong replication consistency and is particularly well-optimized for quickly handling small files — another favorable trait for supporting cloud native workloads. ChubaoFS works well with Kubernetes as its underlying storage infrastructure in that it separates compute from storage. It has been integrated with container storage interface (CSI) and Helm.
ChubaoFS is one of a number of high-performance, highly-scalable distributed file systems that are earmarked for cloud native workloads. For those who want to comparison shop, the FAQ section of the ChubaoFS site offers some critiques of other choices: Ceph can be difficult to learn and hard to optimize. Both HDFS and MooseSF suffer from single-node based metadata bottlenecks. Like ChubaoFS, Facebook’s Haystack keeps he metadata in main memory for performance, though unlike Facebook’s, ChubaoFS stores in this memory the actual physical offsets, instead of logical indices, of the file contents. Also it doesn’t require garbage collection, as files are deleted in realtime according to user requests.
According to GitHub stats, the project has, as of the time of this post, 419 commits from 19 contributors. Currently, ChubaoFS can be run on X86_64 and AMD platforms.
JD.com uses a range of open source and CNCF technologies in its ops, including Kubernetes, Vitess, Prometheus, Helm, and Harbor.