Containers and Microservices Spark a Search for Better File Systems

File systems usually stay low, both at the operating system level and in tech discussions. Red Hat’s recent deprecation of the Btrfs file system from its platform ignited some interest in the role of file systems in containerized environments.
As Linux distributions container-based operations microservices, they come across new file-system related challenges. Linux vendors, including Red Hat, SUSE and Canonical, are major players in the container space. In addition to their traditional OSes, these companies have also built container as service platforms to handle containerized workloads and microservices. Following the footsteps of CoreOS’s Container Linux, Red Hat has created Project Atomic; Canonical came out with Ubuntu Core and SUSE released SUSE CaaS Platform and Kubic.
Namespaces, Dedup, Scheduling
“One of the biggest challenges that the containers ecosystem faces is that file systems are not currently namespace aware,” said Ben Breard, Red Hat senior technology product manager for Linux containers. Though there are several concepts to create a namespace of sorts with existing file systems, this current limitation creates challenges, particularly around security and usability, for things like user namespaces.
Raj Meel, SUSE global product and solution marketing manager considers “dedupe” (deduplication) as one of the modern challenges.
Since SUSE supports multiple file systems, the dedupe problem can be solved by an existing filesystem like XFS and Btrfs. But that creates new challenges. You can have “perfect” dedupe such that no space is ever “wasted” on duplicate data, but it comes with a pretty hefty performance penalty that must be incurred all the time, regardless of how related data is, said Meel.
Josh Wood, DocOps, CoreOS pointed out another problem in a containerized environment: “Permanent storage has tended to locality,” he said. “To achieve the scaling and reliability aims of modern infrastructure, it has to become dynamic enough to gracefully support the automated scheduling and scaling of applications, as well as automated reprovisioning of the underlying compute resources where bits are actually stored on disk.”
Another, even more important challenge is splitting limited IO resources between containers: “Blkio throttling and the ability of IO schedulers to balance IO have both seen improvements, but there’s still a long way to go,” said Wood.
What’s the solution?
We have barely scratched the surface. We can keep going on and going, digging out problems with traditional file systems. What’s the solution? Do we need new file system(s) to address those concerns? Can they, realistically?
“Our emphasis is not on writing new file systems, primarily because they take years to prove themselves worthy of storing critical customer data on,” said Mark Thacker, principal technology product manager, Red Hat. “Some new file systems never make it to that point.”
However, there are “relatively” newer file systems, like OverlayFS, that do solve some of these problems, efficiently. But OverlayFS is not that new. It is, however, a good example of how communities are become innovative to solve new problems with existing solutions.
Necessity Is the Mother of all Inventions.
“Containerization has resulted in several challenges for local file systems. Perhaps the most obvious example is in shared lower layers; OverlayFS has seen rapid development in no small part due to the needs of container runtimes,” said CoreOS’ Wood.
“In certain cases, overlay2 is a great choice for a container root filesystem, for example,” said Anil Madhavapeddy, engineer at Docker.
But when it comes to file systems, there is no silver bullet. “Each file system has its strengths and weaknesses and the “best” general purpose file systems really just have less to differentiate on but are never outstanding at anything,” said Meel.
Most existing file systems do have capabilities to handle modern workloads; companies continue to add new capabilities these file systems.
For example, the pending arrival of persistent memory devices (such as NVDIMMs) has challenged a lot of assumptions around capacity, latency and even what is memory versus storage. To address these challenges, both XFS and ext4 have been modified to take advantage of these devices.
But, Thacker warns, these are short-term solutions. In the long term, there may be more optimized workloads built around these devices, which will require more work and innovation. “Modern microservices architectures, with fast startup and shutdown requirements, and extreme image content re-use may be ideally suited to these new architectures,” Thacker said.
On the SUSE side, Meel strongly believes that btrfs is the best bet for modern workloads. SUSE is using Btrfs in its SUSE CaaS Platform and its open source cousin, Kubic. Btrfs creator Chris Mason told me that they are using Btrfs for containers at Facebook.
Red Hat puts its weight behind XFS, which is being used in RHEL and Red Hat Enterprise Linux Atomic Host, its containers-as-a-service platform. “In fact, XFS is the only file system we support OverlayFS on top of, which is now a requirement for modern containers,” said Breard.
While Red Hat is investing in XFS, it’s also working on a new project to solve file systems related problems. It’s called Project Stratis. The project was introduced by Red hat at the 2017 Linux Vault Conference.
Project Stratis is a proposal for managing local file systems and devices in a much simpler and extendable manner. Stratis uses proven, existing technologies, such as XFS and device mapper, but integrates them with volume management, file system provisioning, data protection, much like a volume managing file system would, but without the years of stability testing required for a new file system.
“While Stratis is in the early stages as a community project, it does have ambitious goals including: customer-defined SLA for data protection, SSD / NVDIMM caching, taking advantage of differing capacity drives, ease of use customer CLI, easy to integrate API, as well as active monitoring and management,” said Thacker.
It’s too early to predict whether Stratis be adopted industry-wide, even by competitors like SUSE, Canonical or Core OS. But its worked before: Systemd is a good example of Red Hat technologies being used across the industry.
Stratis may or may not be the ultimate solution. Different companies may continue to build features around file systems that they know the best. Companies will also continue to use a mix of technologies to cater to their customers.
“We are committed to promoting user choice and this philosophy applies to filesystems as well. While BTRFS is one supported option, we will also continue to support a variety of filesystems through our volume plugins,” said Madhavapeddy. “Ultimately, there are many choices, and each has advantages for particular types of workload – we want to encourage and enable this flexibility for users.”
Conclusion
Linux kernel maintainer Ted Ts’o has said that most of the file systems were written in an era when you needed a general-purpose file system. These days, you don’t need a lot of features, for example, journaling, and companies like Google are removing or disabling such features. Companies are using a mix of file systems for different workloads, optimized for those particular workloads.
But as these companies and communities move forward to create, improvise and innovate new technologies around file systems, they face a non-technical problem; it’s cultural.
“File system and container communities tend to move at a very different pace,” said Breard. “Encouraging these communities to engage with each other rather than avoid and work around each other will help everyone from a long-term perspective, and limit the types of workarounds that lead to security and usability concerns that then require additional effort on all parties,” said Breard.
CoreOS and Red Hat are sponsors of The New Stack.
Feature image by Steven Ramon via Unsplash.