Roman Gushchin, a member of Facebook’s Linux kernel engineering team, has proposed a brand new slab memory controller for the Linux kernel. This new slab memory controller promises to provide a much-improved memory utilization between multiple memory cgroups, via sharing slab pages.
What Is a Slab Page?
Slab allocation is a form of memory management, within the Linux kernel, used with the intention of making memory allocation of objects efficient. This type of memory management reduces fragmentation caused by allocations and deallocations. Slab allocation retains allocated memory for reuse upon subsequent allocations of similar objects and provides a lower overhead cost of object initialization.
Slab allocation involves a cache for a certain type/size of object. That cache has a number of pre-allocated “slabs” of memory, chunked into fixed sizes that are suitable for specific objects. Within the kernel, there’s a slab allocator that manages the chunks such that when it (the kernel) receives a request to allocate memory for an object, it can satisfy that request with a free chunk from an existing slab.
The New Slab Controller
Gushchin discovered what he considers to be a “serious flaw” in the current slab memory controller. According to Gushchin, “The real reason why the existing design leads to a low slab utilization is simple: slab pages are used exclusively by one memory cgroup. If there are only a few allocations of certain size made by a cgroup, or if some active objects (e.g. dentries) are left after the cgroup is deleted, or the cgroup contains a single-threaded application which is barely allocating any kernel objects, but does it every time on a new CPU: in all these cases the resulting slab utilization is very low. If kmem accounting is off, the kernel is able to use free space on slab pages for other allocations.”
Gushchin argues that this wasn’t an issue when the kmem controller was introduced as an opt-in feature, which had to be turned on for each memory cgroup. Now, however, the kmem controller is turned on by default for both cgroup v1 and v2. And since modern systemd-based systems tend to create a very large number of cgroups, slab utilization is rendered less efficient.
By sharing slab pages between multiple memory cgroups (and employing a system where accounting is performed on a per-object basis—as opposed to a per-page basis), this new implementation of the slab memory controller aims to reach a much more efficient level of utilization.
According to Gushchin, the new patchset contains two semi-independent pieces:
- Subpage charging API, which can be used in the future for accounting of other non-page-sized objects, e.g. percpu allocations.
- mem_cgroup_ptr API (refcounted pointers to a memcg, can be reused for the efficient reparenting of other objects, e.g. pagecache.
Gushchin’s new slab memory controller has been tested on numerous workloads. The results of the tests show significant savings in memory usage, including:
- Web frontend, 650-700 Mb, ~42% of slab memory.
- Database cache, 750-800 Mb, ~35% of slab memory
- DNS server, 700 Mb, ~36% of slab memory
With this new controller, it’s possible to gain anywhere from 35-42% better memory usage in Linux. Gushchin noted in this lkml.org thread that nothing in his testing was Facebook-specific. He indicated he’d tested the patch on a Fedora 30 installation and found the numbers to be roughly the same.
Gushchin’s patch is currently under a “request for comments” flag. Should it be accepted, it could find its way into the mainline kernel as early as 2020.