Roman Gushchin (a Facebook software engineer) registry in Linux kernel development list, a set of patches to the slab controller memory mapping application (a memory controller).
The new controller is remarkable by moving slab accounting from the memory page level to the kernel object level, which makes it possible to share slab pages across different groups c, rather than allocating separate slab caches for each group c.
Roman found what he calls a "very serious flaw" in the existing slab memory controller that leads to low utilization these days with cgroups.
“The real reason why the existing layout leads to low slab utilization is simple: slab pages are used exclusively by a memory pool.
If there are only a few allocations of a certain size made by a cgroup, or if there are some active objects left after the cgroup is removed, or the cgroup contains a single threaded application that is barely allocating any kernel objects, but every time on a new CPU: in all these cases, the resulting slab utilization is very low.
If kmem accounting is disabled, the kernel can use free space on slab pages for other allocations. «
The proposed Slab memory controller by Romano Gushchin in the past year was quite promising as increases efficiency of the use of the slab, reduce the size of memory used for slab by 30-45% and significantly reduce total kernel memory consumption.
Also, the patches implemented have indicated that Facebook is already using the code in production on their servers and was saving ~ 650-700MB + for front-end web servers, database caching and DNS servers, among other awards.
By reducing the number of non-mobile slabs, a positive effect is also observed in the field of reducing memory fragmentation. New memory controller significantly simplifies code for accounting slab and does not require complicated algorithms for dynamic creation and deletion of slab caches for each group c.
All cgroups for memory in the new implementation use a common set of slab caches, and the lifespan of slab caches is no longer tied to the lifespan of memory constraints set through cgroup.
The more precise resource accounting implemented in the new slab controller should theoretically load the CPU more, but in practice the differences turned out to be negligible.
En particular, the new slab driver has been used for several months on Facebook servers in operation that handle different types of loads, and so far no significant regressions have been detected.
The patch contains a couple of semi-independent parts, which can also find their use outside of slab's memory controller:
- The Subpage Load API, which can be used in the future to count other objects that are not the size of a page, for example percpu allocations
- The mem_cgroup_ptr API, where pointers counted to a memcg, can be reused for efficient reparenting of other objects, for example pagecache.
At the same time, there is a significant decrease in memory consumption- On some hosts it was possible to save up to 1 GB of memory, but this indicator largely depends on the nature of the load, the total size of RAM, the amount of CPU and the characteristics of working with memory.
Instead of creating a separate set of kmem_caches for each memory cgroup, two global sets are used: the root set for uncounted and the root group cgroup assignments, and the second set for all other assignments. This allows simplifying lifetime management of individual kmem_caches.
Finally, if you are interested in knowing the new set of 19 patches it can be found in the list kernel mail.