Clyso engineers boast of having achieved Tbit/s performance in a cluster with Ceph

Image of Clyso

Image of Clyso

Recently Clyso engineers have released news unprecedented, since they mention that have managed to obtain a performance greater than Terabytes per second on a storage cluster based on a fault-tolerant distributed Ceph system.

Without a doubt, this is an achievement that marks the first time that a Ceph-based cluster has achieved such an indicator, overcoming a series of challenges to achieve such performance in the cluster.

The fact that this performance was achieved was due to the fact that the Clyso engineers received a request for the implementation of a cluster and based on the needs of your client and your work to obtain the best performance without leaving aside the requested requirements.

And it is mentioned that the engineers who, when the client first approached Clyso, proposed a configuration that used 34 2U dual-socket nodes distributed in 17 racks with a couple of alternative configurations.

In the end, the client decided to go with a Dell architecture designed by Clyso, which was approximately 13% cheaper than the original configuration despite having several key advantages. The new configuration has less memory per OSD (still comfortably 12 GiB each), but faster memory performance.

It also provides more aggregated CPU resources, significantly higher aggregated network performance, a simpler single-socket configuration, and uses the latest generation of AMD processors and DDR5 RAM. By employing smaller nodes, we halve the impact of a node failure on cluster recovery.

The customer indicated that they would like to limit the additional power consumption per rack to around 1000-1500 watts. 

To improve performance by 10-20%, They discovered that enabling servers in maximum performance mode and disable c-state in BIOS power saving settings it was effective.

It also turned out that When using NVMe drives, the Linux kernel spends a significant amount of time to process spin locks during the IOMMU mapping update process. Disabling IOMMU in the kernel caused a noticeable performance increase in the 4 MB block read and write tests, although it did not completely resolve performance issues when randomly writing 4 KB blocks.

They also mention that while they were figuring out what was happening, the engineers found fixes in the Ceph build scripts, belonging to the Gentoo and Ubuntu projects, which included compiling with the RelWithDebInfo option, since optimization mode was used with it. -O2” in GCC, which significantly increases Ceph performance.

Compiling with the TCMalloc library also resulted in a performance drop. Changing the compile flags and removing the use of TCMalloc resulted in a threefold reduction in compaction time and a twofold increase in performance for random 4K block writes.

Additionally, adjustments to Reef RocksDB configuration and placement groups contributed to overall system optimization.

All the System specifications are shown below:

Nodes 68 x Dell PowerEdge R6615
UPC 1 x AMD EPYC 9454P 48C/96T processor.
Conference proceedings 5GB DDR192
Red 2 x 100GbE Mellanox ConnectX-6
NVMe 10 x Dell 15,36TB Enterprise NVMe Read Intensive AG
OS version Ubuntu 20.04.6 (focus)
Ceph Version Quincy v17.2.7 (Upstream Deb Packages)

The results are impressive: performance for sequential read operations of 4 million blocks reached 1025 GiB/s, while for writes it was 270 GiB/s.

In random read operations of 4 KB blocks, the performance reached 25.5 million read operations per second and 4.9 million write operations. Enabling encryption reduced read performance to approximately 750 GiB/s.

This achievement not only represents a technical milestone for Clyso, but also highlights the continued evolution and improvement in distributed storage capabilities.

It is important to mention that in September, CERN also reached a similar milestone in its exabyte storage cluster based on EOS distributed storage and the XRootD protocol.

finally if you are interested in knowing more about it, you can check the details in the following link


Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: Miguel Ángel Gatón
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.