Author: Michael Green, Field Engineer
E8 Storage recently published the Deploying E8 Storage with IBM Spectrum Scale white paper to demonstrate the simplest and most straightforward way to deploy E8 Storage as the underlying storage for an IBM Spectrum Scale cluster. In that paper, we covered how to enable RDMA between NSD servers and clients to improve the performance of the GPFS cluster with E8 Storage. Now, we would like to share with our customers what a tremendous improvement a few simple commands can deliver to the performance of your GPFS cluster.
E8 Storage leverages and greatly benefits from RDMA (InfiniBand or RDMA over Converged Ethernet aka RoCE). RoCE is supported by all data-center grade Ethernet switches, and is also supported by a wide variety of NICs (Mellanox, Cavium/Qlogic, Broadcom). Having Ethernet infrastructure already in place enables customers to extract additional value from their hardware and software investments by moving the NSD Server-Client block communication away from the traditional 1GbE networks and on to the fast, reliable and, most importantly, already paid for RDMA infrastructure. That alone provides significant performance boost in the form of reduced latency and increased throughput.
It doesn’t stop there. By turning on the support for Remote Direct Memory Access (RDMA) using the VERBS API for data transfer between an NSD server and client, customers can further drive the latency down and the throughput up.
The steps to enable the VERBS API can be found in the Deploying E8 Storage with IBM Spectrum Scale whitepaper. To measure performance, we used a small 3 node cluster consisting of 2 NSD servers and 1 client, each connected at 50GbE to the network. The client node had no local access to E8 Storage volumes and so all I/O had to go through one of the NSD servers. The only I/O load on the cluster came from the I/O generator and performance measuring tool, FIO v3.5.
The testing methodology was simple: run random read jobs against the mounted GPFS file system /e8fs1. The workloads we used for the performance comparison were:
- 4k, 100% random read, 8 FIO threads, queue depth of 14 for a total queue depth of 112.
- 128k, 100% random read, 8 FIO threads, queue depth of 7 for a total queue depth of 56.
The results below speak for themselves. For the client performance we’re talking about
- Over 5x improvement in small block latency and throughput
- Over 2x improvement in large block latency and throughput
Note how the large block IO is now able to nearly max out all available bandwidth of the 50GbE connection, which has a potential max throughput of about 5.5GB/s.
E8 Storage customers are uniquely positioned to extract additional value from the fast, reliable and cost-effective solution they already have at their disposal by taking advantage of a few simple steps to enable RDMA within their existing Ethernet or IB networks.
If you want to learn more, contact us!