Author: Ron Herrmann, Director of Sales Engineering
E8 Storage recently attended BioIT World in Boston, an impressive number of customers and vendors were focused on a common theme: understanding the human genome as the basis for discovering cures for disease and saving lives. Key to that endeavor is how to speed up genomic processing to accelerate scientific research, and E8 Storage’s record breaking performance for analytics is a perfect fit.
From our office in Ann Arbor, MI we see a fairly significant amount of genomic research happening due to the proximity of the University of Michigan, with large health sciences companies such as Pfizer, Perrigo, and Stryker, as well as dozens of smaller firms like Parabricks, Everist, Genomenon, Rubicon (Takara Bio), StrataOncology, and Swift Biosciences. These companies are doing both primary and secondary analysis of genome research, mapping individual genomes as well as analyzing across groups of genomes.
Secondary analysis in genomics processing analyzes the genomes of thousands of people and creates variants from this data in order to test various hypothesis during research. This analysis can require 500TB or more of storage capacity and may require take days to complete, especially if the data is stored on spinning disk or even generation one all-flash arrays. Clearly, a better solution is needed to accelerate secondary research.
Building a High Performance Genomics System
E8 Storage tackles this performance challenge head on, seamlessly integrating into high performance IBM Spectrum Scale (GPFS) clusters built on GPU processors and 100Gb Ethernet (RoCE) or Infiniband networks. With shared read / write volumes, E8 Storage can host the entire secondary genomics on high performance, low latency NVMe storage, feeding the GPU accelerated application data faster than any solution on the market, and producing the fastest genomics results possible.
The drawing below illustrates a reference architecture for integrating E8 Storage into a high performance GPU based cluster for genomic sequencing.
This solution maximizes the processing power of both the genomics applications as well as the file system and storage processing. Using the idle CPU cores on the NVIDIA DGX1 servers, E8 Storage can provide the massive number of IO queues – at very low latency – needed to feed the applications running on the GPUs, which consist of thousands of small cores. With support for shared read / write volumes, every GPFS node in the GPU farm has direct access to data, without the added latency of adding an NSD node for storage access.
The reference solution also offers scalability in multiple dimensions, from compute to storage capacity. E8 Storage controllers can support up to 126 host connections, and performance is maximized in clusters to 10 to 100 nodes. With up to 8 GPUs per node, E8 Storage can easily support a cluster of 1000 GPUs. Conversely, each node in the cluster can access up to 4 E8 Storage controllers, which allows the cluster to scale to over 500TB of genomics data for processing.
What are Customers Saying?
Several customers are using E8 Storage as high performance fast tier / scratch space behind GPFS to support genomics sequencing, and the results have been outstanding. In one user’s experience, E8 Storage was one of the elements that enabled the customer to improve their genomic processing time by 100x!
- “The E8 Storage appliance behaved flawlessly and the integration with InfiniBand was simpler than expected”
- After simple tuning, a single node 100Gb EDR IB achieved 5GB/s throughput and 1.5M 4K IOPS, an impressive result!
- The customer used the E8 Storage API to integrate the volume management with Grid Engine, allowing user-requestable scratch space to be automatically created / destroyed by a job. Hands off storage administration which optimizes genomics processing jobs!
It’s results and endorsements like these that really excite us! Not only customers, but the press are seeing the benefits of what E8 Storage can deliver as you can see in this great article The Register wrote about E8 Storage’s performance for Genomics as well as AI/ML.
2018 is turning out to be the year of NVMe. Customers are starting to build large all NVMe systems for genomics (as well as AI/ML, FinTech and other applications), and are turning to E8 Storage as a pioneer in shared NVMe storage. As a matter of fact, we recently designed an architecture for a 2PB all-NVMe storage solution for genomics where the customer specifically required vendors to bid all NVMe – not Generation 1 SSD or hybrid. This is also a good indication that “all NVMe” solutions are being recognized as the “go to” storage for genomics.