WekaIO is replacing Dell EMC Isilon at a genome sequencing installation where Isilon ran out of gas.
In 2015 Genomics England deployed 7PB of clustered NAS flash and disk drive storage from EMC Isilon for its 100,000 Genomes project. That project is expanding into five million genomes and the current Dell EMC Isilon storage set-up is unable to cope.
David Ardley, director of technology at Genomics England, said: “Our legacy storage system had already reached its limit and performance had deteriorated. We needed a modern storage solution that could scale to hundreds of petabytes while maintaining performance scaling, and it had to be simple to manage at that scale.”
100,000 Genome Project
Originally, Genomics England (GEL) wanted to sequence 100,000 genomes from 70,000 people, including NHS patients and their families. Its goal is to provide better disease response by optimising medication for genomes – a person’s DNA structure – and identifying patients at risk from diseases linked to their genome types.
GEL works on large files and looks for common patterns. It requires parallelised access to a library of files, up to 240GB in size, held in network-attached storage (NAS). In 2015, Isilon provided the best kit for this task.
Backup services were provided by Dell EMC’s Data Domain and Networker. In September 2016 GEL decided to additionally use an Isilon data lake to store all the data collected during the sequencing process for it to be analysed.
The data lake was then sized at 17PB. GEL also bought 24 all-flash XtremIO X-Bricks to provide faster block storage for its applications. At that point it had sequenced 13,040 genomes.
GEL completed its 100,000th sequence in December 2018 and has amassed the world’s largest database of whole genome sequences with associated clinical data.
It is running a pilot project in which 20,000 babies will be given whole-genome sequencing to detect their liability for epilepsy, cystic fibrosis and other conditions. NHS England operates the national NHS Genomic Medicine Service (GMS) and intends to integrate genomic medicine with routine NHS care by 2025.
The NHS GMS will be deployed across England from April 2020 and comprises seven networked genomic laboratory hubs in an NHS genomic medicines centre infrastructure. A national genomic test directory and whole-genome sequencing will be available nationwide with an integrated clinical service.
In comes WekaIO
NHS England has now decided to sequence five million whole genomes by 2024. That means a genome library in the 100s of petabytes. The Isilon system, now 25PB in size, can no longer cope and GEL has decided that WekaIO’s file system is the one to use.
A linear projection from 100,000 genome sequences at 25PB to five million sequences entails 1,250PB of data lake storage.
Ardley said he likes WekaIO’s combination of flash for performance and object store for scale, with data tiered from disk to flash.
WekaIO CEO Liran Zvibel said: “The Weka File System has delivered a 10x performance improvement over GEL’s legacy NFS-based NAS and is enabling more effective use of existing cloud infrastructure [This will] improve overall productivity and empower researchers to become more efficient at analysing results.”
Genomics England Chief Scientist Professor Mark Caulfield said: “As the UK database expands to five million sequences and beyond, new insights will help to save many lives, both in the NHS and around the world.”