In 2003, when The New York Times announced that the human genome project was successfully complete two years ahead of its schedule (leave aside the conspiracy theory that the genome was never ‘completely’ sequenced), it heralded a new dawn in the history of modern science. The challenge thereafter was to make sense out of the staggering data that became available.
The High Throughput Sequencing technology came to revolutionize the processing of genomic data in a way, but had its own limitations (such as the high rate of erroneous base calls produced). Google has now launched an artificial intelligence tool, DeepVariant, to analyze the huge data resulting from the sequencing of the genome.
It took two years of research for Google to build DeepVariant. It’s a combined effort from Google’s Brain team, a group that focuses on developing and applying AI techniques, and Verily Life Sciences, another Alphabet subsidiary that is focused on the life sciences.
How the DeepVariant makes sense of your genome?
DeepVariant uses the latest deep learning techniques to turn high-throughput sequencing readouts into a picture of a full genome. It automatically identifies small insertion and deletion mutations and single-base-pair mutations in sequencing data.
Ever since the high-throughput sequencing made genome sequencing more accessible, the data produced has at best offered error-prone snapshot of a full genome. Researchers have found it challenging to distinguish small mutations from random errors generated during the sequencing process, especially in repetitive portions of a genome.
A number of tools and methods have come out to interpret these readouts (both public and private funded), but all of them have used simpler statistical and machine-learning approaches to identify mutations. Google claims DeepVariant offers significantly greater accuracy than all previous classical methods.
DeepVariant transforms the task of variant calling (the process to identify variants from sequence data) into an image classification problem well-suited to Google’s existing technology and expertise.
Google’s team collected millions of high-throughput reads and fully sequenced genomes from the Genome in a Bottle (GIAB) project, and fed the data to a deep-learning system that interpreted sequenced data with a high level of accuracy. “Using multiple replicates of GIAB reference genomes, we produced tens of millions of training examples in the form of multi-channel tensors encoding the HTS instrument data, and then trained a TensorFlow-based image classification model to identify the true genome sequence from the experimental data produced by the instruments.” Google said.
The result has been remarkable. Within a year, DeepVariant went on to win first place in the PrecisionFDA Truth Challenge, outperforming all state-of-the-art methods in accurate genetic sequencing. “Since then, we’ve further reduced the error rate by more than 50%,” the team claims.
Image Source: research.googleblog.com
“The success of DeepVariant is important because it demonstrates that in genomics, deep learning can be used to automatically train systems that perform better than complicated hand-engineered systems,” says Brendan Frey, CEO of Deep Genomics, one of the several companies using AI on genomics for potential drugs.
DeepVariant is ‘open’ for all
The best thing about DeepVariant is that it has been launched as an open source software. This will encourage enthusiastic researchers for collaboration and possibly accelerate its adoption to solve real world problems.
“To further this goal, we partnered with Google Cloud Platform (GCP) to deploy DeepVariant workflows on GCP, available today, in configurations optimized for low-cost and fast turnarounds using scalable GCP technologies like the Pipelines API,” Google said. This paired set of releases could facilitate a scalable, cloud-based solution to handle even the largest genomics datasets.
The road ahead: What DeepVariant means for future
According to Google, DeepVariant is the first of “what we hope will be many contributions that leverage Google’s computing infrastructure and Machine learning expertise” to better understand the genome and provide deep learning-based genomics tools to the community. This is, in fact, all part of a “broader goal” to apply Google technologies to healthcare and other scientific applications.
As AI starts to propel different branches of medicine take big leaps forward in coming years, there is a whole lot of medical data to mine and drive insights from. But with genomic medicine, the scale is huge. We are talking about an unprecedented set of data that is equally complex.
“For the first time in history, our ability to measure our biology, and even to act on it, has far surpassed our ability to understand it,” says Frey. “The only technology we have for interpreting and acting on these vast amounts of data is AI. That’s going to completely change the future of medicine.”
These are exciting times for medical research. In 1990, when the human genome project was initiated, it met with a lot of skepticism from many people, including scientists and non-scientists alike. But today, we have completely worked out each A, T, C, and G that makes up the DNA of all 23 pairs of human chromosomes. After high-throughput sequencing made the genomic data accessible, Google’s DeepVariant could just be the next big thing to take genetic sequencing to a whole new level.