Big Data Gets Personal with Genome Research

By Ami Gal

6.19.2014 twitter linkedin facebook

“When people hear that data mining can keep a kid from going blind, they want to know more.” This quote was taken from a Stanford Medical report about Big Data and data algorithms currently being used for medical advances.

Big Data Gets Personal

Did you know that Big Data and analytics are having a major impact on the healthcare sector? Genome research in particular is paving the way for Big Data in personal medicine. Scientists working in the field of genomics are using Big Data analytics to identify relevant data points which can potentially save lives, create more personalized medical treatments and provide advanced disease detection. So, how does all of this work?
Think of it like this: A genome is the DNA structure that makes a person who they are, and many medical patterns can be revealed about an individual when the billions of base pairs in a single genome are sequenced. A single genome sequencing process creates hundreds of gigabytes of data. With over 1000 sequencers already working 24×7 to sequence genomes, genomic research is consuming more and more data each year. Analyzing such intense data volumes requires a state-of-the art Big Data technology. Once analyzed, the genomic data can be used by researchers to gain medical insight.

Too Good To Be True?

A large reason why scientists in the field of genomics are able to analyze so much patient data is because genome sequencing has become much cheaper and faster than it had previously been. DNA sequencing products are more readily available and affordable, and Big Data providers are now focused on creating revolutionary databases capable of managing genomic data.
The IBM Watson, for example, aims to help scientists at the New York Genome Center detect mutations in patients suffering from the brain cancer known as glioblastoma. With the help of Watson, patterns in genome sequencing can be identified to create personalized medicine options for cancer. SAP is also working with a variety of medical professionals and institutions to implement Hana for Big Data analysis. Stanford for example just signed a three-year contract with SAP to use Hana to analyze its large volumes of data at the Stanford School of Medicine.

Technically Speaking…

While revolutionary, the solutions being offered by major Big Data providers have a few drawbacks – they actually tend to still be expensive and are unable to manage the volume of genomic data being produced. Many of these providers offer data platforms capable of analyzing “large volumes” of data at extreme speed, yet these databases are unable to accommodate petabytes of data being generated from genome sequencing.
Most genome research is being done on files and transferring this information to a database makes the data easier to manage. Many database providers are able to offer this benefit, yet the huge volumes of genomic data are unable to actually fit inside these databases in one go. This has become a growing dilemma which is slowing down the advancements of Big Data usage in genome research. At SQream Technologies however, we’ve come up with a simple solution to this problem.
The SQream Big Data database is fast, cost-effective and actually able to manage petabytes of genomic data. This is because SQream’s technology runs on GPUs rather than CPUs, enabling 100X faster insights in a cost-effective manner. Using GPUs instead of CPUs also means that we can run data queries much faster due to the raw power of the GPUs. Additionally, SQream’s DB stores all of the data on disks, making the data analysis process much faster and less costly. Expensive storage arrays and multiple machines are not needed with our state-of-the art technology.
And now SQream Technologies has teamed up with INFINIDAT to provide biomedical researchers with a new, revolutionary genomic tool for Big Data analysis. The combination of Infinidat’s IZBox high-density NAS storage and SQream’s GPU based analytics engine offers genomic researchers an easy-to-use solution to store, manage and analyze more data in a cost-efficient manner.

What Happens Next?

With the right technology, Big Data insights coming from genomic data can be used to determine the early-stage development of a disease or can help pinpoint a specific treatment for someone suffering from a particular illness. For example, if genome data shows that an individual is at risk for developing diabetes, lifestyle changes such as better eating habits and regular exercise can be made in advanced before the disease worsens. By helping determine which genes correspond to certain diseases, genome data could even diagnose various types of cancers by identifying whether cells carrying “driver” mutations have been eliminated during the cancer treatment.
Medicine is also becoming much more personalized based on data analysis from genome sequencing. Data coming from something as simple as a single drop of blood will provide scientists with information which can predict whether an individual is healthy or diseased. This data can then be used to match patients with specific medicines which meet their exact needs.

Big Data “Saving Lives”

Bottom line – faster, more cost-effective Big Data solutions must be made accessible to continue advancing genome research. An SQL based database, such as SQream’s, allows genomic researchers to pinpoint correct data and then run it against millions of DNA samples. This makes it much easier to run the query, without having to open all the compressed files to find the specific part of interest. SQream’s DB creates a much better value in terms of time and cost, providing faster results which lead to better choices when it comes to predicting disease and personalized medicine.
It’s probably safe to say that lives won’t be saved by Big Data insights alone, but it certainly puts medical professional’s one step closer to ensuring longer life expectancies. As data being generated from genome research becomes more available, scientists will be able to predict diseases before they occur and determine which individuals are better suited for particular medicines and treatments.