Data Scientists, Got a Clue about GPU? You Really Should…

By Gidon Ben-Zvi

3.31.2017 twitter linkedin facebook

Python. R. GPU.
Wait… What?
 

While Graphic processing units (GPUs) have long powered 3D games on computer displays, they have finally become potent enough to perform high-end computations that are the staple of many engineering activities.

Many clever Big Data platform developers, desperate to store, process and analyze oceans of data, have sat up and taken notice.

 

Mining for Gold in a Sea of Data: Data Scientists and GPU

Today’s data scientist, the brains behind data interpretation, should seriously consider getting well acquainted with this promising new way to whittle down massive amounts of data down into concise statistics, to use in predictive and prescriptive modeling.

 

Need for Speed:  GPUs, because of their ability to provide high density processing, have enabled deep learning computations and dominated recently.
(Source: https://www.slideshare.net/VishalSingh405/cpu-vs-gpu-presentation-54700475)

 

After all, the sheer size of Big Data isn’t what is most impressive; it’s the gold mine of business insights it offers when analyzed.

This is where the data scientist enters the frame. One part statistician and one part software engineer, data scientists take massive amounts of data and turn them into valuable, actionable information.

 

GPU databases: The New Lingua Franca?

Until recently, extracting insights and information from data meant having an in-depth knowledge of SAS or R and sklearn, as well as being familiar with data processing frameworks like Spark.

However, primarily due to the emergence of GPU computing, we now have a lot more power, with less required hardware to run a query. Besides enhanced capacity by orders of magnitude, GPUs perform matrix operations that are quite conducive to running back propagation computations in neural nets. It’s this rise of neural network in data science that’s feeding the demand for smaller supercomputers, like GPU-enabled servers.

The main difference between a GPU processor and a CPU core is that each CPU core can perform different instructions on different data in parallel because each one has a separate control unit. In contrast, a GPU has a massively parallel architecture consisting of thousands of smaller, more efficient cores designed for handling multiple tasks simultaneously.

GPU computing is used together with a CPU to accelerate scientific, analytics, engineering, consumer, and enterprise applications. According to one popular analogy, if CPU is the brain then GPU is soul of the computer.

What’s the Difference? While CPUs are best at handling single calculations extremely quickly, GPUs are better at multiple calculations.
(Source: http://www.e2matrix.com/blog/cpu-vs-gpu/)

 

To date, GPUs have three orders of magnitude more cores than a CPU. As a result, using GPUs can be considered the second wave of Big Data and Data Science.

 

Behind the Curtain: How GPU’s Fuel Lightening Fast Analysis

What’s the magic behind GPUs? A single GPU PCIe card may have up to 4992 processors. These processors are specifically designed to perform high-volume and high-velocity numerical computations on both fixed and floating point values.

Big Data vendors such as SQream Technologies are harnessing the power of GPUs for its big data analytics SQL database, gaining high performance and increasing cost efficiency. Imagine being able to run an analysis up to 100 times faster than you can today. Well, SQream DB, with its patented and proprietary technology that uses a GPU as a massively parallel processor to run complex SQL queries, is making this possible.

 

Awesome Threesome: GPU, Data Scientists and Big Data 

With the demand for fast computing services on the rise and fast exceeding current capacities, data scientists are being called upon to hone their skills and contribute their unique talents to the development of GPU-accelerated computing. In a few short years, this powerful combination of demand, technology and data science could well result in the creation of a new class of computers that learn, see and perceive the world as humans do.

Meanwhile, developments in Big Data technology are being bolstered by businesses. Increasingly, firms are requiring that data scientists communicate their findings and solutions with engaging and relevant stories that present problems and possible solutions in an easier, more elegant manner.

Today is a particularly exciting time to be a data scientist, which may well be the reason that the Harvard Business Review called it the sexiest job of the 21st century.