SQream Platform
GPU Powered Data & Analytics Acceleration
Enterprise (Private Deployment) SQL on GPU for Large & Complex Queries
Public Cloud (GCP, AWS) GPU Powered Data Lakehouse
No Code Data Solution for Small & Medium Business
By Arnon Shimoni
Put it simply, a GPU database is a database, relational or non-relational, that uses a GPU (graphical processing unit) to perform some database operations. For example, GPU databases are typically fast, and geared towards analytics. The use of high-throughput devices like NVIDIA Tesla GPUs mean that most GPU databases are more flexible in processing many different types of data, or much larger amounts of data. Let’s have a look at two main types of GPU databases:
Some GPU databases might be GPU-aware, like IBM DB2 BLU. GPU-aware databases will offload some operations to the GPU, like a co-processor. This was done with Netezza Twinfin, which used FPGAs to calculate specific things. Examples of these are IBM DB2 BLU and PG-Strom.
Most companies and individuals who built a GPU database, designed them around the capabilities of the GPU from the start. Presently, GPU database perform most operations on the GPU (“device”) and keep a small amount on the CPU (“host”). For example, bulk of the relational operations, like Project (π), Rename (ρ), Join (⋈) are typically performed on the GPU. Let’s split up the relational database category into two: in-memory and non-in-memory. Looking at different classifications, we see that the focus is mostly on how the database architecture deals with data sets larger than physical memory.
For very large datasets, the non-in-memory GPU databases can typically handle very large data-sets. This category of databases is frequently used to analyze data set sizes of more than 10TB. Relational GPU databases use SQL as the main query language, but may have additional APIs for performing some operations through a variety of programming languages. Examples of these relational SQL GPU databases include SQream DB and Blazing DB.
In-memory relational GPU databases are designed for fast response times. Subsequently, these databases are designed for smaller data-sets, typically around 1-10TB. Examples of these in-memory databases include MapD (RAM), Kinetica (RAM) and Brytlyt (GPU RAM)
Non-relational GPU databases can be graph databases or other NoSQL databases, that do not use an underlying relational engine. Therefore, these GPU databases designed for specialized, specific purposes, like graph analytics. Examples of these include BlazeGraph.
Depending on your situation, different GPU databases could be right (or completely wrong!) for you. Refer to this article to find out which GPU database is right for you.