Why would you even build a database?

By Arnon Shimoni

12.13.2016 twitter linkedin facebook

That’s crazy

We built the entire database from scratch
– You mean the plugins for the GPU?
No, we built the entire database
– But you have a parser from another database
We wrote that too.
– That’s crazy

This is a common conversation we have at SQream with prospects, when we explain what we do. It seems illogical. There are so many databases around, why would we build one from scratch? When I joined SQream about three years ago, I too did not understand why anyone would build a database, and not just an acceleration layer.

Let’s get parallel

Driven by well known “The Free Lunch Is Over” paper by Herb Sutter, the SQream co-founders decided to apply the new GPU technologies to building a database.

Applications will increasingly need to be concurrent

GPUs were typically designed to deal with graphics processing and gaming. They weren’t bound by the same legacy that CPUs were, and were therefore free to innovate and invent new techniques to achieve high performance for games and renderings.
While engineers at IBM, Intel and AMD crammed more complex CPU cores together on the same die in what is known as SMP (Symmetric multiprocessing), the engineers at NVIDIA created an array of simple processors, capable of performing the same operation many times on a huge amount of data.
In a previous blog post, I likened this to a coin-press:

Think of the GPU as a coin press machine, which can punch out 100 coins with one operation from a single sheet of metal, whereas a CPU is a coin press which can punch out 10 coins at a time from a strip of metal. While the CPU might have a faster ‘time between punches’, it also requires a faster feed rate of metal strips as well. This is the key difference between the GPU and CPU. The GPU is throughput oriented, while the CPU is latency oriented. 

 

GPU Large Scale SIMD
CPU Small Scale SIMD
GPU Large Scale SIMD
CPU Small Scale SIMD

So, we understand now that GPUs have found themselves to be quite applicable to a variety of industries – Gaming (obviously), Self-driving cars, virtual reality, machine (and deep) learning… But they were strangely absent from the database and analytics realm.
While building the SQream DB prototype, it became clear that you can’t take a regular database and make it GPU-aware. I mean, you could, but it’s not going to be fast. And you’ll likely have to start messing with bits that weren’t designed to be parallelized. It’s a little bit like strapping a rocket onto a bicycle. It’s technically possible, but it’s ugly and it will probably kill you.

The untapped market

There just aren’t any cost-effective solutions for analytics in the tens to hundreds of terabytes. Even classic, CPU based clustered solutions are very expensive. Sure, the base software might be free – but buying 500 computers, networking, power and the talent for setting these up – is not free.
Unlike programming for a GPU, some things just tend to ‘happen’ in parallel.
While the world is producing large amounts of data growing exponentially, we happened to develop a database that delivers the performance and agility needed for these huge data sets. So many game-changers, like Uber, Airbnb and Zillow base their entire business model on data. These can benefit from a fast, flexible database that wouldn’t mess up their budget.

It’s just so powerful

While building SQream DB we had many surprises. To be frank, we had no idea how fast the GPU could be:
We were surprised that the idea we had a few years earlier actually worked (A few very smart people told us that it’ll never work).
We keep getting surprised by just how fast it is. We also love seeing the reactions we get during our implementation projects (“No way – change these values and rerun the query! Change the JOIN condition, I don’t believe it! Show me the row-counts again?”).
By this point six years on, we know now what we didn’t know before. We have a winner. SQream DB has proven itself as an amazing GPU accelerated database. Additionally, SQream DB has quite modest requirements to run. It can give pretty amazing results on a GPU-enabled laptop (albeit a heavy laptop). You can run SQream DB on your desktop PC and still get better results for complex queries than you would with almost any other database in existence.

Why should YOU use SQream DB?

Your new database stack

Your new database stack


In a recent installation at an Ad-Tech company – in order to get the historical depth they wanted, they’d have to scale out their in-memory database, which would have cost them millions of dollars. With ever-declining revenue, they opted to try out SQream DB instead. Not only was SQream DB as fast as their in-memory database, but it could do more than 100TB, which their in-memory database couldn’t (at that price, anyway). The analysts were happy, IT were happy, and even the CFO was happy.
Because SQream DB plays well with tools like Tableau and Spotfire, and has a native Python connector as well as JDBC and ODBC, the uses are endless.
Some of our customers use SQream DB with custom machine learning code written in Python, others pull out data sets into their Java programs.
 

If scaling is an issue – too expensive or too complicated – take SQream DB for a test-drive.