SQream Platform
GPU Powered Data & Analytics Acceleration
Enterprise (Private Deployment) SQL on GPU for Large & Complex Queries
Public Cloud (GCP, AWS) GPU Powered Data Lakehouse
No Code Data Solution for Small & Medium Business
Scale your ML and AI with Production-Sized Models
By Arnon Shimoni
It’s been almost five years since I joined SQream. I started as a Haskell developer, working on SQL features inside our seriously cool query compiler and optimizer. Since then, I’ve moved to deploying SQream DB in a variety of companies, but also speaking at trade-shows, meetups, and more. Through these engagements, I discovered that the features I worked on have a real, long-lasting effect on the product and our users. It’s not that I didn’t already know this – it’s just very different seeing it with my own eyes. While the topic of GPU often comes up, the fact is – SQream DB is an incredibly powerful RDBMS even with the GPU aside. So, I thought it’s about time I write about the coolest features I see in SQream DB today, and what they mean for database users like myself.
This one may seem trivial, but its significance becomes extremely apparent when you’re forced to use non-SQL databases. As someone who’s come from using a bunch of different SQL databases, SQream DB was a cinch to use. For more advanced analytics, SQream DB also has windowing functions, common table expressions, subqueries, regular expressions and more. In most situations, moving queries over from existing data warehouses to SQream DB is quite simple.
This is quite an astounding feature that’s made possible by using a GPU. SQream DB automatically decides on compression for each ‘chunk’ of data that comes in. Each column is split up into GPU-manageable chunks. Each chunk is compressed according to its own data distribution. This adaptive compression saves anywhere between 65% – 85% of the data on the disk, which in turn saves lots and lots of I/O. By reading and writing less data, even meager hardware can go a long way. Today, SQream DB’s compression is adaptive, and it learns about your data. The more data that goes in, the better it becomes. Can your database do that, automatically, all of the time, on every column type?
Before using SQream DB, I mostly used MySQL, Postgres, SQL Server, and the occasional Oracle. The most tedious part of using the database was doing routine DBA work – working out the indexes and that kind of stuff. I’m by no means a DBA. The first thing that struck me when using SQream DB was the transparency of the optimizations. I didn’t need to tell SQream DB anything, because it already knew. Our architect gave me a series of small sessions during my first few weeks, and he explained how this transparency works and why SQream DB was designed like this. He called it “Load-and-Go.” Basically, the system automatically and transparently collects lots of information about the raw data inside the database, as it comes in. SQream DB is designed for speed, so many elements are kept modular and simple. There’s no explicit indexing, explicit partitioning, sharding, projections, or any kind of data preparation. The data is ready to be queried immediately as it’s ingested. Plus, the compiler makes many informed decisions about the subsequent queries, even before they’re executed. Eventually, I started writing my own optimizations. Today, SQream DB has upwards of 200 distinct compilation optimizations, which handle all the most tedious of DBA work. Having SQream DB do most of the optimizations is a bit like having an expert, helping you with getting the settings just right. There is still room for some DBA ingenuity in writing good queries, but SQream DB will take care of most of the performance in most queries.
I’ve actually written in the past about how SQream DB scales linearly, which is phenomenal in its own right. It’s a pretty cool concept that’s kind of hard to believe – a database that doesn’t degrade, even though it’s holding lots more data. This is part of the same “Load-and-Go” architecture, in which data is appended in a very unique way, to limit its effect on the existing data and queries. Just a few weeks ago, Citihub Consulting released a whitepaper in which they examined the performance of several databases in financial industry tests. According to the authors, SQream DB actually scales sub-linearly as data sizes grow. In fact, SQream DB performed just 4.3x slower when data grew by a factor of six, due in part to its excellent query compiler and optimizer. The report discovered that “SQream does not suffer from the query optimisation issues that we have seen elsewhere”!.
To me, this is hands-down SQream DB’s #1 feature. But first, a short story… I was working on a university project that required me to join three tables with hundreds of millions of rows (each). After getting kicked out of the computer lab for crashing the Oracle cluster, and after my workstation MySQL installation made it clear this would never finish – I decided to try something. I had just started working at SQream, and decided to try to get SQream DB (a version so young, it didn’t even have a version number) to run the same query. It was a three table join, with the key being a VARCHAR(64) field. After 12 seconds, I had my results. On my desktop computer, with a low-end NVIDIA GT630 card. As a young developer, it was that moment that cemented my belief in the product. If you want to learn more about SQream DB, check out our SQream DB overview page, or check out our architectural white paper, to see what makes SQream DB tick.