Handling hundreds of terabytes? Forget indexing

By Arnon Shimoni

4.8.2019 twitter linkedin facebook

SQream DB is a columnar database designed for very large data sets. Our database engine has to manage hundreds of terabytes. As a result, we have to do things a bit differently than ordinary databases. In this blog post I will discuss two tightly connected concepts: chunking and zone-map metadata.

Chunking for hyper-partitioning the table

If we consider the columnar approach as a vertical partition, SQream DB adds another layer of horizontal partitioning, which we call ‘chunking’. But how do we know which chunks are relevant for a query?
Roughly speaking, each ‘chunk’ holds some amount of data for part of a column.
All data in a SQream DB table is automatically split up into chunks, each containing a few million data points. These chunks are then compressed, resulting in a chunk size of just a few megabytes. This means that each group of rows in a database table are divided into several chunks, spanning the number of columns.

The SQream DB hyperpartitioned table
The SQream DB hyper-partitioned table is built up of columns, further split up into chunks

This method for horizontally partitioning the data provides several key benefits:

  • Unlike regular partitioning, SQream DB’s chunks are automatically partitioned across all data types and columns, requiring no intervention or maintenance
  • SQream DB chunks are small, which makes them efficient for transfer across the PCI bus to the GPU for processing
  • SQream DB’s compression algorithms adapt to different data more easily, due to the data locality principle. As your data changes, the automatic adaptive compression algorithms determine the best compression for that specific chunk

Metadata collection for zone-maps and data pruning

On ingest, SQream DB collects and stores metadata for all rows stored in a chunk. The most useful aspect of the collected metadata is the range of values and properties for the values being ingested. This metadata is stored separately from the compressed chunk.
This feature is derived from MonetDB x100’s original implementation from their Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS document. This technique as also seen success in more mainstream databases like Oracle and Netezza, where it’s called a zone map, as well as the BRIN indexing technique, found in newer versions of Postgres.
The zone map method provides several key benefits:

  • Unlike standard indexing, the metadata collected on these chunks is automatic and transparent across all data types and columns, requiring no intervention or maintenance
  • The metadata collection is space-efficient compared to columns, resulting in just a 1% storage overhead
  • Querying data becomes much faster, as the calculated zone-maps allow for efficient data pruning (also called skipping), eliminating reading of irrelevant data
  • Older data can be easily deleted.

When running a query, SQream DB looks at the metadata catalog, and excludes the chunks that the zone-map has identified as irrelevant for the query. Data that is out of the range of the query (for example, when filtering with a WHERE clause, or when joining on some keys) won’t be read from disk.

“Why don’t you have indexes?”

SQream DB has fast parallel I/O, but even then, it really helps if the database could know which data is needed for a query. SQream DB doesn’t support explicit indexes. At all. So, we are often asked – “why don’t you have indexes?”
Initially, we wanted to simplify the usage and provide fast results with as little tweaking as possible. In fact, our research into this topic revealed early on that high-throughput analytic queries run so much slower when the indexing wasn’t done exactly right, so we decided to keep it as simple as possible for the users.
As we developed SQream DB further, it became apparent that the cost of maintaining the indexes during bulk inserts and delete operations are quite high, and we’d have to maintain another quite large internal database for these indexes.

Loading data with indexingLoading data with a typical RDBMS results in larger data on disk, due to indexes and other data preparation methods

“But you actually do have indexes, because your metadata zone-maps are the same thing”

Yes, and no. Indexes and our metadata zone-maps have one key thing in common: both are a way of collecting metadata information about the data to speed up queries.
However, SQream DB’s metadata zone-maps differ in that they are light-weight, requiring only 1% overhead. It’s a lot easier to store and place in RAM.

Loading into SQream DB with metadataLoading with SQream DB is straightforward, with just a lightweight metadata layer requiring only <1% overhead

The zone-map system is not perfect. It’s a more brute-force approach, one that could cause SQream DB to scan more than an indexed scan approach. Rather than providing a perfect system like the index, it provides a more straight-forward approach that works well most of the times.
More importantly, SQream DB captures data ranges for every chunk. Because each piece of information is stored in a chunk, SQream DB can ignore the chunks that are outside of the lower bound or higher bound of the chunk. The larger the chunk, the more data fits inside. Because this could result in a much larger range of values, this approach is best used when data is clustered together or sorted.
This system works best for real-world data in large quantities, because of the nature of timestamped data.

Timestamped data

SQream DB automatically collects metadata about all columns, but numerical data benefits the most. Most database tables have some sort of timestamp or date, and this data is usually in some sort of ascending order. As data is ingested into a SQream DB table, the system fills up chunks. When a chunk is filled up with a certain date, a new chunk will be created with the same or newer date.
For example, consider a query over a large historical data table spanning 10 years. If you wish to analyze a specific hour of the day, SQream DB will only scan, at most, 1/24th of the table. If you only wish to look at one hour during December 24th across 10 years, SQream DB will only scan 1/8760th of the table. By further selecting only the relevant columns, SQream DB can read a much smaller proportion of the table.

Zone-Maps in SQream DB
SQream DB’s metadata zone-maps allows for pruning irrelevant data from a query

For time-series data, this level of pruning enables fast results for ranged queries even on very large tables, without the scourge of a big and heavy index.

To summarize

This blog post is meant to give you some understanding of how SQream DB manages large amounts of data in a straight-forward way, with a small overhead that scales better for large data sets than do traditional methods.

  • SQream DB hyper-partitions tables both horizontally and vertically
  • SQream DB collects metadata for generating zone-maps automatically for every chunk
  • Zone-maps are much lighter-weight than indexes, but not as accurate
  • Joining or filtering on mostly sorted values like dates is likely to significantly accelerate your queries

We invite you to see for yourself how SQream DB functions for large-scale data. Click here to try SQream DB or learn more about our architecture.