By Ohad Shalev, 11.17.2022

SQream’s On-Cloud Performance with TPCx-BB 30TB Benchmark

How SQream’s data acceleration platform beats Snowflake, Google BigQuery & AWS Redshift in Analyzing 30TB of data in Cost, performance and Total Time to Insight

 

TPC Express Big Bench (or TPCx-BB) is a benchmark that was developed in order to objectively compare Big Data Analytics System (BDAS) solutions. SQream’s big data analysts ran an internal field test derived from the TPCx-BB in September 2021 to understand its performance in comparison to leading cloud analytics solutions (like Amazon and Google). For more information regarding TPCx-BB, please see the official TPC website.

30TB is not enough? Check out our 300 TB performance benchmark


Platforms Analyzed

SQream (currently running only on private cloud), Google BigQuery, Amazon Redshift, Snowflake.

Scale Factor

We ran the benchmark with a scale factor of 30,000, which creates a dataset of ~30TB, as SQream was designed to handle large datasets.

Hardware Used

The main consideration for customizing the hardware stack for each one of the competing vendors was the right balance between cost and performance. Obviously, we took into account each vendor’s recommendation depending on the size of the chosen dataset (30TB) and maintained an equal number of nodes for all participants.

Environment
Configuration
Compute cost (hour)
Storage cost 
(TB)
Amazon Redshift
AWS
8X ra3.4xlarge
$26.08
$24
Snowflake
AWS
Large
$16.00
$40
 (on-demand)
SQream
AWS
8X g4dn.8xlarge
$17.4
$23
Google BigQuery
GCP
Flat-rate 400 slots
$16.00
$20
Snowflake
GCP
Large
$16.00
$46 
(on-demand)
SQream
GCP
4x nl-standard-32 
(with additional 2-GPU each)
$16.88
$20

Running the Field Test

After configuring the chosen cloud environment for the field test and generating the 30TB dataset, we were ready to begin. Out of the 30 queries included on the TPCx-BB, we tested only 18 use cases as a reflection of the functionalities that were supported by SQream’s platform as of September 2021. Those queries were 5, 6,7,9,11-17, 20-26. As we were running the different use cases, we focused on two metrics for comparison:

Performance:
  • Ingestion – time elapsed during the process of transporting the data from its source to the DB / DWH.
  • Query – time elapsed during the process of executing the 18 queries (using concurrent streams, aka ‘Throughput Test’).
  • Total Time To Insight (TTTI) – Ingestion + Query.
Cost:
  • Storage – the cost of storing the compressed data on the relevant cloud vendor service ($/TB).
  • Compute – the cost of resources used to ingest the raw data from its sources and complete the 18 queries ($/Hour).

The Results

The following chart shows the overall performance of each platform for the given workload, in terms of total time for Ingestion and Query in the TPCx-BB field test:

TPCx-BB 30TB Benchmark – Performance HH:MM (lower is better)

The results revealed several performance differentiators between the competing products. Overall, in both cloud environments, SQream presented the best TTTI, between X1.5 to X9.5 faster. As for the average execution time of the 18 queries, SQream presented between 1.7X to 4.6X faster results (212 seconds on AWS and 197 seconds on GCP). Even when segmenting the results into more specific use cases or data types, SQream maintained its advantage:

Query time performance (MM:SS) – per data type (lower is better)

Query time performance (MM:SS) – per use case (lower is better)

Even though the computing cost of machines with GPUs (which is SQream’s case) is usually much higher, the outstanding performance of SQream during the field test staging showed it to be the most cost-effective option:

TPCx-BB 30TB Benchmark – Cost (lower is better)