thumbnails for website (14) Vote for SQream (7)

SQream Blue - Data Preparation Lakehouse

SQream Blue is a SQL data lakehouse that empowers organizations to transform and query complex, multi-terabyte scale datasets to gain deeper, time-sensitive insights at 1/2 the cost and 2X the speed of cloud warehouse and query engine solutions.

Watch Book a Demo

Go faster with SQream Blue

SQream Blue is a cloud-native fully-managed data lakehouse built for fast, reliable, and cost-effective data processing utilizing a patented GPU-acceleration engine. The platform enables easy data preparation and transformation from and to the data lake, for faster analytics and AI/ML

Data Preparation

Transforming raw data to make it ready for analytics (BI / ML), as a part of a Medallion Architecture design pattern. It may involve denormalization, pre-aggregation, feature generation, data enrichment, or validation.

BI Tools (1) BI Tools (1)

Query Engine

Analyze data stored in open-standard formats (ORC, Avro, Parquet, JSON) on cloud storage (data lake) with SQream Blue’s UI or with your favorite BI tool connected to Blue’s processing engine.

BI Tools BI Tools

Ecosystem

Product highlights

GPU_Icon_on_white

GPU processing engine

Blue’s performance leans on patented GPU-acceleration, synchronizing all available resources (CPU, GPU, RAM) and using the brute force of the GPU for the most complex analytical tasks. Blue uses the GPU to achieve parallel data processing. By splitting large tasks into smaller processes, SQream distributes operations between multiple GPU cores, while allowing admins to balance parallelism and concurrency according to their business needs

Oracle_Data_Set_Icon_on_white

Architecture

Blue doesn’t require ingestion or data movement and relies on direct access to data in open-standard formats. Through the entire data preparation cycle, all data remains at the customer’s low-cost cloud storage, maintaining privacy and ownership at best, while preserving a single source of truth and eliminating the need for data duplication.

Data_Integration_Icon_on_white

Connectivity

Blue easily integrates with common open-source workflow management and orchestration tools (Apache Airflow, Dgaster, Prefect), along with support for industry-standard ODBC, JDBC, and Python connectors. Moreover, Blue’s cluster management has a REST API

Table_Icon_on_white

Columnar Optimization

Blue’s processing engine utilizes Apache Parquet’s column-oriented structure and metadata by saving unnecessary data read

SQream Blue - seamless integration into existing infrastructure

Watch Yaniv Leven, the VP of Market Strategy at SQream, talks about SQream Blue offering in the data world on cloud challenges

FAQ

What cloud platforms will Blue support?

The first cloud platform supported by Blue is GCP, while AWS, OCI and Azure will follow in the future.

What’s SQream Blue cloud deployment model?

SQream Blue is a Software-as-a-Service (SaaS) product.

Can customers sign up right away and start using Blue?

During its open beta period, customers will be able to register to SQream Blue on GCP’s Marketplace. A sales representative will reach out, evaluate the opportunity, and approve the creation of a dedicated environment.

Does SQream Blue offers a trial version?

While SQream Blue is still on open beta stage, we will not be offering a free tier or a trial version. Note that since the product works in a pay-as-you-go model and no subscription fee is required, customers can easily try the product by themselves before they commit to working with it.

Is SQream Blue a Data Warehouse product?

SQream Blue is a Data Lakehouse, meaning it offers customers a Data Warehouse experience without having to move their data outside of their data lake. Data is stored exclusively in the customer’s own Google Cloud Storage using open formats (such as Parquet, Avro, CSV, JSON).

Is SQream Blue an ETL product?

Unlike other Data Integration or ETL tools, SQream Blue can’t be used for consolidating data from various sources into one place. An ETL product and a Data Lakehouse have different approaches to transformation tasks. While ETL products conduct transformations as a stage in the Data Integration process, Data Lakehouses perform transformation from and into the data lake, without copying it to anywhere else.

What SQL functionality will be supported by SQream Blue?

SQream Blue will support querying external tables using the exact same capabilities supported by SQreamDB - including JOINs, aggregations and window functions. In terms of data description and manipulation commands (DDL & DML) - Blue will initially support creating external tables, inserting data to existing ones, truncating and dropping. More statement type (e.g. DELETE and UPDATE) will be added in the future.

What are the file formats supported by SQream Blue?

SQream Blue is supporting the following file formats: CSV, Apache ORC, Apache Avro, Apahce Parquet and JSON. Blue’s patented processing engine is optimized for read from and writing to Apache Parquet files.

What are the table formats supported by SQream Blue?

While still in open beta version, Blue won’t be supporting any open table format. Upon evolving into General Availability phase, it will support Apache Iceberg during 2023, and Delta Lake during 2024.

Can customers store data in SQream Blue?

SQream Blue is not a database or a data warehouse, so it has no internal storage capabilities. With direct access to data stored in data lakes, Blue’s customers can avoid data duplication and enjoy its transformation on-the-fly.

Can SQream Blue access data stored in other products/databases?

SQream Blue can access to data stored on data lakes (aka cloud object storage). While on open beta, Blue will be able to access data residing on Google Cloud Storage (GCS), while S3 (AWS) and ADLS (Azure) will follow.

What billing model will SQream Blue use?

SQream Blue uses a per-usage billing model, in which customers only pay for the compute power they’ve actually used using a credits-based system (SQream GPU Units, SGU), with an minute credit rate based on the compute cluster size. Unused clusters will automatically shut down and enter suspended mode in order to cut unnecessary costs. At the end of the billing period, the total amount of credits will be converted to USD in order to create an invoice.

Start going faster

Find out