SQream Platform
GPU Powered Data & Analytics Acceleration
Enterprise (Private Deployment) SQL on GPU for Large & Complex Queries
Public Cloud (GCP, AWS) GPU Powered Data Lakehouse
No Code Data Solution for Small & Medium Business
Scale your ML and AI with Production-Sized Models
By Ohad Shalev
Great news for database (DB) developers! A recent contribution to the Apache Calcite open-source project by Itiel Sadeh, a talented developer at SQream, brings a valuable new function to Calcite’s window function capabilities: “exclude”.
For those unfamiliar, Apache Calcite serves as the foundation for SQream’s next-generation compiler. This open-source framework acts as a behind-the-scenes engine powering many data systems. Calcite’s core function is parsing and optimizing SQL queries, ensuring efficient execution across various data sources, including databases, warehouses, and even real-time streams. This versatility makes Calcite a powerful tool for building modern data management solutions.
One of Calcite’s key features is its cost optimization engine, which empowers SQream’s GPU-based analytical engine for high-performance query execution.
While integrating Calcite into SQream’s compiler, our team identified a gap in functionality: the absence of an “exclude” option within window functions.
Window functions operate on a set of rows defined by a window clause. This clause specifies which rows to consider relative to the current row being processed. Common use cases include moving averages, cumulative sums, and ranking.
The “exclude” option allows for more refined window frames by selectively removing certain rows from the calculation. For instance, you might want to calculate a moving average that excludes the current row itself. This is particularly useful for avoiding self-referential bias in calculations.
Itiel’s contribution of the “exclude” option significantly enhances the versatility of window functions within Apache Calcite. Users can now achieve a wider range of calculations without needing to resort to workarounds or manipulating the data before applying the window function.
Let’s say you have sales data and you want to calculate the average sale amount for the previous 3 days excluding the current day’s sale. Here’s how you can achieve this:
{ SELECT sale_date, amount, AVG(amount) OVER ( ORDER BY sale_date ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING EXCLUDE CURRENT ROW ) AS moving_avg_excluding_current FROM sales ORDER BY sale_date; }
{
SELECT
sale_date,
amount,
AVG(amount) OVER (
ORDER BY sale_date
ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING
EXCLUDE CURRENT ROW
) AS moving_avg_excluding_current
FROM sales
ORDER BY sale_date;
}
In this example, the moving average is calculated over the past 3 days excluding the current day. This helps in scenarios where the current day’s sale might not be finalized or you want a clearer view of the trend without the current data point.
The EXCLUDE CURRENT ROW clause in SQL window functions provides more flexibility in defining window frames, allowing you to exclude the current row from calculations. This is useful for moving averages, cumulative sums, and other analytic functions where the influence of the current row needs to be omitted.