Understanding Photon Acceleration in Databricks

By Allison Foster

12.2.2024 twitter linkedin facebook

Understanding Photon Acceleration in Databricks

Acceleration is thrilling. Whether you’re watching F1, NASCAR, or even the latest SpaceX launch, seeing acceleration in action is a powerful experience. When it comes to querying data, acceleration is no less important: it can be the difference between a slow-to-react organization that’s ultimately left behind, versus an agile, ever-improving one that is focused on being tomorrow’s leader. 

For many organizations, acceleration comes in the form of Photon acceleration in Databricks.

We’ll explore this functionality, its key features and benefits, use cases, tips and more. If you have any questions about Photon acceleration in Databricks, this should answer them all. 

What is Photon Acceleration in Databricks?

The announcement of Photon acceleration for Databricks was originally made in 2021. Photon is a high-performance Databricks-native vectorized query engine that’s used to run SQL workloads and DataFrame API calls faster, while reducing the overall cost per workload.

Essentially, Photon acceleration in Databricks enables the handling of complex queries on massive datasets, offering faster time-to-insight compared to regular Databricks usage. 

Key Features and Benefits of Photon Acceleration

Photon acceleration in Databricks offers a number of key features and benefits. These include:

  • Compatibility: Photon acceleration from Databricks offers support for SQL and equivalent DataFrame operations with Delta and Parquet tables
  • Speed: Accelerated queries process data faster and include aggregations and joins
  • Caching: Enhanced performance when data is accessed repeatedly from the disk cache
  • Scalability: Strong scan performance on tables with multiple columns and small files
  • Flexibility: Quicker Delta and Parquet writing, using UPDATE, DELETE, MERGE INTO, INSERT, and CREATE TABLE AS SELECT, including wide tables that contain thousands of columns
  • Optimization: Replaces sort-merge joins with hash-joins
  • Performance: For ML and AI workloads, Photon improves performance for applications using Spark SQL, Spark DataFrames, feature engineering, GraphFrames, and xgboost4j

How Photon Acceleration Enhances Data Processing

Photon acceleration in Databricks optimizes all stages of query execution – from parsing, to generation of results. It utilizes vectorized processing, allowing for simultaneous operations on multiple data rows, thus speeding up execution. 

Photon is therefore a strong choice for compute-heavy workloads including joins and aggregations. Its ability to process vast datasets efficiently also reduces infrastructure costs by completing tasks faster and with fewer resources. This means that the total cost of ownership can drop over time, even taking into account additional initial costs.

Comparing Photon Acceleration to Other Databricks Features

Photon acceleration is one of the features offered by Databricks. Here we’ll compare Photon acceleration to Databricks’ other features.

  • Photon vs. standard Spark: Photon provides significantly better performance, especially for SQL-heavy workloads. Benchmark data shows that Photon can outperform traditional Spark processing by up to 10 times, in specific scenarios.
  • Photon vs. Delta Engine: Both Photon and Delta Engine boost performance, though Photon’s C++ foundation and vectorized execution offer additional optimizations, especially for analytical queries.

Photon Acceleration Use Cases

Photon acceleration can be used in a wide variety of use cases. Generally, these can include:

  • Data warehousing: Photon can be used to accelerate analytical queries, making it ideal for use cases involving data warehousing
  • ETL processes: By improving the speed and efficiency of ETL operations, Photon enables faster data pipeline execution
  • Machine learning model preparation: Photon helps in preparing large datasets for ML training – reducing preprocessing time

Now let’s explore some specific use cases for Photon acceleration with Databricks:

Retail

Photon acceleration in retail can streamline data analysis for personalized promotions, for example. By processing large datasets of customer purchase histories and preferences, Photon enables advanced segmentation and faster deployment of targeted marketing campaigns. Retailers can quickly identify high-value customers, predict product preferences, and optimize inventory placement for seasonal trends. 

Telecom

In the telecom sector, Photon acceleration can dramatically enhance the analysis of network performance data. By rapidly processing terabytes of operational logs and usage metrics, telecom providers can uncover patterns in network traffic, predict maintenance needs, and optimize resource allocation. 

Manufacturing

Photon acceleration in manufacturing can speed up quality control processes by enabling faster analysis of sensor data from production lines. Manufacturers can use this capability to quickly identify defects, optimize production efficiency, and reduce waste.

Almost any modern organization can benefit from acceleration when it comes to querying data. Typically, Photon acceleration in Databricks is used for CPU-heavy actions, and in terms of performance a speedup of 3 times is claimed, compared to regular Databricks runtime. 

FAQ

What is the cost of using Photon Acceleration in Databricks?

Photon is available in select Databricks pricing tiers. The cost varies by cloud provider and Databricks plan, but DBUs with Photon can be up to twice as expensive as non-Photon DBUs. 

How can I enable Photon Acceleration in Databricks?

To enable Photon, navigate to your cluster settings in Databricks, ensure it is running Databricks Runtime with Photon, and toggle the Photon setting.

Are there specific hardware requirements for Photon?

Photon works best on modern hardware optimized for vectorized instructions. Using Databricks with Photon on newer compute instances can maximize performance.

Does Photon support all file types in Databricks?

Photon supports widely-used formats such as Parquet, ORC, Delta, JSON, and CSV. 

Can Photon Acceleration improve ML model training speeds?

While Photon is primarily optimized for SQL workloads, it can accelerate the preprocessing and feature engineering stages of machine learning, indirectly benefiting model training. Per Databricks, “The query performance of Photon and the pre-built AI infrastructure of Databricks ML Runtime make it faster and easier to build machine learning models. Starting from Databricks Machine Learning Runtime 15.2 and above, users can create an ML Runtime cluster with Photon by selecting ‘Use Photon Acceleration’. Meanwhile, the native Spark version of point-in-time join comes with ML Runtime 15.4 LTS and above.”

Meet SQream: Industry-leading GPU Accelerated Data Processing

When it comes to accelerated data processing, SQream is a clear leader, consistently demonstrating performance and cost efficiency far outperforming other solutions. 

Leveraging its patented GPU parallelism and advanced compression technologies, SQream empowers organizations to accelerate complex queries and gain powerful insights, even from petabytes of data. 

Its architecture ensures data is accessed directly in open-standard formats from low-cost cloud storage, avoiding the overhead of ingestion and duplication while maintaining a single source of truth.

SQream’s ability to handle dynamic workloads and optimize resource allocation means real benefits for users; from unbelievable results to significant cost savings, even in the most demanding scenarios. 

Whether it’s analytics, machine learning, or other big data processing tasks, SQream offers unbeatable value and consistent ROI.

To see how you can benefit from the SQream solution, set up a call here.

Summary

Photon acceleration in Databricks is helpful for organizations looking to optimize their analytics and data processing workflows. 

Its ability to accelerate SQL workloads, improve resource usage, and integrate into existing environments makes Photon popular. 

When combined with solutions like SQream, organizations can achieve unmatched performance and scalability in their data operations, maximizing business outcomes.