What We Expect from Data Analytics in 2022

By Inbal Aharoni

3.30.2022 twitter linkedin facebook

According to the 2021 NewVantage Partners Big Data and AI Executive Survey, 99% of enterprises are actively investing in big data and artificial intelligence in their quest for superior, data-driven business decisions. Two-thirds (65%) of the firms have appointed a Chief Data Officer, and 96% report that they have enhanced their competitive edge and achieved measurable business outcomes. 

Innovative data analytics use cases are being explored every day across a wide range of verticals, from manufacturing and supply chain management to financial services, healthcare, ecommerce, and retail. However, there are still major challenges in extracting the full value from an organization’s data: huge volumes and varieties of data, costly and time-consuming ETL, and complex hybrid infrastructures, not to mention making data analytics part of your organization’s DNA rather than the specialized domain of data scientists.

In this blog post, we look more closely at the issues surrounding data analytics and discuss how industry watchers expect those challenges to be addressed during 2022.

Barriers to Insightful Data Analytics

puzzled data

Organizations used to aspire to be data-centric. But today, we know that for an organization to get the full value from its data, it must take the next step and become information-centric, i.e., gain timely and reliable insights from the data. 

Below, we describe the key barriers to achieving this next level of data analytics maturity.

Drowning in Unstructured Data 

According to Statista, the volume of data created and consumed worldwide will rise to 181 zettabytes by 2025, up from an estimated 79 zettabytes in 2021. Most of this data is unstructured, coming in a wide range of forms, sizes, and shapes. And ingesting, storing, protecting, and making this unstructured data available for analytics and insights is both challenging and costly.

Poor Data Hygiene 

Clean, high-quality data sets are more important for getting insightful results than algorithms. But the sheer volume, variety, variability, and velocity of the data being collected can affect the veracity of an organization’s data assets. As in many domains, so too in data analytics: Garbage in, garbage out.

Data Sprawl

Many organizations continue to keep their most sensitive data on-premise. In addition, they maintain data stores across multiple public clouds. Maintaining data pipelines at scale across complex hybrid data infrastructures is a key data engineering challenge.

Complex Models 

Deep learning models are what stand behind the real-time, interactive applications that are shaping the 21st century, from self-driving cars to natural language processing, virtual assistants, visual recognition, and much more. 

In order to achieve human-level performance, these models make extensive use of hyperparameters. Choosing, optimizing, and tuning these hyperparameters is yet another major data science challenge.

Lengthy Total Time-to-Insight (TTTI)

The new metaverse focuses on reliable real-time insights using time-sensitive data. Working on data that is stale or produces obsolete insights just won’t cut it in many verticals such as manufacturing, retail, and finance. But ETL is slow and costly, as is running the analytics. Data engineers themselves often become a bottleneck as they juggle an incessant stream of data pipeline requirements from a variety of stakeholders. 

In a recent survey of data engineers, 91% reported that they frequently receive data pipeline requests that are unrealistic or unreasonable. Data lakes and ELT as well as direct querying of raw data have emerged to accelerate time-to-insight, but there are still major data engineering barriers to be overcome to meet the holy grail of real-time, interactive analytics.

What to Look Forward to in 2022

future of data analytics future of data analytics

In this section, we summarize the data analytics trends foreseen for the coming year—trends that seek to address the challenges outlined above. We have organized them into three categories: data management and governance; operationalization and collaboration; and accelerated time-to-insight. 

Data Management and Governance

  • Algorithm-driven data set management: Instead of tedious and error-prone manual data engineering tasks, targeted algorithms will automatically determine which data sets organizations should include in a given data pipeline and which to discard.
  • Data sustainability: Tools and methods will emerge that promote data sustainability and model scalability by reusing and recombining data across different business problem statements. A more operational but related trend is “data as code” to dynamically clone, distribute, and destroy data copies on demand.
  • Data thinning/reduction at the edge: Back in 2020, Gartner predicted that by 2025 75% of enterprise data will be created and processed at the edge, outside the data center or cloud. We will see more and more companies leveraging edge intelligence to ensure that only meaningful and relevant data is streamed into central repositories for analysis.
  • Modernized data lakes replacing data warehouses: By saving data in its native form and loading data immediately after extraction (ELT), a data lake can handle data analytics at a higher scale and with greater agility than data warehouses. Transformation takes place on demand, and data lakes require less maintenance than data warehouses.
  • Data fabric architectures: Organizations will integrate diverse data from complex, distributed infrastructures into a holding area for immediate use, with centralized security and governance policies.

 

Operationalization and Collaboration

  • MLOps, ModelOps: We will see less emphasis on data science and more on data engineering that transforms data science processes (development, training, testing, deployment, monitoring) into controlled, documented, and automated workflows.
  • Low-code/no-code tools: In response to the chronic shortage of data scientists, low-code or no-code tools allow non-experts to create models and applications with minimal input from a data professional. LOB experts with deep knowledge of business challenges can now become “citizen data scientists” (a term coined by Gartner) who independently perform detailed analyses and create machine learning models.
  • Cloud-native fully managed databases: More and more enterprises will migrate to cloud-native fully managed databases for greater agility, scalability, and availability. In addition, offloading DB infrastructure and administration tasks to a fully managed, on-demand service can reduce DB management costs.
  • Full-stack approach: One of the data pipeline development challenges faced by data engineers today is the multitude of data analytics tools that are deployed by different stakeholders. In 2022, we will see a continued trend towards a full-stack approach that creates a centralized, data analytics ecosystem with single-pane visibility and control, along with high levels of automation.

Accelerated Time-to-Insight

  • Kubernetes, GPUs, and smart scale-out storage: These will enable organizations to streamline AI development and production performance, for faster time-to-market and time-to-insight.
  • “Collaboration data mining”: We will see silos being broken down to achieve better BI outcomes faster. The data fabric architecture mentioned above will also promote better and faster results by interweaving graph, document, and time-series databases.
  • Augmented dashboards: Shared across teams, these will go beyond metrics to deliver well-visualized insights that serve as a single source of BI truth for your entire enterprise.
  • Performant and scalable graph technology (real-time transactions at scale): Engineers will be able to replace relational DBs with graph DBs as the central, enterprise-grade system of record—with all the real-time data analytic advantages that graph DBs have over relational and NoSQL databases

The GPU and the Future of Data Analytics

More than a decade ago, forward-looking data scientists started experimenting with GPU-accelerated databases in their quest to achieve real-time interactive insights from very large and varied datasets. The advantages of the massively parallel architecture of GPUs over CPUs quickly became clear, and today, GPU’s are utilized for their:

  • Very fast data ingestion rates, up to 13X faster than non-GPU platforms (see the SQream benchmarks below). 
  • Rapid time-to-innovation, with GPUs typically 10-100X faster than CPUs at processing the same workloads (runtime gain of 1.5X bandwidth ratio of GPU to CPU and sub-100ms latency).
  • More compact and lower cost, with a GPU being 6.5-20X smaller than a CPU. For example, 16 GPU-accelerated servers provide equivalent performance to a cluster of 1,000 CPU-based servers.
  • Real-time visualization of lightning speed processing results, leveraging the original GPU application—a powerful graphics rendering engine.

As we can see, the massive data computing capabilities of the GPU have played an integral role in speeding digital transformation, and enabling a host of previously impossible use cases – from OFSAA financial reporting, to network QoS, to always-on-marketing, and many others – in industries whose very survival is contingent upon being able to rapidly analyze and gain insights from their growing data. 

The GPU (working together with technologies such as AI and ML), has gone from being the gamer’s graphics chip, to being the go-to tool in the arsenal of the enterprise organization, helping telecom operators, manufacturers, bankers, etc. achieve reliable insights where and when they need them, optimizing performance, cutting costs and increasing revenue.  

You can learn more about the power of GPU architecture data-base by clicking here