Managing a “Big Data Overflow” Through Analytic Offloading

By Ami Gal

12.17.2013 twitter linkedin facebook

We’ve reached an age in which technological capabilities can analyze virtually anything, including Big Data. While this should be viewed as a major technological accomplishment, many enterprises are now being faced with the challenge of storing and analyzing volumes of structured, unstructured and semi-structured data sources which have simply outgrown data warehouses. Data driven businesses are now being forced to seek a smarter Big Data management solution.

Offloading: A Simple Solution to Big Data Management

Offloading is described by IBM as, “moving infrequently accessed data from data warehouses into enterprise-grade Hadoop.” Simply put, in order to deal with volumes of data coming from a variety of complex sources, enterprises are now transferring large data sets from data warehouses onto another analytic platform. An offloading solution provides companies with a long-term storage and ETL tool, enabling enterprises to combine their current data warehouse with a new technology.

Preventing a “Big Data Overflow”

Data warehouses are capable of storing volumes of past or archived data without much updating required. An offloading solution should be applied when an enterprise is confronted with what I refer to as a, “Big Data overflow.” A Big Data overflow occurs when a companys’ data warehouse can no longer accommodate the volumes of data flowing in from social networks, online sources, customer records, machine-generated log files, etc. (unstructured/semi-structured sources).

The Benefits of Offloading

Offloading serves as a solution to unconstrained analytics and helps companies capture information which was once considered unobtainable due to data warehouse restrictions. When utilized correctly, offloading can provide companies with 3 main benefits:

  • Instant access to analytics – Offloading ensures that data is immediately accessible for running queries, meaning data gets in and out at an extremely fast rate. This saves time and money and allows enterprises to make faster, more informative decisions.
  • Access to all data sources – Analysts can now view information coming from unstructured and semi-structured data sources which were once too complex for data warehouses. This leads to more advanced algorithms and real-time capabilities.
  • Faster analytics – One data query usually takes an entire day to run but data teams can now run multiple data queries within a day by using an offloading solution, ensuring a higher time to value performance rate.

Finding the Right Solution – Comparing Key Players

Transferring data from one source to another is easier said than done. Moving information from a Data warehouse to another platform can be extremely expensive and time consuming, depending on the technology. Apache Hadoop, Amazon Redshift and SQream Technologies all offer offloading solutions which are compared below:

  • Apache Hadoop – Hadoop can be used to analyze unstructured and semi-structured data sets without the ETL step needed to insert them into a traditional database. This allows users to load their queries in a single step. In a report conducted by hapyrus, Hadoop combined with Hive cost $210 to run a query every 30 minutes and took 1491 seconds to run their queries for 1.2 TB of data.
  • Amazon Redshift – Amazon Redshift serves as a fully managed data warehouse service in the cloud, meaning users can run queries extremely fast with the same SQL and BI tools they already use. Hapyrus compared Hadoop + Hive to Amazon Redshift, another Big Data processing solution, noting that Redshift took only 155 seconds to run their queries for 1.2 TB of data and cost only $20 to run a query every 30 minutes. According these results, Amazon Redshift is 10X faster and 10X more cost-effective than Hadoop.
  • SQream Technologies – SQream Technologies has developed an alternative solution which outperforms both Hadoop and Redshift in terms of cost and performance. We are capable of running queries 3X faster than Redshift and we are running tests which show that the speed of our technology is constantly increasing. In terms of load time, we outperform both Hadoop and Redshift by 17X during the offloading process and we run the SQL so our customers do not have to rewrite their own queries. Also, the TCO of our system is much less expensive compared with Hadoop and Redshift – we consume less power and cost less in hardware. This is because we are using a cutting-edge solution to accelerate the data analysis process by using GPU technology to pre-process data (combined with today’s best-practices).

Do the Math

Data only continues to get bigger as we approach 2014. The bottom line remains – enterprises that want to get the most from their Big Data must be equipped with a hassle-free and cost-effective data storage solution. Apache Hadoop, Amazon Redshift and SQream Technologies all offer great options, the next step is finding the best fit for your company.
Learn more about our technology! Follow SQream on twitter for daily Big Data tweets, like our Facebook page, join us on LinkedIn and find us on Google+.  Leave your comments below too – we can’t wait to hear what you have to say!