Reducing your analytical carbon footprint

By Ohad Shalev

11.21.2022 twitter linkedin facebook

Environmental issues are becoming a big deal, for both economic and regulatory concerns, so companies are looking for ways to reduce their carbon emissions. Reporting the ESG efforts is becoming mandatory in many stock exchanges and the taxes for carbon emissions are growing. 

On the other hand…

When organizations talk about digital transformation, they are usually aware that it comes with a price tag. This cost reflects what it takes for an agile, relevant, and timely data infrastructure that can truly support a business wishing to become data-driven. The cost formula is simple – the bigger your dataset is, the higher is the cost to turn it into insights. 

 

With big data exploding and computing needs growing with machine learning algorithms, data centers have become the biggest contributors to greenhouse gas emissions from 2010 (33%) to 2020 (45%). Lately, it’s an even bigger problem since environmental impacts are a measurement for every company’s success – carbon footprint, energy efficiency, and how the business complies with local regulations.  Training one machine learning model, for example, results in the emission of 284 tons of carbon dioxide into the atmosphere.

Common sustainability solutions organizations pursue include:

  • Moving analytical workloads into the cloud, where they enjoy newer and more efficient hardware.
  • Prioritizing a more energy-efficient cloud (from a geographic point of view), that uses renewable energy or is located in a cold climate.
  • Using software that optimizes data center operations (shut down non-active servers,  etc.). 

 

However, there is another variant that most companies are not quite familiar with – the carbon footprint of the computation process, which is necessary for all data analytics. The Green Algorithms project made this calculation pretty simple for anyone and also helped us understand SQreamDB’s carbon footprint when it’s working with big datasets. This calculation also proved the claim that better performance (less ingestion and query time + faster insights) is also kinder to the environment.  

 

Knowing that additional processing cores and switching to a GPU-based database is not always better (actually, it could be a lot worse, as not all GPUs have a good wat-per-work ratio), we tested SQreamDB anyway. We used our field test deriving from the TPCx-BB on 300TB on AWS, and the results were surprising:

Green algorithm

SQreamDB has saved ~1/10 carbon emissions, proving that improved cost/performance in our case also comes with a reduced effect on the environment.

What can we learn from this test? 

  • The most important tool for every company is awareness. Start by calculating your existing carbon footprint, and predict how software or hardware changes will affect it. 
  • Choose vendors that both enable you to calculate your environmental impact and are transparent about the hardware they use (model of processor, number of cores, RAM available, etc.).
  • Reducing carbon footprint is mostly a result of computing efficiency. Faster time to business insights saves you carbon (and money). 
  • Better performance doesn’t always mean fewer emissions. You should choose a GPU-based software, as it performs much more work for every unit of energy exerted. 

Interested in getting deeper into the big data analytics effect on our carbon footprint? Check out our webinar, our CEO’s take on this topic, or some more of the author’s thoughts

If you want to learn more about SQreamDB cost/performance, check out our benchmarks