Check Your ML Carbon Footprint with the Machine Learning Emissions Calculator
Faced with dire reports of looming global catastrophe due to the ongoing climate emergency, many of us are taking a long, hard look at the carbon footprint of our daily lives — whether it’s from the food we eat, how much we drive or how often we fly. But sometimes it’s the most intangible of things that may actually be pumping out more carbon than we think — namely, the surprisingly large carbon footprint that can be associated with creating machine learning models — the same technology that underlies the apps on our smartphones, digital personal assistants and computers.
While using such tech might not necessarily emit all that much carbon, the cause for concern lies behind the carbon impact of the computational processes that go into training AI — and whether researchers and companies can be well-informed enough to choose less carbon-intensive options.
Until now, artificial intelligence researchers have not really had an easily available method to quantify the carbon impact. But that’s changing, thanks to a team from Canada’s Montreal Institute for Learning Algorithms (MILA), Element AI and Polytechnique Montreal, which recently released a tool designed to help those working in the AI field estimate how much carbon is produced in training their machine learning models. The project aims to raise awareness, while also spurring further discussion about the environmental impact of developing such algorithms.
The carbon footprint of training AI models “is often something that gets overlooked, since accuracy is really the main factor people consider,” said Alexandra Luccioni, an AI researcher with MILA and one of the study’s co-authors. “But as models and datasets get bigger, the cost in energy (and environmental impact) is going to get bigger as well.”
Indeed, as the field of artificial intelligence research expands, increasingly powerful and power-hungry hardware like GPUs (graphics processing units) are being harnessed to train machine learning models for a diverse range of applications, from natural language processing to computer vision. However, as artificial neural networks become more complex, more computational muscle (and therefore more energy) is required. It all comes at an environmental cost — albeit of mostly unknown quantities to the vast majority of AI experts.
Quantifying Machine Learning’s Carbon Impact
To tackle that question, the team’s Machine Learning Emissions Calculator is designed to take into account several main factors: the energy that is consumed by the system’s hardware; length of training time; the geographical location of the server being used by the provider of cloud computing services; the CO2 emissions per unit of electricity produced in that particular region; and any potential carbon offsets that have been purchased by the cloud provider. Once these variables are entered in, the calculator can estimate how much carbon is being generated during training tasks.
As the paper notes, it can be difficult to estimate exactly the amount of CO2 emitted by a cloud server, because that data is usually not made public. To get around that problem, the study assumes that servers are connected to their local power grids, and cross-references publicly available data from those grids with the known server locations of major cloud providers like Google Cloud Platform, Microsoft Azure and Amazon Web Services, thus allowing the calculator to make its evaluation. Not surprisingly, where a cloud server is physically located, and whether its local grid uses renewable energy sources can make a huge difference in how much carbon is ultimately emitted.
“It’s really impressive how big a difference in terms of emissions there can be, just based on where your model is training,” Luccioni told us. “People often choose the server based on availability, or proximity, or personal preference, but choosing a low-carbon server in a location like Quebec or California can reduce the amount of carbon produced by a factor of 100.”
Beyond carefully choosing a cloud provider based on the location of their servers and whether they use renewable energy sources or buy carbon offsets, another measure that could significantly lower carbon emissions is to avoid training AI models from scratch whenever possible, which previous studies have shown could push emissions much higher than if a pre-trained model was used. To that end, larger tech giants like Google could consider sharing their models so that people with less computational resources can build on top of them, rather than producing more carbon in training from scratch. As the team’s project continues to develop, it ultimately raises important questions of what might be the best practices to reduce AI’s carbon footprint.
“Being mindful of the amount of CO2 produced by a model and trying to do a trade-off (such as accuracy gained versus carbon produced), is a great first step for individuals,” said Luccioni. “On a company level, either offsetting or installing more efficient hardware can definitely go a long way.”
Nevertheless, there’s still a long way to go before AI’s carbon emissions can be more accurately predicted, says Luccioni: “There is a huge lack of transparency with regard to the CO2 produced by grids in different locations. So while we currently use the best publicly available data we could find, it’s definitely not 100 percent accurate, since we just don’t have that information. In order to improve our work, we call upon companies to disclose their emissions and energy consumption, so that we can improve our estimates. We hope that our work, along with others, will open the door for these conversations and debates to take place, to quantify the environmental impact of our field, and for positive changes that can be made to reduce it.”
Feature Image by marian anbu juwan from Pixabay.