Scaling Laws: AI and Energy: What do we know? What are we learning?

Kevin Frazier; Mosharaf Chowdhury; Dan Zhao

Cybersecurity & Tech

Scaling Laws: AI and Energy: What do we know? What are we learning?

Kevin Frazier, Mosharaf Chowdhury, Dan Zhao

Tuesday, October 14, 2025, 10:00 AM

Share On:

Discussing the energy costs of artificial intelligence.

Meet The Authors

Published by The Lawfare Institute
in Cooperation With

Subscribe to Lawfare

Mosharaf Chowdhury, associate professor at the University of Michigan and director of the ML Energy lab, and Dan Zhao, AI researcher at MIT, GoogleX, and Microsoft focused on AI for science and sustainable and energy-efficient AI, join Kevin Frazier, AI Innovation and Law Fellow at the University of Texas School of Law and a Senior Editor at Lawfare, to discuss the energy costs of AI.

They break down exactly how much a energy fuels a single ChatGPT query, why this is difficult to figure out, how we might improve energy efficiency, and what kinds of policies might minimize AI’s growing energy and environmental costs.

Leo Wu provided excellent research assistance on this podcast.

Read more from Dan:

Find Scaling Laws on the Lawfare website, and subscribe to never miss an episode.

This Scaling Laws episode ran as the October 17 Lawfare Daily episode.

To receive ad-free podcasts, become a Lawfare Material Supporter at www.patreon.com/lawfare. You can also support Lawfare by making a one-time donation at https://givebutter.com/lawfare-institute.

Click the button below to view a transcript of this podcast. Please note that the transcript was auto-generated and may contain errors.

Transcript

[Intro]

Alan Rozenshtein: When the AI overlords take over, what are you most excited about?

Kevin Frazier: It's not crazy, it's just smart.

Alan Rozenshtein: And just this year, in the first six months, there have been something like a thousand laws.

Kevin Frazier: Who's actually building the scaffolding around how it's going to work, how everyday folks are going to use it?

Alan Rozenshtein: AI only works if society lets it work.

Kevin Frazier: There are so many questions have to be figured out, and nobody came to my bonus class. Let's enforce the rules of the road.

[Main episode]

Welcome back to Scaling Laws, the podcast brought to you by Lawfare and the University of Texas School of Law that explores the intersection of AI, policy, and, of course, the law. I'm Kevin Frazier, the AI Innovation and Law Fellow at Texas Law and a senior editor at Lawfare.

Artificial Intelligence is sometimes framed as a magic bullet for solving big problems from discovering new drugs to planning smart cities. But the infrastructure that powers these models uses electricity and water and a lot of it.

A December 2024 Department of Energy report found that AI data centers already account for 4.4% of electricity consumption, which is estimated to double or triple by 2028. That's a trend that seems likely to continue even after this decade, given that OpenAI announced the construction of five new data centers as part of Project Stargate and other labs seem poised to follow suit.

So why exactly does AI use so much energy? and is it cause for alarm or merely a fact of technological advance? Today on Scaling Laws? To explore these questions and more, we have Mosharaf Chowdhury, who's an associate professor at the University of Michigan, and one of the directors of the ML Energy Lab, and Dan Zhao, an AI researcher at MIT, Google, X, and Microsoft who focuses on AI for science and sustainability and energy-efficient AI.

Giddyup for quite the ride. To get in touch with us, email scalinglaws@lawfaremedia.org. And with that, we hope you enjoy the show.

Thank you to both of you for coming on.

Mosharaf Chowdhury: Thank you for having us.

Dan Zhao: Thank you for having me.

Awesome. Mosharaf, what would you say is the common understanding of AI's energy consumption, to the extent there is one?

Mosharaf Chowdhury: So, AI's energy consumption have been in the news a lot for last three, four years. And at the beginning there were no good tools to precisely measure how much energy these models consume for training and inference.

And as a placeholder, people were using––and that's because for the lack of better tools––they were using estimations to get a sense of rough order of magnitude of energy consumption, which essentially means that how much a GPU consumes at its peak and then how many GPUs you might have, and then assuming that many GPUs are needed to do a training or for inference. Then you just multiply all these big numbers and then assume that they are always running all the time. Then you end up with a very large number, which have been reported bigger than Netherlands and Ireland and so on.

So [inaudible] is in that sense, there is a lot of overestimation. It used to be, and that led to many news articles which are honestly, I don’t know, make sense. If you see those numbers, people would be concerned. So I think that's what the court is referring to.

Kevin Frazier: Right. So we've seen all these stats, as you mentioned, from the early days in particular of AI where seemingly every day we were learning, oh, now it's the energy rate of a hundred thousand homes for three months at the height of summer, and now it's the amount of energy of Ireland, and now it's New York during the Super Bowl, and so on and so forth.

Which are great for tweets. I'm sure they get a lot of viral traction, but maybe not the most empirically driven. Mosharaf pointed out the critical difference between training and inference that comes into this picture as helping us get a sense of where this energy use is actually coming from.

Dan, can you kind of explain for listeners who perhaps aren't as steeped in AI vocab as the three of us presumably are, what is the difference between training and inference, and how is that relevant to this energy conversation?

Dan Zhao: Sure. I think that's a very good question. So, a while ago––and you know, when I say a while, I contextualize that in AI space, so just a year or two ago, right––people were starting to––

Kevin Frazier: So in AI time, that's at least like 15 years ago. Exactly

Dan Zhao: Right. And so basically, if you think about it this way, everyone was very occupied with the energy costs of training, right?

These large behemoth models needed to undergo something called pre-training, which basically means you take a very large model, you feed the tokens, and essentially it trains very slowly, depending on your hardware or the generation of GPUs you have, the networking, the clusters you have. So nothing really accessible to your common folk with a single GPU, for example, consumer grade.

But these sort of large companies, large labs, they're basically training these models on tons and tons of tokens, right? And basically getting this sort of large language model out of it. So you can think of your ChatGPT for example. You can think of your Llama that are released, for example. And so during these times, you are actually trying to train your model to get up to a certain level of performance, and then that model should be good enough for all sorts of downstream tasks, from chatting, tool use, so on and so forth.

Inference is a little different because inference has kind of allowed people to think a bit more flexibly about taking these models, so they don't necessarily need as many computational resources for as long or as intensively. You basically think about taking a model that's already pre-trained and then really setting it up for deployment downstream, right?

So when you think about inference, it's basically the things that people nowadays are much more familiar with. When they engage with chat, for example, you're basically throwing in a query. It's coming out to the other end response. It seems easy, but under the hood, there's a whole bunch of complicating factors there.

There are ways to route different requests, if you think about it, millions of requests maybe every five minutes. Go to ChatGPT. They need to find a way to route it through their servers, make sure their servers don't crash. They need to make sure that things are maintained properly. Quality doesn't degrade.

They need to think about toxic software. Think about harmful content they need to filter out. And so inference is really, you take a trained model, you throw things in, then you hopefully get desirable things out, and that's much more faster and much more, easier in some senses of the word than training, for example, is.

Kevin Frazier: And so one of the things that I got called out on in preparation for this podcast, and thanks to excellent research from Leo Wu was the assumption I assume many people hold, one of those common understandings that the energy use has to be this training function, right? Whenever we hear about training these large language models, and in particular the frontier models here, we think about these massive data centers being built in the middle of nowhere with tens of thousands of GPUs and even more CPUs. They're running 24/7, 365 days a year and just, they need water to stay cool. They need tons of energy sources. Surely it's got to be the training that's driving these energy costs.

Mosharaf, why is that not the case? Where are we seeing the preponderance of energy use coming from? So.

Mosharaf Chowdhury: That's a very good question. Thank you. Training essentially happens, say, only once. It's not really once, but just to get a sense of, you train a model once and then when it gets deployed, as Dan was saying then millions of people at this point, hundreds of millions, certain people are using the same model.

And of course you retrain the model, you fine-tune it many different way, but none of them comes close to how many people are actually interacting with the model. So a single training instance consumes a lot of energy, tens of thousands of watt-hours, based on some of the smaller model and open source model that have been published. We don't really know how much actual energy is consumed by training ChatGPT-scale models.

It looks big because we can point to, this model took this much energy to train. But it gets much, much bigger when you think about all the millions of people sending, you know, tens of queries each person. Each inference request consumes small amount of energy. But when you multiply it by hundreds of million, they add up to very big number.

And all these big data centers, they are not necessarily just serving training. They're also serving inference requests. So, the ratio between energy consumption of training and inference depends on what company you are talking about and what model, but many of the numbers that I have heard from different providers ranges from 30/70 and 40/60.

I mean, the smaller one being training and the bigger one being inference. So, at the beginning, when these models were not very popular, you could easily think that training consumes the lion’s share. But as more and more people use, it easily gets dwarfed by all the people and all of their requests.

Kevin Frazier: And I think what's critical to point out about the fact that inference is driving so much of this energy use, the calls you're making to the LLM, the prompts you're sending. Presumably this is only going to skyrocket as we see AI agents become more and more ingrained to our daily lives, these AI tools that are able to act autonomously on our behalf if they are taking all these actions, pursuing all these tasks for us.

The amount of inference energy costs should presumably only go up. Which raises a lot of questions that we'll get to in a second, such as whether we should say thank you to our AI or not, but I'll leave that aside for just one second.

Dan, we have talked a little bit about the fact that we need a lot of water. We need a lot of energy and electricity to make sure that we can train these models and engage in inference as well. What does it actually look like in terms of where the water's coming from, what sources of energy we're relying on? And why might that be a concern for folks from a sustainability standpoint?

Dan Zhao: Sure. So if you think about it, and I'll probably talk very simply without going too many of the details, just so folks get the big picture, right?

Kevin Frazier: You’re just doing it for me, which I appreciate.

Dan Zhao: I mean, everyone appreciates the finer details, but I think the big picture here is probably more important.

So if you think about, let's say when I was at MIT, for example, at Lincoln Labs at ***Sea Sale and the Supercomputing Center, you walk in and you basically see racks among racks of these sort of GPUs that are basically just sitting there. And when you walk in, the first thing you notice it's quite hot, right?

Like these GPUs, people can think about these GPUs as these small black boxes. Essentially, they consume power because they need the power to run calculations, do a whole large sum of number of electronic operations. And basically these GPUs kind of run and you know, there's this common trope for me that GPUs go brr. They do, they indeed do go brr, and to do so basically requires switching powers through a lot of its billions of transistors that are kind of on these GPUs to basically come up with these mathematical operations for either training or inference or whatever it might be.

Right. So obviously the GPUs stood at the center of that. Now with the GPUs, like even when the GPUs aren't active, when they're idle, like these transistors, they still kind of have static power leakage. So, there's still power that's being consumed there, even if not being used. And one thing I should note is also like not many people know how to efficiently get every single drop of efficiency out of GPU usage, right?

When people run a model, load a model and run it on GPU, they don't know things like, oh, I need to basically overlap my communication costs with compute. I need to keep my GPUs busy. If you didn't know what that meant, forget I said it, it's not a big deal. But GPUs are an expensive resource and we need to keep them busy and running.

Other things involve things like, oh, if you want to basically saturate the memory of your GPU, otherwise a lot of this sort of power that's going to be consumed anyway is kind of going to waste. Right? And so there are these considerations that are restricted to the GPU as well. So that's one aspect. Then there's also––

Kevin Frazier: Pause on that though.

Sorry. So you're essentially saying that because GPU time is so scarce and so valuable, right? We hear all the time about new startups that are just renting out GPU space, for example, just so that folks can train new tools or allow for greater inference. So, the economic incentive here is to run these as much as possible for as long as possible, rather than saying, oh, you know, let's take a break and let them breathe and stretch before their next training.

Dan Zhao: There is a human dynamic here as there is to all human systems, right. And as a former economist, I tend to appreciate the frustrations that go with that. Let me put it this way, democratizing AI is great, right? So, more people have access to resources, to compute, to run experiments, and hopefully innovation will be born out of those as well.

But on the flip hand, you have more and more people using GPUs inefficiently, and those GPUs are going to be run one way or another. It's in some form a tragedy of the commons, if you think about it, right? Because suppose that––this is what we did at MIT, for example, we came out with a system that basically capped the power draw of GPUs.

So what that basically meant was we limited the amount of power these GPUs could draw. And this is currently in practice at MITs Supercomputing Center.

Now what this did was we save a lot of energy in aggregate, and we found there a sweet spot where users don't notice a single or any sort of perceptible difference on their job performance.

The problem there is, well, if they realize that they're, we've capped their power and they were able to get their jobs done, you know, in shorter amounts of time and they were still saving energy, well, people might still order more jobs to run anyway just because, right?

And so that energy or that effort you spare might still get used up anyway, because people are going to say, oh, those GPUs are still going to be running. I might as well be running something on them. Right?

And so when more people come into the deep learning world and they're running GPUs, you know, learning how to run GPUs efficiently is very important.

And sometimes it's the job of big labs and other places that kind of do the behind the scenes magic to make sure that you squeeze every drop of efficiency out, if that makes sense.

Kevin Frazier: Right. So essentially you could think of a novice farmer going off and trying to gather all the wheat. They might do it in a horribly inefficient manner. They're expending all these man hours, they're using the wrong tools, so on and so forth, way more labor-intensive.

Whereas the trained farmer, the big labs, for example, they know how to harness these resources most efficiently. So that's, is that a fair characterization?

Dan Zhao: It is fair, but it's also clear that, you know, there's this knowledge and information asymmetry, right?

Which is natural. You can't really blame one side or the other. Like for example, even if you had folks who––if you gave folks full access, for example, to the GPU, to the scheduler behind the GPU resourcing, if you gave people access to the networking behind the cluster, like, they wouldn't really know what to do with that anyway, right?

And so it's not necessarily a fault that' be assigned to either side, it's just that it's a natural consequence of any new technology that needs to be used, especially if this new technology comes with these sort of costs that need to be borne by one side, inevitably, right, that then spill over aggregately into society.

So it is a bit of a sort of natural circumstance as well.

Kevin Frazier: Right. So you have to learn how to farm to start planting stuff. That's right. That's going to be a natural process.

So, Dan, I cut you off. You were going, you were mentioning that we have the GPU kind of incentives in terms of running frequently or using them to the greatest extent possible.

What's this second kind of driver of energy consumption?

Dan Zhao: Yeah, so when you think about a lot of these things that come up nowadays, I think Mosharaf mentioned this as well, like demand. It's just simply a matter of demand as well.

People, more and more people are wanting to use these things. More and more people might use them inefficiently. And more, as more people use these things, and as this sort of technology and span of capabilities grow, the more we can do with these models––I'm not sure if you saw, Sora 2.0 come out the other day, its video generation capabilities are amazing––the more that's going to take. And then the capabilities––

I'll give you a more concrete example. So, as capabilities grow, modalities grow. So for example, video generation is one example that we didn't really think about as much as text generation back in the day. And again, back in the day is like two years ago.

And so things are going to grow. And then this also plays into what we were mentioning earlier, Kevin, about agents, right? So for example most recently when I was a senior research scientist at Microsoft, when you think about agents, the amount of iterations that they take in a single action is far more than a single round of chat between you and ChatGPT.

The fact that you also then can use images. Images also spend quite a bit of tokens, more so than text sometimes, depending on the number of images you're using to do a single task, which is more, much more complex than answering a single question, for example, on a computer. Like, that's going to drive things.

So the increasing capabilities, the increase in demand, along with what I mentioned at first, I think those three make a powerful combination.

Kevin Frazier: Thanks, Dan. And you're affirming one of my bad jokes, and I have many of them, which is if a picture is worth a thousand words, then a generated image is worth 10,000 tokens. So you're welcome to steal that from me anytime you want.

Mosharaf, what are you doing about this? What is your team doing to help solve this issue? Because we mentioned earlier at the outset that there are these holes in this conversation. And yet, you know, Google released a report saying that there's about 0.24 watts per median query that gets used when you're using Gemini, and apparently that's the same as operating a microwave for a second.

I don't really know what that means in terms of how I should change or alter my approach to using Gemini or ChatGPT, but the labs are sharing information. Sam Altman sent out a tweet about some of OpenAI's energy uses. What gaps are you filling, or how is your team working on this issue?

Mosharaf Chowdhury: So, thank you for asking this question.

So yeah, we have to go back to earlier when I said that there were gaps few years ago because people simply didn't have tools. So, for the last almost five years, the University of Michigan and with many of my colleagues across the U.S. I'm leading the ML Energy Initiative.

And we are trying to build tools to precisely measure and understand and then optimize energy consumption of AI. So one of those tools is Zeus, which we built from ground up where it can interact with different kinds of GPUs and CPUs and memory and collect over time how much energy is being consumed by the workload that is running and the workload being AI inference or training.

And using Zeus, we have measured the sort of––as precisely as possible within the GPU, how many watt-hours or joules of energy are being consumed. And we produce this this output which I will refer to as the ML energy leaderboard, where we measure the energy consumption of open-source models. So it led to two things.

One is, it gave people a tool, but also a methodology in terms of how to think about energy measurement. At the beginning of the podcast, when you mentioned this MIT Tech Review article, they actually used our tools and worked with us for almost six months to measure all the models we did, but couple other questions they have had.

So these tools allows us to actually help journalists to collect all these numbers. And that also caught a lot of attention from different companies. For example, if you read this article or white paper from Google, you will see that they also referred to ML Energy Benchmark, and they talk about how we measured and how they have expanded the methodology to make it even more accurate by considering idle machines in their data centers.

And they came up with this median number. By the way, which should be watt-hour instead of watt, because it's energy. But that's okay. And so that's one part. What we are doing is making tools available so nobody can say that it's impossible, it's very hard. We don't have means to measure it, and it's difficult.

The other part that we are doing is in the optimization space. So Zeus measures, but Zeus also optimizes. The way it optimizes, part of it is figuring out the precise power cap to set in a job specific way than earlier mentioned. At MIT, they were capping all of the servers. But Zeus has a optimization method that sets individual GPUs’ power in a unique fashion that together works well for the training job to actually perform better.

And it can be applied to small models where it fits in a single GPU, but it can be applied perhaps even more effectively for bigger models because of the distributed computation structure of the model. So essentially, I will, I know, simplify it and give an overview of why it works for distributed model are bigger models because they're too big, they don't fit in a single device.

Oftentimes you need 1632 GPUs just to hold one copy of the model to start training. Fewer for inference. But essentially, what happens is that the computation flows through all of the devices. And that led to, that leads to multiple computation paths. Some of them we will call the parts that are dictating the runtime and some non-critical, the ones that dictate are the critical ones.

So all of the computation that are happening on the critical path, if they slow down even a little bit, your entire training, your inference request will slow. But all the ones outside the critical path that are non-critical they can be slowed down as long as they don't become slower than the critical part.

And so what we have built is a tool that automatically finds this critical part and precisely computes how to slow down everybody outside the critical part by setting precise frequency of the GPU that it should be operating at and precise points. So it's a coordinated dance across thousands of GPUs that happens at tens of millisecond granularity.

And then that allows us to save up to 20, 30% of energy consumption of training. So when you said 300,000 or however many homes, now you can get the same thing done in 200,000 of how many homes and save all of this energy. Which as Dan mentioned again, like will still be used for doing more training instead of saving, ut at least our work is making sure we are effectively using the energy we are paying for.

Kevin Frazier: Well, Mosharaf, I'm still trying to get the two-step down here in Austin, so I think the dance you just described is far beyond my skills. But I'm impressed that we're seeing just how nuanced we can be in making these training runs more efficient and making inference more efficient.

And I wonder how we can try to get more transparency around the fact that these efficiency gains exist and what would it look like to sort of mandate or encourage the adoption of those mechanisms.

But I'll get there in a second, because I think it's also important to flag that your benchmarking work is pivotal in terms of getting a more holistic picture of where and how energy is actually being used by these models. But you noted something critical, which was you were analyzing open-source models, and as I'm sure all good Scaling Laws, listeners know.

The big boys, for lack of a better phrase, with the exception of Llama from Meta, are all closed sourced. And so Dan, how are we trying to remedy the lack of information and the lack of transparency around some of the biggest companies and the biggest models and the energy uses here? And what work is your team doing to amplify some of the efficiency gains that Mosharaf was talking about?

Dan Zhao: Yeah, so our work at MIT focused a lot on sort of energy efficiency from a very early time period. But you know, at the time we didn't think that this would actually go anywhere. This was back in 2021, 2022. We were looking at, like, training, measuring the energy of training and inference for, geez, back in the day, Resnet.

So these were old CNNs that still see some use nowadays for computer vision. We were doing things for BERT-like models, simple BERT-like models, GNNs for molecular inter atomic potential measurements and things like that. We eventually moved on to works where we tried to benchmark the inference costs or energy inference for LLMs.

But back then it was, Llama had just come out and no one really was paying attention to energy inference. The main difficulty is that it's almost near impossible, unfortunately, because there are several factors at play here. The most reasonable way people have gone about it has been trying to find models of a similar size be in terms of the number of parameters or flops comparable to, you know, whispers on the wind as to what you would think the big companies are doing.

Or in some ways tracking alternative data, right about like measurements of certain emissions. But then again, energy does not equal emissions, because different energy, whereenergy is drawn will translate into different emissions. So if you're using like energy from wind farms and windmills versus energy from coal versus energy from nuclear, those emissions are going to be vastly different, right?

So all these approaches are imperfect, but they try to get at a scale of measurement. And so that makes things very difficult when it comes to trying to understand it from the angle that you're describing. So what most people have tended to do has been just to try and offer improvements, right?

Because at the end of the day, dollars and cents are more likely stronger incentives to sort of move things around. So, advances in both hardware as well as advances in the model algorithmic side, research which we do, is just trying to put out there that hey, if you adopt this, you can reduce energy, which means you can basically push in more throughput or save on dollars and cents.

GPUs go brr, and basically everyone's happy, right? But like I said at the beginning, there is this information asymmetry that does make things difficult. And so the proxies are what we're using as open data, especially during our work at MIT. That's essentially what we can do at this point.

And basically using these open models as test beds. So for example, if you look at Llama 1, Llama 2, Llama 3, 3.1, 3.2, et cetera, using these techniques and adapting them and applying them on these models and saying, ‘hey, look at how much we've reduced model flops this way. Look at how much latency we've saved this way look at, or more specifically like, oh, benchmarking on a V100 or a 100 or whatever else.’

Saying like, okay, in our setting we show that energy efficiency increases this way, but only really the large labs themselves when they do it on their own hardware, on their own networking stacks, so on and so forth, that may know the precise numbers, but all we can do sort of offer solutions that we see work on open benchmarks, open models, and then basically taken from there.

So that's how I would think, or how I see the current state of things today, at least.

Kevin Frazier: Yeah, so this will come as no shock to regular listeners, which is to say I'm generally pretty bullish on AI and notwithstanding my optimism around its use cases and its potential to help ameliorate many social woes, I think there's a sense economically, culturally, so on and so forth, AI's here to stay.

Obviously, it's not going anywhere. The amount of billions, if not soon trillions of dollars invested in this space suggest that the momentum will continue. And so I think that your point, Dan, of saying how are we going to help this process be more efficient because we know it's going to continue makes a heck of a lot of sense.

But if you don't mind, I want to personalize this for just a second, because I think there are some folks you know, you can read the New York Times or you can just observe it anytime there's an AI conversation, there will be someone who will say something along the lines of, ‘I don't use AI or I rarely use AI because I think of the environmental ramifications as being too high to justify whatever Studio Ghibli meme I'm going to get out of this or whatever silly new joke I'm going to have it generate.’

For the two of you, and Mosharaf, I'd love to start with you if you don't mind answering, I'm guessing you use AI pretty regularly in your day-to-day life, or are you a AI vegan who tries to limit as much as possible your AI use? What's your general approach?

Mosharaf Chowdhury: I use AI when needed.

I was talking to my student yesterday and they were asking like what AI do I use? I said, I don't pay for AI. So, I just use for very simple things like when I need turn, something to be quickly returned, proofread. And that type of work. But not too much in terms of idea generation.

Not because I am for or against. It’s just that I want to, I don't know. I feel like ideation is one of the fun part of being a professor and I want to keep it to myself and take some time to think about ideas. But I mean, it sort of makes sense to use it as a tool, like any other tool, when you feel like it's going to make things faster.

At least that's the way I see it. To me, it's just a tool that is very good at doing certain things and the things that I think it is good at, I'm going to use it for,

Kevin Frazier: Right. When you see a nail, you grab a hammer. When you need some improved editing, you grab ChatGPT makes total sense there. I get that.

Dan, how about yourself? Are you a frequent AI user and in what senses?

Dan Zhao: I was going to say, technically I have to be as a researcher, you kind of have to tinker with these models when you work up, for example developing your agentic models or trying to make models more efficient, you kind of have to use it, although probably not in the way that most people would want to use it.

In terms of communicating solely via text, I was the largest holdout for a very long time. Simply because I, one didn't want them to have my training data. That's number one. Perhaps a bit paranoid or futile in the end, but you tell me. Number two, I think it was also because so the other reason was because I was always suspicious of like, oh, if I use this and this does reduce friction and save time for me, I'm wondering whether or not certain sets of gears in my head will no longer click as quickly as they would in the past, right.

So I can try to at least very be conservative in terms of my usage. Although I will admit, like in certain things, like there's this for example, when I write papers, I use low-tech, right? Overleaf has come in, that's helped. And now I don't have to spend half an hour trying to correct the table formatting because I can't figure out how, why the table won't render a certain way.

I'm sure Mosharaf off knows exactly what I'm talking about as well. I give it to ChatGPT, it figures out for me, I don't ask questions. Yeah, flame me for that. Feel free. But that's how I see it.

Kevin Frazier: I will not filet you for that. And I will not detail all the images I use for my students that try to make, you know, boring case law a little bit more exciting.

Yeah. But it sounds like from the two of you the main driver isn't necessarily in a concern out of energy consumption by way of using AI when it comes to your own personal habits. So can you steelman and really bolster the case for the people who say, ‘I don't want to use AI because of these energy costs.’

What is the most persuasive argument for saying, yes indeed you should refrain or perhaps scale back your AI use because of the energy-intensive nature of it.

Dan, I know this is a toughie, but I'm going to start with you.

Dan Zhao: Sure. I would say that it's all kind of harks back to what I was saying at the very beginning about the tragedy of the cons, right.

If you won't submit that single query, someone else will, right? And so that energy, that opportunity cost is very small in that sort of aggregate setting, right? And so if, from a purely economic point of view, if you're thinking about the productivity gain versus the very small sliver of energy cost––so, for example, one of our works on benchmarking the large language model energy costs showed that a single query, it's not really worth a lot really in terms of energy from a single person.

Over time, over demand, over usage and utilization, it will grow. Absolutely. But as an individual submitting a query once or twice for a session, it's not, it's a negligible difference, right. It's really the common, the tragedy of the commons and the coordination issue that comes up in aggregating an externality that kind of really produces these effects in aggregate.

And that's sort of something that we're worry about. And it's also something that can only really effectively be addressed on the aggregate at the higher level. So I would say like given the productivity benefits potentially to you for using these, for me, for example, formatting the tech tables will save me a whole bunch of time.

I'm not claiming I could use that time to save the world, but I can use that time to continue research on energy efficiency, like the work at MIT, right? To be able to do these things. But that's, that probably would be my first and foremost argument when I think about why am I myself not cutting back?

Am I just being a hypocrite? I'm sure I am, but at the same time, when I think about the actual concrete effects, that's sort of the calculus that goes on in my head.

Kevin Frazier: And Mosharaf, I wonder, when you pick up the paper and you see things like using AI for this number of queries at this time of night is akin to turning on the microwave for eight seconds, do you think, can we please stop doing this? This is so annoying, this isn't really helpful to anything. How, if you were the AI information czar for a day, how would you change how we're talking about AI and energy?

What do you think are the most important things that policy makers should be talking about that the public should be aware of? That labs should be disclosing?

Mosharaf Chowdhury: It's very hard question because people want something that people can relate to, so they can understand, okay, this is how much we are talking about. It could have been instead of a microwave, some people have used how many light bulbs, how some people have used, how many miles and meters in electric vehicle, how many homes.

Sometimes I must have also used, I don’t know, how many pounds of coal to burn. People just want something physical so that people can relate to. And I think things like microwave and teaspoon of water, those things come up. They're looking for household things so that everybody can think of, okay, this is, how much is it?

And it's nothing. I'm opening the microwave so many times a day. And just to get across the sense that, as Dan was saying, individual users, one query is so small. It only is a problem because all of us are using it, and it would only be solved when most of us stop using it. A few people adding and deleting is not going to make a dent one way or the other because each of the individual query is not infinitesimally small, but really really really small.

And so in terms of how to express this, one way of going about it would be, I think changes in culture and education system, I would say. Because it is very easy for us to understand that second versus ten second and a hundred second, and what it means.

And in the U.S. of course, we use like, how many miles you have to drive. It doesn't have to be time the other way around, depending on who you talk to. So similarly for energy, we have to create and sort of cultivate this language that we can use that everybody understands. I think the, all of these different comparisons show up.

People are still searching for the right way of posing it so everybody can understand the same thing and people keep looking for different examples.

Personally, I don't really have any suggestion like people have suggested to me many different ways because every time I give this type of talks for years, I just say, okay, this many joules.

And they say, okay, you should use these or that. I don’t know what these joules even mean, like, how can I relate to it? But none of them seem perfect, so I end up not using any of them. And unfortunately, as the information czar, I don't really have a solution that I can provide right away.

Kevin Frazier: Mosharaf, I'm sorry, you're definitely fired from your job, but it's okay. You know, you can put it on your LinkedIn still. You were information czar for all of two minutes.

So, Dan, I wonder for the folks who are concerned about this space and want to make sure that we are using and training AI models as efficiently as possible, what are the primary bottle fit––excuse me, what are the primary bottlenecks that you and your team may be facing? Is it a matter of a lack of information from the labs? Is it a matter of resources for your own work? Is it what's holding this research back?

Dan Zhao: Yeah I'm probably going to take a rather unconventional answer compared some of the answers I've given on similar places.

So, I think public education is probably going to be very important. When we think about, so given the places I've been, for example, right? Like, in big research labs, in public academic labs, industry, so on and so forth, everyone wants to do the sexy thing. So, they want to be like, oh, I want to build the next agent that gets, you know, 99% performance on OSWorld or Windows Agent Arena, which are like these agentic benchmarks that are built nowadays that determine what are state-of-the-art in terms of like, computer usage agent capabilities, right?

No one's really thinking, oh, how do I really save energy? And part of this is because, per Mosharaf’s point, like it's a bit abstract to think about what that necessarily means.

And also measuring it is so difficult as well, right? And so as a result it becomes very murky. So I think public education's important, having people understand what a GPU does and how you can translate making––based on that understanding of what GPU, how GPUs work, right?

And understanding, oh, this is why I want to fill up my memory, pick up a batch size along with the model size, just so that the GPU memory's filled or saturated. I think people need these, kind of focus on the sexy things. They understand like, oh, this is how, these are how LLMs work, these are components that go down, this is what self-attention is, so on and so forth.

But they rarely think about, oh, this is what's actually happening on a device, on the GPU, right? Or, oh, when this is getting sent to a data center, this is what's potentially happening. Or when they think about, oh, this is how a GPU architecture and how that maps, let's say, a loop in deep learning training or inference.

Like, their deeper understanding and appreciation of this will not only make sure that those individual effects feed up on aggregate, but also at the same time, they're also going to get a bigger bang for their buck when they run these GPUs, right? They're going to get more efficiency, they're going to get better performance. So it's a benefit from both sides.

It's just that it takes that fixed cost to overcome for people to learn, because it's not easy. If it were easy, everyone would be doing it. And there's also that initial cost of inefficiency that's required to actually learn and get there, like I mentioned at the beginning. So, at least that's my somewhat optimistic take at this point.

Kevin Frazier: Yeah it's really interesting because for me, and again I'm just a lowly law professor, the thing that comes to mind is, you don't need a Ford F-150 to drive in downtown Austin, right?

The roads are stable. You're not going over any massive cliffs. You're just driving on a paved road. You can get by with a Fiat or like I do, you can ride around on your bike.

But in the context of using AI, there's this idea of, oh, well, why not just use the reasoning model? Why not go to Deep Research to answer the question of what should I eat for dinner tonight, right?

You're going to use that model in a far more energy-intensive way. And if we're not talking about this in kind of layperson's terms, then users may not understand the difference between how they're engaging with these models.

But to your point, Dan, obviously there's also just an education point on the startup side of, if you are getting into this space, thinking how you can pick up on those more efficient training mechanisms sooner rather than later is super fascinating.

So, Mosharaf, given that, you know, you all have the world to save, I don't want to take up too much of your time. What is driving your optimism in this space or your pessimism in this space? What is kind of top of mind for you when you put your head on the pillow?

Mosharaf Chowdhury: I'm optimistic. Let's start with that. I think there is a lot of efficiency to be extracted. So I have this vision, what I call energy-optimal AI.

As I think it came up multiple times, if we don't do it, someone else will. So AI is going to happen. What we want to do is to make the AI happen within the same amount of time, getting to the same level of accuracy or more, but figuring out what is the minimum amount of energy we need to sort of spend to get to that point.

And to do that, we need an approach that's what I call a full-stack approach, starting at the very top of all the models and algorithms that people are creating. There is a lot of innovation that are happening there.

In the middle, at the software layer that is translating all of the models to be executed on the hardware, there's a lot of work we are doing, Dan is doing, that a lot of other, our colleagues are doing.

And then at the bottom layer, sort of in the hardware level, there's massive amount of progress that are happening. There are new kind of accelerators coming up. And existing ones are being updated and made more efficient and added with more course and so on and so forth.

So, as AI becomes commoditized and democratized it'll also get cheaper because everybody at every layer are working on to make it more efficient and cost-effective, even for these big boys. Because at the end of the day, everybody wants to, I don't know, make sure that they give better service for lower cost, either to make it cheaper for everybody, or at least to make more, I dunno, profit for themselves.

So the, for the fixed amount of AI that we want, that, the energy cost, I think, will keep going down. The only thing that's happening here is that we are still at the beginning of AI. I think there is a lot more AI to be had. I think Dan mentioned earlier Sora 2.0 and then Google has Veo 3.

All of these things we are going to world models where there are much more bigger and more expensive than whatever text and other things we are doing. Soon we will say back in the days we used to do text, they were so much better and doesn't consume anything. Whereas now we are living in this, I don't know, world model and whatever else comes up.

So that's what sort of is going to drive the energy consumption. But as we go through all of these stages, I think there will always be these opportunities to optimize them and make sure that we are getting as close as possible to energy optimality for that particular type of AI. So I am quite optimistic that we continue to find different ways to keep the cost low and so it doesn't sort of go out of bounds. Yeah. Sorry, go ahead.

Kevin Frazier: No, that's great. Sounds like you sleep well at night, which is good to hear.

And Dan, I'll tee you up with one final question, which is, we've talked quite a bit about the market being pretty effective here, where there's a huge driver for the labs themselves to be more efficient. What is your commentary for the state legislators who are introducing, for example, AI energy-related bills?

Is this a moment of saying, hey, just let the market do its thing, let's let this play out? Or do you think this is a time of saying no, we need to mandate the sort of efficiency gains we're seeing by Lab A and make sure that Labs B through Z are applying them as well?

Dan Zhao: Yeah, that's a very complex question there, right?

It's a very multifaceted question too, and it depends because the differing incentives in the sort of complex whirlpool of things may having a clear-cut answer also very difficult, but I'll give it a shot anyway.

So first and foremost, obviously, innovation is still going to be key. If pursuing energy efficiency gains blindsidedly, and imposing those onto––or taking those strategies and imposing them on places that may not work, that probably won’t, won't take priority.

Like in terms of reality, people are probably going to put energy efficiency considerations aside in terms of chasing pure improvement performance, be it in terms of, you know, measured from either a model performance perspective in terms of throughput, tokens per second model forms in terms of improved loss or be it in, for example, model flops, so on and so forth.

Or in terms of more product metrics when it comes to actually putting these products into putting these LLMs and agents into products, be it like, oh, better user engagement, so on and so forth. So that's one consideration. The other consideration is also that we probably don't want a one-strategy-fits-all.

At this stage, I feel like we're still very early on when it comes to trying to understand what strategies work. Because again, everyone might be working with different model components like the different LLMs that are out there. Architectures are still quite different, though a lot of them are still bounded in LLM land when it comes to like all these sort of auto-aggressive decoder auto models, multimodal, so on and so forth, common motifs.

But each one is operating probably most likely on different constraints, different powering constraints, different networking infrastructure, different compute clusters, different data centers, so on and so forth. So these ops, these operations are very different. And OpEx, CapEx are likely very different too.

So when it comes to understanding, at least for state legislators, what's in their own backyard first is probably the most important, right? If you have places that are concentrating in your backyard that might be a specific profile, then you might want to target towards that profile first.

But that also requires understanding, I think, a lot of what's going on there. And that understanding, I think, might be lacking in state legislatures at this moment, especially because of the sort of lack of subject matter expertise, for better for worse. The other thing I'll also say is alluding back to an earlier point, I think that in terms of further efficiency gains, I don't know how long LLMs will be the main focus.

We've seen slight differences in model architectures. A lot of the current ecosystem is indeed based on LLMs, these sort of auto-aggressive decoder style models that are multimodal, et cetera, et cetera, s far as we know. But one thing that I do think is universal or near-universal is probably human behavior.

Like if we can somehow induce human behavior to change in terms of how they're able to operate or collectively work together towards something that would be great.

I'll give you an example. So, machine learning conferences, I'm sure those are like the bread and butter that make everyone's day, right.

Places I typically submit to or things like NeurIPS, ICML, ICLR, as many researchers do. Other folks prefer other settings like IEEE settings, supercomputer, supercomputing, so on and so forth. If you look at like at MIT as we did, like at usage, you'll see them spike around these deadlines, because everyone's panicking and everyone's delaying and everyone procrastinates. Fine.

But like, my point of saying this is that even if eventually we do fall off the LLM wagon and go towards new architecture, and then we have to redraw from scratch or rebuild from some basics of what efficiency techniques mean for these new model architecture types rather than LLMs––and we've already seen this happen a little bit, so back in the days, the dense LLMs, now MOEs or mixture of experts are a little more popular, and they found their ways into LLMs.

Efficiency techniques that worked solely on LMS that didn't count for these MOE additions don't really work as much. Some working.

But human behavior is always universal. So if we can get human behavior, you know, on board, then you know, I'm probably happy with that. A long-winded way of saying it depends.

Kevin Frazier: And all we need to do is change human nature, which, easy smeezy. Right? I'm sure. I'm sure we're all just ready to do that right after we close this episode.

But, you all have some work to do. Thank you for doing it. I find it fascinating. And surely I'm going to be sending you a note to please come back soon. But for now, Mosharaf, Dan, thanks so much for joining.

Dan Zhao: Yeah, thank you for having us.

Kevin Frazier: Scaling Laws is a joint production of Lawfare and the University of Texas School of Law. You can get an ad-free version of this and other Lawfare podcasts by becoming a material subscriber at our website, lawfaremedia.org/support. You'll also get access to special events and other content available only to our supporters. Please rate and review us wherever you get your podcasts. Check out our written work at lawfaremedia.org. You can also follow us on X and Bluesky.

This podcast was edited by Noam Osband of Goat Rodeo. Our music is from ALIBI. As always, thanks for listening.

Topics:

Cybersecurity & Tech

Back to Top