Cybersecurity & Tech

Scaling Laws: The State of AI Safety with Steven Adler

Kevin Frazier, Steven Adler
Tuesday, September 9, 2025, 7:00 AM
What is the current state of AI testing?

Published by The Lawfare Institute
in Cooperation With
Brookings

Steven Adler, former OpenAI safety researcher, author of Clear-Eyed AI on Substack, and independent AGI-readiness researcher, joins Kevin Frazier, AI Innovation and Law Fellow at the University of Texas School of Law, to assess the current state of AI testing and evaluations. The two walk through Steven’s views on industry efforts to improve model testing and what he thinks regulators ought to know and do when it comes to preventing AI harms.

You can read Steven’s Substack here: https://stevenadler.substack.com/

Thanks to Leo Wu for research assistance!

This episode ran on the Lawfare Daily podcast feed as the Sept. 12 episode.

To receive ad-free podcasts, become a Lawfare Material Supporter at www.patreon.com/lawfare. You can also support Lawfare by making a one-time donation at https://givebutter.com/lawfare-institute.

Click the button below to view a transcript of this podcast. Please note that the transcript was auto-generated and may contain errors.

 

Transcript

[Intro]

Alan Rozenshtein: When the AI overlords takeover, what are you most excited about?

Kevin Frazier: It's, it's not crazy, it's just smart.

Alan Rozenshtein: And just this year, in the first six months, there have been something like a thousand laws.

Kevin Frazier: Who’s actually building the scaffolding around how it's gonna work, how everyday folks are gonna use it.

Alan Rozenshtein: AI only works if society lets it work.

Kevin Frazier: There are so many questions have to be figured out and

Alan Rozenshtein: Nobody came to my bonus class.

Kevin Frazier: Let's enforce the rules of the road.

Welcome back to Scaling Laws, the podcast brought to you by Lawfare and the University of Texas School of Law that explores the intersection of AI, policy and of course, the law. I'm Kevin Frazier, the AI Innovation and Law Fellow at Texas Law and a senior editor at Lawfare.

I'm joined today by Steven Adler, former OpenAI safety researcher and author of Clear-Eyed AI on Substack to examine the ongoing technical and policy struggle to ensure AI models are developed and deployed in a responsible fashion.

[Main Podcast]

AI safety means a lot of things to a lot of people. Research into the harms posed by ever more advanced AI was an animating cause for the launch of OpenAI. Yet times have changed talk of existential risk, used to be pervasive on the Hill. Today the focus in D.C. seems much more focused on first keeping pace with China and second achieving AI dominance.

We're lucky to have Steven on to investigate whether this pivot is perception or reality. Steven Adler, thanks so much for coming on Scaling Laws.

Steven Adler: Yeah, of course. Thanks for having me.

Kevin Frazier: So you've worn many hats in your career. You were an Open AI safety researcher before it was cool, for lack of a better phrase. You've been thinking about AI safety for a long time, and we find ourselves at sort of an interesting point in the development of AI safety and its salience as a political topic.

If we go back, just, a little more than a year, we can recall Senator Schumer asking a panel of experts what their (p)doom was, their probability of AI leading to some sort of catastrophic harm. But if we look forward to today, instead the focus seems really focused on some of these national security concerns and achieving quote unquote AI dominance.

So given that you've had a longer history in this field than most folks, I'm really excited to hear how you define AI safety, because I think that term carries a lot of weight in a lot of conversations. And then if you could walk us through how you would describe the sort of evolution of AI safety since the initial introduction of ChatGPT.

Steven Adler: I tend to think of a slightly broader category, which I would call impacts of AI. And there are three big questions that I think about. One is about the geopolitical struggles over AI and you know largely between the U.S. and China, will there be some sort of conflict over who develops AI, how they use it, their relative positioning, and could that cause a lot of harm?

There's a second question about how do we keep control over AI systems in terms of ruling out the worst ways that they might be used. A classic example here is about bio weapons. Could an AI system that is very high capability useful for scientific research, empower many more people to do dangerous things with bioweapons, even though it's in violation of international law, make it sufficiently cheaper and accessible that we have a serious problem on our hands?

And then there's a third question, which is, let's say that powerful AI is developed. There aren't major power struggles along the way. It's still going to change society in really profound ways. You know, we will need to think about labor and what the purpose of being human is and all, all of these sorts of things.

So those are three really broad categories. Often when people say AI safety, I get the sense that they mean that second category, which is about ruling out the worst impacts of how AI will be, will be used to cause harm. There's a framework from Google DeepMind that I like on this where broadly you can think of a few categories of ways that AI systems could cause large scale harm.

One is misuse. This is the bio weapons example I gave. Maybe it empowers non-state actors to do much more harm than they could before. You can also think of a category of accidents. These AI systems are increasingly getting integrated to the military. You know, they're going to be used for scientific research. Could there be mistakes along the way where we think an AI system is safe and robust for a certain purpose? It turns out not to be in something very bad happens as a consequence.

There's also this category of misalignment risk, which is if you think about the AI. AI systems today, they are agents being trained to pursue goals in the real world. Today, often pretty narrow goals like booking a hotel room for you. But the AI companies want to have their AI systems go and pursue bigger, more open-ended goals. A common one is to solve cancer.

And so as we give AI systems more and more responsibility, maybe they will pursue these goals in ways that we don't like. They're contrary to our interests. We can talk about why this happens and some of the evidence but you might end up in a situation where AI isn't adversary to you or at least doing things you really don't want. Sometimes people think of AI safety as, you know, tone policing. Don't say bad words.

I think that is a very watered down version of the problem that matters often that is more companies doing brand safety. They want their reputation to be good. They don't want to offend the administration, they don't want to offend customers, but it's less so about how AI might be used to cause real harm in the world as a powerful technology.

Kevin Frazier: And I'm interested in your take on the fact that even with this administration emphasizing quote unquote AI dominance as its goal for AI policy, making sure that the U.S. stays ahead of China, making sure that the U.S. is the leader in all frontier AI development. We still saw in the AI action plan this concern around CBRN risks, so cyber, biological, radiological, and nuclear risks from AI development.

For you looking at that focus and seeing, even though we are no longer talking about (p)dooms on the Hill necessarily, for you, is that enough of a focus for AI safety from a political viewpoint? Are you, for lack of better phrase, satisfied with this awareness of those worst case harms from a political vantage point? Or would you like to see policymakers being more attentive to the full spectrum of AI harms you just outlined?

Steven Adler: There's a really broad range of risks besides just CBRN ones. I'm, I'm certainly glad those are getting attention. I'm also glad there's a recognition that even if the U.S. government reasonably wants to be ahead of China on AI development which I think is a reasonable goal, I think it might have some unfortunate consequences that need to be managed.

But a recognition that if you go too fast and pursue AI development with too much abandon, without thinking about what are reasonable safeguards, you might end up with the whole situation blowing up in your face. There are a bunch of ways this could happen. It could be that an AI system has these dangerous capabilities, like the ability to do scary bio weapons things and it might be stolen.

A kind of scary fact is that none of the frontier AI companies today believe that they could withstand the force of the Chinese government. If the Chinese government wanted to steal one of their AI systems, it might be costly, it might be difficult. But certainly OpenAI, Anthropic have not made these claims and so–

Kevin Frazier: And you're saying, just to, to clarify in that point, you're saying none of the companies believe they could withstand a sort of cyber attack from the PRC trying to gain access to the model weights, for example.

Steven Adler: That's right. Yeah. Thank you for clarifying to, to maintain dominance, suppose that you have the best technology system in the world, the best AI system you need to actually maintain control over it and if it can be stolen from you so long as the, the opposition can run it, which generally they can, they will also have large computer clusters to be able to run the software.

You no longer have a monopoly on it. And so to create a very powerful AI system is to create risk that your adversary can get their hands on it. Notably, if you go ahead and you open source the AI system or really more open weights, you put the big file that represents the AI system on the internet, maybe not the full software that created it, but you know, the computer brain itself.

You are giving your adversary direct access to it, and so it doesn't even need to be stolen in that case, still might be worthwhile. There are lots of reasons to consider this. I think open source is generally a great thing, but it does come with some amount of risk.

Kevin Frazier: So with respect to those harms. I'd love to hear some more tangible examples that you think should be top of mind for policy makers or anyone who cares about AI development. What, for you, would you list near the top of your concerns from an AI safety perspective? What keeps you up at night, if anything? And specificity is encouraged here.

Steven Adler: One big category is keeping, keeping physical control over your AI system in the way that I alluded to. And so real security standards for AI companies building at the frontier with, you know, really aggressive pen testing standards penetration testing, people trying and really throwing resources at breaking in and stealing, you know, real audits of insider threat models. Could your employees, if they in fact, weren't loyal to you, steal your technology, give it to an adversary?

So there are these security standards and then related to this one big risk of an AI system, if it were misaligned, if it did develop goals that were not exactly what you wanted. As we're seeing all sorts of evidence today with the systems that exist, although they aren't yet powerful, but they do seem to have different, weird goals that we don't fully understand.

Couldn't AI system break out of your computers, at which point it is a rogue virus on the internet and you would have a very hard time getting it back. People often wonder, why can't you just unplug your AI if it is misbehaving? And the answer is, if it misbehaves in a way where it breaks out of your computers, you no longer have the ability to pull the plug.

You could turn off your data center, your AI system no longer lives in your data center, unfortunately. And so it's a bit too late. So these are questions of keeping control over your AI system. There's a separate question about, okay, suppose someone with an AI or even the AI system itself, a future AI system that is more of an agent and takes actions across the internet, you know, what can it actually do that is harmful?

Like great, it's roaming free on the internet. Why is that scary? And the answer is generally cyber attacks, as you've somewhat alluded to, maybe AI systems can launch attacks on critical infrastructure that we rely upon, like the power grid financial systems cause a bunch of havoc. Maybe they can be useful for CBRN risks to do all sorts of dangerous things in the physical world that we aren't used to contending with.

And suddenly you have many more people capable of causing physical harm to humans by developing new molecules, spreading them throughout the world. There are lots of people in the world who want to cause harm, and thankfully today they are largely pretty limited.

You know, the defensive standards that we have in place are generally pretty effective. We deter people. We there, there's just not that much that different people can do. But if you really amplify the offensive capabilities of new science and new warfare, you need to make sure that you have stronger defensive capabilities to go with it.

Kevin Frazier: So you've been in the literal belly of the beast. I mean, you were working at OpenAI, you were working on multiple different safety fronts throughout your career at OpenAI, and you saw the inner workings of this lab that was founded on a safety narrative, a sort of how do we make sure that if we develop the most sophisticated AI, if we achieve AGI, we do so in a way that benefits humanity.

And so I'm curious from that vantage point, why isn't the lab, why isn't OpenAI or any of the sort of frontier AI labs taking the requisite steps with respect to physical security? Let's just start there because you could imagine there's a clear market incentive to try to make sure that your model's protected to try to prevent Chinese hackers from accessing your model weights, for instance.

Why aren't we seeing sufficient safeguards being taken? Is it a matter of just, well, the Chinese are too good. If they're gonna hack, they're gonna hack, you know, no one can defeat that. Or is it a sort of, well maybe the government's gonna come in and and help us out anyway, so let's not spend our money there. Why aren't we seeing the requisite steps being taken in your opinion?

Steven Adler: There, there are lots of ways that safety and security aren't compatible with the incentives of a frontier lab. Fundamentally, it's just really, really hard to secure your technology at this level and entails a bunch of trade-offs.

An issue in the case of OpenAI, they have this software that they want the world to be able to pull upon in all these ways. They need it to be accessible and usable to external users. They need it to be usable and modifiable to hundreds of researchers within the company moved from computer to computer. It's just a really, really large scale problem and to slow down. To solve, it would mean potentially letting competitors who aren't as concerned about these risks jump ahead of you.

There's a point I often make in terms of safety, which is that the, the policies and practices of one AI company on safety are absolutely not what they would choose if they didn't have competitors nipping at their heels. The unfortunate reality is if you do slow down to pursue safety or security, and you say, we're not going to move forward until we solve these problems, you can't guarantee that other AI companies will be as judicious as you and they might leapfrog you. Or if they were behind, they might catch up to you.

And so fundamentally, what all of these companies are balancing is what are the safety and security measures that hopefully are effective enough, are cheap enough that don't compromise our lead? But fundamentally the way they think about this is if you give up your lead, if you give up your spot at the frontier, you have lost your influence.

And so a lot of good that is done, if you have, you have secured your models against adversaries, you know, other states, whatever it might be. If in the process you are no longer a big important AI company and actually you don't have the voice or influence to, to help direct how this all goes, you have given up your influence in the process.

Kevin Frazier: So we've heard about these sort of racing dynamics before on the Scaling Laws podcast of having different AI labs, seemingly pushing one another to move ever forward with the AI frontier and asking questions after the fact. Of course, one of the big pushbacks to that argument is, well, if we stop, is China gonna stop? Is Russia gonna stop? Is you name the adversary going to stop on their own AI development?

So how does the national security picture and the tumultuous geopolitical scene factor into your analysis of what you think a responsible national policy would be here with respect to mandating, for example, some sort of security mandates on these AI labs?

Steven Adler: I, I call this regretful racing. The idea that each of the individual actors might reasonably want to slow down and not be racing each other. But of course, you know, they largely can only fiat their own actions. And so given that they might determine their best move is to race, I think fundamentally the way to change this is to explore treaties that impose certain safeguards, certain minimum ones ideally verifiable that both parties be they companies be they countries can enact.

And this means that they don't pursue the, the forms of powerful AI development in quite as unchecked away. I think there's a, a thing that people often get wrong about the U.S.-China AI competition, which is, they call it a race. I think this is just like totally mistaken. In a race you win by being the first to cross the finish line. You know, in like a simple game, you're playing connect four, you win by getting four in a row. It doesn't matter if your opponent would've gotten four in a row on the next turn like you have won, the game is over.

And that is just like, totally not how powerful AI development works. We have seen the lead between different companies go back and forth. One company being the first to a certain level of AI or one country being the first to a certain level of AI doesn't mean that they have the perpetual lead. They still might need to worry about getting caught up to by the other player, or maybe even if they're ahead of the other player.

The other player has powerful enough AI to still cause a bunch of issues for them. So to make this more concrete, maybe the U.S. forever more will have stronger AI than China, but there is some powerful level of AI that the Chinese government can still reach that allows them to wreck all sorts of havoc through cyber attacks on U.S. infrastructure.

And so even though our AI system is stronger, it might just be really hard to defend all the surface areas that are necessary. And so the goal for the U.S. government can't just be to beat China and AI development, beat them to AGI or some level. The U.S. government needs to actually contain the Chinese AGI development effort.

And this is symmetric, you know, the Chinese government likewise probably feels threatened by the U.S. pursuing AGI development. And so what you kind of realize as you reason through this is it's not an all out race where one will win by getting there first. They each have an interest in containing the other, and that shifts the frame to how can we get international cooperation on this.

Like we have in the past on topics like bio weapons and nuclear non-proliferation and just the whole, the whole regime of international treaties that bind countries to do things that might not be in their unilateral self-interest.

Kevin Frazier: So we've seen some headlines emerge throughout this AI policy debate about, for example, whether Xi Jinping the president of China, is actually more of a safety forward person than perhaps many folks would suspect, somebody who's actually concerned about some of the loss of control scenarios you flagged earlier.

But I fear that if we move forward, let's say in a hypo hypothetical world, the U.S. and China miraculously get together, shake hands, and say, alright, fine. We're not gonna pursue AGI. We're gonna work with our respective companies to make sure this doesn't happen. Could we make the argument that actually benefits China's apparent strategy on this AI, let's now call it a competition, not a race, this AI competition, given that we've seen China seemingly lean more into open sourcing its models, making those models more generally available to more people. Leaning into things like AI adoption by Chinese companies and by the Chinese people. So they may surrender. Alright, fine.

Actually we think we're behind on AGI. And the pursuit of some of these frontier developments great. If the U.S. is gonna bind itself to not pursuing the best of the best AI, we're gonna win the race or win the competition when it comes to diffusing AI and making the entire global community reliant on our AI. So we'll actually take that win. How do you respond to this scenario of really moving away from the frontier might benefit China?

Steven Adler: Yeah, it's, it's a tricky question. There are a few different dimensions to it. One thing I hear in this example is the assumption that, you know, if the U.S. could have pushed ahead and been the first to AI because you know, China was bluffing, they weren't actually going to get there, that, the U.S. had a win and it squandered it.

And I don't think that's right. Like, I think that there is large, large risk in trying to build AI systems that are smarter than any human. Even if you weren't contending with another international adversary, the, the scientific community believes that these are unsolved problems. There's this, this global statement from many, many hundreds of AI researchers, the CEOs of the three top AI companies, you know, founders of the field, some of the top cited AI scientists of all time declaring that AI is an extinction risk that should be treated like nuclear war pandemics.

This is a serious issue and we don't know how to pursue it safely. I, I think there's much, much more to this, but maybe that's a good point to pause and I'm happy to keep talking about it.

Kevin Frazier: Well, so this raises the point that we are a couple months now removed from ChatGPT 5, which was heralded and advertised by OpenAI as something that was going to move the frontier substantially. We received press release after press release and blog after blog saying this is gonna change the game, so on and so forth.

And if we look at things like Epoch AI, we see that ChatGPT 5 was indeed aligned with where we would expect the next model to be from OpenAI. And yet it's September 2025. The sky hasn't fallen. We haven't seen a huge loss of control scenario develop bad actors for all intents and purposes, obviously with some exceptions, we're not seeing some new terrorist organization form and harness AI. So what's your best response to the sort of, hey, you know, this, this AI safety community, they sound kind of like Cassandra's.

You know, they've been warning about these X risks since 2023, or even in some cases 2022, calling on us to pause AI development. And yet the world looks more or less the same as it did in November of 2022 with, with some exceptions. But what's your best response to that sort of where, where's the X risk question?

Steven Adler: I don't think anyone has defended that GPT5 would cause an extinction level event. Like I, I think in the minds of different AI safety people, this is still some number of years off. People disagree about how many and what exactly the milestone. Pretty, pretty clear it's not going to be GPT5.

I think there are two things to emphasize. One is we are already seeing pretty large scale harms from the models of the most recent generation. Anthropic recently released an intelligence report that discussed how people are now using Claude to conduct large scale cyber attacks, cyber ransom, and in contrast with how, how AI has been used in some past cycles where, you know, it's not really doing the thing.

It's like helping a human a little bit, but it can't really be described as the main force behind it. It seems in this case the AI systems are now operating as the main force and getting some level of success. The second thing we need to consider is what is the effect of the safeguards that have been enacted by the AI companies because they are worried about some level of harm.

And so, for example, both Anthropic and OpenAI with their most recent models have, have found through their evaluations the AI systems, if unconstrained, if they hadn't taken steps to reduce disability, could help, you know, an ordinary college student, some background in stem, do much more harm with bio or chemical weapons than otherwise you know, still, still not trivial.

You still have to get different materials, right? It's not automatic success. But they have found this, even though I, I think especially for OpenAI, it's not in their interest to admit this. And if there were not riskier, I think that they would've happily concluded that we weren't yet at that window. And so the question I see is it's great we have some companies being proactive about these risks, measuring them, although to different degrees of quality, doing something about it, we can't guarantee that other AI providers are going to do this. And in fact, I suspect many AI providers as they catch up to the frontier, are not going to put the resources into doing this form of testing and therefore won't take the mitigations.

Let me, let me give you an example. So in the automotive industry, there, there's a well understood paradigm at this point of what it takes to, to have a car be safe. You know, you understand you need to look at survivability. When it crashes, you need to look at it rolling over. And in fact, there are like pretty standard ways of doing this testing, at least within the U.S.

You know, you drive into a wall at a certain speed, 35 miles per hour. And in the AI industry, what we instead have is some companies don't want to get these results that say, oh, our car actually couldn't really survive impacted 35 miles per hour. So we won't push our tests as far as we can because we don't actually want to know the risks of our models.

We'll do the front wall test, we'll drive in it into it at 10 miles per hour. Oh look, that's fine. Or, you know, a company comes out with a new car and they're like, oh, you know, there is no evidence that our car can roll over. So we actually, we consider it unnecessary to do this testing. And you know, maybe they don't say it quite so directly.

They implicate it, they gesture at it, and it's just totally unacceptable. And unlike in the automotive industry where if the car isn't sufficiently tested. There aren't enough safeguards. You know, it's really sad if an individual consumer buys one of those cars and gets into an accident as a consequence, but largely it's on them.

The AI companies, on the other hand, are producing products used by hundreds of millions of people each week. They affect many, many more people beyond the direct user of the ai. And so I think that there is more of an impetus. To have companies do these testing and mitigations like Anthropic and OpenAI have done on these CBRN risks to, to try to lift up the floor on the safety from, from any of these deployments.

Kevin Frazier: So we love to think, and I hope our stats show, I'll go talk to our our data people later, that there are folks right now who are on BART listening to this podcast, folks on the Acela Corridor listening to this. And one of the guiding principles of good public policy is always weighing the costs and the benefits.

And we've talked a lot about the costs and in many cases we've described scenarios with high magnitude potential loss, you know, lots of lives being lost, huge economic damages, but. For all intents and purposes, depending on who you ask, some lower rate of probability, and yet, I wake up every Monday and my favorite email I get full disclosure, is from the Center for Data Innovation, and they provide 10 examples of awesome AI use cases ranging from academic institutions like UT developing new materials that reflect heat off of buildings and therefore save on energy bills and labs who are improving our ability to detect natural disasters and increasing our resilience when the next tsunami comes or next earthquake comes.

How do you balance the theoretical, both costs and benefits in this discussion? Because one of my gravest concerns, and I really enjoy talking with you, and I really enjoy your writing because you, I feel like are, are, are uniquely able to have reason discourse on this.

So just thank you again for coming on. But how do you weigh that cost and benefit scenario when we know there are tangible benefits that could be achieved with AI right now, and the realization that with any emerging technology, there's a real risk of foregone innovation, of missed opportunities to see what's the spillover technology, what's the new innovation that could have come about? So how do you weigh that tricky balance in your head?

Steven Adler: We need to make sure what we are weighing is the costs and benefits to society writ large, not just the pocket books of the individual companies who are making their, their own decisions where, you know to them if they take safety or security more seriously and fall out of the race. They then aren't making the money.

But that's different than whether if they were all acting together without some of this competitive pressure, whether they would make a different decision. I think you're totally right that we need to be quantifying the cost of different safety interventions, and we should absolutely be looking for ones that are relatively more effective than others.

I think there are ones that are very, very cheap by the way, that are not being pursued by companies today, and this is part of the reason why I just fear that we can't leave it to companies of their own volition to invest enough into safety. Just a quick digression. There's been a lot of discourse recently about syco fancy this property of AI systems to tell people what they want to hear.

And unfortunately it seems linked to at least a handful of different deaths in the case of ChatGPT. It's really, really sad. And this was a risk that was well known and understood in the AI industry, including by OpenAI. There were free evaluations available on the internet that Anthropic had published back in 2023.

OpenAI hadn't tested for these. When I went and said, I wonder what I would find if I ran them. It cost maybe like 20 cents to run the test took like an hour of my time. And so AI companies, when they're making their own investment decisions about where to allocate resources, even interventions that are cheap, interventions that are on topics that they have said they care about.

Sometimes it just doesn't come out in their favor. They just are not measuring and caring about risks even that they know and well understand. I think more broadly there's an institutional design challenge here, which is how do we get both at the company level and the country level to have verifiable cooperation of sorts that lets everyone have this reasoned conversation about.

What does it take to safely manage these systems? What interventions are helpful? What are worth the cost of them? Because not all of them will be and make these decisions. More freely without the competitive pressure. I spoke early about how you might be leapfrog if you invest too much in safety. You know, it's hard to test an AI model.

You need to be really, really careful if you are actually interested in turning over every rock and figuring out what risks exist. And it might take you longer than a few days, which increasingly seems to be the speed at which the AI companies need to operate between having a system ready and wanting it to be live to the world in some form.

And so it's unfortunate. That we don't have the time to actually reflect. On the benefits and costs and figure out, hey, maybe it would be better if there were more of a norm around waiting at least a few weeks. Because some of these tests, they just take a reasonably long time to run. But they are worth it.

Kevin Frazier: They're, they're useful. So one of my favorite reports, and I think a report that's received a lot of a acclaim from everyone across the AI safety spectrum, let's say from the folks who are generally more pro innovation, to the folks who are more in line with slowing down, or at least letting off the gas a little bit have supported the AI Frontier Working Group report that was produced in the state of California as directed by Governor Newsom looking at what would be a good way to govern frontier AI. And one of the recommendations was a emphasis on evidence-based policymaking.

And I think that you and I might be aligned here, but I'm gonna push you because I want to hear how you respond to this. If we are going to have evidence-based policy making. I really think that there needs to be a, a far greater emphasis on both safety mechanisms and deployment efforts to really try to tally these costs and benefits.

So let me give you an example because I think that this AI companion conversation and the fears of psychosis is gonna be the thing that dominates headlines for a while now, given that we've seen recent lawsuits dropped in that regard. And given that we're seeing more and more AI tools specifically designed for kids and deployed in schools, this seemingly will be a policy topic. For a while we saw the state of Illinois ban therapy bots as a sort of mechanism to try to prevent AI being deployed too quickly, too soon to folks who may be mentally vulnerable.

As someone who was, I'll just say mentally vulnerable, as an elementary schooler, I had anorexia, I had an eating disorder, and my least favorite part of the day was going to therapy. And if I could have just chatted with a therapy bot, my hunch is that that would've been a better experience for me as a fourth grader rather than driving 50 minutes to go into downtown Portland and meet with the therapist.

So can we align? Perhaps we can agree on a greater emphasis among state legislators, among Congress for robust institutional mechanisms to gather data, because I just don't think we're talking enough about the fact that so much of AI regulation, in my opinion right now, is based off of vibes and I will fall on my sword.

The second we see that the costs of, for example, these AI therapy tools exceed the benefits, great. Let's ban 'em or let's delay them, so on and so forth. But I just wanna see the evidence. Am I wrong? Am I right? Where, where do we agree on this or disagree?

Steven Adler: Yeah. Thanks, thanks for sharing your experience. I, I do think it helps to ground it. And yeah, you know, bans, bans are often inefficient. Sometimes they're still justified, but totally there, there is efficiency loss. I think you're absolutely right that, that there is not enough effort being made to pursue evidence. You know, quite a lot of untruth can hide in the statement.

There is no evidence that, especially when certain players do not have an incentive to produce that evidence. A bit of research that I recently came across from Anthropic looked at the idea of hidden objectives in AI systems and whether they can be discovered. And this was a way of trying to ground this question that often feels really, really broad and hard to speculate about, which is if AI systems have some secret goal.

How can we actually know? And there there's been research that finds that in fact these systems can be poisoned by bad data on the internet. They can have a hidden objective and it's really hard to tell, and in fact can persist through different forms of safety training. And so if these systems have a hidden objective, we really, really need to know about it. And I love this grounding of the question.

They said, let's design an experiment where we can approximate this type of thing happening. We will create these synthetic documents and feed them to the AI to try to get it to have a hidden objective. We will simulate the thing ourselves and we'll run an actual test.

And I love that practice of gathering evidence on these questions, and I wish that more groups were trying to construct these, these types of real world. Analogs similar research from the group, Palisade research. You know, they looked at when AI systems like o3, a powerful reasoning model of OpenAI, are confronted with problems that they can't solve the right way, like defeating the best chess engines in chess, you know, well, they sometimes look to hack or do other things that we really don't want them to do to pursue these goals.

And so, you know, we're always squinting a little bit in terms of, hey, like what is the external validity of this research? Have we made it approximate the real world thing that we care about? As much as possible. But certainly these are solvable problems and I wish more people were going at them. On the mental health specifically, it does seem like different psychiatric facilities could be doing more to collate data on how often are chat bots a factor, it seems like there is a lack of data in the public sphere.

Kevin Frazier: So you spent time in OpenAI and we discussed that you were a part of several different teams with respect to safety mandates and, and oversaw some of that early research. And I'm interested in. How and why you think the culture has shifted?

You have written about and talked about the fact that some of the early concerns about safety and some of the early prioritization of making sure that models were thoroughly tested before deployment, seemingly isn't as pervasive as it once was at OpenAI. What do you ascribe that to? Is it a sort of change in culture?

Is it a change in dollar signs landing in the bank account? Is it a change in just how popular society sees AI? What, what gives, what's happening? An OpenAI that you feel may be leading folks maybe further away from that founding mandate?

Steven Adler: The open, the OpenAI that I joined in late 2020 certainly very different from the OpenAI today, both in terms of scale and the number of users it's serving and the product surfaces. And also unfortunately, some of the approach to safety. I think one fundamental thing. It really felt in those days, like OpenAI believed in its nonprofit charter, that it was designed for public benefit.

You know, its mission was to ensure that AGI was broadly beneficial to humanity. The mission was not to build AGI. There's this part of the charter called the Merge and Assist Clause, where OpenAI even said, you know, we recognize that race conditions between labs can be very dangerous and in the right circumstances.

If we think that we might be in one of these races, we would even merge with a competing effort and just fully help them rather than competing against each other and causes ways to cut corners. Recently OpenAI has of course, tried in various ways to move away from being a nonprofit. The California Attorney General, I think rightfully objected to this, and it's a little unclear where it is going to net out.

Unfortunately, I don't expect it to be with a strong public, public mission orientation. I think a factor in all of this is that as the scale of OpenAI's mission has grown, and as they have needed more and more computation to train these models, more and more financing of various sorts, you just end up needing to make different bargains to raise capital, keep your partners happy.

You know, AI is very dependent on the computers of Microsoft, Oracle you know, increasingly other groups, even other groups around the world. And they want to earn a profit and they want open AI to pursue certain policies and behaviors and not others. And it's just really hard to keep all of that balanced when there is so much commercial pressure on you, not in line with the, the nonprofit charter.

Kevin Frazier: Within much of the labs, we can see some sort of safety team or consumer protection effort or you name the team, they all have their own fund variants. For you, which of the labs do you think is doing the best work in this space to make sure they're going through their paces before deploying any ai ai model? And what do you think distinguishes them?

Steven Adler: So, so two things on this. One, there's a really great organization called AI Lab Watch. It's AI lab watch org from a researcher, Zach Stein-Perlman who is really comprehensive about cataloging the safety practices of the leading AI companies. And so I would, I would generally defer to his assessment.

If anyone's looking for details, he's gone through it on a bunch of different fronts. I want to offer one slightly different frame, which is the public often thinks about risk at the moment of an AI model being deployed and accessible to the outside world, but in fact. The moments of highest risk might be before these models are deployed at all.

And if we have a bunch of testing and other requirements as a necessary checkpoint before you can deploy your model to the outside world, but not before other types of risks, which I'll describe in a second what you might end up happening is more and more risk. Concentrated in these earlier moments and companies put off doing the testing and put off deploying the model, but it hasn't actually solved the risk.

And so earlier I referred to one important threat being can an AI system break out of the AI company's computers? If it were misaligned, if we tried to train it to have the right goals, but we failed at that effort similar to how OpenAI didn't succeed at making their models. Not sycophantic, not telling the user whatever they wanted.

And so the moment of heightened risk for a model like this one is trained is the first time that your AI researchers and engineers start using it inside of your company's walls to write code, including maybe altering the security code that is responsible for keeping the AI system locked inside the box.

And so even though the company maybe hasn't intended to sell the product externally, they therefore haven't done these forms of testing. You can still have a lot of risk of the AI company losing control over its system in this moment. To take the automotive analogy. You know, if a, if a car company is driving its new car around the lot, again, that's kind of on it, right?

Like maybe it will hurt some of their employees. There is some risk, but it is largely constrained to the group who is making those decisions. It isn't affecting people outside the walls of the company. And in fact, in the AI case, that is maybe the, the heightened moment, the risk of an AI system being used within an AI company still has grand effects outside of its walls.

Kevin Frazier: So, before we let you go, I wanna run through a couple fun hypos because I'm a law professor and we love hypotheticals, so let's imagine that David Sachs, the Crypto Czar calls you up. Steven, let's chat. I want to hear what are the one or two things you want me to do from my vantage point? What do you say? What? What are your go-to responses? Mr. Sachs? I need you to.

Steven Adler: I'm, I'm not sure that David Sachs wants to talk to me. If he does, I would, I would take the call. I'm sure there's, there's a person better position to talk with that.

Kevin Frazier: Don't fight the hypo. Steven. Don't fight the hypo. Yeah. Lean in. No, I mean, is on the line. I'm just saying.

Steven Adler: I'll give the ideas someone else, someone else who has a warmer relationship can deliver them. Yeah, I, I think the fundamental goal we need to get to is making it that you can trust in the safety of an AI system, even if you don't trust the developer of it, or even if you fundamentally mistrust it. And this applies both to the U.S. and to China, who I understand have reasonable reasons to be skeptical of the other.

It also applies to the different Western AI companies. An interesting thing, right? Like all of the AI company CEOs seem to deeply mistrust each other. Open AI was founded because people did not trust Google DeepMind. There are like four different organizations that were founded because people did not trust open ai.

And, you know, maybe there is something there aside though from whether whether the groups are trustworthy. It's just you can't, it's not sustainable to rely on personal relationships. This sense of trust, executives change, circumstances change. You need to figure out how to have a system that actually works.

And so to me, there are two core pieces. There's a scientific problem of what are the techniques that if people were to use them we can use to keep control over an AI system either to understand, have we succeeded at the goals that we have trained into it. Or to stop it from pursuing behaviors that aren't in line with the goals that we wanted for it.

So there's a scientific question and then there's an adoption question. How do you get everyone to actually go with it? Make sure that they're going with it. And that will be easier if the interventions are cheaper. Certainly it's easier if they are verifiable and you can make sure that other people aren't defecting on you.

But that, that is the core of the approach. And in fact, I see some promising signs that where people think there is not a ton to be gained from defecting. They can cooperate. So for example, the U.S. and China having some sort of agreement to not use AI in the inner loop of nuclear command and control seems really good.

That is like a very scary scenario. In fact, this is, this is basically the Skynet scenario where AI has access to nuclear command and control. But you know, in this case, there's just, I, I just don't think there's that much to be gained from giving your AI nuclear command and control. And so it's easy enough to swear it off.

There's a lot of risk. There's not that much to be gained. Can we figure out how to transform other cooperation questions to this domain where similarly, there's just not that much to be gained from defecting. And hopefully we can all you know, look at the evidence and see that people are complying.

Kevin Frazier: Well, you're welcome Mr. Sachs, for that intel. I'm gonna save my second hypo for what I hope is a conversation down the road, Steven. But for now, I'll let you go. Thanks so much for joining Scaling Laws.

Steven Adler: Yeah, of course. Thank you for having me. This was a lot of fun.

Kevin Frazier: Scaling Laws is a joint production of Lawfare and the University of Texas School of Law.

You can get an ad free version of this and other Lawfare podcasts by becoming a Lawfare material supporter at our website, lawfare media.org/support. You'll also get access to special events and other content available only to our supporters. Please rate and review us wherever you get your podcasts.

Check out our written work at lawfaremedia.org. You can also follow us on Xa and Bluesky and email us at scaling laws at lawfaremedia.org. This podcast was edited by Jay Venables from Goat Rodeo. Our theme song is from ALIBI music. As always, thank you for listening.


Kevin Frazier is an AI Innovation and Law Fellow at UT Austin School of Law and Senior Editor at Lawfare .
Steven Adler is an AI researcher who previously led dangerous capabilities testing for OpenAI. Steven is also the author of a newsletter on how to safely manage advanced AI systems.
}

Subscribe to Lawfare