Scaling Laws: The State of AI Safety with Steven Adler
                
                Published by The Lawfare Institute
                in Cooperation With
                
            
Steven Adler, former OpenAI safety researcher, author of Clear-Eyed AI on Substack, and independent AGI-readiness researcher, joins Kevin Frazier, AI Innovation and Law Fellow at the University of Texas School of Law, to assess the current state of AI testing and evaluations. The two walk through Steven’s views on industry efforts to improve model testing and what he thinks regulators ought to know and do when it comes to preventing AI harms.
You can read Steven’s Substack here: https://stevenadler.substack.com/
Thanks to Leo Wu for research assistance!
This episode ran on the Lawfare Daily podcast feed as the Sept. 12 episode.
To receive ad-free podcasts, become a Lawfare Material Supporter at www.patreon.com/lawfare. You can also support Lawfare by making a one-time donation at https://givebutter.com/lawfare-institute.
Click the button below to view a transcript of this podcast. Please note that the transcript was auto-generated and may contain errors.
Transcript
[Intro]
Alan Rozenshtein: When the AI overlords takeover, what are you most excited
about?
Kevin Frazier: It's, it's not crazy, it's just smart.
Alan Rozenshtein: And just this year, in the first six months, there have
been something like a thousand laws.
Kevin Frazier: Who’s actually building the scaffolding around how it's
gonna work, how everyday folks are gonna use it.
Alan Rozenshtein: AI only works if society lets it work.
Kevin Frazier: There are so many questions have to be figured out and
Alan Rozenshtein: Nobody came to my bonus class.
Kevin Frazier: Let's enforce the rules of the road. 
Welcome back to Scaling Laws, the podcast brought to you by Lawfare
and the University of Texas School of Law that explores the intersection of AI,
policy and of course, the law. I'm Kevin Frazier, the AI Innovation and Law
Fellow at Texas Law and a senior editor at Lawfare. 
I'm joined today by Steven Adler, former OpenAI safety
researcher and author of Clear-Eyed AI on Substack to examine the ongoing
technical and policy struggle to ensure AI models are developed and deployed in
a responsible fashion.
[Main Podcast]
AI safety means a lot of things to a lot of people. Research
into the harms posed by ever more advanced AI was an animating cause for the
launch of OpenAI. Yet times have changed talk of existential risk, used to be
pervasive on the Hill. Today the focus in D.C. seems much more focused on first
keeping pace with China and second achieving AI dominance.
We're lucky to have Steven on to investigate whether this pivot
is perception or reality. Steven Adler, thanks so much for coming on Scaling
Laws. 
Steven Adler: Yeah, of course. Thanks for having me. 
Kevin Frazier: So you've worn many hats in your career.
You were an Open AI safety researcher before it was cool, for lack of a better
phrase. You've been thinking about AI safety for a long time, and we find
ourselves at sort of an interesting point in the development of AI safety and
its salience as a political topic.
If we go back, just, a little more than a year, we can recall
Senator Schumer asking a panel of experts what their (p)doom was, their
probability of AI leading to some sort of catastrophic harm. But if we look
forward to today, instead the focus seems really focused on some of these
national security concerns and achieving quote unquote AI dominance.
So given that you've had a longer history in this field than
most folks, I'm really excited to hear how you define AI safety, because I
think that term carries a lot of weight in a lot of conversations. And then if
you could walk us through how you would describe the sort of evolution of AI
safety since the initial introduction of ChatGPT.
Steven Adler: I tend
to think of a slightly broader category, which I would call impacts of AI. And
there are three big questions that I think about. One is about the geopolitical
struggles over AI and you know largely between the U.S. and China, will there
be some sort of conflict over who develops AI, how they use it, their relative
positioning, and could that cause a lot of harm?
There's a second question about how do we keep control over AI
systems in terms of ruling out the worst ways that they might be used. A
classic example here is about bio weapons. Could an AI system that is very high
capability useful for scientific research, empower many more people to do
dangerous things with bioweapons, even though it's in violation of
international law, make it sufficiently cheaper and accessible that we have a
serious problem on our hands?
And then there's a third question, which is, let's say that
powerful AI is developed. There aren't major power struggles along the way.
It's still going to change society in really profound ways. You know, we will
need to think about labor and what the purpose of being human is and all, all
of these sorts of things.
So those are three really broad categories. Often when people
say AI safety, I get the sense that they mean that second category, which is
about ruling out the worst impacts of how AI will be, will be used to cause
harm. There's a framework from Google DeepMind that I like on this where
broadly you can think of a few categories of ways that AI systems could cause
large scale harm.
One is misuse. This is the bio weapons example I gave. Maybe it
empowers non-state actors to do much more harm than they could before. You can
also think of a category of accidents. These AI systems are increasingly
getting integrated to the military. You know, they're going to be used for
scientific research. Could there be mistakes along the way where we think an AI
system is safe and robust for a certain purpose? It turns out not to be in
something very bad happens as a consequence. 
There's also this category of misalignment risk, which is if
you think about the AI. AI systems today, they are agents being trained to
pursue goals in the real world. Today, often pretty narrow goals like booking a
hotel room for you. But the AI companies want to have their AI systems go and
pursue bigger, more open-ended goals. A common one is to solve cancer. 
And so as we give AI systems more and more responsibility,
maybe they will pursue these goals in ways that we don't like. They're contrary
to our interests. We can talk about why this happens and some of the evidence
but you might end up in a situation where AI isn't adversary to you or at least
doing things you really don't want. Sometimes people think of AI safety as, you
know, tone policing. Don't say bad words. 
I think that is a very watered down version of the problem that
matters often that is more companies doing brand safety. They want their
reputation to be good. They don't want to offend the administration, they don't
want to offend customers, but it's less so about how AI might be used to cause
real harm in the world as a powerful technology. 
Kevin Frazier: And
I'm interested in your take on the fact that even with this administration
emphasizing quote unquote AI dominance as its goal for AI policy, making sure
that the U.S. stays ahead of China, making sure that the U.S. is the leader in
all frontier AI development. We still saw in the AI action plan this concern
around CBRN risks, so cyber, biological, radiological, and nuclear risks from
AI development.
For you looking at that focus and seeing, even though we are no
longer talking about (p)dooms on the Hill necessarily, for you, is that enough
of a focus for AI safety from a political viewpoint? Are you, for lack of
better phrase, satisfied with this awareness of those worst case harms from a
political vantage point? Or would you like to see policymakers being more
attentive to the full spectrum of AI harms you just outlined? 
Steven Adler: There's
a really broad range of risks besides just CBRN ones. I'm, I'm certainly glad
those are getting attention. I'm also glad there's a recognition that even if
the U.S. government reasonably wants to be ahead of China on AI development
which I think is a reasonable goal, I think it might have some unfortunate
consequences that need to be managed.
But a recognition that if you go too fast and pursue AI
development with too much abandon, without thinking about what are reasonable
safeguards, you might end up with the whole situation blowing up in your face.
There are a bunch of ways this could happen. It could be that an AI system has
these dangerous capabilities, like the ability to do scary bio weapons things
and it might be stolen.
A kind of scary fact is that none of the frontier AI companies
today believe that they could withstand the force of the Chinese government. If
the Chinese government wanted to steal one of their AI systems, it might be
costly, it might be difficult. But certainly OpenAI, Anthropic have not made
these claims and so–
Kevin Frazier: And
you're saying, just to, to clarify in that point, you're saying none of the
companies believe they could withstand a sort of cyber attack from the PRC
trying to gain access to the model weights, for example. 
Steven Adler: That's
right. Yeah. Thank you for clarifying to, to maintain dominance, suppose that
you have the best technology system in the world, the best AI system you need
to actually maintain control over it and if it can be stolen from you so long
as the, the opposition can run it, which generally they can, they will also
have large computer clusters to be able to run the software.
You no longer have a monopoly on it. And so to create a very
powerful AI system is to create risk that your adversary can get their hands on
it. Notably, if you go ahead and you open source the AI system or really more
open weights, you put the big file that represents the AI system on the
internet, maybe not the full software that created it, but you know, the
computer brain itself. 
You are giving your adversary direct access to it, and so it
doesn't even need to be stolen in that case, still might be worthwhile. There
are lots of reasons to consider this. I think open source is generally a great
thing, but it does come with some amount of risk.
Kevin Frazier: So
with respect to those harms. I'd love to hear some more tangible examples that
you think should be top of mind for policy makers or anyone who cares about AI
development. What, for you, would you list near the top of your concerns from
an AI safety perspective? What keeps you up at night, if anything? And
specificity is encouraged here.
Steven Adler: One big
category is keeping, keeping physical control over your AI system in the way
that I alluded to. And so real security standards for AI companies building at
the frontier with, you know, really aggressive pen testing standards
penetration testing, people trying and really throwing resources at breaking in
and stealing, you know, real audits of insider threat models. Could your
employees, if they in fact, weren't loyal to you, steal your technology, give
it to an adversary? 
So there are these security standards and then related to this
one big risk of an AI system, if it were misaligned, if it did develop goals
that were not exactly what you wanted. As we're seeing all sorts of evidence
today with the systems that exist, although they aren't yet powerful, but they
do seem to have different, weird goals that we don't fully understand.
Couldn't AI system break out of your computers, at which point
it is a rogue virus on the internet and you would have a very hard time getting
it back. People often wonder, why can't you just unplug your AI if it is
misbehaving? And the answer is, if it misbehaves in a way where it breaks out
of your computers, you no longer have the ability to pull the plug.
You could turn off your data center, your AI system no longer
lives in your data center, unfortunately. And so it's a bit too late. So these
are questions of keeping control over your AI system. There's a separate
question about, okay, suppose someone with an AI or even the AI system itself,
a future AI system that is more of an agent and takes actions across the
internet, you know, what can it actually do that is harmful?
Like great, it's roaming free on the internet. Why is that
scary? And the answer is generally cyber attacks, as you've somewhat alluded
to, maybe AI systems can launch attacks on critical infrastructure that we rely
upon, like the power grid financial systems cause a bunch of havoc. Maybe they
can be useful for CBRN risks to do all sorts of dangerous things in the
physical world that we aren't used to contending with. 
And suddenly you have many more people capable of causing
physical harm to humans by developing new molecules, spreading them throughout
the world. There are lots of people in the world who want to cause harm, and
thankfully today they are largely pretty limited.
You know, the defensive standards that we have in place are
generally pretty effective. We deter people. We there, there's just not that
much that different people can do. But if you really amplify the offensive
capabilities of new science and new warfare, you need to make sure that you
have stronger defensive capabilities to go with it.
Kevin Frazier: So
you've been in the literal belly of the beast. I mean, you were working at
OpenAI, you were working on multiple different safety fronts throughout your
career at OpenAI, and you saw the inner workings of this lab that was founded
on a safety narrative, a sort of how do we make sure that if we develop the
most sophisticated AI, if we achieve AGI, we do so in a way that benefits
humanity. 
And so I'm curious from that vantage point, why isn't the lab,
why isn't OpenAI or any of the sort of frontier AI labs taking the requisite
steps with respect to physical security? Let's just start there because you
could imagine there's a clear market incentive to try to make sure that your
model's protected to try to prevent Chinese hackers from accessing your model
weights, for instance.
Why aren't we seeing sufficient safeguards being taken? Is it a
matter of just, well, the Chinese are too good. If they're gonna hack, they're
gonna hack, you know, no one can defeat that. Or is it a sort of, well maybe
the government's gonna come in and and help us out anyway, so let's not spend
our money there. Why aren't we seeing the requisite steps being taken in your
opinion? 
Steven Adler: There,
there are lots of ways that safety and security aren't compatible with the
incentives of a frontier lab. Fundamentally, it's just really, really hard to
secure your technology at this level and entails a bunch of trade-offs.
An issue in the case of OpenAI, they have this software that
they want the world to be able to pull upon in all these ways. They need it to
be accessible and usable to external users. They need it to be usable and
modifiable to hundreds of researchers within the company moved from computer to
computer. It's just a really, really large scale problem and to slow down. To
solve, it would mean potentially letting competitors who aren't as concerned
about these risks jump ahead of you. 
There's a point I often make in terms of safety, which is that
the, the policies and practices of one AI company on safety are absolutely not
what they would choose if they didn't have competitors nipping at their heels. The
unfortunate reality is if you do slow down to pursue safety or security, and
you say, we're not going to move forward until we solve these problems, you
can't guarantee that other AI companies will be as judicious as you and they
might leapfrog you. Or if they were behind, they might catch up to you.
And so fundamentally, what all of these companies are balancing
is what are the safety and security measures that hopefully are effective
enough, are cheap enough that don't compromise our lead? But fundamentally the
way they think about this is if you give up your lead, if you give up your spot
at the frontier, you have lost your influence.
And so a lot of good that is done, if you have, you have
secured your models against adversaries, you know, other states, whatever it
might be. If in the process you are no longer a big important AI company and
actually you don't have the voice or influence to, to help direct how this all
goes, you have given up your influence in the process.
Kevin Frazier: So
we've heard about these sort of racing dynamics before on the Scaling Laws
podcast of having different AI labs, seemingly pushing one another to move ever
forward with the AI frontier and asking questions after the fact. Of course,
one of the big pushbacks to that argument is, well, if we stop, is China gonna
stop? Is Russia gonna stop? Is you name the adversary going to stop on their
own AI development? 
So how does the national security picture and the tumultuous
geopolitical scene factor into your analysis of what you think a responsible
national policy would be here with respect to mandating, for example, some sort
of security mandates on these AI labs?
Steven Adler: I, I
call this regretful racing. The idea that each of the individual actors might
reasonably want to slow down and not be racing each other. But of course, you
know, they largely can only fiat their own actions. And so given that they
might determine their best move is to race, I think fundamentally the way to
change this is to explore treaties that impose certain safeguards, certain
minimum ones ideally verifiable that both parties be they companies be they
countries can enact. 
And this means that they don't pursue the, the forms of
powerful AI development in quite as unchecked away. I think there's a, a thing
that people often get wrong about the U.S.-China AI competition, which is, they
call it a race. I think this is just like totally mistaken. In a race you win
by being the first to cross the finish line. You know, in like a simple game,
you're playing connect four, you win by getting four in a row. It doesn't
matter if your opponent would've gotten four in a row on the next turn like you
have won, the game is over.
And that is just like, totally not how powerful AI development
works. We have seen the lead between different companies go back and forth. One
company being the first to a certain level of AI or one country being the first
to a certain level of AI doesn't mean that they have the perpetual lead. They
still might need to worry about getting caught up to by the other player, or
maybe even if they're ahead of the other player.
The other player has powerful enough AI to still cause a bunch
of issues for them. So to make this more concrete, maybe the U.S. forever more
will have stronger AI than China, but there is some powerful level of AI that
the Chinese government can still reach that allows them to wreck all sorts of
havoc through cyber attacks on U.S. infrastructure. 
And so even though our AI system is stronger, it might just be
really hard to defend all the surface areas that are necessary. And so the goal
for the U.S. government can't just be to beat China and AI development, beat
them to AGI or some level. The U.S. government needs to actually contain the
Chinese AGI development effort.
And this is symmetric, you know, the Chinese government
likewise probably feels threatened by the U.S. pursuing AGI development. And so
what you kind of realize as you reason through this is it's not an all out race
where one will win by getting there first. They each have an interest in
containing the other, and that shifts the frame to how can we get international
cooperation on this.
Like we have in the past on topics like bio weapons and nuclear
non-proliferation and just the whole, the whole regime of international
treaties that bind countries to do things that might not be in their unilateral
self-interest. 
Kevin Frazier: So
we've seen some headlines emerge throughout this AI policy debate about, for
example, whether Xi Jinping the president of China, is actually more of a
safety forward person than perhaps many folks would suspect, somebody who's
actually concerned about some of the loss of control scenarios you flagged
earlier.
But I fear that if we move forward, let's say in a hypo
hypothetical world, the U.S. and China miraculously get together, shake hands,
and say, alright, fine. We're not gonna pursue AGI. We're gonna work with our
respective companies to make sure this doesn't happen. Could we make the
argument that actually benefits China's apparent strategy on this AI, let's now
call it a competition, not a race, this AI competition, given that we've seen
China seemingly lean more into open sourcing its models, making those models
more generally available to more people. Leaning into things like AI adoption
by Chinese companies and by the Chinese people. So they may surrender. Alright,
fine. 
Actually we think we're behind on AGI. And the pursuit of some
of these frontier developments great. If the U.S. is gonna bind itself to not
pursuing the best of the best AI, we're gonna win the race or win the
competition when it comes to diffusing AI and making the entire global
community reliant on our AI. So we'll actually take that win. How do you
respond to this scenario of really moving away from the frontier might benefit
China?
Steven Adler: Yeah,
it's, it's a tricky question. There are a few different dimensions to it. One
thing I hear in this example is the assumption that, you know, if the U.S.
could have pushed ahead and been the first to AI because you know, China was
bluffing, they weren't actually going to get there, that, the U.S. had a win
and it squandered it.
And I don't think that's right. Like, I think that there is
large, large risk in trying to build AI systems that are smarter than any
human. Even if you weren't contending with another international adversary,
the, the scientific community believes that these are unsolved problems.
There's this, this global statement from many, many hundreds of AI researchers,
the CEOs of the three top AI companies, you know, founders of the field, some
of the top cited AI scientists of all time declaring that AI is an extinction
risk that should be treated like nuclear war pandemics.
This is a serious issue and we don't know how to pursue it
safely. I, I think there's much, much more to this, but maybe that's a good
point to pause and I'm happy to keep talking about it. 
Kevin Frazier: Well,
so this raises the point that we are a couple months now removed from ChatGPT 5,
which was heralded and advertised by OpenAI as something that was going to move
the frontier substantially. We received press release after press release and
blog after blog saying this is gonna change the game, so on and so forth. 
And if we look at things like Epoch AI, we see that ChatGPT 5 was
indeed aligned with where we would expect the next model to be from OpenAI. And
yet it's September 2025. The sky hasn't fallen. We haven't seen a huge loss of
control scenario develop bad actors for all intents and purposes, obviously
with some exceptions, we're not seeing some new terrorist organization form and
harness AI. So what's your best response to the sort of, hey, you know, this,
this AI safety community, they sound kind of like Cassandra's.
You know, they've been warning about these X risks since 2023,
or even in some cases 2022, calling on us to pause AI development. And yet the
world looks more or less the same as it did in November of 2022 with, with some
exceptions. But what's your best response to that sort of where, where's the X
risk question?
Steven Adler: I don't
think anyone has defended that GPT5 would cause an extinction level event. Like
I, I think in the minds of different AI safety people, this is still some
number of years off. People disagree about how many and what exactly the
milestone. Pretty, pretty clear it's not going to be GPT5.
I think there are two things to emphasize. One is we are
already seeing pretty large scale harms from the models of the most recent
generation. Anthropic recently released an intelligence report that discussed
how people are now using Claude to conduct large scale cyber attacks, cyber
ransom, and in contrast with how, how AI has been used in some past cycles
where, you know, it's not really doing the thing.
It's like helping a human a little bit, but it can't really be
described as the main force behind it. It seems in this case the AI systems are
now operating as the main force and getting some level of success. The second
thing we need to consider is what is the effect of the safeguards that have
been enacted by the AI companies because they are worried about some level of
harm.
And so, for example, both Anthropic and OpenAI with their most
recent models have, have found through their evaluations the AI systems, if
unconstrained, if they hadn't taken steps to reduce disability, could help, you
know, an ordinary college student, some background in stem, do much more harm
with bio or chemical weapons than otherwise you know, still, still not trivial.
You still have to get different materials, right? It's not
automatic success. But they have found this, even though I, I think especially
for OpenAI, it's not in their interest to admit this. And if there were not
riskier, I think that they would've happily concluded that we weren't yet at
that window. And so the question I see is it's great we have some companies being
proactive about these risks, measuring them, although to different degrees of
quality, doing something about it, we can't guarantee that other AI providers
are going to do this. And in fact, I suspect many AI providers as they catch up
to the frontier, are not going to put the resources into doing this form of
testing and therefore won't take the mitigations.
Let me, let me give you an example. So in the automotive
industry, there, there's a well understood paradigm at this point of what it
takes to, to have a car be safe. You know, you understand you need to look at
survivability. When it crashes, you need to look at it rolling over. And in
fact, there are like pretty standard ways of doing this testing, at least
within the U.S.
You know, you drive into a wall at a certain speed, 35 miles
per hour. And in the AI industry, what we instead have is some companies don't
want to get these results that say, oh, our car actually couldn't really
survive impacted 35 miles per hour. So we won't push our tests as far as we can
because we don't actually want to know the risks of our models.
We'll do the front wall test, we'll drive in it into it at 10
miles per hour. Oh look, that's fine. Or, you know, a company comes out with a
new car and they're like, oh, you know, there is no evidence that our car can
roll over. So we actually, we consider it unnecessary to do this testing. And
you know, maybe they don't say it quite so directly.
They implicate it, they gesture at it, and it's just totally
unacceptable. And unlike in the automotive industry where if the car isn't
sufficiently tested. There aren't enough safeguards. You know, it's really sad
if an individual consumer buys one of those cars and gets into an accident as a
consequence, but largely it's on them.
The AI companies, on the other hand, are producing products
used by hundreds of millions of people each week. They affect many, many more
people beyond the direct user of the ai. And so I think that there is more of
an impetus. To have companies do these testing and mitigations like Anthropic
and OpenAI have done on these CBRN risks to, to try to lift up the floor on the
safety from, from any of these deployments.
Kevin Frazier: So we
love to think, and I hope our stats show, I'll go talk to our our data people
later, that there are folks right now who are on BART listening to this
podcast, folks on the Acela Corridor listening to this. And one of the guiding
principles of good public policy is always weighing the costs and the benefits.
And we've talked a lot about the costs and in many cases we've
described scenarios with high magnitude potential loss, you know, lots of lives
being lost, huge economic damages, but. For all intents and purposes, depending
on who you ask, some lower rate of probability, and yet, I wake up every Monday
and my favorite email I get full disclosure, is from the Center for Data
Innovation, and they provide 10 examples of awesome AI use cases ranging from
academic institutions like UT developing new materials that reflect heat off of
buildings and therefore save on energy bills and labs who are improving our
ability to detect natural disasters and increasing our resilience when the next
tsunami comes or next earthquake comes.
How do you balance the theoretical, both costs and benefits in
this discussion? Because one of my gravest concerns, and I really enjoy talking
with you, and I really enjoy your writing because you, I feel like are, are,
are uniquely able to have reason discourse on this.
So just thank you again for coming on. But how do you weigh
that cost and benefit scenario when we know there are tangible benefits that
could be achieved with AI right now, and the realization that with any emerging
technology, there's a real risk of foregone innovation, of missed opportunities
to see what's the spillover technology, what's the new innovation that could
have come about? So how do you weigh that tricky balance in your head?
Steven Adler: We need
to make sure what we are weighing is the costs and benefits to society writ
large, not just the pocket books of the individual companies who are making
their, their own decisions where, you know to them if they take safety or
security more seriously and fall out of the race. They then aren't making the
money.
But that's different than whether if they were all acting
together without some of this competitive pressure, whether they would make a
different decision. I think you're totally right that we need to be quantifying
the cost of different safety interventions, and we should absolutely be looking
for ones that are relatively more effective than others.
I think there are ones that are very, very cheap by the way,
that are not being pursued by companies today, and this is part of the reason
why I just fear that we can't leave it to companies of their own volition to
invest enough into safety. Just a quick digression. There's been a lot of
discourse recently about syco fancy this property of AI systems to tell people
what they want to hear.
And unfortunately it seems linked to at least a handful of
different deaths in the case of ChatGPT. It's really, really sad. And this was
a risk that was well known and understood in the AI industry, including by OpenAI.
There were free evaluations available on the internet that Anthropic had
published back in 2023.
OpenAI hadn't tested for these. When I went and said, I wonder
what I would find if I ran them. It cost maybe like 20 cents to run the test
took like an hour of my time. And so AI companies, when they're making their
own investment decisions about where to allocate resources, even interventions
that are cheap, interventions that are on topics that they have said they care
about.
Sometimes it just doesn't come out in their favor. They just
are not measuring and caring about risks even that they know and well
understand. I think more broadly there's an institutional design challenge
here, which is how do we get both at the company level and the country level to
have verifiable cooperation of sorts that lets everyone have this reasoned
conversation about.
What does it take to safely manage these systems? What
interventions are helpful? What are worth the cost of them? Because not all of
them will be and make these decisions. More freely without the competitive
pressure. I spoke early about how you might be leapfrog if you invest too much
in safety. You know, it's hard to test an AI model.
You need to be really, really careful if you are actually
interested in turning over every rock and figuring out what risks exist. And it
might take you longer than a few days, which increasingly seems to be the speed
at which the AI companies need to operate between having a system ready and
wanting it to be live to the world in some form.
And so it's unfortunate. That we don't have the time to
actually reflect. On the benefits and costs and figure out, hey, maybe it would
be better if there were more of a norm around waiting at least a few weeks.
Because some of these tests, they just take a reasonably long time to run. But
they are worth it.
Kevin Frazier: They're,
they're useful. So one of my favorite reports, and I think a report that's
received a lot of a acclaim from everyone across the AI safety spectrum, let's
say from the folks who are generally more pro innovation, to the folks who are more
in line with slowing down, or at least letting off the gas a little bit have
supported the AI Frontier Working Group report that was produced in the state
of California as directed by Governor Newsom looking at what would be a good
way to govern frontier AI. And one of the recommendations was a emphasis on
evidence-based policymaking. 
And I think that you and I might be aligned here, but I'm gonna
push you because I want to hear how you respond to this. If we are going to
have evidence-based policy making. I really think that there needs to be a, a
far greater emphasis on both safety mechanisms and deployment efforts to really
try to tally these costs and benefits. 
So let me give you an example because I think that this AI
companion conversation and the fears of psychosis is gonna be the thing that
dominates headlines for a while now, given that we've seen recent lawsuits
dropped in that regard. And given that we're seeing more and more AI tools
specifically designed for kids and deployed in schools, this seemingly will be
a policy topic. For a while we saw the state of Illinois ban therapy bots as a
sort of mechanism to try to prevent AI being deployed too quickly, too soon to
folks who may be mentally vulnerable.
As someone who was, I'll just say mentally vulnerable, as an
elementary schooler, I had anorexia, I had an eating disorder, and my least
favorite part of the day was going to therapy. And if I could have just chatted
with a therapy bot, my hunch is that that would've been a better experience for
me as a fourth grader rather than driving 50 minutes to go into downtown
Portland and meet with the therapist.
So can we align? Perhaps we can agree on a greater emphasis
among state legislators, among Congress for robust institutional mechanisms to
gather data, because I just don't think we're talking enough about the fact
that so much of AI regulation, in my opinion right now, is based off of vibes
and I will fall on my sword.
The second we see that the costs of, for example, these AI
therapy tools exceed the benefits, great. Let's ban 'em or let's delay them, so
on and so forth. But I just wanna see the evidence. Am I wrong? Am I right?
Where, where do we agree on this or disagree? 
Steven Adler: Yeah.
Thanks, thanks for sharing your experience. I, I do think it helps to ground
it. And yeah, you know, bans, bans are often inefficient. Sometimes they're
still justified, but totally there, there is efficiency loss. I think you're
absolutely right that, that there is not enough effort being made to pursue
evidence. You know, quite a lot of untruth can hide in the statement.
There is no evidence that, especially when certain players do
not have an incentive to produce that evidence. A bit of research that I
recently came across from Anthropic looked at the idea of hidden objectives in
AI systems and whether they can be discovered. And this was a way of trying to
ground this question that often feels really, really broad and hard to
speculate about, which is if AI systems have some secret goal.
How can we actually know? And there there's been research that
finds that in fact these systems can be poisoned by bad data on the internet.
They can have a hidden objective and it's really hard to tell, and in fact can
persist through different forms of safety training. And so if these systems
have a hidden objective, we really, really need to know about it. And I love
this grounding of the question. 
They said, let's design an experiment where we can approximate
this type of thing happening. We will create these synthetic documents and feed
them to the AI to try to get it to have a hidden objective. We will simulate
the thing ourselves and we'll run an actual test.
And I love that practice of gathering evidence on these
questions, and I wish that more groups were trying to construct these, these
types of real world. Analogs similar research from the group, Palisade
research. You know, they looked at when AI systems like o3, a powerful
reasoning model of OpenAI, are confronted with problems that they can't solve
the right way, like defeating the best chess engines in chess, you know, well,
they sometimes look to hack or do other things that we really don't want them
to do to pursue these goals.
And so, you know, we're always squinting a little bit in terms
of, hey, like what is the external validity of this research? Have we made it
approximate the real world thing that we care about? As much as possible. But
certainly these are solvable problems and I wish more people were going at
them. On the mental health specifically, it does seem like different
psychiatric facilities could be doing more to collate data on how often are
chat bots a factor, it seems like there is a lack of data in the public sphere.
Kevin Frazier: So you
spent time in OpenAI and we discussed that you were a part of several different
teams with respect to safety mandates and, and oversaw some of that early
research. And I'm interested in. How and why you think the culture has shifted?
You have written about and talked about the fact that some of
the early concerns about safety and some of the early prioritization of making
sure that models were thoroughly tested before deployment, seemingly isn't as
pervasive as it once was at OpenAI. What do you ascribe that to? Is it a sort
of change in culture?
Is it a change in dollar signs landing in the bank account? Is
it a change in just how popular society sees AI? What, what gives, what's
happening? An OpenAI that you feel may be leading folks maybe further away from
that founding mandate?
Steven Adler: The
open, the OpenAI that I joined in late 2020 certainly very different from the
OpenAI today, both in terms of scale and the number of users it's serving and
the product surfaces. And also unfortunately, some of the approach to safety. I
think one fundamental thing. It really felt in those days, like OpenAI believed
in its nonprofit charter, that it was designed for public benefit.
You know, its mission was to ensure that AGI was broadly
beneficial to humanity. The mission was not to build AGI. There's this part of
the charter called the Merge and Assist Clause, where OpenAI even said, you
know, we recognize that race conditions between labs can be very dangerous and
in the right circumstances.
If we think that we might be in one of these races, we would
even merge with a competing effort and just fully help them rather than
competing against each other and causes ways to cut corners. Recently OpenAI
has of course, tried in various ways to move away from being a nonprofit. The
California Attorney General, I think rightfully objected to this, and it's a
little unclear where it is going to net out.
Unfortunately, I don't expect it to be with a strong public,
public mission orientation. I think a factor in all of this is that as the
scale of OpenAI's mission has grown, and as they have needed more and more
computation to train these models, more and more financing of various sorts,
you just end up needing to make different bargains to raise capital, keep your
partners happy.
You know, AI is very dependent on the computers of Microsoft,
Oracle you know, increasingly other groups, even other groups around the world.
And they want to earn a profit and they want open AI to pursue certain policies
and behaviors and not others. And it's just really hard to keep all of that
balanced when there is so much commercial pressure on you, not in line with
the, the nonprofit charter.
Kevin Frazier: Within
much of the labs, we can see some sort of safety team or consumer protection
effort or you name the team, they all have their own fund variants. For you,
which of the labs do you think is doing the best work in this space to make
sure they're going through their paces before deploying any ai ai model? And
what do you think distinguishes them? 
Steven Adler: So, so
two things on this. One, there's a really great organization called AI Lab
Watch. It's AI lab watch org from a researcher, Zach Stein-Perlman who is
really comprehensive about cataloging the safety practices of the leading AI
companies. And so I would, I would generally defer to his assessment.
If anyone's looking for details, he's gone through it on a
bunch of different fronts. I want to offer one slightly different frame, which
is the public often thinks about risk at the moment of an AI model being
deployed and accessible to the outside world, but in fact. The moments of
highest risk might be before these models are deployed at all.
And if we have a bunch of testing and other requirements as a
necessary checkpoint before you can deploy your model to the outside world, but
not before other types of risks, which I'll describe in a second what you might
end up happening is more and more risk. Concentrated in these earlier moments
and companies put off doing the testing and put off deploying the model, but it
hasn't actually solved the risk.
And so earlier I referred to one important threat being can an
AI system break out of the AI company's computers? If it were misaligned, if we
tried to train it to have the right goals, but we failed at that effort similar
to how OpenAI didn't succeed at making their models. Not sycophantic, not
telling the user whatever they wanted.
And so the moment of heightened risk for a model like this one
is trained is the first time that your AI researchers and engineers start using
it inside of your company's walls to write code, including maybe altering the
security code that is responsible for keeping the AI system locked inside the
box.
And so even though the company maybe hasn't intended to sell
the product externally, they therefore haven't done these forms of testing. You
can still have a lot of risk of the AI company losing control over its system
in this moment. To take the automotive analogy. You know, if a, if a car
company is driving its new car around the lot, again, that's kind of on it,
right?
Like maybe it will hurt some of their employees. There is some
risk, but it is largely constrained to the group who is making those decisions.
It isn't affecting people outside the walls of the company. And in fact, in the
AI case, that is maybe the, the heightened moment, the risk of an AI system
being used within an AI company still has grand effects outside of its walls.
Kevin Frazier: So, before
we let you go, I wanna run through a couple fun hypos because I'm a law
professor and we love hypotheticals, so let's imagine that David Sachs, the Crypto
Czar calls you up. Steven, let's chat. I want to hear what are the one or two
things you want me to do from my vantage point? What do you say? What? What are
your go-to responses? Mr. Sachs? I need you to.
Steven Adler: I'm,
I'm not sure that David Sachs wants to talk to me. If he does, I would, I would
take the call. I'm sure there's, there's a person better position to talk with
that. 
Kevin Frazier: Don't
fight the hypo. Steven. Don't fight the hypo. Yeah. Lean in. No, I mean, is on
the line. I'm just saying. 
Steven Adler: I'll
give the ideas someone else, someone else who has a warmer relationship can
deliver them. Yeah, I, I think the fundamental goal we need to get to is making
it that you can trust in the safety of an AI system, even if you don't trust
the developer of it, or even if you fundamentally mistrust it. And this applies
both to the U.S. and to China, who I understand have reasonable reasons to be
skeptical of the other.
It also applies to the different Western AI companies. An
interesting thing, right? Like all of the AI company CEOs seem to deeply
mistrust each other. Open AI was founded because people did not trust Google
DeepMind. There are like four different organizations that were founded because
people did not trust open ai.
And, you know, maybe there is something there aside though from
whether whether the groups are trustworthy. It's just you can't, it's not
sustainable to rely on personal relationships. This sense of trust, executives
change, circumstances change. You need to figure out how to have a system that
actually works.
And so to me, there are two core pieces. There's a scientific
problem of what are the techniques that if people were to use them we can use
to keep control over an AI system either to understand, have we succeeded at
the goals that we have trained into it. Or to stop it from pursuing behaviors
that aren't in line with the goals that we wanted for it.
So there's a scientific question and then there's an adoption
question. How do you get everyone to actually go with it? Make sure that
they're going with it. And that will be easier if the interventions are
cheaper. Certainly it's easier if they are verifiable and you can make sure
that other people aren't defecting on you.
But that, that is the core of the approach. And in fact, I see
some promising signs that where people think there is not a ton to be gained
from defecting. They can cooperate. So for example, the U.S. and China having
some sort of agreement to not use AI in the inner loop of nuclear command and
control seems really good.
That is like a very scary scenario. In fact, this is, this is
basically the Skynet scenario where AI has access to nuclear command and
control. But you know, in this case, there's just, I, I just don't think
there's that much to be gained from giving your AI nuclear command and control.
And so it's easy enough to swear it off.
There's a lot of risk. There's not that much to be gained. Can
we figure out how to transform other cooperation questions to this domain where
similarly, there's just not that much to be gained from defecting. And
hopefully we can all you know, look at the evidence and see that people are
complying.
Kevin Frazier: Well,
you're welcome Mr. Sachs, for that intel. I'm gonna save my second hypo for
what I hope is a conversation down the road, Steven. But for now, I'll let you
go. Thanks so much for joining Scaling Laws. 
Steven Adler: Yeah,
of course. Thank you for having me. This was a lot of fun. 
Kevin Frazier:
Scaling Laws is a joint production of Lawfare and the University of Texas
School of Law.
You can get an ad free version of this and other Lawfare
podcasts by becoming a Lawfare material supporter at our website,
lawfare media.org/support. You'll also get access to special events and other
content available only to our supporters. Please rate and review us wherever
you get your podcasts.
Check out our written work at lawfaremedia.org. You can also
follow us on Xa and Bluesky and email us at scaling laws at lawfaremedia.org.
This podcast was edited by Jay Venables from Goat Rodeo. Our theme song is from
ALIBI music. As always, thank you for listening.
