Scaling Laws: AI Safety Meet Trust & Safety with Ravi Iyer and David Sullivan
Published by The Lawfare Institute
in Cooperation With
David Sullivan, Executive Director of the Digital Trust & Safety Partnership, and Rayi Iyer, Managing Director of the Psychology of Technology Institute at USC’s Neely Center, join Kevin Frazier, AI Innovation and Law Fellow at the University of Texas School of Law and a Senior Editor at Lawfare, to discuss the evolution of the Trust & Safety field and its relevance to ongoing conversations about how best to govern AI.
They discuss the importance of thinking about the end user in regulation, debate the differences and similarities between social media and AI companions, and evaluate current policy proposals.
You’ll “like” (bad pun intended) this one.
Leo Wu provided excellent research assistance to prepare for this podcast.
Read more from David:
- https://www.weforum.org/stories/2025/08/safety-product-build-better-bots/
- https://www.techpolicy.press/learning-from-the-past-to-shape-the-future-of-digital-trust-and-safety/
Read more from Ravi:
- https://shows.acast.com/arbiters-of-truth/episodes/ravi-iyer-on-how-to-improve-technology-through-design
- https://open.substack.com/pub/psychoftech/p/regulate-value-aligned-design-not?r=2alyy0&utm_campaign=post&utm_medium=web&showWelcomeOnShare=false
Read more from Kevin:
Find Scaling Laws on the Lawfare website, and subscribe to never miss an episode.
This Scaling Laws episode ran as the October 10 Lawfare Daily episode.
To receive ad-free podcasts, become a Lawfare Material Supporter at www.patreon.com/lawfare. You can also support Lawfare by making a one-time donation at https://givebutter.com/
Click the button below to view a transcript of this podcast. Please note that the transcript was auto-generated and may contain errors.
Transcript
[Intro]
Alan Rozenshtein: It
is the Lawfare Podcast. I'm Alan Rozenshtein, associate professor of law
at the University of Minnesota, and a senior editor and research director at Lawfare.
Today we're bringing you something a little different: an
episode from our new podcast series, Scaling Laws. It's a creation of Lawfare
and the University of Texas School of Law where we're tackling the most
important AI and policy questions from new legislation on Capitol Hill to the
latest breakthroughs that are happening in the labs.
We cut through the hype to get you up to speed on the rules,
standards, and ideas shaping the future of this pivotal technology. If you
enjoy this episode, you can find and subscribe to Scaling Laws wherever
you get your podcasts and follow us on X and Bluesky. Thanks for listening.
When the AI overlords takeover, what are you most excited
about?
Kevin Frazier: It's,
it's not crazy. It's just smart.
Alan Rozenshtein: And
just this year, in the first six months, there have been something like a
thousand laws.
Kevin Frazier: Who's
actually building the scaffolding around how it's gonna work, how everyday
folks are gonna use it? AI only works if society lets it work.
There are so many questions have to be figured out and nobody
came to my bonus class. Let's enforce the rules of the road.
[Main episode]
Welcome back to Scaling Laws, the podcast brought to you
by Lawfare and the University of Texas School of Law that explores the
intersection of AI policy and, of course, the law.
I'm Kevin Frazier, the AI Innovation and Law fellow at Texas
Law and a senior editor at Lawfare. There's a lot of attention today on
what many refer to as AI safety: making sure models are aligned with social
values and perform as intended. But there's another part of the puzzle to
ensuring AI aligns with consumer expectations and community values, and that's
analyzing how people actually use AI and to what ends.
Recent headlines have included tragic incidents of self-harm by
teens who establish relationships with AI tools. The resulting litigation may
turn on many of the same issues that have long been at the heart of the
so-called trust and safety field. It's my pleasure to welcome to the show David
Sullivan, founding executive director of the Digital Trust and Safety
Partnership, and Ravi Iyer, managing director of the Psychology of Technology
Institute at USC’s Neely Center.
These two folks are leaders in the T&S space and can help
us sort out what's new and what's not about AI when it comes to trust and
safety issues. As always, please feel free to follow us on X or Bluesky or
shoot us an email if you have any feedback. Thanks and enjoy the show.
I am so glad to have David and Ravi join the
conversation.Thanks to you both for hopping on the pod.
Good to be here.
Alright, David, let's start with you. For the folks who perhaps
missed the T&S ship as it got launched out of some port in San Francisco,
what is trust and safety? What's a good definition?
David Sullivan: So,
trust and safety is really the function of dealing with basically unwanted
content and behavior on digital products and services.
This goes back to the very beginning of the internet. It's
something that's kind of endogenous to the fact that if you are dealing in user-generated
content or behavior you're going to have the content or behavior that is either
harmful or illegal. And you need to have processes and mechanisms to deal with
that.
And this is a field that has evolved from being sort of an
artisanal kind of thing on bulletin boards back in the eighties and nineties
before the commercial internet into commercial content moderation, which is
what most people sort of associate with trust and safety. And now into a field
where you have all kinds of institutions and standards and approaches that I
think are helping to mature the field overall.
But it's one that is inherently about people and how they use
technology. And so it's one that is always going to be imperfect, but hopefully
improving.
Kevin Frazier: And
it's wild to consider that evolution with just a, a few quick stories. I know
one that stands out to me. There's a famous New York Times article that
outlined, I believe Nicole Wong's role at Google as the quote unquote decider
who is making a lot of these trust and safety decisions just by herself.
Saying, you know, what content should we allow? What policy
policies should we have? And now to your point, David, there's whole
conferences, there's whole books, there's whole podcasts all focused on T&S.
And so Ravi, we got a great definition there about trust and safety. What are
some of the misconceptions about what trust and safety is and isn't that you
think is, is particularly necessary to call out before we dive even deeper into
this topic?
Ravi Iyer: I think
when you see things on the internet that you think people shouldn't be doing,
the natural thing to do is to say, you know, why don't we make rules about the
things people can and can't do? And I think that's where we started. I think
that's what a lot of the work is.
That's what the article you referenced mentions, but I think
we've eventually realized that it's impossible to decide. It's, it's one,
illegitimate and two, impossible to decide everything.
You know, there are some things at the margin that we can all
agree on that people can and can't do, but I think trust and safety has evolved
to, to encompass more than just deciding what people can and can't do, but also,
how do we create a safer system overall?
Like, how do we design a system so that it doesn't encourage
people to do be doing things they shouldn't be doing? How do we create systems
so that people can mitigate for themselves, right? So like, we're not deciding
what people can and can't do, but maybe we allow users to figure out what they
don't want done to them, right? And to figure that out for themselves, right?
So I think it started from, a place of let's define rules for
what, you know, Nicole thinks people can and can't do. But I think it's evolved
to a place where we realize like we need a little bit more accommodation. That
we actually can't define these things and we need to let people define it for
themselves.
Kevin Frazier: And we
need to not be encouraging, encouraging this bad behavior in the first place, and
not only flagging the fact that we've moved beyond having a single quote
unquote decider to having much more people involved in this space, but also to
your point, Ravi, exploring different interventions and in particular, I'd say
interventions that are less perhaps top down, but more in the background.
And so, Ravi, can you explain that? Two parts of this
evolution.
Number one, how has the actual field of trust and safety
changed over time with respect to social media? Looking at the social media
companies themselves, how did we see trust and safety grow? And then perhaps
where do we stand today?
And also, how would you characterize the evolution of the sort
of interventions we see in terms of trying to facilitate a better internet,
whatever that may mean?
Ravi Iyer: Yeah, I'll
explain that, and I'll tell a little bit about my background at the same time.
So, you know, I, I've worked at Meta about, I started about
eight years ago. And I started off working on moderation efforts. Basically
like I worked in data science.
We, we counted how many times people reported bullying and
harassment or saw content that was later taken down for violating our hate
speech policy.
And we could drive those numbers to fairly low numbers, in part
because we had defined the problem, right? Like we defined what was hate
speech. And so if you define a thing, it's easy to sort of remove the thing you
define, right?
But eventually, you know, people in society would say to us—and
this is not specific to Meta, to every company—that there are things that
they're finding harmful that are not actually captured by these policies. And
so you'll see companies try to define policies to, to bridge those gaps, right?
And so like maybe if I just had a better definition of hate speech, I could
actually cover all the ways that people get people to dislike each other.
And the reality is that you can't do that. And like, one of the
most common ways that people create hate for each, for example, is to take a
crime committed by someone of the opposite group, right? And so, researchers
will call this “fierce themes.” And, and you know, if you look, if you start to
get that pattern and you look in society, that's often how people create hate.
They take, you know, something that someone and some group has
done, which–– someone of every group does something terrible on occasion and
they use that to tar the entire group as criminals or murderers or, or what
have you.
And so I think we realized that we need to like move beyond the,
you know, let's just define more and more policies. Go to the realm of design.
How do we go more upstream? How do we, like, think? And so, you know, you'll
see platforms doing what, you know, are sometimes called break the glass
measures where design chains be, you know, in the use space, they're sort of
increasing more privacy. Like privacy by default is a common thing that people
talk about.
Kevin Frazier: And
privacy. Just to, to pause you there for a second.
Ravi Iyer: Yep.
Kevin Frazier: Can
you give an example of some of the design features you're referring to?
Ravi Iyer: Yep, yeah.
So I mean, privacy by default is a simple one, right? So, you know, how do we
live rather than figuring out who has contacted whom and therefore we should
you know, in, in a, in appropriate way.
And then we should sanction those individuals. How do we stop
unwanted contact in the first place? Simple things like rate limits, like where
do we set our rate limits? Do we let new untrusted users contact people en masse?
Do you use, get different kinds of protections so they're maybe not visible at
the very beginning.
And you know, who do we put into our recommendation systems,
right? So we're saying like, you should be friends with this person. Can we
like, make those systems a little bit more robust so we stop those interactions
in the first place. And then, you know, things around algorithmic reforms. So,
you know, are there whole classes of content that we just think we're not going
to do a great job moderating, so therefore we just exclude them entirely.
And so, you know, there's just a couple of the examples. Where
you can go a little more upstream than the individual contact eligible content
level.
Kevin Frazier: Yeah.
And David, we've seen perhaps with Grok or excuse me, David, we've seen with X
in particular in that transition from Twitter to X as part of that transition,
I think there were mass layoffs in the trust and safety department.
Subsequently, we've also seen at Meta mass layoffs with respect
to trust and safety. What's the rationale there? What's going on? What is the
sort of justification that these companies are giving for saying, hey, we know
these folks are, are trying to make your platform better with respect to
aligning with user intentions.
Why get rid of the TNS folks?
David Sullivan: So I
think when it comes to how companies look at this, each company is going to
take their own point of view and, and should be asked about these things. I,
but I think it, it sort of goes back to Ravi's point in terms of people often
think that you know, content moderation is censorship when it's about things
that they, you know, that they like and want to see more of. And it is
imperative when it is something that they don't like.
And these are subjective decisions. I think there is a tendency
for folks who come to come into this world that has, as I said, sort of, you
know, people have been having these debates about what type of content and
behavior should be allowed on different online services since the beginning of
the internet.
But for folks who've missed some of those, you know, previous
debates and arguments you end up in a space of what, what Mike Masnick calls
the sort of speedrunning trust and safety.
And, and I think he did a great post about this in the context
of the, the change from Twitter to X where you, you sort of think that you
don't need all, all of this and that you can just let expression kind of,
flourish. And then you realize, you know, you still are going to have problems
like child sexual abuse material on any, any site that allows images. And then
from there, all of these other concerns.
I do think, you know, it can be hard to find a silver lining
when it comes to a, you know, some of the, the losses in terms of the
retrenchment of trust and safety across some companies and platforms in the
context of broader tech layoffs and all of that. But I do think that there are
some silver linings that you can find.
And one is that yes, a lot of people were laid off from the
trust and safety team at Twitter in particular, and by a bunch of other
companies as well. A lot of those people have now sort of diffused across the
space. So either maybe they've gone to smaller platforms or they are working in
academia, or they are working for vendor companies, or, you know, to the point
of, of Scaling Laws, they are working on AI solutions to maybe help, you
know, kind of reinvent the wheel and, and sort of trip the same landmines that
folks have been tripping for, for decades when it comes to content.
So, you know, there are I think some things we can try to be
optimistic about here.
Kevin Frazier: So I want
to test a hypothesis against the two of you, that not only was there pushback
around concern about censorship, and again, as you two have pointed out,
censorship was in the eye of the beholder and taken on both sides and all sides
of the political spectrum, and I think that definitely had a role to play in
sort of the concern about T&S as a field and perhaps growing too large at
certain companies. But I also think that it's worth noting that a lot of the
interventions––in particular the ones that tried to emulate the way we regulate
other spaces––just proved somewhat ineffective.
And so in particular, highlighting things like Meta’s oversight
board, this is the independent sort of supreme court of content moderation for
Facebook. They have a body of experts from around the world. They preside over
tricky content moderation decisions. But if you dig into the details, the
oversight board hears, I think, point zero, zero, zero, zero, zero, zero, I
mean, I could go on for a while. zero, zero, zero one percent of all appeals of
content moderation decisions made by Facebook. We also see things like
opportunities for submitting complaints and filing different forms. And so it
gives this illusion of greater due process and the opportunity of meaningful
challenges to whatever rules are being imposed.
Was there a sort of backlash to those interventions as just not
effective. Do you think that's part of this evolution that we've seen? Ravi,
I'll start with you.
Ravi Iyer: Yeah. I, I
do think that the combination of the ineffectiveness of, you know, people
deciding what people can and can't say combined with the, you know, there are,
you know, there are some bad faith discussions of censorship, but there are
some well-needed discussions of censorship as well. And there are mistakes that
platforms have made, that people have made in sort of deciding what people can
and can't say. So I think that combination means that there aren't that many
defenders.
Even if you talk to trust and safety workers, I don't think
most of them would say, you know, the solutions to our online environment is
just hire more of us. Right? Like, we just get more of us, because they realize
it's, it's not a winning battle. You can't just spend more money and solve
this. So I think we realize, you know, for both positive and negative reasons, like
trying to solve the problem and not trying to screw it up that we need
something beyond just the traditional moderation solutions.
Kevin Frazier: David,
anything to add there?
David Sullivan: I
think there are great things about initiatives like the oversight board. In
particular, to my mind, what the oversight board does a really good job of is
applying international human rights law to content moderation decisions and
showing that you can use international human rights law and especially Article
19 of the International Covenant on Civil and Political rights as a way of
looking at these decisions and, and evaluating them.
I do think that the sheer volume of effort devoted by some of
the biggest companies in this space over a period of time has led to focusing
on the most well-resourced, largest companies who have a particular set of, of
challenges and concerns, perhaps at the expense of a wider and more varied set
of services that are out there.
Like I don't think that we should, that that's the––so the
great thing about the oversight board is analysis of all of these decisions
that Facebook has made. The downside is that we are thinking about trust and
safety and content moderation only with regard to Facebook when there are so
many other products and, and that have their own functions and features that I
think need to be thought of in their own respect instead of just thinking about,
you know, the speech platforms in particular, and Meta in particular.
Kevin Frazier: Beyond
that, and I think something that you two have both either explicitly said or
hinted at, there's just the sheer scale problem, all, of all of this. And the
speed problem of having nine experts or 11 experts––or, you pick, 33––attempt
to cover the entire span of humanity for a platform like Facebook or as we'll
get to in a minute, OpenAI if you have 700 million users, the idea that you'll
know what's culturally appropriate and politically necessary or sensitive.
It's just, you know, a farce to think that anyone can have
their pulse on the scale of all of that. Which, as you were pointing out
earlier, Ravi, really raises the question of, just how far can the law go or
can explicit rules go in this context? And Ravi, I think a point that is useful
to bring out further here is what does progress even mean in this space?
What does it mean if we're doing a good job? In the trust and
safety world, what metrics can we say, all right, we've got our annual report
on trust and safety. We did the thing, guys, high five. Let's, let's go get a
drink.
Ravi Iyer: Yeah, I––so
companies will report their trust and safety metrics. What percentage of kind
are people seeing that violate some policy?
I am on the side of Arturo Baja's work on in this domain. Those
are not very effective in terms of understanding user harm. And so I think if
you want to ground truth on user harm, you have to ask users. It's not perfect,
but if a user––so what percentage of users say they've been bullied as opposed
to what percentage of content violates a bullying and harassment policy?
If somebody says, you know, nice sweater to another person, it
doesn't violate any policy and you can't really tell if they're being sarcastic
or if they're being––you need to know so much about the history. And so if a
person says they've been bullied, then again, it's not a hundred percent, but
it's a lot closer to ground than platform metrics about violating policies.
And so anyone who could do surveys, and so you're seeing
regulators do this more and more. You see Australia's online experiences
survey. UK does this, you know, people are tracking over time, you know, what
percentage of kids have seen unwanted sexual content, what percentage of kids
have had an online sexual interaction.
Thorn does this across platforms. You can actually, you know,
hold some platforms accountable for having twice as much of that than other
platforms. And I think that's the kind of thing we can see to, to measure
progress, giving platforms credit when they have low numbers and also holding
platforms accountable when they have high numbers.
Kevin Frazier: Yeah,
and mapping that onto the AI space, I think, is going to be particularly
challenging, which we'll get to in one second. Because David, I also want to
highlight before we move fully into the realm of AI, how is AI impacting the
realm of trust and safety in the traditional trust and safety sense with
respect to social media?
David Sullivan: So, I
think that you know, AI and automation have been, you know, going back to very,
very rudimentary automation, right? Has been a part of trust and safety from
the get-go. From the, like, oh we have a problem with spam and we need to
figure out how to solve that. So, AI as it's developed, you know, through sort
of machine learning classifiers and then moving into generative AI, these are
part of the tool set that trust and safety practitioners inside companies have
been using and are using.
And I think that one thing that's interesting is as we look
around for really compelling business use cases for generative AI products.
Trust and safety is one of the biggest ones in some ways. And it makes it very
interesting that, you know, the labs and companies that are developing, you
know, frontier models and that are also involved in all different places across
the AI supply chain are also users of AI in this space.
And I think that there is an important case for those folks who
are using AI as part of trust and safety to be a bigger part of the
conversation about making AI products as safe as possible.
Kevin Frazier: Yeah,
that's, it's such an interesting point too, because I think when we fail to
consider how trust and safety actually works and how content moderation
actually works, we may not be asking the right questions because I think a lot
of people, when they hear, even before we started calling it AI, as you noted
David, this has been around for a while.
We've leaned extensively on automated processes to be able to
call out bad content and content in particular. That immediately raises red
flags like CSAM, that is known, and that's just proliferating. We pull it down
as quickly as possible. And yet for the folks who insist, oh no, we need a
human in the loop.
The question that I've asked rooms full of people is, okay,
which of you is going to raise your hand to be that human in the loop? Who
wants to see the entirety of the nastiness of the internet? Not a lot of people
raise their hands.
Understandably so, and I think that's important because if we
can reduce the need for humans to be exposed to that sort of content––and we
know that it can result in serious mental distress and long-term mental
distress––we have to be asking, maybe AI can have an added benefit here.
Ravi, have you seen a sort of growing appreciation for the fact
that this, compared to what question is really something that just has to be a
part of the conversation.
Ravi Iyer: Yeah, I
mean, I agree. It has to be a part of the conversation.
There are two kinds of errors that you can make in moderation.
Like you can, you can be biased in a direction or another or you can be
inconsistent.
And I think once you realize that human beings are just not
going to agree as to what is harmful, then you realize like the bias question
is kind of just in some ways arbitrary, like there's no real way to fix that.
But human beings are always going to be inconsistent. Whereas AI, so AI is not going
to fix your bias problem because someone's going to have to figure out like,
what is the line?
And, and there is no real line that for many things that you
can draw, but it will fix the consistency problem. Like, so whatever you do,
it'll at least be consistent. And, and people will know the rules and so it
won't get arbitrary decisions in one case versus another.
Kevin Frazier: Yeah,
and this is where I hope we continue to have a more nuanced conversation about
trust and safety, of just taking a holistic picture of who's involved in this
process, who are we calling on to be involved in this process?
And to your point earlier, Ravi, how can we have clear metrics
about what's working and what's not? Because that can help inform the
conversation rather than just insisting on principles. Like, well, there must
always be a human in the loop, just not me. Nose goes for whoever gets stuck
here.
But David, now to, to shift a little bit, I mentioned earlier
in the intro that we've had what I refer to as a social media hangover. I think
you've seen in a lot of the AI debates a sense that we didn't get it right on
Facebook, we didn't get it right on Instagram.
People have read every Jonathan Haidt book. People have
listened to all the podcasts and they say, we need to act now. We got to
protect the kids. All very well intentioned. But before we kind of get into the
politics of that, can you just explain your perspective on what is distinct
about, for example, a user's interaction with an AI companion versus the use of
social media?
David Sullivan: Yeah,
I think that, and this really gets into the question of how folks are
attempting to deal with these issues through legislation regulation, litigation,
you know, sort of independent of what companies are doing voluntarily to try to
deal with these challenges.
To me, when you––one of the hardest things about trying to
regulate in this space is trying to have precise definitions for what you are
regulating, right? And so, yes, there's a sense that, okay, we should, you know––in
the past, it would be good to not have social media services, you know, kind
of, doing things that are leading to, say, unwanted, the unwanted sexual
contact that Ravi mentioned earlier. And so a lot of the definitions that you
see in laws and regulations, whether it's at the state level in the U.S. or
looking around the world to different places that have enacted legislation. Try
to come up with some definition of social media.
That usually involves the sort of user-to-user kind of sharing
of content element and in other cases are looking at search either as like
something else to be regulated or as something to carve out because you don't
want to inhibit people's ability to use digital services to seek out and obtain
and access information.
To me AI companions and, and chatbots of different kinds really
fall in the middle between these two things. You know, there may be a social
component. We see new products coming out all the time with more social
components.
Kevin Frazier: All
the vibes. All the vibes, exactly. That unintended of
David Sullivan: New,
new social media platforms driven by AI, but the notion, but that may, that's
usually incidental sort of, it's not the primary purpose if we're trying to
think about it that way.
And so yeah, I think that there is a, a challenge with how you
try to are trying to, you know, retrofit definitions that are looking at user-to-user
interaction or sort of search to address this new technology that, as you
mentioned before, is being used by, you know, hundreds of millions of people
around the world.
Kevin Frazier: Yeah,
and it's such a crazy challenge too because we know that of the 700 million
people, everyone's using it for different purposes. I think if we look at the
OpenAI user survey, which of course I would love to kick tires on, and I think
a lot of people need to do further investigation of it, but just a cursory
look, you'll see 1.9% of use cases are for AI companions or this sort of
emotional reliance.
And so that's where I think putting things in perspective, to
your point about definitions, David, we also have to consider just the actual
empirical uses of the tools and it's very improper to conflate AI companions
with AI all capitalized.
Right. You know, this is a subset use of a tool that is, you
know, the, a Swiss Army knife infinite size. But Ravi, what to you, stands out
as particularly distinct between social media and kind of these AI companion
tools?
Ravi Iyer: Yeah, so,
I think there's some things that are similar and there are things that are
distinct.
So, so I think both of these things are products that companies
are going to try to get through to use them more. And that's why you get these
externalities, right? And so, you know, in social media, you know, optimizing
for engagement, trying to build your network has these externalities when now
you're talking to people who they're risky and you're seeing concept that maybe
you don't want, right?
Because it's more engaging. Similarly, in the AI world, you
know, these, you know, you can ask, a friend of mine asked this product for
some spreadsheet formulas and it, you know, told him what a great question he
asked. And he was like, this thing is so cloying and right.
And so you see that synco as a way to get you to use the
product for, right. So I totally agree with you. Like the, the use cases are
very different.
I just point out that, you know, 1.9% of 700 million is a lot
of people. So just because it's a small percentage of the people doesn't mean
we, we shouldn't actually care about it. And I think a lot of the regulation is
about staying in that informational realm like.
There's a lot of research showing the benefits of these
products for, you know, search, like use cases. I want to learn about a, a fin––there's
not a lot of people who are trying to use it to, for companions. The product
may sort of veer into that realm because it's trying to get you to use it more.
And I think a lot of the regulation is to, to sort of put a wall there, like we
didn't for social media, where we want to keep it in this realm of utility.
You know, a lot of, something I think about a lot these days is
agency. Like, what are you just trying to use? How do we get to use these
things as tools? Like, I want to learn some fact, and how, how do I stop it
from pushing me in a direction?
I don't want to go, like, I'm not trying to make friends with
this product. Don't try to push me to do that unless I ask you. That's what I want
to do.
Kevin Frazier: Yeah,
and I think that's such an important point to call out Ravi, which is to say,
to go back earlier to the online experience surveys you kind of highlighted,
which is saying, look, we're grounding this in the expectations and experiences
of the user, and to what extent is this tool aligning with what they want it?
To address––because my chief complaint, and I have many
complaints unfortunately, but my chief complaint in, in this realm is laws like
AB 1064 in California, for instance, is very much well-intentioned, calling on what
they refer to as operators––basically any developer, deployer agency company,
allowing for the diffusion of an AI companion to miners, for example, to
prioritize quote unquote factual accuracy over the user's values, beliefs, or
preferences.
Which in a vacuum, I think you can say, sure, that might sound
good or socially desirable, but also sometimes it's really nice to have a
companion that says, you know what, Kevin, you were right. Your sister is being
really mean. Or, you know what, Kevin, Santa is real and isn't that wonderful
for us all to celebrate. And these sorts of questions of what are you trying to
get out of the model and what do we want the model to do from a user's
perspective?
Rather than trying to, in my opinion, re-litigate the, the sort
of disinformation, misinformation, battles of the social media era and just map
it right onto AI, it’s a really tricky question, but I, I'm not sure we're
getting the sort of nuance here that we've been talking about in terms of prioritizing
user expectations.
And I, I love this point, Ravi, user agency of what is it you
want from the model and is it doing that to the fullest extent possible? But
David, what am, what am I missing here? What, what other, what other lessons do
you think we should or should not be learning from the trust and safety
evolution that we saw in the AI context?
David Sullivan: So, I
think one piece of the trust and safety evolution that I want to highlight to
do just a tiny bit of self-promotion. So at the Digital Trust and Safety
Partnership, we've articulated a framework of best practices.
It's now an international standard at ISO IEC 25389––you can
get that for free from ISO––that sets out five commitments around product
development, governance, enforcement, improvement, and transparency with
examples of best practices underneath that. Those examples of best practices
are articulated at bullet point length.
They are not super prescriptive. I don't think they say all the
things that Ravi would want them to say. But my view would be that if you use
those in a really robust and rigorous manner, you would get to a lot of the
outcomes that, you know, Ravi would be recommending based on his experience and
research and, and all of that.
When it comes to things like incorporating user experience and,
you know, incorporating the perspective of users and, and building features and
products that reflect those things as opposed to reflecting incentives that
might be problematic, so I do think that there is, there is a lot we can draw
on from there when it comes to you know, turning to AI products.
To me, I think the other thing that I really want to highlight
here is maybe, you know, because there's so much focus on the most general
purpose of AI applications like a ChatGPT, as well as on the capabilities of
these, you know, frontier models and the new models that are coming out all the
time, we, we even just, you know, constantly talk about the model this, the
model that, and I think that is a distraction.
What we really should be talking about is the products. That's
where the rubber hits the road. That's where people are actually using these
things, whether it is in a new standalone, you know, AI companion or whether it
is you know, Gemini trying to, you know, Clippy its way into me using
generative AI in, you know, every respect across the Google workspace.
And that's where I think we can build better features
incentives and mitigations in order to create safety that is downstream from,
you know, all of these conversations about the safety of the models. So I think
that's an important distinction that we can kind of hammer in on.
Kevin Frazier: Yeah,
and I love this focus too on the technology itself because I think that perhaps
in some state legislatures and perhaps in Congress, we're not always aware of
the full AI tech stack and what it means to intervene at which level of
development, deployment, and then actual application as we're talking about
here.
And if you talk to folks in the labs, they'll tell you just a
little bit of fine tuning can drastically change the nature of that model. And
so paying attention to where is the sort of least-cost-avoider––not to get way
into econ and public policy land, but where is the intervention going to have
the greatest impact, at the lowest amount of cost to everyone else, and other
use cases is a difficult question, but it's a question we should be asking.
Ravi, it looks like you've got a response here. What, what's on
your mind?
Ravi Iyer: Yeah, I
mean, I do want to highlight that fine-tuning is, you know, it, there is
research suggesting that once a model has a capability and you do some fine
tuning to get rid of that capability, the capability still remains right.
And so people can undo that. So it's not as robust as something
that, just like in social media, it's not as robust to address things after the
fact. It's always more robust to do things more upstream. And as far as I
understand it. You know, there are things like synco that would actually be
better to address at a higher level.
So, you know, I, I guess I just want to say that I, I still
believe that we should, and, and I don't think these things are impossible. So
I do think that you can, you know, have models that have the same, you know, if
models are trying to give you what you expect based on the things it's been
trained, you know, but, but all those datasets have biases, right?
They're things that have been said online in certain ways,
right? And so they don't reflect everything in human society. And so therefore
these models say more positive things to each other, potentially. Or they may
be, they don't give you negative feedback 'cause they're trained on, you know.
What people say they prefer as an answer, right?
And whereas human beings, you know, if you ask me enough
questions about, you know, how to build a bomb, or, you know, about negative
things, like, my alarm bells go off, right? I, I tell other people about it. I know
when I'm outta my depth, right? And I don't answer things. And, and I go get
help. I, I alert somebody else who might be able to help.
And, and these models just don't do that. So they have like a
part of what we have as a human being and they don't have a lot of other parts
that keep us safe, and those are things that can be addressed at a more active
level.
Kevin Frazier: Yeah,
but I, I think it's also worth pointing out as you flag that in many of these
instances too, there's a question of just what are the other interventions that
we're perhaps not considering?
I think the rush to say the first response should be going to
the model versus acknowledging that, hey, these alternatives of flagging
content or notifying folks may be a, a different mechanism. I'm not sure they,
it's mutually exclusive, but I think it's just worth pushing people. And so
with that in mind, I just got a text from Sam Altman and he said, David and
Ravi.
I need help. I need to know how to proceed. We just saw OpenAI,
arguably in response to a lot of the lawsuits that are being filed, has changed
its approach to the use of its tools by teens in particular. In a blog post,
Altman said that they were going to prioritize the safety of teens over freedom
and over privacy, being very explicit to say, look, we think we need these
safety protocols to, to take priority in that sense.
David, if you had one piece of advice or one thing, you very
much want Sam, or you pick the frontier AI company that's developing these
tools, what advice would you provide or what intervention would you really
insist on?
David Sullivan: So, I
think my intervention is going to be at the, at the level of like bureaucratic
politics inside the company, which is to say that, you know, Sam's probably got
a lot of people who are focused on AI safety at the, the sort of frontier model
level who are probably people thinking about a certain set of risks with a
certain background in training. There may be some, so we'll say
Kevin Frazier: AI
safety, capital S Safety, yeah. Yeah. Catastrophic risk. Existential risk.
Okay.
David Sullivan:
Exactly. Then you have responsible AI people and teams, right? Who are people
who are coming from the world of recognizing from almost, from that other realm
of, of AI about predictive and automated decision making, right?
All of the bias and need for accountability and transparency.
And those people are coming at these things from a certain way and report to
certain people. And then you have trust and safety teams who, maybe
understandably, AI companies are like, we don't want to repeat the mistakes
that, you know, we've seen others make.
So having, you know, really, functions that are all about not
just, you know, sort of what the, the model does, but how you monitor and
enforce and improve and all of that, we don't need to worry as much about that.
We're just going to get things right from the get go.
And I don't think that these teams are talking to each other
and given adequate kind of, attention across the decision making that goes into
how these policies get rolled out. So I think you gotta break down those silos
and hopefully then get to some better outcomes first in terms of what kind of
policy decisions you're making, and then about how you actually monitor and
update those over time.
So that, that's my take on this.
Kevin Frazier: Sam,
Sam gave you a response, a thumbs up but he said he still wants to hear from
Ravi. So Ravi, what are you going to tell Sam here?
Ravi Iyer: Yeah, I'd
say the original sin of social media is believing that. The more that people
use the product, the more valuable it is, and that there's an infinite amount
of product usage that is valuable and people don't want, and they'll tell you
this if you do surveys, like people actually think they use these products too
much.
They don't want, like, it's not in their interest, it's not
their aspiration to use these products more. And so insofar as Sam is
considering business models––and you see this, right? Advertising models, you
know, AI-generated videos and like a TikTok clone that are predicated on, we
just want you, to get you to use these products more.
And that's how we're serving, you know, value, providing value
to community. I think you're inevitably going to create, make product decisions
that are going to be harmful to users. You know, you're going to create a
slightly more syn product, you know, whatever you call it, because. You know,
that's what gets people to use it more.
It's going to compliment people more. It's going to pretend to
be their friend more. Or it'll come up with some new, other new trick that I, I
haven't defined yet, right. So, just getting out of the, let's try to get
people to use those products more. And that is creating user value and really,
you know, honestly being, you know, anchoring on the, the user value, what a
user's trying to do, how can I serve that?
They want to learn something and then they want to move on and
not use the product anymore. Let them do that. Don't try to hold onto them
through all sorts of tricks, and I think you'll be in a better place.
Kevin Frazier: And
this is where I think it's so critical that we have an ongoing social
conversation of just how do we think folks should be using these tools? In what
way should AI companions work themselves into our daily lives, and in
particular in the lives of children? Because that's a, it's a tricky question,
but it's one we have to be honest about so that we can go back and say, Hey,
this is what a good use of social of.
Nah, see, I'm doing it myself. This is what a good use of AI
actually looks like. But with that, David, any final word?
David Sullivan: I
think that the one thing that I worry about, right, I, I share Ravi's message
to Sam. Like, I, I agree with that, but what I worry about is when that gets
translated especially into legislation and regulation that calcify that down to
certain just, you know, a, a checklist of things and be like, don't worry, we
don't have infinite scroll. So everyone's going to be good. Right?
That, and so I think that's why there, there is a need to
constantly reexamine the choices that are made to make sure that, you know, you
are––that products are not leading to more harmful situations. And that can't
be like something that can be easily written down in a bill that's going to be
effective in, you know, six months or a year or five years from now. It's going
to be something that reflects, you know, something that's already outdated.
So we have to figure out how to, how to do that in a way that's
actually gonna get results that are future-proofed to the, the best extent
possible, which is always limited when it comes to, again, you know, products
that are interacting with humans and, and all the challenges that come with
that.
Kevin Frazier: David,
you're such a radical calling for empirically driven policy.
How dare you? That's just wild. Ravi, any final, final word
here?
Ravi Iyer: Yeah, I
mean, I, I guess maybe just a slight counterpoint you know, just because we
don't know everything doesn't mean we don't know something, right? And so there
are some tangible patterns of these products. The Neely Center, you have a
design code for search chatbots that we're working on, which contains many of
these patterns, the same patterns we do in psychology labs.
You know, if you're in a psychology lab and you want to
manufacture intimacy and friendship, you do certain things. You tell someone
how much you like them, you share stories about yourself. And these products
learn to do those things. So just because we don't know everything about the
future of these products doesn't mean that we don't know some things that are
harmful, that really aren't what users are asking for.
Like users want to learn new things from these products.
They're not really asking to be com––you know, have them as companion and
especially for kids, right? So I think that's a very different argument for
kids. So I do think that there are well-meaning and, and commonsense ideas
about how we could draw a line for kids that are worth considering now, even as
we have more to learn for the future.
Kevin Frazier: Well,
clearly David has one more final thing to say.
David Sullivan: I, I agree
with that. Very much. But I also, you know, sort of, I'm thinking about like if
you look at something like the Kids' Online Safety Act, right? Which has been
the thing that many people are like, if only we could pass the Kids' Online
Safety Act, that would take care of all of this.
But the Kids' Online Safety Act, if you look at both the
definitions of what kinds of platforms would be covered by that and what those
platforms would have to do, it's not clear that if the Kids' Online Safety Act
had been passed a year ago or two years ago, that it would be addressing the
challenges that we're seeing with kids and chatbots.
And so that's why we need to have––we need to, we need to be
thinking about this, these things. Yes, let's look to the science and let's
change the incentives. But let's also maintain flexibility and yeah, as you
said, Kevin, look for, for evidence to support these things going forward.
Kevin Frazier: I look
forward to having you both back on as we see how the evidence comes out and how
the laws get implemented. It's going to be a wild time. I think we can all
agree on that.
So thank you, Ravi, thank you, David for coming on. We'll have
to leave it there.
Ravi Iyer: Thanks.
David Sullivan:
Thanks very much.
Kevin Frazier:Scaling Laws is a joint production of Lawfare and
the University of Texas School of Law.
You can get an ad-free version of this and other Lawfare
podcasts by becoming a material subscriber at our website,
lawfaremedia.org/support. You'll also get access to special events and other
content available only to our supporters. Please rate and review us wherever
you get your podcasts. Check out our written work at lawfaremedia.org. You can
also follow us on X and Bluesky.
This podcast was edited by Noam Osband of Goat Rodeo. Our music
is from ALIBI. As always, thanks for listening.
