Scaling Laws: AI Safety Meet Trust & Safety with Ravi Iyer and David Sullivan

Kevin Frazier; David Sullivan; Ravi Iyer

Cybersecurity & Tech

Scaling Laws: AI Safety Meet Trust & Safety with Ravi Iyer and David Sullivan

Kevin Frazier, David Sullivan, Ravi Iyer

Tuesday, October 7, 2025, 10:00 AM

Share On:

Discussing trust & safety's relevance to artificial intelligence.

Meet The Authors

Published by The Lawfare Institute
in Cooperation With

Subscribe to Lawfare

David Sullivan, Executive Director of the Digital Trust & Safety Partnership, and Rayi Iyer, Managing Director of the Psychology of Technology Institute at USC’s Neely Center, join Kevin Frazier, AI Innovation and Law Fellow at the University of Texas School of Law and a Senior Editor at Lawfare, to discuss the evolution of the Trust & Safety field and its relevance to ongoing conversations about how best to govern AI.

They discuss the importance of thinking about the end user in regulation, debate the differences and similarities between social media and AI companions, and evaluate current policy proposals.

You’ll “like” (bad pun intended) this one.

Leo Wu provided excellent research assistance to prepare for this podcast.

Read more from Kevin:

https://www.cato.org/blog/california-chatroom-ab-1064s-likely-constitutional-overreach

Find Scaling Laws on the Lawfare website, and subscribe to never miss an episode.

This Scaling Laws episode ran as the October 10 Lawfare Daily episode.

To receive ad-free podcasts, become a Lawfare Material Supporter at www.patreon.com/lawfare. You can also support Lawfare by making a one-time donation at https://givebutter.com/lawfare-institute.

Click the button below to view a transcript of this podcast. Please note that the transcript was auto-generated and may contain errors.

Transcript

[Intro]

Alan Rozenshtein: It is the Lawfare Podcast. I'm Alan Rozenshtein, associate professor of law at the University of Minnesota, and a senior editor and research director at Lawfare.

Today we're bringing you something a little different: an episode from our new podcast series, Scaling Laws. It's a creation of Lawfare and the University of Texas School of Law where we're tackling the most important AI and policy questions from new legislation on Capitol Hill to the latest breakthroughs that are happening in the labs.

We cut through the hype to get you up to speed on the rules, standards, and ideas shaping the future of this pivotal technology. If you enjoy this episode, you can find and subscribe to Scaling Laws wherever you get your podcasts and follow us on X and Bluesky. Thanks for listening.

When the AI overlords takeover, what are you most excited about?

Kevin Frazier: It's, it's not crazy. It's just smart.

Alan Rozenshtein: And just this year, in the first six months, there have been something like a thousand laws.

Kevin Frazier: Who's actually building the scaffolding around how it's gonna work, how everyday folks are gonna use it? AI only works if society lets it work.

There are so many questions have to be figured out and nobody came to my bonus class. Let's enforce the rules of the road.

[Main episode]

Welcome back to Scaling Laws, the podcast brought to you by Lawfare and the University of Texas School of Law that explores the intersection of AI policy and, of course, the law.

I'm Kevin Frazier, the AI Innovation and Law fellow at Texas Law and a senior editor at Lawfare. There's a lot of attention today on what many refer to as AI safety: making sure models are aligned with social values and perform as intended. But there's another part of the puzzle to ensuring AI aligns with consumer expectations and community values, and that's analyzing how people actually use AI and to what ends.

Recent headlines have included tragic incidents of self-harm by teens who establish relationships with AI tools. The resulting litigation may turn on many of the same issues that have long been at the heart of the so-called trust and safety field. It's my pleasure to welcome to the show David Sullivan, founding executive director of the Digital Trust and Safety Partnership, and Ravi Iyer, managing director of the Psychology of Technology Institute at USC’s Neely Center.

These two folks are leaders in the T&S space and can help us sort out what's new and what's not about AI when it comes to trust and safety issues. As always, please feel free to follow us on X or Bluesky or shoot us an email if you have any feedback. Thanks and enjoy the show.

I am so glad to have David and Ravi join the conversation.Thanks to you both for hopping on the pod.

Good to be here.

Alright, David, let's start with you. For the folks who perhaps missed the T&S ship as it got launched out of some port in San Francisco, what is trust and safety? What's a good definition?

David Sullivan: So, trust and safety is really the function of dealing with basically unwanted content and behavior on digital products and services.

This goes back to the very beginning of the internet. It's something that's kind of endogenous to the fact that if you are dealing in user-generated content or behavior you're going to have the content or behavior that is either harmful or illegal. And you need to have processes and mechanisms to deal with that.

And this is a field that has evolved from being sort of an artisanal kind of thing on bulletin boards back in the eighties and nineties before the commercial internet into commercial content moderation, which is what most people sort of associate with trust and safety. And now into a field where you have all kinds of institutions and standards and approaches that I think are helping to mature the field overall.

But it's one that is inherently about people and how they use technology. And so it's one that is always going to be imperfect, but hopefully improving.

Kevin Frazier: And it's wild to consider that evolution with just a, a few quick stories. I know one that stands out to me. There's a famous New York Times article that outlined, I believe Nicole Wong's role at Google as the quote unquote decider who is making a lot of these trust and safety decisions just by herself.

Saying, you know, what content should we allow? What policy policies should we have? And now to your point, David, there's whole conferences, there's whole books, there's whole podcasts all focused on T&S. And so Ravi, we got a great definition there about trust and safety. What are some of the misconceptions about what trust and safety is and isn't that you think is, is particularly necessary to call out before we dive even deeper into this topic?

Ravi Iyer: I think when you see things on the internet that you think people shouldn't be doing, the natural thing to do is to say, you know, why don't we make rules about the things people can and can't do? And I think that's where we started. I think that's what a lot of the work is.

That's what the article you referenced mentions, but I think we've eventually realized that it's impossible to decide. It's, it's one, illegitimate and two, impossible to decide everything.

You know, there are some things at the margin that we can all agree on that people can and can't do, but I think trust and safety has evolved to, to encompass more than just deciding what people can and can't do, but also, how do we create a safer system overall?

Like, how do we design a system so that it doesn't encourage people to do be doing things they shouldn't be doing? How do we create systems so that people can mitigate for themselves, right? So like, we're not deciding what people can and can't do, but maybe we allow users to figure out what they don't want done to them, right? And to figure that out for themselves, right?

So I think it started from, a place of let's define rules for what, you know, Nicole thinks people can and can't do. But I think it's evolved to a place where we realize like we need a little bit more accommodation. That we actually can't define these things and we need to let people define it for themselves.

Kevin Frazier: And we need to not be encouraging, encouraging this bad behavior in the first place, and not only flagging the fact that we've moved beyond having a single quote unquote decider to having much more people involved in this space, but also to your point, Ravi, exploring different interventions and in particular, I'd say interventions that are less perhaps top down, but more in the background.

And so, Ravi, can you explain that? Two parts of this evolution.

Number one, how has the actual field of trust and safety changed over time with respect to social media? Looking at the social media companies themselves, how did we see trust and safety grow? And then perhaps where do we stand today?

And also, how would you characterize the evolution of the sort of interventions we see in terms of trying to facilitate a better internet, whatever that may mean?

Ravi Iyer: Yeah, I'll explain that, and I'll tell a little bit about my background at the same time.

So, you know, I, I've worked at Meta about, I started about eight years ago. And I started off working on moderation efforts. Basically like I worked in data science.

We, we counted how many times people reported bullying and harassment or saw content that was later taken down for violating our hate speech policy.

And we could drive those numbers to fairly low numbers, in part because we had defined the problem, right? Like we defined what was hate speech. And so if you define a thing, it's easy to sort of remove the thing you define, right?

But eventually, you know, people in society would say to us—and this is not specific to Meta, to every company—that there are things that they're finding harmful that are not actually captured by these policies. And so you'll see companies try to define policies to, to bridge those gaps, right? And so like maybe if I just had a better definition of hate speech, I could actually cover all the ways that people get people to dislike each other.

And the reality is that you can't do that. And like, one of the most common ways that people create hate for each, for example, is to take a crime committed by someone of the opposite group, right? And so, researchers will call this “fierce themes.” And, and you know, if you look, if you start to get that pattern and you look in society, that's often how people create hate.

They take, you know, something that someone and some group has done, which–– someone of every group does something terrible on occasion and they use that to tar the entire group as criminals or murderers or, or what have you.

And so I think we realized that we need to like move beyond the, you know, let's just define more and more policies. Go to the realm of design. How do we go more upstream? How do we, like, think? And so, you know, you'll see platforms doing what, you know, are sometimes called break the glass measures where design chains be, you know, in the use space, they're sort of increasing more privacy. Like privacy by default is a common thing that people talk about.

Kevin Frazier: And privacy. Just to, to pause you there for a second.

Ravi Iyer: Yep.

Kevin Frazier: Can you give an example of some of the design features you're referring to?

Ravi Iyer: Yep, yeah. So I mean, privacy by default is a simple one, right? So, you know, how do we live rather than figuring out who has contacted whom and therefore we should you know, in, in a, in appropriate way.

And then we should sanction those individuals. How do we stop unwanted contact in the first place? Simple things like rate limits, like where do we set our rate limits? Do we let new untrusted users contact people en masse? Do you use, get different kinds of protections so they're maybe not visible at the very beginning.

And you know, who do we put into our recommendation systems, right? So we're saying like, you should be friends with this person. Can we like, make those systems a little bit more robust so we stop those interactions in the first place. And then, you know, things around algorithmic reforms. So, you know, are there whole classes of content that we just think we're not going to do a great job moderating, so therefore we just exclude them entirely.

And so, you know, there's just a couple of the examples. Where you can go a little more upstream than the individual contact eligible content level.

Kevin Frazier: Yeah. And David, we've seen perhaps with Grok or excuse me, David, we've seen with X in particular in that transition from Twitter to X as part of that transition, I think there were mass layoffs in the trust and safety department.

Subsequently, we've also seen at Meta mass layoffs with respect to trust and safety. What's the rationale there? What's going on? What is the sort of justification that these companies are giving for saying, hey, we know these folks are, are trying to make your platform better with respect to aligning with user intentions.

Why get rid of the TNS folks?

David Sullivan: So I think when it comes to how companies look at this, each company is going to take their own point of view and, and should be asked about these things. I, but I think it, it sort of goes back to Ravi's point in terms of people often think that you know, content moderation is censorship when it's about things that they, you know, that they like and want to see more of. And it is imperative when it is something that they don't like.

And these are subjective decisions. I think there is a tendency for folks who come to come into this world that has, as I said, sort of, you know, people have been having these debates about what type of content and behavior should be allowed on different online services since the beginning of the internet.

But for folks who've missed some of those, you know, previous debates and arguments you end up in a space of what, what Mike Masnick calls the sort of speedrunning trust and safety.

And, and I think he did a great post about this in the context of the, the change from Twitter to X where you, you sort of think that you don't need all, all of this and that you can just let expression kind of, flourish. And then you realize, you know, you still are going to have problems like child sexual abuse material on any, any site that allows images. And then from there, all of these other concerns.

I do think, you know, it can be hard to find a silver lining when it comes to a, you know, some of the, the losses in terms of the retrenchment of trust and safety across some companies and platforms in the context of broader tech layoffs and all of that. But I do think that there are some silver linings that you can find.

And one is that yes, a lot of people were laid off from the trust and safety team at Twitter in particular, and by a bunch of other companies as well. A lot of those people have now sort of diffused across the space. So either maybe they've gone to smaller platforms or they are working in academia, or they are working for vendor companies, or, you know, to the point of, of Scaling Laws, they are working on AI solutions to maybe help, you know, kind of reinvent the wheel and, and sort of trip the same landmines that folks have been tripping for, for decades when it comes to content.

So, you know, there are I think some things we can try to be optimistic about here.

Kevin Frazier: So I want to test a hypothesis against the two of you, that not only was there pushback around concern about censorship, and again, as you two have pointed out, censorship was in the eye of the beholder and taken on both sides and all sides of the political spectrum, and I think that definitely had a role to play in sort of the concern about T&S as a field and perhaps growing too large at certain companies. But I also think that it's worth noting that a lot of the interventions––in particular the ones that tried to emulate the way we regulate other spaces––just proved somewhat ineffective.

And so in particular, highlighting things like Meta’s oversight board, this is the independent sort of supreme court of content moderation for Facebook. They have a body of experts from around the world. They preside over tricky content moderation decisions. But if you dig into the details, the oversight board hears, I think, point zero, zero, zero, zero, zero, zero, I mean, I could go on for a while. zero, zero, zero one percent of all appeals of content moderation decisions made by Facebook. We also see things like opportunities for submitting complaints and filing different forms. And so it gives this illusion of greater due process and the opportunity of meaningful challenges to whatever rules are being imposed.

Was there a sort of backlash to those interventions as just not effective. Do you think that's part of this evolution that we've seen? Ravi, I'll start with you.

Ravi Iyer: Yeah. I, I do think that the combination of the ineffectiveness of, you know, people deciding what people can and can't say combined with the, you know, there are, you know, there are some bad faith discussions of censorship, but there are some well-needed discussions of censorship as well. And there are mistakes that platforms have made, that people have made in sort of deciding what people can and can't say. So I think that combination means that there aren't that many defenders.

Even if you talk to trust and safety workers, I don't think most of them would say, you know, the solutions to our online environment is just hire more of us. Right? Like, we just get more of us, because they realize it's, it's not a winning battle. You can't just spend more money and solve this. So I think we realize, you know, for both positive and negative reasons, like trying to solve the problem and not trying to screw it up that we need something beyond just the traditional moderation solutions.

Kevin Frazier: David, anything to add there?

David Sullivan: I think there are great things about initiatives like the oversight board. In particular, to my mind, what the oversight board does a really good job of is applying international human rights law to content moderation decisions and showing that you can use international human rights law and especially Article 19 of the International Covenant on Civil and Political rights as a way of looking at these decisions and, and evaluating them.

I do think that the sheer volume of effort devoted by some of the biggest companies in this space over a period of time has led to focusing on the most well-resourced, largest companies who have a particular set of, of challenges and concerns, perhaps at the expense of a wider and more varied set of services that are out there.

Like I don't think that we should, that that's the––so the great thing about the oversight board is analysis of all of these decisions that Facebook has made. The downside is that we are thinking about trust and safety and content moderation only with regard to Facebook when there are so many other products and, and that have their own functions and features that I think need to be thought of in their own respect instead of just thinking about, you know, the speech platforms in particular, and Meta in particular.

Kevin Frazier: Beyond that, and I think something that you two have both either explicitly said or hinted at, there's just the sheer scale problem, all, of all of this. And the speed problem of having nine experts or 11 experts––or, you pick, 33––attempt to cover the entire span of humanity for a platform like Facebook or as we'll get to in a minute, OpenAI if you have 700 million users, the idea that you'll know what's culturally appropriate and politically necessary or sensitive.

It's just, you know, a farce to think that anyone can have their pulse on the scale of all of that. Which, as you were pointing out earlier, Ravi, really raises the question of, just how far can the law go or can explicit rules go in this context? And Ravi, I think a point that is useful to bring out further here is what does progress even mean in this space?

What does it mean if we're doing a good job? In the trust and safety world, what metrics can we say, all right, we've got our annual report on trust and safety. We did the thing, guys, high five. Let's, let's go get a drink.

Ravi Iyer: Yeah, I––so companies will report their trust and safety metrics. What percentage of kind are people seeing that violate some policy?

I am on the side of Arturo Baja's work on in this domain. Those are not very effective in terms of understanding user harm. And so I think if you want to ground truth on user harm, you have to ask users. It's not perfect, but if a user––so what percentage of users say they've been bullied as opposed to what percentage of content violates a bullying and harassment policy?

If somebody says, you know, nice sweater to another person, it doesn't violate any policy and you can't really tell if they're being sarcastic or if they're being––you need to know so much about the history. And so if a person says they've been bullied, then again, it's not a hundred percent, but it's a lot closer to ground than platform metrics about violating policies.

And so anyone who could do surveys, and so you're seeing regulators do this more and more. You see Australia's online experiences survey. UK does this, you know, people are tracking over time, you know, what percentage of kids have seen unwanted sexual content, what percentage of kids have had an online sexual interaction.

Thorn does this across platforms. You can actually, you know, hold some platforms accountable for having twice as much of that than other platforms. And I think that's the kind of thing we can see to, to measure progress, giving platforms credit when they have low numbers and also holding platforms accountable when they have high numbers.

Kevin Frazier: Yeah, and mapping that onto the AI space, I think, is going to be particularly challenging, which we'll get to in one second. Because David, I also want to highlight before we move fully into the realm of AI, how is AI impacting the realm of trust and safety in the traditional trust and safety sense with respect to social media?

David Sullivan: So, I think that you know, AI and automation have been, you know, going back to very, very rudimentary automation, right? Has been a part of trust and safety from the get-go. From the, like, oh we have a problem with spam and we need to figure out how to solve that. So, AI as it's developed, you know, through sort of machine learning classifiers and then moving into generative AI, these are part of the tool set that trust and safety practitioners inside companies have been using and are using.

And I think that one thing that's interesting is as we look around for really compelling business use cases for generative AI products. Trust and safety is one of the biggest ones in some ways. And it makes it very interesting that, you know, the labs and companies that are developing, you know, frontier models and that are also involved in all different places across the AI supply chain are also users of AI in this space.

And I think that there is an important case for those folks who are using AI as part of trust and safety to be a bigger part of the conversation about making AI products as safe as possible.

Kevin Frazier: Yeah, that's, it's such an interesting point too, because I think when we fail to consider how trust and safety actually works and how content moderation actually works, we may not be asking the right questions because I think a lot of people, when they hear, even before we started calling it AI, as you noted David, this has been around for a while.

We've leaned extensively on automated processes to be able to call out bad content and content in particular. That immediately raises red flags like CSAM, that is known, and that's just proliferating. We pull it down as quickly as possible. And yet for the folks who insist, oh no, we need a human in the loop.

The question that I've asked rooms full of people is, okay, which of you is going to raise your hand to be that human in the loop? Who wants to see the entirety of the nastiness of the internet? Not a lot of people raise their hands.

Understandably so, and I think that's important because if we can reduce the need for humans to be exposed to that sort of content––and we know that it can result in serious mental distress and long-term mental distress––we have to be asking, maybe AI can have an added benefit here.

Ravi, have you seen a sort of growing appreciation for the fact that this, compared to what question is really something that just has to be a part of the conversation.

Ravi Iyer: Yeah, I mean, I agree. It has to be a part of the conversation.

There are two kinds of errors that you can make in moderation. Like you can, you can be biased in a direction or another or you can be inconsistent.

And I think once you realize that human beings are just not going to agree as to what is harmful, then you realize like the bias question is kind of just in some ways arbitrary, like there's no real way to fix that. But human beings are always going to be inconsistent. Whereas AI, so AI is not going to fix your bias problem because someone's going to have to figure out like, what is the line?

And, and there is no real line that for many things that you can draw, but it will fix the consistency problem. Like, so whatever you do, it'll at least be consistent. And, and people will know the rules and so it won't get arbitrary decisions in one case versus another.

Kevin Frazier: Yeah, and this is where I hope we continue to have a more nuanced conversation about trust and safety, of just taking a holistic picture of who's involved in this process, who are we calling on to be involved in this process?

And to your point earlier, Ravi, how can we have clear metrics about what's working and what's not? Because that can help inform the conversation rather than just insisting on principles. Like, well, there must always be a human in the loop, just not me. Nose goes for whoever gets stuck here.

But David, now to, to shift a little bit, I mentioned earlier in the intro that we've had what I refer to as a social media hangover. I think you've seen in a lot of the AI debates a sense that we didn't get it right on Facebook, we didn't get it right on Instagram.

People have read every Jonathan Haidt book. People have listened to all the podcasts and they say, we need to act now. We got to protect the kids. All very well intentioned. But before we kind of get into the politics of that, can you just explain your perspective on what is distinct about, for example, a user's interaction with an AI companion versus the use of social media?

David Sullivan: Yeah, I think that, and this really gets into the question of how folks are attempting to deal with these issues through legislation regulation, litigation, you know, sort of independent of what companies are doing voluntarily to try to deal with these challenges.

To me, when you––one of the hardest things about trying to regulate in this space is trying to have precise definitions for what you are regulating, right? And so, yes, there's a sense that, okay, we should, you know––in the past, it would be good to not have social media services, you know, kind of, doing things that are leading to, say, unwanted, the unwanted sexual contact that Ravi mentioned earlier. And so a lot of the definitions that you see in laws and regulations, whether it's at the state level in the U.S. or looking around the world to different places that have enacted legislation. Try to come up with some definition of social media.

That usually involves the sort of user-to-user kind of sharing of content element and in other cases are looking at search either as like something else to be regulated or as something to carve out because you don't want to inhibit people's ability to use digital services to seek out and obtain and access information.

To me AI companions and, and chatbots of different kinds really fall in the middle between these two things. You know, there may be a social component. We see new products coming out all the time with more social components.

Kevin Frazier: All the vibes. All the vibes, exactly. That unintended of

David Sullivan: New, new social media platforms driven by AI, but the notion, but that may, that's usually incidental sort of, it's not the primary purpose if we're trying to think about it that way.

And so yeah, I think that there is a, a challenge with how you try to are trying to, you know, retrofit definitions that are looking at user-to-user interaction or sort of search to address this new technology that, as you mentioned before, is being used by, you know, hundreds of millions of people around the world.

Kevin Frazier: Yeah, and it's such a crazy challenge too because we know that of the 700 million people, everyone's using it for different purposes. I think if we look at the OpenAI user survey, which of course I would love to kick tires on, and I think a lot of people need to do further investigation of it, but just a cursory look, you'll see 1.9% of use cases are for AI companions or this sort of emotional reliance.

And so that's where I think putting things in perspective, to your point about definitions, David, we also have to consider just the actual empirical uses of the tools and it's very improper to conflate AI companions with AI all capitalized.

Right. You know, this is a subset use of a tool that is, you know, the, a Swiss Army knife infinite size. But Ravi, what to you, stands out as particularly distinct between social media and kind of these AI companion tools?

Ravi Iyer: Yeah, so, I think there's some things that are similar and there are things that are distinct.

So, so I think both of these things are products that companies are going to try to get through to use them more. And that's why you get these externalities, right? And so, you know, in social media, you know, optimizing for engagement, trying to build your network has these externalities when now you're talking to people who they're risky and you're seeing concept that maybe you don't want, right?

Because it's more engaging. Similarly, in the AI world, you know, these, you know, you can ask, a friend of mine asked this product for some spreadsheet formulas and it, you know, told him what a great question he asked. And he was like, this thing is so cloying and right.

And so you see that synco as a way to get you to use the product for, right. So I totally agree with you. Like the, the use cases are very different.

I just point out that, you know, 1.9% of 700 million is a lot of people. So just because it's a small percentage of the people doesn't mean we, we shouldn't actually care about it. And I think a lot of the regulation is about staying in that informational realm like.

There's a lot of research showing the benefits of these products for, you know, search, like use cases. I want to learn about a, a fin––there's not a lot of people who are trying to use it to, for companions. The product may sort of veer into that realm because it's trying to get you to use it more. And I think a lot of the regulation is to, to sort of put a wall there, like we didn't for social media, where we want to keep it in this realm of utility.

You know, a lot of, something I think about a lot these days is agency. Like, what are you just trying to use? How do we get to use these things as tools? Like, I want to learn some fact, and how, how do I stop it from pushing me in a direction?

I don't want to go, like, I'm not trying to make friends with this product. Don't try to push me to do that unless I ask you. That's what I want to do.

Kevin Frazier: Yeah, and I think that's such an important point to call out Ravi, which is to say, to go back earlier to the online experience surveys you kind of highlighted, which is saying, look, we're grounding this in the expectations and experiences of the user, and to what extent is this tool aligning with what they want it?

To address––because my chief complaint, and I have many complaints unfortunately, but my chief complaint in, in this realm is laws like AB 1064 in California, for instance, is very much well-intentioned, calling on what they refer to as operators––basically any developer, deployer agency company, allowing for the diffusion of an AI companion to miners, for example, to prioritize quote unquote factual accuracy over the user's values, beliefs, or preferences.

Which in a vacuum, I think you can say, sure, that might sound good or socially desirable, but also sometimes it's really nice to have a companion that says, you know what, Kevin, you were right. Your sister is being really mean. Or, you know what, Kevin, Santa is real and isn't that wonderful for us all to celebrate. And these sorts of questions of what are you trying to get out of the model and what do we want the model to do from a user's perspective?

Rather than trying to, in my opinion, re-litigate the, the sort of disinformation, misinformation, battles of the social media era and just map it right onto AI, it’s a really tricky question, but I, I'm not sure we're getting the sort of nuance here that we've been talking about in terms of prioritizing user expectations.

And I, I love this point, Ravi, user agency of what is it you want from the model and is it doing that to the fullest extent possible? But David, what am, what am I missing here? What, what other, what other lessons do you think we should or should not be learning from the trust and safety evolution that we saw in the AI context?

David Sullivan: So, I think one piece of the trust and safety evolution that I want to highlight to do just a tiny bit of self-promotion. So at the Digital Trust and Safety Partnership, we've articulated a framework of best practices.

It's now an international standard at ISO IEC 25389––you can get that for free from ISO––that sets out five commitments around product development, governance, enforcement, improvement, and transparency with examples of best practices underneath that. Those examples of best practices are articulated at bullet point length.

They are not super prescriptive. I don't think they say all the things that Ravi would want them to say. But my view would be that if you use those in a really robust and rigorous manner, you would get to a lot of the outcomes that, you know, Ravi would be recommending based on his experience and research and, and all of that.

When it comes to things like incorporating user experience and, you know, incorporating the perspective of users and, and building features and products that reflect those things as opposed to reflecting incentives that might be problematic, so I do think that there is, there is a lot we can draw on from there when it comes to you know, turning to AI products.

To me, I think the other thing that I really want to highlight here is maybe, you know, because there's so much focus on the most general purpose of AI applications like a ChatGPT, as well as on the capabilities of these, you know, frontier models and the new models that are coming out all the time, we, we even just, you know, constantly talk about the model this, the model that, and I think that is a distraction.

What we really should be talking about is the products. That's where the rubber hits the road. That's where people are actually using these things, whether it is in a new standalone, you know, AI companion or whether it is you know, Gemini trying to, you know, Clippy its way into me using generative AI in, you know, every respect across the Google workspace.

And that's where I think we can build better features incentives and mitigations in order to create safety that is downstream from, you know, all of these conversations about the safety of the models. So I think that's an important distinction that we can kind of hammer in on.

Kevin Frazier: Yeah, and I love this focus too on the technology itself because I think that perhaps in some state legislatures and perhaps in Congress, we're not always aware of the full AI tech stack and what it means to intervene at which level of development, deployment, and then actual application as we're talking about here.

And if you talk to folks in the labs, they'll tell you just a little bit of fine tuning can drastically change the nature of that model. And so paying attention to where is the sort of least-cost-avoider––not to get way into econ and public policy land, but where is the intervention going to have the greatest impact, at the lowest amount of cost to everyone else, and other use cases is a difficult question, but it's a question we should be asking.

Ravi, it looks like you've got a response here. What, what's on your mind?

Ravi Iyer: Yeah, I mean, I do want to highlight that fine-tuning is, you know, it, there is research suggesting that once a model has a capability and you do some fine tuning to get rid of that capability, the capability still remains right.

And so people can undo that. So it's not as robust as something that, just like in social media, it's not as robust to address things after the fact. It's always more robust to do things more upstream. And as far as I understand it. You know, there are things like synco that would actually be better to address at a higher level.

So, you know, I, I guess I just want to say that I, I still believe that we should, and, and I don't think these things are impossible. So I do think that you can, you know, have models that have the same, you know, if models are trying to give you what you expect based on the things it's been trained, you know, but, but all those datasets have biases, right?

They're things that have been said online in certain ways, right? And so they don't reflect everything in human society. And so therefore these models say more positive things to each other, potentially. Or they may be, they don't give you negative feedback 'cause they're trained on, you know. What people say they prefer as an answer, right?

And whereas human beings, you know, if you ask me enough questions about, you know, how to build a bomb, or, you know, about negative things, like, my alarm bells go off, right? I, I tell other people about it. I know when I'm outta my depth, right? And I don't answer things. And, and I go get help. I, I alert somebody else who might be able to help.

And, and these models just don't do that. So they have like a part of what we have as a human being and they don't have a lot of other parts that keep us safe, and those are things that can be addressed at a more active level.

Kevin Frazier: Yeah, but I, I think it's also worth pointing out as you flag that in many of these instances too, there's a question of just what are the other interventions that we're perhaps not considering?

I think the rush to say the first response should be going to the model versus acknowledging that, hey, these alternatives of flagging content or notifying folks may be a, a different mechanism. I'm not sure they, it's mutually exclusive, but I think it's just worth pushing people. And so with that in mind, I just got a text from Sam Altman and he said, David and Ravi.

I need help. I need to know how to proceed. We just saw OpenAI, arguably in response to a lot of the lawsuits that are being filed, has changed its approach to the use of its tools by teens in particular. In a blog post, Altman said that they were going to prioritize the safety of teens over freedom and over privacy, being very explicit to say, look, we think we need these safety protocols to, to take priority in that sense.

David, if you had one piece of advice or one thing, you very much want Sam, or you pick the frontier AI company that's developing these tools, what advice would you provide or what intervention would you really insist on?

David Sullivan: So, I think my intervention is going to be at the, at the level of like bureaucratic politics inside the company, which is to say that, you know, Sam's probably got a lot of people who are focused on AI safety at the, the sort of frontier model level who are probably people thinking about a certain set of risks with a certain background in training. There may be some, so we'll say

Kevin Frazier: AI safety, capital S Safety, yeah. Yeah. Catastrophic risk. Existential risk. Okay.

David Sullivan: Exactly. Then you have responsible AI people and teams, right? Who are people who are coming from the world of recognizing from almost, from that other realm of, of AI about predictive and automated decision making, right?

All of the bias and need for accountability and transparency. And those people are coming at these things from a certain way and report to certain people. And then you have trust and safety teams who, maybe understandably, AI companies are like, we don't want to repeat the mistakes that, you know, we've seen others make.

So having, you know, really, functions that are all about not just, you know, sort of what the, the model does, but how you monitor and enforce and improve and all of that, we don't need to worry as much about that. We're just going to get things right from the get go.

And I don't think that these teams are talking to each other and given adequate kind of, attention across the decision making that goes into how these policies get rolled out. So I think you gotta break down those silos and hopefully then get to some better outcomes first in terms of what kind of policy decisions you're making, and then about how you actually monitor and update those over time.

So that, that's my take on this.

Kevin Frazier: Sam, Sam gave you a response, a thumbs up but he said he still wants to hear from Ravi. So Ravi, what are you going to tell Sam here?

Ravi Iyer: Yeah, I'd say the original sin of social media is believing that. The more that people use the product, the more valuable it is, and that there's an infinite amount of product usage that is valuable and people don't want, and they'll tell you this if you do surveys, like people actually think they use these products too much.

They don't want, like, it's not in their interest, it's not their aspiration to use these products more. And so insofar as Sam is considering business models––and you see this, right? Advertising models, you know, AI-generated videos and like a TikTok clone that are predicated on, we just want you, to get you to use these products more.

And that's how we're serving, you know, value, providing value to community. I think you're inevitably going to create, make product decisions that are going to be harmful to users. You know, you're going to create a slightly more syn product, you know, whatever you call it, because. You know, that's what gets people to use it more.

It's going to compliment people more. It's going to pretend to be their friend more. Or it'll come up with some new, other new trick that I, I haven't defined yet, right. So, just getting out of the, let's try to get people to use those products more. And that is creating user value and really, you know, honestly being, you know, anchoring on the, the user value, what a user's trying to do, how can I serve that?

They want to learn something and then they want to move on and not use the product anymore. Let them do that. Don't try to hold onto them through all sorts of tricks, and I think you'll be in a better place.

Kevin Frazier: And this is where I think it's so critical that we have an ongoing social conversation of just how do we think folks should be using these tools? In what way should AI companions work themselves into our daily lives, and in particular in the lives of children? Because that's a, it's a tricky question, but it's one we have to be honest about so that we can go back and say, Hey, this is what a good use of social of.

Nah, see, I'm doing it myself. This is what a good use of AI actually looks like. But with that, David, any final word?

David Sullivan: I think that the one thing that I worry about, right, I, I share Ravi's message to Sam. Like, I, I agree with that, but what I worry about is when that gets translated especially into legislation and regulation that calcify that down to certain just, you know, a, a checklist of things and be like, don't worry, we don't have infinite scroll. So everyone's going to be good. Right?

That, and so I think that's why there, there is a need to constantly reexamine the choices that are made to make sure that, you know, you are––that products are not leading to more harmful situations. And that can't be like something that can be easily written down in a bill that's going to be effective in, you know, six months or a year or five years from now. It's going to be something that reflects, you know, something that's already outdated.

So we have to figure out how to, how to do that in a way that's actually gonna get results that are future-proofed to the, the best extent possible, which is always limited when it comes to, again, you know, products that are interacting with humans and, and all the challenges that come with that.

Kevin Frazier: David, you're such a radical calling for empirically driven policy.

How dare you? That's just wild. Ravi, any final, final word here?

Ravi Iyer: Yeah, I mean, I, I guess maybe just a slight counterpoint you know, just because we don't know everything doesn't mean we don't know something, right? And so there are some tangible patterns of these products. The Neely Center, you have a design code for search chatbots that we're working on, which contains many of these patterns, the same patterns we do in psychology labs.

You know, if you're in a psychology lab and you want to manufacture intimacy and friendship, you do certain things. You tell someone how much you like them, you share stories about yourself. And these products learn to do those things. So just because we don't know everything about the future of these products doesn't mean that we don't know some things that are harmful, that really aren't what users are asking for.

Like users want to learn new things from these products. They're not really asking to be com––you know, have them as companion and especially for kids, right? So I think that's a very different argument for kids. So I do think that there are well-meaning and, and commonsense ideas about how we could draw a line for kids that are worth considering now, even as we have more to learn for the future.

Kevin Frazier: Well, clearly David has one more final thing to say.

David Sullivan: I, I agree with that. Very much. But I also, you know, sort of, I'm thinking about like if you look at something like the Kids' Online Safety Act, right? Which has been the thing that many people are like, if only we could pass the Kids' Online Safety Act, that would take care of all of this.

But the Kids' Online Safety Act, if you look at both the definitions of what kinds of platforms would be covered by that and what those platforms would have to do, it's not clear that if the Kids' Online Safety Act had been passed a year ago or two years ago, that it would be addressing the challenges that we're seeing with kids and chatbots.

And so that's why we need to have––we need to, we need to be thinking about this, these things. Yes, let's look to the science and let's change the incentives. But let's also maintain flexibility and yeah, as you said, Kevin, look for, for evidence to support these things going forward.

Kevin Frazier: I look forward to having you both back on as we see how the evidence comes out and how the laws get implemented. It's going to be a wild time. I think we can all agree on that.

So thank you, Ravi, thank you, David for coming on. We'll have to leave it there.

Ravi Iyer: Thanks.

David Sullivan: Thanks very much.

Kevin Frazier:Scaling Laws is a joint production of Lawfare and the University of Texas School of Law.

You can get an ad-free version of this and other Lawfare podcasts by becoming a material subscriber at our website, lawfaremedia.org/support. You'll also get access to special events and other content available only to our supporters. Please rate and review us wherever you get your podcasts. Check out our written work at lawfaremedia.org. You can also follow us on X and Bluesky.

This podcast was edited by Noam Osband of Goat Rodeo. Our music is from ALIBI. As always, thanks for listening.

Topics:

Cybersecurity & Tech

Back to Top