Cybersecurity & Tech

Scaling Laws: AI Copyright Lawsuits with Pam Samuelson

Alan Z. Rozenshtein, Pamela Samuelson
Tuesday, September 16, 2025, 10:00 AM
Discussing the rapidly evolving legal landscape at the intersection of generative AI and copyright law. 

Published by The Lawfare Institute
in Cooperation With
Brookings

On today's Scaling Laws episode, Alan Rozenshtein sat down with Pam Samuelson, the Richard M. Sherman Distinguished Professor of Law at the University of California, Berkeley, School of Law, to discuss the rapidly evolving legal landscape at the intersection of generative AI and copyright law. They dove into the recent district court rulings in lawsuits brought by authors against AI companies, including Bartz v. Anthropic and Kadrey v. Meta. They explored how different courts are treating the core questions of whether training AI models on copyrighted data is a transformative fair use and whether AI outputs create a “market dilution” effect that harms creators. They also touched on other key cases to watch and the role of the U.S. Copyright Office in shaping the debate.

Mentioned in this episode:

This episode ran on the Lawfare Daily podcast feed as the Sept. 19 episode.

To receive ad-free podcasts, become a Lawfare Material Supporter at www.patreon.com/lawfare. You can also support Lawfare by making a one-time donation at https://givebutter.com/lawfare-institute.

Click the button below to view a transcript of this podcast. Please note that the transcript was auto-generated and may contain errors.

 

Transcript

[Intro]

Kevin Frazier: It’s the Lawfare Podcast. I'm Kevin Frazier, the AI Innovation and Law Fellow at the University of Texas School of Law, and a senior editor at Lawfare. Today we're bringing you something a little different. It's an episode from our new podcast series, Scaling Laws.

Scaling Laws is a creation of Lawfare and Texas Law. It has a pretty simple aim, but a huge mission. We cover the most important AI and law policy questions that are top of mind for everyone from Sam Altman, to senators on the Hill, to folks like you. We dive deep into the weeds of new laws, various proposals, and what the labs are up to, to make sure you're up to date on the rules and regulations, standards, and ideas that are shaping the future of this pivotal technology.

If that sounds like something you're gonna be interested in, and our hunch is it is, you can find Scaling Laws wherever you subscribe to podcasts. You can also follow us on X and Bluesky. Thank you.

Alan Rozenshtein: When the AI overlords takeover, what are you most excited about?

Kevin Frazier: It's, it's not crazy, it's just smart.

Alan Rozenshtein: And just this year, in the first six months, there have been something like a thousand laws.

Kevin Frazier: Who's actually building the scaffolding around how it's gonna work? How everyday folks are gonna use it?

Alan Rozenshtein: AI only works if society lets it work.

Kevin Frazier: There are so many questions, have to be figured out and nobody came to my bonus class. Let's enforce the rules of the road.

Alan Rozenshtein: Welcome to Scaling Laws, a podcast from Lawfare and the University of Texas School of Law that explores the intersection of AI, law, and policy. I'm Alan Rozenshtein, associate professor of law at the University of Minnesota and research director at Lawfare. Today I'm talking to Pam Samuelson, the Richard M. Sherman distinguished professor of law at the University of California Berkeley School of Law.

We discussed the flurry of recent court cases defining the future of AI and copyright, including the pivotal District Court rulings in the lawsuits against Anthropic and Meta, the emerging legal theory of market dilution, and the controversial report on AI and fair use by the US Copyright Office.

You can reach us at scalinglaws@lawfaremedia.org, and we hope you enjoy the show.

Pam Samuelson, welcome to Scaling Laws.

Pam Samuelson: Thanks very much. Glad to be here.

Alan Rozenshtein: So we are at a I think, pretty pivotal moment for AI and copyright, and I'm delighted to have to have you on. I can't imagine a better guest to talk about these issues with. We have a lot to talk about in terms of cases and reports from the copyright office and, and policy recommendations.

But I think conceptually, at the heart of all of this is of course the idea of copyright, and then in particular what's often called the fair use defense. And so just for those of our listeners who, it may have been a few years since they took IP or, or who, who have never taken this, just give a kinda high-level overview of why, when it comes to talking about LLMs and other generative AI systems, so much of the discussion is about potential copyright infringement.

Pam Samuelson: Well, copyright law gives authors of all manner of works of authorship a set of exclusive rights that means a right to not, to exclude other people from doing certain things that you don't approve of. And one of those exclusive rights is to reproduce the work in copies.

And so the reproduction right is one of the most powerful tools that copyright owners have to be able to control what people do with copies of their works. And there's a limitation that the statute recognizes when people make fair uses then even if it was a prima facie infringement, it's not an actual infringement because fair use defenses actually prevail in some cases.

And so that's why these issues are so important. If someone is, for example, using in-copyright works as training data for a generative AI model, then they are making reproductions of those works.

And they may think that it's a fair use, but there's a prima facie infringement just because copies are made both to collect the works in a dataset and then to make copies of it during the training process.

Alan Rozenshtein: So there are these four fair use factors. They're in the statute they are a little bit cryptic as to what they mean. My sense is that a lot of the debate in specific copyright cases comes down to, on the one hand, how what's called transformative the work is, how much does it differ from the, the copyrighted work that's at issue.

And then on the other hand––and, and I should say, the more transformative it is, the more likely it is that it's gonna be fair use––and then on the other hand the more that the work in question can substitute for or harm the market for the copyrighted work, the less likely it is to be a fair use.

Is, is that––again, a huge oversimplification, but is that a, a rough cut at how courts try, try to think about these cases on a case-by-case basis?

Pam Samuelson: Yes. That's a, that's a really good summary. In general, the purpose of the of the defendant's use and what effect the challenged use has on the market are the two most significant factors.

It depends on the type of case. Sometimes even in non-transformative use cases, for example, making copies of television programs to watch 'em at a later time. That's a private non-commercial copy that the Supreme Court has said is fair use. And so it doesn't have to be transformative in order to be a fair use.

But when it is transformative, generally speaking, it's less likely to supplant demand for the original. And what copyright cares about more than anything is substitution of the market, right? So if this defendant's work is able to be in the market the same time as the plaintiff's work, some people might buy the defendant's work instead of the plaintiff's work.

Even though the what was valuable about the defendant's work really came from the plaintiff. And so it's not fair.

Alan Rozenshtein: And, and, and just, just to emphasize that last point, which I think is really important because different parts of IP have different justifications for them. And, and correct me if I'm wrong, but the justification for copyright, at least in the United States and I know the Europeans have a somewhat different conception of this, is you want to incentivize people to create more work and therefore you have to give them some sort of economic benefit, which is the kind of copyright time-limited monopoly. Which is essentially what it is.

And so you know, when we think about copyright, what we're trying to do is trying to maximize the sort of productive incentive for creators to create new work. Right?

And the reason I'm emphasizing this is because I think that hanging over all of these AI debates is this kind of nightmare scenario where AI essentially displaces all of this incentive to create new copyrighted work, 'cause you can't make money anymore because ChatGPT can do everything you can and then, you know, whether that's morally good or bad for the authors, there's just a practical case that then there's no more raw material. So in a sense, you know, the well runs, runs dry.

And this to me at least seems like the kind of worst-case-scenario that a lot of people have, have in mind. I'm curious what you think about that framing.

Pam Samuelson: Well, one thing I'll say is that the only difference I have with what you just said is that the Supreme Court has repeatedly said that the primary beneficiary of copyright is actually the public, that the Constitution says that Congress is given the power to create laws like copyright to promote the progress of science, right?

And so some part of what we're trying to think about when we're thinking about fair use is whether or not allowing the defendant to do what the defendant has done is going to be in––overall for, in the public benefit. Or whether or not it's not. And obviously, if it ruins the market for the work, then that's gonna be a problem.

And we'll come back, I think, probably in a little while, to what is now called market dilution, which is this fear that if generative AI can essentially output books on mushrooms and on gardening and romance novels, that nobody who, who might otherwise write a book about mushrooms or gardens or romance novels just won't do it anymore.

I think that's a, a question that the courts are being faced with. So far, that hasn't happened. And whether it actually will is something that's, I think got to be empirically tested just because it––kind of like, it hits us where we live. It's also important to recognize that automation has changed a lot of jobs, right?

A lot of skilled people don't have jobs anymore because automation actually made it simpler to do it. And so, you know, if you look at the studies that have been done about what is, what are AI gonna do to the labor market, authors aren't at the top of the list.

Alan Rozenshtein: Fair. Before we get into the case, I wanna talk about, I wanna do one more bit of stage-setting, and that is to briefly discuss the last major Supreme Court statement on copyright, which is this 2023 case of Warhol v. Goldsmith.

And, and my sense, again, I'm not a copyright scholar but my sense was that this was a reasonably big deal in the law and, and that the Supreme Court tried, have simultaneously strengthened copyright in a meaningful way and also injected a decent amount of uncertainty into how to actually do the fair use analysis.

So if you wouldn't mind just giving sort of an overview of, of the Warhol case, because I think those issues are gonna be very helpful as we discuss the specific district court cases in, in a few minutes.

Pam Samuelson: Sure. So. Back in the 1980s, Lynn Goldsmith took a photograph or a series of photographs of Prince the singer. He was then kind of just starting to become famous.

And Vanity Fair decided that it wanted to publish a story about his rise to fame. So it contacted agents for photographers and asked for pictures of, of Prince. And Goldsmith had one that Vanity Fair liked, so it made an arrangement called an artist reference license.

And what that means is that the, the agent for Lynn Goldsmith agreed to allow Vanity Fair to use this one photograph for the purpose of having a third person actually make an, a work of art from the from the, from based on the photograph. And so Vanity Fair commissioned, commissioned Andy Warhol to do it.

Andy Warhol made a purple Prince print that Vanity Fair used in the, in the magazine. And then years later, when Prince died, Conde Nast decided that it wanted to do a special commemorative issue on Prince. And so contacted the foundation about licensing probably that same image for the commemorative issue, and it ended up doing a different one.

So it turns out that print that, that Warhol made more than more than one instance of things based on the photograph. And so they liked the orange prints and so they put the orange prints on the front cover. And then Lynn Goldsmith, this is like now 2016, sees it, and she says, oh my God, that's based on one of my photographs.

And then she contacted the Warhol Foundation and said, that's an infringement of my copyright and you owe me money. And the Warhol Foundation said, Hmm. I think it's a fair use. And so they were so confident that it was a fair use that they sought a declaration from a court of non-infringement.

And they the trial court agreed. They said the court trial court said it's transformative because it has a different purpose and has a different image, and they've changed a whole bunch of things. And also the, they compete in different markets, right? She competes only in the magazine article market and and Warhol's stuff goes on museum walls and stuff like that. Okay.

Alan Rozenshtein: So, so just, just so I, just so I understand, because I think it's an important nuance, what it sounds like Warhol was arguing––and I think this will be relevant when we get to LLMs, is that even the original reference. License agreement would've been unnecessary. Under that logic, Warhol could have just seen the, the––seen Lynn Goldsmith's photo, and even without a reference license converted it into some Warhol Purple Prince.

Is, is that the, the import of that argument and the implications of it?

Pam Samuelson: Well, it's certainly one of the arguments. I would say that the fact that there was a reference license means that that he was authorized to create that particular work. And so, to me, that actually is part of the fair use argument. And my colleague Jessica Silvy actually says that the general understanding of artist reference license is, once you get that, you know, then you have your own work of art.

And so the trial, the trial court took that into account. The court of appeals did not take it into account, and the issue just didn't come up in the Supreme Court. So. I factor the fact that it was an artist reference license as part of my fair use analysis in that particular case. But appropriation art has in fact just seen a lot of photographs and other works as raw material for ongoing creation.

And so appropriation art was kind of at issue in the Warhol case. Now, what's different about the Warhol case when it was at the Supreme Court, as opposed to when it was at the Second Circuit and at the trial court, is that the parties had been arguing that the, that it was fair use when Warhol created the print series in 1984.

And before the Supreme Court, the solicitor general said Supreme Court, you don't have to think about 1984. Just think about that one commercial license that the foundation gave to Conde Nast and they got $10,000 for that, and Lynn Goldsmith didn't get anything. And the question is whether that's fair use, whether that was transformative.

And Supreme Court said no. That the, the Orange Prince was being licensed as a magazine illustration. That's kind of what Lynn Goldsmith does. So they had the same purpose, and they were commercial. And so the Court decided that that actually was a substitution for a legitimate market that belonged to Goldsmith.

Alan Rozenshtein: So it's always hard––and I always tell this to my students when I teach them a new case and they say, okay, what is the rule of the case? And I say, we don't know. The case is too new. Ask me in 10 years. Right? You never really know what a case stands for until the Supreme Court in some future time looks back and says, this is what we meant in this case.

So I'm asking to prognosticate a little bit, but is it fair to say that, as the kids say, the vibes after this opinion is at least that the, the courts are going to, to really squint quite hard and look for any potential substitution effect. And that will weigh reasonably heavily on fair use analysis because I suspect that you know, again, just thinking about these, these AI cases, you know, conceptually I think it's very hard to––the concept of transformativeness seems to me a very, very tricky question.

It has all these philosophical questions, I think, baked in, whereas the question of substitutability, market effects, market dilution––it's obviously a complicated empirical question, but it feels a little more tractable for a, for a court to decide.

And if that's the case, then the logic of a case like Warhol v. Goldsmith would suggest that it's gonna be marginally harder––we'll see how much harder it is––for the analogue companies to say, no, no, we're doing something totally new. Therefore we don't even have to worry about the economic effects on these creators whose work we are alleged to have infringed upon.

Pam Samuelson: So there are people who think that that Warhol v. Goldsmith transformed copyright fair use doctrine, and many fewer things will be fair uses in the future. I think that's baloney.

And I've actually been reading a lot of the cases that have been decided since then. And there's one case that was decided I think it was by the Tenth Circuit that, where Netflix had used part of a video, of a funeral ceremony in a documentary about the Tiger King.

And, and the judge said you didn't comment on it, you didn't criticize it, and therefore it's an infringement, not a fair use. And then the, Netflix came back and asked for reconsideration. And they sort of said, oh, I guess the, Warhol doesn't mean that after all. And so they withdrew the, the opinion that said, if you don't criticize, if you don't comment, it's not fair use.

That's not right. Documentary films have been able to use snippets of things for many years, and I don't see that changing in the aftermath of the, of the, of the Goldsmith case. Another thing to realize is that the Supreme Court actually gave several examples of uses that the Warhol Foundation could make without Goldsmith's permission, such as hanging it on the wall, such as licensing it for a book about Warhol's art.

That's not a market she's in, and therefore there's no market substitution.

Alan Rozenshtein: All right. I think that's good for background. Now let's jump into the, the recent spate of cases. So the first one I'll start with is Bartz v. Anthropic, which I think it's fair to say is probably the case that has gotten the most attention.

Here, you have three book authors representing a much larger class who allege that Anthropic, which is the company that is behind the popular Claude chatbot and LLM used their materials unlawfully in an infringing way in the training of their models. This is a complicated case because it involves both the, the ingesting of these books, but also many of these books apparently had been downloaded illegally for, from internet piracy websites, these big databases and Anthropic just used those.

So there are lots of moving parts in this. What––so let's, let's start with an overview of, of how Judge Alsup kind of split the difference between these. And then I'm curious if you think in the end this is a win for Anthropic or a loss, because it's a bit of a split verdict. And I think that there've been a lot of different perspectives on who quote unquote “won” in this district court opinion.

Pam Samuelson: So I think it's fair to say that a really important part of the ruling was that using in-copyright works as training data for constructing a model for a generative AI system, that's fair use because––

Alan Rozenshtein: It's transformative.

Pam Samuelson: Highly transformative. You took the whole thing, but what you did with it, you were using it for non-expressive purposes, right?

You were trying to construct a model rather than consuming the expression as you would if you were just reading the book. And the judge actually said that, although the, the authors wanted to say, you didn't get a license from me, the judge said, this is a market that authors have no right to control.

So it's important that there have been a number of cases, including Supreme Court cases, in which the courts say that copyright owners don't have an entitlement to control transformative markets. And Judge Alsup considered this to be a transformative market, and something that the authors didn't have any right to control.

Alan Rozenshtein: So, so let, let's just make an, actually––I wanna stand on this for a second, 'cause I think the transformative market point is important. So an analogy might be, you know I, I'm a law professor. I read someone's book about law. I learn about the, the law, I get some ideas, I read some other people's book, and then I go and become a very successful lecturer.

Right? The, the idea here is that the author of the book that I used, you know, maybe it would be professionally bad of me not to cite him, maybe all these sorts of issues, but he does not have, or she does not have an entitlement to control the market for my lectures, right? That, that’s the kind of analogy that's going on here.

Is that right?

Pam Samuelson: Yeah. As, again, as long as there is not a use of a substantial amount of the expression from the original, right? If you just recite directly from the book, then that might raise a different issue.

Alan Rozenshtein: But yeah, because then of course the market is for the book itself in a sense. I'm reciting the book.

Pam Samuelson: Right. Then that's consuming the book for its expressiveness. So Judge Alsup also said that when Anthropic went out and bought books and scanned the books, and then threw the book, the physical books away, but kept the, the contents in a database, that was transformative and fair use also. So those two things were not––were, have been controversial.

But Judge Alsup felt comfortable with both the use of books that were lawfully made available being used, digital books being used, and then scanning books to use as training data. He thought that was just fine.

Alan Rozenshtein: And so, yeah. And so, we'll, we'll get to the illegally, the illegal books part in a second.

But I, I do wanna focus on this because my sense––but again, I'm not the IP expert, so I, I really want your read on this––this is a massive deal, right? That, that it, it's a big deal. This is a big deal, right? It is about as big a deal as it gets, right?

Because if this logic, which really is kind of everything that the LLM companies have been asking for, I mean, what their argument has always been, look, our models are learning in,in, you know, a way, not exactly like humans learn, this is a kind of alien intelligence, but for purposes of copyright law, certainly analogically to how humans learn.

And copyright has never been thought to restrict the ability of people to learn from copyrighted material and then do stuff with that. So, you know, obviously––and I should have sort of emphasized this at the beginning––these are district court opinions. There's a, a long, long road to hoe here.

But if this becomes the law of the land, I mean, this is kind of the whole ball game, or this is a lot of the ball game for AI and copyright. Is, is that a fair statement?

Pam Samuelson: Yeah, I mean, it's important to sort of understand that the European Union has also adopted a text- and data-mining exception, so that at least nonprofit entities can use any in copyright material that they lawfully acquire to engage in text and data mining.

And the European officials have said that training data uses of works falls within this exception. And, and even profit-making entities can do text- and data-mining copies also, although they have to give copyright owners the opportunity to opt out.

So the, it, you know, fair––we got started with the fair use conversation, but other countries have recognized that there is a legitimate interest in being able to use works for what are often called non-expressive or non-consumptive uses.

Because you know, when you do text- and data mining, you actually learn things that you can't learn any other way. So again, thinking back to what is the constitutional purpose of copyright, it is to promote knowledge, right?

And so this kind of notion that, that what are the things that these generative AI companies are doing is promoting more access to knowledge. That's one of their things that they're gonna say.

Now, of course, from the standpoint of the authors, that's fine with me as long as you pay me. And that's gonna be the sort of the thing that, we'll see how that plays out.

But, but Judge Alsup didn't think that was necessary.

Alan Rozenshtein: Is––going back to, to Warhol, the Warhol opinion for a second. Is this consistent, this kind of reasoning consistent with Warhol? And, and specifically in the following sense: it seems that what Judge Alsup in this opinion did was he really focused on the question of transformative.

And he said, look, you know, books and neural nets, they're, they're just too different. Right? And so I'm just, I'm not gonna, I'm not gonna go down the question of market substitution, and I'm not gonna do that. I'm just gonna say, these are so different that it's just too bad for the authors. Is that consistent with Warhol?

I mean, or in other words, if you could have a case and the other case we'll talk, we're gonna talk about is gonna a little bit like that actually. But if you had a case in which the authors could show, sure, the LLM is doing something for totally transformative, fine, fair enough. But the effect is going to be that no one's ever gonna buy my book again because they can just go to ChatGPT and get all the information that my book initially provided.

You can imagine that in the case of a textbook, you know. No one's gonna go to ChatGPT to read a novel. That's not how it works. But you can imagine going to ChatGPT to learn calculus and no longer read calculus textbooks. The idea that, well, if it's transformative enough, we don't even look at the economic effects.

Is that consistent with Warhol as you understand it?

Pam Samuelson: So again, the, the way that plaintiffs are looking at this is that you should think about the generative AI system in a more holistic way, right?

Yeah, we're talking about training data, but if you think about it holistically, then here's my work, which has this pur––, particular purpose, and then here's the model in between, and then here's the output.

And if the output essentially will satisfy demand for my work over here, then, then in fact, it's a, it's a competing substitute. Now, what's hard about this for copyright purposes is that generally speaking, in order for something to be an infringing copy or to be an infringing derivative work, it has to have substantial similarity in the expression to the work that it came from.

So what is different about this is the notion that something that is not substantially similar in expression over here, the output, but that has a lot of the same information––well, copyright doesn't protect information. So, you know, it would be, shall I say, completely weird to say that this output over here is an infringement of this particular input when it says it completely differently. It just uses the information and the ideas.

Alan Rozenshtein: So, we've talked about, I think, the core of Alsup's ruling, but there's also the part about the fact that much of the Anthropic dataset was acquired illegally. It was downloaded from these pirated databases, and here, Judge Alsup did not think that was fair use. And he ruled against Anthropic there.

And we've, you know, and then recently––though at the, the time that we're recording this, this is all still pending––Anthropic and the, the plaintiffs in the class have come to a proposed settlement. I believe it's about $1.5 billion.

I, I read somewhere that's about $3,000 per book, which is substantially less than the, the sort of statutory damages that are available. But, you know, substantially more than nothing.

But the judge seems, at least in a preliminary hearing about this, very unhappy with this. So we'll see. By the time we release this episode, maybe he will have ruled on this.

But I'm curious, kind of two things. One, what you make of his quite critical comments, at least initially, about the first draft of the settlement. And two, whether this really matters. I mean, obviously, a billion and a half dollars. I mean, it's a lot of money to me. But you know, Anthropic––I think its most recent valuation is like north of $150 billion, and the AI market in general is probably trillions of dollars at this point.

So, you know, these things seem like rounding errors. And I, I wonder, you know, in the long, in the grand scheme of AI and, and LLM AI and copyright, and I'm curious what your view on that is.

Pam Samuelson: Well, first of all, the other case that was decided within days of Judge Alsup's decision was Kadrey v. Meta.

Pretty much the same claims were being made against Meta as were being made against Anthropic. And there were pirated books in their dataset too. And Judge Chhabria just didn't care.

So the idea that the––that you use training data that comes from books that were downloaded from Books3 or some of the other shadow, shadow libraries that that necessarily is gonna taint the infringement claim.

I think that's an exaggeration, because Judge Chhabria basically said, you know, it doesn't cut either way. The, and the, from the standpoint of the people who are doing it, they were basically saying to themselves, look, we're not, we're not trying to exploit the expression, right? We're not gonna sell the book in competition with you.

We just want the, to extract knowledge from these works. And they thought that they were just doing research. Now Judge Alsup had a very different view about the things. And what's different about this particular situation is that he––Judge Alsup certified a class that the plaintiffs and the defendants didn't ask for.

So he just did this on his own. And he certified the class as all of the copyright, legal or beneficial owners of copyright in books that had 1) been registered with the copyright office, and 2) had an ISBN or an Amazon number associated with it. And the estimation is that about half a million books are within that class.

And then the question is sort of like, what do you do with $1.5 billion? Well, it's an exaggeration to say that authors are gonna get $3,000 each because the legal owners of many of the copyrights in that collection are the publishers.

And the authors may be beneficial owners, as in have a right to some royalties. They might share in this. But I think that also now kind of realizes that, oh my god, this, most of the money's gonna go to the publishers. And it's really the authors who brought this lawsuit, and what are the publishers doing in the lawsuit? They hadn't been in the lawsuit at all. Right? So all of a sudden the publishers get a gazillion dollars, and the author's gonna have to fight to get a share of it.

That just, I think he's kind of like, he's been trying to push this thing along. Right? He––the reason that this case didn't go up on the class certification and didn't go up on a fair use issue to the Ninth Circuit was because he said, I'm going to trial on the pirated books issue on December 1st. You better get ready.

And faced with the potential of a much bigger damage award at trial, a settlement actually sounded like a safe thing to do. And in some sense, it also puts Anthropic in a better competitive position than OpenAI. Because OpenAI has used pirated books too.

Alan Rozenshtein: This remedy question is interesting because as you point out, if you're trying to make authors whole or you're trying to help the authors, you may need to go quite a bit beyond $1.5 billion so that there's, you know, money left over.

But this then gets into the question and you, you've actually, the last time you were on the, the podcast and, and you wrote this great piece for us a few years ago about, about remedies, you start getting into not just huge financial remedies, but also injunctive remedies and, you know, remedies to, to destroy data.

What––realistically speaking, even if, even if we find, even if courts find that there has been copyright infringement in some of these cases, are they likely to impose the sorts of remedies that would be necessary to vindicate the injury, which is gonna be massive? Given that to do that would potentially mean destroying or really severely crippling these AI companies, which my sense is no one actually wants to do.

You know, at least by mid02025, every judge has tried ChatGPT. Whether they like it or not, they realize it's a pretty big deal. I don't think any of them want to be the people that destroy the AI, AI industry. So I'm sort of curious how you think the kind of realpolitik of this is all gonna fall out?

Pam Samuelson: Yeah, I mean, one of the things that's interesting about Europe is that they have a very elaborate collecting society, kind of, culture.

And what I mean by that is that you can actually go to a collecting society, let's say in France or Germany, and get a license that covers––not all, necessarily, but a, a big swath of the authors of a particular type of work.

And you get the license and it's a basically then you can do with the, the works what you want. You just have to pay something for it. So collective licensing is something that the Europeans are actually thinking about a lot in terms of the generative AI. So the, when the copyright directive that created the text- and data-mining exception was adopted, they didn't think about collective licensing.

But now that generative AI is here, there's a lot of talk about that. And so there are authors and author groups that would like to see collective licensing in the U.S. for things like the uses of work since training data. But, you know, when Anthropic hired an economic expert to think about the market effects of the use of the works as training data, he said, you know, you'd have to make transactions with millions and millions of authors and the transaction costs just swamp that. And so you can't really do it.

And so even though there are a number of organizations, including the Copyright Clearance Center that want to offer a collective license for uses of works as training data, they don't have the rights to that. Okay. So, you know, they, they'll issue you a license and they'll collect some money from you, but it's like, they don't really have a repertoire of all of the author's works––licensing––to make that license happen.

And the general view is that the, if there is a training data, right, it belongs to the authors. And that's why these lawsuits have been author lawsuits, not publisher lawsuits.

Alan Rozenshtein: Let's talk about the, the, the Meta case because I think it is, it is worth talking, digging into a little bit more. As you said, it's basically the same case but it comes somewhat differently because, as you said the, the judge doesn't really care about the pirated data issue––that's not, that's not their concern––and ultimately rules for Meta, but does so, as far as I can tell, because he's quite dissatisfied with how the plaintiffs argued their case.

So just explain why this case is, if you agree with this summary, less of the victory fer Meta than I think it was initially portrayed as, and that'll let us get into this question of market dilution and what that means and why that might be important going forward.

Pam Samuelson: Yeah. So there were two arguments that Kadrey’s lawyers made to Judge Chhabria about the market harm that they saw.

One was they thought that there was going to be undercutting the actual market for the books. And two is the market for licenses for uses of works as training data. And they focused all of their briefing and all of their discovery evidence based on those two theories. And Judge Chhabria was persuaded that Meta had actually done a good job putting guardrails in its system to stop the, the recitation or regurgitation of expression from the book. And therefore that the lost sales was not a viable claim of harm.

Alan Rozenshtein: Unless you, let me jump in here because I think it may be worth noting this question of memorization is important because it's also at the heart of other copyright lawsuits.

So yes. You know, there's this big, I mean, prob- probably the, the, the highest profile copyright, sort of, all right now is New York Times v. OpenAI. And a lot of that case is about this memorization question that, you know, if you prompt OpenAI and it's sort of, sort of the right way, it'll spit out large pieces of a New York Times article, which, which seems bad.

But that all, that did always strike me as a little beside the point because sure, that's bad and maybe you should be punished for that, but like, that's gonna be fixed. That doesn't, that doesn't strike me as the sort of fundamental, the interesting question about LLMs.

So, so just going from going back to the, to the, to the Meta case you know, once, once we clear out this question of memorization, there's still this more interesting lurking question of, okay, well what about the effect of the truly transformative use? And, and that's, please continue, sort of where, where the judge then says, you know, this has not been argued very well.

Pam Samuelson: Yeah. So, so lost sales goes out the window because Meta has done the guardrails thing and the lost licensing fee.

Here's a place where Judge Chhabria agrees entirely with Judge Alsup that the, a licensing market for uses of works as training data is just a market that the, the plaintiffs don't have any right to control, period. So those were the two arguments that Kadrey’s lawyers were making. That's what evidence that they had been able to produce.

And the judge said, well, what about this market dilution thing? If somebody wrote a book about mushrooms there––nobody's gonna wanna buy their book about mushrooms anymore because you can go to ChatGPT or to Meta’s Llama and get the, and get a book about mushrooms if you want. Or you can write a your own, you know, you can ask Llama to write a book about mushrooms and so it undercuts that market.

So that is, as I said like a few minutes ago, a very novel theory, that something that uses information from existing works and puts it in somewhat different words, that may compete, but it's not it's not a direct market substitute.

And what Judge Chhabria thinks is that there's indirect market substitution and that that's enough to say that there's a market effect. And he agrees that it's, this is a novel theory. And of course he's speculating, speculating, speculating about this. He got really excited about this theory. But then if you don't produce any evidence of market dilution, it's just a supposition. That's not good enough.

So what he's basically doing is signaling to the other plaintiffs in the other cases, why don't you amend your complaint to raise this market dilution theory?

Alan Rozenshtein: So, so I, I think this subtle distinction may be lost on me, but if, if the copyright holders do not have an entitlement to the transformative market, why do they have an entitlement to avoid market dilution?

I feel like we're, we're slicing the bologna real thin here, which, you know, is––we're lawyers. That's literally what we're paid to do. But I, I'm, I'll admit, I'm having a little trouble following this distinction.

Pam Samuelson: Well, I hope Judge Chhabria has a chance to like, help us understand it, because it's not, it's not just me.

Okay, good. My reaction was exactly the same. You know, they say you can't, you can't control this particular market. And then during the, during the, the colloquy with the lawyers he was saying, well, you guys can actually, you'll, you'll figure out a way to get a license. To, to do this is like, no, I don't think so.

And you know, this is actually a reason why authors who really are worried about market dilution, they would want the models destroyed because there's no really, seriously––because there's never––okay. I am like, I trained on, let's say, 2 billion books. Okay, there isn't such a thing in the world, but let's just say I did that.

How much does every single author of every single book get? And how would you figure out what's a fair compensation? Now? A colleague of mine in Europe basically says, look, let everybody train as much as they want. And then the AI company should put a big bunch of money into a pot, and that pot should be used to subsidize authorships.

Alan Rozenshtein: Kind of like the National Endowment for the Humanities or something like that. PPPI, I'm gonna mispronounce the, the name, apologies. But it's like pay and taxation. I mean, this is, this is what you do, right? You know, the, the pie gets bigger and then you tax the pie and then you redistribute. Which, you know, always just the economist favorite way of doing it, because it avoids a lot of kind of dead weight loss and stuff like that.

Of course it tends to work better in theory than in practice, right? It usually does not in fact accomplish the goal. But yeah, I agree with you. That does seem like the most conceptually tractable way forward. Okay. So, so we, we have these cases on, on the west coast there. There's other cases that are, that are, that are trickling up.

I do wanna get some other stuff sort of in, in the policy space. But before we move to that, to what extent can we read into these cases or, or, or how much do we know right now? Right in the middle of 2025. Right. It seems to me that the momentum is kind of on the side of the companies, but again, these are district court cases, so are you changing your priors on how the law's gonna shake out or is it just too early to tell?

And we're gonna have this conversation in two years when we have a couple of ninth circuit opinions and a, and a Third Circuit opinion. And, and, you know, we're trying to figure out the appropriate vehicle for cert.

Pam Samuelson: Yeah. It's gonna be a while. I actually got in touch with some people to find out, you know, what's the state of play and nobody's planning to, to make summary judgment motions and any of the other generative AI cases so far as I can tell until 2026. Now there are two AI training data cases now before the ninth circuit. And and the, and the Third Circuit. So the Doe v. GitHub case is one in which there's no copyright infringement claim. But the claim is that the, the that GitHub's Co-Pilot is basically was trained on lots and lots of 5 billion lines of open source code.

And then you could use copilot to say, I'm writing a program about x, I need a function. I need some code that will do this particular task. And then copilot will just generate something for you. And so do 1, 2, 3, 4, and I think five Now say that you remove copyright information such as my open source license from the works that were used as training data, and you spit out code that's nearly verbatim of some MyCode and therefore that the removal of copyright information is something which is illegal.

And so this, the, the Ninth Circuit is being asked to review a decision that said that if the code that copilot produces is not identical. To the code that was trained on then there's no violation of this copyright management removal claim. Now this is important because the original claim in the Doe v. GitHub case was the violations of this what's called Section 1202, the copyright management information law.

They asked for $9 billion for that and that's actually a lot of money for removing copyright information. But it comes with a statutory damage minimum of $2,500 per violation. And if you think about how many times how many copyright works that, or how many copyright stuff notices might be affected by this, we're talking really large numbers.

Okay? So that's before the Ninth Circuit. Right now, I've got a brief in that particular case agreeing with GitHub that identical should be required. But that's in, that's one case. The, the one that's closer and more significant for the generative AI cases especially, that are pending in the East Coast is the Thomson Reuters versus Ross Intelligence case. So Ross basically got several thousand head notes from a vendor that it dealt with and trained not a generative ai, but a trained, a model on the head notes.

And the Thomson Reuters is the owner of the Westlaw database and claims that use of the, of the head notes this training data is infringement. And at first the judge who is the trial court judge said, you know, there you guys are disputing about the facts. This needs to go to trial. And then he changed his mind and said, you know, send me another set of briefs. And so then he decided that it wasn't fair use and he was very influenced in that by his reading of the Warhol decision because he thought that the Ross Intelligence AI program was offering the same thing as the West Thomson Reuters.

Tool, and therefore they had a same purpose and they were commercial, and therefore that was that was something that then meant that there's there's harm to the market. So that had a kind of substitute of effect in that view. Now that that judge actually said, you know, I'm not really sure about this.

It may be that we should that we should. Let the appellate court take a look at it. So he's

Alan Rozenshtein: unsurprisingly that judge was a former law professor himself, so, yes. I, I, you can you can, you can see his sort of law professor is all over those, those opinions. Yeah.

Pam Samuelson: So, so that case is now pending before the before the third circuit and the trial judge had to certify.

Yeah. Appellate court, please take this. And then the appellate court has to say Yes. So that happened. And that is in the middle of its briefing schedule right now.

Alan Rozenshtein: I think it also probably helps that the trial court was an appellate court judge sitting by designation. So he, he asked his colleagues for some help and they, they agreed.

Before we close out, I do wanna make sure that I get your opinion on. Some stuff that's been happening in DC in particular, the truly wild story of the US Copyright Office and this draft report that they issued setting out some thoughts about fair use this report. I think it was an interesting read.

It was somewhat controversial when it came out, but it became really controversial because shortly thereafter, president Trump fired the head of the Copyright Office Shera Perlmutter. Though recently actually a federal court has halted that and reinstated her. There's a very interesting separation of powers removal question that we're not gonna get into, obviously.

But I, and again, I'm not a copyright person, so perhaps it's not surprising. I had never heard of the Copyright Office. I did not know it was such a big deal. So just explain a little bit what the copyright office is, what their role is, whether these reports are, you know, are they, are they binding?

Are they just. Law review articles effectively. What did they say? What, what, what is your read of, of all of this, especially, and I should say, I'll sneak this in as well, given that the administration's recent AI action plan does actually says nothing about copyright which was a notable omission.

Even though in releasing it, president Trump in the press conference kind of riffed a little bit about how copyright was crazy and you couldn't possibly build these models if everything was copyrighted. So I just, I gave you a lot, I gave you a big kind of DC stew of stuff. So just to close out, what do you make, what, what do you make of all of this?

Pam Samuelson: One of the roles that that the copyright office has is as an advisor to Congress. So the studies that it conducts are studies that for the most part, some member of Congress asked them to write. So members of Congress know that there is this big AI thing going on out there, and who are they gonna ask for advice about it?

The copyright office. So the reports were the, the training data report was the third in a series of, of reports about AI related IP type issues. And the office worked really, really, really, really hard on this. And it they got 10,000 comments on their on the proposed on the questions that they posed about ai.

And so sifting through all of those was not just an easy thing to do. And the bottom line in the report that the court that the office issued, just before she got fired was was a report that basically said some of these uses may be fair uses, and some of them may not. That's basically what they said.

So sometimes it might be transformative. The more researchy, the more kind of educational, the more likely it's gonna be, the more commercial it is, the less transformative. And the, the, the, that was the report that actually introduced the phrase market dilution for the issue that you were raising earlier about who will ever write a book again if generative AI will just like flood the market with all kinds of other other AI generated stuff.

Now, you know, the AI generated stuff can't be copyrighted. So you can't really make a lot of money if you can't control a copyright. So the stuff is in the public domain. So authors actually have some benefits from copyright in the human author works. So I think that they in fact would do pretty well.

But you know, there's a real serious empirical question about which, if any sectors of the copyright industries will be harmed. And you know, the motion picture industry kind of likes generative ai 'cause it comes up with some pretty cool things. And they've been using computer generated stuff in their movies forever.

You know, as long as there's been computers, they've been using computers to do some of the kind of like scenes with space aliens and stuff like that. So but back to the question of like, what about Perlmutter? Well. The Trump didn't have the authority to hire fire her. Only the only the librarian of Congress can fire the register of copyrights.

And there isn't a librarian of Congress right now because Trump fired her and he tried to sit, put somebody in place as her successor. And that person has to be Senate confirmed in order to be a librarian of Congress. And so we are at a stalemate right now. And so Shera prevailed that the DC circuit and we'll see whether the government decides to take that up to the Supreme Court.

Shadow docket's been getting a lot of work lately. And so maybe they'll take this one on too.

Alan Rozenshtein: I, I, I could spend the next two hours talking to you about copyright, but I think we're gonna have to leave it here. Thanks so much, Pam. And we'll have to, we'll, we'll, we'll, we'll get you on in the next turn of the screw on all of these cases.

Really appreciate it.

Pam Samuelson: Okay, sounds great. Okay, thanks. Bye.

Kevin Frazier: Scaling Laws is a joint production of Lawfare and the University of Texas School of Law. You can get an ad-free version of this and other Lawfare podcast by becoming a Lawfare material supporter at our website, lawfaremedia.org/support. You'll also get access to special events and other content available only to our supporters.

Please rate and review us wherever you get your podcasts. Check out our written work at lawfaremedia.org. You can also follow us on X and Bluesky and email us at scalinglaws@lawfaremedia.org. This podcast was edited by Jay Venables from Goat Rodeo. Our theme song is from ALIBI music. As always, thank you for listening.


Alan Z. Rozenshtein is an Associate Professor of Law at the University of Minnesota Law School, Research Director and Senior Editor at Lawfare, a Nonresident Senior Fellow at the Brookings Institution, and a Term Member of the Council on Foreign Relations. Previously, he served as an Attorney Advisor with the Office of Law and Policy in the National Security Division of the U.S. Department of Justice and a Special Assistant United States Attorney in the U.S. Attorney's Office for the District of Maryland. He also speaks and consults on technology policy matters.
Pamela Samuelson is the Richard M. Sherman Distinguished Professor of Law and Information at the University of California, Berkeley.
}

Subscribe to Lawfare