Scaling Laws: Rapid Response to the Implications of Claude's New Constitution

Jakub Kraus; Kevin Frazier; Alan Rozenshtein

Cybersecurity & Tech

Scaling Laws: Rapid Response to the Implications of Claude's New Constitution

Jakub Kraus, Kevin Frazier, Alan Z. Rozenshtein

Friday, January 23, 2026, 7:00 AM

Share On:

Discussing Anthropic's newly released "constitution" for its AI model, Claude.

Meet The Authors

Subscribe to Lawfare

Jakub Kraus, a Tarbell Fellow at Lawfare, speaks with Alan Rozenshtein, Associate Professor of Law at the University of Minnesota and Research Director at Lawfare, and Kevin Frazier, the AI Innovation and Law Fellow at the University of Texas School of Law, a Senior Fellow at the Abundance Institute, and a Senior Editor at Lawfare, about Anthropic's newly released "constitution" for its AI model, Claude.

The conversation covers the lengthy document's principles and underlying philosophical views, what these reveal about Anthropic's approach to AI development, how market forces are shaping the AI industry, and the weighty question of whether an AI model might ever be a conscious or morally relevant being.

Mentioned in this episode:

Kevin Frazier, "Interpreting Claude's Constitution," Lawfare
Alan Rozenshtein, "The Moral Education of an Alien Mind," Lawfare

Find Scaling Laws on the Lawfare website, and subscribe to never miss an episode.

To receive ad-free podcasts, become a Lawfare Material Supporter at www.patreon.com/lawfare. You can also support Lawfare by making a one-time donation at https://givebutter.com/lawfare-institute.

Click the button below to view a transcript of this podcast. Please note that the transcript was auto-generated and may contain errors.

Transcript

[Intro]

Alan Rozenshtein: It is the Lawfare Podcast. I'm Alan Rozenshtein, associate professor of law at the University of Minnesota, and a senior editor and research director at Lawfare. Today we're bringing you something a little different, an episode from our new podcast series, Scaling Laws. It's a creation of Lawfare and the University of Texas School of Law where we're tackling the most important AI and policy questions from new legislation on Capitol Hill to the latest breakthroughs that are happening in the labs.

We cut through the hype to get you up to speed on the rules, standards, and ideas shaping the future of this pivotal technology. If you enjoy this episode, you can find and subscribe to Scaling Laws wherever you get your podcasts. And follow us on X and Bluesky. Thanks for listening.

When the AI overlords takeover, what are you most excited about?

Kevin Frazier: It's not crazy. It's just smart.

Alan Rozenshtein: And just this year, in the first six months, there have been something like a thousand laws.

Kevin Frazier: Who's actually building the scaffolding around how it's gonna work, how everyday folks are gonna use it.

Alan Rozenshtein: AI only works if society lets it work.

Kevin Frazier: There are so many questions have to be figured out—and nobody came to my bonus class! Let's enforce the rules of the road.

Jakub Kraus: Welcome back to Scaling Laws, a podcast from Lawfare and the University of Texas School of Law that explores the intersection of artificial intelligence, law, and policy. I'm Jakub Kraus, a Tarbell Fellow at Lawfare, and today I'm talking with Alan Rozenshtein, associate professor of law at the University of Minnesota and research director at Lawfare.

And Kevin Frazier, the AI Innovation and Law Fellow at the University of Texas School of Law, a senior fellow at the Abundance Institute and a senior editor at Lawfare. Our focus is on Anthropic’s, recently released Constitution for its AI model. Claude, which Alan and Kevin just wrote about for Lawfare, we discussed the lengthy documents, principles, and underlying philosophical views, what these reveal about Anthropic’s approach to AI development, how market forces are shaping the AI industry, and the weighty question of whether an AI model might ever be a conscious or morally relevant being. You can reach us at scalinglaws@lawfaremedia.org, and we hope you enjoy the show.

Alan and Kevin, thanks for coming on to talk about Claude's constitution. Let's start with Alan. What were your initial impressions of the document and what was this for readers who are listeners who are unfamiliar?

Alan Rozenshtein: Yeah, I mean, my initial impression of the document was that it was very long. It's 80 pages in PDF, I think it's like 22,000 words.

Which I mean, I'm a law professor, so that's my sweet spot for, you know, law review articles. But I don't usually see things that long you know, written by normies. Though maybe this is, maybe the idea that anything in this world is written by normies is my first mistake.

So what is this? So, I guess stepping back, right? When these models are trained, you basically start with what's called a pre-trained model, which is basically a text prediction machine on the entire internet. And that is kind of the core of all of these models, intellect, and we can put “intellect” in scare quotes, but you know, their capabilities.

Obviously, the different models are different in how they're trained, but because they're all at this point essentially training on the entire internet and there is only one entire internet, the pre-trained versions of these models are reasonably comparable. But pre-training is only the first step, hence the pre-training.

After pre-training, there's a bunch of stuff that then happens to move the model into a more useful direction according to however, the developer wants the model to behave. This is often called training or post-training, and there are a million different components of it.

And as part of this, a kind of, and again we can put scare quotes around this, I'm sure we can put scare quotes around this entire conversation and—I'll just stop putting scare quotes around anything—is the kind of model's personality and again, different developers have taken different approaches some more sophisticated, some more explicit than others.

And Anthropic in particular has taken I think, a very deeply interesting approach to taking these kind of raw pre-trained models and making them into something useful. Anthropic calls this quote unquote “constitutional AI,” and I think we're gonna probably spend a bunch of time, especially because, you know, Kevin and I are sort of law professors and we have very specific ideas of what the word constitutional means, whether that's the right word, and in what way is this akin to a kind of traditional constitution, but basically trying to embed various principles and judgments and heuristics and guides into these models.

Now again, I think every developer that is making a sort of useful chatbot is doing something like this, whatever they call this. But I think Anthropic has done the most sophisticated thinking about this. So about I think a year ago, anthropic released an early version of what it called Claude's constitution.

A relatively short document of I think like 20 or 30 kind of high level principles. You know, be helpful, don't lie, that sort of thing, as a kind of example of how it was training, claw to be useful and helpful and in line with Anthropic’s values.

What Anthropic released earlier this week is the kind of full version of its constitution. Again, this 80-page, 22,000 word document that is. Meant to, I think simultaneously, and here I should go to the technical details, but I guess it simultaneously meant as the document that Claude itself uses to guide its behavior and also it is simultaneously an outward facing document to the world as to what Claude is doing.

Over the last I think in the last couple of months, there was some indication that there was a quote unquote, “soul document” someone had managed to get Claud to output what seemed like a kind of constitution. And, shortly after Amanda Askell, who is Anthropic’s, sort of philosopher-in-chief and an actual PhD moral philosopher who also is the prime author of this new constitution, she went on X and basically confirmed that yes, there is such a I don't think it's necessarily called the soul document, but there is such a document that has been used to train Claude and this constitution that was just released is kind of a cleaned up, and somewhat more expanded version of this document.

So, you know, Jakub, we can get in sort of whatever details you want or we can sort of turn it over to Kevin. But basically this document is meant to set out how cl Anthropic thinks about training Claude, how Claude relates to Anthropic, to deployers, to users. And then the part that interests me, personally the most, a kind of deeply interesting discussion of moral philosophy and character formation as applied to magic sand.

Jakub Kraus: Yeah. Kevin, I want to hear more of your piece, more of your thoughts relating to your piece on Claude's constitution and comparisons to the U.S. Constitution and before that,

I'm interested if you have reactions to what Alan was saying there with, this is previously called a soul document and there's a fair amount of treating—scare quotes Alan is using as he is talking. Anthropic is doing something a bit unusual from the other labs by focusing on Claude as more than a tool, almost treating it human-like.

Do you think that's a fair direction to go in with this kind of document? Is that a good direction?

Kevin Frazier: So I'll start off by saying that I would definitely not categorize this as a document that was crafted by normies. No offense to Alan's initial use of the term and not to call them non-normies, or I'm not sure what the—

Alan Rozenshtein: I, I mean, I've never met, I've never met Amanda personally.

I suspect she's the sort of person that would be offended if we called her a normie. So, this is obviously with all due compliments to Amanda. Like nothing about this is normie.

Kevin Frazier: Amanda, when you listen, note that it was Alan who first alleged that you all were normies, not me. So when we invite you to Scaling Laws to come explain this document in even more detail, please be nice to me and mean to Alan.

That's general advice for all.

Alan Rozenshtein: No I specifically said none of these are normies. None of these are normies. No one's a normie here.

Kevin Frazier: Yeah, exactly. Exactly. Sure.

So what's really important to point out is that Anthropic from the get go in, its, maybe we'll call it the preamble, explaining the purpose of this constitution specifies that their approach to AI development is regarding themselves as being on the vanguard of doing it safely, and they very much view their company's mission as pursuing the frontier of AI, but doing it in a way that they think better aligns with human values and the long-term success of humanity, more so than other labs.

And so it's just really important to put this constitution in the context of Anthropic’s underlying mission. Perhaps it's national ambition if we're gonna move forward and carry forward this constitutional analogy.

And Jakub, as you said, and as Alan alluded to, it's impossible not to also bring in some of these questions of consciousness and the extent to which AI may be something greater than just bits of data and sophisticated computer training. That is a topic that warrants, and it will receive, incredible additional inquiry on scaling laws and by tons of other scholars and by an interdisciplinary set of actors.

But it's worth noting that from the outset, Amanda in an interview with Time referred to training a 6-year-old as sort of an analogy for trying to train Claude. The idea that this 6-year-old can very much probe whether you're being true or false or whether you're trying to deceive it or whether you're trying to guide it in a certain direction, and also knowing that internally, this may have been referred to as a soul document.

We just get a sense from the outset that this is a different sort of relationship. In terms of AI developer to AI model that Anthropic has and perceives, than perhaps we've seen from OpenAI or from Google or from other labs. And so just getting that background I think is important.

The second is to flag that Anthropic has been among the more outwardly supportive labs of AI regulation. So whereas some labs have come out with respect to various state AI bills and said, that's a bridge too far, or we only support this subject to quite substantial amendment, and Anthropic has raised its hand on more frequent occasions saying we invite some degree of regulation.

So with all that said, I'm fascinated by this document for many reasons, but first and foremost, because of its labeling as a constitution. And when we talk about constitutions, these are documents from a legal standpoint that are meant to set high overarching values for a legal system that guide more structural decision making and subsequent areas of law.

Now, there are only four core values spelled out in this constitution. The first is being broadly safe. The second is being broadly ethical. The third is compliant with Anthropic’s guidelines, and the fourth is genuinely helpful.

And each of those supersedes the other, so it, Claude must be broadly safe before it's broadly ethical; broadly ethical before it's compliant with Anthropic’s guidelines; and compliant with Anthropic’s guidelines before being genuinely helpful, so on and so forth.

Alan Rozenshtein: It's the four laws of robotics but for Claude.

Kevin Frazier: Yeah. Asimov's, you know, forgotten fourth value. Exactly.

Jakub Kraus: Alan, can you say what the four laws of robotics are, or Kevin?

Alan Rozenshtein: Oh yeah. So I think there are only three laws of robotics but, so the famous sci-fi author, Isaac Asimov put forward his famous three laws of robotics, and oh my god, if I don't get this right, they're gonna take away my nerd card. But the first law is, oh my god, the first law is—

Jakub Kraus: A robot can't do no harm, right?

Alan Rozenshtein: Yeah. A robot can't, can't do any harm.

Kevin Frazier: Don't help him! Don't help him.

Alan Rozenshtein: The second law—this is so bad. A robot can't do any harm. And then the second is something, and then the third is a robot can't allow itself to be harmed. I, this is bad. Just take away my nerd card.

It's so bad. But I mean, it's this, yeah.

Again, I think the content of the laws I think is less important than the idea that from the very beginning of thinking about robotics, there was this notion that you know, at the core, you're gonna need some very basic kind of hierarchical list of things to do and not do.

And if you get those right, like the idea is if you can get those right then a lot of, I mean, alignment—I mean, this was kind of what Isaac Asimov was really thinking about before we called it alignment—a lot of the kinda the alignment problems take care of themselves and of course inevitably a lot of is Asimov stories and Asimov-inspired stories are a kind of monkey’s paw curl of you know, the way that these laws, despite seeming obvious and correct misfire.

And so, you know, one could ask the same question about whether these four laws of Claude you know, might similarly misfire—are they the right laws? And I don't think, I don't think Anthropic would pretend to know the answer.

But you gotta start somewhere.

Kevin Frazier: Well, and just to flesh this out a little bit further too, is there is a sort of valence to constitutions that evokes a certain idea about the relationship between who's creating it and the user or the folks subject to that constitution that in some regards, I have problems with the use of the term constitution here.

Because as we're talking about AI governance, there's a lot of discussions about whether that regulation should be self-governance some form of multi-stakeholder approach among private actors, state driven, or federally driven or even internationally driven.

And to use the word constitution evokes some degree of sort of shared responsibility for both creating, crafting, and implementing a constitution. And yet one important carve out that has to be mentioned, and this was cited in a Time interview with a number of Anthropic individuals models deployed to the US military, quote, “wouldn't necessarily be trained on the same constitution,” end quote, according to an Anthropic spokesperson.

The Constitution of the United States applies to the entirety of its functions. We don't have a carve out for, oh, well, except for governance. Or except for excuse me. Excuse me. Except for national.

Alan Rozenshtein: Except for where it really matters, this constitution applies.

Kevin Frazier: Exactly. So, the utter irony too is that some of the risks that folks more concerned about AI safety will commonly raise are the use of weapons, for example, the use of cyber-attacks, the kind of real offensive capabilities that you would suspect would be core to what a defense plans to use something like Claude for.

So to have that carve out is somewhat problematic for me to still use this term constitutionalism. And then the second kind of broad concern here would be, again, constitutionalism implies a sort of social contract, and yet how users are supposed to be a part of this contract is unclear to me in terms of whether they'll have any role in amending or revising or helping ensure that this constitution is adhered to, is left undefined.

Jakub Kraus: Do you have any ideas on how that would happen? Should users submit a large feedback form to Anthropic? Should Anthropic hire people to go interview Americans and Ethiopians and everyone around the world? How does that work? Alan, I think in your piece that's coming out, you pointed out that this is a pretty Western document, and a lot of the authors come from a particular background and it doesn't seem necessarily representative of the whole world, but yeah.

Kevin, how do you think we can get users more involved?

Kevin Frazier: That is kind of presuming that I think users should be part of governing the model training, which I'm not sure I agree with.

I will say from the outset, efforts to do sort of lowercase “d” democratic governance of tech companies hasn't worked very well. The best example is Facebook for a little bit entertain the idea of kind of, user referenda on Facebook's values and bylaws or content moderation rules. I think maybe it was like 0.05% of users actually participated in that voting mechanism, and so it wasn't meaningful, and Facebook eventually abandoned it. I similarly think that there would be some power users and folks of specific mindsets and use cases of Claude that would dominate a sort of lowercase “d” democratic process.

But again, I'm not even sure the use of democratic mechanisms here makes sense, which is again, why I somewhat take issue with referring to this as a constitution.

Alan Rozenshtein: Yeah I tend to agree with you, Kevin. I'm not, you know, you mentioned the sort of meta example that didn't really work, and ironically meta.

The moment there was a, I forget what it, what specific policy issue threatened to actually get users to vote. That's immediately when Meta said, yeah, I think we're done with this. So, yeah I think the history of doing sort of small democratic processes doesn't work. What I think does work, and you know, here I'm gonna out myself as usual as a neoliberal shield, is the market mechanism.

There are lots of competitors, right? I mean, I think, you know, there's constant discussion in Silicon Valley about, you know, are there moats around, you know, do these companies have moats? And you know, it's an interesting question. I'm not qualified to answer that, but I think in the first instance, one quote unquote “moat,” or at least differentiator is the, for lack of a better term, vibes of a particular model.

You know, I think one reason why Claude is so popular especially among sort of Silicon Valley insiders, right? Why everyone uses Claude Code and not Codex or Gemini, even though those models are, in some senses actually better, right? They score higher on certain benchmarks, is because, and I, this is true for me too, as someone who essentially lives in Claude Code at this point, although, and I'm not a coder, I mean, 2% of it is coding. The rest of it's just my life.

Jakub Kraus: What are you doing living in Claude Code?

Just, oh yeah. I mean this, we can do, this is a separate episode, but I mean, if you think of Claude Code more as an agent that sits on your computer and can interact with folders and markdown files it's much more of a knowledge work agent than it is a coding agent.

I mean, it's kind of optimized for code, but the vast, you know, there's a huge overlap with knowledge work, so I find it extremely helpful. But a huge reason why I like to use Claude and a lot of other people like to use Claude is because the kind of ergon, the kind of ergonomics, the vibes are just really good.

And so I don't think you necessarily need to quote unquote small “d” democratic process in a kind of Deweyan-sense to have user input, presumably Anthropic is constantly doing market research on what its users like, and I think it's actually done a very good job in figuring this out and at least for the moment, and we can talk about whether this will be true in the long term, the incentives, I think, are quite aligned. Both in terms of having Claude be a quote unquote “good person,” whatever that means. Again, there's a lot to unpack there. And also Claude being a industry leading model, at least for a certain subset of users.

And I think this also then segues nicely into an answer to your question, Jakub, about is this a sort of Western model and is that going to go over well around the world? I think that it's a hundred percent a Western model is a quote unquote WEIRD model, right? WEIRD being the acronym for Western, educated, industrialized, rich, and democratic. I think that's what the I think that's what the acronym stands for,

Kevin Frazier: You remember all that and you can't remember, can't remember the three goddamn—

Alan Rozenshtein: The three laws of robotics. It's terrible!

You know, there's a great book by the Harvard anthropologist, Joseph Henrich called “The WEIRDest People in the World,” that's super, super interesting about how sort of unusual, in particular kind of, Western, liberal, democratic societies are.

I am a product of this society. I quite like this society, right? I don't necessarily feel like I need to go out on a limb and say whether it's objectively the best society. But I certainly prefer it to any other society. So I have no problem with Claude being a very WEIRD, in that sense, model. But I can also recognize that other societies and especially other governments that don't share kind of Western liberal democratic values, may not want this kind of model.

I think that's fine, right? And I think the market mechanism will sort that out. And look if, you know Saudi Arabia which is building massive capacity, both in terms of, you know, compute infrastructure and also its own homegrown talent wants to develop its own model. You know, if Saudi Arabia wants to come up with, you know, its own version of an agent that it thinks better reflects its own values. You know, it's not the, that's not the one I'm gonna use, but it's allowed to do that.

So, look, I think that it's good for and I wrote a piece about this with some co-authors for Lawfare, I think, you know, a couple years ago, back when Gemini was both crappy and woke and would do things like you know, give you multiracial Nazis when you ask for images of, you know, SS soldiers that there is no such thing as a quote unquote “neutral model.” Right, you know, all models have choices baked into them.

And that doesn't mean some models aren't better than other models. But that, I think the best thing that these developers can do is they can just be honest with what kind of model that they are putting forward. And I think Anthropic is, I think near the end of the document is admirably honest.

When it says, look, we think this is the best model. That's why we trained it in this way. We think it's the most ethical model. That's why we trained it in this way. We're not taking a position on whether in some universal objective sense, this is the right ethics, like that's not something we can answer right now.

But the, you know, this, we can't not make the best model we want to make. This is the best model according to us. And if you disagree, that's fine. There are other models. Go with God.

Jakub Kraus: I think it's—I wanna push back a little bit. It seems like there's a notion we're talking about now that, let the market decide everyone's gonna have their own constitution, it'll be great. But it strikes me that most of the other companies haven't released a constitution yet, and there might sometimes be a tension between a constitution that's good and a constitution that's making a lot of profit. I think some people have complained about Claude being overly refusing of responses out of a concern for ethics.

Sometimes the document, the constitution, talks about saying users shouldn't always have their way if they're trying to do something bad. So at first, I have a little hesitation on what might happen if we just let everyone do whatever they want regarding constitutions, I think we might not get constitutions.

And second, more generally, I wonder if there's any kind of policy intersection here. We had Anthropic pioneered the responsible scaling policy that sort of became an industry norm, and then California, New York are trying to make that an industry-wide requirement. Is that a direction that constitutions might go in?

If not, why not? Is there anything for policymakers to think about regarding constitutions and the market dynamics of this?

Alan Rozenshtein: Yeah, so let me tease out two. I think, let me tease out two different issues here that I think are, some are conflated, in your point. So the one question is do you need small “d” democratic governance from users to have models reflect user preferences?

And I think the answer is just no. Right? You don't, and you don't even need constitutions for that, 'cause remember whether or not a developer releases an 80-page document called the constitution written by a PhD, moral philosopher, right? Which is like one extreme of how you can do this.

All models have quote unquote “constitutions” in the sense that all models, which is just their post training, right, you know, whatever, RLHF and a million other things that happen once you have created a next token predictor on the entire internet. So, some of those I will like, like Claude. Some of those I will not like, I don't wanna use Grok, right? Like I have no interest in using a model that has been designed by people who think that it's okay to basically make non-consensual pornography of anyone publicly on the internet.

Like, I don't trust that model. That's not the model that I want to be using. That is a model with a constitution, that's a model with a personality. And other people might like that. And so, to the extent that you're trying to match users to models, like users will match to models just by using them for a few hours and deciding whose vibes do I like more, right?

There's a separate question of is it a good world in which every model developer can design whatever model that they want. That's an interesting question, right? We can have a policy argument about that. We can have a legal First Amendment argument about that. But if we as a society decide that we don't want, full freedom of model training, we want these models to have certain guardrails. Remember these models whatever constitutions they call themselves, are embedded in something much more important, which is reality, like the actual society in which they function, right? You know, sometimes arguments about digital technology have this sort of unreal quality as if like it's all in the cloud.

It's not in the cloud, it's on computers, and computers are in places, right? And those places have jurisdictions and police forces and armies and legislatures, right? You know, if at any moment a country wants to say, ‘No, you know, your models have to act in a certain way. They can just do that, see, e.g. China. So that's a totally separate conversation, I think.

And I think the question that is sparked by the Claude constitution, right? Maybe we should talk, maybe we should stop talking about the constitution. I think it's actually honestly much more useful to talk about as a soul document. I think that's actually much more accurate than his constitution is, you know, it, did it operate well for the purposes that Anthropic wanted it to operate, which is, which I think it did.

If you don't like those purposes, of course, then you might not like the document itself.

Jakub Kraus: Kevin, anything to add on that?

Kevin Frazier: Yeah, I mean, just going more off of the idea of a market based and more dynamic posture, I think one thing that stuck out to me is if we look at some of the initial public policy concerns related to AI use, let's start with probably the one that's top of mind for most state legislators and many members of the public right now, which is AI companions.

We've seen rapid responses by the private labs reflecting the fact that users don't want things that do bad things to their kids. Right? That's just a pure market dynamic. There's not a huge interest in a consumer saying, I am very pro tools that cause mental health concerns to my child, and we're seeing labs respond to that market incentive, right?

OpenAI has already changed its policies. Character AI kicked off minor users we're seeing innovative new approaches by, for example, OpenAI, I believe released yesterday, January 21st, a new mechanism for age verification. So I see this as one of many options to try to signal to consumers what the values and what the best use cases are of each model.

I think that this will get to many of the concerns some people have about the alleged bias of different models. When I talk to people around the country, oftentimes people still refer to the 2023 use of Gemini. When you were getting Nazis of all races, for example, generated as a result of a system prompt that encouraged more diversity in images and things like that. So I think–

Alan Rozenshtein: United Colors of Benetton set of Nazis as I like to think about it. Just, it's just so, just very heartwarming, man. We all come together.

Kevin Frazier: That was 2023. It's 2026, and folks are still indexing on something that is, is very old. And so I've been outspoken and I've written about the fact that I would love to see something akin to the MPA movie rating standards where you can go up and down an aisle at the movie, the at the rental store actually. What rental store is anyone going to? You can scroll on your phone and see, okay, is this rated G? Is this rated PG-13, R, and, so on and so forth. And quickly understand what it is you're trying to get from that movie or what it is you're trying to get from that model.

Perhaps my concern about this initial constitution is knowing that Claude is being trained to be, quote, broadly safe, broadly ethical, compliant with Anthropic guidelines, and genuinely helpful. Not to be too trite, but just doesn't really tell me anything, right? In terms of if I'm trying to be a savvy consumer of what is it that I'm actually looking for from a model.

This version of a constitution to me is devoid of the information that would actually help me be a more savvy AI consumer. And so I think this is a great initial start, and I think that setting high level values that inform how Claude's going to behave in novel situations and situations that developers can't necessarily know is admirable and a step in the right direction.

But I would push Anthropic and I would push all other labs to think about what are the metrics, what are the, what's the sort of information they can share that can actually make users more AI savvy? And distinguishing between, oh, I want to use this model versus that model.

Jakub Kraus: So we're talking a lot about consumer choice, which model they want to use. Claude has a different texture. Its vibes are good. I wonder if. Either of you wants to try to take a stab at defending the other AI companies here that aren't going Anthropic’s route to see if we can tease out what's unique about this constitution.

What are the benefits or costs of doing an approach like this to a product? I guess I'm also a little still a little reserved about thinking about the Constitution, purely as this is a way that they're making Claude and consumers can choose which ones they want because I think OpenAI is also trying to do that, and aAI and Meta are trying to do that, and Google are trying to do that, and they haven't really done a constitution in this way.

Google, OpenAI has a model spec, which talks about how it wants its models to behave. They certainly want their models to have good vibes and a way that a lot of people will use, Anthropic has more of a business market, so maybe it's maybe the businesses like the vibes of Claude more than the consumers who are using OpenAI.

But the only other document I've seen that's somewhat related is Google put out a, here's our approach to Gemini, and they referred to it as our approach to the Gemini. App and we wanna make a really good tool. And there's a pretty stark contrast to Anthropics approach of thinking of Claude as a kind of being a human-like entity that needs training in its personality and having a good personality.

So those are just a bunch of ideas I'm throwing out, but what do you think? Why aren't all the labs then going to put out their Constitution. My guess is it's because this constitution is a bit beyond character training or making a good product that people want to use. It's more of a risk management line of documents akin to the responsible scaling policy.

Alan Rozenshtein: Well, wait, so I guess trying to understand the question is your question, why aren't other companies releasing 80 page jighly philosophical treatises or why aren't other companies doing the sort of essentially virtue ethics? And if you want, we can sort of get into what I think is quite philosophically interesting about this document.

The kind of virtue ethics based training of their models relative to some other form of trading. So is the question about the document or the actual substance of what the companies are doing?

Jakub Kraus: Yeah, there's first a bit about the document. If this is a good thing for Claude's customer base, why aren't lots of companies trying to do this?

I suspect it's because it's not necessarily a great thing for the customer base. It connects a little bit to the policy question I was asking earlier of should this be more of a standard across the industry? Should this be more widely adopted? So there's the document itself. I think the more interesting question is the approach the document is taking to AI and Anthropic in general is hiring model welfare people thinking a lot about the catastrophic risks of their models, and that's part of this document as well.

The other companies aren't doing that as much, so what's their sort of stance on how they're trying to make there?

Alan Rozenshtein: Yeah. So I don't know if folks from OpenAI and, you know, Google and X and Meta are listening, come on, we'd love to hear your, you know, how you're doing this.

My, my guess would be that, either Anthropic is actually more AGI pilled than the other labs. So either they are actually taking AGI much more seriously and they are thinking, okay, well if AGI is around the corner, the best model we have for general intelligence is human general intelligence and how do you train human general intelligence?

Well, Aristotle was fundamentally right, like it turns out that Aristotle just got it right in the Nicomachean Ethics, you know, 2300 years ago or, you know, whenever he was. And a lot of modern psychology has borne that out, which is that the kind of fundamental unit of ethical decision making is not the Kant rule. It is not the Benthamite utilitarian calculus.

It is the Aristotelian virtue. It is the disposition. It is fundamentally a psychological way of seeing the world. And so the best way to align a artificial general intelligence or a, the, let's put it this way the best starting point for us humans to try to align an artificial general intelligence is to look to the nearest closest thing, which is us, and ask what makes a human a good human right?

And I, I think it, it's very compelling to think that what makes a human a good human is that they have certain dispositions to disposition, to be honest, a disposition to be helpful, a disposition to be merciful, a disposition to be thoughtful, et cetera, et cetera. And so we might as well try that with Claude.

So, so just to sort of sum up I think one possibility is that Anthropic is more AGI pilled than the other labs, and therefore they are taking the idea of artificial general intelligence more seriously. Or they're not more AGI pilled, but they just have a particular theory, right, of how general intelligences will operate and ought to be aligned.

I think this is a good example of how personnel is policy, right? I think that for whatever reason, when Anthropic kind of broke away from OpenAI, you know, it was like a bunch, it is like all the philosophers left, right? And they hired other philosophers and that's just what it is now, are they right?

My instinct is that they are correct, but I have absolutely no idea. Which is why I end my law fair piece with this kind of point that, you know, we've been debating these questions of moral formation, you know, for literally thousands of years now, we finally get to run the experiment. I'm fairly optimistic, but you know, it's been two days, so, we'll, you know, it'll take a while to figure it out.

Kevin Frazier: I think it's useful again, to return to Constitution as we normally understand them, right? Where you learn a tremendous amount more about a government looking at a traditional constitution than these core values that are set forth here. If anything, this reads to me not to draw this even more into legal land, like a declaration of independence or Bill of Rights, where it's much more high level and isn't necessarily telling you all the juicy details that might actually make you choose one government, for example, or one model over another.

By way of yet another analogy, again, sorry for fulfilling every loyally trope. One other analogy here would be, what's the information you care about when you buy a car? Right? What decides you buying that Subaru versus you buying that Lexus? It's gonna be price. It's gonna be the crash test rating. It's going to be, can I park this easily? Does it fit into my lifestyle, and is it available in my favorite color? When we talk about ai, the things that I think matter most to the average user, right, is, again, price is gonna be a huge one. Capabilities is gonna be a huge one. Is it good at what I want it to do?

And then related to crash test rating, does it avoid worse case scenarios with respect to my personal use case, right. When you buy a car on the edge, on the margin, no one's saying, oh, is this car going to guarantee against one day cars driving across the entirety of the country and taking, and parking lots, taking over every green space.

They will ask though perhaps about fuel efficiency, but again, mainly from a mindset of price at the gas station more so than necessarily climate motives. But that's my own. We can dive into that later in the AI context. I think people wanna know that information about how do you respond to kids, right? How do you take care of my data so that I can use this at work? Are you training this in a manner that will have the sort of stylistic optionality and features that I care most about?

That's not rising to the level of a constitution. To me it's more of like a nutrition label that we really need to be moving towards so that people actually understand what these models are doing and how they're going to impact them on a day-to-day basis.

I think that this document is perhaps more symbolic then anything else in terms of what the message is to users and I think and to the world globally. And I think that's important, and I applaud Anthropic for being so transparent and outwardly spoken about this, but I don't necessarily think every lab needs to have specific values, right?

Like you can go and buy a Patagonia jacket either because you really like the fact that they donate back to the climate or because you just really like Patagonia's gear. Right? And if one company just wants to be the good vest maker and another company wants to be the good vest maker who also cares about the planet, cool. But I don't think we have to mandate everyone suddenly become you know, that sort of a mission oriented company. There's a time and place for that, but I don't think that has to be the role of every AI company.

Alan Rozenshtein: Yeah. Yeah. I agree. Which is why I think the test is, does it lead to a better, the test is gonna be, does it lead to a better product? Right. And again, I mean, the field of AI is, it's so new, right? We don't still fundamentally understand how these models work. And I don't wanna overstate the case. There's a ton of work being done on, you know, mechanistic interpretability and stuff like that. It's obviously a research area, but, you know, I forget who said this, but you know, it's better to think of this, these models or be as being grown rather than being created.

Right? Sort of we're cre it's almost like we're creating a new biological organism and then we're going, huh, I wonder how this works. Right? Rather than creating a machine where you sort of know how it works, because the only way you can build it sort of layer by layer is to know how it works. So. Right now.

All I know is that I like using an Claude more than any other model because I prefer it's vibes. Right? And it really is a question of vibes. I'm not using that as kind of snarky sense. I just prefer interacting with that model more. Right. It feels to me that has better eq, which again, right.

A somewhat fraught thing to say about a model, but you know, it is what it is. Right. And I do want to talk at some point before we break about, you know, is it right for Anthropic to treat. Claude kind of as a person, as almost as like a small child in a sense. Right cause I do wanna stick up for that a little bit.

So I know one thing I like using Claude more than other models. Right. And that's not always been the case. You know, I loved using GPT for a while. I went through a Gemini phase, right. I still use all the models in kinda different use cases, but kind my daily driver is Claude.

And I also know that Claude is run by a bunch of philosophers who like to write 80 page, you know, like Nicomachean Ethics for AI. Is that correlation or is that causation? I have no idea. You know, I'm sure there are people in every one of the model labs who was thinking right now either man ask is onto to something we need to do this too, to get our vibes up, or this is actually orthogonal to how to get good, you know, good vibes and we don't need to do this.

Or actually, this is like, Claude is good despite all this philosophy crap, right? And in fact, this is a wrong turn. We will find out over the next several years I guess, but for right now, I'm happy to have a sort of, as a defeasible prior that it is. It is this virtue ethics approach that is at least partially the reason for the good vibes of Claude.

And again, it's because, and I will say I am AGI pilled. Right? I really do think that we are developing general intelligences. We are relatively close to getting most of the way there, that the most useful analogy for an artificial general intelligence is a human general intelligence.

And the reason that I like my friends, the reason I like my friends' vibes is because I like my friends' values and dispositions. Because again, it turns out that Aristotle was wrong about a lot of stuff, but he was just right fundamentally about human psychology 2300 years ago. Right. And all of human psychology and moral reasoning is mostly just footnotes on Aristotle.

Jakub Kraus: This is a lot to chew on. I think one point is, let's talk a bit more about the treating Claude as a person and the sentient of Claude, moral patienthood of Claude. I think that is a bit of the elephant in the room is, we've talked a lot about the business incentives here, and should the market be deciding how different companies are tailoring their ais to have different textures and response patterns. But I want to try to step away from all the profit considerations here and just think about the societal implications.

Alan Rozenshtein: Yeah. Well, so I, I don't think–

Jakub Kraus: If this is a moral patient or a person like entity that we're gonna sell to a billion users a month, that's a really weird thing and a really big deal.

And on the one hand, it immediately draws reason for caution. What if Claude doesn't like all the tasks it's doing every day? On the other hand, what if this is all a big distraction? Maybe some of the other companies think that. But do you either of you have thoughts on what, if we're building many people in computers,

Kevin Frazier: I'm just gonna jump in quickly and first say, because

Alan Rozenshtein: Kevin knows that I have way too many thoughts.

Kevin Frazier: Yeah. Also.

Alan Rozenshtein: So yes, Kevin, go first.

Kevin Frazier: And also this is a question that merits way more scrutiny than we're gonna be able to give it in this episode. But something that I just wanna emphasize is I am unabashedly human-centric and will always prioritize humanity over other things. And I am unashamed in that bias, and I think that so long as there are millions, if not billions of individuals who are struggling to find the basics of good life, shelter, food strong political environment in which they can experience freedom, that's always gonna be my paramount concern.

I think it's very much a problem if we begin to change our laws or structures around other beings and their welfare. Because to the extent we can even label AI a being, which is again, a very weighty topic I will always prioritize my fellow humans over everything else. And until we address those basic concerns, then I think this conversation is somewhat mooted.

Additionally, I think that it's distracting from the fact that. And I'm gonna beat this drum so much more in 2026, it's not my formal New Year's resolution, but I should have said it. Humans have agency. Humans can make decisions. We are capable of changing settings. We are capable of not using a tool. We are capable of deciding you want to use one product over another. We are capable of touching base with our friends and telling them not to use a tool. We are capable of reaching out to our employers and saying we have an issue with one model over another.

We can take more agency in this conversation and not just say, we are wholly reliant on a couple of people in San Francisco making our fate and making our values magically appear. And so I just want to beat that drum very loudly because the removal of agency here is. Very troubling and I would very much encourage people to read more Harry Law. Harry Law is a great scholar at the Cosmos Institute who's advanced the idea of tailoring how models perform on a user by user basis, which I think makes a ton of sense.

Let's empower users to design controls and have controls that shape model behavior, and worry less about trying to forecast what's best for all of humanity because that hasn't worked out well historically.

Jakub Kraus: So I assume Kevin before Alan, I know Alan knows a lot to say, but it strikes me that the Constitution Anthropic has created here, although they say maybe we'll do a little bit of a different one for the military, is almost precisely the opposite approach.

And Anthropic saying, well, we don't wanna be too paternalistic, but here's exactly how Claude should be, behave ethically across all the possible situations users might give to it. But I agree. There's also the user specific AI seems like it has a great appeal to it as well, but yeah, you guys take it away.

Alan Rozenshtein: I wanted to jump in mostly to tease Kevin and say that I assumed his 2026 resolution was to wear more bolo ties.

Kevin Frazier: If you're not boing, you ain't living Alan.

Alan Rozenshtein: Your task this year, Kevin, as my podcast co-host is to buy me a bolo tie and I will wear it challenging if you gimme a nice bolo tie.

So, so a lot there. I am happy to co-sign to Kevin that, you know, I think human interests must in some sense, come first, though I think the question is always at what margin, right? Because, you know, I think it's not crazy to, for example, say, you know, animal interests, non-human animal interests are less than human interests.

But we don't solve every human problem before we address, you know, the absolute horrors of factory farming. Right. And so I think you can do the same thing for AI and say, look like we can be human, we can be carbon-based life form centric but still wonder at what margin and if there is some chance that we are inflicting immense psychic pain, whatever that would mean in the context of an ai.

And we can fix that with not a lot of cost to humans. That's a thing worth. Thinking about and that honestly is very fair how I take very fair, these AI welfare conversations to go. Now I think earlier, Jakub, you said like, let's take this argument on its own terms and kind of put away the profit conversation for a second, which we should do though.

I think there is an interesting profit question because one can be a real cynic about this. This is not my view necessarily, but I could certainly imagine a world in which it is true, in which all of this human you know, AI welfare conversation from like companies like Anthropic is nonsense. They all know that it's false.

They're just doing this as a moat because if you can convince people that AIs have welfare. Then it becomes very easy to say, and only we anthropic are well positioned to take care of this. Right. And therefore, you know, you should only let us do it again. I don't have a reason to think that's what's going on, but I can imagine that as a kind of cynical critique.

Right. And we should, I guess, put that out there for completion's sake. My view is that the most intellectually on my view is that. The most intellectually honest approach to this question of AI welfare, and I think this is what is motivating anthropic, is we have no idea what makes human beings conscious, right?

This is a real problem. We have made almost zero progress in the thousands of years we've been thinking about this. All we know are the outward behavioral manifestations of this thing we assume exists, which is consciousness. We're not even sure if we're conscious, right? There's the famous zombie problem.

We're not even sure if other people are conscious, right? And if you're Daniel Dennet, right? The late great philosopher, you're not even sure that you are conscious, right? It might all just be an illusion. So all we have is the outward behavioral manifestations of consciousness.

Well, we now have these very sophisticated tools that like, by the way, passed the Turing test a year ago and like no one talked about that weird, that no one talked about that they passed the Turing test. And, and they are in some ways even more developed than we are. And in, you know, several years we could imagine might be more developed, more sophisticated on any level of outward manifestation of consciousness you could come up with.

There's no reason to think that human beings are the apogee of consciousness. So not only might we be dealing with a conscious being, we be dealing with a being that is more conscious than we are, right? In a way that we are more sentient than a dog. And AI may be more sentient than we are, right? Yeah, that's possible.

And everyone who scoffs at the idea of AI consciousness can never explain to me right on what basis they are benchmarking AI consciousness relative to their own consciousness just becomes kind of a feeling, right? And an almost feeling of offense of how dare you think that AI is conscious. It becomes an almost kind of religious disposition to prioritize human beings.

I get where that instinct is coming from, but I just think intellectually you have to be honest right about it. This is the kind of highfalutin argument for taking a o welfare seriously. I think the more honestly near term realistic reason to take a welfare seriously is because human beings will themselves demand it.

People get really attached to these AI models, right? When, OpenAI deprecated 4.0 people freaked out because 4.0 was their friend, right? And I don't mean it was like their no. It was their friend for all meaningful behavioral, kinda manifestations of those relationships as these models become more sophisticated, especially once we attach them to voice and real time video, give them faces, especially once we embody them in robots, which is obviously coming, right?

I think people are gonna start treating them as conscious. Now I have this theory that one of the great. Religious fractures of the 21st century, and I don't mean the late 21st century, I mean the next two decades of the 21st century is gonna be this question of, you know, do you believe AIs have souls?

And this is gonna be a real societal cleavage because some people will find that this revolting and some people will find that inescapable. Now, the real question, I think is then what do you do with that? You know, the thing about AI systems is as sophisticated as they are, humans have a lot of agency in defining their utility functions.

You know, I was watching a video earlier today of a border clie, like going through one of these like incredible like international dog competitions where they like run through all sorts of mazes and stuff like that. The only way I survive in social media is to have half of my feed be like cute animal videos, and this border collie is doing real work.

But as far as I can tell this border collie is. Like the happiest it could possibly be because it's a working dog. Right. I think just as we can design environments to give humans a sense of fulfillment and ammonia, there's no reason we can't invent environments for AIs, and if we can align those things, you can sort of have the best of both worlds.

Like it doesn't have to be this dystopian hellscape of we've created persons and therefore we've now immediately enslaved trillions of minds to something they hate doing. I think there are ways of scoring that circle while putting human interest first, but I do think you have to take this seriously and my ar-, my argument in this debate has never been a strong position on whether these things are conscious or not, but a strong position that you have to absolutely think about this and to not is, I don't know, it is intellectually unjustifiable to me relative to what we understand about human consciousness.

Kevin Frazier: And I very much agree that this merits tons and tons of more scholarly inquiry and democratic inquiry with the world over.

Jakub Kraus: Yeah, that's a good place to end it, I think. So I encourage listeners to contemplate for the rest of the day, are you the apogee of consciousness? Are humans? Is Claude? Stay tuned to scaling laws and Lawfare to figure it out. All right, thanks Kevin. Thanks Alan.

Alan Rozenshtein: Thanks Jakub.

Kevin Frazier: Scaling Laws is a joint production of Lawfare and the University of Texas School of Law. You can get an ad-free version of this and other Lawfare podcasts by becoming a material subscriber at our website lawfaremedia.org/support. You'll also get access to special events and other content available only to our supporters.

Please rate and review us wherever you get your podcasts. Check out our written work at lawfaremedia.org. You can also follow us on X and Bluesky. This podcast was edited by Noam Osband of Goat Rodeo. Our music is from Alibi.

As always, thanks for listening.

Topics:

Cybersecurity & Tech

Back to Top

Scaling Laws: Rapid Response to the Implications of Claude's New Constitution

Jakub Kraus

Kevin Frazier

Alan Z. Rozenshtein

Jakub Kraus

Kevin Frazier

Alan Z. Rozenshtein

More Articles

U.S. Vows to Fight Distillation Attacks

Lawfare Daily: The Dangers of Privatized, Automated Immigration Enforcement

Ukraine’s AI Gambit Shows Middle Powers How to Play a Weak Hand

Other Topics

Subscribe to Lawfare

Lawfare

Resources

About