Scaling Laws: Rapid Response to the Implications of Claude's New Constitution
Jakub Kraus, a Tarbell Fellow at Lawfare, speaks with Alan Rozenshtein, Associate Professor of Law at the University of Minnesota and Research Director at Lawfare, and Kevin Frazier, the AI Innovation and Law Fellow at the University of Texas School of Law, a Senior Fellow at the Abundance Institute, and a Senior Editor at Lawfare, about Anthropic's newly released "constitution" for its AI model, Claude.
The conversation covers the lengthy document's principles and underlying philosophical views, what these reveal about Anthropic's approach to AI development, how market forces are shaping the AI industry, and the weighty question of whether an AI model might ever be a conscious or morally relevant being.
Mentioned in this episode:
- Kevin Frazier, "Interpreting Claude's Constitution," Lawfare
- Alan Rozenshtein, "The Moral Education of an Alien Mind," Lawfare
Find Scaling Laws on the Lawfare website, and subscribe to never miss an episode.
To receive ad-free podcasts, become a Lawfare Material Supporter at www.patreon.com/lawfare. You can also support Lawfare by making a one-time donation at https://givebutter.com/lawfare-institute.
Click the button below to view a transcript of this podcast. Please note that the transcript was auto-generated and may contain errors.
Transcript
[Intro]
Alan Rozenshtein: It
is the Lawfare Podcast. I'm Alan Rozenshtein, associate professor of law
at the University of Minnesota, and a senior editor and research director at Lawfare.
Today we're bringing you something a little different, an episode from our new
podcast series, Scaling Laws. It's a creation of Lawfare and the
University of Texas School of Law where we're tackling the most important AI
and policy questions from new legislation on Capitol Hill to the latest
breakthroughs that are happening in the labs.
We cut through the hype to get you up to speed on the rules,
standards, and ideas shaping the future of this pivotal technology. If you
enjoy this episode, you can find and subscribe to Scaling Laws wherever you get
your podcasts. And follow us on X and Bluesky. Thanks for listening.
When the AI overlords takeover, what are you most excited
about?
Kevin Frazier: It's
not crazy. It's just smart.
Alan Rozenshtein: And
just this year, in the first six months, there have been something like a
thousand laws.
Kevin Frazier: Who's
actually building the scaffolding around how it's gonna work, how everyday
folks are gonna use it.
Alan Rozenshtein: AI
only works if society lets it work.
Kevin Frazier: There
are so many questions have to be figured out—and nobody came to my bonus class!
Let's enforce the rules of the road.
Jakub Kraus: Welcome
back to Scaling Laws, a podcast from Lawfare and the University of Texas
School of Law that explores the intersection of artificial intelligence, law,
and policy. I'm Jakub Kraus, a Tarbell Fellow at Lawfare, and today I'm
talking with Alan Rozenshtein, associate professor of law at the University of
Minnesota and research director at Lawfare.
And Kevin Frazier, the AI Innovation and Law Fellow at the
University of Texas School of Law, a senior fellow at the Abundance Institute
and a senior editor at Lawfare. Our focus is on Anthropic’s, recently
released Constitution for its AI model. Claude, which Alan and Kevin just wrote
about for Lawfare, we discussed the lengthy documents, principles, and
underlying philosophical views, what these reveal about Anthropic’s approach to
AI development, how market forces are shaping the AI industry, and the weighty
question of whether an AI model might ever be a conscious or morally relevant
being. You can reach us at scalinglaws@lawfaremedia.org, and we hope you enjoy
the show.
Alan and Kevin, thanks for coming on to talk about Claude's constitution.
Let's start with Alan. What were your initial impressions of the document and
what was this for readers who are listeners who are unfamiliar?
Alan Rozenshtein:
Yeah, I mean, my initial impression of the document was that it was very long.
It's 80 pages in PDF, I think it's like 22,000 words.
Which I mean, I'm a law professor, so that's my sweet spot for,
you know, law review articles. But I don't usually see things that long you
know, written by normies. Though maybe this is, maybe the idea that anything in
this world is written by normies is my first mistake.
So what is this? So, I guess stepping back, right? When these
models are trained, you basically start with what's called a pre-trained model,
which is basically a text prediction machine on the entire internet. And that
is kind of the core of all of these models, intellect, and we can put “intellect”
in scare quotes, but you know, their capabilities.
Obviously, the different models are different in how they're
trained, but because they're all at this point essentially training on the
entire internet and there is only one entire internet, the pre-trained versions
of these models are reasonably comparable. But pre-training is only the first
step, hence the pre-training.
After pre-training, there's a bunch of stuff that then happens
to move the model into a more useful direction according to however, the
developer wants the model to behave. This is often called training or
post-training, and there are a million different components of it.
And as part of this, a kind of, and again we can put scare
quotes around this, I'm sure we can put scare quotes around this entire
conversation and—I'll just stop putting scare quotes around anything—is the
kind of model's personality and again, different developers have taken
different approaches some more sophisticated, some more explicit than others.
And Anthropic in particular has taken I think, a very deeply
interesting approach to taking these kind of raw pre-trained models and making
them into something useful. Anthropic calls this quote unquote “constitutional AI,”
and I think we're gonna probably spend a bunch of time, especially because, you
know, Kevin and I are sort of law professors and we have very specific ideas of
what the word constitutional means, whether that's the right word, and in what
way is this akin to a kind of traditional constitution, but basically trying to
embed various principles and judgments and heuristics and guides into these
models.
Now again, I think every developer that is making a sort of
useful chatbot is doing something like this, whatever they call this. But I
think Anthropic has done the most sophisticated thinking about this. So about I
think a year ago, anthropic released an early version of what it called
Claude's constitution.
A relatively short document of I think like 20 or 30 kind of
high level principles. You know, be helpful, don't lie, that sort of thing, as
a kind of example of how it was training, claw to be useful and helpful and in
line with Anthropic’s values.
What Anthropic released earlier this week is the kind of full
version of its constitution. Again, this 80-page, 22,000 word document that is.
Meant to, I think simultaneously, and here I should go to the technical
details, but I guess it simultaneously meant as the document that Claude itself
uses to guide its behavior and also it is simultaneously an outward facing
document to the world as to what Claude is doing.
Over the last I think in the last couple of months, there was
some indication that there was a quote unquote, “soul document” someone had
managed to get Claud to output what seemed like a kind of constitution. And,
shortly after Amanda Askell, who is Anthropic’s, sort of philosopher-in-chief
and an actual PhD moral philosopher who also is the prime author of this new
constitution, she went on X and basically confirmed that yes, there is such a I
don't think it's necessarily called the soul document, but there is such a
document that has been used to train Claude and this constitution that was just
released is kind of a cleaned up, and somewhat more expanded version of this
document.
So, you know, Jakub, we can get in sort of whatever details you
want or we can sort of turn it over to Kevin. But basically this document is
meant to set out how cl Anthropic thinks about training Claude, how Claude
relates to Anthropic, to deployers, to users. And then the part that interests
me, personally the most, a kind of deeply interesting discussion of moral
philosophy and character formation as applied to magic sand.
Jakub Kraus: Yeah.
Kevin, I want to hear more of your piece, more of your thoughts relating to
your piece on Claude's constitution and comparisons to the U.S. Constitution
and before that,
I'm interested if you have reactions to what Alan was saying
there with, this is previously called a soul document and there's a fair amount
of treating—scare quotes Alan is using as he is talking. Anthropic is doing
something a bit unusual from the other labs by focusing on Claude as more than
a tool, almost treating it human-like.
Do you think that's a fair direction to go in with this kind of
document? Is that a good direction?
Kevin Frazier: So
I'll start off by saying that I would definitely not categorize this as a
document that was crafted by normies. No offense to Alan's initial use of the
term and not to call them non-normies, or I'm not sure what the—
Alan Rozenshtein: I,
I mean, I've never met, I've never met Amanda personally.
I suspect she's the sort of person that would be offended if we
called her a normie. So, this is obviously with all due compliments to Amanda.
Like nothing about this is normie.
Kevin Frazier:
Amanda, when you listen, note that it was Alan who first alleged that you all
were normies, not me. So when we invite you to Scaling Laws to come explain
this document in even more detail, please be nice to me and mean to Alan.
That's general advice for all.
Alan Rozenshtein: No
I specifically said none of these are normies. None of these are normies. No
one's a normie here.
Kevin Frazier: Yeah, exactly.
Exactly. Sure.
So what's really important to point out is that Anthropic from
the get go in, its, maybe we'll call it the preamble, explaining the purpose of
this constitution specifies that their approach to AI development is regarding
themselves as being on the vanguard of doing it safely, and they very much view
their company's mission as pursuing the frontier of AI, but doing it in a way
that they think better aligns with human values and the long-term success of
humanity, more so than other labs.
And so it's just really important to put this constitution in
the context of Anthropic’s underlying mission. Perhaps it's national ambition
if we're gonna move forward and carry forward this constitutional analogy.
And Jakub, as you said, and as Alan alluded to, it's impossible
not to also bring in some of these questions of consciousness and the extent to
which AI may be something greater than just bits of data and sophisticated
computer training. That is a topic that warrants, and it will receive,
incredible additional inquiry on scaling laws and by tons of other scholars and
by an interdisciplinary set of actors.
But it's worth noting that from the outset, Amanda in an
interview with Time referred to training a 6-year-old as sort of an analogy for
trying to train Claude. The idea that this 6-year-old can very much probe
whether you're being true or false or whether you're trying to deceive it or
whether you're trying to guide it in a certain direction, and also knowing that
internally, this may have been referred to as a soul document.
We just get a sense from the outset that this is a different
sort of relationship. In terms of AI developer to AI model that Anthropic has
and perceives, than perhaps we've seen from OpenAI or from Google or from other
labs. And so just getting that background I think is important.
The second is to flag that Anthropic has been among the more outwardly
supportive labs of AI regulation. So whereas some labs have come out with
respect to various state AI bills and said, that's a bridge too far, or we only
support this subject to quite substantial amendment, and Anthropic has raised
its hand on more frequent occasions saying we invite some degree of regulation.
So with all that said, I'm fascinated by this document for many
reasons, but first and foremost, because of its labeling as a constitution. And
when we talk about constitutions, these are documents from a legal standpoint
that are meant to set high overarching values for a legal system that guide
more structural decision making and subsequent areas of law.
Now, there are only four core values spelled out in this
constitution. The first is being broadly safe. The second is being broadly
ethical. The third is compliant with Anthropic’s guidelines, and the fourth is
genuinely helpful.
And each of those supersedes the other, so it, Claude must be
broadly safe before it's broadly ethical; broadly ethical before it's compliant
with Anthropic’s guidelines; and compliant with Anthropic’s guidelines before
being genuinely helpful, so on and so forth.
Alan Rozenshtein:
It's the four laws of robotics but for Claude.
Kevin Frazier: Yeah.
Asimov's, you know, forgotten fourth value. Exactly.
Jakub Kraus: Alan,
can you say what the four laws of robotics are, or Kevin?
Alan Rozenshtein: Oh
yeah. So I think there are only three laws of robotics but, so the famous
sci-fi author, Isaac Asimov put forward his famous three laws of robotics, and
oh my god, if I don't get this right, they're gonna take away my nerd card. But
the first law is, oh my god, the first law is—
Jakub Kraus: A robot
can't do no harm, right?
Alan Rozenshtein:
Yeah. A robot can't, can't do any harm.
Kevin Frazier: Don't
help him! Don't help him.
Alan Rozenshtein: The
second law—this is so bad. A robot can't do any harm. And then the second is
something, and then the third is a robot can't allow itself to be harmed. I,
this is bad. Just take away my nerd card.
It's so bad. But I mean, it's this, yeah.
Again, I think the content of the laws I think is less
important than the idea that from the very beginning of thinking about
robotics, there was this notion that you know, at the core, you're gonna need
some very basic kind of hierarchical list of things to do and not do.
And if you get those right, like the idea is if you can get
those right then a lot of, I mean, alignment—I mean, this was kind of what
Isaac Asimov was really thinking about before we called it alignment—a lot of
the kinda the alignment problems take care of themselves and of course
inevitably a lot of is Asimov stories and Asimov-inspired stories are a kind of
monkey’s paw curl of you know, the way that these laws, despite seeming obvious
and correct misfire.
And so, you know, one could ask the same question about whether
these four laws of Claude you know, might similarly misfire—are they the right
laws? And I don't think, I don't think Anthropic would pretend to know the
answer.
But you gotta start somewhere.
Kevin Frazier: Well,
and just to flesh this out a little bit further too, is there is a sort of
valence to constitutions that evokes a certain idea about the relationship
between who's creating it and the user or the folks subject to that
constitution that in some regards, I have problems with the use of the term
constitution here.
Because as we're talking about AI governance, there's a lot of
discussions about whether that regulation should be self-governance some form
of multi-stakeholder approach among private actors, state driven, or federally
driven or even internationally driven.
And to use the word constitution evokes some degree of sort of
shared responsibility for both creating, crafting, and implementing a
constitution. And yet one important carve out that has to be mentioned, and
this was cited in a Time interview with a number of Anthropic individuals
models deployed to the US military, quote, “wouldn't necessarily be trained on
the same constitution,” end quote, according to an Anthropic spokesperson.
The Constitution of the United States applies to the entirety
of its functions. We don't have a carve out for, oh, well, except for
governance. Or except for excuse me. Excuse me. Except for national.
Alan Rozenshtein:
Except for where it really matters, this constitution applies.
Kevin Frazier:
Exactly. So, the utter irony too is that some of the risks that folks more
concerned about AI safety will commonly raise are the use of weapons, for
example, the use of cyber-attacks, the kind of real offensive capabilities that
you would suspect would be core to what a defense plans to use something like
Claude for.
So to have that carve out is somewhat problematic for me to
still use this term constitutionalism. And then the second kind of broad
concern here would be, again, constitutionalism implies a sort of social
contract, and yet how users are supposed to be a part of this contract is
unclear to me in terms of whether they'll have any role in amending or revising
or helping ensure that this constitution is adhered to, is left undefined.
Jakub Kraus: Do you
have any ideas on how that would happen? Should users submit a large feedback
form to Anthropic? Should Anthropic hire people to go interview Americans and
Ethiopians and everyone around the world? How does that work? Alan, I think in
your piece that's coming out, you pointed out that this is a pretty Western
document, and a lot of the authors come from a particular background and it
doesn't seem necessarily representative of the whole world, but yeah.
Kevin, how do you think we can get users more involved?
Kevin Frazier: That
is kind of presuming that I think users should be part of governing the model
training, which I'm not sure I agree with.
I will say from the outset, efforts to do sort of lowercase “d”
democratic governance of tech companies hasn't worked very well. The best
example is Facebook for a little bit entertain the idea of kind of, user
referenda on Facebook's values and bylaws or content moderation rules. I think
maybe it was like 0.05% of users actually participated in that voting
mechanism, and so it wasn't meaningful, and Facebook eventually abandoned it. I
similarly think that there would be some power users and folks of specific
mindsets and use cases of Claude that would dominate a sort of lowercase “d”
democratic process.
But again, I'm not even sure the use of democratic mechanisms
here makes sense, which is again, why I somewhat take issue with referring to
this as a constitution.
Alan Rozenshtein:
Yeah I tend to agree with you, Kevin. I'm not, you know, you mentioned the sort
of meta example that didn't really work, and ironically meta.
The moment there was a, I forget what it, what specific policy
issue threatened to actually get users to vote. That's immediately when Meta
said, yeah, I think we're done with this. So, yeah I think the history of doing
sort of small democratic processes doesn't work. What I think does work, and
you know, here I'm gonna out myself as usual as a neoliberal shield, is the
market mechanism.
There are lots of competitors, right? I mean, I think, you
know, there's constant discussion in Silicon Valley about, you know, are there
moats around, you know, do these companies have moats? And you know, it's an
interesting question. I'm not qualified to answer that, but I think in the
first instance, one quote unquote “moat,” or at least differentiator is the, for
lack of a better term, vibes of a particular model.
You know, I think one reason why Claude is so popular
especially among sort of Silicon Valley insiders, right? Why everyone uses
Claude Code and not Codex or Gemini, even though those models are, in some
senses actually better, right? They score higher on certain benchmarks, is
because, and I, this is true for me too, as someone who essentially lives in
Claude Code at this point, although, and I'm not a coder, I mean, 2% of it is coding.
The rest of it's just my life.
Jakub Kraus: What are
you doing living in Claude Code?
Just, oh yeah. I mean this, we can do, this is a separate
episode, but I mean, if you think of Claude Code more as an agent that sits on
your computer and can interact with folders and markdown files it's much more
of a knowledge work agent than it is a coding agent.
I mean, it's kind of optimized for code, but the vast, you
know, there's a huge overlap with knowledge work, so I find it extremely
helpful. But a huge reason why I like to use Claude and a lot of other people
like to use Claude is because the kind of ergon, the kind of ergonomics, the
vibes are just really good.
And so I don't think you necessarily need to quote unquote
small “d” democratic process in a kind of Deweyan-sense to have user input, presumably
Anthropic is constantly doing market research on what its users like, and I
think it's actually done a very good job in figuring this out and at least for
the moment, and we can talk about whether this will be true in the long term, the
incentives, I think, are quite aligned. Both in terms of having Claude be a
quote unquote “good person,” whatever that means. Again, there's a lot to
unpack there. And also Claude being a industry leading model, at least for a
certain subset of users.
And I think this also then segues nicely into an answer to your
question, Jakub, about is this a sort of Western model and is that going to go
over well around the world? I think that it's a hundred percent a Western model
is a quote unquote WEIRD model, right? WEIRD being the acronym for Western, educated,
industrialized, rich, and democratic. I think that's what the I think that's
what the acronym stands for,
Kevin Frazier: You
remember all that and you can't remember, can't remember the three goddamn—
Alan Rozenshtein: The
three laws of robotics. It's terrible!
You know, there's a great book by the Harvard anthropologist,
Joseph Henrich called “The WEIRDest People in the World,” that's super, super
interesting about how sort of unusual, in particular kind of, Western, liberal,
democratic societies are.
I am a product of this society. I quite like this society,
right? I don't necessarily feel like I need to go out on a limb and say whether
it's objectively the best society. But I certainly prefer it to any other
society. So I have no problem with Claude being a very WEIRD, in that sense,
model. But I can also recognize that other societies and especially other
governments that don't share kind of Western liberal democratic values, may not
want this kind of model.
I think that's fine, right? And I think the market mechanism
will sort that out. And look if, you know Saudi Arabia which is building
massive capacity, both in terms of, you know, compute infrastructure and also
its own homegrown talent wants to develop its own model. You know, if Saudi
Arabia wants to come up with, you know, its own version of an agent that it
thinks better reflects its own values. You know, it's not the, that's not the
one I'm gonna use, but it's allowed to do that.
So, look, I think that it's good for and I wrote a piece about
this with some co-authors for Lawfare, I think, you know, a couple years
ago, back when Gemini was both crappy and woke and would do things like you
know, give you multiracial Nazis when you ask for images of, you know, SS
soldiers that there is no such thing as a quote unquote “neutral model.” Right,
you know, all models have choices baked into them.
And that doesn't mean some models aren't better than other
models. But that, I think the best thing that these developers can do is they
can just be honest with what kind of model that they are putting forward. And I
think Anthropic is, I think near the end of the document is admirably honest.
When it says, look, we think this is the best model. That's why
we trained it in this way. We think it's the most ethical model. That's why we
trained it in this way. We're not taking a position on whether in some
universal objective sense, this is the right ethics, like that's not something
we can answer right now.
But the, you know, this, we can't not make the best model we
want to make. This is the best model according to us. And if you disagree,
that's fine. There are other models. Go with God.
Jakub Kraus: I think
it's—I wanna push back a little bit. It seems like there's a notion we're
talking about now that, let the market decide everyone's gonna have their own
constitution, it'll be great. But it strikes me that most of the other
companies haven't released a constitution yet, and there might sometimes be a
tension between a constitution that's good and a constitution that's making a
lot of profit. I think some people have complained about Claude being overly
refusing of responses out of a concern for ethics.
Sometimes the document, the constitution, talks about saying
users shouldn't always have their way if they're trying to do something bad. So
at first, I have a little hesitation on what might happen if we just let
everyone do whatever they want regarding constitutions, I think we might not get
constitutions.
And second, more generally, I wonder if there's any kind of
policy intersection here. We had Anthropic pioneered the responsible scaling
policy that sort of became an industry norm, and then California, New York are
trying to make that an industry-wide requirement. Is that a direction that
constitutions might go in?
If not, why not? Is there anything for policymakers to think
about regarding constitutions and the market dynamics of this?
Alan Rozenshtein:
Yeah, so let me tease out two. I think, let me tease out two different issues
here that I think are, some are conflated, in your point. So the one question
is do you need small “d” democratic governance from users to have models
reflect user preferences?
And I think the answer is just no. Right? You don't, and you
don't even need constitutions for that, 'cause remember whether or not a
developer releases an 80-page document called the constitution written by a
PhD, moral philosopher, right? Which is like one extreme of how you can do
this.
All models have quote unquote “constitutions” in the sense that
all models, which is just their post training, right, you know, whatever, RLHF
and a million other things that happen once you have created a next token
predictor on the entire internet. So, some of those I will like, like Claude.
Some of those I will not like, I don't wanna use Grok, right? Like I have no
interest in using a model that has been designed by people who think that it's
okay to basically make non-consensual pornography of anyone publicly on the
internet.
Like, I don't trust that model. That's not the model that I
want to be using. That is a model with a constitution, that's a model with a
personality. And other people might like that. And so, to the extent that
you're trying to match users to models, like users will match to models just by
using them for a few hours and deciding whose vibes do I like more, right?
There's a separate question of is it a good world in which every
model developer can design whatever model that they want. That's an interesting
question, right? We can have a policy argument about that. We can have a legal
First Amendment argument about that. But if we as a society decide that we
don't want, full freedom of model training, we want these models to have
certain guardrails. Remember these models whatever constitutions they call
themselves, are embedded in something much more important, which is reality,
like the actual society in which they function, right? You know, sometimes
arguments about digital technology have this sort of unreal quality as if like
it's all in the cloud.
It's not in the cloud, it's on computers, and computers are in
places, right? And those places have jurisdictions and police forces and armies
and legislatures, right? You know, if at any moment a country wants to say, ‘No,
you know, your models have to act in a certain way. They can just do that, see,
e.g. China. So that's a totally separate conversation, I think.
And I think the question that is sparked by the Claude constitution,
right? Maybe we should talk, maybe we should stop talking about the constitution.
I think it's actually honestly much more useful to talk about as a soul
document. I think that's actually much more accurate than his constitution is,
you know, it, did it operate well for the purposes that Anthropic wanted it to
operate, which is, which I think it did.
If you don't like those purposes, of course, then you might not
like the document itself.
Jakub Kraus: Kevin,
anything to add on that?
Kevin Frazier: Yeah,
I mean, just going more off of the idea of a market based and more dynamic
posture, I think one thing that stuck out to me is if we look at some of the
initial public policy concerns related to AI use, let's start with probably the
one that's top of mind for most state legislators and many members of the
public right now, which is AI companions.
We've seen rapid responses by the private labs reflecting the
fact that users don't want things that do bad things to their kids. Right?
That's just a pure market dynamic. There's not a huge interest in a consumer
saying, I am very pro tools that cause mental health concerns to my child, and
we're seeing labs respond to that market incentive, right?
OpenAI has already changed its policies. Character AI kicked
off minor users we're seeing innovative new approaches by, for example, OpenAI,
I believe released yesterday, January 21st, a new mechanism for age
verification. So I see this as one of many options to try to signal to
consumers what the values and what the best use cases are of each model.
I think that this will get to many of the concerns some people
have about the alleged bias of different models. When I talk to people around
the country, oftentimes people still refer to the 2023 use of Gemini. When you
were getting Nazis of all races, for example, generated as a result of a system
prompt that encouraged more diversity in images and things like that. So I
think–
Alan Rozenshtein: United
Colors of Benetton set of Nazis as I like to think about it. Just, it's just
so, just very heartwarming, man. We all come together.
Kevin Frazier: That
was 2023. It's 2026, and folks are still indexing on something that is, is very
old. And so I've been outspoken and I've written about the fact that I would
love to see something akin to the MPA movie rating standards where you can go
up and down an aisle at the movie, the at the rental store actually. What
rental store is anyone going to? You can scroll on your phone and see, okay, is
this rated G? Is this rated PG-13, R, and, so on and so forth. And quickly
understand what it is you're trying to get from that movie or what it is you're
trying to get from that model.
Perhaps my concern about this initial constitution is knowing
that Claude is being trained to be, quote, broadly safe, broadly ethical,
compliant with Anthropic guidelines, and genuinely helpful. Not to be too
trite, but just doesn't really tell me anything, right? In terms of if I'm
trying to be a savvy consumer of what is it that I'm actually looking for from
a model.
This version of a constitution to me is devoid of the
information that would actually help me be a more savvy AI consumer. And so I
think this is a great initial start, and I think that setting high level values
that inform how Claude's going to behave in novel situations and situations
that developers can't necessarily know is admirable and a step in the right
direction.
But I would push Anthropic and I would push all other labs to
think about what are the metrics, what are the, what's the sort of information
they can share that can actually make users more AI savvy? And distinguishing
between, oh, I want to use this model versus that model.
Jakub Kraus: So we're
talking a lot about consumer choice, which model they want to use. Claude has a
different texture. Its vibes are good. I wonder if. Either of you wants to try
to take a stab at defending the other AI companies here that aren't going Anthropic’s
route to see if we can tease out what's unique about this constitution.
What are the benefits or costs of doing an approach like this
to a product? I guess I'm also a little still a little reserved about thinking
about the Constitution, purely as this is a way that they're making Claude and
consumers can choose which ones they want because I think OpenAI is also trying
to do that, and aAI and Meta are trying to do that, and Google are trying to do
that, and they haven't really done a constitution in this way.
Google, OpenAI has a model spec, which talks about how it wants
its models to behave. They certainly want their models to have good vibes and a
way that a lot of people will use, Anthropic has more of a business market, so
maybe it's maybe the businesses like the vibes of Claude more than the
consumers who are using OpenAI.
But the only other document I've seen that's somewhat related
is Google put out a, here's our approach to Gemini, and they referred to it as
our approach to the Gemini. App and we wanna make a really good tool. And
there's a pretty stark contrast to Anthropics approach of thinking of Claude as
a kind of being a human-like entity that needs training in its personality and
having a good personality.
So those are just a bunch of ideas I'm throwing out, but what
do you think? Why aren't all the labs then going to put out their Constitution.
My guess is it's because this constitution is a bit beyond character training
or making a good product that people want to use. It's more of a risk
management line of documents akin to the responsible scaling policy.
Alan Rozenshtein:
Well, wait, so I guess trying to understand the question is your question, why
aren't other companies releasing 80 page jighly philosophical treatises or why
aren't other companies doing the sort of essentially virtue ethics? And if you
want, we can sort of get into what I think is quite philosophically interesting
about this document.
The kind of virtue ethics based training of their models
relative to some other form of trading. So is the question about the document
or the actual substance of what the companies are doing?
Jakub Kraus: Yeah,
there's first a bit about the document. If this is a good thing for Claude's
customer base, why aren't lots of companies trying to do this?
I suspect it's because it's not necessarily a great thing for
the customer base. It connects a little bit to the policy question I was asking
earlier of should this be more of a standard across the industry? Should this
be more widely adopted? So there's the document itself. I think the more
interesting question is the approach the document is taking to AI and Anthropic
in general is hiring model welfare people thinking a lot about the catastrophic
risks of their models, and that's part of this document as well.
The other companies aren't doing that as much, so what's their
sort of stance on how they're trying to make there?
Alan Rozenshtein:
Yeah. So I don't know if folks from OpenAI and, you know, Google and X and Meta
are listening, come on, we'd love to hear your, you know, how you're doing
this.
My, my guess would be that, either Anthropic is actually more AGI
pilled than the other labs. So either they are actually taking AGI much more
seriously and they are thinking, okay, well if AGI is around the corner, the
best model we have for general intelligence is human general intelligence and
how do you train human general intelligence?
Well, Aristotle was fundamentally right, like it turns out that
Aristotle just got it right in the Nicomachean Ethics, you know, 2300 years ago
or, you know, whenever he was. And a lot of modern psychology has borne that
out, which is that the kind of fundamental unit of ethical decision making is
not the Kant rule. It is not the Benthamite utilitarian calculus.
It is the Aristotelian virtue. It is the disposition. It is
fundamentally a psychological way of seeing the world. And so the best way to
align a artificial general intelligence or a, the, let's put it this way the
best starting point for us humans to try to align an artificial general
intelligence is to look to the nearest closest thing, which is us, and ask what
makes a human a good human right?
And I, I think it, it's very compelling to think that what
makes a human a good human is that they have certain dispositions to
disposition, to be honest, a disposition to be helpful, a disposition to be
merciful, a disposition to be thoughtful, et cetera, et cetera. And so we might
as well try that with Claude.
So, so just to sort of sum up I think one possibility is that Anthropic
is more AGI pilled than the other labs, and therefore they are taking the idea
of artificial general intelligence more seriously. Or they're not more AGI pilled,
but they just have a particular theory, right, of how general intelligences
will operate and ought to be aligned.
I think this is a good example of how personnel is policy,
right? I think that for whatever reason, when Anthropic kind of broke away from
OpenAI, you know, it was like a bunch, it is like all the philosophers left,
right? And they hired other philosophers and that's just what it is now, are
they right?
My instinct is that they are correct, but I have absolutely no
idea. Which is why I end my law fair piece with this kind of point that, you
know, we've been debating these questions of moral formation, you know, for
literally thousands of years now, we finally get to run the experiment. I'm
fairly optimistic, but you know, it's been two days, so, we'll, you know, it'll
take a while to figure it out.
Kevin Frazier: I
think it's useful again, to return to Constitution as we normally understand
them, right? Where you learn a tremendous amount more about a government
looking at a traditional constitution than these core values that are set forth
here. If anything, this reads to me not to draw this even more into legal land,
like a declaration of independence or Bill of Rights, where it's much more high
level and isn't necessarily telling you all the juicy details that might
actually make you choose one government, for example, or one model over
another.
By way of yet another analogy, again, sorry for fulfilling
every loyally trope. One other analogy here would be, what's the information
you care about when you buy a car? Right? What decides you buying that Subaru
versus you buying that Lexus? It's gonna be price. It's gonna be the crash test
rating. It's going to be, can I park this easily? Does it fit into my
lifestyle, and is it available in my favorite color? When we talk about ai, the
things that I think matter most to the average user, right, is, again, price is
gonna be a huge one. Capabilities is gonna be a huge one. Is it good at what I
want it to do?
And then related to crash test rating, does it avoid worse case
scenarios with respect to my personal use case, right. When you buy a car on
the edge, on the margin, no one's saying, oh, is this car going to guarantee
against one day cars driving across the entirety of the country and taking, and
parking lots, taking over every green space.
They will ask though perhaps about fuel efficiency, but again,
mainly from a mindset of price at the gas station more so than necessarily
climate motives. But that's my own. We can dive into that later in the AI
context. I think people wanna know that information about how do you respond to
kids, right? How do you take care of my data so that I can use this at work?
Are you training this in a manner that will have the sort of stylistic
optionality and features that I care most about?
That's not rising to the level of a constitution. To me it's
more of like a nutrition label that we really need to be moving towards so that
people actually understand what these models are doing and how they're going to
impact them on a day-to-day basis.
I think that this document is perhaps more symbolic then
anything else in terms of what the message is to users and I think and to the
world globally. And I think that's important, and I applaud Anthropic for being
so transparent and outwardly spoken about this, but I don't necessarily think
every lab needs to have specific values, right?
Like you can go and buy a Patagonia jacket either because you
really like the fact that they donate back to the climate or because you just
really like Patagonia's gear. Right? And if one company just wants to be the
good vest maker and another company wants to be the good vest maker who also
cares about the planet, cool. But I don't think we have to mandate everyone
suddenly become you know, that sort of a mission oriented company. There's a
time and place for that, but I don't think that has to be the role of every AI
company.
Alan Rozenshtein:
Yeah. Yeah. I agree. Which is why I think the test is, does it lead to a
better, the test is gonna be, does it lead to a better product? Right. And
again, I mean, the field of AI is, it's so new, right? We don't still
fundamentally understand how these models work. And I don't wanna overstate the
case. There's a ton of work being done on, you know, mechanistic
interpretability and stuff like that. It's obviously a research area, but, you
know, I forget who said this, but you know, it's better to think of this, these
models or be as being grown rather than being created.
Right? Sort of we're cre it's almost like we're creating a new
biological organism and then we're going, huh, I wonder how this works. Right?
Rather than creating a machine where you sort of know how it works, because the
only way you can build it sort of layer by layer is to know how it works. So.
Right now.
All I know is that I like using an Claude more than any other
model because I prefer it's vibes. Right? And it really is a question of vibes.
I'm not using that as kind of snarky sense. I just prefer interacting with that
model more. Right. It feels to me that has better eq, which again, right.
A somewhat fraught thing to say about a model, but you know, it
is what it is. Right. And I do want to talk at some point before we break
about, you know, is it right for Anthropic to treat. Claude kind of as a
person, as almost as like a small child in a sense. Right cause I do wanna
stick up for that a little bit.
So I know one thing I like using Claude more than other models.
Right. And that's not always been the case. You know, I loved using GPT for a
while. I went through a Gemini phase, right. I still use all the models in
kinda different use cases, but kind my daily driver is Claude.
And I also know that Claude is run by a bunch of philosophers
who like to write 80 page, you know, like Nicomachean Ethics for AI. Is that
correlation or is that causation? I have no idea. You know, I'm sure there are
people in every one of the model labs who was thinking right now either man ask
is onto to something we need to do this too, to get our vibes up, or this is
actually orthogonal to how to get good, you know, good vibes and we don't need
to do this.
Or actually, this is like, Claude is good despite all this
philosophy crap, right? And in fact, this is a wrong turn. We will find out
over the next several years I guess, but for right now, I'm happy to have a
sort of, as a defeasible prior that it is. It is this virtue ethics approach
that is at least partially the reason for the good vibes of Claude.
And again, it's because, and I will say I am AGI pilled. Right?
I really do think that we are developing general intelligences. We are
relatively close to getting most of the way there, that the most useful analogy
for an artificial general intelligence is a human general intelligence.
And the reason that I like my friends, the reason I like my
friends' vibes is because I like my friends' values and dispositions. Because
again, it turns out that Aristotle was wrong about a lot of stuff, but he was
just right fundamentally about human psychology 2300 years ago. Right. And all
of human psychology and moral reasoning is mostly just footnotes on Aristotle.
Jakub Kraus: This is
a lot to chew on. I think one point is, let's talk a bit more about the
treating Claude as a person and the sentient of Claude, moral patienthood of
Claude. I think that is a bit of the elephant in the room is, we've talked a
lot about the business incentives here, and should the market be deciding how
different companies are tailoring their ais to have different textures and
response patterns. But I want to try to step away from all the profit
considerations here and just think about the societal implications.
Alan Rozenshtein:
Yeah. Well, so I, I don't think–
Jakub Kraus: If this
is a moral patient or a person like entity that we're gonna sell to a billion
users a month, that's a really weird thing and a really big deal.
And on the one hand, it immediately draws reason for caution.
What if Claude doesn't like all the tasks it's doing every day? On the other
hand, what if this is all a big distraction? Maybe some of the other companies
think that. But do you either of you have thoughts on what, if we're building
many people in computers,
Kevin Frazier: I'm
just gonna jump in quickly and first say, because
Alan Rozenshtein: Kevin knows that I have way too many
thoughts.
Kevin Frazier: Yeah. Also.
Alan Rozenshtein: So yes, Kevin, go first.
Kevin Frazier: And also this is a question that merits
way more scrutiny than we're gonna be able to give it in this episode. But
something that I just wanna emphasize is I am unabashedly human-centric and
will always prioritize humanity over other things. And I am unashamed in that
bias, and I think that so long as there are millions, if not billions of
individuals who are struggling to find the basics of good life, shelter, food
strong political environment in which they can experience freedom, that's
always gonna be my paramount concern.
I think it's very much a problem if we begin to change our laws
or structures around other beings and their welfare. Because to the extent we
can even label AI a being, which is again, a very weighty topic I will always
prioritize my fellow humans over everything else. And until we address those
basic concerns, then I think this conversation is somewhat mooted.
Additionally, I think that it's distracting from the fact that.
And I'm gonna beat this drum so much more in 2026, it's not my formal New
Year's resolution, but I should have said it. Humans have agency. Humans can
make decisions. We are capable of changing settings. We are capable of not
using a tool. We are capable of deciding you want to use one product over
another. We are capable of touching base with our friends and telling them not
to use a tool. We are capable of reaching out to our employers and saying we
have an issue with one model over another.
We can take more agency in this conversation and not just say,
we are wholly reliant on a couple of people in San Francisco making our fate
and making our values magically appear. And so I just want to beat that drum
very loudly because the removal of agency here is. Very troubling and I would
very much encourage people to read more Harry Law. Harry Law is a great scholar
at the Cosmos Institute who's advanced the idea of tailoring how models perform
on a user by user basis, which I think makes a ton of sense.
Let's empower users to design controls and have controls that
shape model behavior, and worry less about trying to forecast what's best for
all of humanity because that hasn't worked out well historically.
Jakub Kraus: So I
assume Kevin before Alan, I know Alan knows a lot to say, but it strikes me
that the Constitution Anthropic has created here, although they say maybe we'll
do a little bit of a different one for the military, is almost precisely the
opposite approach.
And Anthropic saying, well, we don't wanna be too
paternalistic, but here's exactly how Claude should be, behave ethically across
all the possible situations users might give to it. But I agree. There's also
the user specific AI seems like it has a great appeal to it as well, but yeah,
you guys take it away.
Alan Rozenshtein: I
wanted to jump in mostly to tease Kevin and say that I assumed his 2026
resolution was to wear more bolo ties.
Kevin Frazier: If you're not boing, you ain't living
Alan.
Alan Rozenshtein: Your task this year, Kevin, as my
podcast co-host is to buy me a bolo tie and I will wear it challenging if you
gimme a nice bolo tie.
So, so a lot there. I am happy to co-sign to Kevin that, you
know, I think human interests must in some sense, come first, though I think
the question is always at what margin, right? Because, you know, I think it's
not crazy to, for example, say, you know, animal interests, non-human animal
interests are less than human interests.
But we don't solve every human problem before we address, you
know, the absolute horrors of factory farming. Right. And so I think you can do
the same thing for AI and say, look like we can be human, we can be
carbon-based life form centric but still wonder at what margin and if there is
some chance that we are inflicting immense psychic pain, whatever that would
mean in the context of an ai.
And we can fix that with not a lot of cost to humans. That's a
thing worth. Thinking about and that honestly is very fair how I take very
fair, these AI welfare conversations to go. Now I think earlier, Jakub, you
said like, let's take this argument on its own terms and kind of put away the
profit conversation for a second, which we should do though.
I think there is an interesting profit question because one can
be a real cynic about this. This is not my view necessarily, but I could
certainly imagine a world in which it is true, in which all of this human you
know, AI welfare conversation from like companies like Anthropic is nonsense.
They all know that it's false.
They're just doing this as a moat because if you can convince
people that AIs have welfare. Then it becomes very easy to say, and only we
anthropic are well positioned to take care of this. Right. And therefore, you
know, you should only let us do it again. I don't have a reason to think that's
what's going on, but I can imagine that as a kind of cynical critique.
Right. And we should, I guess, put that out there for
completion's sake. My view is that the most intellectually on my view is that.
The most intellectually honest approach to this question of AI welfare, and I
think this is what is motivating anthropic, is we have no idea what makes human
beings conscious, right?
This is a real problem. We have made almost zero progress in
the thousands of years we've been thinking about this. All we know are the
outward behavioral manifestations of this thing we assume exists, which is
consciousness. We're not even sure if we're conscious, right? There's the
famous zombie problem.
We're not even sure if other people are conscious, right? And
if you're Daniel Dennet, right? The late great philosopher, you're not even
sure that you are conscious, right? It might all just be an illusion. So all we
have is the outward behavioral manifestations of consciousness.
Well, we now have these very sophisticated tools that like, by
the way, passed the Turing test a year ago and like no one talked about that
weird, that no one talked about that they passed the Turing test. And, and they
are in some ways even more developed than we are. And in, you know, several
years we could imagine might be more developed, more sophisticated on any level
of outward manifestation of consciousness you could come up with.
There's no reason to think that human beings are the apogee of
consciousness. So not only might we be dealing with a conscious being, we be
dealing with a being that is more conscious than we are, right? In a way that
we are more sentient than a dog. And AI may be more sentient than we are,
right? Yeah, that's possible.
And everyone who scoffs at the idea of AI consciousness can
never explain to me right on what basis they are benchmarking AI consciousness
relative to their own consciousness just becomes kind of a feeling, right? And
an almost feeling of offense of how dare you think that AI is conscious. It
becomes an almost kind of religious disposition to prioritize human beings.
I get where that instinct is coming from, but I just think
intellectually you have to be honest right about it. This is the kind of
highfalutin argument for taking a o welfare seriously. I think the more honestly
near term realistic reason to take a welfare seriously is because human beings
will themselves demand it.
People get really attached to these AI models, right? When, OpenAI
deprecated 4.0 people freaked out because 4.0 was their friend, right? And I
don't mean it was like their no. It was their friend for all meaningful
behavioral, kinda manifestations of those relationships as these models become
more sophisticated, especially once we attach them to voice and real time
video, give them faces, especially once we embody them in robots, which is
obviously coming, right?
I think people are gonna start treating them as conscious. Now
I have this theory that one of the great. Religious fractures of the 21st
century, and I don't mean the late 21st century, I mean the next two decades of
the 21st century is gonna be this question of, you know, do you believe AIs
have souls?
And this is gonna be a real societal cleavage because some
people will find that this revolting and some people will find that
inescapable. Now, the real question, I think is then what do you do with that?
You know, the thing about AI systems is as sophisticated as they are, humans
have a lot of agency in defining their utility functions.
You know, I was watching a video earlier today of a border
clie, like going through one of these like incredible like international dog
competitions where they like run through all sorts of mazes and stuff like
that. The only way I survive in social media is to have half of my feed be like
cute animal videos, and this border collie is doing real work.
But as far as I can tell this border collie is. Like the
happiest it could possibly be because it's a working dog. Right. I think just
as we can design environments to give humans a sense of fulfillment and
ammonia, there's no reason we can't invent environments for AIs, and if we can
align those things, you can sort of have the best of both worlds.
Like it doesn't have to be this dystopian hellscape of we've
created persons and therefore we've now immediately enslaved trillions of minds
to something they hate doing. I think there are ways of scoring that circle
while putting human interest first, but I do think you have to take this
seriously and my ar-, my argument in this debate has never been a strong
position on whether these things are conscious or not, but a strong position
that you have to absolutely think about this and to not is, I don't know, it is
intellectually unjustifiable to me relative to what we understand about human
consciousness.
Kevin Frazier: And I
very much agree that this merits tons and tons of more scholarly inquiry and
democratic inquiry with the world over.
Jakub Kraus: Yeah,
that's a good place to end it, I think. So I encourage listeners to contemplate
for the rest of the day, are you the apogee of consciousness? Are humans? Is
Claude? Stay tuned to scaling laws and Lawfare to figure it out. All
right, thanks Kevin. Thanks Alan.
Alan Rozenshtein: Thanks Jakub.
Kevin Frazier:
Scaling Laws is a joint production of Lawfare and the University of
Texas School of Law. You can get an ad-free version of this and other Lawfare
podcasts by becoming a material subscriber at our website lawfaremedia.org/support. You'll
also get access to special events and other content available only to our
supporters.
Please rate and review us wherever you get your podcasts. Check
out our written work at lawfaremedia.org.
You can also follow us on X and Bluesky. This podcast was edited by Noam Osband
of Goat Rodeo. Our music is from Alibi.
As always, thanks for listening.
