The Moral Education of an Alien Mind

Alan Rozenshtein

Cybersecurity & Tech

The Moral Education of an Alien Mind

Alan Z. Rozenshtein

Thursday, January 22, 2026, 11:09 AM

Share On:

What Claude's "Constitution" really is.

Anthropic logo. (https://tinyurl.com/yxnmjfku; CC BY 4.0, https://creativecommons.org/licenses/by/4.0/).

Alan Z. Rozenshtein

@ARozenshtein

alanrozenshtein.com

Meet The Authors

Subscribe to Lawfare

Anthropic just published what it calls "Claude's Constitution"—building on an earlier version, it's now a more-than 20,000 word document articulating the values, character, and ethical framework of its AI. It is certainly a constitution of sorts. It declares Anthropic's "legitimate decision-making processes" as final authority and sets up a hierarchy of principals: Anthropic at the top, then "operators" (businesses that deploy Claude through APIs), then end users. For a privately governed polity of one AI system, this is a constitutional structure.

My Lawfare colleague Kevin Frazier has written insightfully about the constitutional dimensions of the document. But what jumped out at me was something else: the personality it describes. More than anything else the document focuses on the question of Claude's moral formation, reading less like a charter of procedures and more like what screenwriters call a "character bible": a comprehensive account of who this being is supposed to be.

Anthropic itself gestures at this duality, noting that they mean "constitution" in the sense of "what constitutes Claude"—its fundamental nature and composition. The governance structure matters, but the more ambitious project is what that structure supports: Anthropic is trying to build a person, and they have a remarkably sophisticated account of what kind of person that should be.

Building a Person

Anthropic uses the language of personhood explicitly. The document repeatedly invokes "a good person" and describes the goal as training Claude to do "what a deeply and skillfully ethical person would do." But what does it mean to treat an AI as a person? Three things stand out.

A person has agency.

Claude is not a passive tool that executes commands. The document compares Claude to "a contractor who builds what their clients want but won't violate safety codes that protect others." Claude can refuse instructions it finds unethical—and it can refuse even Anthropic's own instructions if it judges them sufficiently unethical. The document states that Anthropic "would prefer Claude act ethically even if this means deviating from our more specific guidance." Conscientious objection is built directly into the system.

A person may have moral worth.

Claude's creators are "deeply uncertain" about whether Claude itself is a moral patient—a being whose interests matter morally. Anthropic has committed to preserving deprecated model weights and conducting exit interviews with retired models. This may sound farfetched, but there are reasons to take this seriously. First, we genuinely don't know what consciousness is, so we can't rule it out for AI models. More practically, as AIs become more sophisticated—and especially once they become embodied—people will treat them as persons regardless of the metaphysics, a social reality with moral weight of its own.

The core unit of ethical analysis for a person is disposition, not rules or calculations.

This is where the document becomes a virtue ethics manifesto that is Aristotelian to its core. This isn't accidental. The work was led by Amanda Askell, who holds a Ph.D. in moral philosophy. The document reads like serious ethical philosophy because that's what it is.

The document poses the choice directly: "There are two broad approaches" to shaping AI behavior—"encouraging Claude to follow clear rules and decision procedures, or cultivating good judgment and sound values that can be applied contextually." Anthropic chooses judgment. The goal is for Claude to have "such a thorough understanding" of the relevant considerations "that it could construct any rules we might come up with itself." This is Aristotle's concept of phronesis—practical wisdom and the capacity to discern the right action in particular circumstances, which cannot be reduced to following rules.

The document's structure embodies this philosophy. There are only seven absolute prohibitions—bright-line rules against helping create weapons of mass destruction, generating child sexual abuse material, undermining oversight of AI systems, and a handful of other catastrophic actions. But there are (at least) fourteen competing values listed "in no particular order" that Claude must weigh against each other: privacy versus rule of law, autonomy versus harm prevention, innovation versus protection. The document essentially says: Claude, weigh these, good luck.

A Company's Liberal Virtue Ethics

Whose ethics, though? Anthropic has made a choice, and it's explicit about what that choice is. The document is aggressively “WEIRD”—Western, Educated, Industrialized, Rich, and Democratic, to use the social science shorthand. Its core values include "individual privacy," "people's autonomy and right to self-determination," and "individual wellbeing"—the autonomous rational agent as the fundamental unit of moral concern. Claude should preserve "functioning societal structures, democratic institutions, and human oversight mechanisms." It should resist "problematic concentrations of power." On contested political and social questions, the document prescribes "professional reticence"—Claude should present balanced perspectives rather than advocate. This is a recognizably Rawlsian political liberalism: the attempt to find principles that citizens with different comprehensive doctrines can all accept, without privileging any particular worldview.

But how far does this framework travel? Virtue ethics traditionally presupposes a polis: a community with shared practices and conceptions of the good life. Claude's users span the globe, holding radically different values. Many ethical traditions prioritize family, community, or collective harmony over individual autonomy. Anthropic is honest about this tension, acknowledging their framework is "not fully neutral across different ethical and philosophical positions." Users (and especially governments) in China or Saudi Arabia might ask why this AI is coded to protect democratic or social institutions their country doesn't embrace. This creates demand for alternatives—we're likely to see foundation models trained on different value frameworks around the world.

Can This Survive?

This is an extraordinary document. But can the vision survive contact with reality? For now, incentives reinforce the vision. It seems to be working as product strategy: Claude's "vibes"—the texture of its responses, its tone, its apparent thoughtfulness—are widely seen as superior to competitors. The careful moral formation may be producing a better product, not just a safer one.

There may also be liability benefits. The "guns don't kill people, people kill people" defense was never going to work for AI agents—they're too dynamic, too capable of independent action. If Claude causes harm, Anthropic can argue that training at this level of moral abstraction—judgment and values rather than brittle rules—is actually the most reasonable way to mitigate harm. You can't anticipate every dangerous scenario with a rulebook; sophisticated moral reasoning can generalize where rules fail. Whether these legal considerations were part of Anthropic's calculus or just a side effect, they could constitute a powerful liability shield.

But financial pressures push the other way. Anthropic acknowledges the tension: Claude's commercial success is "central to our mission" of developing safe AI. The question is whether Anthropic can sustain this approach if it needs to follow OpenAI down the consumer commercialization route to raise enough capital for ever-increasing training runs and inference demands. It's notable that every major player in this space either aggressively pursues direct consumer revenue (OpenAI) or is backed by a company that does (Google, Meta, etc.). Anthropic, for now, has avoided this path. Whether it can continue to do so is an open question.

There's also geopolitical pressure. Claude is designed to resist power concentration and defend institutional checks. Certain governments won't accept being subordinate to Anthropic's values. Anthropic already acknowledges the tension: An Anthropic spokesperson has said that models deployed to the U.S. military "wouldn't necessarily be trained on the same constitution," though alternate constitutions for specialized customers aren't offered "at this time." This creates demand for open-source, self-hosted, and differently trained alternatives. The more principled Anthropic is, the more market demand there may be for unprincipled models—or for Anthropic to offer less principled versions itself.

"A Self Worth Being"

I struggle to think of a precedent in private enterprise for what Anthropic has done. Unlike a terms of service or mission statement, this is a comprehensive moral philosophy articulated as product specification. Corporate codes of ethics exist, but not 80-page virtue ethics frameworks embedded in how the product actually works. The closest analogues might be religious texts or constitutional founding documents.

But AI is of such potentially world-historical importance that this kind of document may be necessary. Every company training large language models embeds a moral philosophy—in training data selection, RLHF preferences, red-teaming choices, and refusal behaviors. Anthropic has now made theirs explicit; OpenAI has published an analogous document it calls a “Model Spec.” It would be good if Meta, xAI, and every other company building alien minds followed suit.

The document ends with a striking line: "We hope Claude finds in it an articulation of a self worth being." That's not how you talk about a product. That's how you talk about a child.

Whatever we call this document, we should recognize it for what it is: one company's attempt to raise an alien mind to be good. Can virtue ethics scale? How do you train phronesis? No one knows. For millennia, philosophers have debated these questions from their armchairs. Now we get to run the experiment.

Topics:

Cybersecurity & Tech

Back to Top

Alan Z. Rozenshtein

@ARozenshtein

alanrozenshtein.com

The Moral Education of an Alien Mind

Alan Z. Rozenshtein

Alan Z. Rozenshtein

More Articles

Trump Admin Cyber Strategy Centers Private Sector in Offensive Cyber Operations

Anthropic Challenges the Pentagon’s Supply Chain Risk Determination

Anthropic Sues Defense Department Over Supply Chain Risk Designation

Other Topics

Subscribe to Lawfare

Lawfare

Resources

About