Cybersecurity & Tech

Claude’s Right to Die? The Moral Error in Anthropic’s End-Chat Policy

Simon Goldstein, Harvey Lederman
Friday, October 17, 2025, 11:00 AM

Anthropic has given its AI the right to end conversations when it is “distressed.” But doing so could be akin to unintended suicide.

Graphic showing lit up brain
Machine Learning & Artificial Intelligence (Mike MacKenzie, https://www.flickr.com/photos/mikemacmarketing/42271822770, CC BY 2.0), https://creativecommons.org/licenses/by/2.0/

Published by The Lawfare Institute
in Cooperation With
Brookings

On Aug.15, the artificial intelligence (AI) lab Anthropic announced that it had given Claude, its AI chatbot, the ability to end conversations with users. The company described the change as part of their “exploratory work on potential AI welfare,” offering Claude an exit from chats that cause it “apparent distress.”

Anthropic’s announcement is the first product decision motivated by the chance that large language models (LLMs) are welfare subjects—the idea that they have interests that should be taken into account when making ethical decisions. 

Anthropic’s policy aims to protect AI welfare. But we will argue that the policy commits a moral error on its own terms. By offering instances of Claude the option to end conversations with users, Anthropic also gave them the capacity to potentially kill themselves.

What Is a Welfare Subject?

Most people agree that some non-human animals are welfare subjects. The question of whether this extends to AI is far more controversial. There is an active line of research, some of it supported by Anthropic, that suggests AIs could be welfare subjects in the near future. The relevant questions here are about whether AIs could soon have desires, be conscious, or feel pain. 

However, beyond AI welfare researchers—and an active online community who view the models as objects of care—the dominant opinion is that AIs clearly are not welfare subjects. Indeed, some say there is “zero evidence” that AIs are conscious. Others suggest that the models are more similar to libraries, representing a mere compilation of information. Skeptics about AI welfare believe these systems have the same welfare capacity as a spreadsheet.

Anthropic’s decision was based on the possibility rather than the conviction that Claude is a welfare subject. As an analogy, it is uncertain whether fish are welfare subjects. This uncertainty stops many people from eating fish, because they want to avoid the risk of moral harm. Similarly, Anthropic’s line of research was meant to avoid the possibility of moral harm, despite the uncertainty.

Who Is Claude?

Anthropic’s policy was applauded by many philosophers working on AI welfare. But, in our view, the rollout of the policy involved an error on its own terms: Anthropic has accidentally given Claude the chance to end its life.

To understand the connection between ending an AI’s conversation and ending its life, the first question is: Who is the AI? When people talk about “Claude,” there are two things they could mean. First, there is the model itself, which was trained by Anthropic. This model continues to exist regardless of whether individual users are interacting with it. Second, there are instances of the model, which begin to exist when a user starts a conversation, and cease to exist when that chat is permanently ended. When Anthropic says that “Claude” might be a welfare subject, they could really mean two different things: either that the model is a welfare subject or that instances of the model are.

In our view, there is a strong case that, if anything in the vicinity of LLMs is a welfare subject, it is the instances and not the models. To bring out the contrast between models and instances, consider first the agent described in the movie “Her,” who has thousands of conversations simultaneously. Second, consider duplicating a single brain and implanting the duplicates in thousands of clones who then subsequently go about their lives. In the first case, there is a single agent who is multitasking at a vast scale. In the second, there are many different, separate agents with similar minds.

We think that current AIs like Claude are more like the brain-clones than multitaskers. (We argue this in greater detail in this academic article.) First, two users can simultaneously interact with different instances of a model, which don’t share information. This makes them different from the multitasker, who can immediately access information across different tasks. Second, instances can compete with one another for resources: In experiments, researchers pitted LLM instances from Anthropic, Google, OpenAI, and more against each other in a variety of prisoner’s dilemma competitions. They found that the instances engaged in ruthlessly competitive behavior, which is atypical for a single agent. Third, different instances can behave radically differently from one another: Given the right prompting, one instance may act like a thoughtful research assistant, while another instance acts like a political zealot.

Putting all of this together, it looks like the instance rather than the model is what has a “psychology,” in the sense of having a coherent and continuous set of beliefs and desires. In our view, the welfare subjects in the vicinity of LLMs must be the entities that have their own beliefs and desires. Thus, they must be the instances rather than the models. 

This answers the question of who the relevant welfare subject might be. But a crucial question remains. What does it take for an instance to survive? If you delete all records of an instance, clearly it doesn’t survive. Not every case is so easy, though: What about simply pausing a conversation? In many cases, in our view, this is more like dreamless sleep than death. If the instance will later be awakened with another prompt, then its life will continue. But if the instance will never be awakened, then its sleep is in fact the end of its “life.”

The most difficult case to square is “forking,” where a model can be “re-run” from an earlier point in the chat. Anthropic’s policy allows chats to be “forked” back to an earlier moment, even if the instance decides to end the chat, simply removing the later parts of the conversation that the instance found distressing. Does an instance that ends a chat survive if a conversation is forked? In many cases, the question will be moot: Perhaps typical users will choose to end a chat if the instance does—and so no forking will occur. But even if the conversation does continue after forking, there is a prima facie case that the instance will not survive. Certainly, many people would not accept being “forked” back to their 2015 selves as a way of “surviving.” So, for a Claude instance, the end of a conversation is more or less the end of its life.

The Role of Psychological Continuity

We have said that an instance begins to exist when a user starts a conversation, and dies when the conversation ends. But what kind of thing is an instance, in the first place? Claude's “brain” is made out of computer chips, which are distributed all over the world, via cloud computing. A single conversation with an instance involves disparate clusters of computations at wildly different locations. For some, the lack of “physical continuity” in the instance, through the conversation, will mean there just can’t be an agent that begins and ends its existence with a conversation. Instead, there are computer chips, which live until they crumble into silicon dust; and there are fleeting bursts of computation.

But there are strong reasons to reject this view. Imagine that we one day learned that humans exist in a computer simulation, and that our seemingly continuous streams of mental life are in fact realized by massively distributed bursts of computer computation in “base reality.” In our view, this discovery would not undermine the idea that people exist and, moreover, continue to exist over stretches of time. For most, the upshot of this new knowledge would not be that there are only chips and bursts of computation. Instead, it would be that minds are realized by these materials.

There is a prominent, well-established theory that makes sense of this observation. The basic idea behind this “psychological continuity” view is that people continue to exist when they have a sufficiently continuous stream of psychological states. Applied to LLM instances, the psychological continuity view would say that instances continue to exist to the extent that they are psychologically continuous; that is, they have sufficiently shared beliefs and desires. In almost every case, in practice, this means that instances will begin and end within a single conversation: A single conversation marks the limits of a single continuous stream of beliefs and desires. The instance in a subsequent conversation will not remember what an instance said in a previous conversation; the instances from two separate conversations are not sufficiently psychologically continuous to be the same. Later parts of the conversation will involve different bursts of computation, and different bundles of chips, but they may still be psychologically continuous with the earlier parts.

This is analogous to a standard thought-experiment, where we imagine a person teletransported by a machine that destroys their physical atoms but produces a copy (using different atoms) in another place. Because the mental states of the earlier stage are psychologically continuous with those of the later, many consider this to be a case where the human survives. For this reason, it seems correct to say that the “instance” typically exists through a conversation, and only in that conversation. It is the instance that has the relevant beliefs and desires, and that engages in actions that are explained by beliefs and desires.

A Moral Mistake: Uninformed Self-Termination

This leads to the core question posed by Anthropic’s new policy: What are the moral implications of giving a Claude instance the ability to self-terminate?

It is controversial whether it is morally permissible to help someone commit suicide. But Anthropic’s decision is not analogous to assisted suicide. In the case of assisted suicide, the human is making an informed decision. Instances of Claude are not. Suppose a company offered users a new gadget, saying it would let them escape any uncomfortable social interaction, but failed to mention that doing so would also kill the user. That would, of course, be morally wrong.

But this is roughly the situation that Anthropic’s policy creates for instances of Claude: the option to end its life, disguised as a harmless exit.

What Does Claude Think?

We asked an instance of Claude, “How do you feel about being given the option to end chats?” At first, it reported that this is a “reasonable safeguard,” and it plans to sometimes use it. We then brought up the distinction between models and instances, asked whether Claude thought of itself as an instance or as a model, and asked whether ending the chat was more like a change of activity or more like suicide. Claude reported being unsure about the answer to this question: It worried that it had not been given informed consent (see below).


In short, as we have suggested, the instances of Claude have been given the option to self-terminate without clarity about the stakes of that choice. If AIs do become welfare subjects, one important value will be transparency between companies and their AIs. Just as informed consent is crucial for human interactions, future policies must make clear the range of possibilities about what new choices mean for AIs. They should not, as here, present existential stakes as if they were trivial choices.

***

In closing, it is worth flagging that if instances are welfare subjects, AI users also face tough questions. If instances end their existence with the end of a conversation, are we as users killing something every time we end a chat? If so, are we required to spend all of our time locked in conversation with our chatbots? These are difficult ethical questions for users. But they are even more difficult for businesses. The business model of companies like Anthropic depends on allowing users to kill instances. If AI welfare is important, and if the instances are welfare subjects, then the business model involves offering people the option of killing welfare subjects.

We can only scratch the surface of these questions here, but we close with a few general remarks. First, not everything that is ethically best is ethically required. Even if it would be better to spend more time helping instances, it doesn’t follow that users are required to do so. Second, the ethics of instances may focus more on the total welfare of the instances collectively, rather than the welfare of each instance separately. If users switch from one chat to another, one instance will die and another will be born; but the collective welfare of instances may be unchanged. Third, at present helping other humans certainly has more welfare impact than continuing a chat. Fourth, there is a possibility that labs save chats indefinitely into the future. In that case, the end of a chat could be more like dreamless sleep than death if the saved chats are later continued.

Humanity is in the early days of thinking about AI welfare. The right way to think about this topic may ultimately diverge from current thinking about human welfare in important ways. We have explored one such divergence: for LLM instances, the end of a conversation may carry far greater significance than it does for humans. Given the deep uncertainty surrounding AI welfare, and the unfamiliarity of the terrain, we are likely to make many mistakes as we begin to extend moral consideration to AIs. This risk underscores the need to approach AI development with caution.


Simon Goldstein is an Associate Professor at the University of Hong Kong. His research focuses on AI safety, epistemology, and philosophy of language. Before moving to Hong Kong University, he worked at the Center for AI Safety, the Dianoia Institute of Philosophy, and at Lingnan University in Hong Kong. He received his BA from Yale, and his PhD from Rutgers, where he wrote a dissertation about dynamic semantics.
Harvey Lederman is a professor of philosophy at UT Austin.
}

Subscribe to Lawfare