Cybersecurity & Tech

OpenAI No Longer Takes Safety Seriously

Peter N. Salib
Wednesday, May 22, 2024, 10:09 AM

OpenAI and its competitors are racing as fast as they can to develop systems that are as capable and as autonomous as possible.

OpenAI logo with magnifying glass (Jernej Furman,; CC BY 2.0 DEED,

Published by The Lawfare Institute
in Cooperation With

Until last week, OpenAI had a team dedicated to making sure its products did not destroy humanity. The team was called Superalignment. It was co-headed by Ilya Sutskever, OpenAI’s co-founder and chief scientist, and Jan Leike, a pioneering AI researcher in his own right. Last week, both Sutskever and Leike resigned from the company (Leike explicitly in protest), and OpenAI disbanded Superalignment. 

This is very bad news. Even if you do not believe that near-future artificial intelligence (AI) systems could pose an existential threat to humanity, OpenAI does. Or at least it has said so over and over. Since its founding, OpenAI has frankly professed the riskiness of its stated goal of creating “AGI”: machines as smart as or smarter than humans. And as recently as 2023, Sam Altman, OpenAI’s co-founder and CEO, signed a statement asserting that “the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”

In the past, OpenAI took actions designed to demonstrate that it was acting responsibly to mitigate the risks its business was imposing on the rest of us. At its founding, for example, the company chose an unusual governance structure. It is a for-profit entity wholly controlled by a nonprofit. The majority of the nonprofit’s board members are “independent,” having no financial stake in the for-profit entity. Rather than shareholder value, their primary charge is to ensure “safe AGI.” The board also has plenary authority to declare that OpenAI has “attained AGI,” at which point no outside partners—including Microsoft—will have any further right to use OpenAI’s technology. The OpenAI nonprofit board can also do normal corporate board things, like hiring and firing the CEO. At least in theory

Superalignment, too, was supposed to be evidence of OpenAI’s sober approach to the dangers of building smarter-than-human machines. AI scientists broadly agree that much of the potential danger from advanced AI comes from “misalignment.” A misaligned AI is simply one that does not reliably act in ways that promote, rather than conflict with, human intentions and well-being. Every advanced AI system available today is misaligned to some degree. OpenAI’s GPT-4, for example, cannot be reliably prevented from hacking secure computer systems, attempting to assist bioterrorism, or breaking the law and then lying about it. Misaligned AI is not yet an existential threat to humans, but only because GPT-4 is still a mediocre hacker and bioterrorist. But OpenAI and its competitors are racing as fast as they can to develop systems that are as capable and as autonomous as possible. The worry, then, is that someone will soon create a system that is both very unsafe and also powerful enough to be ungovernable. 

Superalignment’s job was to solve the alignment problem before such a highly dangerous and uncontrollable AI system arrived. Upon creating Superalignment, OpenAI publicly pledged to dedicate 20 percent of its computing resources to the team. Not a trivial commitment for a company whose main expense is computing power.

But Superalignment is gone now, dissolved almost instantly upon Sutskever’s and Leike’s departures. 

The best explanation, it would seem, is that OpenAI is no longer taking catastrophic AI risk seriously. Leike has said as much. In a Twitter thread immediately following his resignation, he wrote that in recent months, his team “ha[d] been sailing against the wind … struggling for compute” needed to “get th[eir] crucial research done.” Leike lamented that, at OpenAI, “safety culture and processes have taken a backseat to shiny products.”

While they were certainly the highest-profile departures, Sutskever and Leike are not the only OpenAI employees with safety-focused job descriptions to have left in recent months. Daniel Kokotajlo and Cullen O’Keefe, members of the company’s AI governance team, also quit recently. Like Leike, Kokotajlo has publicly stated that he has “lost trust in OpenAI leadership and their ability to responsibly handle AGI.” 

These departures follow last November’s failed attempt by OpenAI’s independent board members to remove Altman as CEO. The board, too, cited a lack of trust that Altman would candidly share safety-relevant information. In the end, it was the board’s safety-focused members, rather than Altman, who were fired.

At the time, it was perhaps possible to read the board’s dust-up with Altman as pure corporate politics: A board of directors bristling at what it perceived as the excessive power of a charismatic founder and CEO. That kind of thing could happen at any company, not just one that’s developing a technology it admits could threaten humanity. 

But these recent departures make that narrative harder to believe. For top-tier AI researchers who, like Sustskever and Leike, are seriously concerned about AI risk, OpenAI is arguably the best place in the world to work. It is the leading lab pursuing AGI. It currently has the best AI systems—crucial objects of study for safety work. It has top-notch talent and immense resources. For an engineer hoping to help solve the alignment problem and ensure that advanced AI benefits, rather than harms, humans, a job at OpenAI is hard to beat. It is also arguably the place to be for someone who hopes to be well-positioned to sound the alarm if and when a truly dangerous AI system arrives. 

Thus, if Leike, Kokotajlo, and O’Keefe are being honest when they say that AI risk is deadly serious, then their departures should themselves constitute alarm bells. By leaving OpenAI, they are foregoing perhaps the best opportunity to make substantive progress on the problem that, by their lights, is the most important. The only reason for them to leave would be if it were clear that OpenAI really had abandoned its commitment to supporting safety work. 

Maybe you think the departed employees aren’t being honest about their belief in AI risk. Perhaps OpenAI has already figured out how to make arbitrarily intelligent AI systems safe. Or maybe it has become clear inside OpenAI that the company will never be able to create systems smart enough to pose a global threat. In either case, the Superalignment team might no longer be necessary. Then all of this would be a commonplace story of a company rationally downsizing an obsolete division, with the heads of that division trying desperately to stay relevant. 

That story does not seem plausible. Both Sutskever and Leike are extremely talented AI engineers, with skill sets that transcend safety work. Leike is a co-inventor of reinforcement learning from human feedback—a technique credited with making language models useful in “chat” form. And Sutskever, to state the obvious, is the co-founder and chief scientist of the world’s leading AI company. He was the co-creator of GPT-2, OpenAI’s first successful language model. If safety work like Superalignement had become obsolete, both men could have surely shifted to similarly high-paying and prestigious work making AI more capable. They could have done it at OpenAI. 

Maybe there are instead evolving internal views, and reasonable disagreement, inside OpenAI about the riskiness of AGI. But even then, the departures would be disturbing. It has been less than a year since Superalignment was launched. If you run a company that, 10 months ago, regarded its work as an existential threat to humanity, disbanding the team dedicated to mitigating that risk is wildly irresponsible. It is irresponsible no matter how much you think you’ve learned in 10 months. And it is doubly so if your co-founder and chief scientist, along with your other safety experts, remain convinced of the dangers.

This leaves just two possibilities: Either the departures of Sutskever, Leike, Kokotajlo, O’Keefe, and the ousted board members signal a seismic shift inside OpenAI against taking safety seriously. Or all of those people are delusional. Either the technology OpenAI hopes to build remains extraordinarily risky, and the company has—like many companies before it—simply abandoned public safety in favor of profit. Or OpenAI was just kidding all along about the risk stuff. Under this theory, the weird nonprofit ownership structure, the financially uninterested board, the statements about existential risk, Superalignment, and the 20 percent compute pledge would have to be written off as bizarre pieces of performance art. 

Which seems more likely? Certain commenters have long derided proclamations of AI risk as mere corporate “hype.” But telling your customers that your products might kill them, their families, their friends, and everyone they have ever known is not normal corporate marketing. It is not what cigarette companies or opioid manufacturers—whose products are dangerous—do. Just the opposite. When companies with highly dangerous products begin to face the threat of accountability for those dangers, they tend to obfuscate, conceal, and even lie. OpenAI’s previous candid acknowledgments of risk made it an outlier. But perhaps now that its user counts are skyrocketing, its valuation is ballooning, and the threat of regulation is looming, it is simply beginning to act like a normal company. If so, we should all be very concerned. 

Peter N. Salib is an Assistant Professor of Law at the University of Houston Law Center and Affiliated Faculty at the Hobby School of Public Affairs. He thinks and writes about constitutional law, economics, and artificial intelligence. His scholarship has been published in, among others, the University of Chicago Law Review, the Northwestern University Law Review, and the Texas Law Review.

Subscribe to Lawfare