Cybersecurity & Tech

Content Moderation’s Legalism Problem

Yoel Roth
Monday, July 24, 2023, 1:30 PM
Could a public editor solve social media’s crisis of trust?
Twitter headquarters in San Francisco, March 6, 2015. (Runner1928,; CC BY-SA 4.0,

Published by The Lawfare Institute
in Cooperation With

On Sept. 8, 2019, model and media personality Chrissy Teigen posted a tweet calling President Donald Trump a “pussy ass bitch.”

Recounting the incident during a hearing before the House Oversight Committee, former Twitter safety policy team member Anika Collier Navaroli testified that, in response to Teigen’s post, representatives from the Trump White House reached out to Twitter to request the post’s removal.

At the time of Teigen’s post, I was working at Twitter as the company’s head of site integrity, leading a team responsible for content moderation and election security issues. Within Twitter, Teigen’s post created a challenge—not only because of the White House’s request, but also because staffers on the moderation team couldn’t come to a consensus about whether the post constituted a violation of the company’s policies. The site’s public-facing Abusive Behavior Policy prohibited “behavior that harasses or intimidates, or is otherwise intended to shame or degrade others.” Teigen’s post, plausibly, could be interpreted as falling within its penumbra. 

Internally, Twitter had a more detailed standard for determining potential violations of the Abusive Behavior Policy: the so-called three-insult rule. The rule required the presence of multiple mild insults in a post (or across several posts) before Twitter would step in. The rationale behind the rule was to avoid arbitrating schoolyard disputes between users; the company’s top-down content moderation procedures would only kick in past a certain point of hostility. The three-insult rule was one of hundreds of granular guidelines arbitrating the finer points of online speech—a set of detailed guidelines that exist at nearly every tech company but only rarely see the light of day.

And so Twitter’s policy teams set about determining whether Teigen’s post violated the three-insult rule—a question that required an exacting analysis not just of the post’s sentiment (clearly hostile) and context (provoked; the president had referred to Teigen as the “filthy mouthed wife” of musician John Legend), but also the grammatical specifics of its construction. Did “pussy,” “ass,” and “bitch” each constitute separate insults, thereby breaking the three-insult rule and warranting a Trump-pleasing removal of the post? (This clearly would have been the politically pragmatic interpretation for a company already beleaguered by accusations of bias against conservatives.) Or—as I concluded at the time, drawing on the sentence diagramming skills I learned in elementary school—did “pussy” modify “ass,” which in turn modified “bitch,” resulting in only a single, adjective-laden insult? 

In the end, the company decided not to remove Teigen’s tweet, both for grammatical reasons and because of a long-standing practice of erring on the side of permitting even quite aggressive criticism of public figures.

It’s hard to look at Kafkaesque deliberations like these with anything other than bemusement—or, perhaps, disdain. Here was a billion-dollar company, investing a significant amount of time in parsing the sentence structure of a crude insult. I was aware of the absurdity at the time. Long before I worked on content moderation at Twitter, I lambasted the hair-splitting guidelines imposed by Facebook, Apple, and the then-new industry of mobile apps in my Ph.D. research. As Cornell University social media researcher Tarleton Gillespie put it in 2012, writing about the leak of a detailed copy of Facebook’s content moderation guidelines, “The absurdity here is really an artifact of having to draw this many lines in this much sand.”

Perhaps it’s this absurdity that keeps companies from talking about the messy business of making content moderation decisions unless they are forced to do so. New advances in the Metaverse, for example, make for better marketing than the sausage-making of speech governance. And yet, if policymakers and technologists hope to address the fundamental issues ailing social media—a profound failure of public trust in companies whose products billions of people engage with every day—it’s exactly this kind of work that needs to be brought into public view. Social media companies can do so by drawing on a strategy employed by newsrooms for decades: the role of public editor.

Social media content moderation discussions often invite comparisons to the work of the American judiciary. Law professor Kate Klonick dubbed platforms like Facebook and Twitter “the New Governors” of speech: in one light, behaving as software companies; in another, carrying out many of the lawmaking and law enforcement responsibilities traditionally associated with governments themselves. Within companies, trust and safety discussions commonly reference an eclectic mix of concepts drawn from American First Amendment law, international human rights approaches, and criminology. Company staffers often see themselves as not just managing a website, but constructing a system of justice. Facebook’s 2020 decision to launch the Oversight Board as a sort of “Supreme Court” (in Mark Zuckerberg’s words) for the platform’s moderation decisions makes the comparisons even more apt.

Writing about the earliest days of content moderation at Facebook and YouTube, Klonick describes the companies’ shift from a set of loosely articulated standards (“Feel bad? Take it down”) to a more exhaustively documented set of rules. The goal of this transformation, she writes, was to help companies manage their explosive growth by reducing the amount of individual and subjective decision-making required of moderators. But it also transformed the work of trust and safety from the enforcement of loose, vibes-based guidelines living in the heads of a small cohort of Silicon Valley staffers into a form of judicial scrutiny. An entire field of work emerged to engage in the Talmudic study of platform policies. 

Perhaps unsurprisingly, many of the American lawyers responsible for leading platforms’ early policy and content moderation efforts brought the traditions they learned in law school to social media content governance. The principle of stare decisis—abiding by established precedent—became a cardinal rule within companies. Twitter’s trust and safety staffers would agonize over key decisions, ensuring that the handling of emergent issues was wholly consistent with past applications of the company’s policies. As happened with Teigen’s tweet about President Trump, content moderation decisions—especially those involving prominent figures—often involve a painstaking parsing and documentation of user behavior reminiscent of offline procedures for handling criminal evidence. And, as Facebook’s Oversight Board began its work in 2021, the group found itself rehashing one of the central debates of American constitutional law: whether the role of a judiciary is to adhere closely to the text of the laws they interpret (a doctrine of interpretation known as textualism or originalism), or to see it as a “living” document subject to evolution and change through interpretation. In this world of new governors, harried platform staffers codifying rules in a haphazard assortment of internal documents, guidelines, PowerPoint presentations, and emails became the framers of bizarre, modern-day constitutions.

These legalistic approaches to policymaking and enforcement are not just abstract questions of interpretation; they directly inform how platforms handle—or fail to handle—the harmful behavior they’re responsible for governing. Understanding them can help reveal strategies for building a safer and more trustworthy internet.

Here is another instructive example: In August 2022, the Twitter account “Libs of TikTok,” a popular pseudonymous account operated by the former New York real estate agent Chaya Raichik, posted a series of tweets purporting to expose child abuse at hospitals providing care to transgender kids and teens. The claims in the posts were plainly false. Their consequence, however, was immediate and predictable: Hospitals were inundated with hostile social media activity, harassment of their staff, and even bomb threats.

Within Twitter, the account’s activity stymied textualist interpretations of the platform’s rules. The company’s Hateful Conduct Policy prohibited posts that include “slurs, tropes or other content that intends to dehumanize, degrade or reinforce negative or harmful stereotypes about a protected category”—a category of speech that included, for example, suggestions that LGBTQ people are child molesters or “groomers.” But the posts by Libs of TikTok took a different approach, instead attacking doctors providing services to transgender youth, and accusing them of being “groomers.” Doctors, notably, are not a “protected category.” Should the Hateful Conduct Policy apply to them?

Many staffers on the company’s trust and safety team, myself included, suggested adopting a flexible approach to arbitrate the case: An attack on doctors providing health care services to transgender youth is, in essence, an attack on transgender youth themselves. Stigmatizing health care for trans people is rooted in the discriminatory viewpoint that there’s something wrong with being trans. Not to mention, the account’s conduct had already plainly emerged as dangerous, with significant offline consequences; surely no one can argue that a bomb threat disrupting the operations of a major children’s hospital is “just” harmless online trolling.

But a strict, literal application of the company’s rules to the account’s posts pulled in the opposite direction. Looking “within the four walls of the tweet,” as one executive liked to put it, what could we point to that violated the letter of the law? Even as we agreed that the posts contravened the spirit of the company’s policies, Raichik’s carefully constructed posts stopped short of clearly breaking the Twitter rules. I was encouraged by company executives to prioritize a fair application of the policies, and not assign subjective intent to Raichik’s actions. Doing so, they worried, would lead to a snowballing application of Twitter’s rules to borderline content—and, ultimately, to the very censorship we claimed to stand against. My team’s recommendation to ban Raichik was overruled.

The tensions around this case are not new, nor are they specific to Libs of TikTok’s abuse of transgender people. Platforms (and other institutions, as in the recent case of the harassment of a University of Chicago professor) have long struggled to address hate and harassment when the conduct in question is vague, veiled, or distributed across a mob. Twitter’s three-insult rule tells moderators what to do when one person repeatedly attacks another; but what if one person encourages other members of their community to do the harassment instead, each posting below the threshold for company intervention? 

For the victims of this abuse, the impacts of harassment by a mob can be the same, or even greater, than abuse from a single account. But platforms often find it impossible to prove definitively that such actions are intentional and coordinated. Lacking the ability to peer into users’ minds and understand their motivations, content moderation teams are often left to operate only within the four walls of the evidence available to them. Critics of social media, like law professor Evelyn Douek, rightfully call for platforms to do better than “trust us, we know it when we see it” standards for moderation; but these calls are a double-edged sword. Legalistic formalism and rigid evidentiary standards protect users from platform caprices; but they also make it harder for platforms to respond to what researcher Nina Jankowicz has termed “malign creativity”: the purposeful deployment of coded speech by abusive users to evade social media moderation policies. And, in an effort to avoid disproportionately and unjustifiably punishing users for what appear to be isolated disagreements, platforms can miss the forest for the trees.

Going back to incidents like Gamergate in 2015, platform policies have been mired in these debates. As a new generation of social media services emerge—like Mastodon, Bluesky, T2, or even Meta’s Twitter competitor Threads—what should these upstart platforms do to try to move past them?

One option is to have simpler rules that require less constitutional agonizing. In contrast to Facebook’s gargantuan (18,882 words) community standards, the “server rules” for, the largest instance of decentralized social media service Mastodon, total just 45 words. The rules state the service’s expectations simply and unequivocally: “No racism, sexism, homophobia, transphobia, xenophobia, or casteism.” (If more detailed guidelines exist, the server’s administrators do not share them publicly.) Facebook’s public policy on the distribution of firearms is more than three times the length of all of’s external-facing guidelines.

Given the opportunity to start with a blank canvas, platforms have chosen to draft rules that at least appear to provide their moderators with greater flexibility. A recurring theme in discussions about new platform policies is a retreat from what some have described as the “pseudo-legalism” of Twitter and Facebook’s approach, and a return to common sense—or, as Denise “rahaeli” Paolucci, a longtime content moderation expert, put it, a mindset that responds to even creatively disguised abuse with, “You know what, I was not fucking born yesterday.” Certainly, this approach would have been a welcome alternative to moderation failures such as Twitter’s handling of Libs of TikTok. But it’s not obvious that moderation by common sense and “good vibes” could withstand the scrutiny from hostile congressional committees and global regulators that larger platforms have to contend with. And controversies have already sprung up on these newer platforms about bans that users view as inexplicable or unjust; in response, pushed to explain their decisions, platform staffers have gravitated toward ever-more-detailed policies and guidelines—proceeding, as journalist Mike Masnick has put it, along the content moderation learning curve, to predictable and familiar outcomes.

Another approach has been to expand the aperture of evidence considered by platforms as they arbitrate user conduct. Upstart social platform Spill, which became one of the most-downloaded social networking apps in the Apple App Store following a series of outages on Twitter in July, explicitly states that the service will consider “off-platform evidence” in determining rules violations. (Livestreaming platform Twitch has said it will take a similar approach.) These guidelines are an important first step toward broadening the aperture of content moderation to reflect that what happens on one social media service is only a slice of the broader human dynamics that can contribute to on- and offline harm. To return to legal analogies, these policies are like the discretion left to judges in determining sentencing for a crime: They do not subvert a rules-based order so much as give the people responsible for implementing it the discretion to recognize that context unavoidably matters.

Both of these solutions have merit and help companies respond more robustly to harmful behavior that stops short of the bright lines of platform policies. But neither gets at the fundamental issue plaguing social media: a failure of trust. 

In the context of technology platforms, “trust” is an enigmatic and challenging concept. How, exactly, can people be convinced to trust institutions—such as tech companies—that appear purpose-built to resist outside influence and accountability? To date, the answer has been legalistic formalism. From detailed policies to the transparency measures of the Santa Clara Principles to provisions in the European Digital Services Act that require platforms to make it abundantly clear to users how moderation decisions were made, tech companies and their regulators have built an ever-more-formalized set of requirements around platform conduct, in the hope that these requirements will contribute to a social web that people understand, trust, and want to use.

But trust is not simply a product of rigid adherence to a set of written rules. In their book “The Trusted Advisor,” management consultants David Maister, Charles Green, and Robert Galford describe a “trust equation” that characterizes trust as the product of multiple variables, including credibility (someone’s expertise and professional presence), reliability (confidence that someone will do what they say they will), intimacy (one’s affective comfort level with another person or group), and self-orientation (the degree to which someone is seen as acting principally in their self-interest). As Maister, Green, and Galford describe it, genuine trust cannot be achieved without finding a balance of each of these factors. And yet, content moderation has gone all-in on reliability via legalistic policy and enforcement approaches—creating institutions that, for the most part, do what they say they will, even if those actions are harmful to their users and damaging to trust in the institution itself.

Episodes like Twitter’s failure to address harmful conduct by Libs of TikTok demonstrate that platforms charged with governing human behavior need the discretion to make difficult decisions in service of their communities, even when written policies and established precedent are imperfect. Reliability needs to have its limits. But the alternative can’t be mob rule or freewheeling governance-by-vibes, either. So what now?

Journalism suggests one possible solution. Newsrooms, like platforms, are charged with making impossible decisions about how to cover global events. Editorial selection is not purely subjective; most reputable papers have clear, codified ethical standards that inform their decisions. But controversy and disagreement emerge nevertheless, and news outlets are charged with making decisions about what counts as news, or what perspectives are in the public’s interest to publish, in the absence of obviously right or wrong answers. In response, some news outlets have created the role of public editor: an outsider within a newsroom who, as communication professor Mike Annany describes it, “combine[s] access and wisdom to thoughtfully critique their paper, help readers trust it, hold journalists accountable to the professionalism they know first-hand, and explain to readers how and why journalism works as it does.” In other words, an effective public editor is empowered to serve as a liaison between the public and the newsroom—bringing insider perspectives to bear on public outrage and disagreement, in service of holding the newsroom accountable to readers, and helping readers understand the complex factors contributing to decisions they individually may disagree with. This work begins from a premise that newsrooms won’t get things right 100 percent of the time, and that trust has to be built and earned in the face of repeated, inevitable institutional failures. 

This approach can apply directly to content moderation. On Jan. 8, 2021, Twitter made the fateful decision to ban President Trump from the platform—a hotly contested choice that sparked polarized condemnation: For having done too little too late or, alternatively, for overstepping the appropriate bounds of a platform’s authority to restrict free speech. During internal discussions, then-CEO Jack Dorsey made a demand of the team advocating for the president’s suspension: If we were going to ban the account, we needed to make the reasoning for doing so public. When the company banned the president later that day, we did so alongside a blog post that detailed, for the first time, the full internal rationale behind the decision. The post was an only lightly edited version of the document the trust and safety team had compiled to make the decision. Agree or disagree, our reasons for making the choice were clear.

Companies are usually allergic to getting into this level of detail about their decisions. For years, Twitter’s response to inquiries about why an account had (or hadn’t) been suspended was a brusque, “We do not comment on individual accounts for privacy and safety reasons.” Twitter’s position, like much of the industry, was that content moderation decisions were between the company and the account being moderated—no one else. In some cases, privacy is genuinely a factor; but there are pragmatic sides to this, too. Keeping moderation decisions opaque shields the company from the discomfort of having to admit and explain the inevitable mistakes. (It also helpfully protects companies from having to deal with the possibility of lawsuits that follow even the faintest hint of an admission of error.) And, in the perennially adversarial domain of content moderation—wherein platforms are locked in a battle with dedicated bad actors who persistently try to walk right up to the limit of allowable conduct, while stopping just short of what it would take to get banned—more explicit, detailed guidelines could present a road map for future malfeasance.

These concerns have merit, and they have informed how content moderation has operated at most platforms for more than a decade. But it’s time to acknowledge, once and for all, that this approach of trying to keep content moderation in a space of obscurity through formality has failed. The decisions here are too complex, too contextual, and too situated in the specifics of individual utterances and patterns of interactions and cultural nuance. Companies tie themselves in knots trying to produce constitution-esque policies that can accommodate any emergent situation without requiring additional subjective reasoning or explanation. The result is a web of rules that are poorly understood, poorly enforced, and, ultimately, fail to achieve their goals of building robust, trustworthy communities.

Creating a dedicated touchpoint between companies and their constituencies would represent an alternate, and fundamentally different, path to building trust. Instead of expecting rules to stand alone, they’d become a foundation for an ongoing conversation between platforms and users about the challenging decisions that go along with content moderation and governance. Instead of simply telling users what they’ve done after the fact and treating trust-building like a public relations exercise, companies could ask, “What would you do instead?” —and, critically, have someone on staff with a mandate to either act on the feedback or have a conversation about why the company arrived at a different conclusion. This goes well beyond just standing up a “Supreme Court,” like the Facebook Oversight Board, to adjudicate appeals of individual cases long after the fact; it requires companies to engage, in real time, in the messy business of governance. Doing so turns platforms from unaccountable black boxes into active, present participants in the communities they host—and, at core, ought to be members of.

The most common objection to this approach—and, perhaps, why no company has made a genuine effort at it, beyond a scattered handful of corporate communiques—is scalability. In the first quarter of 2023, Meta took action on nearly 2.2 billion accounts and pieces of content across Facebook and Instagram; surely it’s impossible to directly explain each of those choices. Newsrooms have struggled with the same issue: When it eliminated the role of public editor in 2017, the New York Times explained that “the responsibility of the public editor … has outgrown that one office”—ironically, arguing that feedback pouring in via social media was both too much for a single person to handle, and in any event obviated the need to have a dedicated public editor on staff (because, presumably, reporters can just read their Twitter mentions and walk away with the same degree of accountability). Talking about the idea of a social media public editor with Techdirt’s Mike Masnick, the former public editor of the New York Times Times Margaret Sullivan made a similar argument: that the volume a social media “complaints department” would have to handle would be nearly insurmountable. But this expectation that a public editor’s responsibility is to address complaints at scale misses the core failures plaguing social media. First, exposing the inner logic of social media content moderation doesn’t require 2 billion explanations—it requires thoughtful, detailed engagement with the key, benchmark decisions exposing how and why companies act the way they do. Not every content moderation decision is as high-stakes as banning Donald Trump; perhaps platforms could start with the cases that are, though, and see how it goes. Second, and more critically, the key function of a public editor is not merely marshaling the tidal wave of complaints, a job already allocated to operations and trust and safety teams—it’s overseeing a process of providing meaningful transparency about what happened at pivotal moments, and driving internal accountability for what ought to happen next. 

The work of a public editor is not easy; explaining tough calls and demanding accountability from large institutions for their mistakes is pretty thankless work. Within companies (or newsrooms), a dedicated user advocate can quickly spiral into an antagonist for already overwhelmed internal teams; it’s challenging to navigate the interpersonal and organizational dynamics of an employee whose entire job description is to publicly call out and diagnose a colleague’s failures. From the outside, when representatives of social media companies have spoken publicly about controversial decisions in the past, they’ve been heckled, harassed, and doxxed; I’ve experienced these consequences firsthand. The degree of editorial independence required for a public editor to be successful—as Sullivan put it, a public editor needs to be assured that no one will tell them what they can or can’t say about an incident—is at odds with the risk-averse legal postures common in the tech industry. And, as the Times in part concluded when they abolished the role of public editor, in an increasingly cost-sensitive environment, standing up a function just to explain what other parts of the business are doing isn’t necessarily how companies might opt to spend their scarce resources. 

But, nearly two decades into the mainstream social web, technology companies—and the users of their services—need to recognize that existing solutions to establish trust aren’t working. There will likely never be rules good enough or perfectly enforced enough to obviate the need for a genuine touchpoint between platforms and their publics—and even perfect rules run into the shifting, dynamic limits of user behavior. People will push boundaries, and platforms need a way to meet that activity when and where it emerges, without devolving into ad-hoc totalitarian rule. Where legalism has failed, perhaps a more human, less rigid approach might succeed.

Yoel Roth is a Technology Policy Fellow at UC Berkeley, a Non-Resident Scholar at the Carnegie Endowment for International Peace, and the former Head of Trust & Safety at Twitter. For more than 7 years, he led the teams responsible for Twitter’s content moderation, integrity, and platform security efforts, including policy development, threat investigation, product, design, research, and operations. His current research and writing focus on the trust and safety industry and how technology companies manage the conflicting values and incentives built into content moderation at scale.

Subscribe to Lawfare