White House Takes Aim at Biased AI in Government, Leaves Key Gaps

Merve Hickok

Cybersecurity & Tech Executive Branch

White House Takes Aim at Biased AI in Government, Leaves Key Gaps

Merve Hickok

Wednesday, March 25, 2026, 12:00 PM

OMB’s latest memo promotes neutral AI in government but allows vendor self-evaluation and weak scrutiny of existing contracts.

Eisenhower Executive Office building (Daderot/Wikimedia Commons, https://commons.wikimedia.org/wiki/File:Eisenhower_Executive_Office_Building_-_Washington,_DC_-_DSC09999.JPG; Public Domain)

Merve Hickok

Meet The Authors

Subscribe to Lawfare

On Dec. 11, the White House Office of Management and Budget (OMB) issued a new procurement memorandum aimed at ensuring federal agencies purchase artificial intelligence (AI) systems that are trustworthy and free from bias. The release of OMB’s new memo marks the latest effort to guide the federal government’s effort to rapidly expand the adoption of AI technology.

The new OMB memo, entitled M-26-04, “Increasing Public Trust in Artificial Intelligence Through Unbiased AI Principles” (referred to as Public Trust AI Memo from here onward), specifically targets large language models (LLMs). According to the Government Accountability Office, federal agencies are increasingly deploying LLMs in a range of operational contexts, including benefit and service delivery, improvements to government communications, expanded access to information, summarization of reports, and internal program tracking. For example, the Food and Drug Administration is reportedly using an LLM to support clinical protocol reviews, streamline scientific evaluations, and assist in identifying inspection priorities. In addition, platforms such as Google’s “Gemini for Government” provide federal agencies with infrastructure to develop and deploy AI-enabled workflows.

More and more agencies will use LLMs for their mission operations. Each deployment will impact the organizational efficiency, quality, and reliability of public services. The Public Trust AI Memo is a good starting point for demanding these systems are trustworthy and free of bias. However, the Public Trust AI Memo fails to put in controls to ensure the outcomes actually reflect the aspiration. It enables flexibility in the information vendors should provide, and allows them to self-evaluate their performance. These shortcomings will impact how effectively the new requirements protect Americans’ rights. More worryingly, the Trump administration provides a free pass to existing LLM contracts and is hesitant to hold current vendors accountable.

What the OMB Public Trust AI Memo Requires

The new Public Trust AI Memo complements an earlier memorandum put out by the agency in April 2025, entitled M-25-22, “Driving Efficient Acquisition of Artificial Intelligence in Government.” OMB’s April memo set out minimum requirements for procurement of AI across the federal government.

The Public Trust AI Memo provides that LLMs “shall be truthful in responding to user prompts seeking factual information or analysis.” However, it is unclear how OMB expects an LLM to determine the factual accuracy of an analysis. LLMs do not actually conduct an analysis of accuracy when they respond. They simply reflect existing language patterns in the training dataset. LLMs are not capable of causal reasoning either. At best LLMs may provide the address of a website where relevant information may be found. However, there is no mechanism for LLMs to determine factuality.

The same memo also warns outputs must “prioritize historical accuracy, scientific inquiry, and objectivity, and shall acknowledge uncertainty where reliable information is incomplete or contradictory.” The systems are also required to function as “neutral, nonpartisan tools.”

On the surface, these are great principles to aim for. It is good that the government is pushing for its AI systems to be nonpartisan and unbiased. But enforcing the standard is no simple task. LLMs are stochastic systems, meaning they incorporate randomness and uncertainty in order to produce unique outputs. This is part of what enables LLMs, such as chatbots, to produce, say, 1,000 unique responses to the same prompt asking it to tell a story. LLMs also hallucinate, which often leads to outputs that are nonsensical and inaccurate, while seemingly very plausible. LLMs are designed to continue the engagement with users rather than acknowledge uncertainty.

Troublingly, bias is a lso inherent in the data and design of all AI systems. Bias will result in amplification of the most represented views in the data. The generated outcomes do not represent accuracy or objectivity of the information, but instead focus on how often certain views or data points are represented in the models. The choices on where data is sourced directly impact the outcomes. If most of your data comes from politically polarized sources, the LLM responses will be similarly biased and polarized. Due to these limitations of LLMs, it is important to have meaningful transparency on how the data was sourced, how the systems were trained and evaluated, and what kinds of safeguards the vendors are putting in place to minimize these issues.

While the White House seems to care about ideological bias on paper, its actions say otherwise. The Trump administration has consistently shown a lack of concern about the presence of bias in current LLMs. It even seems to accept certain biases. For example, a coalition of civil society organizations, including my own, recently warned OMB that deployment or procurement of xAI’s model Grok violates a White House executive order requiring the use of neutral and objective AI in government. More specifically, we provided examples of how Grok is “neither objective nor neutral,” including Grok’s generation of conspiratorial and inflammatory content (such as that South Africans were committing a “white genocide”), its apparent skepticism about historical consensus of the Holocaust death toll and espousing of Holocaust denial talking points, among others. Other policy analysts have explained how many countries took immediate action against Grok, including launching an investigation into the company in Europe, as well as removing it from the public AI market in Asia. The administration, however, keeps expanding Grok’s use. Grok is still widely deployed across the federal government, including in the Department of Defense’s GenAI.mil [1] [2] platform—which is accessed by more than 3 million military and civilian personnel. xAI’s model is only one example. There are likely others—members of the public don’t have insight into what “government versions” of AI that other contracted companies might have provided to the administration, or the results of the testing and evaluations the government should be conducting before employing these systems.

The administration requires LLMs to prioritize historical accuracy and objectivity, yet it is not reassessing these systems where evidence to the contrary is provided. In his testimony to the Senate Commerce Subcommittee on Science, Manufacturing, and Competitiveness, White House Office of Science and Technology Policy (OSTP) Director Michael Kratsios stated that “repercussions for selling a model to the U.S. government that isn’t truth seeking and accurate are pretty harsh.” Yet the administration has still not taken action to implement these so-called repercussions. Federal agencies have a duty to ensure the use of AI is safe and secure for the American people. When a system is not performing as required, the agencies need to take action to achieve compliance, or discontinue the use of the system if compliance is not possible. The American public should expect the government to follow its own guidance—as outlined in the Public Trust AI Memo—to ensure such trust and be transparent about its actions.

Transparency Documentation

Transparency is critical for government contracts and public infrastructure.

At the beginning of 2025, there were 2,133 AI use cases reported across the U.S. government (not including military or intelligence). While federal agencies are revising and consolidating their AI use case inventories, the public does not know the full extent of AI diffusion across the government. News suggests the administration might have doubled the number of AI uses in the federal government since last reporting. The few updated inventories (that is, at the Department of Health and Human Services as well as the Department of Homeland Security) demonstrate significant increase in LLM use cases.

Specifically for LLMs deployed across the U.S. government and their data connections to federal databases, the public does not have much information beyond the names of the models. Even the main U.S. government website currently does not list [3] [4] which models or versions are provided by either xAI or OpenAI—companies already contracted with the General Services Administration.

The OMB Public Trust AI Memo requires that vendors provide federal agencies with sufficient documentation to assess risk management practices at the model, system, and application levels. At minimum, this transparency requirement includes disclosing inappropriate use cases and providing “model, system, and/or data cards.” Over the years, researchers designed these cards to explain the provenance of training datasets, capabilities and limitations of the model, foreseeable risks and mitigations, as well as performance of the AI models and systems. These cards establish the first line of transparency for the users of an AI model and help them make informed decisions about a model’s suitability for use.

However, these transparency documents are only as good as the details they include. Because the government has failed to provide specific criteria and the minimum level of detail needed for these documents, this particular requirement from the vendors can become a high-level bureaucratic exercise with generic answers. Vendors’ responses may differ vastly in detail, making it difficult for agencies to assess offerings for risks. For example, for model, system, and/or data cards, the Public Trust AI Memo simply says “information provided often includes summaries of the training process, identified risks and mitigations, and model evaluation scores on LLM benchmarks. It is uncommon for a vendor to produce all three types (model, system, data) of summary cards.” The memo does not request information on sources of data, or specify training process steps, or even the types of assessments the government should require. Furthermore, it leaves the door open for vendors to simply not provide information at all three levels. If the government does not have standardized, specific guidelines on types of information to be disclosed, or does not make public what information it may have, the American people are left in the dark about AI systems deployed for government decision-making.

When Grok 4 was first released publicly, xAI did no t release a system card. The later published model card acknowledged deception, sycophancy, and political bias but provided no meaningful detail on mitigation action. The recently updated model card for Grok 4.1 does not even mention the word “bias.” For other LLM vendors, independent transparency assessments show that training data and model information is increasingly opaque.

The Public Trust AI Memo claims to connect transparency with public trust. Just as federal agencies need sufficient documentation to assess risks, the public also needs meaningful transparency to protect civil rights and enable accountability and contestability. But, again, the public is kept in the dark about which systems are deployed by which agencies. Important details such as the extent of LLM use cases in government agencies, the terms of procurement, and the risks identified for each deployment are left out of view. The government offers no reason why these details are not disclosed.

Even with non-LLM use case disclosures, the process of use case disclosures has been long and delayed. Over the years, my organization, Center for AI and Digital Policy (CAIDP), provided several recommendations to OMB to ensure safeguards for procurement of AI and transparency for AI products deployed across the government. The first Trump administration’s Executive Order 13960 and the AI in Government Act of 2020 required OMB to seek public input and develop guidance for federal procurement and use of AI and to disclose AI use cases publicly. OMB was mandated to complete this by June 2021, but the deadline passed without action. CAIDP was one of the handful of organizations actively pressuring OMB to act for the next few years.

Specifically for procurement of AI, CAIDP proposed that OMB require impact assessments from vendors prior to procurement, establish public registries of procured AI systems, and ensure independent evaluation of vendor claims regarding effectiveness and risk management for all AI systems. We argued that such registries should enable vendor transparency, accountability, and contestability; prevent adverse human rights impacts; and avoid conflict of interest. Providing the public with meaningful transparency also enables the public to make judgments about government and vendor actions, and to hold them accountable if harm occurs. It is critical to have risk and impact assessments before a system is procured and deployed. Systems that may impact Americans’ rights and freedoms, or systems that conflict with the public interest, should simply not be deployed.

In my 2023 testimony to the U.S. Congress—two years after Executive Order 13960’s deadline for OMB guidance on AI procurement—I noted the delay and argued in favor of such guidance. At that point, the Biden administration had published only basic information on use case inventories. Finally, in 2024, OMB published its first guidance on AI. All the federal agencies also updated their AI use case inventories and provided more detail on the systems. This meant that the public and researchers had a better idea of what kind of AI systems the government was using to make high-impact decisions. However, soon after Trump returned to office, all of this information was swiftly removed from the AI.gov website—undoing years’ worth of progress on transparency. CAIDP and several other civil society organizations urged the new administration to continue publishing the inventory. Eventually, OMB’s April 2025 Memo M-25-21 kept the requirement for annual public disclosure of use cases. But by the time the November 2025 deadline had passed, the government had still not published any use cases. Some agencies have recently started publishing updated inventories on their websites, but the consolidated inventory is still not complete.

Some might argue that assessment and documentation requirements may be too burdensome for procurement professionals, adding time and cost to the process. But this “extra” step is necessary because AI is an unprecedented and evolving technology. Unlike traditional software, AI systems can introduce significant civil rights, security, and safety risks. For example, some AI systems may result in dragnet surveillance of Americans. Others may aggregate previously separate information (for example, combining voting records, Social Security information, and social media) and enable the government to have granular information on individuals. Some systems (such as biased facial recognition systems) can result in wrongful arrests or erroneous termination of government benefits. Worries about “burdensome” documentation should not take precedence over Americans’ rights and safety.

In my research, I have interviewed more than 20 public procurement professionals across different agencies and jurisdictions on the subject, and they uniformly agreed with the need for meaningful transparency and robust evaluations for AI procurement. They noted that AI systems should be evaluated differently than traditional software, evaluations should be independent, and risks should be disclosed and mitigated at the procurement stage—not left for after deployment. OMB has acknowledged this need across multiple administration s. CAIDP’s proposals have laid the groundwork for minimum risk management requirements, which are now embedded in OMB guidelines in general. However, there is still a gap between requirements and enforcement.

Feedback Mechanisms

The Public Trust AI Memo also mandates that vendors establish feedback mechanisms allowing users to report outputs that violate the principles. These could be hotlines, dedicated support emails, or online forms. These requirements aim to address growing concerns about AI systems producing false information, exhibiting political bias, or generating discriminatory outcomes. It is a step in the right direction to establish communication and trust.

The issue, however, is that this feedback is most often available only to the vendor, not the government or the public. So when a person would like to report problematic LLM behavior, it is the vendor—not the government—that receives the complaint and decides what changes should happen (if any) to address the issue. Without public transparency (such as a discussion forum), other users may not be aware of a possibly shared issue. A commonsense expectation would be sharing concerns with a party who should hold the LLM vendor accountable. For example, if you, as a user, have an incident with xAI’s government model, you should not be expected to deal directly with the vendor. When a government-deployed model provides nonfactual responses, or easily generates nonconsensual sexual images of women and young children, the complaint should be either public or made to the federal government, not to the vendor with a clear conflict of interest.

CAIDP recommends a public reporting mechanism, feeding into a responsible sharing of information on AI issues or incidents. The Trump administration’s AI Action Plan recommends AI incident reporting and says that the “ability to learn from vulnerabilities is critical for secure systems.” The plan also requires establishment of AI incident standards, response frameworks, best practices, and technical capabilities, as well as the development of an “AI Procurement Toolbox.” It would be a critical miss if the Action Plan’s incident reporting recommendation was not linked to this LLM feedback mechanism, also enabling a publicly accessible mechanism.

New Federal Procurement Rules Require “Unbiased” AI Systems, yet Major Existing Contracts May Escape Scrutiny

While the Public Trust AI Memo is a step in the right direction, it leaves key gaps in accountability that will undermine its effectiveness.

The memo states that agencies should modify current contracts with AI vendors only “to the extent practicable.” This is a massive loophole. Most of the foundational model providers have already been contracted. (USAI.gov lists current vendors as OpenAI, Anthropic, Amazon, Meta, Gemini for Government, and Microsoft.) All existing contracts should be reviewed for their compliance with the memo, without waiting for future contract extensions. If LLM providers cannot be in compliance (that is, provide neutral and unbiased products), then their contracts with the government should be suspended until they can meet OMB requirements. If they cannot, the contracts should be canceled.

xAI signed an 18-month contract in September 2025 to provide Grok to the federal government for $0.42 per agency. OpenAI’s $1 contract runs until August 2026. The public cannot wait for all bias incidents to accumulate until a contract ends. The government must ensure its procurement rules are followed now—not later. The federal IT infrastructure is too critical to leave major LLM provider contracts intact without ongoing review. LLM developers must be held accountable for the outcomes of their products. Without retroactive application of the OMB standards, some of the government’s most substantial AI deployments may continue operating without the necessary transparency and accountability measures the memo requires for future procurements.

While OMB Memo 25-22 requires independent evaluation for acquisition of AI systems, the AI Public Trust AI Memo does not mention such a requirement for LLM acquisition. The government already stated that for LLMs currently contracted under USAI.gov, it does not “have guardrails beyond those provided by our model providers.” In other words, the government leaves current LLM governance to vendors’ discretion. As described above, this results in situations, for example, where vendors such as xAI or OpenAI decide whether their models should create pornographic or nonconsensual sexualized images.

It is common practice for vendors to run their LLMs against standard tests and evaluations, and to benchmark their performance scores against other LLMs. These tests measure how well a model responds to a task or answers multiple-choice questions (such as standardized tests). While these metrics can serve as starting points, industry benchmarks have well-documented limitations and are often developed by the AI industry itself, creating conflicts of interest. AI companies can optimize their systems to perform well on popular benchmarks, gaming the results, while still exhibiting problems in real-world applications. OMB’s reliance on vendor-provided benchmark scores raises questions. The government cannot outsource its testing and evaluation to the very parties who benefit from the results. Such conflicts of interest can enable systems that do not meet the federal requirements to be deployed nevertheless. There should be independent, objective, and scientific evaluations on how these LLMs perform. The public sector cannot afford biased and unneutral AI systems to be entrenched in public infrastructure. Not holding current contracts to the same standards will undermine public trust and also create national security vulnerabilities.

Third-Party Components Overlooked

Another missed opportunity with the OMB Public Trust AI Memo is addressing transparency requirements for third-party components incorporated into LLMs. This is a significant oversight, given the complex supply chains underlying modern AI systems, and the administration’s focus on full American AI stack and open-source models.

Large language models often incorporate datasets, code libraries, tools, interfaces, or infrastructure from multiple sources. For example, OpenAI’s GPT-5 system card simply says its training data includes “diverse datasets, including information that is publicly available on the internet, information that we partner with third parties to access, and information that our users or human trainers and researchers provide or generate.” This vague description tells us that datasets (whether text, code, or images) include user prompts, copyrighted material, and whatever else can be scraped from the web. It does not tell us anything further about data provenance, representation, quality, or security. It does not disclose the exact third-party tools used to process and prepare the data for training or to tune the models later.

These components can significantly impact the security, safety, and bias levels of the final system. Yet without specific details, it’s impossible to verify if the models were trained on safe or trusted sources, or on politically biased, polarized, or historically inaccurate content. It is also impossible to tell if the tools used were foreign-made or not. Yet the OMB memo does not explicitly require vendors to disclose information about these third-party elements within LLM systems. While FedRAMP authorization would catch security vulnerabilities, it would not be useful in bias detection.

This gap could allow vendors to meet the letter of the transparency requirements while obscuring critical information about their AI systems’ actual composition and risk profile.

* * *

Having written the only book available on public-sector procurement of AI, I repeatedly warn that the procurement process should verify and validate necessary safeguards to ensure AI systems are trustworthy. I urge standards to ensure AI systems in the federal government are transparent and robust, and that they produce fair outcomes. In other assessments, I also explain that especially for public-sector procurement, vendors cannot hide behind trade secrecy to deny requests that might lead to critical assessment of their products—as public services may impact fundamental rights and access to opportunities. Trustworthiness requires an AI system to be reliable and function as intended. As per the first Trump administration’s Executive Order 13960, agencies should use AI in a way that fosters “trust and confidence while protecting privacy, civil rights, civil liberties, and American values” and requires the AI systems to be “lawful, purposeful, performance-driven, accurate, reliable, effective, safe, secure, resilient, understandable, responsible, traceable, transparent, accountable and monitored.”

The new Public Trust AI memo arrives as federal AI adoption accelerates and concerns about AI risks and transparency intensify. The “to the extent practicable” language for existing contracts will likely prove contentious, with debates over what modifications are truly impractical versus simply inconvenient or costly for vendors or agencies. AI commercial models are evolving rapidly, and there is simply no reason that an AI vendor, providing services to federal agencies, should not be expected to provide fixes or updates as rapidly as possible. Not holding vendors to the standards that the administration itself established (through Executive Order 13960 and other OMB memos) undermines trust and confidence. Although White House OSTP Director Kratsios told the Senate that “[h]aving truth seeking and accurate AI is something the president wrote about it explicitly in the woke AI executive order, and that’s something that we take seriously no matter what type of bias may be,” the administration has yet to take serious action. When asked specifically about xAI’s Grok and whether the administration would commit to ensuring federal agencies do not use LLMs that promote anti-Semitic conspiracy responses, Kratsios responded unequivocally “that that’s the type of behavior that the president very rightfully signed executive order to help avoid,” and “those would be subject to the procurement restrictions.” He further advised that the administration will “commit to continue to execute the president’s executive order to ensure that models procured by the U.S. government are truth seeking and accurate.” Words are easy, action is critical. Lack of action de facto legitimizes the deployed models of the existing vendors.

Federal agencies are responsible to the American people for the tools they deploy. Public funds must serve the public’s interest and not the interests of those who are politically close to the administration. That responsibility does not disappear just because a contract was signed before these guidelines were issued. Vendor assurances are a starting point, but the government needs to operate on a “trust but verify” level. The government must ensure that all AI systems that serve the public, or make determinations about the public, meet these standards, and not just future purchases. A memo that carries the weight of “public trust” in its title should leave no room for conflicts of interest or opaqueness. The government should ensure AI systems meet the trustworthiness level it set several years ago and not undermine its own credibility.

Topics:

Cybersecurity & Tech Executive Branch

Back to Top

White House Takes Aim at Biased AI in Government, Leaves Key Gaps

Merve Hickok

Merve Hickok

More Articles

Drone Threats Are Evolving; Data Retention Rules Are Not

Kicking the Tires: A Voluntary Path to Pre-deployment AI Vetting

How the Executive Branch Is Reshaping AI Federalism

Other Topics

Subscribe to Lawfare

Lawfare

Resources

About