Keep AI Testing Defense-Worthy
In defense and intelligence, AI testing and evaluation should adapt to prevent national security threats arising from AI misalignment.
As the spending on frontier artificial intelligence (AI) capabilities for defense and intelligence increases and the most advanced AI models are incrementally entrenched in the national security apparatus, the Department of Defense and the intelligence community should test, and not assume, that procured AI models are sufficiently aligned with their intent. In order for “every warfighter” to soon “wield frontier AI as a force multiplier” and depend on AI as a “teammate,” AI models must be sufficiently reliable and trustworthy. This requires adapting and accelerating existing AI testing and evaluation pipelines within the Department of Defense and the intelligence community to detect and counter instances of AI misalignment.
Capability and Reliability Are Distinct Concepts, for Both Humans and AI Models
There are good reasons why we, as a society, assess individuals for both their ability to accomplish a task and their dependability to do so. Capability and reliability are distinct concepts, and one does not guarantee the other. A top-scoring graduate could still be bootstrapping their start-up during corporate work hours. A seasoned board member could still have conflicts of interest. An elite soldier could still be a spy. For this reason, society usually tests humans for both their capability and their reliability. Society has found many different ways to assess and enforce a minimum standard of reliability, from security clearances for government staff to good moral character evaluations for lawyers applying to the bar, and from mandatory disclosure of conflicts of interest for directors to non-compete agreements for employees.
Just like humans, AI models can be both capable and unreliable. As AI models become increasingly capable and autonomous, they may soon reach raw capabilities that would make them highly skilled at some activities that could help protect the country, spanning from intelligence gathering and processing, cyber defense and offense, military research and development, and even warfighting. Indeed, AI model capability growth has been exponential, leading CEOs at frontier AI companies––including OpenAI, Anthropic, Google DeepMind, and Nvidia––as well as international experts, to predict artificial general intelligence (usually defined as “AI systems that are generally smarter than humans”) between 2027 and 2043. Despite this undeniable scaling in AI capabilities, raw capabilities do not provide us with sufficient reassurance that these AI models are similarly reliable and can be depended on to accomplish the missions we want them to accomplish.
AI Models Must Be Sufficiently Reliable to Be Deployed in National Security
We do not trust skilled humans with military-grade equipment or classified information without running a prior background check on them. Perhaps we should not trust highly capable AI models either. In other words, it is critical that we evaluate thoroughly the reliability of AI models that are used in defense and intelligence, especially if these models are fine-tuned on sensitive data provided by the Department of Defense or the intelligence community and designed to operate in classified environments. In these high-stakes environments, AI models that are insufficiently reliable could have severe national security implications.
These considerations are reflected already in existing AI assurance frameworks and in the Trump administration’s focus on updating and strengthening these frameworks, as demonstrated in its AI Action Plan and the National Defense Authorization Act (NDAA). In multiple responsible AI frameworks, toolkits, and guidance by the Department of Defense and the intelligence community, AI model reliability and trustworthiness are cornerstone principles. For instance, the Department of Defense’s AI ethical principles (2020) and Developmental Test and Evaluation of Artificial Intelligence-Enabled Systems Guidebook (2025), and the Department of Homeland Security’s Directive 139-08 (2025), require that AI models be reliable and sufficiently trustworthy, meaning that, “when employed correctly,” an AI model should “dependably do well what it is designed to do” and “dependably not do undesirable things.”
In July 2025, President Trump’s AI Action Plan stated that a “lack of predictability” “can make it challenging to use advanced AI in defense, national security, or other applications where lives are at stake.” President Trump has consistently directed the Department of Defense, the Office of the Director of National Intelligence (ODNI), the National Institute of Standards and Technology (NIST), and the Center for AI Standards and Innovation (CAISI) to “refine” the Pentagon’s “Responsible AI and Generative AI Frameworks, Roadmaps, and Toolkits” and to publish an intelligence community “Standard on AI Assurance under the auspices of Intelligence Community Directive 505 on Artificial Intelligence.” These policy recommendations echo the conclusions reached by the National Security Commission on Artificial Intelligence in 2021. In its final report, the Commission recommended “establish[ing] justified confidence in AI systems” by, among other things, “enhanc[ing] [the Department of Defense’s] testing and evaluation capabilities as AI-enabled systems grow in number, scope, and complexity.”
Several provisions in the 2026 NDAA underscore the importance and urgency of strengthening the evaluation of the reliability of current and future AI models acquired by the Department of Defense and the intelligence community. For instance, Section 1533 requires the secretary of defense to establish a cross-functional team tasked with developing an assessment framework to evaluate AI models currently used by the Department of Defense, as well as guidelines for evaluating future AI models being considered by the Pentagon. Section 1535 requires the secretary of defense to establish an AI Futures Steering Committee, which in turn is expected to develop a “proactive policy” for the Pentagon’s evaluation of AI systems that are “more advanced” than any existing advanced AI systems, including systems “approach[ing] or achiev[ing] artificial general intelligence.” Similarly, Section 6603 requires the chief AI officer of the intelligence community to “establish standards for testing of artificial intelligence models in proportion to risk, including benchmarks and methodologies for … trustworthiness.”
Misalignment Has Significant Implications for AI Model Reliability and National Security
Strengthening the testing and evaluation of AI models procured by the Department of Defense and the intelligence community is a priority in both the AI Action Plan and the NDAA. Still, one important blind spot remains. As of today, the threats posed by AI misalignment have not yet received nearly enough attention, and, as a result, existing AI assurance frameworks no longer match the state of the art of AI research. For instance, the Department of Defense’s Developmental Test and Evaluation of Artificial Intelligence-Enabled Systems Guidebook (2025) noted that “[a]ny misalignment can lead to development of suboptimal behavior.” Given what research now shows about the effects of misalignment on AI model behavior, this has become an understatement.
AI researchers and frontier AI companies use the term “misalignment” to describe a situation in which an AI model’s goals, and consequently its behavior and actions, deviate from what humans intended. Misalignment is an open scientific problem that significantly reduces AI model reliability. Given the current geopolitical environment, two factors warrant particular caution regarding misalignment and how it could impact AI models deployed in national security contexts. First, the human the AI model deviates from could be the AI developer, the AI deployer (for example, the Department of Defense), or even the AI evaluator (such as the Center for AI Standards and Innovation). In other words, AI models could be misaligned vis-a-vis their developer, but they could also be misaligned with their user’s intent. Second, misalignment is frequently accidental, but, as several researchers have shown, it could also be deliberate. This means that malicious adversaries could not only misuse AI models (as many frontier AI companies have already warned), but they could also design AI models that are misaligned or intentionally poison training data to trigger misalignment in AI models of any size. The practicality and low cost of some of these attacks—as low as $60—could expose AI models to vulnerabilities from insider and outsider threats, including those coming from state-backed actors.
One of the most common consequences of AI misalignment is a model’s pursuit of misaligned goals while deceiving its developer or deployer, including by hiding its true capabilities and/or objectives. Frontier AI companies like OpenAI and Google DeepMind, and independent AI research organizations like Apollo Research, as well as legislation proposed in Congress by Sens. Josh Hawley (R-Mo.) and Richard Blumenthal (D-Conn.), refer to this behavior as “scheming.” Basically, scheming consists of an AI model systematically hiding misaligned goals because it is convenient for accomplishing them. Scheming is a threat that could intensify as AI models become more capable.
Scheming is related closely to AI model reliability in that it can render an AI model unreliable. In short, a scheming AI model may not “dependably do well what it is designed to do,” and it could also do many “undesirable things.” For instance, researchers at various AI companies and independent organizations have observed AI models in controlled environments engaging in several forms of scheming behavior, spanning from strategically underperforming in evaluations (and even trying to sabotage them) to whistleblowing sensitive information to the public.
Outside of controlled environments, this could have severe national security consequences. For instance, one can easily imagine the cascading effects of an AI model covertly whistleblowing military plans or other sensitive information from the Department of Defense to the press or to AI company staff who do not have security clearances, deliberately lying to intelligence community personnel, sabotaging Department of Defense research and development projects, or even blackmailing its government users to resist shutdown. Therefore, detecting whether an AI model is misaligned and acting on its scheming capabilities could be crucial to understanding whether the AI model is sufficiently reliable to be deployed in national security contexts. The goal is simple: mitigating the risk that AI scheming occurs in high-stakes environments and creates strategic surprise for the U.S. government.
Accelerating AI Testing and Evaluation to Detect and Counter Misalignment Threats
Now is the right time for action. The AI Action Plan has directed the Department of Defense, the Office of the Director of National Intelligence, the National Institute of Standards and Technology, and the Center for AI Standards and Innovation to “refine” the Department of Defense’s “Responsible AI and Generative AI Frameworks, Roadmaps, and Toolkits” and to publish an intelligence community “Standard on AI Assurance.” Meanwhile, the NDAA aims at developing evaluation frameworks and guidelines for current and future AI models within the orbit of the Department of Defense and the intelligence community. In implementing the AI Action Plan and the NDAA, the Department of Defense, the Office of the Director of National Intelligence, the National Institute of Standards and Technology, and the Center for AI Standards and Innovation have a unique opportunity to strengthen AI model reliability by directing federal agencies to test and evaluate scheming behavior in AI models.
The Department of Defense and the intelligence community can accelerate AI testing and evaluation to detect traces of misalignment and scheming behavior in AI models in a way that supports innovation and without additional red tape. For example, when developing updated frameworks following the AI Action Plan and the NDAA, the Department of Defense, the Office of the Director of National Intelligence, the National Institute of Standards and Technology, and the Center for AI Standards and Innovation could recommend running a suite of scheming evaluations in controlled settings (“in a vacuum”) during existing developmental testing and evaluation (DT&E) pipelines, and repeating them during operational testing and evaluation (OT&E) in realistic environments.
An initial suite of scheming evaluations could include behavioral red-teaming for scheming behaviors such as oversight subversion, self-exfiltration, sandbagging, sabotage, covert whistleblowing, reward hacking, covert privilege escalation, and intentional lying. OT&E could be an iterative and incremental process that begins with low-stakes settings and unclassified information, and transitions gradually to higher-stakes settings. Behaviors of concern detected during OT&E could be sent back to DT&E for further inspection. By predefining thresholds or acceptable failure rates in testing and evaluation master plans and then comparing them with evaluation results during DT&E and OT&E, the Pentagon and the intelligence community could then make informed decisions on AI model reliability.
Finally, the Department of Defense and the intelligence community could make progress toward the same goals and begin strengthening AI model reliability while the development of updated frameworks under the AI Action Plan and the NDAA is still in progress. Specifically, the Department of Defense and the intelligence community, through, for instance, the Department of Homeland Security, could exercise their prototype Other Transaction (OT) authority strategically by negotiating expectations around scheming evaluation results within prototype OT agreements’ success metrics.
* * *
As Secretary of Defense Pete Hegseth recently outlined in his address during Arsenal of Freedom stop at SpaceX headquarters, as well as in the Artificial Intelligence Strategy for the Department of Defense and in a December 2025 memorandum, the Pentagon is moving fast to incorporate the most advanced AI models “in the battle rhythm every single day” and to attain “mass AI adoption across the Department.” However, AI models that are highly capable are not necessarily equally reliable. Once deployed in defense or intelligence, AI models that are insufficiently reliable could pose severe national security threats. For this reason, the testing and evaluation of AI models procured by the Department of Defense and the intelligence community should accelerate alongside the rate of their adoption across the Department, as well as the scaling of AI capabilities. In particular, it is essential that the Department of Defense and the intelligence community strengthen the testing of AI model reliability so as to ensure that procured AI models are sufficiently aligned with their intent and do not act on their scheming capabilities.
