Democracy & Elections Executive Branch Foreign Relations & International Law

Integrating the Art and Science of Wargaming

Bethany Goldblum, Andrew Reddie
Wednesday, March 15, 2023, 12:10 PM

Lt. Gen. Paul Van Riper, pictured on the right, commanded “Red” forces in the Millennium Challenge 2002 war game exercise. (U.S. National Archives,; Public Domain Mark 1.0,

Published by The Lawfare Institute
in Cooperation With

How can the United States deter a Chinese attack on Taiwan? Will Russia use tactical nuclear weapons in Ukraine? Do hypersonic weapons make the world less stable? Should military planners integrate decision support algorithms into nuclear command and control?

Though they address a wide range of issues in strategy, security, and foreign policy, each of these questions poses a common challenge for analysts and policymakers: In the absence of observational data, how can they make a prediction about the state of the world?

To address this challenge, policymakers often rely on inferences drawn from wargames—strategy and analytical games that simulate tactical, operational, and strategic aspects of conflict. While they are useful in studying potential conflicts, one must consider whether wargames are appropriate for drawing these sorts of inferences. Currently wargames are designed and executed predominantly as descriptive or correlational studies and thus are not suitable for uncovering the kinds of causal relationships required to inform these sorts of policy challenges. Wargamers can and should do better.

The Utility of Wargaming Methods

There is a great deal to like about wargames. They represent an immersive synthetic environment in which players engage in strategic interaction and must live with the consequences of their decisions. Wargames also allow players—whether current or former policymakers or other populations of interest—to engage with tactical, operational, and strategic problems in all of their complexity (or, at least, more complexity than formal approaches or survey methods). Researchers are also able to craft games that push directly on problems of interest, offering them control over the players’ experience. 

Strategically building environments that leverage each of these traits represents the “art of wargaming.” These traits set wargaming methods apart from some of the alternative synthetic data-generating processes that analysts often turn to (such as formal approaches and survey experiments). These traits have also driven recent interest in the application of wargaming in broader national security contexts.

There are three primary applications of wargaming methods—teaching, exploration, and analysis. In brief, the first use case—teaching—might involve making classroom concepts (such as deterrence, defense strategy, and democratic peace) “real” for scholars and military professionals. In some cases, wargames might also be used to teach complex topics, reinforce fundamental lessons, and measure performance before and after taking a course (though it’s important to consider whether there is a “right” way to play).

The second use case—exploration—involves engaging current or former policymakers, subject matter experts, and scholars to consider current and previous problems. For example, how might the introduction of missile defense capabilities alter the strategic calculus in the Baltic Sea region. In these cases, the data stem from conversations over the table or a record of decisions made by players. These wargames can yield useful results for understanding what information is necessary for policymakers to come to a decision or what resources and/or capabilities are germane to a particular scenario of interest. This exploratory use case might subsequently pose additional research questions for examination as part of a broader cycle of research. Much of the traditional wargaming that takes place within the defense community exists in this category—creating conditions for policymakers to contend with potential scenarios of interest or seeking to understand the broader context in which a national security challenge might play out.

Unfortunately, this second use case is often confused with the third—wargames used for analytical purposes. That is, wargames designed to collect data to answer a specific question—whether driven by current events, at the direction of policymakers, or academic debates. This article is focused primarily on this third use case.

The Analysis Problem

Both traditional wargame designers and participants tend to believe that wargame play yields meaningful insights about generalizable outcomes. This phenomenon has been described elsewhere as a “reluctance to accept both the power and limitations of wargames.”

The problem with using existing or “traditional” wargame designs for analytical purposes is rooted in their lack of adherence to the scientific method. While a single traditional seminar-based game can be used to explore a problem space or understand the contexts in which players (including groups of luminaries) might use nuclear weapons or choose restraint, these games have inherent methodological flaws, and thus it is not appropriate to draw analytical conclusions from them. 

Another particularly problematic aspect of traditional wargames is that they are generally played only once. Because the games are not repeated, the lack of a control case or comparator precludes objective analysis of the particular variables of interest. For example, a comparison of conflict escalation in the synthetic environment with and without decision support algorithms is required to assess the escalation risks posed by introducing decision support algorithms into nuclear command-and-control systems. Put simply, without a control case, observers cannot be certain that the outcomes of their experiments—in this case, wargames—are the result of the subject matter under study. A study that relies on a collection of samples allows researchers to generalize conclusions about a population, which reduces error and increases the reliability of an analytical assessment. A single measurement does not yield sufficient statistics to assess behaviors that are inherently probabilistic in nature. And even those games that occur in a series—such as DEGRE—often integrate changes that preclude repeatability and prevent meaningful comparison across games. At best, one-shot wargames represent “a” potential future, not “the” or even “the most likely” future as might be determined from assessing a statistical collection of repeated wargame trials. 

Traditional wargames also often rely on a white cell (a group of nonplayers tasked to move the game forward by ingesting orders from the players and creating the context for the next round of play). This introduces a bias because gameplay is driven not only by player actions but also by an external group ostensibly outside the synthetic environment. From the perspective of scientific inquiry, observers should be particularly concerned about the adjudication of gameplay by the game designers via the white cell, which could lead to the manipulation of game outcomes. In the worst case, this could influence game results toward outcomes preordained by a particular sponsor to justify the creation or maintenance of a particular program.

The Solution

Though challenging, these limitations should not lead wargamers to throw up their hands in resignation. Applying the scientific method to wargaming—specifically, through the execution of a controlled experiment—both rescues its analytical utility and provides a significant opportunity to answer the difficult questions detailed above that policymakers are most interested in. Our approach—described as “analytical” or “experimental” wargaming—takes concerns about generalizability seriously by enforcing a control-treatment design while still leveraging the benefits of wargaming as a tool for human-centered data collection.

Take the question of whether hypersonic weapons make the world less stable. Analysts seeking to test this hypothesis might design a wargame featuring a multilateral conflict scenario with two cases—one with hypersonic weapons available in the players’ arsenals (the treatment) and one without (the control case). Multiple games would then be executed for each case while maintaining a record of game state and play to develop a body of evidence on the variable of interest. These data would then be analyzed to examine, for example, the likelihood of nuclear first strike with and without hypersonic weapons available to the players. Such an approach, where the wargame design and execution are tailored for the hypothesis under study and games are repeated for control and treatment cases, allows for the application of the scientific method across a number of policy-relevant scenarios of interest. 

Experimental wargames have various defining characteristics that set them apart from wargames designed for teaching or exploration. They are conducted using a control-treatment approach, where the game is run at least twice and all conditions within the two games or sets of games are held constant aside from the treatment variable (which is the phenomenon whose impact analysts are trying to assess) and the characteristics associated with each player. Experimental wargames are designed to give researchers control over the variables under examination and are instrumented for data collection. Experimental wargames must be repeatable to sample stochastic processes and reproducible so that different teams can conduct the same experiment for themselves to determine if they reach the same conclusion as other researchers (within statistical error, of course). Experimental wargame participants are also assigned to control or treatment groups randomly to avoid bias introduced by differing individual characteristics. As the field matures, analysts might also consider player characteristics as an independent variable impacting game outcomes—which is one of a variety of laboratory effects that the scholarly community is only beginning to identify and investigate.

The challenge, of course, is making sure that adherence to these principles does not detract from contextual realism and player agency in the synthetic environment, where players can make meaningful decisions with real-world dynamics and benefit, or suffer their consequences.

A Nuclear Use Case

Our recent work examining the effects of low-yield nuclear capabilities on escalation and the likelihood of nuclear use serves as an example of taking experimental design principles seriously in the application of wargames for social science inquiry. In the process, we engaged with a theoretical debate between Vipin Narang and Austin Long about whether low-yield nuclear weapons are destabilizing or not. Currently, and fortunately, there is little to no empirical data to weigh the theories posited by both scholars. In our view, experimental wargaming could play a role in filling this gap. 

Alongside colleagues from the national laboratories in the Project on Nuclear Gaming, we built the SIGNAL wargaming platform—in table-top exercise, board, and online formats. Unlike other platforms, SIGNAL allows researchers to control for the introduction of specific military capabilities by manipulating the capability cards available to each player at the outset of the game. Using this platform, we eventually found some support for the theory that low-yield nuclear weapons increase the likelihood of nuclear use and act as a substitute for high-yield systems. 

During the game design process, we had to manage the exigencies of capturing the core dynamics of nuclear decision-making to provide contextual realism that might allow players to draw externally valid conclusions. We also had to create a synthetic environment that allowed for engaging play—creating the immersion and consequential decision-making that are foundational to wargaming methods. Finally, we had to ensure that the art of the game design was weighed against its analytical utility to answer the research question of interest. While these requirements limited some creativity, they significantly helped to enable data collection and answer our specific research questions related to nuclear use.

Perhaps, as important as the findings are the game manual and account of the game design decisions that were published alongside our work in the hope that other scholars and analysts will examine the methods, replicate the study, or tweak the game design as they deem appropriate. Indeed, if researchers are to understand and interrogate the laboratory effects of using wargames for analysis and advance this approach as a means for drawing quantitative inference, sharing data and design information is essential. 

Of course, SIGNAL was designed to answer a specific research question, but the broader lessons concerning the potential of integrating experimental methods and more traditional wargaming design principles have proved valuable and sparked considerable interest among academics looking for alternatives to survey and simulation approaches.

The Future of Wargaming for Analysis

There remains a rampant debate as to whether wargaming represents an art or a science.

Our answer to this question is that it is—and should be—both. Game design requires artistry (recognizing that engaged play is intrinsic to the value of the medium), but if a game is going to be applied to a problem of national security or foreign policy, then science has to play a role. And taking on this task has never been more important. The United States faces a changing, complex, and challenging security environment made more complicated by emerging technologies that are altering the security environment in fundamentally new ways—like artificial intelligence, quantum computing, human-machine interfaces, networked sensors, and space technologies—all of which we have limited to no data on regarding their impact for the warfighter and the national security environment. Wargaming done well has a crucial role to play in filling this gap and strengthening the United States’s national security now and in the long run.

Bethany Goldblum is an associate professor in the Department of Nuclear Engineering and member of the Faculty Advisory Board at the Berkeley Risk and Security Lab at the University of California, Berkeley.
Andrew W. Reddie is an Associate Research Professor at the University of California, Berkeley’s Goldman School of Public Policy, and Founder of the Berkeley Risk and Security Lab where he examines how technology shapes international order—with a focus on nuclear weapons policy, cybersecurity, AI governance, and innovation.

Subscribe to Lawfare