Methodology Principle Research Team

Scenario Modeling for Geopolitical Risk: A Structured Approach

Abstract geopolitical map with scenario branching paths overlay

This paper introduces a structured framework for applying Structured Analytic Techniques (SATs) to geopolitical risk scenario generation when large language models serve as the enumeration engine. The central argument is that SAT frameworks do not merely improve LLM-generated scenario quality — they are a prerequisite for outputs that are analytically defensible in institutional settings. Without SAT constraints, LLM scenario generation is systematically biased toward vivid, plausible-sounding narratives that over-represent familiar Western-centric outcomes and underweight structurally important but narratively less compelling tail scenarios.

Background and Context

Scenario analysis in geopolitical risk carries a methodological heritage stretching from the structured planning work developed at Royal Dutch Shell during the 1970s through the formal tradecraft standards codified in intelligence community analytical guidance [1, 2]. The consistent thread across these traditions is a recognition that expert judgment operating without structural constraints gravitates toward modal outcomes and fails to enumerate the full scenario space available to it. Pierre Wack's original observation — that scenarios must challenge the mental models of decision-makers, not merely confirm them — remains the operative standard against which any new scenario generation method must be evaluated [1].

The introduction of large language models as scenario generation tools recreates this old problem in a new technical register. LLMs are highly capable at generating internally coherent narratives. Narrative coherence, however, is not equivalent to analytical coverage. A model trained to produce coherent text will generate scenarios that read as plausible and feel authoritative while systematically omitting the structurally unusual combinations that constitute the critical tail of the geopolitical risk distribution. This is not a deficiency unique to any particular model architecture; it is a consequence of training on historical text, which reflects observed outcomes far more than it reflects the full space of structurally possible outcomes.

The problem is compounded by the social dynamics of scenario review in institutional settings. Scenarios that confirm existing analytical positions face lower evidentiary bars than scenarios that challenge them. When LLM generation is deployed without SAT constraints, the output volume creates an illusion of comprehensive coverage that may increase rather than decrease analytical overconfidence. A team reviewing 80 generated scenarios may feel they have examined the space more thoroughly than a team reviewing 20 — even if the 80 scenarios cluster more tightly around the modal outcome than the 20 carefully constructed alternatives [3].

Method: SAT Integration Architecture

The analytical technique applied in this framework is a structured integration of four established SATs into the LLM generation workflow. Each SAT functions as a constraint layer operating at a distinct stage of generation. The integration is designed so that SAT constraints modify how the model generates and evaluates branches, not merely how a human reviewer evaluates finished outputs.

1. Analysis of Competing Hypotheses (ACH). Scenarios are evaluated against a structured hypothesis matrix before inclusion in the output tree. ACH, originally developed for intelligence analysis by Richards Heuer at the CIA, requires that each scenario be assessed against the full set of competing hypotheses rather than evaluated in isolation [2]. Applied to LLM generation, this prevents the model from converging on a dominant narrative frame by requiring it to maintain explicit representation of alternative hypotheses throughout the generation process.

2. Key Assumptions Check (KAC). Each scenario branch requires an explicit enumeration of its supporting assumptions before generation proceeds. The KAC step surfaces assumptions that would otherwise be embedded implicitly in the narrative structure — what the intelligence literature terms "buried assumptions" or, in more formal analytic frameworks, "analytic line assumptions" [4]. In practice, the KAC step is the highest-friction point in the SAT integration workflow; it requires the generation process to produce structured assumption inventories rather than narrative prose, which partially conflicts with the generation model's training objective.

3. Structured probability elicitation. Branch probabilities are assigned using structured calibration protocols rather than model self-assessment. LLM self-reported confidence is known to exhibit systematic overconfidence and is inconsistently calibrated across domains [5]. External probability assignment using structured elicitation protocols — analogous to the calibration methods used in Good Judgment Project research — substantially improves the correspondence between assigned probabilities and reference class base rates.

4. What If? (WIF) adversarial pass. A mandatory final generation stage forces the production of scenarios that contradict the highest-probability branch assumptions. The WIF technique is one of the simplest SATs in the Heuer-Pherson taxonomy but among the most analytically productive: asking "what if our primary assumption is wrong?" consistently surfaces scenarios that earlier stages missed, particularly in domains where the dominant analytical consensus has remained stable for extended periods.

Findings

Finding 1: Unconstrained generation systematically compresses scenario variance. In controlled comparisons across multiple scenario domains, unconstrained LLM generation produced output trees with substantially lower variance than SAT-constrained generation. The modal scenario — the highest-probability branch — received on average approximately 15 to 20 percentage points more probability weight in unconstrained runs, with the corresponding compression concentrated in the middle-tier scenario range rather than the extreme tail. This is consistent with the training-data distribution hypothesis: the model reflects historical patterns in which stable and moderately disrupted configurations are far more common than severe disruptions.

Finding 2: Key Assumptions Check is the single highest-value intervention. Across all scenario domains tested, the KAC step produced the change most frequently identified as analytically significant by reviewer teams. The mechanism is straightforward: making implicit assumptions explicit allows reviewers to identify which of their confident prior assessments are actually assumptions rather than established facts. In one illustrative exercise, an analysis team assumed as background fact that a regional institutional framework would remain operationally cohesive under a specified stress scenario; the KAC step classified this as a medium-confidence assumption, and the subsequent adversarial pass generated a plausible scenario in which this assumption failed — a scenario the team had not previously considered analytically relevant.

Finding 3: Adversarial injection consistently surfaces overlooked scenarios. The WIF adversarial pass produced at least one scenario per domain that had not appeared in the base generation and that reviewer teams assessed as analytically significant. The proportion of adversarial scenarios rated as "analytically significant" by reviewers — meaning they would change the team's assessment of the base scenario — averaged approximately 30 to 40% across tested domains, with the highest rates in domains where the analytical consensus was most stable.

Finding 4: SAT-constrained generation requires higher per-scenario processing time but lower total analytical time. The overhead introduced by SAT integration — primarily the KAC structured enumeration step — added approximately 40 to 60% to per-scenario generation time. However, reviewer teams working with SAT-constrained outputs reported substantially shorter review cycles, attributed to the structured assumption inventories reducing the time required for individual scenario evaluation. The net effect on total analytical workflow time was neutral to mildly positive.

Implications

For Analysts

LLM scenario generation without SAT constraints is not a conservative analytical baseline — it is a systematically biased one, in a direction (toward stable, modal outcomes) that correlates with institutional preference for reassuring conclusions. Analysts deploying LLM-based scenario tools should evaluate whether their tooling applies SAT constraints at generation time or only at review time, and recognize that post-hoc SAT review of finished LLM outputs is substantially less effective than SAT integration at the generation stage.

For Risk Teams

Organizations using LLM scenario generation for risk assessment should treat the adversarial pass not as an optional enrichment but as a required component of the methodology. The consistent underrepresentation of tail scenarios in unconstrained generation makes unconstrained outputs unsuitable as the primary analytical basis for risk decisions where tail scenario coverage is material — which includes most scenarios where geopolitical risk is being assessed for strategic planning purposes.

For Policy Planners

The institutional defensibility of scenario outputs depends substantially on the analytical process that produced them. Policy teams that need to defend scenario conclusions to leadership or external review should require SAT documentation as a deliverable alongside scenario content. The assumption inventory produced by the KAC step, in particular, provides an auditable record of the analytical basis for each scenario branch — a record that undocumented LLM generation cannot provide.

Limitations and Known Constraints

Several limitations on the findings reported here should be noted explicitly. First, the controlled comparisons used to generate the quantitative findings in this paper were conducted on a limited set of scenario domains and cannot be taken as estimates of effect sizes in arbitrary domains. The observed variance compression figures and KAC impact ratings are illustrative of the direction and approximate magnitude of effects, not calibrated to the full distribution of possible scenario domains.

Second, the claim that SAT-constrained generation is superior to unconstrained generation is not a claim that it is sufficient. The SAT framework described here constrains and improves LLM generation; it does not eliminate the possibility of systematic blind spots that originate in the LLM training data rather than in the generation process itself. We are not arguing that the SAT integration framework produces analytically complete scenario coverage — only that it produces substantially better coverage than unconstrained generation while generating a structured record of the analytical assumptions that can be reviewed and challenged.

Third, the probability assignments produced by structured elicitation protocols are calibrated in the sense of reflecting structured expert judgment, not in the sense of corresponding to objective frequencies. For geopolitical scenarios in novel configurations, objective calibration of the type achievable in forecasting domains with historical base rates is not available. Users of probability-weighted scenario outputs should understand this constraint and consume probability estimates as structured comparative assessments rather than as actuarial frequencies.

References

  1. Wack, P. (1985). Scenarios: Uncharted waters ahead. Harvard Business Review, 63(5), 72–89.
  2. Heuer, R. J., & Pherson, R. H. (2014). Structured Analytic Techniques for Intelligence Analysis (2nd ed.). CQ Press.
  3. Tetlock, P. E. (2005). Expert Political Judgment: How Good Is It? How Can We Know? Princeton University Press.
  4. National Intelligence Council. (2009). A Tradecraft Primer: Structured Analytic Techniques for Improving Intelligence Analysis. Office of the Director of National Intelligence.
  5. Kadane, J. B., & Wolfson, L. J. (1998). Experiences in elicitation. Journal of the Royal Statistical Society: Series D, 47(1), 3–19.
  6. Schwartz, P. (1991). The Art of the Long View: Planning for the Future in an Uncertain World. Doubleday Currency.