Methodology

How Principle generates, evaluates, and calibrates geopolitical scenarios

A structured account of the analytic and computational methods underlying Principle's simulation outputs — including acknowledged limitations.

Structured Analytic Techniques as Foundation

Principle's scenario generation methodology is grounded in Structured Analytic Techniques (SATs) — a family of methods developed within the intelligence analysis community to counter cognitive biases, manage analytic uncertainty, and produce defensible estimates. SATs were systematically codified in academic and practitioner literature [1] as a response to recognized failures in intelligence analysis where unstructured expert judgment produced poorly calibrated assessments.

The core SATs applied in Principle's workflow include:

  • Analysis of Competing Hypotheses (ACH): Scenarios are evaluated against a structured hypothesis matrix. Rather than generating scenarios that support a dominant narrative, ACH requires scenario branches to be tested against contradictory evidence and alternative framings.
  • Key Assumptions Check (KAC): Each scenario branch begins with an explicit enumeration of the assumptions required for that scenario to obtain. Assumptions are classified by confidence level and flagged as critical or peripheral to the scenario outcome.
  • Probabilistic Threat Assessment: Probability weights are assigned to scenario branches using structured elicitation techniques, not from model self-assessment alone. Where expert input is available, calibration procedures are applied to reduce overconfidence bias.
  • What If? Analysis: Scenarios are stress-tested by forcing contrary assumptions — asking what conditions would need to hold for a discounted scenario to become dominant. This process systematically surfaces blind spots in the base scenario tree.

The function of SATs in Principle's workflow is not ceremonial application of labels — it is to constrain scenario generation within analytic boundaries that a trained intelligence analyst would recognize as methodologically sound.

[1] Heuer, R.J. & Pherson, R.H. (2014). Structured Analytic Techniques for Intelligence Analysis (2nd ed.). CQ Press. / NATO Intelligence Standards for SAT application in policy contexts.

Role of Language Models in Scenario Generation

Large language models (LLMs) are applied in Principle's workflow as structured generators, not as oracles. The role of the LLM is to enumerate scenario branches given a constrained parameter set, not to independently assess geopolitical situations.

This distinction is operationally significant. Unconstrained LLM generation of geopolitical scenarios produces outputs that are often internally plausible but collectively unweighted — the model cannot reliably distinguish a highly plausible scenario from a low-probability but vivid narrative. This is a known failure mode in LLM-based analysis and is explicitly addressed in Principle's architecture.

The LLM operates within a structured generation scaffold that:

  • Receives scenario parameters as structured input (actor profiles, variable states, timeline anchors)
  • Is prompted to generate branches subject to SAT-derived constraints, not free-form narrative
  • Has outputs evaluated by a validation layer that checks internal logical consistency and assumption coverage
  • Does not determine probability weights — those are assigned by the calibration layer described below

The LLM is therefore a structured enumeration engine, not a judgment engine. The analytic judgment functions — weighting, assumption validation, red-teaming — are conducted by separate process layers, some of which incorporate human analyst input at configurable points in the workflow.

Probability Calibration and Uncertainty Representation

Probability weights assigned to scenario branches are derived from a structured calibration process. The objective is not precision — geopolitical forecasting does not support precision at the individual event level — but tractability and defensibility: outputs should represent a calibrated epistemic state that can be communicated to and reviewed by decision-makers.

The calibration approach draws on established frameworks from judgment and decision-making research [2] and from forecasting practice [3]:

  • Reference class anchoring: Base rates from comparable historical scenarios are used as starting probabilities before adjustment for specific contextual factors.
  • Decomposition: Complex scenario probabilities are decomposed into component conditional probabilities where decomposition reduces rather than amplifies estimation error.
  • Superforecasting-style elicitation protocols: Where human analyst input is incorporated, structured protocols are used to reduce scope insensitivity and overconfidence.
  • Interval representation: Outputs report probability intervals (e.g., 0.25–0.40) rather than point estimates where interval representation more accurately conveys calibrated uncertainty.

Probability outputs are explicitly designed to support the language of intelligence probability expressions — "likely", "highly likely", "remote" — as used in standardized analytical communication frameworks, including U.S. Intelligence Community Directive 203 (ICD 203) and NATO intelligence standards.

[2] Tversky, A. & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124–1131. / [3] Tetlock, P.E. & Gardner, D. (2015). Superforecasting: The Art and Science of Prediction. Crown.

Validation and Red-Teaming Process

Scenario outputs are subject to a structured validation and red-teaming process before delivery. The purpose of this process is to ensure that the scenario set delivered to the client is not merely internally coherent but also appropriately challenged — that dominant scenarios have been stress-tested against adversarial assumptions.

The validation process operates in two modes:

  • Internal validation: Automated SAT consistency checks confirm that each scenario branch satisfies the assumption requirements declared in the Key Assumptions Check. Branches failing consistency checks are flagged for analyst review rather than automatically excluded, since failing a consistency check may indicate a gap in declared assumptions rather than an implausible scenario.
  • Adversarial scenario injection: A separate generation pass deliberately constructs scenarios that challenge the assumptions underlying the highest-probability branches identified in the base run. This red-team pass is designed to surface analytical blind spots — scenarios that are plausible but would be missed by an analysis that optimized for finding the modal outcome.

Red-team scenarios are provided to clients alongside base scenarios, clearly labeled as adversarial challenges to the primary scenario tree. The combined output — base tree plus red-team challenges — gives decision-makers both the most likely scenario space and a structured view of where the primary analysis may be most vulnerable.

Acknowledged Limitations

Analytical integrity requires explicit acknowledgment of the limitations inherent in any scenario simulation methodology. The following limitations are recognized and should inform how outputs are interpreted and applied.

  • LLM training data boundaries: Language models have knowledge cutoffs and may under-represent recent developments or low-coverage geographic regions in their base training. This creates asymmetric scenario coverage where well-documented contexts receive more scenario variety than less-documented ones.
  • Calibration is not prediction: Calibrated probability outputs represent structured epistemic uncertainty — they do not constitute predictions. A scenario assigned a 0.30 probability is not predicted to occur; it is assessed as occupying approximately that portion of the plausible future space given the available evidence and declared assumptions.
  • Garbage-in-garbage-out applies: Scenario quality is bounded by input quality. Poor-quality actor profiles, outdated intelligence, or biased framing of the scenario domain will produce correspondingly degraded outputs regardless of methodological rigor applied downstream.
  • SAT application is not a substitute for domain expertise: Structured methods constrain the scenario generation process, but they do not replace geopolitical domain expertise. Outputs from Principle are designed to augment, not replace, experienced analyst judgment. Final analytic conclusions remain the responsibility of the analyst teams using the platform.
  • Model behavior is not fully transparent: Despite the structured generation scaffold, LLM-generated scenario branches may embed implicit assumptions or cultural frames that are not captured in the explicit assumption inventory. Analyst review of outputs remains essential for identifying such embedded assumptions.

These limitations are communicated in every deliverable produced by Principle. The expectation is that clients use scenario outputs as a structured input to their own analytical process — not as a substitute for it.

Explore Our Research

Detailed methodology notes and case studies are published in the Principle research archive.

View Research