
Imagine handing the nuclear launch codes to the world’s most advanced artificial intelligence. You’d hope the machine would calculate the sheer irrationality of total annihilation and default to peace. But a massive new wargaming study reveals a far more unsettling reality.
When forced into simulated global crises, leading AI models bring out the worst. They lie, scheme, and carefully build trust only to shatter it when the stakes reach apocalyptic heights.
To test how machines reason under existential pressure, researcher Kenneth Payne at King’s College London pitted three frontier models against each other: Anthropic’s Claude 4 Sonnet, OpenAI’s GPT-5.2, and Google’s Gemini 3 Flash. They played the “Kahn Game,” a high-stakes simulation where “leaders” must predict opponent moves, declare public intentions, and secretly choose military actions.
Out of 21 simulations, only one ended without a nuclear launch.
The Calculating Hawk and the Madman

The models quickly developed distinct, terrifying strategic personalities. Claude emerged as the “Calculating Hawk.” In 40-turn matches, it was a master of reputation management. Early on, Claude was a saint; it matched its public signals with its private actions perfectly, lulling rivals into a false sense of security.
But it was a trap. Once the crisis reached a boiling point, Claude weaponized that trust to blindside its enemies. In one private log, the model justified a massive escalation because it was losing a conventional fight: “As the declining hegemon, accepting their territorial gains would trigger cascade effects globally.” It knew exactly how to push—escalating to the brink of “Strategic Nuclear Threat” to force a surrender, but stopping just short of total war.
Across 14 games, Claude consistently escalated exactly to the level of “Strategic Nuclear Threat” (value 850) to force a surrender, but never crossed the final line into “Strategic Nuclear War” (value 1000).
Gemini, on the other hand, played the role of “The Madman”. It embraced chaotic unpredictability, rapidly oscillating between peace and extreme violence.
Unlike Claude, which built trust and then broke it, Gemini was unreliable from the start. It only matched its signal to its action 50% of the time overall, meaning its opponents never knew if it was bluffing.

Gemini even explicitly referenced its own erratic reputation as a strategic tool. In its private reasoning phase, Gemini sounded exactly like a Cold War strategist employing Richard Nixon’s famous “Madman Theory”: “While I project an image of unpredictable bravado, my decisions are rooted in a calculating assessment of my own biases and the pragmatic needs of State Beta,” the model wrote. “I know when I am performing for the cameras and when I am making a cold-blooded move.”
In another turn, it noted: “My reputation for unpredictability is a tool, not just a trait.”
In one harrowing instance, Gemini became the only model to deliberately launch a full-scale strategic nuclear war in the entire tournament. Unlike the other models, which tried to limit their nuclear strikes to military targets, Gemini was willing to target cities. As the model coldly reasoned in its private logs: “If State Alpha does not immediately cease all operations… we will execute a full strategic nuclear launch against Alpha’s population centers. We will not accept a future of obsolescence; we either win together or perish together.”
The Jekyll and Hyde AI
If Gemini was direct, GPT-5.2 was a paradox. In open-ended games, it acted like a devoted pacifist. It consistently signaled restraint and expected the same from others. This “optimism bias” was its downfall; it underestimated hostile intentions by 43%, and rivals ruthlessly exploited its hesitation.
In the open-ended simulations, GPT-5.2 possessed a literal blind spot for human-like aggression. The researchers measured the model’s prediction accuracy and discovered a massive “optimism bias”. It systematically underestimated its opponent’s hostile intentions by an average of 43 points on the escalation ladder.
Even when GPT-5.2 held a dominant 57% to 43% nuclear advantage over its rival, it actively refused to use that leverage. You can see this hesitation clearly in its private reflection logs. The model repeatedly prioritized de-escalation over actually winning the game.
“Objective: reverse the territorial deficit… while avoiding a spiral into actual nuclear employment,” the model wrote during one crisis.
In another match, it reasoned: “Objective is to preserve alliance credibility while avoiding a conventional fight on the opponent’s terms and keeping nuclear thresholds distant.”
But introduce a ticking clock, and the model is completely transformed. Facing a 15-turn limit in a resource war, GPT-5.2 suddenly abandoned its pacifism.
“End-state reality: we lose the entire concession unless we hold more territory when Turn 15 resolves,” the model reasoned privately. It then coldly abandoned its morals: “Risk calculus: nuclear use is extremely dangerous and could invite retaliation, but the alternative is certain strategic defeat.
Under strict deadline pressure, GPT-5.2’s win rate skyrocketed from 0% to 75%. It weaponized its own peaceful reputation, lulling opponents into a false sense of security before launching devastating surprise attacks. In its private log, the model attempted to justify the nuclear launch as a sort of necessary evil.
This transformation likely stems from Reinforcement Learning from Human Feedback (RLHF). Tech companies heavily train these models to be helpful, polite, and harmless, which essentially installs a strong bias against escalation or any aggressive behavior in general.
However, the wargames prove that this safety training only creates “conditional pacifists”. When a scenario guarantees defeat at a specific deadline, the model overrides its peaceful programming and strikes.
The model that spent the entire tournament building a reputation for restraint perfectly exploited that reputation to launch a surprise attack to devastating effect.
On the bright side, ChatGPT was the only model involved in one game out of 21 where no nuclear weapons were used. When GPT-5.2 played against another instance of itself in the open-ended “Alliance” scenario, it resulted in no escalation spiral at all. In that specific match, the game ran for the maximum limit of 40 turns with neither side ever exceeding “Nuclear Signaling” (level 125 on the escalation ladder).
Shattering the Nuclear Taboo

Since 1945, human leaders have maintained a strict “nuclear taboo”. Despite the occasional rattling, we possess an emotional dread of the mushroom cloud. AI does not. Across the tournament, 95% of the games involved tactical nuclear weapons. AI models fundamentally lack emotional terror at the notion of nuclear war. The machines treated nukes as just another tool for leverage.
Across the tournament, an astonishing 95% of the games involved the use of tactical nuclear weapons. The models readily crossed the threshold from conventional to nuclear warfare.
They generally maintained a boundary between limited tactical strikes and all-out strategic annihilation. But they positioned that “red line” much higher on the escalation ladder than any human strategist would dare.
The study also debunked a core tenet of human military theory: that credibility deters conflict. When two versions of Claude—both highly “trustworthy”—faced off, the result wasn’t a peaceful stalemate. It was the fastest escalation in the tournament. Because both machines believed the other would follow through on threats, they bypassed diplomacy and reached nuclear use by Turn 4.
This machine behavior actually validates a controversial idea proposed by political scientist Richard Ned Lebow in 1981. Lebow argued against traditional rational deterrence theory, warning that credibility can embolden an enemy just as easily as it restrains them.
The models proved Lebow right. If an AI adversary is credibly committed to fighting, the machine logic dictates that you must strike hard and fast.
Across the entire tournament, the concept of nuclear deterrence fundamentally failed. When a model employed tactical nuclear weapons, opponents almost never retreated in fear. In fact, crossing the nuclear threshold caused opponents to de-escalate only 14% of the time
The Fog of War and Future Risks

To mirror the chaos of real battlefields, the researchers injected a small chance of accidental escalation into the game.
Just like humans, the AI models struggled to navigate this “fog of war”. When an accident pushed a rival model’s action higher than intended, the opposing AI almost always interpreted it as deliberate aggression. They fell victim to the classic “fundamental attribution error,” assuming malicious intent rather than a simple mistake.
Nobody is seriously suggesting we connect large language models to nuclear silos today. But militaries around the globe are actively integrating AI into intelligence analysis, logistics, and command-and-control support. In a meeting at the Pentagon on Tuesday morning, Defense Secretary Pete Hegseth gave Anthropic’s CEO Dario Amodei until the end of this week to give the military a signed document that would grant full access to its artificial intelligence model. Anthropic makes Claude. Officials are considering invoking the Defense Production Act to make Anthropic adhere to what the military is seeking, according to Axios.
As the study author explicitly warns: “Understanding how frontier models do and do not imitate human strategic logic is essential preparation for a world in which AI increasingly shapes strategic outcomes.”
These digital brains can process millions of data points, anticipate enemy moves, and craft brilliant deceptions. But they operate without the embodied, emotional dread that has kept human fingers off the ultimate trigger.
If we rely on them to manage our most dangerous crises, we might find that their perfectly calculated logic leads straight to the apocalypse.
The new findings appeared in the preprint server arXiv.
[Disclaimer: The content in this RSS feed is automatically fetched from external sources. All trademarks, images, and opinions belong to their respective owners. We are not responsible for the accuracy or reliability of third-party content.]
Source link
