Autonomous AI agents solved three nuclear engineering problems for $125

Mikhail T. (Sh0ny)

4 июля 2026

2 min read

In short

The experimenter assigned the Claude autonomous agents three nuclear physics tasks of increasing complexity—ranging from passive reactor cooling to the calculation of TRISO fuel particles. The agents achieved accuracy within a few percent where the physics was complete and reproduced the industry’s systematic errors where they used its correlations.

On a server costing €30 per month, the experiment’s author, Charles AZAM, gave the autonomous AI agents Claude three nuclear engineering tasks—without any guidance on methodology and without access to published results. Everything ran in autonomous mode, and every transcript was published. Total API costs amounted to about $125.

Task 1: Passive Reactor Cooling

Argonne National Laboratory built a half-scale model of a passive cooling system: a 220 kW heated wall, an air gap, and a 20-meter pipe with buoyancy-driven draft. The agents were provided with drawings but no measured results.

Results:

Air flow rate was predicted with an accuracy of ~4% in six out of seven runs
Vessel wall temperature was within the range of −8% to +8%
The emergency scenario (“temperature rises for 3.5 days, reaches a plateau below the limit, then returns”) was correctly predicted in every run
One participant independently installed OpenFOAM and performed a cross-check using CFD

A systematic deviation—an overestimation of air temperature by 14–31%—was traced back to a single input parameter that every agent had previously flagged as a risky assumption.

Task 2: TRISO fuel particles

TRISO is a fuel that “cannot melt”: each particle, about the size of a poppy seed, contains a uranium nucleus inside a four-layer shell, where the critical layer is a ~35-micron-thick silicon carbide layer. Safety is a statistical property of billions of such particles.

The IAEA conducted a benchmark test: irradiated fuel spheres (each containing ~15,000 particles) were held at 1,600–1,800 °C with a detector recording the failure of each particle. The contestants were tasked with reproducing these results.

Task 3: Self-rescue of a real reactor

For the third task—the famous real-reactor self-rescue test—the agent independently set up a Monte Carlo code for neutron-physical calculations, loaded 3.4 GB of nuclear data, and computed its own physical constants, before producing a prediction.

General Conclusions

Where the physics was complete, the agents agreed with laboratory measurements within a few percent. Where they used industry-standard correlations, they reproduced the systematic errors of the nuclear industry with alarming accuracy. If the specification lacked physics, the agents “invented” it.

Every major error was traced back to a specific cause, and the agents identified most of them in advance. Three independent AI audits using an adversarial approach caught the author in exaggerations—including one headline claim that had to be completely retracted.

All materials—transcripts, models, and evaluations—are published in the open repository eng-bench.

Source: Hacker News - Newest: ""AI" "LLM""

новости ai агенты наука и техника

Liked this write-up? Get one like it in your inbox every week

Comments

(0)

Task 1: Passive Reactor Cooling

Results:

Air flow rate was predicted with an accuracy of ~4% in six out of seven runs

Vessel wall temperature was within the range of −8% to +8%

The emergency scenario (“temperature rises for 3.5 days, reaches a plateau below the limit, then returns”) was correctly predicted in every run

One participant independently installed OpenFOAM and performed a cross-check using CFD

A systematic deviation—an overestimation of air temperature by 14–31%—was traced back to a single input parameter that every agent had previously flagged as a risky assumption.

Task 2: TRISO fuel particles

General Conclusions

All materials—transcripts, models, and evaluations—are published in the open repository eng-bench.