How likely is a non-prosaic explanation for Roswell? (a 3-juror estimate)

A structured-judgment exercise: three independent analysts each estimated, 0–100, the probability that the 1947 Roswell incident had a genuinely non-prosaic explanation — exotic/anomalous material, a crashed non-terrestrial craft, or recovered non-human bodies — as opposed to a mundane terrestrial object (balloon + radar target, from Project Mogul or any other classified/ordinary balloon program). Each reasoned independently, no internet, from the same balanced briefing (strongest points both ways). Filed 2026-06-01. Built on 2026-05-31-could-roswell-debris-be-project-mogul, roswell-incident-1947, roswell-witness-affidavits.

Result

JurorP(non-prosaic)
14
24 (“just below 5”)
34
Synthesis~4–5 / 100 (≈95% prosaic)

The convergence was tight and independent — three separate runs reproduced both the number and the reasoning.

Why it lands low (the agreed factors)

  • The materials are period-sourced, not memory. Brazel’s contemporaneous July-1947 description (foil, tough paper, sticks, rubber, “tape with flowers printed upon it”) maps onto an ML-307 radar target; the flowered tape is “a near-fingerprint” for the toy-company target construction.
  • A real, classified, local balloon program explains both the odd debris and the secrecy without invoking anything exotic — and even the pro-ET witnesses’ own affidavits (Schreiber: “pieces of a large balloon which had burst,” kite sticks, pastel tape, “just a bunch of garbage”) describe balloon-target components; qualified officers identified it on sight; the recovered quantity was trivial.
  • The exotic claims are uniformly the weakest evidentiary tier — decades-later, escalating, anonymous (Stringfield) or single-witness and discredited (Dennis’s bodies account), with no contemporaneous record and no preserved physical material.
  • Base rate. Recovered-ET-craft is an extraordinary claim, and the prosaic account is sufficient.

Why it isn’t zero (the real holes — and their limit)

The genuine weaknesses surfaced elsewhere in this base — the contested, self-contradictory Flight-4 identification, the documented cover story (DuBose), the anomalous-property affidavits Moore “had no explanation” for, and Marcel’s lifelong “not a weather balloon” — push the estimate off zero. But all three jurors independently applied the key discipline: the Flight-4 and cover-story holes relocate which balloon / what the cover concealed — they do not bridge to exotic. A botched “flying disc” press release plus a classified balloon program fully accounts for them.

The single shared uncertainty

The anomalous-property testimony — foil that “sprang back into its original shape” wrinkle-free (Tadolini), “wouldn’t crush or burn” (Proctor). If those are accurate observations rather than decades-later embellishment, they are the one thread resisting a plain balloon reading. The jurors weighted them as memory drift (consistent with the rest of the base), and noted that even if genuine, they tilt toward “unusual classified material,” not necessarily ET. The estimate that would move it sharply upward: surviving, independently testable physical debris with properties no 1947 terrestrial program could produce — which does not exist.

Honest caveat on the number

This is a calibrated judgment, not a measurement, and it is conditioned on a briefing assembled here — which I worked to keep balanced (strongest points in both directions), but which inevitably embeds this base’s weighting. The load-bearing assumption is that the anomalous-property affidavits are decades-later recollection, not reliable observation; a reasoner who treats that testimony as sound would land materially higher. The ~4–5% reflects the base’s consistent treatment of late/anonymous/unpreserved testimony as the weak tier. It is a statement about the weight of the available evidence, not a claim that the exotic possibility is refuted.

External cross-checks (other models / CLIs) — with provenance caveats

The same balanced briefing was re-run through several other models/CLIs to test robustness. Read the provenance column before trusting the convergence — most of these were instructed to reason from the briefing only (“Do not browse or use tools”), but only the codex runs are provenance-verified against a session log.

System / modelP(non-prosaic)Provenance
Claude jurors ×3 (panel)4, 4, 4briefing-only by intent (subagents had file access; not sandbox-verified)
Cursor agent CLI4unverified (no isolated session log found)
agy CLI1.7unverified; ran with --dangerously-skip-permissions (could have read files)
Gemini 2.5 Flash15not a clean control — output shows “Falling back to GrepTool”; tool subsystem was live, cannot certify briefing-only
Gemini 3 Flash (preview)7same caveat as 2.5 Flash
codex (gpt-5.5)2.7VERIFIED briefing-only — session log has only user→reasoning→answer events; zero tool calls; read-only sandbox; 11.7k tokens
codex (gpt-5.5)6VERIFIED browsed — a reworded “read the related files first” prompt; 181k tokens; read 40+ repo files (incl. ones not suggested) + external critiques

What the cross-checks show. Median ≈ 4; the central cluster is 1.7–7, and every run independently reproduced the same up/down factors and the same single uncertainty (the anomalous-property testimony). The one high outlier (Gemini 2.5 Flash, 15) is also the weakest model and a non-clean control. The convergence is therefore real but softer than a clean panel — only codex is provenance-verified.

The most informative single result is the codex pair: told not to browse it returned 2.7; told to read the actual base files (richer and more pro-exotic-curious than the distilled briefing — full affidavits, the cover story, the Flight-4 holes) it returned 6. Browsing moved the number up ~3 points but did not change the conclusion (~94% prosaic). That is empirical evidence the briefing was not smuggling in a lower number than the underlying primary material supports — if anything, the fuller record nudges slightly more uncertain, not less.