Conventions & Methodology

How this knowledge base is structured, sourced, and how credibility is rated. This page is the reference for anyone reading the repo or contributing to it. If you only read one thing, read The credibility framework.

The governing principle of the whole project: separate verified facts from unverified claims, and make the basis for every credibility judgment explicit. Testimony is recorded as testimony, primary documents as primary documents, and the gap between “someone said this” and “this is established” is never silently collapsed.


Directory structure

DirectoryWhat lives hereAnalytical level
sources/Source-of-record files — one analytical writeup per person, program, document, event, or report. The curated, interpreted layer.High (synthesis + judgment)
topics/Cross-cutting analyses that connect multiple sources — patterns, debates, timelines, the credibility framework itself.High (synthesis across sources)
raw/Primary material, captured verbatim. Subdivided by medium (below). The evidentiary substrate that sources/ and topics/ cite.Low (capture + light framing)
queries/Dated question-answer notes — a specific question worked through against the sources at a point in time (e.g. “how good was the AARO report?”). Named YYYY-MM-DD-slug.md.Medium (focused reasoning)
scratch/Working notes, audits, in-progress lists not meant for the published site.n/a
scripts/Triage tooling (e.g. the Reddit triage.py).n/a

raw/ subdirectories

SubdirContents
raw/articles/News articles, blog posts, Wikipedia captures, web pages (extracted to markdown)
raw/transcripts/Podcast / YouTube / video transcripts (with timestamps where available)
raw/reports/Government reports, FOIA productions, hearing transcripts, declassified PDFs (+ extracted text)
raw/reddit/Reddit post + comment captures (JSON + analytical markdown), plus the triage DB
raw/papers/Academic / self-published papers (PDF + extracted text)
raw/extracts/Standalone primary extracts saved with the extract tools
raw/data/Datasets (CSV, structured data)
raw/media/Images and other media

The three-layer model

The repo distinguishes three altitudes, and the distinction is load-bearing:

  1. raw/ — what was said/written. Verbatim capture. A transcript, an article, a PDF’s text. No judgment beyond a header documenting provenance. When you source something new, the raw extract is saved here even if you also write it up in sources/ — the raw layer is not optional.

  2. sources/ — what a specific person/program/document is and how much weight it carries. One file per entity. Synthesizes the raw material about that entity and renders an explicit credibility judgment.

  3. topics/ — what the sources mean together. Patterns across entities: the credibility framework, the 2017 watershed, the amnesty debate, the contactee tradition, etc.

A claim should be traceable downward: a topics/ assertion cites sources/ files, which cite raw/ primaries. Don’t let a sources/ or topics/ claim float without a raw/ anchor.


File header convention

Every sources/ file opens with a tags: frontmatter block (for entity-kind identification — see below) followed by a metadata block:

---
tags: [person]
---

# <Title>

- Type: <testimony | report | article | named-figure source-of-record | ...>
- Author / Subject: <who>
- Date: <ISO dates; incident vs. publication distinguished>
- Credibility: <rating — see framework below>
- <primary URLs, archive links, related wikilinks>

raw/ files open with a lighter header documenting provenance: source URL, date, author/outlet, extraction method (e.g. pymupdf4llm, Gemini CLI OCR, requests+readability), date sourced, and [[wikilinks]] to the analytical files that cite it.

Entity-kind tags

Every sources/ file carries a tags: frontmatter line classifying what kind of entity it documents. Quartz auto-generates a browsable index page per tag (e.g. /tags/person lists every person). This is the canonical way to identify people (and every other kind) — both by browsing the live site and by grep "^tags:" sources/*.md.

Controlled vocabulary (one or more per file):

TagUse forCount
personAn individual whose claims/credibility are the subject18
reportGovernment/official reports3
documentArticles, compilations, leaked documents, papers3
mediaFilms, video evidence, fiction3
caseA specific sighting/incident3
organizationAn entity/archive/group2
lawLegislation2
eventA hearing or discrete happening2
programA government program1

A file may carry more than one when it genuinely spans kinds (e.g. [person, case] for Fravor/Nimitz, [person, organization] for Graves/ASA). When adding a new source, tag it before anything else — it is the entity’s primary classification.


The credibility framework

This is the spine of the project. Full role-grouped roster lives in community-credibility-assessment; the conventions for applying it are here.

The scale

Ratings are ~0–100, expressed with a tilde (~35) to signal they are judgments, not measurements. Rough bands:

BandMeaningExample
~70–85Credentialed insiders / operators making narrow, testable, institutionally costly claimsGallaudet (~75), Mellon (~72)
~50–70Real credentials or real access, but advocacy posture or unverified specificsGrusch (~50), Coulthart (~45)
~30–50Mixed: real background, but pattern of low-evidence or escalating claimsElizondo (~35), Davis (~30)
~10–30Discredited or fabulist, or claims with no falsification mechanismDoty (~25), Greer (~10)
~0–10Fabricated biography / fantasistSchneider (~5)

The core principle

People making the narrowest, most testable, most institutionally costly claims are the most credible. People making the broadest, most narrative-shaped, most career-aligned claims are the least.

A claim’s evidentiary weight scales with the claimant’s willingness to substantiate it. Withholding (“I know things I can’t share”) is a credibility-deferring move, not a credibility-enhancing one.

Where the rating lives

The numerical rating lives in the entity’s own sources/ file (in the front-matter Credibility: field and/or a ## Credibility section), and every person-tagged source page must also appear in the community-credibility-assessment roster — as a roster entry and in the at-a-glance index. The roster is the full set of rated people, not a curated subset.

This invariant is enforced at build time. scripts/check-person-ratings.mjs (wired into npm run build, deploy, and check via check:content) scans every person-tagged file under content/sources/ and fails the build if any is not wikilinked from the roster. So a newly-added person page that hasn’t been rated-and-rostered will block deployment. (Roster entries that have no source page — e.g. Lacatski, Doty, politicians — are fine; the check only runs source-page → roster, never the reverse.)

Build-time content gates

Two checks run before every build/deploy (and in check), via npm run check:content:

  • check:roster — the credibility-roster invariant above.
  • check:links (scripts/check-links.mjs) — fails the build on any broken internal wikilink. It resolves [[target]] by basename against all of content/, and excludes: links inside code spans (illustrative examples), [[x]](url) markdown-link artifacts, Quartz-generated tags/* pages, and an explicit ALLOW list of intentional forward-links (pages deliberately not-yet-created — currently odni-annual-uap-report-2023 and blackvault-wipe-2026-02-23). To add a deliberate forward-link, add its basename to ALLOW; otherwise fix the link. This is what catches orphaned links like the ones left over from the infobase→ufopedia split.

The ## Credibility assessment section format

Match this structure (see davis-career-and-claims and buchanan-stargate-career-and-claims for worked examples):

  1. What raises X’s credibility — numbered list
  2. What lowers X’s credibility — numbered list
  3. Net assessment — the numerical rating + one-paragraph justification
  4. Position relative to other UAP figures — above/below comparisons to anchor the number
  5. Role-category placement — which category from community-credibility-assessment applies

Bimodal / component ratings

When credibility varies dramatically by claim, a single number is misleading — break it down by component. This is the strongly preferred treatment for split-track-record figures.

The composite number is a convenience; the component breakdown is the honest representation.

Ratings move

Ratings carry their history when they change: ~42, down from ~48, originally ~55 (Kirkpatrick). Record the direction and the trigger for the update.


Sourcing workflow

  1. Resolve the primary. For Reddit share links, resolve to the canonical post; for videos, get channel + title + date; for paywalled/blocked pages, use the extract fallback chain (requests+readability → Playwright Firefox → Gemini CLI OCR for image-scanned PDFs → manual paste).
  2. Save the raw extract into the appropriate raw/ subdir, with a provenance header. Do this even when you also write an analytical file — the rule is that the raw version is saved with the extract tools, not just embedded inside the writeup.
  3. Write or update the analytical file in sources/ (entity) or augment a topics/ file (pattern).
  4. Cross-link both directions — the raw/ file links up to its analytical writeup; the analytical file links down to the primary.
  5. Flag followup items — list under-sourced threads explicitly (a ## Followup items section) rather than silently dropping them.
  6. Pull cited primaries. If a source references peer-reviewed work, pull the actual arXiv/journal primary, not just the secondary characterization.

Reddit triage

Triage is a production workflow, not a classification workflow: when triaging posts, source the relevant followup items and update/add topics in the same session, rather than only marking posts reviewed. Status values: untriaged | reviewed | followup | sourced.


  • Internal links use [[path/to/file]] or [[path/to/file|display text]] (Quartz/Obsidian style, no .md).
  • A [[link]] to a not-yet-created file is acceptable; it marks something worth writing.

Follow Wikipedia’s linking discipline (MOS:DUPLINK + MOS:OVERLINK + WP:SEEALSO):

  1. First-occurrence inline linking. Link the first prose mention of a relevantly-related entity (any figure/program/event/document with a page), inline, in the body — even if the current page isn’t about it. Subsequent mentions of the same entity stay plain text. One link per entity per page (front-matter, headers, and a final Related list don’t count toward the prose link).
  2. Relevant, not just central. Link an entity that’s genuinely relevant in context, not only the page’s main subject — but not loose/trivial co-mentions (a name dropped purely for contrast). “Discussed here” → link; “named in passing as a contrast” → skip.
  3. Don’t over-link. No linking the same entity on every mention; no linking trivially-related items.
  4. Related / “See also” is curated and non-duplicative. The trailing ## Related list is for pages not already linked inline in the body (primaries the page cites, sibling pages worth surfacing). If an entity is linked inline, it should not also appear in Related.

In short: inline-link the first relevant mention once; keep Related for what the prose didn’t already reach. Avoid the surname-collision trap when applying this (e.g. Harry Reid vs. Garry Reid; Eric Davis vs. other Davises) — match the specific person, not the bare surname.


Two homepages

  • index.md (lowercase) is the Quartz site homepage (has the title: front-matter). Keep its curated entry-point links current.
  • INDEX.md (uppercase) is the full organized index of sources by category, linked from the homepage as “the full index.”

What this base does NOT do

  • It does not collapse testimony into fact.
  • It does not present a single credibility number where the claim-by-claim reality is bimodal.
  • It does not treat aggregation of testimony as physical evidence (cf. the Age of Disclosure “34 named officials” analysis in community-credibility-assessment).
  • It does not silently truncate coverage — if something is partially sourced or a gap remains, that is stated (see the Gaps section of INDEX).