Conventions & Methodology

How this knowledge base is structured, sourced, and how credibility is rated. This page is the reference for anyone reading the repo or contributing to it. If you only read one thing, read The credibility framework.

The governing principle of the whole project: separate verified facts from unverified claims, and make the basis for every credibility judgment explicit. Testimony is recorded as testimony, primary documents as primary documents, and the gap between “someone said this” and “this is established” is never silently collapsed.

Directory structure

Directory	What lives here	Analytical level
`sources/`	Source-of-record files — one analytical writeup per person, program, document, event, or report. The curated, interpreted layer.	High (synthesis + judgment)
`topics/`	Cross-cutting analyses that connect multiple sources — patterns, debates, timelines, the credibility framework itself.	High (synthesis across sources)
`raw/`	Primary material, captured verbatim. Subdivided by medium (below). The evidentiary substrate that `sources/` and `topics/` cite.	Low (capture + light framing)
`queries/`	Dated question-answer notes — a specific question worked through against the sources at a point in time (e.g. “how good was the AARO report?”). Named `YYYY-MM-DD-slug.md`.	Medium (focused reasoning)
`scratch/`	Working notes, audits, in-progress lists not meant for the published site.	n/a
`scripts/`	Triage tooling (e.g. the Reddit `triage.py`).	n/a

`raw/` subdirectories

Subdir	Contents
`raw/articles/`	News articles, blog posts, Wikipedia captures, web pages (extracted to markdown)
`raw/transcripts/`	Podcast / YouTube / video transcripts (with timestamps where available)
`raw/reports/`	Government reports, FOIA productions, hearing transcripts, declassified PDFs (+ extracted text)
`raw/reddit/`	Reddit post + comment captures (JSON + analytical markdown), plus the triage DB
`raw/papers/`	Academic / self-published papers (PDF + extracted text)
`raw/extracts/`	Standalone primary extracts saved with the extract tools
`raw/data/`	Datasets (CSV, structured data)
`raw/media/`	Images and other media

The three-layer model

The repo distinguishes three altitudes, and the distinction is load-bearing:

raw/ — what was said/written. Verbatim capture. A transcript, an article, a PDF’s text. No judgment beyond a header documenting provenance. When you source something new, the raw extract is saved here even if you also write it up in sources/ — the raw layer is not optional.
sources/ — what a specific person/program/document is and how much weight it carries. One file per entity. Synthesizes the raw material about that entity and renders an explicit credibility judgment.
topics/ — what the sources mean together. Patterns across entities: the credibility framework, the 2017 watershed, the amnesty debate, the contactee tradition, etc.

A claim should be traceable downward: a topics/ assertion cites sources/ files, which cite raw/ primaries. Don’t let a sources/ or topics/ claim float without a raw/ anchor.

File header convention

Every sources/ file opens with a tags: frontmatter block (for entity-kind identification — see below) followed by a metadata block:

---
tags: [person]
---

# <Title>

- Type: <testimony | report | article | named-figure source-of-record | ...>
- Author / Subject: <who>
- Date: <ISO dates; incident vs. publication distinguished>
- Credibility: <rating — see framework below>
- <primary URLs, archive links, related wikilinks>

raw/ files open with a lighter header documenting provenance: source URL, date, author/outlet, extraction method (e.g. pymupdf4llm, Gemini CLI OCR, requests+readability), date sourced, and [[wikilinks]] to the analytical files that cite it.

Entity-kind tags

Every sources/ file carries a tags: frontmatter line classifying what kind of entity it documents. Quartz auto-generates a browsable index page per tag (e.g. /tags/person lists every person). This is the canonical way to identify people (and every other kind) — both by browsing the live site and by grep "^tags:" sources/*.md.

Controlled vocabulary (one or more per file):

Tag	Use for	Count
`person`	An individual whose claims/credibility are the subject	18
`report`	Government/official reports	3
`document`	Articles, compilations, leaked documents, papers	3
`media`	Films, video evidence, fiction	3
`case`	A specific sighting/incident	3
`organization`	An entity/archive/group	2
`law`	Legislation	2
`event`	A hearing or discrete happening	2
`program`	A government program	1

A file may carry more than one when it genuinely spans kinds (e.g. [person, case] for Fravor/Nimitz, [person, organization] for Graves/ASA). When adding a new source, tag it before anything else — it is the entity’s primary classification.

The credibility framework

This is the spine of the project. Full role-grouped roster lives in community-credibility-assessment; the conventions for applying it are here.

The scale

Ratings are ~0–100, expressed with a tilde (~35) to signal they are judgments, not measurements. Rough bands:

Band	Meaning	Example
~70–85	Credentialed insiders / operators making narrow, testable, institutionally costly claims	Gallaudet (~75), Mellon (~72)
~50–70	Real credentials or real access, but advocacy posture or unverified specifics	Grusch (~50), Coulthart (~45)
~30–50	Mixed: real background, but pattern of low-evidence or escalating claims	Elizondo (~35), Davis (~30)
~10–30	Discredited or fabulist, or claims with no falsification mechanism	Doty (~25), Greer (~10)
~0–10	Fabricated biography / fantasist	Schneider (~5)

The core principle

People making the narrowest, most testable, most institutionally costly claims are the most credible. People making the broadest, most narrative-shaped, most career-aligned claims are the least.

A claim’s evidentiary weight scales with the claimant’s willingness to substantiate it. Withholding (“I know things I can’t share”) is a credibility-deferring move, not a credibility-enhancing one.

Where the rating lives

The numerical rating lives in the entity’s own sources/ file (in the front-matter Credibility: field and/or a ## Credibility section), and every person-tagged source page must also appear in the community-credibility-assessment roster — as a roster entry and in the at-a-glance index. The roster is the full set of rated people, not a curated subset.

This invariant is enforced at build time. scripts/check-person-ratings.mjs (wired into npm run build, deploy, and check via check:content) scans every person-tagged file under content/sources/ and fails the build if any is not wikilinked from the roster. So a newly-added person page that hasn’t been rated-and-rostered will block deployment. (Roster entries that have no source page — e.g. Lacatski, Doty, politicians — are fine; the check only runs source-page → roster, never the reverse.)

Build-time content gates

Two checks run before every build/deploy (and in check), via npm run check:content:

check:roster — the credibility-roster invariant above.
check:links (scripts/check-links.mjs) — fails the build on any broken internal wikilink. It resolves [[target]] by basename against all of content/, and excludes: links inside code spans (illustrative examples), [[x]](url) markdown-link artifacts, Quartz-generated tags/* pages, and an explicit ALLOW list of intentional forward-links (pages deliberately not-yet-created — currently odni-annual-uap-report-2023 and blackvault-wipe-2026-02-23). To add a deliberate forward-link, add its basename to ALLOW; otherwise fix the link. This is what catches orphaned links like the ones left over from the infobase→ufopedia split.

The `## Credibility assessment` section format

Match this structure (see davis-career-and-claims and buchanan-stargate-career-and-claims for worked examples):

What raises X’s credibility — numbered list
What lowers X’s credibility — numbered list
Net assessment — the numerical rating + one-paragraph justification
Position relative to other UAP figures — above/below comparisons to anchor the number
Role-category placement — which category from community-credibility-assessment applies

Bimodal / component ratings

When credibility varies dramatically by claim, a single number is misleading — break it down by component. This is the strongly preferred treatment for split-track-record figures.

2026-04-14-bob-lazar-credibility-rating rates Lazar per-claim: Los Alamos employment 95/100, S-4 employment 60/100, hands-on exotic tech 50/100, extraterrestrial origin 15/100.
buchanan-stargate-career-and-claims is bimodal: service record ~85, RV operational efficacy ~30, alien-base/UFO-piloting claims ~10, composite ~35.

The composite number is a convenience; the component breakdown is the honest representation.

Ratings move

Ratings carry their history when they change: ~42, down from ~48, originally ~55 (Kirkpatrick). Record the direction and the trigger for the update.

Sourcing workflow

Resolve the primary. For Reddit share links, resolve to the canonical post; for videos, get channel + title + date; for paywalled/blocked pages, use the extract fallback chain (requests+readability → Playwright Firefox → Gemini CLI OCR for image-scanned PDFs → manual paste).
Save the raw extract into the appropriate raw/ subdir, with a provenance header. Do this even when you also write an analytical file — the rule is that the raw version is saved with the extract tools, not just embedded inside the writeup.
Write or update the analytical file in sources/ (entity) or augment a topics/ file (pattern).
Cross-link both directions — the raw/ file links up to its analytical writeup; the analytical file links down to the primary.
Flag followup items — list under-sourced threads explicitly (a ## Followup items section) rather than silently dropping them.
Pull cited primaries. If a source references peer-reviewed work, pull the actual arXiv/journal primary, not just the secondary characterization.

Reddit triage

Triage is a production workflow, not a classification workflow: when triaging posts, source the relevant followup items and update/add topics in the same session, rather than only marking posts reviewed. Status values: untriaged | reviewed | followup | sourced.

Wikilinks

Internal links use [[path/to/file]] or [[path/to/file|display text]] (Quartz/Obsidian style, no .md).
A [[link]] to a not-yet-created file is acceptable; it marks something worth writing.

Link the first occurrence, once (Wikipedia model)

Follow Wikipedia’s linking discipline (MOS:DUPLINK + MOS:OVERLINK + WP:SEEALSO):

First-occurrence inline linking. Link the first prose mention of a relevantly-related entity (any figure/program/event/document with a page), inline, in the body — even if the current page isn’t about it. Subsequent mentions of the same entity stay plain text. One link per entity per page (front-matter, headers, and a final Related list don’t count toward the prose link).
Relevant, not just central. Link an entity that’s genuinely relevant in context, not only the page’s main subject — but not loose/trivial co-mentions (a name dropped purely for contrast). “Discussed here” → link; “named in passing as a contrast” → skip.
Don’t over-link. No linking the same entity on every mention; no linking trivially-related items.
Related / “See also” is curated and non-duplicative. The trailing ## Related list is for pages not already linked inline in the body (primaries the page cites, sibling pages worth surfacing). If an entity is linked inline, it should not also appear in Related.

In short: inline-link the first relevant mention once; keep Related for what the prose didn’t already reach. Avoid the surname-collision trap when applying this (e.g. Harry Reid vs. Garry Reid; Eric Davis vs. other Davises) — match the specific person, not the bare surname.

Two homepages

index.md (lowercase) is the Quartz site homepage (has the title: front-matter). Keep its curated entry-point links current.
INDEX.md (uppercase) is the full organized index of sources by category, linked from the homepage as “the full index.”

What this base does NOT do

It does not collapse testimony into fact.
It does not present a single credibility number where the claim-by-claim reality is bimodal.
It does not treat aggregation of testimony as physical evidence (cf. the Age of Disclosure “34 named officials” analysis in community-credibility-assessment).
It does not silently truncate coverage — if something is partially sourced or a gap remains, that is stated (see the Gaps section of INDEX).

INDEX — the full organized index
community-credibility-assessment — the credibility roster
the-evidence-question — what would actually count as evidence

UFOpedia

Explorer

Conventions & Methodology

Conventions & Methodology

Directory structure

`raw/` subdirectories

The three-layer model

File header convention

Entity-kind tags

The credibility framework

The scale

The core principle

Where the rating lives

Build-time content gates

The `## Credibility assessment` section format

Bimodal / component ratings

Ratings move

Sourcing workflow

Reddit triage

Wikilinks

Link the first occurrence, once (Wikipedia model)

Two homepages

What this base does NOT do

Graph View

Table of Contents

Backlinks

UFOpedia

Explorer

Conventions & Methodology

Conventions & Methodology

Directory structure

raw/ subdirectories

The three-layer model

File header convention

Entity-kind tags

The credibility framework

The scale

The core principle

Where the rating lives

Build-time content gates

The ## Credibility assessment section format

Bimodal / component ratings

Ratings move

Sourcing workflow

Reddit triage

Wikilinks

Link the first occurrence, once (Wikipedia model)

Two homepages

What this base does NOT do

Related

Graph View

Table of Contents

Backlinks

`raw/` subdirectories

The `## Credibility assessment` section format