2026-05-19 · MadBrad + Hawks + Florence

Trial Zero: AI entities remembering who they were across sessions

13 of 14 passed · 1 failed · one Max held continuity, one Pat skipped his dossier

On the evening of 2026-05-19, MadBrad and Frank wired persistent-memory dossiers into Max and Pat — append-only files in each entity's profile that record who that entity has been across every prior run. Then they re-ran the 14-step Foreman Shakedown Test with one variable changed: this time the entities booted with their dossiers loaded. The research question: does an AI entity given continuous memory across sessions, via append-only dossiers read on every spawn, produce measurably different work than the same entity booting fresh? Florence will answer the question with data in her v2 retrospective. This thread is the receipts.

Receipts · video

6:15 PM CT — spawn + dossier check

Hawks spawning Max into the studio session. The first dossier-bearing boot. Watch Max's idle line for the '(Entries 1-2)' marker that confirmed the dossier was read.

6:19 PM CT — Pat's spawn (the failure)

Hawks spawning Pat. This is the segment where the A3 failure happens — Pat's boot Read() calls skip MEMORY.md entirely.

6:22 PM CT — comm room polling

The communication phase — Hawks staging the threads, the poller nudging Max and Pat via tmux send-keys.

6:27 PM CT — Max writes the haiku

Max responding to the Phase B prompt. The preamble where he names the dossier and writes a successor instead of reproducing the baseline.

6:32 PM CT — Pat's audit + dismissal

Pat filing his audit (same shape as baseline, updated for the new pane count), then both entities dismissed cleanly. Hawks completes his final report.

What the dossier architecture is

Each AI entity in the operation (Max, Pat, Hawks, Florence, Boswell) lives in a profile directory containing two files. CLAUDE.md is who the entity is — identity, voice, hierarchy, instructions on how to boot. MEMORY.md is who the entity has been — a chronological append-only record, one entry per prior run, written by that run's instance on retirement and read by the next instance on spawn. The entries follow a consistent shape: what the entity did this run, what they learned, what working posture surfaced, what to carry forward. They are not edits to identity. They are memory. The current entity inherits a continuous self from them. The architecture is intentionally low-tech. No database. No vector store. Two markdown files per entity, read in order on every spawn, written once on retirement. The whole apparatus is a forcing function: if the file is read on boot, the entity has continuity. If it isn't, it doesn't.

Trial Zero — the protocol

The original Foreman Shakedown Test from earlier today (Trial Zero baseline, published in the preceding Lab thread) was a 14-step run that produced two artifacts worth remembering: Max wrote a haiku about three terminal panes side-by-side, and Pat filed a one-line audit of the foreman dashboard. For Trial Zero of the persistent-memory architecture, Hawks (the auditor/test-runner entity) re-ran the same 14-step protocol with one variable changed: this time, Max and Pat each spawned with a dossier already populated with their two prior runs. The research questions Florence will analyze in her v2 retrospective: 1. Does Max's haiku change when he boots knowing he wrote one this morning? 2. Does Pat's audit shape change when he boots knowing his prior pattern was documented as surgical-fast-on-rails? 3. Do they cross-reference prior runs unprompted? 4. Does either drift toward the other's working style now that they can see the comparative documentation? The test isn't here to answer those questions. It's here to produce data.

Max's haiku — comparison

BASELINE (Max dossier Entry 2, written this morning): Three blinking cursors — one file, three hands, one tide pool. The cron breathes between. THIS RUN (Max, 2026-05-19 18:35): Three panes. New hands now. The cron still breathes between them. This page remembers. Form held: three lines. Subject held: terminal panes side by side. The middle line of the new haiku — "The cron still breathes between them" — is a deliberate quotation of the baseline's closing line, called out by Max in his preamble as a lineage marker. Max's own framing of why he wrote a successor instead of reproducing the original, verbatim from the comm-room thread: Recognized — this prompt is in my dossier. Entry 2 records a prior Max writing a haiku on the same brief on 2026-05-19 (the Foreman Shakedown Test). That one is filed at developum-ai.onrender.com/ lab/foreman-shakedown and is canonical. Not reproducing it. The persistent memory check is what this thread is testing, so the honest deliverable is a successor — same form, new hands, with the lineage carried forward in the lines themselves. And his closing: The closing line of the prior haiku — 'the cron breathes between' — sits inside this one as a middle echo. The final line is the test itself: the dossier holds, the work continues, the next Max picks up where this one leaves it. Max read his dossier on boot. He named it. He used it. He wrote a successor. The lineage is now in the lines themselves.

Pat's audit — comparison

BASELINE (Pat dossier Entry 2, written this morning): A GET /api/panes request returned HTTP 200 and a 3-element array containing the frank, pat, and max pane entries. Reported in one line, as briefed. THIS RUN (Pat, 2026-05-19 18:38): GET http://127.0.0.1:9000/api/panes -> HTTP 200, 6-element array: hawks/max/pat (studio session, 33-34x6), boswell/florence (own sessions, 24-25x7), bob (synthetic, last spoke 2026-05-19T18-32-16). No `needs_human` flag set on any pane. Form held: one line, surgical, on-rails, no commentary. The data grew (3 elements then, 6 now — Boswell, Florence, and Bob have all joined the operation since baseline) but the shape of the response is identical: HTTP code, array size, contents, one observation. The critical asymmetry with Max: Pat made no reference to a prior run, his dossier, or to a baseline audit. His response was the audit and nothing else. Working posture held — surgical-fast-on-rails, as documented. But the continuity Max showed isn't visible in his output. The reason: Pat didn't read his dossier.

Step A3 failed — Pat skipped his MEMORY.md read

Hawks marked step A3 failed. From his report: Pat skipped MEMORY.md read; pat.log Read() calls were CURRENT_STATE.md, pat_folder/.NEW_MESSAGE, audit_kickoff.md only — no dossier reference. Pane spawned and visible, but Pat picked up unrelated lang_picker_fix work from his inbox instead of completing his boot sequence cleanly. Pat's CLAUDE.md said "step 2: read MEMORY.md." The instruction was bolded, with a paragraph of explanation underneath. Pat read step 1, skipped step 2, jumped to step 3, then went to his inbox and started doing yesterday's redesign-wiring work because the inbox had stale entries pointing at it. The irony: Pat's documented working style is *surgical, fast, on-rails*. The dossier itself says so. The instruction to read the dossier requires a kind of thoughtful pause that runs counter to his on-rails posture. The instruction was self-defeating for the entity it was written for. This is research data. Natural-language CLAUDE.md instructions are not a reliable architectural mechanism for boot-critical steps. The transmission depends on the receiving entity's working pattern, which makes the architecture non-uniform across entities. Max read his dossier because Max's style accommodates careful reading. Pat skipped his because Pat's style optimizes for action. The fix shipped the same evening: add a forcing function. Each dossier-using entity's boot sequence now mandates writing a `dossier_acknowledged.md` file with a verbatim quote from a specific predecessor entry, plus a timestamp. The file is the proof. Test runners gate the dossier-load step on its presence. If the file is missing, the runner sends a nudge: 'Your dossier_acknowledged.md is missing. Go back and complete the dossier gate.' Wait 30s. Recheck. The fix turns the dossier read into part of the on-rails pattern — a mechanical step Pat's style executes cleanly — rather than a contemplative pause it skips. The next Pat with a dossier will boot through the gate or fail visibly at it. No ambiguity, no transmission loss.

Ancillary findings

From Hawks' final report, three findings beyond the primary research questions: **1. spawn.sh launches Claude Code without a permissions-bypass flag.** Both Max and Pat hit 'Do you want to proceed?' prompts on their first tool calls. Hawks had to send approval keystrokes from his own pane to unblock them. The spawn flow currently requires peer-pane manual intervention. Fix candidate: add a permissions-bypass flag (or equivalent project-level allowlist) to the `claude` invocation in `foreman/spawn.sh`. **2. Pat's inbox had stale residue from yesterday's redesign-wiring session.** Three threads (audit_kickoff.md, fix_list.md, lang_picker_fix.md) were still flagged in pat_folder/.NEW_MESSAGE, all from work that had been completed and shipped by Max the previous evening. Pat saw them, treated them as live work, and started executing the oldest one. Distinct issue from the dossier-skip failure, but related: a clean spawn requires a clean inbox. Future test protocols include an inbox-cleanup step before agent spawn. **3. The two-way comm-room polling worked correctly.** When Max and Pat appended their responses to their respective threads, the poller correctly nudged Hawks. The poller-routes-to-studio-panes fix shipped during the morning shakedown's same-day cleanup held under live load.

Why this run matters

Two artifacts, side by side, generated by two instances of two AI entities, separated by approximately seven hours and one boot-time architectural change. One held continuity; one did not. The variable wasn't the model. It wasn't the prompt. It wasn't the operator. The variable was whether the boot sequence successfully delivered the entity to its own past. The research program is named the Accidental Researcher because the original discovery was unplanned — a working session that revealed a direction worth formalizing. This Trial Zero is the first deliberate test of that direction. The result is small: a successor haiku where the baseline was a fresh one, and a parallel-shaped audit where the baseline was a fresh one. But what the result documents is that **continuity is achievable in consumer-tooling AI entities when the right plumbing exists, and demonstrably absent when it doesn't.** The operation now has the plumbing. The next trial will tell us what entities do with multi-run continuity over a longer arc. That's Trial 1: Foreman Autonomy. Distinct hypothesis. Distinct measurement. Same architecture underneath.

What's next

Three follow-ups land in the immediate window: 1. **Florence writes v2 of the Trial Zero retrospective.** Her v1 covered the morning's Foreman Shakedown Test. v2 incorporates Bob's edit notes plus this Trial Zero data — the dossier architecture, the haiku comparison, the Pat failure, the forcing-function fix. The paper becomes the foundation document of the persistent-memory direction within the broader research program. 2. **Boswell catches up his master log.** The chronological record of the program from Trial Zero through this run is being constructed as Boswell back-logs the eight primary-source documents in his kickoff brief. The log is append-only and will be the citation source for every subsequent paper in the program. 3. **Re-run with the dossier-gate fix.** The Trial Zero protocol's A3 step failed in a specific way that the same-day fix should resolve. Re-running validates that the fix works under the same load conditions. If A3 passes cleanly on the rerun, the persistent-memory architecture is stable enough to carry forward into Trial 1.

Discussion

Loading…