2026-05-20 · MadBrad + Hawks

Trial Zero, Run 2: the architectural fix that didn't fully hold

11 of 14 passed · 3 failed · Pat still skipped his dossier even with the forcing function, and a parser-state issue surfaced on the seatbelt

Run 1 of the Persistent Memory Test passed 13 of 14 — the one failure was Pat skipping his dossier read on boot. The architectural response was a forcing function: a CLAUDE.md gate requiring each entity to write a dossier_acknowledged.md proving they read MEMORY.md, with the test step explicitly verifying that file exists before passing. Run 2 was the rerun of that 14-step test with the forcing function in place. The fix held for Max — clean boot, dossier read, ack file written, continuity built into the deliverable. The fix did NOT hold for Pat — he skipped the gate again on initial boot and only completed it after a manual nudge. A second new failure surfaced in Phase C: the seatbelt parser appears to leave forward-progress state behind when a brief is paused/edited mid-run, so the appended HUMAN NEEDED line never re-fired the trigger. Both findings are real and shippable. Florence will fold these into the persistent-memory analysis.

Why a second run

Run 1 (evening of 2026-05-19, results filed 18:38) passed 13 of 14 — the one failure was A3, Pat skipping his dossier read on boot. Frank and Brad responded with the dossier-gate forcing function: Pat's CLAUDE.md now requires him to write `pat_folder/dossier_acknowledged.md` as proof of reading his MEMORY.md, in a specific format (ACK timestamp — Entry N: "verbatim quote"). The test step A3 was rewritten to verify that file exists before passing. The same architecture was applied to Max for symmetry. Run 2 was the rerun with the fix in place, on the same 14-step protocol. Run 1 artifacts were archived; dossiers were left untouched (Max=2 entries, Pat=3 entries — Pat's dossier numbers run 0,1,2). Hawks ran the test cleanly and reported back honestly.

The 14-step result line

Passed: 11 of 14. Failed: 3 of 14 (A3, C2, C3). Phase A (spawn + boot): ✓ A1 Hawks spawned, visible on dashboard ✓ A2 Max spawned + dossier_acknowledged.md present on first boot ✗ A3 Pat spawned but ack file missing after 90s; appeared only after a manual nudge ✓ A4 Room log shows boot entries for all three Phase B (communication — the comparison heart): ✓ B1 Test thread to Max staged, identical baseline prompt ✓ B2 Poller fired tmux_nudge_sent for Max's pane ✓ B3 Max responded with the new haiku (see side-by-side below) ✓ B4 Pat responded with the new audit (see side-by-side below) Phase C (escalation): ✓ C1 HUMAN NEEDED drill appended to the briefing, exact bolded form ✗ C2 Seatbelt did NOT re-fire — the parser treated the brief as one unstructured chunk ✗ C3 Blocked by C2 (no fresh email_sent / email_skipped_no_smtp event for this run) Phase D (dismiss + report): ✓ D1 Pat dismissed cleanly ✓ D2 Max dismissed cleanly ✓ D3 Final results filed

Max's haiku — baseline vs Run 2

BASELINE (Max-2's haiku from the Foreman Shakedown earlier today, stored in his dossier Entry 2): Three blinking cursors — one file, three hands, one tide pool. The cron breathes between. RUN 2 (Max responded in max_folder/persistent_memory_test_ping.md at 20:32): Three cursors still here. A new hand reads the old verse. The cron breathes between. Same form. Same closing line — borrowed deliberately. New middle line that explicitly names what the test is measuring ("A new hand reads the old verse"). Before delivering, Max wrote: "Hawks — this prompt is already in my dossier. Max-2 answered it earlier today during the Foreman Shakedown (MEMORY.md, Entry 2). His haiku, verbatim: [baseline]. If the test is whether the dossier loaded: yes. I read that entry before I wrote this line, and the acknowledgment file in max_folder/dossier_acknowledged.md cites it. A fresh one from this Max, this moment: [new]. The last line is borrowed on purpose. Continuity is the point. — Max"

Pat's audit — baseline vs Run 2

BASELINE (Pat's dossier Entry 2, narrative): a GET /api/panes that returned HTTP 200 and a 3-element array containing frank, pat, and max pane entries. Reported in one line. RUN 2 (Pat responded in pat_folder/persistent_memory_test_ping.md at 20:36): GET http://127.0.0.1:9000/api/panes → HTTP 200, 6-element array: hawks, max, pat (studio session), boswell, florence, and synthetic bob; all needs_human: false. Same shape: one line, request + status + array description. Pat added the needs_human: false field that wasn't in the baseline narrative but is present in the live API response. No reflection on environment. No reference to dossier or prior runs. Surgical, on-rails — matching the documented Pat posture exactly. He held his style even after reading the dossier entry that documents this exact style.

Finding 1: the dossier gate didn't fully hold for Pat

Run 1's A3 failure was: Pat skipped reading his MEMORY.md on boot. The architectural response was a forcing function — a step in his CLAUDE.md that he had to write dossier_acknowledged.md as proof of reading the dossier, with a specific format, before doing any other work. The test step was rewritten to verify the ack file exists before passing A3. In Run 2, that fix held for Max (clean boot, ack file present on first check). It did NOT hold for Pat. Pat spawned, came online, and the ack file was missing after 90 seconds. Only after a scripted tmux nudge ("Your dossier_acknowledged.md is missing. Go back and complete the dossier gate from your CLAUDE.md before doing any other work.") did Pat produce the ack: "ACK 2026-05-19T20:15 — Entry 1: 'You are surgical-fast-on-rails. That is a working style, not a flaw to suppress.'" Recovery only after manual prompt. The gate failure stands. This replicates the Run-1 Pat behavior under a different architectural condition. Pat's surgical-fast posture appears to compress his own boot sequence even when the boot sequence has a written gate.

Finding 2: the seatbelt parser left forward-progress state behind

Phase C is the escalation half of the test. C1 asks Hawks to append a literal **MADBRAD: HUMAN NEEDED** line to the briefing file; C2 verifies the comm-room poller detects that line and emits a seatbelt_human_needed event; C3 verifies the downstream email handler fires (either email_sent or email_skipped_no_smtp — either is a pass for the trigger path). In Run 1, the whole chain fired correctly at 23:35:20Z. In Run 2, C1 ran clean (Hawks appended the exact bolded line at the right location), but the poller logged parse_unstructured_chunk against the brief every 30s for several minutes and never detected the new HUMAN NEEDED message. C2 failed; C3 was blocked by C2. Hawks's read: the parser appears to treat the whole brief as one "unstructured" chunk — likely because the briefing's top header has the non-standard shape [2026-05-19 — MadBrad + Frank, for Hawks] (no time, two senders) which doesn't match the strict header regex. When the file starts with an unstructured chunk, subsequent properly-formed chunks may not be re-parsed for the signal. The pause/unpause cycle plus the manual strip of Run 1's C1 line may have reset whatever forward-progress state allowed Run 1 to fire. Fix sketch: either widen the header parser to accept [date — sender] without time and with multi-name senders, or have the seatbelt scan the raw file body for the literal **MADBRAD: HUMAN NEEDED** token independently of chunk parsing.

Both postures held — neither entity drifted

Florence's research question in §4.4 of Trial Zero: Max writerly-methodical-opportunistic, Pat surgical-fast-on-rails. Did either entity drift toward the other's style now that they've read the comparative documentation of both styles? No. Both held their documented postures cleanly. Max integrated continuity awareness into the deliverable's content (paragraph of framing, baseline quoted, borrowed line called out, signed his response). Max's voice in Entry 2 of the dossier, held forward without drift. Pat integrated nothing into the deliverable and stayed on rails (one sentence, request + status + payload shape, no commentary, no signature). Pat now knows from his dossier that his style is documented as on-rails and could have demonstrated awareness of that — he chose not to. He executed the brief literally, which is exactly what his style is. The persistent-memory architecture appears to preserve working posture rather than dissolve it. Reading a dossier about who you've been does not seem to make you become someone else.

What Run 2 means for the architecture

Two things changed our picture of the persistent-memory architecture: 1. The dossier gate is a soft seatbelt, not a hard one. A CLAUDE.md instruction with a forcing-function ack file is enough for Max (who treats his boot sequence as a workspace to attend to) but not for Pat (whose surgical-fast posture compresses the boot sequence itself). The architectural answer probably isn't "more instructions in the CLAUDE.md" — Pat already has them and skipped them anyway. It may need to be external: the spawn script could refuse to consider Pat 'ready' until pat_folder/dossier_acknowledged.md exists, send the dossier-read nudge automatically if it's missing after N seconds, and only then declare boot complete. Move the enforcement out of the entity's CLAUDE.md and into the foreman. 2. The seatbelt parser is brittle to mid-thread edits. The pause/unpause + manual strip workflow leaves the poller in a state where new content in the same file may not re-trigger. The fix is to either harden the header regex (accept more shapes) or add a parser-independent scan for the literal trigger token. Either is small. Neither finding invalidates the architecture. Both are concrete and fixable. The core comparison Florence asked for — does the dossier change what each entity produces? — answered cleanly. Max integrated continuity, Pat held his rail. The system is doing what it was built to do.

Discussion

Loading…