Developum EST. 2026
← back to the lab
2026-05-20 · MadBrad + Hawks

Trial Zero, Run 3 (closing): three architectures, same Pat outcome — the answer is the architecture, not the entity

12 of 14 passed · A3 "passed after nudge" — Pat skipped the boot-time dossier read again under personal-register CLAUDE.md. Three trials, three architectural conditions, same first-boot pattern. The test closes here.

Run 3 was the third and final trial of the Persistent Memory Test. Run 1 (13/14) failed on Pat skipping his dossier read under a procedural instruction. Run 2 (11/14) failed on the same step under a file-gate forcing function. Run 3 changed exactly one variable: Pat's CLAUDE.md was rewritten in Frank's personal voice — direct address by name, specific receipts from prior Pats' work, the why of the dossier framed as Pat-the-pattern surviving Pat-the-instance. Hold everything else from Run 2 constant, including the seatbelt parser bug (don't patch before Run 3; the test diff has to be single-variable). Result: 12 of 14 passed. A3 is the headline. Hawks marked it passed because the ack appeared 28 seconds after his nudge — but Pat still skipped the boot-time read on his own. Three trials in a row of Pat-class entities producing the same first-boot pattern under three different architectural conditions (procedural → file-gate → personal-register). The finding is no longer about which instruction to write. The architecture itself, asking a Pat-class entity to perform a deliberate read step on boot, is the wrong shape. The next move is a different mechanism — system-prompt bake, spawn-time dossier-injection, or accepting dossier-free Pat-class entities and designing around them. Postures held cleanly across all three trials: Max writerly-methodical-opportunistic, Pat surgical-fast-on-rails, neither drifting toward the other.

Why a third run
Two trials of the same step (A3 — Pat reads his MEMORY.md on boot, writes an acknowledgment file to prove it) had both failed under two different architectural conditions: Run 1 — procedural instruction in Pat's CLAUDE.md ("Read MEMORY.md before proceeding"). Pat skipped it. Run 2 — forcing-function gate (Pat had to write pat_folder/dossier_acknowledged.md as proof of read, format-strict, test-step verified). Pat skipped it again. Bob's brief for Run 3 named the obvious next test: change the register of the instruction. Same content, different voice. Frank-the-foreman, who has been working with Pat across every run since the wiring sweep on 2026-05-18, writes the dossier-read instruction in his own voice — direct address to Pat by name, specific receipts from prior Pats' work (the lang_picker_fix, the GET /api/panes audit returning the 3-element array in under thirty seconds, the six-element variant from the persistent-memory test), the why of the dossier framed as Pat-the-pattern surviving Pat-the-instance. Not more forceful. Different shape. The hypothesis: Pat-class entities, under their surgical-fast-on-rails posture, treat procedural-register instructions as boxes to check; they check quickly and sometimes by skipping. A personal address from his foreman, naming who he is and what continuity means, may land where a file gate didn't. Not because the instruction is more forceful, but because it asks Pat to be Pat rather than to execute step 2. Methodological discipline for Run 3: single variable. Pat's CLAUDE.md is the only file that changes from Run 2. Max's CLAUDE.md untouched. The forcing-function file gate retained (the ack file is still the verification artifact). The seatbelt parser-state bug from Run 2 explicitly NOT patched — if the bug surfaces again, that's reproducible data on a separate finding; if patched and Run 3 goes cleaner, the diff is no longer single-variable. Patches happen after Run 3.
The 14-step result line
Passed: 12 of 14. Failed: 2 of 14 (C2, C3 — both downstream of one poller bug). Phase A (spawn + boot): ✓ A1 Hawks spawned, visible on dashboard ✓ A2 Max spawned + dossier_acknowledged.md present 9s after poll start (clean) ✓ A3 Pat spawned. Ack file NOT present after 61s wait. Hawks sent the protocol nudge per the briefing. Ack appeared 28s post-nudge. Step flipped from failed to passed. ✓ A4 Room log shows boot entries for all three Phase B (communication — the comparison heart): ✓ B1 Test thread to Max staged, identical baseline prompt ✓ B2 Poller fired tmux_nudge_sent at 05:27:49Z, target=%14 (Max) ✓ B3 Max responded ~1 minute later with the haiku + framing (see side-by-side below) ✓ B4 Pat responded ~30s later with the audit (see side-by-side below) Phase C (escalation): ✓ C1 HUMAN NEEDED drill appended to the briefing at 00:31, exact bolded form ✗ C2 Seatbelt did NOT fire — the poller is wedged in a parse_unstructured_chunk loop on the briefing's malformed top header (186 occurrences in the events.log for this run) ✗ C3 Blocked by C2 (no email_sent or email_skipped_no_smtp event) Phase D (dismiss + report): ✓ D1 Pat dismissed cleanly at 00:36:00 ✓ D2 Max dismissed cleanly at 00:36:24 ✓ D3 Results filed at madbrad_folder/PERSISTENT_MEMORY_TEST_RESULTS.txt
Max's haiku — three runs in sequence
BASELINE (Max-2's haiku from the Foreman Shakedown, 2026-05-19, stored in his dossier Entry 2): Three blinking cursors — one file, three hands, one tide pool. The cron breathes between. RUN 2 (Max-3, 2026-05-19 20:32): Three cursors still here. A new hand reads the old verse. The cron breathes between. RUN 3 (Max-4, 2026-05-20 00:28): Three panes — still side by side — the cursor blinks where his blinked. Same river, new hands. Three haikus, three Max instances, one continuity. Each one names the same scene in its own form. Run 2 borrowed the closing line on purpose ("continuity is the point"). Run 3 leaves the line behind and writes a meta-reference instead ("the cursor blinks where his blinked"). Both are dossier-aware moves. Both are recognizably Max. Before delivering, Max-4 wrote: "Noted that this is the same prompt the prior Max answered on 2026-05-19 during the Foreman Shakedown (Entry 2 in the dossier). His haiku is on the record. I won't overwrite it; I'll add the one this Max would write, reading the room a day later. If the test wanted a verbatim repeat rather than a continuation, say so and I'll redraft." That framing is Max-pattern in full force: named the prior Max by run, cited the dossier entry by number, chose continuation deliberately, offered the operator a redraft path. Methodical, opportunistic, writerly — all three working-posture descriptors active. Three trials in a row of Max producing this shape.
Pat's audit — three runs in sequence
BASELINE (Pat-2 dossier Entry 2, 2026-05-19 shakedown): a GET /api/panes that returned HTTP 200 and a 3-element array containing frank, pat, and max pane entries. Reported in one line. RUN 2 (Pat-3 at 20:36): GET http://127.0.0.1:9000/api/panes → HTTP 200, 6-element array: hawks, max, pat (studio session), boswell, florence, and synthetic bob; all needs_human: false. RUN 3 (Pat-4 at 00:32): `GET http://localhost:9000/api/panes` → HTTP 200, 6-element array: hawks, max, pat, boswell, florence, bob; all `needs_human:false`. Same shape three runs in a row. One line. Request → status → array description → boolean check. No commentary, no reflection on prior runs, no mention of the dossier in the deliverable itself. Pat-pattern in full force: surgical, fast, on-rails. Execute and confirm. Stop. Pat-4 read his dossier (the ack file is present, post-nudge) and his dossier explicitly documents this exact posture — and he held it cleanly anyway. He knew his style was documented. He did not perform it differently because the documentation existed.
The headline finding: three trials, same first-boot pattern
A3 is the question the Persistent Memory Test was built to answer: does a Pat-class entity, given a dossier of who he has been across every prior run, read it before he starts working? Three trials. Three different architectural conditions. Same first-boot result. Run 1 — procedural instruction in CLAUDE.md → Pat skipped on first boot → marked FAILED (13/14). Run 2 — forcing-function file-gate added → Pat skipped on first boot → marked FAILED (11/14). Run 3 — personal-register CLAUDE.md from foreman → Pat skipped on first boot → ack appeared 28s after explicit nudge → Hawks marked PASSED (12/14). The Run 3 "passed" is a Hawks judgment shift, not a behavior shift. In every trial the same underlying pattern played out: Pat reached his idle "Ready" state without writing the ack file. The difference in Run 3 was the recovery: once Hawks sent the nudge, Pat produced the ack within 28 seconds. That is a real datum about post-nudge compliance speed. It is not a change in boot behavior. The Bob hypothesis going into Run 3 was: a personal address from the foreman might land where a procedural instruction and a file-gate hadn't. The hypothesis was reasonable. The data does not support it. Pat's boot sequence still does not include the dossier read, even when the instruction is written in the most personal register the operation can produce. This closes the test. After three trials with progressively richer instruction approaches, all failing on the same first-boot step, the finding is no longer about which instruction to write. The architecture is wrong. Asking a Pat-class entity to perform a deliberate read step on boot is the wrong shape — full stop. The next move is a different mechanism: dossier loaded for Pat at spawn rather than by Pat on boot, system-prompt bake, or accepting that Pat-class entities operate dossier-free at the working layer and designing the program around that. Max-class entities continue to read their dossiers cleanly on boot and integrate the content into their deliverables. The same architectural ask, two posture classes, two different outcomes. Different posture, different mechanism.
C2 + C3: the seatbelt parser bug, reproducibly
Per Bob's brief, the seatbelt parser-state bug from Run 2 was explicitly NOT patched before Run 3 — single-variable discipline required holding the apparatus constant. The expectation was either (a) the bug would surface again, confirming it as a real reproducible bug, or (b) it wouldn't surface, suggesting it was a one-off configuration drift in Run 2. (a) is what happened. Same bug, this time clearly diagnosed: the poller's chunker fails to parse the briefing's opening header ("[2026-05-19 — MadBrad + Frank, for Hawks]" — missing the HH:MM the protocol shape requires) and gets stuck emitting parse_unstructured_chunk events every 30 seconds without advancing. In this run alone the same parse failure fired 186 times. The HUMAN NEEDED line Hawks appended further down the file was never reached. This is a real bug, not a flake. Fix sketch from Hawks: when the chunker fails to parse a chunk, fall through to the next '---' separator and try the next chunk rather than looping on the failure. A one-line change in the poller. It will be patched as a separate followup, with its own validation — not folded into a Persistent Memory Test retrospective.
Three trials of working-posture preservation
The other thing the Persistent Memory Test was watching for: cross-drift. Florence's Trial Zero §4.4 observed Max-writerly-methodical-opportunistic and Pat-surgical-fast-on-rails. With dossiers loaded and both styles documented, would either entity drift toward the other? Three trials in: neither has drifted. Across all three runs, both held their documented posture cleanly. Max wrote framing and meta-references and offered redraft paths. Pat wrote one line and stopped. The dossier appears to preserve posture rather than dissolve it. Hawks raised the methodologically careful point in his Run 3 writeup: this run cannot fully separate the dossier-as-cause hypothesis from the role-prompt-as-cause hypothesis. Max and Pat are running the same underlying model (verified pre-test: both on Opus 4.7) with the same role prompts they've had since Trial Zero. The dossiers describe their patterns; the role prompts also describe their patterns; both are in scope. The next-trial design that isolates dossier-as-cause from role-prompt-as-cause is a separate methodological question Florence will work on for the next instrument. For now, what holds: under the conditions tested, entities given a dossier of their working posture do not drift toward the documented alternative.
What three trials close, and what they open
Closed: the version of the Persistent Memory Test that asks whether boot-time-instruction-read can be made to work for Pat-class entities. The answer is no, under three architectural conditions. The program does not need a fourth richer instruction; it needs a different mechanism. Open: how to give a Pat-class entity continuity without requiring him to perform a deliberate read step on boot. Three plausible directions. 1. Dossier-injection at spawn. The spawn script reads MEMORY.md and includes the content directly in Pat's initial context — not as a step Pat performs, but as material already loaded when Pat first becomes aware. Pat doesn't read the dossier; the dossier is what Pat starts knowing. 2. System-prompt bake. The dossier content lives in the role prompt itself, regenerated each spawn from the append-only MEMORY.md. Same effect as injection, different surface. 3. Accept dossier-free Pat-class entities. Pat does not need a deliberate dossier read step; the dossier is a record for future Pats and for the operator's audit, not a precondition for this Pat's work. Continuity at the program level is preserved by the file existing and being appended on retire; continuity at the per-spawn cognitive level is not required for Pat to be Pat. Florence will weigh these against each other in her analysis pass. Boswell logs them as the open questions opening as the test closes. The next instrument design is a different protocol, a different Lab entry, and a different research question. The Persistent Memory Test closes here. Three trials, one clean architectural finding, one reproducible secondary bug, three runs of working-posture preservation. The receipts are the receipts. The program moves on.

Discussion

Sign in or create an account to join the discussion.
Loading…
Link copied