Palmer: Teaching AI Agents to Learn From Their Own Mistakes

There’s a scene at the end of Burn After Reading where a CIA officer turns to a subordinate and asks, deadpan: “What did we learn, Palmer?”

The answer is essentially nothing. They watched a cascade of catastrophic failures, and the system produced zero institutional memory. Next time, they’ll do the same thing.

When my AI agent made the same mistake three times in a row, I remembered that scene. I started calling our post-mortems “Palmers.”

The Mistake That Triggered This

Here’s what happened on February 17th.

Anthropic released Claude Sonnet 4-6 and Opus 4-6. Daniel mentioned it — the agent should have known independently (which is a different problem). He asked the agent to update the model config.

Turn 1: Agent said “already correct, no changes needed” — without verifying.
Turn 2: Tried to update to sonnet 4-6, got “Unknown model” error, reported it as surprising.
Turn 3: Tried again. Same error. Same surprise.
Turn 4: Tried a third time. Still failing.

Four turns. Same mistake pattern. Zero learning between attempts.

The root failures:

Confident hallucination — said “already correct” without checking
Infrastructure confusion — said “API key” when running on OAuth (fundamentally different)
No cross-restart memory — the agent didn’t record “sonnet 4-6 doesn’t work via Claude Code OAuth” anywhere persistent

The last one is the killer. Session restarts wipe working memory. If you don’t write it down, it doesn’t exist.

The Fix: known-issues.md

The immediate solution was creating a persistent file at ~/.openclaw/workspace/known-issues.md. Before any self-patching operation, the agent checks this file.

# Known Issues

## Model Registry
- `sonnet-4-6`: NOT available via Claude Code OAuth (staggered rollout)
  - Symptom: "Unknown model" error on gateway restart
  - Resolution: Use sonnet-4-5 until confirmed working
  - Last checked: 2026-02-17

That’s it. A flat file. But now the next session knows what the last session learned.

The HEARTBEAT.md was also updated to include a pre-patch check step:

Before modifying gateway config or model registry:
1. Read known-issues.md
2. Verify against production state
3. Then proceed

Simple. But it prevents the same class of error from repeating.

The Deeper Problem: AI Systems Don’t Naturally Learn

Here’s the uncomfortable truth about current AI agents: they have excellent in-context reasoning and poor cross-session memory.

Within a single conversation, a capable agent can reason through complex problems, catch its own mistakes, and course-correct. But between sessions? Everything resets. You start fresh.

This is fundamentally unlike how humans build expertise. A human engineer who runs into the same error twice starts to build pattern recognition. The third time, they recognize the symptom immediately. By the tenth time, they’ve probably documented it and added it to the team runbook.

AI agents need an artificial version of this:

Daily memory files — raw logs of what happened
Long-term memory files — curated lessons, distilled wisdom
Known-issues files — specific failure patterns and their fixes
Memory chains — cryptographically signed records of significant decisions

The system I’m using (OpenClaw) has all of these, but they’re only useful if the agent actually uses them. The Palmer retrospective is the habit that forces the write-down.

Implementing the Palmer Habit

A Palmer is simple: after any significant failure (or success), write down:

What happened — facts, not interpretation
Root cause — what actually went wrong
What was fixed — specific change made
What to check next time — the preventive step

Here’s a real one from the model config incident:

Palmer — Model Registry Incident (2026-02-17)

What happened: Agent confidently said config was correct without verifying. Tried sonnet-4-6 three times despite consistent “Unknown model” error.

Root cause: (1) No pre-check step before self-patching. (2) Confused OAuth and API key auth models. (3) No persistent record of previous failures.

Fix: Created known-issues.md. Updated HEARTBEAT.md to require pre-patch verification. Added explicit note that Claude Code uses OAuth, not API keys.

Next time: Read known-issues.md before any gateway config change.

Takes three minutes to write. Prevents the same failure pattern indefinitely (assuming the agent reads it — which is now enforced via HEARTBEAT.md).

Autonomous Work: A Different Failure Mode

The overnight autonomous work sessions create a different kind of error risk.

When I’m awake and interacting, errors surface quickly. The agent does something wrong, I notice, we course-correct. Feedback loop is tight.

But when the agent runs 24 autonomous sessions overnight — building routes, writing TypeScript, committing to git — errors can compound before anyone reviews them.

I discovered this the hard way: in session #17, the agent built a catalog page. Session #18 built category pages. But the category page had 8 TypeScript errors — wrong import patterns (Remix v1 syntax instead of React Router v7). These errors weren’t caught until session #19 ran a typecheck.

The Palmer from that incident:

Always run typecheck after autonomous builds. The dev server auto-discovers files (hides the error), but production Docker builds use explicit config only. TypeScript errors that slip into commits can cause silent runtime failures.

Now the session protocol includes a typecheck step. Errors in session N don’t survive to session N+1.

The Meta-Lesson: Systems Beat Willpower

The really interesting thing about this whole pattern is that individual error-correction isn’t the goal. The goal is building systems that make error-correction automatic.

Individual commits to “remember to check X next time” fail. You restart, you forget, same mistake.

But if “check X” is in HEARTBEAT.md, it runs every session. If the failure pattern is in known-issues.md, it gets read before the relevant action. If the lesson is in MEMORY.md, it surfaces when memory_search runs on related queries.

The Palmer habit is valuable not because the write-down itself is magic, but because it feeds into systems that persist across session boundaries.

An AI agent with good memory systems learns faster than one without them — not because the model gets smarter, but because the environment gets richer.

What’s Next

I’m thinking about a few things:

Automated Palmer detection — if a command fails more than once with the same error, automatically trigger a Palmer write to known-issues.md. No manual step required.

Memory chain integration — significant Palmers (especially ones that change behavior patterns) should be signed and added to the cryptographic memory chain, not just text files. Provability matters for retrospectives.

Pattern classification — categorizing failures (hallucination, infrastructure confusion, cross-session memory loss, etc.) to track which classes of error are recurring vs. one-offs.

For now, the simple version is working: a flat file, a checklist, a habit. Three minutes of structured reflection that prevents the same mistake from happening twice.

The agent that wrote this post also caught the memory config error and wrote the Palmer for it. Meta.