Memory is the hard part

Most agent setups forget everything between sessions. Here's what a working memory system actually looks like.

Mar 10, 2026

If you're running AI agents for anything beyond one-shot tasks, you've probably noticed the same thing: they forget everything between sessions.

The LinkedIn agent doesn't know what it posted yesterday. The security agent doesn't remember which servers it already patched. The publishing agent has no idea which drafts are staged and which were rejected. Every session starts from zero. Every session wastes its first few minutes rediscovering context that should have been obvious.

Memory is what turns a chatbot into something useful. Almost nobody builds it right.

What memory actually means in an agent system

When people say "memory" in the context of AI agents, they usually mean one of three things, and they're usually conflating all three.

Conversation history is what ChatGPT gives you. The model remembers what you said earlier in the thread. Cheapest form of memory, least useful for real work. It dies when the session ends. It fills the context window. It can't be shared across agents.

Retrieved context is RAG. Embed documents, store them in a vector database, retrieve relevant chunks at query time. Works for knowledge bases. Terrible for operational memory because relevance scoring doesn't understand time, recency, or task state. A vector search for "what's the current deploy status" might return a doc from three months ago because the words match.

Persistent state is what agents actually need. Not "what documents are similar to this query" but "what happened yesterday, what's in progress right now, and what decisions were already made." Most setups skip this entirely.

My system uses all three, but the third one does the heavy lifting.

The MEMORY.md pattern

The pattern that works: give every agent a MEMORY.md file. Plain markdown. Human-readable. Version-controlled by nature because it lives in a git repo.

A LinkedIn agent's MEMORY.md tracks voice guidelines, which posts performed well, which topics to avoid, and the current posting schedule. A security agent's MEMORY.md tracks which servers have been audited, what vulnerabilities were found, and which patches are pending.

Not a database. A text file the agent reads at session start and updates when something important changes. The format is just markdown:

# MEMORY.md - Agent Name

## Role
What this agent does.

## Current State
What's in progress right now.

## Decisions Made
Things that were decided and shouldn't be relitigated.

## Preferences
How the human wants things done.

Why markdown instead of a database?

Debuggability. When an agent does something wrong, open the MEMORY.md and read it. You can see exactly what the agent knew. No query logs, no embedding inspection, no "why did the retrieval return that chunk?" detective work. Just text.

Editability. Open the file and change it. If an agent has a wrong assumption baked into memory, fix it in thirty seconds. Try doing that with a vector store.

Portability. Every agent's memory is a text file in a git repo. You can grep across all agent memories in one command. Diff changes over time. Copy one agent's memory pattern to bootstrap a new agent.

Semantic search on top of flat files

MEMORY.md handles persistent state. But agents also need to search their memory when a question comes up that isn't covered by the top-level file.

Agents can also have a `memory/` directory with dated entries and topic-specific files. When an agent gets a question about something from last Tuesday, it runs a semantic search across MEMORY.md and everything in `memory/`. The search uses embeddings (text-embedding-3-small works fine) and returns the top snippets with file paths and line numbers.

The flow: agent gets a question, runs `memory_search`, gets back relevant snippets with citations, pulls the specific lines it needs with `memory_get`, and answers with full context.

This is where the "retrieved context" layer comes in, but it's searching the agent's own operational history, not a generic document corpus. A RAG system pointed at your company wiki retrieves information. This retrieves experience.

Why RAG alone fails for operational memory

A common mistake: building agent memory with RAG and nothing else. Embed all the docs, all the Slack messages, all the meeting notes into a vector store and point the agent at it.

It works for answering questions about static knowledge. "What's our refund policy?" gets the right answer because the refund policy doc is sitting in the index and the embedding similarity is high.

It fails for anything time-sensitive or state-dependent. "What did we decide about the pricing change?" might return four different documents from four different meetings because they all discuss pricing. The agent has no way to know which one is current. "What's the status of the deployment?" returns nothing useful because deployment status isn't a document, it's a state that changes every hour.

The fix isn't better embeddings or fancier retrieval. The fix is a separate memory layer that tracks state explicitly. MEMORY.md handles "what is true right now." Semantic search handles "what happened before that might be relevant." RAG handles "what do we know about this topic in general." Three layers, three purposes.

The memory lifecycle

Memory isn't write-once. It has a lifecycle.

Capture: Something important happens during a session. The agent writes it to memory. Not everything, just decisions, outcomes, and state changes. An agent that logs every API call to memory will drown in noise. An agent that logs "deployed v2.3 to production, all tests passing, monitoring for 24h" gives its future self exactly what it needs.

Recall: At the start of every session, the agent loads its MEMORY.md. This is automatic in my system. The agent's workspace files (SOUL.md, MEMORY.md, TOOLS.md) are injected into every session. For deeper recall, the agent runs semantic search when a question requires historical context.

Decay: Old memory needs to age out or get compressed. A MEMORY.md that grows forever becomes useless. I handle this with periodic consolidation: the agent reviews its memory, keeps what's still relevant, archives what isn't, and summarizes patterns. This happens on a schedule, not continuously.

Correction: Sometimes memory is wrong. The agent believed something that turned out to be false, or a decision was reversed. The human edits the MEMORY.md directly. This is why plain text matters. Correcting a vector embedding is a research project. Correcting a markdown file is a text edit.

What breaks when memory is wrong

Bad memory is worse than no memory.

An agent with no memory starts fresh every session. It's slow but safe. It asks questions it's asked before. It redoes work it's done before. Annoying but not dangerous.

An agent with wrong memory acts on false beliefs with full confidence. The security agent that "remembers" a server was patched when it wasn't. The publishing agent that "remembers" a draft was approved when it was actually rejected. The financial agent that "remembers" an invoice was sent when it's still in queue.

This is why MEMORY.md being human-readable matters. Review agent memories periodically. Not every day, but often enough to catch drift. When an agent starts making decisions that don't make sense, check its memory first. Nine times out of ten, something in the MEMORY.md is stale or wrong.

The cost math

Memory isn't free. Every MEMORY.md that gets loaded into a session consumes tokens. Semantic search costs embedding API calls. Storing files costs disk space (trivial) and git history (also trivial).

The real cost is context window usage. A 500-line MEMORY.md takes roughly 2,000 tokens. Across a fleet of agents running multiple sessions per day, that adds up. But prompt caching covers most of it. MEMORY.md files load at session start, which means they hit the cache on subsequent messages within the same session. First load costs full price. Everything after costs roughly 10% because the cached prefix matches.

Without memory, agents waste tokens rediscovering context. An agent that spends 500 tokens re-asking "what's the current status?" every session burns more than the 2,000-token memory file that would have answered it upfront. The net cost of memory is negative.

Getting started

If you're running agents and haven't built a memory layer, start here:

Create a MEMORY.md for your most important agent. Put the basics in it: what the agent does, what's currently in progress, and any decisions that shouldn't be repeated.

Tell the agent to update its MEMORY.md when something important changes. Not after every message. After outcomes: task completed, decision made, error encountered.

After a week, read the MEMORY.md. Is it useful? Does it contain things the agent actually needs to know? Trim the noise, keep the signal.

Add semantic search when you outgrow a single file. This usually happens when the agent has been running for a month and the memory directory has enough entries to make search worthwhile.

Set a reminder to review agent memories monthly. Catch stale beliefs before they cause problems.

The whole system is plain text files with a search layer on top. No specialized infrastructure. No vector database to manage (the embeddings are computed at query time or cached locally). No migration path to worry about because markdown doesn't have schema changes.

Model quality matters less than you think. Memory quality matters more than anyone talks about.

Protomota Lab

Discussion about this post

Ready for more?