Your AI Forgets Everything. Here Is the Plain-Text Fix I Built.

Every time you open a fresh AI chat, you are hiring a brilliant new employee with total amnesia. You re-explain who you are. You re-explain the project. You re-explain the decision you already made last Tuesday and do not want to relitigate. Ten minutes later it is useful, and tomorrow it remembers none of it. That is not the model being dumb, it is context window limits: it only knows what fits in its window, and every new chat starts that window empty.

I got tired of paying that tax. So I built my agents a memory. Not a vector database, not a paid "memory layer" from one of the dozen startups selling them this year. Plain text files on my own machine, plus one small local index. That is the whole trick. This is how it works, and just as important, how I decided whether it was worth the upkeep.

First, an honest number

A few weeks ago I asked an AI to count my memory layers. It told me I had sixteen, and that this put me in the top one percent of knowledge workers. I run on the assumption that any machine praising me is malfunctioning, so I counted again by hand.

The real number was seven.

The inflation came from two places. It had counted app databases (the Postgres behind an app I ship, a browser's local storage) as if they were part of how I think. They are not. And it had counted each agent's private copy of the same structure as a separate layer. Strip that out and you get seven surfaces, one retrieval mechanism over them, and one contract governing them. Smaller than the brochure, which is the good news: you can build this yourself in an afternoon.

The architecture: seven surfaces, all markdown

Every layer here is a markdown file or a folder of them. That is deliberate. Plain text syncs anywhere, diffs cleanly in git, opens in any editor, and will still be readable in twenty years when today's note-taking app is a dead login screen.

Startup surface. One file the agent reads first, every session. Mine is working-memory.md. It holds only what is active right now: three "hot" projects, a handful of "warm" ones, an archive line, and the exact next action for each. It is a whiteboard, not a filing cabinet, and I keep it brutally short. Orientation in seconds is the entire job of this file.
The index. A catalog of every durable note, one line each, loaded at session start. Mine is MEMORY.md. It is a router. The agent does not need to know a fact, it needs to know which file the fact lives in.
Durable memories. One fact per file, typed. I use four types: a preference (how I like things done), a decision (a call I made and why, so it stays made), a project note (state that is not in the code), and a reference (a pointer to something external). One fact per file sounds fussy until you try to update or delete just one thing in a giant notes document. The boundaries are not sacred: they are chosen so each file holds exactly one kind of thing, which is what makes any layer cheap to grep, cheap to prune, and safe to delete. Split a layer only when one file starts answering two different questions; merge two the moment you never query them apart.
Conversation summaries. After a meaningful session, the agent writes a short checkpoint: what got built, what got verified, what broke, the next step. This is the episodic layer, the time-stamped log of what happened. It is how I reconstruct why a decision was made six weeks later.
Project handoffs. Per-project resume state that lives with the project, not in the central store. What is done, what is blocked, what not to touch. When I reopen a repo I have not seen in a month, this is the file that catches me up.
Shared long-term store. The one place cross-cutting decisions and operating rules live, readable by every agent. Think of it as the constitution rather than the daily news.
A human wiki. The layer I read, not the agent. Curated, selective, in Obsidian. This is where a thought graduates from "machine state" to "something I actually want to keep and browse." Most people conflate this with the agent memory. Keeping them separate is what stops either one from turning into a swamp.

Over all seven sits a small local search index (SQLite full-text search) that can return a cited snippet from any of them. That is not a layer, it is a lens. Which brings me to the part that actually matters.

The one idea worth stealing: gate your retrieval

Every memory guide tells you the same thing: always search your memory before you answer. I did that, and noticed something annoying: searching often cost more than reading the file would have. You pay to build the index, pay to run the query, pay to read the snippets, and half the time the answer was already sitting in the context you loaded at startup.

So I stopped searching by default. Now retrieval runs through a gate, cheapest tier first:

Tier 0, free: answer from what is already loaded. The startup surface and the index are in context from the first message. If the answer is there, done, zero extra cost.
Tier 1, cheap: if I know which file holds it, read that one file. The index tells me where. No search required.
Tier 2, medium: only when I do not know where it lives do I hit the full-text search, then follow it to the source.
Tier 3, last: read an entire file end to end, and only after Tier 2 pointed me at it.

Searching first is a habit, not a law, and it is an expensive habit. Reaching for the cheapest tier that answers the question is the difference between a memory system that saves you tokens and one that quietly burns them. If you take one thing from this, take this.

If you run more than one agent, write a contract

I run two coding agents and a chat agent across three machines. The moment two of them can write to the same memory, you have invented a merge conflict with feelings. A contract fixes it, and the contract is boring on purpose:

Each agent owns its own directory and stays in its lane.
Any agent may read another's memory. None may edit it unless I say so.
When an agent writes to a shared file, it tags who wrote the line, so I can always tell.
One store is the single source of truth for shared long-term decisions.
Secrets are never written down, only referred to by their environment-variable name.

That is the least common piece of any personal setup I have seen, and probably the most transferable. The rules are the reusable asset, not my particular notes.

The question nobody writes about: does it earn its keep?

Here is the part the how-to posts skip. A memory system is not free. It has a maintenance tax: writing the summaries, pruning the stale notes, keeping the surface short. And a stale memory is not neutral, it is dangerous. A confident wrong note is worse than no note at all. I have opened a memory that was three weeks out of date and briefly believed it.

So I will be honest about my own system: I feel faster with it, and I have not actually measured that it nets positive. Feeling is not evidence.

If you build one, run the cheap experiment I am now running on myself. For two weeks, jot a line every time a memory clearly saved you a re-explanation or stopped a repeated mistake, and roughly what the upkeep cost you that week. Then read the ledger. If it is green, keep the system and grow it. If it is thin, cut back to the three or four layers doing the real work and delete the rest without ceremony. More layers is not automatically better. Most people who run a simpler system than mine did the math and were right to.

Start smaller than you think

You do not need seven layers on day one. Start with two files: a working-memory.md your agent reads first, and a MEMORY.md index it can route from. Add a durable note the next time you catch yourself explaining the same preference twice. Practice the retrieval gate as a habit before you automate anything. Add the contract only when a second agent shows up.

The forgetting is the default. Continuity is something you build, one plain-text file at a time, and it will outlive every app you were tempted to trust it to.

Back to writing