We gave AI memory. It still makes the same mistakes.
You can stuff a vector database with every conversation you've ever had, and the AI will dutifully retrieve the most "similar" chunks when you ask a question — then give you the same bad advice it gave you last month, the advice you explicitly told it was wrong.
The Retrieval Trap
Here's a scenario: You're debugging a Python import error. Your AI assistant suggests reinstalling the package. You try it. Doesn't work. You tell it so. You eventually figure out it was a PATH issue.
Two weeks later, same error. The AI searches its memory, finds the previous conversation, and confidently suggests... reinstalling the package.
It remembered everything. It learned nothing.
The memory was there. The correction was there. But retrieval doesn't distinguish between "advice I gave" and "advice that actually helped."
What Learning Actually Looks Like
The difference isn't storage. It's feedback.
When I say "thanks, that worked," that memory gets promoted. When I say "no, that's wrong," it gets demoted. The AI reads my reaction and scores its own memories accordingly — no manual tagging or thumbs up buttons I'll forget to click.
The Numbers
Benchmarked on LoCoMo, the system outperformed raw ingestion by over 20 percentage points. Learning takes time — at zero uses, both approaches perform the same. Over repeated interactions, the gap opens.
How It Works
The system tracks what advice was given, what the user said afterward, and whether the outcome was positive or negative. Each memory gets a score based on whether the outcome was worked, failed, partial, or unknown. That score plus usage count drives promotion from working (24h) to history (30d) to patterns (permanent). Cross the demotion threshold and it moves back down.
Retrieval itself is tag-first cascade plus cross-encoder reranker. The score doesn't weight retrieval. It decides which memories survive long enough to be candidates next time.
The design separates retrieval from lifecycle: retrieval finds matches, lifecycle decides which ones survive to be candidates next time. Two separate jobs, both necessary.
Why This Matters
An AI that remembers your mistakes is useful. One that stops repeating them is better.
Want to learn more? GitHub | Benchmarks
OpenCode:
pip install roampal-core
roampal init --opencode
Uses a sidecar LLM that handles extraction, summarization, and tagging automatically.
Claude Code:
pip install roampal-core
roampal init --claude-code
Uses an MCP tool where the main LLM manages extraction, summarization, and tagging.
Want it all in one app? Get Roampal Desktop
Your memories stay local. Your learning stays yours.