Breathe-Memory – Associative memory injection for LLMs (not RAG)

by Eino Leffler

Visit

Mar 26, 2026 Developer Tools

Gallery

About

LLMs forget. The standard fix is RAG — retrieve chunks, stuff them in. It works until it doesn't: irrelevant chunks waste tokens, summaries lose structure, and nothing actually models how memory works.Breathe-memory takes a different approach: associative injection. Before each LLM call, it extracts anchors from the user's message (entities, temporal references, emotional signals), traverses a concept graph via BFS, runs optional vector search, and injects only what's relevant — typically in <60ms.When context fills up, instead of summarizing, it extracts a structured graph: topics, decisions, open questions, artifacts. This preserves the semantic structure that summaries destroy.The whole thing is ~1500 lines of Python, interface-based, zero mandatory deps. Plug in any database, any LLM, any vector store. Reference implementation uses PostgreSQL + pgvector.https://github.com/tkenaz/breathe-memoryWe've been running this in production for several months. Open-sourcing because we think the approach (injection over retrieval) is underexplored and worth more attention.We've also posted an article about memory injections in a more human-readable form, if you want to see the thinking under the hood: https://medium.com/towards-artificial-intelligence/beyond-ra...