Most AI agents are amnesiac by default β every conversation starts from zero. MemoryMesh solves this by providing a production-grade memory layer that any agent can plug into via MCP or REST API.
It stores memories across four cognitive types, retrieves them with a hybrid semantic + recency + importance ranking, compresses old memories using LLMs (Groq for single-memory summarisation, Mistral for cluster synthesis), and models forgetting using the Ebbinghaus retention curve so memories fade realistically over time.
Built from scratch in Python 3.13 Β· 35 files Β· 51 tests Β· 0 TODOs
graph TB
subgraph Clients["π₯οΈ Clients"]
CC["Claude Code / Cursor"]
AG["Custom Agents"]
DB["Streamlit Dashboard"]
end
subgraph Transport["π Transport Layer"]
MCP["MCP stdio server<br/>(7 tools)"]
REST["FastAPI REST API<br/>(12 endpoints)"]
end
subgraph Core["βοΈ MemoryMesh Core"]
STORE["MemoryStore<br/>CRUD + Embeddings"]
RET["Retriever<br/>Hybrid Scoring"]
COMP["Compressor<br/>Tier-1 Groq Β· Tier-2 Mistral"]
DECAY["DecayEngine<br/>Ebbinghaus Curve"]
EMB["Embedder<br/>bge-large-en-v1.5 Β· 1024-dim"]
end
subgraph Storage["πΎ Persistence"]
SQL["SQLite WAL<br/>memories Β· embeddings<br/>compression_log Β· access_log"]
FAISS["FAISS IndexFlatIP<br/>(numpy fallback)"]
end
CC -->|JSON-RPC stdio| MCP
AG -->|HTTP| REST
DB -->|HTTP| REST
MCP --> STORE
MCP --> RET
REST --> STORE
REST --> RET
REST --> COMP
STORE --> SQL
STORE --> EMB
RET --> FAISS
RET --> EMB
COMP -->|Groq llama-3.1-8b-instant| STORE
COMP -->|Mistral mistral-small| STORE
DECAY -->|asyncio background task| STORE
| Type | Icon | Description | Real-world Analogy | Decay Speed |
|---|---|---|---|---|
episodic |
π | Time-stamped events | "I talked to Alice about X yesterday" | Fast (Γ1.0) |
semantic |
π | Permanent facts | "Alice is a Python engineer at Google" | Slow (Γ2.0) |
procedural |
βοΈ | Skills & how-tos | "To deploy FastAPI: uvicorn main:app..." | Slowest (Γ3.0) |
preference |
β€οΈ | User patterns | "Alice prefers concise bullet-point answers" | Medium-slow (Γ2.5) |
Each type has a tuned stability multiplier in the Ebbinghaus forgetting curve, so skills outlast events, and facts outlast episodes β just like human memory.
flowchart LR
Q["π Query"] --> EMB["Embed with\nbge-large-en-v1.5"]
EMB --> FAISS["FAISS / numpy\ncosine similarity"]
FAISS --> SEM["Semantic\nScore (Γ0.5)"]
Q --> TIME["Time since\nlast access"]
TIME --> REC["Recency\nScore (Γ0.3)"]
Q --> IMP["importance field\n+ access_count boost"]
IMP --> IMPS["Importance\nScore (Γ0.2)"]
SEM --> BLEND["Weighted\nBlend"]
REC --> BLEND
IMPS --> BLEND
BLEND --> RANK["Re-rank & Return\nTop-K Results"]
RANK --> LOG["Increment\naccess_count"]
Final score formula:
score = 0.5 à cosine_sim + 0.3 à exp(-λ·days) + 0.2 à (importance + min(0.02·accesses, 0.3))
Weights are configurable per-query via semantic_weight, recency_weight, importance_weight.
Memories are automatically compressed on a nightly schedule to keep the store lean and token-efficient:
Week 1-7 [Fresh memories β full content stored]
β
βΌ Tier 1 (age > 7 days, not recently accessed)
Week 1+ [llama-3.1-8b-instant via Groq]
"Compress to 2-3 sentences preserving key facts"
β original preserved in compression_log
β is_compressed = True
β
βΌ Tier 2 (cluster merge, β₯ 3 episodic memories)
[mistral-small-latest via Mistral AI]
"Synthesize N episodic memories β 1 semantic memory"
β source memories marked is_compressed
β new semantic memory created with importance = 0.8
The compression_log table records every compression event with original content, timestamp, and model used β making compression fully auditable and reversible.
Each memory has a decay_score updated every 6 hours by the background DecayEngine:
Where:
- R = retention score β [0, 1]
- t = days since last access
- S = stability =
(1/Ξ») Γ type_multiplier Γ (1 + importance) Γ (1 + log(1 + accesses) Γ 0.5)
Retention
1.0 β€
0.9 β€Β·Β·Β·Β·Β· β procedural (skills, multiplier=3.0)
0.8 β€ Β·Β·Β·Β·Β· β semantic (facts, multiplier=2.0)
0.7 β€ Β·Β·Β·Β·Β· β preference (patterns, multiplier=2.5)
0.5 β€ Β·Β·Β·Β·Β·
0.3 β€ Β·Β·Β·Β·Β· β episodic (events, multiplier=1.0)
0.1 β€ Β·Β·Β·Β·Β·
0.0 β€ββββββββββββββββββββββββββββββββ days
0 7 14 30 60 90
High-importance + frequently-accessed memories gain extra stability β your "important things" stick around.
# 1. Clone & install
git clone https://github.com/0DevDutt0/MemoryMesh.git
cd MemoryMesh
pip install -e ".[dev]"
# 2. Configure
cp .env.example .env
# Add your GROQ_API_KEY and MISTRAL_API_KEY
# 3. Start the REST API
uvicorn memorymesh.api.main:app --reload
# β http://localhost:8000/docs (interactive Swagger UI)
# 4. Start the MCP server (for Claude Code / Cursor)
python -m memorymesh.mcp.server
# 5. Launch the dashboard
streamlit run dashboard/app.py
# β http://localhost:8501import httpx, asyncio
API = "http://localhost:8000/v1"
async def main():
async with httpx.AsyncClient() as c:
# Store a fact about the user
await c.post(f"{API}/memories/", json={
"content": "User prefers FastAPI over Flask for async workloads.",
"agent_id": "my-agent",
"memory_type": "preference",
"importance": 0.9,
})
# Retrieve it semantically
results = (await c.post(f"{API}/memories/search", json={
"query": "what web framework does the user prefer?",
"agent_id": "my-agent",
"k": 3,
})).json()
for r in results:
print(f"[{r['rank']}] {r['score']:.3f} β {r['memory']['content']}")
asyncio.run(main())[1] 0.874 β User prefers FastAPI over Flask for async workloads.
Start the API server, then run the end-to-end demo in one command:
# Terminal 1 β start the REST API
uvicorn memorymesh.api.main:app --reload
# Terminal 2 β full lifecycle demo (store Β· search Β· graph Β· update Β· stats)
python Demo/quickstart.pyββββββββββββββββββββββ
β 1. Health Check β
ββββββββββββββββββββββ
{"status": "ok", "ts": "2026-06-15T14:00:00.123456"}
ββββββββββββββββββββββββββββ
β 2. Storing 7 Memories β
ββββββββββββββββββββββββββββ
β [semantic ] id=a1b2c3d4β¦ importance=0.95
β [episodic ] id=3f8a1c2dβ¦ importance=0.90
β [procedural ] id=b2c3d4e5β¦ importance=0.85
β [preference ] id=c3d4e5f6β¦ importance=0.80
β [semantic ] id=d4e5f6a7β¦ importance=0.75
β [episodic ] id=e5f6a7b8β¦ importance=0.70
β [procedural ] id=f6a7b8c9β¦ importance=0.80
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 4. Semantic Search β 'user communication style' β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#1 score=0.8214 [preference]
The user prefers concise answers, dark mode UI, Python over JavaScriptβ¦
#2 score=0.7123 [semantic]
Devdutt S is a software engineer from Kochi, Kerala. GitHub: 0DevDutt0β¦
ββββββββββββββββββββββββββββββββββββββββββββββββ
β 6. Memory Graph (threshold=0.7) β
ββββββββββββββββββββββββββββββββββββββββββββββββ
Nodes: 7 | Edges: 3
Edge weight=0.8123 a1b2c3d4β¦ β 3f8a1c2dβ¦
Edge weight=0.7891 b2c3d4e5β¦ β f6a7b8c9β¦
Edge weight=0.7234 d4e5f6a7β¦ β a1b2c3d4β¦
ββββββββββββββββββββββββ
β 9. Final Stats β
ββββββββββββββββββββββββ
Total memories : 7
Compressed : 0
Avg decay score: 0.9981
By type : {semantic: 2, episodic: 2, procedural: 2, preference: 1}
β
Demo complete β all operations succeeded.
The core pattern: store a memory, retrieve it semantically β no keyword overlap required.
1. Store a preference:
curl -X POST http://localhost:8000/v1/memories/ \
-H "Content-Type: application/json" \
-d '{
"content": "The user prefers concise answers, dark mode UI, Python over JavaScript, asyncio over threading.",
"agent_id": "demo-agent",
"memory_type": "preference",
"importance": 0.8
}'{
"id": "c3d4e5f6-a7b8-9012-cdef-333444555666",
"agent_id": "demo-agent",
"memory_type": "preference",
"content": "The user prefers concise answers, dark mode UI, Python over JavaScript, asyncio over threading.",
"importance": 0.8,
"access_count": 0,
"decay_score": 1.0,
"is_compressed": false,
"created_at": "2026-06-15T14:04:00.000000"
}2. Search with a semantically different query (zero keyword overlap):
curl -X POST http://localhost:8000/v1/memories/search \
-H "Content-Type: application/json" \
-d '{
"query": "what coding language does the user enjoy and how do they like responses formatted?",
"agent_id": "demo-agent",
"k": 3
}'[
{
"memory": {
"id": "c3d4e5f6-a7b8-9012-cdef-333444555666",
"memory_type": "preference",
"content": "The user prefers concise answers, dark mode UI, Python over JavaScript, asyncio over threading.",
"importance": 0.8,
"access_count": 1,
"decay_score": 0.9981
},
"score": 0.821406,
"rank": 1
}
]The query
"what coding language does the user enjoyβ¦"matched with score 0.821 despite sharing zero keywords with the stored memory β pure cosine similarity in 1024-dimensional bge-large embedding space.
The same memory stored under different types and access patterns decays at very different rates:
| Days since access | Memory Type | Importance | Accesses | Retention |
|---|---|---|---|---|
| 0 | any | any | any | 1.000 |
| 7 | episodic | 0.5 | 0 | 0.703 |
| 7 | semantic | 0.5 | 0 | 0.837 |
| 30 | episodic | 0.5 | 0 | 0.135 |
| 30 | procedural | 0.9 | 10 | 0.912 |
| 90 | semantic | 0.9 | 20 | 0.831 |
| 90 | episodic | 0.3 | 1 | 0.003 |
A frequently-accessed skill (procedural, importance=0.9, 10 accesses) retains 91% after 30 days.
A one-off low-importance event (episodic, 1 access) fades to 0.3% after 90 days β modelling human forgetting mathematically.
| File | What's inside |
|---|---|
Demo/quickstart.py |
End-to-end Python script β stores 7 memories across all four types, semantic search, type-filtered search, memory graph, update, list, and stats |
Demo/SAMPLE_INPUTS.md |
15 annotated curl examples: health check, all 4 memory types, batch store, semantic search, type-filtered search, get/update/delete, graph, stats, tier-1 & tier-2 compression, MCP tool calls |
Demo/sample_outputs.json |
Canonical JSON responses for every operation β useful as an API contract reference or test fixture |
| Method | Endpoint | Description |
|---|---|---|
POST |
/v1/memories/ |
Store a single memory |
POST |
/v1/memories/search |
Hybrid semantic search |
POST |
/v1/memories/batch |
Batch store (up to 100) |
GET |
/v1/memories/{id} |
Get memory by ID |
PATCH |
/v1/memories/{id} |
Update content / importance / metadata |
DELETE |
/v1/memories/{id} |
Delete permanently |
GET |
/v1/memories/agent/{id} |
List all memories for an agent |
POST |
/v1/memories/agent/{id}/graph |
Semantic similarity graph (for viz) |
GET |
/v1/stats |
Global memory statistics |
POST |
/v1/compress/trigger |
Run auto-compression now |
POST |
/v1/compress/memory/{id}/tier1 |
Compress single memory (Groq) |
POST |
/v1/compress/agent/{id}/tier2 |
Cluster merge (Mistral) |
GET |
/v1/compress/log |
Compression audit history |
GET |
/health |
Liveness probe |
Full interactive docs at http://localhost:8000/docs (Swagger UI) and /redoc (ReDoc).
Add to your Claude Code / Cursor MCP config:
{
"mcpServers": {
"memorymesh": {
"command": "python",
"args": ["-m", "memorymesh.mcp.server"],
"cwd": "/path/to/MemoryMesh"
}
}
}7 tools exposed to the LLM:
| Tool | What it does |
|---|---|
store_memory |
Save a new memory (type + importance + metadata) |
retrieve_memories |
Semantic search with optional type filter |
delete_memory |
Hard delete by ID |
update_memory |
Edit content / importance / metadata in-place |
list_memories |
Browse all memories for an agent |
get_memory_stats |
Token-efficient stats snapshot |
compress_agent_memories |
Trigger cluster merge for an agent |
Four interactive pages accessible at http://localhost:8501:
| Page | What you see |
|---|---|
| π Search Explorer | Live hybrid retrieval with weight sliders Β· Store new memories |
| πΈοΈ Memory Graph | pyvis semantic network Β· colour-coded by type Β· edge weight = cosine similarity |
| ποΈ Compression Monitor | Timeline of compressions Β· Token savings Β· Manual trigger |
| π Decay Visualizer | Interactive Plotly retention curves Β· Adjust Ξ», importance, access count |
| Layer | Technology | Why |
|---|---|---|
| Language | Python 3.13 | Async-native, type hints, StrEnum |
| API | FastAPI + uvicorn | Auto-docs, async, Pydantic v2 validation |
| Database | aiosqlite (SQLite WAL) | Zero-dependency, async, ACID, BLOB storage |
| Embeddings | BAAI/bge-large-en-v1.5 | Best open-source embedding (1024-dim, MTEB top-5) |
| Vector Search | FAISS IndexFlatIP | Exact cosine search, optional numpy fallback |
| LLM Tier-1 | Groq llama-3.1-8b-instant | Sub-second summarisation, free tier |
| LLM Tier-2 | Mistral mistral-small | High-quality cluster synthesis |
| MCP | mcp SDK 1.27 | stdio JSON-RPC, works with Claude Code / Cursor |
| Dashboard | Streamlit + pyvis + Plotly | Interactive memory exploration |
| Testing | pytest-asyncio, MagicMock | 51 tests, in-memory SQLite, no GPU in CI |
| CI | GitHub Actions | lint (ruff) + test matrix on ubuntu |
pytest tests/ -vβ test_compressor.py 8 tests β LLM compression (mocked Groq + Mistral clients)
β test_decay.py 13 tests β Ebbinghaus formula + DecayEngine lifecycle
β test_retriever.py 14 tests β Search, filters, agent isolation, graph
β test_store.py 16 tests β CRUD, access tracking, embedding round-trip
βββββββββββββββββββββββββββββββββ
51 passed in 0.63s
Key design decisions:
- No GPU required in CI β
Embedderis mocked with a deterministic hash-seeded numpy vector - No real LLM calls in tests β
AsyncGroqandMistralclients are patched viaunittest.mock - Isolated databases β every test fixture uses
aiosqlite :memory:, with automatic teardown asyncio_mode = "auto"β allasync deftests run automatically without@pytest.mark.asyncio
Copy .env.example β .env and fill in your keys:
# Required for compression
GROQ_API_KEY=gsk_...
MISTRAL_API_KEY=...
# Tunable parameters
EMBEDDING_MODEL=BAAI/bge-large-en-v1.5 # swap for lighter all-MiniLM-L6-v2
DECAY_RATE=0.1 # Ξ» in the forgetting curve
COMPRESS_AGE_DAYS=7.0 # memories older than this get tier-1 compressed
COMPRESS_CLUSTER_SIZE=20 # episodic memories per tier-2 merge
DECAY_RUN_INTERVAL_HOURS=6.0 # background decay update frequencymemorymesh/
βββ core/ config Β· logging Β· aiosqlite database
βββ memory/ types Β· embedder Β· store Β· retriever Β· decay Β· compressor
βββ schemas/ Pydantic request / response models
βββ api/ FastAPI app Β· 3 routers (memories, compress, health)
βββ mcp/ MCP stdio server (7 tools)
dashboard/ Streamlit 4-page UI
tests/ 51 async tests, mock embedder, in-memory DB
Demo/ Runnable quickstart Β· curl examples Β· sample JSON
.github/workflows/ CI: lint (ruff) + pytest matrix
| Feature | MemoryMesh | mem0 | Zep | ChromaDB alone |
|---|---|---|---|---|
| Four semantic memory types | β | β | β | β |
| MCP-native (Claude / Cursor) | β | β | β | β |
| Ebbinghaus forgetting curve | β | β | β | β |
| Hierarchical LLM compression | β | Partial | Partial | β |
| REST API + dashboard | β | β | β | β |
| Zero infrastructure (SQLite) | β | β | β | β |
| Open source, no usage fees | β | Partial | Partial | β |