Skip to content

0DevDutt0/MemoryMesh

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Typing SVG


Python FastAPI MCP SQLite Tests License


What is MemoryMesh?

Most AI agents are amnesiac by default β€” every conversation starts from zero. MemoryMesh solves this by providing a production-grade memory layer that any agent can plug into via MCP or REST API.

It stores memories across four cognitive types, retrieves them with a hybrid semantic + recency + importance ranking, compresses old memories using LLMs (Groq for single-memory summarisation, Mistral for cluster synthesis), and models forgetting using the Ebbinghaus retention curve so memories fade realistically over time.

Built from scratch in Python 3.13 Β· 35 files Β· 51 tests Β· 0 TODOs


Architecture

graph TB
    subgraph Clients["πŸ–₯️ Clients"]
        CC["Claude Code / Cursor"]
        AG["Custom Agents"]
        DB["Streamlit Dashboard"]
    end

    subgraph Transport["πŸ”Œ Transport Layer"]
        MCP["MCP stdio server<br/>(7 tools)"]
        REST["FastAPI REST API<br/>(12 endpoints)"]
    end

    subgraph Core["βš™οΈ MemoryMesh Core"]
        STORE["MemoryStore<br/>CRUD + Embeddings"]
        RET["Retriever<br/>Hybrid Scoring"]
        COMP["Compressor<br/>Tier-1 Groq Β· Tier-2 Mistral"]
        DECAY["DecayEngine<br/>Ebbinghaus Curve"]
        EMB["Embedder<br/>bge-large-en-v1.5 Β· 1024-dim"]
    end

    subgraph Storage["πŸ’Ύ Persistence"]
        SQL["SQLite WAL<br/>memories Β· embeddings<br/>compression_log Β· access_log"]
        FAISS["FAISS IndexFlatIP<br/>(numpy fallback)"]
    end

    CC -->|JSON-RPC stdio| MCP
    AG -->|HTTP| REST
    DB -->|HTTP| REST
    MCP --> STORE
    MCP --> RET
    REST --> STORE
    REST --> RET
    REST --> COMP
    STORE --> SQL
    STORE --> EMB
    RET --> FAISS
    RET --> EMB
    COMP -->|Groq llama-3.1-8b-instant| STORE
    COMP -->|Mistral mistral-small| STORE
    DECAY -->|asyncio background task| STORE
Loading

The Four Memory Types

Type Icon Description Real-world Analogy Decay Speed
episodic πŸ• Time-stamped events "I talked to Alice about X yesterday" Fast (Γ—1.0)
semantic πŸ“š Permanent facts "Alice is a Python engineer at Google" Slow (Γ—2.0)
procedural βš™οΈ Skills & how-tos "To deploy FastAPI: uvicorn main:app..." Slowest (Γ—3.0)
preference ❀️ User patterns "Alice prefers concise bullet-point answers" Medium-slow (Γ—2.5)

Each type has a tuned stability multiplier in the Ebbinghaus forgetting curve, so skills outlast events, and facts outlast episodes β€” just like human memory.


Hybrid Retrieval Pipeline

flowchart LR
    Q["πŸ” Query"] --> EMB["Embed with\nbge-large-en-v1.5"]
    EMB --> FAISS["FAISS / numpy\ncosine similarity"]
    FAISS --> SEM["Semantic\nScore (Γ—0.5)"]

    Q --> TIME["Time since\nlast access"]
    TIME --> REC["Recency\nScore (Γ—0.3)"]

    Q --> IMP["importance field\n+ access_count boost"]
    IMP --> IMPS["Importance\nScore (Γ—0.2)"]

    SEM --> BLEND["Weighted\nBlend"]
    REC --> BLEND
    IMPS --> BLEND
    BLEND --> RANK["Re-rank & Return\nTop-K Results"]
    RANK --> LOG["Increment\naccess_count"]
Loading

Final score formula:

score = 0.5 Γ— cosine_sim  +  0.3 Γ— exp(-λ·days)  +  0.2 Γ— (importance + min(0.02Β·accesses, 0.3))

Weights are configurable per-query via semantic_weight, recency_weight, importance_weight.


Hierarchical Compression

Memories are automatically compressed on a nightly schedule to keep the store lean and token-efficient:

Week 1-7    [Fresh memories β€” full content stored]
              β”‚
              β–Ό  Tier 1 (age > 7 days, not recently accessed)
Week 1+     [llama-3.1-8b-instant via Groq]
              "Compress to 2-3 sentences preserving key facts"
              β†’ original preserved in compression_log
              β†’ is_compressed = True
              β”‚
              β–Ό  Tier 2 (cluster merge, β‰₯ 3 episodic memories)
              [mistral-small-latest via Mistral AI]
              "Synthesize N episodic memories β†’ 1 semantic memory"
              β†’ source memories marked is_compressed
              β†’ new semantic memory created with importance = 0.8

The compression_log table records every compression event with original content, timestamp, and model used β€” making compression fully auditable and reversible.


The Ebbinghaus Forgetting Curve

Each memory has a decay_score updated every 6 hours by the background DecayEngine:

$$R = e^{-t/S}$$

Where:

  • R = retention score ∈ [0, 1]
  • t = days since last access
  • S = stability = (1/Ξ») Γ— type_multiplier Γ— (1 + importance) Γ— (1 + log(1 + accesses) Γ— 0.5)
Retention
   1.0 ─
   0.9 ─·····  ← procedural (skills, multiplier=3.0)
   0.8 ─    Β·Β·Β·Β·Β·  ← semantic (facts, multiplier=2.0)
   0.7 ─        Β·Β·Β·Β·Β·  ← preference (patterns, multiplier=2.5)
   0.5 ─              Β·Β·Β·Β·Β·
   0.3 ─                   Β·Β·Β·Β·Β·  ← episodic (events, multiplier=1.0)
   0.1 ─                         Β·Β·Β·Β·Β·
   0.0 ───────────────────────────────── days
       0    7    14   30   60   90

High-importance + frequently-accessed memories gain extra stability β€” your "important things" stick around.


Quick Start

# 1. Clone & install
git clone https://github.com/0DevDutt0/MemoryMesh.git
cd MemoryMesh
pip install -e ".[dev]"

# 2. Configure
cp .env.example .env
# Add your GROQ_API_KEY and MISTRAL_API_KEY

# 3. Start the REST API
uvicorn memorymesh.api.main:app --reload
# β†’ http://localhost:8000/docs (interactive Swagger UI)

# 4. Start the MCP server (for Claude Code / Cursor)
python -m memorymesh.mcp.server

# 5. Launch the dashboard
streamlit run dashboard/app.py
# β†’ http://localhost:8501

Store & Retrieve in 10 Lines

import httpx, asyncio

API = "http://localhost:8000/v1"

async def main():
    async with httpx.AsyncClient() as c:
        # Store a fact about the user
        await c.post(f"{API}/memories/", json={
            "content": "User prefers FastAPI over Flask for async workloads.",
            "agent_id": "my-agent",
            "memory_type": "preference",
            "importance": 0.9,
        })

        # Retrieve it semantically
        results = (await c.post(f"{API}/memories/search", json={
            "query": "what web framework does the user prefer?",
            "agent_id": "my-agent",
            "k": 3,
        })).json()

        for r in results:
            print(f"[{r['rank']}] {r['score']:.3f} β€” {r['memory']['content']}")

asyncio.run(main())
[1] 0.874 β€” User prefers FastAPI over Flask for async workloads.

Demo β€” See It In Action

Demo Sample Inputs Sample Outputs

Start the API server, then run the end-to-end demo in one command:

# Terminal 1 β€” start the REST API
uvicorn memorymesh.api.main:app --reload

# Terminal 2 β€” full lifecycle demo (store Β· search Β· graph Β· update Β· stats)
python Demo/quickstart.py

Quickstart Terminal Output

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  1. Health Check   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
{"status": "ok", "ts": "2026-06-15T14:00:00.123456"}

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  2. Storing 7 Memories   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  βœ“ [semantic     ] id=a1b2c3d4…  importance=0.95
  βœ“ [episodic     ] id=3f8a1c2d…  importance=0.90
  βœ“ [procedural   ] id=b2c3d4e5…  importance=0.85
  βœ“ [preference   ] id=c3d4e5f6…  importance=0.80
  βœ“ [semantic     ] id=d4e5f6a7…  importance=0.75
  βœ“ [episodic     ] id=e5f6a7b8…  importance=0.70
  βœ“ [procedural   ] id=f6a7b8c9…  importance=0.80

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  4. Semantic Search β€” 'user communication style'     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  #1 score=0.8214  [preference]
     The user prefers concise answers, dark mode UI, Python over JavaScript…
  #2 score=0.7123  [semantic]
     Devdutt S is a software engineer from Kochi, Kerala. GitHub: 0DevDutt0…

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  6. Memory Graph (threshold=0.7)             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  Nodes: 7  |  Edges: 3
  Edge  weight=0.8123  a1b2c3d4… ↔ 3f8a1c2d…
  Edge  weight=0.7891  b2c3d4e5… ↔ f6a7b8c9…
  Edge  weight=0.7234  d4e5f6a7… ↔ a1b2c3d4…

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  9. Final Stats      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  Total memories : 7
  Compressed     : 0
  Avg decay score: 0.9981
  By type        : {semantic: 2, episodic: 2, procedural: 2, preference: 1}

βœ…  Demo complete β€” all operations succeeded.

Store β†’ Search JSON Round-Trip

The core pattern: store a memory, retrieve it semantically β€” no keyword overlap required.

1. Store a preference:

curl -X POST http://localhost:8000/v1/memories/ \
  -H "Content-Type: application/json" \
  -d '{
    "content": "The user prefers concise answers, dark mode UI, Python over JavaScript, asyncio over threading.",
    "agent_id": "demo-agent",
    "memory_type": "preference",
    "importance": 0.8
  }'
{
  "id": "c3d4e5f6-a7b8-9012-cdef-333444555666",
  "agent_id": "demo-agent",
  "memory_type": "preference",
  "content": "The user prefers concise answers, dark mode UI, Python over JavaScript, asyncio over threading.",
  "importance": 0.8,
  "access_count": 0,
  "decay_score": 1.0,
  "is_compressed": false,
  "created_at": "2026-06-15T14:04:00.000000"
}

2. Search with a semantically different query (zero keyword overlap):

curl -X POST http://localhost:8000/v1/memories/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "what coding language does the user enjoy and how do they like responses formatted?",
    "agent_id": "demo-agent",
    "k": 3
  }'
[
  {
    "memory": {
      "id": "c3d4e5f6-a7b8-9012-cdef-333444555666",
      "memory_type": "preference",
      "content": "The user prefers concise answers, dark mode UI, Python over JavaScript, asyncio over threading.",
      "importance": 0.8,
      "access_count": 1,
      "decay_score": 0.9981
    },
    "score": 0.821406,
    "rank": 1
  }
]

The query "what coding language does the user enjoy…" matched with score 0.821 despite sharing zero keywords with the stored memory β€” pure cosine similarity in 1024-dimensional bge-large embedding space.


Ebbinghaus Decay In Action

The same memory stored under different types and access patterns decays at very different rates:

Days since access Memory Type Importance Accesses Retention
0 any any any 1.000
7 episodic 0.5 0 0.703
7 semantic 0.5 0 0.837
30 episodic 0.5 0 0.135
30 procedural 0.9 10 0.912
90 semantic 0.9 20 0.831
90 episodic 0.3 1 0.003

A frequently-accessed skill (procedural, importance=0.9, 10 accesses) retains 91% after 30 days.
A one-off low-importance event (episodic, 1 access) fades to 0.3% after 90 days β€” modelling human forgetting mathematically.


Demo Files

File What's inside
Demo/quickstart.py End-to-end Python script β€” stores 7 memories across all four types, semantic search, type-filtered search, memory graph, update, list, and stats
Demo/SAMPLE_INPUTS.md 15 annotated curl examples: health check, all 4 memory types, batch store, semantic search, type-filtered search, get/update/delete, graph, stats, tier-1 & tier-2 compression, MCP tool calls
Demo/sample_outputs.json Canonical JSON responses for every operation β€” useful as an API contract reference or test fixture

API Reference

Method Endpoint Description
POST /v1/memories/ Store a single memory
POST /v1/memories/search Hybrid semantic search
POST /v1/memories/batch Batch store (up to 100)
GET /v1/memories/{id} Get memory by ID
PATCH /v1/memories/{id} Update content / importance / metadata
DELETE /v1/memories/{id} Delete permanently
GET /v1/memories/agent/{id} List all memories for an agent
POST /v1/memories/agent/{id}/graph Semantic similarity graph (for viz)
GET /v1/stats Global memory statistics
POST /v1/compress/trigger Run auto-compression now
POST /v1/compress/memory/{id}/tier1 Compress single memory (Groq)
POST /v1/compress/agent/{id}/tier2 Cluster merge (Mistral)
GET /v1/compress/log Compression audit history
GET /health Liveness probe

Full interactive docs at http://localhost:8000/docs (Swagger UI) and /redoc (ReDoc).


MCP Integration

Add to your Claude Code / Cursor MCP config:

{
  "mcpServers": {
    "memorymesh": {
      "command": "python",
      "args": ["-m", "memorymesh.mcp.server"],
      "cwd": "/path/to/MemoryMesh"
    }
  }
}

7 tools exposed to the LLM:

Tool What it does
store_memory Save a new memory (type + importance + metadata)
retrieve_memories Semantic search with optional type filter
delete_memory Hard delete by ID
update_memory Edit content / importance / metadata in-place
list_memories Browse all memories for an agent
get_memory_stats Token-efficient stats snapshot
compress_agent_memories Trigger cluster merge for an agent

Streamlit Dashboard

Four interactive pages accessible at http://localhost:8501:

Page What you see
πŸ” Search Explorer Live hybrid retrieval with weight sliders Β· Store new memories
πŸ•ΈοΈ Memory Graph pyvis semantic network Β· colour-coded by type Β· edge weight = cosine similarity
πŸ—œοΈ Compression Monitor Timeline of compressions Β· Token savings Β· Manual trigger
πŸ“‰ Decay Visualizer Interactive Plotly retention curves Β· Adjust Ξ», importance, access count

Tech Stack

Layer Technology Why
Language Python 3.13 Async-native, type hints, StrEnum
API FastAPI + uvicorn Auto-docs, async, Pydantic v2 validation
Database aiosqlite (SQLite WAL) Zero-dependency, async, ACID, BLOB storage
Embeddings BAAI/bge-large-en-v1.5 Best open-source embedding (1024-dim, MTEB top-5)
Vector Search FAISS IndexFlatIP Exact cosine search, optional numpy fallback
LLM Tier-1 Groq llama-3.1-8b-instant Sub-second summarisation, free tier
LLM Tier-2 Mistral mistral-small High-quality cluster synthesis
MCP mcp SDK 1.27 stdio JSON-RPC, works with Claude Code / Cursor
Dashboard Streamlit + pyvis + Plotly Interactive memory exploration
Testing pytest-asyncio, MagicMock 51 tests, in-memory SQLite, no GPU in CI
CI GitHub Actions lint (ruff) + test matrix on ubuntu

Testing

pytest tests/ -v
βœ“ test_compressor.py   8 tests  β€” LLM compression (mocked Groq + Mistral clients)
βœ“ test_decay.py       13 tests  β€” Ebbinghaus formula + DecayEngine lifecycle
βœ“ test_retriever.py   14 tests  β€” Search, filters, agent isolation, graph
βœ“ test_store.py       16 tests  β€” CRUD, access tracking, embedding round-trip
─────────────────────────────────
51 passed in 0.63s

Key design decisions:

  • No GPU required in CI β€” Embedder is mocked with a deterministic hash-seeded numpy vector
  • No real LLM calls in tests β€” AsyncGroq and Mistral clients are patched via unittest.mock
  • Isolated databases β€” every test fixture uses aiosqlite :memory:, with automatic teardown
  • asyncio_mode = "auto" β€” all async def tests run automatically without @pytest.mark.asyncio

Configuration

Copy .env.example β†’ .env and fill in your keys:

# Required for compression
GROQ_API_KEY=gsk_...
MISTRAL_API_KEY=...

# Tunable parameters
EMBEDDING_MODEL=BAAI/bge-large-en-v1.5   # swap for lighter all-MiniLM-L6-v2
DECAY_RATE=0.1                            # Ξ» in the forgetting curve
COMPRESS_AGE_DAYS=7.0                     # memories older than this get tier-1 compressed
COMPRESS_CLUSTER_SIZE=20                  # episodic memories per tier-2 merge
DECAY_RUN_INTERVAL_HOURS=6.0             # background decay update frequency

Project Layout

memorymesh/
β”œβ”€β”€ core/           config Β· logging Β· aiosqlite database
β”œβ”€β”€ memory/         types Β· embedder Β· store Β· retriever Β· decay Β· compressor
β”œβ”€β”€ schemas/        Pydantic request / response models
β”œβ”€β”€ api/            FastAPI app Β· 3 routers (memories, compress, health)
└── mcp/            MCP stdio server (7 tools)
dashboard/          Streamlit 4-page UI
tests/              51 async tests, mock embedder, in-memory DB
Demo/               Runnable quickstart Β· curl examples Β· sample JSON
.github/workflows/  CI: lint (ruff) + pytest matrix

Why MemoryMesh?

Feature MemoryMesh mem0 Zep ChromaDB alone
Four semantic memory types βœ… ❌ ❌ ❌
MCP-native (Claude / Cursor) βœ… ❌ ❌ ❌
Ebbinghaus forgetting curve βœ… ❌ ❌ ❌
Hierarchical LLM compression βœ… Partial Partial ❌
REST API + dashboard βœ… βœ… βœ… ❌
Zero infrastructure (SQLite) βœ… ❌ ❌ βœ…
Open source, no usage fees βœ… Partial Partial βœ…

Built with curiosity and production instincts by Devdutt S Β· Kochi, India

GitHub LinkedIn

About

Production-grade persistent memory server for AI agents. MCP-compatible (Claude Code, Cursor). Four memory types (episodic/semantic/procedural/preference), hybrid vector retrieval (FAISS + bge-large), hierarchical LLM compression (Groq + Mistral), Ebbinghaus forgetting curve, FastAPI + Streamlit.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages