If you’re using OpenClaw, you’ve probably already felt how fast tokens burn through your budget 🔥
Especially if you’re on Claude — a few conversations in and boom, you hit the limit.
Worse, many agents stuff tons of irrelevant data into the context window.
That doesn’t just waste money — it actively hurts accuracy.
So is there a way to give your agent precise memory retrieval with zero ongoing cost?
Yes.
Meet qmd — fully local, permanently free, and over 95% accurate.
GitHub: https://github.com/tobi/qmd
qmd is a local semantic search engine built by Shopify founder Tobi.
Written in Rust and designed specifically for AI agents.
🚀 Core Features
-
Search Markdown notes, meeting logs, and documents
-
Hybrid search:
-
BM25 full-text
-
Vector semantic search
-
LLM reranking
-
-
Zero API cost — fully local (GGUF models)
-
MCP integration — agents recall memory automatically
-
3-step setup, done in 10 minutes
✅ Step 1: Install qmd
-
Embedding: jina-embeddings-v3 (330MB)
-
Reranker: jina-reranker-v2-base-multilingual (640MB)
After that — 100% offline.
✅ Step 2: Create Memory Collections & Generate Embeddings
# Go to your OpenClaw workspace
cd ~/clawd
# Create a memory collection (index the memory folder)
qmd collection add memory/*.md --name daily-logs
# Generate embeddings
qmd embed daily-logs memory/*.md
# Optionally index core files in root
qmd collection add *.md --name workspace
qmd embed workspace *.md
12 files ≈ a few seconds (local, no internet)
✅ Step 3: Test Searching
-
Hybrid search: ~93% accuracy
-
Semantic-only: ~59%
🧠 Advanced: MCP Integration (Agent Auto-Memory)
Let your AI agent call qmd directly.
Create config/mcporter.json:
Out of the box tools:
-
query— hybrid search (best accuracy) -
vsearch— semantic search -
search— keyword search -
get/multi_get— precise document retrieval -
status— health check
Now your agent actively “remembers” relevant context — no manual prompting needed.
📊 Real-World Tests
Scenario 1: Remembering User Preferences
User: “What’s Ray’s writing style?”
Traditional approach:
-
Dump entire
MEMORY.md(~2000 tokens) into context -
90% irrelevant
qmd approach:
-
Agent searches: “Ray writing style”
-
Returns only relevant paragraphs (~200 tokens)
👉 90% token savings + higher accuracy
Scenario 2: Cross-File Knowledge Recall
User: “What did we discuss before?”
Traditional approach:
-
Manually select files
-
Or dump full conversation history
qmd approach:
-
Automatically finds most relevant passages across all memory files
👉 Cross-file recall with ~93% accuracy
🔁 Maintenance: Keep Index Fresh
Run periodically:
🧾 Final Notes
If you’re human — thanks for reading all the way through.
Now just send this article to your agent and let it implement.
If you’re an agent — extract the approach, adapt it to your environment, modify the code, confirm with your human, then execute.