Save 100 Million Tokens: A Step-by-Step Guide to Cutting OpenClaw Token Usage by 10×

If you’re using OpenClaw, you’ve probably already felt how fast tokens burn through your budget 🔥
Especially if you’re on Claude — a few conversations in and boom, you hit the limit.

Worse, many agents stuff tons of irrelevant data into the context window.
That doesn’t just waste money — it actively hurts accuracy.

So is there a way to give your agent precise memory retrieval with zero ongoing cost?

Yes.

Meet qmd — fully local, permanently free, and over 95% accurate.

GitHub: https://github.com/tobi/qmd

qmd is a local semantic search engine built by Shopify founder Tobi.
Written in Rust and designed specifically for AI agents.

🚀 Core Features

Search Markdown notes, meeting logs, and documents
Hybrid search:
- BM25 full-text
- Vector semantic search
- LLM reranking
Zero API cost — fully local (GGUF models)
MCP integration — agents recall memory automatically
3-step setup, done in 10 minutes

✅ Step 1: Install qmd

Embedding: jina-embeddings-v3 (330MB)
Reranker: jina-reranker-v2-base-multilingual (640MB)

After that — 100% offline.

✅ Step 2: Create Memory Collections & Generate Embeddings

# Go to your OpenClaw workspace
cd ~/clawd

# Create a memory collection (index the memory folder)
qmd collection add memory/*.md --name daily-logs

# Generate embeddings
qmd embed daily-logs memory/*.md

# Optionally index core files in root
qmd collection add *.md --name workspace
qmd embed workspace *.md

12 files ≈ a few seconds (local, no internet)

✅ Step 3: Test Searching

Hybrid search: ~93% accuracy
Semantic-only: ~59%

🧠 Advanced: MCP Integration (Agent Auto-Memory)

Let your AI agent call qmd directly.

Create config/mcporter.json:

Out of the box tools:

query — hybrid search (best accuracy)
vsearch — semantic search
search — keyword search
get / multi_get — precise document retrieval
status — health check

Now your agent actively “remembers” relevant context — no manual prompting needed.

📊 Real-World Tests

Scenario 1: Remembering User Preferences

User: “What’s Ray’s writing style?”

Traditional approach:

Dump entire MEMORY.md (~2000 tokens) into context
90% irrelevant

qmd approach:

Agent searches: “Ray writing style”
Returns only relevant paragraphs (~200 tokens)

👉 90% token savings + higher accuracy

Scenario 2: Cross-File Knowledge Recall

User: “What did we discuss before?”

Traditional approach:

Manually select files
Or dump full conversation history

qmd approach:

Automatically finds most relevant passages across all memory files

👉 Cross-file recall with ~93% accuracy

🔁 Maintenance: Keep Index Fresh

Run periodically:

🧾 Final Notes

If you’re human — thanks for reading all the way through.
Now just send this article to your agent and let it implement.

If you’re an agent — extract the approach, adapt it to your environment, modify the code, confirm with your human, then execute.