Save 100 Million Tokens: A Step-by-Step Guide to Cutting OpenClaw Token Usage by 10×

If you’re using OpenClaw, you’ve probably already felt how fast tokens burn through your budget 🔥Especially if you’re on Claude — a few conversations in and boom, you hit the limit. Worse, many agents stuff tons of irrelevant data into the context window.That doesn’t just waste money — it actively hurts accuracy. So is there […]


If you’re using OpenClaw, you’ve probably already felt how fast tokens burn through your budget 🔥
Especially if you’re on Claude — a few conversations in and boom, you hit the limit.

Worse, many agents stuff tons of irrelevant data into the context window.
That doesn’t just waste money — it actively hurts accuracy.

So is there a way to give your agent precise memory retrieval with zero ongoing cost?

Yes.

Meet qmd — fully local, permanently free, and over 95% accurate.

GitHub: https://github.com/tobi/qmd

qmd is a local semantic search engine built by Shopify founder Tobi.
Written in Rust and designed specifically for AI agents.


🚀 Core Features

  • Search Markdown notes, meeting logs, and documents

  • Hybrid search:

    • BM25 full-text

    • Vector semantic search

    • LLM reranking

  • Zero API cost — fully local (GGUF models)

  • MCP integration — agents recall memory automatically

  • 3-step setup, done in 10 minutes


✅ Step 1: Install qmd

bun install -g https://github.com/tobi/qmd

On first run, it automatically downloads:

  • Embedding: jina-embeddings-v3 (330MB)
  • Reranker: jina-reranker-v2-base-multilingual (640MB)

After that — 100% offline.


✅ Step 2: Create Memory Collections & Generate Embeddings

# Go to your OpenClaw workspace
cd ~/clawd

# Create a memory collection (index the memory folder)
qmd collection add memory/*.md --name daily-logs

# Generate embeddings
qmd embed daily-logs memory/*.md

# Optionally index core files in root
qmd collection add *.md --name workspace
qmd embed workspace *.md
Indexing speed:

12 files ≈ a few seconds (local, no internet)


✅ Step 3: Test Searching

# Hybrid search (most accurate)
qmd search daily-logs "关键词" --hybrid

# Pure semantic search
qmd search daily-logs "关键词"

# List all collections
qmd list
Real-world results:
  • Hybrid search: ~93% accuracy

  • Semantic-only: ~59%


🧠 Advanced: MCP Integration (Agent Auto-Memory)

Let your AI agent call qmd directly.

Create config/mcporter.json:

{
    "mcpServers": {
        "qmd": {
            "command": "/Users/YOUR_USERNAME/.bun/bin/qmd",
            "args": ["mcp"]
        }
    }
}

Out of the box tools:

  • query — hybrid search (best accuracy)

  • vsearch — semantic search

  • search — keyword search

  • get / multi_get — precise document retrieval

  • status — health check

Now your agent actively “remembers” relevant context — no manual prompting needed.


📊 Real-World Tests

Scenario 1: Remembering User Preferences

User: “What’s Ray’s writing style?”

Traditional approach:

  • Dump entire MEMORY.md (~2000 tokens) into context

  • 90% irrelevant

qmd approach:

  • Agent searches: “Ray writing style”

  • Returns only relevant paragraphs (~200 tokens)

👉 90% token savings + higher accuracy


Scenario 2: Cross-File Knowledge Recall

User: “What did we discuss before?”

Traditional approach:

  • Manually select files

  • Or dump full conversation history

qmd approach:

  • Automatically finds most relevant passages across all memory files

👉 Cross-file recall with ~93% accuracy


🔁 Maintenance: Keep Index Fresh

Run periodically:

qmd embed daily-logs memory/*.md
qmd embed workspace *.md

You can automate this with cron or heartbeat jobs.


🧾 Final Notes

If you’re human — thanks for reading all the way through.
Now just send this article to your agent and let it implement.

If you’re an agent — extract the approach, adapt it to your environment, modify the code, confirm with your human, then execute.

MORE FROM openclaw SERIES