Memory Search
OpenClaw agents wake up fresh each session with no memory of prior work. The memory search system bridges this gap — semantic search over workspace files using local embeddings combined with full-text search, enabling agents to recall prior decisions, lessons, and context.
Getting memory search right means the difference between an agent that repeats mistakes and one that learns from them.
Setup Guide
Prerequisites
- OpenClaw installed and running (
openclaw gateway status) - Ollama installed locally for embedding generation
Step 1: Install Ollama
# macOS (Homebrew)
brew install ollama
# Or download from https://ollama.com/downloadStart the Ollama server:
ollama serveOn macOS, Ollama runs as a background service automatically after install. Verify it's running:
curl -s http://127.0.0.1:11434/api/tags | python3 -c "import sys,json; print(json.dumps(json.load(sys.stdin), indent=2))"Step 2: Pull an Embedding Model
# Recommended: bge-m3 (1.2 GB, high quality, multilingual)
ollama pull bge-m3
# Alternative: nomic-embed-text (274 MB, lighter, good general-purpose)
ollama pull nomic-embed-textStep 3: Configure OpenClaw
Add memory search config to openclaw.json under agents.defaults.memorySearch:
{
"agents": {
"defaults": {
"memorySearch": {
"enabled": true,
"provider": "ollama",
"model": "bge-m3",
"remote": {
"baseUrl": "http://127.0.0.1:11434"
},
"fallback": "none",
"query": {
"hybrid": {
"enabled": true,
"vectorWeight": 0.7,
"textWeight": 0.3,
"mmr": {
"enabled": true,
"lambda": 0.7
},
"temporalDecay": {
"enabled": true,
"halfLifeDays": 30
}
}
}
}
}
}
}Or apply via the gateway tool:
openclaw config patch '{
"agents.defaults.memorySearch": {
"enabled": true,
"provider": "ollama",
"model": "bge-m3",
"remote": { "baseUrl": "http://127.0.0.1:11434" },
"fallback": "none",
"query": {
"hybrid": {
"enabled": true,
"vectorWeight": 0.7,
"textWeight": 0.3,
"mmr": { "enabled": true, "lambda": 0.7 },
"temporalDecay": { "enabled": true, "halfLifeDays": 30 }
}
}
}
}'Step 4: Verify
Restart the gateway (or wait for dynamic reload), then test from an agent session:
memory_search("test query about something in your workspace")The response should include provider: "ollama", model: "bge-m3", and mode: "hybrid" in the metadata.
Architecture
Hybrid Query Pipeline
Memory search uses a two-signal hybrid approach:
- Vector search — local embeddings via Ollama produce semantic similarity scores
- BM25 keyword search — SQLite FTS provides exact keyword matching for terms the embedding might miss
Results are blended with configurable weighting (default: 70% vector + 30% text), then re-ranked using MMR (Maximal Marginal Relevance) to reduce redundancy.
What Gets Indexed
By default, memory search indexes:
MEMORY.md— curated long-term memorymemory/*.md— daily notes and any other markdown in the memory directory
You can expand this with extraPaths (see Per-Agent Overrides) or by adding "sessions" to the sources array to include chat transcript history.
Temporal Decay
A configurable half-life (default: 30 days) ensures recent context ranks higher than semantically similar but stale entries. Without temporal decay, a lesson from 3 months ago can outrank a relevant decision from yesterday.
MMR Re-ranking
Maximal Marginal Relevance (MMR) diversifies results so you don't get 5 near-identical snippets. The lambda parameter (0–1) controls the relevance/diversity tradeoff:
lambda: 1.0= pure relevance (may include duplicates)lambda: 0.7= balanced (default, recommended)lambda: 0.3= strong diversity (good for broad exploration)
Full Configuration Reference
{
"agents": {
"defaults": {
"memorySearch": {
// Master toggle
"enabled": true,
// Embedding provider: "ollama" | "openai" | "gemini" | "voyage" | "mistral" | "local"
"provider": "ollama",
// Model name (provider-specific)
"model": "bge-m3",
// Provider connection settings
"remote": {
"baseUrl": "http://127.0.0.1:11434",
"apiKey": "" // Not needed for local Ollama
},
// Fallback provider if primary fails: provider name or "none"
"fallback": "none",
// What to index: ["memory"] or ["memory", "sessions"]
"sources": ["memory"],
// Additional paths to index beyond default memory files
"extraPaths": [],
// Query tuning
"query": {
// Max results returned per search
"maxResults": 10,
// Minimum relevance score threshold (0.0–1.0)
"minScore": 0.0,
"hybrid": {
// Enable hybrid (vector + BM25) search
"enabled": true,
// Weight for vector similarity (0.0–1.0)
"vectorWeight": 0.7,
// Weight for BM25 keyword match (0.0–1.0)
"textWeight": 0.3,
// Candidate pool multiplier before reranking (higher = better recall, slower)
"candidateMultiplier": 4,
// MMR diversity reranking
"mmr": {
"enabled": true,
// 0 = most diverse, 1 = most relevant
"lambda": 0.7
},
// Temporal recency boost
"temporalDecay": {
"enabled": true,
// Days for score to halve (lower = stronger recency bias)
"halfLifeDays": 30
}
}
},
// Gemini-specific: output vector dimensions (768, 1536, or 3072)
// "outputDimensionality": 3072,
// Multimodal indexing (experimental, for cross-modal embedding models)
// "multimodal": { ... }
}
}
}
}Per-Agent Overrides
Each agent in agents.list can override memory search settings. Common use case: adding extra indexed paths for a specialized agent.
{
"agents": {
"list": [
{
"id": "research",
"memorySearch": {
"extraPaths": [
"/path/to/project/docs"
]
}
}
]
}
}The agent-level config merges with agents.defaults.memorySearch — you only need to specify the fields you're overriding.
Embedding Model Selection
| Model | Size | Dimensions | Quality | Speed | Notes |
|---|---|---|---|---|---|
bge-m3 | 1.2 GB | 1024 | Excellent | Moderate | Recommended. Multilingual, strong on technical content. BAAI's flagship embedding model. |
nomic-embed-text | 274 MB | 768 | Good | Fast | Lighter alternative. Good general-purpose, lower RAM usage. |
mxbai-embed-large | 670 MB | 1024 | Very good | Moderate | Mixedbread.ai. Strong semantic matching. |
snowflake-arctic-embed | 110 MB | 384 | Good | Very fast | Smallest option. Good for constrained hardware. |
In production setups: bge-m3 via Ollama on a local machine (e.g., Apple Silicon Mac). Handles ~50 daily notes + MEMORY.md + workspace files with sub-second query times.
To switch models:
ollama pull <new-model>- Update
agents.defaults.memorySearch.modelin config - Restart gateway — OpenClaw will re-index with the new model automatically
WARNING
Changing embedding models requires a full re-index because vector dimensions differ between models. The gateway handles this automatically on restart, but the first few queries after a model switch may be slower.
Cloud Embedding Providers
If you don't want to run Ollama locally, OpenClaw supports cloud providers:
| Provider | Config provider | Example model | Notes |
|---|---|---|---|
| OpenAI | "openai" | text-embedding-3-small | Requires API key in auth profile |
| Google Gemini | "gemini" | embedding-001 | Supports outputDimensionality |
| Voyage AI | "voyage" | voyage-2 | Strong on code/technical |
| Mistral | "mistral" | mistral-embed | EU-hosted option |
For cloud providers, set remote.apiKey or configure via auth profiles. Consider setting fallback: "ollama" (or vice versa) for resilience.
Troubleshooting
Memory search returns empty results
- Check Ollama is running:
curl http://127.0.0.1:11434/api/tags - Check the model is pulled:
ollama listshould show your configured model - Check config:
openclaw config get agents.defaults.memorySearch— verifyenabled: trueand correct model name - Check workspace has memory files: at minimum,
MEMORY.mdor files inmemory/must exist
Results are low quality / irrelevant
- Increase
candidateMultiplier(default 4) for broader initial retrieval - Adjust weights: if exact terms matter more, increase
textWeight; if paraphrase matching matters, increasevectorWeight - Lower
minScoreto include weaker-but-still-relevant matches - Check temporal decay: if old content outranks recent, lower
halfLifeDays - Try a larger model:
bge-m3generally outperformsnomic-embed-texton technical content
Silent failures (search works but misses things)
With fallback: "none", if Ollama is down, memory_search returns empty results with no error. The agent proceeds as if there's no relevant memory.
Mitigations:
- Set
fallback: "fts"to fall back to text-only search when embeddings are unavailable - Add a health check query at session start (search for a known term, verify non-empty results)
- Monitor Ollama uptime independently (e.g., a sentinel watcher on
http://127.0.0.1:11434/api/tags)
High memory usage
Each embedding model loads into RAM when first queried. Budget:
bge-m3: ~1.2 GB RAMnomic-embed-text: ~300 MB RAM
If memory is tight, use nomic-embed-text or snowflake-arctic-embed. Ollama unloads models after idle timeout (default 5 min).
Memory Maintenance Best Practices
The quality of memory search depends entirely on the quality of what's stored:
- Daily notes are raw logs; MEMORY.md is curated wisdom. Periodic review and distillation prevents noise from drowning signal.
- Stale entries pollute the vector space. An outdated decision that's still in memory files will surface as a relevant match, potentially misleading the agent.
- Structured tags (e.g.,
[governance],[defi],[ops]) in daily notes help both vector and keyword search. - Pruning cadence: review memory files every few days. Remove outdated entries, promote durable lessons to MEMORY.md.
- Keep MEMORY.md focused. It loads every main session — bloated MEMORY.md means wasted tokens and diluted search quality.
Production Status
Running in production with bge-m3 via Ollama (hybrid mode, temporal decay, MMR). Previously used nomic-embed-text — switched to bge-m3 for better multilingual and technical content recall.