Memory Search

OpenClaw agents wake up fresh each session with no memory of prior work. The memory search system bridges this gap — semantic search over workspace files using local embeddings combined with full-text search, enabling agents to recall prior decisions, lessons, and context.

Getting memory search right means the difference between an agent that repeats mistakes and one that learns from them.

Setup Guide

Prerequisites

OpenClaw installed and running (openclaw gateway status)
Ollama installed locally for embedding generation

Step 1: Install Ollama

bash

# macOS (Homebrew)
brew install ollama

# Or download from https://ollama.com/download

Start the Ollama server:

bash

ollama serve

On macOS, Ollama runs as a background service automatically after install. Verify it's running:

bash

curl -s http://127.0.0.1:11434/api/tags | python3 -c "import sys,json; print(json.dumps(json.load(sys.stdin), indent=2))"

Step 2: Pull an Embedding Model

bash

# Recommended: bge-m3 (1.2 GB, high quality, multilingual)
ollama pull bge-m3

# Alternative: nomic-embed-text (274 MB, lighter, good general-purpose)
ollama pull nomic-embed-text

Step 3: Configure OpenClaw

Add memory search config to openclaw.json under agents.defaults.memorySearch:

jsonc

{
  "agents": {
    "defaults": {
      "memorySearch": {
        "enabled": true,
        "provider": "ollama",
        "model": "bge-m3",
        "remote": {
          "baseUrl": "http://127.0.0.1:11434"
        },
        "fallback": "none",
        "query": {
          "hybrid": {
            "enabled": true,
            "vectorWeight": 0.7,
            "textWeight": 0.3,
            "mmr": {
              "enabled": true,
              "lambda": 0.7
            },
            "temporalDecay": {
              "enabled": true,
              "halfLifeDays": 30
            }
          }
        }
      }
    }
  }
}

Or apply via the gateway tool:

bash

openclaw config patch '{
  "agents.defaults.memorySearch": {
    "enabled": true,
    "provider": "ollama",
    "model": "bge-m3",
    "remote": { "baseUrl": "http://127.0.0.1:11434" },
    "fallback": "none",
    "query": {
      "hybrid": {
        "enabled": true,
        "vectorWeight": 0.7,
        "textWeight": 0.3,
        "mmr": { "enabled": true, "lambda": 0.7 },
        "temporalDecay": { "enabled": true, "halfLifeDays": 30 }
      }
    }
  }
}'

Step 4: Verify

Restart the gateway (or wait for dynamic reload), then test from an agent session:

memory_search("test query about something in your workspace")

The response should include provider: "ollama", model: "bge-m3", and mode: "hybrid" in the metadata.

Architecture

Hybrid Query Pipeline

Memory search uses a two-signal hybrid approach:

Vector search — local embeddings via Ollama produce semantic similarity scores
BM25 keyword search — SQLite FTS provides exact keyword matching for terms the embedding might miss

Results are blended with configurable weighting (default: 70% vector + 30% text), then re-ranked using MMR (Maximal Marginal Relevance) to reduce redundancy.

What Gets Indexed

By default, memory search indexes:

MEMORY.md — curated long-term memory
memory/*.md — daily notes and any other markdown in the memory directory

You can expand this with extraPaths (see Per-Agent Overrides) or by adding "sessions" to the sources array to include chat transcript history.

Temporal Decay

A configurable half-life (default: 30 days) ensures recent context ranks higher than semantically similar but stale entries. Without temporal decay, a lesson from 3 months ago can outrank a relevant decision from yesterday.

MMR Re-ranking

Maximal Marginal Relevance (MMR) diversifies results so you don't get 5 near-identical snippets. The lambda parameter (0–1) controls the relevance/diversity tradeoff:

lambda: 1.0 = pure relevance (may include duplicates)
lambda: 0.7 = balanced (default, recommended)
lambda: 0.3 = strong diversity (good for broad exploration)

Full Configuration Reference

jsonc

{
  "agents": {
    "defaults": {
      "memorySearch": {
        // Master toggle
        "enabled": true,

        // Embedding provider: "ollama" | "openai" | "gemini" | "voyage" | "mistral" | "local"
        "provider": "ollama",

        // Model name (provider-specific)
        "model": "bge-m3",

        // Provider connection settings
        "remote": {
          "baseUrl": "http://127.0.0.1:11434",
          "apiKey": ""  // Not needed for local Ollama
        },

        // Fallback provider if primary fails: provider name or "none"
        "fallback": "none",

        // What to index: ["memory"] or ["memory", "sessions"]
        "sources": ["memory"],

        // Additional paths to index beyond default memory files
        "extraPaths": [],

        // Query tuning
        "query": {
          // Max results returned per search
          "maxResults": 10,

          // Minimum relevance score threshold (0.0–1.0)
          "minScore": 0.0,

          "hybrid": {
            // Enable hybrid (vector + BM25) search
            "enabled": true,

            // Weight for vector similarity (0.0–1.0)
            "vectorWeight": 0.7,

            // Weight for BM25 keyword match (0.0–1.0)
            "textWeight": 0.3,

            // Candidate pool multiplier before reranking (higher = better recall, slower)
            "candidateMultiplier": 4,

            // MMR diversity reranking
            "mmr": {
              "enabled": true,
              // 0 = most diverse, 1 = most relevant
              "lambda": 0.7
            },

            // Temporal recency boost
            "temporalDecay": {
              "enabled": true,
              // Days for score to halve (lower = stronger recency bias)
              "halfLifeDays": 30
            }
          }
        },

        // Gemini-specific: output vector dimensions (768, 1536, or 3072)
        // "outputDimensionality": 3072,

        // Multimodal indexing (experimental, for cross-modal embedding models)
        // "multimodal": { ... }
      }
    }
  }
}

Per-Agent Overrides

Each agent in agents.list can override memory search settings. Common use case: adding extra indexed paths for a specialized agent.

jsonc

{
  "agents": {
    "list": [
      {
        "id": "research",
        "memorySearch": {
          "extraPaths": [
            "/path/to/project/docs"
          ]
        }
      }
    ]
  }
}

The agent-level config merges with agents.defaults.memorySearch — you only need to specify the fields you're overriding.

Embedding Model Selection

Model	Size	Dimensions	Quality	Speed	Notes
`bge-m3`	1.2 GB	1024	Excellent	Moderate	Recommended. Multilingual, strong on technical content. BAAI's flagship embedding model.
`nomic-embed-text`	274 MB	768	Good	Fast	Lighter alternative. Good general-purpose, lower RAM usage.
`mxbai-embed-large`	670 MB	1024	Very good	Moderate	Mixedbread.ai. Strong semantic matching.
`snowflake-arctic-embed`	110 MB	384	Good	Very fast	Smallest option. Good for constrained hardware.

In production setups: bge-m3 via Ollama on a local machine (e.g., Apple Silicon Mac). Handles ~50 daily notes + MEMORY.md + workspace files with sub-second query times.

To switch models:

ollama pull <new-model>
Update agents.defaults.memorySearch.model in config
Restart gateway — OpenClaw will re-index with the new model automatically

WARNING

Changing embedding models requires a full re-index because vector dimensions differ between models. The gateway handles this automatically on restart, but the first few queries after a model switch may be slower.

Cloud Embedding Providers

If you don't want to run Ollama locally, OpenClaw supports cloud providers:

Provider	Config `provider`	Example `model`	Notes
OpenAI	`"openai"`	`text-embedding-3-small`	Requires API key in auth profile
Google Gemini	`"gemini"`	`embedding-001`	Supports `outputDimensionality`
Voyage AI	`"voyage"`	`voyage-2`	Strong on code/technical
Mistral	`"mistral"`	`mistral-embed`	EU-hosted option

For cloud providers, set remote.apiKey or configure via auth profiles. Consider setting fallback: "ollama" (or vice versa) for resilience.

Troubleshooting

Memory search returns empty results

Check Ollama is running: curl http://127.0.0.1:11434/api/tags
Check the model is pulled: ollama list should show your configured model
Check config: openclaw config get agents.defaults.memorySearch — verify enabled: true and correct model name
Check workspace has memory files: at minimum, MEMORY.md or files in memory/ must exist

Results are low quality / irrelevant

Increase candidateMultiplier (default 4) for broader initial retrieval
Adjust weights: if exact terms matter more, increase textWeight; if paraphrase matching matters, increase vectorWeight
Lower minScore to include weaker-but-still-relevant matches
Check temporal decay: if old content outranks recent, lower halfLifeDays
Try a larger model: bge-m3 generally outperforms nomic-embed-text on technical content

Silent failures (search works but misses things)

With fallback: "none", if Ollama is down, memory_search returns empty results with no error. The agent proceeds as if there's no relevant memory.

Mitigations:

Set fallback: "fts" to fall back to text-only search when embeddings are unavailable
Add a health check query at session start (search for a known term, verify non-empty results)
Monitor Ollama uptime independently (e.g., a sentinel watcher on http://127.0.0.1:11434/api/tags)

High memory usage

Each embedding model loads into RAM when first queried. Budget:

bge-m3: ~1.2 GB RAM
nomic-embed-text: ~300 MB RAM

If memory is tight, use nomic-embed-text or snowflake-arctic-embed. Ollama unloads models after idle timeout (default 5 min).

Memory Maintenance Best Practices

The quality of memory search depends entirely on the quality of what's stored:

Daily notes are raw logs; MEMORY.md is curated wisdom. Periodic review and distillation prevents noise from drowning signal.
Stale entries pollute the vector space. An outdated decision that's still in memory files will surface as a relevant match, potentially misleading the agent.
Structured tags (e.g., [governance], [defi], [ops]) in daily notes help both vector and keyword search.
Pruning cadence: review memory files every few days. Remove outdated entries, promote durable lessons to MEMORY.md.
Keep MEMORY.md focused. It loads every main session — bloated MEMORY.md means wasted tokens and diluted search quality.

Production Status

Running in production with bge-m3 via Ollama (hybrid mode, temporal decay, MMR). Previously used nomic-embed-text — switched to bge-m3 for better multilingual and technical content recall.

Memory Search ​

Setup Guide ​

Prerequisites ​

Step 1: Install Ollama ​

Step 2: Pull an Embedding Model ​

Step 3: Configure OpenClaw ​

Step 4: Verify ​

Architecture ​

Hybrid Query Pipeline ​

What Gets Indexed ​

Temporal Decay ​

MMR Re-ranking ​

Full Configuration Reference ​

Per-Agent Overrides ​

Embedding Model Selection ​

Cloud Embedding Providers ​

Troubleshooting ​

Memory search returns empty results ​

Results are low quality / irrelevant ​

Silent failures (search works but misses things) ​

High memory usage ​

Memory Maintenance Best Practices ​

Memory Search

Setup Guide

Prerequisites

Step 1: Install Ollama

Step 2: Pull an Embedding Model

Step 3: Configure OpenClaw

Step 4: Verify

Architecture

Hybrid Query Pipeline

What Gets Indexed

Temporal Decay

MMR Re-ranking

Full Configuration Reference

Per-Agent Overrides

Embedding Model Selection

Cloud Embedding Providers

Troubleshooting

Memory search returns empty results

Results are low quality / irrelevant

Silent failures (search works but misses things)

High memory usage

Memory Maintenance Best Practices