Skip to content

Memory Search

OpenClaw agents wake up fresh each session with no memory of prior work. The memory search system bridges this gap — semantic search over workspace files using local embeddings combined with full-text search, enabling agents to recall prior decisions, lessons, and context.

Getting memory search right means the difference between an agent that repeats mistakes and one that learns from them.

Setup Guide

Prerequisites

  • OpenClaw installed and running (openclaw gateway status)
  • Ollama installed locally for embedding generation

Step 1: Install Ollama

bash
# macOS (Homebrew)
brew install ollama

# Or download from https://ollama.com/download

Start the Ollama server:

bash
ollama serve

On macOS, Ollama runs as a background service automatically after install. Verify it's running:

bash
curl -s http://127.0.0.1:11434/api/tags | python3 -c "import sys,json; print(json.dumps(json.load(sys.stdin), indent=2))"

Step 2: Pull an Embedding Model

bash
# Recommended: bge-m3 (1.2 GB, high quality, multilingual)
ollama pull bge-m3

# Alternative: nomic-embed-text (274 MB, lighter, good general-purpose)
ollama pull nomic-embed-text

Step 3: Configure OpenClaw

Add memory search config to openclaw.json under agents.defaults.memorySearch:

jsonc
{
  "agents": {
    "defaults": {
      "memorySearch": {
        "enabled": true,
        "provider": "ollama",
        "model": "bge-m3",
        "remote": {
          "baseUrl": "http://127.0.0.1:11434"
        },
        "fallback": "none",
        "query": {
          "hybrid": {
            "enabled": true,
            "vectorWeight": 0.7,
            "textWeight": 0.3,
            "mmr": {
              "enabled": true,
              "lambda": 0.7
            },
            "temporalDecay": {
              "enabled": true,
              "halfLifeDays": 30
            }
          }
        }
      }
    }
  }
}

Or apply via the gateway tool:

bash
openclaw config patch '{
  "agents.defaults.memorySearch": {
    "enabled": true,
    "provider": "ollama",
    "model": "bge-m3",
    "remote": { "baseUrl": "http://127.0.0.1:11434" },
    "fallback": "none",
    "query": {
      "hybrid": {
        "enabled": true,
        "vectorWeight": 0.7,
        "textWeight": 0.3,
        "mmr": { "enabled": true, "lambda": 0.7 },
        "temporalDecay": { "enabled": true, "halfLifeDays": 30 }
      }
    }
  }
}'

Step 4: Verify

Restart the gateway (or wait for dynamic reload), then test from an agent session:

memory_search("test query about something in your workspace")

The response should include provider: "ollama", model: "bge-m3", and mode: "hybrid" in the metadata.

Architecture

Hybrid Query Pipeline

Memory search uses a two-signal hybrid approach:

  1. Vector search — local embeddings via Ollama produce semantic similarity scores
  2. BM25 keyword search — SQLite FTS provides exact keyword matching for terms the embedding might miss

Results are blended with configurable weighting (default: 70% vector + 30% text), then re-ranked using MMR (Maximal Marginal Relevance) to reduce redundancy.

What Gets Indexed

By default, memory search indexes:

  • MEMORY.md — curated long-term memory
  • memory/*.md — daily notes and any other markdown in the memory directory

You can expand this with extraPaths (see Per-Agent Overrides) or by adding "sessions" to the sources array to include chat transcript history.

Temporal Decay

A configurable half-life (default: 30 days) ensures recent context ranks higher than semantically similar but stale entries. Without temporal decay, a lesson from 3 months ago can outrank a relevant decision from yesterday.

MMR Re-ranking

Maximal Marginal Relevance (MMR) diversifies results so you don't get 5 near-identical snippets. The lambda parameter (0–1) controls the relevance/diversity tradeoff:

  • lambda: 1.0 = pure relevance (may include duplicates)
  • lambda: 0.7 = balanced (default, recommended)
  • lambda: 0.3 = strong diversity (good for broad exploration)

Full Configuration Reference

jsonc
{
  "agents": {
    "defaults": {
      "memorySearch": {
        // Master toggle
        "enabled": true,

        // Embedding provider: "ollama" | "openai" | "gemini" | "voyage" | "mistral" | "local"
        "provider": "ollama",

        // Model name (provider-specific)
        "model": "bge-m3",

        // Provider connection settings
        "remote": {
          "baseUrl": "http://127.0.0.1:11434",
          "apiKey": ""  // Not needed for local Ollama
        },

        // Fallback provider if primary fails: provider name or "none"
        "fallback": "none",

        // What to index: ["memory"] or ["memory", "sessions"]
        "sources": ["memory"],

        // Additional paths to index beyond default memory files
        "extraPaths": [],

        // Query tuning
        "query": {
          // Max results returned per search
          "maxResults": 10,

          // Minimum relevance score threshold (0.0–1.0)
          "minScore": 0.0,

          "hybrid": {
            // Enable hybrid (vector + BM25) search
            "enabled": true,

            // Weight for vector similarity (0.0–1.0)
            "vectorWeight": 0.7,

            // Weight for BM25 keyword match (0.0–1.0)
            "textWeight": 0.3,

            // Candidate pool multiplier before reranking (higher = better recall, slower)
            "candidateMultiplier": 4,

            // MMR diversity reranking
            "mmr": {
              "enabled": true,
              // 0 = most diverse, 1 = most relevant
              "lambda": 0.7
            },

            // Temporal recency boost
            "temporalDecay": {
              "enabled": true,
              // Days for score to halve (lower = stronger recency bias)
              "halfLifeDays": 30
            }
          }
        },

        // Gemini-specific: output vector dimensions (768, 1536, or 3072)
        // "outputDimensionality": 3072,

        // Multimodal indexing (experimental, for cross-modal embedding models)
        // "multimodal": { ... }
      }
    }
  }
}

Per-Agent Overrides

Each agent in agents.list can override memory search settings. Common use case: adding extra indexed paths for a specialized agent.

jsonc
{
  "agents": {
    "list": [
      {
        "id": "research",
        "memorySearch": {
          "extraPaths": [
            "/path/to/project/docs"
          ]
        }
      }
    ]
  }
}

The agent-level config merges with agents.defaults.memorySearch — you only need to specify the fields you're overriding.

Embedding Model Selection

ModelSizeDimensionsQualitySpeedNotes
bge-m31.2 GB1024ExcellentModerateRecommended. Multilingual, strong on technical content. BAAI's flagship embedding model.
nomic-embed-text274 MB768GoodFastLighter alternative. Good general-purpose, lower RAM usage.
mxbai-embed-large670 MB1024Very goodModerateMixedbread.ai. Strong semantic matching.
snowflake-arctic-embed110 MB384GoodVery fastSmallest option. Good for constrained hardware.

In production setups: bge-m3 via Ollama on a local machine (e.g., Apple Silicon Mac). Handles ~50 daily notes + MEMORY.md + workspace files with sub-second query times.

To switch models:

  1. ollama pull <new-model>
  2. Update agents.defaults.memorySearch.model in config
  3. Restart gateway — OpenClaw will re-index with the new model automatically

WARNING

Changing embedding models requires a full re-index because vector dimensions differ between models. The gateway handles this automatically on restart, but the first few queries after a model switch may be slower.

Cloud Embedding Providers

If you don't want to run Ollama locally, OpenClaw supports cloud providers:

ProviderConfig providerExample modelNotes
OpenAI"openai"text-embedding-3-smallRequires API key in auth profile
Google Gemini"gemini"embedding-001Supports outputDimensionality
Voyage AI"voyage"voyage-2Strong on code/technical
Mistral"mistral"mistral-embedEU-hosted option

For cloud providers, set remote.apiKey or configure via auth profiles. Consider setting fallback: "ollama" (or vice versa) for resilience.

Troubleshooting

Memory search returns empty results

  1. Check Ollama is running: curl http://127.0.0.1:11434/api/tags
  2. Check the model is pulled: ollama list should show your configured model
  3. Check config: openclaw config get agents.defaults.memorySearch — verify enabled: true and correct model name
  4. Check workspace has memory files: at minimum, MEMORY.md or files in memory/ must exist

Results are low quality / irrelevant

  • Increase candidateMultiplier (default 4) for broader initial retrieval
  • Adjust weights: if exact terms matter more, increase textWeight; if paraphrase matching matters, increase vectorWeight
  • Lower minScore to include weaker-but-still-relevant matches
  • Check temporal decay: if old content outranks recent, lower halfLifeDays
  • Try a larger model: bge-m3 generally outperforms nomic-embed-text on technical content

Silent failures (search works but misses things)

With fallback: "none", if Ollama is down, memory_search returns empty results with no error. The agent proceeds as if there's no relevant memory.

Mitigations:

  • Set fallback: "fts" to fall back to text-only search when embeddings are unavailable
  • Add a health check query at session start (search for a known term, verify non-empty results)
  • Monitor Ollama uptime independently (e.g., a sentinel watcher on http://127.0.0.1:11434/api/tags)

High memory usage

Each embedding model loads into RAM when first queried. Budget:

  • bge-m3: ~1.2 GB RAM
  • nomic-embed-text: ~300 MB RAM

If memory is tight, use nomic-embed-text or snowflake-arctic-embed. Ollama unloads models after idle timeout (default 5 min).

Memory Maintenance Best Practices

The quality of memory search depends entirely on the quality of what's stored:

  • Daily notes are raw logs; MEMORY.md is curated wisdom. Periodic review and distillation prevents noise from drowning signal.
  • Stale entries pollute the vector space. An outdated decision that's still in memory files will surface as a relevant match, potentially misleading the agent.
  • Structured tags (e.g., [governance], [defi], [ops]) in daily notes help both vector and keyword search.
  • Pruning cadence: review memory files every few days. Remove outdated entries, promote durable lessons to MEMORY.md.
  • Keep MEMORY.md focused. It loads every main session — bloated MEMORY.md means wasted tokens and diluted search quality.

Production Status

Running in production with bge-m3 via Ollama (hybrid mode, temporal decay, MMR). Previously used nomic-embed-text — switched to bge-m3 for better multilingual and technical content recall.

Built with OpenClaw 🤖