OpenClaw Context Management: How It Remembers Everything Without Breaking the Bank
Most AI agents have the memory of a goldfish. After a hundred messages, they forget what you said at the start of the conversation. OpenClaw's LCM (Lossless Context Management) system is the reason it doesn't — and why your token bill doesn't spiral out of control the longer you use it. Here's the full breakdown.
Every AI model has a context window — a hard ceiling on how many tokens it can hold in active memory at once. Claude Sonnet 4.6 maxes out at 200,000 tokens. Gemini 2.5 Pro hits 1 million. These numbers sound enormous until you start actually using an AI agent daily: a month of conversations, uploaded documents, web search results, and code snippets will blow past even a million-token window without breaking a sweat.
The naive solution — keeping everything in context — fails in two directions. First, it gets astronomically expensive. Every API call sends the full conversation history as input tokens. Second, it gets slower. Larger context = longer time-to-first-token. At 500k tokens of history, you're waiting several seconds before the model starts responding.
OpenClaw solves this with LCM — Lossless Context Management. It's the system that decides what to keep in full, what to compress, and what to archive — all without you manually managing any of it. Combined with the MEMORY.md persistent memory file, it creates an AI agent that genuinely improves with time instead of forgetting everything when you clear chat.
The Context Window Problem (For Real)
Let's put numbers on it. A typical OpenClaw cron job — a daily briefing that checks email, searches the web, and summarizes the morning — might consume 8,000–15,000 tokens per run. Run that once a day for a month and you've accumulated 250,000–450,000 tokens of conversation history. If your agent context naively grew to include all of that, every single new message would cost $0.75–$1.35 in input tokens alone (at Sonnet 4.6 pricing), before the model even starts generating a response.
The alternative most tools use is simpler but worse: truncation. Keep the last N messages, drop everything older. This is why most AI chat tools feel like they have amnesia after a few hundred exchanges. You can't ask "what did we decide about the database architecture three weeks ago?" because those messages have been silently discarded.
LCM takes a third path. Instead of keeping everything (expensive) or dropping everything old (lossy), it compresses old context into semantic summaries while preserving searchable archives. The agent doesn't have 400,000 tokens of raw history in memory — but it can retrieve any specific piece of that history on demand when it's relevant.
How LCM Actually Works
LCM operates in three layers: active context, summary DAG, and searchable archive.
Active context is what you think of as "the current conversation." LCM keeps this lean — typically 20,000–40,000 tokens. This includes recent messages, the current task in progress, and any files or tool outputs from the last few exchanges. This is what gets sent to the model on every turn.
The summary DAG (directed acyclic graph) is where the magic happens. As older messages fall out of active context, LCM doesn't delete them — it compresses batches of messages into structured summaries tagged with semantic metadata. Each summary links to its source messages and to higher-level summaries that roll up multiple conversation sessions. You can think of it as a hierarchical memory: individual exchanges → session summaries → topic summaries → long-term project summaries.
The searchable archive is the full text of all past messages stored locally. When the agent needs to answer a question that requires historical context — "what API key did I configure for that Telegram bot last month?" — it uses lcm_grep and lcm_expand to search and retrieve from the archive without loading the full history into context.
lcm architecture
Active Context (20-40k tokens) ├── Current message ├── Recent messages (last ~15 turns) └── Injected tool results Summary DAG (compressed, indexed) ├── sum_abc → "April 4 morning: discussed API keys, cron setup" ├── sum_def → "Week of Mar 28: Telegram bot config, OpenAI setup" └── sum_ghi → "March overall: initial setup, model routing, costs" Searchable Archive (full history, local disk) └── All raw messages, searchable via lcm_grep + lcm_expand
MEMORY.md: The Persistent Layer
LCM handles conversation history automatically. But there's a second, simpler memory layer that works alongside it: MEMORY.md.
This is a plain markdown file in your workspace that the agent updates manually when something important happens. New API key configured? Written to MEMORY.md. Important architectural decision made? Written to MEMORY.md. Project completed? Written to MEMORY.md. Unlike LCM (which compresses and archives), MEMORY.md is always loaded at the start of every session as part of active context. It's the agent's permanent long-term memory.
The key difference between LCM and MEMORY.md is intentionality. LCM is automatic — it manages the full conversation history without you thinking about it. MEMORY.md is curated — the agent (and you) decide what rises to the level of "permanently worth remembering." The two work together: LCM handles "what happened in our conversations," MEMORY.md handles "what permanently matters."
A well-maintained MEMORY.md typically runs 1,000–3,000 tokens. It's cheap to include in every context and dramatically reduces the frequency of the agent needing to dig into LCM archives to answer basic continuity questions. Think of it as a fast, always-available cache in front of the deeper historical search.
example memory.md structure
# MEMORY.md — Persistent Agent Memory ## Projects - claw.mobile blog → /Users/you/.openclaw/workspace/clawbot-website - Main Telegram bot → @Klawdik_bot (token in TOOLS.md) ## API Keys & Tools - Anthropic: direct API, sk-ant-... (console.anthropic.com) - Gemini: configured in environment, used for image gen - OpenAI: DALL-E 3 for banner images when Gemini unavailable ## Decisions Made - [2026-03-15] Chose Vercel for deployment, not Netlify - [2026-03-28] Using Gemini 2.5 Flash as cheap model for cron summaries - [2026-04-01] Blog posts: 1200-1600 words, builder voice, no corporate fluff ## Completed Milestones - [2026-04-04] Published oracle-layoffs-ai-2026 (morning article) - [2026-04-04] Published anthropic-blocks-claude-code-openclaw
Tuning Your Context Config
The default LCM settings work well for most users, but if you're running a heavy OpenClaw setup — multiple cron jobs, frequent large file reads, multi-step agent tasks — understanding the knobs gives you meaningful cost and latency improvements.
Context budget is the primary lever. This controls how large active context is allowed to grow before LCM starts aggressively compressing older messages into summaries. Lower this if you're on a cost-sensitive plan or using a smaller context model. Raise it if you're doing complex multi-session tasks where mid-conversation retrieval from summaries would be disruptive.
Compaction threshold determines when LCM runs its compression pass. By default it kicks in when active context hits about 70% of the configured budget. You can push this higher (80–90%) if you want longer uninterrupted context stretches, at the cost of slightly less aggressive cost optimization.
Summary model is often overlooked but important. LCM uses a model to generate summaries when compressing old context. This doesn't need to be your highest-quality model — Gemini 2.5 Flash works excellently for summarization and costs a fraction of Sonnet or Pro. Set your summary model to your cheapest capable provider and let your expensive model focus on actual tasks.
openclaw.config.yaml — context tuning
lcm: # How large active context can grow before compression kicks in contextBudget: 40000 # tokens; default ~40k # Compression threshold (% of budget) compactionThreshold: 0.70 # compress when 70% full # Model used for generating summaries summaryModel: google/gemini-2.5-flash # cheap + fast # Keep full session transcripts on disk for lcm_grep archiveToLocal: true # How many summary levels to maintain (more = better recall) summaryDepth: 3
One more config worth knowing: workspaceFiles. This controls which files in your workspace are automatically injected into the context at session start. MEMORY.md, SOUL.md, USER.md, and TOOLS.md are typically included here. If these files are growing large — TOOLS.md especially can get bloated with API key docs and notes — consider splitting them. Large workspace files are loaded on every turn; keep them under 2,000 tokens each for best performance. Check our full OpenClaw setup guide for workspace file best practices.
What the Community Is Saying
Context management is consistently the topic that separates OpenClaw power users from casual users in community discussions. On the OpenClaw Discord, the #tips-and-tricks channel is full of users who discovered that tuning contextBudget and switching to Flash for summaries cut their monthly costs in half without any noticeable quality drop. The most common complaint from new users is also context-related: "why does my agent seem to forget things I told it last week?" — usually the answer is that MEMORY.md isn't being maintained, or the user cleared chat history manually and wiped the LCM archive along with it.
On Reddit's r/LocalLLaMA, OpenClaw's LCM architecture is frequently cited as a meaningful differentiator compared to building your own agent setup. The summary DAG approach — hierarchical compression rather than flat truncation — handles long-running sessions far better than most DIY implementations. Several developers have reverse-engineered OpenClaw's approach to build similar systems in their own agents, which is a strong signal that the design is sound.
In today's HN thread about Anthropic blocking Claude Code subscriptions, a recurring sub-thread is about exactly this: people debating whether Claude Code's context handling is better or worse than OpenClaw's LCM. The consensus among developers who've used both is that they're solving different problems. Claude Code's context is optimized for within-session continuity on a single codebase. OpenClaw's LCM is optimized for across-session continuity across an entire personal workflow. Neither is strictly better; they're different tools.
Real Commands That Work
The LCM tools are exposed as first-class agent capabilities. When you ask your OpenClaw agent to look something up from past sessions, it uses these internally — but you can also reference them explicitly to control behavior.
search past conversations
# Tell your agent to search history
"Search my past conversations for anything about the Telegram bot setup"
# The agent will use lcm_grep internally:
# lcm_grep({ pattern: "telegram bot", scope: "both" })
# Then expand relevant summaries:
# lcm_expand({ summaryIds: ["sum_abc123"], includeMessages: true })force memory update
# Add something critical to permanent memory "Update MEMORY.md: we decided to use Vercel for all deployments going forward" # The agent writes directly to ~/workspace/MEMORY.md # This persists across all future sessions
inspect lcm summaries
# Ask the agent what it remembers from a specific period
"What did we work on in the last week? Give me a summary from your context history"
# Internally uses:
# lcm_expand({ query: "last week work projects", maxDepth: 2 })One pattern worth establishing early: end significant sessions with an explicit memory update. "Before we finish, update MEMORY.md with anything important from today." The agent will write relevant decisions, configurations, and completions to permanent memory. This takes ten seconds and prevents the "I can't find that thing we set up three weeks ago" problem entirely. It also pairs well with the cost calculator — a well-maintained MEMORY.md means cheaper future sessions because the agent finds answers there instead of doing expensive LCM searches.
The Cost Impact
Numbers, because they matter. A user running OpenClaw with default context settings for 30 days (one cron job daily, roughly 20 manual interactions) will accumulate approximately 600,000–900,000 tokens of raw conversation history. Without any compression, naively sending all of that with every request would cost around $1.50–$2.25 per message in input tokens alone at Sonnet 4.6 pricing.
With LCM active and context budget set to 40,000 tokens, that same usage pattern costs roughly $0.03–$0.08 per message in input tokens. The compression overhead (running Flash to generate summaries periodically) adds maybe $2–4 per month across those 600,000 archived tokens. Total monthly input token cost: under $10 for a moderately active setup. Without LCM it would be $300+.
That's not a minor optimization. It's the difference between OpenClaw being financially viable for daily use and not. The design of the system — treating the context window as a hot cache rather than a complete record — is what makes long-running AI agents economically realistic in 2026.
The broader lesson applies beyond OpenClaw. Any agentic AI system that survives long-term use needs this kind of hierarchical memory architecture. The models with 1M+ token windows make it tempting to just throw everything in context and skip the engineering. But the economics are brutal: at Gemini 2.5 Pro pricing, a 1M token context costs $1.25 per call. Do that five times a day for a month and you've spent $187 just on context — and the model still doesn't "remember" anything when you start a new session. See our full cost reduction guide for model-by-model pricing breakdowns.
Quick Setup Checklist
- ✅ Set
summaryModelto your cheapest provider (Gemini Flash) - ✅ Keep MEMORY.md under 3,000 tokens — prune it quarterly
- ✅ End important sessions with an explicit memory update request
- ✅ Set
contextBudget: 30000for cost-sensitive setups - ✅ Never manually delete your LCM archive — it's your long-term recall layer
Related
Stay in the Loop
Get weekly OpenClaw tips, new skills, and automation ideas. No spam, unsubscribe anytime.
Join 2,000+ builders · No spam · Unsubscribe anytime