How to Cut Your OpenClaw API Costs by 80% Without Losing Quality
Most people run OpenClaw on expensive models for tasks that don't need them. Smart model routing, prompt caching, and context pruning can slash your monthly AI bill by 60โ80%. Here's exactly how.
Model Comparison: Cost vs Quality
This is the single highest-impact decision you'll make for API costs. The difference between Haiku and Opus is a 60-100x cost multiplier.
| Model | Input ($/1M) | Output ($/1M) | Context | Speed |
|---|---|---|---|---|
| Claude Sonnet 4.6 | $3.00 | $15.00 | 200K | Fast |
| Claude Opus 4.6 | $15.00 | $75.00 | 200K | Slower |
| Claude Haiku 4 | $0.80 | $4.00 | 200K | Very Fast |
| Gemini Flash 2.5 | $0.15 | $0.60 | 1M | Very Fast |
| Gemini Flash Lite | $0.075 | $0.30 | 1M | Fastest |
| Kimi K2.5 | $2.00 | $8.00 | 2M | Fast |
| Grok 4 | $3.00 | $15.00 | 256K | Fast |
When to Use Each Model
Claude Haiku 4 โ The Workhorse
~70% of tasks- โWeb research and content summarization
- โEmail triage and classification
- โData extraction from pages
- โSimple code generation (under 100 lines)
- โSub-agent workers doing focused tasks
- โContext compaction (LCM compaction model)
Claude Sonnet 4.6 โ The Daily Driver
Main agent default- โMain agent (primary conversational agent)
- โComplex coding and debugging
- โMulti-step reasoning tasks
- โWriting long-form content
- โCode review with nuanced feedback
Gemini Flash Lite โ Infrastructure Only
Cheapest option- โHeartbeat model (keep-alive pings)
- โContext compaction (LCM)
- โBackground monitors and watchers
- โSimple yes/no classification tasks
- โHigh-volume sub-agents with simple jobs
Claude Opus 4.6 โ For Elite Tasks Only
Use sparingly- โCritical system architecture decisions
- โComplex legal or financial document analysis
- โTasks where quality difference is measurable and matters
Don't Use Opus as Your Default
We've seen users set Opus as their default model thinking "best = always better." That's how you hit $500/month bills on tasks a $5/month Haiku setup would handle equally well.
Prompt Caching (cacheRetention: long)
Prompt caching is one of the most underused cost features. When enabled, Anthropic caches your system prompt โ so you only pay full price for it once. Subsequent calls using the cached context are 90% cheaper.
{
"model": {
"default": "anthropic/claude-sonnet-4-6",
"cacheRetention": "long",
"cacheSystemPrompt": true,
"cachePrefill": true
}
}Real Savings Example
A typical OpenClaw system prompt is 8,000โ15,000 tokens. Without caching, a 50-message conversation costs ~$0.50 just in system prompt tokens. With caching, those 50 messages cost ~$0.05. That's 90% off that portion of your bill.
Gemini Flash Lite for Compaction and Heartbeats
OpenClaw has two background processes that run constantly: the heartbeat (session keep-alive) and LCM compaction (context compression). By default, these use your main model โ that's expensive and unnecessary.
{
"model": {
"default": "anthropic/claude-sonnet-4-6",
"heartbeat": "google/gemini-flash-lite",
"compaction": "google/gemini-flash-lite",
"compactionThreshold": 0.75
},
"heartbeat": {
"intervalMs": 60000,
"model": "google/gemini-flash-lite"
}
}Switching these two background tasks from Sonnet to Flash Lite alone can cut 20โ30% off your monthly bill if you run OpenClaw all day.
Context Pruning and TTL
Long conversations accumulate thousands of tokens that get sent to the API on every new message โ even if that context is weeks old and irrelevant. Configure TTL to automatically prune stale context:
{
"lcm": {
"enabled": true,
"compaction": {
"model": "google/gemini-flash-lite",
"threshold": 0.75,
"targetRatio": 0.5
},
"ttl": {
"messages": "30d",
"summaries": "90d"
}
},
"context": {
"maxTokens": 80000,
"pruneOlderThan": "7d"
}
}Real Cost Breakdown
Typical monthly costs for a power user running OpenClaw all day:
Unoptimized (Opus default)
$120 โ $300/mo
Partially optimized (Sonnet default)
$30 โ $80/mo
Optimized (Haiku + caching)
$8 โ $25/mo
Fully optimized (all techniques)
$3 โ $12/mo
Full Optimized Config Example
{
"model": {
"default": "anthropic/claude-sonnet-4-6",
"heartbeat": "google/gemini-flash-lite",
"compaction": "google/gemini-flash-lite",
"fallback": [
"anthropic/claude-haiku-4",
"google/gemini-flash-2.5"
],
"cacheRetention": "long",
"cacheSystemPrompt": true
},
"lcm": {
"enabled": true,
"compaction": {
"model": "google/gemini-flash-lite",
"threshold": 0.75
}
},
"context": {
"maxTokens": 80000
},
"heartbeat": {
"intervalMs": 60000,
"model": "google/gemini-flash-lite",
"silent": true
}
}Ready to Set This Up?
Follow the complete setup guide to get OpenClaw running with the right config from day one.
Full Setup Guide