Cost & Performance

How to Cut Your OpenClaw API Costs by 80% Without Losing Quality

Most people run OpenClaw on expensive models for tasks that don't need them. Smart model routing, prompt caching, and context pruning can slash your monthly AI bill by 60โ€“80%. Here's exactly how.

๐Ÿฆž claw.mobile EditorialยทMarch 22, 2026ยท
16 min read

Model Comparison: Cost vs Quality

This is the single highest-impact decision you'll make for API costs. The difference between Haiku and Opus is a 60-100x cost multiplier.

ModelInput ($/1M)Output ($/1M)ContextSpeed
Claude Sonnet 4.6$3.00$15.00200KFast
Claude Opus 4.6$15.00$75.00200KSlower
Claude Haiku 4$0.80$4.00200KVery Fast
Gemini Flash 2.5$0.15$0.601MVery Fast
Gemini Flash Lite$0.075$0.301MFastest
Kimi K2.5$2.00$8.002MFast
Grok 4$3.00$15.00256KFast

When to Use Each Model

Claude Haiku 4 โ€” The Workhorse

~70% of tasks
  • โ†’Web research and content summarization
  • โ†’Email triage and classification
  • โ†’Data extraction from pages
  • โ†’Simple code generation (under 100 lines)
  • โ†’Sub-agent workers doing focused tasks
  • โ†’Context compaction (LCM compaction model)

Claude Sonnet 4.6 โ€” The Daily Driver

Main agent default
  • โ†’Main agent (primary conversational agent)
  • โ†’Complex coding and debugging
  • โ†’Multi-step reasoning tasks
  • โ†’Writing long-form content
  • โ†’Code review with nuanced feedback

Gemini Flash Lite โ€” Infrastructure Only

Cheapest option
  • โ†’Heartbeat model (keep-alive pings)
  • โ†’Context compaction (LCM)
  • โ†’Background monitors and watchers
  • โ†’Simple yes/no classification tasks
  • โ†’High-volume sub-agents with simple jobs

Claude Opus 4.6 โ€” For Elite Tasks Only

Use sparingly
  • โ†’Critical system architecture decisions
  • โ†’Complex legal or financial document analysis
  • โ†’Tasks where quality difference is measurable and matters

Don't Use Opus as Your Default

We've seen users set Opus as their default model thinking "best = always better." That's how you hit $500/month bills on tasks a $5/month Haiku setup would handle equally well.

Prompt Caching (cacheRetention: long)

Prompt caching is one of the most underused cost features. When enabled, Anthropic caches your system prompt โ€” so you only pay full price for it once. Subsequent calls using the cached context are 90% cheaper.

openclaw.json โ€” Enable Prompt Caching
{
  "model": {
    "default": "anthropic/claude-sonnet-4-6",
    "cacheRetention": "long",
    "cacheSystemPrompt": true,
    "cachePrefill": true
  }
}

Real Savings Example

A typical OpenClaw system prompt is 8,000โ€“15,000 tokens. Without caching, a 50-message conversation costs ~$0.50 just in system prompt tokens. With caching, those 50 messages cost ~$0.05. That's 90% off that portion of your bill.

Gemini Flash Lite for Compaction and Heartbeats

OpenClaw has two background processes that run constantly: the heartbeat (session keep-alive) and LCM compaction (context compression). By default, these use your main model โ€” that's expensive and unnecessary.

openclaw.json โ€” Cheap Background Models
{
  "model": {
    "default": "anthropic/claude-sonnet-4-6",
    "heartbeat": "google/gemini-flash-lite",
    "compaction": "google/gemini-flash-lite",
    "compactionThreshold": 0.75
  },
  "heartbeat": {
    "intervalMs": 60000,
    "model": "google/gemini-flash-lite"
  }
}

Switching these two background tasks from Sonnet to Flash Lite alone can cut 20โ€“30% off your monthly bill if you run OpenClaw all day.

Context Pruning and TTL

Long conversations accumulate thousands of tokens that get sent to the API on every new message โ€” even if that context is weeks old and irrelevant. Configure TTL to automatically prune stale context:

openclaw.json โ€” Context TTL Settings
{
  "lcm": {
    "enabled": true,
    "compaction": {
      "model": "google/gemini-flash-lite",
      "threshold": 0.75,
      "targetRatio": 0.5
    },
    "ttl": {
      "messages": "30d",
      "summaries": "90d"
    }
  },
  "context": {
    "maxTokens": 80000,
    "pruneOlderThan": "7d"
  }
}

Real Cost Breakdown

Typical monthly costs for a power user running OpenClaw all day:

Unoptimized (Opus default)

$120 โ€“ $300/mo

Partially optimized (Sonnet default)

$30 โ€“ $80/mo

Optimized (Haiku + caching)

$8 โ€“ $25/mo

Fully optimized (all techniques)

$3 โ€“ $12/mo

Full Optimized Config Example

openclaw.json โ€” Full Optimized Config
{
  "model": {
    "default": "anthropic/claude-sonnet-4-6",
    "heartbeat": "google/gemini-flash-lite",
    "compaction": "google/gemini-flash-lite",
    "fallback": [
      "anthropic/claude-haiku-4",
      "google/gemini-flash-2.5"
    ],
    "cacheRetention": "long",
    "cacheSystemPrompt": true
  },
  "lcm": {
    "enabled": true,
    "compaction": {
      "model": "google/gemini-flash-lite",
      "threshold": 0.75
    }
  },
  "context": {
    "maxTokens": 80000
  },
  "heartbeat": {
    "intervalMs": 60000,
    "model": "google/gemini-flash-lite",
    "silent": true
  }
}

Ready to Set This Up?

Follow the complete setup guide to get OpenClaw running with the right config from day one.

Full Setup Guide
We use cookies for analytics. Learn more
Run your own AI agent for $6/month โ†’