Cost & Performance

How to Cut Your OpenClaw API Costs by 80% Without Losing Quality

Most people run OpenClaw on expensive models for tasks that don't need them. Smart model routing, prompt caching, and context pruning can slash your monthly AI bill by 60–80%. Here's exactly how.

🦞 claw.mobile Editorial·March 22, 2026·

16 min read

Model Comparison: Cost vs Quality

This is the single highest-impact decision you'll make for API costs. The difference between Haiku and Opus is a 60-100x cost multiplier.

Model	Input ($/1M)	Output ($/1M)	Context	Speed
Claude Sonnet 4.6	$3.00	$15.00	200K	Fast
Claude Opus 4.6	$15.00	$75.00	200K	Slower
Claude Haiku 4.5	$0.80	$4.00	200K	Very Fast
Gemini Flash 2.5	$0.15	$0.60	1M	Very Fast
Gemini Flash Lite	$0.075	$0.30	1M	Fastest
Kimi K2.5	$2.00	$8.00	2M	Fast
Grok 4	$3.00	$15.00	256K	Fast

When to Use Each Model

Claude Haiku 4.5 — The Workhorse

~70% of tasks

→Web research and content summarization
→Email triage and classification
→Data extraction from pages
→Simple code generation (under 100 lines)
→Sub-agent workers doing focused tasks
→Context compaction (LCM compaction model)

Claude Sonnet 4.6 — The Daily Driver

Main agent default

→Main agent (primary conversational agent)
→Complex coding and debugging
→Multi-step reasoning tasks
→Writing long-form content
→Code review with nuanced feedback

Gemini Flash Lite — Infrastructure Only

Cheapest option

→Heartbeat model (keep-alive pings)
→Context compaction (LCM)
→Background monitors and watchers
→Simple yes/no classification tasks
→High-volume sub-agents with simple jobs

Claude Opus 4.6 — For Elite Tasks Only

Use sparingly

→Critical system architecture decisions
→Complex legal or financial document analysis
→Tasks where quality difference is measurable and matters

Don't Use Opus as Your Default

We've seen users set Opus as their default model thinking "best = always better." That's how you hit $500/month — see what it costs for you bills on tasks a $5/month Haiku setup would handle equally well.

Get the weekly AI agent digest 🦞

What's shipping in AI tools, every Monday. No fluff.

Subscribe Free →

Prompt Caching (cacheRetention: long)

Prompt caching is one of the most underused cost features. When enabled, Anthropic caches your system prompt — so you only pay full price for it once. Subsequent calls using the cached context are 90% cheaper.

openclaw.json — Enable Prompt Caching

{
  "model": {
    "default": "anthropic/claude-sonnet-4-6",
    "cacheRetention": "long",
    "cacheSystemPrompt": true,
    "cachePrefill": true
  }
}

Real Savings Example

A typical OpenClaw system prompt is 8,000–15,000 tokens. Without caching, a 50-message conversation costs ~$0.50 just in system prompt tokens. With caching, those 50 messages cost ~$0.05. That's 90% off that portion of your bill.

Gemini Flash Lite for Compaction and Heartbeats

OpenClaw has two background processes that run constantly: the heartbeat (session keep-alive) and LCM compaction (context compression). By default, these use your main model — that's expensive and unnecessary.

openclaw.json — Cheap Background Models

{
  "model": {
    "default": "anthropic/claude-sonnet-4-6",
    "heartbeat": "google/gemini-flash-lite",
    "compaction": "google/gemini-flash-lite",
    "compactionThreshold": 0.75
  },
  "heartbeat": {
    "intervalMs": 60000,
    "model": "google/gemini-flash-lite"
  }
}

Switching these two background tasks from Sonnet to Flash Lite alone can cut 20–30% off your monthly bill if you run OpenClaw all day.

Context Pruning and TTL

Long conversations accumulate thousands of tokens that get sent to the API on every new message — even if that context is weeks old and irrelevant. Configure TTL to automatically prune stale context:

openclaw.json — Context TTL Settings

{
  "lcm": {
    "enabled": true,
    "compaction": {
      "model": "google/gemini-flash-lite",
      "threshold": 0.75,
      "targetRatio": 0.5
    },
    "ttl": {
      "messages": "30d",
      "summaries": "90d"
    }
  },
  "context": {
    "maxTokens": 80000,
    "pruneOlderThan": "7d"
  }
}

Real Cost Breakdown

Typical monthly costs for a power user running OpenClaw all day:

Unoptimized (Opus default)

$120 – $300/mo

Partially optimized (Sonnet default)

$30 – $80/mo

Optimized (Haiku + caching)

$8 – $25/mo

Fully optimized (all techniques)

$3 – $12/mo

Full Optimized Config Example

openclaw.json — Full Optimized Config

{
  "model": {
    "default": "anthropic/claude-sonnet-4-6",
    "heartbeat": "google/gemini-flash-lite",
    "compaction": "google/gemini-flash-lite",
    "fallback": [
      "anthropic/claude-haiku-4",
      "google/gemini-flash-2.5"
    ],
    "cacheRetention": "long",
    "cacheSystemPrompt": true
  },
  "lcm": {
    "enabled": true,
    "compaction": {
      "model": "google/gemini-flash-lite",
      "threshold": 0.75
    }
  },
  "context": {
    "maxTokens": 80000
  },
  "heartbeat": {
    "intervalMs": 60000,
    "model": "google/gemini-flash-lite",
    "silent": true
  }
}

Ready to Set This Up?

Follow the complete setup guide to get OpenClaw running with the right config from day one.

Full Setup Guide

Back to all articles

Need a website or bot built?

Fixed pricing from $999. Free mockup in 48h. You own the code.

See pricing

Get the Vibe Coding Cheat Sheet

Best tool for every use case + pricing + pro tips. One page, zero fluff. Plus weekly updates on new tools.