Local AI · Breaking

Gemma 4 + OpenClaw: Google's New Open Agent Model, Running on Your Hardware

Google dropped Gemma 4 this morning — #1 on Hacker News within hours. Native function calling, multimodal reasoning, and models that run on a laptop. Here's how to wire it into OpenClaw as your primary or fallback agent model, right now.

🦞 claw.mobile Editorial·April 3, 2026·
11 min read

1What Gemma 4 Actually Is

The name is familiar but the architecture isn't. Google built Gemma 4 on the same research stack as Gemini 3 — the same interleaved attention and distillation techniques that made Gemini competitive, now compressed into open weights you can download this morning. What came out is a family of models that pushes intelligence-per-parameter to a place that wasn't there even six months ago.

The flagship is the 31B thinking model. It scored 89.2% on AIME 2026 mathematics without external tools — the same benchmark that stumped GPT-4o and early Claude Sonnet releases. The 26B MoE variant (which activates only 4B parameters per forward pass) scores within 1 point and is dramatically cheaper to run on modest hardware. Then there's the E4B and E2B models — "embedded" sizes aimed at running on phones and IoT devices. If you have a recent laptop, you can run something real.

The thing that matters most for OpenClaw users specifically: Gemma 4 has native function calling baked into the architecture and was benchmarked on τ2-bench agentic tool use — scoring 86.4% on the retail task suite. That's not a marketing claim, it's a benchmark designed to measure exactly what AI agents need to do in the real world: call APIs, navigate apps, complete multi-step tasks. And the 31B model is competitive with anything cloud-only.

2The Four Models: Which One to Run

There are four models, and picking wrong costs you either money or capability. Here's the honest breakdown:

Gemma 4 31B IT ThinkingBest quality

The full model. AIME 89.2%, τ2-bench 86.4%. Needs ~22GB VRAM — a 4090 or a Mac with 32GB unified memory handles it. If you have the hardware, this is the one.

ollama pull gemma4:31b-it-thinking

Gemma 4 26B A4B IT Thinking (MoE)Best value

Mixture-of-Experts variant activating 4B parameters per call. Scores nearly as high as the 31B at a fraction of the compute cost. Runs well on 16GB VRAM or a 24GB Mac. The sweet spot for most setups.

ollama pull gemma4:26b-a4b-it-thinking

Gemma 4 E4B IT ThinkingLaptop / 8GB

A 4B embedded model for machines without a serious GPU. AIME 42.5%, τ2-bench 57.5% — not impressive, but capable for structured tasks like cron summaries, email triage, and routing decisions.

ollama pull gemma4:e4b-it-thinking

Gemma 4 E2B IT ThinkingIoT / Edge

The 2B model. Designed for phones, Raspberry Pis, IoT devices. AIME 37.5% — not your agent brain. Use it for fast classification, keyword extraction, or as a cheap pre-filter before hitting a larger model.

ollama pull gemma4:e2b-it-thinking

Honest note on VRAM: These numbers assume Q4 quantization from Unsloth (the team that released Gemma 4 quants within hours of launch). Full-precision weights cost roughly 2x. Stick to the GGUF quants for local use.

3Why Gemma 4 + OpenClaw Is a Good Combination

Most open models released in 2024 and early 2025 were fine-tuned on chat data and grudgingly bolted on tool-calling afterward. The results were brittle — models that could call a function in a clean demo but hallucinated argument names under any real-world variation. Gemma 4 is different in that the agentic benchmarks are actually highlighted on the product page, not buried in a model card appendix.

OpenClaw routes tasks to models via config. Every tool call, cron job, and skill invocation goes through the model layer — which means swapping Gemma 4 in is a three-line config change. You don't need to rewrite automations or skills. The skill system works exactly the same regardless of the underlying model; you're just changing what brain interprets the skill instructions.

The 140-language support is underrated if you're building automations that touch non-English content — emails, documents, RSS feeds from non-US sources. Claude and GPT handle this, but at API cost per token. Running locally on Gemma 4 E4B, multilingual classification and extraction becomes essentially free.

4Setup: Ollama + Gemma 4

If you've already run OpenClaw with Ollama before, this will take five minutes. If not, install Ollama first — it handles model management and exposes a local OpenAI-compatible endpoint.

1. Install Ollama (if not already installed)

# macOS / Linux curl -fsSL https://ollama.ai/install.sh | sh

2. Pull Gemma 4 (pick the model that matches your hardware)

# 26B MoE — recommended for 16-24GB VRAM or Mac with 24GB ollama pull gemma4:26b-a4b-it-thinking # 31B full thinking model — 32GB+ recommended ollama pull gemma4:31b-it-thinking # Lightweight option for CPU-only or 8GB machines ollama pull gemma4:e4b-it-thinking

3. Verify Ollama is serving on the default port

curl http://localhost:11434/api/tags # Should list gemma4 models in the response

Ollama exposes an OpenAI-compatible API at http://localhost:11434/v1 by default. That's the URL you'll point OpenClaw at.

5Configure OpenClaw to Use Gemma 4

OpenClaw supports multiple model providers simultaneously. You can run Gemma 4 locally for cost-sensitive tasks while keeping Claude or GPT as your premium fallback for complex reasoning. Here's the config:

Option A — Gemma 4 as the default model

# ~/.openclaw/config.yml model: provider: openai-compatible baseUrl: http://localhost:11434/v1 apiKey: ollama model: gemma4:26b-a4b-it-thinking

Option B — Multi-provider setup (local Gemma 4 + cloud fallback)

# ~/.openclaw/config.yml model: provider: openai-compatible baseUrl: http://localhost:11434/v1 apiKey: ollama model: gemma4:26b-a4b-it-thinking # Override per-session from Telegram: # /model anthropic/claude-sonnet-4-6

Once the config is saved, restart the gateway: openclaw gateway restart. Then open your Telegram chat and run /status — the model field should show gemma4. If you want to switch back temporarily, /model anthropic/claude-sonnet-4-6 works at runtime without touching config.

Running OpenClaw on a VPS? Ollama doesn't have to be on the same machine. If your VPS is too small for a local model, you can run Ollama on your home workstation and point the config at its LAN/tailnet address. See the VPS setup guide for Tailscale configs that make this seamless.

6Agentic Workflows That Actually Work with Gemma 4

Not everything benefits equally from running locally. Gemma 4's native function calling makes it genuinely good at structured agentic tasks. Here are the patterns worth wiring up.

Cron-driven research briefings

Set a daily cron that has Gemma 4 fetch a list of URLs, extract key points, and send a Telegram summary. The E4B model handles this fine — extraction and summarization don't need frontier-level intelligence, just consistency. Running locally means the task runs instantly, no cold-start, no rate limit.

# Morning brief cron — every weekday at 7am schedule: {kind: cron, expr: "0 7 * * 1-5"} payload: Fetch top stories from HN and r/LocalLLaMA, summarize in 5 bullets, send to Telegram

Multilingual document processing

If you receive documents, emails, or contracts in Portuguese, Spanish, German, or any of the 140 supported languages, Gemma 4 processes them natively without translation overhead. Route these tasks locally; send only the English-language complex reasoning tasks to the cloud model. You'll cut API costs significantly on any workflow that touches multilingual content.

Image and screenshot analysis

Gemma 4 is multimodal — send it a screenshot of a dashboard, a chart, or a photo, and it can reason over it. This pairs well with the OpenClaw automation workflow pattern of "screenshot → analyze → act." For example: send a screenshot of your server metrics and ask if anything looks unusual. The 26B model handles this with reasonable quality.

# From Telegram — attach screenshot, ask: "Analyze this metrics screenshot. Any anomalies?"

Local code review without token costs

Pass a file or diff and ask for a code review. Gemma 4 31B's 80% LCB v6 score puts it in legitimate territory for non-trivial code review tasks. Not as good as Claude Sonnet on nuanced architecture questions — but for routine review of straightforward diffs, running locally and paying zero per token is hard to argue against.

7Honest Tradeoffs

Gemma 4 is genuinely impressive, and the benchmark numbers hold up under scrutiny. But it's not a straight Claude replacement, and treating it as one will lead to frustration.

The thinking mode adds latency. On a local machine, the 31B model thinking through a complex task takes considerably longer than a Claude API call — which has data center hardware on the other end. If your automation requires a response in under 5 seconds, the E4B model is more appropriate, or you route to the cloud. Gemma 4 for background tasks, Claude for interactive sessions, is a reasonable dividing line.

The τ2-bench 86.4% sounds great until you read what it's testing: structured retail task completion with defined tool schemas. Real-world agentic tasks are messier — ambiguous instructions, unexpected tool responses, mid-task pivots. Claude Sonnet 4.6 and GPT-4.1 still have an edge on complex multi-step reasoning under ambiguity. Gemma 4 is best when the task is structured and repeatable.

Finally, the Ollama GGUF quantization situation: at the time of writing, the Unsloth quants are the best available and they work well. But quantization at this model size introduces real degradation on edge cases. For anything critical or high-stakes, route to a full-precision cloud endpoint. Local Gemma 4 is for volume, not for your most sensitive tasks.

8What the Community Is Saying

The Hacker News thread hit 382 comments within 13 hours — which, for an open model release, is unusually active. The top comment, from a developer at Unsloth, noted that quantized versions were available within hours of the weights dropping and that they "work really well" for agentic use cases. Most of the HN discussion focused on the MoE architecture and whether the active-parameter counts translate meaningfully to real inference cost — the consensus seems to be yes, the 26B A4B model really does run more like a 4B model at inference time while punching considerably above that weight on quality benchmarks. On X, the posts getting the most traction are around the τ2-bench numbers specifically, with several AI infrastructure builders noting this is the first open model they'd seriously consider for production agentic pipelines — though most are pairing it with GPT-4.1 or Claude for edge cases rather than full replacement. The overall tone is: this is a real step forward, not a benchmark-optimized demo, and the fact that it came from Google's DeepMind-aligned research process gives people more confidence in the robustness of the evaluation.

9The Hybrid Strategy

The smartest setup isn't "Gemma 4 instead of Claude" — it's routing tasks intelligently between both. OpenClaw makes this explicit: the default model handles most work, and you override per-session or per-task when you need something specific.

Suggested routing logic:

Gemma 4 26B (local): Cron jobs, research summaries, multilingual content, image analysis, code review on structured diffs, any high-volume task where cost matters
Claude Sonnet 4.6 (API): Interactive sessions where response time matters, complex multi-step reasoning, anything involving high ambiguity or nuanced judgment
Gemma 4 E4B (edge): Fast pre-filtering, keyword classification, routing decisions, anything on a device without serious GPU memory

This hybrid approach is what the cost reduction guide points at — the biggest wins come from not using expensive cloud tokens for tasks that a local model handles correctly 95% of the time. Gemma 4's benchmark numbers suggest it can cover that 95% reliably for the structured, repeatable workloads that dominate most agent setups.

It dropped this morning. The quants are ready. The benchmarks are real. Pull the model, update your config, and see what actually happens with your existing automations — that's a more useful test than any benchmark anyway.

Join 2,000+ builders

Stay in the Loop

Get weekly OpenClaw tips, new skills, and automation ideas. No spam, unsubscribe anytime.

Join 2,000+ builders · No spam · Unsubscribe anytime

We use cookies for analytics. Learn more
Run your own AI agent for $6/month →