Run OpenClaw with Ollama: Fully Offline AI on Your Own Hardware
No API keys. No data leaving your machine. No monthly bill. Here's how to wire OpenClaw into Ollama and run a capable AI agent that works even when the internet is down.
Something shifted in 2026. Running a capable large language model locally stopped being a nerd experiment and became genuinely practical. Models got smaller and smarter. Hardware got cheaper. And tools like Ollama made spinning up a local LLM about as hard as installing a Node package.
If you're already running OpenClaw โ your personal AI agent โ the natural next question is: can I cut out the API entirely? Run everything on my own machine, keep my data private, zero recurring fees?
The answer is yes. And this guide shows you exactly how.
Why Run Fully Offline?
Before we get into setup, let's be honest about the tradeoffs. Running locally isn't strictly better โ it's a different profile. Here's when it makes sense:
Privacy First
Your prompts never leave the machine. Medical notes, legal documents, personal journals โ none of it touches a cloud API. Zero telemetry.
Cost at Scale
Cloud LLMs charge per token. If you run thousands of cron jobs or heavy automation, local inference costs nothing after the hardware investment.
True Independence
API outages, model deprecations, pricing changes โ none of that affects you. Your agent works on a plane, in a cabin, on an air-gapped network.
Honest caveat: Local models are still not as capable as Claude Sonnet or GPT-4o for complex reasoning tasks. For most daily automations, cron jobs, and information retrieval? Totally fine. For advanced code review, multi-step reasoning chains, or nuanced writing? You'll notice the gap. See the Hybrid Mode section for the best-of-both-worlds approach.
Hardware Requirements
Ollama runs on Mac (Apple Silicon and Intel), Linux, and Windows. The main constraint is RAM โ the model has to fit in memory.
| RAM | Recommended Models | Use Case | Speed |
|---|---|---|---|
| 8 GB | Llama 3.2 3B, Phi-3 Mini | Light tasks, quick lookups | Fast |
| 16 GB | Llama 3.1 8B, Mistral 7B, Qwen 2.5 7B | Most daily automations | Good |
| 32 GB | Llama 3.1 70B (Q4), DeepSeek R1 32B | Complex reasoning, coding | Moderate |
| 64 GB+ | Llama 3.3 70B, Qwen 2.5 72B | Near-frontier quality | Moderate |
Apple Silicon (M1/M2/M3/M4) has a significant advantage here โ unified memory means the GPU and CPU share RAM, so a 32 GB Mac mini runs 70B models better than a 32 GB Linux box with a discrete GPU. Running on a Mac mini? You're well positioned.
Install Ollama
Ollama is one command on macOS and Linux. It exposes a local HTTP server on port 11434 that speaks the OpenAI Chat Completions API format โ which is exactly what OpenClaw can point to.
macOS / Linux
# Install Ollama curl -fsSL https://ollama.com/install.sh | sh # Pull a model (Llama 3.1 8B is a solid starting point) ollama pull llama3.1:8b # Verify it's running curl http://localhost:11434/api/tags
Test it works
# Quick inference test
ollama run llama3.1:8b "What is the capital of France?"
# Or via HTTP (OpenAI-compatible)
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.1:8b",
"messages": [{"role": "user", "content": "Hello!"}]
}'If you see a JSON response with a message, Ollama is running. On macOS, Ollama also runs as a menubar app and starts automatically on login โ no daemon management needed.
Pick Your Local Model
Model choice matters more locally than in the cloud, because you're constrained by RAM and inference speed. My practical recommendations for OpenClaw use cases:
Llama 3.1 8B โ Best All-Rounder
RecommendedFast, capable, instruction-following is excellent. Handles cron automation, file operations, web summaries, basic coding. Runs well on 16 GB RAM.
ollama pull llama3.1:8bQwen 2.5 7B โ Best for Coding
Code TasksAlibaba's Qwen 2.5 punches well above its size for coding tasks. If your OpenClaw automations involve a lot of code generation or file manipulation, this is the one.
ollama pull qwen2.5:7bDeepSeek R1 32B โ Best for Reasoning
32 GB RAMDeepSeek's R1 architecture with chain-of-thought reasoning. If you have 32+ GB and want local reasoning quality approaching frontier models, this is the pick for complex automations.
ollama pull deepseek-r1:32bConfigure OpenClaw to Use Ollama
OpenClaw supports custom model providers. Since Ollama speaks the OpenAI API format, you can point OpenClaw directly at your local Ollama instance.
Open your OpenClaw config (usually ~/.openclaw/config.yaml) and add or update the model provider section:
~/.openclaw/config.yaml
# Point OpenClaw at your local Ollama instance
model:
provider: openai-compatible
baseUrl: http://localhost:11434/v1
apiKey: ollama # Ollama doesn't need a real key
default: llama3.1:8b # Your primary local model
# Optional: set a fallback to Claude for complex tasks
fallback:
provider: anthropic
model: claude-sonnet-4-6
triggerOnError: true # Falls back if local model failsAfter saving the config, restart OpenClaw:
openclaw gateway restart
Send a test message via Telegram (or whatever channel you use) and you should see responses coming from your local model. Check the logs to confirm:
# Tail OpenClaw logs to confirm local model is being used tail -f ~/.openclaw/logs/gateway.log | grep -i "model|ollama"
Hybrid Mode: Local + Cloud
Fully offline is great for privacy-sensitive tasks. But for heavy lifting โ complex analysis, writing full blog posts, multi-step reasoning โ you probably still want access to frontier models. The smart approach is hybrid mode: local by default, cloud when needed.
You can achieve this in OpenClaw using per-session model overrides. From your Telegram chat, you can switch models on the fly:
Model switching commands
# Switch to local Llama for the session /model ollama/llama3.1:8b # Switch to Claude for a complex task /model anthropic/claude-sonnet-4-6 # Reset to default (whatever config.yaml says) /model default
For cron jobs specifically, you can set the model per-job in the cron definition. Route lightweight recurring tasks (daily briefings, feed summaries, simple data transforms) to the local model, and save the cloud API for jobs that genuinely need it.
Cron job with local model
# In your cron job payload config payload: kind: agentTurn message: "Summarize today's news from my feeds" model: ollama/llama3.1:8b # local, free timeoutSeconds: 120
Performance Tips
Local inference is slower than cloud APIs, especially on consumer hardware. A few things that make a real difference:
Use quantized models (Q4_K_M)
Most Ollama models come in quantized variants. Q4_K_M is the sweet spot โ 4-bit quantization with minimal quality loss and ~50% smaller footprint than FP16.
ollama pull llama3.1:8b-instruct-q4_K_MKeep Ollama warm
Ollama unloads models from memory after a timeout. For responsiveness, set a long keep-alive so the model stays loaded between requests.
OLLAMA_KEEP_ALIVE=24h ollama serveTune context window
Larger context windows use more memory. For simple tasks, reducing from 128K to 8K context cuts memory usage significantly.
OLLAMA_NUM_CTX=8192 ollama serveUse GPU if available
On Apple Silicon, Ollama automatically uses the Metal GPU. On Linux, ensure CUDA or ROCm drivers are installed โ this is a 5-10x speedup over CPU inference.
ollama run llama3.1:8b --verbose # shows GPU layersReal-World Benchmarks
To give you concrete numbers, here's what I benchmarked running Llama 3.1 8B via Ollama on a Mac mini M4 Pro (24 GB unified memory) โ a typical OpenClaw setup:
| Task | Llama 3.1 8B (local) | Claude Sonnet 4.6 (cloud) | Winner |
|---|---|---|---|
| First token latency | ~0.8s | ~1.2s | ๐ Local |
| Summarize 500-word article | ~6s | ~4s | โ๏ธ Cloud |
| Write a cron job config | ~8s | ~5s | โ๏ธ Cloud |
| Answer factual question | ~3s | ~3s | ๐ค Tie |
| Complex data analysis | ~45s (degraded) | ~15s | โ๏ธ Cloud |
| Cost per 1M tokens | $0.00 | ~$3.00 | ๐ Local |
Mac mini M4 Pro, 24 GB unified memory. Results will vary with different hardware and quantization levels.
Limitations to Know
I'd be doing you a disservice if I didn't call these out clearly:
- โ ๏ธ
Tool use (function calling) quality varies a lot between local models. Llama 3.1 handles it reasonably well, but smaller models like Phi-3 Mini can be unreliable. Test your automations thoroughly before relying on them.
- โ ๏ธ
Context window management is trickier locally. OpenClaw's LCM (context compression) helps, but large context loads hit local models harder than cloud APIs.
- โ ๏ธ
Model updates require manual action. When Meta releases Llama 3.2, you pull it. There's no automatic upgrade. You might fall behind frontier capability over time.
- โ ๏ธ
Your machine has to be on. Cloud OpenClaw on a VPS runs 24/7 without you. Local OpenClaw depends on your hardware being awake โ fine for a home server, less ideal for on-the-go mobile use.
For most people, the answer is hybrid: local Ollama for privacy-sensitive and high-volume tasks, cloud API for the complex stuff. It's not either/or. See the full cloud vs self-hosted breakdown for more on this decision.
Getting Started
Here's the five-minute path to a working local setup:
- 1
Install Ollama:
curl -fsSL https://ollama.com/install.sh | sh - 2
Pull a model:
ollama pull llama3.1:8b - 3
Verify it's running:
curl http://localhost:11434/api/tags - 4
Update your OpenClaw config to point at
http://localhost:11434/v1 - 5
Restart OpenClaw:
openclaw gateway restart - 6
Send a test message โ you should see local inference in action.
If you don't have OpenClaw set up yet, the complete setup guide covers everything from scratch. Once you have it running, layering in Ollama takes under ten minutes.
The direction is clear: AI that runs locally, privately, and without a monthly bill is no longer a science project. It's a production-grade option. The question isn't if you should explore it โ it's which workloads to run where. Start local, expand to cloud where it matters, and build the hybrid stack that works for you.
Related Articles
Get notified about new features and guides
Join 2,000+ builders getting weekly OpenClaw tips, skills, and automation ideas.