OpenClaw integrates with Ollama’s native API (Documentation Index
Fetch the complete documentation index at: https://docs.openclaw.ai/llms.txt
Use this file to discover all available pages before exploring further.
/api/chat) for hosted cloud models and local/self-hosted Ollama servers. You can use Ollama in three modes: Cloud + Local through a reachable Ollama host, Cloud only against https://ollama.com, or Local only against a reachable Ollama host.
Ollama provider config uses baseUrl as the canonical key. OpenClaw also accepts baseURL for compatibility with OpenAI SDK-style examples, but new config should prefer baseUrl.
Auth rules
Local and LAN hosts
Local and LAN hosts
ollama-local marker only for loopback, private-network, .local, and bare-hostname Ollama base URLs.Remote and Ollama Cloud hosts
Remote and Ollama Cloud hosts
https://ollama.com) require a real credential through OLLAMA_API_KEY, an auth profile, or the provider’s apiKey.Custom provider ids
Custom provider ids
api: "ollama" follow the same rules. For example, an ollama-remote provider that points at a private LAN Ollama host can use apiKey: "ollama-local" and sub-agents will resolve that marker through the Ollama provider hook instead of treating it as a missing credential. Memory search can also set agents.defaults.memorySearch.provider to that custom provider id so embeddings use the matching Ollama endpoint.Auth profiles
Auth profiles
auth-profiles.json stores the credential for a provider id. Put endpoint settings (baseUrl, api, model ids, headers, timeouts) in models.providers.<id>. Older flat auth-profile files such as { "ollama-windows": { "apiKey": "ollama-local" } } are not a runtime format; run openclaw doctor --fix to rewrite them to the canonical ollama-windows:default API-key profile with a backup. baseUrl in that file is compatibility noise and should be moved to provider config.Memory embedding scope
Memory embedding scope
- A provider-level key is sent only to that provider’s Ollama host.
agents.*.memorySearch.remote.apiKeyis sent only to its remote embedding host.- A pure
OLLAMA_API_KEYenv value is treated as the Ollama Cloud convention, not sent to local or self-hosted hosts by default.
Getting started
Choose your preferred setup method and mode.- Onboarding (recommended)
- Manual setup
Choose your mode
- Cloud + Local — local Ollama host plus cloud models routed through that host
- Cloud only — hosted Ollama models via
https://ollama.com - Local only — local models only
Select a model
Cloud only prompts for OLLAMA_API_KEY and suggests hosted cloud defaults. Cloud + Local and Local only ask for an Ollama base URL, discover available models, and auto-pull the selected local model if it is not available yet. When Ollama reports an installed :latest tag such as gemma4:latest, setup shows that installed model once instead of showing both gemma4 and gemma4:latest or pulling the bare alias again. Cloud + Local also checks whether that Ollama host is signed in for cloud access.Non-interactive mode
Cloud models
- Cloud + Local
- Cloud only
- Local only
Cloud + Local uses a reachable Ollama host as the control point for both local and cloud models. This is Ollama’s preferred hybrid flow.Use Cloud + Local during setup. OpenClaw prompts for the Ollama base URL, discovers local models from that host, and checks whether the host is signed in for cloud access with ollama signin. When the host is signed in, OpenClaw also suggests hosted cloud defaults such as kimi-k2.5:cloud, minimax-m2.7:cloud, and glm-5.1:cloud.If the host is not signed in yet, OpenClaw keeps the setup local-only until you run ollama signin.Model discovery (implicit provider)
When you setOLLAMA_API_KEY (or an auth profile) and do not define models.providers.ollama or another custom remote provider with api: "ollama", OpenClaw discovers models from the local Ollama instance at http://127.0.0.1:11434.
| Behavior | Detail |
|---|---|
| Catalog query | Queries /api/tags |
| Capability detection | Uses best-effort /api/show lookups to read contextWindow, expanded num_ctx Modelfile parameters, and capabilities including vision/tools |
| Vision models | Models with a vision capability reported by /api/show are marked as image-capable (input: ["text", "image"]), so OpenClaw auto-injects images into the prompt |
| Reasoning detection | Uses /api/show capabilities when available, including thinking; falls back to a model-name heuristic (r1, reasoning, think) when Ollama omits capabilities |
| Token limits | Sets maxTokens to the default Ollama max-token cap used by OpenClaw |
| Costs | Sets all costs to 0 |
ollama/<pulled-model>:latest in local infer model run; OpenClaw resolves that installed model from Ollama’s live catalog without requiring a hand-written models.json entry.
infer model run with a full Ollama model ref:
infer model run. This sends the prompt and image directly to
the selected Ollama vision model without loading chat tools, memory, or prior
session context:
model run --file accepts files detected as image/*, including common PNG,
JPEG, and WebP inputs. Non-image files are rejected before Ollama is called.
For speech recognition, use openclaw infer audio transcribe instead.
When you switch a conversation with /model ollama/<model>, OpenClaw treats
that as an exact user selection. If the configured Ollama baseUrl is
unreachable, the next reply fails with the provider error instead of silently
answering from another configured fallback model.
Isolated cron jobs do one extra local safety check before they start the agent
turn. If the selected model resolves to a local, private-network, or .local
Ollama provider and /api/tags is unreachable, OpenClaw records that cron run
as skipped with the selected ollama/<model> in the error text. The endpoint
preflight is cached for 5 minutes, so multiple cron jobs pointed at the same
stopped Ollama daemon do not all launch failing model requests.
Live-verify the local text path, native stream path, and embeddings against
local Ollama with:
models.providers.ollama explicitly, or configure a custom remote provider such as models.providers.ollama-cloud with api: "ollama", auto-discovery is skipped and you must define models manually. Loopback custom providers such as http://127.0.0.2:11434 are still treated as local. See the explicit config section below.Vision and image description
The bundled Ollama plugin registers Ollama as an image-capable media-understanding provider. This lets OpenClaw route explicit image-description requests and configured image-model defaults through local or hosted Ollama vision models. For local vision, pull a model that supports images:--model must be a full <provider/model> ref. When it is set, openclaw infer image describe runs that model directly instead of skipping description because the model supports native vision.
Use infer image describe when you want OpenClaw’s image-understanding provider flow, configured agents.defaults.imageModel, and image-description output shape. Use infer model run --file when you want a raw multimodal model probe with a custom prompt and one or more images.
To make Ollama the default image-understanding model for inbound media, configure agents.defaults.imageModel:
ollama/<model> ref. If the same model is listed under models.providers.ollama.models with input: ["text", "image"] and no other configured image provider exposes that bare model ID, OpenClaw also normalizes a bare imageModel ref such as qwen2.5vl:7b to ollama/qwen2.5vl:7b. If more than one configured image provider has the same bare ID, use the provider prefix explicitly.
Slow local vision models can need a longer image-understanding timeout than cloud models. They can also crash or stop when Ollama tries to allocate the full advertised vision context on constrained hardware. Set a capability timeout, and cap num_ctx on the model entry when you only need a normal image-description turn:
image tool the agent can call during a turn. Provider-level models.providers.ollama.timeoutSeconds still controls the underlying Ollama HTTP request guard for normal model calls.
Live-verify the explicit image tool against local Ollama with:
models.providers.ollama.models manually, mark vision models with image input support:
/api/show reports a vision capability.
Configuration
- Basic (implicit discovery)
- Explicit (manual models)
- Custom base URL
Common recipes
Use these as starting points and replace model IDs with the exact names fromollama list or openclaw models list --provider ollama.
Local model with auto-discovery
Local model with auto-discovery
models.providers.ollama block unless you want to define models manually.LAN Ollama host with manual models
LAN Ollama host with manual models
/v1.contextWindow is the OpenClaw-side context budget. params.num_ctx is sent to Ollama for the request. Keep them aligned when your hardware cannot run the model’s full advertised context.Ollama Cloud only
Ollama Cloud only
Cloud plus local through a signed-in daemon
Cloud plus local through a signed-in daemon
ollama signin and should serve both local models and :cloud models.Multiple Ollama hosts
Multiple Ollama hosts
ollama-large/qwen3.5:27b reaches Ollama as qwen3.5:27b.Lean local model profile
Lean local model profile
compat.supportsTools: false only when the model or server reliably fails on tool schemas. It trades agent capability for stability.
localModelLean removes the browser, cron, and message tools from the agent surface, but it does not change Ollama’s runtime context or thinking mode. Pair it with explicit params.num_ctx and params.thinking: false for small Qwen-style thinking models that loop or spend their response budget on hidden reasoning.Model selection
Once configured, all your Ollama models are available:ollama-spark/qwen3:32b, OpenClaw strips only that
prefix before calling Ollama so the server receives qwen3:32b.
For slow local models, prefer provider-scoped request tuning before raising the
whole agent runtime timeout:
timeoutSeconds applies to the model HTTP request, including connection setup,
headers, body streaming, and the total guarded-fetch abort. params.keep_alive
is forwarded to Ollama as top-level keep_alive on native /api/chat requests;
set it per model when first-turn load time is the bottleneck.
Quick verification
127.0.0.1 with the host used in baseUrl. If curl works but OpenClaw does not, check whether the Gateway runs on a different machine, container, or service account.
Ollama Web Search
OpenClaw supports Ollama Web Search as a bundledweb_search provider.
| Property | Detail |
|---|---|
| Host | Uses your configured Ollama host (models.providers.ollama.baseUrl when set, otherwise http://127.0.0.1:11434); https://ollama.com uses the hosted API directly |
| Auth | Key-free for signed-in local Ollama hosts; OLLAMA_API_KEY or configured provider auth for direct https://ollama.com search or auth-protected hosts |
| Requirement | Local/self-hosted hosts must be running and signed in with ollama signin; direct hosted search requires baseUrl: "https://ollama.com" plus a real Ollama API key |
openclaw onboard or openclaw configure --section web, or set:
/api/experimental/web_search proxy. For https://ollama.com, it calls the hosted /api/web_search endpoint directly.
Advanced configuration
Legacy OpenAI-compatible mode
Legacy OpenAI-compatible mode
api: "openai-completions" explicitly:params: { streaming: false } in model config.When api: "openai-completions" is used with Ollama, OpenClaw injects options.num_ctx by default so Ollama does not silently fall back to a 4096 context window. If your proxy/upstream rejects unknown options fields, disable this behavior:Context windows
Context windows
PARAMETER num_ctx values from custom Modelfiles. Otherwise it falls back to the default Ollama context window used by OpenClaw.You can set provider-level contextWindow, contextTokens, and maxTokens defaults for every model under that Ollama provider, then override them per model when needed. contextWindow is OpenClaw’s prompt and compaction budget. Native Ollama requests leave options.num_ctx unset unless you explicitly configure params.num_ctx, so Ollama can apply its own model, OLLAMA_CONTEXT_LENGTH, or VRAM-based default. To cap or force Ollama’s per-request runtime context without rebuilding a Modelfile, set params.num_ctx; invalid, zero, negative, and non-finite values are ignored. The OpenAI-compatible Ollama adapter still injects options.num_ctx by default from the configured params.num_ctx or contextWindow; disable that with injectNumCtxForOpenAICompat: false if your upstream rejects options.Native Ollama model entries also accept the common Ollama runtime options under params, including temperature, top_p, top_k, min_p, num_predict, stop, repeat_penalty, num_batch, num_thread, and use_mmap. OpenClaw forwards only Ollama request keys, so OpenClaw runtime params such as streaming are not leaked to Ollama. Use params.think or params.thinking to send top-level Ollama think; false disables API-level thinking for Qwen-style thinking models.agents.defaults.models["ollama/<model>"].params.num_ctx works too. If both are configured, the explicit provider model entry wins over the agent default.Thinking control
Thinking control
think, not options.think. Auto-discovered models whose /api/show response includes the thinking capability expose /think low, /think medium, /think high, and /think max; non-thinking models expose only /think off.params.think or params.thinking can disable or force Ollama API thinking for a specific configured model. OpenClaw preserves those explicit model params when the active run only has the implicit default off; non-off runtime commands such as /think medium still override the active run.Reasoning models
Reasoning models
deepseek-r1, reasoning, or think as reasoning-capable by default.Model costs
Model costs
Memory embeddings
Memory embeddings
/api/embed endpoint, and batches
multiple memory chunks into one input request when possible.| Property | Value |
|---|---|
| Default model | nomic-embed-text |
| Auto-pull | Yes — the embedding model is pulled automatically if not present locally |
nomic-embed-text, qwen3-embedding, and mxbai-embed-large. Memory document batches stay raw so existing indexes do not need a format migration.To select Ollama as the memory search embedding provider:Streaming configuration
Streaming configuration
/api/chat) by default, which fully supports streaming and tool calling simultaneously. No special configuration is needed.For native /api/chat requests, OpenClaw also forwards thinking control directly to Ollama: /think off and openclaw agent --thinking off send top-level think: false unless an explicit model params.think/params.thinking value is configured, while /think low|medium|high send the matching top-level think effort string. /think max maps to Ollama’s highest native effort, think: "high".Troubleshooting
WSL2 crash loop (repeated reboots)
WSL2 crash loop (repeated reboots)
ollama.service systemd unit with Restart=always. If that service autostarts and loads a GPU-backed model during WSL2 boot, Ollama can pin host memory while the model loads. Hyper-V memory reclaim cannot always reclaim those pinned pages, so Windows can terminate the WSL2 VM, systemd starts Ollama again, and the loop repeats.Common evidence:- repeated WSL2 reboots or terminations from the Windows side
- high CPU in
app.sliceorollama.serviceshortly after WSL2 startup - SIGTERM from systemd rather than a Linux OOM-killer event
ollama.service enabled with Restart=always, and visible CUDA markers.Mitigation:%USERPROFILE%\.wslconfig on the Windows side, then run wsl --shutdown:Ollama not detected
Ollama not detected
OLLAMA_API_KEY (or an auth profile), and that you did not define an explicit models.providers.ollama entry:No models available
No models available
models.providers.ollama.Connection refused
Connection refused
Remote host works with curl but not OpenClaw
Remote host works with curl but not OpenClaw
baseUrlpoints atlocalhost, but the Gateway runs in Docker or on another host.- The URL uses
/v1, which selects OpenAI-compatible behavior instead of native Ollama. - The remote host needs firewall or LAN binding changes on the Ollama side.
- The model is present on your laptop’s daemon but not on the remote daemon.
Model outputs tool JSON as text
Model outputs tool JSON as text
compat.supportsTools: false on that model entry and retest.Kimi or GLM returns garbled symbols
Kimi or GLM returns garbled symbols
Cloud + Local or Cloud only, then try a fresh session and a fallback model:Cold local model times out
Cold local model times out
timeoutSeconds also extends the guarded Undici connect timeout for this provider.Large-context model is too slow or runs out of memory
Large-context model is too slow or runs out of memory
params.num_ctx. Cap both OpenClaw’s budget and Ollama’s request context when you want predictable first-token latency:contextWindow first if OpenClaw is sending too much prompt. Lower params.num_ctx if Ollama is loading a runtime context that is too large for the machine. Lower maxTokens if generation runs too long.