vLLM can serve open-source (and some custom) models via an OpenAI-compatible HTTP API. OpenClaw connects to vLLM using theDocumentation Index
Fetch the complete documentation index at: https://docs.openclaw.ai/llms.txt
Use this file to discover all available pages before exploring further.
openai-completions API.
OpenClaw can also auto-discover available models from vLLM when you opt in with VLLM_API_KEY (any value works if your server does not enforce auth) and you do not define an explicit models.providers.vllm entry.
OpenClaw treats vllm as a local OpenAI-compatible provider that supports
streamed usage accounting, so status/context token counts can update from
stream_options.include_usage responses.
| Property | Value |
|---|---|
| Provider ID | vllm |
| API | openai-completions (OpenAI-compatible) |
| Auth | VLLM_API_KEY environment variable |
| Default base URL | http://127.0.0.1:8000/v1 |
Getting started
Start vLLM with an OpenAI-compatible server
Your base URL should expose
/v1 endpoints (e.g. /v1/models, /v1/chat/completions). vLLM commonly runs on:Model discovery (implicit provider)
WhenVLLM_API_KEY is set (or an auth profile exists) and you do not define models.providers.vllm, OpenClaw queries:
If you set
models.providers.vllm explicitly, auto-discovery is skipped and you must define models manually.Explicit configuration (manual models)
Use explicit config when:- vLLM runs on a different host or port
- You want to pin
contextWindowormaxTokensvalues - Your server requires a real API key (or you want to control headers)
- You connect to a trusted loopback, LAN, or Tailscale vLLM endpoint
Advanced configuration
Proxy-style behavior
Proxy-style behavior
vLLM is treated as a proxy-style OpenAI-compatible
/v1 backend, not a native
OpenAI endpoint. This means:| Behavior | Applied? |
|---|---|
| Native OpenAI request shaping | No |
service_tier | Not sent |
Responses store | Not sent |
| Prompt-cache hints | Not sent |
| OpenAI reasoning-compat payload shaping | Not applied |
| Hidden OpenClaw attribution headers | Not injected on custom base URLs |
Qwen thinking controls
Qwen thinking controls
For Qwen models served through vLLM, set
Non-
params.qwenThinkingFormat: "chat-template" on the model entry when the
server expects Qwen chat-template kwargs. OpenClaw maps /think off to:off thinking levels send enable_thinking: true. If your endpoint
expects DashScope-style top-level flags instead, use
params.qwenThinkingFormat: "top-level" to send enable_thinking at the
request root. Snake-case params.qwen_thinking_format is also accepted.Nemotron 3 thinking controls
Nemotron 3 thinking controls
vLLM/Nemotron 3 can use chat-template kwargs to control whether reasoning is
returned as hidden reasoning or visible answer text. When an OpenClaw session
uses To customize these values, set
vllm/nemotron-3-* with thinking off, the bundled vLLM plugin sends:chat_template_kwargs under the model params.
If you also set params.extra_body.chat_template_kwargs, that value has
final precedence because extra_body is the last request-body override.Qwen tool calls appear as text
Qwen tool calls appear as text
First make sure vLLM was started with the right tool-call parser and chat
template for the model. For example, vLLM documents Replace You can apply the same override from the CLI:This is an opt-in compatibility workaround. It makes every model turn with
tools require a tool call, so use it only for a dedicated local model entry
where that behavior is acceptable. Do not use it as a global default for all
vLLM models, and do not use a proxy that blindly converts arbitrary
assistant text into executable tool calls.
hermes for Qwen2.5
models and qwen3_xml for Qwen3-Coder models.Symptoms:- skills or tools never run
- the assistant prints raw JSON/XML such as
{"name":"read","arguments":...} - vLLM returns an empty
tool_callsarray when OpenClaw sendstool_choice: "auto"
tool_choice: "required". For those model entries, force the
OpenAI-compatible request field with params.extra_body:Qwen-Qwen2.5-Coder-32B-Instruct with the exact id returned by:Custom base URL
Custom base URL
If your vLLM server runs on a non-default host or port, set
baseUrl in the explicit provider config:Troubleshooting
Slow first response or remote server timeout
Slow first response or remote server timeout
For large local models, remote LAN hosts, or tailnet links, set a
provider-scoped request timeout:
timeoutSeconds applies to vLLM model HTTP requests only, including
connection setup, response headers, body streaming, and the total
guarded-fetch abort. Prefer this before increasing
agents.defaults.timeoutSeconds, which controls the whole agent run.Server not reachable
Server not reachable
Check that the vLLM server is running and accessible:If you see a connection error, verify the host, port, and that vLLM started with the OpenAI-compatible server mode.
For explicit loopback, LAN, or Tailscale endpoints, also set
models.providers.vllm.request.allowPrivateNetwork: true; provider
requests block private-network URLs by default unless the provider is
explicitly trusted.Auth errors on requests
Auth errors on requests
If requests fail with auth errors, set a real
VLLM_API_KEY that matches your server configuration, or configure the provider explicitly under models.providers.vllm.No models discovered
No models discovered
Auto-discovery requires
VLLM_API_KEY to be set and no explicit models.providers.vllm config entry. If you have defined the provider manually, OpenClaw skips discovery and uses only your declared models.Tools render as raw text
Tools render as raw text
If a Qwen model prints JSON/XML tool syntax instead of executing a skill,
check the Qwen guidance in Advanced configuration above. The usual fix is:
- start vLLM with the correct parser/template for that model
- confirm the exact model id with
openclaw models list --provider vllm - add a dedicated per-model
params.extra_body.tool_choice: "required"override only iftool_choice: "auto"still returns empty or text-only tool calls
Related
Model selection
Choosing providers, model refs, and failover behavior.
OpenAI
Native OpenAI provider and OpenAI-compatible route behavior.
OAuth and auth
Auth details and credential reuse rules.
Troubleshooting
Common issues and how to resolve them.