Providers

Inferrs

Edit source

inferrs can serve local models behind an OpenAI-compatible /v1 API. OpenClaw works with inferrs through the generic openai-completions path.

Property Value
Provider id inferrs (custom; configure under models.providers.inferrs)
Plugin none — inferrs is not a bundled OpenClaw provider plugin
Auth env var Optional. Any value works if your inferrs server has no auth
API OpenAI-compatible (openai-completions)
Suggested base URL http://127.0.0.1:8080/v1 (or wherever your inferrs server lives)

Getting started

  • Start inferrs with a model

    bash
    inferrs serve google/gemma-4-E2B-it \  --host 127.0.0.1 \  --port 8080 \  --device metal
  • Verify the server is reachable

    bash
    curl http://127.0.0.1:8080/healthcurl http://127.0.0.1:8080/v1/models
  • Add an OpenClaw provider entry

    Add an explicit provider entry and point your default model at it. See the full config example below.

  • Full config example

    This example uses Gemma 4 on a local inferrs server.

    json5
    {  agents: {    defaults: {      model: { primary: "inferrs/google/gemma-4-E2B-it" },      models: {        "inferrs/google/gemma-4-E2B-it": {          alias: "Gemma 4 (inferrs)",        },      },    },  },  models: {    mode: "merge",    providers: {      inferrs: {        baseUrl: "http://127.0.0.1:8080/v1",        apiKey: "inferrs-local",        api: "openai-completions",        models: [          {            id: "google/gemma-4-E2B-it",            name: "Gemma 4 E2B (inferrs)",            reasoning: false,            input: ["text"],            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },            contextWindow: 131072,            maxTokens: 4096,            compat: {              requiresStringContent: true,            },          },        ],      },    },  },}

    On-demand startup

    Inferrs can also be started by OpenClaw only when an inferrs/... model is selected. Add localService to the same provider entry:

    json5
    {  models: {    providers: {      inferrs: {        baseUrl: "http://127.0.0.1:8080/v1",        apiKey: "inferrs-local",        api: "openai-completions",        timeoutSeconds: 300,        localService: {          command: "/opt/homebrew/bin/inferrs",          args: [            "serve",            "google/gemma-4-E2B-it",            "--host",            "127.0.0.1",            "--port",            "8080",            "--device",            "metal",          ],          healthUrl: "http://127.0.0.1:8080/v1/models",          readyTimeoutMs: 180000,          idleStopMs: 0,        },        models: [          {            id: "google/gemma-4-E2B-it",            name: "Gemma 4 E2B (inferrs)",            reasoning: false,            input: ["text"],            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },            contextWindow: 131072,            maxTokens: 4096,            compat: {              requiresStringContent: true,            },          },        ],      },    },  },}

    command must be absolute. Use which inferrs on the Gateway host and put that path in config. For the full field reference, see Local model services.

    Advanced configuration

    Why requiresStringContent matters

    Some inferrs Chat Completions routes accept only string messages[].content, not structured content-part arrays.

    json5
    compat: {  requiresStringContent: true}

    OpenClaw will flatten pure text content parts into plain strings before sending the request.

    Gemma and tool-schema caveat

    Some current inferrs + Gemma combinations accept small direct /v1/chat/completions requests but still fail on full OpenClaw agent-runtime turns.

    If that happens, try this first:

    json5
    compat: {  requiresStringContent: true,  supportsTools: false}

    That disables OpenClaw's tool schema surface for the model and can reduce prompt pressure on stricter local backends.

    If tiny direct requests still work but normal OpenClaw agent turns continue to crash inside inferrs, the remaining issue is usually upstream model/server behavior rather than OpenClaw's transport layer.

    Manual smoke test

    Once configured, test both layers:

    bash
    curl http://127.0.0.1:8080/v1/chat/completions \  -H 'content-type: application/json' \  -d '{"model":"google/gemma-4-E2B-it","messages":[{"role":"user","content":"What is 2 + 2?"}],"stream":false}'
    bash
    openclaw infer model run \  --model inferrs/google/gemma-4-E2B-it \  --prompt "What is 2 + 2? Reply with one short sentence." \  --json

    If the first command works but the second fails, check the troubleshooting section below.

    Proxy-style behavior

    inferrs is treated as a proxy-style OpenAI-compatible /v1 backend, not a native OpenAI endpoint.

    • Native OpenAI-only request shaping does not apply here
    • No service_tier, no Responses store, no prompt-cache hints, and no OpenAI reasoning-compat payload shaping
    • Hidden OpenClaw attribution headers (originator, version, User-Agent) are not injected on custom inferrs base URLs

    Troubleshooting

    curl /v1/models fails

    inferrs is not running, not reachable, or not bound to the expected host/port. Make sure the server is started and listening on the address you configured.

    messages[].content expected a string

    Set compat.requiresStringContent: true in the model entry. See the requiresStringContent section above for details.

    Direct /v1/chat/completions calls pass but openclaw infer model run fails

    Try setting compat.supportsTools: false to disable the tool schema surface. See the Gemma tool-schema caveat above.

    inferrs still crashes on larger agent turns

    If OpenClaw no longer gets schema errors but inferrs still crashes on larger agent turns, treat it as an upstream inferrs or model limitation. Reduce prompt pressure or switch to a different local backend or model.

    Was this useful?