Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.openclaw.ai/llms.txt

Use this file to discover all available pages before exploring further.

The Google plugin provides access to Gemini models through Google AI Studio, plus image generation, media understanding (image/audio/video), text-to-speech, and web search via Gemini Grounding.
  • Provider: google
  • Auth: GEMINI_API_KEY or GOOGLE_API_KEY
  • API: Google Gemini API
  • Runtime option: agents.defaults.agentRuntime.id: "google-gemini-cli" reuses Gemini CLI OAuth while keeping model refs canonical as google/*.

Getting started

Choose your preferred auth method and follow the setup steps.
Best for: standard Gemini API access through Google AI Studio.
1

Run onboarding

openclaw onboard --auth-choice gemini-api-key
Or pass the key directly:
openclaw onboard --non-interactive \
  --mode local \
  --auth-choice gemini-api-key \
  --gemini-api-key "$GEMINI_API_KEY"
2

Set a default model

{
  agents: {
    defaults: {
      model: { primary: "google/gemini-3.1-pro-preview" },
    },
  },
}
3

Verify the model is available

openclaw models list --provider google
The environment variables GEMINI_API_KEY and GOOGLE_API_KEY are both accepted. Use whichever you already have configured.

Capabilities

CapabilitySupported
Chat completionsYes
Image generationYes
Music generationYes
Text-to-speechYes
Realtime voiceYes (Google Live API)
Image understandingYes
Audio transcriptionYes
Video understandingYes
Web search (Grounding)Yes
Thinking/reasoningYes (Gemini 2.5+ / Gemini 3+)
Gemma 4 modelsYes
The bundled gemini web-search provider uses Gemini Google Search grounding. Configure a dedicated search key under plugins.entries.google.config.webSearch, or let it reuse models.providers.google.apiKey after GEMINI_API_KEY:
{
  plugins: {
    entries: {
      google: {
        config: {
          webSearch: {
            apiKey: "AIza...", // optional if GEMINI_API_KEY or models.providers.google.apiKey is set
            baseUrl: "https://generativelanguage.googleapis.com/v1beta", // falls back to models.providers.google.baseUrl
            model: "gemini-2.5-flash",
          },
        },
      },
    },
  },
}
Credential precedence is dedicated webSearch.apiKey, then GEMINI_API_KEY, then models.providers.google.apiKey. webSearch.baseUrl is optional and exists for operator proxies or compatible Gemini API endpoints; when omitted, Gemini web search reuses models.providers.google.baseUrl. See Gemini search for the provider-specific tool behavior.
Gemini 3 models use thinkingLevel rather than thinkingBudget. OpenClaw maps Gemini 3, Gemini 3.1, and gemini-*-latest alias reasoning controls to thinkingLevel so default/low-latency runs do not send disabled thinkingBudget values./think adaptive keeps Google’s dynamic thinking semantics instead of choosing a fixed OpenClaw level. Gemini 3 and Gemini 3.1 omit a fixed thinkingLevel so Google can choose the level; Gemini 2.5 sends Google’s dynamic sentinel thinkingBudget: -1.Gemma 4 models (for example gemma-4-26b-a4b-it) support thinking mode. OpenClaw rewrites thinkingBudget to a supported Google thinkingLevel for Gemma 4. Setting thinking to off preserves thinking disabled instead of mapping to MINIMAL.

Image generation

The bundled google image-generation provider defaults to google/gemini-3.1-flash-image-preview.
  • Also supports google/gemini-3-pro-image-preview
  • Generate: up to 4 images per request
  • Edit mode: enabled, up to 5 input images
  • Geometry controls: size, aspectRatio, and resolution
To use Google as the default image provider:
{
  agents: {
    defaults: {
      imageGenerationModel: {
        primary: "google/gemini-3.1-flash-image-preview",
      },
    },
  },
}
See Image Generation for shared tool parameters, provider selection, and failover behavior.

Video generation

The bundled google plugin also registers video generation through the shared video_generate tool.
  • Default video model: google/veo-3.1-fast-generate-preview
  • Modes: text-to-video, image-to-video, and single-video reference flows
  • Supports aspectRatio, resolution, and audio
  • Current duration clamp: 4 to 8 seconds
To use Google as the default video provider:
{
  agents: {
    defaults: {
      videoGenerationModel: {
        primary: "google/veo-3.1-fast-generate-preview",
      },
    },
  },
}
See Video Generation for shared tool parameters, provider selection, and failover behavior.

Music generation

The bundled google plugin also registers music generation through the shared music_generate tool.
  • Default music model: google/lyria-3-clip-preview
  • Also supports google/lyria-3-pro-preview
  • Prompt controls: lyrics and instrumental
  • Output format: mp3 by default, plus wav on google/lyria-3-pro-preview
  • Reference inputs: up to 10 images
  • Session-backed runs detach through the shared task/status flow, including action: "status"
To use Google as the default music provider:
{
  agents: {
    defaults: {
      musicGenerationModel: {
        primary: "google/lyria-3-clip-preview",
      },
    },
  },
}
See Music Generation for shared tool parameters, provider selection, and failover behavior.

Text-to-speech

The bundled google speech provider uses the Gemini API TTS path with gemini-3.1-flash-tts-preview.
  • Default voice: Kore
  • Auth: messages.tts.providers.google.apiKey, models.providers.google.apiKey, GEMINI_API_KEY, or GOOGLE_API_KEY
  • Output: WAV for regular TTS attachments, Opus for voice-note targets, PCM for Talk/telephony
  • Voice-note output: Google PCM is wrapped as WAV and transcoded to 48 kHz Opus with ffmpeg
To use Google as the default TTS provider:
{
  messages: {
    tts: {
      auto: "always",
      provider: "google",
      providers: {
        google: {
          model: "gemini-3.1-flash-tts-preview",
          voiceName: "Kore",
          audioProfile: "Speak professionally with a calm tone.",
        },
      },
    },
  },
}
Gemini API TTS uses natural-language prompting for style control. Set audioProfile to prepend a reusable style prompt before the spoken text. Set speakerName when your prompt text refers to a named speaker. Gemini API TTS also accepts expressive square-bracket audio tags in the text, such as [whispers] or [laughs]. To keep tags out of the visible chat reply while sending them to TTS, put them inside a [[tts:text]]...[[/tts:text]] block:
Here is the clean reply text.

[[tts:text]][whispers] Here is the spoken version.[[/tts:text]]
A Google Cloud Console API key restricted to the Gemini API is valid for this provider. This is not the separate Cloud Text-to-Speech API path.

Realtime voice

The bundled google plugin registers a realtime voice provider backed by the Gemini Live API for backend audio bridges such as Voice Call and Google Meet.
SettingConfig pathDefault
Modelplugins.entries.voice-call.config.realtime.providers.google.modelgemini-2.5-flash-native-audio-preview-12-2025
Voice...google.voiceKore
Temperature...google.temperature(unset)
VAD start sensitivity...google.startSensitivity(unset)
VAD end sensitivity...google.endSensitivity(unset)
Silence duration...google.silenceDurationMs(unset)
Activity handling...google.activityHandlingGoogle default, start-of-activity-interrupts
Turn coverage...google.turnCoverageGoogle default, only-activity
Disable auto VAD...google.automaticActivityDetectionDisabledfalse
API key...google.apiKeyFalls back to models.providers.google.apiKey, GEMINI_API_KEY, or GOOGLE_API_KEY
Example Voice Call realtime config:
{
  plugins: {
    entries: {
      "voice-call": {
        enabled: true,
        config: {
          realtime: {
            enabled: true,
            provider: "google",
            providers: {
              google: {
                model: "gemini-2.5-flash-native-audio-preview-12-2025",
                voice: "Kore",
                activityHandling: "start-of-activity-interrupts",
                turnCoverage: "only-activity",
              },
            },
          },
        },
      },
    },
  },
}
Google Live API uses bidirectional audio and function calling over a WebSocket. OpenClaw adapts telephony/Meet bridge audio to Gemini’s PCM Live API stream and keeps tool calls on the shared realtime voice contract. Leave temperature unset unless you need sampling changes; OpenClaw omits non-positive values because Google Live can return transcripts without audio for temperature: 0. Gemini API transcription is enabled without languageCodes; the current Google SDK rejects language-code hints on this API path.
Control UI Talk supports Google Live browser sessions with constrained one-use tokens. Backend-only realtime voice providers can also run through the generic Gateway relay transport, which keeps provider credentials on the Gateway.
For maintainer live verification, run OPENAI_API_KEY=... GEMINI_API_KEY=... node --import tsx scripts/dev/realtime-talk-live-smoke.ts. The Google leg mints the same constrained Live API token shape used by Control UI Talk, opens the browser WebSocket endpoint, sends the initial setup payload, and waits for setupComplete.

Advanced configuration

For direct Gemini API runs (api: "google-generative-ai"), OpenClaw passes a configured cachedContent handle through to Gemini requests.
  • Configure per-model or global params with either cachedContent or legacy cached_content
  • If both are present, cachedContent wins
  • Example value: cachedContents/prebuilt-context
  • Gemini cache-hit usage is normalized into OpenClaw cacheRead from upstream cachedContentTokenCount
{
  agents: {
    defaults: {
      models: {
        "google/gemini-2.5-pro": {
          params: {
            cachedContent: "cachedContents/prebuilt-context",
          },
        },
      },
    },
  },
}
When using the google-gemini-cli OAuth provider, OpenClaw normalizes the CLI JSON output as follows:
  • Reply text comes from the CLI JSON response field.
  • Usage falls back to stats when the CLI leaves usage empty.
  • stats.cached is normalized into OpenClaw cacheRead.
  • If stats.input is missing, OpenClaw derives input tokens from stats.input_tokens - stats.cached.
If the Gateway runs as a daemon (launchd/systemd), make sure GEMINI_API_KEY is available to that process (for example, in ~/.openclaw/.env or via env.shellEnv).

Model selection

Choosing providers, model refs, and failover behavior.

Image generation

Shared image tool parameters and provider selection.

Video generation

Shared video tool parameters and provider selection.

Music generation

Shared music tool parameters and provider selection.