The Google plugin provides access to Gemini models through Google AI Studio, plus image generation, media understanding (image/audio/video), text-to-speech, and web search via Gemini Grounding.Documentation Index
Fetch the complete documentation index at: https://docs.openclaw.ai/llms.txt
Use this file to discover all available pages before exploring further.
- Provider:
google - Auth:
GEMINI_API_KEYorGOOGLE_API_KEY - API: Google Gemini API
- Runtime option:
agents.defaults.agentRuntime.id: "google-gemini-cli"reuses Gemini CLI OAuth while keeping model refs canonical asgoogle/*.
Getting started
Choose your preferred auth method and follow the setup steps.- API key
- Gemini CLI (OAuth)
Capabilities
| Capability | Supported |
|---|---|
| Chat completions | Yes |
| Image generation | Yes |
| Music generation | Yes |
| Text-to-speech | Yes |
| Realtime voice | Yes (Google Live API) |
| Image understanding | Yes |
| Audio transcription | Yes |
| Video understanding | Yes |
| Web search (Grounding) | Yes |
| Thinking/reasoning | Yes (Gemini 2.5+ / Gemini 3+) |
| Gemma 4 models | Yes |
Web search
The bundledgemini web-search provider uses Gemini Google Search grounding.
Configure a dedicated search key under plugins.entries.google.config.webSearch,
or let it reuse models.providers.google.apiKey after GEMINI_API_KEY:
webSearch.apiKey, then GEMINI_API_KEY,
then models.providers.google.apiKey. webSearch.baseUrl is optional and
exists for operator proxies or compatible Gemini API endpoints; when omitted,
Gemini web search reuses models.providers.google.baseUrl. See
Gemini search for the provider-specific tool behavior.
Image generation
The bundledgoogle image-generation provider defaults to
google/gemini-3.1-flash-image-preview.
- Also supports
google/gemini-3-pro-image-preview - Generate: up to 4 images per request
- Edit mode: enabled, up to 5 input images
- Geometry controls:
size,aspectRatio, andresolution
See Image Generation for shared tool parameters, provider selection, and failover behavior.
Video generation
The bundledgoogle plugin also registers video generation through the shared
video_generate tool.
- Default video model:
google/veo-3.1-fast-generate-preview - Modes: text-to-video, image-to-video, and single-video reference flows
- Supports
aspectRatio,resolution, andaudio - Current duration clamp: 4 to 8 seconds
See Video Generation for shared tool parameters, provider selection, and failover behavior.
Music generation
The bundledgoogle plugin also registers music generation through the shared
music_generate tool.
- Default music model:
google/lyria-3-clip-preview - Also supports
google/lyria-3-pro-preview - Prompt controls:
lyricsandinstrumental - Output format:
mp3by default, pluswavongoogle/lyria-3-pro-preview - Reference inputs: up to 10 images
- Session-backed runs detach through the shared task/status flow, including
action: "status"
See Music Generation for shared tool parameters, provider selection, and failover behavior.
Text-to-speech
The bundledgoogle speech provider uses the Gemini API TTS path with
gemini-3.1-flash-tts-preview.
- Default voice:
Kore - Auth:
messages.tts.providers.google.apiKey,models.providers.google.apiKey,GEMINI_API_KEY, orGOOGLE_API_KEY - Output: WAV for regular TTS attachments, Opus for voice-note targets, PCM for Talk/telephony
- Voice-note output: Google PCM is wrapped as WAV and transcoded to 48 kHz Opus with
ffmpeg
audioProfile to prepend a reusable style prompt before the spoken text. Set
speakerName when your prompt text refers to a named speaker.
Gemini API TTS also accepts expressive square-bracket audio tags in the text,
such as [whispers] or [laughs]. To keep tags out of the visible chat reply
while sending them to TTS, put them inside a [[tts:text]]...[[/tts:text]]
block:
A Google Cloud Console API key restricted to the Gemini API is valid for this
provider. This is not the separate Cloud Text-to-Speech API path.
Realtime voice
The bundledgoogle plugin registers a realtime voice provider backed by the
Gemini Live API for backend audio bridges such as Voice Call and Google Meet.
| Setting | Config path | Default |
|---|---|---|
| Model | plugins.entries.voice-call.config.realtime.providers.google.model | gemini-2.5-flash-native-audio-preview-12-2025 |
| Voice | ...google.voice | Kore |
| Temperature | ...google.temperature | (unset) |
| VAD start sensitivity | ...google.startSensitivity | (unset) |
| VAD end sensitivity | ...google.endSensitivity | (unset) |
| Silence duration | ...google.silenceDurationMs | (unset) |
| Activity handling | ...google.activityHandling | Google default, start-of-activity-interrupts |
| Turn coverage | ...google.turnCoverage | Google default, only-activity |
| Disable auto VAD | ...google.automaticActivityDetectionDisabled | false |
| API key | ...google.apiKey | Falls back to models.providers.google.apiKey, GEMINI_API_KEY, or GOOGLE_API_KEY |
Google Live API uses bidirectional audio and function calling over a WebSocket.
OpenClaw adapts telephony/Meet bridge audio to Gemini’s PCM Live API stream and
keeps tool calls on the shared realtime voice contract. Leave
temperature
unset unless you need sampling changes; OpenClaw omits non-positive values
because Google Live can return transcripts without audio for temperature: 0.
Gemini API transcription is enabled without languageCodes; the current Google
SDK rejects language-code hints on this API path.Control UI Talk supports Google Live browser sessions with constrained one-use
tokens. Backend-only realtime voice providers can also run through the generic
Gateway relay transport, which keeps provider credentials on the Gateway.
OPENAI_API_KEY=... GEMINI_API_KEY=... node --import tsx scripts/dev/realtime-talk-live-smoke.ts.
The Google leg mints the same constrained Live API token shape used by Control
UI Talk, opens the browser WebSocket endpoint, sends the initial setup payload,
and waits for setupComplete.
Advanced configuration
Direct Gemini cache reuse
Direct Gemini cache reuse
For direct Gemini API runs (
api: "google-generative-ai"), OpenClaw
passes a configured cachedContent handle through to Gemini requests.- Configure per-model or global params with either
cachedContentor legacycached_content - If both are present,
cachedContentwins - Example value:
cachedContents/prebuilt-context - Gemini cache-hit usage is normalized into OpenClaw
cacheReadfrom upstreamcachedContentTokenCount
Gemini CLI JSON usage notes
Gemini CLI JSON usage notes
When using the
google-gemini-cli OAuth provider, OpenClaw normalizes
the CLI JSON output as follows:- Reply text comes from the CLI JSON
responsefield. - Usage falls back to
statswhen the CLI leavesusageempty. stats.cachedis normalized into OpenClawcacheRead.- If
stats.inputis missing, OpenClaw derives input tokens fromstats.input_tokens - stats.cached.
Environment and daemon setup
Environment and daemon setup
If the Gateway runs as a daemon (launchd/systemd), make sure
GEMINI_API_KEY
is available to that process (for example, in ~/.openclaw/.env or via
env.shellEnv).Related
Model selection
Choosing providers, model refs, and failover behavior.
Image generation
Shared image tool parameters and provider selection.
Video generation
Shared video tool parameters and provider selection.
Music generation
Shared music tool parameters and provider selection.