> ## Documentation Index
> Fetch the complete documentation index at: https://docs.openclaw.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Talk Mode

# Talk Mode

Talk mode is a continuous voice conversation loop:

1. Listen for speech
2. Send transcript to the model (main session, chat.send)
3. Wait for the response
4. Speak it via ElevenLabs (streaming playback)

## Behavior (macOS)

* **Always-on overlay** while Talk mode is enabled.
* **Listening → Thinking → Speaking** phase transitions.
* On a **short pause** (silence window), the current transcript is sent.
* Replies are **written to WebChat** (same as typing).
* **Interrupt on speech** (default on): if the user starts talking while the assistant is speaking, we stop playback and note the interruption timestamp for the next prompt.

## Voice directives in replies

The assistant may prefix its reply with a **single JSON line** to control voice:

```json  theme={"theme":{"light":"min-light","dark":"min-dark"}}
{ "voice": "<voice-id>", "once": true }
```

Rules:

* First non-empty line only.
* Unknown keys are ignored.
* `once: true` applies to the current reply only.
* Without `once`, the voice becomes the new default for Talk mode.
* The JSON line is stripped before TTS playback.

Supported keys:

* `voice` / `voice_id` / `voiceId`
* `model` / `model_id` / `modelId`
* `speed`, `rate` (WPM), `stability`, `similarity`, `style`, `speakerBoost`
* `seed`, `normalize`, `lang`, `output_format`, `latency_tier`
* `once`

## Config (`~/.openclaw/openclaw.json`)

```json5  theme={"theme":{"light":"min-light","dark":"min-dark"}}
{
  talk: {
    voiceId: "elevenlabs_voice_id",
    modelId: "eleven_v3",
    outputFormat: "mp3_44100_128",
    apiKey: "elevenlabs_api_key",
    silenceTimeoutMs: 1500,
    interruptOnSpeech: true,
  },
}
```

Defaults:

* `interruptOnSpeech`: true
* `silenceTimeoutMs`: when unset, Talk keeps the platform default pause window before sending the transcript (`700 ms on macOS and Android, 900 ms on iOS`)
* `voiceId`: falls back to `ELEVENLABS_VOICE_ID` / `SAG_VOICE_ID` (or first ElevenLabs voice when API key is available)
* `modelId`: defaults to `eleven_v3` when unset
* `apiKey`: falls back to `ELEVENLABS_API_KEY` (or gateway shell profile if available)
* `outputFormat`: defaults to `pcm_44100` on macOS/iOS and `pcm_24000` on Android (set `mp3_*` to force MP3 streaming)

## macOS UI

* Menu bar toggle: **Talk**
* Config tab: **Talk Mode** group (voice id + interrupt toggle)
* Overlay:
  * **Listening**: cloud pulses with mic level
  * **Thinking**: sinking animation
  * **Speaking**: radiating rings
  * Click cloud: stop speaking
  * Click X: exit Talk mode

## Notes

* Requires Speech + Microphone permissions.
* Uses `chat.send` against session key `main`.
* TTS uses ElevenLabs streaming API with `ELEVENLABS_API_KEY` and incremental playback on macOS/iOS/Android for lower latency.
* `stability` for `eleven_v3` is validated to `0.0`, `0.5`, or `1.0`; other models accept `0..1`.
* `latency_tier` is validated to `0..4` when set.
* Android supports `pcm_16000`, `pcm_22050`, `pcm_24000`, and `pcm_44100` output formats for low-latency AudioTrack streaming.


Built with [Mintlify](https://mintlify.com).