Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.openclaw.ai/llms.txt

Use this file to discover all available pages before exploring further.

Deepgram is a speech-to-text API. In OpenClaw it is used for inbound audio/voice-note transcription through tools.media.audio and for Voice Call streaming STT through plugins.entries.voice-call.config.streaming. For batch transcription, OpenClaw uploads the complete audio file to Deepgram and injects the transcript into the reply pipeline ({{Transcript}} + [Audio] block). For Voice Call streaming, OpenClaw forwards live G.711 u-law frames over Deepgram’s WebSocket listen endpoint and emits partial or final transcripts as Deepgram returns them.
DetailValue
Websitedeepgram.com
Docsdevelopers.deepgram.com
AuthDEEPGRAM_API_KEY
Default modelnova-3

Getting started

1

Set your API key

Add your Deepgram API key to the environment:
DEEPGRAM_API_KEY=dg_...
2

Enable the audio provider

{
  tools: {
    media: {
      audio: {
        enabled: true,
        models: [{ provider: "deepgram", model: "nova-3" }],
      },
    },
  },
}
3

Send a voice note

Send an audio message through any connected channel. OpenClaw transcribes it via Deepgram and injects the transcript into the reply pipeline.

Configuration options

OptionPathDescription
modeltools.media.audio.models[].modelDeepgram model id (default: nova-3)
languagetools.media.audio.models[].languageLanguage hint (optional)
detect_languagetools.media.audio.providerOptions.deepgram.detect_languageEnable language detection (optional)
punctuatetools.media.audio.providerOptions.deepgram.punctuateEnable punctuation (optional)
smart_formattools.media.audio.providerOptions.deepgram.smart_formatEnable smart formatting (optional)
{
  tools: {
    media: {
      audio: {
        enabled: true,
        models: [{ provider: "deepgram", model: "nova-3", language: "en" }],
      },
    },
  },
}

Voice Call streaming STT

The bundled deepgram plugin also registers a realtime transcription provider for the Voice Call plugin.
SettingConfig pathDefault
API keyplugins.entries.voice-call.config.streaming.providers.deepgram.apiKeyFalls back to DEEPGRAM_API_KEY
Model...deepgram.modelnova-3
Language...deepgram.language(unset)
Encoding...deepgram.encodingmulaw
Sample rate...deepgram.sampleRate8000
Endpointing...deepgram.endpointingMs800
Interim results...deepgram.interimResultstrue
{
  plugins: {
    entries: {
      "voice-call": {
        config: {
          streaming: {
            enabled: true,
            provider: "deepgram",
            providers: {
              deepgram: {
                apiKey: "${DEEPGRAM_API_KEY}",
                model: "nova-3",
                endpointingMs: 800,
                language: "en-US",
              },
            },
          },
        },
      },
    },
  },
}
Voice Call receives telephony audio as 8 kHz G.711 u-law. The Deepgram streaming provider defaults to encoding: "mulaw" and sampleRate: 8000, so Twilio media frames can be forwarded directly.

Notes

Authentication follows the standard provider auth order. DEEPGRAM_API_KEY is the simplest path.
Override endpoints or headers with tools.media.audio.baseUrl and tools.media.audio.headers when using a proxy.
Output follows the same audio rules as other providers (size caps, timeouts, transcript injection).

Media tools

Audio, image, and video processing pipeline overview.

Configuration

Full config reference including media tool settings.

Troubleshooting

Common issues and debugging steps.

FAQ

Frequently asked questions about OpenClaw setup.