TheDocumentation Index
Fetch the complete documentation index at: https://docs.openclaw.ai/llms.txt
Use this file to discover all available pages before exploring further.
music_generate tool lets the agent create music or audio through the
shared music-generation capability with configured providers — ComfyUI,
fal, Google, MiniMax, and OpenRouter today.
For session-backed agent runs, OpenClaw starts music generation as a
background task, tracks it in the task ledger, then wakes the agent again
when the track is ready so the agent can tell the user and attach the
finished audio. Generated-media completions are delivered by the agent through
the message tool; OpenClaw does not auto-post the file as a fallback if the
completion agent writes only a private final reply. The completion wake
explicitly warns the agent that normal final replies are private for this
route.
The built-in shared tool only appears when at least one music-generation
provider is available. If you do not see
music_generate in your agent’s
tools, configure agents.defaults.musicGenerationModel or set up a
provider API key.Quick start
Example prompts:Supported providers
| Provider | Default model | Reference inputs | Supported controls | Auth |
|---|---|---|---|---|
| ComfyUI | workflow | Up to 1 image | Workflow-defined music or audio | COMFY_API_KEY, COMFY_CLOUD_API_KEY |
| fal | fal-ai/minimax-music/v2.6 | None | lyrics, instrumental, durationSeconds, format | FAL_KEY or FAL_API_KEY |
lyria-3-clip-preview | Up to 10 images | lyrics, instrumental, format | GEMINI_API_KEY, GOOGLE_API_KEY | |
| MiniMax | music-2.6 | None | lyrics, instrumental, durationSeconds, format=mp3 | MINIMAX_API_KEY or MiniMax OAuth |
| OpenRouter | google/lyria-3-pro-preview | Up to 1 image | lyrics, instrumental, durationSeconds, format | OPENROUTER_API_KEY |
Capability matrix
The explicit mode contract used bymusic_generate, contract tests, and the
shared live sweep:
| Provider | generate | edit | Edit limit | Shared live lanes |
|---|---|---|---|---|
| ComfyUI | ✓ | ✓ | 1 image | Not in the shared sweep; covered by extensions/comfy/comfy.live.test.ts |
| fal | ✓ | — | None | generate |
| ✓ | ✓ | 10 images | generate, edit | |
| MiniMax | ✓ | — | None | generate |
| OpenRouter | ✓ | ✓ | 1 image | generate, edit |
action: "list" to inspect available shared providers and models at
runtime:
action: "status" to inspect the active session-backed music task:
Tool parameters
Music generation prompt. Required for
action: "generate"."status" returns the current session task; "list" inspects providers.Provider/model override (e.g.
google/lyria-3-pro-preview,
comfy/workflow).Optional lyrics when the provider supports explicit lyric input.
Request instrumental-only output when the provider supports it.
Single reference image path or URL.
Multiple reference images (up to 10 on supporting providers).
Target duration in seconds when the provider supports duration hints.
Output format hint when the provider supports it.
Output filename hint.
Not all providers support all parameters. OpenClaw still validates hard
limits such as input counts before submission. When a provider supports
duration but uses a shorter maximum than the requested value, OpenClaw
clamps to the closest supported duration. Truly unsupported optional hints
are ignored with a warning when the selected provider or model cannot honor
them. Tool results report applied settings;
details.normalization
captures any requested-to-applied mapping.agents.defaults.musicGenerationModel.timeoutMs when configured, raises values
below 120000ms to 120000ms, and otherwise defaults provider requests to
300000ms.
Async behavior
Session-backed music generation runs as a background task:- Background task:
music_generatecreates a background task, returns a started/task response immediately, and posts the finished track later in a follow-up agent message. - Duplicate prevention: while a task is
queuedorrunning, latermusic_generatecalls in the same session return task status instead of starting another generation. Useaction: "status"to check explicitly. - Status lookup:
openclaw tasks listoropenclaw tasks show <taskId>inspects queued, running, and terminal status. - Completion wake: OpenClaw injects an internal completion event back into the same session so the model can write the user-facing follow-up itself.
- Prompt hint: later user/manual turns in the same session get a small
runtime hint when a music task is already in flight, so the model does
not blindly call
music_generateagain. - No-session fallback: direct/local contexts without a real agent session run inline and return the final audio result in the same turn.
Task lifecycle
| State | Meaning |
|---|---|
queued | Task created, waiting for the provider to accept it. |
running | Provider is processing (typically 30 seconds to 3 minutes depending on provider and duration). |
succeeded | Track ready; the agent wakes and posts it to the conversation. |
failed | Provider error or timeout; the agent wakes with error details. |
Configuration
Model selection
Provider selection order
OpenClaw tries providers in this order:modelparameter from the tool call (if the agent specifies one).musicGenerationModel.primaryfrom config.musicGenerationModel.fallbacksin order.- Auto-detection using auth-backed provider defaults only:
- current default provider first;
- remaining registered music-generation providers in provider-id order.
agents.defaults.mediaGenerationAutoProviderFallback: false to use only
explicit model, primary, and fallbacks entries.
Provider notes
ComfyUI
ComfyUI
Workflow-driven and depends on the configured graph plus node mapping
for prompt/output fields. The bundled
comfy plugin plugs into the
shared music_generate tool through the music-generation provider
registry.fal
fal
Uses fal model endpoints through the shared provider auth path. The
bundled provider defaults to
fal-ai/minimax-music/v2.6 and also exposes
fal-ai/ace-step/prompt-to-audio and
fal-ai/stable-audio-25/text-to-audio for prompt-to-audio requests.Google (Lyria 3)
Google (Lyria 3)
Uses Lyria 3 batch generation. The current bundled flow supports
prompt, optional lyrics text, and optional reference images.
MiniMax
MiniMax
Uses the batch
music_generation endpoint. Supports prompt, optional
lyrics, instrumental mode, duration steering, and mp3 output through
either minimax API-key auth or minimax-portal OAuth.OpenRouter
OpenRouter
Uses OpenRouter chat completions audio output with streaming enabled. The
bundled provider defaults to
google/lyria-3-pro-preview and also exposes
openrouter/google/lyria-3-clip-preview.Choosing the right path
- Shared provider-backed when you want model selection, provider failover, and the built-in async task/status flow.
- Plugin path (ComfyUI) when you need a custom workflow graph or a provider that is not part of the shared bundled music capability.
Provider capability modes
The shared music-generation contract supports explicit mode declarations:generatefor prompt-only generation.editwhen the request includes one or more reference images.
maxInputImages, supportsLyrics, and
supportsFormat are not enough to advertise edit support. Providers
should declare generate and edit explicitly so live tests, contract
tests, and the shared music_generate tool can validate mode support
deterministically.
Live tests
Opt-in live coverage for the shared bundled providers:generate and declared edit coverage when
the provider enables edit mode. Coverage today:
google:generatepluseditfal:generateonlyminimax:generateonlyopenrouter:generatepluseditcomfy: separate Comfy live coverage, not the shared provider sweep
Related
- Background tasks — task tracking for detached
music_generateruns - ComfyUI
- Configuration reference —
musicGenerationModelconfig - Google (Gemini)
- MiniMax
- Models — model configuration and failover
- Tools overview