Tools
Music generation
The music_generate tool creates music or audio through the shared
music-generation capability, backed by ComfyUI, fal, Google, MiniMax, and
OpenRouter.
For session-backed agent runs, music_generate starts as a background task,
tracks progress in the task ledger, then wakes the agent when the track is
ready so it can tell the user and attach the finished audio. The completion
agent follows the session's visible-reply contract: automatic final reply
when configured, or message(action="send") when the session requires the
message tool. If the requester session is inactive or its wake fails and
generated audio is still missing from the reply, OpenClaw sends an
idempotent direct fallback with just the missing audio.
Quick start
Shared provider-backed
Configure auth
Set an API key for at least one provider — for example
GEMINI_API_KEY or MINIMAX_API_KEY.
Pick a default model (optional)
{ agents: { defaults: { musicGenerationModel: { primary: "google/lyria-3-clip-preview", }, }, },}Ask the agent
"Generate an upbeat synthpop track about a night drive through a neon city."
The agent calls music_generate automatically. No tool
allow-listing needed.
Without a session-backed agent run (direct/local contexts), the tool runs inline and returns the final media path in the same tool result.
ComfyUI workflow
Configure the workflow
Configure plugins.entries.comfy.config.music with a workflow
JSON and prompt/output nodes.
Cloud auth (optional)
For Comfy Cloud, set COMFY_API_KEY or COMFY_CLOUD_API_KEY.
Call the tool
/tool music_generate prompt="Warm ambient synth loop with soft tape texture"Example prompts:
Generate a cinematic piano track with soft strings and no vocals.Generate an energetic chiptune loop about launching a rocket at sunrise.Use action: "list" to inspect available providers/models, and
action: "status" to inspect the active session-backed music task:
/tool music_generate action=list/tool music_generate action=statusDirect generation example:
/tool music_generate prompt="Dreamy lo-fi hip hop with vinyl texture and gentle rain" instrumental=trueSupported providers
| Provider | Default model | Reference inputs | Supported controls | Auth |
|---|---|---|---|---|
| ComfyUI | workflow |
Up to 1 image | Workflow-defined music or audio | COMFY_API_KEY, COMFY_CLOUD_API_KEY |
| fal | fal-ai/minimax-music/v2.6 |
None | lyrics, instrumental, durationSeconds, format |
FAL_KEY or FAL_API_KEY |
lyria-3-clip-preview |
Up to 10 images | lyrics, instrumental, format |
GEMINI_API_KEY, GOOGLE_API_KEY |
|
| MiniMax | music-2.6 |
None | lyrics, instrumental, format (mp3 only) |
MINIMAX_API_KEY or MiniMax OAuth |
| OpenRouter | google/lyria-3-pro-preview |
Up to 1 image | lyrics, instrumental, durationSeconds, format |
OPENROUTER_API_KEY |
MiniMax registers two provider ids sharing the same models: minimax for
API-key auth and minimax-portal for OAuth. Model refs follow the auth path
(minimax/music-2.6 vs minimax-portal/music-2.6); see
MiniMax.
fal also exposes fal-ai/ace-step/prompt-to-audio (wav, no lyrics, no
instrumental toggle) and fal-ai/stable-audio-25/text-to-audio (wav,
prompt-only) alongside its default MiniMax-backed model. Google's default
lyria-3-clip-preview outputs mp3 only; lyria-3-pro-preview also supports
wav. MiniMax also exposes music-2.6-free, music-cover, and
music-cover-free. OpenRouter also exposes google/lyria-3-clip-preview.
Capability matrix
The explicit mode contract used by music_generate, contract tests, and the
shared live sweep:
| Provider | generate |
edit |
Edit limit | Shared live lanes |
|---|---|---|---|---|
| ComfyUI | ✓ | ✓ | 1 image | Not in the shared sweep; covered by extensions/comfy/comfy.live.test.ts |
| fal | ✓ | — | None | generate |
| ✓ | ✓ | 10 images | generate, edit |
|
| MiniMax | ✓ | — | None | generate |
| OpenRouter | ✓ | ✓ | 1 image | generate, edit |
Tool parameters
promptstringrequiredMusic generation prompt. Required for action: "generate".
action"generate" | "status" | "list"default: generate"status" returns the current session task; "list" inspects providers.
modelstringProvider/model override (e.g. google/lyria-3-pro-preview,
comfy/workflow).
lyricsstringOptional lyrics when the provider supports explicit lyric input.
instrumentalbooleanRequest instrumental-only output when the provider supports it.
imagestringSingle reference image path or URL.
imagesstring[]Multiple reference images (up to 10 on supporting providers).
durationSecondsnumberTarget duration in seconds when the provider supports duration hints.
format"mp3" | "wav"Output format hint when the provider supports it.
filenamestringProvider request timeouts are operator configuration only. OpenClaw uses
agents.defaults.musicGenerationModel.timeoutMs when configured, raises
values below 120000ms to 120000ms, and otherwise defaults provider requests
to 300000ms.
Async behavior
Session-backed music generation runs as a background task:
- Background task:
music_generatecreates a background task, returns a started/task response immediately, and posts the finished track later in a follow-up agent message. - Duplicate prevention: while a task is
queuedorrunning, latermusic_generatecalls in the same session return task status instead of starting another generation. Useaction: "status"to check explicitly. A recently completed matching request is also deduplicated for 2 minutes. - Status lookup:
openclaw tasks listoropenclaw tasks show <taskId>inspects queued, running, and terminal status. - Completion wake: OpenClaw injects an internal completion event back into the same session so the model can write the user-facing follow-up itself.
- Prompt hint: later user/manual turns in the same session get a small
runtime hint when a music task is already in flight, so the model does
not blindly call
music_generateagain. - No-session fallback: direct/local contexts without a real agent session run inline and return the final audio result in the same turn.
Task lifecycle
The music task surfaces the same states as the general task registry (see
Background tasks for the full state
machine, including timed_out, cancelled, and lost). Most music runs
move through:
| State | Meaning |
|---|---|
queued |
Task created, waiting for the provider to accept it. |
running |
Provider is processing (typically 30 seconds to 3 minutes depending on provider and duration). |
succeeded |
Track ready; the agent wakes and posts it to the conversation. |
failed |
Provider error or timeout; the agent wakes with error details. |
Check status from the CLI:
openclaw tasks listopenclaw tasks show <taskId>openclaw tasks cancel <taskId>Configuration
Model selection
{ agents: { defaults: { musicGenerationModel: { primary: "google/lyria-3-clip-preview", fallbacks: ["fal/fal-ai/minimax-music/v2.6", "minimax/music-2.6"], }, }, },}Provider selection order
OpenClaw tries providers in this order:
modelparameter from the tool call (if the agent specifies one).musicGenerationModel.primaryfrom config.musicGenerationModel.fallbacksin order.- Auto-detection using auth-backed provider defaults only:
- current default text-model provider first, if it also offers music generation;
- remaining registered music-generation providers, alphabetically by provider id.
If a provider fails, the next candidate is tried automatically. If all fail, the error includes details from each attempt.
Set agents.defaults.mediaGenerationAutoProviderFallback: false to use only
explicit model, primary, and fallbacks entries.
Provider notes
ComfyUI
Workflow-driven and depends on the configured graph plus node mapping
for prompt/output fields. The bundled comfy plugin plugs into the
shared music_generate tool through the music-generation provider
registry.
fal
Uses fal model endpoints through the shared provider auth path. The
bundled provider defaults to fal-ai/minimax-music/v2.6 and also exposes
fal-ai/ace-step/prompt-to-audio and
fal-ai/stable-audio-25/text-to-audio for prompt-to-audio requests.
Lyrics and instrumental mode are MiniMax-model-only; the other two
models are prompt-only.
Google (Lyria 3)
Uses Lyria 3 batch generation. The current bundled flow supports
prompt, optional lyrics text, and optional reference images. The
default lyria-3-clip-preview model outputs mp3 only; the
lyria-3-pro-preview model also supports wav.
MiniMax
Uses the batch music_generation endpoint. Supports prompt, optional
lyrics, instrumental mode, and mp3 output through either minimax
API-key auth or minimax-portal OAuth. Also exposes music-2.6-free,
music-cover, and music-cover-free models.
OpenRouter
Uses OpenRouter chat completions audio output with streaming enabled. The
bundled provider defaults to google/lyria-3-pro-preview and also exposes
openrouter/google/lyria-3-clip-preview.
Choosing the right path
- Shared provider-backed when you want model selection, provider failover, and the built-in async task/status flow.
- Plugin path (ComfyUI) when you need a custom workflow graph or a provider that is not part of the shared bundled music capability.
If you are debugging ComfyUI-specific behavior, see ComfyUI. If you are debugging shared provider behavior, start with fal, Google (Gemini), MiniMax, or OpenRouter.
Provider capability modes
The shared music-generation contract supports explicit mode declarations:
generatefor prompt-only generation.editwhen the request includes one or more reference images.
New provider implementations should prefer explicit mode blocks:
capabilities: { generate: { maxTracks: 1, supportsLyrics: true, supportsFormat: true, }, edit: { enabled: true, maxTracks: 1, maxInputImages: 1, supportsFormat: true, },}Legacy flat fields such as maxInputImages, supportsLyrics, and
supportsFormat are not enough to advertise edit support. Providers
should declare generate and edit explicitly so live tests, contract
tests, and the shared music_generate tool can validate mode support
deterministically.
Live tests
Opt-in live coverage for the shared bundled providers (fal, Google, MiniMax, OpenRouter):
OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/music-generation-providers.live.test.tsEquivalent repo wrapper, which drives the same test file:
pnpm test:live:media:musicThis live file uses already-exported provider env vars ahead of stored auth
profiles by default, and runs both generate and declared edit coverage when
the provider enables edit mode. Coverage today:
google:generatepluseditfal:generateonlyminimax:generateonlyopenrouter:generatepluseditcomfy: separate Comfy live coverage, not the shared provider sweep
Opt-in live coverage for the bundled ComfyUI music path:
OPENCLAW_LIVE_TEST=1 COMFY_LIVE_TEST=1 pnpm test:live -- extensions/comfy/comfy.live.test.tsThe Comfy live file also covers comfy image and video workflows when those sections are configured.
Related
- Background tasks — task tracking for detached
music_generateruns - ComfyUI
- Configuration reference —
musicGenerationModelconfig - Google (Gemini)
- MiniMax
- Models — model configuration and failover
- Tools overview