Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.openclaw.ai/llms.txt

Use this file to discover all available pages before exploring further.

The music_generate tool lets the agent create music or audio through the shared music-generation capability with configured providers — ComfyUI, fal, Google, MiniMax, and OpenRouter today. For session-backed agent runs, OpenClaw starts music generation as a background task, tracks it in the task ledger, then wakes the agent again when the track is ready so the agent can tell the user and attach the finished audio. Generated-media completions are delivered by the agent through the message tool; OpenClaw does not auto-post the file as a fallback if the completion agent writes only a private final reply. The completion wake explicitly warns the agent that normal final replies are private for this route.
The built-in shared tool only appears when at least one music-generation provider is available. If you do not see music_generate in your agent’s tools, configure agents.defaults.musicGenerationModel or set up a provider API key.

Quick start

1

Configure auth

Set an API key for at least one provider — for example GEMINI_API_KEY or MINIMAX_API_KEY.
2

Pick a default model (optional)

{
  agents: {
    defaults: {
      musicGenerationModel: {
        primary: "google/lyria-3-clip-preview",
      },
    },
  },
}
3

Ask the agent

“Generate an upbeat synthpop track about a night drive through a neon city.”The agent calls music_generate automatically. No tool allow-listing needed.
For direct synchronous contexts without a session-backed agent run, the built-in tool still falls back to inline generation and returns the final media path in the tool result.
Example prompts:
Generate a cinematic piano track with soft strings and no vocals.
Generate an energetic chiptune loop about launching a rocket at sunrise.

Supported providers

ProviderDefault modelReference inputsSupported controlsAuth
ComfyUIworkflowUp to 1 imageWorkflow-defined music or audioCOMFY_API_KEY, COMFY_CLOUD_API_KEY
falfal-ai/minimax-music/v2.6Nonelyrics, instrumental, durationSeconds, formatFAL_KEY or FAL_API_KEY
Googlelyria-3-clip-previewUp to 10 imageslyrics, instrumental, formatGEMINI_API_KEY, GOOGLE_API_KEY
MiniMaxmusic-2.6Nonelyrics, instrumental, durationSeconds, format=mp3MINIMAX_API_KEY or MiniMax OAuth
OpenRoutergoogle/lyria-3-pro-previewUp to 1 imagelyrics, instrumental, durationSeconds, formatOPENROUTER_API_KEY

Capability matrix

The explicit mode contract used by music_generate, contract tests, and the shared live sweep:
ProvidergenerateeditEdit limitShared live lanes
ComfyUI1 imageNot in the shared sweep; covered by extensions/comfy/comfy.live.test.ts
falNonegenerate
Google10 imagesgenerate, edit
MiniMaxNonegenerate
OpenRouter1 imagegenerate, edit
Use action: "list" to inspect available shared providers and models at runtime:
/tool music_generate action=list
Use action: "status" to inspect the active session-backed music task:
/tool music_generate action=status
Direct generation example:
/tool music_generate prompt="Dreamy lo-fi hip hop with vinyl texture and gentle rain" instrumental=true

Tool parameters

prompt
string
required
Music generation prompt. Required for action: "generate".
action
"generate" | "status" | "list"
default:"generate"
"status" returns the current session task; "list" inspects providers.
model
string
Provider/model override (e.g. google/lyria-3-pro-preview, comfy/workflow).
lyrics
string
Optional lyrics when the provider supports explicit lyric input.
instrumental
boolean
Request instrumental-only output when the provider supports it.
image
string
Single reference image path or URL.
images
string[]
Multiple reference images (up to 10 on supporting providers).
durationSeconds
number
Target duration in seconds when the provider supports duration hints.
format
"mp3" | "wav"
Output format hint when the provider supports it.
filename
string
Output filename hint.
Not all providers support all parameters. OpenClaw still validates hard limits such as input counts before submission. When a provider supports duration but uses a shorter maximum than the requested value, OpenClaw clamps to the closest supported duration. Truly unsupported optional hints are ignored with a warning when the selected provider or model cannot honor them. Tool results report applied settings; details.normalization captures any requested-to-applied mapping.
Provider request timeouts are operator configuration only. OpenClaw uses agents.defaults.musicGenerationModel.timeoutMs when configured, raises values below 120000ms to 120000ms, and otherwise defaults provider requests to 300000ms.

Async behavior

Session-backed music generation runs as a background task:
  • Background task: music_generate creates a background task, returns a started/task response immediately, and posts the finished track later in a follow-up agent message.
  • Duplicate prevention: while a task is queued or running, later music_generate calls in the same session return task status instead of starting another generation. Use action: "status" to check explicitly.
  • Status lookup: openclaw tasks list or openclaw tasks show <taskId> inspects queued, running, and terminal status.
  • Completion wake: OpenClaw injects an internal completion event back into the same session so the model can write the user-facing follow-up itself.
  • Prompt hint: later user/manual turns in the same session get a small runtime hint when a music task is already in flight, so the model does not blindly call music_generate again.
  • No-session fallback: direct/local contexts without a real agent session run inline and return the final audio result in the same turn.

Task lifecycle

StateMeaning
queuedTask created, waiting for the provider to accept it.
runningProvider is processing (typically 30 seconds to 3 minutes depending on provider and duration).
succeededTrack ready; the agent wakes and posts it to the conversation.
failedProvider error or timeout; the agent wakes with error details.
Check status from the CLI:
openclaw tasks list
openclaw tasks show <taskId>
openclaw tasks cancel <taskId>

Configuration

Model selection

{
  agents: {
    defaults: {
      musicGenerationModel: {
        primary: "google/lyria-3-clip-preview",
        fallbacks: ["fal/fal-ai/minimax-music/v2.6", "minimax/music-2.6"],
      },
    },
  },
}

Provider selection order

OpenClaw tries providers in this order:
  1. model parameter from the tool call (if the agent specifies one).
  2. musicGenerationModel.primary from config.
  3. musicGenerationModel.fallbacks in order.
  4. Auto-detection using auth-backed provider defaults only:
    • current default provider first;
    • remaining registered music-generation providers in provider-id order.
If a provider fails, the next candidate is tried automatically. If all fail, the error includes details from each attempt. Set agents.defaults.mediaGenerationAutoProviderFallback: false to use only explicit model, primary, and fallbacks entries.

Provider notes

Workflow-driven and depends on the configured graph plus node mapping for prompt/output fields. The bundled comfy plugin plugs into the shared music_generate tool through the music-generation provider registry.
Uses fal model endpoints through the shared provider auth path. The bundled provider defaults to fal-ai/minimax-music/v2.6 and also exposes fal-ai/ace-step/prompt-to-audio and fal-ai/stable-audio-25/text-to-audio for prompt-to-audio requests.
Uses Lyria 3 batch generation. The current bundled flow supports prompt, optional lyrics text, and optional reference images.
Uses the batch music_generation endpoint. Supports prompt, optional lyrics, instrumental mode, duration steering, and mp3 output through either minimax API-key auth or minimax-portal OAuth.
Uses OpenRouter chat completions audio output with streaming enabled. The bundled provider defaults to google/lyria-3-pro-preview and also exposes openrouter/google/lyria-3-clip-preview.

Choosing the right path

  • Shared provider-backed when you want model selection, provider failover, and the built-in async task/status flow.
  • Plugin path (ComfyUI) when you need a custom workflow graph or a provider that is not part of the shared bundled music capability.
If you are debugging ComfyUI-specific behavior, see ComfyUI. If you are debugging shared provider behavior, start with fal, Google (Gemini), MiniMax, or OpenRouter.

Provider capability modes

The shared music-generation contract supports explicit mode declarations:
  • generate for prompt-only generation.
  • edit when the request includes one or more reference images.
New provider implementations should prefer explicit mode blocks:
capabilities: {
  generate: {
    maxTracks: 1,
    supportsLyrics: true,
    supportsFormat: true,
  },
  edit: {
    enabled: true,
    maxTracks: 1,
    maxInputImages: 1,
    supportsFormat: true,
  },
}
Legacy flat fields such as maxInputImages, supportsLyrics, and supportsFormat are not enough to advertise edit support. Providers should declare generate and edit explicitly so live tests, contract tests, and the shared music_generate tool can validate mode support deterministically.

Live tests

Opt-in live coverage for the shared bundled providers:
OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/music-generation-providers.live.test.ts
Repo wrapper:
pnpm test:live:media music
This live file uses already-exported provider env vars ahead of stored auth profiles by default, and runs both generate and declared edit coverage when the provider enables edit mode. Coverage today:
  • google: generate plus edit
  • fal: generate only
  • minimax: generate only
  • openrouter: generate plus edit
  • comfy: separate Comfy live coverage, not the shared provider sweep
Opt-in live coverage for the bundled ComfyUI music path:
OPENCLAW_LIVE_TEST=1 COMFY_LIVE_TEST=1 pnpm test:live -- extensions/comfy/comfy.live.test.ts
The Comfy live file also covers comfy image and video workflows when those sections are configured.