Tools

Music generation

The music_generate tool creates music or audio through the shared music-generation capability, backed by ComfyUI, fal, Google, MiniMax, and OpenRouter.

For session-backed agent runs, music_generate starts as a background task, tracks progress in the task ledger, then wakes the agent when the track is ready so it can tell the user and attach the finished audio. The completion agent follows the session's visible-reply contract: automatic final reply when configured, or message(action="send") when the session requires the message tool. If the requester session is inactive or its wake fails and generated audio is still missing from the reply, OpenClaw sends an idempotent direct fallback with just the missing audio.

Quick start

Shared provider-backed

  • Configure auth

    Set an API key for at least one provider — for example GEMINI_API_KEY or MINIMAX_API_KEY.

  • Pick a default model (optional)

    json5
    {  agents: {    defaults: {      musicGenerationModel: {        primary: "google/lyria-3-clip-preview",      },    },  },}
  • Ask the agent

    "Generate an upbeat synthpop track about a night drive through a neon city."

    The agent calls music_generate automatically. No tool allow-listing needed.

  • Without a session-backed agent run (direct/local contexts), the tool runs inline and returns the final media path in the same tool result.

    ComfyUI workflow

  • Configure the workflow

    Configure plugins.entries.comfy.config.music with a workflow JSON and prompt/output nodes.

  • Cloud auth (optional)

    For Comfy Cloud, set COMFY_API_KEY or COMFY_CLOUD_API_KEY.

  • Call the tool

    text
    /tool music_generate prompt="Warm ambient synth loop with soft tape texture"
  • Example prompts:

    text
    Generate a cinematic piano track with soft strings and no vocals.
    text
    Generate an energetic chiptune loop about launching a rocket at sunrise.

    Use action: "list" to inspect available providers/models, and action: "status" to inspect the active session-backed music task:

    text
    /tool music_generate action=list/tool music_generate action=status

    Direct generation example:

    text
    /tool music_generate prompt="Dreamy lo-fi hip hop with vinyl texture and gentle rain" instrumental=true

    Supported providers

    Provider Default model Reference inputs Supported controls Auth
    ComfyUI workflow Up to 1 image Workflow-defined music or audio COMFY_API_KEY, COMFY_CLOUD_API_KEY
    fal fal-ai/minimax-music/v2.6 None lyrics, instrumental, durationSeconds, format FAL_KEY or FAL_API_KEY
    Google lyria-3-clip-preview Up to 10 images lyrics, instrumental, format GEMINI_API_KEY, GOOGLE_API_KEY
    MiniMax music-2.6 None lyrics, instrumental, format (mp3 only) MINIMAX_API_KEY or MiniMax OAuth
    OpenRouter google/lyria-3-pro-preview Up to 1 image lyrics, instrumental, durationSeconds, format OPENROUTER_API_KEY

    MiniMax registers two provider ids sharing the same models: minimax for API-key auth and minimax-portal for OAuth. Model refs follow the auth path (minimax/music-2.6 vs minimax-portal/music-2.6); see MiniMax.

    fal also exposes fal-ai/ace-step/prompt-to-audio (wav, no lyrics, no instrumental toggle) and fal-ai/stable-audio-25/text-to-audio (wav, prompt-only) alongside its default MiniMax-backed model. Google's default lyria-3-clip-preview outputs mp3 only; lyria-3-pro-preview also supports wav. MiniMax also exposes music-2.6-free, music-cover, and music-cover-free. OpenRouter also exposes google/lyria-3-clip-preview.

    Capability matrix

    The explicit mode contract used by music_generate, contract tests, and the shared live sweep:

    Provider generate edit Edit limit Shared live lanes
    ComfyUI 1 image Not in the shared sweep; covered by extensions/comfy/comfy.live.test.ts
    fal None generate
    Google 10 images generate, edit
    MiniMax None generate
    OpenRouter 1 image generate, edit

    Tool parameters

    promptstringrequired

    Music generation prompt. Required for action: "generate".

    action"generate" | "status" | "list"default: generate

    "status" returns the current session task; "list" inspects providers.

    modelstring

    Provider/model override (e.g. google/lyria-3-pro-preview, comfy/workflow).

    lyricsstring

    Optional lyrics when the provider supports explicit lyric input.

    instrumentalboolean

    Request instrumental-only output when the provider supports it.

    imagestring

    Single reference image path or URL.

    imagesstring[]

    Multiple reference images (up to 10 on supporting providers).

    durationSecondsnumber

    Target duration in seconds when the provider supports duration hints.

    format"mp3" | "wav"

    Output format hint when the provider supports it.

    filenamestring

    Provider request timeouts are operator configuration only. OpenClaw uses agents.defaults.musicGenerationModel.timeoutMs when configured, raises values below 120000ms to 120000ms, and otherwise defaults provider requests to 300000ms.

    Async behavior

    Session-backed music generation runs as a background task:

    • Background task: music_generate creates a background task, returns a started/task response immediately, and posts the finished track later in a follow-up agent message.
    • Duplicate prevention: while a task is queued or running, later music_generate calls in the same session return task status instead of starting another generation. Use action: "status" to check explicitly. A recently completed matching request is also deduplicated for 2 minutes.
    • Status lookup: openclaw tasks list or openclaw tasks show <taskId> inspects queued, running, and terminal status.
    • Completion wake: OpenClaw injects an internal completion event back into the same session so the model can write the user-facing follow-up itself.
    • Prompt hint: later user/manual turns in the same session get a small runtime hint when a music task is already in flight, so the model does not blindly call music_generate again.
    • No-session fallback: direct/local contexts without a real agent session run inline and return the final audio result in the same turn.

    Task lifecycle

    The music task surfaces the same states as the general task registry (see Background tasks for the full state machine, including timed_out, cancelled, and lost). Most music runs move through:

    State Meaning
    queued Task created, waiting for the provider to accept it.
    running Provider is processing (typically 30 seconds to 3 minutes depending on provider and duration).
    succeeded Track ready; the agent wakes and posts it to the conversation.
    failed Provider error or timeout; the agent wakes with error details.

    Check status from the CLI:

    bash
    openclaw tasks listopenclaw tasks show <taskId>openclaw tasks cancel <taskId>

    Configuration

    Model selection

    json5
    {  agents: {    defaults: {      musicGenerationModel: {        primary: "google/lyria-3-clip-preview",        fallbacks: ["fal/fal-ai/minimax-music/v2.6", "minimax/music-2.6"],      },    },  },}

    Provider selection order

    OpenClaw tries providers in this order:

    1. model parameter from the tool call (if the agent specifies one).
    2. musicGenerationModel.primary from config.
    3. musicGenerationModel.fallbacks in order.
    4. Auto-detection using auth-backed provider defaults only:
      • current default text-model provider first, if it also offers music generation;
      • remaining registered music-generation providers, alphabetically by provider id.

    If a provider fails, the next candidate is tried automatically. If all fail, the error includes details from each attempt.

    Set agents.defaults.mediaGenerationAutoProviderFallback: false to use only explicit model, primary, and fallbacks entries.

    Provider notes

    ComfyUI

    Workflow-driven and depends on the configured graph plus node mapping for prompt/output fields. The bundled comfy plugin plugs into the shared music_generate tool through the music-generation provider registry.

    fal

    Uses fal model endpoints through the shared provider auth path. The bundled provider defaults to fal-ai/minimax-music/v2.6 and also exposes fal-ai/ace-step/prompt-to-audio and fal-ai/stable-audio-25/text-to-audio for prompt-to-audio requests. Lyrics and instrumental mode are MiniMax-model-only; the other two models are prompt-only.

    Google (Lyria 3)

    Uses Lyria 3 batch generation. The current bundled flow supports prompt, optional lyrics text, and optional reference images. The default lyria-3-clip-preview model outputs mp3 only; the lyria-3-pro-preview model also supports wav.

    MiniMax

    Uses the batch music_generation endpoint. Supports prompt, optional lyrics, instrumental mode, and mp3 output through either minimax API-key auth or minimax-portal OAuth. Also exposes music-2.6-free, music-cover, and music-cover-free models.

    OpenRouter

    Uses OpenRouter chat completions audio output with streaming enabled. The bundled provider defaults to google/lyria-3-pro-preview and also exposes openrouter/google/lyria-3-clip-preview.

    Choosing the right path

    • Shared provider-backed when you want model selection, provider failover, and the built-in async task/status flow.
    • Plugin path (ComfyUI) when you need a custom workflow graph or a provider that is not part of the shared bundled music capability.

    If you are debugging ComfyUI-specific behavior, see ComfyUI. If you are debugging shared provider behavior, start with fal, Google (Gemini), MiniMax, or OpenRouter.

    Provider capability modes

    The shared music-generation contract supports explicit mode declarations:

    • generate for prompt-only generation.
    • edit when the request includes one or more reference images.

    New provider implementations should prefer explicit mode blocks:

    typescript
    capabilities: {  generate: {    maxTracks: 1,    supportsLyrics: true,    supportsFormat: true,  },  edit: {    enabled: true,    maxTracks: 1,    maxInputImages: 1,    supportsFormat: true,  },}

    Legacy flat fields such as maxInputImages, supportsLyrics, and supportsFormat are not enough to advertise edit support. Providers should declare generate and edit explicitly so live tests, contract tests, and the shared music_generate tool can validate mode support deterministically.

    Live tests

    Opt-in live coverage for the shared bundled providers (fal, Google, MiniMax, OpenRouter):

    bash
    OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/music-generation-providers.live.test.ts

    Equivalent repo wrapper, which drives the same test file:

    bash
    pnpm test:live:media:music

    This live file uses already-exported provider env vars ahead of stored auth profiles by default, and runs both generate and declared edit coverage when the provider enables edit mode. Coverage today:

    • google: generate plus edit
    • fal: generate only
    • minimax: generate only
    • openrouter: generate plus edit
    • comfy: separate Comfy live coverage, not the shared provider sweep

    Opt-in live coverage for the bundled ComfyUI music path:

    bash
    OPENCLAW_LIVE_TEST=1 COMFY_LIVE_TEST=1 pnpm test:live -- extensions/comfy/comfy.live.test.ts

    The Comfy live file also covers comfy image and video workflows when those sections are configured.

    Was this useful?
    On this page

    On this page