Audio

Overview

The Audio node executes a lens-backed AI call that produces an audio artifact — such as a generated voiceover, music clip, or sound effect — using an audio-capable provider (ElevenLabs, Suno, OpenAI TTS). Setting nodeType: 'audio' on the node config causes the execution engine to route the call through the provider pipeline, enforce the modality guard (blocking execution if the governing agent policy forbids audio output), and emit a NodeOutputEnvelope with artifactKind: 'audio' and a media.url pointing to the stored asset. Use this node when a workflow step must produce audio from a lens prompt, rather than using purpose-built utility nodes such as text_to_speech or image_to_audio, which target specific fixed transformations.

Configuration

Field	Type	Required	Description
`modelId`	string	Yes	AI model key for the audio-capable provider, e.g. `elevenlabs:eleven_multilingual_v2` or `openai:gpt-4o-mini-tts`. Must be registered in the model registry and resolve to a provider with an audio adapter.
`nodeType`	enum	Yes	Must be set to `audio`. Signals the execution engine to apply the audio modality guard and write `artifactKind: 'audio'` on the output envelope.
`funding_source`	enum	No	Funding mode for execution. One of `platform_credit`, `user_byok_cloud`, or `user_byok_local`. Defaults to `platform_credit`. Audio generation via fal-ai adapters requires `platform_credit` or `user_byok_local`.
`param_overrides`	object	No	Static key-value overrides applied to the lens template params after edge resolution — for example `{ voice_id: '21m00Tcm4TlvDq8ikWAM', speed: '1.0' }`.
`retry`	object	No	Retry policy override for this node. Shape: `{ attempts: number, backoffMs: number, retryOn: string[] }`. Defaults to 2 attempts with 1 s back-off, retrying on `timeout`, `provider_error`, and `rate_limit`.
`timeoutMs`	number	No	Per-call timeout in milliseconds before the node is cancelled and retried (if configured). Audio generation is typically slower than text; set at least 60000 ms for music-generation models.
`moderation`	enum	No	Moderation policy applied to the rendered input prompt before the provider call. One of `none`, `basic`, or `strict`. Defaults to the run-level moderation setting.

Inputs

Port	Type	Description
`input`	text	Rendered prompt text wired from an upstream node or root inputs. Passed verbatim to the audio provider after lens template resolution and moderation checks.
`attachments`	any	Optional multimodal context inferred from upstream edge outputs (e.g. an image URL resolved by `inferAttachmentsFromRendered`). Passed as `ExecutionInput.attachments` to the provider when present.

Outputs

Port	Type	Description
`output`	audio	Primary output envelope with `artifactKind: 'audio'`. The `media.url` field holds the URL of the stored audio asset; `media.mime` carries the MIME type (e.g. `audio/mpeg`). The `output` string field contains the URL as a plain string for downstream `[[label]]` template interpolation.
`error`	error	Emitted when all retry attempts are exhausted or a non-retriable provider error occurs. Contains `message`, `nodeId`, and the last error cause (`timeout`, `rate_limit`, or `provider_error`). Route to an `error_catch` node to handle gracefully.

Example

json

{
  "nodeType": "audio",
  "config": {
    "nodeType": "audio",
    "modelId": "elevenlabs:eleven_multilingual_v2",
    "funding_source": "platform_credit",
    "param_overrides": {
      "voice_id": "21m00Tcm4TlvDq8ikWAM",
      "speed": "1.0",
      "format": "mp3"
    },
    "timeoutMs": 60000,
    "retry": {
      "attempts": 2,
      "backoffMs": 3000,
      "retryOn": [
        "timeout",
        "rate_limit"
      ]
    }
  }
}

Audio ​

Overview ​

Configuration ​

Inputs ​

Outputs ​

Example ​