Skip to content

Audio

Overview

The Audio node executes a lens-backed AI call that produces an audio artifact — such as a generated voiceover, music clip, or sound effect — using an audio-capable provider (ElevenLabs, Suno, OpenAI TTS). Setting nodeType: 'audio' on the node config causes the execution engine to route the call through the provider pipeline, enforce the modality guard (blocking execution if the governing agent policy forbids audio output), and emit a NodeOutputEnvelope with artifactKind: 'audio' and a media.url pointing to the stored asset. Use this node when a workflow step must produce audio from a lens prompt, rather than using purpose-built utility nodes such as text_to_speech or image_to_audio, which target specific fixed transformations.

Configuration

FieldTypeRequiredDescription
modelIdstringYesAI model key for the audio-capable provider, e.g. elevenlabs:eleven_multilingual_v2 or openai:gpt-4o-mini-tts. Must be registered in the model registry and resolve to a provider with an audio adapter.
nodeTypeenumYesMust be set to audio. Signals the execution engine to apply the audio modality guard and write artifactKind: 'audio' on the output envelope.
funding_sourceenumNoFunding mode for execution. One of platform_credit, user_byok_cloud, or user_byok_local. Defaults to platform_credit. Audio generation via fal-ai adapters requires platform_credit or user_byok_local.
param_overridesobjectNoStatic key-value overrides applied to the lens template params after edge resolution — for example { voice_id: '21m00Tcm4TlvDq8ikWAM', speed: '1.0' }.
retryobjectNoRetry policy override for this node. Shape: { attempts: number, backoffMs: number, retryOn: string[] }. Defaults to 2 attempts with 1 s back-off, retrying on timeout, provider_error, and rate_limit.
timeoutMsnumberNoPer-call timeout in milliseconds before the node is cancelled and retried (if configured). Audio generation is typically slower than text; set at least 60000 ms for music-generation models.
moderationenumNoModeration policy applied to the rendered input prompt before the provider call. One of none, basic, or strict. Defaults to the run-level moderation setting.

Inputs

PortTypeDescription
inputtextRendered prompt text wired from an upstream node or root inputs. Passed verbatim to the audio provider after lens template resolution and moderation checks.
attachmentsanyOptional multimodal context inferred from upstream edge outputs (e.g. an image URL resolved by inferAttachmentsFromRendered). Passed as ExecutionInput.attachments to the provider when present.

Outputs

PortTypeDescription
outputaudioPrimary output envelope with artifactKind: 'audio'. The media.url field holds the URL of the stored audio asset; media.mime carries the MIME type (e.g. audio/mpeg). The output string field contains the URL as a plain string for downstream [[label]] template interpolation.
errorerrorEmitted when all retry attempts are exhausted or a non-retriable provider error occurs. Contains message, nodeId, and the last error cause (timeout, rate_limit, or provider_error). Route to an error_catch node to handle gracefully.

Example

json
{
  "nodeType": "audio",
  "config": {
    "nodeType": "audio",
    "modelId": "elevenlabs:eleven_multilingual_v2",
    "funding_source": "platform_credit",
    "param_overrides": {
      "voice_id": "21m00Tcm4TlvDq8ikWAM",
      "speed": "1.0",
      "format": "mp3"
    },
    "timeoutMs": 60000,
    "retry": {
      "attempts": 2,
      "backoffMs": 3000,
      "retryOn": [
        "timeout",
        "rate_limit"
      ]
    }
  }
}