Skip to content

Text To Speech

Overview

The Text To Speech node converts a text string into a spoken audio file using a configured voice and speech provider. It is used in workflows where battle results, narration, or AI-generated responses need to be delivered as audio output. The node requires a provider credential and emits a URL or binary audio payload on success; if synthesis fails, execution routes through the error port. Audio format and voice characteristics are configurable per invocation.

Configuration

FieldTypeRequiredDescription
providerenumYesSpeech synthesis provider to use. Supported values: openai, elevenlabs, google.
voicestringYesVoice identifier as defined by the selected provider (e.g. 'alloy', 'nova', 'en-US-Neural2-F').
modelstringNoProvider model or engine variant to use for synthesis (e.g. 'tts-1-hd'). Defaults to the provider's standard model.
outputFormatenumNoAudio output format. Supported values: mp3, wav, ogg. Defaults to mp3.
speednumberNoSpeaking rate multiplier. Accepts values from 0.25 to 4.0. Defaults to 1.0.
credentialIdstringYesID of the stored credential used to authenticate with the selected provider.

Inputs

PortTypeDescription
textstringThe text content to synthesize into speech. Accepts plain text or SSML markup if supported by the selected provider.

Outputs

PortTypeDescription
outputobjectEmitted on successful synthesis. Contains audioUrl (string) pointing to the generated audio file and mimeType (string) describing the format.
errorobjectEmitted when synthesis fails. Contains message (string) and code (string) describing the failure reason.

Example

json
{
  "nodeType": "text_to_speech",
  "config": {
    "provider": "openai",
    "voice": "nova",
    "model": "tts-1-hd",
    "outputFormat": "mp3",
    "speed": 1,
    "credentialId": "cred_openai_prod"
  }
}