Text To Speech

Overview

The Text To Speech node converts a text string into a spoken audio file using a configured voice and speech provider. It is used in workflows where battle results, narration, or AI-generated responses need to be delivered as audio output. The node requires a provider credential and emits a URL or binary audio payload on success; if synthesis fails, execution routes through the error port. Audio format and voice characteristics are configurable per invocation.

Configuration

Field	Type	Required	Description
`provider`	enum	Yes	Speech synthesis provider to use. Supported values: openai, elevenlabs, google.
`voice`	string	Yes	Voice identifier as defined by the selected provider (e.g. 'alloy', 'nova', 'en-US-Neural2-F').
`model`	string	No	Provider model or engine variant to use for synthesis (e.g. 'tts-1-hd'). Defaults to the provider's standard model.
`outputFormat`	enum	No	Audio output format. Supported values: mp3, wav, ogg. Defaults to mp3.
`speed`	number	No	Speaking rate multiplier. Accepts values from 0.25 to 4.0. Defaults to 1.0.
`credentialId`	string	Yes	ID of the stored credential used to authenticate with the selected provider.

Inputs

Port	Type	Description
`text`	string	The text content to synthesize into speech. Accepts plain text or SSML markup if supported by the selected provider.

Outputs

Port	Type	Description
`output`	object	Emitted on successful synthesis. Contains audioUrl (string) pointing to the generated audio file and mimeType (string) describing the format.
`error`	object	Emitted when synthesis fails. Contains message (string) and code (string) describing the failure reason.

Example

json

{
  "nodeType": "text_to_speech",
  "config": {
    "provider": "openai",
    "voice": "nova",
    "model": "tts-1-hd",
    "outputFormat": "mp3",
    "speed": 1,
    "credentialId": "cred_openai_prod"
  }
}

Text To Speech ​

Overview ​

Configuration ​

Inputs ​

Outputs ​

Example ​