Skip to content

Generative Media Capability Matrix

All 8 media node types and their current implementation status across both execution surfaces.


Node Type Matrix

Node typeInput sourceOutputSync/AsyncWorker pathDAG runnerNotes
text_to_imageresolvedPromptimage URLSyncVia modelKind='image' branchTextToImageRunnerDALL-E 3, Stable Diffusion, FAL
text_to_speechresolvedPromptaudio URLSyncVia modelKind='audio' branchTextToSpeechRunnerElevenLabs, OpenAI TTS
text_to_videoresolvedPromptvideo URL or taskIdAsyncVia modelKind='video' branchTextToVideoRunnerKling, FAL — polls via CRON
image_to_imageUpstream image URLimage URLSyncNot yet on worker pathImageToImageRunnerRequires upstream node with mediaType: 'image'
speech_to_textUpstream audio URLtext transcriptSyncNot yet on worker pathSpeechToTextRunnerRequires upstream node with mediaType: 'audio'
image_to_audioUpstream image URLaudio URLSyncNot yet on worker pathImageToAudioRunnerMultimodal: image → ambient audio
image_upscaleUpstream image URLimage URLSyncNot yet on worker pathImageUpscaleRunnerSuper-resolution; scale param
media_convertUpstream media URLanyNot yetthrows NotImplementedErrorNo conversion service wired

Provider Support by Modality

Providertext→texttext→imagetext→audiotext→videotext→music
OpenAI✅ DALL-E 3✅ TTS
Anthropic
Google✅ Imagen
Mistral
Ollama✅ (local)
ElevenLabs
FAL
Kling
Suno
Stability AI

Async Media Flow

For text_to_video and text_to_music the provider returns immediately with a task ID:

Worker / DAG runner
  └─ callGenerativeMedia(provider, modality, key, model, prompt)
       └─ returns { status: 'pending', providerTaskId: 'xxx' }
            └─ stored as artifact_id in submissions (is_final: false)

poll-async-executions edge function (pg_cron every 30s)
  └─ reads submissions WHERE is_final = false
       └─ polls provider status
            └─ on 'completed': updates media_url, is_final = true, mime_type

Upstream-Input Runners

ImageToImageRunner, SpeechToTextRunner, ImageToAudioRunner, and ImageUpscaleRunner extract their input from ctx.upstreamOutputs:

typescript
for (const output of ctx.upstreamOutputs.values()) {
  if (output.mediaType === 'image' && output.url) {
    imageUrl = output.url
    break
  }
}

If no matching upstream output is found, the runner throws (not graceful fallback) — a clear diagnostic rather than silent failure.


Adding a New Media Provider

  1. Add the model to libs/providers/src/lib/model-registry.ts with the correct outputModality
  2. Implement the provider call in callGenerativeMedia() in libs/providers/src/lib/generative-media.ts
  3. The worker path picks it up automatically via modelKind(job.model_key)
  4. The DAG runner path works via ctx.executeProvider (engine closes over the provider)
  5. For async providers: ensure poll-async-executions edge function handles the new provider's polling API