T2V · I2V · R2V Explained

These models support several generation "modes." Picking the right one is your biggest lever on quality and control. This page covers what each mode does and when to use it.

Text-to-Video (T2V)

You describe a scene in words; the model invents the rest: subject, environment, motion, camera, and (on most 2026 models) audio. No input image anchors the result, so you trade control for creative freedom.

Pro move: If you need a specific look, generate a still image first (T2I), then feed it into I2V. You get the imagination of text plus the lock-in of an image.

Image-to-Video (I2V)

You supply a starting image (and sometimes an ending frame); the model animates it. Appearance, composition, lighting, and identity are largely "locked" by the image, so your prompt mostly controls motion and camera.

First/last frame: Kling 3, Seedance 2.0, and Wan 2.7 support setting both a start and end frame. This is the most reliable way to choreograph a precise transformation (pose A → pose B).

Reference-to-Video (R2V)

The most powerful and most misunderstood mode. Instead of one start frame, you feed the model a library of references (character images, a motion clip, a style board, an audio track) and you tell it, in your prompt, what to take from each. The model extracts those elements and builds a new video.

This is where Seedance 2.0 and Wan 2.7 shine. A single generation can combine, for example: the character from Image 1, the camera move from Video 1, the lighting style from Image 2, and the voice timbre from Audio 1.

@Image1 = character. @Video1 = camera movement. Generate: she walks toward the lens through a rain-soaked alley at night, neon reflections on the pavement.

R2V sub-tasks (Seedance terminology)

ByteDance's own guide splits reference workflows into three task types, a useful mental model for any model:

TaskWhat it doesPrompt pattern
ReferenceExtract elements (subject, style, motion, sound) to make a new video."Refer to the [action/style/sound] in @Video1 to generate…"
EditModify part of an existing video; everything unmentioned stays the same."Strictly edit @Video1, changing its [original feature] to [new feature]…"
ExtendContinue a clip forward (or backward) with consistent identity and style."Extend @Video1, generate…"

When editing or extending, reference the clip directly (e.g. @Video1) rather than saying "Reference Video 1," which the model can misread as a new referencing task.

Watch: Reference-to-Video in Venice Studio

Full library: Video Guides.

Which mode should I start with?

T2VText only
No source material, exploring ideas, or generating from pure imagination.
I2VOne image
You have (or can make) the exact subject as a still and want it animated faithfully.
R2VMany refs
You need consistency, motion transfer, multiple characters, or audio/voice control.