Prompting Fundamentals

2026 video models work like directors. Talk to them like one. A keyword salad ("woman, beautiful, 4k, cinematic") wastes the model. A shot description ("handheld camera drifts behind her as she…") gets results. This page is the shared foundation; each model page adds its own quirks.

The universal formula

Kling, Seedance, Wan, and Happy Horse respond to the same underlying structure. Order matters: lead with the camera, end with mood/audio.

[Camera shot & movement] + [Subject & specific action] + [Environment & lighting] + [Texture & physical detail] + [Mood / audio]

Weak vs. strong, element by element

Layer	Weak ❌	Strong ✅
Camera	"camera follows her"	"handheld shoulder-cam drifts behind her with subtle sway"
Action	"a woman walking"	"she walks at a steady pace, each foot landing heel-first, weight rolling forward"
Lighting	"cinematic lighting"	"flickering neon casts magenta and cyan on the wet asphalt"
Texture	"looks realistic"	"condensation on the glass, visible breath in the cold air, fabric sheen"
Mood/audio	"nice music"	"melancholic; distant traffic hum and soft rain on the awning"

Ten rules that work everywhere

Lead with the camera. Shot type (wide / medium / close-up / macro) + movement (dolly push, orbit, tracking, whip-pan, FPV drone, locked-off tripod). This sets the entire feel.
Use real cinematography verbs. "Dolly push," "rack focus," "crash zoom," "crane up." Generic words like "moves" give the model nothing.
Describe physics, not adjectives. "Heel-first," "weight transfer," "hair lags then settles," "fabric ripples in the wind." Physics language kills sliding feet and floaty motion.
Give the shot a timeline. Beginning → middle → end. For precise control, use beats: 0–2s … 2–4s … 4–5s. This prevents "frozen moment" outputs and identity drift.
One main action per clip. Don't cram three scene changes into a 5-second generation. Use multi-shot mode (Kling/Seedance) for sequences.
Match prompt to references. In I2V/R2V, don't contradict the image. If the reference wears red, don't write "blue dress."
Set aspect ratio up front. 16:9 for YouTube, 9:16 for TikTok/Reels, 1:1 for feed. Re-cropping later destroys quality.
Iterate one variable at a time. Close but not quite? Change a single element and re-run. Don't rewrite the whole prompt.
Draft cheap, finish expensive. Nail composition and motion at 720p / 4s, then render the keeper at full res / full length.
Know your model's negative-prompt support. Kling and Wan support negatives ("no warped fingers, no jitter"). Seedance does not. State what you want instead ("clear sunny sky," not "no rain").

Camera & motion vocabulary cheat-sheet

Goal	Phrasing that works
Intimacy / realization	"slow dolly push-in toward the face"
Energy / action	"dynamic FPV drone shot, whips and rolls 360°"
Reveal	"slow pan from the face to reveal the figure behind"
Documentary feel	"handheld, slight imperfect sway, quick reframes"
Product hero	"locked-off tripod, then 360° orbit, dramatic side light"
Emphasis shift	"rack focus from foreground to background"
Stylized speed	"speed ramp from 40% to 100% as the action peaks"

Multi-shot prompting

Kling 3 (up to 6 shots) and Seedance 2.0 generate coherent multi-shot sequences in one pass. Label each shot and keep character names consistent so identity carries across cuts.

Shot 1 (0–4s): Medium. [Character A] at a café window, rain outside, dull light. Shot 2 (4–7s): Close-up of her hand setting down the cup. Shot 3 (7–12s): Wide. She stands and walks out; warm light spills in.

Prompting audio

Most 2026 models generate sound in the same pass. You can direct it:

Ambient: "distant city traffic, soft rain on the awning."
SFX timing: "a single door creak as she enters."
Dialogue + lip-sync: supply the line; Seedance and Happy Horse sync phonemes across multiple languages.
Music/beat-sync: on Seedance, feed a track as a reference and ask it to cut to the beat.