Parallel LogoPARALLEL
Official Guide — Translated from Mandarin

Seedance 2.0 Official Prompt Guide

Complete Multimodal Creative Experience — The definitive user manual for ByteDance's Seedance 2.0 AI video generation model.

Translated from ByteDance Lark OfficeLast updated: February 2026

🌈 For the best experience, try Seedance 2.0 on the official Dreamina website: jimeng.jianying.com

Seedance 2.0 is now fully available. If the server is under heavy load, try the Seedance 2.0 Fast model for a faster experience.

šŸŒ€ Seedance 2.0 is Now Live on Dreamina!

From the day we could only use text and images to "tell stories," we've wanted to build a video model that truly understands you. Today, it's finally here!

Seedance 2.0 now supports four input modalities: image, video, audio, and text. The combination methods are richer than ever, giving you world-class controllability.

You can use a single image to set the visual style, a video to define character actions and camera movements, a few seconds of audio to establish rhythm and atmosphere... combined with text prompts, the creative process becomes more natural, more sophisticated, and more like being a real "director."

In this upgrade, "reference capabilities" are the biggest highlight:

  • šŸŽØReference Images can precisely reproduce visual details and character features
  • šŸŽ¬Reference Videos support camera movements, complex action choreography, and creative effect replication
  • šŸŽµAudio Support for background music and rhythm-synced generation — not just creating, but "continuing to shoot"
  • šŸŽ¤Extension Capabilities enhanced simultaneously — supporting character pose, emotion, and continuity across existing videos

We know that video creation has never been just about "generating" — it's about precise control over every detail. Seedance 2.0 isn't just multimodal; it's a truly flexible creative approach.

Seedance 2.0. Multimodal creation. It starts here.

Example: A girl hanging laundry — generated from a single reference image + text prompt. The action of shaking clothes and the swaying of hung garments are realistically reproduced.

1. Parameter Preview

Input TypeLimitSupported FormatsMax Size
Images≤ 9jpeg, png, webp, bmp, tiff, gif30 MB each
Videos≤ 3mp4, mov50 MB each, total duration 2–15s
Audio≤ 3mp3, wav15 MB each, total ≤ 15s
TextNatural language——
Total Files≤ 12 combined——

Output Parameters

  • •Duration: 4–15 seconds (user-selectable)
  • •Audio: Includes auto-generated sound effects and background music
  • •Resolution: 480p (640Ɨ640) to 720p (834Ɨ1112), up to 2K output

2. Interaction Format — The @ Reference System

Seedance 2.0 uses @ to assign roles to each uploaded asset. This is the most critical part of prompt writing.

How to Reference

@Image1    @Image2    @Image3   ... (up to @Image9)
@Video1    @Video2    @Video3
@Audio1    @Audio2    @Audio3

Assigning Roles to References

Always explicitly state what each reference is for:

PurposeExample Syntax
First frame@Image1 as the first frame
Last frame@Image2 as the last frame
Character appearance@Image1's character as the subject
Scene / backgroundscene references @Image3
Camera movementreference @Video1's camera movement
Action / motionreference @Video1's action choreography
Visual effectscompletely reference @Video1's effects and transitions
Rhythm / tempovideo rhythm references @Video1
Voice / tonenarration voice references @Video1
Background musicBGM references @Audio1
Sound effectssound effects reference @Video3's audio
Outfit / clothingwearing the outfit from @Image2
Product appearanceproduct details reference @Image3

Multi-Reference Combinations

You can combine multiple references in a single prompt:

@Image1's character as the subject, reference @Video1's camera movement
and action choreography, BGM references @Audio1, scene references @Image2

Example: Multiple reference images + video input generating a fashion showcase with precise character, outfit, and visual effect replication.

āš ļø Important Notice: About Uploading Realistic Human Face Materials

The platform does not allow uploading images or videos containing realistic, clearly identifiable human faces. Such materials will be automatically blocked by the system. Please use illustrated, stylized, or non-photorealistic character references instead.

Seedance 2.0 Capabilities & Enhancement Preview

1. Significantly Enhanced Base Capabilities: More Stable, Smoother, More Realistic!

Seedance 2.0 delivers a fundamental leap in video generation quality. Motion is more stable and natural, transitions are smoother, and the output is dramatically more realistic compared to previous versions. The model excels at producing videos with natural physics, coherent movements, and cinematic visual quality.

2. Comprehensive Multimodal Upgrade: Video Creation Enters the "Free Combination" Era!

Seedance 2.0 introduces a truly multimodal approach to video creation. By accepting text, images, videos, and audio simultaneously, it enables creators to combine inputs freely — unlocking creative possibilities that were previously impossible with single-modality systems.

2.1 Seedance 2.0 Multimodal Overview

Prompt Structure Blueprint

A well-structured Seedance 2.0 prompt follows this formula:

[Subject/Character Setup] + [Scene/Environment] + [Action/Motion] +
[Camera Movement] + [Timing Breakdown] + [Transitions/Effects] +
[Audio/Sound Design] + [Style/Mood]

Time-Segmented Prompts (Recommended for 10s+ Videos)

For precise control, break your prompt into timed segments:

0–3s: [opening scene description, camera, action]
3–6s: [mid-section development]
6–10s: [climax or key action]
10–15s: [resolution, ending shot, final text/branding]

Camera Language Reference

Camera TermDescription
Push in / Slow pushCamera moves toward subject
Pull back / Pull awayCamera moves away from subject
Pan left / rightCamera rotates horizontally
Tilt up / downCamera rotates vertically
Track / Follow shotCamera follows subject movement
Orbit / RevolveCamera circles around subject
One-take / OnerContinuous shot with no cuts
Hitchcock zoom (dolly zoom)Push in + zoom out — vertigo effect
Fisheye lensUltra-wide distorted lens
Whip panVery fast horizontal pan creating motion blur
Crane shotVertical movement like a crane arm

Shot Sizes

Shot SizeDescription
Extreme close-upEyes, mouth, or small detail only
Close-upFace fills frame
Medium close-upHead and shoulders
Medium shotWaist up
Full shotEntire body
Wide / Establishing shotFull environment

2.2 Special Usage Patterns (No Limits — For Reference Only)

  • →First/last frame + reference video actions: @Image1 as first frame, reference @Video1's fight choreography
  • →Extend existing video: Extend @Video1 by 5s (set generation length to 5s too)
  • →Merge multiple videos: Add a scene between @Video1 and @Video2, content: xxx
  • →No audio file? Reference video sound: You can directly reference the audio from a video input
  • →Continuous action generation: Character transitions from jumping to rolling, maintaining fluid motion + multiple reference images

2.3 Those Previously Impossible Video Problems? Now They're Actually Solvable!

2.3.1 Comprehensive Consistency Improvements

Keep the same character consistent across multiple shots by anchoring to a reference image. Character identity, clothing, features, and style are maintained throughout the video.

Case Example:

The man in @Image1 walks tiredly down the hallway, slowing his steps,
finally stopping at his front door. Close-up on his face — he takes a
deep breath, adjusts his emotions, replaces the weariness with a relaxed
expression. Close-up of him finding his keys, inserting into the lock.
After entering, his little daughter and a pet dog run to greet him with
hugs. The interior is warm and cozy. Natural dialogue throughout.

2.3.2 High-Difficulty / Controllable Camera Movement & Precise Action Replication

Reference a video's exact camera work and replicate complex movements including Hitchcock zooms, orbit shots, mechanical arm tracking, and more.

Example: A man in a black suit flees through city streets — with multi-cut camera changes, maintained clothing consistency, and natural sound design.

Case Example:

Reference @Image1's male character. He is in @Image2's elevator.
Completely reference @Video1's camera movements and the protagonist's
facial expressions. Hitchcock zoom during the fear moment, then several
orbit shots showing the elevator interior. Elevator doors open, follow
shot walking out. Exterior scene references @Image3. The man looks
around, referencing @Video1's mechanical arm multi-angle tracking of
the character's gaze.

2.3.3 Creative Templates / Complex VFX — Precise Replication

Replicate transitions, advertising templates, and cinematic visual effects from reference videos. Swap characters while keeping the creative effects intact.

Case Example:

Replace @Video1's character with @Image1. @Image1 as the first frame.
Character puts on VR sci-fi glasses. Reference @Video1's camera work —
close orbit shot transitions from third-person to character's subjective
POV. Travel through the VR glasses into @Image2's deep blue universe.
Several spaceships shuttle toward the distance. Camera follows ships
into @Image3's pixel world. Low-altitude flyover of pixel mountains
where trees grow procedurally. Then upward angle, rapid shuttle to
@Image4's pale green textured planet, camera skims the planet surface.

2.3.4 Model Creativity & Story Completion

Seedance 2.0 can auto-generate storylines from storyboard images or scripts, filling in creative details and narrative elements.

Case — Manga / Storyboard Interpretation:

Interpret @Image1 as a manga sequence from left to right, top to bottom.
Keep the character dialogue consistent with the text in the images.
Add special sound effects for panel transitions and key plot moments.
Overall style: humorous and witty. Performance style references @Video1.

Case — Storyboard Script Generation:

Reference @Image1's documentary storyboard script. Follow @Image1's
framing, shot sizes, camera movements, visuals, and copy to create a
15-second healing-style opening sequence about "Childhood Through the
Four Seasons."

2.3.5 Video Extension

Extend existing videos forward or backward with smooth continuity. Set the generation duration to match the extension length.

Case — Forward Extension:

Extend @Video1 by 15 seconds.
1–5s: Light and shadow slowly slide across the wooden table and cup
through venetian blinds. Tree branches sway gently.
6–10s: A coffee bean gently drifts down from the top of frame. Camera
pushes toward the bean until the screen goes black.
11–15s: English text gradually appears — first line "Lucky Coffee",
second line "Breakfast", third line "AM 7:00-10:00".

Case — Backward Extension (Prepending):

Extend backward 10s. In warm afternoon light, the camera starts from
the corner with the awning fluttering in the breeze, slowly tilting
down to daisies peeking out at the wall base...
Tip: When extending, set the generation duration to match the new portion (e.g., extend by 5s → select 5s generation length).

2.3.6 More Accurate Tone, More Realistic Voice

Native audio-video synchronization supports lip-sync dialogue in 8+ languages (including Chinese, English, Japanese, Korean, and Chinese dialects). Voice cloning, dialogue generation, and sound effect design are all built in.

Case — Comedy Dialogue:

In the "Cat & Dog Roast Show" — an emotionally expressive comedy segment:

Cat host (licking paw, rolling eyes): "Who understands my suffering?
This one next to me does nothing but wag his tail, destroy sofas, and
con humans out of treats with those 'pet me I'm adorable' eyes..."

Dog host (head tilted, tail wagging): "You're one to talk? You sleep 18
hours a day, wake up just to rub against humans' legs for canned food..."

Sound: Comedy background music, exaggerated expressions, talk show pacing.

2.3.7 Stronger Shot Continuity (One-Take)

Generate seamless long takes that flow continuously across scenes with no cuts — following characters through multiple environments in a single unbroken shot.

Case — Spy Thriller One-Take:

Spy thriller style. @Image1 as the first frame. Camera tracks a woman
in a red trench coat walking forward from the front. Full-shot follow.
Passersby intermittently block the woman in red. She reaches a corner
— reference @Image2's corner building. Fixed camera as the woman exits
frame and disappears around the corner. A masked girl lurks at the
corner, glaring menacingly — masked girl references @Image3. Camera
pans forward to the woman in red as she walks into a mansion and
vanishes — mansion references @Image4. No cuts throughout. One continuous
take from start to finish.

2.3.8 Highly Usable Video Editing

Modify existing videos with precise control — swap characters, alter plots, add or remove elements while preserving the rest of the scene.

Case — Plot Subversion:

Subvert @Video1's plot — the man's expression shifts from tenderness to
icy cruelty. In an unguarded moment, he shoves the female lead off the
bridge into the water. The action is decisive, premeditated, without
hesitation. The female lead falls with no scream, only disbelief in her
eyes. She surfaces and screams: "You've been lying to me from the start!"
The man stands on the bridge with a sinister smile, murmuring: "This is
what your family owes mine."

Case — Character Replacement:

Replace @Video1's female lead singer with @Image1's male lead singer.
Actions completely mimic the original video. No shot cuts. Band performs
the music throughout.

Case — Element Addition:

Change the woman's hairstyle in @Video1 to long red hair. @Image1's
great white shark slowly surfaces — half its head visible — lurking
behind her.

2.3.9 Music Beat Matching

Synchronize visual rhythm precisely with music beats — matching keyframe positions, transitions, and character movements to the audio.

Case Example:

@Image1 @Image2 @Image3 @Image4 @Image5 @Image6 @Image7 — match the
keyframe positions and overall rhythm of @Video1 for beat-synced cuts.
Characters should have more dynamic movement. Overall visual style more
dreamlike with strong visual tension. Adjust shot sizes and add lighting
changes based on music and visual needs.

2.3.10 Better Emotional Performance

Characters can now express nuanced emotions — from subtle tenderness to explosive rage, from quiet contemplation to joyful surprise. The model produces more natural and emotionally resonant performances.

Case — Dramatic Scene (Short Drama):

Scene (0–5s): Close-up on the character's reddened eyes, finger
pointing accusingly, tears streaming down. Emotion on the edge of
collapse.

Dialogue 1 (Character A, choking with rage): "What exactly are you
trying to take from me?"

Scene (6–10s): The other character trembles, holding up evidence,
red-eyed, stepping forward. Camera sweeps past background details.

Dialogue 2 (Character B, urgent and choked): "I'm not deceiving you!
This is what he entrusted to me!"

Scene (11–15s): Evidence is revealed, Character A freezes — expression
shifts from anger to shock, hands slowly rise.

Sound: Urgent piano + static interference, sobbing, ending with a
muffled voice blending in.
Duration: Precise 15 seconds, every frame tight.

Prompt Template Library

Template: Product Ad (15s)

Reference @Video1's editing style and camera transitions. Replace
@Video1's product with @Image1 as the hero product. Create a 15-second
product showcase video.

0–3s: Product enters frame with dynamic rotation, close-up on surface
texture and logo details.
4–8s: Multiple angle transitions — front, side, back — with product
highlight scanning light effects.
9–12s: Product in lifestyle context showing usage scenario.
13–15s: Hero shot with brand tagline appearing, background music builds
to resolution.

Sound: Reference @Video1's background music. Add product interaction
sound effects.

Template: E-Commerce Product Showcase

Deconstruct the reference image. Static camera. Hamburger suspended and
rotating mid-air. Ingredients gently and precisely separate while
maintaining shape and proportion. Smooth motion, no extra effects.

Hamburger splits apart — golden sesame bun top, fresh green lettuce,
dewy red tomato slices, two thick juicy beef patties with melting golden
cheddar cheese, and soft bun base — all slowly descend and perfectly
reassemble into a complete deluxe double cheeseburger.

Throughout, cheese continues to melt and drip slowly, lettuce and tomato
dewdrops glisten, maintaining ultimate appetizing food aesthetics.

Template: Dance Video (13s)

Have the character in @Image1 replicate the dance moves and beat-synced
music from @Video1. Generate a 13-second video. Movements should be
smooth with no stuttering or freezing.

Template: Fantasy / Xianxia Action (15s)

15-second xianxia high-intensity battle scene, warm gold-red tones.

0–3s: Low angle close-up on the protagonist's blue robe hem billowing
in heat waves. Both hands grip a thunder-patterned greatsword, blade
flashing crimson lightning. Molten lava churns on the ground. Demon
soldiers roar and charge from the distance. Protagonist growls: "Today,
with this blade, I vanquish your evil!" Sword chime and lava bubbling.

4–8s: Orbiting quick-cut, protagonist spins and swings the sword.
Blade tears through air, releasing red shockwaves. Front-line demons
are blasted apart into ash. Sword energy whistling and demon wailing.

9–12s: Low angle pull-back with slow motion, protagonist leaps skyward,
blade condensing a massive thunderbolt arc striking down at the demon
horde, lava splashing where the arc sweeps.

13–15s: Slow push close-up of protagonist landing and sheathing the
sword. Robe settles with residual energy. Blade still flickering with
lightning. Cold voice: "This realm's gate — you shall not cross."
Freeze on a gate silhouette, audio fading to resonant tremor and wind.

Template: Science / Medical Education (15s)

Ultra-realistic 4K medical CGI, semi-transparent blue human upper body
with clearly visible vascular system. Camera slowly pushes in, entering
a clean artery. Blood flows smoothly, cool-toned clinical lighting
creates a calming atmosphere.

Mid-sequence: Symbolic sugar and fat particles from milk tea dissolve
into the bloodstream. Camera tracks the blood flow. As blood viscosity
increases, yellow lipid deposits gradually form on vessel walls.

Final segment: Blood flow speed decreases, vessel lumen visibly narrows.
Lighting shifts to slightly dimmer tones, creating an educational and
cautionary atmosphere. 15-second health education clip.

Template: Scenery Montage with Music (15s)

@Image1 @Image2 @Image3 @Image4 @Image5 @Image6 — landscape scene
images. Reference @Video1's visual rhythm, inter-scene transitions,
visual style, and music tempo for beat-synced editing.

Template: Video Fusion / Continuation

The particle-composed horse in @Video1 gradually becomes concrete —
particles densify, transitioning into @Video2. The running horse in
@Video2 gradually transforms into @Video3 and slowly dissolves away.
Ethereal visual atmosphere. Background audio: hoofbeats and futuristic
particle sound effects.

Template: Long Video (>15s) — Multi-Segment

For videos over 15 seconds, generate in segments with continuity between each:
## Long Video Prompt (Total ~30 seconds)

Theme: Sword cultivator on misty mountain vs. demons
Total segments: 2
Aspect ratio: 16:9

--- Segment 1 (0–15s) — Normal generation ---
Duration: 15s

15-second xianxia shot. 0–5s: Overhead view of churning cloud sea over
immortal mountains, camera slowly pushes down through the clouds.
6–10s: Sword cultivator stands at the cliff edge, back to camera, robes
flowing in the wind. Dark energy rises in the distance.
11–15s: Cultivator slowly turns to face camera, draws sword, blade
glows golden. Steely eyes, low voice: "They're here." Freeze on the
cultivator holding the sword facing camera.

Handoff point: Cultivator facing camera with sword, dark energy churning
behind.

--- Segment 2 (15–30s) — Video extension ---
Upload Segment 1 as @Video1
Duration: 15s

Extend @Video1 by 15s. 0–5s: Continuing from the cultivator holding
the sword, dozens of shadow beasts surge from the dark energy and dive
toward him. Cultivator leaps to meet them.
6–10s: Aerial combat, sword energy criss-crossing, beasts slashed into
ash particles. Orbiting quick-cuts.
11–15s: Cultivator lands, sheathes sword. Golden explosion particles
drift behind him. Slow push to close-up on his profile. Audio fades.

Style & Quality Modifiers

Append these to your prompts to enhance output quality:

Visual Style

  • Cinematic quality, film grain, shallow depth of field
  • 2.35:1 widescreen, 24fps
  • Ink wash painting / Anime / Photorealistic
  • High saturation neon colors, cool-warm contrast
  • 4K medical CGI, semi-transparent visualization
  • Ultra-fine CG animation

Mood & Atmosphere

  • Tense and suspenseful
  • Warm and healing
  • Epic and grand
  • Comedy with exaggerated expressions
  • Documentary tone, restrained narration
  • Dark fantasy / High-intensity xianxia

Audio Direction

  • Background music: grand and majestic
  • Sound effects: footsteps, crowd noise, car sounds
  • Voice tone reference @Video1
  • Beat-synced transitions matching music rhythm
  • Footsteps, breathing, fabric rustling must be clear and beat-aligned

Common Mistakes to Avoid

1.
Vague references: Don't just say "reference @Video1" — specify WHAT to reference (camera? action? effects? rhythm?).
2.
Conflicting instructions: Don't ask for "static camera" and "orbit shot" in the same segment.
3.
Overloading: Don't try to pack too many scenes into 4–5 seconds — keep it physically plausible.
4.
Missing @ assignments: If you upload 5 images, make sure each one is referenced with a clear purpose.
5.
Ignoring audio: Sound design dramatically improves output — always include audio direction.
6.
Duration mismatch: Match your prompt complexity to the selected generation length.
7.
Realistic faces: Don't upload clear, identifiable real human photos — the system will block them.

Platform Specifications

SpecificationDetails
Image Inputjpeg/png/webp/bmp/tiff/gif, up to 9 images, each <30MB
Video Inputmp4/mov, up to 3 videos, total 2–15s, each <50MB, 480p–720p
Audio Inputmp3/wav, up to 3 files, total ≤15s, each <15MB
Text InputNatural language description
File LimitMax 12 files total (images + videos + audio combined)
Duration4–15 seconds per generation
Sound OutputBuilt-in sound effects and music
ResolutionUp to 2K output

Demo Videos

Image-to-Video: Realistic Action Generation

A single reference image + text prompt produces realistic human actions with natural cloth physics.

Multi-Cut Scene Generation

Automatic scene changes with maintained clothing and color consistency across cuts.

Multiple Reference Inputs

Up to 9 images and 3 videos as simultaneous inputs — with precise control over characters, outfits, and visual effects.

Reference Video Input Example

An example of video reference input used to control camera movement and action choreography.

šŸ Final Words

Seedance 2.0 represents a new paradigm in AI video generation — one where creators have true multimodal control over every aspect of the creative process. From a single text prompt to complex multi-reference compositions with precise camera work, sound design, and character consistency, the possibilities are vast.

We encourage you to experiment freely with different combinations of inputs and reference styles. The model is designed to understand creative intent and expand on your ideas while maintaining precise control where you need it.

Start creating at jimeng.jianying.com and bring your creative vision to life.

This guide is an English translation of ByteDance's official Seedance 2.0 user manual, originally published in Mandarin Chinese on ByteDance Lark Office. Content has been faithfully translated and adapted for English readers. All original media assets are preserved where available.