Seedance 2.0 Official Prompt Guide
Complete Multimodal Creative Experience ā The definitive user manual for ByteDance's Seedance 2.0 AI video generation model.
š For the best experience, try Seedance 2.0 on the official Dreamina website: jimeng.jianying.com
Seedance 2.0 is now fully available. If the server is under heavy load, try the Seedance 2.0 Fast model for a faster experience.
š Seedance 2.0 is Now Live on Dreamina!
From the day we could only use text and images to "tell stories," we've wanted to build a video model that truly understands you. Today, it's finally here!
Seedance 2.0 now supports four input modalities: image, video, audio, and text. The combination methods are richer than ever, giving you world-class controllability.
You can use a single image to set the visual style, a video to define character actions and camera movements, a few seconds of audio to establish rhythm and atmosphere... combined with text prompts, the creative process becomes more natural, more sophisticated, and more like being a real "director."
In this upgrade, "reference capabilities" are the biggest highlight:
- šØReference Images can precisely reproduce visual details and character features
- š¬Reference Videos support camera movements, complex action choreography, and creative effect replication
- šµAudio Support for background music and rhythm-synced generation ā not just creating, but "continuing to shoot"
- š¤Extension Capabilities enhanced simultaneously ā supporting character pose, emotion, and continuity across existing videos
We know that video creation has never been just about "generating" ā it's about precise control over every detail. Seedance 2.0 isn't just multimodal; it's a truly flexible creative approach.
Seedance 2.0. Multimodal creation. It starts here.
Example: A girl hanging laundry ā generated from a single reference image + text prompt. The action of shaking clothes and the swaying of hung garments are realistically reproduced.
1. Parameter Preview
| Input Type | Limit | Supported Formats | Max Size |
|---|---|---|---|
| Images | ⤠9 | jpeg, png, webp, bmp, tiff, gif | 30 MB each |
| Videos | ⤠3 | mp4, mov | 50 MB each, total duration 2ā15s |
| Audio | ⤠3 | mp3, wav | 15 MB each, total ⤠15s |
| Text | Natural language | ā | ā |
| Total Files | ⤠12 combined | ā | ā |
Output Parameters
- ā¢Duration: 4ā15 seconds (user-selectable)
- ā¢Audio: Includes auto-generated sound effects and background music
- ā¢Resolution: 480p (640Ć640) to 720p (834Ć1112), up to 2K output
2. Interaction Format ā The @ Reference System
Seedance 2.0 uses @ to assign roles to each uploaded asset. This is the most critical part of prompt writing.
How to Reference
@Image1 @Image2 @Image3 ... (up to @Image9) @Video1 @Video2 @Video3 @Audio1 @Audio2 @Audio3
Assigning Roles to References
Always explicitly state what each reference is for:
| Purpose | Example Syntax |
|---|---|
| First frame | @Image1 as the first frame |
| Last frame | @Image2 as the last frame |
| Character appearance | @Image1's character as the subject |
| Scene / background | scene references @Image3 |
| Camera movement | reference @Video1's camera movement |
| Action / motion | reference @Video1's action choreography |
| Visual effects | completely reference @Video1's effects and transitions |
| Rhythm / tempo | video rhythm references @Video1 |
| Voice / tone | narration voice references @Video1 |
| Background music | BGM references @Audio1 |
| Sound effects | sound effects reference @Video3's audio |
| Outfit / clothing | wearing the outfit from @Image2 |
| Product appearance | product details reference @Image3 |
Multi-Reference Combinations
You can combine multiple references in a single prompt:
@Image1's character as the subject, reference @Video1's camera movement and action choreography, BGM references @Audio1, scene references @Image2
Example: Multiple reference images + video input generating a fashion showcase with precise character, outfit, and visual effect replication.
ā ļø Important Notice: About Uploading Realistic Human Face Materials
The platform does not allow uploading images or videos containing realistic, clearly identifiable human faces. Such materials will be automatically blocked by the system. Please use illustrated, stylized, or non-photorealistic character references instead.
Seedance 2.0 Capabilities & Enhancement Preview
1. Significantly Enhanced Base Capabilities: More Stable, Smoother, More Realistic!
Seedance 2.0 delivers a fundamental leap in video generation quality. Motion is more stable and natural, transitions are smoother, and the output is dramatically more realistic compared to previous versions. The model excels at producing videos with natural physics, coherent movements, and cinematic visual quality.
2. Comprehensive Multimodal Upgrade: Video Creation Enters the "Free Combination" Era!
Seedance 2.0 introduces a truly multimodal approach to video creation. By accepting text, images, videos, and audio simultaneously, it enables creators to combine inputs freely ā unlocking creative possibilities that were previously impossible with single-modality systems.
2.1 Seedance 2.0 Multimodal Overview
Prompt Structure Blueprint
A well-structured Seedance 2.0 prompt follows this formula:
[Subject/Character Setup] + [Scene/Environment] + [Action/Motion] + [Camera Movement] + [Timing Breakdown] + [Transitions/Effects] + [Audio/Sound Design] + [Style/Mood]
Time-Segmented Prompts (Recommended for 10s+ Videos)
For precise control, break your prompt into timed segments:
0ā3s: [opening scene description, camera, action] 3ā6s: [mid-section development] 6ā10s: [climax or key action] 10ā15s: [resolution, ending shot, final text/branding]
Camera Language Reference
| Camera Term | Description |
|---|---|
| Push in / Slow push | Camera moves toward subject |
| Pull back / Pull away | Camera moves away from subject |
| Pan left / right | Camera rotates horizontally |
| Tilt up / down | Camera rotates vertically |
| Track / Follow shot | Camera follows subject movement |
| Orbit / Revolve | Camera circles around subject |
| One-take / Oner | Continuous shot with no cuts |
| Hitchcock zoom (dolly zoom) | Push in + zoom out ā vertigo effect |
| Fisheye lens | Ultra-wide distorted lens |
| Whip pan | Very fast horizontal pan creating motion blur |
| Crane shot | Vertical movement like a crane arm |
Shot Sizes
| Shot Size | Description |
|---|---|
| Extreme close-up | Eyes, mouth, or small detail only |
| Close-up | Face fills frame |
| Medium close-up | Head and shoulders |
| Medium shot | Waist up |
| Full shot | Entire body |
| Wide / Establishing shot | Full environment |
2.2 Special Usage Patterns (No Limits ā For Reference Only)
- āFirst/last frame + reference video actions:
@Image1 as first frame, reference @Video1's fight choreography - āExtend existing video:
Extend @Video1 by 5s(set generation length to 5s too) - āMerge multiple videos:
Add a scene between @Video1 and @Video2, content: xxx - āNo audio file? Reference video sound: You can directly reference the audio from a video input
- āContinuous action generation:
Character transitions from jumping to rolling, maintaining fluid motion+ multiple reference images
2.3 Those Previously Impossible Video Problems? Now They're Actually Solvable!
2.3.1 Comprehensive Consistency Improvements
Keep the same character consistent across multiple shots by anchoring to a reference image. Character identity, clothing, features, and style are maintained throughout the video.
Case Example:
The man in @Image1 walks tiredly down the hallway, slowing his steps, finally stopping at his front door. Close-up on his face ā he takes a deep breath, adjusts his emotions, replaces the weariness with a relaxed expression. Close-up of him finding his keys, inserting into the lock. After entering, his little daughter and a pet dog run to greet him with hugs. The interior is warm and cozy. Natural dialogue throughout.
2.3.2 High-Difficulty / Controllable Camera Movement & Precise Action Replication
Reference a video's exact camera work and replicate complex movements including Hitchcock zooms, orbit shots, mechanical arm tracking, and more.
Example: A man in a black suit flees through city streets ā with multi-cut camera changes, maintained clothing consistency, and natural sound design.
Case Example:
Reference @Image1's male character. He is in @Image2's elevator. Completely reference @Video1's camera movements and the protagonist's facial expressions. Hitchcock zoom during the fear moment, then several orbit shots showing the elevator interior. Elevator doors open, follow shot walking out. Exterior scene references @Image3. The man looks around, referencing @Video1's mechanical arm multi-angle tracking of the character's gaze.
2.3.3 Creative Templates / Complex VFX ā Precise Replication
Replicate transitions, advertising templates, and cinematic visual effects from reference videos. Swap characters while keeping the creative effects intact.
Case Example:
Replace @Video1's character with @Image1. @Image1 as the first frame. Character puts on VR sci-fi glasses. Reference @Video1's camera work ā close orbit shot transitions from third-person to character's subjective POV. Travel through the VR glasses into @Image2's deep blue universe. Several spaceships shuttle toward the distance. Camera follows ships into @Image3's pixel world. Low-altitude flyover of pixel mountains where trees grow procedurally. Then upward angle, rapid shuttle to @Image4's pale green textured planet, camera skims the planet surface.
2.3.4 Model Creativity & Story Completion
Seedance 2.0 can auto-generate storylines from storyboard images or scripts, filling in creative details and narrative elements.
Case ā Manga / Storyboard Interpretation:
Interpret @Image1 as a manga sequence from left to right, top to bottom. Keep the character dialogue consistent with the text in the images. Add special sound effects for panel transitions and key plot moments. Overall style: humorous and witty. Performance style references @Video1.
Case ā Storyboard Script Generation:
Reference @Image1's documentary storyboard script. Follow @Image1's framing, shot sizes, camera movements, visuals, and copy to create a 15-second healing-style opening sequence about "Childhood Through the Four Seasons."
2.3.5 Video Extension
Extend existing videos forward or backward with smooth continuity. Set the generation duration to match the extension length.
Case ā Forward Extension:
Extend @Video1 by 15 seconds. 1ā5s: Light and shadow slowly slide across the wooden table and cup through venetian blinds. Tree branches sway gently. 6ā10s: A coffee bean gently drifts down from the top of frame. Camera pushes toward the bean until the screen goes black. 11ā15s: English text gradually appears ā first line "Lucky Coffee", second line "Breakfast", third line "AM 7:00-10:00".
Case ā Backward Extension (Prepending):
Extend backward 10s. In warm afternoon light, the camera starts from the corner with the awning fluttering in the breeze, slowly tilting down to daisies peeking out at the wall base...
2.3.6 More Accurate Tone, More Realistic Voice
Native audio-video synchronization supports lip-sync dialogue in 8+ languages (including Chinese, English, Japanese, Korean, and Chinese dialects). Voice cloning, dialogue generation, and sound effect design are all built in.
Case ā Comedy Dialogue:
In the "Cat & Dog Roast Show" ā an emotionally expressive comedy segment: Cat host (licking paw, rolling eyes): "Who understands my suffering? This one next to me does nothing but wag his tail, destroy sofas, and con humans out of treats with those 'pet me I'm adorable' eyes..." Dog host (head tilted, tail wagging): "You're one to talk? You sleep 18 hours a day, wake up just to rub against humans' legs for canned food..." Sound: Comedy background music, exaggerated expressions, talk show pacing.
2.3.7 Stronger Shot Continuity (One-Take)
Generate seamless long takes that flow continuously across scenes with no cuts ā following characters through multiple environments in a single unbroken shot.
Case ā Spy Thriller One-Take:
Spy thriller style. @Image1 as the first frame. Camera tracks a woman in a red trench coat walking forward from the front. Full-shot follow. Passersby intermittently block the woman in red. She reaches a corner ā reference @Image2's corner building. Fixed camera as the woman exits frame and disappears around the corner. A masked girl lurks at the corner, glaring menacingly ā masked girl references @Image3. Camera pans forward to the woman in red as she walks into a mansion and vanishes ā mansion references @Image4. No cuts throughout. One continuous take from start to finish.
2.3.8 Highly Usable Video Editing
Modify existing videos with precise control ā swap characters, alter plots, add or remove elements while preserving the rest of the scene.
Case ā Plot Subversion:
Subvert @Video1's plot ā the man's expression shifts from tenderness to icy cruelty. In an unguarded moment, he shoves the female lead off the bridge into the water. The action is decisive, premeditated, without hesitation. The female lead falls with no scream, only disbelief in her eyes. She surfaces and screams: "You've been lying to me from the start!" The man stands on the bridge with a sinister smile, murmuring: "This is what your family owes mine."
Case ā Character Replacement:
Replace @Video1's female lead singer with @Image1's male lead singer. Actions completely mimic the original video. No shot cuts. Band performs the music throughout.
Case ā Element Addition:
Change the woman's hairstyle in @Video1 to long red hair. @Image1's great white shark slowly surfaces ā half its head visible ā lurking behind her.
2.3.9 Music Beat Matching
Synchronize visual rhythm precisely with music beats ā matching keyframe positions, transitions, and character movements to the audio.
Case Example:
@Image1 @Image2 @Image3 @Image4 @Image5 @Image6 @Image7 ā match the keyframe positions and overall rhythm of @Video1 for beat-synced cuts. Characters should have more dynamic movement. Overall visual style more dreamlike with strong visual tension. Adjust shot sizes and add lighting changes based on music and visual needs.
2.3.10 Better Emotional Performance
Characters can now express nuanced emotions ā from subtle tenderness to explosive rage, from quiet contemplation to joyful surprise. The model produces more natural and emotionally resonant performances.
Case ā Dramatic Scene (Short Drama):
Scene (0ā5s): Close-up on the character's reddened eyes, finger pointing accusingly, tears streaming down. Emotion on the edge of collapse. Dialogue 1 (Character A, choking with rage): "What exactly are you trying to take from me?" Scene (6ā10s): The other character trembles, holding up evidence, red-eyed, stepping forward. Camera sweeps past background details. Dialogue 2 (Character B, urgent and choked): "I'm not deceiving you! This is what he entrusted to me!" Scene (11ā15s): Evidence is revealed, Character A freezes ā expression shifts from anger to shock, hands slowly rise. Sound: Urgent piano + static interference, sobbing, ending with a muffled voice blending in. Duration: Precise 15 seconds, every frame tight.
Prompt Template Library
Template: Product Ad (15s)
Reference @Video1's editing style and camera transitions. Replace @Video1's product with @Image1 as the hero product. Create a 15-second product showcase video. 0ā3s: Product enters frame with dynamic rotation, close-up on surface texture and logo details. 4ā8s: Multiple angle transitions ā front, side, back ā with product highlight scanning light effects. 9ā12s: Product in lifestyle context showing usage scenario. 13ā15s: Hero shot with brand tagline appearing, background music builds to resolution. Sound: Reference @Video1's background music. Add product interaction sound effects.
Template: E-Commerce Product Showcase
Deconstruct the reference image. Static camera. Hamburger suspended and rotating mid-air. Ingredients gently and precisely separate while maintaining shape and proportion. Smooth motion, no extra effects. Hamburger splits apart ā golden sesame bun top, fresh green lettuce, dewy red tomato slices, two thick juicy beef patties with melting golden cheddar cheese, and soft bun base ā all slowly descend and perfectly reassemble into a complete deluxe double cheeseburger. Throughout, cheese continues to melt and drip slowly, lettuce and tomato dewdrops glisten, maintaining ultimate appetizing food aesthetics.
Template: Dance Video (13s)
Have the character in @Image1 replicate the dance moves and beat-synced music from @Video1. Generate a 13-second video. Movements should be smooth with no stuttering or freezing.
Template: Fantasy / Xianxia Action (15s)
15-second xianxia high-intensity battle scene, warm gold-red tones. 0ā3s: Low angle close-up on the protagonist's blue robe hem billowing in heat waves. Both hands grip a thunder-patterned greatsword, blade flashing crimson lightning. Molten lava churns on the ground. Demon soldiers roar and charge from the distance. Protagonist growls: "Today, with this blade, I vanquish your evil!" Sword chime and lava bubbling. 4ā8s: Orbiting quick-cut, protagonist spins and swings the sword. Blade tears through air, releasing red shockwaves. Front-line demons are blasted apart into ash. Sword energy whistling and demon wailing. 9ā12s: Low angle pull-back with slow motion, protagonist leaps skyward, blade condensing a massive thunderbolt arc striking down at the demon horde, lava splashing where the arc sweeps. 13ā15s: Slow push close-up of protagonist landing and sheathing the sword. Robe settles with residual energy. Blade still flickering with lightning. Cold voice: "This realm's gate ā you shall not cross." Freeze on a gate silhouette, audio fading to resonant tremor and wind.
Template: Science / Medical Education (15s)
Ultra-realistic 4K medical CGI, semi-transparent blue human upper body with clearly visible vascular system. Camera slowly pushes in, entering a clean artery. Blood flows smoothly, cool-toned clinical lighting creates a calming atmosphere. Mid-sequence: Symbolic sugar and fat particles from milk tea dissolve into the bloodstream. Camera tracks the blood flow. As blood viscosity increases, yellow lipid deposits gradually form on vessel walls. Final segment: Blood flow speed decreases, vessel lumen visibly narrows. Lighting shifts to slightly dimmer tones, creating an educational and cautionary atmosphere. 15-second health education clip.
Template: Scenery Montage with Music (15s)
@Image1 @Image2 @Image3 @Image4 @Image5 @Image6 ā landscape scene images. Reference @Video1's visual rhythm, inter-scene transitions, visual style, and music tempo for beat-synced editing.
Template: Video Fusion / Continuation
The particle-composed horse in @Video1 gradually becomes concrete ā particles densify, transitioning into @Video2. The running horse in @Video2 gradually transforms into @Video3 and slowly dissolves away. Ethereal visual atmosphere. Background audio: hoofbeats and futuristic particle sound effects.
Template: Long Video (>15s) ā Multi-Segment
## Long Video Prompt (Total ~30 seconds) Theme: Sword cultivator on misty mountain vs. demons Total segments: 2 Aspect ratio: 16:9 --- Segment 1 (0ā15s) ā Normal generation --- Duration: 15s 15-second xianxia shot. 0ā5s: Overhead view of churning cloud sea over immortal mountains, camera slowly pushes down through the clouds. 6ā10s: Sword cultivator stands at the cliff edge, back to camera, robes flowing in the wind. Dark energy rises in the distance. 11ā15s: Cultivator slowly turns to face camera, draws sword, blade glows golden. Steely eyes, low voice: "They're here." Freeze on the cultivator holding the sword facing camera. Handoff point: Cultivator facing camera with sword, dark energy churning behind. --- Segment 2 (15ā30s) ā Video extension --- Upload Segment 1 as @Video1 Duration: 15s Extend @Video1 by 15s. 0ā5s: Continuing from the cultivator holding the sword, dozens of shadow beasts surge from the dark energy and dive toward him. Cultivator leaps to meet them. 6ā10s: Aerial combat, sword energy criss-crossing, beasts slashed into ash particles. Orbiting quick-cuts. 11ā15s: Cultivator lands, sheathes sword. Golden explosion particles drift behind him. Slow push to close-up on his profile. Audio fades.
Style & Quality Modifiers
Append these to your prompts to enhance output quality:
Visual Style
- Cinematic quality, film grain, shallow depth of field
- 2.35:1 widescreen, 24fps
- Ink wash painting / Anime / Photorealistic
- High saturation neon colors, cool-warm contrast
- 4K medical CGI, semi-transparent visualization
- Ultra-fine CG animation
Mood & Atmosphere
- Tense and suspenseful
- Warm and healing
- Epic and grand
- Comedy with exaggerated expressions
- Documentary tone, restrained narration
- Dark fantasy / High-intensity xianxia
Audio Direction
- Background music: grand and majestic
- Sound effects: footsteps, crowd noise, car sounds
- Voice tone reference @Video1
- Beat-synced transitions matching music rhythm
- Footsteps, breathing, fabric rustling must be clear and beat-aligned
Common Mistakes to Avoid
Platform Specifications
| Specification | Details |
|---|---|
| Image Input | jpeg/png/webp/bmp/tiff/gif, up to 9 images, each <30MB |
| Video Input | mp4/mov, up to 3 videos, total 2ā15s, each <50MB, 480pā720p |
| Audio Input | mp3/wav, up to 3 files, total ā¤15s, each <15MB |
| Text Input | Natural language description |
| File Limit | Max 12 files total (images + videos + audio combined) |
| Duration | 4ā15 seconds per generation |
| Sound Output | Built-in sound effects and music |
| Resolution | Up to 2K output |
Demo Videos
Image-to-Video: Realistic Action Generation
A single reference image + text prompt produces realistic human actions with natural cloth physics.
Multi-Cut Scene Generation
Automatic scene changes with maintained clothing and color consistency across cuts.
Multiple Reference Inputs
Up to 9 images and 3 videos as simultaneous inputs ā with precise control over characters, outfits, and visual effects.
Reference Video Input Example
An example of video reference input used to control camera movement and action choreography.
š Final Words
Seedance 2.0 represents a new paradigm in AI video generation ā one where creators have true multimodal control over every aspect of the creative process. From a single text prompt to complex multi-reference compositions with precise camera work, sound design, and character consistency, the possibilities are vast.
We encourage you to experiment freely with different combinations of inputs and reference styles. The model is designed to understand creative intent and expand on your ideas while maintaining precise control where you need it.
Start creating at jimeng.jianying.com and bring your creative vision to life.
This guide is an English translation of ByteDance's official Seedance 2.0 user manual, originally published in Mandarin Chinese on ByteDance Lark Office. Content has been faithfully translated and adapted for English readers. All original media assets are preserved where available.