LTX-Video Prompting Guide: How to Use Lightricks’ Real-Time Open-Source AI Video Model (LTX2)

LTX-Video from Lightricks generates 30fps video — with dialogue, singing, and sound — faster than you can watch it. Here's how to prompt the most capable open-source video model.

LTX-Video is Lightricks' open-source DiT-based video model — the first capable of generating high-quality video in real-time. It produces 30fps output at 1216×704 resolution faster than playback speed. With models ranging from 2B to 13B parameters, built-in voice/dialogue generation, and ControlNet support (depth, pose, canny), it's the most feature-rich open-source AI video model available.

Developer: Lightricks
Available on Splice: Yes — splice.film.fun (as Lightricks LTX2).
Resolution: 1080p with audio.
Aspect ratios: 1:1 (square), 4:3, 16:9 (widescreen), 21:9 (ultrawide), 9:16 (vertical/stories), 3:4 (portrait).
Duration: 2, 3, 4, 5, 6, 7, 8, 9, or 10 seconds.
Features: First Frame input (image-to-video), Advanced settings, audio generation.

Prompt Structure

LTX-Video works best with a single flowing paragraph — not bullet points or fragmented instructions. Write it like a scene description, present tense, flowing naturally from beginning to end. Aim for 4-8 descriptive sentences.

The 6-Element Framework

Every prompt should cover these elements, woven into natural prose:

Element	What to Include	Key Vocabulary
1. Shot & Camera	Shot scale, camera movement, end position	"Slow dolly in," "handheld tracking," "camera pans right to reveal"
2. Scene & Setting	Location, time, lighting, atmosphere	"Dimly lit jazz club," "golden-hour light," "fog at ground level"
3. Subject & Character	Age, hairstyle, clothing, distinguishing features	Physical details, not abstract labels
4. Action	What happens, as a natural sequence	Present tense, flowing from beginning to end
5. Visual Style	Color palette, textures, film characteristics	"Film noir palette," "warm amber," "film grain"
6. Audio	Ambient sound, music, dialogue, voice quality	Sound descriptions, dialogue in quotes

The Golden Rules

Present tense for everything: "walks" not "walked," "the camera pushes in" not "push the camera in"
Physical cues over emotional labels: "Her jaw tightens and she looks away" not "She feels sad"
Match detail to shot scale: Close-ups need specific textures and micro-expressions; wide shots need environment and atmosphere
Describe camera end-states: "The camera pushes in, settling on a close-up of her face" helps the model complete the motion accurately
One flowing paragraph: Not a shot list, not bullet points

Prompt Examples

Example 1: Dialogue Scene (Screenplay Style)

LTX-Video excels at screenplay-format prompts for dialogue scenes:

EXT. SMALL TOWN STREET – MORNING – LIVE NEWS BROADCAST. The shot 
opens on a news reporter standing in front of cordoned-off cars, 
yellow caution tape fluttering behind him. Warm early sun reflects 
off the camera lens. The reporter, composed but visibly excited, 
looks directly into the camera, microphone in hand. "Thank you, 
Sylvia — this morning, here in the quiet town of New Castle, 
Vermont, black gold has been found!" He gestures toward the field 
behind him. The camera pans right, slowly revealing a construction 
site. With a sudden roar, a geyser of oil erupts from the ground. 
Workers cheer, the black stream glistening in the morning light. 
The camera shakes slightly through the chaos.

Why this works: Screenplay format (EXT. LOCATION – TIME), dialogue in quotes, camera direction embedded naturally, physical action described sequentially, audio implied by the scene.

Example 2: Animation with Voice

The camera opens in a calm, sunlit frog yoga studio. Warm morning 
light washes over the wooden floor as incense smoke drifts lazily 
in the air. The senior frog instructor sits cross-legged at the 
center, eyes closed, voice deep and calm. "We are one with the 
pond." All the frogs answer softly: "Ommm..." He smiles faintly. 
"We are one with the flies." A pause. The camera pans to one frog 
who twitches, eyes darting. Suddenly its tongue snaps out, catching 
a fly mid-air. The master exhales slowly, still serene. "But we 
do not chase the flies... not during class." The guilty frog lowers 
its head in shame, folding its hands back into a meditative pose.

Why this works: Animated style implied by subjects (talking frogs), dialogue drives the story, physical comedy described through action not labels, camera movement serves the comedic reveal.

Example 3: Cinematic Atmosphere

A dimly lit jazz club in 1950s New York. A saxophone player in 
a wrinkled suit plays a slow melody with eyes closed, swaying 
gently. Smoke curls through amber spotlights behind him. A few 
patrons at small tables cradle drinks in the blue shadows. Medium 
shot, slightly low angle. The camera pushes in slowly, settling 
on his face as he opens his eyes mid-phrase. Warm amber spotlight 
on subject, deep blue shadows, film noir palette. The saxophone 
melody fills the room, mixing with the clink of glasses and low 
murmured conversation. Intimate, melancholic.

Example 4: Commercial Energy

A sunlit rooftop pool overlooking a city skyline at golden hour. 
A woman in an orange swimsuit runs toward the edge and dives into 
crystal-clear water. The camera tracks the arc of her dive in 
slight slow motion, water droplets catching the light. City 
buildings and palm trees glow behind her under a clear sky. Bright 
midday sun creates specular highlights dancing on the pool surface. 
Vivid saturated colors. The splash echoes against the rooftop 
walls, followed by the ambient hum of the city below. Energetic, 
aspirational.

Example 5: Horror Atmosphere

An abandoned Victorian hospital corridor stretches into darkness. 
An empty wheelchair sits in the center of the hallway. It begins 
to slowly roll forward on its own, wheels creaking on the tile 
floor. Peeling wallpaper lines the walls, broken ceiling tiles 
hang overhead, and a single fluorescent light flickers at the far 
end. Long shot, symmetrical composition. Camera completely static, 
locked off. Sickly green tint, deep black shadows. The only sound 
is the wheelchair's wheels and the electric buzz of the dying 
light. Dread, isolation.

Example 6: Nature Documentary

Dense Amazon canopy dappled with morning sunlight. A brilliant 
blue morpho butterfly lands delicately on a wet leaf, its wings 
slowly opening and closing. Water droplets glisten on the leaf 
surface. Extreme close-up, macro lens feel. Camera holds perfectly 
still. Rich saturated greens and electric blue. Natural diffused 
daylight filtering through the canopy. The ambient sound of 
distant birds and dripping water. Serene, detailed.

Voice and Dialogue

LTX-Video generates native audio including spoken dialogue, singing, ambient sound, and music. This is one of its standout features alongside Veo 3.

Dialogue Rules

Place spoken words in quotation marks within the scene description
Specify language and accent if needed
Describe voice quality with physical terms, not just emotion

Voice Style Vocabulary

Style	Prompt Term
Authoritative	"Resonant voice with gravitas"
Excited	"Energetic announcer voice"
Creepy	"Distorted radio-style voice"
Robotic	"Flat robotic monotone"
Innocent	"Childlike curiosity in her voice"
Quiet	"She whispers" / "He mutters under his breath"
Loud	"He shouts" / "She screams"

Audio Prompting

Describe the soundscape as part of your scene:

✅ "The saxophone melody fills the room, mixing with clinking 
   glasses and low murmured conversation"

✅ "Wind howls through the broken windows. Distant thunder. 
   Rain patters on the concrete floor."

✅ "Upbeat electronic music pulses in the background as 
   the crowd cheers"

Characters can sing. LTX-Video handles musical performance:

A street musician sits on stone steps in a narrow European alley, 
strumming an acoustic guitar and singing a warm, gravelly folk 
melody. His voice echoes off the old stone walls. Golden afternoon 
light catches dust motes in the air. Medium shot, static camera.

What LTX-Video Excels At

Strength	Details
Cinematic compositions	Wide, medium, and close-up shots with thoughtful lighting, shallow depth of field, natural motion
Emotive human moments	Strong single-subject emotional expressions, subtle gestures, facial nuance
Atmosphere & setting	Fog, mist, golden-hour light, rain, reflections, ambient textures
Clear camera language	Explicit instructions: "slow dolly in," "handheld tracking," "camera pans right"
Stylized aesthetics	Painterly, noir, analog film, fashion editorial, pixelated animation, claymation, stop-motion
Lighting & mood	Backlighting, color palettes, rim light, flickering lamps, neon glow
Voice & dialogue	Characters talk and sing, multiple languages, ambient sound, music
Speed	Real-time generation enables rapid iteration — test 10 variations in the time other models do 1

What to Avoid

Avoid	Why	Do This Instead
Emotional labels	"Sad" or "confused" don't translate to visual output	"Eyes lowered, jaw tightened, she turns away"
Text and logos	Readable text is not currently reliable	Use text-in-video models like Wan 2.1 for this
Complex physics	Chaotic multi-body motion introduces artifacts	Keep physics simple; dancing is OK, demolition derby is not
Overloaded scenes	Too many characters or simultaneous actions reduce quality	One primary subject, one clear action
Conflicting lighting	Mixed light logic confuses scene interpretation	Commit to one lighting setup per generation
Overcomplicated prompts	Start simple and layer complexity gradually	Begin with 3-4 sentences, add detail only if needed
Past tense or commands	Model responds best to present tense narration	"She walks" not "She walked" or "Make her walk"

Style Vocabulary

Type	Styles
Animation	Stop-motion, 2D / 3D animation, claymation, hand-drawn
Stylized	Comic book, cyberpunk, 8-bit pixel, surreal, minimalist, painterly, illustrated
Cinematic	Period drama, film noir, fantasy, epic space opera, thriller, modern romance, experimental film, arthouse, documentary

Visual Details

Element	Terms
Lighting	Flickering candles, neon glow, natural sunlight, dramatic shadows, rim light, backlighting
Textures	Rough stone, smooth metal, worn fabric, glossy surfaces, wet pavement
Color	Vibrant, muted, monochromatic, high contrast, warm amber, cool blue
Atmosphere	Fog, rain, dust, smoke, particles, heat haze, mist
Film	Film grain, lens flares, pixelated edges, jittery stop-motion, shallow depth of field
Pacing	Slow motion, time-lapse, lingering shot, continuous shot, freeze-frame, fade-in/out

Using LTX2 on Splice

On Splice, LTX2 is available as Lightricks LTX2 with these settings:

Setting	Options
Resolution	1080p (always on)
Audio	Enabled — dialogue, sound, music all generated
Aspect ratio	1:1, 4:3, 16:9, 21:9, 9:16, 3:4
Duration	2–10 seconds (2, 3, 4, 5, 6, 7, 8, 9, 10)
First Frame	Toggle on to use an image as the starting frame (I2V)
Advanced	Additional generation settings

Choosing Your Aspect Ratio

Ratio	Use Case
16:9	Cinematic widescreen — films, YouTube, presentations
21:9	Ultra-cinematic — letterbox feel, epic landscapes
9:16	Vertical — TikTok, Instagram Reels, Stories
1:1	Square — Instagram feed, social thumbnails
4:3	Classic — retro film feel, documentary
3:4	Portrait — character-focused, mobile-friendly

Tip: Match aspect ratio to your shot type. Wide establishing shots work best in 16:9 or 21:9. Close-up portraits work in 3:4 or 1:1. Vertical content (9:16) needs composition designed for tall frames — stack elements vertically, not horizontally.

First Frame (Image-to-Video)

Toggle First Frame on to upload a starting image. LTX2 will animate from that frame.

When using First Frame, your prompt should describe motion and action only — don't redescribe what's already in the image:

✅ "She turns to camera and smiles. Wind catches her hair. 
   Warm afternoon light."

❌ "A woman with brown hair wearing a blue dress standing 
   in a garden..." (the image already shows this)

Choosing Your Duration

Duration	Best For	Prompt Complexity
2–3s	Micro-moments — a glance, a gesture, a reaction	2-3 sentences, one action
4–5s	Single scene beats — a short dialogue line, one camera move	4-5 sentences
6–8s	Full scene moments — action + reaction, camera choreography	5-7 sentences, full 6-element
9–10s	Extended takes — multi-beat scenes, dialogue exchanges	6-8 sentences, screenplay style

Tips:

Match prompt length to duration — a 2s clip doesn't need 8 sentences; a 10s clip needs the detail
Shorter = sharper — 2-3s clips tend to have higher visual quality and coherence
Longer = more narrative — 8-10s gives room for dialogue, camera movement, and action sequences
Start mid-action for punchy short clips: "She's already mid-stride" not "She starts to walk"
End states always matter: "Then holds gaze" prevents motion from feeling unfinished
Build sequences by generating multiple clips and editing them together in Splice

Common Mistakes

❌ Writing bullet-point prompts

Bad: "Setting: jazz club. Subject: saxophone player. 
Action: playing. Camera: push in."

Good: "A dimly lit jazz club in 1950s New York. A saxophone 
player in a wrinkled suit plays a slow melody with eyes closed, 
swaying gently. The camera pushes in slowly, settling on his 
face as he opens his eyes mid-phrase."

LTX-Video reads prompts as flowing narrative, not structured data.

❌ Using emotional labels instead of physical cues

Bad: "A sad woman sits alone."
Good: "A woman sits alone at the table, eyes lowered, fingers 
tracing the rim of an empty glass. She exhales slowly and 
looks toward the rain-streaked window."

❌ Forgetting the audio

Bad: "A street performer plays guitar."
Good: "A street performer strums an acoustic guitar, the warm 
melody echoing off stone walls. Passersby murmur in the 
background. A distant church bell rings."

❌ Skipping camera end-states

Bad: "The camera pushes in."
Good: "The camera pushes in slowly, settling on a close-up of 
her face as her expression shifts."

Describing where the camera ends up helps LTX-Video complete the motion cleanly.

Pro Tips

Write a scene, not a shot list — One flowing paragraph in present tense outperforms structured prompts
Physical cues over emotion words — "Jaw tightens, eyes narrow" beats "angry"
Dialogue in quotes works — Characters can talk, argue, joke, and sing
Camera end-states improve motion — "Settles on," "comes to rest at," "holds on" tells the model where to stop
Screenplay format for dialogue scenes — EXT. LOCATION – TIME with character lines works surprisingly well
Mood word at the end — A single word like "melancholic" or "triumphant" shifts the entire generation
First Frame for maximum control — Generate your perfect still with an image model, then animate it with LTX2
Describe the soundscape — Ambient audio, music, voice quality — LTX2 generates it all
Match prompt length to duration — 2s clips need 2-3 sentences; 10s clips need 6-8
Use short durations for quality, long for narrative — 2-3s is sharper; 8-10s gives room for dialogue and camera moves

Ready to put these techniques into practice? Try Splice — film.fun's AI Creator Studio. Generate video, edit in the browser, and bring your stories to life. Learn more at academy.

LTX-Video Prompting Guide: Open-Source Cinematic Video with Voice and Real-Time Speed