LTX-Video Prompting Guide: Open-Source Cinematic Video with Voice and Real-Time Speed
guide
8 min read

LTX-Video Prompting Guide: Open-Source Cinematic Video with Voice and Real-Time Speed

F

Film Fun Academy

February 23, 2026

LTX-Video from Lightricks generates 30fps video — with dialogue, singing, and sound — faster than you can watch it. Here's how to prompt the most capable open-source video model.

LTX-Video is Lightricks' open-source DiT-based video model — the first capable of generating high-quality video in real-time. It produces 30fps output at 1216×704 resolution faster than playback speed. With models ranging from 2B to 13B parameters, built-in voice/dialogue generation, and ControlNet support (depth, pose, canny), it's the most feature-rich open-source AI video model available.

Developer: Lightricks
Available on Splice: Yes — splice.film.fun (as Lightricks LTX2).
Resolution: 1080p with audio.
Aspect ratios: 1:1 (square), 4:3, 16:9 (widescreen), 21:9 (ultrawide), 9:16 (vertical/stories), 3:4 (portrait).
Duration: 2, 3, 4, 5, 6, 7, 8, 9, or 10 seconds.
Features: First Frame input (image-to-video), Advanced settings, audio generation.


Prompt Structure

LTX-Video works best with a single flowing paragraph — not bullet points or fragmented instructions. Write it like a scene description, present tense, flowing naturally from beginning to end. Aim for 4-8 descriptive sentences.

The 6-Element Framework

Every prompt should cover these elements, woven into natural prose:

ElementWhat to IncludeKey Vocabulary
1. Shot & CameraShot scale, camera movement, end position"Slow dolly in," "handheld tracking," "camera pans right to reveal"
2. Scene & SettingLocation, time, lighting, atmosphere"Dimly lit jazz club," "golden-hour light," "fog at ground level"
3. Subject & CharacterAge, hairstyle, clothing, distinguishing featuresPhysical details, not abstract labels
4. ActionWhat happens, as a natural sequencePresent tense, flowing from beginning to end
5. Visual StyleColor palette, textures, film characteristics"Film noir palette," "warm amber," "film grain"
6. AudioAmbient sound, music, dialogue, voice qualitySound descriptions, dialogue in quotes

The Golden Rules

  • Present tense for everything: "walks" not "walked," "the camera pushes in" not "push the camera in"
  • Physical cues over emotional labels: "Her jaw tightens and she looks away" not "She feels sad"
  • Match detail to shot scale: Close-ups need specific textures and micro-expressions; wide shots need environment and atmosphere
  • Describe camera end-states: "The camera pushes in, settling on a close-up of her face" helps the model complete the motion accurately
  • One flowing paragraph: Not a shot list, not bullet points

Prompt Examples

Example 1: Dialogue Scene (Screenplay Style)

LTX-Video excels at screenplay-format prompts for dialogue scenes:

EXT. SMALL TOWN STREET – MORNING – LIVE NEWS BROADCAST. The shot 
opens on a news reporter standing in front of cordoned-off cars, 
yellow caution tape fluttering behind him. Warm early sun reflects 
off the camera lens. The reporter, composed but visibly excited, 
looks directly into the camera, microphone in hand. "Thank you, 
Sylvia — this morning, here in the quiet town of New Castle, 
Vermont, black gold has been found!" He gestures toward the field 
behind him. The camera pans right, slowly revealing a construction 
site. With a sudden roar, a geyser of oil erupts from the ground. 
Workers cheer, the black stream glistening in the morning light. 
The camera shakes slightly through the chaos.

Why this works: Screenplay format (EXT. LOCATION – TIME), dialogue in quotes, camera direction embedded naturally, physical action described sequentially, audio implied by the scene.

Example 2: Animation with Voice

The camera opens in a calm, sunlit frog yoga studio. Warm morning 
light washes over the wooden floor as incense smoke drifts lazily 
in the air. The senior frog instructor sits cross-legged at the 
center, eyes closed, voice deep and calm. "We are one with the 
pond." All the frogs answer softly: "Ommm..." He smiles faintly. 
"We are one with the flies." A pause. The camera pans to one frog 
who twitches, eyes darting. Suddenly its tongue snaps out, catching 
a fly mid-air. The master exhales slowly, still serene. "But we 
do not chase the flies... not during class." The guilty frog lowers 
its head in shame, folding its hands back into a meditative pose.

Why this works: Animated style implied by subjects (talking frogs), dialogue drives the story, physical comedy described through action not labels, camera movement serves the comedic reveal.

Example 3: Cinematic Atmosphere

A dimly lit jazz club in 1950s New York. A saxophone player in 
a wrinkled suit plays a slow melody with eyes closed, swaying 
gently. Smoke curls through amber spotlights behind him. A few 
patrons at small tables cradle drinks in the blue shadows. Medium 
shot, slightly low angle. The camera pushes in slowly, settling 
on his face as he opens his eyes mid-phrase. Warm amber spotlight 
on subject, deep blue shadows, film noir palette. The saxophone 
melody fills the room, mixing with the clink of glasses and low 
murmured conversation. Intimate, melancholic.

Example 4: Commercial Energy

A sunlit rooftop pool overlooking a city skyline at golden hour. 
A woman in an orange swimsuit runs toward the edge and dives into 
crystal-clear water. The camera tracks the arc of her dive in 
slight slow motion, water droplets catching the light. City 
buildings and palm trees glow behind her under a clear sky. Bright 
midday sun creates specular highlights dancing on the pool surface. 
Vivid saturated colors. The splash echoes against the rooftop 
walls, followed by the ambient hum of the city below. Energetic, 
aspirational.

Example 5: Horror Atmosphere

An abandoned Victorian hospital corridor stretches into darkness. 
An empty wheelchair sits in the center of the hallway. It begins 
to slowly roll forward on its own, wheels creaking on the tile 
floor. Peeling wallpaper lines the walls, broken ceiling tiles 
hang overhead, and a single fluorescent light flickers at the far 
end. Long shot, symmetrical composition. Camera completely static, 
locked off. Sickly green tint, deep black shadows. The only sound 
is the wheelchair's wheels and the electric buzz of the dying 
light. Dread, isolation.

Example 6: Nature Documentary

Dense Amazon canopy dappled with morning sunlight. A brilliant 
blue morpho butterfly lands delicately on a wet leaf, its wings 
slowly opening and closing. Water droplets glisten on the leaf 
surface. Extreme close-up, macro lens feel. Camera holds perfectly 
still. Rich saturated greens and electric blue. Natural diffused 
daylight filtering through the canopy. The ambient sound of 
distant birds and dripping water. Serene, detailed.

Voice and Dialogue

LTX-Video generates native audio including spoken dialogue, singing, ambient sound, and music. This is one of its standout features alongside Veo 3.

Dialogue Rules

  • Place spoken words in quotation marks within the scene description
  • Specify language and accent if needed
  • Describe voice quality with physical terms, not just emotion

Voice Style Vocabulary

StylePrompt Term
Authoritative"Resonant voice with gravitas"
Excited"Energetic announcer voice"
Creepy"Distorted radio-style voice"
Robotic"Flat robotic monotone"
Innocent"Childlike curiosity in her voice"
Quiet"She whispers" / "He mutters under his breath"
Loud"He shouts" / "She screams"

Audio Prompting

Describe the soundscape as part of your scene:

✅ "The saxophone melody fills the room, mixing with clinking 
   glasses and low murmured conversation"

✅ "Wind howls through the broken windows. Distant thunder. 
   Rain patters on the concrete floor."

✅ "Upbeat electronic music pulses in the background as 
   the crowd cheers"

Characters can sing. LTX-Video handles musical performance:

A street musician sits on stone steps in a narrow European alley, 
strumming an acoustic guitar and singing a warm, gravelly folk 
melody. His voice echoes off the old stone walls. Golden afternoon 
light catches dust motes in the air. Medium shot, static camera.

What LTX-Video Excels At

StrengthDetails
Cinematic compositionsWide, medium, and close-up shots with thoughtful lighting, shallow depth of field, natural motion
Emotive human momentsStrong single-subject emotional expressions, subtle gestures, facial nuance
Atmosphere & settingFog, mist, golden-hour light, rain, reflections, ambient textures
Clear camera languageExplicit instructions: "slow dolly in," "handheld tracking," "camera pans right"
Stylized aestheticsPainterly, noir, analog film, fashion editorial, pixelated animation, claymation, stop-motion
Lighting & moodBacklighting, color palettes, rim light, flickering lamps, neon glow
Voice & dialogueCharacters talk and sing, multiple languages, ambient sound, music
SpeedReal-time generation enables rapid iteration — test 10 variations in the time other models do 1

What to Avoid

AvoidWhyDo This Instead
Emotional labels"Sad" or "confused" don't translate to visual output"Eyes lowered, jaw tightened, she turns away"
Text and logosReadable text is not currently reliableUse text-in-video models like Wan 2.1 for this
Complex physicsChaotic multi-body motion introduces artifactsKeep physics simple; dancing is OK, demolition derby is not
Overloaded scenesToo many characters or simultaneous actions reduce qualityOne primary subject, one clear action
Conflicting lightingMixed light logic confuses scene interpretationCommit to one lighting setup per generation
Overcomplicated promptsStart simple and layer complexity graduallyBegin with 3-4 sentences, add detail only if needed
Past tense or commandsModel responds best to present tense narration"She walks" not "She walked" or "Make her walk"

Style Vocabulary

Categories

TypeStyles
AnimationStop-motion, 2D / 3D animation, claymation, hand-drawn
StylizedComic book, cyberpunk, 8-bit pixel, surreal, minimalist, painterly, illustrated
CinematicPeriod drama, film noir, fantasy, epic space opera, thriller, modern romance, experimental film, arthouse, documentary

Visual Details

ElementTerms
LightingFlickering candles, neon glow, natural sunlight, dramatic shadows, rim light, backlighting
TexturesRough stone, smooth metal, worn fabric, glossy surfaces, wet pavement
ColorVibrant, muted, monochromatic, high contrast, warm amber, cool blue
AtmosphereFog, rain, dust, smoke, particles, heat haze, mist
FilmFilm grain, lens flares, pixelated edges, jittery stop-motion, shallow depth of field
PacingSlow motion, time-lapse, lingering shot, continuous shot, freeze-frame, fade-in/out

Using LTX2 on Splice

On Splice, LTX2 is available as Lightricks LTX2 with these settings:

SettingOptions
Resolution1080p (always on)
AudioEnabled — dialogue, sound, music all generated
Aspect ratio1:1, 4:3, 16:9, 21:9, 9:16, 3:4
Duration2–10 seconds (2, 3, 4, 5, 6, 7, 8, 9, 10)
First FrameToggle on to use an image as the starting frame (I2V)
AdvancedAdditional generation settings

Choosing Your Aspect Ratio

RatioUse Case
16:9Cinematic widescreen — films, YouTube, presentations
21:9Ultra-cinematic — letterbox feel, epic landscapes
9:16Vertical — TikTok, Instagram Reels, Stories
1:1Square — Instagram feed, social thumbnails
4:3Classic — retro film feel, documentary
3:4Portrait — character-focused, mobile-friendly

Tip: Match aspect ratio to your shot type. Wide establishing shots work best in 16:9 or 21:9. Close-up portraits work in 3:4 or 1:1. Vertical content (9:16) needs composition designed for tall frames — stack elements vertically, not horizontally.

First Frame (Image-to-Video)

Toggle First Frame on to upload a starting image. LTX2 will animate from that frame.

When using First Frame, your prompt should describe motion and action only — don't redescribe what's already in the image:

✅ "She turns to camera and smiles. Wind catches her hair. 
   Warm afternoon light."

❌ "A woman with brown hair wearing a blue dress standing 
   in a garden..." (the image already shows this)

Choosing Your Duration

DurationBest ForPrompt Complexity
2–3sMicro-moments — a glance, a gesture, a reaction2-3 sentences, one action
4–5sSingle scene beats — a short dialogue line, one camera move4-5 sentences
6–8sFull scene moments — action + reaction, camera choreography5-7 sentences, full 6-element
9–10sExtended takes — multi-beat scenes, dialogue exchanges6-8 sentences, screenplay style

Tips:

  • Match prompt length to duration — a 2s clip doesn't need 8 sentences; a 10s clip needs the detail
  • Shorter = sharper — 2-3s clips tend to have higher visual quality and coherence
  • Longer = more narrative — 8-10s gives room for dialogue, camera movement, and action sequences
  • Start mid-action for punchy short clips: "She's already mid-stride" not "She starts to walk"
  • End states always matter: "Then holds gaze" prevents motion from feeling unfinished
  • Build sequences by generating multiple clips and editing them together in Splice

Common Mistakes

❌ Writing bullet-point prompts

Bad: "Setting: jazz club. Subject: saxophone player. 
Action: playing. Camera: push in."

Good: "A dimly lit jazz club in 1950s New York. A saxophone 
player in a wrinkled suit plays a slow melody with eyes closed, 
swaying gently. The camera pushes in slowly, settling on his 
face as he opens his eyes mid-phrase."

LTX-Video reads prompts as flowing narrative, not structured data.

❌ Using emotional labels instead of physical cues

Bad: "A sad woman sits alone."
Good: "A woman sits alone at the table, eyes lowered, fingers 
tracing the rim of an empty glass. She exhales slowly and 
looks toward the rain-streaked window."

❌ Forgetting the audio

Bad: "A street performer plays guitar."
Good: "A street performer strums an acoustic guitar, the warm 
melody echoing off stone walls. Passersby murmur in the 
background. A distant church bell rings."

❌ Skipping camera end-states

Bad: "The camera pushes in."
Good: "The camera pushes in slowly, settling on a close-up of 
her face as her expression shifts."

Describing where the camera ends up helps LTX-Video complete the motion cleanly.


Pro Tips

  1. Write a scene, not a shot list — One flowing paragraph in present tense outperforms structured prompts
  2. Physical cues over emotion words — "Jaw tightens, eyes narrow" beats "angry"
  3. Dialogue in quotes works — Characters can talk, argue, joke, and sing
  4. Camera end-states improve motion — "Settles on," "comes to rest at," "holds on" tells the model where to stop
  5. Screenplay format for dialogue scenesEXT. LOCATION – TIME with character lines works surprisingly well
  6. Mood word at the end — A single word like "melancholic" or "triumphant" shifts the entire generation
  7. First Frame for maximum control — Generate your perfect still with an image model, then animate it with LTX2
  8. Describe the soundscape — Ambient audio, music, voice quality — LTX2 generates it all
  9. Match prompt length to duration — 2s clips need 2-3 sentences; 10s clips need 6-8
  10. Use short durations for quality, long for narrative — 2-3s is sharper; 8-10s gives room for dialogue and camera moves

Ready to put these techniques into practice? Try Splice — film.fun's AI Creator Studio. Generate video, edit in the browser, and bring your stories to life. Learn more at academy.

📬 Enjoyed this? Get weekly AI filmmaking tips

Join thousands of creators getting guides like this delivered to their inbox every week.