"precision" Best for: high-quality final delivery, talking-head videos, and content where accurate lip-sync is critical.
How Precision Mode Works
Precision mode uses avatar inference and multiple models to re-render the speaker’s mouth movements to match the translated audio—producing significantly more realistic lip-sync than Speed mode. It requires longer processing time and is recommended for polished, client-facing, or broadcast-quality output.Quick Start
1. List Supported Languages
Before translating, fetch available target language codes:2. Submit a Translation (Single Language)
Batch (Multiple Languages)
3. Poll for Status
| Status | Meaning |
|---|---|
pending | Queued |
running | Avatar inference in progress |
completed | Done — video_url is available |
failed | Check failure_message |
Precision mode takes longer than Speed mode — plan polling intervals accordingly (e.g. every 30–60 seconds for longer videos).
Source Video Input
| Type | Example |
|---|---|
| URL | { "type": "url", "url": "https://example.com/video.mp4" } |
| Asset ID | { "type": "asset_id", "asset_id": "<asset_id>" } |
The URL must be publicly accessible (test by opening in an incognito browser).
Precision Mode Options
These parameters are particularly relevant for Precision mode:| Parameter | Default | Description |
|---|---|---|
mode | "speed" | Set to "precision" to enable avatar inference |
speaker_num | auto | Number of speakers |
translate_audio_only | false | When true, skips avatar inference and only dubs audio (negates precision benefit) |
enable_dynamic_duration | true | Allows output duration to vary to match natural speech pacing |
disable_music_track | false | Strips background music from output |
enable_speech_enhancement | false | Improves speech audio quality |
enable_caption | false | Generates captions alongside the video |
brand_voice_id | — | Apply a custom brand voice (requires setup) |
srt | — | Custom subtitle file — Enterprise plan only |
srt_role | — | "input" or "output" — which video the SRT applies to. Enterprise only |
callback_url | — | Webhook URL notified on completion or failure |
callback_id | — | Your own ID, echoed back in the webhook payload |
Tip: Setting speaker_num is especially important in Precision mode — accurate speaker separation directly improves the quality of avatar inference per speaker.
Captions
To enable captions, setenable_caption: true in the translation request. Once completed, download them:
srt, vtt.
Proofread Before Finalizing
Precision mode fully supports the proofread workflow — review and edit subtitles before committing to the full avatar inference render. This is especially valuable in Precision mode since generation takes longer and costs more.Step 1 — Create Proofread Session
proofread_ids — one per language.
Step 2 — Poll Until completed
Step 3 — Download & Edit the SRT
srt_url file locally, then upload the revised version:
Step 4 — Generate Final Video
video_translation_id to poll via GET /v3/video-translations/<id>.
Other Operations
List All Translations
has_more + next_token for pagination.
Delete a Translation
When to Use Speed vs. Precision
| Speed | Precision | |
|---|---|---|
| Processing Time | Faster | Slower |
| Translation | Adequate | Context- and Gender-Aware |
| Lip-Sync Quality | Standard | High |
| Best For | Faces with little movement, quick drafts | Faces with significant movement, side angles, or occlusions; final delivery videos |

