Documentation Index
Fetch the complete documentation index at: https://heygen-1fa696a7.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Overview
POST https://api.heygen.com/v2/video/generate
Generates videos using the AI Studio backend with support for avatars, voices, and dynamic backgrounds. You can create videos using either your photo avatar or digital twin. This endpoint supports Avatar III and Avatar IV.
Each video is composed of one or more scenes (up to 50), where each scene defines its own avatar, voice, background, and on-screen text.
Request Body
Top-Level Parameters
| Parameter | Type | Required | Description |
|---|
video_inputs | array | Yes | Array of scene objects (1–50). Each scene defines an avatar, voice, background, and optional text. |
caption | boolean | No | Enable captions in the video. Only supported for text-based voice input. Default: false. |
caption_mode | string | No | file_only or burn_in. When set, takes precedence over caption. |
title | string | No | Title of the video. |
callback_id | string | No | Custom ID for callback/webhook tracking. |
callback_url | string | No | URL to notify when video rendering is complete. |
dimension | object | No | Custom output dimensions. Defaults to 1920×1080. Width and height must be even, between 128 and 4096. |
dimension.width | integer | No | Width of the output video. Default: 1920. |
dimension.height | integer | No | Height of the output video. Default: 1080. |
fps | float | No | Output frame rate. Default: 25.0. |
folder_id | string | No | Folder ID where the video is stored. |
enable_watermark | boolean | No | Apply the HeyGen watermark to the output. Default: false. |
subtitles | object | No | Burned-in subtitle overlay settings (see Subtitles below). |
test | boolean | No | Render in test mode (lower quality, no quota deduction). Default: false. |
Each item in the video_inputs array represents a scene and can contain the following:
character
Defines the avatar or talking photo for the scene. The type field discriminates between avatar and talking_photo; some fields apply to one type only.
| Parameter | Type | Required | Description |
|---|
type | string | Yes | avatar or talking_photo. |
avatar_id | string | Yes* | Unique avatar identifier. Required when type is avatar. |
talking_photo_id | string | Yes* | Unique talking photo identifier. Required when type is talking_photo. |
avatar_style | string | No | normal, closeUp, or circle. Applies only to avatar type. Default: normal. |
talking_photo_style | string | No | circle or square. Applies only to talking_photo type. |
talking_style | string | No | stable or expressive. Applies only to talking_photo type. Default: stable. |
expression | string | No | default or happy. Applies only to talking_photo type. Default: default. |
scale | float | No | Avatar size. Range: 0.0–5.0. Default: 1.0. |
offset | object | No | Position adjustment: { "x": 0.0, "y": 0.0 }. Each axis range: -1.0–1.0. |
fit | string | No | How the character fits inside the scene: contain or cover. Default: contain. |
use_avatar_iv_model | boolean | No | Whether to use Avatar IV. See Avatar engine default change. |
model | string | No | Avatar IV model version (e.g. 4.3, 4.3_turbo, 4.3_turbo_edge). Applies when use_avatar_iv_model is true. |
resolution | string | No | Avatar IV output resolution: 720p, 1080p, or 4k. Applies when use_avatar_iv_model is true. |
prompt | string | No | Avatar IV motion prompt. Applies when use_avatar_iv_model is true. |
keep_original_prompt | boolean | No | Preserve the motion prompt as-is (skip enhancement). Applies when use_avatar_iv_model is true. |
alpha | float | No | Avatar IV expressiveness level. Range: -0.5–0.5. Lower values are more expressive. |
matting | boolean | No | Remove the photo background. |
super_resolution | boolean | No | Enhance image quality. Applies only to talking_photo type. |
circle_background_color | string | No | Hex color for circle/square style background (e.g., #FFFFFF). Default: #F6F6FC. |
use_legacy_photo_avatar_model | boolean | No | Force the deprecated Avatar 3 photo avatar model. Applies only to talking_photo type. Not recommended for new requests. |
voice
Defines what the avatar says in this scene.
| Parameter | Type | Required | Description |
|---|
type | string | Yes | text, audio, or silence. |
voice_id | string | Yes* | Voice identifier. Required for text type. |
input_text | string | Yes* | Text the avatar will speak. Required for text type. |
speed | float | No | Voice speed. Range: 0.5–1.5. Default: 1.0. Applies to text type. |
pitch | float | No | Voice pitch. Range: -50.0–50.0. Default: 0.0. Applies to text type. |
volume | float | No | Voice audio volume. Range: 0.0–1.0 (0.0 silent, 1.0 full). Applies to text type. |
emotion | string | No | Excited, Friendly, Serious, Soothing, Broadcaster, or Angry. Applies to text type. |
locale | string | No | Voice accent/locale (e.g., en-US, pt-BR). Applies to text type. |
audio_url | string | Yes* | URL of uploaded audio. Required for audio type (provide either this or audio_asset_id). |
audio_asset_id | string | Yes* | Asset ID of uploaded audio. Required for audio type (provide either this or audio_url). |
duration | float | No | Silence duration in seconds. Range: 1.0–100.0. Default: 1.0. Applies to silence type. |
elevenlabs_settings | object | No | Advanced ElevenLabs voice settings (see below). Applies to text type. |
ElevenLabs Settings:
| Parameter | Type | Description |
|---|
model | string | ElevenLabs model: eleven_monolingual_v1, eleven_multilingual_v1, eleven_multilingual_v2, eleven_turbo_v2, eleven_turbo_v2_5, or eleven_v3. |
similarity_boost | float | Similarity to original voice. Range: 0.0–1.0. |
stability | float | Voice consistency. Range: 0.0–1.0. For eleven_v3, default is 1.0 and allowed values are 0, 0.5, 1.0. |
style | float | Style intensity. Range: 0.0–1.0. |
background
Defines the scene background.
| Parameter | Type | Required | Description |
|---|
type | string | Yes | color, image, or video. |
value | string | Yes* | Hex color code (e.g., #FFFFFF). Required for color type. Default: #f6f6fc. |
url | string | Yes* | URL of uploaded image/video. Required for image/video type (provide either this or the corresponding asset ID). |
image_asset_id | string | Yes* | Asset ID for image background. Provide either this or url. |
video_asset_id | string | Yes* | Asset ID for video background. Provide either this or url. |
play_style | string | Yes* | Playback mode: freeze, loop, or fit_to_scene. Required for video type. |
fit | string | No | How the background fits the screen: cover, contain, crop, or none. Default: cover. |
volume | float | No | Volume for video backgrounds. Range: 0.0–1.0. Applies to video type. |
text
Optional on-screen text overlay.
| Parameter | Type | Required | Description |
|---|
type | string | Yes | Must be text. |
text | string | Yes | Text content to display. |
font_family | string | No | Font family. Default: Arial. |
font_size | float | No | Font size in points. Default: 24.0. |
font_weight | string | No | Font weight (e.g., bold). Default: bold. |
color | string | No | Text color in hex (e.g., #FFFFFF). Default: #FFFFFF. |
position | object | No | Position offset: { "x": 0.0, "y": 0.0 }. Each axis range: -1.0–1.0. |
text_align | string | No | left, center, or right. Default: center. |
line_height | float | Yes | Line height / spacing between lines. Must be > 0. |
width | float | No | Text container width. |
height | float | No | Text container height. |
rotate | float | No | Rotation angle in degrees. Range: 0–360. |
scale_x | float | No | Horizontal scale. Must be >= 0. |
scale_y | float | No | Vertical scale. Must be >= 0. |
transform_scale_x | float | No | Additional horizontal transform scale. Must be >= 0. |
transform_scale_y | float | No | Additional vertical transform scale. Must be >= 0. |
audio
Optional secondary audio track for this scene (in addition to voice). Useful for background music or sound effects.
| Parameter | Type | Required | Description |
|---|
audio_url | string | Yes* | URL of the uploaded audio. Provide either this or audio_asset_id. |
audio_asset_id | string | Yes* | Asset ID of the uploaded audio. Provide either this or audio_url. |
volume | float | No | Audio volume. Range: 0.0–1.0. Default: 1.0. |
duration | float | No | Audio duration in seconds. |
timeline | object | No | Timeline placement: { "start": 0.0, "duration": 0.0 }. |
Subtitles
| Parameter | Type | Required | Description |
|---|
preset_name | string | Yes | Subtitle preset name. |
alignment | integer | No | Subtitle alignment code. Default: 2. |
disable_highlight | boolean | No | Override the preset’s highlight style. Default: false. |
font_size | integer | No | Font size override. |
position | object | No | Subtitle position: { "x": 0.0, "y": 0.0 }. |
Example Request
{
"title": "My Studio Video",
"caption": false,
"dimension": {
"width": 1920,
"height": 1080
},
"video_inputs": [
{
"character": {
"type": "avatar",
"avatar_id": "YOUR_AVATAR_ID",
"avatar_style": "normal"
},
"voice": {
"type": "text",
"voice_id": "YOUR_VOICE_ID",
"input_text": "Welcome to the first scene of this video.",
"speed": 1.0
},
"background": {
"type": "color",
"value": "#1a1a2e"
}
},
{
"character": {
"type": "avatar",
"avatar_id": "YOUR_AVATAR_ID",
"avatar_style": "closeUp"
},
"voice": {
"type": "text",
"voice_id": "YOUR_VOICE_ID",
"input_text": "And here is the second scene with a different style."
},
"background": {
"type": "color",
"value": "#16213e"
}
}
]
}
Response
200 — Success
{
"error": null,
"data": {
"video_id": "af273759c9xa47369e05418c69drq174"
}
}
| Field | Type | Description |
|---|
error | string | null | Error message if the request fails; null on success. |
data.video_id | string | Unique identifier of the generated video. |
Full API Reference
For complete details, see the Create Avatar Video (V2) endpoint documentation.