Skip to main content
Legacy Endpoint — This is a legacy endpoint that supports scene-by-scene video generation. The V3 APIs do not offer scene-by-scene generation.

Overview

POST https://api.heygen.com/v2/video/generate Generates videos using the AI Studio backend with support for avatars, voices, and dynamic backgrounds. You can create videos using either your photo avatar or digital twin. This endpoint supports Avatar III and Avatar IV. Each video is composed of one or more scenes (up to 50), where each scene defines its own avatar, voice, background, and on-screen text.

Authentication

Include your API key in the request header:
HeaderValue
x-api-keyYour HeyGen API key
Content-Typeapplication/json

Request Body

Top-Level Parameters

ParameterTypeRequiredDescription
video_inputsarrayYesArray of scene objects (1–50). Each scene defines an avatar, voice, background, and optional text.
captionbooleanNoEnable captions in the video. Only supported for text-based voice input. Default: false.
titlestringNoTitle of the video.
callback_idstringNoCustom ID for callback/webhook tracking.
dimensionobjectNoCustom output dimensions. Defaults to 1920×1080.
dimension.widthintegerNoWidth of the output video. Default: 1920.
dimension.heightintegerNoHeight of the output video. Default: 1080.
folder_idstringNoFolder ID where the video is stored.
callback_urlstringNoURL to notify when video rendering is complete.

Scene Object (video_inputs[])

Each item in the video_inputs array represents a scene and can contain the following:

character

Defines the avatar or talking photo for the scene.
ParameterTypeRequiredDescription
typestringYesavatar or talking_photo.
avatar_idstringYes*Unique avatar identifier. Required when type is avatar.
talking_photo_idstringYes*Unique talking photo identifier. Required when type is talking_photo.
avatar_stylestringNonormal, closeUp, or circle. Applies only to avatar type. Default: normal.
talking_photo_stylestringNocircle. Applies only to talking_photo type.
talking_stylestringNostable or expressive. Applies only to talking_photo type. Default: stable.
expressionstringNodefault or happy. Applies only to talking_photo type.
scalefloatNoAvatar size. Range: 0.05.0. Default: 1.
offsetobjectNoPosition adjustment: { "x": 0.0, "y": 0.0 }.
use_avatar_iv_modelbooleanNoWhether to use Avatar IV.
promptstringNoAvatar IV motion prompt. Applies to talking_photo type when use_avatar_iv_model is true.
keep_original_promptbooleanNoPreserve motion prompt as-is (skip enhancement). Applies when use_avatar_iv_model is true.
mattingbooleanNoRemove photo background.
super_resolutionbooleanNoEnhance image quality. Applies only to talking_photo type.
circle_background_colorstringNoHex color for circle style background (e.g., #FFFFFF).

voice

Defines what the avatar says in this scene.
ParameterTypeRequiredDescription
typestringYestext, audio, or silence.
voice_idstringYes*Voice identifier. Required for text type.
input_textstringYes*Text the avatar will speak. Required for text type.
speedfloatNoVoice speed. Range: 0.51.5. Default: 1. Applies to text type.
pitchintegerNoVoice pitch. Range: -5050. Default: 0. Applies to text type.
emotionstringNoExcited, Friendly, Serious, Soothing, or Broadcaster. Applies to text type.
localestringNoVoice accent/locale (e.g., en-US, pt-BR). Applies to text type.
audio_urlstringYes*URL of uploaded audio. Required for audio type (provide either this or audio_asset_id).
audio_asset_idstringYes*Asset ID of uploaded audio. Required for audio type (provide either this or audio_url).
durationstringNoSilence duration in seconds. Range: 1.0100.0. Default: 1. Applies to silence type.
elevenlabs_settingsobjectNoAdvanced ElevenLabs voice settings (see below). Applies to text type.
ElevenLabs Settings:
ParameterTypeDescription
modelstringElevenLabs model: eleven_monolingual_v1, eleven_multilingual_v1, eleven_multilingual_v2, eleven_turbo_v2, eleven_turbo_v2_5, or eleven_v3.
similarity_boostfloatSimilarity to original voice. Range: 0.01.0.
stabilityfloatVoice consistency. Range: 0.01.0. For eleven_v3, default is 1.0 and allowed values are 0, 0.5, 1.0.
stylefloatStyle intensity. Range: 0.01.0.

background

Defines the scene background.
ParameterTypeRequiredDescription
typestringYescolor, image, or video.
valuestringYes*Hex color code (e.g., #FFFFFF). Required for color type.
urlstringYes*URL of uploaded image/video. Required for image/video type (provide either this or the corresponding asset ID).
image_asset_idstringYes*Asset ID for image background. Provide either this or url.
video_asset_idstringYes*Asset ID for video background. Provide either this or url.
play_stylestringNoPlayback mode: freeze, loop, or fit_to_scene. Applies to video type.
fitstringNoHow background fits the screen: crop, cover, contain, or none. Default: cover.

text

Optional on-screen text overlay.
ParameterTypeRequiredDescription
typestringYesMust be text.
textstringYesText content to display.
font_familystringNoFont family (e.g., Arial).
font_sizefloatNoFont size in points.
font_weightstringNobold.
colorstringNoText color in hex (e.g., #FFFFFF).
positionobjectNoPosition: { "x": 0.0, "y": 0.0 }.
text_alignstringNoleft, center, or right.
line_heightfloatYesLine height / spacing between lines.
widthnumberNoText container width.

Example Request

{
  "title": "My Legacy Video",
  "caption": false,
  "dimension": {
    "width": 1920,
    "height": 1080
  },
  "video_inputs": [
    {
      "character": {
        "type": "avatar",
        "avatar_id": "YOUR_AVATAR_ID",
        "avatar_style": "normal"
      },
      "voice": {
        "type": "text",
        "voice_id": "YOUR_VOICE_ID",
        "input_text": "Welcome to the first scene of this video.",
        "speed": 1.0
      },
      "background": {
        "type": "color",
        "value": "#1a1a2e"
      }
    },
    {
      "character": {
        "type": "avatar",
        "avatar_id": "YOUR_AVATAR_ID",
        "avatar_style": "closeUp"
      },
      "voice": {
        "type": "text",
        "voice_id": "YOUR_VOICE_ID",
        "input_text": "And here is the second scene with a different style."
      },
      "background": {
        "type": "color",
        "value": "#16213e"
      }
    }
  ]
}

Response

200 — Success

{
  "error": null,
  "data": {
    "video_id": "af273759c9xa47369e05418c69drq174"
  }
}
FieldTypeDescription
errorstring | nullError message if the request fails; null on success.
data.video_idstringUnique identifier of the generated video.

Full API Reference

For complete details, see the Create Avatar Video (V2) endpoint documentation.