Skip to main content
  • Endpoint: POST https://api.heygen.com/v3/voices/speech
  • Purpose: Synthesize speech audio from text using a specified voice. Returns a URL to the generated audio file along with duration and optional word-level timestamps.

Authentication

HeaderValue
X-Api-KeyYour HeyGen API key
AuthorizationBearer YOUR_ACCESS_TOKEN (OAuth)

Quick Example

curl -X POST "https://api.heygen.com/v3/voices/speech" \
  -H "X-Api-Key: $HEYGEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello from HeyGen!",
    "voice_id": "voice_abc123"
  }'

Request Parameters

ParameterTypeRequiredDefaultDescription
textstringYesText to synthesize (1–5,000 characters).
voice_idstringYesVoice ID to use. Get available IDs from GET /v3/voices.
input_typestringNo"text""text" for plain text or "ssml" for SSML markup.
speednumberNo1.0Speed multiplier (0.5–2.0).
languagestringNoauto-detectedBase language code (e.g. "en", "pt", "zh"). Auto-detected from text when omitted.
localestringNoBCP-47 locale tag (e.g. "en-US", "pt-BR"). When set, language is inferred from locale.

Response Fields

FieldTypeDescription
audio_urlstringURL of the generated audio file.
durationnumberDuration of the audio in seconds.
request_idstring or nullUnique identifier for this generation request.
word_timestampsarray or nullWord-level timing data (see below).
Each entry in word_timestamps:
FieldTypeDescription
wordstringThe word.
startnumberStart time in seconds.
endnumberEnd time in seconds.

SSML Support

For finer control over pronunciation, pauses, and emphasis, set input_type to "ssml" and pass SSML markup in the text field. Check support_pause on the voice object (from GET /v3/voices) to confirm the voice supports SSML pause/break tags.
curl -X POST "https://api.heygen.com/v3/voices/speech" \
  -H "X-Api-Key: $HEYGEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "<speak>Welcome to HeyGen. <break time=\"500ms\"/> Let us get started.</speak>",
    "voice_id": "voice_abc123",
    "input_type": "ssml"
  }'

Speed and Locale

Adjust speaking speed and accent with speed and locale:
{
  "text": "This is a demonstration of slower speech.",
  "voice_id": "voice_abc123",
  "speed": 0.75,
  "locale": "en-GB"
}

Finding a Voice

Use GET /v3/voices to browse available voices. To get only TTS-compatible voices, filter by the starfish engine:
curl -X GET "https://api.heygen.com/v3/voices?engine=starfish&language=English&gender=female&limit=5" \
  -H "X-Api-Key: $HEYGEN_API_KEY"
See Voices for full filtering and pagination details.

Summary

StepEndpointPurpose
1GET /v3/voicesList voices → get a voice_id
2POST /v3/voices/speechSend text + voice_id → get audio URL