- Endpoint:
POST https://api.heygen.com/v3/voices/speech - Purpose: Synthesize speech audio from text using a specified voice. Returns a URL to the generated audio file along with duration and optional word-level timestamps.
Authentication
| Header | Value |
|---|---|
X-Api-Key | Your HeyGen API key |
Authorization | Bearer YOUR_ACCESS_TOKEN (OAuth) |
Quick Example
Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
text | string | Yes | — | Text to synthesize (1–5,000 characters). |
voice_id | string | Yes | — | Voice ID to use. Get available IDs from GET /v3/voices. |
input_type | string | No | "text" | "text" for plain text or "ssml" for SSML markup. |
speed | number | No | 1.0 | Speed multiplier (0.5–2.0). |
language | string | No | auto-detected | Base language code (e.g. "en", "pt", "zh"). Auto-detected from text when omitted. |
locale | string | No | — | BCP-47 locale tag (e.g. "en-US", "pt-BR"). When set, language is inferred from locale. |
Response Fields
| Field | Type | Description |
|---|---|---|
audio_url | string | URL of the generated audio file. |
duration | number | Duration of the audio in seconds. |
request_id | string or null | Unique identifier for this generation request. |
word_timestamps | array or null | Word-level timing data (see below). |
word_timestamps:
| Field | Type | Description |
|---|---|---|
word | string | The word. |
start | number | Start time in seconds. |
end | number | End time in seconds. |
SSML Support
For finer control over pronunciation, pauses, and emphasis, setinput_type to "ssml" and pass SSML markup in the text field. Check support_pause on the voice object (from GET /v3/voices) to confirm the voice supports SSML pause/break tags.
Speed and Locale
Adjust speaking speed and accent withspeed and locale:
Finding a Voice
UseGET /v3/voices to browse available voices. To get only TTS-compatible voices, filter by the starfish engine:
Summary
| Step | Endpoint | Purpose |
|---|---|---|
| 1 | GET /v3/voices | List voices → get a voice_id |
| 2 | POST /v3/voices/speech | Send text + voice_id → get audio URL |

