Text-to-Speech API
High-quality neural text-to-speech with 320+ voices in 75+ languages. Convert any text to natural-sounding audio in seconds. Features: SSML markup support, real-time streaming, batch processing (up to 10 texts), word-level timestamps, volume/speed/pitch control. Filter voices by gender, personality, category, or search by name. 6 endpoints: voices listing, text-to-speech, streaming, batch,…
Text-to-Speech API endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST |
POST /v1/tts/batch /v1/tts/batch |
Submit up to 10 TTS requests in a single call. Returns base64-encoded audio for each item. Requires Pro subscription. Supports SSML per item. |
| POST |
POST /v1/tts/stream /v1/tts/stream |
Stream TTS audio in real-time as chunked data. Lower latency than /v1/tts for large texts. Returns binary audio stream (audio/mpeg). Supports SSML markup when ssml=true. Fields:… |
| POST |
POST /v1/tts/timestamps /v1/tts/timestamps |
Convert text to speech with word-level timestamps. Returns audio as base64 and an array of word boundaries with offset/duration in ms. Fields: - text (string, required): Text to… |
| POST |
POST /v1/tts /v1/tts |
Convert text to speech. Returns binary audio data. Fields: - text (string, required): Text to convert (1-5000 chars) - voice (string): Voice name, default "en-US-AriaNeural" (see… |
| GET |
GET /v1/health /v1/health |
|
| GET |
GET /v1/voices /v1/voices |
List all available voices. Optionally filter by language code (e.g. 'fr', 'en', 'es'). |