Speech-to-Text (STT) API
Our Accurate Speech-to-Text (STT) API leverages the powerful Coqui STT engine, the successor to Mozilla's DeepSpeech, to provide highly accurate transcriptions for your audio files. Designed for long-form audio, the API supports a variety of common formats, including WAV, MP3, FLAC, and OGG.
Speech-to-Text (STT) API endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST |
Sppech to Text /speech_to_text |
Transcribes spoken audio from a file into text using DeepSpeech. Supported file formats: WAV, MP3, FLAC, OGG. |
Speech-to-Text (STT) API pricing
| Plan | Price | Rate limit | Quotas |
|---|---|---|---|
| BASIC | Free | — |
|
| PRO | $15 / month | 1000 / hour |
|
| ULTRA | $59 / month | 1000 / hour |
|
| MEGA | $149 / month | 1000 / hour |
|