Audio
Text-to-speech, speech-to-text, music, and voice cloning.
Tool | Category | Segment | Platform | Plan | Monthly Price USD | Pricing Model | Free Tier / Trial | Included Credits / Usage | Audio Modes | STT / TTS / Voice Support | API Access | Commercial Rights / Privacy | Models / Quality | Latency / Realtime | Team / Collaboration | Best Fit | Main Limits / Caveats |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
No tagline | Audio | AI voice suite | ElevenLabs | Free | $0 | Monthly credit plan | ✓ | 10k credits/month; about 10 TTS minutes listed | TTS, STT, sound effects, voice design, music, productions | TTS and STT included; voice cloning/design limited | ✓ | No commercial license on free plan; account terms apply | High-quality hosted voice models | Streaming and low-latency options exist, but free is limited | Individual account | Testing ElevenLabs voice quality before paying | Credits reset monthly; commercial use requires paid tier |
No tagline | Audio | AI voice suite | ElevenLabs | Starter | $6/mo | Monthly subscription with credits | No; paid plan | 30k credits/month; about 30 TTS minutes listed | TTS, STT, voice cloning, dubbing, music | TTS/STT plus instant voice cloning | ✓ | Commercial license included | High-quality hosted voice models | API and streaming support; credit limits still apply | Individual creator account | Lowest paid ElevenLabs tier with commercial license | Small credit pool; professional voice cloning starts higher |
No tagline | Audio | AI voice suite | ElevenLabs | Creator | $22/mo list; $11 first-month promo displayed | Monthly subscription with credits | No; paid plan | 121k credits/month; about 121 TTS minutes listed | TTS, STT, professional voice cloning, dubbing | TTS/STT plus professional voice cloning | ✓ | Commercial license included; paid credits can roll over up to two months while active | Higher creator-grade voice quality and cloning | Good for creator workflows; 128kbps output listed | Individual creator account | Main creator tier for cloning and recurring content | Displayed promo can differ from ongoing list price; credits vary by model |
No tagline | Audio | AI voice suite | ElevenLabs | Pro | $99/mo | Monthly subscription with credits | No; paid plan | 600k credits/month; about 600 TTS minutes listed | TTS, STT, cloning, studio/API audio | Higher-quality API output and creator features | ✓ | Commercial license included | 192kbps and 44.1kHz PCM API output listed | Higher production throughput than Creator | Individual/pro creator account | Professional narration, apps and voice products needing more quota | Still credit-based; no team seats until Scale |
No tagline | Audio | AI voice suite | ElevenLabs | Scale | $299/mo | Monthly subscription with team seats | No; paid plan | 1.8M credits/month; 3 seats; 3 professional voice clones | TTS, STT, voice cloning, productions, team workspace | Fuller production voice suite | ✓ | Commercial license included | High-volume hosted voice models | Team/project workflows and higher included credits | 3 workspace seats | Small teams producing recurring AI audio | Business/enterprise features and larger seats cost more |
No tagline | Audio | Realtime TTS/STT API | Cartesia | Free | $0 | Monthly included credits | ✓ | Sonic-3.5 TTS about 27 minutes/month; 2 concurrent requests listed | TTS, STT and voice agent building blocks | Sonic TTS; Ink STT on paid plan table | ✓ | Commercial/data terms depend on Cartesia account terms | Sonic-3.5 low-latency TTS | Designed for realtime voice; low-latency positioning | Individual developer account | Trying low-latency TTS before committing | Free usage is small and concurrency-limited |
No tagline | Audio | Realtime TTS/STT API | Cartesia | Starter | $5/mo | Monthly subscription with credits | No; paid plan | Sonic-3.5 about 133 minutes/month; 3 concurrent requests listed | TTS plus basic voice cloning workflows | TTS and selected voice features | ✓ | Commercial/data terms depend on Cartesia account terms | Sonic-3.5; instant voice cloning listed | Realtime-oriented TTS | Individual developer account | Low-cost TTS API for prototypes and small products | STT hours and advanced agent features sit in higher plan areas |
No tagline | Audio | Realtime TTS/STT API | Cartesia | Pro | $49/mo | Monthly subscription with larger credits | No; paid plan | Sonic-3.5 about 1,667 minutes/month; 5 concurrent requests listed | TTS, voice cloning and voice API workflows | TTS with voice cloning and broader API usage | ✓ | Commercial/data terms depend on Cartesia account terms | Sonic-3.5 production TTS | Higher quota/concurrency for realtime voice apps | Individual/pro developer account | Production TTS with predictable monthly included minutes | Exact STT/agent usage may consume credits differently |
No tagline | Audio | Realtime TTS/STT API | Cartesia | Scale | $299/mo | Monthly subscription with high-volume credits | No; paid plan | Sonic-3.5 about 10,667 minutes/month; 15 concurrent requests listed | High-volume TTS/STT/voice agent stack | TTS plus broader speech platform features | ✓ | Commercial/data terms depend on Cartesia account terms | Sonic-3.5 and Ink-2 platform models | Higher concurrency and realtime throughput | Team/production workflow | High-volume voice products needing predictable TTS minutes | Custom enterprise still required for larger terms |
No tagline | Audio | Speech API | Deepgram | PAYG | $0 base | Usage-based API with free credit | ✓ | $200 free credit; then pay-as-you-go | STT, TTS, voice agent API, audio intelligence | Nova/Flux STT, Aura TTS, Voice Agent API | ✓ | Deepgram account terms apply; enterprise terms available separately | Nova-3 STT, Flux STT, Aura TTS | Realtime streaming and voice-agent latency support | Developer/startup account; up to public-model concurrency listed | Production speech API with strong free credit | Free credit is credit-based, not a permanent monthly allowance |
No tagline | Audio | Speech API | Deepgram | Growth | $4K+/year | Annual prepaid credits with discount | No; paid annual commitment | Prepaid credits redeemed against usage; up to 20% savings listed | STT, TTS, voice agent API, audio intelligence | Same public model endpoints with higher WSS concurrency | ✓ | Deepgram account terms apply | Nova-3/Flux/Aura public models | Higher WSS concurrency than PAYG on pricing page | Growing application plan | Teams with predictable speech volume | Annual commitment; not a casual monthly plan |
No tagline | Audio | Speech-to-text API | AssemblyAI | Free | $0 | Free API tier | ✓ | Up to 185 hours pre-recorded audio and 333 hours streaming audio free listed | Pre-recorded STT, realtime STT, speech understanding | STT-focused; voice agent API also listed | ✓ | AssemblyAI account terms apply | Universal STT models | Realtime endpoint available on free tier | Developer account | Testing transcription and speech understanding at meaningful volume | Free tier scope and rate limits can differ from production needs |
No tagline | Audio | Speech-to-text API | AssemblyAI | PAYG | $0 base | Usage-based API | After free tier | Universal-3 Pro $0.21/hr; Universal-2 $0.15/hr; Whisper-Streaming $0.30/hr listed | Pre-recorded STT, realtime STT, speech understanding, voice agents | STT and speech intelligence APIs | ✓ | AssemblyAI account terms apply | Universal-3 Pro and Universal-2 transcription models | Realtime STT available; pricing by audio hour | Developer/product account | Accurate transcription and speech intelligence without subscription | TTS is not the main product; add-ons may cost extra |
No tagline | Audio | Realtime voice API | OpenAI | GPT-Realtime-2 | $0 base | Usage-based audio/text/image tokens | No included free tier on pricing page | Audio $32/1M input tokens, $0.40/1M cached input, $64/1M output tokens | Speech-to-speech realtime voice agents | Native realtime voice model with text/image input support | ✓ | API data and privacy governed by OpenAI API terms | Most capable OpenAI realtime voice model listed | Realtime conversational voice with interruption handling | Developer/API account | Native voice agents inside OpenAI stack | Token-based audio billing is harder to estimate than per-minute STT/TTS |
No tagline | Audio | Realtime STT / translation API | OpenAI | Realtime Whisper / Translate | $0 base | Per-minute realtime audio pricing | No included free tier on pricing page | GPT-Realtime-Whisper $0.017/min; GPT-Realtime-Translate $0.034/min | Live transcription and live speech translation | Realtime STT and translation | ✓ | API data and privacy governed by OpenAI API terms | OpenAI realtime transcription and translation models | Designed for live speech streams | Developer/API account | Live captions, translation and voice products using OpenAI | Separate from full speech-to-speech GPT-Realtime-2 billing |
No tagline | Audio | Text-to-speech API | OpenAI | GPT-4o mini TTS | $0 base | Usage-based TTS tokens | No included free tier on model page | Text input $0.60/1M tokens; audio output $12/1M tokens | TTS generation | TTS only for this model row | ✓ | API data and privacy governed by OpenAI API terms | Steerable OpenAI TTS model | API latency depends on request size and endpoint | Developer/API account | Programmatic TTS when already using OpenAI APIs | 2,000 input token maximum listed on model page |
No tagline | Audio | Fast STT API | Groq | Whisper V3 Large | $0 base | Usage-based API; free console access via limits | Free API key / free plan limits listed separately | $0.111/hour on pricing page; audio minimum 10 seconds/request | Speech-to-text transcription and translation | Whisper-family STT on GroqCloud | ✓ | GroqCloud terms apply; model license/terms apply | Whisper Large V3 hosted on Groq | High speed factor listed; very fast transcription | Developer/API account | Fast, cheap Whisper transcription | Free usage is capped by rate limits; not a monthly credit plan |
No tagline | Audio | Fast STT API | Groq | Whisper V3 Turbo | $0 base | Usage-based API; free console access via limits | Free API key / free plan limits listed separately | $0.04/hour on pricing page; audio minimum 10 seconds/request | Speech-to-text transcription and translation | Whisper-family STT on GroqCloud | ✓ | GroqCloud terms apply; model license/terms apply | Whisper V3 Turbo for lower-cost STT | High-speed hosted inference | Developer/API account | Lowest-cost hosted Whisper-style transcription in this set | Accuracy/features differ from larger model; free plan has rate limits |
No tagline | Audio | Cloud speech API | Google Cloud | Speech-to-Text | $0 base | Usage-based per processed minute | Yes for V1 standard | V1 standard: 60 minutes/month free; then $0.016/min with data logging or $0.024/min without data logging; V2 standard starts $0.016/min | Speech recognition and transcription | STT only | ✓ | Google Cloud terms apply; data logging choice affects pricing | Google standard/chirp speech recognition models | Batch and realtime modes; dynamic batch $0.003/min for V2 standard | Google Cloud project/account | Google Cloud teams needing STT and cloud billing controls | Free tier details differ between V1/V2 and data logging mode |
No tagline | Audio | Cloud speech API | Google Cloud | Text-to-Speech | $0 base | Usage-based per character/token | Yes for several voice classes | Standard/WaveNet 4M chars free; Neural2/Studio/Polyglot 1M free; Chirp 3 HD 1M free; Gemini TTS has no free usage listed | Text-to-speech synthesis | TTS only | ✓ | Google Cloud terms apply | Standard, WaveNet, Neural2, Chirp 3 HD, Gemini TTS | Cloud TTS latency depends on voice/model | Google Cloud project/account | Broad multilingual TTS with large monthly free character pools | Gemini TTS is token-priced and has no free limit on pricing page |
No tagline | Audio | Cloud TTS API | Amazon Polly | PAYG | $0 base | Usage-based per 1M characters | Yes for first 12 months | Free tier: 5M Standard, 1M Neural, 500k Long-Form, 100k Generative chars/month for first 12 months; then Standard $4/1M, Neural $16/1M, Generative $30/1M | Text-to-speech and speech marks | TTS only | ✓ | AWS terms apply | Standard, Neural, Long-Form and Generative voices | Cloud TTS; realtime app latency depends on integration | AWS account/project | Cheap predictable TTS in AWS environments | Free tier is time-limited for new accounts |
No tagline | Audio | Cloud STT API | Amazon Transcribe | PAYG | $0 base | Usage-based per audio minute | Yes for first 12 months | Free tier: 60 minutes/month for 12 months; pay-as-you-go after | Batch and streaming speech transcription | STT only | ✓ | AWS terms apply | Amazon Transcribe speech recognition | Streaming and batch transcription available | AWS account/project | AWS-native transcription and call analytics workflows | Free tier is small and time-limited; exact paid per-minute rate depends on region/tier |
No tagline | Audio | Cloud speech suite | Azure AI Speech | Free F0 | $0 | Free Azure Speech tier | ✓ | 5 audio hours/month STT; 0.5M neural TTS chars/month; 5 audio hours speech translation/month | STT, TTS, speech translation and speech services | STT/TTS/translation; custom limits on F0 | ✓ | Microsoft Azure terms apply | Azure neural speech and transcription services | Realtime transcription free hours listed | Azure account/resource | Testing Azure Speech before pay-as-you-go | Free tier has concurrency and batch restrictions; public page hid some PAYG prices in this crawl |
No tagline | Audio | Local STT | Whisper | Open source | $0 software | Free local/self-hosted software | Yes; local unlimited if you provide hardware | No hosted credits; limited by local CPU/GPU and model size | Speech-to-text and translation | STT/ASR; no TTS | Local CLI/library; no hosted API included | MIT-licensed software; privacy stays local, audio data does not need to leave machine | Whisper models including large variants | Hardware-dependent; not realtime by default without extra tooling | Individual/local self-host | Private offline transcription and batch processing | Requires setup, compute and operational maintenance |
No tagline | Audio | Local TTS | Piper | Open source | $0 software | Free local/self-hosted software | Yes; local unlimited if you provide hardware | No hosted credits; limited by hardware and selected voices | Text-to-speech | TTS only | Local CLI/library; no hosted API included | MIT-licensed software; voice/model licenses may vary | Fast neural TTS aimed at local use | Fast on local hardware; suitable for offline assistants | Individual/local self-host | Lightweight offline TTS with predictable cost | Voice quality and language coverage depend on available voices |
No tagline | Audio | Local TTS / voice cloning | F5-TTS | Open source | $0 software | Free local/self-hosted software | Yes; local unlimited if you provide hardware | No hosted credits; limited by local GPU/CPU and model/license | Text-to-speech and voice cloning/reconstruction workflows | TTS; voice cloning depends on model/use | Local/self-hosted; no official hosted API included | Project/model license terms apply; privacy can stay local | Modern open-source TTS from local resource list | Hardware-dependent generation speed | Individual/local self-host | Experimenting with open-source voice cloning/TTS | Requires model setup and careful rights/consent handling for voice cloning |
No tagline | Audio | Local expressive TTS | Bark | Open source | $0 software | Free local/self-hosted software | Yes; local unlimited if you provide hardware | No hosted credits; limited by local hardware | Text-to-audio / expressive speech generation | TTS-like generative audio; no STT | Local/self-hosted; no official hosted API included | License/model terms apply; privacy can stay local | Expressive local generative speech/audio | Slower and more experimental than production APIs | Individual/local self-host | Creative local speech/audio experiments | Less predictable than API TTS; setup and model management required |