LLM APIs

API providers for large language models, including free tiers and local options.

Tool	Category	Segment	Provider	Plan	Monthly Price USD	Billing Model	Free Tier / Trial	Included Usage / Credits	Overages / Top-ups	API Compatibility	Model Access	Reference Model Price Input USD / 1M Tokens	Reference Model Price Output USD / 1M Tokens	Cheap Model Price Input USD / 1M Tokens	Cheap Model Price Output USD / 1M Tokens	Context / Rate Limits	Data Privacy / Training	Best Fit	Main Limits / Caveats
OpenAI API pay-as-you-go No tagline	LLM APIs	Frontier model API	OpenAI API	Pay-as-you-go	$0 subscription	Token-based API usage	No standing free tier	No monthly included usage published on pricing page	Prepaid/API billing by model and feature	Native OpenAI API; broad SDK support	GPT-5.5, GPT-5.1, GPT-5.4 mini, GPT-4.1 family, realtime/audio/image tools	$5.00 in / $30.00 out for GPT-5.5	$30.00	$0.75 in for GPT-5.4 mini	$4.50 out for GPT-5.4 mini	Rate limits depend on account tier and model	API data is governed by OpenAI API data controls; verify org retention settings	Default choice for broad SDK support and frontier models	No free API quota; costs can rise quickly with long context/tool calls
OpenAI Batch API discounted usage No tagline	LLM APIs	Frontier model API	OpenAI API	Batch API	$0 subscription	50 percent lower token price for async batch jobs	✕	No monthly included usage	Same API billing, discounted for batch-compatible workloads	Native OpenAI Batch API	Same supported batchable OpenAI models/features	$2.50 in / $15.00 out equivalent for GPT-5.5 batch	$15.00	$0.375 in equivalent for GPT-5.4 mini batch	$2.25 out equivalent for GPT-5.4 mini batch	Async batch window; not for interactive latency	Same API data controls as standard OpenAI API	Offline evals, document processing, synthetic data, backfills	Not suitable for realtime app UX
Anthropic Claude API pay-as-you-go No tagline	LLM APIs	Frontier model API	Anthropic Claude API	Pay-as-you-go	$0 subscription	Token-based API usage	Console credits/trials vary by account	No fixed monthly included usage published on pricing page	Pay by model; prompt caching and batch discounts available	Anthropic Messages API; SDKs; many gateways support Claude	Claude Opus 4.8, Claude Sonnet 4.5, Claude Haiku 4.5	$5.00 in / $25.00 out for Claude Opus 4.8	$25.00	$1.00 in for Claude Haiku 4.5	$5.00 out for Claude Haiku 4.5	Rate limits depend on API tier; prompt caching available	Anthropic API data policy; verify retention and zero-retention eligibility	Claude-native apps, reasoning, coding, long-context workflows	Model access and limits vary by account and region
Anthropic Message Batches No tagline	LLM APIs	Frontier model API	Anthropic Claude API	Batch API	$0 subscription	50 percent discount for batch processing	✕	No fixed monthly included usage	Batch jobs billed at discounted token prices	Anthropic Message Batches API	Batchable Claude models	$2.50 in / $12.50 out for Claude Opus 4.8 batch	$12.50	$0.50 in for Claude Haiku 4.5 batch	$2.50 out for Claude Haiku 4.5 batch	Async batch processing; not low latency	Same Anthropic API policy; verify retention settings	Large non-interactive processing and eval workloads	Batch output is delayed; not for chat UX
Gemini API Free Tier No tagline	LLM APIs	Frontier model API	Google Gemini API	Free	$0	Free quota by model	✓	Free tier available for selected Gemini API models; limits vary by model and region	Upgrade to paid tier through Google AI Studio / Google Cloud billing	Google Gemini API; OpenAI-compatible endpoint available for some workflows	Gemini 3 Pro Preview, Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.5 Flash-Lite	Free where listed for eligible models	Free where listed for eligible models	$0 on free tier	$0 on free tier	Free tier rate limits are lower and vary by model	Free tier prompts/responses may be used to improve Google products; paid tier not used for training per pricing page	No-cost prototyping with strong models	Free tier quotas can change and are not for sensitive data unless policy is acceptable
Gemini API Paid Tier No tagline	LLM APIs	Frontier model API	Google Gemini API	Paid tier	$0 subscription	Token-based API usage	✕	No monthly included usage; pay per token	Billed by model, context length and modality	Google Gemini API / Google Cloud billing	Gemini 3 Pro Preview, Gemini 2.5 Pro, Flash, Flash-Lite	$2.00 in / $12.00 out for Gemini 3 Pro Preview	$12.00	$0.10 in for Gemini 2.5 Flash	$0.40 out for Gemini 2.5 Flash	Paid tier has higher rate limits than free tier; exact quotas by model	Paid tier inputs/outputs are not used to improve Google products per pricing page	Production apps that need Gemini pricing and Google ecosystem	Long-context, grounding and modality prices differ by model
Mistral La Plateforme API pay-as-you-go No tagline	LLM APIs	European model API	Mistral La Plateforme	Pay-as-you-go	$0 subscription	Token-based API usage	Free tier may require opt-in/data-training settings; verify account	No fixed monthly included usage on pricing table	Pay by model and feature	Mistral API; OpenAI-compatible integrations available through clients/gateways	Mistral Large, Medium, Small, Codestral, Magistral, embedding/OCR/audio models	$2.00 in / $6.00 out for Ministral 3 14B on pricing table	$6.00	$0.10 in for Mistral Small 3.2	$0.30 out for Mistral Small 3.2	Rate limits depend on workspace/tier	EU-focused provider; verify free-tier training opt-in and enterprise privacy needs	Developers wanting European provider and strong open/proprietary models	Model list and prices move often; some products are non-text modalities
GroqCloud Free No tagline	LLM APIs	Fast inference API	GroqCloud	Free	$0	Free developer quota	✓	Free limits by model, e.g. requests/day and tokens/minute in Groq rate-limit docs	Upgrade to Dev Tier / paid usage for higher limits	OpenAI-compatible API surface for many chat workflows	Llama, Qwen, DeepSeek, GPT-OSS, Whisper and other fast-hosted models	$0 on free quota	$0 on free quota	$0 on free quota	$0 on free quota	Free limits are model-specific; examples include RPM/TPM/RPD/TPD limits	Verify Groq data processing and retention terms for production	Ultra-low-latency open model inference and prototypes	Free quota is generous but not guaranteed for production
GroqCloud paid API usage No tagline	LLM APIs	Fast inference API	GroqCloud	Paid usage	$0 subscription	Token-based API usage	✕	No fixed monthly included usage	Pay by model; higher limits through paid tiers	OpenAI-compatible API surface for many chat workflows	Llama, Qwen, DeepSeek, GPT-OSS, Whisper and other hosted models	Model-specific pricing	Model-specific pricing	Model-specific pricing	Model-specific pricing	Paid limits depend on usage tier	Verify Groq data processing and retention terms for production	Production low-latency open-model apps	Official pricing is model-specific and may require console context
OpenRouter Free Models No tagline	LLM APIs	Model gateway	OpenRouter	Free models	$0	Free model quota	✓	Free models and shared quota; local resource noted 20 RPM and many free models	Buy credits for paid models or BYOK routing	OpenAI-compatible gateway API	Hundreds of hosted models from OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek and others	$0 for models marked :free	$0 for models marked :free	$0 for models marked :free	$0 for models marked :free	Free model limits and availability vary by provider	Provider routing and data policy depend on selected model/provider	Trying many models without separate provider accounts	Free models can disappear or throttle; production should pin fallback models
OpenRouter pay-as-you-go credits No tagline	LLM APIs	Model gateway	OpenRouter	Pay-as-you-go	$0 subscription	Prepaid credits / token-based routing	✕	No monthly included usage	Buy credits; model prices pass through with OpenRouter routing	OpenAI-compatible gateway API	Commercial and open models from many providers	Model-specific pass-through pricing	Model-specific pass-through pricing	Model-specific pass-through pricing	Model-specific pass-through pricing	Limits depend on model/provider and account balance	Data handling depends on routed provider; check per-model provider policy	One API key for multi-model routing and fallback	Adds gateway dependency and per-model policy complexity
Hugging Face Free Inference Credits No tagline	LLM APIs	Model gateway	Hugging Face Inference Providers	Free user	$0	Monthly included credits	✓	$0.10 monthly inference credits for free users	Pay-as-you-go after credits where available	Hugging Face routed provider APIs; provider SDKs and HF libraries	Many open models across supported inference providers	Provider/model-specific	Provider/model-specific	Provider/model-specific	Provider/model-specific	Credits and provider availability vary by model/provider	Data policy depends on provider routed through Hugging Face	Light experimentation with open models	Tiny free credit amount; not enough for serious production
Hugging Face PRO Inference Credits No tagline	LLM APIs	Model gateway	Hugging Face Inference Providers	PRO	$9/user	Subscription with monthly credits	✕	$2 monthly inference credits included for PRO users	Pay-as-you-go beyond included credits	Hugging Face routed provider APIs; provider SDKs and HF libraries	Many open models across supported inference providers	Provider/model-specific	Provider/model-specific	Provider/model-specific	Provider/model-specific	Provider quotas and availability vary	Data policy depends on selected provider	Developers already using Hugging Face Hub	Credits are modest; high volume still pay-as-you-go
Hugging Face Team Organization Inference Credits No tagline	LLM APIs	Model gateway	Hugging Face Inference Providers	Team	$20/user	Team subscription with org credits	✕	$2 monthly inference credits per seat included	Pay-as-you-go beyond included credits	Hugging Face routed provider APIs; provider SDKs and HF libraries	Many open models across supported inference providers	Provider/model-specific	Provider/model-specific	Provider/model-specific	Provider/model-specific	Org billing and quotas depend on provider	Data policy depends on selected provider	Small teams using HF org workflows	Enterprise custom options excluded
DeepSeek API pay-as-you-go No tagline	LLM APIs	Low-cost model API	DeepSeek API	Pay-as-you-go	$0 subscription	Token-based API usage	✕	No fixed monthly included usage on pricing page	Pay by model; off-peak discounts may apply	OpenAI-compatible API	deepseek-chat, deepseek-reasoner and related models	$0.56 in / $1.68 out for deepseek-chat standard cache-miss pricing	$1.68	$0.028 in cache-hit for deepseek-chat	$1.68 out	Rate limits depend on account and model	Verify DeepSeek data/security policy before proprietary workloads	Very low-cost reasoning/chat API	Availability and regional policy may matter for commercial use
Cloudflare Workers AI Free No tagline	LLM APIs	Edge/serverless model API	Cloudflare Workers AI	Free	$0	Free daily allocation	✓	10,000 neurons/day free allocation	Upgrade to Workers Paid for higher allocation and pay-as-you-go neurons	Workers AI REST/API bindings; runs inside Cloudflare Workers	Cloudflare-hosted open models including Llama, Qwen, Mistral, Gemma, Whisper and embeddings	Neuron-based, model-specific	Neuron-based, model-specific	Neuron-based, model-specific	Neuron-based, model-specific	Free allocation resets daily	Cloudflare account data/security terms apply	Edge apps and prototypes already on Cloudflare	Pricing unit is neurons, not simple token price
Cloudflare Workers Paid + Workers AI No tagline	LLM APIs	Edge/serverless model API	Cloudflare Workers AI	Paid	$5 account minimum for Workers Paid	Subscription plus usage	✕	Higher Workers platform limits; Workers AI charged by neurons	Pay-as-you-go beyond free allocation	Workers AI REST/API bindings; Cloudflare Workers integration	Cloudflare-hosted open models	Neuron-based, model-specific	Neuron-based, model-specific	Neuron-based, model-specific	Neuron-based, model-specific	Account/platform limits depend on Workers plan	Cloudflare account data/security terms apply	Production edge AI workloads	Neuron pricing is harder to compare against token APIs
Fireworks AI trial credits No tagline	LLM APIs	Open model inference API	Fireworks AI	Trial credits	$0	Signup credit / no payment method path	✓	Local resource and pricing page indicate free/trial credits for new accounts	Move to pay-as-you-go or monthly Fire Pass	OpenAI-compatible API for many serverless models	Open-source and partner models, image/audio and fine-tune options	$0 until credits are exhausted	$0 until credits are exhausted	$0 until credits are exhausted	$0 until credits are exhausted	Limits depend on account and selected model	Verify model/provider data policy and Fireworks retention terms	Testing hosted open models quickly	Trial credit amount can change; verify account console
Fireworks AI serverless pay-as-you-go No tagline	LLM APIs	Open model inference API	Fireworks AI	Pay-as-you-go	$0 subscription	Token/usage-based serverless inference	✕	No fixed monthly included usage	Pay by model; serverless and dedicated deployments available	OpenAI-compatible API for many serverless models	Llama, DeepSeek, Qwen, Mixtral and many open models	Model-specific pricing	Model-specific pricing	Model-specific pricing	Model-specific pricing	Limits depend on account and deployment type	Verify model/provider data policy and Fireworks retention terms	Open model production inference with good latency	Dedicated deployments and enterprise options excluded
Fireworks AI Fire Pass 1 month subscription No tagline	LLM APIs	Open model inference API	Fireworks AI	Fire Pass	$49/user	Monthly subscription / access pass	✕	Fire Pass gives access to Fireworks app/API benefits listed on pricing page	Usage and premium models may still have limits depending on account	Fireworks API and app surfaces	Fireworks-hosted models	Plan-specific	Plan-specific	Plan-specific	Plan-specific	Plan details vary by account/product	Verify data policy for selected model and deployment	Users who want a monthly Fireworks bundle	Not as transparent as pure token PAYG
Together AI serverless pay-as-you-go No tagline	LLM APIs	Open model inference API	Together AI	Pay-as-you-go	$0 subscription	Token-based serverless inference	Trial/promotional credits may vary by account	No fixed public monthly allowance in pricing docs	Pay by model; dedicated endpoints available	OpenAI-compatible API and Together SDK	Meta Llama, Qwen, DeepSeek, Mistral, FLUX and other open models	Model-specific pricing	Model-specific pricing	Model-specific pricing	Model-specific pricing	Rate limits and quotas depend on account/model	Verify Together data retention/training terms for production	Hosted open model inference and fine-tuning ecosystem	Trial credits are account-dependent; dedicated endpoints excluded
Vercel AI Gateway Free No tagline	LLM APIs	Model gateway	Vercel AI Gateway	Free	$0	Monthly included credits	✓	$5/month in included AI Gateway credits on Free per docs	Buy credits / upgrade Vercel plan for more	AI Gateway routes to model providers; Vercel AI SDK friendly	OpenAI, Anthropic, Google, xAI, Groq, Mistral and other supported providers	Provider/model-specific	Provider/model-specific	Provider/model-specific	Provider/model-specific	Usage limited by included credits and provider routing	Data policy depends on Vercel gateway and selected provider	Next.js/Vercel projects needing one AI gateway	Best when already on Vercel; provider policies still matter
Vercel AI Gateway Pro No tagline	LLM APIs	Model gateway	Vercel AI Gateway	Pro plan credits	$20/user Vercel Pro base	Vercel plan plus AI Gateway usage credits	✕	$15/month in included AI Gateway credits on Pro per docs	Buy additional credits; provider/model-specific charges	AI Gateway routes to model providers; Vercel AI SDK friendly	OpenAI, Anthropic, Google, xAI, Groq, Mistral and other supported providers	Provider/model-specific	Provider/model-specific	Provider/model-specific	Provider/model-specific	Usage limited by credits, plan and model/provider routing	Data policy depends on Vercel gateway and selected provider	Production apps deployed on Vercel	Enterprise custom tier excluded
AI21 Studio Free Trial No tagline	LLM APIs	Model API	AI21 Studio	Free Trial	$0	Trial credits	✓	$10 trial credits for 3 months listed on pricing page	Move to pay-as-you-go after credits expire/exhaust	AI21 API	Jamba, Jurassic/AI21 models and task-specific endpoints	$2.00 in / $8.00 out for Jamba Large 1.7	$8.00	$0.20 in for Jamba Mini 1.7	$0.40 out for Jamba Mini 1.7	Rate limits depend on account and model	Verify AI21 data handling terms	Trying Jamba models and AI21 task APIs	Trial expires after stated period
AI21 Studio pay-as-you-go No tagline	LLM APIs	Model API	AI21 Studio	Pay-as-you-go	$0 subscription	Token-based API usage	✕	No fixed monthly included usage	Pay by model after trial	AI21 API	Jamba, Jurassic/AI21 models and task-specific endpoints	$2.00 in / $8.00 out for Jamba Large 1.7	$8.00	$0.20 in for Jamba Mini 1.7	$0.40 out for Jamba Mini 1.7	Rate limits depend on account and model	Verify AI21 data handling terms	Apps needing AI21/Jamba models	Smaller ecosystem than OpenAI/Anthropic/Gemini
Perplexity Sonar API pay-as-you-go No tagline	LLM APIs	Search-grounded LLM API	Perplexity API	Pay-as-you-go	$0 subscription	Token + search/request pricing	✕	No fixed monthly included usage	Pay by model plus search/context features	Perplexity API	Sonar, Sonar Pro, Sonar Reasoning, Sonar Deep Research	$1.00 in / $1.00 out for Sonar Pro text token pricing	$1.00 plus search/request costs	$1.00 in for Sonar	$1.00 out for Sonar	Limits depend on account tier and model	Search data and provider terms apply; verify citations/privacy needs	Grounded answers, research assistants, search-heavy apps	Pricing includes request/search components, not only tokens
Replicate pay-as-you-go No tagline	LLM APIs	Hosted model marketplace	Replicate	Pay-as-you-go	$0 subscription	Usage-based compute/model pricing	Limited free usage may vary by account/model	No fixed monthly included usage	Pay by model runtime/prediction; some models have per-second or per-run pricing	Replicate API and client libraries	Open-source text, image, video, audio and multimodal models	Model/runtime-specific	Model/runtime-specific	Model/runtime-specific	Model/runtime-specific	Limits vary by account and model hardware	Replicate/model owner policies apply	Trying many open models across modalities	Text LLM costs are harder to normalize than pure token APIs
Cohere Trial API Key No tagline	LLM APIs	Enterprise-friendly model API	Cohere	Trial	$0	Trial key / trial limits	✓	Trial API key is limited and non-commercial per local resource; official docs route pricing by model	Upgrade to production/API billing	Cohere API	Command, Embed, Rerank, Aya and related models	$0 until trial quota exhausted	$0 until trial quota exhausted	$0 until trial quota exhausted	$0 until trial quota exhausted	Trial rate limits and monthly request limits apply	Verify Cohere trial/commercial data terms	Testing Command and Rerank APIs	Trial may be non-commercial and quota-limited
Cohere Production API No tagline	LLM APIs	Enterprise-friendly model API	Cohere	Pay-as-you-go	$0 subscription	Token/request-based API usage	✕	No fixed monthly included usage	Pay by model/task	Cohere API	Command, Embed, Rerank, Aya and related models	Model/task-specific	Model/task-specific	Model/task-specific	Model/task-specific	Production limits depend on account/model	Cohere enterprise/privacy posture; verify exact retention setting	RAG apps needing rerank/embedding plus chat	Pricing differs by task; not just chat tokens
NVIDIA NIM API Catalog free credits No tagline	LLM APIs	Accelerated inference API	NVIDIA NIM API Catalog	Free credits	$0	Signup credits / hosted API catalog	✓	Signup credits for NVIDIA-hosted NIM API catalog; local resource notes 1K credits signup	Buy/upgrade through NVIDIA ecosystem or self-host NIM	NVIDIA-hosted API endpoints and NIM containers	Llama, Mistral, Qwen, Nemotron and other NIM-hosted models	$0 until credits exhausted	$0 until credits exhausted	$0 until credits exhausted	$0 until credits exhausted	Credit, RPM and verification requirements apply	NVIDIA terms and selected model policy apply	Trying optimized NIM-hosted open models	Credit system is less transparent than token-price APIs
Cerebras Inference Free No tagline	LLM APIs	Accelerated inference API	Cerebras Inference	Free	$0	Free developer quota	✓	Free usage tier and model rate limits shown in Cerebras pricing/rate-limit docs	Upgrade to Developer / paid usage for higher limits	OpenAI-compatible API	Llama, Qwen, GPT-OSS and Cerebras-hosted fast inference models	$0 on free quota	$0 on free quota	$0 on free quota	$0 on free quota	Free rate limits are model-specific	Verify Cerebras data terms for production	Fast open-model inference experiments	Free quota is not production capacity
Cerebras Inference Developer No tagline	LLM APIs	Accelerated inference API	Cerebras Inference	Developer	$0 subscription	Token-based paid API usage	✕	No fixed monthly included usage	Pay by model/token once paid usage is enabled	OpenAI-compatible API	Llama, Qwen, GPT-OSS and Cerebras-hosted fast inference models	Model-specific pricing	Model-specific pricing	Model-specific pricing	Model-specific pricing	Paid limits higher than free where available	Verify Cerebras data terms for production	Low-latency open-model inference at scale	Exact model prices and limits require current pricing table/console
Ollama local API No tagline	LLM APIs	Local/self-hosted API	Ollama	Local	$0 + hardware	Free local software	✓	Unlimited local usage subject to local hardware	No vendor overage; pay hardware/electricity/cloud GPU if used	Ollama API; OpenAI-compatible endpoint support documented	Local open models such as Llama, Qwen, Mistral, Gemma and custom Modelfiles	$0 software cost	$0 software cost	$0 software cost	$0 software cost	Limited by local CPU/GPU/RAM and model size	Local-first; provider training does not apply unless using remote models	Private prototyping and offline/local workflows	Requires hardware and model management; quality depends on local model
LM Studio local server No tagline	LLM APIs	Local/self-hosted API	LM Studio	Local	$0 + hardware	Free local app/server	✓	Unlimited local usage subject to local hardware	No vendor overage; pay hardware/electricity/cloud GPU if used	OpenAI-like local server API	Local GGUF/open models downloadable through LM Studio	$0 software cost	$0 software cost	$0 software cost	$0 software cost	Limited by local CPU/GPU/RAM and model size	Local-first; no provider training for local inference	Non-technical local API and model testing	Desktop app dependency; production self-hosting needs care
LocalAI self-hosted OpenAI-compatible API No tagline	LLM APIs	Local/self-hosted API	LocalAI	Self-hosted	$0 + hardware	Free open-source software	✓	Unlimited local/self-hosted usage subject to infrastructure	No vendor overage; pay infrastructure only	OpenAI-compatible local API	Runs local LLMs, image/audio models and embeddings depending on setup	$0 software cost	$0 software cost	$0 software cost	$0 software cost	Limited by server hardware and model backend	Self-hosted; data stays on your infrastructure if configured correctly	Teams needing OpenAI-compatible local/private endpoints	Ops burden and performance tuning are on you