LLM APIs

API providers for large language models, including free tiers and local options.

Tool
Category
Segment
Provider
Plan
Monthly Price USD
Billing Model
Free Tier / Trial
Included Usage / Credits
Overages / Top-ups
API Compatibility
Model Access
Reference Model Price Input USD / 1M Tokens
Reference Model Price Output USD / 1M Tokens
Cheap Model Price Input USD / 1M Tokens
Cheap Model Price Output USD / 1M Tokens
Context / Rate Limits
Data Privacy / Training
Best Fit
Main Limits / Caveats
No tagline
LLM APIsFrontier model APIOpenAI APIPay-as-you-go$0 subscriptionToken-based API usageNo standing free tierNo monthly included usage published on pricing pagePrepaid/API billing by model and featureNative OpenAI API; broad SDK supportGPT-5.5, GPT-5.1, GPT-5.4 mini, GPT-4.1 family, realtime/audio/image tools$5.00 in / $30.00 out for GPT-5.5$30.00$0.75 in for GPT-5.4 mini$4.50 out for GPT-5.4 miniRate limits depend on account tier and modelAPI data is governed by OpenAI API data controls; verify org retention settingsDefault choice for broad SDK support and frontier modelsNo free API quota; costs can rise quickly with long context/tool calls
No tagline
LLM APIsFrontier model APIOpenAI APIBatch API$0 subscription50 percent lower token price for async batch jobsNo monthly included usageSame API billing, discounted for batch-compatible workloadsNative OpenAI Batch APISame supported batchable OpenAI models/features$2.50 in / $15.00 out equivalent for GPT-5.5 batch$15.00$0.375 in equivalent for GPT-5.4 mini batch$2.25 out equivalent for GPT-5.4 mini batchAsync batch window; not for interactive latencySame API data controls as standard OpenAI APIOffline evals, document processing, synthetic data, backfillsNot suitable for realtime app UX
No tagline
LLM APIsFrontier model APIAnthropic Claude APIPay-as-you-go$0 subscriptionToken-based API usageConsole credits/trials vary by accountNo fixed monthly included usage published on pricing pagePay by model; prompt caching and batch discounts availableAnthropic Messages API; SDKs; many gateways support ClaudeClaude Opus 4.8, Claude Sonnet 4.5, Claude Haiku 4.5$5.00 in / $25.00 out for Claude Opus 4.8$25.00$1.00 in for Claude Haiku 4.5$5.00 out for Claude Haiku 4.5Rate limits depend on API tier; prompt caching availableAnthropic API data policy; verify retention and zero-retention eligibilityClaude-native apps, reasoning, coding, long-context workflowsModel access and limits vary by account and region
No tagline
LLM APIsFrontier model APIAnthropic Claude APIBatch API$0 subscription50 percent discount for batch processingNo fixed monthly included usageBatch jobs billed at discounted token pricesAnthropic Message Batches APIBatchable Claude models$2.50 in / $12.50 out for Claude Opus 4.8 batch$12.50$0.50 in for Claude Haiku 4.5 batch$2.50 out for Claude Haiku 4.5 batchAsync batch processing; not low latencySame Anthropic API policy; verify retention settingsLarge non-interactive processing and eval workloadsBatch output is delayed; not for chat UX
No tagline
LLM APIsFrontier model APIGoogle Gemini APIFree$0Free quota by modelFree tier available for selected Gemini API models; limits vary by model and regionUpgrade to paid tier through Google AI Studio / Google Cloud billingGoogle Gemini API; OpenAI-compatible endpoint available for some workflowsGemini 3 Pro Preview, Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.5 Flash-LiteFree where listed for eligible modelsFree where listed for eligible models$0 on free tier$0 on free tierFree tier rate limits are lower and vary by modelFree tier prompts/responses may be used to improve Google products; paid tier not used for training per pricing pageNo-cost prototyping with strong modelsFree tier quotas can change and are not for sensitive data unless policy is acceptable
No tagline
LLM APIsFrontier model APIGoogle Gemini APIPaid tier$0 subscriptionToken-based API usageNo monthly included usage; pay per tokenBilled by model, context length and modalityGoogle Gemini API / Google Cloud billingGemini 3 Pro Preview, Gemini 2.5 Pro, Flash, Flash-Lite$2.00 in / $12.00 out for Gemini 3 Pro Preview$12.00$0.10 in for Gemini 2.5 Flash$0.40 out for Gemini 2.5 FlashPaid tier has higher rate limits than free tier; exact quotas by modelPaid tier inputs/outputs are not used to improve Google products per pricing pageProduction apps that need Gemini pricing and Google ecosystemLong-context, grounding and modality prices differ by model
No tagline
LLM APIsEuropean model APIMistral La PlateformePay-as-you-go$0 subscriptionToken-based API usageFree tier may require opt-in/data-training settings; verify accountNo fixed monthly included usage on pricing tablePay by model and featureMistral API; OpenAI-compatible integrations available through clients/gatewaysMistral Large, Medium, Small, Codestral, Magistral, embedding/OCR/audio models$2.00 in / $6.00 out for Ministral 3 14B on pricing table$6.00$0.10 in for Mistral Small 3.2$0.30 out for Mistral Small 3.2Rate limits depend on workspace/tierEU-focused provider; verify free-tier training opt-in and enterprise privacy needsDevelopers wanting European provider and strong open/proprietary modelsModel list and prices move often; some products are non-text modalities
No tagline
LLM APIsFast inference APIGroqCloudFree$0Free developer quotaFree limits by model, e.g. requests/day and tokens/minute in Groq rate-limit docsUpgrade to Dev Tier / paid usage for higher limitsOpenAI-compatible API surface for many chat workflowsLlama, Qwen, DeepSeek, GPT-OSS, Whisper and other fast-hosted models$0 on free quota$0 on free quota$0 on free quota$0 on free quotaFree limits are model-specific; examples include RPM/TPM/RPD/TPD limitsVerify Groq data processing and retention terms for productionUltra-low-latency open model inference and prototypesFree quota is generous but not guaranteed for production
No tagline
LLM APIsFast inference APIGroqCloudPaid usage$0 subscriptionToken-based API usageNo fixed monthly included usagePay by model; higher limits through paid tiersOpenAI-compatible API surface for many chat workflowsLlama, Qwen, DeepSeek, GPT-OSS, Whisper and other hosted modelsModel-specific pricingModel-specific pricingModel-specific pricingModel-specific pricingPaid limits depend on usage tierVerify Groq data processing and retention terms for productionProduction low-latency open-model appsOfficial pricing is model-specific and may require console context
No tagline
LLM APIsModel gatewayOpenRouterFree models$0Free model quotaFree models and shared quota; local resource noted 20 RPM and many free modelsBuy credits for paid models or BYOK routingOpenAI-compatible gateway APIHundreds of hosted models from OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek and others$0 for models marked :free$0 for models marked :free$0 for models marked :free$0 for models marked :freeFree model limits and availability vary by providerProvider routing and data policy depend on selected model/providerTrying many models without separate provider accountsFree models can disappear or throttle; production should pin fallback models
No tagline
LLM APIsModel gatewayOpenRouterPay-as-you-go$0 subscriptionPrepaid credits / token-based routingNo monthly included usageBuy credits; model prices pass through with OpenRouter routingOpenAI-compatible gateway APICommercial and open models from many providersModel-specific pass-through pricingModel-specific pass-through pricingModel-specific pass-through pricingModel-specific pass-through pricingLimits depend on model/provider and account balanceData handling depends on routed provider; check per-model provider policyOne API key for multi-model routing and fallbackAdds gateway dependency and per-model policy complexity
No tagline
LLM APIsModel gatewayHugging Face Inference ProvidersFree user$0Monthly included credits$0.10 monthly inference credits for free usersPay-as-you-go after credits where availableHugging Face routed provider APIs; provider SDKs and HF librariesMany open models across supported inference providersProvider/model-specificProvider/model-specificProvider/model-specificProvider/model-specificCredits and provider availability vary by model/providerData policy depends on provider routed through Hugging FaceLight experimentation with open modelsTiny free credit amount; not enough for serious production
No tagline
LLM APIsModel gatewayHugging Face Inference ProvidersPRO$9/userSubscription with monthly credits$2 monthly inference credits included for PRO usersPay-as-you-go beyond included creditsHugging Face routed provider APIs; provider SDKs and HF librariesMany open models across supported inference providersProvider/model-specificProvider/model-specificProvider/model-specificProvider/model-specificProvider quotas and availability varyData policy depends on selected providerDevelopers already using Hugging Face HubCredits are modest; high volume still pay-as-you-go
No tagline
LLM APIsModel gatewayHugging Face Inference ProvidersTeam$20/userTeam subscription with org credits$2 monthly inference credits per seat includedPay-as-you-go beyond included creditsHugging Face routed provider APIs; provider SDKs and HF librariesMany open models across supported inference providersProvider/model-specificProvider/model-specificProvider/model-specificProvider/model-specificOrg billing and quotas depend on providerData policy depends on selected providerSmall teams using HF org workflowsEnterprise custom options excluded
No tagline
LLM APIsLow-cost model APIDeepSeek APIPay-as-you-go$0 subscriptionToken-based API usageNo fixed monthly included usage on pricing pagePay by model; off-peak discounts may applyOpenAI-compatible APIdeepseek-chat, deepseek-reasoner and related models$0.56 in / $1.68 out for deepseek-chat standard cache-miss pricing$1.68$0.028 in cache-hit for deepseek-chat$1.68 outRate limits depend on account and modelVerify DeepSeek data/security policy before proprietary workloadsVery low-cost reasoning/chat APIAvailability and regional policy may matter for commercial use
No tagline
LLM APIsEdge/serverless model APICloudflare Workers AIFree$0Free daily allocation10,000 neurons/day free allocationUpgrade to Workers Paid for higher allocation and pay-as-you-go neuronsWorkers AI REST/API bindings; runs inside Cloudflare WorkersCloudflare-hosted open models including Llama, Qwen, Mistral, Gemma, Whisper and embeddingsNeuron-based, model-specificNeuron-based, model-specificNeuron-based, model-specificNeuron-based, model-specificFree allocation resets dailyCloudflare account data/security terms applyEdge apps and prototypes already on CloudflarePricing unit is neurons, not simple token price
No tagline
LLM APIsEdge/serverless model APICloudflare Workers AIPaid$5 account minimum for Workers PaidSubscription plus usageHigher Workers platform limits; Workers AI charged by neuronsPay-as-you-go beyond free allocationWorkers AI REST/API bindings; Cloudflare Workers integrationCloudflare-hosted open modelsNeuron-based, model-specificNeuron-based, model-specificNeuron-based, model-specificNeuron-based, model-specificAccount/platform limits depend on Workers planCloudflare account data/security terms applyProduction edge AI workloadsNeuron pricing is harder to compare against token APIs
No tagline
LLM APIsOpen model inference APIFireworks AITrial credits$0Signup credit / no payment method pathLocal resource and pricing page indicate free/trial credits for new accountsMove to pay-as-you-go or monthly Fire PassOpenAI-compatible API for many serverless modelsOpen-source and partner models, image/audio and fine-tune options$0 until credits are exhausted$0 until credits are exhausted$0 until credits are exhausted$0 until credits are exhaustedLimits depend on account and selected modelVerify model/provider data policy and Fireworks retention termsTesting hosted open models quicklyTrial credit amount can change; verify account console
No tagline
LLM APIsOpen model inference APIFireworks AIPay-as-you-go$0 subscriptionToken/usage-based serverless inferenceNo fixed monthly included usagePay by model; serverless and dedicated deployments availableOpenAI-compatible API for many serverless modelsLlama, DeepSeek, Qwen, Mixtral and many open modelsModel-specific pricingModel-specific pricingModel-specific pricingModel-specific pricingLimits depend on account and deployment typeVerify model/provider data policy and Fireworks retention termsOpen model production inference with good latencyDedicated deployments and enterprise options excluded
No tagline
LLM APIsOpen model inference APIFireworks AIFire Pass$49/userMonthly subscription / access passFire Pass gives access to Fireworks app/API benefits listed on pricing pageUsage and premium models may still have limits depending on accountFireworks API and app surfacesFireworks-hosted modelsPlan-specificPlan-specificPlan-specificPlan-specificPlan details vary by account/productVerify data policy for selected model and deploymentUsers who want a monthly Fireworks bundleNot as transparent as pure token PAYG
No tagline
LLM APIsOpen model inference APITogether AIPay-as-you-go$0 subscriptionToken-based serverless inferenceTrial/promotional credits may vary by accountNo fixed public monthly allowance in pricing docsPay by model; dedicated endpoints availableOpenAI-compatible API and Together SDKMeta Llama, Qwen, DeepSeek, Mistral, FLUX and other open modelsModel-specific pricingModel-specific pricingModel-specific pricingModel-specific pricingRate limits and quotas depend on account/modelVerify Together data retention/training terms for productionHosted open model inference and fine-tuning ecosystemTrial credits are account-dependent; dedicated endpoints excluded
No tagline
LLM APIsModel gatewayVercel AI GatewayFree$0Monthly included credits$5/month in included AI Gateway credits on Free per docsBuy credits / upgrade Vercel plan for moreAI Gateway routes to model providers; Vercel AI SDK friendlyOpenAI, Anthropic, Google, xAI, Groq, Mistral and other supported providersProvider/model-specificProvider/model-specificProvider/model-specificProvider/model-specificUsage limited by included credits and provider routingData policy depends on Vercel gateway and selected providerNext.js/Vercel projects needing one AI gatewayBest when already on Vercel; provider policies still matter
No tagline
LLM APIsModel gatewayVercel AI GatewayPro plan credits$20/user Vercel Pro baseVercel plan plus AI Gateway usage credits$15/month in included AI Gateway credits on Pro per docsBuy additional credits; provider/model-specific chargesAI Gateway routes to model providers; Vercel AI SDK friendlyOpenAI, Anthropic, Google, xAI, Groq, Mistral and other supported providersProvider/model-specificProvider/model-specificProvider/model-specificProvider/model-specificUsage limited by credits, plan and model/provider routingData policy depends on Vercel gateway and selected providerProduction apps deployed on VercelEnterprise custom tier excluded
No tagline
LLM APIsModel APIAI21 StudioFree Trial$0Trial credits$10 trial credits for 3 months listed on pricing pageMove to pay-as-you-go after credits expire/exhaustAI21 APIJamba, Jurassic/AI21 models and task-specific endpoints$2.00 in / $8.00 out for Jamba Large 1.7$8.00$0.20 in for Jamba Mini 1.7$0.40 out for Jamba Mini 1.7Rate limits depend on account and modelVerify AI21 data handling termsTrying Jamba models and AI21 task APIsTrial expires after stated period
No tagline
LLM APIsModel APIAI21 StudioPay-as-you-go$0 subscriptionToken-based API usageNo fixed monthly included usagePay by model after trialAI21 APIJamba, Jurassic/AI21 models and task-specific endpoints$2.00 in / $8.00 out for Jamba Large 1.7$8.00$0.20 in for Jamba Mini 1.7$0.40 out for Jamba Mini 1.7Rate limits depend on account and modelVerify AI21 data handling termsApps needing AI21/Jamba modelsSmaller ecosystem than OpenAI/Anthropic/Gemini
No tagline
LLM APIsSearch-grounded LLM APIPerplexity APIPay-as-you-go$0 subscriptionToken + search/request pricingNo fixed monthly included usagePay by model plus search/context featuresPerplexity APISonar, Sonar Pro, Sonar Reasoning, Sonar Deep Research$1.00 in / $1.00 out for Sonar Pro text token pricing$1.00 plus search/request costs$1.00 in for Sonar$1.00 out for SonarLimits depend on account tier and modelSearch data and provider terms apply; verify citations/privacy needsGrounded answers, research assistants, search-heavy appsPricing includes request/search components, not only tokens
No tagline
LLM APIsHosted model marketplaceReplicatePay-as-you-go$0 subscriptionUsage-based compute/model pricingLimited free usage may vary by account/modelNo fixed monthly included usagePay by model runtime/prediction; some models have per-second or per-run pricingReplicate API and client librariesOpen-source text, image, video, audio and multimodal modelsModel/runtime-specificModel/runtime-specificModel/runtime-specificModel/runtime-specificLimits vary by account and model hardwareReplicate/model owner policies applyTrying many open models across modalitiesText LLM costs are harder to normalize than pure token APIs
No tagline
LLM APIsEnterprise-friendly model APICohereTrial$0Trial key / trial limitsTrial API key is limited and non-commercial per local resource; official docs route pricing by modelUpgrade to production/API billingCohere APICommand, Embed, Rerank, Aya and related models$0 until trial quota exhausted$0 until trial quota exhausted$0 until trial quota exhausted$0 until trial quota exhaustedTrial rate limits and monthly request limits applyVerify Cohere trial/commercial data termsTesting Command and Rerank APIsTrial may be non-commercial and quota-limited
No tagline
LLM APIsEnterprise-friendly model APICoherePay-as-you-go$0 subscriptionToken/request-based API usageNo fixed monthly included usagePay by model/taskCohere APICommand, Embed, Rerank, Aya and related modelsModel/task-specificModel/task-specificModel/task-specificModel/task-specificProduction limits depend on account/modelCohere enterprise/privacy posture; verify exact retention settingRAG apps needing rerank/embedding plus chatPricing differs by task; not just chat tokens
No tagline
LLM APIsAccelerated inference APINVIDIA NIM API CatalogFree credits$0Signup credits / hosted API catalogSignup credits for NVIDIA-hosted NIM API catalog; local resource notes 1K credits signupBuy/upgrade through NVIDIA ecosystem or self-host NIMNVIDIA-hosted API endpoints and NIM containersLlama, Mistral, Qwen, Nemotron and other NIM-hosted models$0 until credits exhausted$0 until credits exhausted$0 until credits exhausted$0 until credits exhaustedCredit, RPM and verification requirements applyNVIDIA terms and selected model policy applyTrying optimized NIM-hosted open modelsCredit system is less transparent than token-price APIs
No tagline
LLM APIsAccelerated inference APICerebras InferenceFree$0Free developer quotaFree usage tier and model rate limits shown in Cerebras pricing/rate-limit docsUpgrade to Developer / paid usage for higher limitsOpenAI-compatible APILlama, Qwen, GPT-OSS and Cerebras-hosted fast inference models$0 on free quota$0 on free quota$0 on free quota$0 on free quotaFree rate limits are model-specificVerify Cerebras data terms for productionFast open-model inference experimentsFree quota is not production capacity
No tagline
LLM APIsAccelerated inference APICerebras InferenceDeveloper$0 subscriptionToken-based paid API usageNo fixed monthly included usagePay by model/token once paid usage is enabledOpenAI-compatible APILlama, Qwen, GPT-OSS and Cerebras-hosted fast inference modelsModel-specific pricingModel-specific pricingModel-specific pricingModel-specific pricingPaid limits higher than free where availableVerify Cerebras data terms for productionLow-latency open-model inference at scaleExact model prices and limits require current pricing table/console
No tagline
LLM APIsLocal/self-hosted APIOllamaLocal$0 + hardwareFree local softwareUnlimited local usage subject to local hardwareNo vendor overage; pay hardware/electricity/cloud GPU if usedOllama API; OpenAI-compatible endpoint support documentedLocal open models such as Llama, Qwen, Mistral, Gemma and custom Modelfiles$0 software cost$0 software cost$0 software cost$0 software costLimited by local CPU/GPU/RAM and model sizeLocal-first; provider training does not apply unless using remote modelsPrivate prototyping and offline/local workflowsRequires hardware and model management; quality depends on local model
No tagline
LLM APIsLocal/self-hosted APILM StudioLocal$0 + hardwareFree local app/serverUnlimited local usage subject to local hardwareNo vendor overage; pay hardware/electricity/cloud GPU if usedOpenAI-like local server APILocal GGUF/open models downloadable through LM Studio$0 software cost$0 software cost$0 software cost$0 software costLimited by local CPU/GPU/RAM and model sizeLocal-first; no provider training for local inferenceNon-technical local API and model testingDesktop app dependency; production self-hosting needs care
No tagline
LLM APIsLocal/self-hosted APILocalAISelf-hosted$0 + hardwareFree open-source softwareUnlimited local/self-hosted usage subject to infrastructureNo vendor overage; pay infrastructure onlyOpenAI-compatible local APIRuns local LLMs, image/audio models and embeddings depending on setup$0 software cost$0 software cost$0 software cost$0 software costLimited by server hardware and model backendSelf-hosted; data stays on your infrastructure if configured correctlyTeams needing OpenAI-compatible local/private endpointsOps burden and performance tuning are on you