Embeddings RAG

Tool	Category	Segment	Platform / Tool	Plan / License	Monthly Price USD	Pricing Model	Free Tier / OSS	Included Usage / Limits	Embedding Models / Dimensions	Reranking / Retrieval	RAG / Search Features	Integrations / Frameworks	Deployment / Hosting	Security / Privacy	Team / Governance	Best Fit	Main Limits / Caveats
RAGFlow OSS No tagline	Embeddings RAG	Full RAG application platform	RAGFlow	Apache-2.0 / open source	$0 software	Open-source app/platform; hosting/model costs separate	✓	Open-source RAG engine/app that combines document understanding, retrieval and generation workflows	Uses external/local embedding models depending configuration	Retrieval pipeline, document parsing and answer generation workflow; rerank support depends setup	Document Q&A, knowledge base apps, parsing-heavy RAG and UI workflows	Docker/self-host, document parsers, vector/search backends and LLM providers	Self-hosted OSS; commercial/cloud options should be checked separately	Data stays in self-hosted deployment unless external model APIs are used	Self-managed governance in OSS	Teams wanting a full RAG app surface instead of only a library	Heavier than a library; production deployment and model/provider costs remain
text-embedding-3-small No tagline	Embeddings RAG	Embedding API	OpenAI	API pay-as-you-go	$0.02 / 1M tokens	Usage-based per input token; Batch API pricing may differ	No durable free tier captured on pricing page	Small embedding model; official model page lists $0.02 per 1M tokens	1536 dimensions by default; supports shortening via dimensions parameter per OpenAI embeddings docs	No native reranker endpoint in OpenAI API row; pair with vector DB/search provider	Semantic search, clustering, recommendations, anomaly detection and classification	OpenAI SDKs, LangChain, LlamaIndex, vector DB integrations	Hosted OpenAI API	OpenAI API data handling terms apply; enterprise/data settings should be checked per org	Organization/API-key governance in OpenAI platform	Cost-sensitive general-purpose semantic search and RAG	No built-in vector store or reranker; no free quota captured for production API use
text-embedding-3-large No tagline	Embeddings RAG	Embedding API	OpenAI	API pay-as-you-go	$0.13 / 1M tokens	Usage-based per input token; Batch API pricing may differ	No durable free tier captured on pricing page	Most capable current OpenAI embedding model row; official page lists $0.13 per 1M tokens	3072 dimensions by default; supports shortening via dimensions parameter per OpenAI embeddings docs	No native reranker endpoint in OpenAI API row; pair with vector DB/search provider	Higher-quality multilingual retrieval, clustering and classification	OpenAI SDKs, LangChain, LlamaIndex, vector DB integrations	Hosted OpenAI API	OpenAI API data handling terms apply; enterprise/data settings should be checked per org	Organization/API-key governance in OpenAI platform	Higher-quality retrieval when embedding cost is still secondary to answer model cost	More expensive than small model; no built-in vector storage/reranking
Gemini Embedding No tagline	Embeddings RAG	Embedding API	Google Gemini API	Free / Paid	$0 free tier; $0.15 / 1M tokens paid; $0.075 / 1M batch	Free tier plus paid per-token pricing	✓	Free tier input price is free of charge; paid tier $0.15 per 1M tokens; batch paid tier $0.075 per 1M tokens	gemini-embedding-001; flexible output size 128 to 3072, recommended 768/1536/3072; 2,048 input token limit in embeddings docs	No standalone reranker captured; Gemini File Search charges embeddings plus regular model tokens for retrieved document tokens	Embeddings for semantic search, classification, clustering and RAG; File Search is available as a Gemini tool	Google GenAI SDK, REST, LangChain/LlamaIndex integrations, Google AI Studio	Gemini Developer API; Vertex AI for enterprise deployment path	Free tier content may be used to improve products; paid tier says content not used to improve products on pricing table	Google project/API-key governance; enterprise via Vertex AI	Developers wanting a free-start embedding API with strong Google ecosystem integration	Free tier rate limits and product-improvement terms matter; paid production needs billing
Voyage Text Embeddings No tagline	Embeddings RAG	Embedding and rerank API	Voyage AI	API pay-as-you-go with free allocation	$0 platform fee; usage after free allocation	Per-token embedding pricing by model	✓	Pricing page lists free token allocations by model; multimodal row explicitly gives 200M free text tokens and 150B pixels for voyage-multimodal-3.5/3	Voyage embedding family for text, code, multilingual, law and finance; dimensions vary by model	Separate Voyage reranker endpoint available	High-quality domain-specific retrieval and RAG embeddings	Python/REST APIs, LangChain/LlamaIndex/vector DB integrations	Hosted Voyage API	Data handling terms depend on account/enterprise contract	API-key/account governance; enterprise options by sales	Teams optimizing retrieval quality in specific domains like code, law or finance	Free allocations and exact per-model prices vary; check the pricing table before high-volume use
Voyage Rerank / Multimodal No tagline	Embeddings RAG	Rerank and multimodal retrieval API	Voyage AI	API pay-as-you-go with free allocation	$0 platform fee; rerank after first 200M processed tokens	Rerank billed by processed tokens; multimodal billed by text tokens and pixels	✓	First 200M processed rerank tokens free for rerank-2.5/rerank-2.5-lite/rerank-2/rerank-2-lite; multimodal has 200M text tokens and 150B pixels free	voyage-multimodal-3.5/3 supports text/image/video retrieval pricing row; text embedding models separate	Rerank endpoint calculates processed tokens as query tokens times document count plus document tokens	Second-stage reranking, multimodal retrieval and RAG quality improvement	API usage with vector DBs, LangChain/LlamaIndex and custom RAG stacks	Hosted Voyage API	Data handling terms depend on account/enterprise contract	API-key/account governance; enterprise options by sales	RAG teams needing strong reranking or multimodal retrieval	Rerank billing scales with candidate document count; multimodal image/video pixel billing needs estimation
Cohere Embed No tagline	Embeddings RAG	Embedding API	Cohere	Trial / Production API	Usage-based; current prices on Cohere pricing page	Embedding models billed by tokens embedded	Yes, via evaluation/trial keys	Docs distinguish limited evaluation keys from paid production keys; embed rate limit examples list 100/min evaluation and 2,000/min production	Embed 4 and other Cohere embed models; dimensions/model details vary	Rerank is a separate Cohere model family	Enterprise-grade multilingual semantic search and retrieval	Cohere API, LangChain, LlamaIndex, vector DB integrations; Cohere Compass for managed search	Hosted API or Model Vault private deployment	Enterprise/private deployment via Cohere Model Vault; data/security terms by plan	Production keys and enterprise contracts; Model Vault dedicated deployment options	Companies wanting Cohere retrieval models with enterprise deployment options	Public pricing page emphasizes enterprise/private deployments; exact hosted API unit prices should be rechecked at checkout/docs
Cohere Rerank / Model Vault No tagline	Embeddings RAG	Rerank API	Cohere	Enterprise / Model Vault	$3,250/mo for Rerank 3.5 or Rerank 4 Fast Medium Model Vault; $6,500/mo Rerank 4 Pro Large	Dedicated instance pricing; hosted API pricing separate	Trial/evaluation keys exist for API access	Pricing page lists Model Vault hourly/monthly rates for Rerank 3.5, Rerank 4 Fast and Rerank 4 Pro; rate-limit docs list rerank eval and production key limits	Not an embedding row; pairs with Cohere Embed or third-party embeddings	Rerank 3.5/4 models for reordering retrieved candidates; Rerank docs price hosted endpoint by searches	Two-stage RAG, enterprise search and relevance tuning	Cohere API, Compass, vector DB and framework integrations	Hosted API or dedicated Model Vault deployment	Model Vault is dedicated/fully managed with no shared resources per pricing page	Enterprise procurement, dedicated deployment and support	Enterprise search teams that need reranking and private deployment controls	Monthly Model Vault rates are high for small teams; hosted API unit prices still need current pricing-table check
Jina Free API Key No tagline	Embeddings RAG	Embedding, rerank and search API	Jina AI	Free API key	$0	Free token quota plus rate limits	✓	API docs say new users receive 10M free tokens; Embedding/Reranker free-key limits show 100 RPM and 100,000 TPM	Jina embeddings include text and multimodal models; dimensions/model choice vary	Reranker API is available with same free-key rate-limit shape as embedding API	Embeddings, reranking, classification, reader/search APIs and batch embeddings	REST API, OpenAI-compatible metadata endpoint, LangChain/LlamaIndex style integrations	Hosted Jina API; local/open model usage depends on model license	Commercial model license and API terms apply; some model licenses are not fully open	API-key tiers: free, paid, premium; dashboard key manager	Developers needing generous free multilingual/multimodal retrieval API	Free tokens are finite; commercial model license details can differ by model
Jina Paid / Premium API Key No tagline	Embeddings RAG	Embedding, rerank and search API	Jina AI	Paid / Premium	Usage-based; rate limits scale by tier	Token-counted API usage	Free tier exists	Paid-key limits for Embedding/Reranker show 500 RPM and 2,000,000 TPM; Premium shows 5,000 RPM and 50,000,000 TPM	Text, multimodal, multi-vector/ColBERT and classifier model families	Reranker endpoint and search foundation API	Search AI for multilingual and multimodal data, batching and high-throughput retrieval	REST/OpenAPI, batch jobs and common RAG frameworks	Hosted Jina API	Commercial terms and data handling by Jina account/tier	Tiered API keys; premium/enterprise support path	Teams scaling retrieval workloads after free quota validation	Exact per-token price is not visible in the captured rate-limit table; confirm dashboard billing before production
Mistral Embed No tagline	Embeddings RAG	Embedding API	Mistral AI	API pay-as-you-go	$0.10 / 1M tokens	Per-token API pricing	No free tier captured in official model card	Model card lists $0.1 per million tokens and 8k context	mistral-embed; text/code semantic representations; 8k context on model card	No first-party reranker captured in this row	Semantic search, clustering, classification and RAG quickstarts	Mistral SDK/API, LangChain/LlamaIndex and Mistral knowledge/RAG toolkit	Hosted Mistral API; enterprise deployment options should be checked separately	Mistral API legal/privacy terms apply	Workspace/API-key governance in Mistral console	Teams already using Mistral for generation and wanting same-vendor embeddings	Single embedding model family; no native reranker row captured
HF Inference Providers No tagline	Embeddings RAG	Inference provider for embeddings/ranking	Hugging Face Inference Providers	Free credits / PRO / Team	$0.10 monthly credits Free; $2/mo credits PRO; $2/seat/mo credits Team/Enterprise	Monthly credits plus pay-as-you-go by provider/hardware	✓	Pricing docs list $0.10 monthly credits for free users, $2 for PRO, and $2/seat for Team or Enterprise organizations	Access to many embedding and text-ranking models through providers; hf-inference focuses mostly on CPU tasks including embedding/text-ranking	Text-ranking models available depending provider/model	Model hub, widgets, inference playground and serverless provider routing	Hugging Face Hub, transformers, sentence-transformers, LangChain/LlamaIndex integrations	Hosted Inference Providers or custom provider key; self-host models separately	Data/provider handling depends on selected routed provider or custom provider key	User, PRO, Team and Enterprise org billing/governance	Experimenting across many OSS embedding/rerank models without separate provider setup	Free credits are tiny; production cost depends on model/provider compute time
Nomic Developer API / Business No tagline	Embeddings RAG	Developer API and domain search platform	Nomic	Business / Enterprise	$40/user/mo annual, 25-seat minimum; $1,000/mo minimum	Seat subscription with included AI usage	No free developer API tier captured on pricing page	Each $40 seat includes $20 of included AI usage; usage can apply to Developer API, document ingestion and platform tools	Nomic Embed model family; developer API tools built on top of Nomic	Search/research queries and document ingestion are included AI-usage categories	Project data search, document indexing, workflows and domain-specific retrieval	Developer API, Nomic platform, document/project data sources	Cloud SaaS; Enterprise adds VPC/on-prem options	Org-wide privacy controls on Business; Enterprise adds SCIM, audit logs and deployment controls	Business has SAML/OIDC SSO; Enterprise custom governance	Architecture/engineering firms and teams using Nomic's document-search platform plus API	Annual 25-seat minimum makes it poor fit for hobby embedding-only usage
Pinecone Hosted Embeddings No tagline	Embeddings RAG	Hosted model inference inside vector DB	Pinecone	Starter / paid plans	$0 Starter minimum; hosted embedding usage priced per model	Plan minimums plus per-token inference/model pricing	✓	Starter has $0/month minimum; Pinecone limits page lists 5M embedding tokens/month/model on Starter; model gallery lists hosted model prices such as multilingual-e5-large $0.08/1M tokens and llama-text-embed-v2 $0.16/1M tokens	Hosted embedding models include llama-text-embed-v2, multilingual-e5-large and sparse encoder rows; dimensions depend model	Pinecone has hosted rerank models separately	Integrated inference with upsert/query, dense/sparse retrieval and vector search	Pinecone SDKs, LangChain/LlamaIndex, hosted vector index integrations	Managed Pinecone cloud	Pinecone account/project security and enterprise controls by plan	Starter/Builder/Standard/Enterprise plan governance	Teams wanting embeddings and vector storage/search managed in one place	Starter monthly token limit is low; hosted model pricing and vector DB usage both contribute to cost
Pinecone Hosted Rerank No tagline	Embeddings RAG	Hosted rerank inside vector DB	Pinecone	Starter / paid plans	$2.00 / 1k rerank requests for listed models	Per-request rerank pricing plus plan limits	Yes, selected rerank models on Starter	Model gallery lists cohere-rerank-3.5, bge-reranker-v2-m3 and pinecone-rerank-v0 at $2.00 per 1k requests; limits page shows 60 RPM Starter for bge/pinecone rerank, cohere-rerank not available on Starter	Not an embedding row; pairs with Pinecone hosted or external embeddings	Reranks candidate documents after vector, keyword or hybrid retrieval	Two-stage retrieval, integrated search and relevance improvement	Pinecone SDK inference.rerank, vector DB query flows, framework integrations	Managed Pinecone cloud	Pinecone account/project security and enterprise controls by plan	Plan-based rate limits and enterprise governance	RAG teams already using Pinecone who want one vendor for vector search and rerank	Rerank priced per request, so candidate-count and query volume need monitoring
Mixedbread Starter No tagline	Embeddings RAG	Embedding, rerank and vector store API	Mixedbread	Free	$0 with $5 one-time credits	Free credits plus request limits	✓	Starter includes $5 one-time credits, 3 workspace users, 10 stores and 100 requests/min; pricing page advertises up to $250 in free credits separately	Mixedbread embedding models and vector/store platform; exact dimensions depend selected model	Rerank listed at $7.50 per 1k queries in pricing snippet	Embeddings, reranking, vector stores and retrieval APIs	API, stores, RAG integrations and custom apps	Hosted Mixedbread platform	Data/security terms depend plan; Enterprise has dedicated infrastructure/BYOC	3 users on Starter; Enterprise custom	Developers testing embedding/rerank/store workflows without card	One-time free credits are limited; exact model prices should be checked for selected route
Mixedbread Scale / Enterprise No tagline	Embeddings RAG	Embedding, rerank and vector store API	Mixedbread	Scale / Enterprise	$20/mo Scale; Enterprise custom	Subscription with included credits plus pay-as-you-go usage	Starter free plan exists	Scale includes $20/month credits, unlimited workspace users, 10,000 stores, 1,200 queries/min and 360 ingestion/min; Enterprise adds custom limits and BYOC	Embedding models and store-backed retrieval workflows	Rerank price listed as $7.50 per 1k queries; higher rate limits on Scale/Enterprise	Managed stores, ingestion, search, rerank and retrieval workflows	API and platform integrations	Hosted platform; Enterprise dedicated infrastructure/BYOC	Enterprise offers dedicated infrastructure and white-glove support	Unlimited users on Scale; Enterprise custom	Teams needing integrated retrieval store plus embeddings/reranking	Stores and query limits are plan-specific; overage model should be watched
Fireworks Embeddings No tagline	Embeddings RAG	Hosted OSS embedding/rerank models	Fireworks AI	Serverless pay-as-you-go	$1 free credits; embeddings from $0.008 to $0.10 / 1M input tokens	Per-token serverless inference	✓	Pricing page says get started with $1 in free credits; embedding price table lists up to 150M params at $0.008/1M tokens, 150M-350M at $0.016, Qwen3 8B at $0.10	Hosts Qwen3 embedding models and other embedding/rerank models; context/model dimensions vary	Fireworks docs include embeddings and reranking service with OpenAI-compatible embeddings endpoint	Semantic search, RAG and reranking using hosted open models	OpenAI-compatible endpoint, Python/REST, LangChain/LlamaIndex style integrations	Serverless API; on-demand deployments for dedicated GPUs	Fireworks account/API-key terms; enterprise deployments available	Project/API key governance; enterprise custom	Teams wanting cheap hosted open embedding models without managing GPUs	Model page snippets can show inconsistent library prices; use pricing page and model page before production
Titan / Nova Embeddings No tagline	Embeddings RAG	Cloud embedding API	Amazon Bedrock	AWS pay-as-you-go	$0.02 / 1M tokens for Titan Text Embeddings V2 commonly listed; Nova multimodal differs	Bedrock on-demand token pricing	No always-free Bedrock model tier captured	Bedrock pricing docs list Amazon embedding models including Titan Text Embeddings V2, Titan Multimodal and Nova Multimodal; current region/model pricing should be checked in AWS pricing table	Titan Text Embeddings V2 and multimodal embedding models; dimensions/model capabilities vary	No first-party reranker row captured; can pair with OpenSearch/Kendra/vector DBs	AWS-native RAG, semantic search and knowledge base ingestion	Bedrock Knowledge Bases, OpenSearch, Aurora pgvector, LangChain/LlamaIndex, AWS SDK	AWS Bedrock regional service	AWS IAM/VPC/compliance controls; data terms by Bedrock model/provider	AWS account/IAM/org governance	AWS-heavy teams needing embeddings inside existing cloud/compliance perimeter	Pricing varies by region/model and AWS tables are harder to scrape; verify exact region before cost modeling
Sentence Transformers OSS No tagline	Embeddings RAG	Open-source embedding library	Sentence Transformers	Apache-2.0 / open source	$0 software	Self-hosted open-source library; compute/model hosting separate	✓	Python framework for state-of-the-art sentence, text and image embeddings; no hosted quota because it runs locally or on your infrastructure	Supports many pretrained embedding models and cross-encoders; dimensions vary by model	Cross-encoder reranking supported through sentence-transformers/cross-encoder workflows	Semantic search, clustering, retrieval, paraphrase mining and model fine-tuning	Hugging Face models, PyTorch, transformers, LangChain/LlamaIndex integrations	Local, server, GPU, HF Inference or custom hosting	Data stays local if self-hosted; model licenses vary	No SaaS governance unless wrapped by your platform	Teams that want maximum model choice and control over embeddings	Requires infra, batching, monitoring and model-license checks
FastEmbed OSS No tagline	Embeddings RAG	Open-source lightweight embedding/rerank library	FastEmbed	Apache-2.0 / open source	$0 software	Self-hosted open-source library; compute/model hosting separate	✓	Lightweight ONNX Runtime-based library; default examples use BAAI/bge-small-en-v1.5 and support dense, sparse, late-interaction multimodal and rerankers	Dense/sparse/late-interaction embeddings; dimensions vary by model; example BGE small vector is 384 dimensions	TextCrossEncoder rerankers supported	Local semantic retrieval, Qdrant integration, serverless-friendly embeddings	Qdrant, Python, ONNX Runtime, custom HF model sources	Local/self-hosted; can run in serverless runtimes more easily than heavier PyTorch stacks	Data stays local; model licenses vary	No SaaS governance unless wrapped by your platform	Developers wanting fast local embeddings/rerank without PyTorch weight	Model support is curated; still needs vector DB/storage and production ops
Open Embedding Models No tagline	Embeddings RAG	Open-source embedding models	BAAI BGE / E5 / Qwen Embeddings	Open model licenses vary	$0 software; hosting/API costs separate	Local or hosted through HF/Fireworks/Together/etc.	Yes, if local/open weights license permits	Local resource highlights BGE-Large-EN-v1.5, E5-Mistral and Nomic Embed as free/local choices; HF model cards define exact licenses and dimensions	BGE, E5, Qwen and Nomic families; dimensions/context vary by model	Open rerankers such as BGE reranker or Qwen reranker can be paired separately	Custom semantic search, multilingual retrieval and domain-tuned RAG	Sentence Transformers, FastEmbed, TEI, vLLM, HF Inference, vector DBs	Local, cloud GPU, HF Inference or provider APIs	Data privacy depends on local vs hosted execution; licenses vary per model	Governance is self-managed unless using a platform	Cost-controlled teams willing to manage models for retrieval quality	Licenses, quantization quality, pooling strategy and hardware requirements vary widely
LangChain OSS No tagline	Embeddings RAG	RAG orchestration framework	LangChain	MIT / open source	$0 software	Open-source framework; provider/vector DB costs separate	✓	Official LangChain page says LangChain is MIT-licensed open source and free to use	Supports many embedding providers and vector stores through integrations	Retriever, contextual compression and reranker integrations through ecosystem	Chains, retrievers, tools, agents, document loaders and RAG app patterns	OpenAI, Gemini, Cohere, Hugging Face, vector DBs, LangGraph and LangSmith	Local/server app framework; LangSmith/hosting separate	Data handling depends on providers and whether LangSmith is used	No SaaS governance in OSS; LangSmith adds team governance/pricing	Developers wanting the broadest RAG integration ecosystem	Framework complexity and version churn can be nontrivial; observability/hosted features are separate
Haystack OSS No tagline	Embeddings RAG	RAG pipeline framework	Haystack by deepset	Apache-2.0 / open source	$0 software	Open-source framework; deepset platform custom/priced separately	✓	Local resources list Haystack as open-source RAG framework; commercial deepset platform handles infrastructure/collaboration separately	Integrates embedding retrievers and document stores; model dimensions depend provider	Supports rankers/retrievers and pipeline components for retrieval quality	Composable pipelines for search, Q&A, RAG, agents and evaluation	Elasticsearch/OpenSearch, vector DBs, model APIs and Python ecosystem	Self-hosted framework; deepset enterprise/cloud platform available	Self-host keeps data in your infra; enterprise platform terms separate	OSS has no SaaS governance; enterprise platform adds roles/collaboration	Python teams building production retrieval pipelines with explicit components	Cloud/platform pricing is not public in captured official/local sources; OSS requires ops