Embeddings RAG
Tool | Category | Segment | Platform / Tool | Plan / License | Monthly Price USD | Pricing Model | Free Tier / OSS | Included Usage / Limits | Embedding Models / Dimensions | Reranking / Retrieval | RAG / Search Features | Integrations / Frameworks | Deployment / Hosting | Security / Privacy | Team / Governance | Best Fit | Main Limits / Caveats |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
No tagline | Embeddings RAG | Full RAG application platform | RAGFlow | Apache-2.0 / open source | $0 software | Open-source app/platform; hosting/model costs separate | ✓ | Open-source RAG engine/app that combines document understanding, retrieval and generation workflows | Uses external/local embedding models depending configuration | Retrieval pipeline, document parsing and answer generation workflow; rerank support depends setup | Document Q&A, knowledge base apps, parsing-heavy RAG and UI workflows | Docker/self-host, document parsers, vector/search backends and LLM providers | Self-hosted OSS; commercial/cloud options should be checked separately | Data stays in self-hosted deployment unless external model APIs are used | Self-managed governance in OSS | Teams wanting a full RAG app surface instead of only a library | Heavier than a library; production deployment and model/provider costs remain |
No tagline | Embeddings RAG | Embedding API | OpenAI | API pay-as-you-go | $0.02 / 1M tokens | Usage-based per input token; Batch API pricing may differ | No durable free tier captured on pricing page | Small embedding model; official model page lists $0.02 per 1M tokens | 1536 dimensions by default; supports shortening via dimensions parameter per OpenAI embeddings docs | No native reranker endpoint in OpenAI API row; pair with vector DB/search provider | Semantic search, clustering, recommendations, anomaly detection and classification | OpenAI SDKs, LangChain, LlamaIndex, vector DB integrations | Hosted OpenAI API | OpenAI API data handling terms apply; enterprise/data settings should be checked per org | Organization/API-key governance in OpenAI platform | Cost-sensitive general-purpose semantic search and RAG | No built-in vector store or reranker; no free quota captured for production API use |
No tagline | Embeddings RAG | Embedding API | OpenAI | API pay-as-you-go | $0.13 / 1M tokens | Usage-based per input token; Batch API pricing may differ | No durable free tier captured on pricing page | Most capable current OpenAI embedding model row; official page lists $0.13 per 1M tokens | 3072 dimensions by default; supports shortening via dimensions parameter per OpenAI embeddings docs | No native reranker endpoint in OpenAI API row; pair with vector DB/search provider | Higher-quality multilingual retrieval, clustering and classification | OpenAI SDKs, LangChain, LlamaIndex, vector DB integrations | Hosted OpenAI API | OpenAI API data handling terms apply; enterprise/data settings should be checked per org | Organization/API-key governance in OpenAI platform | Higher-quality retrieval when embedding cost is still secondary to answer model cost | More expensive than small model; no built-in vector storage/reranking |
No tagline | Embeddings RAG | Embedding API | Google Gemini API | Free / Paid | $0 free tier; $0.15 / 1M tokens paid; $0.075 / 1M batch | Free tier plus paid per-token pricing | ✓ | Free tier input price is free of charge; paid tier $0.15 per 1M tokens; batch paid tier $0.075 per 1M tokens | gemini-embedding-001; flexible output size 128 to 3072, recommended 768/1536/3072; 2,048 input token limit in embeddings docs | No standalone reranker captured; Gemini File Search charges embeddings plus regular model tokens for retrieved document tokens | Embeddings for semantic search, classification, clustering and RAG; File Search is available as a Gemini tool | Google GenAI SDK, REST, LangChain/LlamaIndex integrations, Google AI Studio | Gemini Developer API; Vertex AI for enterprise deployment path | Free tier content may be used to improve products; paid tier says content not used to improve products on pricing table | Google project/API-key governance; enterprise via Vertex AI | Developers wanting a free-start embedding API with strong Google ecosystem integration | Free tier rate limits and product-improvement terms matter; paid production needs billing |
No tagline | Embeddings RAG | Embedding and rerank API | Voyage AI | API pay-as-you-go with free allocation | $0 platform fee; usage after free allocation | Per-token embedding pricing by model | ✓ | Pricing page lists free token allocations by model; multimodal row explicitly gives 200M free text tokens and 150B pixels for voyage-multimodal-3.5/3 | Voyage embedding family for text, code, multilingual, law and finance; dimensions vary by model | Separate Voyage reranker endpoint available | High-quality domain-specific retrieval and RAG embeddings | Python/REST APIs, LangChain/LlamaIndex/vector DB integrations | Hosted Voyage API | Data handling terms depend on account/enterprise contract | API-key/account governance; enterprise options by sales | Teams optimizing retrieval quality in specific domains like code, law or finance | Free allocations and exact per-model prices vary; check the pricing table before high-volume use |
No tagline | Embeddings RAG | Rerank and multimodal retrieval API | Voyage AI | API pay-as-you-go with free allocation | $0 platform fee; rerank after first 200M processed tokens | Rerank billed by processed tokens; multimodal billed by text tokens and pixels | ✓ | First 200M processed rerank tokens free for rerank-2.5/rerank-2.5-lite/rerank-2/rerank-2-lite; multimodal has 200M text tokens and 150B pixels free | voyage-multimodal-3.5/3 supports text/image/video retrieval pricing row; text embedding models separate | Rerank endpoint calculates processed tokens as query tokens times document count plus document tokens | Second-stage reranking, multimodal retrieval and RAG quality improvement | API usage with vector DBs, LangChain/LlamaIndex and custom RAG stacks | Hosted Voyage API | Data handling terms depend on account/enterprise contract | API-key/account governance; enterprise options by sales | RAG teams needing strong reranking or multimodal retrieval | Rerank billing scales with candidate document count; multimodal image/video pixel billing needs estimation |
No tagline | Embeddings RAG | Embedding API | Cohere | Trial / Production API | Usage-based; current prices on Cohere pricing page | Embedding models billed by tokens embedded | Yes, via evaluation/trial keys | Docs distinguish limited evaluation keys from paid production keys; embed rate limit examples list 100/min evaluation and 2,000/min production | Embed 4 and other Cohere embed models; dimensions/model details vary | Rerank is a separate Cohere model family | Enterprise-grade multilingual semantic search and retrieval | Cohere API, LangChain, LlamaIndex, vector DB integrations; Cohere Compass for managed search | Hosted API or Model Vault private deployment | Enterprise/private deployment via Cohere Model Vault; data/security terms by plan | Production keys and enterprise contracts; Model Vault dedicated deployment options | Companies wanting Cohere retrieval models with enterprise deployment options | Public pricing page emphasizes enterprise/private deployments; exact hosted API unit prices should be rechecked at checkout/docs |
No tagline | Embeddings RAG | Rerank API | Cohere | Enterprise / Model Vault | $3,250/mo for Rerank 3.5 or Rerank 4 Fast Medium Model Vault; $6,500/mo Rerank 4 Pro Large | Dedicated instance pricing; hosted API pricing separate | Trial/evaluation keys exist for API access | Pricing page lists Model Vault hourly/monthly rates for Rerank 3.5, Rerank 4 Fast and Rerank 4 Pro; rate-limit docs list rerank eval and production key limits | Not an embedding row; pairs with Cohere Embed or third-party embeddings | Rerank 3.5/4 models for reordering retrieved candidates; Rerank docs price hosted endpoint by searches | Two-stage RAG, enterprise search and relevance tuning | Cohere API, Compass, vector DB and framework integrations | Hosted API or dedicated Model Vault deployment | Model Vault is dedicated/fully managed with no shared resources per pricing page | Enterprise procurement, dedicated deployment and support | Enterprise search teams that need reranking and private deployment controls | Monthly Model Vault rates are high for small teams; hosted API unit prices still need current pricing-table check |
No tagline | Embeddings RAG | Embedding, rerank and search API | Jina AI | Free API key | $0 | Free token quota plus rate limits | ✓ | API docs say new users receive 10M free tokens; Embedding/Reranker free-key limits show 100 RPM and 100,000 TPM | Jina embeddings include text and multimodal models; dimensions/model choice vary | Reranker API is available with same free-key rate-limit shape as embedding API | Embeddings, reranking, classification, reader/search APIs and batch embeddings | REST API, OpenAI-compatible metadata endpoint, LangChain/LlamaIndex style integrations | Hosted Jina API; local/open model usage depends on model license | Commercial model license and API terms apply; some model licenses are not fully open | API-key tiers: free, paid, premium; dashboard key manager | Developers needing generous free multilingual/multimodal retrieval API | Free tokens are finite; commercial model license details can differ by model |
No tagline | Embeddings RAG | Embedding, rerank and search API | Jina AI | Paid / Premium | Usage-based; rate limits scale by tier | Token-counted API usage | Free tier exists | Paid-key limits for Embedding/Reranker show 500 RPM and 2,000,000 TPM; Premium shows 5,000 RPM and 50,000,000 TPM | Text, multimodal, multi-vector/ColBERT and classifier model families | Reranker endpoint and search foundation API | Search AI for multilingual and multimodal data, batching and high-throughput retrieval | REST/OpenAPI, batch jobs and common RAG frameworks | Hosted Jina API | Commercial terms and data handling by Jina account/tier | Tiered API keys; premium/enterprise support path | Teams scaling retrieval workloads after free quota validation | Exact per-token price is not visible in the captured rate-limit table; confirm dashboard billing before production |
No tagline | Embeddings RAG | Embedding API | Mistral AI | API pay-as-you-go | $0.10 / 1M tokens | Per-token API pricing | No free tier captured in official model card | Model card lists $0.1 per million tokens and 8k context | mistral-embed; text/code semantic representations; 8k context on model card | No first-party reranker captured in this row | Semantic search, clustering, classification and RAG quickstarts | Mistral SDK/API, LangChain/LlamaIndex and Mistral knowledge/RAG toolkit | Hosted Mistral API; enterprise deployment options should be checked separately | Mistral API legal/privacy terms apply | Workspace/API-key governance in Mistral console | Teams already using Mistral for generation and wanting same-vendor embeddings | Single embedding model family; no native reranker row captured |
No tagline | Embeddings RAG | Inference provider for embeddings/ranking | Hugging Face Inference Providers | Free credits / PRO / Team | $0.10 monthly credits Free; $2/mo credits PRO; $2/seat/mo credits Team/Enterprise | Monthly credits plus pay-as-you-go by provider/hardware | ✓ | Pricing docs list $0.10 monthly credits for free users, $2 for PRO, and $2/seat for Team or Enterprise organizations | Access to many embedding and text-ranking models through providers; hf-inference focuses mostly on CPU tasks including embedding/text-ranking | Text-ranking models available depending provider/model | Model hub, widgets, inference playground and serverless provider routing | Hugging Face Hub, transformers, sentence-transformers, LangChain/LlamaIndex integrations | Hosted Inference Providers or custom provider key; self-host models separately | Data/provider handling depends on selected routed provider or custom provider key | User, PRO, Team and Enterprise org billing/governance | Experimenting across many OSS embedding/rerank models without separate provider setup | Free credits are tiny; production cost depends on model/provider compute time |
No tagline | Embeddings RAG | Developer API and domain search platform | Nomic | Business / Enterprise | $40/user/mo annual, 25-seat minimum; $1,000/mo minimum | Seat subscription with included AI usage | No free developer API tier captured on pricing page | Each $40 seat includes $20 of included AI usage; usage can apply to Developer API, document ingestion and platform tools | Nomic Embed model family; developer API tools built on top of Nomic | Search/research queries and document ingestion are included AI-usage categories | Project data search, document indexing, workflows and domain-specific retrieval | Developer API, Nomic platform, document/project data sources | Cloud SaaS; Enterprise adds VPC/on-prem options | Org-wide privacy controls on Business; Enterprise adds SCIM, audit logs and deployment controls | Business has SAML/OIDC SSO; Enterprise custom governance | Architecture/engineering firms and teams using Nomic's document-search platform plus API | Annual 25-seat minimum makes it poor fit for hobby embedding-only usage |
No tagline | Embeddings RAG | Hosted model inference inside vector DB | Pinecone | Starter / paid plans | $0 Starter minimum; hosted embedding usage priced per model | Plan minimums plus per-token inference/model pricing | ✓ | Starter has $0/month minimum; Pinecone limits page lists 5M embedding tokens/month/model on Starter; model gallery lists hosted model prices such as multilingual-e5-large $0.08/1M tokens and llama-text-embed-v2 $0.16/1M tokens | Hosted embedding models include llama-text-embed-v2, multilingual-e5-large and sparse encoder rows; dimensions depend model | Pinecone has hosted rerank models separately | Integrated inference with upsert/query, dense/sparse retrieval and vector search | Pinecone SDKs, LangChain/LlamaIndex, hosted vector index integrations | Managed Pinecone cloud | Pinecone account/project security and enterprise controls by plan | Starter/Builder/Standard/Enterprise plan governance | Teams wanting embeddings and vector storage/search managed in one place | Starter monthly token limit is low; hosted model pricing and vector DB usage both contribute to cost |
No tagline | Embeddings RAG | Hosted rerank inside vector DB | Pinecone | Starter / paid plans | $2.00 / 1k rerank requests for listed models | Per-request rerank pricing plus plan limits | Yes, selected rerank models on Starter | Model gallery lists cohere-rerank-3.5, bge-reranker-v2-m3 and pinecone-rerank-v0 at $2.00 per 1k requests; limits page shows 60 RPM Starter for bge/pinecone rerank, cohere-rerank not available on Starter | Not an embedding row; pairs with Pinecone hosted or external embeddings | Reranks candidate documents after vector, keyword or hybrid retrieval | Two-stage retrieval, integrated search and relevance improvement | Pinecone SDK inference.rerank, vector DB query flows, framework integrations | Managed Pinecone cloud | Pinecone account/project security and enterprise controls by plan | Plan-based rate limits and enterprise governance | RAG teams already using Pinecone who want one vendor for vector search and rerank | Rerank priced per request, so candidate-count and query volume need monitoring |
No tagline | Embeddings RAG | Embedding, rerank and vector store API | Mixedbread | Free | $0 with $5 one-time credits | Free credits plus request limits | ✓ | Starter includes $5 one-time credits, 3 workspace users, 10 stores and 100 requests/min; pricing page advertises up to $250 in free credits separately | Mixedbread embedding models and vector/store platform; exact dimensions depend selected model | Rerank listed at $7.50 per 1k queries in pricing snippet | Embeddings, reranking, vector stores and retrieval APIs | API, stores, RAG integrations and custom apps | Hosted Mixedbread platform | Data/security terms depend plan; Enterprise has dedicated infrastructure/BYOC | 3 users on Starter; Enterprise custom | Developers testing embedding/rerank/store workflows without card | One-time free credits are limited; exact model prices should be checked for selected route |
No tagline | Embeddings RAG | Embedding, rerank and vector store API | Mixedbread | Scale / Enterprise | $20/mo Scale; Enterprise custom | Subscription with included credits plus pay-as-you-go usage | Starter free plan exists | Scale includes $20/month credits, unlimited workspace users, 10,000 stores, 1,200 queries/min and 360 ingestion/min; Enterprise adds custom limits and BYOC | Embedding models and store-backed retrieval workflows | Rerank price listed as $7.50 per 1k queries; higher rate limits on Scale/Enterprise | Managed stores, ingestion, search, rerank and retrieval workflows | API and platform integrations | Hosted platform; Enterprise dedicated infrastructure/BYOC | Enterprise offers dedicated infrastructure and white-glove support | Unlimited users on Scale; Enterprise custom | Teams needing integrated retrieval store plus embeddings/reranking | Stores and query limits are plan-specific; overage model should be watched |
No tagline | Embeddings RAG | Hosted OSS embedding/rerank models | Fireworks AI | Serverless pay-as-you-go | $1 free credits; embeddings from $0.008 to $0.10 / 1M input tokens | Per-token serverless inference | ✓ | Pricing page says get started with $1 in free credits; embedding price table lists up to 150M params at $0.008/1M tokens, 150M-350M at $0.016, Qwen3 8B at $0.10 | Hosts Qwen3 embedding models and other embedding/rerank models; context/model dimensions vary | Fireworks docs include embeddings and reranking service with OpenAI-compatible embeddings endpoint | Semantic search, RAG and reranking using hosted open models | OpenAI-compatible endpoint, Python/REST, LangChain/LlamaIndex style integrations | Serverless API; on-demand deployments for dedicated GPUs | Fireworks account/API-key terms; enterprise deployments available | Project/API key governance; enterprise custom | Teams wanting cheap hosted open embedding models without managing GPUs | Model page snippets can show inconsistent library prices; use pricing page and model page before production |
No tagline | Embeddings RAG | Cloud embedding API | Amazon Bedrock | AWS pay-as-you-go | $0.02 / 1M tokens for Titan Text Embeddings V2 commonly listed; Nova multimodal differs | Bedrock on-demand token pricing | No always-free Bedrock model tier captured | Bedrock pricing docs list Amazon embedding models including Titan Text Embeddings V2, Titan Multimodal and Nova Multimodal; current region/model pricing should be checked in AWS pricing table | Titan Text Embeddings V2 and multimodal embedding models; dimensions/model capabilities vary | No first-party reranker row captured; can pair with OpenSearch/Kendra/vector DBs | AWS-native RAG, semantic search and knowledge base ingestion | Bedrock Knowledge Bases, OpenSearch, Aurora pgvector, LangChain/LlamaIndex, AWS SDK | AWS Bedrock regional service | AWS IAM/VPC/compliance controls; data terms by Bedrock model/provider | AWS account/IAM/org governance | AWS-heavy teams needing embeddings inside existing cloud/compliance perimeter | Pricing varies by region/model and AWS tables are harder to scrape; verify exact region before cost modeling |
No tagline | Embeddings RAG | Open-source embedding library | Sentence Transformers | Apache-2.0 / open source | $0 software | Self-hosted open-source library; compute/model hosting separate | ✓ | Python framework for state-of-the-art sentence, text and image embeddings; no hosted quota because it runs locally or on your infrastructure | Supports many pretrained embedding models and cross-encoders; dimensions vary by model | Cross-encoder reranking supported through sentence-transformers/cross-encoder workflows | Semantic search, clustering, retrieval, paraphrase mining and model fine-tuning | Hugging Face models, PyTorch, transformers, LangChain/LlamaIndex integrations | Local, server, GPU, HF Inference or custom hosting | Data stays local if self-hosted; model licenses vary | No SaaS governance unless wrapped by your platform | Teams that want maximum model choice and control over embeddings | Requires infra, batching, monitoring and model-license checks |
No tagline | Embeddings RAG | Open-source lightweight embedding/rerank library | FastEmbed | Apache-2.0 / open source | $0 software | Self-hosted open-source library; compute/model hosting separate | ✓ | Lightweight ONNX Runtime-based library; default examples use BAAI/bge-small-en-v1.5 and support dense, sparse, late-interaction multimodal and rerankers | Dense/sparse/late-interaction embeddings; dimensions vary by model; example BGE small vector is 384 dimensions | TextCrossEncoder rerankers supported | Local semantic retrieval, Qdrant integration, serverless-friendly embeddings | Qdrant, Python, ONNX Runtime, custom HF model sources | Local/self-hosted; can run in serverless runtimes more easily than heavier PyTorch stacks | Data stays local; model licenses vary | No SaaS governance unless wrapped by your platform | Developers wanting fast local embeddings/rerank without PyTorch weight | Model support is curated; still needs vector DB/storage and production ops |
No tagline | Embeddings RAG | Open-source embedding models | BAAI BGE / E5 / Qwen Embeddings | Open model licenses vary | $0 software; hosting/API costs separate | Local or hosted through HF/Fireworks/Together/etc. | Yes, if local/open weights license permits | Local resource highlights BGE-Large-EN-v1.5, E5-Mistral and Nomic Embed as free/local choices; HF model cards define exact licenses and dimensions | BGE, E5, Qwen and Nomic families; dimensions/context vary by model | Open rerankers such as BGE reranker or Qwen reranker can be paired separately | Custom semantic search, multilingual retrieval and domain-tuned RAG | Sentence Transformers, FastEmbed, TEI, vLLM, HF Inference, vector DBs | Local, cloud GPU, HF Inference or provider APIs | Data privacy depends on local vs hosted execution; licenses vary per model | Governance is self-managed unless using a platform | Cost-controlled teams willing to manage models for retrieval quality | Licenses, quantization quality, pooling strategy and hardware requirements vary widely |
No tagline | Embeddings RAG | RAG orchestration framework | LangChain | MIT / open source | $0 software | Open-source framework; provider/vector DB costs separate | ✓ | Official LangChain page says LangChain is MIT-licensed open source and free to use | Supports many embedding providers and vector stores through integrations | Retriever, contextual compression and reranker integrations through ecosystem | Chains, retrievers, tools, agents, document loaders and RAG app patterns | OpenAI, Gemini, Cohere, Hugging Face, vector DBs, LangGraph and LangSmith | Local/server app framework; LangSmith/hosting separate | Data handling depends on providers and whether LangSmith is used | No SaaS governance in OSS; LangSmith adds team governance/pricing | Developers wanting the broadest RAG integration ecosystem | Framework complexity and version churn can be nontrivial; observability/hosted features are separate |
No tagline | Embeddings RAG | RAG pipeline framework | Haystack by deepset | Apache-2.0 / open source | $0 software | Open-source framework; deepset platform custom/priced separately | ✓ | Local resources list Haystack as open-source RAG framework; commercial deepset platform handles infrastructure/collaboration separately | Integrates embedding retrievers and document stores; model dimensions depend provider | Supports rankers/retrievers and pipeline components for retrieval quality | Composable pipelines for search, Q&A, RAG, agents and evaluation | Elasticsearch/OpenSearch, vector DBs, model APIs and Python ecosystem | Self-hosted framework; deepset enterprise/cloud platform available | Self-host keeps data in your infra; enterprise platform terms separate | OSS has no SaaS governance; enterprise platform adds roles/collaboration | Python teams building production retrieval pipelines with explicit components | Cloud/platform pricing is not public in captured official/local sources; OSS requires ops |