How do you select an embedding model for RAG in 2027?
Direct Answer
In 2027, embedding model selection for RAG and semantic search comes down to four criteria: (1) task-specific quality on your domain, (2) dimension count and cost-per-query trade-off, (3) multilingual support if needed, and (4) enterprise availability (API + compliance).
The 2027 default short-list: OpenAI text-embedding-3-large (3072 dim, $0.13/M tokens, strong general), Cohere embed-v4 (1024 dim, $0.10/M, strong multilingual), Voyage AI voyage-3-large (1024 dim, $0.18/M, strong code and retrieval), Google Gemini Embedding 2 (768 dim, $0.025/M, cheapest), Anthropic embed (when available; expected 2027), and bge-large-en-v1.5 (open-source, self-hosted, 1024 dim).
1. Task-Specific Quality
Public benchmarks (MTEB — Massive Text Embedding Benchmark) measure general quality. Always re-evaluate on your task.
MTEB 2026 leaders: Voyage AI voyage-3-large, OpenAI text-embedding-3-large, Cohere embed-v4, Google Gemini Embedding 2. Differences are usually <2% on average; differences on your task can be 10–20%.
Task patterns:
- General text retrieval: OpenAI text-embedding-3-large or Voyage voyage-3-large.
- Code retrieval: Voyage voyage-code-3 or text-embedding-3-large.
- Multilingual: Cohere embed-multilingual-v4.
- Legal, medical, financial domains: Voyage AI domain-specific variants.
1.1 Evaluation Method
Build a labeled relevance set (200+ query-document pairs). Measure NDCG@10 and MRR. Run all candidate models. Pick the winner — often 5–10% NDCG difference between candidates.
2. Dimension Count and Cost
Higher dimensions usually mean higher quality but more storage and slower search.
- 3072-dim (OpenAI text-embedding-3-large): strongest quality; 6x storage cost vs 512-dim.
- 1024-dim (Voyage, Cohere): standard sweet spot.
- 768-dim (Gemini Embedding 2, MiniLM variants): strong cost optimization.
- 512-dim (custom or distilled): edge deployments.
Matryoshka embeddings (OpenAI text-embedding-3 family) let you truncate to any dimension at query time — store 3072-dim and query at 512-dim if cost matters.
2.1 Storage Cost at Scale
100M vectors at:
- 3072-dim float32: 1.2 TB
- 1024-dim float32: 400 GB
- 768-dim float32: 300 GB
Most vector databases (Pinecone, Qdrant, Weaviate) charge by storage. Dimension matters at 10M+ scale.
3. Multilingual Support
For multilingual products, Cohere embed-multilingual-v4 is the default. Supports 100+ languages with consistent quality. OpenAI text-embedding-3-large is strong but English-leaning. Voyage voyage-multilingual-2 is competitive.
3.1 Cross-Lingual Retrieval
When users query in language A and documents are in language B, multilingual models retrieve correctly. Critical for global products.
4. Enterprise Availability
For regulated workloads:
- OpenAI — SOC 2 Type II, HIPAA BAA, GDPR DPA.
- Cohere — SOC 2, GDPR, FedRAMP via AWS Bedrock partnership.
- Voyage AI — SOC 2; growing enterprise posture.
- Google Vertex AI Embeddings — full Google Cloud compliance stack.
- Self-hosted bge-large on AWS Bedrock, Azure ML, or owned infrastructure — full control.
5. Self-Hosted vs API
API embedding is best for under 10B tokens monthly. Self-hosted (bge-large, jina-embeddings, custom-fine-tuned) wins at 50B+ tokens monthly if you have GPU capacity.
Cost crossover: OpenAI text-embedding-3-large at $0.13/M tokens; self-hosted bge-large on a single H100 GPU runs ~$0.02/M tokens at full utilization. Crossover happens around 10B tokens monthly.
5.1 Fine-Tuning Embeddings
Domain-specific fine-tuning (legal, medical, code) can lift retrieval quality 10–25%. Sentence-Transformers framework + GPU + 1,000+ in-domain triplets (query, positive, negative).
FAQ
OpenAI or Voyage as default? OpenAI text-embedding-3-large for ubiquity; Voyage voyage-3-large if benchmarks favor it on your task.
Should we use Matryoshka truncation? Yes if storage cost matters. Store 3072-dim, query at 512 or 768 dim.
Cohere or OpenAI for multilingual? Cohere embed-multilingual-v4. Significantly stronger non-English retrieval.
Self-hosted bge-large or API? API under 10B tokens/month; self-hosted above.
How often should we re-evaluate? Quarterly on the same labeled relevance set. Vendor models update; your data drifts.
Bottom Line
Embedding selection in 2027 is a task-specific decision. OpenAI text-embedding-3-large and Voyage voyage-3-large are the general defaults; Cohere embed-multilingual-v4 for multilingual; Gemini Embedding 2 for cost; bge-large self-hosted for scale. Always re-evaluate on your task — public benchmarks tell you nothing definitive about your domain.
Sources
- MTEB — Massive Text Embedding Benchmark (Hugging Face)
- OpenAI — text-embedding-3-large Documentation
- Cohere — embed-v4 and embed-multilingual-v4 Documentation
- Voyage AI — voyage-3-large and voyage-code-3 Reference
- Google — Gemini Embedding 2 Documentation
- BAAI — bge-large-en-v1.5 Open-Source Model Reference
- Sentence-Transformers — Fine-Tuning Reference
- Pinecone — Embedding Model Comparison Reference
- LlamaIndex — Embedding Provider Comparison Documentation
- AWS Bedrock — Embedding Model Catalog