Vector Embeddings Guide
Vector Embeddings Guide
Source: https://qdrant.tech/articles/what-are-embeddings/
Core Functions
• Data Transformation
▪ Convert raw text into numerical vectors
▪ Maintain semantic relationships in vector space
▪ Create consistent representations for similar inputs
• Semantic Preservation
▪ Ensure similar meanings result in similar vectors
▪ Capture contextual nuances and relationships
▪ Enable mathematical operations on language
• Google Models:
▪ text-embedding-004: For better free tier availability
▪ gemini-embedding-001: Latest Gemini embedding model, State-of-the-art
performance across English, multilingual and code tasks. It unifies the
previously specialized models like text-embedding-005 and text-multilingual-
embedding-002 and achieves better performance in their respective domains
1. Supports 250+ languages
2. Long context window (3,072 tokens)
3. Limited Free tier available
Image Embeddings
• OpenAI CLIP:
▪ Links text and images in unified vector space
▪ Enables cross-modal search and understanding
▪ Foundation for many multimodal applications
• Other Options:
▪ Google Vision API embeddings
▪ Microsoft Azure Computer Vision
▪ Open-source alternatives like OpenCLIP
Multi-modal Embeddings
• Amazon Titan Multimodel:
▪ Handles text and images simultaneously
▪ AWS ecosystem integration
▪ Enterprise-focused features
Inference Phase
• Text Processing: Convert new input text into tokens
Retrieval-focused Models
• Specifically optimized for search and retrieval
• Often use dual-encoder architectures
• Examples: DPR, BGE models
3. Vector Databases
What are Vector Databases?
Vector databases are specialized storage systems optimized for storing, indexing, and
querying high-dimensional vector embeddings. They form the backbone of modern AI
applications by enabling fast similarity search at scale.
Key Features
High-Dimensional Storage
• Efficient Storage: Optimized compression for vector data
• Metadata Support: Store additional information alongside vectors
• Scalability: Handle millions to billions of vectors
• Data Types: Support various embedding dimensions and formats
Query Processing
• Query Vectorization: Convert search query to embedding
• Similarity Search: Use ANN algorithms for fast retrieval
• Filtering: Apply metadata filters if needed
• Result Ranking: Sort by similarity scores
Optimization Strategies
• Index Selection: Choose appropriate algorithms (HNSW, IVF)
• Sharding: Distribute data across multiple nodes
• Caching: Store frequently accessed results
• Load Balancing: Distribute query load efficiently
Semantic Search
• Find similar products, articles, or media content
• Support natural language queries
• Improve search relevance over keyword matching
• Enable cross-lingual search capabilities
Recommendation Systems
• Store user and item embeddings
• Find similar users or products
• Power personalization engines
• Support real-time recommendations
Anomaly Detection
• Identify outliers in vector space
• Detect unusual patterns or behaviors
• Security and fraud prevention
• Quality control in manufacturing
Integration Factors
• Existing Infrastructure: Compatibility with current systems
• API Preferences: REST, GraphQL, or native client libraries
• Programming Languages: SDK availability
• Deployment Options: Cloud, on-premise, or hybrid
Performance Requirements
• Query Latency: Sub-second vs seconds acceptable
• Accuracy: Exact vs approximate search needs
• Update Frequency: Real-time vs batch updates
• Consistency: Strong vs eventual consistency requirements
Cost Considerations
• Hosting Costs: Infrastructure and operational expenses
• Licensing: Open source vs commercial solutions
• Scaling Costs: Cost growth with data and usage
• Maintenance: Support and administrative overhead
Limitations to Consider:
• Self-reported results may have optimistic bias
• Public datasets may not reflect your specific domain
• Benchmark gaming through overfitting to test sets
• Limited coverage of domain-specific tasks
Task-Specific Scores:
• Classification accuracy for content categorization
• Clustering quality for document organization
• Similarity correlation for recommendation systems
Technical Specifications
Model Size:
• Parameter count affects memory requirements
• Larger models typically perform better but cost more
• Consider deployment constraints (edge vs cloud)
Max Token Length:
• OpenAI (text-embedding-3 series): Supports up to 8,192 tokens per input.
• Google Gemini (gemini-embedding-001): Supports up to 2,048 tokens per input.
• IBM Watsonx.ai: Maximum context length varies from 4,096 to 131,072 tokens,
depending on the foundation model.
• LlamaIndex: Handles embeddings with a maximum token length of 512 tokens.
• LangChain (OpenAIEmbeddings): Configurable embedding_ctx_length, default is
8,191 tokens.
• Jina Embeddings v3: Supports context lengths up to 8,192 tokens.
• ChEmbed: Tailored for chemical literature, supports context lengths up to 8,192
tokens.
• Hugging Face Open Source Models:
o BGE-M3: Supports up to 8,192 tokens per input.
o nomic-embed-text-v2-moe: Maximum sequence length is 512 tokens.
o sentence-transformers/bert-large-nli-max-tokens: Maximum input length is
512 tokens.
o all-MiniLM-L6-v2: Maximum input length is 512 tokens.
o instructor-xl: Maximum input length is 512 tokens.
o gte-multilingual-base: Maximum input length is 8,192 tokens.
o Qwen3-Embedding-8B: Maximum context length is 32,768 tokens.
o NV-Embed-v1: Maximum sequence length is 8,192 tokens.
Embedding Dimensions:
• OpenAI (text-embedding-3-small): Default is 1,536 dimensions.
• OpenAI (text-embedding-3-large): Default is 3,072 dimensions.
• Google Gemini (gemini-embedding-001): Default is 3,072 dimensions, with the
ability to truncate to 768, 1,536, or 3,072 dimensions using the
output_dimensionality parameter.
• IBM Watsonx.ai: Embedding models have 768 dimensions.
• LlamaIndex: Embedding dimensions vary; for instance, vdr-2b-multi-v1 has 768
dimensions.
• LangChain (OpenAIEmbeddings): Supports specifying embedding dimensions;
default varies by model.
• Jina Embeddings v3: Default is 1,024 dimensions, with flexibility to reduce to as low
as 32 dimensions using Matryoshka Representation Learning.
• ChEmbed: Embedding dimensions are not explicitly stated; however, the model is
optimized for chemical literature retrieval.
• Hugging Face Open Source Models:
o BGE-M3: Supports flexible dimension from 768 to 256 through Matryoshka
representation learning.
o nomic-embed-text-v2-moe: Supports flexible dimension from 768 to 256
through Matryoshka representation learning.
o sentence-transformers/bert-large-nli-max-tokens: Maps sentences &
paragraphs to a 1,024-dimensional dense vector space.
o all-MiniLM-L6-v2: Maps sentences & paragraphs to a 384-dimensional dense
vector space.
o instructor-xl: Embedding dimensions are not explicitly stated; however, the
model is optimized for instruction-based tasks.
o gte-multilingual-base: Embedding dimension is 768.
o Qwen3-Embedding-8B: Embedding dimension is up to 4,096, supports user-
defined output dimensions ranging from 32 to 4,096.
o NV-Embed-v1: Embedding dimension is 4,096.
Inference Speed:
• Processing time per input
• Critical for real-time applications
• Varies significantly between models
Cost Considerations
API Pricing:
• Cost per million tokens for commercial models
• OpenAI: ~$0.0001-0.0004 per 1K tokens
• Google: Free tier with usage limits
• Consider volume discounts
Infrastructure Costs:
• Self-hosting requirements for open-source models
• GPU memory and computational needs
• Scaling costs with usage growth
Operational Expenses:
• Monitoring and maintenance overhead
• Support and troubleshooting costs
• Update and migration expenses
Deployment Factors
Privacy Requirements:
• Data sovereignty and compliance needs
• On-premise vs cloud deployment options
• GDPR, HIPAA, or other regulatory requirements
Integration Complexity:
• API compatibility with existing systems
• SDK availability for your programming language
• Documentation and community support quality
Scalability Needs:
• Expected growth in data volume and users
• Auto-scaling capabilities
• Geographic distribution requirements
Selection Strategy
Phase 1: Initial Screening
1. Identify Requirements: Define your specific use case and constraints
2. Review Benchmarks: Check MTEB scores for relevant tasks
3. Technical Filtering: Eliminate models that don't meet basic requirements
4. Cost Analysis: Assess budget implications for shortlisted models
• Clustering Quality:
▪ Evaluate how well embeddings group similar documents
▪ Metrics: silhouette score, adjusted rand index
▪ Test on domain-specific document collections
Extrinsic Evaluation
• Downstream Task Performance:
▪ Test embeddings on your specific application
▪ Measure end-to-end system performance
▪ Examples: search relevance, classification accuracy
Metrics Collection
Similarity Scores:
• Query-document similarity distributions
• Inter-document similarity patterns
• Outlier detection and analysis
Retrieval Metrics:
• Precision@K: Accuracy of top K results
• Recall@K: Coverage of relevant results in top K
• Mean Average Precision (MAP): Overall ranking quality
• NDCG: Ranking quality with graded relevance
Comparative Analysis
Cross-Model Comparison:
• Side-by-side performance on same datasets
• Statistical significance testing
• Error analysis and failure mode identification
Ablation Studies:
• Impact of different parameters
• Effect of preprocessing choices
• Influence of context length and chunking
Robustness Testing
• Performance with typos and grammatical errors
• Handling of different text formats (bullets, tables)
• Resilience to adversarial inputs
Scalability Assessment
• Performance degradation with increased data volume
• Query latency under load
• Memory usage growth patterns
Enterprise Solutions:
• Google Vertex AI: Integrated enterprise features
Multilingual Applications
Top Choices:
• Google text-embedding-004: 100+ languages, consistent quality
• multilingual-e5-large: Open source, strong cross-lingual performance
• BGE-M3: Excellent multilingual capabilities
• Considerations: Language-specific performance variations
Implementation Guidelines
Getting Started Checklist
1. Define Requirements: Specify your use case, scale, and constraints
2. Choose Initial Model: Start with Google Gemini (Free tier) or Huggingface based
Open Source Embedding Models for testing
3. Prepare Test Data: Create representative evaluation dataset
4. Set Up Infrastructure: Choose vector database and deployment method
5. Implement Evaluation: Build metrics and monitoring systems
6. Test and Compare: Evaluate multiple models on your data
7. Deploy and Monitor: Launch with performance tracking
Best Practices
Data Preprocessing:
• Consistent text cleaning and normalization
• Appropriate chunking strategies for long documents
• Handling of special characters and formatting
Model Management:
• Version control for model updates
• A/B testing infrastructure for model comparison
• Rollback procedures for performance issues
Performance Optimization:
• Batch processing for efficiency
• Caching strategies for repeated queries
• Load balancing for high-traffic applications
Premature Optimization:
• Start with simple, proven solutions
• Optimize based on actual performance bottlenecks
• Don't over-engineer initial implementations
Insufficient Testing:
• Test edge cases and error conditions
• Evaluate performance under realistic load
• Monitor for drift in embedding quality over time
Migration and Scaling Strategies
Model Migration
• Gradual Rollout: Phase in new models with careful monitoring
Scaling Considerations
• Horizontal Scaling: Distribute load across multiple instances
7. Implementation Considerations
Technical Architecture
System Design Patterns
Microservices Architecture:
Embedding Generation:
• Batch processing for large volumes
Performance Optimization
Inference Optimization
• Batch Processing: Group multiple texts for efficient processing
Storage Optimization
• Vector Compression: Reduce storage requirements
Query Optimization
• Approximate Search: Trade accuracy for speed when appropriate
Privacy Considerations
• Data Minimization: Only store necessary information
Model Security
• Input Validation: Sanitize inputs to prevent injection attacks
Research Papers
• MTEB Paper: "MTEB: Massive Text Embedding Benchmark"
• Sentence-BERT: "Sentence-BERT: Sentence Embeddings using Siamese BERT-
Networks"
• E5 Models: "Text Embeddings by Weakly-Supervised Contrastive Pre-training"
Evaluation Tools
• BEIR: Information retrieval evaluation
• SentEval: Sentence embedding evaluation
• MTEB: Comprehensive embedding benchmark
Technical Specifications
• Model Size: 308MB (EmbeddingGemma) to 28GB+ (NV-Embed-v2), most production
models 1-7GB
• Max Tokens: 2,048 (Google Gemini) to 8,191 (OpenAI), trend toward longer contexts
• Embedding Dimensions: 384-3,072+ dimensions, many support Matryoshka
representation
• Inference Speed: <50ms typical for production workloads, varies by model
complexity
Cost Considerations
• OpenAI: $0.00002 (3-small) to $0.00013 (3-large) per 1K tokens
• Google Gemini: Free tier available, competitive paid pricing for production
• Self-hosting: $200-15,000+ monthly depending on scale and model choice
• Managed Vector DBs: $50-1,000+ per million vectors monthly, varies by features
needed