Qdrant Resource Optimization Guide
Qdrant Resource Optimization Guide
A Complete
Guide to
Resource
Optimization
Resource optimization isn’t just a choice - it’s a necessity.
07 Configure Indexing for Faster Searches 16 Reduce Overhead With Batch Processing
11 Multitenancy 21 Conclusion
12 Custom Sharding
What’s in This Guide? Insider tips to help you fine-tune your vector database for peak performance.
Is your AI application
resource-hungry and hard
to scale?
You're not alone. Many organizations struggle with unoptimized
implementations, leading to wasted resources and limited real-world impact.
But here's the good news: by fine-tuning your setup, you can transform your
vector database into a powerhouse of efficiency and performance.
Highlights Include:
01 02
When You Scale Up If You’re Facing Budget Constraints
As data grows and the requests surge, optimizing Strike the perfect balance between performance and
resource usage ensures your systems stay responsive cost, cutting unnecessary expenses while maintaining
and cost-efficient, even under heavy loads. essential capabilities.
03 04
You Need Better Performance When System Stability is Paramount
If you’re noticing slow query speeds, latency issues, or To manage high-traffic environments you will need to
frequent timeouts, it’s time to fine-tune your resource prevent crashes or failures caused by resource
allocation. exhaustion.
layer 2 nearest
point on
What is it?
layer 1 nearest
neighbour on
A vector index is the central location where Qdrant calculates layer l = 1 entry
point on
neighbour on
entry
layer l = 0
point on
layer l = 0
Figure 2: A sample HNSW vector index with three layers. Follow the blue arrow
on the top layer to see how a query travels throughout the database index. The
closest result is on the bottom level, nearest to the gray query point. Figure 2
Optimization
This controls the number of edges in the graph. A higher value
enhances search accuracy but demands more memory and
build time. Fine-tune this to balance memory usage and
Parameters
precision.
ef (Search Range)
This determines how many neighbors are evaluated during a
search query. You can adjust this to balance query speed and
accuracy.
Scalar Quantization
Scalar quantization strikes an excellent balance between
compression and performance, making it the go-to choice for float 32 vector [ range -1 to +1 ]
Binary Quantization
Original Vector
Binary quantization is ideal for high-dimensional datasets and
compatible embedding models, where compression and speed
0.98 -0.08 0.72 -0.43 -0.12 0.01 size= 6144 bytes
are paramount.
Precision management: Figure 4: This method causes maximum compression. It reduces memory usage by 32x and speeds up
Consider rescoring or oversampling to offset precision loss. searches by up to 40x.
Multitenancy
A single collection can handle the complexity of global
multi-user systems.
collection_name=”tenant_data”
Learn more about Multitenancy
Why Multitenancy?
Logical Isolation:
Tenant 1 Tenant 2
Ensures each tenant’s data remains separate while
residing in the same collection.
id=1 ,
id=2 ,
id=3 ,
Minimized Overhead: payload={“group_id”: “tenant_1”} ,
payload={“group_id”: “tenant_1”} ,
payload={“group_id”: “tenant_2”} ,
Reduces resource consumption compared to vector=[0.9, 0.1, 0.1] , vector=[0.1, 0.9, 0.1] , vector=[0.1, 0.1, 0.9] ,
Scalability:
H andles high user volumes without compromising Figure 5: Each individual vector is assigned a specific payload that denotes which tenant it belongs to. This is how
performance. a large number of different tenants can share a single Qdrant collection
Custom Update
Sharding
shard 1: canada shard 3: usa shard 2: germany shard 4: india
Retrieve
Figure 6
Figure 6: Users can both upsert and query shards that are relevant to them, all within the same collection. Regional
sharding can help avoid cross-continental traffic.
Sharding is a critical strategy in Qdrant for splitting collections into smaller units, called shards, to
efficiently distribute data across multiple nodes. It’s a powerful tool for improving scalability and
maintaining performance in large-scale system
When dealing with large datasets, it’s impractical—and inefficient—to search through every
Optimize Queries data point. Instead, you can significantly improve performance by applying filters on specific
payload fields. Filtering allows you to narrow down the search space by excluding irrelevant data
by Properly points, thereby reducing the computational load and focusing only on the most relevant subset
of your dataset.
Query Vector
Introducing Filters
Additional Link
01 02 03
Entry point Entry point Entry point
Figure 7: This technique builds additional links (orange) between leftover data points. The filtered points which stay
behind are now traversable once again. Qdrant uses special category-based methods to connect these data points.
Qdrant’s filterable vector index works fast and it is the best method of capturing all available
results.
Sparse Results
Hybrid search combines keyword filtering with vector similarity search,
enabling faster and more precise results. Keywords help narrow down the
dataset quickly, while vector similarity ensures semantic accuracy. Hybrid Normalization
search in Qdrant combines results from two types of data:
Dense Results
Dense vector search:
Finds results based on semantic similarity using vector embeddings. Sparse Results
Considerations
...
Reranking can be computationally expensive, so aim for a balance between D1 D2 D3
relevance and speed. The first step is to retrieve the relevant documents and D1
then use reranking. D4 ... DN
Data 2 Shards
cycle. It’s an effective strategy for both data insertion and query execution. BATCHED Cluster Peers
Shards
Cluster Peers
Similarly, you can batch multiple queries together rather than executing
them one by one. This reduces the number of round trips to the database,
Data 3
optimizing performance and reducing latency. Batch queries are particularly
useful when processing a large number of similar queries or when handling
multiple user requests simultaneously.
Learn more about Batch Queries Figure 10: Batch Processing Example
Storage
Management How it works
All data is stored in RAM, providing the fastest access times for queries and
operations.
As your data scales, effective resource management
becomes crucial to keeping costs low while ensuring
your application remains reliable and performant.
When to use it
This setup is ideal for applications where performance is critical, and your RAM
Qdrant supports two main methods for storing vectors capacity can accommodate all vectors and their payloads.
and payloads: InMemory and OnDisk/Memmap.
Advantages
Learn more about Storage
You can reach the maximum speed for vector/payload queries and updates.
Limitations
RAM usage can become a bottleneck as your dataset grows.
InMemory
OnDisk / Memmap
Management Instead of loading all data into memory, memmap storage maps data files directly
to a virtual address space on disk. The system's page cache handles data access,
making it highly efficient.
Perfect for storing large collections that exceed your available RAM while still
maintaining near in-memory performance when enough RAM is available.
Qdrant supports two main methods for storing vectors
and payloads: InMemory and OnDisk/Memmap.
Advantages
Learn more about Storage Balances performance and memory usage, allowing you to work with datasets
larger than your physical RAM. For larger datasets or scenarios where memory is
limited, OnDisk storage is more suitable. This method significantly reduces
memory usage by storing data on disk.
Limitations
InMemory
Slightly slower than pure in-memory storage but significantly more scalable.
OnDisk / Memmap
Disk: HDD:
Suitable for less frequently accessed data, such as payloads and non-critical While more cost-effective, HDDs are slower and can negatively impact
information. Disk-backed storage reduces memory demands but can performance, especially for large datasets or applications requiring high-
introduce slight latency. speed access.
Optimization Cheatsheet
Connecting the techniques discussed earlier to common challenges in managing vector
databases can provide practical strategies to optimize performance and scalability.
Reduce Excessive Memory Manage Large Datasets in FixSlow Performance and Query Avoid High Costs by Reducing
Consumption Distributed Systems Timeouts Data Overlap
Large datasets, particularly vector data, can lead As the volume of vector data increases, storing and Performance bottlenecks during querying and retrieval In multi-user environments, isolating user-specific data is
to significant memory usage, straining resources managing it across distributed nodes becomes can result in slow responses or timeouts, particularly in crucial to ensure data security and prevent interference
and degrading system performance. challenging, often leading to slower data retrieval times systems with poor optimization or inefficient storage between users. However, managing this isolation across
and scaling issues. methods. multiple nodes can lead to increased operational costs.
Offloading Load from RAM to SSDs:
Extend memory capacity by storing data on Partition and distribute the dataset across multiple Optimize & refine query structures: Implement a multitenant architecture:
SSDs, which offer a cost-effective way to reduce nodes in a cluster to enhance scalability and improve Leverage approximate nearest neighbor (ANN) This allows user data to be efficiently isolated while
RAM reliance while maintaining reasonable retrieval performance. algorithms to reduce query latency and enhance sharing a common infrastructure. This approach
access speeds. response times. minimizes redundancy by storing user data logically
Load Balancing in requests:
separated within the same system, reducing both
Vector Quantization Techniques: Distribute incoming requests evenly across nodes to Cache frequently accessed data in memory:
resource usage and costs.
Compress vectors to lower memory ensure consistent performance and avoid overloading This minimizes the load on the database and speeds up
requirements without compromising accuracy any single node. query execution. Build access control mechanisms:
significantly. Techniques like scalar or product This ensures that user data remains isolated, even within
quantization can be particularly effective. a shared infrastructure.
Working with 10,000,000+ vectors and want to make sure you are set up for success? Talk to sales
About Qdrant:
Qdrant is the leading, high-performance, scalable, open-source vector database and search
Qdrant is able to handle billions of vectors, supports the matching of semantically complex
objects, and is implemented in Rust for performance, memory safety, and scale. Recently,
the company was recognized among the top 10 startups on Sifted’s 2024 B2B SaaS Rising
100, which annually ranks Europe's most promising B2B SaaS startups valued under $1bn.
https://qdrant.tech/
https://discord.com/invite/qdrant
https://github.com/qdrant/qdrant