Skip to the content.

Performance Tuning Guide

This guide provides recommendations for optimizing the performance of your Qdrant multi-node cluster deployment. Fine-tuning your configuration can significantly improve search speed, throughput, and resource utilization.

Cluster Configuration

Optimizing Node Count

The number of nodes in your Qdrant cluster affects both performance and availability:

To adjust the node count, modify the docker-compose.yml file by adding or removing node definitions.

Shard Configuration

Sharding is critical for distributing data across nodes:

# In settings.py
SHARD_NUMBER = 4  # Default value

Recommended shard numbers:

Relation to nodes:

To change the shard count:

  1. Modify SHARD_NUMBER in src/qdrant_demo/config/settings.py
  2. Update the collection creation in QdrantClusterDemo.setup_collection()

Vector Indexing Optimization

HNSW Parameters

Qdrant uses HNSW (Hierarchical Navigable Small World) for vector indexing. Key parameters:

client.create_collection(
    collection_name="optimized_collection",
    vectors_config=models.VectorParams(
        size=768,
        distance=models.Distance.COSINE,
        hnsw_config=models.HnswConfigDiff(
            m=16,               # Number of edges per node
            ef_construct=100,   # Size of the dynamic candidate list during construction
            full_scan_threshold=10000  # Threshold for using HNSW vs exhaustive search
        )
    ),
    # ... other parameters
)

Parameter recommendations:

Optimization Thresholds

Configure when optimization processes occur:

client.create_collection(
    # ... other parameters
    optimizers_config=models.OptimizersConfigDiff(
        indexing_threshold=20000,    # When to start building index
        memmap_threshold=50000,      # When to switch to disk-based storage
        default_segment_number=5     # Target number of segments
    ),
)

Recommended values:

Vector Storage Configuration

Memory vs. Disk

Configure on-disk storage for better scalability:

client.create_collection(
    # ... other parameters
    on_disk_payload=True,    # Store payload on disk
    on_disk=False            # Store vectors in memory (faster searches)
)

Guidelines:

Memory Allocation

For Docker-based deployments, allocate sufficient memory:

qdrant_node1:
  # ... other configuration
  deploy:
    resources:
      limits:
        memory: 4G

Memory sizing guidelines:

Query Optimization

Filtering Strategies

Combine vector search with efficient filtering:

client.search(
    collection_name="my_collection",
    query_vector=query_vector,
    query_filter=models.Filter(
        must=[
            models.FieldCondition(
                key="category",
                match=models.MatchValue(value="electronics")
            )
        ]
    ),
    limit=10
)

Performance tips:

Search Parameters

Fine-tune search accuracy vs. speed:

client.search(
    collection_name="my_collection",
    query_vector=query_vector,
    limit=10,
    params=models.SearchParams(
        hnsw_ef=128,             # Size of the dynamic candidate list
        exact=False              # Use approximate search
    )
)

Guidelines:

Batch Operations

For better throughput when inserting data:

# Instead of inserting one point at a time
points = []
for i in range(1000):
    points.append(models.PointStruct(
        id=i,
        vector=generate_random_vector(768),
        payload=generate_payload()
    ))

# Insert in a single batch operation
client.upsert(
    collection_name="my_collection",
    points=points
)

Batch operation tips:

Monitoring Performance

Use the included Prometheus and Grafana setup to monitor:

  1. Query latency: Track p95/p99 search times
  2. CPU/Memory usage: Watch for resource bottlenecks
  3. Disk I/O: Important when using on-disk storage

Key metrics to watch:

Production Deployment Recommendations

Hardware Recommendations

For production deployments, consider:

Cloud Deployment

When deploying to cloud environments:

Load Balancing

For high-availability setups: