| title | Query Plan Caching |
|---|---|
| subtitle | Configure in-memory and distributed caching for query plans |
| description | Configure query plan caching to improve router performance by storing generated query plans in memory or Redis. |
import RedisTls from '../../../shared/redis-tls.mdx';
Whenever your router receives an incoming GraphQL operation, it generates a query plan to determine which subgraphs it needs to query to resolve that operation.
By caching previously generated query plans, your router can skip generating them again if a client later sends the exact same operation—improving your router's responsiveness.
The router is a highly scalable and low-latency runtime. Even with all caching disabled, the time to process operations and query plans is minimal (nanoseconds to milliseconds) when compared to the overall supergraph request, except in edge cases of extremely large operations and supergraphs.
Caching offers stability to those running a large graph so your overhead for given operations stays consistent, not that it dramatically improves. To validate the performance wins of operation caching, check out the traces and metrics in the router to take measurements before and after.
In extremely large edge cases though, the cache can save 2-10x time to create the query plan, which is still a small part of the overall request.
GraphOS Router enables query plan caching by default using an in-memory LRU cache. In your router's YAML config file, you can configure the maximum number of query plan entries in the cache:
supergraph:
query_planning:
cache:
in_memory:
limit: 512 # This is the default value.When loading a new schema, a query plan might change for some queries, so cached query plans cannot be reused.
To prevent increased latency upon query plan cache invalidation, the router precomputes query plans for the most used queries from the cache when a new schema is loaded.
Precomputed plans are cached before the router switches traffic over to the new schema.
You can also send the header Apollo-Expose-Query-Plan: dry-run for generating query plans at runtime, which you can use to warm up your cache instances with a custom-defined operation list.
By default, the router warms up the cache with 30% of the queries already in cache, but you can configure it as follows:
supergraph:
query_planning:
# Pre-plan the 100 most used operations when the supergraph changes
warmed_up_queries: 100Additionally, the router can use the persisted query list to prewarm the cache. By default, the router prewarms the cache when loading a new schema but not on startup. You can configure the router to change these defaults.
With router v1.61.0+ and v2.x+, if you have enabled exposing query plans via --dev mode or plugins.experimental.expose_query_plan: true, you can pass the Apollo-Expose-Query-Plan header to return query plans in the GraphQL response extensions. You must set the header to one of the following values:
true: Returns a human-readable string and JSON blob of the query plan while still executing the query to fetch data.dry-run: Generates the query plan and aborts without executing the query.
After using dry-run, query plans are saved to your configured cache locations. Using real, mirrored, or similar-to-production operations is a great way to warm up the caches before transitioning traffic to new router instances.
To get more information on the planning and warm-up process, use the following metrics (where <storage> can be redis for distributed cache or memory):
apollo.router.cache.hit.time.count{kind="query planner", storage="<storage>"}apollo.router.cache.miss.time.count{kind="query planner", storage="<storage>"}
apollo.router.query_planning.plan.duration: time spent planning queriesplanner: The query planner implementation used (rustorjs)outcome: The outcome of the query planning process (success,timeout,cancelled,error)
apollo.router.schema.load.duration: time spent loading a schemaapollo.router.cache.hit.time{kind="query planner", storage="<storage>"}: time to get a value from the cacheapollo.router.cache.miss.time{kind="query planner", storage="<storage>"}
apollo.router.cache.size{kind="query planner", storage="memory"}: current size of the cache (only for in-memory cache)apollo.router.cache.storage.estimated_size{kind="query planner", storage="memory"}: estimated storage size of the cache (only for in-memory query planner cache)
To define the right size of the in-memory cache, monitor apollo.router.cache.size and the cache hit rate. Then examine apollo.router.schema.load.duration and apollo.router.query_planning.plan.duration to decide how much time to spend warming up queries.
<PlanRequired plans={["Free", "Developer", "Standard", "Enterprise"]}>
Rate limits apply on the Free plan. Performance pricing applies on Developer and Standard plans. Developer and Standard plans require Router v2.6.0 or later.
If you have multiple GraphOS Router instances, those instances can share a Redis-backed cache for their query plans. This means that if any of your router instances caches a particular value, all of your instances can look up that value to significantly improve responsiveness.
To use distributed caching:
- You must have a Redis cluster (or single instance) that your router instances can communicate with.
- You must have a GraphOS Enterprise plan and connect your router to GraphOS.
Whenever a router instance requires a query plan to resolve a client operation:
- The router instance checks its own in-memory cache for the required value and uses it if found.
- If not found, the router instance then checks the distributed Redis cache for the required value and uses it if found. It also then replicates the found value in its own in-memory cache.
- If not found, the router instance generates the required query plan.
- The router instance stores the obtained value in both the distributed cache and its in-memory cache.
The distributed caching configuration must contain one or more URLs using different schemes depending on the expected deployment:
redis— TCP connected to a centralized server.rediss— TLS connected to a centralized server.redis-cluster— TCP connected to a cluster.rediss-cluster— TLS connected to a cluster.redis-sentinel— TCP connected to a centralized server behind a sentinel layer.rediss-sentinel— TLS connected to a centralized server behind a sentinel layer.
The URLs must have the following format:
redis|rediss :// [[username:]password@] host [:port][/database]
Example: redis://localhost:6379
redis|rediss[-cluster] :// [[username:]password@] host [:port][?[node=host1:port1][&node=host2:port2][&node=hostN:portN]]
or, if configured with multiple URLs:
[
"redis|rediss[-cluster] :// [[username:]password@] host [:port]",
"redis|rediss[-cluster] :// [[username:]password@] host1 [:port1]",
"redis|rediss[-cluster] :// [[username:]password@] host2 [:port2]"
]
redis|rediss[-sentinel] :// [[username1:]password1@] host [:port][/database][?[node=host1:port1][&node=host2:port2][&node=hostN:portN]
[&sentinelServiceName=myservice][&sentinelUsername=username2][&sentinelPassword=password2]]
or, if configured with multiple URLs:
[
"redis|rediss[-sentinel] :// [[username:]password@] host [:port][/database][?[&sentinelServiceName=myservice][&sentinelUsername=username2][&sentinelPassword=password2]]",
"redis|rediss[-sentinel] :// [[username1:]password1@] host [:port][/database][?[&sentinelServiceName=myservice][&sentinelUsername=username2][&sentinelPassword=password2]]"
]
In your router's YAML config file, you should specify your Redis URLs via environment variables and variable expansion. This prevents your Redis URLs from being committed to version control, which is especially dangerous if they include authentication information like a username and/or password.
Cached query plans are not evicted on schema refresh, which can quickly lead to distributed cache overflow when combined with cache warm-up and frequent schema publishes.
Test your cache configuration with expected queries and consider decreasing the TTL to prevent cache overflow.
To enable distributed caching of query plans, add the following to your router's YAML config file:
supergraph:
query_planning:
cache:
redis:
urls: ["redis://..."]The value of urls is a list of URLs for all Redis instances in your cluster.
All query plan cache entries will be prefixed with plan. within the distributed cache.
supergraph:
query_planning:
cache:
redis:
urls: ["redis://..."]
username: admin/123 # Optional, can be part of the urls directly, mainly useful if you have special character like '/' in your password that doesn't work in url. This field takes precedence over the username in the URL
password: admin # Optional, can be part of the urls directly, mainly useful if you have special character like '/' in your password that doesn't work in url. This field takes precedence over the password in the URL
timeout: 2s # Optional, by default: 500ms
ttl: 24h # Optional
namespace: "prefix" # Optional
#tls:
required_to_start: false # Optional, defaults to false
reset_ttl: true # Optional, defaults to true
pool_size: 4 # Optional, defaults to 1Connecting and sending commands to Redis have a timeout of 500ms by default, which you can override.
The ttl option defines the default global expiration for Redis entries. For query plan caching, the default expiration is set to 30 days.
When enabling distributed caching, consider how frequently you publish new schemas and configure the TTL accordingly. When new schemas are published, the router pre-warms the in-memory and distributed caches but doesn't invalidate existing cached query plans in the distributed cache, creating an additive effect on cache utilization.
To prevent cache overflow, consider decreasing the TTL to 24 hours or twice the median publish interval (whichever's lesser), and monitor cache utilization in your environment, especially during schema publish events.
Also note that when cache warm-up is enabled, each router instance will warm the distributed cache with query plans from its own in-memory cache. In the worst case, a schema publish will increase the number of query plans in the distributed cache by the number of router instances multiplied by the number of warmed-up queries per instance, which may noticeably increase the total cache utilization.
Be sure to test your configuration with expected queries and during schema publish events to understand the impact of distributed caching on cache utilization.When using the same Redis instance for multiple purposes, the namespace option defines a prefix for all the keys defined by the router.
Nest this tls configuration under supergraph.query_planning.cache.
When active, the required_to_start option will prevent the router from starting if it cannot connect to Redis. By default, the router will still start without a connection to Redis, which would result in only using the in-memory cache for query planning.
When this option is active, accessing a cache entry in Redis will reset its expiration.
The pool_size option defines the number of connections to Redis that the router will open. By default, the router will open a single connection to Redis. If there is a lot of traffic between router and Redis and/or there is some latency in those requests, it is recommended to increase the pool size to reduce that latency.
If the router uses distributed caching for query plans, the warm-up phase also stores the new query plans in Redis. Since all router instances might have the same distributions of queries in their in-memory cache, the list of queries is shuffled before warm-up, so each router instance can plan queries in a different order and share their results through the cache.