Skip to content

Latest commit

 

History

History
655 lines (510 loc) · 25.4 KB

File metadata and controls

655 lines (510 loc) · 25.4 KB
title Demand Control
subtitle Protect your graph from high-cost GraphQL operations
description Protect your graph from malicious or demanding clients with Apollo GraphOS Router's demand control features. Estimate, calculate, observe, and reject high-cost GraphQL operations.
minVersion Router v1.48.0

<PlanRequired plans={["Free", "Developer", "Standard", "Enterprise"]}>

Rate limits apply on the Free plan. Developer and Standard plans require Router v2.6.0 or later.

What is demand control?

Demand control provides a way to secure your supergraph from overly complex operations, based on the IBM GraphQL Cost Directive specification.

Application clients can send overly costly operations that overload your supergraph infrastructure. These operations may be costly due to their complexity and/or their need for expensive resolvers. In either case, demand control can help you protect your infrastructure from these expensive operations. When your router receives a request, it calculates a cost for that operation. If the cost is greater than your configured maximum, the operation is rejected.

Calculating cost

When calculating the cost of an operation, the router sums the costs of the sub-requests that it plans to send to your subgraphs.

  • For each operation, the cost is the sum of its base cost plus the costs of its fields.
  • For each field, the cost is defined recursively as its own base cost plus the cost of its selections. In the IBM specification, this is called field cost.

The cost of each operation type:

Mutation Query Subscription
type 10 0 0

The cost of each GraphQL element type, per operation type:

Mutation Query Subscription
Object 1 1 1
Interface 1 1 1
Union 1 1 1
Scalar 0 0 0
Enum 0 0 0

Using these defaults, the following operation would have a cost of 4.

query BookQuery {
  book(id: 1) {
    title
    author {
      name
    }
    publisher {
      name
      address {
        zipCode
      }
    }
  }
}
1 Query (0) + 1 book object (1) + 1 author object (1) + 1 publisher object (1) + 1 address object (1) = 4 total cost

Customizing cost

Since version 1.53.0, the router supports customizing the cost calculation with the @cost directive. The @cost directive has a single argument, weight, which overrides the default weights from the table above.

The Apollo Federation @cost directive differs from the IBM specification in that the weight argument is of type Int! instead of String!.

Annotating your schema with the @cost directive customizes how the router scores operations. For example, imagine that the Address resolver for an example query is particularly expensive. We can annotate the schema with the @cost directive with a larger weight:

type Query {
  book(id: ID): Book
}

type Book {
  title: String
  author: Author
  publisher: Publisher
}

type Author {
  name: String
}

type Publisher {
  name: String
  address: Address
}

type Address
  @cost(weight: 5) { #highlight-line
  zipCode: Int!
}

This increases the cost of BookQuery from 4 to 8.

1 Query (0) + 1 book object (1) + 1 author object (1) + 1 publisher object (1) + 1 address object (5) = 8 total cost

Handling list fields

During the static analysis phase of demand control, the router doesn't know the size of the list fields in a given query. It must use estimates for list sizes. The closer the estimated list size is to the actual list size for a field, the closer the estimated cost will be to the actual cost.

The difference between estimated and actual operation cost calculations is due only to the difference between assumed and actual sizes of list fields.

There are three ways to indicate the expected list sizes to the router:

The @listSize directive supports field-level granularity in setting list size. By using its assumedSize argument, you can set a statically defined list size for a field. If you are using paging parameters which control the size of the list, use the slicingArguments argument.

Continuing with our example above, let's add two queryable fields. First, we will add a field which returns the top five best selling books:

type Query {
  book(id: ID): Book
  bestsellers: [Book] @listSize(assumedSize: 5)
}

With this schema, the following query has a cost of 40:

query BestsellersQuery {
  bestsellers {
    title
    author {
      name
    }
    publisher {
      name
      address {
        zipCode
      }
    }
  }
}
1 Query (0) + 5 book objects (5 * (1 book object (1) + 1 author object (1) + 1 publisher object (1) + 1 address object (5))) = 40 total cost

The second field we will add is a paginated resolver. It returns the latest additions to the inventory:

type Query {
  book(id: ID): Book
  bestsellers: [Book] @listSize(assumedSize: 5)
  #highlight-start
  newestAdditions(after: ID, limit: Int!): [Book]
    @listSize(slicingArguments: ["limit"])
  #highlight-end
}

The number of books returned by this resolver is determined by the limit argument.

query NewestAdditions {
  newestAdditions(limit: 3) {
    title
    author {
      name
    }
    publisher {
      name
      address {
        zipCode
      }
    }
  }
}

The router will estimate the cost of this query as 24. If the limit was increased to 7, then the cost would increase to 56.

When requesting 3 books:
1 Query (0) + 3 book objects (3 * (1 book object (1) + 1 author object (1) + 1 publisher object (1) + 1 address object (5))) = 24 total cost

When requesting 7 books:
1 Query (0) + 7 book objects (7 * (1 book object (1) + 1 author object (1) + 1 publisher object (1) + 1 address object (5))) = 56 total cost

List-type arguments in slicingArguments

When a slicingArguments entry references a list-type argument, the router uses the length of that list as the list size. This is useful when your resolver returns one result per input ID or tag, instead of using a numeric paging argument.

type Query {
  book(id: ID): Book
  bestsellers: [Book] @listSize(assumedSize: 5)
  newestAdditions(after: ID, limit: Int!): [Book]
    @listSize(slicingArguments: ["limit"])
  #highlight-start
  booksByIds(ids: [ID!]!): [Book]
    @listSize(slicingArguments: ["ids"])
  #highlight-end
}

With your schema, the router uses the length of the ids argument as the list size:

query BooksByIds {
  booksByIds(ids: ["abc", "def", "ghi"]) {
    title
    author {
      name
    }
  }
}
1 Query (0) + 3 book objects (3 * (1 book object (1) + 1 author object (1))) = 6 total cost

This also works when the list is passed using query variables.

query BooksByIds($ids: [ID!]!) {
  booksByIds(ids: $ids) {
    title
    author {
      name
    }
  }
}
{
  "ids": ["abc", "def", "ghi", "jkl", "mno"]
}
1 Query (0) + 5 book objects (5 * (1 book object (1) + 1 author object (1))) = 10 total cost

Nested paths in slicingArguments

When the list size comes from an argument nested in an input object, use a dot-separated path in slicingArguments. The router resolves the path against the query arguments and variable values to determine the size.

input PaginationInput {
  first: Int
  after: String
}

input SearchInput {
  pagination: PaginationInput
  query: String
}

type Query {
  search(input: SearchInput!): [Book]
    @listSize(slicingArguments: ["input.pagination.first"], requireOneSlicingArgument: true)
}

For a query such as search(input: { pagination: { first: 10 }, query: "fiction" }), the router uses 10 as the list size. You can use deeper paths (for example, input.level1.level2.count).

The sizedFields argument

When a field returns an object containing one or more list fields, use the sizedFields argument to specify which of those object's list fields use the list size to calculate cost. Only selections matching those paths are scaled by the size.

type Query {
  container(first: Int): ResultContainer
    @listSize(slicingArguments: ["first"], 
      sizedFields: ["page"], 
      requireOneSlicingArgument: false)
}

type ResultContainer {
  page: [Book]     # in sizedFields; uses list size from slicingArguments "first"
  recent: [Book]   # not in sizedFields; uses default list size
  metadata: String # not in sizedFields; cost is 1
}

For nested lists, use a path in sizedFields such as "results { page }" so that the list size applies to the inner list when the query selects deepContainer { results { page { ... } } }:

type Query {
  deepContainer(first: Int): DeepContainer
    @listSize(slicingArguments: ["first"], sizedFields: ["results { page }"], requireOneSlicingArgument: false)
}

type DeepContainer {
  results: ResultContainer
}

type ResultContainer {
  page: [Book]
}

Subgraph-level demand control

Subgraph-level demand control lets you enforce per-subgraph query cost limits in Apollo Router, in addition to the existing global cost limit for the whole supergraph. This helps you protect specific backend services that have different capacity or cost profiles from being overwhelmed by expensive operations.

When a subgraph‑specific cost limit is exceeded, the router:

  • Still runs the rest of the operation, including other subgraphs whose cost is within limits.
  • Skips calls to only the over‑budget subgraph, and composes the response as if that subgraph had returned null, instead of rejecting the entire query.

Per‑subgraph limits apply to the total work for that subgraph in a single operation. For each request, the router tracks the aggregate estimated cost per subgraph across the entire query plan. If the same subgraph is fetched multiple times (for example, through entity lookups, nested fetches, or conditional branches), those costs are summed together and the subgraph’s limit is enforced against that total.

Configuring demand control

To enable demand control in the router, configure the demand_control option in router.yaml:

demand_control:
  enabled: true
  mode: measure
  strategy:
    static_estimated:
      list_size: 10
      max: 1000
      subgraph:
        all:
          list_size: 10
          max: 800 # any subgraph can receive operations totaling at most 800
        subgraphs:
          products:
            list_size: 20 # overrides list_size = 10 from `subgraph.all`
          users:
            max: 200 # overrides max = 800 from `subgraph.all`

When demand_control is enabled, the router measures the cost of each operation and can enforce operation cost limits, based on additional configuration.

Customize demand_control with the following settings:

Option Valid values Default value Description
enabled boolean false Set to true to measure operation costs or enforce operation cost limits.
mode measure, enforce -- - measure collects information about the cost of operations.
- enforce rejects operations exceeding configured cost limits.
strategy static_estimated -- static_estimated estimates the cost of an operation before it is sent to a subgraph
static_estimated.actual_cost_mode by_subgraph, by_response_shape by_subgraph - by_subgraph calculates the cost of an operation as the sum of the cost of each subgraph response.
- by_response_shape calculates the cost based on only the final shape of the response.
static_estimated.list_size integer -- The assumed maximum size of a list for fields that return lists.
static_estimated.max integer -- The maximum cost of an accepted operation. An operation with a higher cost than this is rejected.
static_estimated.subgraph integer (optional) -- Subgraph-level demand control (requires router >v2.12.0).
static_estimated.subgraph.all.list_size integer (optional) -- The assumed maximum size of a list for fields that return lists.
static_estimated.subgraph.all.max float (optional) -- The maximum cost accepted by a subgraph.
static_estimated.subgraph.subgraphs.*.list_size integer (optional) -- The assumed maximum size of a list for fields that return lists, for this subgraph.
static_estimated.subgraph.subgraphs.*.max float (optional) -- The maximum cost accepted by this subgraph.

When enabling demand_control for the first time, set it to measure mode. This will allow you to observe the cost of your operations before setting your maximum cost.

Actual cost calculation mode

Use the actual_cost_mode option to choose how the actual cost of an operation is calculated.

  • by_subgraph (default for Router versions v2.12.0 and later) sums the cost of each subgraph response. This method reflects the total work done per operation and more closely mirrors the cost estimation strategy. Use this method unless you have pre-existing plugins or coprocessors that make decisions using the actual cost.
  • by_response_shape uses only the final shape of the response. It doesn't consider the intermediate work required to attain that shape, such as fields that were fetched to support federated lookups that were not included in the client response. This behavior was the only option in Router versions v2.11.0 and earlier.

Telemetry for demand control

New to router telemetry? See Router Telemetry.

You can define router telemetry to gather cost information and gain insights into the cost of operations sent to your router:

  • Generate histograms of operation costs by operation name, where the estimated cost is greater than an arbitrary value.
  • Attach cost information to spans.
  • Generate log messages whenever the cost delta between estimated and actual is greater than an arbitrary value.

Instruments

Instrument Description
cost.actual The actual cost of an operation, measured after execution.
cost.estimated The estimated cost of an operation before execution.
cost.delta The difference between the actual and estimated cost.

Attributes

Attributes for cost can be applied to instruments, spans, and events—anywhere supergraph attributes are used.

Attribute Value Description
cost.actual boolean The actual cost of an operation, measured after execution.
cost.estimated boolean The estimated cost of an operation before execution.
cost.delta boolean The difference between the actual and estimated cost.
cost.result boolean The return code of the cost calculation. COST_OK or an error code

Selectors

Selectors for cost can be applied to instruments, spans, and events—anywhere supergraph attributes are used.

Key Value Default Description
cost estimated, actual, delta, result The estimated, actual, or delta cost values, or the result string

Examples

Example instrument

Enable a cost.estimated instrument with the cost.result attribute:

telemetry:
  instrumentation:
    instruments:
      supergraph:
        cost.estimated:
          attributes:
            cost.result: true
            graphql.operation.name: true

Example span

Enable the cost.estimated attribute on supergraph spans:

telemetry:
  instrumentation:
    spans:
      supergraph:
        attributes:
          cost.estimated: true

Example event

Log an error when cost.delta is greater than 1000:

telemetry:
  instrumentation:
    events:
      supergraph:
        COST_DELTA_TOO_HIGH:
          message: "cost delta high"
          on: event_response
          level: error
          condition:
            gt:
              - cost: delta
              - 1000
          attributes:
            graphql.operation.name: true
            cost.delta: true

Filtering by cost result

In router telemetry, you can customize instruments that filter their output based on cost results.

For example, you can record the estimated cost when cost.result is COST_ESTIMATED_TOO_EXPENSIVE:

telemetry:
  instrumentation:
    instruments:
      supergraph:
        # custom instrument
        cost.rejected.operations:
          type: histogram
          value:
            # Estimated cost is used to populate the histogram
            cost: estimated
          description: "Estimated cost per rejected operation."
          unit: delta
          condition:
            eq:
              # Only show rejected operations.
              - cost: result
              - "COST_ESTIMATED_TOO_EXPENSIVE"
          attributes:
            graphql.operation.name: true # Graphql operation name is added as an attribute

Configuring instrument output

When analyzing the costs of operations, if your histograms are not granular enough or don't cover a sufficient range, you can modify the views in your telemetry configuration:

telemetry:
  exporters:
    metrics:
      common:
        views:
          # Define a custom view because cost is different than the default latency-oriented view of OpenTelemetry
          - name: cost.*
            aggregation:
              histogram:
                buckets:
                  - 0
                  - 10
                  - 100
                  - 1000
                  - 10000
                  - 100000
                  - 1000000
# TYPE cost_actual histogram
cost_actual_bucket{otel_scope_name="apollo/router",le="0"} 0
cost_actual_bucket{otel_scope_name="apollo/router",le="10"} 3
cost_actual_bucket{otel_scope_name="apollo/router",le="100"} 5
cost_actual_bucket{otel_scope_name="apollo/router",le="1000"} 11
cost_actual_bucket{otel_scope_name="apollo/router",le="10000"} 19
cost_actual_bucket{otel_scope_name="apollo/router",le="100000"} 20
cost_actual_bucket{otel_scope_name="apollo/router",le="1000000"} 20
cost_actual_bucket{otel_scope_name="apollo/router",le="+Inf"} 20
cost_actual_sum{otel_scope_name="apollo/router"} 1097
cost_actual_count{otel_scope_name="apollo/router"} 20
# TYPE cost_delta histogram
cost_delta_bucket{otel_scope_name="apollo/router",le="0"} 0
cost_delta_bucket{otel_scope_name="apollo/router",le="10"} 2
cost_delta_bucket{otel_scope_name="apollo/router",le="100"} 9
cost_delta_bucket{otel_scope_name="apollo/router",le="1000"} 7
cost_delta_bucket{otel_scope_name="apollo/router",le="10000"} 19
cost_delta_bucket{otel_scope_name="apollo/router",le="100000"} 20
cost_delta_bucket{otel_scope_name="apollo/router",le="1000000"} 20
cost_delta_bucket{otel_scope_name="apollo/router",le="+Inf"} 20
cost_delta_sum{otel_scope_name="apollo/router"} 21934
cost_delta_count{otel_scope_name="apollo/router"} 1
# TYPE cost_estimated histogram
cost_estimated_bucket{cost_result="COST_OK",otel_scope_name="apollo/router",le="0"} 0
cost_estimated_bucket{cost_result="COST_OK",otel_scope_name="apollo/router",le="10"} 5
cost_estimated_bucket{cost_result="COST_OK",otel_scope_name="apollo/router",le="100"} 5
cost_estimated_bucket{cost_result="COST_OK",otel_scope_name="apollo/router",le="1000"} 9
cost_estimated_bucket{cost_result="COST_OK",otel_scope_name="apollo/router",le="10000"} 11
cost_estimated_bucket{cost_result="COST_OK",otel_scope_name="apollo/router",le="100000"} 20
cost_estimated_bucket{cost_result="COST_OK",otel_scope_name="apollo/router",le="1000000"} 20
cost_estimated_bucket{cost_result="COST_OK",otel_scope_name="apollo/router",le="+Inf"} 20
cost_estimated_sum{cost_result="COST_OK",otel_scope_name="apollo/router"}
cost_estimated_count{cost_result="COST_OK",otel_scope_name="apollo/router"} 20

An example chart of a histogram:

You can also chart the percentage of operations that would be allowed or rejected with the current configuration:

Accessing programmatically

You can programmatically access demand control cost data using Rhai scripts or Coprocessors. This can be useful for custom logging, decision making, or exposing cost data to clients.

Exposing cost in response headers

It's possible to expose cost information in the HTTP response payload returned to clients, which can be useful for debugging. This can be accomplished via a Rhai script on the supergraph_service hook:

fn supergraph_service(service) {
  service.map_response(|response| {
    if response.is_primary() {
      try {
        // Get cost estimation values from context
        let estimated_cost = response.context[Router.APOLLO_COST_ESTIMATED_KEY];
        let actual_cost = response.context[Router.APOLLO_COST_ACTUAL_KEY];
        let strategy = response.context[Router.APOLLO_COST_STRATEGY_KEY];
        let result = response.context[Router.APOLLO_COST_RESULT_KEY];

        // Add them as response headers
        if estimated_cost != () {
          response.headers["apollo-cost-estimate"] = estimated_cost.to_string();
        }

        if actual_cost != () {
          response.headers["apollo-cost-actual"] = actual_cost.to_string();
        }

        if strategy != () {
          response.headers["apollo-cost-strategy"] = strategy.to_string();
        }

        if result != () {
          response.headers["apollo-cost-result"] = result.to_string();
        }
      } catch(err) {
        log_debug(`Could not add cost headers: ${err}`);
      }
    }
  });
}