Skip to content

all: DNS request amplification with default settings #7812

@vitaminmoo

Description

@vitaminmoo

Client

I believe this affects everything, but am primarily testing with bigtable and have personally verified that spanner is also impacted

Environment

Kubernetes, including GKE

Go Environment

$ go version
go version go1.20.3 linux/amd64

$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN="/workspace/gobin"
GOCACHE="/builder/home/.cache/go-build"
GOENV="/builder/home/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/go/pkg/mod"
GONOPROXY="github.com/lytics"
GONOSUMDB="github.com/lytics"
GOOS="linux"
GOPATH="/go"
GOPRIVATE="github.com/<redacted>"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.20.3"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="0"
GOMOD="/workspace/go.mod"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2036783142=/tmp/go-build -gno-record-gcc-switches"

Code

package main

import (
	"context"
	"fmt"
	"log"
	"time"

	"cloud.google.com/go/bigtable"
)

func main() {
	client, err := bigtable.NewClient(context.Background(),	"project", "instance")
	if err != nil {
		log.Fatalf("creating client: %v", err)
	}
	tbl := client.Open("table")
	for {
		_, err := tbl.SampleRowKeys(ctx)
		if err != nil {
			log.Fatalf("sampling: %v", err)
		}
		fmt.Println("sampled")
		time.Sleep(10 * time.Second)
	}
}

Expected behavior

A couple DNS requests are made to initiate connections

Actual behavior

70 extra DNS requests are made, and refreshed regularly. This number gets larger if option.WithGRPCConnectionPool is higher than the default of four

Screenshots

CPU usage node-local-dns daemonset before and after the mitigations described below were put in place just before Apr 20

image

Additional context

In GKE with the default DNS policy (ndots 5, five search suffixes, four connections), when the code above initializes it causes 75 requests to DNS, only five of which are useful, and most of those are duplicated

Counts of lookups during initialization, from tcdumping port 53 in a sidecar container:

      4 SRV? _grpclb._tcp.bigtable.googleapis.com.svc.cluster.local.
      4 SRV? _grpclb._tcp.bigtable.googleapis.com.google.internal.
      4 SRV? _grpclb._tcp.bigtable.googleapis.com.default.svc.cluster.local.
      4 SRV? _grpclb._tcp.bigtable.googleapis.com.c.lyticsstaging.internal.
      4 SRV? _grpclb._tcp.bigtable.googleapis.com.cluster.local.
      4 SRV? _grpclb._tcp.bigtable.googleapis.com.
      4 A? bigtable.googleapis.com.svc.cluster.local.
      4 A? bigtable.googleapis.com.google.internal.
      4 A? bigtable.googleapis.com.default.svc.cluster.local.
      4 A? bigtable.googleapis.com.c.lyticsstaging.internal.
      4 A? bigtable.googleapis.com.cluster.local.
      4 A? bigtable.googleapis.com. // this is 4x larger than it should be
      4 AAAA? bigtable.googleapis.com.svc.cluster.local.
      4 AAAA? bigtable.googleapis.com.google.internal.
      4 AAAA? bigtable.googleapis.com.default.svc.cluster.local.
      4 AAAA? bigtable.googleapis.com.c.lyticsstaging.internal.
      4 AAAA? bigtable.googleapis.com.cluster.local.
      4 AAAA? bigtable.googleapis.com. // this is 4x larger than it should be
      1 PTR? 10.16.0.10.in-addr.arpa. // this is fine
      1 A? metadata.google.internal. // this is fine
      1 AAAA? metadata.google.internal. // this is fine

All SRV lookups, at least for bigtable result in NXDomains 100% of the time (and as such are only cached by node-local-dns for two seconds)
All DNS requests to .local or .internal domains are pointless and caused by the default ndots:5 and search suffix list, and also cause NXDomains
There's 4x more valid DNS requests than neccessary due to the default pool size - Increasing the pool size from four also increases the DNS requests

NXDomains are not generally cached heavily, so for large pool sizes and large numbers of pods, the traffic generated can be immense, causing timeouts talking to kube-dns, or high cpu usage in node-local-dns

These queries are also repeated once per hour by default, though as mentioned below we were seeing it be /much/ more aggressive

In our production environment, this was causing host-local-dns to have high CPU usage (20-50 cores just for node-local-dns) and log millions of timeouts talking to kube-dns in the form of [ERROR] plugin/errors: 2 <hostname> A: read tcp <node IP: port>-><kubedns IP>:53: i/o timeout as documented here, but happening with modern GKE.

The factors at play appear to be:

  • The local resolver cache isn't shared between connections, causing the multiplication based on the size of your connection pools (4 by default, but ours in production are much larger than that)
  • grpclb features are not disabled, causing SRV lookups that aren't ever going to return anything
  • The hostname of the APIs for some or all of the endpoints lack trailing periods to make them fully qualified, causing them to have the search suffixes appended with kubernetes' default ndots config of 5
  • Additionally, it looks like golang's LookupSRV tries the search suffixes after receiving an NXDomain even if ndots is set to something sane like 1, so mitigating this outside of code changes is difficult

As for mitigation, you can

  • Enable ndots:1, which drops the non-SRV search suffix lookups
  • Enable the undocumented GOOGLE_CLOUD_DISABLE_DIRECT_PATH environmental variable, which causes a different resolver to be used that doesn't try grpclb lookups - This seems like a hack, though

With both of these set, DNS requests are still 2 * option.WithGRPCConnectionPool due to the A and AAAA records each being queried per connection

With our large connection pools in our production environment, we also saw constant re-resolution of all these names on a rather rapid cadence - causing up to hundreds or thousands of DNS requests per second per pod, indefinitely.

Proposed changes to solve this properly:

  • Share a global resolver cache process-wide so DNS lookups happen once per process, not once per connection
  • Disable grpclb features for services that do not implement them
  • Change hostnames to fully qualified with a trailing dot to avoid search suffix amplification

Please let me know if a more thorough example of reproduction and measurement is desired

Metadata

Metadata

Assignees

Labels

priority: p3Desirable enhancement or fix. May not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions