all: DNS request amplification with default settings

**Client**

I believe this affects everything, but am primarily testing with bigtable and have personally verified that spanner is also impacted

**Environment**

Kubernetes, including GKE

**Go Environment**
```
$ go version
go version go1.20.3 linux/amd64

$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN="/workspace/gobin"
GOCACHE="/builder/home/.cache/go-build"
GOENV="/builder/home/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/go/pkg/mod"
GONOPROXY="github.com/lytics"
GONOSUMDB="github.com/lytics"
GOOS="linux"
GOPATH="/go"
GOPRIVATE="github.com/<redacted>"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.20.3"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="0"
GOMOD="/workspace/go.mod"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2036783142=/tmp/go-build -gno-record-gcc-switches"
```

**Code**

```go
package main

import (
	"context"
	"fmt"
	"log"
	"time"

	"cloud.google.com/go/bigtable"
)

func main() {
	client, err := bigtable.NewClient(context.Background(),	"project", "instance")
	if err != nil {
		log.Fatalf("creating client: %v", err)
	}
	tbl := client.Open("table")
	for {
		_, err := tbl.SampleRowKeys(ctx)
		if err != nil {
			log.Fatalf("sampling: %v", err)
		}
		fmt.Println("sampled")
		time.Sleep(10 * time.Second)
	}
}
```

**Expected behavior**

A couple DNS requests are made to initiate connections

**Actual behavior**

70 extra DNS requests are made, and refreshed regularly. This number gets larger if option.WithGRPCConnectionPool is higher than the default of four

**Screenshots**

CPU usage node-local-dns daemonset before and after the mitigations described below were put in place just before Apr 20

![image](https://user-images.githubusercontent.com/692453/233688293-4894f2bd-5939-4355-a18a-8741bb39898d.png)

**Additional context**

In GKE with the default DNS policy (ndots 5, five search suffixes, four connections), when the code above initializes it causes 75 requests to DNS, only five of which are useful, and most of those are duplicated

Counts of lookups during initialization, from tcdumping port 53 in a sidecar container:
```
      4 SRV? _grpclb._tcp.bigtable.googleapis.com.svc.cluster.local.
      4 SRV? _grpclb._tcp.bigtable.googleapis.com.google.internal.
      4 SRV? _grpclb._tcp.bigtable.googleapis.com.default.svc.cluster.local.
      4 SRV? _grpclb._tcp.bigtable.googleapis.com.c.lyticsstaging.internal.
      4 SRV? _grpclb._tcp.bigtable.googleapis.com.cluster.local.
      4 SRV? _grpclb._tcp.bigtable.googleapis.com.
      4 A? bigtable.googleapis.com.svc.cluster.local.
      4 A? bigtable.googleapis.com.google.internal.
      4 A? bigtable.googleapis.com.default.svc.cluster.local.
      4 A? bigtable.googleapis.com.c.lyticsstaging.internal.
      4 A? bigtable.googleapis.com.cluster.local.
      4 A? bigtable.googleapis.com. // this is 4x larger than it should be
      4 AAAA? bigtable.googleapis.com.svc.cluster.local.
      4 AAAA? bigtable.googleapis.com.google.internal.
      4 AAAA? bigtable.googleapis.com.default.svc.cluster.local.
      4 AAAA? bigtable.googleapis.com.c.lyticsstaging.internal.
      4 AAAA? bigtable.googleapis.com.cluster.local.
      4 AAAA? bigtable.googleapis.com. // this is 4x larger than it should be
      1 PTR? 10.16.0.10.in-addr.arpa. // this is fine
      1 A? metadata.google.internal. // this is fine
      1 AAAA? metadata.google.internal. // this is fine
```

All SRV lookups, at least for bigtable result in NXDomains 100% of the time (and as such are only cached by node-local-dns for two seconds)
All DNS requests to .local or .internal domains are pointless and caused by the default ndots:5 and search suffix list, and also cause NXDomains
There's 4x more valid DNS requests than neccessary due to the default pool size - Increasing the pool size from four also increases the DNS requests

NXDomains are not generally cached heavily, so for large pool sizes and large numbers of pods, the traffic generated can be immense, causing timeouts talking to kube-dns, or high cpu usage in node-local-dns

These queries are also repeated once per hour by default, though as mentioned below we were seeing it be /much/ more aggressive

In our production environment, this was causing host-local-dns to have high CPU usage (20-50 cores just for node-local-dns) and log millions of timeouts talking to kube-dns in the form of `[ERROR] plugin/errors: 2 <hostname> A: read tcp <node IP: port>-><kubedns IP>:53: i/o timeout` as documented [here](https://cloud.google.com/kubernetes-engine/docs/how-to/nodelocal-dns-cache#timeout_issues), but happening with modern GKE.

The factors at play appear to be:
- The local resolver cache isn't shared between connections, causing the multiplication based on the size of your connection pools (4 by default, but ours in production are much larger than that)
- grpclb features are not disabled, causing SRV lookups that aren't ever going to return anything
- The hostname of the APIs for some or all of the endpoints lack trailing periods to make them fully qualified, causing them to have the search suffixes appended with kubernetes' default ndots config of 5
- Additionally, it looks like golang's LookupSRV tries the search suffixes after receiving an NXDomain even if ndots is set to something sane like 1, so mitigating this outside of code changes is difficult

As for mitigation, you can
- Enable ndots:1, which drops the non-SRV search suffix lookups
- Enable the [undocumented](https://www.google.com/search?q=%22GOOGLE_CLOUD_DISABLE_DIRECT_PATH%22) [GOOGLE_CLOUD_DISABLE_DIRECT_PATH](https://github.com/googleapis/google-api-go-client/blob/main/transport/grpc/dial.go#L33) environmental variable, which causes a different resolver to be used that doesn't try grpclb lookups - This seems like a hack, though

With both of these set, DNS requests are still `2 * option.WithGRPCConnectionPool` due to the A and AAAA records each being queried per connection

With our large connection pools in our production environment, we also saw constant re-resolution of all these names on a rather rapid cadence - causing up to hundreds or thousands of DNS requests per second per pod, indefinitely.

Proposed changes to solve this properly:
- Share a global resolver cache process-wide so DNS lookups happen once per process, not once per connection
- Disable grpclb features for services that do not implement them
- Change hostnames to fully qualified with a trailing dot to avoid search suffix amplification

Please let me know if a more thorough example of reproduction and measurement is desired

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

all: DNS request amplification with default settings #7812

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

all: DNS request amplification with default settings #7812

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions