Replace shelling out with kubernetes proxy by siggy · Pull Request #249 · linkerd/linkerd2

siggy · 2018-02-01T03:31:12Z

The conduit dashboard command asychronously shells out and runs "kubectl
proxy".

This change replaces the shelling out with calls to kubernetes proxy
APIs. It also allows us to enable race detection in our go tests, as the
shell out code tests did not pass race detection.

Fixes #173

Signed-off-by: Andrew Seigner [email protected]

pcalcado

TIOLI but what about splitting these into two PRs, one with just the -race change? I think it makes it more cohesive

pcalcado · 2018-02-01T18:31:10Z

cli/cmd/dashboard.go

+		api, err := k8s.NewK8sAPI(shell.NewUnixShell().HomeDir(), kubeconfigPath)
 		if err != nil {
-			log.Fatalf("Failed to start kubectl: %v", err)
+			log.Fatalf("NewK8sAPI() failed with: %+v", err)


If this is meant to be seen by the user, I'd keep the original message and avoid leaking method names. Even if not part of the UX, I think that this can go stale quickly as we refactor rename/extract methods so I'd avoid it

pcalcado · 2018-02-01T18:31:37Z

cli/cmd/dashboard.go

-
-		if err != nil {
-			log.Fatalf("Failed to generate URL for dashboard: %v", err)
+			log.Fatalf("Failed to generate URL for dashboard: %+v", err)


Should we at some point have a blanket PR adding + to all %v?

Hmm, yeah, we use this pattern all over the place, but my preference is to use "%s" when printing errors. We don't gain anything with "%v" or "%+v", and those aren't type checked by go vet. I'd be 👍 👍 👍 for a branch that converts all printed errors to "%s". https://play.golang.org/p/KqJ4PezUhGs

Hmm, I generally lean towards blanket %+v everywhere, particularly if %s and %+v yield the same output for error types, it's one less thing for me to think about. But, I had not considered the go vet angle.

pcalcado · 2018-02-01T18:46:25Z

cli/cmd/dashboard.go

 		}

 		fmt.Printf("Opening [%s] in the default browser\n", url)
 		err = browser.OpenURL(url.String())


Is this opening the browser before starting the proxy? Won't it 404?

Yeah it's initially a 404, but Chrome and Safari do a quick reload, so it Just Works.

I think we have a few options:

leave it as-is

print the URL instead of opening a browser

add a delay before opening the browser

add some asynchronous signaling around the StartProxy() API to let us know when the proxy is ready

2 is the most platform-agnostic but less convenient, 3 is a bit hacky and not reliable, 4 is more complexity. i'm inclined to leave it as-is.

@siggy I'm ok with leaving it it it works but maybe we should do 2 anyway just in case the browser open fails?

that sounds good. as i write this i'm realizing we already do 2:

$ bin/go-run cli dashboard -p 0 github.com/runconduit/conduit/cli Opening [http://127.0.0.1:56026/api/v1/namespaces/conduit/services/web:http/proxy/] in the default browser Starting to serve on 127.0.0.1:56026

klingerf

Great, this seems like a good improvement. Just had some miscellaneous comments below.

klingerf · 2018-02-01T18:44:09Z

controller/Dockerfile

@@ -1,5 +1,5 @@
 ## compile controller services
-FROM gcr.io/runconduit/go-deps:dac3fae6 as golang
+FROM gcr.io/runconduit/go-deps:f39a1eed as golang


For consistency, would you want to add the -race flag in this file too, on line 8?

klingerf · 2018-02-01T19:13:57Z

pkg/k8s/kubectl.go

 	KubernetesPods                      = "pods"
-	kubectlDefaultProxyPort             = 8001
 	kubectlDefaultTimeout               = 10 * time.Second
 	portWhenProxyNotRunning             = -1


I think you can remove kubectlDefaultTimeout and portWhenProxyNotRunning too.

klingerf · 2018-02-01T19:19:13Z

cli/cmd/dashboard.go

-
-		if err != nil {
-			log.Fatalf("Failed to generate URL for dashboard: %v", err)
+			log.Fatalf("Failed to generate URL for dashboard: %+v", err)


Hmm, yeah, we use this pattern all over the place, but my preference is to use "%s" when printing errors. We don't gain anything with "%v" or "%+v", and those aren't type checked by go vet. I'd be 👍 👍 👍 for a branch that converts all printed errors to "%s". https://play.golang.org/p/KqJ4PezUhGs

klingerf · 2018-02-01T19:21:55Z

cli/Dockerfile-bin

-RUN CGO_ENABLED=0 GOOS=windows go build -a -installsuffix cgo -o /out/conduit-windows -ldflags "-X github.com/runconduit/conduit/pkg/version.Version=$CONDUIT_VERSION" ./cli
+RUN CGO_ENABLED=0 GOOS=darwin go build -race -a -installsuffix cgo -o /out/conduit-darwin -ldflags "-X github.com/runconduit/conduit/pkg/version.Version=$CONDUIT_VERSION" ./cli
+RUN CGO_ENABLED=0 GOOS=linux go build -race -a -installsuffix cgo -o /out/conduit-linux -ldflags "-X github.com/runconduit/conduit/pkg/version.Version=$CONDUIT_VERSION" ./cli
+RUN CGO_ENABLED=0 GOOS=windows go build -race -a -installsuffix cgo -o /out/conduit-windows -ldflags "-X github.com/runconduit/conduit/pkg/version.Version=$CONDUIT_VERSION" ./cli


Just thinking out loud here -- I wonder how much additional time the -race flag adds to the build. Building this docker image is already pretty slow, since we produce three static binaries, one for each platform. I wouldn't want to slow it down more if race detection is expensive. Also, since we're building the same target with all three build commands, we'd get the same results if only 1 of the 3 built with race detection.

per @pcalcado, i'm going to add race detection in a separate branch.

related to this specific Dockerfile, i'm only going to add -race back for ci and testing, per https://blog.golang.org/race-detector:

race-enabled binaries can use ten times the CPU and memory, so it is impractical to enable the race detector all the time

klingerf · 2018-02-01T19:27:17Z

pkg/k8s/api.go

+func (kubeapi *kubernetesApi) StartProxy(proxyPort int) error {
+	server, err := proxyCreate(kubeapi.Config)
+	if err != nil {
+		log.Fatalf("proxyCreate() failed with: %+v", err)


This relates to #17. I don't think we want to call log.Fatalf here. That will cause an error message to be logged and then the process will immediately exit with a non-zero exit code. Since the method signature of this method allows you to return an error, I think it would be better to return an error, and let the caller decide how they want to log / exit.

It's also worth noting that the previous implementation of this method used return fmt.Errorf instead of log.Fatalf.

klingerf · 2018-02-01T19:27:42Z

pkg/k8s/api.go

 	return &kubernetesApi{Config: config}, nil
 }
+
+func proxyCreate(config *rest.Config) (*proxy.Server, error) {


Same goes for this method -- it should return errors instead of aborting.

klingerf · 2018-02-01T19:27:47Z

pkg/k8s/api.go

+	return server, nil
+}
+
+func proxyListen(server *proxy.Server, proxyPort int) error {


Same goes for this method -- it should return errors instead of aborting.

adleong

🍴 🐚 ⭐

adleong · 2018-02-01T23:40:30Z

Gopkg.toml

  name = "k8s.io/apimachinery"
  version = "kubernetes-1.9.1"

+[[constraint]]


I'm sad that we have to pull this into all of our Go projects when it's only the CLI that needs it 😢

Yeah, I looked at implementing this strictly in terms of client-go, but I'd prefer to leverage https://github.com/kubernetes/kubernetes/blob/master/pkg/kubectl/proxy/proxy_server.go#L182.

The good news is this should mostly slow the build the first time around.

adleong · 2018-02-01T23:42:33Z

cli/cmd/dashboard.go

+		// blocks until killed
+		err = api.StartProxy(proxyPort)
+		if err != nil {
+			fmt.Fprintf(os.Stderr, "Error starting proxy via kubectl: %s", err)


I think kubectl can be taken out of this error message now

olix0r · 2018-02-01T23:52:58Z

cli/cmd/dashboard.go

-				os.Exit(1)
-			}
+		// blocks until killed
+		err = api.StartProxy(proxyPort)


is it possible to use a port of 0 so it binds an ephemeral port? If it doesn't return, then I'm guessing it's not practical to get the port that way...

If not, can we manually try to choose an ephemeral port until there are no port conflicts? (i.e. by handling the port-conflict error and retrying on a new port)

klingerf · 2018-02-02T01:00:40Z

controller/telemetry/server.go

 	pods, err := s.pods.GetPodsByIndex(ipStr)
 	if err != nil {
-		log.Debugf("Cannot get pod for IP %s: %s", ipStr, err)
+		log.Errorf("Cannot get pod for IP %s: %s", ipStr, err)


Can I bug you to revert the log level changes in this file? Most of these logs are not indicative of a larger problem. For instance, any external request will have a source IP that does not belong to a pod, so with this change we'd end up error logging on every request served.

The conduit dashboard command asychronously shells out and runs "kubectl proxy". This change replaces the shelling out with calls to kubernetes proxy APIs. It also allows us to enable race detection in our go tests, as the shell out code tests did not pass race detection. Fixes #173 Signed-off-by: Andrew Seigner <[email protected]>

pcalcado · 2018-02-02T17:01:23Z

Thanks for looking at the initial feedback ✨ I'm testing this on my machine and will report results soon

pcalcado

Testing this on my machine against some scenarios, I see that the following error scenarios:

No running k8s
Running k8s but conduit not installed
Conduit had an error on installing and its containers are error'd

End up with the browser opening as 404 and the CLI never terminates. Should we add a status check before? I am also happy having this as a separate issue to keep this PR moving.

siggy · 2018-02-02T18:31:27Z

@pcalcado i've filed #258 to track handling those error scenarios. i'm going to merge this as it's quite a big dependency change.

We previously did not have race detection enabled because our tests would fail. Following #249, this is no longer the case. Enable race detection in ci and build instructions. Fixes #173

We previously did not have race detection enabled because our tests would fail. Following #249, this is no longer the case. Enable race detection in ci and build instructions. Fixes #173 Signed-off-by: Andrew Seigner <[email protected]>

We previously did not have race detection enabled because our tests would fail. Following #249, this is no longer the case. Enable race detection in ci and build instructions. This change also fixes client_test.go attempting to allocate a 2GB buffer due to bad test input. Fixes #173 Signed-off-by: Andrew Seigner <[email protected]>

commit 5f89351081eff47a4ab8cd88e2e1a69a04f86541 Author: Oliver Gould <[email protected]> Date: Thu May 9 16:39:24 2019 -0700 Upgrade tower dependencies (#249) Tower must be updated in order to pickup tower-rs/tower#281 to address #2804. This adopts released crates where possible. commit 5d5eed6f8180b8db4090d995e71fdf7b0890c647 Author: Zahari Dichev <[email protected]> Date: Thu May 9 01:08:34 2019 +0300 Assert that TLS connection is refused if identity is not certified yet (#243) This branch adds tls capability to the support cient used in tests. In addition to that it adds two tests verifying that a TLS connection is refused in case the identity is not certified yet. This attempts to fix ##2598 and provide facility to write tests for #2676. As these are still some of my first lines of Rust code, it is advised to approach everything with a healthy dose of doubt :) Signed-off-by: Zahari Dichev <[email protected]> commit 1b9bb3745e44c959d1d41d14fed2b2822c82b5ba Author: Oliver Gould <[email protected]> Date: Wed May 8 14:28:37 2019 -0700 Introduce dispatch timeouts around buffers (#246) The proxy has several buffers, especially where it routes requests over shared stacks. If any of these routes is unavailable, then a request may remain buffered indefinitely. Previously, before service profiles were introduced, there was a default _response_ timeout that would cause these requests to fail; but since this response timeout is now optional (and is only applied once the request has been routed within a proxy), then we need a new mechanism to prevent requests from getting "stuck". This change does the following: - all proxied requests are annotated with a dispatch deadline; - each time a request is bufered, a timeout is registered. - if the timeout fires, the response exception fails, a 503 is returned, and the request is dropped. - if the request is processed into the inner stack, the timeout is ignored. The dispatch timeout limits the _time a request is buffered in a proxy_. This is distinct from the response timeout, as the server's response may naturally be delayed for any number of (non-proxy-related) reasons. The `insert_target` module has been generalized to `insert` to support setting the DispatchDeadline extension. The `buffer` module has been augmented with generic deadline-extraction logic. The `svc` module now exposes its own builder type that notably adds a `buffer_pending` helper. It's helpful to pull a builder type into the proxy to assist debugging type errors when modifying stacks. Fixes #2779 #2795 commit caf899557c3b041190f63544da865396231b3e30 Author: Oliver Gould <[email protected]> Date: Fri May 3 15:55:32 2019 -0700 router: Fail requests when the route is not ready (#241) In #2779, we plan to expire requests while they are buffered. However, the router _implicitly_ buffers requests in the executor when the inner service is not ready. This change alters the route to wrap all inner layers in a `LoadShed` so it can expect all services to `poll_ready()` immediately. commit 587bad101d9e5daeacb24b6733097c350a798356 Author: Eliza Weisman <[email protected]> Date: Fri May 3 14:18:08 2019 -0700 Remove Destination service query concurrency limit (#244) Currently, the proxy enforces a limit on the number of concurrent queries (i.e., the number of gRPC streams) to the Destination service. This limit was added based on information about the behaviour of the Destination service that is now known to be incorrect. This branch removes the limit on concurrent queries from the proxy's `control::destination` module. Although it should now be possible to simplify this code as a result of this change, I've refrained from doing any major refactoring in this branch --- my intention is to do this after the DNS fallback behaviour has also been removed, as together with this change, that will result in a _significant_ simplification of the module. Additionally, I've removed the tests for the concurrency limit, as they are no longer relevant. The `LINKERD2_PROXY_DESTINATION_CLIENT_CONCURRENCY_LIMIT` environment variable was also removed; this is not a breaking change as neither the CLI nor the proxy injector will currently set this env var. Signed-off-by: Eliza Weisman <[email protected]> commit cbdf45b44f7e4d852dc0497716062167ab9539fb Author: Sean McArthur <[email protected]> Date: Thu May 2 11:47:48 2019 -0700 Remove h2::Error requirement from metrics Signed-off-by: Sean McArthur <[email protected]> commit 3276949d4608dc4344b7bed3de2fc4b3080c2c6e Author: Sean McArthur <[email protected]> Date: Thu May 2 09:44:00 2019 -0700 delete unused proxy::http::metrics::class module Signed-off-by: Sean McArthur <[email protected]> Signed-off-by: Eliza Weisman <[email protected]>

siggy added area/cli area/build refactoring review/ready Issue has a reviewable PR labels Feb 1, 2018

siggy self-assigned this Feb 1, 2018

pcalcado reviewed Feb 1, 2018

View reviewed changes

klingerf reviewed Feb 1, 2018

View reviewed changes

siggy force-pushed the siggy/no-shells branch 3 times, most recently from bc6b93f to 7866010 Compare February 1, 2018 23:23

adleong approved these changes Feb 1, 2018

View reviewed changes

olix0r reviewed Feb 1, 2018

View reviewed changes

klingerf reviewed Feb 2, 2018

View reviewed changes

siggy force-pushed the siggy/no-shells branch from 7866010 to 05115b1 Compare February 2, 2018 02:18

pcalcado approved these changes Feb 2, 2018

View reviewed changes

siggy mentioned this pull request Feb 2, 2018

conduit dashboard command should fail if conduit not ready #258

Closed

siggy merged commit 9a40d98 into master Feb 2, 2018

siggy deleted the siggy/no-shells branch February 2, 2018 18:32

siggy removed the review/ready Issue has a reviewable PR label Feb 2, 2018

siggy added a commit that referenced this pull request Feb 2, 2018

Enable race detection in ci

8a441c1

We previously did not have race detection enabled because our tests would fail. Following #249, this is no longer the case. Enable race detection in ci and build instructions. Fixes #173

siggy mentioned this pull request Feb 2, 2018

Enable race detection in ci #259

Merged

This was referenced Feb 7, 2018

Stop using port 8001 as the default port for conduit dashboard #291

Closed

"conduit check" fails with kubectl client 1.9.x #286

Closed

klingerf mentioned this pull request Feb 20, 2018

conduit dashboard does not use the port provided to open the dashboard url when -p is used #359

Closed

Conversation

siggy commented Feb 1, 2018

Uh oh!

pcalcado left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

klingerf left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adleong left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcalcado commented Feb 2, 2018

Uh oh!

pcalcado left a comment

Choose a reason for hiding this comment

Uh oh!

siggy commented Feb 2, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants