Conversation
pcalcado
left a comment
There was a problem hiding this comment.
TIOLI but what about splitting these into two PRs, one with just the -race change? I think it makes it more cohesive
cli/cmd/dashboard.go
Outdated
| api, err := k8s.NewK8sAPI(shell.NewUnixShell().HomeDir(), kubeconfigPath) | ||
| if err != nil { | ||
| log.Fatalf("Failed to start kubectl: %v", err) | ||
| log.Fatalf("NewK8sAPI() failed with: %+v", err) |
There was a problem hiding this comment.
If this is meant to be seen by the user, I'd keep the original message and avoid leaking method names. Even if not part of the UX, I think that this can go stale quickly as we refactor rename/extract methods so I'd avoid it
cli/cmd/dashboard.go
Outdated
|
|
||
| if err != nil { | ||
| log.Fatalf("Failed to generate URL for dashboard: %v", err) | ||
| log.Fatalf("Failed to generate URL for dashboard: %+v", err) |
There was a problem hiding this comment.
Should we at some point have a blanket PR adding + to all %v?
There was a problem hiding this comment.
Hmm, yeah, we use this pattern all over the place, but my preference is to use "%s" when printing errors. We don't gain anything with "%v" or "%+v", and those aren't type checked by go vet. I'd be 👍 👍 👍 for a branch that converts all printed errors to "%s". https://play.golang.org/p/KqJ4PezUhGs
There was a problem hiding this comment.
Hmm, I generally lean towards blanket %+v everywhere, particularly if %s and %+v yield the same output for error types, it's one less thing for me to think about. But, I had not considered the go vet angle.
| } | ||
|
|
||
| fmt.Printf("Opening [%s] in the default browser\n", url) | ||
| err = browser.OpenURL(url.String()) |
There was a problem hiding this comment.
Is this opening the browser before starting the proxy? Won't it 404?
There was a problem hiding this comment.
Yeah it's initially a 404, but Chrome and Safari do a quick reload, so it Just Works.
I think we have a few options:
- leave it as-is
- print the URL instead of opening a browser
- add a delay before opening the browser
- add some asynchronous signaling around the
StartProxy()API to let us know when the proxy is ready
2 is the most platform-agnostic but less convenient, 3 is a bit hacky and not reliable, 4 is more complexity. i'm inclined to leave it as-is.
There was a problem hiding this comment.
@siggy I'm ok with leaving it it it works but maybe we should do 2 anyway just in case the browser open fails?
There was a problem hiding this comment.
that sounds good. as i write this i'm realizing we already do 2:
$ bin/go-run cli dashboard -p 0
github.com/runconduit/conduit/cli
Opening [http://127.0.0.1:56026/api/v1/namespaces/conduit/services/web:http/proxy/] in the default browser
Starting to serve on 127.0.0.1:56026
klingerf
left a comment
There was a problem hiding this comment.
Great, this seems like a good improvement. Just had some miscellaneous comments below.
controller/Dockerfile
Outdated
| @@ -1,5 +1,5 @@ | |||
| ## compile controller services | |||
| FROM gcr.io/runconduit/go-deps:dac3fae6 as golang | |||
| FROM gcr.io/runconduit/go-deps:f39a1eed as golang | |||
There was a problem hiding this comment.
For consistency, would you want to add the -race flag in this file too, on line 8?
pkg/k8s/kubectl.go
Outdated
| KubernetesPods = "pods" | ||
| kubectlDefaultProxyPort = 8001 | ||
| kubectlDefaultTimeout = 10 * time.Second | ||
| portWhenProxyNotRunning = -1 |
There was a problem hiding this comment.
I think you can remove kubectlDefaultTimeout and portWhenProxyNotRunning too.
cli/cmd/dashboard.go
Outdated
|
|
||
| if err != nil { | ||
| log.Fatalf("Failed to generate URL for dashboard: %v", err) | ||
| log.Fatalf("Failed to generate URL for dashboard: %+v", err) |
There was a problem hiding this comment.
Hmm, yeah, we use this pattern all over the place, but my preference is to use "%s" when printing errors. We don't gain anything with "%v" or "%+v", and those aren't type checked by go vet. I'd be 👍 👍 👍 for a branch that converts all printed errors to "%s". https://play.golang.org/p/KqJ4PezUhGs
cli/Dockerfile-bin
Outdated
| RUN CGO_ENABLED=0 GOOS=windows go build -a -installsuffix cgo -o /out/conduit-windows -ldflags "-X github.com/runconduit/conduit/pkg/version.Version=$CONDUIT_VERSION" ./cli | ||
| RUN CGO_ENABLED=0 GOOS=darwin go build -race -a -installsuffix cgo -o /out/conduit-darwin -ldflags "-X github.com/runconduit/conduit/pkg/version.Version=$CONDUIT_VERSION" ./cli | ||
| RUN CGO_ENABLED=0 GOOS=linux go build -race -a -installsuffix cgo -o /out/conduit-linux -ldflags "-X github.com/runconduit/conduit/pkg/version.Version=$CONDUIT_VERSION" ./cli | ||
| RUN CGO_ENABLED=0 GOOS=windows go build -race -a -installsuffix cgo -o /out/conduit-windows -ldflags "-X github.com/runconduit/conduit/pkg/version.Version=$CONDUIT_VERSION" ./cli |
There was a problem hiding this comment.
Just thinking out loud here -- I wonder how much additional time the -race flag adds to the build. Building this docker image is already pretty slow, since we produce three static binaries, one for each platform. I wouldn't want to slow it down more if race detection is expensive. Also, since we're building the same target with all three build commands, we'd get the same results if only 1 of the 3 built with race detection.
There was a problem hiding this comment.
per @pcalcado, i'm going to add race detection in a separate branch.
related to this specific Dockerfile, i'm only going to add -race back for ci and testing, per https://blog.golang.org/race-detector:
race-enabled binaries can use ten times the CPU and memory, so it is impractical to enable the race detector all the time
pkg/k8s/api.go
Outdated
| func (kubeapi *kubernetesApi) StartProxy(proxyPort int) error { | ||
| server, err := proxyCreate(kubeapi.Config) | ||
| if err != nil { | ||
| log.Fatalf("proxyCreate() failed with: %+v", err) |
There was a problem hiding this comment.
This relates to #17. I don't think we want to call log.Fatalf here. That will cause an error message to be logged and then the process will immediately exit with a non-zero exit code. Since the method signature of this method allows you to return an error, I think it would be better to return an error, and let the caller decide how they want to log / exit.
It's also worth noting that the previous implementation of this method used return fmt.Errorf instead of log.Fatalf.
pkg/k8s/api.go
Outdated
| return &kubernetesApi{Config: config}, nil | ||
| } | ||
|
|
||
| func proxyCreate(config *rest.Config) (*proxy.Server, error) { |
There was a problem hiding this comment.
Same goes for this method -- it should return errors instead of aborting.
pkg/k8s/api.go
Outdated
| return server, nil | ||
| } | ||
|
|
||
| func proxyListen(server *proxy.Server, proxyPort int) error { |
There was a problem hiding this comment.
Same goes for this method -- it should return errors instead of aborting.
bc6b93f to
7866010
Compare
| name = "k8s.io/apimachinery" | ||
| version = "kubernetes-1.9.1" | ||
|
|
||
| [[constraint]] |
There was a problem hiding this comment.
I'm sad that we have to pull this into all of our Go projects when it's only the CLI that needs it 😢
There was a problem hiding this comment.
Yeah, I looked at implementing this strictly in terms of client-go, but I'd prefer to leverage https://github.com/kubernetes/kubernetes/blob/master/pkg/kubectl/proxy/proxy_server.go#L182.
The good news is this should mostly slow the build the first time around.
cli/cmd/dashboard.go
Outdated
| // blocks until killed | ||
| err = api.StartProxy(proxyPort) | ||
| if err != nil { | ||
| fmt.Fprintf(os.Stderr, "Error starting proxy via kubectl: %s", err) |
There was a problem hiding this comment.
I think kubectl can be taken out of this error message now
cli/cmd/dashboard.go
Outdated
| os.Exit(1) | ||
| } | ||
| // blocks until killed | ||
| err = api.StartProxy(proxyPort) |
There was a problem hiding this comment.
is it possible to use a port of 0 so it binds an ephemeral port? If it doesn't return, then I'm guessing it's not practical to get the port that way...
If not, can we manually try to choose an ephemeral port until there are no port conflicts? (i.e. by handling the port-conflict error and retrying on a new port)
controller/telemetry/server.go
Outdated
| pods, err := s.pods.GetPodsByIndex(ipStr) | ||
| if err != nil { | ||
| log.Debugf("Cannot get pod for IP %s: %s", ipStr, err) | ||
| log.Errorf("Cannot get pod for IP %s: %s", ipStr, err) |
There was a problem hiding this comment.
Can I bug you to revert the log level changes in this file? Most of these logs are not indicative of a larger problem. For instance, any external request will have a source IP that does not belong to a pod, so with this change we'd end up error logging on every request served.
The conduit dashboard command asychronously shells out and runs "kubectl proxy". This change replaces the shelling out with calls to kubernetes proxy APIs. It also allows us to enable race detection in our go tests, as the shell out code tests did not pass race detection. Fixes #173 Signed-off-by: Andrew Seigner <[email protected]>
7866010 to
05115b1
Compare
|
Thanks for looking at the initial feedback ✨ I'm testing this on my machine and will report results soon |
pcalcado
left a comment
There was a problem hiding this comment.
Testing this on my machine against some scenarios, I see that the following error scenarios:
- No running k8s
- Running k8s but conduit not installed
- Conduit had an error on installing and its containers are error'd
End up with the browser opening as 404 and the CLI never terminates. Should we add a status check before? I am also happy having this as a separate issue to keep this PR moving.
We previously did not have race detection enabled because our tests would fail. Following #249, this is no longer the case. Enable race detection in ci and build instructions. Fixes #173 Signed-off-by: Andrew Seigner <[email protected]>
We previously did not have race detection enabled because our tests would fail. Following #249, this is no longer the case. Enable race detection in ci and build instructions. This change also fixes client_test.go attempting to allocate a 2GB buffer due to bad test input. Fixes #173 Signed-off-by: Andrew Seigner <[email protected]>
We previously did not have race detection enabled because our tests would fail. Following #249, this is no longer the case. Enable race detection in ci and build instructions. This change also fixes client_test.go attempting to allocate a 2GB buffer due to bad test input. Fixes #173 Signed-off-by: Andrew Seigner <[email protected]>
commit 5f89351081eff47a4ab8cd88e2e1a69a04f86541 Author: Oliver Gould <[email protected]> Date: Thu May 9 16:39:24 2019 -0700 Upgrade tower dependencies (#249) Tower must be updated in order to pickup tower-rs/tower#281 to address #2804. This adopts released crates where possible. commit 5d5eed6f8180b8db4090d995e71fdf7b0890c647 Author: Zahari Dichev <[email protected]> Date: Thu May 9 01:08:34 2019 +0300 Assert that TLS connection is refused if identity is not certified yet (#243) This branch adds tls capability to the support cient used in tests. In addition to that it adds two tests verifying that a TLS connection is refused in case the identity is not certified yet. This attempts to fix ##2598 and provide facility to write tests for #2676. As these are still some of my first lines of Rust code, it is advised to approach everything with a healthy dose of doubt :) Signed-off-by: Zahari Dichev <[email protected]> commit 1b9bb3745e44c959d1d41d14fed2b2822c82b5ba Author: Oliver Gould <[email protected]> Date: Wed May 8 14:28:37 2019 -0700 Introduce dispatch timeouts around buffers (#246) The proxy has several buffers, especially where it routes requests over shared stacks. If any of these routes is unavailable, then a request may remain buffered indefinitely. Previously, before service profiles were introduced, there was a default _response_ timeout that would cause these requests to fail; but since this response timeout is now optional (and is only applied once the request has been routed within a proxy), then we need a new mechanism to prevent requests from getting "stuck". This change does the following: - all proxied requests are annotated with a dispatch deadline; - each time a request is bufered, a timeout is registered. - if the timeout fires, the response exception fails, a 503 is returned, and the request is dropped. - if the request is processed into the inner stack, the timeout is ignored. The dispatch timeout limits the _time a request is buffered in a proxy_. This is distinct from the response timeout, as the server's response may naturally be delayed for any number of (non-proxy-related) reasons. The `insert_target` module has been generalized to `insert` to support setting the DispatchDeadline extension. The `buffer` module has been augmented with generic deadline-extraction logic. The `svc` module now exposes its own builder type that notably adds a `buffer_pending` helper. It's helpful to pull a builder type into the proxy to assist debugging type errors when modifying stacks. Fixes #2779 #2795 commit caf899557c3b041190f63544da865396231b3e30 Author: Oliver Gould <[email protected]> Date: Fri May 3 15:55:32 2019 -0700 router: Fail requests when the route is not ready (#241) In #2779, we plan to expire requests while they are buffered. However, the router _implicitly_ buffers requests in the executor when the inner service is not ready. This change alters the route to wrap all inner layers in a `LoadShed` so it can expect all services to `poll_ready()` immediately. commit 587bad101d9e5daeacb24b6733097c350a798356 Author: Eliza Weisman <[email protected]> Date: Fri May 3 14:18:08 2019 -0700 Remove Destination service query concurrency limit (#244) Currently, the proxy enforces a limit on the number of concurrent queries (i.e., the number of gRPC streams) to the Destination service. This limit was added based on information about the behaviour of the Destination service that is now known to be incorrect. This branch removes the limit on concurrent queries from the proxy's `control::destination` module. Although it should now be possible to simplify this code as a result of this change, I've refrained from doing any major refactoring in this branch --- my intention is to do this after the DNS fallback behaviour has also been removed, as together with this change, that will result in a _significant_ simplification of the module. Additionally, I've removed the tests for the concurrency limit, as they are no longer relevant. The `LINKERD2_PROXY_DESTINATION_CLIENT_CONCURRENCY_LIMIT` environment variable was also removed; this is not a breaking change as neither the CLI nor the proxy injector will currently set this env var. Signed-off-by: Eliza Weisman <[email protected]> commit cbdf45b44f7e4d852dc0497716062167ab9539fb Author: Sean McArthur <[email protected]> Date: Thu May 2 11:47:48 2019 -0700 Remove h2::Error requirement from metrics Signed-off-by: Sean McArthur <[email protected]> commit 3276949d4608dc4344b7bed3de2fc4b3080c2c6e Author: Sean McArthur <[email protected]> Date: Thu May 2 09:44:00 2019 -0700 delete unused proxy::http::metrics::class module Signed-off-by: Sean McArthur <[email protected]> Signed-off-by: Eliza Weisman <[email protected]>
commit 5f89351081eff47a4ab8cd88e2e1a69a04f86541 Author: Oliver Gould <[email protected]> Date: Thu May 9 16:39:24 2019 -0700 Upgrade tower dependencies (#249) Tower must be updated in order to pickup tower-rs/tower#281 to address #2804. This adopts released crates where possible. commit 5d5eed6f8180b8db4090d995e71fdf7b0890c647 Author: Zahari Dichev <[email protected]> Date: Thu May 9 01:08:34 2019 +0300 Assert that TLS connection is refused if identity is not certified yet (#243) This branch adds tls capability to the support cient used in tests. In addition to that it adds two tests verifying that a TLS connection is refused in case the identity is not certified yet. This attempts to fix ##2598 and provide facility to write tests for #2676. As these are still some of my first lines of Rust code, it is advised to approach everything with a healthy dose of doubt :) Signed-off-by: Zahari Dichev <[email protected]> commit 1b9bb3745e44c959d1d41d14fed2b2822c82b5ba Author: Oliver Gould <[email protected]> Date: Wed May 8 14:28:37 2019 -0700 Introduce dispatch timeouts around buffers (#246) The proxy has several buffers, especially where it routes requests over shared stacks. If any of these routes is unavailable, then a request may remain buffered indefinitely. Previously, before service profiles were introduced, there was a default _response_ timeout that would cause these requests to fail; but since this response timeout is now optional (and is only applied once the request has been routed within a proxy), then we need a new mechanism to prevent requests from getting "stuck". This change does the following: - all proxied requests are annotated with a dispatch deadline; - each time a request is bufered, a timeout is registered. - if the timeout fires, the response exception fails, a 503 is returned, and the request is dropped. - if the request is processed into the inner stack, the timeout is ignored. The dispatch timeout limits the _time a request is buffered in a proxy_. This is distinct from the response timeout, as the server's response may naturally be delayed for any number of (non-proxy-related) reasons. The `insert_target` module has been generalized to `insert` to support setting the DispatchDeadline extension. The `buffer` module has been augmented with generic deadline-extraction logic. The `svc` module now exposes its own builder type that notably adds a `buffer_pending` helper. It's helpful to pull a builder type into the proxy to assist debugging type errors when modifying stacks. Fixes #2779 #2795 commit caf899557c3b041190f63544da865396231b3e30 Author: Oliver Gould <[email protected]> Date: Fri May 3 15:55:32 2019 -0700 router: Fail requests when the route is not ready (#241) In #2779, we plan to expire requests while they are buffered. However, the router _implicitly_ buffers requests in the executor when the inner service is not ready. This change alters the route to wrap all inner layers in a `LoadShed` so it can expect all services to `poll_ready()` immediately. commit 587bad101d9e5daeacb24b6733097c350a798356 Author: Eliza Weisman <[email protected]> Date: Fri May 3 14:18:08 2019 -0700 Remove Destination service query concurrency limit (#244) Currently, the proxy enforces a limit on the number of concurrent queries (i.e., the number of gRPC streams) to the Destination service. This limit was added based on information about the behaviour of the Destination service that is now known to be incorrect. This branch removes the limit on concurrent queries from the proxy's `control::destination` module. Although it should now be possible to simplify this code as a result of this change, I've refrained from doing any major refactoring in this branch --- my intention is to do this after the DNS fallback behaviour has also been removed, as together with this change, that will result in a _significant_ simplification of the module. Additionally, I've removed the tests for the concurrency limit, as they are no longer relevant. The `LINKERD2_PROXY_DESTINATION_CLIENT_CONCURRENCY_LIMIT` environment variable was also removed; this is not a breaking change as neither the CLI nor the proxy injector will currently set this env var. Signed-off-by: Eliza Weisman <[email protected]> commit cbdf45b44f7e4d852dc0497716062167ab9539fb Author: Sean McArthur <[email protected]> Date: Thu May 2 11:47:48 2019 -0700 Remove h2::Error requirement from metrics Signed-off-by: Sean McArthur <[email protected]> commit 3276949d4608dc4344b7bed3de2fc4b3080c2c6e Author: Sean McArthur <[email protected]> Date: Thu May 2 09:44:00 2019 -0700 delete unused proxy::http::metrics::class module Signed-off-by: Sean McArthur <[email protected]> Signed-off-by: Eliza Weisman <[email protected]>
The conduit dashboard command asychronously shells out and runs "kubectl
proxy".
This change replaces the shelling out with calls to kubernetes proxy
APIs. It also allows us to enable race detection in our go tests, as the
shell out code tests did not pass race detection.
Fixes #173
Signed-off-by: Andrew Seigner [email protected]