Skip to content

Replace shelling out with kubernetes proxy#249

Merged
siggy merged 1 commit intomasterfrom
siggy/no-shells
Feb 2, 2018
Merged

Replace shelling out with kubernetes proxy#249
siggy merged 1 commit intomasterfrom
siggy/no-shells

Conversation

@siggy
Copy link
Member

@siggy siggy commented Feb 1, 2018

The conduit dashboard command asychronously shells out and runs "kubectl
proxy".

This change replaces the shelling out with calls to kubernetes proxy
APIs. It also allows us to enable race detection in our go tests, as the
shell out code tests did not pass race detection.

Fixes #173

Signed-off-by: Andrew Seigner [email protected]

@siggy siggy self-assigned this Feb 1, 2018
Copy link
Contributor

@pcalcado pcalcado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIOLI but what about splitting these into two PRs, one with just the -race change? I think it makes it more cohesive

api, err := k8s.NewK8sAPI(shell.NewUnixShell().HomeDir(), kubeconfigPath)
if err != nil {
log.Fatalf("Failed to start kubectl: %v", err)
log.Fatalf("NewK8sAPI() failed with: %+v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is meant to be seen by the user, I'd keep the original message and avoid leaking method names. Even if not part of the UX, I think that this can go stale quickly as we refactor rename/extract methods so I'd avoid it


if err != nil {
log.Fatalf("Failed to generate URL for dashboard: %v", err)
log.Fatalf("Failed to generate URL for dashboard: %+v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we at some point have a blanket PR adding + to all %v?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah, we use this pattern all over the place, but my preference is to use "%s" when printing errors. We don't gain anything with "%v" or "%+v", and those aren't type checked by go vet. I'd be 👍 👍 👍 for a branch that converts all printed errors to "%s". https://play.golang.org/p/KqJ4PezUhGs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I generally lean towards blanket %+v everywhere, particularly if %s and %+v yield the same output for error types, it's one less thing for me to think about. But, I had not considered the go vet angle.

}

fmt.Printf("Opening [%s] in the default browser\n", url)
err = browser.OpenURL(url.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this opening the browser before starting the proxy? Won't it 404?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's initially a 404, but Chrome and Safari do a quick reload, so it Just Works.

I think we have a few options:

  1. leave it as-is
  2. print the URL instead of opening a browser
  3. add a delay before opening the browser
  4. add some asynchronous signaling around the StartProxy() API to let us know when the proxy is ready

2 is the most platform-agnostic but less convenient, 3 is a bit hacky and not reliable, 4 is more complexity. i'm inclined to leave it as-is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siggy I'm ok with leaving it it it works but maybe we should do 2 anyway just in case the browser open fails?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds good. as i write this i'm realizing we already do 2:

$ bin/go-run cli dashboard -p 0
github.com/runconduit/conduit/cli
Opening [http://127.0.0.1:56026/api/v1/namespaces/conduit/services/web:http/proxy/] in the default browser
Starting to serve on 127.0.0.1:56026

Copy link
Contributor

@klingerf klingerf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, this seems like a good improvement. Just had some miscellaneous comments below.

@@ -1,5 +1,5 @@
## compile controller services
FROM gcr.io/runconduit/go-deps:dac3fae6 as golang
FROM gcr.io/runconduit/go-deps:f39a1eed as golang
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, would you want to add the -race flag in this file too, on line 8?

KubernetesPods = "pods"
kubectlDefaultProxyPort = 8001
kubectlDefaultTimeout = 10 * time.Second
portWhenProxyNotRunning = -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove kubectlDefaultTimeout and portWhenProxyNotRunning too.


if err != nil {
log.Fatalf("Failed to generate URL for dashboard: %v", err)
log.Fatalf("Failed to generate URL for dashboard: %+v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah, we use this pattern all over the place, but my preference is to use "%s" when printing errors. We don't gain anything with "%v" or "%+v", and those aren't type checked by go vet. I'd be 👍 👍 👍 for a branch that converts all printed errors to "%s". https://play.golang.org/p/KqJ4PezUhGs

RUN CGO_ENABLED=0 GOOS=windows go build -a -installsuffix cgo -o /out/conduit-windows -ldflags "-X github.com/runconduit/conduit/pkg/version.Version=$CONDUIT_VERSION" ./cli
RUN CGO_ENABLED=0 GOOS=darwin go build -race -a -installsuffix cgo -o /out/conduit-darwin -ldflags "-X github.com/runconduit/conduit/pkg/version.Version=$CONDUIT_VERSION" ./cli
RUN CGO_ENABLED=0 GOOS=linux go build -race -a -installsuffix cgo -o /out/conduit-linux -ldflags "-X github.com/runconduit/conduit/pkg/version.Version=$CONDUIT_VERSION" ./cli
RUN CGO_ENABLED=0 GOOS=windows go build -race -a -installsuffix cgo -o /out/conduit-windows -ldflags "-X github.com/runconduit/conduit/pkg/version.Version=$CONDUIT_VERSION" ./cli
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud here -- I wonder how much additional time the -race flag adds to the build. Building this docker image is already pretty slow, since we produce three static binaries, one for each platform. I wouldn't want to slow it down more if race detection is expensive. Also, since we're building the same target with all three build commands, we'd get the same results if only 1 of the 3 built with race detection.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per @pcalcado, i'm going to add race detection in a separate branch.

related to this specific Dockerfile, i'm only going to add -race back for ci and testing, per https://blog.golang.org/race-detector:

race-enabled binaries can use ten times the CPU and memory, so it is impractical to enable the race detector all the time

pkg/k8s/api.go Outdated
func (kubeapi *kubernetesApi) StartProxy(proxyPort int) error {
server, err := proxyCreate(kubeapi.Config)
if err != nil {
log.Fatalf("proxyCreate() failed with: %+v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relates to #17. I don't think we want to call log.Fatalf here. That will cause an error message to be logged and then the process will immediately exit with a non-zero exit code. Since the method signature of this method allows you to return an error, I think it would be better to return an error, and let the caller decide how they want to log / exit.

It's also worth noting that the previous implementation of this method used return fmt.Errorf instead of log.Fatalf.

pkg/k8s/api.go Outdated
return &kubernetesApi{Config: config}, nil
}

func proxyCreate(config *rest.Config) (*proxy.Server, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same goes for this method -- it should return errors instead of aborting.

pkg/k8s/api.go Outdated
return server, nil
}

func proxyListen(server *proxy.Server, proxyPort int) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same goes for this method -- it should return errors instead of aborting.

@siggy siggy force-pushed the siggy/no-shells branch 3 times, most recently from bc6b93f to 7866010 Compare February 1, 2018 23:23
Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍴 🐚 ⭐

name = "k8s.io/apimachinery"
version = "kubernetes-1.9.1"

[[constraint]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sad that we have to pull this into all of our Go projects when it's only the CLI that needs it 😢

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I looked at implementing this strictly in terms of client-go, but I'd prefer to leverage https://github.com/kubernetes/kubernetes/blob/master/pkg/kubectl/proxy/proxy_server.go#L182.

The good news is this should mostly slow the build the first time around.

// blocks until killed
err = api.StartProxy(proxyPort)
if err != nil {
fmt.Fprintf(os.Stderr, "Error starting proxy via kubectl: %s", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think kubectl can be taken out of this error message now

os.Exit(1)
}
// blocks until killed
err = api.StartProxy(proxyPort)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to use a port of 0 so it binds an ephemeral port? If it doesn't return, then I'm guessing it's not practical to get the port that way...

If not, can we manually try to choose an ephemeral port until there are no port conflicts? (i.e. by handling the port-conflict error and retrying on a new port)

pods, err := s.pods.GetPodsByIndex(ipStr)
if err != nil {
log.Debugf("Cannot get pod for IP %s: %s", ipStr, err)
log.Errorf("Cannot get pod for IP %s: %s", ipStr, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I bug you to revert the log level changes in this file? Most of these logs are not indicative of a larger problem. For instance, any external request will have a source IP that does not belong to a pod, so with this change we'd end up error logging on every request served.

The conduit dashboard command asychronously shells out and runs "kubectl
proxy".

This change replaces the shelling out with calls to kubernetes proxy
APIs. It also allows us to enable race detection in our go tests, as the
shell out code tests did not pass race detection.

Fixes #173

Signed-off-by: Andrew Seigner <[email protected]>
@pcalcado
Copy link
Contributor

pcalcado commented Feb 2, 2018

Thanks for looking at the initial feedback ✨ I'm testing this on my machine and will report results soon

Copy link
Contributor

@pcalcado pcalcado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing this on my machine against some scenarios, I see that the following error scenarios:

  • No running k8s
  • Running k8s but conduit not installed
  • Conduit had an error on installing and its containers are error'd

End up with the browser opening as 404 and the CLI never terminates. Should we add a status check before? I am also happy having this as a separate issue to keep this PR moving.

@siggy
Copy link
Member Author

siggy commented Feb 2, 2018

@pcalcado i've filed #258 to track handling those error scenarios. i'm going to merge this as it's quite a big dependency change.

@siggy siggy merged commit 9a40d98 into master Feb 2, 2018
@siggy siggy deleted the siggy/no-shells branch February 2, 2018 18:32
@siggy siggy removed the review/ready Issue has a reviewable PR label Feb 2, 2018
siggy added a commit that referenced this pull request Feb 2, 2018
We previously did not have race detection enabled because our tests
would fail. Following #249, this is no longer the case.

Enable race detection in ci and build instructions.

Fixes #173
@siggy siggy mentioned this pull request Feb 2, 2018
siggy added a commit that referenced this pull request Feb 2, 2018
We previously did not have race detection enabled because our tests
would fail. Following #249, this is no longer the case.

Enable race detection in ci and build instructions.

Fixes #173

Signed-off-by: Andrew Seigner <[email protected]>
siggy added a commit that referenced this pull request Feb 2, 2018
We previously did not have race detection enabled because our tests
would fail. Following #249, this is no longer the case.

Enable race detection in ci and build instructions. This change also
fixes client_test.go attempting to allocate a 2GB buffer due to bad test
input.

Fixes #173

Signed-off-by: Andrew Seigner <[email protected]>
siggy added a commit that referenced this pull request Feb 2, 2018
We previously did not have race detection enabled because our tests
would fail. Following #249, this is no longer the case.

Enable race detection in ci and build instructions. This change also
fixes client_test.go attempting to allocate a 2GB buffer due to bad test
input.

Fixes #173

Signed-off-by: Andrew Seigner <[email protected]>
hawkw added a commit that referenced this pull request May 10, 2019
commit 5f89351081eff47a4ab8cd88e2e1a69a04f86541
Author: Oliver Gould <[email protected]>
Date:   Thu May 9 16:39:24 2019 -0700

    Upgrade tower dependencies (#249)

    Tower must be updated in order to pickup tower-rs/tower#281
    to address #2804.

    This adopts released crates where possible.

commit 5d5eed6f8180b8db4090d995e71fdf7b0890c647
Author: Zahari Dichev <[email protected]>
Date:   Thu May 9 01:08:34 2019 +0300

    Assert that TLS connection is refused if identity is not certified yet (#243)

    This branch adds tls capability to the support cient used in tests. In addition to that it adds two tests verifying that a TLS connection is refused in case the identity is not certified yet. This attempts to fix ##2598 and provide facility to write tests for #2676.

    As these are still some of my first lines of Rust code, it is advised to approach everything with a healthy dose of doubt :)

    Signed-off-by: Zahari Dichev <[email protected]>

commit 1b9bb3745e44c959d1d41d14fed2b2822c82b5ba
Author: Oliver Gould <[email protected]>
Date:   Wed May 8 14:28:37 2019 -0700

    Introduce dispatch timeouts around buffers (#246)

    The proxy has several buffers, especially where it routes requests over
    shared stacks. If any of these routes is unavailable, then a request may
    remain buffered indefinitely. Previously, before service profiles were
    introduced, there was a default _response_ timeout that would cause
    these requests to fail; but since this response timeout is now optional
    (and is only applied once the request has been routed within a proxy),
    then we need a new mechanism to prevent requests from getting "stuck".

    This change does the following:
    - all proxied requests are annotated with a dispatch deadline;
    - each time a request is bufered, a timeout is registered.
    - if the timeout fires, the response exception fails, a 503 is returned,
      and the request is dropped.
    - if the request is processed into the inner stack, the timeout is
      ignored.

    The dispatch timeout limits the _time a request is buffered in a proxy_.
    This is distinct from the response timeout, as the server's response may
    naturally be delayed for any number of (non-proxy-related) reasons.

    The `insert_target` module has been generalized to `insert` to support
    setting the DispatchDeadline extension.

    The `buffer` module has been augmented with generic deadline-extraction
    logic.

    The `svc` module now exposes its own builder type that notably adds
    a `buffer_pending` helper. It's helpful to pull a builder type into the
    proxy to assist debugging type errors when modifying stacks.

    Fixes #2779 #2795

commit caf899557c3b041190f63544da865396231b3e30
Author: Oliver Gould <[email protected]>
Date:   Fri May 3 15:55:32 2019 -0700

    router: Fail requests when the route is not ready (#241)

    In #2779, we plan to expire requests while they are
    buffered. However, the router _implicitly_ buffers requests in the
    executor when the inner service is not ready.

    This change alters the route to wrap all inner layers in a `LoadShed`
    so it can expect all services to `poll_ready()` immediately.

commit 587bad101d9e5daeacb24b6733097c350a798356
Author: Eliza Weisman <[email protected]>
Date:   Fri May 3 14:18:08 2019 -0700

    Remove Destination service query concurrency limit (#244)

    Currently, the proxy enforces a limit on the number of concurrent
    queries (i.e., the number of gRPC streams) to the Destination service.
    This limit was added based on information about the behaviour of the
    Destination service that is now known to be incorrect.

    This branch removes the limit on concurrent queries from the proxy's
    `control::destination` module. Although it should now be possible to
    simplify this code as a result of this change, I've refrained from doing
    any major refactoring in this branch --- my intention is to do this
    after the DNS fallback behaviour has also been removed, as together with
    this change, that will result in a _significant_ simplification of the
    module. Additionally, I've removed the tests for the concurrency limit,
    as they are no longer relevant.

    The `LINKERD2_PROXY_DESTINATION_CLIENT_CONCURRENCY_LIMIT`
    environment variable was also removed; this is not a breaking change as
    neither the CLI nor the proxy injector will currently set this env var.

    Signed-off-by: Eliza Weisman <[email protected]>

commit cbdf45b44f7e4d852dc0497716062167ab9539fb
Author: Sean McArthur <[email protected]>
Date:   Thu May 2 11:47:48 2019 -0700

    Remove h2::Error requirement from metrics

    Signed-off-by: Sean McArthur <[email protected]>

commit 3276949d4608dc4344b7bed3de2fc4b3080c2c6e
Author: Sean McArthur <[email protected]>
Date:   Thu May 2 09:44:00 2019 -0700

    delete unused proxy::http::metrics::class module

    Signed-off-by: Sean McArthur <[email protected]>

Signed-off-by: Eliza Weisman <[email protected]>
hawkw added a commit that referenced this pull request May 10, 2019
commit 5f89351081eff47a4ab8cd88e2e1a69a04f86541
Author: Oliver Gould <[email protected]>
Date:   Thu May 9 16:39:24 2019 -0700

    Upgrade tower dependencies (#249)

    Tower must be updated in order to pickup tower-rs/tower#281
    to address #2804.

    This adopts released crates where possible.

commit 5d5eed6f8180b8db4090d995e71fdf7b0890c647
Author: Zahari Dichev <[email protected]>
Date:   Thu May 9 01:08:34 2019 +0300

    Assert that TLS connection is refused if identity is not certified yet (#243)

    This branch adds tls capability to the support cient used in tests. In addition to that it adds two tests verifying that a TLS connection is refused in case the identity is not certified yet. This attempts to fix ##2598 and provide facility to write tests for #2676.

    As these are still some of my first lines of Rust code, it is advised to approach everything with a healthy dose of doubt :)

    Signed-off-by: Zahari Dichev <[email protected]>

commit 1b9bb3745e44c959d1d41d14fed2b2822c82b5ba
Author: Oliver Gould <[email protected]>
Date:   Wed May 8 14:28:37 2019 -0700

    Introduce dispatch timeouts around buffers (#246)

    The proxy has several buffers, especially where it routes requests over
    shared stacks. If any of these routes is unavailable, then a request may
    remain buffered indefinitely. Previously, before service profiles were
    introduced, there was a default _response_ timeout that would cause
    these requests to fail; but since this response timeout is now optional
    (and is only applied once the request has been routed within a proxy),
    then we need a new mechanism to prevent requests from getting "stuck".

    This change does the following:
    - all proxied requests are annotated with a dispatch deadline;
    - each time a request is bufered, a timeout is registered.
    - if the timeout fires, the response exception fails, a 503 is returned,
      and the request is dropped.
    - if the request is processed into the inner stack, the timeout is
      ignored.

    The dispatch timeout limits the _time a request is buffered in a proxy_.
    This is distinct from the response timeout, as the server's response may
    naturally be delayed for any number of (non-proxy-related) reasons.

    The `insert_target` module has been generalized to `insert` to support
    setting the DispatchDeadline extension.

    The `buffer` module has been augmented with generic deadline-extraction
    logic.

    The `svc` module now exposes its own builder type that notably adds
    a `buffer_pending` helper. It's helpful to pull a builder type into the
    proxy to assist debugging type errors when modifying stacks.

    Fixes #2779 #2795

commit caf899557c3b041190f63544da865396231b3e30
Author: Oliver Gould <[email protected]>
Date:   Fri May 3 15:55:32 2019 -0700

    router: Fail requests when the route is not ready (#241)

    In #2779, we plan to expire requests while they are
    buffered. However, the router _implicitly_ buffers requests in the
    executor when the inner service is not ready.

    This change alters the route to wrap all inner layers in a `LoadShed`
    so it can expect all services to `poll_ready()` immediately.

commit 587bad101d9e5daeacb24b6733097c350a798356
Author: Eliza Weisman <[email protected]>
Date:   Fri May 3 14:18:08 2019 -0700

    Remove Destination service query concurrency limit (#244)

    Currently, the proxy enforces a limit on the number of concurrent
    queries (i.e., the number of gRPC streams) to the Destination service.
    This limit was added based on information about the behaviour of the
    Destination service that is now known to be incorrect.

    This branch removes the limit on concurrent queries from the proxy's
    `control::destination` module. Although it should now be possible to
    simplify this code as a result of this change, I've refrained from doing
    any major refactoring in this branch --- my intention is to do this
    after the DNS fallback behaviour has also been removed, as together with
    this change, that will result in a _significant_ simplification of the
    module. Additionally, I've removed the tests for the concurrency limit,
    as they are no longer relevant.

    The `LINKERD2_PROXY_DESTINATION_CLIENT_CONCURRENCY_LIMIT`
    environment variable was also removed; this is not a breaking change as
    neither the CLI nor the proxy injector will currently set this env var.

    Signed-off-by: Eliza Weisman <[email protected]>

commit cbdf45b44f7e4d852dc0497716062167ab9539fb
Author: Sean McArthur <[email protected]>
Date:   Thu May 2 11:47:48 2019 -0700

    Remove h2::Error requirement from metrics

    Signed-off-by: Sean McArthur <[email protected]>

commit 3276949d4608dc4344b7bed3de2fc4b3080c2c6e
Author: Sean McArthur <[email protected]>
Date:   Thu May 2 09:44:00 2019 -0700

    delete unused proxy::http::metrics::class module

    Signed-off-by: Sean McArthur <[email protected]>

Signed-off-by: Eliza Weisman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants