Skip to content

Update web server to use tap APIService#3208

Merged
kleimkuhler merged 10 commits intomasterfrom
kleimkuhler/web-tap-apiserver
Aug 8, 2019
Merged

Update web server to use tap APIService#3208
kleimkuhler merged 10 commits intomasterfrom
kleimkuhler/web-tap-apiserver

Conversation

@kleimkuhler
Copy link
Contributor

Motivation

PR #3167 introduced the tap APIService and migrated linkerd tap to use it.
Subsequent PRs (#3186 and #3187) updated linkerd top and linkerd profile --tap to use the tap APIService. This PR moves the web's Go server to now also
use the tap APIService instead of the public API. It also ensures an error
banner is shown to the user when unauthorized taps fail via linkerd top
command in Overview and Top, and linkerd tap command in Tap.

Details

The majority of these changes are focused around piping through the HTTP error
that occurs and making sure the error banner generated displays the error
message explaining to view the tap RBAC docs.

httpError is now public (HTTPError) and the error message generated is short
enough to fit in a control frame (explained here).

Testing

The error we are testing for only occurs when the linkerd-web service account is
not authorzied to tap resources. Unforutnately that is not the case on Docker
For Mac (assuming that is what you use locally), so you'll need to test on a
different cluster. I chose a GKE cluster made through the GKE console--not made
through cluster-utils because it adds cluster-admin.

Checkout the branch locally and bin/docker-build or ares-build if you have
it setup. It should produce a linkerd with the version git-04e61786. I have
already pushed the dependent components, so you won't need to bin/docker-push git-04e61786.

Install linkerd on this GKE cluster and try to run tap or top commands via
the web. You should see the following errors:

Tap

web-tap-unauthorized

Top

web-top-unauthorized

Signed-off-by: Kevin Leimkuhler [email protected]

@kleimkuhler kleimkuhler requested review from scottcarol and siggy August 7, 2019 22:12
@kleimkuhler kleimkuhler self-assigned this Aug 7, 2019
@grampelberg
Copy link
Contributor

A Websocket error: internal error sounds like we're broken and it isn't expected. Can we have a specific message for auth issues? Like Missing authorization, see https://linkerd.io/tap-rbac to remedy? It'd be nice if the link was clickable and formatted that way as well.

@siggy should this be a specific section on that page for the dashboard? We'll need to explain the web service account and such.

@l5d-bot
Copy link
Collaborator

l5d-bot commented Aug 7, 2019

Copy link
Member

@siggy siggy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works!

@siggy
Copy link
Member

siggy commented Aug 8, 2019

@grampelberg Once #3203 ships the dashboard should just work, and this error will be more exception than rule, but I do think some special treatment here is warranted.

Recommend in increasing order of effort (and descending order of whether we should include it for 2.5, I think we should just do the first two):

  1. Document linkerd-web RBAC management, filed Add mock timers for timing-dependent tests #455 to track.
  2. Update the Websocket error: internal error... message to be more RBAC-specific.
  3. Render the error message with a clickable URL.
  4. Render the error message differently on the dashboard from other error messages.

@kleimkuhler
Copy link
Contributor Author

@siggy Thanks for the comments I addressed them all above.

@grampelberg I changed the error message so that it now only displays: Missing authorization, visit https://linkerd.io/tap-rbac to remedy as shown in the picture below. I worked with @scottcarol for a little bit to try and get the error banner to render the URL as a link, but ultimately was unable to after trying to get react-linkify to render it. As @siggy mentioned in the above comment, we can address this in follow-up PRs in time for a stable 2.5.

The more significant change in this commit is that tap errors now return a websocket.ClosePolicyViolation status code so that we can branch on that here and not display the Websocket close error [${e.code}: ${wsCloseCodes[e.code]}] prefix. Branching on a unique code was chosen over branching on a regex such as looking for unauthorized in the reason.

ClosePolicyViolation was chosen because in the HTTP standard it notes:

This is a generic status code, used when codes 1003 and 1009 are not suitable.

@kleimkuhler
Copy link
Contributor Author

It looks like this line in the CI logs indicates something in protohttp is failing: FAIL github.com/linkerd/linkerd2/pkg/protohttp [build failed]. I changed the HTTPError type name in here to make it public and added a comment, but nothing else. Any idea what's up? I don't see this error locally.

@l5d-bot
Copy link
Collaborator

l5d-bot commented Aug 8, 2019

@siggy
Copy link
Member

siggy commented Aug 8, 2019

@kleimkuhler re: the ci failure, locally i get:

$ go test ./...
...
# github.com/linkerd/linkerd2/pkg/protohttp [github.com/linkerd/linkerd2/pkg/protohttp.test]
pkg/protohttp/protohttp_test.go:106:26: undefined: httpError
pkg/protohttp/protohttp_test.go:153:16: undefined: httpError
...

Signed-off-by: Kevin Leimkuhler <[email protected]>
@kleimkuhler
Copy link
Contributor Author

Ah jeez thank you. I was looking for build issues with go build and not thinking it was just in the tests... It is fixed now

Copy link
Member

@siggy siggy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm modulo @scottcarol's eyes, and bin/lint and l5d-bot passing. 👍 🚢

Screen Shot 2019-08-07 at 10 05 27 PM

Signed-off-by: Kevin Leimkuhler <[email protected]>
@kleimkuhler
Copy link
Contributor Author

Yep just fixed that lint. Sounds good!

@l5d-bot
Copy link
Collaborator

l5d-bot commented Aug 8, 2019

Copy link
Contributor

@scottcarol scottcarol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Thanks @siggy for help demoing it.

To linkify the URL in the error message, you want to add react-linkify to package.json:

     "react-iframe": "1.7.16",
+    "react-linkify": "1.0.0-alpha",
     "react-router": "4.3.1",```

then surround line 72 of ErrorBanner.jsx with Linkify:

-            { !error ? null : <div>{error}</div> }
+            <Linkify>{ !error ? null : <div>{error}</div> }</Linkify>

to style it so it's readable, add a class to styles.css:

 .tapGrayed {
   color: lightgray;
 }
+
+.errorMessage a {
+  color: white;
+}

and change line 67 of ErrorBanner.jsx:

-          <div id="message-id" >
+          <div id="message-id" className="errorMessage">

this will render the below box with the URL linked.
Screen Shot red

@kleimkuhler
Copy link
Contributor Author

@scottcarol Oh awesome thanks for looking more into that! I'll get this change in now.

Signed-off-by: Kevin Leimkuhler <[email protected]>
@kleimkuhler
Copy link
Contributor Author

Ok the link is clickable now!

linkify-error

@l5d-bot
Copy link
Collaborator

l5d-bot commented Aug 8, 2019

@scottcarol
Copy link
Contributor

Your updated branch works for me, the error message was rendered and linked properly. 👍

I'm still seeing that l5d-bot is failing, but it failed in a different place this time:

Checking if there is a Kubernetes cluster available...Starting to serve on 127.0.0.1:8080

Failed to connect to Kubernetes cluster
cleaning up control-plane namespaces in k8s-context []
cleaning up data-plane namespaces in k8s-context []
no control-plane namespaces found
no data-plane namespaces found
no clusterrolebindings found
no clusterroles found
no mutatingwebhookconfigurations found
no validatingwebhookconfigurations found
no podsecuritypolicies found
no customresourcedefinitions found
no apiservices found
cleaning up rolebindings in kube-system namespace in k8s-context []

@kleimkuhler
Copy link
Contributor Author

Hm yea not sure why it says it failed. It passed CI this time.

Signed-off-by: Kevin Leimkuhler <[email protected]>
@l5d-bot
Copy link
Collaborator

l5d-bot commented Aug 8, 2019

@kleimkuhler
Copy link
Contributor Author

Okay I ran the integration tests locally and everything passed. I'm going to merge this.

➜  linkerd2 git:(kleimkuhler/web-tap-apiserver) bin/test-run $PWD/target/cli/darwin/linkerd l5d-integration
Checking the linkerd binary...[ok]
Checking if there is a Kubernetes cluster available...[ok]
==================RUNNING ALL TESTS==================
Testing Linkerd version [git-96c7c435] namespace [l5d-integration] k8s-context []
Test script: [install_test.go] Params: [-failfast --upgrade-from-version=stable-2.4.0 --linkerd-namespace=l5d-integration-upgrade]
ok      command-line-arguments  128.514s
Test script: [install_test.go] Params: [-failfast --linkerd-namespace=l5d-integration]
ok      command-line-arguments  117.239s
Test script: [serviceprofiles_test.go] Params: [--linkerd-namespace=l5d-integration]
ok      command-line-arguments  56.900s
Test script: [egress_test.go] Params: [--linkerd-namespace=l5d-integration]
ok      command-line-arguments  19.422s
Test script: [inject_test.go] Params: [--linkerd-namespace=l5d-integration]
ok      command-line-arguments  12.854s
Test script: [get_test.go] Params: [--linkerd-namespace=l5d-integration]
ok      command-line-arguments  16.492s
Test script: [tap_test.go] Params: [--linkerd-namespace=l5d-integration]
ok      command-line-arguments  43.629s
Test script: [stat_test.go] Params: [--linkerd-namespace=l5d-integration]
ok      command-line-arguments  10.234s
Test script: [routes_test.go] Params: [--linkerd-namespace=l5d-integration]
ok      command-line-arguments  0.820s

=== PASS: all tests passed

@kleimkuhler kleimkuhler merged commit 5d7662f into master Aug 8, 2019
@kleimkuhler kleimkuhler deleted the kleimkuhler/web-tap-apiserver branch August 8, 2019 17:18
@scottcarol
Copy link
Contributor

Thanks for doing the extra tests! 💪

@adleong
Copy link
Member

adleong commented Aug 8, 2019

This is awesome!

I may have missed the boat on the high level UX discussion, but what do folks think about replacing the content of the "Live requests" and "Tap" sections with a friendly message rather than using error banners? Especially since the error banner can be dismissed and then you're left with a UI that doesn't work. I think replacing the section entirely communicates "This feature is disabled" rather than "This feature is broken". WDYT?

@scottcarol
Copy link
Contributor

@adleong I agree that dashboard-wide we could do a better job explaining the reasons why no data exists and whether or not that is a concern. This aligns with #2998 which came out of conversation with @grampelberg about what to do if no data is found (currently in the dashboard we show an empty table if there are no rows to display, with no additional message).

@adleong
Copy link
Member

adleong commented Aug 8, 2019

Something along the lines of:

Screen Shot 2019-08-08 at 11 05 35 AM

@grampelberg
Copy link
Contributor

I personally prefer @adleong's suggestion. Should we get a ticket to track that or just update #2998?

@scottcarol
Copy link
Contributor

Is it possible for a dashboard user to have Top/Tap access for certain traffic streams and not others?

If the user would be blocked from all traffic streams when using the dashboard and we could check with just one query, the above suggestion seems simple to implement.

If the user could have access to some streams but not others, we'd need to allow the user to formulate their desired query before telling them "yes" or "no." I think a separate ticket would make sense for this while keeping in mind #2998. Happy to make the ticket later today.

@grampelberg
Copy link
Contributor

I would like to always fire off the request and handle the 403 response in-panel instead of as part of a popup.

@kleimkuhler
Copy link
Contributor Author

Wrote up the issue here: #3220

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants