Introduce Discovery API and endpoints command#2195
Conversation
888cbe5 to
3d3ec9c
Compare
dadjeibaah
left a comment
There was a problem hiding this comment.
This is looking great! I had a couple of comments. I had one question about the new discovery protobuf. What was the reasoning behind this decision? Could we have extended the public API and add the new Endpoints RPC to public API's protobuf? Asking out of curiosity.
| ```bash | ||
| bin/go-run controller/script/discovery-api-client | ||
| ``` | ||
|
|
cli/cmd/endpoints.go
Outdated
| return errors.New("--output currently only supports table and json") | ||
| } | ||
|
|
||
| var ( |
There was a problem hiding this comment.
I think this var needs to move right above func validate() to follow the golang style.
controller/cmd/public-api/main.go
Outdated
| log.Fatal(err.Error()) | ||
| } | ||
| defer proxyAPIConn.Close() | ||
| proxyAPIClient := discovery.NewApiClient(proxyAPIConn) |
There was a problem hiding this comment.
I think the naming within this line could be a little more consistent. It's a little confusing to see a proxyAPIClient being created but then it gets its initialization for the discovery package. Maybe we could change this name to discoveryClient or something like that so that it is clear that we will be interfacing with the discovery gRPC API.
| endpointResource = "endpoints" | ||
| ) | ||
|
|
||
| // TODO: prom metrics for all the queues/caches |
There was a problem hiding this comment.
We probably should turn this into an issue if this doesn't already exist. I can see this being very useful for debugging.
controller/api/proxy/server.go
Outdated
| rsp.ServicePorts[serviceID.String()] = &discoverySP | ||
| } | ||
|
|
||
| s.log.Debugf("Endpoints(%+v): %+v", params, rsp) |
There was a problem hiding this comment.
I think we might need to format these logs just a little bit. When I turn up logging for the proxy api this is what I get
{ServicePorts:map[web-svc.emojivoto:port_endpoints:<key:80 value:<pod_addresses:<addr:<ip:<ipv4:2886795278 > port:80 > pod:<name:\"emojivoto/web-5975764fbb-56k5d\" podIP:\"172.17.0.14\" deployment:\"emojivoto/web\" status:\"Running\" controllerNamespace:\"linkerd\" proxyReady:true proxyVersion:\"git-bae61845\" resourceVersion:\"458977\" > > > > books.default:port_endpoints:<key:7002 value:<pod_addresses:<addr:<ip:<ipv4:2886795274 > port:7002 > pod:<name:\"default/books-558b45bbb-zxhmt\" podIP:\"172.17.0.10\" deployment:\"default/books\" status:\"Running\" controllerNamespace:\"linkerd\" proxyReady:true proxyVersion:\"stable-2.1.0\" resourceVersion:\"462326\" > > > > authors.default:port_endpoints:<key:7001 value:<pod_addresses:<addr:<ip:<ipv4:2886795275 > port:7001 > pod:<name:\"default/authors-7dbc46647b-2p8j2\" podIP:\"172.17.0.11\" deployment:\"default/authors\" status:\"Running\" controllerNamespace:\"linkerd\" proxyReady:true proxyVersion:\"stable-2.1.0\" resourceVersion:\"462614\" > > > > emoji-svc.emojivoto:port_endpoints:<key:8080 value:<pod_addresses:<addr:<ip:<ipv4:2886795280 > port:8080 > pod:<name:\"emojivoto/emoji-5ltnc\" podIP:\"172.17.0.16\" daemon_set:\"emojivoto/emoji\" status:\"Running\" controllerNamespace:\"linkerd\" proxyReady:true proxyVersion:\"git-bae61845\" resourceVersion:\"458980\" > > > > voting-svc.emojivoto:port_endpoints:<key:8080 value:<pod_addresses:<addr:<ip:<ipv4:2886795281 > port:8080 > pod:<name:\"emojivoto/voting-96ccbb7b4-xcg6b\" podIP:\"172.17.0.17\" deployment:\"emojivoto/voting\" status:\"Running\" controllerNamespace:\"linkerd\" proxyReady:true proxyVersion:\"git-bae61845\" resourceVersion:\"459011\" > > > > webapp.default:port_endpoints:<key:7000 value:<pod_addresses:<addr:<ip:<ipv4:2886795266 > port:7000 > pod:<name:\"default/webapp-75c7795bf7-f6788\" podIP:\"172.17.0.2\" deployment:\"default/webapp\" status:\"Running\" controllerNamespace:\"linkerd\" proxyReady:true proxyVersion:\"stable-2.1.0\" resourceVersion:\"462205\" > > pod_addresses:<addr:<ip:<ipv4:2886795270 > port:7000 > pod:<name:\"default/webapp-75c7795bf7-cqg5k\" podIP:\"172.17.0.6\" deployment:\"default/webapp\" status:\"Running\" controllerNamespace:\"linkerd\" proxyReady:true proxyVersion:\"stable-2.1.0\" resourceVersion:\"462602\" > > pod_addresses:<addr:<ip:<ipv4:2886795272 > port:7000 > pod:<name:\"default/webapp-75c7795bf7-nzww9\" podIP:\"172.17.0.8\" deployment:\"default/webapp\" status:\"Running\" controllerNamespace:\"linkerd\" proxyReady:true proxyVersion:\"stable-2.1.0\" resourceVersion:\"462200\" > > > > ] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}" addr=":8086" component=serverWe could add in debug log statements within each iteration of each loop to show the results of getState(). That way we don't have to run another loop to format EndpointsResponse
3d3ec9c to
9b30d68
Compare
|
@dadjeibaah Good question re: the new discovery protobuf. The decision to add this as a new interface, implemented by both
A few alternatives, in decreasing order of viability:
|
adleong
left a comment
There was a problem hiding this comment.
This looks good. Every time I come back to this, I have to reconstruct why we can't just use the existing destination.Get API for this (we want it to be non-side-effecting) so it might be good to call that out in a comment somewhere.
| e.addEndpoints(newObj) | ||
| } | ||
|
|
||
| func (e *endpointsWatcher) getState() servicePorts { |
There was a problem hiding this comment.
What is this method doing that's different from just returning e.servicePorts? Is it effectively making a deep copy of the map? Is that necessary?
There was a problem hiding this comment.
oh, we copy the data out so that we can release the read mutex? I think this could probably use a comment
cli/cmd/endpoints.go
Outdated
| Port uint32 `json:"port"` | ||
| Pod string `json:"pod"` | ||
| Version string `json:"version"` | ||
| Weight uint32 `json:"weight"` |
There was a problem hiding this comment.
Given that we don't do anything with weights yet, we may not want to display them for now.
cli/cmd/endpoints.go
Outdated
| Pod string `json:"pod"` | ||
| Version string `json:"version"` | ||
| Weight uint32 `json:"weight"` | ||
| Identity string `json:"identity"` |
There was a problem hiding this comment.
Should this be called "service" or "authority"? I think "identity" will soon come to mean TLS identity which will correspond to service account.
| listener := newEndpointListener( | ||
| mockGetServer, | ||
| ownerKindAndName, | ||
| true, |
There was a problem hiding this comment.
I find explicit field names to be very helpful for boolean fields like this. mind bringing them back?
There was a problem hiding this comment.
Agree I prefer explicit field names. In this case, we're replacing:
listener := &endpointListener{...}with:
listener := newEndpointListener(...)This change was motivated by the instantiation of endpointListener.log in newEndpointListener(). We could alternatively instantiate the log here via &endpointListener{log: ...}, but calling newEndpointListener() gives slightly more test coverage.
|
It may also be a good idea to (somewhere?) describe all the places that this data transits and is cached and how this piece fits into the bigger debugging picture:
This does a good job of allowing us to introspect the endpoints watcher cache. How do we decide which other layers also require introspection? What classes of bugs cause data inconsistency at which layers? |
|
@grampelberg: updated help output to be more descriptive: $ bin/go-run cli endpoints -h
Introspect Linkerd's service discovery state.
This command provides debug information about the internal state of the
control-plane's proxy-api container. Note that this cache of service discovery
information is populated on-demand via linkerd-proxy requests. This command
will return "No endpoints found." until a linker-proxy begins routing
requests.
Usage:
linkerd endpoints [flags]
Examples:
# get all endpoints
linkerd endpoints
# get endpoints in the emojivoto namespace
linkerd endpoints -n emojivoto
# get all endpoints in json
linkerd endpoints -o json |
|
@siggy 👍 from me, let's give it a try and see how it works out. What do you think about a follow up to docs around "I don't have any metrics" or "my deployment isn't going the right place"? |
cli/cmd/endpoints.go
Outdated
| This command provides debug information about the internal state of the | ||
| control-plane's proxy-api container. Note that this cache of service discovery | ||
| information is populated on-demand via linkerd-proxy requests. This command | ||
| will return "No endpoints found." until a linker-proxy begins routing |
There was a problem hiding this comment.
is this linker-proxy supposed to be linkerd-proxy?
29ed1cb to
54b7a80
Compare
klingerf
left a comment
There was a problem hiding this comment.
This is awesome! The linkerd endpoints implementation is super slick, and the output looks great. I just had some organizational suggestions, intended to make our public API implementation easier to manage going forward.
controller/api/util/api_utils.go
Outdated
| ControlPlane: controllerComponent != "", | ||
| ProxyReady: proxyReady, | ||
| ProxyVersion: proxyVersion, | ||
| ResourceVersion: pod.GetObjectMeta().GetResourceVersion(), |
There was a problem hiding this comment.
Super minor, but since ObjectMeta is an embedded struct, you can write this as:
| ResourceVersion: pod.GetObjectMeta().GetResourceVersion(), | |
| ResourceVersion: pod.ResourceVersion, |
That's equivalent to how you're handling pod.Namespace and pod.Name on line 539.
| Ip: &public.IPAddress{Ip: &public.IPAddress_Ipv4{Ipv4: 1}}, | ||
| Port: 1234, | ||
| }, | ||
| }, |
There was a problem hiding this comment.
Consider adding a test case for IPv6 here as well?
| // This is a throwaway script for testing the proxy-api service | ||
|
|
||
| func main() { | ||
| rand.Seed(time.Now().UnixNano()) |
There was a problem hiding this comment.
Ah, interesting -- I see that the destination-client script also seeds random, but I don't think either of these scripts need to do that, since they're not actually using the rand package. My guess is this is a holdover from the simulate-proxy script. Mind removing the rand call here and in the destination-client script?
| @@ -0,0 +1,44 @@ | |||
| package main | |||
There was a problem hiding this comment.
Mind if we call this script dir discovery-client, instead of discovery-api-client? That feels more consistent with destination-client
| log.Fatal(err.Error()) | ||
| } | ||
|
|
||
| log.Infof("%+v", rsp) |
There was a problem hiding this comment.
This output might be more readable if you use the jsonpb marshaler on the response. Something like:
diff --git a/controller/script/discovery-api-client/main.go b/controller/script/discovery-api-client/main.go
index 53157750..a0c2f8c3 100644
--- a/controller/script/discovery-api-client/main.go
+++ b/controller/script/discovery-api-client/main.go
@@ -3,7 +3,9 @@ package main
import (
"context"
"flag"
+ "os"
+ "github.com/golang/protobuf/jsonpb"
"github.com/linkerd/linkerd2/controller/gen/controller/discovery"
log "github.com/sirupsen/logrus"
"google.golang.org/grpc"
@@ -26,7 +28,11 @@ func main() {
log.Fatal(err.Error())
}
- log.Infof("%+v", rsp)
+ marshaler := jsonpb.Marshaler{EmitDefaults: true, Indent: " "}
+ err = marshaler.Marshal(os.Stdout, rsp)
+ if err != nil {
+ log.Fatal(err.Error())
+ }
}
// newClient creates a new gRPC client to the Proxy API service.That will produce output like:
$ bin/go-run controller/script/discovery-api-client
{
"servicePorts": {
"linkerd-controller-api.linkerd": {
"portEndpoints": {
"8085": {
"podAddresses": [
{
"addr": {
"ip": {
"ipv4": 167838113
},
"port": 8085
},
"pod": {
"name": "linkerd/linkerd-controller-7fbb6dc457-4q8pq",
"podIP": "10.1.1.161",
"deployment": "linkerd/linkerd-controller",
"status": "Running",
"added": false,
"sinceLastReport": null,
"controllerNamespace": "linkerd",
"controlPlane": true,
"uptime": null,
"proxyReady": true,
"proxyVersion": "dev-e570f1d6-kl",
"resourceVersion": "13102"
}
}
]
}
}
},
...
proto/controller/discovery.proto
Outdated
|
|
||
| option go_package = "github.com/linkerd/linkerd2/controller/gen/controller/discovery"; | ||
|
|
||
| service Api { |
There was a problem hiding this comment.
I think it would be clearer if we named this proto service "Discovery", to match the filename. That will also make it easier to distinguish this service from the public API service, which is already called "Api", and it matches how we've named the services in the tap and destination protobuf.
| type APIClient interface { | ||
| pb.ApiClient | ||
| discovery.ApiClient | ||
| } |
| // 2) controller/discovery.Api | ||
| grpcServer pb.ApiServer | ||
| discoveryServer discoveryPb.ApiServer | ||
| } |
There was a problem hiding this comment.
Rather than having this handler reference two separate grpc server implementations, my preference is to create an APIServer interface, to parallel the APIClient interface that you've added, and then update the existing grpcServer struct to serve that interface. That will allow us to define all public API endpoints in one place.
So I'd modify grpc_server.go as follows:
diff --git a/controller/api/public/grpc_server.go b/controller/api/public/grpc_server.go
index d069b43c..b088a1d7 100644
--- a/controller/api/public/grpc_server.go
+++ b/controller/api/public/grpc_server.go
@@ -10,6 +10,7 @@ import (
"github.com/golang/protobuf/ptypes/duration"
"github.com/linkerd/linkerd2/controller/api/util"
healthcheckPb "github.com/linkerd/linkerd2/controller/gen/common/healthcheck"
+ "github.com/linkerd/linkerd2/controller/gen/controller/discovery"
tapPb "github.com/linkerd/linkerd2/controller/gen/controller/tap"
pb "github.com/linkerd/linkerd2/controller/gen/public"
"github.com/linkerd/linkerd2/controller/k8s"
@@ -28,6 +29,7 @@ type (
grpcServer struct {
prometheusAPI promv1.API
tapClient tapPb.TapClient
+ discoveryClient discovery.DiscoveryClient
k8sAPI *k8s.API
controllerNamespace string
ignoredNamespaces []string
@@ -48,18 +50,25 @@ const (
promClientCheckDescription = "control plane can talk to Prometheus"
)
+type APIServer interface {
+ pb.ApiServer
+ discovery.DiscoveryServer
+}
+
func newGrpcServer(
promAPI promv1.API,
tapClient tapPb.TapClient,
+ discoveryClient discovery.DiscoveryClient,
k8sAPI *k8s.API,
controllerNamespace string,
ignoredNamespaces []string,
singleNamespace bool,
-) *grpcServer {
+) APIServer {
grpcServer := &grpcServer{
prometheusAPI: promAPI,
tapClient: tapClient,
+ discoveryClient: discoveryClient,
k8sAPI: k8sAPI,
controllerNamespace: controllerNamespace,
ignoredNamespaces: ignoredNamespaces,
@@ -268,3 +277,15 @@ func (s *grpcServer) ListServices(ctx context.Context, req *pb.ListServicesReque
return &pb.ListServicesResponse{Services: svcs}, nil
}
+
+func (s *grpcServer) Endpoints(ctx context.Context, params *discovery.EndpointsParams) (*discovery.EndpointsResponse, error) {
+ log.Debugf("Endpoints request %+v", params)
+
+ rsp, err := s.discoveryClient.Endpoints(ctx, params)
+ if err != nil {
+ log.Errorf("endpoints request to proxy API failed: %s", err)
+ return nil, err
+ }
+
+ return rsp, nil
+}(note that this diff reflects the "Api" => "Discovery" protobuf change that I suggested)
And then I'd change this file as follows:
diff --git a/controller/api/public/http_server.go b/controller/api/public/http_server.go
index 0f96d104..6eb7ff9f 100644
--- a/controller/api/public/http_server.go
+++ b/controller/api/public/http_server.go
@@ -29,10 +29,7 @@ var (
)
type handler struct {
- // 1) public.Api
- // 2) controller/discovery.Api
- grpcServer pb.ApiServer
- discoveryServer discoveryPb.ApiServer
+ grpcServer APIServer
}
func (h *handler) ServeHTTP(w http.ResponseWriter, req *http.Request) {
@@ -247,7 +244,7 @@ func fullURLPathFor(method string) string {
}
func (h *handler) handleEndpoints(w http.ResponseWriter, req *http.Request) {
- rsp, err := h.discoveryServer.Endpoints(req.Context(), &discoveryPb.EndpointsParams{})
+ rsp, err := h.grpcServer.Endpoints(req.Context(), &discoveryPb.EndpointsParams{})
if err != nil {
writeErrorToHTTPResponse(w, err)
return
@@ -264,7 +261,7 @@ func NewServer(
addr string,
prometheusClient promApi.Client,
tapClient tapPb.TapClient,
- discoveryClient discoveryPb.ApiClient,
+ discoveryClient discoveryPb.DiscoveryClient,
k8sAPI *k8s.API,
controllerNamespace string,
ignoredNamespaces []string,
@@ -274,12 +271,12 @@ func NewServer(
grpcServer: newGrpcServer(
promv1.NewAPI(prometheusClient),
tapClient,
+ discoveryClient,
k8sAPI,
controllerNamespace,
ignoredNamespaces,
singleNamespace,
),
- discoveryServer: newDiscoveryServer(discoveryClient),
}
instrumentedHandler := prometheus.WithTelemetry(baseHandler)That will allow you to remove the controller/api/public/discovery_server.go and controller/api/public/discovery_server_test.go files that you're adding.
The Proxy API service lacked introspection of its internal state. Introduce a new gRPC Discovery API, implemented by two servers: 1) Proxy API Server: returns a snapshot of discovery state 2) Public API Server: pass-through to the Proxy API Server Also wire up a new `linkerd endpoints` command. Fixes #2165 Signed-off-by: Andrew Seigner <[email protected]>
Signed-off-by: Andrew Seigner <[email protected]>
Signed-off-by: Andrew Seigner <[email protected]>
Signed-off-by: Andrew Seigner <[email protected]>
also add comments Signed-off-by: Andrew Seigner <[email protected]>
Signed-off-by: Andrew Seigner <[email protected]>
Signed-off-by: Andrew Seigner <[email protected]>
e1ed658 to
5e5c8d3
Compare
klingerf
left a comment
There was a problem hiding this comment.
⭐️ Looks great! Thanks for addressing my feedback.
In #2195 we introduced `linkerd endpoints` on the CLI. I would like similar information to be on the web. This PR adds an api endpoint at `/api/endpoints`, and introduces a new debugging pagethat shows a table of endpoints, available at `/debug`
The Proxy API service lacked introspection of its internal state.
Introduce a new gRPC Discovery API, implemented by two servers:
Also wire up a new
linkerd endpointscommand.Fixes #2165
Signed-off-by: Andrew Seigner [email protected]