Adding missing filters (http fault, redis, mongo, tcp, rate limits, etc..)#192
Adding missing filters (http fault, redis, mongo, tcp, rate limits, etc..)#192mattklein123 merged 34 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
api/filter/http_fault.proto
Outdated
| import "google/protobuf/wrappers.proto"; | ||
|
|
||
| // Delay specification is used to inject latency into the rpc operations. | ||
| message Delay { |
There was a problem hiding this comment.
Most of this is implemented, except for the exponential faults.. It was there in Istio spec. Just thought I would add it here as well. LMK if you want me to remove it.
There was a problem hiding this comment.
Either remove or add a tracking issue at https://github.com/envoyproxy/envoy for implementation side (and reference here).
api/filter/mongo_proxy.proto
Outdated
| // applied to the following MongoDB operations: Query, Insert, GetMore, | ||
| // and KillCursors. Once an active delay is in progress, all incoming | ||
| // data up until the timer event fires will be a part of the delay. | ||
| Delay delay = 3; |
There was a problem hiding this comment.
Its already implemented.. Just needs remapping config fields in the code..
api/filter/tcp_fault.proto
Outdated
| // connection establishment. | ||
| google.protobuf.Duration terminate_after_period = 2; | ||
| } | ||
| } |
There was a problem hiding this comment.
this does not exist at all :P. Thought I would get this for free when I move the mongo fault into TCP, but then Mongo does L7 style fault (on operations, instead of raw bytes as this filter proposes). My next task is to streamline mongo fault, implement TCP fault and possibly redis fault if its easy.
There was a problem hiding this comment.
And for tcp, I am signing up (currently) only for delay fault, and possibly simple abort fault. Not the fancy abort after period.
api/filter/BUILD
Outdated
| api_proto_library( | ||
| name = "mongo_proxy", | ||
| srcs = ["mongo_proxy.proto"], | ||
| deps = [ |
There was a problem hiding this comment.
deps = [":http_fault"], for singleton. Please run buildifier over this file.
api/filter/http_fault.proto
Outdated
| // Delay specification is used to inject latency into the rpc operations. | ||
| message Delay { | ||
| // Delay type to use (fixed|exponential|..). Currently, only fixed delay (step function) is supported. | ||
| string type = 1; |
There was a problem hiding this comment.
I was going by what was defined in the configs (where type was a string). I guess Enum seems more sensible given we are doing protos.
api/filter/http_fault.proto
Outdated
| import "google/protobuf/wrappers.proto"; | ||
|
|
||
| // Delay specification is used to inject latency into the rpc operations. | ||
| message Delay { |
There was a problem hiding this comment.
Either remove or add a tracking issue at https://github.com/envoyproxy/envoy for implementation side (and reference here).
api/filter/http_fault.proto
Outdated
| google.protobuf.UInt32Value percent = 2; | ||
|
|
||
| oneof http_delay_type { | ||
| // Add a fixed delay before forwarding the operation upstream. Format: 1h/1m/1s/1ms. MUST be >=1ms. |
There was a problem hiding this comment.
There is no format for Durations..
There was a problem hiding this comment.
Yup. But this was intentional. For the doc-gen.. Coz it does not tell people that Duration is just a string and that people can specify 1h/1m/1s. When we generated docs for Istio, this was one of the issues. So, I went back to all protos, and manually added this, so that people reading docs knew what values to put.
There was a problem hiding this comment.
Duration is only a string in JSON/YAML form. We will eventually be generating docs for {proto3, JSON, YAML}, so this interpretation only applies to some readings. I'm surprised that docgen can't inline the documentations from Duration, which is where you would expect to find this kind of explanation.
I think a better way to put it might be "See https://developers.google.com/protocol-buffers/docs/proto3#json for the JSON/YAML Duration mapping.".
api/filter/http_fault.proto
Outdated
|
|
||
| oneof http_delay_type { | ||
| // Add a fixed delay before forwarding the operation upstream. Format: 1h/1m/1s/1ms. MUST be >=1ms. | ||
| google.protobuf.Duration fixed_delay = 2; |
There was a problem hiding this comment.
This has field number 2, but so does percent above. oneof fields share the same number space as the fields in the message around them.
api/filter/http_fault.proto
Outdated
| // network issues, overloaded upstream service, etc. | ||
| Delay delay = 1; | ||
|
|
||
| // Abort Http request attempts and return error codes back to downstream |
api/filter/http_fault.proto
Outdated
| // gRPC status code to use to abort a gRPC request. NOT IMPLEMENTED | ||
| string grpc_status = 2 ; | ||
| // HTTP2 error code used to abort a Http2 request. NOT IMPLEMENTED | ||
| string http2_error = 3 ; |
There was a problem hiding this comment.
Why aren't these error codes? Why don't HTTP/HTTP2 share a code (since they have the same status code space)?
api/filter/tcp_fault.proto
Outdated
| // percentage of connections to throttle. | ||
| float percent = 1; | ||
| // bandwidth limit in "bits" per second between downstream and Envoy | ||
| int64 downstream_limit_bps = 2; |
There was a problem hiding this comment.
Why are these signed? Why aren't they wrapped, e.g. google.protobuf.UInt32Value?
api/filter/tcp_fault.proto
Outdated
|
|
||
| // Alternatively, we could wait for a certain number of bytes to be | ||
| // transferred to upstream before throttling the bandwidth. | ||
| double throttle_after_bytes = 5; |
There was a problem hiding this comment.
You're waiting for a fractional byte? :)
There was a problem hiding this comment.
You gots no idea on the number of eyes this got.. We looked into a time machine and decided to design for the next 4000 years of prosperity
api/filter/tcp_fault.proto
Outdated
| // established, emulating remote server crash or link failure. | ||
| message Terminate { | ||
| // percentage of established Tcp connections to be terminated/reset | ||
| float percent = 1; |
There was a problem hiding this comment.
I think you should be consistent across filters in how percentages are represented (stick with what we do in eds.proto).
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
htuch
left a comment
There was a problem hiding this comment.
Looks mostly good, a few more things.
api/filter/fault.proto
Outdated
| import "google/protobuf/struct.proto"; | ||
| import "google/protobuf/wrappers.proto"; | ||
|
|
||
| // Delay specification is used to inject latency into the rpc/TCP proxy operations. |
There was a problem hiding this comment.
Nit: s/rpc/request/ (consistency with abort below).
There was a problem hiding this comment.
I used RPC explicitly coz this might be used in Mongo/Redis/TCP/HTTP. May be I should use a different term than generalizing a query as an RPC
api/filter/fault.proto
Outdated
| // Applicable only for HTTP connections. | ||
| oneof error_type { | ||
| // HTTP status code to use to abort the HTTP request. | ||
| int32 http_status = 2; |
api/filter/fault.proto
Outdated
| // HTTP status code to use to abort the HTTP request. | ||
| int32 http_status = 2; | ||
| // gRPC status code to use to abort the gRPC request. | ||
| int32 grpc_status = 3; |
There was a problem hiding this comment.
Has this been implemented? Have you some description of how we decide to do a gRPC fault rather than an HTTP one?
There was a problem hiding this comment.
This has not been implemented. I thought it would be easy to add. May be I should take it off for now..
api/filter/fault.proto
Outdated
|
|
||
| // If specified, the filter will abort requests based on the values in | ||
| // the object. At least abort or delay must be specified. | ||
| Abort abort = 2; |
There was a problem hiding this comment.
Is there ever a situation in which this or delay would be a repeated?
There was a problem hiding this comment.
Good question. My answer would be no. The filters can be stacked (i.e. multiple fault filters), such that the matching filter will activate. Changing this into an array would require changes to the filter implementation to process a sequence of faults rather than assume that each filter is responsible for one delay + abort.
api/filter/fault.proto
Outdated
| // injection filter can be applied selectively to requests that match a | ||
| // set of headers specified in the fault filter config. The chances of | ||
| // actual fault injection further depend on the values of abort_percent | ||
| // and fixed_delay_percent parameters.The filter will check the request’s |
api/filter/redis_proxy.proto
Outdated
| // exception to this behavior is when a connection to a backend is not | ||
| // yet established. In that case, the connect timeout on the cluster | ||
| // will govern the timeout until the connection is ready. | ||
| google.protobuf.Duration op_timeout = 3; |
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
|
Once we define these, is someone going to go through and do the work to wire them up so that we use them? |
|
Yea, I will do it. I just wanted all configs to be in proto form, as it makes our v2 migration easier in Istio. |
|
OK, cool. Do you want to just add them all then? We are missing router, buffer, etc. |
api/filter/fault.proto
Outdated
| import "google/protobuf/struct.proto"; | ||
| import "google/protobuf/wrappers.proto"; | ||
|
|
||
| // Delay specification is used to inject latency into the rpc/TCP proxy operations. |
| // yet established. In that case, the connect timeout on the cluster | ||
| // will govern the timeout until the connection is ready. | ||
| google.protobuf.Duration op_timeout = 1; | ||
| } |
There was a problem hiding this comment.
Nit: It's more readable if you put this above settings.
There was a problem hiding this comment.
+1 (I think there are a few other instances of this)
api/filter/fault.proto
Outdated
| // Applicable only for HTTP connections. | ||
| oneof error_type { | ||
| // HTTP status code to use to abort the HTTP request. | ||
| uint32 http_status = 2; |
There was a problem hiding this comment.
google.protobuf.UInt32Value?
There was a problem hiding this comment.
I've been in two minds on this, and have stopped objecting to unwrapped types (and in some cases doing it myself) when either: it's (1) a 100% compulsory field and we don't need defaulting behavior or (2) it's embedded under something like oneof where we can test presence/absence.
It's pretty ugly in proto IMHO that we don't just have aliases for wrapped types that express this, e.g. optional_uint32, rather than the full google.protobuf.UInt32Value.
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
|
@htuch / @mattklein123 I have added a bunch of missing filters (buffer, router, ip tagging, client_ssl auth, grpc-json transcoder). These filters have configuration. Others didn't have any configuration. Also, the missing pieces: rate limit filter proto seems different from the ones in http/tcp. Health check filter is not defined, as I am not sure of its status in the v2 api. Apart from these, all other filters have been defined. |
|
We need all the filters that have JSON schema configs, including TCP/HTTP RL and HC. Please take a look at the JSON schemas. |
Signed-off-by: Shriram Rajagopalan <[email protected]>
|
Added RL, and HC. |
api/filter/fault.proto
Outdated
| // Fixed delay (step function). | ||
| FIXED = 0; | ||
| // Exponential delay. | ||
| EXPONENTIAL = 1; |
There was a problem hiding this comment.
Are you going to implement this soon? If not, can we just remove for now?
api/filter/ip_tagging.proto
Outdated
| // x-envoy-internal is set to true. If x-envoy-internal is not set or | ||
| // false, a request is considered external. The filter defaults to both, | ||
| // and it will apply to all request types. | ||
| string request_type = 1; |
api/filter/rate_limit.proto
Outdated
| @@ -2,6 +2,19 @@ syntax = "proto3"; | |||
|
|
|||
| package envoy.api.v2.filter; | |||
There was a problem hiding this comment.
Should we put network filters in a network subdirectory and http filters in an http subdirectory? Seems like it would be cleaner? If you think that is good do you want to just move stuff around and then do the cleanup required in envoy for HTTP connection manager?
There was a problem hiding this comment.
PTAL.. shuffled stuff around
There was a problem hiding this comment.
I think changing package names can be considered an API breaking change. I don't think we can do this with HTTP connection manager..
There was a problem hiding this comment.
if its just envoymanager using it, and may be @timperret, we could do the right thing now than later?
There was a problem hiding this comment.
@htuch is it really breaking? Since we aren't using Any yet, and just Struct, it seems like this would be a compile only issue and not a wire issue? If so we should clean it up?
There was a problem hiding this comment.
It's breaking in the sense that a .proto for the filter config that wasn't encoded via Struct would break I think, but OK, I don't mind relaxing this one.
| // yet established. In that case, the connect timeout on the cluster | ||
| // will govern the timeout until the connection is ready. | ||
| google.protobuf.Duration op_timeout = 1; | ||
| } |
There was a problem hiding this comment.
+1 (I think there are a few other instances of this)
api/filter/transcoder.proto
Outdated
| // to Envoy over HTTP and get proxied to a gRPC service. The HTTP mapping | ||
| // for the gRPC service has to be defined by custom options, defined in | ||
| // https://cloud.google.com/service-management/reference/rpc/google.api#http | ||
| message gRPCJSONTranscoder { |
There was a problem hiding this comment.
nit: This is hard to read. I would do gRPCJsonTranscoder
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
|
addressed all comments. @mattklein123 / @htuch |
| ConnPoolSettings settings = 3; | ||
|
|
||
| // Redis connection pool settings | ||
| message ConnPoolSettings { |
There was a problem hiding this comment.
move message definition before usage?
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
|
|
||
| // Control options for response json. These options are passed directly | ||
| // to JsonPrintOptions. | ||
| message PrintOptions { |
There was a problem hiding this comment.
same here, message before usage
ci/ci_steps.sh
Outdated
|
|
||
| # Lint travis file. | ||
| travis lint .travis.yml --skip-completion-check | ||
| #travis lint .travis.yml --skip-completion-check |
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
|
@rshriram sorry can you quickly add the new option here: https://github.com/envoyproxy/envoy/pull/1931/files |
Signed-off-by: Shriram Rajagopalan <[email protected]>
|
done |
Signed-off-by: Shriram Rajagopalan <[email protected]>
| // | ||
| // * Note:* The fault injection filter must be inserted before any | ||
| // other filter, including the router filter. | ||
| message HTTPFault { |
There was a problem hiding this comment.
Can we move the common stuff in this file into a common fault.proto? Maybe just api/filter/fault.proto? It's a bit weird that we import http/fault.proto into mongo filter, and will eventually have a L4 fault filter also, etc.
api/filter/README.md
Outdated
| These specifications will be added in the near future. In the interim, you can still supply plain JSON configuration objects | ||
| for these missing filters by setting the `"deprecated_v1"` field to true in the filter's configuration. For example, | ||
| If a filter configuration is not captured in the proto specification, you can still supply plain JSON configuration objects | ||
| for such filters by setting the `"deprecated_v1"` field to true in the filter's configuration. For example, |
| // Specifies whether the filter operates in pass through mode or not. REQUIRED. | ||
| google.protobuf.BoolValue pass_through_mode = 1; | ||
| // Specifies the incoming HTTP endpoint that should be considered the | ||
| // health check endpoint. For example /healthcheck. |
There was a problem hiding this comment.
required? if not what is default?
There was a problem hiding this comment.
This was @htuch’s suggestion. The required term is only for internal tracking, when we have wrapper types and the field is required. It’s not a doc level indicator of required fields. I can do that but better done in separate PR as one full sweep, with no proto change.
There was a problem hiding this comment.
OK that's fine we can fix this in a follow up.
mattklein123
left a comment
There was a problem hiding this comment.
LGTM after small comments. There are probably small issues but we can fix them when you start to implement / we can add constraints when constraints are ready.
Signed-off-by: Shriram Rajagopalan <[email protected]>
|
@rshriram per offline discussion something here is busted with Travis. We can wait till tomorrow or just go ahead and convert to Circle which should be very fast. (You can copy envoy Circle config mostly). I can turn on Circle builds for this repo if you want to switch it. |
Signed-off-by: Shriram Rajagopalan <[email protected]>
|
@mattklein123 added circle CI config. We can remove travis once circle CI is running smoothly. |
|
@rshriram I turned Circle on but tests failed on mater. Can you take a look and fix in a new PR? |
HTTP Fault filter, Redis filter, tcp fault filter are new.
Also added faults to Mongo filter.
Signed-off-by: Shriram Rajagopalan [email protected]
cc @htuch @danielhochman (verify if stuff in Redis looks correct).
I don't know the conventions you folks have been following. So, let me know if I missed something..