Skip to content

Adding missing filters (http fault, redis, mongo, tcp, rate limits, etc..)#192

Merged
mattklein123 merged 34 commits intoenvoyproxy:masterfrom
rshriram:master
Oct 25, 2017
Merged

Adding missing filters (http fault, redis, mongo, tcp, rate limits, etc..)#192
mattklein123 merged 34 commits intoenvoyproxy:masterfrom
rshriram:master

Conversation

@rshriram
Copy link
Copy Markdown
Member

HTTP Fault filter, Redis filter, tcp fault filter are new.
Also added faults to Mongo filter.

Signed-off-by: Shriram Rajagopalan [email protected]

cc @htuch @danielhochman (verify if stuff in Redis looks correct).
I don't know the conventions you folks have been following. So, let me know if I missed something..

Shriram Rajagopalan added 2 commits October 13, 2017 16:14
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
import "google/protobuf/wrappers.proto";

// Delay specification is used to inject latency into the rpc operations.
message Delay {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this is implemented, except for the exponential faults.. It was there in Istio spec. Just thought I would add it here as well. LMK if you want me to remove it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either remove or add a tracking issue at https://github.com/envoyproxy/envoy for implementation side (and reference here).

// applied to the following MongoDB operations: Query, Insert, GetMore,
// and KillCursors. Once an active delay is in progress, all incoming
// data up until the timer event fires will be a part of the delay.
Delay delay = 3;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its already implemented.. Just needs remapping config fields in the code..

// connection establishment.
google.protobuf.Duration terminate_after_period = 2;
}
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not exist at all :P. Thought I would get this for free when I move the mongo fault into TCP, but then Mongo does L7 style fault (on operations, instead of raw bytes as this filter proposes). My next task is to streamline mongo fault, implement TCP fault and possibly redis fault if its easy.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for tcp, I am signing up (currently) only for delay fault, and possibly simple abort fault. Not the fancy abort after period.

api/filter/BUILD Outdated
api_proto_library(
name = "mongo_proxy",
srcs = ["mongo_proxy.proto"],
deps = [
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deps = [":http_fault"], for singleton. Please run buildifier over this file.

// Delay specification is used to inject latency into the rpc operations.
message Delay {
// Delay type to use (fixed|exponential|..). Currently, only fixed delay (step function) is supported.
string type = 1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not enum?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going by what was defined in the configs (where type was a string). I guess Enum seems more sensible given we are doing protos.

import "google/protobuf/wrappers.proto";

// Delay specification is used to inject latency into the rpc operations.
message Delay {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either remove or add a tracking issue at https://github.com/envoyproxy/envoy for implementation side (and reference here).

google.protobuf.UInt32Value percent = 2;

oneof http_delay_type {
// Add a fixed delay before forwarding the operation upstream. Format: 1h/1m/1s/1ms. MUST be >=1ms.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no format for Durations..

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. But this was intentional. For the doc-gen.. Coz it does not tell people that Duration is just a string and that people can specify 1h/1m/1s. When we generated docs for Istio, this was one of the issues. So, I went back to all protos, and manually added this, so that people reading docs knew what values to put.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duration is only a string in JSON/YAML form. We will eventually be generating docs for {proto3, JSON, YAML}, so this interpretation only applies to some readings. I'm surprised that docgen can't inline the documentations from Duration, which is where you would expect to find this kind of explanation.

I think a better way to put it might be "See https://developers.google.com/protocol-buffers/docs/proto3#json for the JSON/YAML Duration mapping.".


oneof http_delay_type {
// Add a fixed delay before forwarding the operation upstream. Format: 1h/1m/1s/1ms. MUST be >=1ms.
google.protobuf.Duration fixed_delay = 2;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has field number 2, but so does percent above. oneof fields share the same number space as the fields in the message around them.

// network issues, overloaded upstream service, etc.
Delay delay = 1;

// Abort Http request attempts and return error codes back to downstream
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/Http/HTTP/

// gRPC status code to use to abort a gRPC request. NOT IMPLEMENTED
string grpc_status = 2 ;
// HTTP2 error code used to abort a Http2 request. NOT IMPLEMENTED
string http2_error = 3 ;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't these error codes? Why don't HTTP/HTTP2 share a code (since they have the same status code space)?

// percentage of connections to throttle.
float percent = 1;
// bandwidth limit in "bits" per second between downstream and Envoy
int64 downstream_limit_bps = 2;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these signed? Why aren't they wrapped, e.g. google.protobuf.UInt32Value?


// Alternatively, we could wait for a certain number of bytes to be
// transferred to upstream before throttling the bandwidth.
double throttle_after_bytes = 5;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're waiting for a fractional byte? :)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You gots no idea on the number of eyes this got.. We looked into a time machine and decided to design for the next 4000 years of prosperity

// established, emulating remote server crash or link failure.
message Terminate {
// percentage of established Tcp connections to be terminated/reset
float percent = 1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should be consistent across filters in how percentages are represented (stick with what we do in eds.proto).

Shriram Rajagopalan added 3 commits October 13, 2017 17:47
Signed-off-by: Shriram Rajagopalan <[email protected]>
nit
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
nit
Signed-off-by: Shriram Rajagopalan <[email protected]>
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good, a few more things.

import "google/protobuf/struct.proto";
import "google/protobuf/wrappers.proto";

// Delay specification is used to inject latency into the rpc/TCP proxy operations.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/rpc/request/ (consistency with abort below).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used RPC explicitly coz this might be used in Mongo/Redis/TCP/HTTP. May be I should use a different term than generalizing a query as an RPC

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then capitalize it RPC :)

// Applicable only for HTTP connections.
oneof error_type {
// HTTP status code to use to abort the HTTP request.
int32 http_status = 2;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint32

// HTTP status code to use to abort the HTTP request.
int32 http_status = 2;
// gRPC status code to use to abort the gRPC request.
int32 grpc_status = 3;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been implemented? Have you some description of how we decide to do a gRPC fault rather than an HTTP one?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has not been implemented. I thought it would be easy to add. May be I should take it off for now..


// If specified, the filter will abort requests based on the values in
// the object. At least abort or delay must be specified.
Abort abort = 2;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there ever a situation in which this or delay would be a repeated?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. My answer would be no. The filters can be stacked (i.e. multiple fault filters), such that the matching filter will activate. Changing this into an array would require changes to the filter implementation to process a sequence of faults rather than assume that each filter is responsible for one delay + abort.

// injection filter can be applied selectively to requests that match a
// set of headers specified in the fault filter config. The chances of
// actual fault injection further depend on the values of abort_percent
// and fixed_delay_percent parameters.The filter will check the request’s
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s. The

// exception to this behavior is when a connection to a backend is not
// yet established. In that case, the connect timeout on the cluster
// will govern the timeout until the connection is ready.
google.protobuf.Duration op_timeout = 3;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be numbered 1.

Shriram Rajagopalan added 2 commits October 17, 2017 00:39
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
@mattklein123
Copy link
Copy Markdown
Member

Once we define these, is someone going to go through and do the work to wire them up so that we use them?

@rshriram
Copy link
Copy Markdown
Member Author

Yea, I will do it. I just wanted all configs to be in proto form, as it makes our v2 migration easier in Istio.

@mattklein123
Copy link
Copy Markdown
Member

OK, cool. Do you want to just add them all then? We are missing router, buffer, etc.

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits then LGTM.

import "google/protobuf/struct.proto";
import "google/protobuf/wrappers.proto";

// Delay specification is used to inject latency into the rpc/TCP proxy operations.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then capitalize it RPC :)

// yet established. In that case, the connect timeout on the cluster
// will govern the timeout until the connection is ready.
google.protobuf.Duration op_timeout = 1;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It's more readable if you put this above settings.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 (I think there are a few other instances of this)

// Applicable only for HTTP connections.
oneof error_type {
// HTTP status code to use to abort the HTTP request.
uint32 http_status = 2;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

google.protobuf.UInt32Value?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been in two minds on this, and have stopped objecting to unwrapped types (and in some cases doing it myself) when either: it's (1) a 100% compulsory field and we don't need defaulting behavior or (2) it's embedded under something like oneof where we can test presence/absence.

It's pretty ugly in proto IMHO that we don't just have aliases for wrapped types that express this, e.g. optional_uint32, rather than the full google.protobuf.UInt32Value.

Shriram Rajagopalan added 2 commits October 23, 2017 10:54
Signed-off-by: Shriram Rajagopalan <[email protected]>
@rshriram
Copy link
Copy Markdown
Member Author

@htuch / @mattklein123 I have added a bunch of missing filters (buffer, router, ip tagging, client_ssl auth, grpc-json transcoder). These filters have configuration. Others didn't have any configuration.

Also, the missing pieces: rate limit filter proto seems different from the ones in http/tcp. Health check filter is not defined, as I am not sure of its status in the v2 api. Apart from these, all other filters have been defined.

@mattklein123
Copy link
Copy Markdown
Member

We need all the filters that have JSON schema configs, including TCP/HTTP RL and HC. Please take a look at the JSON schemas.

Signed-off-by: Shriram Rajagopalan <[email protected]>
@rshriram rshriram changed the title Adding fault filters (http, redis, mongo, tcp) Adding missing filters (http fault, redis, mongo, tcp, rate limits, etc..) Oct 23, 2017
@rshriram
Copy link
Copy Markdown
Member Author

Added RL, and HC.

// Fixed delay (step function).
FIXED = 0;
// Exponential delay.
EXPONENTIAL = 1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you going to implement this soon? If not, can we just remove for now?

// x-envoy-internal is set to true. If x-envoy-internal is not set or
// false, a request is considered external. The filter defaults to both,
// and it will apply to all request types.
string request_type = 1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this an enum

@@ -2,6 +2,19 @@ syntax = "proto3";

package envoy.api.v2.filter;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put network filters in a network subdirectory and http filters in an http subdirectory? Seems like it would be cleaner? If you think that is good do you want to just move stuff around and then do the cleanup required in envoy for HTTP connection manager?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL.. shuffled stuff around

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think changing package names can be considered an API breaking change. I don't think we can do this with HTTP connection manager..

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if its just envoymanager using it, and may be @timperret, we could do the right thing now than later?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch is it really breaking? Since we aren't using Any yet, and just Struct, it seems like this would be a compile only issue and not a wire issue? If so we should clean it up?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's breaking in the sense that a .proto for the filter config that wasn't encoded via Struct would break I think, but OK, I don't mind relaxing this one.

// yet established. In that case, the connect timeout on the cluster
// will govern the timeout until the connection is ready.
google.protobuf.Duration op_timeout = 1;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 (I think there are a few other instances of this)

// to Envoy over HTTP and get proxied to a gRPC service. The HTTP mapping
// for the gRPC service has to be defined by custom options, defined in
// https://cloud.google.com/service-management/reference/rpc/google.api#http
message gRPCJSONTranscoder {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is hard to read. I would do gRPCJsonTranscoder

Shriram Rajagopalan added 3 commits October 23, 2017 22:13
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
Shriram Rajagopalan added 4 commits October 24, 2017 17:03
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
@rshriram
Copy link
Copy Markdown
Member Author

addressed all comments. @mattklein123 / @htuch

ConnPoolSettings settings = 3;

// Redis connection pool settings
message ConnPoolSettings {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move message definition before usage?

Shriram Rajagopalan added 3 commits October 24, 2017 17:28
Signed-off-by: Shriram Rajagopalan <[email protected]>
fix
Signed-off-by: Shriram Rajagopalan <[email protected]>

// Control options for response json. These options are passed directly
// to JsonPrintOptions.
message PrintOptions {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, message before usage

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ci/ci_steps.sh Outdated

# Lint travis file.
travis lint .travis.yml --skip-completion-check
#travis lint .travis.yml --skip-completion-check
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Shriram Rajagopalan added 3 commits October 24, 2017 17:56
fix
Signed-off-by: Shriram Rajagopalan <[email protected]>
Signed-off-by: Shriram Rajagopalan <[email protected]>
fix
Signed-off-by: Shriram Rajagopalan <[email protected]>
@mattklein123
Copy link
Copy Markdown
Member

Signed-off-by: Shriram Rajagopalan <[email protected]>
@rshriram
Copy link
Copy Markdown
Member Author

done

Signed-off-by: Shriram Rajagopalan <[email protected]>
//
// * Note:* The fault injection filter must be inserted before any
// other filter, including the router filter.
message HTTPFault {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the common stuff in this file into a common fault.proto? Maybe just api/filter/fault.proto? It's a bit weird that we import http/fault.proto into mongo filter, and will eventually have a L4 fault filter also, etc.

These specifications will be added in the near future. In the interim, you can still supply plain JSON configuration objects
for these missing filters by setting the `"deprecated_v1"` field to true in the filter's configuration. For example,
If a filter configuration is not captured in the proto specification, you can still supply plain JSON configuration objects
for such filters by setting the `"deprecated_v1"` field to true in the filter's configuration. For example,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: reflow to 100 cols

// Specifies whether the filter operates in pass through mode or not. REQUIRED.
google.protobuf.BoolValue pass_through_mode = 1;
// Specifies the incoming HTTP endpoint that should be considered the
// health check endpoint. For example /healthcheck.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

required? if not what is default?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was @htuch’s suggestion. The required term is only for internal tracking, when we have wrapper types and the field is required. It’s not a doc level indicator of required fields. I can do that but better done in separate PR as one full sweep, with no proto change.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK that's fine we can fix this in a follow up.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after small comments. There are probably small issues but we can fix them when you start to implement / we can add constraints when constraints are ready.

Signed-off-by: Shriram Rajagopalan <[email protected]>
@mattklein123
Copy link
Copy Markdown
Member

@rshriram per offline discussion something here is busted with Travis. We can wait till tomorrow or just go ahead and convert to Circle which should be very fast. (You can copy envoy Circle config mostly). I can turn on Circle builds for this repo if you want to switch it.

Signed-off-by: Shriram Rajagopalan <[email protected]>
@rshriram
Copy link
Copy Markdown
Member Author

@mattklein123 added circle CI config. We can remove travis once circle CI is running smoothly.

@mattklein123 mattklein123 merged commit f2c1dc9 into envoyproxy:master Oct 25, 2017
@mattklein123
Copy link
Copy Markdown
Member

@rshriram I turned Circle on but tests failed on mater. Can you take a look and fix in a new PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants