New iomgr implementation backed by the EventEngine API#26026
New iomgr implementation backed by the EventEngine API#26026drfloob merged 159 commits intogrpc:masterfrom
Conversation
This required modifying code to avoid redefinition of `struct iovec` on platforms that already offer it.
This may be temporary. iOS isn't officially supported by libuv. There is code in libuv for iOS, and a few PRs to make it build.
This macro may outlive iomgr, whereas the existing one's may not. Eventually, maybe the alts/crypt code can be refactored.
ctiller
left a comment
There was a problem hiding this comment.
Reviewed 1 of 30 files at r13, 1 of 52 files at r15, 7 of 21 files at r16.
Reviewable status: all files reviewed, 39 unresolved discussions (waiting on @drfloob and @markdroth)
include/grpc/event_engine/event_engine.h, line 282 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please document.
+1, and hoping we don't actually need it.
(I'd much rather change whatever is using this than add this to the EE API)
drfloob
left a comment
There was a problem hiding this comment.
Thanks Mark! I've left a few pieces of feedback unaddressed, and I'm replying now with what I have since this PR is being debated on both Reviewable and Github. I'd like to note that some of these changes will essentially need to be done twice when we merge them with the uv-ee impl branch, so we should consider which things may be deferred for the next PR to make life easier.
Reviewable status: all files reviewed, 39 unresolved discussions (waiting on @drfloob, @markdroth, and @nicolasnoble)
include/grpc/event_engine/event_engine.h, line 282 at r17 (raw file):
Previously, ctiller (Craig Tiller) wrote…
+1, and hoping we don't actually need it.
(I'd much rather change whatever is using this than add this to the EE API)
Pinging @nicolasnoble for libuv implementation details.
include/grpc/event_engine/event_engine.h, line 286 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This first sentence is not necessary. The
EndpointConfigAPI doesn't even allow the EE impl to tell if that's the case, so it can't possibly care.
Removed. See below.
include/grpc/event_engine/event_engine.h, line 291 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Why is this method needed? Why not just have
CreateListener()orConnect()return a failure if they receive invalid settings?In general, it's not clear to me how this API can work, since the implementation of this method will not know where the
EndpointConfigis going to be used. For example, let's say that an EE impl needs one argument when creating client connections but a different argument when creating a listener on the server side. This method doesn't know which of those things theEndpointConfigis going to be used for, so it can't know whether the config is valid for the use-case it's going to be used for.
Removed. The idea behind this method is to be an isolated, self-documenting bit of code that asserts which config settings some specific EE implementation supports. Given that the EventEngine API is public, this gives EE authors a way to customize their engine's behavior for use outside gRPC internals (alternatively, as we've discussed, they could add methods like CustomConnect and CustomCreateListener). It's not a robust solution for config validation, as you pointed out, but it would replace some code comments with executable code that can be tested and whatnot. I'm not married to it, it does expand the API surface for something that is not absolutely necessary ... I'm not sure we'd use it in gRPC internal code. If that reasoning changes your mind at all, let me know, otherwise it'll stay gone.
include/grpc/event_engine/event_engine.h, line 336 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This needs to be documented.
Also, I suggest renaming it to something like
CreateDefaultEventEngine().
Pinging @nicolasnoble . Nico felt this was a more appropriate name. I believe the main argument was that this method could be creating engines, returning a singleton, or more creative things (returning some shared engine based on the thread id, for example).
This part of the API has evolved a bit since we started the uv-ee implementation. We don't have a final story for how custom default EventEngines will be provided by users, which is why I removed the documentation. We discussed having application code call something like SetDefaultEventEngineFactory before grpc init to override the gRPC-provided default libuv-ee factory. Do you think we need to firm up that story now, or can we revisit this once we've moved on from focusing on the iomgr and reference impl code?
include/grpc/impl/codegen/port_platform.h, line 679 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I think this needs to be a C-style comment, since this is a public header for C-core.
Interesting. I've made the change, but there are a handful of C++-style comments in this file. I'm guessing nobody has needed to compile gRPC with a compiler that lacks C99 support. Shall we consider dropping support for C89, enforcing C99 compatibility?
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_ev_driver_event_engine.cc, line 24 at r3 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I'll resolve this for now, but as we've discussed, I think we'll need to move the c-ares code into iomgr before we can migrate to EventEngine.
Totally agreed. Thanks!
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 531 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Why is this additional variable needed?
ev_driver is an output argument, so it must be here. The code beautification below is possible because of the driver variable.
src/core/lib/event_engine/endpoint_config_internal.h, line 32 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
No need for the trailing semicolon.
Also, please add a space before the
{.
Done. Interesting that no linter/tidy/formatter complained.
src/core/lib/event_engine/endpoint_config_internal.h, line 33 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This isn't necessary.
Which part isn't necessary? Reviewable only highlights the whole line. If you mean the const on the return type, I had already removed that, but must not have pushed the change.
src/core/lib/event_engine/endpoint_config_internal.h, line 37 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
No need to initialize to nullptr, since it will always be set by the ctor.
Removed. However, I feel it's a friendly safety net for other developers if we set default values for objects that might otherwise be garbage. For example, someone may alter this code and unknowingly invalidate the assumption that args_ is always set on construction.
src/core/lib/event_engine/event_engine.cc, line 76 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please remove blank lines within functions.
Removed this function.
src/core/lib/event_engine/slice_allocator.cc, line 51 at r13 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Seems like this TODO is still needed. :)
I implemented this in the uv-ee branch where we're working on the libuv implementation, since it was needed there. PR for that work will follow this one.
src/core/lib/gprpp/sync.h, line 228 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Maybe this should be moved into the EE code for now, since it isn't a general-purpose thing? That way, we don't risk other code starting to depend on this, making it harder to introduce the real thing later.
Good idea. Done.
src/core/lib/iomgr/event_engine/closure.h, line 14 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
It looks like there are a bunch of header files in this PR where you've moved this include outside of the
#ifndefguard for the header file. Why is that needed? It seems inconsistent with how we've always done this.
My mistake, it isn't necessary. port_platform.h needed to be included before we test for #ifdef GRPC_USE_EVENT_ENGINE, and I reordered these as well. Fixed.
src/core/lib/iomgr/event_engine/closure.cc, line 22 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This should be declared in a header file and included here, rather than hard-coding a forward declaration like this.
Done.
src/core/lib/iomgr/event_engine/iomgr.cc, line 35 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please use C++-style comments.
Done
src/core/lib/iomgr/event_engine/iomgr.cc, line 35 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Why is this tracer needed here? Shouldn't this be supported only for the posix iomgr, since it's the only one that supports polling engines?
Required by lockfree_event.cc. It's defined in the Windows iomgr as well, probably for the same reason. LockFreeEvent could likely be cleaned up to avoid this.
src/core/lib/iomgr/event_engine/pollset.cc, line 72 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
If this symbol is going to be visible outside of this file, then it needs to have a
grpc_prefix or be in thegrpc_corenamespace.
Done.
src/core/lib/iomgr/event_engine/resolver.cc, line 89 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please use
grpc_error_handleinstead ofgrpc_error*.
Done.
src/core/lib/iomgr/event_engine/resolver.cc, line 95 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please use
grpc_error_handleinstead ofgrpc_error*.
Done.
src/core/lib/iomgr/event_engine/tcp.cc, line 41 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This should be declared in a header file rather than being forward-declared everywhere.
Done.
src/core/lib/iomgr/event_engine/tcp.cc, line 109 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please call this
endpoint_config.
Done.
src/core/lib/iomgr/event_engine/tcp.cc, line 124 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please call this
endpoint_config.
Done.
src/core/lib/surface/init.h, line 36 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
The style guide prohibits global variables of types that are not trivially destructible, which I believe includes
std::shared_ptr<>.
Thanks. This was temporary code used in the uv-ee branch.
src/core/lib/surface/init.h, line 37 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I don't think this should be in this file. I think it belongs in src/core/lib/iomgr/event_engine/iomgr.cc.
I've removed the definition here and punted to DefaultEventEngineFactory. We need to iron out the appropriate semantics around default EventEngines. Having a singleton EventEngine was a quick way to get the uv-ee impl up and running, but is not a long-term solution as is. We can continue to evolve that in the uv-ee branch, if my fix here is sufficient for now.
src/core/lib/surface/init.cc, line 152 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This should be done in
iomgr_platform_init()in src/core/lib/iomgr/event_engine/iomgr.cc, not here.
Removed. See my comment regarding default EE singletons.
src/core/lib/surface/init.cc, line 191 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Similarly, this code should be moved into
iomgr_platform_shutdown()in src/core/lib/iomgr/event_engine/iomgr.cc.
Removed. See my comment regarding default EE singletons.
test/core/event_engine/endpoint_config_test.cc, line 32 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Can just say:
grpc_channel_args args = {1, &arg};Then there's no need to call
grpc_channel_args_destroy()below.
Excellent, done.
include/grpc/event_engine/endpoint_config.h, line 27 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
a/An/A/
Doh. Fixed.
include/grpc/event_engine/endpoint_config.h, line 27 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I think this comment needs to be a bit more detailed. I suggest something like this:
A set of parameters used to configure an endpoint, either when initiating a new connection on the client side or when listening for incoming connections on the server side. An EndpointConfig contains a set of zero or more Settings. Each setting has a unique name, which can be used to fetch that Setting via the Get() method. Each Setting has a value, which can be an integer, string, or void pointer. Each EE impl should define the set of Settings that it supports being passed into it, along with the corresponding type.
Added verbatim. Thanks.
include/grpc/event_engine/endpoint_config.h, line 29 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I don't understand what this comment means. There is nowhere in this interface definition where we are passing any pointers to the class. (That may happen in implementations of this interface, but it has nothing to do with the interface definition itself.)
Is this trying to say that the caller will not take ownership of a
Settingof typevoid*when theSettingis returned fromGet()? If so, just say that (and move the comment to be right above theGet()method).
Right, that comment was more meaningful when there was a Set method. The intention was to document that the EndpointConfig itself is not responsible for cleanup of any of the objects pointed to by the void* values it contains. I've changed it to say that, let me know what you think.
include/grpc/event_engine/endpoint_config.h, line 32 at r17 (raw file):
Do we need absl::monostate here? I think the other types are default-constructible, so it's not clear to me why we need this.
Right, absl::monostate lets us differentiate between "setting does not exist" and "setting exists with default value". I've updated the documentation. Also note that in endpoint_config_test, this is exercised in the ReturnsMonostateForMissingKeys test.
Is this approach better than having Get() return absl::optional<absl::variant<int, std::string, void*>>?
I believe using monostate is the canonical way to accomplish an "optional/empty variant", that's its purpose. We may not run into any of these use cases that make much difference (std::visit usage, for example), but I believe we should stick with monostate.
include/grpc/event_engine/endpoint_config.h, line 32 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This should probably use
absl::string_viewinstead ofstd::string, to avoid making unnecessary copies.
Since grpc_channel_args are immutable, they may be destroyed when updated. The grpc_channel_args behind our ChannelArgsEndpointConfig stores char* values, and I had decided to return a string to avoid lifetime issues. That may not be a problem in practice, so I've switched it to using a string_view here, and we can see if asan finds anything funky as we develop the engine. Reconsidering it a bit, I think that if the underlying grpc_channel_args is updated in any way after the ChannelArgsEndpointConfig obj is created, we'll either be using a garbage pointer within ChannelArgsEndpointConfig, or it may be referring to stale channel args values. So in any case, a string_view may suffice here.
include/grpc/event_engine/endpoint_config.h, line 34 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
The dtor should be defined before methods, as per the style guide.
Thanks, Done.
ctiller
left a comment
There was a problem hiding this comment.
Reviewable status: 37 of 70 files reviewed, 39 unresolved discussions (waiting on @ctiller, @drfloob, @markdroth, and @nicolasnoble)
include/grpc/impl/codegen/port_platform.h, line 679 at r17 (raw file):
Previously, drfloob (AJ Heller) wrote…
Interesting. I've made the change, but there are a handful of C++-style comments in this file. I'm guessing nobody has needed to compile gRPC with a compiler that lacks C99 support. Shall we consider dropping support for C89, enforcing C99 compatibility?
Could our c89 compatibility test be broken?
nicolasnoble
left a comment
There was a problem hiding this comment.
Reviewable status: 37 of 70 files reviewed, 39 unresolved discussions (waiting on @ctiller, @drfloob, @markdroth, and @nicolasnoble)
include/grpc/event_engine/event_engine.h, line 282 at r17 (raw file):
Previously, drfloob (AJ Heller) wrote…
Pinging @nicolasnoble for libuv implementation details.
Basically temporary measure for
grpc/src/core/lib/surface/init.cc
Lines 220 to 222 in d7954e8
nicolasnoble
left a comment
There was a problem hiding this comment.
Reviewable status: 37 of 70 files reviewed, 39 unresolved discussions (waiting on @ctiller, @drfloob, and @markdroth)
include/grpc/event_engine/event_engine.h, line 336 at r17 (raw file):
Previously, drfloob (AJ Heller) wrote…
Pinging @nicolasnoble . Nico felt this was a more appropriate name. I believe the main argument was that this method could be creating engines, returning a singleton, or more creative things (returning some shared engine based on the thread id, for example).
This part of the API has evolved a bit since we started the uv-ee implementation. We don't have a final story for how custom default EventEngines will be provided by users, which is why I removed the documentation. We discussed having application code call something like
SetDefaultEventEngineFactorybefore grpc init to override the gRPC-provided default libuv-ee factory. Do you think we need to firm up that story now, or can we revisit this once we've moved on from focusing on the iomgr and reference impl code?
I think we coined the word during one of our weekly meetings. This was still in flux as we were discussing how external users can handle creating the same event engine as the one and only we'd ship, and how they'd pass it down during or before grpc_init.
nicolasnoble
left a comment
There was a problem hiding this comment.
Reviewable status: 37 of 70 files reviewed, 39 unresolved discussions (waiting on @ctiller, @drfloob, and @markdroth)
include/grpc/impl/codegen/port_platform.h, line 679 at r17 (raw file):
Previously, ctiller (Craig Tiller) wrote…
Could our c89 compatibility test be broken?
I just checked; yes it's been broken for a little while now.
markdroth
left a comment
There was a problem hiding this comment.
I'm fine with making additional changes in the follow-up PR, but we should keep things in a consistent state in this PR, so there are still a few things I'd like to see cleaned up before we can merge this.
Please let me know if you have any questions. Thanks!
Reviewed 33 of 33 files at r18.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @ctiller, @drfloob, and @nicolasnoble)
include/grpc/event_engine/event_engine.h, line 282 at r17 (raw file):
Previously, nicolasnoble (Nicolas Noble) wrote…
Basically temporary measure for
until we have a better way of doing this. This still needs to be designed appropriately.grpc/src/core/lib/surface/init.cc
Lines 220 to 222 in d7954e8
Craig, is there a particular reason why you wouldn't want this here? I think even our internal EventManager API has something like this, so it seems fairly common. And it's not immediately obvious to me how it could be abused.
include/grpc/event_engine/event_engine.h, line 291 at r17 (raw file):
The idea behind this method is to be an isolated, self-documenting bit of code that asserts which config settings some specific EE implementation supports.
Why would a user care about that? If they pass in a setting that is not supported by the EE impl, then that setting will just be ignored, because the EE impl will not even have any way of knowing that it's there. And that would be equally true regardless of whether the validation is done in Connect() and CreateListener() or whether it's in a separate IsValidEndpointConfig() method, so I don't see how adding the latter method would actually accomplish anything.
Given that the EventEngine API is public, this gives EE authors a way to customize their engine's behavior for use outside gRPC internals (alternatively, as we've discussed, they could add methods like CustomConnect and CustomCreateListener).
Users can customize the EE behavior for callers by passing in whatever settings are supported by the EE impl. I don't see how the presence of this method would help with that. I also don't see why we'd ever need separate CustomConnect() or CustomCreateListener() methods, given that the existing Connect() and CreateListener() methods already accept an EndpointConfig.
My main point here is that I cannot actually see any use-case where a user would want to use this method. What would a user gain from calling this method that they would not get from just passing the same EndpointConfig to Connect() or CreateListener() and getting back an error? If you can give me a concrete example of where this would help, I'm happy to reconsider, but right now I don't see one.
include/grpc/event_engine/event_engine.h, line 336 at r17 (raw file):
Previously, nicolasnoble (Nicolas Noble) wrote…
I think we coined the word during one of our weekly meetings. This was still in flux as we were discussing how external users can handle creating the same event engine as the one and only we'd ship, and how they'd pass it down during or before grpc_init.
If you want to defer finalizing this, that's fine; in that case, just add a TODO here about finalizing the API and documenting it.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 531 at r17 (raw file):
Previously, drfloob (AJ Heller) wrote…
ev_driveris an output argument, so it must be here. The code beautification below is possible because of thedrivervariable.
I don't think the existing code was that hard to read. And in any case, this kind of code cleanup should be done in a separate PR.
Please revert the changes to this file.
src/core/lib/event_engine/endpoint_config_internal.h, line 33 at r17 (raw file):
Previously, drfloob (AJ Heller) wrote…
Which part isn't necessary? Reviewable only highlights the whole line. If you mean the
conston the return type, I had already removed that, but must not have pushed the change.
I mean the whole line is not needed. If you're not actually doing anything in the dtor, there's no need to override it.
src/core/lib/event_engine/slice_allocator.cc, line 51 at r13 (raw file):
Previously, drfloob (AJ Heller) wrote…
I implemented this in the uv-ee branch where we're working on the libuv implementation, since it was needed there. PR for that work will follow this one.
Sure, but in this PR, it's still a TODO. You can remove the TODO in the follow-up PR where you actually implement this.
src/core/lib/iomgr/event_engine/iomgr.cc, line 35 at r17 (raw file):
Previously, drfloob (AJ Heller) wrote…
Required by lockfree_event.cc. It's defined in the Windows iomgr as well, probably for the same reason. LockFreeEvent could likely be cleaned up to avoid this.
Okay. Probably not worth cleaning up at this point, since all of the iomgr code is eventually going to go away.
For now, though, please add a comment indicating that this needs to be defined by all iomgr impls, even if not actually used.
src/core/lib/surface/init.h, line 36 at r17 (raw file):
Previously, drfloob (AJ Heller) wrote…
Thanks. This was temporary code used in the uv-ee branch.
Note that one easy way to avoid this problem would be to do something like this in src/core/lib/iomgr/event_engine/iomgr.cc:
namespace {
// Note: This is a pointer to a shared_ptr, so it's trivially destructible.
std::shared_ptr<EventEngine>* g_event_engine;
void iomgr_platform_init() {
g_event_engine = new std::shared_ptr<EventEngine>(DefaultEventEngineFactory());
}
void iomgr_platform_shutdown() {
grpc_core::Promise<absl::Status> shutdown_status_promise;
(*g_event_engine)->Shutdown([&shutdown_status_promise](absl::Status status) {
shutdown_status_promise.Set(std::move(status));
});
auto shutdown_status = shutdown_status_promise.Get();
GPR_ASSERT(shutdown_status.ok());
delete g_event_engine;
}
} // namespace
// This can be called anywhere in the EE-based iomgr impl where we need to access the global EE instance.
EventEngine* grpc_iomgr_event_engine() {
return g_event_engine->get();
}
src/core/lib/surface/init.h, line 37 at r17 (raw file):
Previously, drfloob (AJ Heller) wrote…
I've removed the definition here and punted to
DefaultEventEngineFactory. We need to iron out the appropriate semantics around default EventEngines. Having a singleton EventEngine was a quick way to get the uv-ee impl up and running, but is not a long-term solution as is. We can continue to evolve that in the uv-ee branch, if my fix here is sufficient for now.
I think having a singleton EventEngine for the EE-based iomgr impl is exactly the right solution. But that should be done in the EE-based iomgr impl, because it's the EE-based iomgr impl that needs the singleton, not the EE API itself.
I think this should be done by having the EE-based iomgr impl call DefaultEventEngineFactory() once at startup, storing the value in a global somehow, and then using that global everywhere in the EE-based iomgr impl. I don't think it makes sense to call DefaultEventEngineFactory() everywhere in the EE-based iomgr impl, because the name of that function implies that it creates a new instance, not that it returns a new ref to a singleton global instance.
I understand that the story around DefaultEventEngineFactory() may evolve before we finalize this API, and I'm fine with that. But at any given point in that evolution, we should be in a consistent state. In this PR, we should do one of two things:
- If we're currently requiring that
DefaultEventEngineFactory()always returns the same global instance, then let's rename it accordingly. - Otherwise, let's keep
DefaultEventEngineFactory()returning a new instance each time, and change the EE-based iomgr impl to store its own global instance. (See the next comment for an example of how to do this.)
include/grpc/event_engine/endpoint_config.h, line 29 at r17 (raw file):
Previously, drfloob (AJ Heller) wrote…
Right, that comment was more meaningful when there was a
Setmethod. The intention was to document that the EndpointConfig itself is not responsible for cleanup of any of the objects pointed to by the void* values it contains. I've changed it to say that, let me know what you think.
I suggest removing this comment and instead adding a comment above the Get() method that just says "Caller does not take ownership of resulting value."
include/grpc/event_engine/endpoint_config.h, line 32 at r17 (raw file):
Previously, drfloob (AJ Heller) wrote…
Since grpc_channel_args are immutable, they may be destroyed when updated. The grpc_channel_args behind our ChannelArgsEndpointConfig stores char* values, and I had decided to return a string to avoid lifetime issues. That may not be a problem in practice, so I've switched it to using a string_view here, and we can see if asan finds anything funky as we develop the engine. Reconsidering it a bit, I think that if the underlying grpc_channel_args is updated in any way after the ChannelArgsEndpointConfig obj is created, we'll either be using a garbage pointer within ChannelArgsEndpointConfig, or it may be referring to stale channel args values. So in any case, a string_view may suffice here.
We've already agreed that the caller does not take ownership of the returned value, so I think that's already true. For example, for a void* value, the EE impl cannot assume that the object it points to will continue to exist, unless it has taken steps to (e.g.) take a new ref to the object. I think strings can have the same semantics: the EE impl can make a copy if it needs it, but we shouldn't force the creation of a copy via the API, since there may be some cases where the EE impl doesn't need a copy.
Anyway, this looks good.
ctiller
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @drfloob, @markdroth, and @nicolasnoble)
include/grpc/event_engine/event_engine.h, line 282 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Craig, is there a particular reason why you wouldn't want this here? I think even our internal EventManager API has something like this, so it seems fairly common. And it's not immediately obvious to me how it could be abused.
Decisions based on it are going to be pretty subtle and potentially hard to debug, so if we can move the system away from needing it, I'd prefer we did that.
This code path is only for global shutdown of iomgr, which by design shouldn't interact with EventEngine, nor should it be able to access such - though I appreciate there may be an interim time where such access is necessary.
To handle this case, I'd suggest we simply launch the background cleanup thread always for event engine until we get a chance to refine that decision later?
nicolasnoble
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @drfloob, @markdroth, and @nicolasnoble)
include/grpc/event_engine/event_engine.h, line 282 at r17 (raw file):
Previously, ctiller (Craig Tiller) wrote…
Decisions based on it are going to be pretty subtle and potentially hard to debug, so if we can move the system away from needing it, I'd prefer we did that.
This code path is only for global shutdown of iomgr, which by design shouldn't interact with EventEngine, nor should it be able to access such - though I appreciate there may be an interim time where such access is necessary.
To handle this case, I'd suggest we simply launch the background cleanup thread always for event engine until we get a chance to refine that decision later?
So I tried that originally (always launching the background cleanup thread), and things started to blow up elsewhere in grpc in lots of subtle and horrifying tsan-related manners, which I suspect means it's (1) the right thing to do and (2) requires further debugging / cleanup in the rest of grpc.
I don't mind always doing the background cleanup, but the race conditions would first need to be solved at HEAD, which is totally orthogonal.
drfloob
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @drfloob and @markdroth)
include/grpc/event_engine/event_engine.h, line 336 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
If you want to defer finalizing this, that's fine; in that case, just add a TODO here about finalizing the API and documenting it.
Sounds good, I've added a TODO to address it.
src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc, line 531 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I don't think the existing code was that hard to read. And in any case, this kind of code cleanup should be done in a separate PR.
Please revert the changes to this file.
Reverted.
src/core/lib/event_engine/endpoint_config_internal.h, line 33 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I mean the whole line is not needed. If you're not actually doing anything in the dtor, there's no need to override it.
Dtor removed. I saw this comment attached to the Setting Get(absl::string_view key) const override; line in Reviewable. My mistake, Reviewable doesn't automatically switch to the appropriate revision when you navigate to the next discussion awaiting a response.
src/core/lib/event_engine/slice_allocator.cc, line 51 at r13 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Sure, but in this PR, it's still a TODO. You can remove the TODO in the follow-up PR where you actually implement this.
TODO added.
src/core/lib/iomgr/exec_ctx.cc, line 32 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I am really uncomfortable with this being exposed like this, since it's really an internal implementation detail of
ExecCtx. Why is this needed?Also, note that if this is needed, then we need to give it a
grpc_prefix or put it in thegrpc_corenamespace, or else we're exporting a public symbol in the global namespace that may conflict with applications.
Understood, it made me a bit uncomfortable as well. The idea is to avoid having to call ExecCtx::Run in the GrpcClosureToCallback lambda:
EventEngine::Callback GrpcClosureToCallback(grpc_closure* closure,
grpc_error_handle error) {
return [closure, error](absl::Status status) {
grpc_error_handle new_error =
grpc_error_add_child(error, absl_status_to_grpc_error(status));
exec_ctx_run(closure, new_error);
grpc_pollset_ee_broadcast_event();
};
}
When not in a debug build, exec_ctx_run just calls the closure and unrefs the error. That's trivial enough to duplicate here, instead of reusing the exec_ctx_run method. In a debug build however, exec_ctx_run also adds logic to closure.scheduled that we'd have to otherwise replicate in GrpcClosureToCallback to avoid aborts on repeated closure scheduling for example, and it adds some handy telemetry collection and logging. It seemed worth reusing that code, rather than copying it.
Caveat: this is untested, this branch hasn't been merged to the uv-ee impl branch for over a week due to the ongoing refactoring of the listener.
src/core/lib/iomgr/event_engine/endpoint.cc, line 35 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
As per my comments elsewhere, this should be declared in a header file and included here, rather than hard-coding a forward declaration like this.
Done
src/core/lib/iomgr/event_engine/endpoint.cc, line 190 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Why are we setting this here instead of letting it be handled in
endpoint_get_peer()?
It solves a lifetime issue: the resource_user doesn't make a copy of the name passed to it (see the line below this one).
src/core/lib/iomgr/event_engine/iomgr.cc, line 35 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Okay. Probably not worth cleaning up at this point, since all of the iomgr code is eventually going to go away.
For now, though, please add a comment indicating that this needs to be defined by all iomgr impls, even if not actually used.
Done.
src/core/lib/iomgr/event_engine/pollset.cc, line 26 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This need to be of type
gpr_cvinstead ofgpr_mu. (Looks like sync_abseil.h defines them both as intptr_t, which is why the compiler isn't complaining.)
ooh, good catch. Done.
src/core/lib/surface/init.h, line 36 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Note that one easy way to avoid this problem would be to do something like this in src/core/lib/iomgr/event_engine/iomgr.cc:
namespace { // Note: This is a pointer to a shared_ptr, so it's trivially destructible. std::shared_ptr<EventEngine>* g_event_engine; void iomgr_platform_init() { g_event_engine = new std::shared_ptr<EventEngine>(DefaultEventEngineFactory()); } void iomgr_platform_shutdown() { grpc_core::Promise<absl::Status> shutdown_status_promise; (*g_event_engine)->Shutdown([&shutdown_status_promise](absl::Status status) { shutdown_status_promise.Set(std::move(status)); }); auto shutdown_status = shutdown_status_promise.Get(); GPR_ASSERT(shutdown_status.ok()); delete g_event_engine; } } // namespace // This can be called anywhere in the EE-based iomgr impl where we need to access the global EE instance. EventEngine* grpc_iomgr_event_engine() { return g_event_engine->get(); }
Thanks. I've added something nearly identical, with the main difference being the signature change to std::shared_ptr<EventEngine> grpc_iomgr_event_engine(). Even though we have a singleton default EventEngine, the iomgr code expects to be working with shared_ptrs; see iomgr/event_engine/tcp.cc, particularly when endpoints need to take refs on user-supplied EventEngines ... once we have a mechanism to support them, that is.
src/core/lib/surface/init.h, line 37 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I think having a singleton EventEngine for the EE-based iomgr impl is exactly the right solution. But that should be done in the EE-based iomgr impl, because it's the EE-based iomgr impl that needs the singleton, not the EE API itself.
I think this should be done by having the EE-based iomgr impl call
DefaultEventEngineFactory()once at startup, storing the value in a global somehow, and then using that global everywhere in the EE-based iomgr impl. I don't think it makes sense to callDefaultEventEngineFactory()everywhere in the EE-based iomgr impl, because the name of that function implies that it creates a new instance, not that it returns a new ref to a singleton global instance.I understand that the story around
DefaultEventEngineFactory()may evolve before we finalize this API, and I'm fine with that. But at any given point in that evolution, we should be in a consistent state. In this PR, we should do one of two things:
- If we're currently requiring that
DefaultEventEngineFactory()always returns the same global instance, then let's rename it accordingly.- Otherwise, let's keep
DefaultEventEngineFactory()returning a new instance each time, and change the EE-based iomgr impl to store its own global instance. (See the next comment for an example of how to do this.)
Good points. Responded in the next comment.
include/grpc/event_engine/endpoint_config.h, line 29 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I suggest removing this comment and instead adding a comment above the
Get()method that just says "Caller does not take ownership of resulting value."
Done.
include/grpc/event_engine/endpoint_config.h, line 32 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
We've already agreed that the caller does not take ownership of the returned value, so I think that's already true. For example, for a
void*value, the EE impl cannot assume that the object it points to will continue to exist, unless it has taken steps to (e.g.) take a new ref to the object. I think strings can have the same semantics: the EE impl can make a copy if it needs it, but we shouldn't force the creation of a copy via the API, since there may be some cases where the EE impl doesn't need a copy.Anyway, this looks good.
Thanks!
drfloob
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @drfloob and @markdroth)
src/core/lib/iomgr/exec_ctx.cc, line 32 at r17 (raw file):
Previously, drfloob (AJ Heller) wrote…
Understood, it made me a bit uncomfortable as well. The idea is to avoid having to call ExecCtx::Run in the GrpcClosureToCallback lambda:
EventEngine::Callback GrpcClosureToCallback(grpc_closure* closure, grpc_error_handle error) { return [closure, error](absl::Status status) { grpc_error_handle new_error = grpc_error_add_child(error, absl_status_to_grpc_error(status)); exec_ctx_run(closure, new_error); grpc_pollset_ee_broadcast_event(); }; }When not in a debug build,
exec_ctx_runjust calls the closure and unrefs the error. That's trivial enough to duplicate here, instead of reusing theexec_ctx_runmethod. In a debug build however,exec_ctx_runalso adds logic toclosure.scheduledthat we'd have to otherwise replicate in GrpcClosureToCallback to avoid aborts on repeated closure scheduling for example, and it adds some handy telemetry collection and logging. It seemed worth reusing that code, rather than copying it.Caveat: this is untested, this branch hasn't been merged to the uv-ee impl branch for over a week due to the ongoing refactoring of the listener.
Correction: " ... to properly abort on repeated closure scheduling for example ... ".
LMKWYT. If we do keep this, I'll rename it as you suggested.
drfloob
left a comment
There was a problem hiding this comment.
Reviewable status: 48 of 71 files reviewed, 14 unresolved discussions (waiting on @drfloob and @markdroth)
include/grpc/event_engine/event_engine.h, line 121 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This should not be exposed in the API. I think we already discussed this at one point: I think we need to change the chttp2 code to track this internally so that it does not need to be passed through the EE endpoint object.
Ignoring the EventEngine for a second: given the "RU-per-Endpoint" model we have today, and given that iomgr implementations create RUs and RQs without the transport's knowledge (for example, endpoints that are created within tcp servers are a few steps removed from any chttp2 code), if we want the transport to track Endpoint-specific RUs without accessing them through the endpoints themselves, we'd probably want to invert the RU ownership and creation responsibilities in iomgr before we introduce an EventEngine at all. We'd be deprecating grpc_endpoint_get_resource_user, and likely making changes to the iomgr API around endpoint creation (grpc_tcp_client_connect and grpc_tcp_server_create at first glance) to accept RUs and some flavor of RU-tracking RQ wrapper that allow those resource objects to be managed externally and injected. This is in addition to changing the chttp2 code to track RUs/RQs.
There's more to think through, I'm happy to brainstorm it if you have ideas. Rethinking the "RU-per-Endpoint" model might be worthwhile.
As an alternative to the iomgr refactor, how would you feel about having Endpoint::GetSliceAllocator(), and adding a friend to the SliceAllocator to access the internal resource_user? It's really not ideal either, I'm aware. Chances are we couldn't remove that GetSliceAllocator method after iomgr is gone, since it'd break EventEngine implementations at that point ... I'm not sure if it'll still be experimental at that point. But that'd be a fairly small change compared to the iomgr refactor.
|
If we removed RU stuff, who would break?
…On Wed, Jun 16, 2021, 3:36 PM AJ Heller ***@***.***> wrote:
***@***.**** commented on this pull request.
*Reviewable <https://reviewable.io/reviews/grpc/grpc/26026>* status: 48
of 71 files reviewed, 14 unresolved discussions (waiting on @drfloob
<https://github.com/drfloob> and @markdroth <https://github.com/markdroth>
)
------------------------------
*include/grpc/event_engine/event_engine.h, line 121 at r17
<https://reviewable.io/reviews/grpc/grpc/26026#-McGVnK-BbgdmlIK9zas:-McLejGRA_Fk7pZ0vKW4:bnhkbbh>
(raw file
<https://github.com/grpc/grpc/blob/eb1850ae41c5ea629cb2a6641d7205bbd6dc3660/include/grpc/event_engine/event_engine.h#L121>):*
*Previously, markdroth (Mark D. Roth) wrote…*
This should not be exposed in the API. I think we already discussed this
at one point: I think we need to change the chttp2 code to track this
internally so that it does not need to be passed through the EE endpoint
object.
Ignoring the EventEngine for a second: given the "RU-per-Endpoint" model
we have today, and given that iomgr implementations create RUs and RQs
without the transport's knowledge (for example, endpoints that are created
within tcp servers are a few steps removed from any chttp2 code), if we
want the transport to track Endpoint-specific RUs without accessing them
*through* the endpoints themselves, we'd probably want to invert the RU
ownership and creation responsibilities in iomgr before we introduce an
EventEngine at all. We'd be deprecating grpc_endpoint_get_resource_user,
and likely making changes to the iomgr API around endpoint creation (
grpc_tcp_client_connect and grpc_tcp_server_create at first glance) to
accept RUs and some flavor of RU-tracking RQ wrapper that allow those
resource objects to be managed externally and injected. This is in addition
to changing the chttp2 code to track RUs/RQs.
There's more to think through, I'm happy to brainstorm it if you have
ideas. Rethinking the "RU-per-Endpoint" model might be worthwhile.
As an alternative to the iomgr refactor, how would you feel about having
Endpoint::GetSliceAllocator(), and adding a friend to the SliceAllocator
to access the internal resource_user? It's really not ideal either, I'm
aware. Chances are we couldn't remove that GetSliceAllocator method after
iomgr is gone, since it'd break EventEngine implementations at that point
... I'm not sure if it'll still be experimental at that point. But that'd
be a fairly small change compared to the iomgr refactor.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26026 (review)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACNG45OWH4BAFG2IWJBM5W3TTERNZANCNFSM43IZTJBA>
.
|
The public C++ ResourceQuota feature would stop working. https://github.com/grpc/grpc/blob/master/include/grpcpp/resource_quota.h. It relies on the transport's reclaimers, which rely on the ability to get RUs from the endpoints. |
markdroth
left a comment
There was a problem hiding this comment.
I don't think we need to remove the RU stuff. I think we can easily support it without it being visible to EE implementations.
Please let me know if you have any questions. Thanks!
Reviewed 23 of 23 files at r19.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @ctiller, @drfloob, and @nicolasnoble)
include/grpc/event_engine/event_engine.h, line 121 at r17 (raw file):
Previously, drfloob (AJ Heller) wrote…
Ignoring the EventEngine for a second: given the "RU-per-Endpoint" model we have today, and given that iomgr implementations create RUs and RQs without the transport's knowledge (for example, endpoints that are created within tcp servers are a few steps removed from any chttp2 code), if we want the transport to track Endpoint-specific RUs without accessing them through the endpoints themselves, we'd probably want to invert the RU ownership and creation responsibilities in iomgr before we introduce an EventEngine at all. We'd be deprecating
grpc_endpoint_get_resource_user, and likely making changes to the iomgr API around endpoint creation (grpc_tcp_client_connectandgrpc_tcp_server_createat first glance) to accept RUs and some flavor of RU-tracking RQ wrapper that allow those resource objects to be managed externally and injected. This is in addition to changing the chttp2 code to track RUs/RQs.There's more to think through, I'm happy to brainstorm it if you have ideas. Rethinking the "RU-per-Endpoint" model might be worthwhile.
As an alternative to the iomgr refactor, how would you feel about having
Endpoint::GetSliceAllocator(), and adding a friend to the SliceAllocator to access the internal resource_user? It's really not ideal either, I'm aware. Chances are we couldn't remove that GetSliceAllocator method after iomgr is gone, since it'd break EventEngine implementations at that point ... I'm not sure if it'll still be experimental at that point. But that'd be a fairly small change compared to the iomgr refactor.
The chttp2 code controls creating the TCP connection. That happens here for the server:
And here for the client:
The chttp2 code will therefore be able to provide its own SliceAllocator implementation, which can contain the associated RU. And on the server side, it can provide its own SliceAllocatorFactory implementation, which knows how to create its SliceAllocator objects.
I am not aware of any case that would require this to be visible in any way to EE implementations.
include/grpc/event_engine/event_engine.h, line 282 at r17 (raw file):
Previously, nicolasnoble (Nicolas Noble) wrote…
So I tried that originally (always launching the background cleanup thread), and things started to blow up elsewhere in grpc in lots of subtle and horrifying tsan-related manners, which I suspect means it's (1) the right thing to do and (2) requires further debugging / cleanup in the rest of grpc.
I don't mind always doing the background cleanup, but the race conditions would first need to be solved at HEAD, which is totally orthogonal.
For now, let's put a TODO here that we should consider whether we can remove this method before we de-experimentalize this API.
src/core/lib/iomgr/exec_ctx.cc, line 32 at r17 (raw file):
Previously, drfloob (AJ Heller) wrote…
Correction: " ... to properly abort on repeated closure scheduling for example ... ".
LMKWYT. If we do keep this, I'll rename it as you suggested.
If the goal here is to replicate the debug-only guards to prevent closures from being rescheduled, I suggest just duplicating that logic. It's not a huge amount of code anyway, and it avoids tightly coupling the ExecCtx and the EE-based iomgr impl.
(I might not suggest the above if any of this code was going to stick around in the long term, but I think it's fine for now.)
src/core/lib/iomgr/event_engine/endpoint.h, line 50 at r19 (raw file):
/// established. grpc_endpoint* grpc_tcp_create(const grpc_channel_args* channel_args, const char* peer_address);
Why change this from absl::string_view to const char*?
Also, I don't see the same change in the .cc file, so I don't think this will build.
src/core/lib/iomgr/event_engine/endpoint.cc, line 190 at r17 (raw file):
Previously, drfloob (AJ Heller) wrote…
It solves a lifetime issue: the resource_user doesn't make a copy of the name passed to it (see the line below this one).
Please add a comment explaining this, since it's not obvious to the reader.
src/core/lib/iomgr/event_engine/iomgr.cc, line 64 at r19 (raw file):
auto shutdown_status = shutdown_status_promise.Get(); GPR_ASSERT(shutdown_status.ok()); delete g_event_engine;
Probably also want to reset g_event_engine to null, just in case some lingering callback tries to use it.
src/core/lib/surface/init.h, line 36 at r17 (raw file):
Previously, drfloob (AJ Heller) wrote…
Thanks. I've added something nearly identical, with the main difference being the signature change to
std::shared_ptr<EventEngine> grpc_iomgr_event_engine(). Even though we have a singleton default EventEngine, the iomgr code expects to be working with shared_ptrs; see iomgr/event_engine/tcp.cc, particularly when endpoints need to take refs on user-supplied EventEngines ... once we have a mechanism to support them, that is.
There is no need for the code in the EE-based iomgr impl to ever hold a ref here. We will not be able to add support for user-supplied EventEngines until after we have eliminated the iomgr code. The iomgr API is inherently not a per-channel thing; it is global.
It does not make sense to have grpc_iomgr_event_engine() return a std::shared_ptr<>, because none of its callers ever need to hold a ref. So forcing this to be a shared_ptr means that we are needlessly taking a new ref and then immediately releasing that ref every time grpc_iomgr_event_engine() is called.
drfloob
left a comment
There was a problem hiding this comment.
Thanks as always, Mark! Regarding the the iomgr/chttp2 refactoring around resource users, I've added a bit more to the discussion. I'll plan to get started with the refactoring, let me know if my arguments below change your mind.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @drfloob and @markdroth)
include/grpc/event_engine/event_engine.h, line 121 at r17 (raw file):
Well said, no argument!
The chttp2 code will therefore be able to provide its own SliceAllocator implementation, which can contain the associated RU. And on the server side, it can provide its own SliceAllocatorFactory implementation, which knows how to create its SliceAllocator objects.
That will have to happen when me migrate the chttp2 code away from using iomgr, if not before then. The chttp2 code will ultimately be responsible for RU and RQ creation and ownership, based on the current EE design.
I am not aware of any case that would require this to be visible in any way to EE implementations.
True, the EE itself doesn't need it, that's what I was trying to convey in the documentation's (for internal use) bit. Presuming the EE interface remains experimental even after we've removed iomgr, the GetResourceUser method should very well be temporary. Allowing the method to exist temporarily would avoid us having to do the iomgr refactor / chttp2 ownership change now, which would push the immediate EventEngine timeline back a few days, maybe a week (I need to spend time familiarizing myself with the transport code).
Having slept on it, even given the timeline slip, I'm inclined to do the refactoring anyway. Let me know if you've changed your mind on that.
include/grpc/event_engine/event_engine.h, line 282 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
For now, let's put a TODO here that we should consider whether we can remove this method before we de-experimentalize this API.
TODO added.
src/core/lib/iomgr/exec_ctx.cc, line 32 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
If the goal here is to replicate the debug-only guards to prevent closures from being rescheduled, I suggest just duplicating that logic. It's not a huge amount of code anyway, and it avoids tightly coupling the
ExecCtxand the EE-based iomgr impl.(I might not suggest the above if any of this code was going to stick around in the long term, but I think it's fine for now.)
Sounds good, I've copied the code over.
src/core/lib/iomgr/event_engine/endpoint.h, line 50 at r19 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Why change this from
absl::string_viewtoconst char*?Also, I don't see the same change in the .cc file, so I don't think this will build.
It was a mistake, leftover from a (partially) reverted change.
src/core/lib/iomgr/event_engine/endpoint.cc, line 190 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please add a comment explaining this, since it's not obvious to the reader.
Correction, the resource_user name is an std::string. I've removed this line.
src/core/lib/iomgr/event_engine/iomgr.cc, line 64 at r19 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Probably also want to reset
g_event_engineto null, just in case some lingering callback tries to use it.
Done.
src/core/lib/surface/init.h, line 36 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
There is no need for the code in the EE-based iomgr impl to ever hold a ref here. We will not be able to add support for user-supplied EventEngines until after we have eliminated the iomgr code. The iomgr API is inherently not a per-channel thing; it is global.
It does not make sense to have
grpc_iomgr_event_engine()return astd::shared_ptr<>, because none of its callers ever need to hold a ref. So forcing this to be a shared_ptr means that we are needlessly taking a new ref and then immediately releasing that ref every timegrpc_iomgr_event_engine()is called.
That's right, thanks. I've removed the need for a shared_ptr in the EE iomgr tcp cod, grpc_iomgr_event_engine now returns an EventEngine*
markdroth
left a comment
There was a problem hiding this comment.
This looks really good! The only really blocking issue is to remove the GetResourceUser() method.
Please let me know if you have any questions. Thanks!
Reviewed 8 of 8 files at r20.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @drfloob)
include/grpc/event_engine/event_engine.h, line 121 at r17 (raw file):
Previously, drfloob (AJ Heller) wrote…
Well said, no argument!
The chttp2 code will therefore be able to provide its own SliceAllocator implementation, which can contain the associated RU. And on the server side, it can provide its own SliceAllocatorFactory implementation, which knows how to create its SliceAllocator objects.
That will have to happen when me migrate the chttp2 code away from using iomgr, if not before then. The chttp2 code will ultimately be responsible for RU and RQ creation and ownership, based on the current EE design.
I am not aware of any case that would require this to be visible in any way to EE implementations.
True, the EE itself doesn't need it, that's what I was trying to convey in the documentation's
(for internal use)bit. Presuming the EE interface remains experimental even after we've removed iomgr, theGetResourceUsermethod should very well be temporary. Allowing the method to exist temporarily would avoid us having to do the iomgr refactor / chttp2 ownership change now, which would push the immediate EventEngine timeline back a few days, maybe a week (I need to spend time familiarizing myself with the transport code).Having slept on it, even given the timeline slip, I'm inclined to do the refactoring anyway. Let me know if you've changed your mind on that.
I don't see any reason not to go ahead and do this refactoring now. I think this is actually a useful code cleanup in its own right, because it centralizes RQ and RU usage to the transport code, where it belongs, rather than sharing responsibility between iomgr and the transport code.
To be clear, though, I'm not saying that we need to block on this; I'm just saying that I don't want to add methods to the EE API that we don't think are a good idea, even temporarily, because the more we do that, the more we have to remember to clean up before we can de-experimentalize it. But we can still defer this refactoring for a bit if you want without having to add this method here: instead, add the method to the libuv-based EE impl subclass, and have the EE-based iomgr code down-cast to the subclass.
src/core/lib/iomgr/event_engine/tcp.cc, line 108 at r20 (raw file):
grpc_millis_to_timespec(deadline, GPR_CLOCK_MONOTONIC)); ChannelArgsEndpointConfig endpoint_config(channel_args); // TODO(hork): Establish how to take a user-supplied EventEngine here instead
This TODO can go away. As we've discussed, this code (the EE-based iomgr impl) will never need to do this.
src/core/lib/iomgr/event_engine/tcp.cc, line 129 at r20 (raw file):
rq = grpc_resource_quota_create(nullptr); } // TODO(hork): Establish how to take a user-supplied EventEngine here instead
Same here.
drfloob
left a comment
There was a problem hiding this comment.
Thanks for the suggestions! Just a bit of discussion left.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @markdroth)
include/grpc/event_engine/event_engine.h, line 121 at r17 (raw file):
add the method to the libuv-based EE impl subclass, and have the EE-based iomgr code down-cast to the subclass.
That sounds like a great temporary solution. I'll do that for now (one detail left to discuss in another thread here), and will plan to do the RU refactoring at a better time.
include/grpc/event_engine/slice_allocator.h, line 50 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
As I mentioned elsewhere, this should not be exposed in this API. We should instead fix the chttp2 code to track this internally.
If we move the GetResourceUser method from the EventEngine API into the uvEngine subclass as you suggested above, we'd still benefit from having this to retrieve the resource user from the SliceAllocator. Alternatively, we could overload or create custom Connect methods on the subclass, and do something like this in the tcp iomgr code so that the uvEngine has direct knowledge of the RU:
void tcp_connect(...) {
...
uvEngine* engine = dynamic_cast<uvEngine*>(grpc_iomgr_event_engine());
SliceAllocator sa(ee_endpoint->ru);
absl::Status connected = engine->ConnectWithRu(
ee_on_connect, ra, endpoint_config, std::move(sa), ee_deadline, ee_endpoint->ru); // extra RU argument at the end
WDYT? I think I'd prefer the latter, but with an overload.
src/core/lib/iomgr/event_engine/tcp.cc, line 108 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This TODO can go away. As we've discussed, this code (the EE-based iomgr impl) will never need to do this.
Done.
src/core/lib/iomgr/event_engine/tcp.cc, line 129 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Same here.
Done.
drfloob
left a comment
There was a problem hiding this comment.
I think I've wrapped up everything. Please let me know if there's anything else that you think needs to be resolved in this PR. Thanks!
Reviewable status: 68 of 71 files reviewed, 4 unresolved discussions (waiting on @markdroth)
include/grpc/event_engine/event_engine.h, line 121 at r17 (raw file):
Previously, drfloob (AJ Heller) wrote…
add the method to the libuv-based EE impl subclass, and have the EE-based iomgr code down-cast to the subclass.
That sounds like a great temporary solution. I'll do that for now (one detail left to discuss in another thread here), and will plan to do the RU refactoring at a better time.
This is the solution we'll put in the uv-ee branch, I think. For now, I've left the endpoint's ru unset in the EE iomgr code, since it doesn't much matter at the moment. I left a TODO which I'll take care of as soon as this work is pushed through.
include/grpc/event_engine/slice_allocator.h, line 50 at r17 (raw file):
Previously, drfloob (AJ Heller) wrote…
If we move the
GetResourceUsermethod from the EventEngine API into theuvEnginesubclass as you suggested above, we'd still benefit from having this to retrieve the resource user from the SliceAllocator. Alternatively, we could overload or create custom Connect methods on the subclass, and do something like this in the tcp iomgr code so that the uvEngine has direct knowledge of the RU:void tcp_connect(...) { ... uvEngine* engine = dynamic_cast<uvEngine*>(grpc_iomgr_event_engine()); SliceAllocator sa(ee_endpoint->ru); absl::Status connected = engine->ConnectWithRu( ee_on_connect, ra, endpoint_config, std::move(sa), ee_deadline, ee_endpoint->ru); // extra RU argument at the endWDYT? I think I'd prefer the latter, but with an overload.
I've removed the SliceAllocator's GetResourceUser method here since it doesn't much matter at the moment, and we can figure out which way to go in the uv-ee branch.
include/grpc/impl/codegen/port_platform.h, line 679 at r17 (raw file):
Previously, nicolasnoble (Nicolas Noble) wrote…
I just checked; yes it's been broken for a little while now.
Nico, did you want to take the AI to look into that?
markdroth
left a comment
There was a problem hiding this comment.
This looks great!
Reviewed 3 of 3 files at r21, 1 of 1 files at r22.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @drfloob)
include/grpc/event_engine/slice_allocator.h, line 50 at r17 (raw file):
Previously, drfloob (AJ Heller) wrote…
I've removed the SliceAllocator's GetResourceUser method here since it doesn't much matter at the moment, and we can figure out which way to go in the uv-ee branch.
Having an alternative version of Connect() is one option. Another option is to have the EE impl know how to downcast the SliceAllocator impl and have the SliceAllocator impl provide the GetResourceUser() method.
Whichever of those options we choose, this is just a temporary solution until we change the chttp2 code to track the resource user itself, so I don't really care which one we go with.
drfloob
left a comment
There was a problem hiding this comment.
Thanks Mark!
Reviewable status:
complete! all files reviewed, all discussions resolved (waiting on @drfloob)
include/grpc/event_engine/slice_allocator.h, line 50 at r17 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Having an alternative version of
Connect()is one option. Another option is to have the EE impl know how to downcast theSliceAllocatorimpl and have theSliceAllocatorimpl provide theGetResourceUser()method.Whichever of those options we choose, this is just a temporary solution until we change the chttp2 code to track the resource user itself, so I don't really care which one we go with.
SGTM
This code adds an iomgr implementation that's driven by an EventEngine.
See also drfloob#1: @nicolasnoble has a pull request against this branch, implementing the libuv-based EventEngine. One goal here is to implement the iomgr code such that it can be merged independently without affecting normal builds.
This implementation can be built using
bazel build --cxxopt='-DGRPC_USE_EVENT_ENGINE' :allSome shortcuts are being taken to get a working, testable version of the engine. EventEngines are not pluggable, for example.
This change is