Skip to content

Add Timer::enableHRTimer()#9155

Merged
htuch merged 9 commits intoenvoyproxy:masterfrom
oschaaf:hrtimer
Dec 3, 2019
Merged

Add Timer::enableHRTimer()#9155
htuch merged 9 commits intoenvoyproxy:masterfrom
oschaaf:hrtimer

Conversation

@oschaaf
Copy link
Copy Markdown
Member

@oschaaf oschaaf commented Nov 27, 2019

Adds a way to set sub-millisecond timers. This generates an opportunity for Nighthawk to improve accuracy of request-release timings. The fault filter extension might leverage this as well to inject sub millisecond delays.

Implemented as an extra call instead of an overload or change to the existing one, because that way caller intent explicitly carries information about accuracy expectations. Also, adding an extra call minimizes changes needed to accomplish this.

For best results, libevent needs to be made aware too that we'd like to use precise timers.
One way to do that is by setting an environment variable called EVENT_PRECISE_TIMER.

EVENT_PRECISE_TIMER=1 bazel-bin/nighthawk_client ...

One thing to mind is that the max sleep time will go down from std::chrono:millliseconds::duration::max() to std::chrono::microseconds::duration::max() with this. AFAICT we will still be able to sleep for ~292277 years with that.

Description:
Risk Level: Low
Testing: By proxy, via existing tests

```
Use --sandbox_debug to see verbose messages from the sandbox
external/envoy/source/common/http/async_client_impl.cc:20:73: error: call to implicitly-deleted default constructor of 'const AsyncStreamImpl::NullRetryPolicy'
const AsyncStreamImpl::NullRetryPolicy AsyncStreamImpl::RouteEntryImpl::retry_policy_;
                                                                        ^
bazel-out/k8-dbg/bin/external/envoy/source/common/http/_virtual_includes/async_client_lib/common/http/async_client_impl.h:147:33: note: default constructor of 'NullRetryPolicy' is implicitly deleted because field 'retriable_status_codes_' of const-qualified type 'const std::vector<uint32_t>' (aka 'const vector<unsigned int>') would not be initialized
    const std::vector<uint32_t> retriable_status_codes_;
                                ^
1 error generated.
```

Signed-off-by: Otto van der Schaaf <[email protected]>
Signed-off-by: Otto van der Schaaf <[email protected]>
@oschaaf
Copy link
Copy Markdown
Member Author

oschaaf commented Nov 27, 2019

/cc @htuch

Signed-off-by: Otto van der Schaaf <[email protected]>
This reverts commit eba4053.

Signed-off-by: Otto van der Schaaf <[email protected]>
@yanavlasov
Copy link
Copy Markdown
Contributor

If the hi-res timers are used in the nighthawk for now, would it make sense to have this functionality just there?
Also would it make sense to make hi-res time enabled in the build files by default, instead of relying on users to remember to set on the command line during build?

@htuch
Copy link
Copy Markdown
Member

htuch commented Nov 27, 2019

@yanavlasov I think it's reasonable to add this to Envoy (I suggest this to @oschaaf). Why not make these the default though, rather than requiring specific enablement? Is there a performance or stability implication?

@oschaaf
Copy link
Copy Markdown
Member Author

oschaaf commented Nov 27, 2019

If the hi-res timers are used in the nighthawk for now, would it make sense to have this functionality just there?

So I thought this might be nice to have here as it seemed the change wouldn't add a significant maintenance burden; as the implementation can be shared across the regular and non-HR versions (at least today).. and potentially Envoy extensions might benefit from it?

Also would it make sense to make hi-res time enabled in the build files by default, instead of relying on users to remember to set on the command line during build?

Yes sure; alternatively this can also be set when initializing libevent instead of via setting an environment variable. But I was trying to minimize the change;

Why not make these the default though, rather than requiring specific enablement? Is there a performance or stability implication?

I was thinking that in Nighthawk this should be enabled by default but with an opt-out escape-latch for those who seek absolute highest throughput, as I remember there is a trade-off involved [1]. IMHO Envoy would probably better be off with an opt-in for those who run specific extensions with strict latency requirements..or maybe it would be nice to facilitate extensions that need it to enable the feature at server boot time automatically; e.g. a new interface call for extensions that Envoy can query as it starts to determine if any of those has strict latency requirements.

[1] From https://sourceforge.net/p/levent/mailman/message/29143607/

There are a bunch of backends that can give us a reasonably good
monotonic timer quickly, or a very precise monotonic timer less
quickly.  For example, Linux has CLOCK_MONOTONIC_COARSE (1msec
precision), which is over twice as fast as CLOCK_MONOTONIC.  Since
epoll only lets you wait with 1msec precision,
CLOCK_MONOTONIC_COARSE is a clear win.

@htuch
Copy link
Copy Markdown
Member

htuch commented Nov 27, 2019

@oschaaf this makes sense, but the implementation seems to be basically doing HR anyway for the regular timer case.

@oschaaf
Copy link
Copy Markdown
Member Author

oschaaf commented Nov 27, 2019

@htuch this makes sense, but the implementation seems to be basically doing HR anyway for the regular timer case.

correct, but imho consumers shouldn't care about how the two calls are implemented internally; from the outside my idea is that one of them is to express the desire for a reasonably precise but efficient timer, and the other one is to express the desire of favoring high precision.

(the doxygen comments might be a good place to clarify this in case we want to take this forward)

I was thinking that this distinction might some day be useful, perhaps knowledge about caller intent might be leveraged some day to optimize with coarse grained timer coalescing when favoring efficiency

@htuch
Copy link
Copy Markdown
Member

htuch commented Dec 2, 2019

@oschaaf yeah, that's reasonable. Can you add some tests for the new method?

@htuch htuch added the waiting label Dec 2, 2019
Signed-off-by: Otto van der Schaaf <[email protected]>
@oschaaf
Copy link
Copy Markdown
Member Author

oschaaf commented Dec 2, 2019

@oschaaf yeah, that's reasonable. Can you add some tests for the new method?

I took a stab at that. 9f00ad8 explicitly shares some test code between enableTimer() and enableHRTimer().

yanavlasov
yanavlasov previously approved these changes Dec 3, 2019
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.. one thing I'd be curious about is whether it's possible to use simulated time to actually capture the timing precision? Not sure how feasible that is, LGTM otherwise.

@oschaaf
Copy link
Copy Markdown
Member Author

oschaaf commented Dec 3, 2019

Thanks.. one thing I'd be curious about is whether it's possible to use simulated time to actually capture the timing precision? Not sure how feasible that is, LGTM otherwise.

Nice, that works, and the new tests also functionally covers theoretical precision for the regular enableTimer() implementation: 2dca213

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@htuch htuch merged commit 8db1682 into envoyproxy:master Dec 3, 2019
@lizan lizan mentioned this pull request Dec 3, 2019
oschaaf added a commit to oschaaf/envoy that referenced this pull request Dec 4, 2019
This reverts commit 8db1682.

Signed-off-by: Otto van der Schaaf <[email protected]>
htuch pushed a commit that referenced this pull request Dec 4, 2019
This reverts commit 8db1682.

server_fuzz_test flakes in CI on super large timeout values
backing this out to stabilize that test while sorting this out offline

Signed-off-by: Otto van der Schaaf <[email protected]>
oschaaf added a commit to oschaaf/envoy that referenced this pull request Dec 4, 2019
This reverts commit a693496.

Signed-off-by: Otto van der Schaaf <[email protected]>
oschaaf added a commit to oschaaf/envoy that referenced this pull request Dec 4, 2019
- Updates TimerUtils::durationToTimeval to use templating to avoid
implicit conversions of durations when passing argument, thereby
generating an opportunity to bound the input before doing conversions
on it. Throws when passed a negative duration.

- Adds some more tests around this

- Slightly refactored version modified of:
 envoyproxy#9155,

Signed-off-by: Otto van der Schaaf <[email protected]>
htuch pushed a commit that referenced this pull request Dec 12, 2019
- Updates TimerUtils::durationToTimeval to use templating to avoid implicit conversions of durations when passing argument, thereby generating an opportunity to bound the input before doing conversions on it. Throws when passed a negative duration.

- Adds some more tests around this

Slightly refactored version of #9155

Risk Level: medium
Testing: unit tests

Signed-off-by: Otto van der Schaaf <[email protected]>
prakhag1 pushed a commit to prakhag1/envoy that referenced this pull request Jan 3, 2020
- Updates TimerUtils::durationToTimeval to use templating to avoid implicit conversions of durations when passing argument, thereby generating an opportunity to bound the input before doing conversions on it. Throws when passed a negative duration.

- Adds some more tests around this

Slightly refactored version of envoyproxy#9155

Risk Level: medium
Testing: unit tests

Signed-off-by: Otto van der Schaaf <[email protected]>
Signed-off-by: Prakhar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants