Skip to content

Add TimerImpl::enableHRTimer - take two#9229

Merged
htuch merged 7 commits intoenvoyproxy:masterfrom
oschaaf:hrtimer-attempt-two
Dec 12, 2019
Merged

Add TimerImpl::enableHRTimer - take two#9229
htuch merged 7 commits intoenvoyproxy:masterfrom
oschaaf:hrtimer-attempt-two

Conversation

@oschaaf
Copy link
Copy Markdown
Member

@oschaaf oschaaf commented Dec 4, 2019

  • Updates TimerUtils::durationToTimeval to use templating to avoid
    implicit conversions of durations when passing argument, thereby
    generating an opportunity to bound the input before doing conversions
    on it. Throws when passed a negative duration.
  • Adds some more tests around this
  • Slightly refactored version of Add Timer::enableHRTimer() #9155

Risk Level: medium
Testing: unit tests

@oschaaf oschaaf force-pushed the hrtimer-attempt-two branch from 93c4522 to 649815b Compare December 4, 2019 21:21
This reverts commit a693496.

Signed-off-by: Otto van der Schaaf <[email protected]>
- Updates TimerUtils::durationToTimeval to use templating to avoid
implicit conversions of durations when passing argument, thereby
generating an opportunity to bound the input before doing conversions
on it. Throws when passed a negative duration.

- Adds some more tests around this

- Slightly refactored version modified of:
 envoyproxy#9155,

Signed-off-by: Otto van der Schaaf <[email protected]>
@oschaaf oschaaf force-pushed the hrtimer-attempt-two branch from 649815b to 6bb6555 Compare December 4, 2019 21:31
Signed-off-by: Otto van der Schaaf <[email protected]>
Signed-off-by: Otto van der Schaaf <[email protected]>
@oschaaf oschaaf marked this pull request as ready for review December 5, 2019 11:07
@oschaaf
Copy link
Copy Markdown
Member Author

oschaaf commented Dec 5, 2019

@htuch attempt two at this, hopefully the fuzzer doesn't object this time. This mitigates potential overflow issues, but unfortunately I have not been able to reproduce the issue locally -- even given the provided corpus that teased it out on CI. Is it possible to respin the fuzzer tests in CI a couple of times for this change?

* @param tv output parameter that will be updated
*/
template <typename Duration> static void durationToTimeval(const Duration& d, timeval& tv) {
if (d.count() < 0) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional thought: one thing that might be worth attempting here, is to typedef the duration args to enableXXXTimer(). E.g.

using TimerDuration = std::chrono::duration<uint64_t, std::milli>;

Doing that would void the need for this sanity check here, but would also bloat this PR a lot as call sites would have to be audited / updated.

Copy link
Copy Markdown
Contributor

@yanavlasov yanavlasov Dec 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it is a big win. Perhaps this should be an ASSERT? Or treat negative duration as 0? ASSERT may however slow down fuzzers considerably if they go to negative duration. So probably would an exception here.
Also the comment says negative duration causes ASSERT, not exception. One of the two needs to change.

/**
* Intended for consumption by enable(HR)Timer, this method is templated method to avoid implicit
* duration conversions for its input arguments. This lets us have an opportunity to check bounds
* before doing any conversions. When the passed in duration exceeds UINT32_T max days, the output
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this read "INT32_T max seconds"? As per the code?

throw EnvoyException(
fmt::format("Negative duration passed to durationToTimeval(): {}", d.count()));
};
constexpr int64_t clip_to = INT32_MAX; // 136.102208 years
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I certainly think this is more than enough. Out of curiosity though, why 32 bits? I think timeval uses time_t for seconds, which is 64 bit?

@oschaaf
Copy link
Copy Markdown
Member Author

oschaaf commented Dec 6, 2019

@yanavlasov thanks for your feedback! I will push an update to make comments consistent shortly.

Sharing some thoughts that further motivate some of the changes in here, as well addresses as some of your questions/feedback.

  • So .. the first concern here are the substructures of timeval.
    It looks like time_t isn't well defined [1] (but probably it will be 64 bits most of the time).

  • Another concern are the chrono durations [2]; they only specify minimums for the underlying types in the specs. So -for example- an implementation may use a full 64 bits range for seconds, and when doing so converting from ::seconds to, say, ::milliseconds may overflow if they also use the same underlying size. I can do this and trigger signedness to flip on my dev box.
    I think that in CI this triggered the fuzzer complaints, as a sanitizer caught it.

  • Then the third concern I had is that possibly unreasonably large timeouts may be indicative of programming mistakes (as we're not building a calendar or some such :-)). So maybe we should assert on that in debug mode, and emit an error/warning in release mode builds (thisprobably will be a low frequency event, so logging should be fine, and hopefulle such events will be noticed and reported back here). It's worth noting here that the current api allows implicit (narrowing) conversions. So there are places where enableTimer(std::chrono::seconds(x)) is used - which is fine if x isn't huge but does leave wiggle room for mistakes - this motivates switching to a typedef [3].

It might be a little paranoid, but I'm hoping that capping the inbound number of seconds to INT32_MAX keeps things within bounds for most systems.

[1]
https://en.cppreference.com/w/c/chrono/time_t

https://en.wikipedia.org/w/index.php?title=Time_t&oldid=450752800
On win32 you can request it to be 32 bits via _USE_32BIT_TIME_T
[2]
https://en.cppreference.com/w/cpp/chrono/duration

std::chrono::nanoseconds	duration</*signed integer type of at least 64 bits*/, std::nano>
std::chrono::microseconds	duration</*signed integer type of at least 55 bits*/, std::micro>
std::chrono::milliseconds	duration</*signed integer type of at least 45 bits*/, std::milli>
std::chrono::seconds	duration</*signed integer type of at least 35 bits*/>
std::chrono::minutes	duration</*signed integer type of at least 29 bits*/, std::ratio<60>>
std::chrono::hours	duration</*signed integer type of at least 23 bits*/, std::ratio<3600>>
std::chrono::days (since C++20)	duration</*signed integer type of at least 25 bits*/, std::ratio<86400>>
std::chrono::weeks (since C++20)	duration</*signed integer type of at least 22 bits*/, std::ratio<604800>>
std::chrono::months (since C++20)	duration</*signed integer type of at least 20 bits*/, std::ratio<2629746>>
std::chrono::years (since C++20)	duration</*signed integer type of at least 17 bits*/, std::ratio<31556952>>

[3]
Some of this can be mitigated by using an typedef which resolves to a fixed size unsigned int; that would provide safety at build time. The cost is that the api would be less user friendly, being able to pass 60s is rather nice. Ease of use might be brought back by bubbling up the templating to the enableTimer() calls instead doing it in the duration translation helper we have now.

/**
* Intended for consumption by enable(HR)Timer, this method is templated method to avoid implicit
* duration conversions for its input arguments. This lets us have an opportunity to check bounds
* before doing any conversions. When the passed in duration exceeds INT32_T max days, the output
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this should just say "INT32_MAX seconds", otherwise it is hard to interpret. Same below, the "INT32_MAX days worth of seconds" may be hard to understand.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - that was a confusing comment indeed; amended: 0c57cfc

Signed-off-by: Otto van der Schaaf <[email protected]>
yanavlasov
yanavlasov previously approved these changes Dec 7, 2019
Signed-off-by: Otto van der Schaaf <[email protected]>
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@htuch
Copy link
Copy Markdown
Member

htuch commented Dec 11, 2019

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 3 pipeline(s).

@htuch htuch merged commit a78311f into envoyproxy:master Dec 12, 2019
prakhag1 pushed a commit to prakhag1/envoy that referenced this pull request Jan 3, 2020
- Updates TimerUtils::durationToTimeval to use templating to avoid implicit conversions of durations when passing argument, thereby generating an opportunity to bound the input before doing conversions on it. Throws when passed a negative duration.

- Adds some more tests around this

Slightly refactored version of envoyproxy#9155

Risk Level: medium
Testing: unit tests

Signed-off-by: Otto van der Schaaf <[email protected]>
Signed-off-by: Prakhar <[email protected]>
@antoniovicente
Copy link
Copy Markdown
Contributor

Q: in what cases do you require microsecond resolution? Something to keep in mind is that syscalls like epoll_wait which I think we are used by libevent on linux take arguments at millisecond resolution.

@oschaaf
Copy link
Copy Markdown
Member Author

oschaaf commented Jun 10, 2020

@antoniovicente this NH commit introduces using microsecond resolution for scheduling wakeups on the dispatcher in NH.

Context: In Nighthawk, when the client has in-flight requests, we can't spin the event loop to poll the clock for determining the time of the next request release, as timers take priority over i/o events. We wouldn't want to block processing of i/o events. So we use sleeps to schedule wakeups instead. Without this change, the smallest unit of time we could use to poll the clock would then be approximately 1ms when there are requests in-flight. There are scenarios where that seems suboptimal; the simplest example I can come up with is that when 2000 qps is requested, we'd probably end up batching two requests every 1 ms, whereas we actually ought to release one every 500 us.

With the enableHRTimer() call introduced here, Nighthawk is able to schedule sub ms wake-ups for checking the clock. This results in more accurate request-release timings -- especially when there are overlapping/in-flight requests. (I manually verified this).

@antoniovicente
Copy link
Copy Markdown
Contributor

There's several other cases where the TimerImpl 0msec timeout behavior has come up. I'm looking into it but it's going slowly. Thanks for the explanation for your use case and problems you have seen.

I do think that you may see some desirable behavior when adding a usec timer based on the iteration's time reference and the real clock has advanced past the absolute time that the timer was set to since you do poll for fd events but with an epoll_wait of 0msecs.

libevent rounds timeouts up to nearest 1ms when scheduling epoll_wait
https://github.com/libevent/libevent/blob/master/evutil_time.c#L133
https://github.com/libevent/libevent/blob/master/epoll.c#L505

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants