-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Use getrandom(2), if available, to implement Interop.GetRandomBytes() #37906
Conversation
| { | ||
| int32_t offset = 0; | ||
| do | ||
| if (errno == EINTR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the docs, getrandom will return -1 and errno will be EAGAIN when GRND_NONBLOCK is used and the entropy pool is not available. Should this include EAGAIN to try again?
I suspect that is probably not likely as I would think that would only happen only very early in the boot process, but, perhaps worth considering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's often the case during early boot scenarios, or on embedded systems. However, calling getrandom() in a tight loop if it keeps returning EAGAIN might turn into an infinite loop in such cases; I'm not sure it's advisable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have at least one VM that can take minutes to fully initialize.
Ubuntu 16.04.6
$ dmesg | grep random
[ 1.844973] random: udevadm: uninitialized urandom read (16 bytes read, 2 bits of entropy available)
[ 1.846829] random: udevadm: uninitialized urandom read (16 bytes read, 2 bits of entropy available)
[ 1.849528] random: systemd-udevd: uninitialized urandom read (16 bytes read, 2 bits of entropy available)
[ 1.850996] random: systemd-udevd: uninitialized urandom read (16 bytes read, 2 bits of entropy available)
[ 1.852320] random: systemd-udevd: uninitialized urandom read (16 bytes read, 2 bits of entropy available)
[ 1.864731] random: systemd-udevd: uninitialized urandom read (16 bytes read, 2 bits of entropy available)
[ 1.866058] random: systemd-udevd: uninitialized urandom read (16 bytes read, 2 bits of entropy available)
[ 1.867459] random: systemd-udevd: uninitialized urandom read (16 bytes read, 2 bits of entropy available)
[ 1.870948] random: systemd-udevd: uninitialized urandom read (16 bytes read, 2 bits of entropy available)
[ 1.872280] random: systemd-udevd: uninitialized urandom read (16 bytes read, 2 bits of entropy available)
[ 180.310815] random: nonblocking pool is initialized
Newer kernels trust the CPU, so it's less likely:
Ubuntu 19.04:
$ dmesg | grep random
[ 0.017218] random: get_random_bytes called from start_kernel+0x97/0x516 with crng_init=0
[ 0.536646] random: crng done (trusting CPU's manufacturer)
Though not necessarily:
Ubuntu 18.04:
$ dmesg | grep random
[ 0.000000] random: get_random_bytes called from start_kernel+0x99/0x4fd with crng_init=0
[ 1.452697] random: fast init done
[ 1.454713] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[ 1.462508] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[ 1.466497] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[ 10.935718] random: crng init done
[ 10.935721] random: 7 urandom warning(s) missed due to ratelimiting
So the EAGAIN either needs to fail over (e.g. to "dirty reads" from /dev/urandom), or the read should just be a blocking read. But it's definitely an important case to handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Possibly it's already being handled OK, just that the assert needs to be removed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assert was just a copy from the read(/dev/urandom) codepath. Maybe a good idea to remove it from there.
BTW, not sure if you're able to set this up in your VM, but there is a virtio-rng device that some hypervisors implement (qemu-kvm being one of them) that provide entropy from the host to the guest; the Linux kernel is able to use it if detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bartonjs Is the "no entropy available" message you're receiving on a fresh VM, or on a VM that has been rebooted? Per http://man7.org/linux/man-pages/man4/random.4.html:
If a seed file is saved across reboots as recommended below (all
major Linux distributions have done this since 2000 at least), the
output is cryptographically secure against attackers without local
root access as soon as it is reloaded in the boot sequence, and
perfectly adequate for network encryption session keys.
IMO blocking is the preferred behavior, especially if it happens only very rarely. If somebody's calling our APIs expecting high-entropy random values then we need to give them high-entropy random values. We can't pull a sleight of hand and give them low-entropy values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO blocking is the preferred behavior
This caused significant problems when we were initially bringing up .NET on Linux. We initially had a blocking implementation, and start-up of the process could end up getting blocked for seconds, until the user wildly moved their mouse around to introduce entropy into the system.
Maybe what we're talking about here is a special case somehow, or maybe things have been improved on all distros/versions we now support, but I don't believe blocking waiting for sufficient entropy is viable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the Python crew considered this same problem a few years ago. They took a breaking change between Python 3.5.2 and Python 3.6 in order to make the os.urandom() function block on low-entropy scenarios because of the security risks inherent in the original behavior. See https://www.python.org/dev/peps/pep-0524/ for more information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.urandom returns bytes suitable for cryptographic use (https://docs.python.org/2/library/os.html#os.urandom).
SystemNative_GetNonCryptographicallySecureRandomBytes is not meant to be suitable for cryptographic use as the name suggests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this API isn't intended to produce strong randomness, we should reconsider having Guid.NewGuid depend on it. We know from experience that for better or worse, devs use Guid.NewGuid to produce random identifiers and expect those identifiers to have strong randomness.
|
(Some of the build break is because this uses the ifunc attribute that's not supported by some of the older compilers used in CI. I'll address this with a different strategy, but I still want input on the approach.) |
6ba47dc to
c7d1517
Compare
|
Thanks everybody for the review -- I force-pushed a new version addressing most of the comments. |
|
Force-pushed a new version with the requested changes:
|
b73f370 to
4d2df52
Compare
|
/azp run corefx-ci |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
There was failure on Windows x86_Release, so I am retriggering the CI. @lpereira thank you for the PR.
We are planning to reduce the number of open PRs in corefx and only keep the active ones open. Thanks again and please let me know how you see the timeline and actionable items for the open PRs. |
|
@maryamariyan Comments were resolved already, just not marked as resolved here on GitHub. I've marked them as such here. I don't think there are any impediments to having this merged (with the exception of the Windows x86_Release, which we'll know soon enough if there's something else to be done). |
bartonjs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The race condition between srand48 and potential parallel calls to lrand48 needs to be addressed (potentially including explaining why it isn't a problem)
|
@lpereira what's the current state of this PR, do you have any plans on addressing feedback soon? |
Feedback will be addressed shortly. Notifications for this item ended up falling through the cracks of my email client. |
The getrandom(2) system call, introduced in Linux 3.17, allows one to obtain numbers from the same pool exported as /dev/urandom (if called with the GRND_NONBLOCK flag). The main advantage is that there's no need to keep a file descriptor open, but this also means that a good source of entropy is always available, even if /dev isn't present (such as in a tight chroot environment). Refactor how the method to obtain random numbers by using an ifunc, which will continue to prefer arc4random_buf(), if available, or pick between getrandom(2) or reading from /dev/urandom. The behavior of XOR-ing the results with lrand48(3) is still there as a workaround broken hardware RNGs. This PRNG is still initialized with time(NULL), which doesn't have a lot of entropy, and #38034 has been filed to address that in the future.
b310584 to
c9ec28a
Compare
|
Rebased on top of current master and pushed addressing comments. |
| // Another thread has already set the rand_des | ||
| close(fd); | ||
| } | ||
| sched_yield(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that a normal priority thread won the compare exchange, but a high priority thread gets stuck in here? Will that just deadlock, or will the high priority thread eventually yield anyways? (the description of sched_yield suggests that a single high priority thread just ignores the yield call).
|
The rebase (well, the force-push) made it difficult to review what was different, but it looks like the srand race condition has been alleviated (modulo one thread priority question). Any other feedback, @jkotas? |
This: #37906 (comment) |
| { | ||
| int fd; | ||
| return fd; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no issues with the thread-safety around the reads of this field? e.g. is there any situation where the compiler/hardware might introduce a read such that it effectively became int a = fd; if (fd >= 0) return a;? (If that could happen we could end up returning -1.)
|
Has any perf testing been done on this? What is the impact, positive or negative? |
|
@lpereira, are you planning to continue working on this? |
I am. I'll address these comments this week; been busy with other things. |
|
Thank you for your contribution. As announced in dotnet/coreclr#27549 this repository will be moving to dotnet/runtime on November 13. If you would like to continue working on this PR after this date, the easiest way to move the change to dotnet/runtime is:
|
|
Hey @lpereira still planning on finishing this? The cutoff for the migration is tomorrow. |
|
Thank you for your contribution. As announced in #27549 the dotnet/runtime repository will be used going forward for changes to this code base. Closing this PR as no more changes will be accepted into master for this repository. If you’d like to continue working on this change please move it to dotnet/runtime. |
(These changes are based on similar changes I made in coreclr.)
The getrandom(2) system call, introduced in Linux 3.17, allows one to obtain numbers from the same pool exported as /dev/urandom (if called with the GRND_NONBLOCK flag). The main advantage is that there's no need to keep a file descriptor open, but this also means that a good source of entropy is always available, even if /dev isn't present (such as in a tight chroot environment).
Refactor how the method to obtain random numbers by using an ifunc, which will continue to prefer arc4random_buf(), if available, and pick between getrandom(2) or reading from /dev/urandom. Falling back to lrand48(3) happens if either of these last two methods fail for whatever reason.
It's important to note that the entropy quality for /dev/urandom and whatever getrandom(2) returns is already pretty good, so it might not be a good idea to XOR it with values from lrand48(3). Since we're not dealing with a CSPRNG here, though, it's good to leave that as the fallback mechanism.