CRYPTO_atomic_read by paulidale · Pull Request #4414 · openssl/openssl

paulidale · 2017-09-25T04:30:04Z

Add a CRYPTO_atomic_read call which allows an int variable to be read in an atomic fashion.

documentation is added or updated
tests are added or updated

richsalz · 2017-09-25T04:32:16Z

doc/man3/CRYPTO_THREAD_run_once.pod

shouldn't that second read mention be atomic add?

And removed.

mspncp · 2017-09-25T06:00:57Z

crypto/threads_win.c

I'm not an expert in this matter, just googled it, but wouldn't this implementation here be more natural?

int CRYPTO_atomic_read(int *val, int *ret, CRYPTO_RWLOCK *lock) { *ret = InterlockedCompareExchange(val, 0, 0); return 1; }

See https://msdn.microsoft.com/en-us/library/ms686355(VS.85).aspx. Unless OpenSSL still supports Windows XP, evenInterlockedCompareExchangeAcquire() could be used.

I know nothing about Windows programming :)
I'll switch this over.

mspncp · 2017-09-25T06:26:32Z

Looks like there are already some existing use cases for CRYPTO_atomic_read(), which should be replaced:

~/src/openssl$ grep -r -nH 'CRYPTO_atomic_add.*, 0'  *
crypto/lhash/lh_stats.c:75:    CRYPTO_atomic_add(&lh_mut->num_hash_calls, 0, &ret,
crypto/lhash/lh_stats.c:78:    CRYPTO_atomic_add(&lh_mut->num_comp_calls, 0, &ret,
crypto/lhash/lh_stats.c:85:    CRYPTO_atomic_add(&lh_mut->num_retrieve, 0, &ret, lh->retrieve_stats_lock);
crypto/lhash/lh_stats.c:87:    CRYPTO_atomic_add(&lh_mut->num_retrieve_miss, 0, &ret,
crypto/lhash/lh_stats.c:90:    CRYPTO_atomic_add(&lh_mut->num_hash_comps, 0, &ret,

kroeckx · 2017-09-25T07:03:17Z

crypto/threads_pthread.c

Is acquire the order you want to use here? Maybe acquire release?

Acquire/release isn't legal for an atomic load. The valid memory orders are: relaxed, seq_cst, acquire, consume. Because this is reading but not writing, I think acquire is the correct ordering.

dot-asm · 2017-09-25T09:07:55Z

As was discussed already [internally], you can't make any sensible decision without modifying an atomic variable. Or in other words how atomic read is different from just read? Indeed, consider

    if (!CRYPTO_THREAD_write_lock(lock))
        return 0;

    *ret  = *val;

    if (!CRYPTO_THREAD_unlock(lock))
        return 0;

If there is contention for *val, then it can be different by the time caller gains execution control after CRYPTO_THREAD_unlock. So what does locking give you? Well, one can still argue that suggested atomic read would prevent compiler from moving it as result of optimization, but so would cast to volatile. As for use of atomic add with 0 increment in lh_stats.c it's terrible and is not right. If coherent view in stats is absolute requirement, then one shouldn't use atomic_add on structure fields elsewhere, but lock the whole structure. Currently view is incoherent and ~~not~~ no atomic add with 0 increment would fix that.

dot-asm · 2017-09-25T09:55:53Z

Well, there is a chance that I've let "practitioner" in myself take over "theoretician". I mean is it possible that notion of atomic load is meaningful on platforms where load has to be performed with multiple instructions? And "practitioner" says there are no such platforms? Is this it? Yet, we do make assumption about platform, for example that it's [[I]L]P(32|64), two's complement, so we don't actually aim for pure abstract platform. So is it reasonable to assume that conventional load is actually atomic?

kaduk · 2017-09-25T14:26:30Z

I, for one, am not confident enough to assert that we can assume that conventional load is atomic on all platforms we are used on. (Note that I say "platforms we are used on", not "supported platforms", because there are no doubt people pushing the envelope.)
And if I understand correctly, here "platform" includes not just processor but also memory model and potential for unaligned accesses.

paulidale · 2017-09-25T21:56:36Z

This came out of #4402 I don't think that an atomic read is required there. The most likely outcome of a race is an extra reseeding (or possibly a delayed reseed).

I have seen processors where int loads took several instructions. Some small micros do this, MIPS does for constant loads (which aren't directly relevant). However, on systems where loads are atomic, I'd expect them to be replaced by a plain load/store sequence. This happens on x86/64. There is still a cost -- the memory write, the function call and return.

If people don't think this PR is worthwhile, speak up. I'm happy for it to not go in. The current use case doesn't need it and the other places in lhash where it could be used should be done differently.

Locking is the solution for the lhash cases. In a new PR.

paulidale · 2017-09-25T23:55:31Z

I've dealt with the lhash concerns in #4418

dot-asm · 2017-09-26T07:26:45Z

"platform" includes not just processor but also memory model and potential for unaligned accesses.

Trick question. There is ambiguity in "memory model" when it's used in same sentence with "platform" and "processor". In sense that processor itself upholds certain discipline referred to as "memory model", but on the other hand there is language "memory model" or how compiler schedules loads and stores :-) Anyway, I'd say that we can take unaligned accesses out of equation. For following reason. We are not talking about reading variables in general, but specifically about base types that match processor register, int, long, void *. And we can expect them to be aligned, compiler is obliged to do it for us. In the context of "atomic" load we can take processor's memory model out of equation too, because there is ambiguity in either case, i.e. did you get value before or after modification. I mean processor's memory model affect ambiguity, but doesn't eliminate it, and it doesn't matter which ambiguity is it. As for language "memory model" it's tricky, isn't it? Our requirement for language standard compliance predates notion of "memory model" in C. It should be noted however that two disjoint subroutines accessing same shared variable, and called from disjoint translation unit do constitute desirable "memory model", don't they? [I suppose this is also argument against LTO.]

dot-asm · 2017-09-26T07:34:06Z

I have seen processors where int loads took several instructions. Some small micros do this, MIPS does for constant loads

This can't be viewed as load, because there is no separately addressable memory cell. Pieces of constant are encoded into instructions themselves. As already mentioned, we are not talking about loads in most general sense, but specifically about aligned loads of shared variables that fit into processor register.

dot-asm · 2017-09-26T07:35:36Z

two disjoint subroutines accessing same shared variable, and called from disjoint translation unit do constitute desirable "memory model", don't they?

And so would variable declared volatile...

dot-asm · 2017-09-26T10:46:41Z

OK, would it be appropriate to say like this? It's desirable to use atomic_load, because it conveys information to compiler about what it can do in terms of optimization. It will be compiled as just load [on supported platforms], but compiler won't be as free to move it around as just reference. It should be noted that essentially same effect can be achieved by declaring variable volatile. And it would actually be acceptable approach for compilers that don't specify "memory model". One can also point out that CRYPTO_atomic_* reside in separate translation unit, which actually limits such compiler's ability to screw things up. [On the other hand one can note that formally speaking adhering to atomic_load should make LTO safe, at least in respect to involved variables.]

If it's adequate summary, then it might be appropriate to implement CRYPTO_atomic_read. But suggestion is to adhere to volatile where atomic_load is not an option. Concern here is that synchronization costs would be off-charts disproportionate to [zero] benefit. And "might" in first sentence means that one can still wonder if there is really legitimate reason for it. I mean since you can't really take sensible decision based on value of reference counter without actually modifying it... I suppose I have to double-check #4402.

mspncp · 2017-09-26T12:26:06Z

The counter drbg->restart_count in #4402 is not a reference counter, it's more like a modification timestamp. A chained drbg checks wether drbg->restart_count != drbg->parent->restart_count. If this is the case, it knows that the parent reseeded more recently than itself, and will auto-reseed. As @paulidale already noted, a race condition will in the worst case have the effect, that the drbg is reseeded more than once , or the reseed occures one or more generate() requests later than intended.

paulidale · 2017-09-26T21:39:52Z

@mspncp summed it up well. The instance in question doesn't strictly require this functionality.
Hence to offer to close this without merging.

paulidale · 2017-09-27T04:40:02Z

It seems that volatile isn't a good answer: https://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming

It guarantees going to memory, not that the memory access is atomic.

dot-asm · 2017-09-27T07:19:07Z

Once again, volatile is suggested as compromise for specific case, load of aligned integer. There is no contradiction with referred article, because it too recognizes that any load of aligned integer, be it volatile or not, is atomic. And what is the compromise? Pre-C11[?] doesn't make any promises about how and when variables are loaded, written or cached in registers. Except for volatile ones. So by adhering to volatile load we simply prevent compiler from caching the value in register. And referred article effectively recognizes that. And once again, the reason for seeking compromise is that synchronization doesn't give you any advantage, in sense that you can ~~have~~ get outdated value even with mutexes around it.

dot-asm · 2017-09-27T07:25:21Z

Pre-C11[?] doesn't make any promises about how and when variables are loaded, written or cached in registers.

Well, it's not like C11 makes such promises in general. It's just in addition to volatiles it adds notion of atomic variables and specifies their behaviour in respect to memory references.

mspncp · 2017-09-27T07:44:03Z

synchronization doesn't give you any advantage, in sense that you can have outdated value even with mutexes around it.

As far as I understand it, atomic read/write is meant to prevent me from reading inconsistent values (partiallly updated by another thread). This has nothing to do with the question whether the value is current or might become outdated immediately afterwards. So isn't this discussion comparing apples and oranges? Or did I miss something?

dot-asm · 2017-09-27T07:44:36Z

by adhering to volatile load we simply prevent compiler from caching the value in register.

Just in case to clarify. I You have to imagine some super aggressive LTO that inlines all functions and for whatever reason chooses to cache variable value in register for like forever.

dot-asm · 2017-09-27T07:54:42Z

atomic read/write is meant to prevent me from reading inconsistent values (partiallly updated by another thread).

This is misleading formulation. Atomic variable can't be partially updated, per its definition. For this reason compiler won't even support types that hardware doesn't guarantee atomic access for. I.e. you are not free to declare arbitrary type [base or not] as atomic.

Or did I miss something?

Well, the spirit of the quote is rather "[provided that we recognize that load of aligned integer is actually atomic (on all known ILP32 and better platforms)] synchronization doesn't give you any advantage."

dot-asm · 2017-09-27T08:00:12Z

[provided that we recognize that load of aligned integer is actually atomic (on all known ILP32 and better platforms)]

And of course so is store in thread that is updating the variable in question of course. Be it one declared atomic or not.

mspncp · 2017-09-27T08:06:34Z

Well, the spirit of the quote is rather "[provided that we recognize that load of aligned integer is actually atomic (on all known ILP32 and better platforms)] synchronization doesn't give you any advantage."

Ok, I think you convinced me now to keep things simple in PR #4402 and just do a naive compare without worrying too much ;-).

paulidale · 2017-09-27T09:13:13Z

I can't see a simple compare hurting for #4402 A delayed or extra reseed is harmless. I don't think a reseed can reasonably be missed which is the problem case (2^32 reseeds between generations is a problem but essentially impossible).

What synchronisation does give you is a guarantee that the two halves weren't read separately. E.g. the structure is misaligned or you're running on an 8 bit CPU that requires four loads or, perhaps, crosses a page boundary.

Anyway, I'll close this unmerged tomorrow if I don't hear any objections.

paulidale · 2017-09-27T20:56:15Z

No concerns about keeping this have been raised so I'm closing without merging.

kaduk · 2017-09-27T21:03:46Z

It would probably be document the conclusions in the tree somewhere, in terms of what behavior we have decided to rely upon.

paulidale · 2017-10-08T22:09:06Z

Reopening in light of #4467

richsalz · 2017-10-08T23:19:26Z

Now that it's re-opened because it seems useful (see the atomic-add calls with 0), are @dot-asm and others satisfied with it? Another option is to define atomic-read as atomic-add with zero and "fix" it later.

paulidale · 2017-10-09T00:46:46Z

Fixed a #defined() to use the correct macro (__ATOMIC_ACQUIRE instead of __ATOMIC_ACQ_REL).

paulidale · 2017-10-09T00:48:25Z

Should an atomic write also be supported?

int CRYPTO_atomic_write(int *val, int n, CRYPTO_RWLOCK *lock)

Again, atomic_add can do this by adding zero but a write would be more efficient on many platforms since it avoids half of the read/write cycle.

paulidale · 2017-10-09T04:39:25Z

CRYPTO_atomic_write added.

in an atomic fashion.

paulidale · 2017-10-09T23:56:15Z

ping?

in an atomic fashion. Reviewed-by: Rich Salz <[email protected]> (Merged from #4414)

Reviewed-by: Rich Salz <[email protected]> (Merged from #4414)

paulidale · 2017-10-10T02:55:59Z

Merged, thanks.

paulidale · 2017-10-10T02:57:40Z

Merged, thanks.

dot-asm · 2017-10-10T05:58:16Z

Generally speaking it's rather a compiler-specific thing than threading-model-specific. For example there is gcc for Windows, you know. To extend the argument it might use C11 coverage (see refcount.h for inspiration). In other words it might be appropriate to have it as separate module as apposite to threading-module-specific...

dot-asm · 2017-10-10T05:59:59Z

crypto/threads_win.c


+int CRYPTO_atomic_read(int *val, int *ret, CRYPTO_RWLOCK *lock)
+{
+    InterlockedCompareExchange(val, 0, 0);


And who updates *ret?

Besides, if one argues that atomic_load is better than atomic_add with 0, because it translates to instruction with no lock prefix, then how come this one? This is as expensive instruction...

I'm open to other suggestions as to how to go about this one Windows.
I don't know enough about the operating system.

@paulidale, you forget the assignment to *ret, see my original suggestion.

I have no idea about the cost of these functions, @dot-asm seems to be better informed. If he says that the compare exchange is much more expensive than a simple attomic_add with 0, then you could use InterlockedAdd() instead.

Ok, sorry, I overlooked #4517.

Yeah, I missed the assignment part of your suggestion @mspncp
Final versions are best done by someone with substantial Windows experience.

dot-asm · 2017-10-10T06:02:38Z

crypto/threads_pthread.c

+        return 1;
+    }
+# endif
+    if (!CRYPTO_THREAD_write_lock(lock))


Why write lock?

Copy paste :(

dot-asm · 2017-10-10T06:29:32Z

are @dot-asm and others satisfied with it?

I'm not, because I still think that it gives you wrong impression. But I have to collect my thoughts....

paulidale · 2017-10-10T23:50:56Z

I've made the changes to the above problems in #4517

paulidale mentioned this pull request Sep 25, 2017

Add master DRBG for reseeding #4402

Closed

2 tasks

richsalz suggested changes Sep 25, 2017

View reviewed changes

paulidale force-pushed the atomic_read branch from 5d3475a to 35b0b16 Compare September 25, 2017 04:34

mspncp reviewed Sep 25, 2017

View reviewed changes

kroeckx reviewed Sep 25, 2017

View reviewed changes

paulidale force-pushed the atomic_read branch from 35b0b16 to 9a310bd Compare September 25, 2017 22:11

paulidale closed this Sep 27, 2017

paulidale deleted the atomic_read branch September 27, 2017 20:57

paulidale restored the atomic_read branch September 27, 2017 21:12

paulidale reopened this Oct 8, 2017

paulidale mentioned this pull request Oct 8, 2017

Add CRYPTO_get_alloc_counts. #4467

Closed

paulidale force-pushed the atomic_read branch from 9a310bd to bcdc4a0 Compare October 8, 2017 22:15

paulidale force-pushed the atomic_read branch from bcdc4a0 to b32d9ed Compare October 9, 2017 00:45

paulidale added 2 commits October 10, 2017 08:45

Add a CRYPTO_atomic_read call which allows an int variable to be read

dca6130

in an atomic fashion.

Add atomic write call

5c35294

paulidale force-pushed the atomic_read branch from d465d87 to 5c35294 Compare October 9, 2017 23:26

richsalz approved these changes Oct 10, 2017

View reviewed changes

levitte pushed a commit that referenced this pull request Oct 10, 2017

Add a CRYPTO_atomic_read call which allows an int variable to be read

94683b7

in an atomic fashion. Reviewed-by: Rich Salz <[email protected]> (Merged from #4414)

levitte pushed a commit that referenced this pull request Oct 10, 2017

Add atomic write call

30ff41b

Reviewed-by: Rich Salz <[email protected]> (Merged from #4414)

paulidale closed this Oct 10, 2017

paulidale deleted the atomic_read branch October 10, 2017 02:58

dot-asm reviewed Oct 10, 2017

View reviewed changes

mspncp mentioned this pull request Oct 13, 2018

data race in RAND_DRBG_generate #7394

Closed

Uh oh!

Comments

Conversation

paulidale commented Sep 25, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mspncp commented Sep 25, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dot-asm commented Sep 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dot-asm commented Sep 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaduk commented Sep 25, 2017

Uh oh!

paulidale commented Sep 25, 2017

Uh oh!

paulidale commented Sep 25, 2017

Uh oh!

dot-asm commented Sep 26, 2017

Uh oh!

dot-asm commented Sep 26, 2017

Uh oh!

dot-asm commented Sep 26, 2017

Uh oh!

dot-asm commented Sep 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mspncp commented Sep 26, 2017

Uh oh!

paulidale commented Sep 26, 2017

Uh oh!

paulidale commented Sep 27, 2017

Uh oh!

dot-asm commented Sep 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dot-asm commented Sep 27, 2017

Uh oh!

mspncp commented Sep 27, 2017

Uh oh!

dot-asm commented Sep 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dot-asm commented Sep 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dot-asm commented Sep 27, 2017

Uh oh!

mspncp commented Sep 27, 2017

Uh oh!

paulidale commented Sep 27, 2017

Uh oh!

paulidale commented Sep 27, 2017

Uh oh!

kaduk commented Sep 27, 2017

Uh oh!

paulidale commented Oct 8, 2017

Uh oh!

richsalz commented Oct 8, 2017

Uh oh!

paulidale commented Oct 9, 2017

Uh oh!

paulidale commented Oct 9, 2017

Uh oh!

paulidale commented Oct 9, 2017

dot-asm commented Sep 25, 2017 •

edited

Loading

dot-asm commented Sep 25, 2017 •

edited

Loading

dot-asm commented Sep 26, 2017 •

edited

Loading

dot-asm commented Sep 27, 2017 •

edited

Loading

dot-asm commented Sep 27, 2017 •

edited

Loading

dot-asm commented Sep 27, 2017 •

edited

Loading