Conversation
doc/man3/CRYPTO_THREAD_run_once.pod
Outdated
There was a problem hiding this comment.
shouldn't that second read mention be atomic add?
5d3475a to
35b0b16
Compare
crypto/threads_win.c
Outdated
There was a problem hiding this comment.
I'm not an expert in this matter, just googled it, but wouldn't this implementation here be more natural?
int CRYPTO_atomic_read(int *val, int *ret, CRYPTO_RWLOCK *lock)
{
*ret = InterlockedCompareExchange(val, 0, 0);
return 1;
}See https://msdn.microsoft.com/en-us/library/ms686355(VS.85).aspx. Unless OpenSSL still supports Windows XP, evenInterlockedCompareExchangeAcquire() could be used.
There was a problem hiding this comment.
I know nothing about Windows programming :)
I'll switch this over.
|
Looks like there are already some existing use cases for |
crypto/threads_pthread.c
Outdated
There was a problem hiding this comment.
Is acquire the order you want to use here? Maybe acquire release?
There was a problem hiding this comment.
Acquire/release isn't legal for an atomic load. The valid memory orders are: relaxed, seq_cst, acquire, consume. Because this is reading but not writing, I think acquire is the correct ordering.
|
As was discussed already [internally], you can't make any sensible decision without modifying an atomic variable. Or in other words how atomic read is different from just read? Indeed, consider If there is contention for *val, then it can be different by the time caller gains execution control after CRYPTO_THREAD_unlock. So what does locking give you? Well, one can still argue that suggested atomic read would prevent compiler from moving it as result of optimization, but so would cast to volatile. As for use of atomic add with 0 increment in lh_stats.c it's terrible and is not right. If coherent view in stats is absolute requirement, then one shouldn't use atomic_add on structure fields elsewhere, but lock the whole structure. Currently view is incoherent and |
|
Well, there is a chance that I've let "practitioner" in myself take over "theoretician". I mean is it possible that notion of atomic load is meaningful on platforms where load has to be performed with multiple instructions? And "practitioner" says there are no such platforms? Is this it? Yet, we do make assumption about platform, for example that it's [[I]L]P(32|64), two's complement, so we don't actually aim for pure abstract platform. So is it reasonable to assume that conventional load is actually atomic? |
|
I, for one, am not confident enough to assert that we can assume that conventional load is atomic on all platforms we are used on. (Note that I say "platforms we are used on", not "supported platforms", because there are no doubt people pushing the envelope.) |
|
This came out of #4402 I don't think that an atomic read is required there. The most likely outcome of a race is an extra reseeding (or possibly a delayed reseed). I have seen processors where int loads took several instructions. Some small micros do this, MIPS does for constant loads (which aren't directly relevant). However, on systems where loads are atomic, I'd expect them to be replaced by a plain load/store sequence. This happens on x86/64. There is still a cost -- the memory write, the function call and return. If people don't think this PR is worthwhile, speak up. I'm happy for it to not go in. The current use case doesn't need it and the other places in lhash where it could be used should be done differently. Locking is the solution for the lhash cases. In a new PR. |
35b0b16 to
9a310bd
Compare
|
I've dealt with the lhash concerns in #4418 |
Trick question. There is ambiguity in "memory model" when it's used in same sentence with "platform" and "processor". In sense that processor itself upholds certain discipline referred to as "memory model", but on the other hand there is language "memory model" or how compiler schedules loads and stores :-) Anyway, I'd say that we can take unaligned accesses out of equation. For following reason. We are not talking about reading variables in general, but specifically about base types that match processor register, int, long, void *. And we can expect them to be aligned, compiler is obliged to do it for us. In the context of "atomic" load we can take processor's memory model out of equation too, because there is ambiguity in either case, i.e. did you get value before or after modification. I mean processor's memory model affect ambiguity, but doesn't eliminate it, and it doesn't matter which ambiguity is it. As for language "memory model" it's tricky, isn't it? Our requirement for language standard compliance predates notion of "memory model" in C. It should be noted however that two disjoint subroutines accessing same shared variable, and called from disjoint translation unit do constitute desirable "memory model", don't they? [I suppose this is also argument against LTO.] |
This can't be viewed as load, because there is no separately addressable memory cell. Pieces of constant are encoded into instructions themselves. As already mentioned, we are not talking about loads in most general sense, but specifically about aligned loads of shared variables that fit into processor register. |
And so would variable declared volatile... |
|
OK, would it be appropriate to say like this? It's desirable to use atomic_load, because it conveys information to compiler about what it can do in terms of optimization. It will be compiled as just load [on supported platforms], but compiler won't be as free to move it around as just reference. It should be noted that essentially same effect can be achieved by declaring variable volatile. And it would actually be acceptable approach for compilers that don't specify "memory model". One can also point out that CRYPTO_atomic_* reside in separate translation unit, which actually limits such compiler's ability to screw things up. [On the other hand one can note that formally speaking adhering to atomic_load should make LTO safe, at least in respect to involved variables.] If it's adequate summary, then it might be appropriate to implement CRYPTO_atomic_read. But suggestion is to adhere to volatile where atomic_load is not an option. Concern here is that synchronization costs would be off-charts disproportionate to [zero] benefit. And "might" in first sentence means that one can still wonder if there is really legitimate reason for it. I mean since you can't really take sensible decision based on value of reference counter without actually modifying it... I suppose I have to double-check #4402. |
|
The counter |
|
@mspncp summed it up well. The instance in question doesn't strictly require this functionality. |
|
It seems that volatile isn't a good answer: https://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming It guarantees going to memory, not that the memory access is atomic. |
|
Once again, volatile is suggested as compromise for specific case, load of aligned integer. There is no contradiction with referred article, because it too recognizes that any load of aligned integer, be it volatile or not, is atomic. And what is the compromise? Pre-C11[?] doesn't make any promises about how and when variables are loaded, written or cached in registers. Except for volatile ones. So by adhering to volatile load we simply prevent compiler from caching the value in register. And referred article effectively recognizes that. And once again, the reason for seeking compromise is that synchronization doesn't give you any advantage, in sense that you can |
Well, it's not like C11 makes such promises in general. It's just in addition to volatiles it adds notion of atomic variables and specifies their behaviour in respect to memory references. |
As far as I understand it, atomic read/write is meant to prevent me from reading inconsistent values (partiallly updated by another thread). This has nothing to do with the question whether the value is current or might become outdated immediately afterwards. So isn't this discussion comparing apples and oranges? Or did I miss something? |
Just in case to clarify. |
This is misleading formulation. Atomic variable can't be partially updated, per its definition. For this reason compiler won't even support types that hardware doesn't guarantee atomic access for. I.e. you are not free to declare arbitrary type [base or not] as atomic.
Well, the spirit of the quote is rather "[provided that we recognize that load of aligned integer is actually atomic (on all known ILP32 and better platforms)] synchronization doesn't give you any advantage." |
And of course so is store in thread that is updating the variable in question of course. Be it one declared atomic or not. |
Ok, I think you convinced me now to keep things simple in PR #4402 and just do a naive compare without worrying too much ;-). |
|
I can't see a simple compare hurting for #4402 A delayed or extra reseed is harmless. I don't think a reseed can reasonably be missed which is the problem case (2^32 reseeds between generations is a problem but essentially impossible). What synchronisation does give you is a guarantee that the two halves weren't read separately. E.g. the structure is misaligned or you're running on an 8 bit CPU that requires four loads or, perhaps, crosses a page boundary. Anyway, I'll close this unmerged tomorrow if I don't hear any objections. |
|
No concerns about keeping this have been raised so I'm closing without merging. |
|
It would probably be document the conclusions in the tree somewhere, in terms of what behavior we have decided to rely upon. |
|
Reopening in light of #4467 |
|
Now that it's re-opened because it seems useful (see the atomic-add calls with 0), are @dot-asm and others satisfied with it? Another option is to define atomic-read as atomic-add with zero and "fix" it later. |
|
Fixed a |
|
Should an atomic write also be supported? Again, atomic_add can do this by adding zero but a write would be more efficient on many platforms since it avoids half of the read/write cycle. |
|
CRYPTO_atomic_write added. |
|
ping? |
in an atomic fashion. Reviewed-by: Rich Salz <[email protected]> (Merged from #4414)
Reviewed-by: Rich Salz <[email protected]> (Merged from #4414)
|
Merged, thanks. |
|
Merged, thanks. |
|
Generally speaking it's rather a compiler-specific thing than threading-model-specific. For example there is gcc for Windows, you know. To extend the argument it might use C11 coverage (see refcount.h for inspiration). In other words it might be appropriate to have it as separate module as apposite to threading-module-specific... |
|
|
||
| int CRYPTO_atomic_read(int *val, int *ret, CRYPTO_RWLOCK *lock) | ||
| { | ||
| InterlockedCompareExchange(val, 0, 0); |
There was a problem hiding this comment.
Besides, if one argues that atomic_load is better than atomic_add with 0, because it translates to instruction with no lock prefix, then how come this one? This is as expensive instruction...
There was a problem hiding this comment.
I'm open to other suggestions as to how to go about this one Windows.
I don't know enough about the operating system.
There was a problem hiding this comment.
@paulidale, you forget the assignment to *ret, see my original suggestion.
I have no idea about the cost of these functions, @dot-asm seems to be better informed. If he says that the compare exchange is much more expensive than a simple attomic_add with 0, then you could use InterlockedAdd() instead.
There was a problem hiding this comment.
Yeah, I missed the assignment part of your suggestion @mspncp
Final versions are best done by someone with substantial Windows experience.
| return 1; | ||
| } | ||
| # endif | ||
| if (!CRYPTO_THREAD_write_lock(lock)) |
I'm not, because I still think that it gives you wrong impression. But I have to collect my thoughts.... |
|
I've made the changes to the above problems in #4517 |
Add a CRYPTO_atomic_read call which allows an int variable to be read in an atomic fashion.