Skip to content

Pull/78 COMPRESS-485 + Substituting 'synchronized' with faster and fully thread-safe collections 'ConcurrentLinkedDeque' and iterators.#79

Closed
Tibor17 wants to merge 2 commits intoapache:masterfrom
Tibor17:pull/78
Closed

Pull/78 COMPRESS-485 + Substituting 'synchronized' with faster and fully thread-safe collections 'ConcurrentLinkedDeque' and iterators.#79
Tibor17 wants to merge 2 commits intoapache:masterfrom
Tibor17:pull/78

Conversation

@Tibor17
Copy link
Copy Markdown
Contributor

@Tibor17 Tibor17 commented May 11, 2019

One ArrayList was synchronized and another was not.
In this concurrent access. Considering wekanesses of memory model this may produce sporadic NPE (so that retrieved Future may not be visible in another thread)
Using synchronization would avoid it but why to use it in such simple critical sections today if we have concurrent collections in Java 1.5+.
So used ConcurrentLinkedDequeue and that's it. Easy.

hboutemy and others added 2 commits May 8, 2019 17:34
this will ease Reproducible Builds when creating zip or jar archives

thanks to Arnaud Nauwynck for the great help
@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage increased (+0.07%) to 86.594% when pulling a618b46 on Tibor17:pull/78 into e127b13 on apache:master.

@bodewig
Copy link
Copy Markdown
Member

bodewig commented Aug 7, 2019

Thanks @Tibor17. Could you please do me a favor and rebase this now that I have merged #78?

@bodewig
Copy link
Copy Markdown
Member

bodewig commented Aug 7, 2019

Before #78 the synchronized would ensure you couldn't spin off new threads and add entries to them while writing or closing. With your change it would be possible to add new threads after writing has started and the result is not really defined (the iterator may see the new stream or it may not), right?

Actually I looked through the history and initially we didn't synchronize the iterator when writing, this has been added as "recommended practice" via https://issues.apache.org/jira/browse/COMPRESS-430 - so we never really cared whether adding new threads while writing would lead to a defined result.

@Tibor17
Copy link
Copy Markdown
Contributor Author

Tibor17 commented Aug 7, 2019

Hi @bodewig , I have to check the code by myself again and rebase the branch since it is 3 months old change.

@Tibor17
Copy link
Copy Markdown
Contributor Author

Tibor17 commented Aug 8, 2019

@bodewig
You have opened two things, but sorry I have to say, they are not related and here is my explanation why:

  1. locking within private/public methods
  2. synchronized iterators

Since now, I am refering to before #78 .
Regarding (1) you mentioned:

synchronized would ensure you couldn't spin off new threads and add entries to them while writing or closing

The methods (addArchiveEntry and submit) and writeTo were not mutually exclusive.
If we want to exclusive then we should fire another PR for that.

If you want to prevent writeTo from double/parallel call, we can do this
if (es.isShutdown()) throw ...Exception.
Locking all 3 methods can be done with AtomicBoolean but that needs to see example.

Regarding (2):
It is old style to use collections wrapped in synchronizedList() and use iterators wrapped within synchronized (streams) {}. In modern collections ConcurrentLinkedDequeue is beneficial because it does not require any external locking.
There was also one issue because the variable futures was not synchronized collection.

@bodewig
Copy link
Copy Markdown
Member

bodewig commented Aug 8, 2019

Thanks @Tibor17

Actually I've more or less been thinking out loud and not raising issues.

I was trying to figure out whether the synchronized usage was giving the API users - or our code - any extra guarantees that the non-blocking collection code didn't. Keep in mind that I'm not the author of the original code either. I am the one who added the synchronized around the iteration over streams - but completely overlooked to the iteration over futures before that. Doesn't sound as if I was the most qualified person to comment ;-)

What I meant with the first part was that if you added new threads once writeTo was underway then whatever your new threads contributed would not be part of the result - while it now is undefined. Looking at the current code in master I see I've been wrong as writeTo does quite a few things before entering the synchronized block and there is enough leeway anyway.

No, I don't think we need to exclude the methods from each other. The class has a very clear usage pattern of two distinct phases:

  1. add all the things you want to add
  2. call writeTo

and it should be clear that the result of calling writeTo before you are done with the first phase is a dubious idea. In particular as the javadocs of writeTo state it will shut down the executor.

I am aware of the iteration guarantees of ConcurrentLinkedDeque. Back when Kristian added the parallel zip support Commons Compress' baseline has been Java 5 (we bumped that to 6 with Compress 1.12 and 7 in Compress 1.13, which is where we are today). If it had been Java 7 back then, then I'm sure Kristian would have used non-blocking collections instead.

@bodewig
Copy link
Copy Markdown
Member

bodewig commented Aug 8, 2019

oh, you don't need to rebase at all. I just realized all your changes have been pat of a distinct commit I could pick out of the PR and apply it separately. Sorry.

@bodewig
Copy link
Copy Markdown
Member

bodewig commented Aug 8, 2019

merged, many thanks again.

@bodewig bodewig closed this Aug 8, 2019
@Tibor17
Copy link
Copy Markdown
Contributor Author

Tibor17 commented Aug 8, 2019

@bodewig
I am thankful to you.
btw, great that you mentioned Kristian, he is my colleague in Apache. We have one common friend ;-)
Cheers Tibor

@bodewig
Copy link
Copy Markdown
Member

bodewig commented Aug 8, 2019

s/non-blocking/lock-free/g 🤦‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants