Skip to content

Conversation

@PastaPastaPasta
Copy link
Member

I ran some benchmarkes and according to my data, using std based ctpl causes ~2% faster benchmarks on total time, min time, max time, median time see https://docs.google.com/spreadsheets/d/1tw43VBd50fcyNPZ5sUV8F-GpRFgDmiSOO1qdV1wMpqw/edit?usp=sharing for data

@PastaPastaPasta PastaPastaPasta added this to the 17.1 milestone Apr 28, 2021
@UdjinM6
Copy link

UdjinM6 commented Apr 29, 2021

Can you reproduce these result in multiple runs? Cause for me it fluctuates a couple of percents up and down from one run to another and it's not clear if there are any performance gains or not tbh. On the bright side, there is also no clear performance loss either :) Also, I like the idea of getting rid of boost dependencies if there is no difference in performance but we might want to take it even further though, pls see (and test) 14d095000c.

@PastaPastaPasta
Copy link
Member Author

Yeah, idk, my "long tests" where the benching took about an hour total for each run show the 2% gains, but I'm not sure if those are real or fake gains :)

Getting rid of boost is enough of a justification

@sidhujag
Copy link

Doesn't look like a lockfree queue mechanism, I used moodycamel in mine https://github.com/cameron314/concurrentqueue

@PastaPastaPasta
Copy link
Member Author

The ctpl::detail::Queue is not lockfree (uses mutexes) but I think that's probably fine. Locking data structures can sometimes even be faster than non-locking data structures.

Is there a reason why we should exclusively look for a non-locking data structure here?

@sidhujag
Copy link

sidhujag commented Apr 30, 2021

The ctpl::detail::Queue is not lockfree (uses mutexes) but I think that's probably fine. Locking data structures can sometimes even be faster than non-locking data structures.

Is there a reason why we should exclusively look for a non-locking data structure here?

Lock free is preferred if producers don't need to coordinate and consistency isn't required nor linearity of results.

#3797

I think overall there are lots of other bottlenecks that are there as a result I only see a marginal improvement but I also had other issues with the boost one documented in the issue.

The best results I think will be with bulk enqueuing and dequeuing.

If the requirements allow it, you will have much less busy-wait cycles with lock free and less CAS per acquiring and releasing context. If designed right you should see 50-100% improvements in performance so it won't give it here (you start with lockfree and build around it) but lock-free is desired if the parallelization is allowed (communication and coordination not required for things like signature checks).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants