-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Optimize Elkan K-means with OpenMP #3764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@MarkFischinger looks good to me, I will approve it after the CI checks out. In the meantime, Would you please show the performance on another test with a higher order of data points? such as 10M with at least 500 dimensions? |
This seems a bit low given that the code is now parallelised. A bog-standard kmeans algorithm should be around twice as fast on a 2 core machine. The Elkan approach is clearly not the same, but I'd expect there'd be more of a speedup. Image extracted from https://doi.org/10.1109/ICSPCS.2017.8270510 (arXiv version: https://arxiv.org/abs/1707.09094) |
@conradsnicta I reran the benchmarks, but the speed remained the same. After making some additional adjustments (also mentioned by @rcurtin) to the elkan kmeans, here are the updated benchmark results: Settings: const size_t numPoints = 100000; // Number of data points.
const size_t numDimensions = 50; // Number of dimensions.
const size_t numClusters = 10; // Number of clusters.
const size_t maxIterations = 100; // Maximum number of iterations.Benchmark: ~/gsoc/mlpack-benchmarks/src/kmeans$ ./elkan
Running benchmark...
Benchmark completed.
Total time: 4.71125 secondsSo I got approximately the results we wanted and fixed some issues. Thank you for your feedback/comment :) |
rcurtin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this one has sat so long---my schedule is finally clearing up and I can get back to looking into these. I think this one is pretty close and the speedup is nice! I just have a few simple structural comments and similar; let me know if I can clarify anything. 👍
| lowerBounds.fill(0); | ||
| upperBounds.fill(DBL_MAX); | ||
| assignments.fill(0); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this back up to where it originally was? (again with the idea of reducing the diff size)
| #pragma omp single | ||
| numThreads = omp_get_num_threads(); | ||
| } | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I take a look at the naive strategy, the idea there is to just create a local centroid object inside each thread. If you applied that strategy here too (and just used a #pragma omp critical when summing centroids) I think the code would be much simpler and you wouldn't need numThreads. Let me know what you think.
| // Now, normalize and calculate the distance each cluster has moved. | ||
| arma::vec moveDistances(centroids.n_cols); | ||
| double cNorm = 0.0; // Cluster movement for residual. | ||
| #ifdef MLPACK_USE_OPENMP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the ifdef; the compiler will ignore the pragma if OpenMP is not enabled. 👍
|
@rcurtin @shrit I removed the OpenMP import and all the Ryan's suggestion did yield a better execution time: So the code before executed in this time: My variant before ryan's suggestion (called "optimized" in the graph below): I used this script for benchmarking. |
rcurtin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one looks pretty much ready to go to me, only a couple tiny comments. I also mentioned the custom reduction idea that I mentioned in the Hamerly k-means PR again; we don't have to do that in this PR, but if you find the idea is worthwhile in the other PR and quick enough to do, might as well do it here too---let me know what you think.
| } | ||
| else | ||
| // Create local centroid and count objects | ||
| arma::mat localNewCentroid(centroids.n_rows, centroids.n_cols, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably call this localNewCentroids since it holds more than one. :) Also the arma::fill::zeros is unnecessary here.
rcurtin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new custom reductions make this really nice and clean! I think it will have to wait until #3761 gets merged (so that omp_reductions.hpp is available). Just a couple small comments, then from my side things look good. 👍
| for (size_t c = 0; c < centroids.n_cols; ++c) | ||
| { | ||
| // Step 3: for all remaining points x and centers c such that c != c(x), | ||
| // u(x) > l(x, c) and u(x) > 0.5 d(c(x), c)... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to change this comment (I imagine it is because of the extra indentation that later got reverted with the custom OpenMP reductions); can you revert it to the original?
Co-authored-by: Ryan Curtin <[email protected]>
rcurtin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a merge and pushing a couple style fixes, everything seems to work here. Thanks again @MarkFischinger 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second approval provided automatically after 24 hours. 👍


I've now applied OpenMP to the Elkan K-means implementation. Here are the results.
I ran a benchmark with the following settings:
Here's what I found:
This represents a 23.6% decrease in execution time.