Skip to content

Conversation

@MarkFischinger
Copy link
Contributor

I've now applied OpenMP to the Elkan K-means implementation. Here are the results.

I ran a benchmark with the following settings:

Number of data points: 100,000
Number of dimensions: 50
Number of clusters: 10
Maximum iterations: 100

Here's what I found:

Original version: 9.42804 seconds
OpenMP version: 7.19841 seconds

This represents a 23.6% decrease in execution time.

@shrit
Copy link
Member

shrit commented Jul 10, 2024

@MarkFischinger looks good to me, I will approve it after the CI checks out. In the meantime, Would you please show the performance on another test with a higher order of data points? such as 10M with at least 500 dimensions?

@conradsnicta
Copy link
Contributor

conradsnicta commented Jul 11, 2024

This represents a 23.6% decrease in execution time.

This seems a bit low given that the code is now parallelised. A bog-standard kmeans algorithm should be around twice as fast on a 2 core machine. The Elkan approach is clearly not the same, but I'd expect there'd be more of a speedup.

Image extracted from https://doi.org/10.1109/ICSPCS.2017.8270510 (arXiv version: https://arxiv.org/abs/1707.09094)

8270510-fig-2-source-large

@MarkFischinger
Copy link
Contributor Author

This seems a bit low given that the code is now parallelised [...]

@conradsnicta I reran the benchmarks, but the speed remained the same. After making some additional adjustments (also mentioned by @rcurtin) to the elkan kmeans, here are the updated benchmark results:

Settings:

  const size_t numPoints = 100000;  // Number of data points.
  const size_t numDimensions = 50;  // Number of dimensions.
  const size_t numClusters = 10;    // Number of clusters.
  const size_t maxIterations = 100; // Maximum number of iterations.

Benchmark:

  ~/gsoc/mlpack-benchmarks/src/kmeans$ ./elkan
  Running benchmark...
  Benchmark completed.
  Total time: 4.71125 seconds

So I got approximately the results we wanted and fixed some issues. Thank you for your feedback/comment :)

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this one has sat so long---my schedule is finally clearing up and I can get back to looking into these. I think this one is pretty close and the speedup is nice! I just have a few simple structural comments and similar; let me know if I can clarify anything. 👍

lowerBounds.fill(0);
upperBounds.fill(DBL_MAX);
assignments.fill(0);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this back up to where it originally was? (again with the idea of reducing the diff size)

#pragma omp single
numThreads = omp_get_num_threads();
}
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I take a look at the naive strategy, the idea there is to just create a local centroid object inside each thread. If you applied that strategy here too (and just used a #pragma omp critical when summing centroids) I think the code would be much simpler and you wouldn't need numThreads. Let me know what you think.

// Now, normalize and calculate the distance each cluster has moved.
arma::vec moveDistances(centroids.n_cols);
double cNorm = 0.0; // Cluster movement for residual.
#ifdef MLPACK_USE_OPENMP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the ifdef; the compiler will ignore the pragma if OpenMP is not enabled. 👍

@MarkFischinger
Copy link
Contributor Author

@rcurtin @shrit I removed the OpenMP import and all the MLPACK_USE_OPENMP tags, fixed the line lengths and implemented ryan's idea that included local centroids for each thread.

Ryan's suggestion did yield a better execution time:

Running benchmark for ElkanKMeans with 1000 points, 10 clusters, and 50 dimensions...
Total time: 0.04244 seconds
Running benchmark for ElkanKMeans with 5000 points, 10 clusters, and 50 dimensions...
Total time: 0.190035 seconds
Running benchmark for ElkanKMeans with 10000 points, 10 clusters, and 50 dimensions...
Total time: 0.353643 seconds
Running benchmark for ElkanKMeans with 50000 points, 10 clusters, and 50 dimensions...
Total time: 2.18762 seconds
Running benchmark for ElkanKMeans with 100000 points, 10 clusters, and 50 dimensions...
Total time: 4.59366 seconds

So the code before executed in this time:

Running benchmark for ElkanKMeans with 1000 points, 10 clusters, and 50 dimensions...
Total time: 0.0752948 seconds
Running benchmark for ElkanKMeans with 5000 points, 10 clusters, and 50 dimensions...
Total time: 0.356647 seconds
Running benchmark for ElkanKMeans with 10000 points, 10 clusters, and 50 dimensions...
Total time: 0.751239 seconds
Running benchmark for ElkanKMeans with 50000 points, 10 clusters, and 50 dimensions...
Total time: 4.68356 seconds
Running benchmark for ElkanKMeans with 100000 points, 10 clusters, and 50 dimensions...
Total time: 10.187 seconds

My variant before ryan's suggestion (called "optimized" in the graph below):

Running benchmark for ElkanKMeans with 1000 points, 10 clusters, and 50 dimensions...
Total time: 0.055524 seconds
Running benchmark for ElkanKMeans with 5000 points, 10 clusters, and 50 dimensions...
Total time: 0.178682 seconds
Running benchmark for ElkanKMeans with 10000 points, 10 clusters, and 50 dimensions...
Total time: 1.70825 seconds
Running benchmark for ElkanKMeans with 50000 points, 10 clusters, and 50 dimensions...
Total time: 3.66519 seconds
Running benchmark for ElkanKMeans with 100000 points, 10 clusters, and 50 dimensions...
Total time: 4.70217 seconds

I used this script for benchmarking.
Here is the data visualized:

benchmark_comparison

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one looks pretty much ready to go to me, only a couple tiny comments. I also mentioned the custom reduction idea that I mentioned in the Hamerly k-means PR again; we don't have to do that in this PR, but if you find the idea is worthwhile in the other PR and quick enough to do, might as well do it here too---let me know what you think.

}
else
// Create local centroid and count objects
arma::mat localNewCentroid(centroids.n_rows, centroids.n_cols,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably call this localNewCentroids since it holds more than one. :) Also the arma::fill::zeros is unnecessary here.

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new custom reductions make this really nice and clean! I think it will have to wait until #3761 gets merged (so that omp_reductions.hpp is available). Just a couple small comments, then from my side things look good. 👍

for (size_t c = 0; c < centroids.n_cols; ++c)
{
// Step 3: for all remaining points x and centers c such that c != c(x),
// u(x) > l(x, c) and u(x) > 0.5 d(c(x), c)...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change this comment (I imagine it is because of the extra indentation that later got reverted with the custom OpenMP reductions); can you revert it to the original?

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a merge and pushing a couple style fixes, everything seems to work here. Thanks again @MarkFischinger 👍

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second approval provided automatically after 24 hours. 👍

@shrit shrit merged commit 3657f42 into mlpack:master Sep 16, 2024
This was referenced Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants