Improve Hamerly K-Means with OpenMP Implementation #3761

MarkFischinger · 2024-07-10T11:11:59Z

I've been working on optimizing the Hamerly K-means implementation, and I'm happy to share the results with you.

Benchmark Results
I ran a benchmark with the following settings:

Number of data points: 100,000
Number of dimensions: 50
Number of clusters: 10
Maximum iterations: 100

Here's what I found:

Original version: 12.1609 seconds
OpenMP version: 6.98171 seconds

That's a 42.6% decrease in execution time!
Looking forward to your feedback! Let me know if you need any clarification or have any questions.

rcurtin

Nice improvement! Just a couple small comments.

rcurtin · 2024-07-10T12:52:39Z

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp

  double centroidMovement = 0.0;
+
+  #pragma omp parallel for reduction(+:distanceCalculations, centroidMovement) \
+                            reduction(max:furthestMovement, secondFurthestMovement)


What about furthestMovingCluster? That is a little trickier to handle...

rcurtin · 2024-07-10T12:53:01Z

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp

    // First bound test.
    if (upperBounds(i) <= m)
    {
+      #pragma omp atomic


Does this need to be atomic if we already have the reduction for hamerlyPruned?

MarkFischinger · 2024-07-22T18:20:43Z

I fixed the error and made sure to use #ifdef MLPACK_USE_OPENMP at the correct places. Here are the new benchmarks:

mark@mark:~/gsoc/mlpack-benchmarks/src/kmeans$ ./hamerly_kmeans
Running benchmark...
Benchmark completed.
Total time: 4.95009 seconds

rcurtin

I'm not yet 100% sure everything here is correct (I am out of time for now, will come back later), but I think the general idea works and the speedup is nice. I have some structure and cleanliness comments; let me know if I can clarify them. 👍

rcurtin · 2024-08-29T21:22:41Z

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp

 {
  // Nothing to do.
 }
-


No need to remove the empty line.

rcurtin · 2024-08-29T21:23:55Z

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp

+        {
+          const double halfDist = dist / 2.0;
+          localMinClusterDistances(i) = std::min(localMinClusterDistances(i), halfDist);
+          localMinClusterDistances(j) = std::min(localMinClusterDistances(j), halfDist);


Can you explain the reasoning behind these changes? I don't umderstand the purpose or how this improves the implementation. Unless I have missed something, this makes the algorithm significantly incorrect.

rcurtin · 2024-08-29T21:26:04Z

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp

      }
-      else if (dist < lowerBounds(i))
+
+      // The bounds failed. So test against all other clusters.


The original comment said significantly more than this for a reason. Please don't remove comments unless there is a very good reason.

rcurtin · 2024-08-29T21:26:26Z

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp


-  // Normalize centroids and calculate cluster movement (contains parts of
-  // Move-Centers() and Update-Bounds()).
+  // Normalize centroids and calculate cluster movement


There is no reason to change this comment.

rcurtin · 2024-08-29T21:27:52Z

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp

    // Calculate movement.
-    const double movement = distance.Evaluate(centroids.col(c),
-                                              newCentroids.col(c));
+    const double movement = std::sqrt(arma::sum(arma::square(centroids.col(c) - newCentroids.col(c))));


This is incorrect for any distance metric that is not the Euclidean distance. Please revert to the original implementation (even if it is slower due to the issue with arma::norm() that was discovered in the naive k-means PR.

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp

rcurtin · 2024-08-29T21:28:52Z

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp


-  // Now update bounds (lines 3-8 of Update-Bounds()).
-  for (size_t i = 0; i < dataset.n_cols; ++i)
+  // Now update bounds


Revert the comment please. It is extremely important for maintenance to be able to connect the code to the parts of the original paper.

rcurtin · 2024-08-29T21:29:56Z

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp

+      #pragma omp critical
+      {
+        Log::Warn << "Invalid assignment for point " << i << std::endl;
+      }


But can this ever happen? I don't think that it can (at least it couldn't in the original imementation).

rcurtin · 2024-08-29T21:30:19Z

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp


  return std::sqrt(centroidMovement);
 }
-


No need to remove the empty line separating the end of the function from the closing of the namespace.

rcurtin · 2024-08-29T21:30:31Z

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp

 } // namespace mlpack

-#endif
+#endif


No need to remove the newline from the end of the file.

MarkFischinger · 2024-08-31T00:24:43Z

@rcurtin I simplified the minClusterDistance update logic, made sure to use openmp more optimized by, for example, moving the centroid normalization and movement calculation into a parallel loop and fixed the calculation issue pointed out by you.

Before the changes these were the execution time results:

DatasetSize,Time
1000,0.00677065
10000,0.148302
100000,1.62585
1000000,17.6833

After:

DatasetSize,Time
1000,0.004081
10000,0.0428959
100000,1.04919
1000000,3.32927

tested with this benchmark script.

I also created this graph:

rcurtin

Awesome, thanks for the cleanups @MarkFischinger. I think this one is basically ready to go but I had a thought about custom reductions that I want to see what you think about. If you can address or respond to the handful of comments I've left, I think we can get this merged shortly. 👍

rcurtin · 2024-08-31T14:30:16Z

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp

+    arma::mat threadNewCentroids(centroids.n_rows, centroids.n_cols,
+      arma::fill::zeros);


Suggested change

arma::mat threadNewCentroids(centroids.n_rows, centroids.n_cols,

arma::fill::zeros);

arma::mat threadNewCentroids(centroids.n_rows, centroids.n_cols);

(Armadillo defaults to fill::zeros, so we don't need to include it here.)

rcurtin · 2024-08-31T14:30:25Z

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp

-    if (upperBounds(i) <= m)
+    arma::mat threadNewCentroids(centroids.n_rows, centroids.n_cols,
+      arma::fill::zeros);
+    arma::Col<size_t> threadCounts(centroids.n_cols, arma::fill::zeros);


Suggested change

arma::Col<size_t> threadCounts(centroids.n_cols, arma::fill::zeros);

arma::Col<size_t> threadCounts(centroids.n_cols);

rcurtin · 2024-08-31T14:35:39Z

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp

  }

-  for (size_t i = 0; i < dataset.n_cols; ++i)
+  #pragma omp parallel


I was thinking to myself, "wouldn't it be nice if we could just use #pragma omp parallel for reduction(+: centroids) or something like this?" That would be much nicer code-wise than making the threadNewCentroids object and threadCounts objects and then updating them. Then I read about OpenMP custom reductions:

https://stackoverflow.com/questions/59034114/custom-omp-reduction-on-stdmaps

Do you want to give that a try and see if it works? Don't feel obligated, we could also do it in another PR or another time. But it could be nice for simplifying not just this code but other code that uses similar patterns for the other k-means strategies. I guess that we could define Armadillo-specific custom reductions in some core header file, and then just use it where relevant.

rcurtin · 2024-08-31T14:37:41Z

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp

+      arma::fill::zeros);
+    arma::Col<size_t> threadCounts(centroids.n_cols, arma::fill::zeros);
+    size_t threadHamerlyPruned = 0;
+    size_t threadDistanceCalculations = 0;


I think for these two variables you could directly use an OpenMP reduction? Let me know if I overlooked something and there's a better reason to do it this way.

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp

rcurtin · 2024-08-31T14:40:49Z

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp

+
+        const double dist = distance.Evaluate(dataset.col(i), centroids.col(c));
+
+        // Is this a better cluster?  At this point, upperBounds[i] = d(i, c(i))


Suggested change

// Is this a better cluster? At this point, upperBounds[i] = d(i, c(i))

// Is this a better cluster? At this point, upperBounds[i] = d(i, c(i)).

The original period seems to have gotten lost 😄

i removed it on purpose because it is above the 80 char limit, due to the indent, it is the 81st char. i thought removing it would be better than breaking the line and causing a larger diff size. let me know what is best here :)

rcurtin · 2024-08-31T14:42:05Z

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp

+  #pragma omp parallel for reduction(+:distanceCalculations,centroidMovement) \
+    schedule(static)


Suggested change

#pragma omp parallel for reduction(+:distanceCalculations,centroidMovement) \

schedule(static)

#pragma omp parallel for reduction(+: distanceCalculations, centroidMovement) \

schedule(static)

Just some small pedantic style fixes. 👍

MarkFischinger · 2024-08-31T20:02:12Z

@rcurtin Reading the stackoverflow comment I implemented the idea of custom reductions. The code is now cleaner and has similar/slightly better performance:

Running benchmark for 1000 points...
Dataset size: 1000, Time: 0.00392151 seconds
Running benchmark for 10000 points...
Dataset size: 10000, Time: 0.0348763 seconds
Running benchmark for 100000 points...
Dataset size: 100000, Time: 0.352331 seconds
Running benchmark for 1000000 points...
Dataset size: 1000000, Time: 3.49524 seconds
Benchmark completed. Results saved to hamerly_kmeans_benchmark.csv

rcurtin · 2024-09-01T01:33:15Z

Awesome that the custom reduction worked! That really simplifies the changes. Do you think you would be up for opening a PR to make that change to use a custom reduction in the naive strategy too? It should be a quick simplification (and I guess provide a slight amount of speedup too, given your results here).

rcurtin

Nice! I think this is just about ready. Do you want to add a note in HISTORY.md about the OpenMP k-means improvements? You could reference this PR and the naive k-means PR, and then we could update it as we get the other PRs in. 👍

I'm excited about getting this merged, the speedups for this one are really nice. Once you handle the couple of comments from my side everything seems good to merge. 🚀

rcurtin · 2024-09-01T01:31:42Z

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp


+// Custom reduction for arma::mat
+#pragma omp declare reduction(matAdd : arma::mat : omp_out += omp_in) \
+    initializer(omp_priv = arma::mat(omp_orig.n_rows, omp_orig.n_cols).zeros())


I think the .zeros() is unnecessary here since Armadillo will automatically initialize the matrix to zeros in recent versions. (That should provide a little additional speedup too.)

Also, some other little thoughts. This is really nice, so it would be great if we could make it available to other techniques. Can you move it to a new header file that we can put in, say, mlpack/core/util/ and include in mlpack/base.hpp? I guess you could call it omp_reductions.hpp or something like that. I'm not totally tied to that so if you think there is a better place that works too. But basically the idea is just that these reductions can be available for any mlpack technique.

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp

rcurtin · 2024-09-01T01:36:59Z

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp

    ++counts(assignments[i]);
  }

+  distanceCalculations += threadDistanceCalculations;


If we're reducing threadDistanceCalculations, then couldn't we just use distanceCalculations directly?

rcurtin

Thanks for handling the issues! It looks good to me now. There are a couple other tiny comments I have, all very minor. I can handle them before merge if you don't get to it, it's no big deal either way.

HISTORY.md

src/mlpack/core/util/omp_reductions.hpp

src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp

Co-authored-by: Ryan Curtin <[email protected]>

github-actions

Second approval provided automatically after 24 hours. 👍

MarkFischinger and others added 3 commits July 10, 2024 13:06

opt: hamerly kmeans openmp

4058d18

remove break

e8b669f

remove #include openmp

bce3be6

rcurtin reviewed Jul 10, 2024

View reviewed changes

MarkFischinger added 6 commits July 15, 2024 17:04

updates

0633448

updates

a1674c5

pull: removed openmp include

774841b

removed space

7139673

fix

d565b58

fix and optimization

8fa6866

MarkFischinger added 3 commits July 22, 2024 22:18

fix

4778f7b

fix and optimizations

38a696d

fix

e5b8d5f

rcurtin reviewed Aug 29, 2024

View reviewed changes

MarkFischinger added 2 commits August 31, 2024 01:47

address comments: fix optimization

21d2ad8

simplified localNewCentroids

e168cc2

line length

009b44d

rcurtin reviewed Aug 31, 2024

View reviewed changes

openMP custom reductions and typo fixes

cf1bd08

rcurtin reviewed Sep 1, 2024

View reviewed changes

MarkFischinger and others added 3 commits September 1, 2024 17:07

updated history.md

dfac6cd

created new omp custom reduction

ca5ffb6

Merge branch 'master' into opt/kmeans-hamerly

92379f5

rcurtin approved these changes Sep 1, 2024

View reviewed changes

rcurtin mentioned this pull request Sep 1, 2024

Optimize Elkan K-means with OpenMP #3764

Merged

Update src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp

1a2fba2

Co-authored-by: Ryan Curtin <[email protected]>

MarkFischinger and others added 4 commits September 1, 2024 23:45

Update src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp

554c25a

Co-authored-by: Ryan Curtin <[email protected]>

Update src/mlpack/methods/kmeans/hamerly_kmeans_impl.hpp

977ed5b

Co-authored-by: Ryan Curtin <[email protected]>

Update omp_reductions.hpp

9d9526c

Update HISTORY.md

65acf07

github-actions bot approved these changes Sep 3, 2024

View reviewed changes

shrit approved these changes Sep 3, 2024

View reviewed changes

shrit merged commit 620a770 into mlpack:master Sep 3, 2024

This was referenced Sep 16, 2024

Release version 4.4.1 #3797

Closed

Release version 4.5.0 #3798

Merged

		arma::mat threadNewCentroids(centroids.n_rows, centroids.n_cols,
		arma::fill::zeros);

	arma::Col<size_t> threadCounts(centroids.n_cols, arma::fill::zeros);
	arma::Col<size_t> threadCounts(centroids.n_cols);


		const double dist = distance.Evaluate(dataset.col(i), centroids.col(c));

		// Is this a better cluster? At this point, upperBounds[i] = d(i, c(i))

		#pragma omp parallel for reduction(+:distanceCalculations,centroidMovement) \
		schedule(static)

Uh oh!

Improve Hamerly K-Means with OpenMP Implementation #3761

Improve Hamerly K-Means with OpenMP Implementation #3761

Uh oh!

Conversation

MarkFischinger commented Jul 10, 2024

Uh oh!

rcurtin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MarkFischinger commented Jul 22, 2024

Uh oh!

rcurtin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MarkFischinger commented Aug 31, 2024

Uh oh!

rcurtin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MarkFischinger commented Aug 31, 2024

Uh oh!

rcurtin commented Sep 1, 2024

Uh oh!

rcurtin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rcurtin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!