-
-
Notifications
You must be signed in to change notification settings - Fork 26.7k
Description
Birch doesn't perform inplace operations (at least not on the input array), so the copy parameter is useless and should be deprecated. It's even detrimental because by default it makes a copy.
The only place where an inplace operation happens is in the update method of _CFSubcluster:
scikit-learn/sklearn/cluster/_birch.py
Lines 315 to 320 in 11e8c21
| def update(self, subcluster): | |
| self.n_samples_ += subcluster.n_samples_ | |
| self.linear_sum_ += subcluster.linear_sum_ | |
| self.squared_sum_ += subcluster.squared_sum_ | |
| self.centroid_ = self.linear_sum_ / self.n_samples_ | |
| self.sq_norm_ = np.dot(self.centroid_, self.centroid_) |
However, update is call in 2 places. The first one is in the _split_node function, but here we first create 2 new _CFSubcluster objects and so the update performs inplace operations on newly created data, so the input data is not modified. The second one is in the insert_cf_subcluster method of _CFNode but is only triggered if the subcluster has a child, which can only come from splitted subclusters (i.e. after _split_node), so again we're not modifying the input data.