-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Description
Description
DBSCAN seems not to use multiple processors (n_jobs argument ignored)
it looks like dbscan hands the arguments off to nearest neighbor, but NN only uses the n_jobs arguments for certain clustering types (presumably not ones that dbscan calls by default). It would be good to mention how to change input to use the n_jobs parameter, and possibly modify the default values to make it useful.
Steps/Code to Reproduce
import numpy as np
from sklearn.cluster import DBSCAN
from sklearn import metrics
from sklearn.datasets.samples_generator import make_blobs
from sklearn.preprocessing import StandardScaler
centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(n_samples=1000000, centers=centers, cluster_std=0.4,
random_state=0)
X = StandardScaler().fit_transform(X)
db = DBSCAN(eps=0.3, min_samples=10, n_jobs=-1).fit(X)
Expected Results
answer is correct but the job should be split between processors, and time consumed should be significantly less.
Actual Results
seems to run on only one processor
Versions
import platform; print(platform.platform())
Linux-3.13.0-101-generic-x86_64-with-Ubuntu-14.04-trusty
import sys; print("Python", sys.version)
Python 3.4.3 (default, Nov 17 2016, 01:08:31)
[GCC 4.8.4]
import numpy; print("NumPy", numpy.version)
NumPy 1.11.2
import scipy; print("SciPy", scipy.version)
SciPy 0.18.1
import sklearn; print("Scikit-Learn", sklearn.version)
Scikit-Learn 0.18.1