-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Closed
Description
Hi,
I am using DBSCAN to cluster some points of mine and I run into some possibly memory related issues. If I try to cluster something in the ballpark of 50k two-dimensional points with the haversine metric, I am running out of memory on a machine with 16GB, and I am not sure that this is supposed to happen. I mean, 50k isn't that many points, or am I wrong?
For the last two times I tried I also got this line repeated over and over again:
File "sklearn/cluster/_dbscan_inner.pyx", line 14, in sklearn.cluster._dbscan_inner.push (sklearn/cluster/_dbscan_inner.cpp:1243)
Followed by
MemoryError: std::bad_alloc
I call DBSCAN like this:
eps = 50 / (1000*6378.137)
db = DBSCAN(eps=eps, min_samples=3, algorithm='ball_tree', metric='haversine').fit(x)
I am on Python 2.7 and scikit-learn 0.16.
mratsim, atthom, Mazochi and brendanartley
Metadata
Metadata
Assignees
Labels
No labels