DBSCAN memory consumption

Hi,

I am using DBSCAN to cluster some points of mine and I run into some possibly memory related issues. If I try to cluster something in the ballpark of 50k two-dimensional points with the haversine metric, I am running out of memory on a machine with 16GB, and I am not sure that this is supposed to happen. I mean, 50k isn't that many points, or am I wrong?

For the last two times I tried I also got this line repeated over and over again:

```
File "sklearn/cluster/_dbscan_inner.pyx", line 14, in sklearn.cluster._dbscan_inner.push (sklearn/cluster/_dbscan_inner.cpp:1243) 
```

Followed by

```
MemoryError: std::bad_alloc 
```

I call DBSCAN like this:

```
eps = 50 / (1000*6378.137)
db = DBSCAN(eps=eps, min_samples=3,  algorithm='ball_tree', metric='haversine').fit(x)
```

I am on Python 2.7 and scikit-learn 0.16.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DBSCAN memory consumption #5275

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

DBSCAN memory consumption #5275

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions