ENH Parallelize gradient computation in t-SNE #13264

tomMoral · 2019-02-25T17:31:45Z

Related to #13213

Add n_jobs to t-SNE and parallelize the computation of the nearest neighbors (with threads) and the gradient (with prange).

tomMoral · 2019-02-27T09:16:08Z

Here is a small benchmark for the performance gain with the current implementation. The benchmark was run using the suite from jeremiedbb/scikit-learn_benchmarks#6. We do not have a large impact in term of peak memory and we seem to gain 30% in performance when using 2 cores.

Runtime
============ ========= ========= =========
--                 n_jobs           
------------ -----------------------------
   method        1         2         4    
============ ========= ========= =========
   exact      7.58±0s   7.47±0s   7.49±0s 
 barnes_hut   8.44±0s   5.70±0s   4.35±0s 
============ ========= ========= =========

Peak memory
============ ======= ======= =======
--                    n_jobs        
------------ -----------------------
    method       1       2       4   
============ ======= ======= =======
   exact      68.8M   68.8M   68.8M 
 barnes_hut   72.7M   74.1M   77.8M 
============ ======= ======= =======

jeremiedbb

looks good.

Needs #13297 to replace effective_n_jobs by effective_n_jobs_openmp (but don't merge it in this PR, it would make the review quite harder).

sklearn/manifold/_barnes_hut_tsne.pyx

sklearn/manifold/t_sne.py

sklearn/manifold/tests/test_t_sne.py

sklearn/neighbors/quad_tree.pxd

sklearn/neighbors/quad_tree.pyx

sklearn/manifold/t_sne.py

jnothman · 2019-05-01T07:01:23Z

This is still current, merely awaiting review, yeah?

What's new needs to be moved to 0.22.rst

tomMoral · 2019-06-19T14:46:48Z

This PR is still valid but I think it was stalled because there were no clear concensus on how to control the number of openmp threads.

I will move the whatnew.

rth · 2019-06-20T16:39:47Z

This PR is still valid but I think it was stalled because there were no clear consensus on how to control the number of openmp threads.

What's wrong with the approach of passing n_jobs to prange as done in this PR?

Maybe remove the "WIP" tag ?

doc/whats_new/v0.21.rst

jeremiedbb · 2019-06-20T16:44:12Z

What's wrong with the approach of passing n_jobs to prange as done in this PR?

prange(num_threads=n_jobs) does not work if n_jobs is None

rth · 2019-06-20T16:48:03Z

prange(num_threads=n_jobs) does not work if n_jobs is None

Doesn't work in what way?

From https://cython.readthedocs.io/en/latest/src/userguide/parallelism.html#cython.parallel.prange

cython.parallel.prange([start,] stop[, step][, nogil=False][, schedule=None[, chunksize=None]][, num_threads=None])
[..]
num_threads – The num_threads argument indicates how many threads the team should consist of. If not given, OpenMP will decide how many threads to use. Typically this is the number of cores available on the machine. However, this may be controlled through the omp_set_num_threads() function, or through the OMP_NUM_THREADS environment variable.

Though I imagine the detection of the actual number of cores is not as good as in joblib..

rth · 2019-06-20T16:49:02Z

Aww, you mean that None doesn't map to 1 but rather to -1?

jeremiedbb · 2019-06-20T17:02:14Z

I mean prange(num_threads=None) does compile but prange(num_threads=n_jobs) where n_jobs is None doesn't

rth · 2019-06-20T17:10:27Z

I mean prange(num_threads=None) does compile but prange(num_threads=n_jobs) where n_jobs is None doesn't

Possibly related to cython/cython#1957, do we want to report it upstream?

In any case don't we want to manually map n_jobs=None to num_threads=1 as num_threads=None actually means n_jobs=-1?

tomMoral · 2019-06-20T17:24:39Z

I think we were also discussing about the default value, and how to handle env variable such as OMP_NUM_THREADS and n_jobs=-1 (which is not valid).

rth · 2019-06-20T17:26:37Z

Maybe let's open an issue about this? As it would also affect all other PRs using OpenMP?

NicolasHug · 2019-06-20T18:31:27Z

+1 for opening an issue

tomMoral · 2019-09-02T15:01:41Z

This PR is stalled as it needs #14196 to be merged before we finalize it. I can make the required updates once the _openmp_effective_n_threads is merged into master.

jnothman · 2019-10-17T23:17:53Z

The dependency has been merged

jeremiedbb · 2019-10-21T08:53:49Z

@tomMoral It's been decided to use as many threads as possible by default and to not expose a n_jobs parameter for that. #14196 is meant to be used as follows:

cdef int num_threads = _openmp_effective_n_threads()
for i in prange(num_threads=num_threads):
    ....

If the prange is an inner loop of nested loops, it's more efficient to call _openmp_effective_n_threads outside the outer loop because it involves python interaction.

tomMoral · 2019-10-21T14:56:24Z

I rebased the PR on master and now use the helper to set the number of threads.
I think it is now ready for merge.

jeremiedbb

A first pass. Here are some comments.

sklearn/manifold/t_sne.py

sklearn/manifold/tests/test_t_sne.py

benchmarks/bench_tsne_mnist.py

doc/whats_new/v0.22.rst

NicolasHug

Some minor comments and questions from me.

Any chance you have access to a machine with lots of cores so we can check whether this suffers from the same performance drop as the GBDT, when the number of CPUs gets high?
Maybe @ogrisel can try on the same machine?

NicolasHug · 2019-10-22T16:38:50Z

sklearn/manifold/_barnes_hut_tsne.pyx

                            long stop,
-                            bint compute_error) nogil:
+                            bint compute_error,
+                            int num_threads) nogil:


Nit: our convention is to use n_threads

The argument in the prange is called num_threads and this is private function. I would rather keep it that way as it seems clearer to me (it is linked to *_NUM_THREADS and all the openMP naming) but if you strongly disagree, I will make the change.

sklearn/manifold/_barnes_hut_tsne.pyx

sklearn/manifold/t_sne.py

jeremiedbb

lgtm !

still needs to benchmark on a big machine (I'll do it when I have time if you haven't done it before)

sklearn/manifold/tests/test_t_sne.py

tomMoral · 2019-10-26T08:48:01Z

Here is a benchmark on a machine with 44 physical cores (88 logical cores). We can see that up to 44 threads, there is "some" improvements but there is a slow down after. Note that by default, TSNE would use 88 threads, as reported by cpu_count.

$ for i in {1,4,16,32,64,88}; do
    OMP_NUM_THREADS=$i /usr/bin/nice -5 python3 benchmarks/bench_tsne_mnist.py
done

Used number of threads: 1
PCA preprocessing down to 50 dimensions took 3.459s
Fitting sklearn TSNE on 100 samples...
Fitting sklearn TSNE on 100 samples took 0.411s in 849 iterations, nn accuracy: 0.690
Fitting sklearn TSNE on 500 samples...
Fitting sklearn TSNE on 500 samples took 2.594s in 999 iterations, nn accuracy: 0.654
Fitting sklearn TSNE on 1000 samples...
Fitting sklearn TSNE on 1000 samples took 7.097s in 999 iterations, nn accuracy: 0.642
Fitting sklearn TSNE on 5000 samples...
Fitting sklearn TSNE on 5000 samples took 37.475s in 999 iterations, nn accuracy: 0.560
Fitting sklearn TSNE on 10000 samples...
Fitting sklearn TSNE on 10000 samples took 88.579s in 999 iterations, nn accuracy: 0.506

Used number of threads: 4
PCA preprocessing down to 50 dimensions took 2.104s
Fitting sklearn TSNE on 100 samples...
Fitting sklearn TSNE on 100 samples took 0.286s in 999 iterations, nn accuracy: 0.790
Fitting sklearn TSNE on 500 samples...
Fitting sklearn TSNE on 500 samples took 1.192s in 999 iterations, nn accuracy: 0.660
Fitting sklearn TSNE on 1000 samples...
Fitting sklearn TSNE on 1000 samples took 2.481s in 999 iterations, nn accuracy: 0.634
Fitting sklearn TSNE on 5000 samples...
Fitting sklearn TSNE on 5000 samples took 14.933s in 999 iterations, nn accuracy: 0.557
Fitting sklearn TSNE on 10000 samples...
Fitting sklearn TSNE on 10000 samples took 37.520s in 999 iterations, nn accuracy: 0.515

Used number of threads: 16
PCA preprocessing down to 50 dimensions took 1.730s
Fitting sklearn TSNE on 100 samples...
Fitting sklearn TSNE on 100 samples took 0.317s in 949 iterations, nn accuracy: 0.680
Fitting sklearn TSNE on 500 samples...
Fitting sklearn TSNE on 500 samples took 1.040s in 999 iterations, nn accuracy: 0.686
Fitting sklearn TSNE on 1000 samples...
Fitting sklearn TSNE on 1000 samples took 1.909s in 999 iterations, nn accuracy: 0.633
Fitting sklearn TSNE on 5000 samples...
Fitting sklearn TSNE on 5000 samples took 10.437s in 999 iterations, nn accuracy: 0.561
Fitting sklearn TSNE on 10000 samples...
Fitting sklearn TSNE on 10000 samples took 25.551s in 999 iterations, nn accuracy: 0.510

Used number of threads: 32
PCA preprocessing down to 50 dimensions took 1.647s
Fitting sklearn TSNE on 100 samples...
Fitting sklearn TSNE on 100 samples took 0.345s in 999 iterations, nn accuracy: 0.730
Fitting sklearn TSNE on 500 samples...
Fitting sklearn TSNE on 500 samples took 0.977s in 999 iterations, nn accuracy: 0.656
Fitting sklearn TSNE on 1000 samples...
Fitting sklearn TSNE on 1000 samples took 1.825s in 999 iterations, nn accuracy: 0.625
Fitting sklearn TSNE on 5000 samples...
Fitting sklearn TSNE on 5000 samples took 10.254s in 999 iterations, nn accuracy: 0.563
Fitting sklearn TSNE on 10000 samples...
Fitting sklearn TSNE on 10000 samples took 22.869s in 999 iterations, nn accuracy: 0.510

Used number of threads: 64
PCA preprocessing down to 50 dimensions took 1.802s
Fitting sklearn TSNE on 100 samples...
Fitting sklearn TSNE on 100 samples took 0.408s in 999 iterations, nn accuracy: 0.710
Fitting sklearn TSNE on 500 samples...
Fitting sklearn TSNE on 500 samples took 1.323s in 999 iterations, nn accuracy: 0.662
Fitting sklearn TSNE on 1000 samples...
Fitting sklearn TSNE on 1000 samples took 1.934s in 999 iterations, nn accuracy: 0.624
Fitting sklearn TSNE on 5000 samples...
Fitting sklearn TSNE on 5000 samples took 10.913s in 999 iterations, nn accuracy: 0.560
Fitting sklearn TSNE on 10000 samples...
Fitting sklearn TSNE on 10000 samples took 27.168s in 999 iterations, nn accuracy: 0.504

Used number of threads: 88
PCA preprocessing down to 50 dimensions took 1.761s
Fitting sklearn TSNE on 100 samples...
Fitting sklearn TSNE on 100 samples took 0.819s in 999 iterations, nn accuracy: 0.720
Fitting sklearn TSNE on 500 samples...
Fitting sklearn TSNE on 500 samples took 1.953s in 999 iterations, nn accuracy: 0.676
Fitting sklearn TSNE on 1000 samples...
Fitting sklearn TSNE on 1000 samples took 2.960s in 999 iterations, nn accuracy: 0.628
Fitting sklearn TSNE on 5000 samples...
Fitting sklearn TSNE on 5000 samples took 13.847s in 999 iterations, nn accuracy: 0.558
Fitting sklearn TSNE on 10000 samples...
Fitting sklearn TSNE on 10000 samples took 29.670s in 999 iterations, nn accuracy: 0.508

tomMoral · 2019-10-26T08:59:02Z

From master on the same machine:

/usr/bin/nice -5 python3 benchmarks/bench_tsne_mnist.py
PCA preprocessing down to 50 dimensions took 1.852s
Fitting sklearn TSNE on 100 samples...
Fitting sklearn TSNE on 100 samples took 0.355s in 999 iterations, nn accuracy: 0.690
Fitting sklearn TSNE on 500 samples...
Fitting sklearn TSNE on 500 samples took 2.236s in 999 iterations, nn accuracy: 0.658
Fitting sklearn TSNE on 1000 samples...
Fitting sklearn TSNE on 1000 samples took 5.367s in 999 iterations, nn accuracy: 0.638
Fitting sklearn TSNE on 5000 samples...
Fitting sklearn TSNE on 5000 samples took 33.667s in 999 iterations, nn accuracy: 0.560
Fitting sklearn TSNE on 10000 samples...
Fitting sklearn TSNE on 10000 samples took 83.715s in 999 iterations, nn accuracy: 0.514

NicolasHug

Nits but LGTM!

doc/whats_new/v0.22.rst

sklearn/manifold/_barnes_hut_tsne.pyx

tomMoral · 2019-10-30T13:02:08Z

I fixed the last comments, let me know if more clean up are needed.

NicolasHug · 2019-10-30T14:17:18Z

Thanks @tomMoral !

tomMoral changed the title ~~ENH add prange for gradient computation in t-SNE~~ [WIP] ENH add prange for gradient computation in t-SNE Feb 26, 2019

jeremiedbb reviewed Feb 27, 2019

View reviewed changes

sklearn/manifold/t_sne.py Show resolved Hide resolved

tomMoral mentioned this pull request Apr 29, 2019

Feature request: Parallel/Multicore implementation of t-SNE #10023

Closed

rth reviewed Jun 20, 2019

View reviewed changes

doc/whats_new/v0.21.rst Outdated Show resolved Hide resolved

jeremiedbb mentioned this pull request Jun 26, 2019

MNT New helper for effective number of OpenMP threads #14196

Merged

amueller added the Needs Decision Requires decision label Aug 6, 2019

tomMoral added 7 commits October 21, 2019 15:00

ENH add prange in t-SNE

d78fbd6

FIX prange for t-SNE

8c29a3e

DOC add what's new entry

75d5485

CLN comments from jeremiedbb

e5345fd

CLN improve t-SNE tests and params

17d013b

MTN merge master

0fe5039

ENH set num_threads with openmp helper

581b3b5

tomMoral force-pushed the ENH_prange_tsne branch from f9d3e0d to 581b3b5 Compare October 21, 2019 14:01

tomMoral changed the title ~~[WIP] ENH add prange for gradient computation in t-SNE~~ [MRG] ENH add prange for gradient computation in t-SNE Oct 21, 2019

rth added Waiting for Reviewer and removed Needs Decision Requires decision labels Oct 21, 2019

CLN correct test description comment

f73cd1b

jeremiedbb reviewed Oct 21, 2019

View reviewed changes

tomMoral added 2 commits October 21, 2019 18:00

CLN first batch of comments

618ce11

BENCH remove n_jobs = 4

54550a3

NicolasHug reviewed Oct 22, 2019

View reviewed changes

tomMoral added 3 commits October 22, 2019 23:47

Merge branch 'master' into ENH_prange_tsne

af6b73e

CLN comments from jdbb and hg

898252c

Merge branch 'master' into ENH_prange_tsne

1019554

jeremiedbb approved these changes Oct 25, 2019

View reviewed changes

sklearn/manifold/tests/test_t_sne.py Outdated Show resolved Hide resolved

TST improve test readability

21cf4e1

BENCH report number of threads

70a9ea8

FIX failing log file

6ef07f9

thomasjpfan added this to the 0.22 milestone Oct 26, 2019

NicolasHug approved these changes Oct 28, 2019

View reviewed changes

doc/whats_new/v0.22.rst Outdated Show resolved Hide resolved

sklearn/manifold/_barnes_hut_tsne.pyx Show resolved Hide resolved

CLN last comments

559231d

NicolasHug changed the title ~~[MRG] ENH add prange for gradient computation in t-SNE~~ ENH Parallelize gradient computation in t-SNE Oct 30, 2019

NicolasHug merged commit 908ded8 into scikit-learn:master Oct 30, 2019

tomMoral deleted the ENH_prange_tsne branch October 30, 2019 14:34

dkobak mentioned this pull request Mar 9, 2021

Speed issues in t-SNE #10044

Closed

Uh oh!

ENH Parallelize gradient computation in t-SNE #13264

ENH Parallelize gradient computation in t-SNE #13264

Uh oh!

Conversation

tomMoral commented Feb 25, 2019

Uh oh!

tomMoral commented Feb 27, 2019

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jnothman commented May 1, 2019

Uh oh!

tomMoral commented Jun 19, 2019

Uh oh!

rth commented Jun 20, 2019

Uh oh!

Uh oh!

jeremiedbb commented Jun 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rth commented Jun 20, 2019

Uh oh!

rth commented Jun 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeremiedbb commented Jun 20, 2019

Uh oh!

rth commented Jun 20, 2019

Uh oh!

tomMoral commented Jun 20, 2019

Uh oh!

rth commented Jun 20, 2019

Uh oh!

NicolasHug commented Jun 20, 2019

Uh oh!

tomMoral commented Sep 2, 2019

Uh oh!

jnothman commented Oct 17, 2019

Uh oh!

jeremiedbb commented Oct 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomMoral commented Oct 21, 2019

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 22, 2019

Choose a reason for hiding this comment

Uh oh!

tomMoral Oct 22, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeremiedbb commented Jun 20, 2019 •

edited

Loading

rth commented Jun 20, 2019 •

edited

Loading

jeremiedbb commented Oct 21, 2019 •

edited

Loading