Intel(R) Extension for Scikit-learn*

Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application. The acceleration is achieved through the use of the Intel(R) oneAPI Data Analytics Library (oneDAL). Patching scikit-learn makes it a well-suited machine learning framework for dealing with real-life problems.

⚠️Intel(R) Extension for Scikit-learn contains scikit-learn patching functionality that was originally available in daal4py package. All future updates for the patches will be available only in Intel(R) Extension for Scikit-learn. We recommend you to use scikit-learn-intelex package instead of daal4py. You can learn more about daal4py in daal4py documentation.

Running the latest scikit-learn test suite with Intel(R) Extension for Scikit-learn:

👀 Follow us on Medium

We publish blogs on Medium, so follow us to learn tips and tricks for more efficient data analysis the help of Intel(R) Extension for Scikit-learn. Here are our latest blogs:

🔗 Important links

💬 Support

Report issues, ask questions, and provide suggestions using:

You may reach out to project maintainers privately at [email protected]

🛠 Installation

Intel(R) Extension for Scikit-learn is available at the Python Package Index, on Anaconda Cloud in Conda-Forge channel and in Intel channel.

# PyPi (recommended by default)
pip install scikit-learn-intelex

# Anaconda Cloud from Conda-Forge channel (recommended for conda users by default)
conda install scikit-learn-intelex -c conda-forge

# Anaconda Cloud from Intel channel (recommended for Intel® Distribution for Python users)
conda install scikit-learn-intelex -c intel

[Click to expand] ℹ️ Supported configurations

📦 PyPi channel

OS / Python version	Python 3.6	Python 3.7	Python 3.8	Python 3.9
Linux	[CPU, GPU]	[CPU, GPU]	[CPU, GPU]	❌
Windows	[CPU, GPU]	[CPU, GPU]	[CPU, GPU]	❌
OsX	[CPU]	[CPU]	[CPU]	❌

📦 Anaconda Cloud: Conda-Forge channel

OS / Python version	Python 3.6	Python 3.7	Python 3.8	Python 3.9
Linux	[CPU]	[CPU]	[CPU]	[CPU]
Windows	[CPU]	[CPU]	[CPU]	[CPU]
OsX	[CPU]	[CPU]	[CPU]	[CPU]

📦 Anaconda Cloud: Intel channel

OS / Python version	Python 3.6	Python 3.7	Python 3.8	Python 3.9
Linux	[CPU, GPU]	[CPU, GPU]	[CPU, GPU]	❌
Windows	[CPU, GPU]	[CPU, GPU]	[CPU, GPU]	❌
OsX	[CPU]	[CPU]	[CPU]	❌

You can build the package from sources as well.

⚡️ Get Started

Intel CPU optimizations patching

import numpy as np
from sklearnex import patch_sklearn
patch_sklearn()

from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
              [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
clustering = DBSCAN(eps=3, min_samples=2).fit(X)

Intel GPU optimizations patching

import numpy as np
from sklearnex import patch_sklearn
from daal4py.oneapi import sycl_context
patch_sklearn()

from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
              [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
with sycl_context("gpu"):
    clustering = DBSCAN(eps=3, min_samples=2).fit(X)

🚀 Scikit-learn patching

Speedups of Intel(R) Extension for Scikit-learn over the original Scikit-learn

Technical details: float type: float64; HW: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz, 2 sockets, 28 cores per socket; SW: scikit-learn 0.23.1, Intel® oneDAl (2021.1 Beta 10), benchmark code

Intel(R) Extension for Scikit-learn patching affects performance of specific Scikit-learn functionality listed below. In cases when unsupported parameters are used, the package fallbacks into original Scikit-learn. These limitations described below. If the patching does not cover your scenarios, submit an issue on GitHub.

[Click to expand] 🔥 Applying the patching will impact the following existing scikit-learn algorithms:

Task	Functionality	Parameters support	Data support
Classification	SVC	All parameters except `kernel` = 'poly' and 'sigmoid'.	No limitations.
	RandomForestClassifier	All parameters except `warmstart` = True and `cpp_alpha` != 0, `criterion` != 'gini'.	Multi-output and sparse data is not supported.
	KNeighborsClassifier	All parameters except `metric` != 'euclidean' or `minkowski` with `p` != 2.	Multi-output and sparse data is not supported.
	LogisticRegression / LogisticRegressionCV	All parameters except `solver` != 'lbfgs' or 'newton-cg', `class_weight` != None, `sample_weight` != None.	Only dense data is supported.
Regression	RandomForestRegressor	All parameters except `warmstart` = True and `cpp_alpha` != 0, `criterion` != 'mse'.	Multi-output and sparse data is not supported.
	KNeighborsRegressor	All parameters except `metric` != 'euclidean' or `minkowski` with `p` != 2.	Sparse data is not supported.
	LinearRegression	All parameters except `normalize` != False and `sample_weight` != None.	Only dense data is supported, `#observations` should be >= `#features`.
	Ridge	All parameters except `normalize` != False, `solver` != 'auto' and `sample_weight` != None.	Only dense data is supported, `#observations` should be >= `#features`.
	ElasticNet	All parameters except `sample_weight` != None.	Multi-output and sparse data is not supported, `#observations` should be >= `#features`.
	Lasso	All parameters except `sample_weight` != None.	Multi-output and sparse data is not supported, `#observations` should be >= `#features`.
Clustering	KMeans	All parameters except `precompute_distances` and `sample_weight` != None.	No limitations.
	DBSCAN	All parameters except `metric` != 'euclidean' or `minkowski` with `p` != 2, `algorithm` != `brute` or `auto` .	Only dense data is supported.
Dimensionality reduction	PCA	All parameters except `svd_solver` != 'full'.	No limitations.
	TSNE	All parameters except `metric` != 'euclidean' or `minkowski` with `p` != 2.	Sparse data is not supported.
Unsupervised	NearestNeighbors	All parameters except `metric` != 'euclidean' or `minkowski` with `p` != 2.	Sparse data is not supported.
Other	train_test_split	All parameters are supported.	Only dense data is supported.
	assert_all_finite	All parameters are supported.	Only dense data is supported.
	pairwise_distance	With `metric`='cosine' and 'correlation'.	Only dense data is supported.
	roc_auc_score	Parameters `average`, `sample_weight`, `max_fpr` and `multi_class` are not supported.	No limitations.

⚠️ We support optimizations for the last four versions of scikit-learn. The latest release of Intel(R) Extension for Scikit-learn 2021.2.X supports scikit-learn 0.21.X, 0.22.X, 0.23.X and 0.24.X.