Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application. The acceleration is achieved through the use of the Intel(R) oneAPI Data Analytics Library (oneDAL). Patching scikit-learn makes it a well-suited machine learning framework for dealing with real-life problems.
Running the latest scikit-learn test suite with Intel(R) Extension for Scikit-learn:
We publish blogs on Medium, so follow us to learn tips and tricks for more efficient data analysis the help of Intel(R) Extension for Scikit-learn. Here are our latest blogs:
- Intel Gives Scikit-Learn the Performance Boost Data Scientists Need
- From Hours to Minutes: 600x Faster SVM
- Improve the Performance of XGBoost and LightGBM Inference
- Accelerate Kaggle Challenges Using Intel AI Analytics Toolkit
- Accelerate Your scikit-learn Applications
- Accelerate Linear Models for Machine Learning
- Accelerate K-Means Clustering
- Documentation
- scikit-learn API and patching
- Benchmark code
- Building from Sources
- About Intel(R) oneAPI Data Analytics Library
- About Intel(R) daal4py
Report issues, ask questions, and provide suggestions using:
You may reach out to project maintainers privately at [email protected]
Intel(R) Extension for Scikit-learn is available at the Python Package Index, on Anaconda Cloud in Conda-Forge channel and in Intel channel.
# PyPi (recommended by default)
pip install scikit-learn-intelex# Anaconda Cloud from Conda-Forge channel (recommended for conda users by default)
conda install scikit-learn-intelex -c conda-forge# Anaconda Cloud from Intel channel (recommended for Intel® Distribution for Python users)
conda install scikit-learn-intelex -c intel[Click to expand] ℹ️ Supported configurations
| OS / Python version | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 |
|---|---|---|---|---|
| Linux | [CPU, GPU] | [CPU, GPU] | [CPU, GPU] | ❌ |
| Windows | [CPU, GPU] | [CPU, GPU] | [CPU, GPU] | ❌ |
| OsX | [CPU] | [CPU] | [CPU] | ❌ |
| OS / Python version | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 |
|---|---|---|---|---|
| Linux | [CPU] | [CPU] | [CPU] | [CPU] |
| Windows | [CPU] | [CPU] | [CPU] | [CPU] |
| OsX | [CPU] | [CPU] | [CPU] | [CPU] |
| OS / Python version | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 |
|---|---|---|---|---|
| Linux | [CPU, GPU] | [CPU, GPU] | [CPU, GPU] | ❌ |
| Windows | [CPU, GPU] | [CPU, GPU] | [CPU, GPU] | ❌ |
| OsX | [CPU] | [CPU] | [CPU] | ❌ |
You can build the package from sources as well.
Intel CPU optimizations patching
import numpy as np
from sklearnex import patch_sklearn
patch_sklearn()
from sklearn.cluster import DBSCAN
X = np.array([[1., 2.], [2., 2.], [2., 3.],
[8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
clustering = DBSCAN(eps=3, min_samples=2).fit(X)Intel GPU optimizations patching
import numpy as np
from sklearnex import patch_sklearn
from daal4py.oneapi import sycl_context
patch_sklearn()
from sklearn.cluster import DBSCAN
X = np.array([[1., 2.], [2., 2.], [2., 3.],
[8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
with sycl_context("gpu"):
clustering = DBSCAN(eps=3, min_samples=2).fit(X)| Speedups of Intel(R) Extension for Scikit-learn over the original Scikit-learn |
|---|
![]() |
| Technical details: float type: float64; HW: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz, 2 sockets, 28 cores per socket; SW: scikit-learn 0.23.1, Intel® oneDAl (2021.1 Beta 10), benchmark code |
Intel(R) Extension for Scikit-learn patching affects performance of specific Scikit-learn functionality listed below. In cases when unsupported parameters are used, the package fallbacks into original Scikit-learn. These limitations described below. If the patching does not cover your scenarios, submit an issue on GitHub.
[Click to expand] 🔥 Applying the patching will impact the following existing scikit-learn algorithms:
| Task | Functionality | Parameters support | Data support |
|---|---|---|---|
| Classification | SVC | All parameters except kernel = 'poly' and 'sigmoid'. |
No limitations. |
| RandomForestClassifier | All parameters except warmstart = True and cpp_alpha != 0, criterion != 'gini'. |
Multi-output and sparse data is not supported. | |
| KNeighborsClassifier | All parameters except metric != 'euclidean' or minkowski with p != 2. |
Multi-output and sparse data is not supported. | |
| LogisticRegression / LogisticRegressionCV | All parameters except solver != 'lbfgs' or 'newton-cg', class_weight != None, sample_weight != None. |
Only dense data is supported. | |
| Regression | RandomForestRegressor | All parameters except warmstart = True and cpp_alpha != 0, criterion != 'mse'. |
Multi-output and sparse data is not supported. |
| KNeighborsRegressor | All parameters except metric != 'euclidean' or minkowski with p != 2. |
Sparse data is not supported. | |
| LinearRegression | All parameters except normalize != False and sample_weight != None. |
Only dense data is supported, #observations should be >= #features. |
|
| Ridge | All parameters except normalize != False, solver != 'auto' and sample_weight != None. |
Only dense data is supported, #observations should be >= #features. |
|
| ElasticNet | All parameters except sample_weight != None. |
Multi-output and sparse data is not supported, #observations should be >= #features. |
|
| Lasso | All parameters except sample_weight != None. |
Multi-output and sparse data is not supported, #observations should be >= #features. |
|
| Clustering | KMeans | All parameters except precompute_distances and sample_weight != None. |
No limitations. |
| DBSCAN | All parameters except metric != 'euclidean' or minkowski with p != 2, algorithm != brute or auto . |
Only dense data is supported. | |
| Dimensionality reduction | PCA | All parameters except svd_solver != 'full'. |
No limitations. |
| TSNE | All parameters except metric != 'euclidean' or minkowski with p != 2. |
Sparse data is not supported. | |
| Unsupervised | NearestNeighbors | All parameters except metric != 'euclidean' or minkowski with p != 2. |
Sparse data is not supported. |
| Other | train_test_split | All parameters are supported. | Only dense data is supported. |
| assert_all_finite | All parameters are supported. | Only dense data is supported. | |
| pairwise_distance | With metric='cosine' and 'correlation'. |
Only dense data is supported. | |
| roc_auc_score | Parameters average, sample_weight, max_fpr and multi_class are not supported. |
No limitations. |
To find out which implementation of the algorithm is currently used (Intel(R) Extension for Scikit-learn or original Scikit-learn), set the environment variable:
- On Linux and Mac OS:
export SKLEARNEX_VERBOSE=INFO - On Windows:
set SKLEARNEX_VERBOSE=INFO
For example, for DBSCAN you get one of these print statements depending on which implementation is used:
SKLEARNEX INFO: sklearn.cluster.DBSCAN.fit: running accelerated version on CPUSKLEARNEX INFO: sklearn.cluster.DBSCAN.fit: fallback to original Scikit-learn
