Skip to content

fschlimb/daal4py

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

949 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Intel(R) Extension for Scikit-learn*

Build Status Coverity Scan Build Status Join the community on GitHub Discussions PyPI Version Conda Version

Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application. The acceleration is achieved through the use of the Intel(R) oneAPI Data Analytics Library (oneDAL). Patching scikit-learn makes it a well-suited machine learning framework for dealing with real-life problems.

⚠️Intel(R) Extension for Scikit-learn contains scikit-learn patching functionality that was originally available in daal4py package. All future updates for the patches will be available only in Intel(R) Extension for Scikit-learn. We recommend you to use scikit-learn-intelex package instead of daal4py. You can learn more about daal4py in daal4py documentation.

Running the latest scikit-learn test suite with Intel(R) Extension for Scikit-learn: CircleCI

👀 Follow us on Medium

We publish blogs on Medium, so follow us to learn tips and tricks for more efficient data analysis the help of Intel(R) Extension for Scikit-learn. Here are our latest blogs:

🔗 Important links

💬 Support

Report issues, ask questions, and provide suggestions using:

You may reach out to project maintainers privately at [email protected]

🛠 Installation

Intel(R) Extension for Scikit-learn is available at the Python Package Index, on Anaconda Cloud in Conda-Forge channel and in Intel channel.

# PyPi (recommended by default)
pip install scikit-learn-intelex
# Anaconda Cloud from Conda-Forge channel (recommended for conda users by default)
conda install scikit-learn-intelex -c conda-forge
# Anaconda Cloud from Intel channel (recommended for Intel® Distribution for Python users)
conda install scikit-learn-intelex -c intel
[Click to expand] ℹ️ Supported configurations

📦 PyPi channel

OS / Python version Python 3.6 Python 3.7 Python 3.8 Python 3.9
Linux [CPU, GPU] [CPU, GPU] [CPU, GPU]
Windows [CPU, GPU] [CPU, GPU] [CPU, GPU]
OsX [CPU] [CPU] [CPU]

📦 Anaconda Cloud: Conda-Forge channel

OS / Python version Python 3.6 Python 3.7 Python 3.8 Python 3.9
Linux [CPU] [CPU] [CPU] [CPU]
Windows [CPU] [CPU] [CPU] [CPU]
OsX [CPU] [CPU] [CPU] [CPU]

📦 Anaconda Cloud: Intel channel

OS / Python version Python 3.6 Python 3.7 Python 3.8 Python 3.9
Linux [CPU, GPU] [CPU, GPU] [CPU, GPU]
Windows [CPU, GPU] [CPU, GPU] [CPU, GPU]
OsX [CPU] [CPU] [CPU]

You can build the package from sources as well.

⚡️ Get Started

Intel CPU optimizations patching

import numpy as np
from sklearnex import patch_sklearn
patch_sklearn()

from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
              [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
clustering = DBSCAN(eps=3, min_samples=2).fit(X)

Intel GPU optimizations patching

import numpy as np
from sklearnex import patch_sklearn
from daal4py.oneapi import sycl_context
patch_sklearn()

from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
              [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
with sycl_context("gpu"):
    clustering = DBSCAN(eps=3, min_samples=2).fit(X)

🚀 Scikit-learn patching

Speedups of Intel(R) Extension for Scikit-learn over the original Scikit-learn
Technical details: float type: float64; HW: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz, 2 sockets, 28 cores per socket; SW: scikit-learn 0.23.1, Intel® oneDAl (2021.1 Beta 10), benchmark code

Intel(R) Extension for Scikit-learn patching affects performance of specific Scikit-learn functionality listed below. In cases when unsupported parameters are used, the package fallbacks into original Scikit-learn. These limitations described below. If the patching does not cover your scenarios, submit an issue on GitHub.

[Click to expand] 🔥 Applying the patching will impact the following existing scikit-learn algorithms:
Task Functionality Parameters support Data support
Classification SVC All parameters except kernel = 'poly' and 'sigmoid'. No limitations.
RandomForestClassifier All parameters except warmstart = True and cpp_alpha != 0, criterion != 'gini'. Multi-output and sparse data is not supported.
KNeighborsClassifier All parameters except metric != 'euclidean' or minkowski with p != 2. Multi-output and sparse data is not supported.
LogisticRegression / LogisticRegressionCV All parameters except solver != 'lbfgs' or 'newton-cg', class_weight != None, sample_weight != None. Only dense data is supported.
Regression RandomForestRegressor All parameters except warmstart = True and cpp_alpha != 0, criterion != 'mse'. Multi-output and sparse data is not supported.
KNeighborsRegressor All parameters except metric != 'euclidean' or minkowski with p != 2. Sparse data is not supported.
LinearRegression All parameters except normalize != False and sample_weight != None. Only dense data is supported, #observations should be >= #features.
Ridge All parameters except normalize != False, solver != 'auto' and sample_weight != None. Only dense data is supported, #observations should be >= #features.
ElasticNet All parameters except sample_weight != None. Multi-output and sparse data is not supported, #observations should be >= #features.
Lasso All parameters except sample_weight != None. Multi-output and sparse data is not supported, #observations should be >= #features.
Clustering KMeans All parameters except precompute_distances and sample_weight != None. No limitations.
DBSCAN All parameters except metric != 'euclidean' or minkowski with p != 2, algorithm != brute or auto . Only dense data is supported.
Dimensionality reduction PCA All parameters except svd_solver != 'full'. No limitations.
TSNE All parameters except metric != 'euclidean' or minkowski with p != 2. Sparse data is not supported.
Unsupervised NearestNeighbors All parameters except metric != 'euclidean' or minkowski with p != 2. Sparse data is not supported.
Other train_test_split All parameters are supported. Only dense data is supported.
assert_all_finite All parameters are supported. Only dense data is supported.
pairwise_distance With metric='cosine' and 'correlation'. Only dense data is supported.
roc_auc_score Parameters average, sample_weight, max_fpr and multi_class are not supported. No limitations.

⚠️ We support optimizations for the last four versions of scikit-learn. The latest release of Intel(R) Extension for Scikit-learn 2021.2.X supports scikit-learn 0.21.X, 0.22.X, 0.23.X and 0.24.X.

📜 Intel(R) Extension for Scikit-learn verbose

To find out which implementation of the algorithm is currently used (Intel(R) Extension for Scikit-learn or original Scikit-learn), set the environment variable:

  • On Linux and Mac OS: export SKLEARNEX_VERBOSE=INFO
  • On Windows: set SKLEARNEX_VERBOSE=INFO

For example, for DBSCAN you get one of these print statements depending on which implementation is used:

  • SKLEARNEX INFO: sklearn.cluster.DBSCAN.fit: running accelerated version on CPU
  • SKLEARNEX INFO: sklearn.cluster.DBSCAN.fit: fallback to original Scikit-learn

Read more in the documentation.

About

sources for daal4py - a convenient Python API to DAAL

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 54.1%
  • C++ 45.5%
  • Other 0.4%