Scikit-Learn Tutorial: How to Install & Scikit-Learn Examples
ฮคฮน ฮตฮฏฮฝฮฑฮน ฯฮฟ Scikit-learn;
Scikit-ฮผฮฌฮธฮตฯฮต ฮตฮฏฮฝฮฑฮน ฮฑฮฝฮฟฮนฯฯฮฟฯ ฮบฯฮดฮนฮบฮฑ Python ฮฒฮนฮฒฮปฮนฮฟฮธฮฎฮบฮท ฮณฮนฮฑ ฮผฮทฯฮฑฮฝฮนฮบฮฎ ฮผฮฌฮธฮทฯฮท. ฮฅฯฮฟฯฯฮทฯฮฏฮถฮตฮน ฮฑฮปฮณฯฯฮนฮธฮผฮฟฯ ฯ ฮฑฮนฯฮผฮฎฯ ฯฯฯฯ KNN, XGBoost, random forest ฮบฮฑฮน SVM. ฮฮฏฮฝฮฑฮน ฯฯฮนฯฮผฮญฮฝฮฟ ฯฮฌฮฝฯ ฮฑฯฯ ฯฮฟ NumPy. ฮคฮฟ Scikit-learn ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮตฮฏฯฮฑฮน ฮตฯ ฯฮญฯฯ ฯฯฮฟฮฝ ฮฑฮฝฯฮฑฮณฯฮฝฮนฯฮผฯ Kaggle ฮบฮฑฮธฯฯ ฮบฮฑฮน ฯฮต ฮตฮพฮญฯฮฟฯ ฯฮตฯ ฮตฯฮฑฮนฯฮตฮฏฮตฯ ฯฮตฯฮฝฮฟฮปฮฟฮณฮฏฮฑฯ. ฮฮฟฮทฮธฮฌ ฯฯฮทฮฝ ฯฯฮฟฮตฯฮตฮพฮตฯฮณฮฑฯฮฏฮฑ, ฯฮท ฮผฮตฮฏฯฯฮท ฮดฮนฮฑฯฯฮฌฯฮตฯฮฝ (ฮตฯฮนฮปฮฟฮณฮฎ ฯฮฑฯฮฑฮผฮญฯฯฯฮฝ), ฯฮทฮฝ ฯฮฑฮพฮนฮฝฯฮผฮทฯฮท, ฯฮทฮฝ ฯฮฑฮปฮนฮฝฮดฯฯฮผฮทฯฮท, ฯฮทฮฝ ฮฟฮผฮฑฮดฮฟฯฮฟฮฏฮทฯฮท ฮบฮฑฮน ฯฮทฮฝ ฮตฯฮนฮปฮฟฮณฮฎ ฮผฮฟฮฝฯฮญฮปฮฟฯ .
ฮคฮฟ Scikit-learn ฮญฯฮตฮน ฯฮทฮฝ ฮบฮฑฮปฯฯฮตฯฮท ฯฮตฮบฮผฮทฯฮฏฯฯฮท ฮฑฯฯ ฯฮปฮตฯ ฯฮนฯ ฮฒฮนฮฒฮปฮนฮฟฮธฮฎฮบฮตฯ ฮฑฮฝฮฟฮนฯฯฮฟฯ ฮบฯฮดฮนฮบฮฑ. ฮฃฮฑฯ ฯฮฑฯฮญฯฮตฮน ฮญฮฝฮฑ ฮดฮนฮฑฮดฯฮฑฯฯฮนฮบฯ ฮณฯฮฌฯฮทฮผฮฑ ฯฯฮฟ https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html.

ฮคฮฟ Scikit-learn ฮดฮตฮฝ ฮตฮฏฮฝฮฑฮน ฯฮฟฮปฯ ฮดฯฯฮบฮฟฮปฮฟ ฯฯฮท ฯฯฮฎฯฮท ฮบฮฑฮน ฯฮฑฯฮญฯฮตฮน ฮตฮพฮฑฮนฯฮตฯฮนฮบฮฌ ฮฑฯฮฟฯฮตฮปฮญฯฮผฮฑฯฮฑ. ฮฉฯฯฯฯฮฟ, ฯฮฟ scikit Learn ฮดฮตฮฝ ฯ ฯฮฟฯฯฮทฯฮฏฮถฮตฮน ฯฮฑฯฮฌฮปฮปฮทฮปฮฟฯ ฯ ฯ ฯฮฟฮปฮฟฮณฮนฯฮผฮฟฯฯ. ฮฮฏฮฝฮฑฮน ฮดฯ ฮฝฮฑฯฯ ฮฝฮฑ ฮตฮบฯฮตฮปฮญฯฮตฯฮต ฮญฮฝฮฑฮฝ ฮฑฮปฮณฯฯฮนฮธฮผฮฟ ฮฒฮฑฮธฮนฮฌฯ ฮตฮบฮผฮฌฮธฮทฯฮทฯ ฮผฮต ฮฑฯ ฯฯ, ฮฑฮปฮปฮฌ ฮดฮตฮฝ ฮตฮฏฮฝฮฑฮน ฮท ฮฒฮญฮปฯฮนฯฯฮท ฮปฯฯฮท, ฮตฮนฮดฮนฮบฮฌ ฮฑฮฝ ฮณฮฝฯฯฮฏฮถฮตฯฮต ฯฯฯ ฮฝฮฑ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮตฮฏฯฮต ฯฮฟ TensorFlow.
ฮ ฯฯ ฮฝฮฑ ฮบฮฑฯฮตฮฒฮฌฯฮตฯฮต ฮบฮฑฮน ฮฝฮฑ ฮตฮณฮบฮฑฯฮฑฯฯฮฎฯฮตฯฮต ฯฮฟ Scikit-learn
ฮคฯฯฮฑ ฯฮต ฮฑฯ ฯฯ Python ฮฮบฮผฮฌฮธฮทฯฮท Scikit-learn, ฮธฮฑ ฮผฮฌฮธฮฟฯ ฮผฮต ฯฯฯ ฮฝฮฑ ฮบฮฑฯฮตฮฒฮฌฯฮตฯฮต ฮบฮฑฮน ฮฝฮฑ ฮตฮณฮบฮฑฯฮฑฯฯฮฎฯฮตฯฮต ฯฮฟ Scikit-learn:
ฮฯฮนฮปฮฟฮณฮฎ 1: AWS
ฮคฮฟ scikit-learn ฮผฯฮฟฯฮตฮฏ ฮฝฮฑ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮทฮธฮตฮฏ ฮผฮญฯฯ AWS. ฮฃฮฑฯ ฯฮฑฯฮฑฮบฮฑฮปฮฟฯฮผฮต ฯฮฑฯฮฑฯฮญฮผฯฯ ฮ ฮตฮนฮบฯฮฝฮฑ docker ฯฮฟฯ ฮญฯฮตฮน ฯฯฮฟฮตฮณฮบฮฑฯฮตฯฯฮทฮผฮญฮฝฮฟ ฯฮฟ scikit-learn.
ฮฮนฮฑ ฮฝฮฑ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮฎฯฮตฯฮต ฯฮทฮฝ ฮญฮบฮดฮฟฯฮท ฯฯฮฟฮณฯฮฑฮผฮผฮฑฯฮนฯฯฮฎ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮฎฯฯฮต ฯฮทฮฝ ฮตฮฝฯฮฟฮปฮฎ in Jupyter
import sys
!{sys.executable} -m pip install git+git://github.com/scikit-learn/scikit-learn.git
ฮฯฮนฮปฮฟฮณฮฎ 2: Mac ฮฎ Windows ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฯฮฝฯฮฑฯ ฯฮฟ Anaconda
ฮฮนฮฑ ฮฝฮฑ ฮผฮฌฮธฮตฯฮต ฯฯฮตฯฮนฮบฮฌ ฮผฮต ฯฮทฮฝ ฮตฮณฮบฮฑฯฮฌฯฯฮฑฯฮท ฯฮฟฯ Anaconda, ฮฑฮฝฮฑฯฯฮญฮพฯฮต https://www.guru99.com/download-install-tensorflow.html
ฮ ฯฯฯฯฮฑฯฮฑ, ฮฟฮน ฯฯฮฟฮณฯฮฑฮผฮผฮฑฯฮนฯฯฮญฯ ฯฮฟฯ scikit ฮบฯ ฮบฮปฮฟฯฯฯฮทฯฮฑฮฝ ฮผฮนฮฑ ฮญฮบฮดฮฟฯฮท ฮฑฮฝฮฌฯฯฯ ฮพฮทฯ ฯฮฟฯ ฮฑฮฝฯฮนฮผฮตฯฯฯฮฏฮถฮตฮน ฮบฮฟฮนฮฝฯ ฯฯฯฮฒฮปฮทฮผฮฑ ฯฮฟฯ ฮฑฮฝฯฮนฮผฮตฯฯฯฮฏฮถฮตฮน ฮท ฯฯฮญฯฮฟฯ ฯฮฑ ฮญฮบฮดฮฟฯฮท. ฮฯฮฎฮบฮฑฮผฮต ฯฯฮน ฮตฮฏฮฝฮฑฮน ฯฮนฮฟ ฮฒฮฟฮปฮนฮบฯ ฮฝฮฑ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮฟฯฮผฮต ฯฮทฮฝ ฮญฮบฮดฮฟฯฮท ฯฯฮฟฮณฯฮฑฮผฮผฮฑฯฮนฯฯฮฎ ฮฑฮฝฯฮฏ ฮณฮนฮฑ ฯฮทฮฝ ฯฯฮญฯฮฟฯ ฯฮฑ ฮญฮบฮดฮฟฯฮท.
ฮ ฯฯ ฮฝฮฑ ฮตฮณฮบฮฑฯฮฑฯฯฮฎฯฮตฯฮต ฯฮฟ scikit-learn ฮผฮต ฯฮฟ Conda Environment
ฮฮฌฮฝ ฮตฮณฮบฮฑฯฮฑฯฯฮฎฯฮฑฯฮต ฯฮฟ scikit-learn ฮผฮต ฯฮฟ ฯฮตฯฮนฮฒฮฌฮปฮปฮฟฮฝ conda, ฮฑฮบฮฟฮปฮฟฯ ฮธฮฎฯฯฮต ฯฮฟ ฮฒฮฎฮผฮฑ ฮณฮนฮฑ ฮตฮฝฮทฮผฮญฯฯฯฮท ฯฯฮทฮฝ ฮญฮบฮดฮฟฯฮท 0.20
ฮฮฎฮผฮฑ 1) ฮฮฝฮตฯฮณฮฟฯฮฟฮนฮฎฯฯฮต ฯฮฟ ฯฮตฯฮนฮฒฮฌฮปฮปฮฟฮฝ tensorflow
source activate hello-tf
ฮฮฎฮผฮฑ 2) ฮฯฮฑฮนฯฮญฯฯฮต ฯฮฟ scikit lean ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฯฮฝฯฮฑฯ ฯฮทฮฝ ฮตฮฝฯฮฟฮปฮฎ conda
conda remove scikit-learn
ฮฮฎฮผฮฑ 3) ฮฮณฮบฮฑฯฮฑฯฯฮฎฯฯฮต ฯฮทฮฝ ฮญฮบฮดฮฟฯฮท ฯฯฮฟฮณฯฮฑฮผฮผฮฑฯฮนฯฯฮฎ.
ฮฮณฮบฮฑฯฮฑฯฯฮฎฯฯฮต ฯฮทฮฝ ฮญฮบฮดฮฟฯฮท ฯฯฮฟฮณฯฮฑฮผฮผฮฑฯฮนฯฯฮฎ scikit Learn ฮผฮฑฮถฮฏ ฮผฮต ฯฮนฯ ฮฑฯฮฑฯฮฑฮฏฯฮทฯฮตฯ ฮฒฮนฮฒฮปฮนฮฟฮธฮฎฮบฮตฯ.
conda install -c anaconda git pip install Cython pip install h5py pip install git+git://github.com/scikit-learn/scikit-learn.git
ฮฃฮฮฮฮฮฉฮฃฮ: Windows ฮฟ ฯฯฮฎฯฯฮทฯ ฮธฮฑ ฯฯฮญฯฮตฮน ฮฝฮฑ ฮตฮณฮบฮฑฯฮฑฯฯฮฎฯฮตฮน Microsoft ฮฯฯฮนฮบฯ C++ 14. ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฯฮฟ ฯฮฌฯฮตฯฮต ฮฑฯฯ ฮตฮดฯ
ฮ ฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ Scikit-Learn ฮผฮต ฮฮทฯฮฑฮฝฮนฮบฮฎ ฮฮฌฮธฮทฯฮท
ฮฯ ฯฯ ฯฮฟ ฯฮตฮผฮนฮฝฮฌฯฮนฮฟ Scikit ฯฯฯฮฏฮถฮตฯฮฑฮน ฯฮต ฮดฯฮฟ ฮผฮญฯฮท:
- ฮฮทฯฮฑฮฝฮนฮบฮฎ ฮผฮฌฮธฮทฯฮท ฮผฮต scikit-learn
- ฮ ฯฯ ฮฝฮฑ ฮตฮผฯฮนฯฯฮตฯ ฯฮตฮฏฯฮต ฯฮฟ ฮผฮฟฮฝฯฮญฮปฮฟ ฯฮฑฯ ฮผฮต ฯฮฟ LIME
ฮคฮฟ ฯฯฯฯฮฟ ฮผฮญฯฮฟฯ ฯฮตฯฮนฮณฯฮฌฯฮตฮน ฯฮฟฮฝ ฯฯฯฯฮฟ ฮบฮฑฯฮฑฯฮบฮตฯ ฮฎฯ ฮตฮฝฯฯ ฮฑฮณฯฮณฮฟฯ, ฯฮท ฮดฮทฮผฮนฮฟฯ ฯฮณฮฏฮฑ ฮตฮฝฯฯ ฮผฮฟฮฝฯฮญฮปฮฟฯ ฮบฮฑฮน ฯฮฟฮฝ ฯฯ ฮฝฯฮฟฮฝฮนฯฮผฯ ฯฯฮฝ ฯ ฯฮตฯฯฮฑฯฮฑฮผฮญฯฯฯฮฝ, ฮตฮฝฯ ฯฮฟ ฮดฮตฯฯฮตฯฮฟ ฮผฮญฯฮฟฯ ฯฮฑฯฮญฯฮตฮน ฯฮทฮฝ ฯฮตฮปฮตฯ ฯฮฑฮฏฮฑ ฮปฮญฮพฮท ฯฮทฯ ฯฮตฯฮฝฮฟฮปฮฟฮณฮฏฮฑฯ ฯฯฮฟฮฝ ฮฑฯฮฟฯฮฌ ฯฮทฮฝ ฮตฯฮนฮปฮฟฮณฮฎ ฮผฮฟฮฝฯฮญฮปฮฟฯ .
ฮฮฎฮผฮฑ 1) ฮฮนฯฮฑฮณฮฌฮณฮตฯฮต ฯฮฑ ฮดฮตฮดฮฟฮผฮญฮฝฮฑ
ฮฮฑฯฮฌ ฯฮท ฮดฮนฮฌฯฮบฮตฮนฮฑ ฮฑฯ ฯฮฟฯ ฯฮฟฯ ฯฮตฮผฮนฮฝฮฑฯฮฏฮฟฯ ฮตฮบฮผฮฌฮธฮทฯฮทฯ Scikit, ฮธฮฑ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮตฮฏฯฮต ฯฮฟ ฯฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ ฮณฮนฮฑ ฮตฮฝฮฎฮปฮนฮบฮตฯ.
ฮฮนฮฑ ฮญฮฝฮฑ ฯ ฯฯฮฒฮฑฮธฯฮฟ ฯฮต ฮฑฯ ฯฯ ฯฮฟ ฯฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ, ฮฑฮฝฮฑฯฯฮญฮพฯฮต ฮตฮฌฮฝ ฮตฮฝฮดฮนฮฑฯฮญฯฮตฯฯฮต ฮฝฮฑ ฮผฮฌฮธฮตฯฮต ฯฮตฯฮนฯฯฯฯฮตฯฮฑ ฯฯฮตฯฮนฮบฮฌ ฮผฮต ฯฮฑ ฯฮตฯฮนฮณฯฮฑฯฮนฮบฮฌ ฯฯฮฑฯฮนฯฯฮนฮบฮฌ ฯฯฮฟฮนฯฮตฮฏฮฑ, ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮฎฯฯฮต ฯฮฑ ฮตฯฮณฮฑฮปฮตฮฏฮฑ Dive ฮบฮฑฮน Overview.
ฮ ฮฑฯฮฑฯฮญฮผฯฯ ฮฑฯ ฯฯ ฯฮฟ ฯฮตฮผฮนฮฝฮฌฯฮนฮฟ ฮผฮฌฮธฮตฯฮต ฯฮตฯฮนฯฯฯฯฮตฯฮฑ ฮณฮนฮฑ ฯฮฟ Dive and Overview
ฮฮนฯฮฌฮณฮตฯฮต ฯฮฟ ฯฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ ฮผฮต ฯฮฑ Pandas. ฮฃฮทฮผฮตฮนฯฯฯฮต ฯฯฮน ฯฯฮญฯฮตฮน ฮฝฮฑ ฮผฮตฯฮฑฯฯฮญฯฮตฯฮต ฯฮฟฮฝ ฯฯฯฮฟ ฯฯฮฝ ฯฯ ฮฝฮตฯฯฮฝ ฮผฮตฯฮฑฮฒฮปฮทฯฯฮฝ ฯฮต ฮผฮฟฯฯฮฎ float.
ฮฯ ฯฯ ฯฮฟ ฯฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ ฯฮตฯฮนฮปฮฑฮผฮฒฮฌฮฝฮตฮน ฮฟฮบฯฯ ฮบฮฑฯฮทฮณฮฟฯฮนฮบฮญฯ ฮผฮตฯฮฑฮฒฮปฮทฯฮญฯ:
ฮฮน ฮบฮฑฯฮทฮณฮฟฯฮนฮบฮญฯ ฮผฮตฯฮฑฮฒฮปฮทฯฮญฯ ฯฮฑฯฮฑฯฮฏฮธฮตฮฝฯฮฑฮน ฯฯฮฟ CATE_FEATURES
- ฯฮฌฮพฮท ฮตฯฮณฮฑฯฮฏฮฑฯ
- ฮตฮบฯฮฑฮฏฮดฮตฯ ฯฮท
- ฯฯ ฮถฯ ฮณฮนฮบฯฯ
- ฮตฯฮฌฮณฮณฮตฮปฮผฮฑ
- ฯฯฮญฯฮท
- ฮฑฮณฯฮฝฮฑฯ
- ฯฯฮปฮฟ
- ฯฯฯฮฑ ฮนฮธฮฑฮณฮญฮฝฮตฮนฮฑฯ
ฮฯฮนฯฮปฮญฮฟฮฝ, ฮญฮพฮน ฯฯ ฮฝฮตฯฮตฮฏฯ ฮผฮตฯฮฑฮฒฮปฮทฯฮญฯ:
ฮฮน ฯฯ ฮฝฮตฯฮตฮฏฯ ฮผฮตฯฮฑฮฒฮปฮทฯฮญฯ ฯฮฑฯฮฑฯฮฏฮธฮตฮฝฯฮฑฮน ฯฯฮนฯ CONTI_FEATURES
- ฯฮทฮฝ ฮทฮปฮนฮบฮฏฮฑ ฯฮฟฯ
- fnlwgt
- ฮตฮบฯฮฑฮฏฮดฮตฯ ฯฮท_ฮฑฯฮนฮธฮผ
- ฮบฮตฯฮฑฮปฮฑฮนฮฑฮบฯ ฮบฮญฯฮดฮฟฯ
- ฮฑฯฯฮปฮตฮนฮฑ_ฮบฮตฯฮฑฮปฮฑฮฏฮฟฯ
- ฯฯฮตฯ_ฮตฮฒฮดฮฟฮผฮฌฮดฮฑ
ฮฃฮทฮผฮตฮนฯฯฯฮต ฯฯฮน ฯฯ ฮผฯฮปฮทฯฯฮฝฮฟฯ ฮผฮต ฯฮท ฮปฮฏฯฯฮฑ ฮผฮต ฯฮฟ ฯฮญฯฮน, ฯฯฯฮต ฮฝฮฑ ฮญฯฮตฯฮต ฮบฮฑฮปฯฯฮตฯฮท ฮนฮดฮญฮฑ ฮณฮนฮฑ ฯฮนฯ ฯฯฮฎฮปฮตฯ ฯฮฟฯ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮฟฯฮผฮต. ฮฮฝฮฑฯ ฯฮนฮฟ ฮณฯฮฎฮณฮฟฯฮฟฯ ฯฯฯฯฮฟฯ ฮณฮนฮฑ ฮฝฮฑ ฮดฮทฮผฮนฮฟฯ ฯฮณฮฎฯฮตฯฮต ฮผฮนฮฑ ฮปฮฏฯฯฮฑ ฮบฮฑฯฮทฮณฮฟฯฮนฮบฯฮฝ ฮฎ ฯฯ ฮฝฮตฯฯฮฝ ฮตฮฏฮฝฮฑฮน ฮฝฮฑ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮฎฯฮตฯฮต:
## List Categorical
CATE_FEATURES = df_train.iloc[:,:-1].select_dtypes('object').columns
print(CATE_FEATURES)
## List continuous
CONTI_FEATURES = df_train._get_numeric_data()
print(CONTI_FEATURES)
ฮฮดฯ ฮตฮฏฮฝฮฑฮน ฮฟ ฮบฯฮดฮนฮบฮฑฯ ฮณฮนฮฑ ฯฮทฮฝ ฮตฮนฯฮฑฮณฯฮณฮฎ ฯฯฮฝ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ:
# Import dataset
import pandas as pd
## Define path data
COLUMNS = ['age','workclass', 'fnlwgt', 'education', 'education_num', 'marital',
'occupation', 'relationship', 'race', 'sex', 'capital_gain', 'capital_loss',
'hours_week', 'native_country', 'label']
### Define continuous list
CONTI_FEATURES = ['age', 'fnlwgt','capital_gain', 'education_num', 'capital_loss', 'hours_week']
### Define categorical list
CATE_FEATURES = ['workclass', 'education', 'marital', 'occupation', 'relationship', 'race', 'sex', 'native_country']
## Prepare the data
features = ['age','workclass', 'fnlwgt', 'education', 'education_num', 'marital',
'occupation', 'relationship', 'race', 'sex', 'capital_gain', 'capital_loss',
'hours_week', 'native_country']
PATH = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data"
df_train = pd.read_csv(PATH, skipinitialspace=True, names = COLUMNS, index_col=False)
df_train[CONTI_FEATURES] =df_train[CONTI_FEATURES].astype('float64')
df_train.describe()
| ฯฮทฮฝ ฮทฮปฮนฮบฮฏฮฑ ฯฮฟฯ | fnlwgt | ฮตฮบฯฮฑฮฏฮดฮตฯ ฯฮท_ฮฑฯฮนฮธฮผ | ฮบฮตฯฮฑฮปฮฑฮนฮฑฮบฯ ฮบฮญฯฮดฮฟฯ | ฮฑฯฯฮปฮตฮนฮฑ_ฮบฮตฯฮฑฮปฮฑฮฏฮฟฯ | ฯฯฮตฯ_ฮตฮฒฮดฮฟฮผฮฌฮดฮฑ | |
|---|---|---|---|---|---|---|
| ฮผฮตฯฯฮฌฮฝฮต | 32561.000000 | 3.256100e + 04 | 32561.000000 | 32561.000000 | 32561.000000 | 32561.000000 |
| ฮตฮฝฮฝฮฟฯ | 38.581647 | 1.897784e + 05 | 10.080679 | 1077.648844 | 87.303830 | 40.437456 |
| std | 13.640433 | 1.055500e + 05 | 2.572720 | 7385.292085 | 402.960219 | 12.347429 |
| ฯฯฮฑฮบฯฮนฮบฮฌ | 17.000000 | 1.228500e + 04 | 1.000000 | 0.000000 | 0.000000 | 1.000000 |
| 25% | 28.000000 | 1.178270e + 05 | 9.000000 | 0.000000 | 0.000000 | 40.000000 |
| 50% | 37.000000 | 1.783560e + 05 | 10.000000 | 0.000000 | 0.000000 | 40.000000 |
| 75% | 48.000000 | 2.370510e + 05 | 12.000000 | 0.000000 | 0.000000 | 45.000000 |
| max | 90.000000 | 1.484705e + 06 | 16.000000 | 99999.000000 | 4356.000000 | 99.000000 |
ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮตฮปฮญฮณฮพฮตฯฮต ฯฮฟฮฝ ฮฑฯฮนฮธฮผฯ ฯฯฮฝ ฮผฮฟฮฝฮฑฮดฮนฮบฯฮฝ ฯฮนฮผฯฮฝ ฯฯฮฝ ฯฮฑฯฮฑฮบฯฮทฯฮนฯฯฮนฮบฯฮฝ native_country. ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮดฮตฮฏฯฮต ฯฯฮน ฮผฯฮฝฮฟ ฮญฮฝฮฑ ฮฝฮฟฮนฮบฮฟฮบฯ ฯฮนฯ ฯฯฮฟฮญฯฯฮตฯฮฑฮน ฮฑฯฯ ฯฮทฮฝ ฮฮปฮปฮฑฮฝฮดฮฏฮฑ-ฮฮปฮปฮฑฮฝฮดฮฏฮฑ. ฮฯ ฯฯ ฯฮฟ ฮฝฮฟฮนฮบฮฟฮบฯ ฯฮนฯ ฮดฮตฮฝ ฮธฮฑ ฮผฮฑฯ ฯฮญฯฮตฮน ฮบฮฑฮผฮฏฮฑ ฯฮปฮทฯฮฟฯฮฟฯฮฏฮฑ, ฮฑฮปฮปฮฌ ฮธฮฑ ฮผฮฑฯ ฮดฯฯฮตฮน ฮญฮฝฮฑ ฮปฮฌฮธฮฟฯ ฮบฮฑฯฮฌ ฯฮท ฮดฮนฮฌฯฮบฮตฮนฮฑ ฯฮทฯ ฮตฮบฯฮฑฮฏฮดฮตฯ ฯฮทฯ.
df_train.native_country.value_counts()
United-States 29170 Mexico 643 ? 583 Philippines 198 Germany 137 Canada 121 Puerto-Rico 114 El-Salvador 106 India 100 Cuba 95 England 90 Jamaica 81 South 80 China 75 Italy 73 Dominican-Republic 70 Vietnam 67 Guatemala 64 Japan 62 Poland 60 Columbia 59 Taiwan 51 Haiti 44 Iran 43 Portugal 37 Nicaragua 34 Peru 31 France 29 Greece 29 Ecuador 28 Ireland 24 Hong 20 Cambodia 19 Trinadad&Tobago 19 Thailand 18 Laos 18 Yugoslavia 16 Outlying-US(Guam-USVI-etc) 14 Honduras 13 Hungary 13 Scotland 12 Holand-Netherlands 1 Name: native_country, dtype: int64
ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮตฮพฮฑฮนฯฮญฯฮตฯฮต ฮฑฯ ฯฮฎฮฝ ฯฮท ฮผฮท ฯฮปฮทฯฮฟฯฮฟฯฮนฮฑฮบฮฎ ฯฮตฮนฯฮฌ ฮฑฯฯ ฯฮฟ ฯฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ
## Drop Netherland, because only one row df_train = df_train[df_train.native_country != "Holand-Netherlands"]
ฮฃฯฮท ฯฯ ฮฝฮญฯฮตฮนฮฑ, ฮฑฯฮฟฮธฮทฮบฮตฯฮตฯฮต ฯฮท ฮธฮญฯฮท ฯฯฮฝ ฯฯ ฮฝฮตฯฯฮฝ ฮดฯ ฮฝฮฑฯฮฟฯฮฎฯฯฮฝ ฯฮต ฮผฮนฮฑ ฮปฮฏฯฯฮฑ. ฮฮฑ ฯฮฟ ฯฯฮตฮนฮฑฯฯฮตฮฏฯฮต ฯฯฮฟ ฮตฯฯฮผฮตฮฝฮฟ ฮฒฮฎฮผฮฑ ฮณฮนฮฑ ฯฮทฮฝ ฮบฮฑฯฮฑฯฮบฮตฯ ฮฎ ฯฮฟฯ ฮฑฮณฯฮณฮฟฯ.
ฮ ฯฮฑฯฮฑฮบฮฌฯฯ ฮบฯฮดฮนฮบฮฑฯ ฮธฮฑ ฮบฮฌฮฝฮตฮน ฮฒฯฯฯฮฟ ฯฮฌฮฝฯ ฮฑฯฯ ฯฮปฮฑ ฯฮฑ ฮฟฮฝฯฮผฮฑฯฮฑ ฯฯฮทฮปฯฮฝ ฯฯฮฟ CONTI_FEATURES ฮบฮฑฮน ฮธฮฑ ฮปฮฌฮฒฮตฮน ฯฮท ฮธฮญฯฮท ฯฮฟฯ (ฮดฮทฮป. ฯฮฟฮฝ ฮฑฯฮนฮธฮผฯ ฯฮฟฯ ) ฮบฮฑฮน ฯฯฮท ฯฯ ฮฝฮญฯฮตฮนฮฑ ฮธฮฑ ฯฮฟฮฝ ฯฯฮฟฯฮฑฯฯฮฎฯฮตฮน ฯฮต ฮผฮนฮฑ ฮปฮฏฯฯฮฑ ฯฮฟฯ ฮฟฮฝฮฟฮผฮฌฮถฮตฯฮฑฮน conti_features
## Get the column index of the categorical features
conti_features = []
for i in CONTI_FEATURES:
position = df_train.columns.get_loc(i)
conti_features.append(position)
print(conti_features)
[0, 2, 10, 4, 11, 12]
ฮ ฯฮฑฯฮฑฮบฮฌฯฯ ฮบฯฮดฮนฮบฮฑฯ ฮบฮฌฮฝฮตฮน ฯฮทฮฝ ฮฏฮดฮนฮฑ ฮดฮฟฯ ฮปฮตฮนฮฌ ฯฯฯฯ ฯฮฑฯฮฑฯฮฌฮฝฯ ฮฑฮปฮปฮฌ ฮณฮนฮฑ ฯฮทฮฝ ฮบฮฑฯฮทฮณฮฟฯฮนฮบฮฎ ฮผฮตฯฮฑฮฒฮปฮทฯฮฎ. ฮ ฯฮฑฯฮฑฮบฮฌฯฯ ฮบฯฮดฮนฮบฮฑฯ ฮตฯฮฑฮฝฮฑฮปฮฑฮผฮฒฮฌฮฝฮตฮน ฯ,ฯฮน ฮญฯฮตฯฮต ฮบฮฌฮฝฮตฮน ฯฯฮฟ ฯฮฑฯฮตฮปฮธฯฮฝ, ฮตฮบฯฯฯ ฮฑฯฯ ฯฮฑ ฮบฮฑฯฮทฮณฮฟฯฮทฮผฮฑฯฮนฮบฮฌ ฯฮฑฯฮฑฮบฯฮทฯฮนฯฯฮนฮบฮฌ.
## Get the column index of the categorical features
categorical_features = []
for i in CATE_FEATURES:
position = df_train.columns.get_loc(i)
categorical_features.append(position)
print(categorical_features)
[1, 3, 5, 6, 7, 8, 9, 13]
ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฯฮฏฮพฮตฯฮต ฮผฮนฮฑ ฮผฮฑฯฮนฮฌ ฯฯฮฟ ฯฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ. ฮฃฮทฮผฮตฮนฯฯฯฮต ฯฯฮน, ฮบฮฌฮธฮต ฮบฮฑฯฮทฮณฮฟฯฮนฮบฯ ฯฮฑฯฮฑฮบฯฮทฯฮนฯฯฮนฮบฯ ฮตฮฏฮฝฮฑฮน ฮผฮนฮฑ ฯฯ ฮผฮฒฮฟฮปฮฟฯฮตฮนฯฮฌ. ฮฮตฮฝ ฮผฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฯฯฮฟฯฮฟฮดฮฟฯฮฎฯฮตฯฮต ฮญฮฝฮฑ ฮผฮฟฮฝฯฮญฮปฮฟ ฮผฮต ฯฮนฮผฮฎ ฯฯ ฮผฮฒฮฟฮปฮฟฯฮตฮนฯฮฌฯ. ฮ ฯฮญฯฮตฮน ฮฝฮฑ ฮผฮตฯฮฑฯฯฮญฯฮตฯฮต ฯฮฟ ฯฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฯฮฝฯฮฑฯ ฮผฮนฮฑ ฮตฮนฮบฮฟฮฝฮนฮบฮฎ ฮผฮตฯฮฑฮฒฮปฮทฯฮฎ.
df_train.head(5)
ฮฃฯฮทฮฝ ฯฯฮฑฮณฮผฮฑฯฮนฮบฯฯฮทฯฮฑ, ฯฯฮญฯฮตฮน ฮฝฮฑ ฮดฮทฮผฮนฮฟฯ ฯฮณฮฎฯฮตฯฮต ฮผฮฏฮฑ ฯฯฮฎฮปฮท ฮณฮนฮฑ ฮบฮฌฮธฮต ฮฟฮผฮฌฮดฮฑ ฯฯฮฟ ฯฮฑฯฮฑฮบฯฮทฯฮนฯฯฮนฮบฯ. ฮฯฯฮนฮบฮฌ, ฮผฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮตฮบฯฮตฮปฮญฯฮตฯฮต ฯฮฟฮฝ ฯฮฑฯฮฑฮบฮฌฯฯ ฮบฯฮดฮนฮบฮฑ ฮณฮนฮฑ ฮฝฮฑ ฯ ฯฮฟฮปฮฟฮณฮฏฯฮตฯฮต ฯฮท ฯฯ ฮฝฮฟฮปฮนฮบฮฎ ฯฮฟฯฯฯฮทฯฮฑ ฯฯฮฝ ฯฯฮทฮปฯฮฝ ฯฮฟฯ ฯฯฮตฮนฮฌฮถฮตฯฯฮต.
print(df_train[CATE_FEATURES].nunique(),
'There are',sum(df_train[CATE_FEATURES].nunique()), 'groups in the whole dataset')
workclass 9 education 16 marital 7 occupation 15 relationship 6 race 5 sex 2 native_country 41 dtype: int64 There are 101 groups in the whole dataset
ฮฮปฯฮบฮปฮทฯฮฟ ฯฮฟ ฯฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ ฯฮตฯฮนฮญฯฮตฮน 101 ฮฟฮผฮฌฮดฮตฯ ฯฯฯฯ ฯฮฑฮฏฮฝฮตฯฮฑฮน ฯฮฑฯฮฑฯฮฌฮฝฯ. ฮฮนฮฑ ฯฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ, ฯฮฑ ฯฮฑฯฮฑฮบฯฮทฯฮนฯฯฮนฮบฮฌ ฯฮทฯ ฯฮฌฮพฮทฯ ฮตฯฮณฮฑฯฮฏฮฑฯ ฮญฯฮฟฯ ฮฝ ฮตฮฝฮฝฮญฮฑ ฮฟฮผฮฌฮดฮตฯ. ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮฑฯฮตฮนฮบฮฟฮฝฮฏฯฮตฯฮต ฯฮฟ ฯฮฝฮฟฮผฮฑ ฯฯฮฝ ฮฟฮผฮฌฮดฯฮฝ ฮผฮต ฯฮฟฯ ฯ ฮฑฮบฯฮปฮฟฯ ฮธฮฟฯ ฯ ฮบฯฮดฮนฮบฮฟฯฯ
ฮ ฮผฮฟฮฝฮฑฮดฮนฮบฮฎ() ฮตฯฮนฯฯฯฮญฯฮตฮน ฯฮนฯ ฮผฮฟฮฝฮฑฮดฮนฮบฮญฯ ฯฮนฮผฮญฯ ฯฯฮฝ ฮบฮฑฯฮทฮณฮฟฯฮนฮบฯฮฝ ฯฮฑฯฮฑฮบฯฮทฯฮนฯฯฮนฮบฯฮฝ.
for i in CATE_FEATURES:
print(df_train[i].unique())
['State-gov' 'Self-emp-not-inc' 'Private' 'Federal-gov' 'Local-gov' '?' 'Self-emp-inc' 'Without-pay' 'Never-worked'] ['Bachelors' 'HS-grad' '11th' 'Masters' '9th' 'Some-college' 'Assoc-acdm' 'Assoc-voc' '7th-8th' 'Doctorate' 'Prof-school' '5th-6th' '10th' '1st-4th' 'Preschool' '12th'] ['Never-married' 'Married-civ-spouse' 'Divorced' 'Married-spouse-absent' 'Separated' 'Married-AF-spouse' 'Widowed'] ['Adm-clerical' 'Exec-managerial' 'Handlers-cleaners' 'Prof-specialty' 'Other-service' 'Sales' 'Craft-repair' 'Transport-moving' 'Farming-fishing' 'Machine-op-inspct' 'Tech-support' '?' 'Protective-serv' 'Armed-Forces' 'Priv-house-serv'] ['Not-in-family' 'Husband' 'Wife' 'Own-child' 'Unmarried' 'Other-relative'] ['White' 'Black' 'Asian-Pac-Islander' 'Amer-Indian-Eskimo' 'Other'] ['Male' 'Female'] ['United-States' 'Cuba' 'Jamaica' 'India' '?' 'Mexico' 'South' 'Puerto-Rico' 'Honduras' 'England' 'Canada' 'Germany' 'Iran' 'Philippines' 'Italy' 'Poland' 'Columbia' 'Cambodia' 'Thailand' 'Ecuador' 'Laos' 'Taiwan' 'Haiti' 'Portugal' 'Dominican-Republic' 'El-Salvador' 'France' 'Guatemala' 'China' 'Japan' 'Yugoslavia' 'Peru' 'Outlying-US(Guam-USVI-etc)' 'Scotland' 'Trinadad&Tobago' 'Greece' 'Nicaragua' 'Vietnam' 'Hong' 'Ireland' 'Hungary']
ฮฯฮฟฮผฮญฮฝฯฯ, ฯฮฟ ฯฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ ฮตฮบฯฮฑฮฏฮดฮตฯ ฯฮทฯ ฮธฮฑ ฯฮตฯฮนฮญฯฮตฮน 101 + 7 ฯฯฮฎฮปฮตฯ. ฮฮน ฯฮตฮปฮตฯ ฯฮฑฮฏฮตฯ ฮตฯฯฮฌ ฯฯฮฎฮปฮตฯ ฮตฮฏฮฝฮฑฮน ฯฮฑ ฯฯ ฮฝฮตฯฮฎ ฯฮฑฯฮฑฮบฯฮทฯฮนฯฯฮนฮบฮฌ.
ฮคฮฟ Scikit-learn ฮผฯฮฟฯฮตฮฏ ฮฝฮฑ ฮฑฮฝฮฑฮปฮฌฮฒฮตฮน ฯฮท ฮผฮตฯฮฑฯฯฮฟฯฮฎ. ฮฮฏฮฝฮตฯฮฑฮน ฯฮต ฮดฯฮฟ ฮฒฮฎฮผฮฑฯฮฑ:
- ฮ ฯฯฯฮฑ, ฯฯฮญฯฮตฮน ฮฝฮฑ ฮผฮตฯฮฑฯฯฮญฯฮตฯฮต ฯฮท ฯฯ ฮผฮฒฮฟฮปฮฟฯฮตฮนฯฮฌ ฯฮต ID. ฮฮนฮฑ ฯฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ, ฯฮฟ State-gov ฮธฮฑ ฮญฯฮตฮน ฯฮฟ ID 1, Self-emp-not-inc ID 2 ฮบฮฑฮน ฮฟฯฯฯ ฮบฮฑฮธฮตฮพฮฎฯ. ฮ ฮปฮตฮนฯฮฟฯ ฯฮณฮฏฮฑ LabelEncoder ฯฮฟ ฮบฮฌฮฝฮตฮน ฮฑฯ ฯฯ ฮณฮนฮฑ ฮตฯฮฌฯ
- ฮฮตฯฮฑฯฮญฯฮตฯฮต ฮบฮฌฮธฮต ฮฑฮฝฮฑฮณฮฝฯฯฮนฯฯฮนฮบฯ ฯฮต ฮผฮนฮฑ ฮฝฮญฮฑ ฯฯฮฎฮปฮท. ฮฯฯฯ ฮฑฮฝฮฑฯฮญฯฮธฮทฮบฮต ฯฯฮฟฮทฮณฮฟฯ ฮผฮญฮฝฯฯ, ฯฮฟ ฯฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ ฮญฯฮตฮน 101 ฮฑฮฝฮฑฮณฮฝฯฯฮนฯฯฮนฮบฯ ฮฟฮผฮฌฮดฮฑฯ. ฮฯฮฟฮผฮญฮฝฯฯ, ฮธฮฑ ฯ ฯฮฌฯฯฮฟฯ ฮฝ 101 ฯฯฮฎฮปฮตฯ ฯฮฟฯ ฮธฮฑ ฮบฮฑฯฮฑฮณฯฮฌฯฮฟฯ ฮฝ ฯฮปฮตฯ ฯฮนฯ ฮฟฮผฮฌฮดฮตฯ ฯฮฑฯฮฑฮบฯฮทฯฮนฯฯฮนฮบฯฮฝ ฮบฮฑฯฮทฮณฮฟฯฮนฯฮฝ. ฮคฮฟ Scikit-learn ฮญฯฮตฮน ฮผฮนฮฑ ฮปฮตฮนฯฮฟฯ ฯฮณฮฏฮฑ ฯฮฟฯ ฮฟฮฝฮฟฮผฮฌฮถฮตฯฮฑฮน OneHotEncoder ฯฮฟฯ ฮตฮบฯฮตฮปฮตฮฏ ฮฑฯ ฯฮฎ ฯฮท ฮปฮตฮนฯฮฟฯ ฯฮณฮฏฮฑ
ฮฮฎฮผฮฑ 2) ฮฮทฮผฮนฮฟฯ ฯฮณฮฎฯฯฮต ฯฮฟ ฯฮตฯ ฯฯฮญฮฝฮฟฯ /ฮดฮฟฮบฮนฮผฯฮฝ
ฮคฯฯฮฑ ฯฮฟฯ ฯฮฟ ฯฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ ฮตฮฏฮฝฮฑฮน ฮญฯฮฟฮนฮผฮฟ, ฮผฯฮฟฯฮฟฯฮผฮต ฮฝฮฑ ฯฮฟ ฯฯฯฮฏฯฮฟฯ ฮผฮต 80/20.
80 ฯฮฟฮนฯ ฮตฮบฮฑฯฯ ฮณฮนฮฑ ฯฮฟ ฯฮตฯ ฯฯฮฟฯฯฮฝฮทฯฮทฯ ฮบฮฑฮน 20 ฯฮฟฮนฯ ฮตฮบฮฑฯฯ ฮณฮนฮฑ ฯฮฟ ฯฮตฯ ฮดฮฟฮบฮนฮผฯฮฝ.
ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮฎฯฮตฯฮต ฯฮฟ train_test_split. ฮคฮฟ ฯฯฯฯฮฟ ฯฯฮนฯฮผฮฑ ฮตฮฏฮฝฮฑฮน ฯฮฟ ฯฮปฮฑฮฏฯฮนฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ ฮตฮฏฮฝฮฑฮน ฯฮฑ ฯฮฑฯฮฑฮบฯฮทฯฮนฯฯฮนฮบฮฌ ฮบฮฑฮน ฯฮฟ ฮดฮตฯฯฮตฯฮฟ ฯฯฮนฯฮผฮฑ ฮตฮฏฮฝฮฑฮน ฯฮฟ ฯฮปฮฑฮฏฯฮนฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ ฮตฯฮนฮบฮญฯฮฑฯ. ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮบฮฑฮธฮฟฯฮฏฯฮตฯฮต ฯฮฟ ฮผฮญฮณฮตฮธฮฟฯ ฯฮฟฯ ฯฯ ฮฝฯฮปฮฟฯ ฮดฮฟฮบฮนฮผฮฎฯ ฮผฮต ฯฮฟ test_size.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df_train[features],
df_train.label,
test_size = 0.2,
random_state=0)
X_train.head(5)
print(X_train.shape, X_test.shape)
(26048, 14) (6512, 14)
ฮฮฎฮผฮฑ 3) ฮฮฑฯฮฑฯฮบฮตฯ ฮฌฯฯฮต ฯฮฟฮฝ ฮฑฮณฯฮณฯ
ฮ ฮฑฮณฯฮณฯฯ ฮดฮนฮตฯ ฮบฮฟฮปฯฮฝฮตฮน ฯฮทฮฝ ฯฯฮฟฯฮฟฮดฮฟฯฮฏฮฑ ฯฮฟฯ ฮผฮฟฮฝฯฮญฮปฮฟฯ ฮผฮต ฯฯ ฮฝฮตฯฮฎ ฮดฮตฮดฮฟฮผฮญฮฝฮฑ.
ฮ ฮนฮดฮญฮฑ ฯฮฏฯฯ ฮฑฯฯ ฮตฮฏฮฝฮฑฮน ฮฝฮฑ ฯฮฟฯฮฟฮธฮตฯฮทฮธฮฟฯฮฝ ฯฮฑ ฮฑฮบฮฑฯฮญฯฮณฮฑฯฯฮฑ ฮดฮตฮดฮฟฮผฮญฮฝฮฑ ฯฮต ฮญฮฝฮฑฮฝ ยซฮฑฮณฯฮณฯยป ฮณฮนฮฑ ฯฮทฮฝ ฮตฮบฯฮญฮปฮตฯฮท ฮปฮตฮนฯฮฟฯ ฯฮณฮนฯฮฝ.
ฮฮนฮฑ ฯฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ, ฮผฮต ฯฮฟ ฯฯฮญฯฮฟฮฝ ฯฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ, ฯฯฮญฯฮตฮน ฮฝฮฑ ฯฯ ฯฮฟฯฮฟฮนฮฎฯฮตฯฮต ฯฮนฯ ฯฯ ฮฝฮตฯฮตฮฏฯ ฮผฮตฯฮฑฮฒฮปฮทฯฮญฯ ฮบฮฑฮน ฮฝฮฑ ฮผฮตฯฮฑฯฯฮญฯฮตฯฮต ฯฮฑ ฮบฮฑฯฮทฮณฮฟฯฮนฮบฮฌ ฮดฮตฮดฮฟฮผฮญฮฝฮฑ. ฮฃฮทฮผฮตฮนฯฯฯฮต ฯฯฮน ฮผฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮตฮบฯฮตฮปฮญฯฮตฯฮต ฮฟฯฮฟฮนฮฑฮดฮฎฯฮฟฯฮต ฮปฮตฮนฯฮฟฯ ฯฮณฮฏฮฑ ฮตฮฝฯฯฯ ฯฮฟฯ ฮฑฮณฯฮณฮฟฯ. ฮฮนฮฑ ฯฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ, ฮตฮฌฮฝ ฮญฯฮตฯฮต ยซNAยป ฯฯฮฟ ฯฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ, ฮผฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฯฮฑ ฮฑฮฝฯฮนฮบฮฑฯฮฑฯฯฮฎฯฮตฯฮต ฮผฮต ฯฮฟฮฝ ฮผฮญฯฮฟ ฯฯฮฟ ฮฎ ฯฮฟฮฝ ฮดฮนฮฌฮผฮตฯฮฟ. ฮฯฮฟฯฮตฮฏฯฮต ฮตฯฮฏฯฮทฯ ฮฝฮฑ ฮดฮทฮผฮนฮฟฯ ฯฮณฮฎฯฮตฯฮต ฮฝฮญฮตฯ ฮผฮตฯฮฑฮฒฮปฮทฯฮญฯ.
ฮฯฮตฯฮต ฯฮทฮฝ ฮตฯฮนฮปฮฟฮณฮฎ. ฮบฯฮดฮนฮบฮฟฯฮฟฮนฮฎฯฯฮต ฯฮบฮปฮทฯฮฌ ฯฮนฯ ฮดฯฮฟ ฮดฮนฮฑฮดฮนฮบฮฑฯฮฏฮตฯ ฮฎ ฮดฮทฮผฮนฮฟฯ ฯฮณฮฎฯฯฮต ฮผฮนฮฑ ฮดฮนฮฟฯฮญฯฮตฯ ฯฮท. ฮ ฯฯฯฯฮท ฮตฯฮนฮปฮฟฮณฮฎ ฮผฯฮฟฯฮตฮฏ ฮฝฮฑ ฮฟฮดฮทฮณฮฎฯฮตฮน ฯฮต ฮดฮนฮฑฯฯฮฟฮฎ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ ฮบฮฑฮน ฮฝฮฑ ฮดฮทฮผฮนฮฟฯ ฯฮณฮฎฯฮตฮน ฮฑฯฯ ฮฝฮญฯฮตฮนฮตฯ ฮผฮต ฯฮทฮฝ ฯฮฌฯฮฟฮดฮฟ ฯฮฟฯ ฯฯฯฮฝฮฟฯ . ฮฮนฮฑ ฮบฮฑฮปฯฯฮตฯฮท ฮตฯฮนฮปฮฟฮณฮฎ ฮตฮฏฮฝฮฑฮน ฮฝฮฑ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮฎฯฮตฯฮต ฯฮฟฮฝ ฮฑฮณฯฮณฯ.
from sklearn.preprocessing import StandardScaler, OneHotEncoder, LabelEncoder from sklearn.compose import ColumnTransformer, make_column_transformer from sklearn.pipeline import make_pipeline from sklearn.linear_model import LogisticRegression
ฮ ฮฑฮณฯฮณฯฯ ฮธฮฑ ฮตฮบฯฮตฮปฮญฯฮตฮน ฮดฯฮฟ ฮปฮตฮนฯฮฟฯ ฯฮณฮฏฮตฯ ฯฯฮนฮฝ ฯฯฮฟฯฮฟฮดฮฟฯฮฎฯฮตฮน ฯฮฟฮฝ ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮฎ logistic:
- ฮคฯ ฯฮฟฯฮฟฮนฮฎฯฯฮต ฯฮท ฮผฮตฯฮฑฮฒฮปฮทฯฮฎ: "StandardScaler()"
- ฮฮตฯฮฑฯฯฮฟฯฮฎ ฯฯฮฝ ฮบฮฑฯฮทฮณฮฟฯฮนฮบฯฮฝ ฯฮฑฯฮฑฮบฯฮทฯฮนฯฯฮนฮบฯฮฝ: OneHotEncoder(sparse=False)
ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮตฮบฯฮตฮปฮญฯฮตฯฮต ฯฮฑ ฮดฯฮฟ ฮฒฮฎฮผฮฑฯฮฑ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฯฮฝฯฮฑฯ ฯฮฟ make_column_transformer. ฮฯ ฯฮฎ ฮท ฮปฮตฮนฯฮฟฯ ฯฮณฮฏฮฑ ฮดฮตฮฝ ฮตฮฏฮฝฮฑฮน ฮดฮนฮฑฮธฮญฯฮนฮผฮท ฯฯฮทฮฝ ฯฯฮญฯฮฟฯ ฯฮฑ ฮญฮบฮดฮฟฯฮท ฯฮฟฯ scikit-learn (0.19). ฮฮตฮฝ ฮตฮฏฮฝฮฑฮน ฮดฯ ฮฝฮฑฯฯ ฮผฮต ฯฮทฮฝ ฯฯฮญฯฮฟฯ ฯฮฑ ฮญฮบฮดฮฟฯฮท ฮฝฮฑ ฮตฮบฯฮตฮปฮตฯฯฮตฮฏ ฮฟ ฮบฯฮดฮนฮบฮฟฯฮฟฮนฮทฯฮฎฯ ฮตฯฮนฮบฮตฯฯฮฝ ฮบฮฑฮน ฮญฮฝฮฑฯ ฮบฮฑฯ ฯฯฯ ฮบฯฮดฮนฮบฮฟฯฮฟฮนฮทฯฮฎฯ ฯฮต ฮตฮพฮญฮปฮนฮพฮท. ฮฮฏฮฝฮฑฮน ฮญฮฝฮฑฯ ฮปฯฮณฮฟฯ ฯฮฟฯ ฮฑฯฮฟฯฮฑฯฮฏฯฮฑฮผฮต ฮฝฮฑ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮฎฯฮฟฯ ฮผฮต ฯฮทฮฝ ฮญฮบฮดฮฟฯฮท ฯฯฮฟฮณฯฮฑฮผฮผฮฑฯฮนฯฯฮฎ.
ฮคฮฟ make_column_transformer ฮตฮฏฮฝฮฑฮน ฮตฯฮบฮฟฮปฮฟ ฯฯฮท ฯฯฮฎฯฮท. ฮ ฯฮญฯฮตฮน ฮฝฮฑ ฮฟฯฮฏฯฮตฯฮต ฯฮฟฮนฮตฯ ฯฯฮฎฮปฮตฯ ฮธฮฑ ฮตฯฮฑฯฮผฮฟฯฯฮตฮฏ ฮฟ ฮผฮตฯฮฑฯฯฮทฮผฮฑฯฮนฯฮผฯฯ ฮบฮฑฮน ฯฮฟฮนฮฟฯ ฮผฮตฯฮฑฯฯฮทฮผฮฑฯฮนฯฮผฯฯ ฮธฮฑ ฮปฮตฮนฯฮฟฯ ฯฮณฮฎฯฮตฮน. ฮฮนฮฑ ฯฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ, ฮณฮนฮฑ ฮฝฮฑ ฯฯ ฯฮฟฯฮฟฮนฮฎฯฮตฯฮต ฯฮท ฯฯ ฮฝฮตฯฮฎ ฮปฮตฮนฯฮฟฯ ฯฮณฮฏฮฑ, ฮผฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮบฮฌฮฝฮตฯฮต:
- conti_features, StandardScaler() ฮผฮญฯฮฑ ฯฯฮฟ make_column_transformer.
- conti_features: ฮปฮฏฯฯฮฑ ฮผฮต ฯฮท ฯฯ ฮฝฮตฯฮฎ ฮผฮตฯฮฑฮฒฮปฮทฯฮฎ
- StandardScaler: ฯฯ ฯฮฟฯฮฟฮฏฮทฯฮท ฯฮทฯ ฮผฮตฯฮฑฮฒฮปฮทฯฮฎฯ
ฮคฮฟ ฮฑฮฝฯฮนฮบฮตฮฏฮผฮตฮฝฮฟ OneHotEncoder ฮผฮญฯฮฑ ฯฯฮฟ make_column_transformer ฮบฯฮดฮนฮบฮฟฯฮฟฮนฮตฮฏ ฮฑฯ ฯฯฮผฮฑฯฮฑ ฯฮทฮฝ ฮตฯฮนฮบฮญฯฮฑ.
preprocess = make_column_transformer(
(conti_features, StandardScaler()),
### Need to be numeric not string to specify columns name
(categorical_features, OneHotEncoder(sparse=False))
)
ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮตฮปฮญฮณฮพฮตฯฮต ฮตฮฌฮฝ ฮฟ ฮฑฮณฯฮณฯฯ ฮปฮตฮนฯฮฟฯ ฯฮณฮตฮฏ ฮผฮต ฯฮฟ fit_transform. ฮคฮฟ ฯฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ ฯฯฮญฯฮตฮน ฮฝฮฑ ฮญฯฮตฮน ฯฮฟ ฮฑฮบฯฮปฮฟฯ ฮธฮฟ ฯฯฮฎฮผฮฑ: 26048, 107
preprocess.fit_transform(X_train).shape
(26048, 107)
ฮ ฮผฮตฯฮฑฯฯฮทฮผฮฑฯฮนฯฯฮฎฯ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ ฮตฮฏฮฝฮฑฮน ฮญฯฮฟฮนฮผฮฟฯ ฮณฮนฮฑ ฯฯฮฎฯฮท. ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮดฮทฮผฮนฮฟฯ ฯฮณฮฎฯฮตฯฮต ฯฮท ฮดฮนฮฟฯฮญฯฮตฯ ฯฮท ฮผฮต ฯฮฟ make_pipeline. ฮฯฮปฮนฯ ฯฮฑ ฮดฮตฮดฮฟฮผฮญฮฝฮฑ ฮผฮตฯฮฑฯฯฮทฮผฮฑฯฮนฯฯฮฟฯฮฝ, ฮผฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฯฯฮฟฯฮฟฮดฮฟฯฮฎฯฮตฯฮต ฯฮทฮฝ ฮปฮฟฮณฮนฯฯฮนฮบฮฎ ฯฮฑฮปฮนฮฝฮดฯฯฮผฮทฯฮท.
model = make_pipeline(
preprocess,
LogisticRegression())
ฮ ฮตฮบฯฮฑฮฏฮดฮตฯ ฯฮท ฮตฮฝฯฯ ฮผฮฟฮฝฯฮญฮปฮฟฯ ฮผฮต ฯฮฟ scikit-learn ฮตฮฏฮฝฮฑฮน ฮฑฯฮฎฮผฮฑฮฝฯฮท. ฮ ฯฮญฯฮตฮน ฮฝฮฑ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮฎฯฮตฯฮต ฯฮทฮฝ ฯฯฮฟฯฮฑฯฮผฮฟฮณฮฎ ฮฑฮฝฯฮนฮบฮตฮนฮผฮญฮฝฮฟฯ ฯฮฟฯ ฯฯฮฟฮทฮณฮตฮฏฯฮฑฮน ฯฮฟฯ ฮฑฮณฯฮณฮฟฯ, ฮดฮทฮปฮฑฮดฮฎ ฯฮฟ ฮผฮฟฮฝฯฮญฮปฮฟ. ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮตฮบฯฯ ฯฯฯฮตฯฮต ฯฮทฮฝ ฮฑฮบฯฮฏฮฒฮตฮนฮฑ ฮผฮต ฯฮฟ ฮฑฮฝฯฮนฮบฮตฮฏฮผฮตฮฝฮฟ ฮฒฮฑฮธฮผฮฟฮปฮฟฮณฮฏฮฑฯ ฮฑฯฯ ฯฮท ฮฒฮนฮฒฮปฮนฮฟฮธฮฎฮบฮท scikit-learn
model.fit(X_train, y_train)
print("logistic regression score: %f" % model.score(X_test, y_test))
logistic regression score: 0.850891
ฮคฮญฮปฮฟฯ, ฮผฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฯฯฮฟฮฒฮปฮญฯฮตฯฮต ฯฮนฯ ฯฮฌฮพฮตฮนฯ ฮผฮต ฯฮฟ predict_proba. ฮฯฮนฯฯฯฮญฯฮตฮน ฯฮทฮฝ ฯฮนฮธฮฑฮฝฯฯฮทฯฮฑ ฮณฮนฮฑ ฮบฮฌฮธฮต ฮบฮปฮฌฯฮท. ฮฃฮทฮผฮตฮนฯฯฯฮต ฯฯฮน ฮฑฮธฯฮฟฮฏฮถฮตฯฮฑฮน ฯฮต ฮญฮฝฮฑ.
model.predict_proba(X_test)
array([[0.83576663, 0.16423337],
[0.94582765, 0.05417235],
[0.64760587, 0.35239413],
...,
[0.99639252, 0.00360748],
[0.02072181, 0.97927819],
[0.56781353, 0.43218647]])
ฮฮฎฮผฮฑ 4) ฮงฯฮฎฯฮท ฯฮฟฯ ฮฑฮณฯฮณฮฟฯ ฮผฮฑฯ ฯฮต ฮผฮนฮฑ ฮฑฮฝฮฑฮถฮฎฯฮทฯฮท ฯฮปฮญฮณฮผฮฑฯฮฟฯ
ฮ ฯฯ ฮฝฯฮฟฮฝฮนฯฮผฯฯ ฯฮทฯ ฯ ฯฮตฯฯฮฑฯฮฌฮผฮตฯฯฮฟฯ (ฮผฮตฯฮฑฮฒฮปฮทฯฮญฯ ฯฮฟฯ ฮบฮฑฮธฮฟฯฮฏฮถฮฟฯ ฮฝ ฯฮท ฮดฮฟฮผฮฎ ฯฮฟฯ ฮดฮนฮบฯฯฮฟฯ ฯฯฯฯ ฮฟฮน ฮบฯฯ ฯฮญฯ ฮผฮฟฮฝฮฌฮดฮตฯ) ฮผฯฮฟฯฮตฮฏ ฮฝฮฑ ฮตฮฏฮฝฮฑฮน ฮบฮฟฯ ฯฮฑฯฯฮนฮบฯฯ ฮบฮฑฮน ฮตฮพฮฑฮฝฯฮปฮทฯฮนฮบฯฯ.
ฮฮฝฮฑฯ ฯฯฯฯฮฟฯ ฮฑฮพฮนฮฟฮปฯฮณฮทฯฮทฯ ฯฮฟฯ ฮผฮฟฮฝฯฮญฮปฮฟฯ ฮธฮฑ ฮผฯฮฟฯฮฟฯฯฮต ฮฝฮฑ ฮตฮฏฮฝฮฑฮน ฮท ฮฑฮปฮปฮฑฮณฮฎ ฯฮฟฯ ฮผฮตฮณฮญฮธฮฟฯ ฯ ฯฮฟฯ ฯฮตฯ ฮตฮบฯฮฑฮฏฮดฮตฯ ฯฮทฯ ฮบฮฑฮน ฮท ฮฑฮพฮนฮฟฮปฯฮณฮทฯฮท ฯฯฮฝ ฮตฯฮนฮดฯฯฮตฯฮฝ.
ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮตฯฮฑฮฝฮฑฮปฮฌฮฒฮตฯฮต ฮฑฯ ฯฮฎฮฝ ฯฮท ฮผฮญฮธฮฟฮดฮฟ ฮดฮญฮบฮฑ ฯฮฟฯฮญฯ ฮณฮนฮฑ ฮฝฮฑ ฮดฮตฮฏฯฮต ฯฮนฯ ฮผฮตฯฯฮฎฯฮตฮนฯ ฮฒฮฑฮธฮผฮฟฮปฮฟฮณฮฏฮฑฯ. ฮฉฯฯฯฯฮฟ, ฮตฮฏฮฝฮฑฮน ฯฮฌฯฮฑ ฯฮฟฮปฮปฮฎ ฮดฮฟฯ ฮปฮตฮนฮฌ.
ฮฮฝฯฮฏฮธฮตฯฮฑ, ฯฮฟ scikit-learn ฯฮฑฯฮญฯฮตฮน ฮผฮนฮฑ ฮปฮตฮนฯฮฟฯ ฯฮณฮฏฮฑ ฮณฮนฮฑ ฯฮทฮฝ ฮตฮบฯฮญฮปฮตฯฮท ฯฯ ฮฝฯฮฟฮฝฮนฯฮผฮฟฯ ฯฮฑฯฮฑฮผฮญฯฯฯฮฝ ฮบฮฑฮน ฮดฮนฮฑฯฯฮฑฯ ฯฮฟฯฮผฮตฮฝฮทฯ ฮตฯฮนฮบฯฯฯฯฮทฯ.
ฮฮนฮฑฯฯฮฑฯ ฯฯฮผฮญฮฝฮท ฮตฯฮนฮบฯฯฯฯฮท
Cross-Validation ฯฮทฮผฮฑฮฏฮฝฮตฮน ฯฯฮน ฮบฮฑฯฮฌ ฯฮท ฮดฮนฮฌฯฮบฮตฮนฮฑ ฯฮทฯ ฮตฮบฯฮฑฮฏฮดฮตฯ ฯฮทฯ, ฯฮฟ ฯฮตฯ ฮตฮบฯฮฑฮฏฮดฮตฯ ฯฮทฯ ฮฟฮปฮนฯฮธฮฑฮฏฮฝฮตฮน n ฯฮฟฯฮญฯ ฯฮต ฯฯฯ ฯฮญฯ ฮบฮฑฮน ฯฯฮท ฯฯ ฮฝฮญฯฮตฮนฮฑ ฮฑฮพฮนฮฟฮปฮฟฮณฮตฮฏ ฯฮฟ ฮผฮฟฮฝฯฮญฮปฮฟ n ฯฯฯฮฝฮฟ. ฮฮนฮฑ ฯฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ, ฮตฮฌฮฝ ฯฮฟ cv ฮญฯฮตฮน ฮฟฯฮนฯฯฮตฮฏ ฯฯฮฟ 10, ฯฮฟ ฯฮตฯ ฮตฮบฯฮฑฮฏฮดฮตฯ ฯฮทฯ ฮตฮบฯฮฑฮนฮดฮตฯฮตฯฮฑฮน ฮบฮฑฮน ฮฑฮพฮนฮฟฮปฮฟฮณฮตฮฏฯฮฑฮน ฮดฮญฮบฮฑ ฯฮฟฯฮญฯ. ฮฃฮต ฮบฮฌฮธฮต ฮณฯฯฮฟ, ฮฟ ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮฎฯ ฮตฯฮนฮปฮญฮณฮตฮน ฯฯ ฯฮฑฮฏฮฑ ฮตฮฝฮฝฮญฮฑ ฯฮฟฯฮญฯ ฮณฮนฮฑ ฮฝฮฑ ฮตฮบฯฮฑฮนฮดฮตฯฯฮตฮน ฯฮฟ ฮผฮฟฮฝฯฮญฮปฮฟ ฮบฮฑฮน ฮท 10ฮท ฯฯฯ ฯฮฎ ฯฯฮฟฮฟฯฮฏฮถฮตฯฮฑฮน ฮณฮนฮฑ ฮฑฮพฮนฮฟฮปฯฮณฮทฯฮท.
ฮฮฝฮฑฮถฮฎฯฮทฯฮท ฯฮปฮญฮณฮผฮฑฯฮฟฯ
ฮฮฌฮธฮต ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮฎฯ ฮญฯฮตฮน ฯ ฯฮตฯฯฮฑฯฮฑฮผฮญฯฯฮฟฯ ฯ ฮณฮนฮฑ ฯฯ ฮฝฯฮฟฮฝฮนฯฮผฯ. ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮดฮฟฮบฮนฮผฮฌฯฮตฯฮต ฮดฮนฮฑฯฮฟฯฮตฯฮนฮบฮญฯ ฯฮนฮผฮญฯ ฮฎ ฮผฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮฟฯฮฏฯฮตฯฮต ฮญฮฝฮฑ ฯฮปฮญฮณฮผฮฑ ฯฮฑฯฮฑฮผฮญฯฯฯฮฝ. ฮฮฌฮฝ ฮผฮตฯฮฑฮฒฮตฮฏฯฮต ฯฯฮฟฮฝ ฮตฯฮฏฯฮทฮผฮฟ ฮนฯฯฯฯฮฟฯฮฟ ฯฮฟฯ scikit-learn, ฮผฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮดฮตฮฏฯฮต ฯฯฮน ฮฟ ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮฎฯ logistic ฮญฯฮตฮน ฮดฮนฮฑฯฮฟฯฮตฯฮนฮบฮญฯ ฯฮฑฯฮฑฮผฮญฯฯฮฟฯ ฯ ฮณฮนฮฑ ฯฯ ฮฝฯฮฟฮฝฮนฯฮผฯ. ฮฮนฮฑ ฮฝฮฑ ฮบฮฌฮฝฮตฯฮต ฯฮทฮฝ ฯฯฮฟฯฯฮฝฮทฯฮท ฯฮนฮฟ ฮณฯฮฎฮณฮฟฯฮท, ฮตฯฮนฮปฮญฮณฮตฯฮต ฮฝฮฑ ฯฯ ฮฝฯฮฟฮฝฮฏฯฮตฯฮต ฯฮทฮฝ ฯฮฑฯฮฌฮผฮตฯฯฮฟ C. ฮฮปฮญฮณฯฮตฮน ฯฮทฮฝ ฯฮฑฯฮฌฮผฮตฯฯฮฟ ฮบฮฑฮฝฮฟฮฝฮนฮบฮฟฯฮฟฮฏฮทฯฮทฯ. ฮฮฑ ฯฯฮญฯฮตฮน ฮฝฮฑ ฮตฮฏฮฝฮฑฮน ฮธฮตฯฮนฮบฯ. ฮฮนฮฑ ฮผฮนฮบฯฮฎ ฯฮนฮผฮฎ ฮดฮฏฮฝฮตฮน ฯฮตฯฮนฯฯฯฯฮตฯฮฟ ฮฒฮฌฯฮฟฯ ฯฯฮฟฮฝ ฮบฮฑฮฝฮฟฮฝฮนฮบฮฟฯฮฟฮนฮทฯฮฎ.
ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮฎฯฮตฯฮต ฯฮฟ ฮฑฮฝฯฮนฮบฮตฮฏฮผฮตฮฝฮฟ GridSearchCV. ฮ ฯฮญฯฮตฮน ฮฝฮฑ ฮดฮทฮผฮนฮฟฯ ฯฮณฮฎฯฮตฯฮต ฮญฮฝฮฑ ฮปฮตฮพฮนฮบฯ ฯฮฟฯ ฮฝฮฑ ฯฮตฯฮนฮญฯฮตฮน ฯฮนฯ ฯ ฯฮตฯฯฮฑฯฮฑฮผฮญฯฯฮฟฯ ฯ ฮณฮนฮฑ ฮฝฮฑ ฯฯ ฮฝฯฮฟฮฝฮนฯฯฮตฮฏฯฮต.
ฮฮฝฮฑฯฮญฯฮตฯฮต ฯฮนฯ ฯ ฯฮตฯฯฮฑฯฮฑฮผฮญฯฯฮฟฯ ฯ ฮฑฮบฮฟฮปฮฟฯ ฮธฮฟฯฮผฮตฮฝฮตฯ ฮฑฯฯ ฯฮนฯ ฯฮนฮผฮญฯ ฯฮฟฯ ฮธฮญฮปฮตฯฮต ฮฝฮฑ ฮดฮฟฮบฮนฮผฮฌฯฮตฯฮต. ฮฮนฮฑ ฯฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ, ฮณฮนฮฑ ฮฝฮฑ ฯฯ ฮฝฯฮฟฮฝฮฏฯฮตฯฮต ฯฮทฮฝ ฯฮฑฯฮฌฮผฮตฯฯฮฟ C, ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮตฮฏฯฮต:
- 'logisticregression__C': [0.1, 1.0, 1.0]: ฮ ฯฮนฮฝ ฮฑฯฯ ฯฮทฮฝ ฯฮฑฯฮฌฮผฮตฯฯฮฟ ฯ ฯฮฌฯฯฮตฮน ฯฮฟ ฯฮฝฮฟฮผฮฑ, ฮผฮต ฯฮตฮถฮฌ, ฯฮฟฯ ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮฎ ฮบฮฑฮน ฮดฯฮฟ ฮบฮฌฯฯ ฯฮฑฯฮปฮตฯ.
ฮคฮฟ ฮผฮฟฮฝฯฮญฮปฮฟ ฮธฮฑ ฮดฮฟฮบฮนฮผฮฌฯฮตฮน ฯฮญฯฯฮตฯฮนฯ ฮดฮนฮฑฯฮฟฯฮตฯฮนฮบฮญฯ ฯฮนฮผฮญฯ: 0.001, 0.01, 0.1 ฮบฮฑฮน 1.
ฮฮบฯฮฑฮนฮดฮตฯฮตฯฮต ฯฮฟ ฮผฮฟฮฝฯฮญฮปฮฟ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฯฮฝฯฮฑฯ 10 ฯฯฯ ฯฮญฯ: cv=10
from sklearn.model_selection import GridSearchCV
# Construct the parameter grid
param_grid = {
'logisticregression__C': [0.001, 0.01,0.1, 1.0],
}
ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮตฮบฯฮฑฮนฮดฮตฯฯฮตฯฮต ฯฮฟ ฮผฮฟฮฝฯฮญฮปฮฟ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฯฮฝฯฮฑฯ ฯฮฟ GridSearchCV ฮผฮต ฯฮทฮฝ ฯฮฑฯฮฌฮผฮตฯฯฮฟ gri ฮบฮฑฮน cv.
# Train the model
grid_clf = GridSearchCV(model,
param_grid,
cv=10,
iid=False)
grid_clf.fit(X_train, y_train)
ฮ ฮฮกฮฮฮฉฮฮ
GridSearchCV(cv=10, error_score='raise-deprecating',
estimator=Pipeline(memory=None,
steps=[('columntransformer', ColumnTransformer(n_jobs=1, remainder='drop', transformer_weights=None,
transformers=[('standardscaler', StandardScaler(copy=True, with_mean=True, with_std=True), [0, 2, 10, 4, 11, 12]), ('onehotencoder', OneHotEncoder(categorical_features=None, categories=None,...ty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False))]),
fit_params=None, iid=False, n_jobs=1,
param_grid={'logisticregression__C': [0.001, 0.01, 0.1, 1.0]},
pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
scoring=None, verbose=0)
ฮฮนฮฑ ฮฝฮฑ ฮฑฯฮฟฮบฯฮฎฯฮตฯฮต ฯฯฯฯฮฒฮฑฯฮท ฯฯฮนฯ ฮบฮฑฮปฯฯฮตฯฮตฯ ฯฮฑฯฮฑฮผฮญฯฯฮฟฯ ฯ, ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮตฮฏฯฮต best_params_
grid_clf.best_params_
ฮ ฮฮกฮฮฮฉฮฮ
{'logisticregression__C': 1.0}
ฮฮตฯฮฌ ฯฮทฮฝ ฮตฮบฯฮฑฮฏฮดฮตฯ ฯฮท ฯฮฟฯ ฮผฮฟฮฝฯฮญฮปฮฟฯ ฮผฮต ฯฮญฯฯฮตฯฮนฯ ฮดฮนฮฑฯฮฟฯฮตฯฮนฮบฮญฯ ฯฮนฮผฮญฯ ฯฮฑฮบฯฮฟฯฮฟฮฏฮทฯฮทฯ, ฮท ฮฒฮญฮปฯฮนฯฯฮท ฯฮฑฯฮฌฮผฮตฯฯฮฟฯ ฮตฮฏฮฝฮฑฮน
print("best logistic regression from grid search: %f" % grid_clf.best_estimator_.score(X_test, y_test))
ฮบฮฑฮปฯฯฮตฯฮท ฮปฮฟฮณฮนฯฯฮนฮบฮฎ ฯฮฑฮปฮนฮฝฮดฯฯฮผฮทฯฮท ฮฑฯฯ ฯฮทฮฝ ฮฑฮฝฮฑฮถฮฎฯฮทฯฮท ฯฮปฮญฮณฮผฮฑฯฮฟฯ: 0.850891
ฮฮนฮฑ ฯฯฯฯฮฒฮฑฯฮท ฯฯฮนฯ ฯฯฮฟฮฒฮปฮตฯฯฮผฮตฮฝฮตฯ ฯฮนฮธฮฑฮฝฯฯฮทฯฮตฯ:
grid_clf.best_estimator_.predict_proba(X_test)
array([[0.83576677, 0.16423323],
[0.9458291 , 0.0541709 ],
[0.64760416, 0.35239584],
...,
[0.99639224, 0.00360776],
[0.02072033, 0.97927967],
[0.56782222, 0.43217778]])
ฮฮฟฮฝฯฮญฮปฮฟ XGBoost ฮผฮต scikit-learn
ฮฯ ฮดฮฟฮบฮนฮผฮฌฯฮฟฯ ฮผฮต ฯฮฑฯฮฑฮดฮตฮฏฮณฮผฮฑฯฮฑ Scikit-learn ฮณฮนฮฑ ฮฝฮฑ ฮตฮบฯฮฑฮนฮดฮตฯฯฮฟฯ ฮผฮต ฮญฮฝฮฑฮฝ ฮฑฯฯ ฯฮฟฯ ฯ ฮบฮฑฮปฯฯฮตฯฮฟฯ ฯ ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮญฯ ฯฯฮทฮฝ ฮฑฮณฮฟฯฮฌ. ฮคฮฟ XGBoost ฮตฮฏฮฝฮฑฮน ฮผฮนฮฑ ฮฒฮตฮปฯฮฏฯฯฮท ฯฮต ฯฯฮญฯฮท ฮผฮต ฯฮฟ ฯฯ ฯฮฑฮฏฮฟ ฮดฮฌฯฮฟฯ. ฮคฮฟ ฮธฮตฯฯฮทฯฮนฮบฯ ฯ ฯฯฮฒฮฑฮธฯฮฟ ฯฮฟฯ ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮฎ ฮตฮบฯฯฯ ฯฮฟฯ ฯฮตฮดฮฏฮฟฯ ฮฑฯ ฯฮฟฯ Python ฮฮบฮผฮฌฮธฮทฯฮท Scikit. ฮฮฌฮฒฮตฯฮต ฯ ฯฯฯฮท ฯฯฮน, ฯฮฟ XGBoost ฮญฯฮตฮน ฮบฮตฯฮดฮฏฯฮตฮน ฯฮฟฮปฮปฮฟฯฯ ฮดฮนฮฑฮณฯฮฝฮนฯฮผฮฟฯฯ kaggle. ฮฮต ฮญฮฝฮฑ ฮผฮญฯฮฟ ฮผฮญฮณฮตฮธฮฟฯ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ, ฮผฯฮฟฯฮตฮฏ ฮฝฮฑ ฮฑฯฮฟฮดฯฯฮตฮน ฯฯฯฮฟ ฮบฮฑฮปฮฌ ฯฯฮฟ ฮญฮฝฮฑฯ ฮฑฮปฮณฯฯฮนฮธฮผฮฟฯ ฮฒฮฑฮธฮนฮฌฯ ฮผฮฌฮธฮทฯฮทฯ ฮฎ ฮฑฮบฯฮผฮฑ ฮบฮฑฮปฯฯฮตฯฮฑ.
ฮ ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮฎฯ ฮตฮฏฮฝฮฑฮน ฮดฯฯฮบฮฟฮปฮฟ ฮฝฮฑ ฮตฮบฯฮฑฮนฮดฮตฯ ฯฮตฮฏ ฮตฯฮตฮนฮดฮฎ ฮญฯฮตฮน ฮผฮตฮณฮฌฮปฮฟ ฮฑฯฮนฮธฮผฯ ฯฮฑฯฮฑฮผฮญฯฯฯฮฝ ฮณฮนฮฑ ฯฯ ฮฝฯฮฟฮฝฮนฯฮผฯ. ฮฯฮฟฯฮตฮฏฯฮต, ฯฯ ฯฮนฮบฮฌ, ฮฝฮฑ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮฎฯฮตฯฮต ฯฮฟ GridSearchCV ฮณฮนฮฑ ฮฝฮฑ ฮตฯฮนฮปฮญฮพฮตฯฮต ฯฮทฮฝ ฯฮฑฯฮฌฮผฮตฯฯฮฟ ฮณฮนฮฑ ฮตฯฮฌฯ.
ฮฮฝฯฮฏฮธฮตฯฮฑ, ฮฑฯ ฮดฮฟฯฮผฮต ฯฯฯ ฮฝฮฑ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮฎฯฮตฯฮต ฮญฮฝฮฑฮฝ ฮบฮฑฮปฯฯฮตฯฮฟ ฯฯฯฯฮฟ ฮณฮนฮฑ ฮฝฮฑ ฮฒฯฮตฮฏฯฮต ฯฮนฯ ฮฒฮญฮปฯฮนฯฯฮตฯ ฯฮฑฯฮฑฮผฮญฯฯฮฟฯ ฯ. ฮคฮฟ GridSearchCV ฮผฯฮฟฯฮตฮฏ ฮฝฮฑ ฮตฮฏฮฝฮฑฮน ฮบฮฟฯ ฯฮฑฯฯฮนฮบฯ ฮบฮฑฮน ฯฮฟฮปฯ ฮผฮฑฮบฯฯ ฮณฮนฮฑ ฮตฮบฯฮฑฮฏฮดฮตฯ ฯฮท, ฮฑฮฝ ฯฮตฯฮฌฯฮตฯฮต ฯฮฟฮปฮปฮญฯ ฯฮนฮผฮญฯ. ฮ ฯฯฯฮฟฯ ฮฑฮฝฮฑฮถฮฎฯฮทฯฮทฯ ฮผฮตฮณฮฑฮปฯฮฝฮตฮน ฮผฮฑฮถฮฏ ฮผฮต ฯฮฟฮฝ ฮฑฯฮนฮธฮผฯ ฯฯฮฝ ฯฮฑฯฮฑฮผฮญฯฯฯฮฝ. ฮฮนฮฑ ฯฯฮฟฯฮนฮผฯฮผฮตฮฝฮท ฮปฯฯฮท ฮตฮฏฮฝฮฑฮน ฮท ฯฯฮฎฯฮท RandomizedSearchCV. ฮฯ ฯฮฎ ฮท ฮผฮญฮธฮฟฮดฮฟฯ ฯฯ ฮฝฮฏฯฯฮฑฯฮฑฮน ฯฯฮทฮฝ ฮตฯฮนฮปฮฟฮณฮฎ ฯฯฮฝ ฯฮนฮผฯฮฝ ฮบฮฌฮธฮต ฯ ฯฮตฯฯฮฑฯฮฑฮผฮญฯฯฮฟฯ ฮผฮตฯฮฌ ฮฑฯฯ ฮบฮฌฮธฮต ฮตฯฮฑฮฝฮฌฮปฮทฯฮท ฯฯ ฯฮฑฮฏฮฑ. ฮฮนฮฑ ฯฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ, ฮตฮฌฮฝ ฮฟ ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮฎฯ ฮญฯฮตฮน ฮตฮบฯฮฑฮนฮดฮตฯ ฯฮตฮฏ ฯฮฌฮฝฯ ฮฑฯฯ 1000 ฮตฯฮฑฮฝฮฑฮปฮฎฯฮตฮนฯ, ฯฯฯฮต ฮฑฮพฮนฮฟฮปฮฟฮณฮฟฯฮฝฯฮฑฮน 1000 ฯฯ ฮฝฮดฯ ฮฑฯฮผฮฟฮฏ. ฮฮตฮนฯฮฟฯ ฯฮณฮตฮฏ ฮปฮฏฮณฮฟ ฯฮฟฮปฯ ฯฯฯฯ. GridSearchCV
ฮ ฯฮญฯฮตฮน ฮฝฮฑ ฮตฮนฯฮฑฮณฮฌฮณฮตฯฮต ฯฮฟ xgboost. ฮฮฌฮฝ ฮท ฮฒฮนฮฒฮปฮนฮฟฮธฮฎฮบฮท ฮดฮตฮฝ ฮตฮฏฮฝฮฑฮน ฮตฮณฮบฮฑฯฮตฯฯฮทฮผฮญฮฝฮท, ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮฎฯฯฮต ฯฮฟ pip3 install xgboost ฮฎ
use import sys
!{sys.executable} -m pip install xgboost
In Jupyter ฮธฮตฯฮนฮบฮฎ ฮฑฯฮผฯฯฯฮฑฮนฯฮฑ
ฮฃฯฮท ฯฯ ฮฝฮญฯฮตฮนฮฑ,
import xgboost from sklearn.model_selection import RandomizedSearchCV from sklearn.model_selection import StratifiedKFold
ฮคฮฟ ฮตฯฯฮผฮตฮฝฮฟ ฮฒฮฎฮผฮฑ ฯฮต ฮฑฯ ฯฯ ฯฮฟ Scikit Python ฮคฮฟ ฯฮตฮผฮนฮฝฮฌฯฮนฮฟ ฯฮตฯฮนฮปฮฑฮผฮฒฮฌฮฝฮตฮน ฯฮฟฮฝ ฮบฮฑฮธฮฟฯฮนฯฮผฯ ฯฯฮฝ ฯฮฑฯฮฑฮผฮญฯฯฯฮฝ ฮณฮนฮฑ ฯฯ ฮฝฯฮฟฮฝฮนฯฮผฯ. ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮฑฮฝฮฑฯฯฮญฮพฮตฯฮต ฯฯฮทฮฝ ฮตฯฮฏฯฮทฮผฮท ฯฮตฮบฮผฮทฯฮฏฯฯฮท ฮณฮนฮฑ ฮฝฮฑ ฮดฮตฮฏฯฮต ฯฮปฮตฯ ฯฮนฯ ฯฮฑฯฮฑฮผฮญฯฯฮฟฯ ฯ ฮณฮนฮฑ ฯฯ ฮฝฯฮฟฮฝฮนฯฮผฯ. ฮฮนฮฑ ฯฮฌฯฮท ฯฮฟฯ Python ฮฮบฮผฮฌฮธฮทฯฮท Sklearn, ฮตฯฮนฮปฮญฮณฮตฯฮต ฮผฯฮฝฮฟ ฮดฯฮฟ ฯ ฯฮตฯฯฮฑฯฮฌฮผฮตฯฯฮฟฯ ฯ ฮผฮต ฮดฯฮฟ ฯฮนฮผฮญฯ ฮท ฮบฮฑฮธฮตฮผฮฏฮฑ. ฮคฮฟ XGBoost ฮฑฯฮฑฮนฯฮตฮฏ ฯฮฟฮปฯ ฯฯฯฮฝฮฟ ฮณฮนฮฑ ฮฝฮฑ ฯฯฮฟฯฮฟฮฝฮทฮธฮตฮฏ, ฯฯฮฟ ฯฮตฯฮนฯฯฯฯฮตฯฮตฯ ฯ ฯฮตฯฯฮฑฯฮฌฮผฮตฯฯฮฟฮน ฯฯฮฟ ฯฮปฮญฮณฮผฮฑ, ฯฯฯฮฟ ฯฮตฯฮนฯฯฯฯฮตฯฮฟฯ ฯฯฯฮฝฮฟฯ ฯฯฮตฮนฮฌฮถฮตฯฮฑฮน ฮฝฮฑ ฯฮตฯฮนฮผฮญฮฝฮตฯฮต.
params = {
'xgbclassifier__gamma': [0.5, 1],
'xgbclassifier__max_depth': [3, 4]
}
ฮฮฑฯฮฑฯฮบฮตฯ ฮฌฮถฮตฯฮต ฮผฮนฮฑ ฮฝฮญฮฑ ฮดฮนฮฟฯฮญฯฮตฯ ฯฮท ฮผฮต ฯฮฟฮฝ ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮฎ XGBoost. ฮฯฮนฮปฮญฮณฮตฯฮต ฮฝฮฑ ฮฟฯฮฏฯฮตฯฮต 600 ฮตฮบฯฮนฮผฮทฯฮญฯ. ฮฃฮทฮผฮตฮนฯฯฯฮต ฯฯฮน ฮฟฮน n_estimators ฮตฮฏฮฝฮฑฮน ฮผฮนฮฑ ฯฮฑฯฮฌฮผฮตฯฯฮฟฯ ฯฮฟฯ ฮผฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฯฯ ฮฝฯฮฟฮฝฮฏฯฮตฯฮต. ฮฮนฮฑ ฯ ฯฮทฮปฮฎ ฯฮนฮผฮฎ ฮผฯฮฟฯฮตฮฏ ฮฝฮฑ ฮฟฮดฮทฮณฮฎฯฮตฮน ฯฮต ฯ ฯฮตฯฯฯฮฟฯฮฑฯฮผฮฟฮณฮฎ. ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮดฮฟฮบฮนฮผฮฌฯฮตฯฮต ฮผฯฮฝฮฟฮน ฯฮฑฯ ฮดฮนฮฑฯฮฟฯฮตฯฮนฮบฮญฯ ฯฮนฮผฮญฯ, ฮฑฮปฮปฮฌ ฮฝฮฑ ฮณฮฝฯฯฮฏฮถฮตฯฮต ฯฯฮน ฮผฯฮฟฯฮตฮฏ ฮฝฮฑ ฯฮฌฯฮตฮน ฯฯฮตฯ. ฮงฯฮทฯฮนฮผฮฟฯฮฟฮนฮตฮฏฯฮต ฯฮทฮฝ ฯฯฮฟฮตฯฮนฮปฮตฮณฮผฮญฮฝฮท ฯฮนฮผฮฎ ฮณฮนฮฑ ฯฮนฯ ฮฌฮปฮปฮตฯ ฯฮฑฯฮฑฮผฮญฯฯฮฟฯ ฯ
model_xgb = make_pipeline(
preprocess,
xgboost.XGBClassifier(
n_estimators=600,
objective='binary:logistic',
silent=True,
nthread=1)
)
ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮฒฮตฮปฯฮนฯฯฮตฯฮต ฯฮท ฮดฮนฮฑฯฯฮฑฯ ฯฮฟฯฮผฮตฮฝฮท ฮตฯฮนฮบฯฯฯฯฮท ฮผฮต ฯฮฟ ฯฯฯฮณฯฮฑฮผฮผฮฑ ฯฮฟฮปฮปฮฑฯฮปฮฎฯ ฮตฯฮนฮบฯฯฯฯฮทฯ Stratified K-Folds. ฮฮฑฯฮฑฯฮบฮตฯ ฮฌฮถฮตฯฮต ฮผฯฮฝฮฟ ฯฯฮตฮนฯ ฯฯฯ ฯฮญฯ ฮตฮดฯ ฮณฮนฮฑ ฮฝฮฑ ฮตฯฮนฯฮฑฯฯฮฝฮตฯฮต ฯฮฟฮฝ ฯ ฯฮฟฮปฮฟฮณฮนฯฮผฯ ฮฑฮปฮปฮฌ ฮฝฮฑ ฮผฮตฮนฯฯฮตฯฮต ฯฮทฮฝ ฯฮฟฮนฯฯฮทฯฮฑ. ฮฯ ฮพฮฎฯฯฮต ฮฑฯ ฯฮฎฮฝ ฯฮทฮฝ ฯฮนฮผฮฎ ฯฮต 5 ฮฎ 10 ฯฯฮฟ ฯฯฮฏฯฮน ฮณฮนฮฑ ฮฝฮฑ ฮฒฮตฮปฯฮนฯฯฮตฯฮต ฯฮฑ ฮฑฯฮฟฯฮตฮปฮญฯฮผฮฑฯฮฑ.
ฮฯฮนฮปฮญฮณฮตฯฮต ฮฝฮฑ ฮตฮบฯฮฑฮนฮดฮตฯฯฮตฯฮต ฯฮฟ ฮผฮฟฮฝฯฮญฮปฮฟ ฯฮต ฯฮญฯฯฮตฯฮนฯ ฮตฯฮฑฮฝฮฑฮปฮฎฯฮตฮนฯ.
skf = StratifiedKFold(n_splits=3,
shuffle = True,
random_state = 1001)
random_search = RandomizedSearchCV(model_xgb,
param_distributions=params,
n_iter=4,
scoring='accuracy',
n_jobs=4,
cv=skf.split(X_train, y_train),
verbose=3,
random_state=1001)
ฮ ฯฯ ฯฮฑฮฏฮฑ ฮฑฮฝฮฑฮถฮฎฯฮทฯฮท ฮตฮฏฮฝฮฑฮน ฮญฯฮฟฮนฮผฮท ฮณฮนฮฑ ฯฯฮฎฯฮท, ฮผฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮตฮบฯฮฑฮนฮดฮตฯฯฮตฯฮต ฯฮฟ ฮผฮฟฮฝฯฮญฮปฮฟ
#grid_xgb = GridSearchCV(model_xgb, params, cv=10, iid=False) random_search.fit(X_train, y_train)
Fitting 3 folds for each of 4 candidates, totalling 12 fits [CV] xgbclassifier__max_depth=3, xgbclassifier__gamma=0.5 ............ [CV] xgbclassifier__max_depth=3, xgbclassifier__gamma=0.5 ............ [CV] xgbclassifier__max_depth=3, xgbclassifier__gamma=0.5 ............ [CV] xgbclassifier__max_depth=4, xgbclassifier__gamma=0.5 ............ [CV] xgbclassifier__max_depth=3, xgbclassifier__gamma=0.5, score=0.8759645283888057, total= 1.0min [CV] xgbclassifier__max_depth=4, xgbclassifier__gamma=0.5 ............ [CV] xgbclassifier__max_depth=3, xgbclassifier__gamma=0.5, score=0.8729701715996775, total= 1.0min [CV] xgbclassifier__max_depth=3, xgbclassifier__gamma=0.5, score=0.8706519235199263, total= 1.0min [CV] xgbclassifier__max_depth=4, xgbclassifier__gamma=0.5 ............ [CV] xgbclassifier__max_depth=3, xgbclassifier__gamma=1 .............. [CV] xgbclassifier__max_depth=4, xgbclassifier__gamma=0.5, score=0.8735460094437406, total= 1.3min [CV] xgbclassifier__max_depth=3, xgbclassifier__gamma=1 .............. [CV] xgbclassifier__max_depth=3, xgbclassifier__gamma=1, score=0.8722791661868018, total= 57.7s [CV] xgbclassifier__max_depth=3, xgbclassifier__gamma=1 .............. [CV] xgbclassifier__max_depth=3, xgbclassifier__gamma=1, score=0.8753886905447426, total= 1.0min [CV] xgbclassifier__max_depth=4, xgbclassifier__gamma=1 .............. [CV] xgbclassifier__max_depth=4, xgbclassifier__gamma=0.5, score=0.8697304768486523, total= 1.3min [CV] xgbclassifier__max_depth=4, xgbclassifier__gamma=1 .............. [CV] xgbclassifier__max_depth=4, xgbclassifier__gamma=0.5, score=0.8740066797189912, total= 1.4min [CV] xgbclassifier__max_depth=4, xgbclassifier__gamma=1 .............. [CV] xgbclassifier__max_depth=3, xgbclassifier__gamma=1, score=0.8707671043538355, total= 1.0min [CV] xgbclassifier__max_depth=4, xgbclassifier__gamma=1, score=0.8729701715996775, total= 1.2min [Parallel(n_jobs=4)]: Done 10 out of 12 | elapsed: 3.6min remaining: 43.5s [CV] xgbclassifier__max_depth=4, xgbclassifier__gamma=1, score=0.8736611770125533, total= 1.2min [CV] xgbclassifier__max_depth=4, xgbclassifier__gamma=1, score=0.8692697535130154, total= 1.2min
[Parallel(n_jobs=4)]: Done 12 out of 12 | elapsed: 3.6min finished /Users/Thomas/anaconda3/envs/hello-tf/lib/python3.6/site-packages/sklearn/model_selection/_search.py:737: DeprecationWarning: The default of the `iid` parameter will change from True to False in version 0.22 and will be removed in 0.24. This will change numeric results when test-set sizes are unequal. DeprecationWarning)
RandomizedSearchCV(cv=<generator object _BaseKFold.split at 0x1101eb830>,
error_score='raise-deprecating',
estimator=Pipeline(memory=None,
steps=[('columntransformer', ColumnTransformer(n_jobs=1, remainder='drop', transformer_weights=None,
transformers=[('standardscaler', StandardScaler(copy=True, with_mean=True, with_std=True), [0, 2, 10, 4, 11, 12]), ('onehotencoder', OneHotEncoder(categorical_features=None, categories=None,...
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
silent=True, subsample=1))]),
fit_params=None, iid='warn', n_iter=4, n_jobs=4,
param_distributions={'xgbclassifier__gamma': [0.5, 1], 'xgbclassifier__max_depth': [3, 4]},
pre_dispatch='2*n_jobs', random_state=1001, refit=True,
return_train_score='warn', scoring='accuracy', verbose=3)
ฮฯฯฯ ฮผฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮดฮตฮฏฯฮต, ฯฮฟ XGBoost ฮญฯฮตฮน ฮบฮฑฮปฯฯฮตฯฮท ฮฒฮฑฮธฮผฮฟฮปฮฟฮณฮฏฮฑ ฮฑฯฯ ฯฮทฮฝ ฯฯฮฟฮทฮณฮฟฯฮผฮตฮฝฮท logisitc ฯฮฑฮปฮนฮฝฮดฯฯฮผฮทฯฮท.
print("ฮบฮฑฮปฯ
ฯฮตฯฮฑ parameter", random_search.best_params_)
print("best logistic regression from grid search: %f" % random_search.best_estimator_.score(X_test, y_test))
ฮบฮฑฮปฯ
ฯฮตฯฮฑ parameter {'xgbclassifier__max_depth': 3, 'xgbclassifier__gamma': 0.5}
best logistic regression from grid search: 0.873157
random_search.best_estimator_.predict(X_test)
array(['<=50K', '<=50K', '<=50K', ..., '<=50K', '>50K', '<=50K'], dtype=object)
ฮฮทฮผฮนฮฟฯ ฯฮณฮฎฯฯฮต DNN ฮผฮต ฯฮฟฮฝ MLPClassifier ฯฯฮฟ scikit-learn
ฮคฮญฮปฮฟฯ, ฮผฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮตฮบฯฮฑฮนฮดฮตฯฯฮตฯฮต ฮญฮฝฮฑฮฝ ฮฑฮปฮณฯฯฮนฮธฮผฮฟ ฮฒฮฑฮธฮนฮฌฯ ฮผฮฌฮธฮทฯฮทฯ ฮผฮต ฯฮฟ scikit-learn. ฮ ฮผฮญฮธฮฟฮดฮฟฯ ฮตฮฏฮฝฮฑฮน ฮท ฮฏฮดฮนฮฑ ฮผฮต ฯฮฟฮฝ ฮฌฮปฮปฮฟ ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮฎ. ฮ ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮฎฯ ฮตฮฏฮฝฮฑฮน ฮดฮนฮฑฮธฮญฯฮนฮผฮฟฯ ฯฯฮฟ MLPClassifier.
from sklearn.neural_network import MLPClassifier
ฮฯฮฏฮถฮตฯฮต ฯฮฟฮฝ ฮฑฮบฯฮปฮฟฯ ฮธฮฟ ฮฑฮปฮณฯฯฮนฮธฮผฮฟ ฮฒฮฑฮธฮนฮฌฯ ฮผฮฌฮธฮทฯฮทฯ:
- ฮฮดฮฌฮผ ฮปฯฯฮทฯ
- ฮฮตฮนฯฮฟฯ ฯฮณฮฏฮฑ ฮตฮฝฮตฯฮณฮฟฯฮฟฮฏฮทฯฮทฯ Relu
- ฮฮปฯฮฑ = 0.0001
- ฮผฮญฮณฮตฮธฮฟฯ ฯฮฑฯฯฮฏฮดฮฑฯ 150
- ฮฯฮฟ ฮบฯฯ ฯฮฌ ฯฯฯฯฮผฮฑฯฮฑ ฮผฮต 100 ฮบฮฑฮน 50 ฮฝฮตฯ ฯฯฮฝฮตฯ ฮฑฮฝฯฮฏฯฯฮฟฮนฯฮฑ
model_dnn = make_pipeline(
preprocess,
MLPClassifier(solver='adam',
alpha=0.0001,
activation='relu',
batch_size=150,
hidden_layer_sizes=(200, 100),
random_state=1))
ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮฑฮปฮปฮฌฮพฮตฯฮต ฯฮฟฮฝ ฮฑฯฮนฮธฮผฯ ฯฯฮฝ ฮตฯฮนฯฮญฮดฯฮฝ ฮณฮนฮฑ ฮฝฮฑ ฮฒฮตฮปฯฮนฯฯฮตฯฮต ฯฮฟ ฮผฮฟฮฝฯฮญฮปฮฟ
model_dnn.fit(X_train, y_train)
print("DNN regression score: %f" % model_dnn.score(X_test, y_test))
ฮฮฑฮธฮผฮฟฮปฮฟฮณฮฏฮฑ ฯฮฑฮปฮนฮฝฮดฯฯฮผฮทฯฮทฯ DNN: 0.821253
LIME: ฮฮผฯฮนฯฯฮตฯ ฯฮตฮฏฯฮต ฯฮฟ ฮผฮฟฮฝฯฮญฮปฮฟ ฯฮฑฯ
ฮคฯฯฮฑ ฯฮฟฯ ฮญฯฮตฯฮต ฮญฮฝฮฑ ฮบฮฑฮปฯ ฮผฮฟฮฝฯฮญฮปฮฟ, ฯฯฮตฮนฮฌฮถฮตฯฯฮต ฮญฮฝฮฑ ฮตฯฮณฮฑฮปฮตฮฏฮฟ ฮณฮนฮฑ ฮฝฮฑ ฯฮฟ ฮตฮผฯฮนฯฯฮตฯ ฯฮตฮฏฯฮต. ฮฮบฮผฮฌฮธฮทฯฮท ฮผฮทฯฮฑฮฝฯฮฝ ฮ ฮฑฮปฮณฯฯฮนฮธฮผฮฟฯ, ฮตฮนฮดฮนฮบฮฌ ฯฮฟ ฯฯ ฯฮฑฮฏฮฟ ฮดฮฌฯฮฟฯ ฮบฮฑฮน ฯฮฟ ฮฝฮตฯ ฯฯฮฝฮนฮบฯ ฮดฮฏฮบฯฯ ฮฟ, ฮตฮฏฮฝฮฑฮน ฮณฮฝฯฯฯฯ ฯฯฮน ฮตฮฏฮฝฮฑฮน ฮฑฮปฮณฯฯฮนฮธฮผฮฟฯ ฮผฮฑฯฯฮฟฯ ฮบฮฟฯ ฯฮนฮฟฯ. ฮ ฮตฮฏฯฮต ฮดฮนฮฑฯฮฟฯฮตฯฮนฮบฮฌ, ฮปฮตฮนฯฮฟฯ ฯฮณฮตฮฏ ฮฑฮปฮปฮฌ ฮบฮฑฮฝฮตฮฏฯ ฮดฮตฮฝ ฮพฮญฯฮตฮน ฮณฮนฮฑฯฮฏ.
ฮคฯฮตฮนฯ ฮตฯฮตฯ ฮฝฮทฯฮญฯ ฮญฯฮฟฯ ฮฝ ฮฒฯฮตฮน ฮญฮฝฮฑ ฮตฮพฮฑฮนฯฮตฯฮนฮบฯ ฮตฯฮณฮฑฮปฮตฮฏฮฟ ฮณฮนฮฑ ฮฝฮฑ ฮดฮฟฯ ฮฝ ฯฯฯ ฮฟ ฯ ฯฮฟฮปฮฟฮณฮนฯฯฮฎฯ ฮบฮฌฮฝฮตฮน ฮผฮนฮฑ ฯฯฯฮฒฮปฮตฯฮท. ฮคฮฟ ฯฮฑฯฯฮฏ ฮฟฮฝฮฟฮผฮฌฮถฮตฯฮฑฮน ฮฮนฮฑฯฮฏ ฯฯฮญฯฮตฮน ฮฝฮฑ ฯฮต ฮตฮผฯฮนฯฯฮตฯฮฟฮผฮฑฮน;
ฮฮฝฮญฯฯฯ ฮพฮฑฮฝ ฮญฮฝฮฑฮฝ ฮฑฮปฮณฯฯฮนฮธฮผฮฟ ฮผฮต ฯฮฟ ฯฮฝฮฟฮผฮฑ ฮคฮฟฯฮนฮบฯ ฮตฯฮผฮทฮฝฮตฯ ฯฮนฮบฯ ฮผฮฟฮฝฯฮญฮปฮฟ-ฮฑฮณฮฝฯฯฯฮนฮบฮญฯ ฮตฯฮตฮพฮทฮณฮฎฯฮตฮนฯ (LIME).
ฮ ฮฌฯฯฮต ฮญฮฝฮฑ ฯฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ:
ฮผฮตฯฮนฮบฮญฯ ฯฮฟฯฮญฯ ฮดฮตฮฝ ฮพฮญฯฮตฯฮต ฮฑฮฝ ฮผฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮตฮผฯฮนฯฯฮตฯ ฯฮตฮฏฯฮต ฮผฮนฮฑ ฯฯฯฮฒฮปฮตฯฮท ฮผฮทฯฮฑฮฝฮนฮบฮฎฯ ฮผฮฌฮธฮทฯฮทฯ:
ฮฮฝฮฑฯ ฮณฮนฮฑฯฯฯฯ, ฮณฮนฮฑ ฯฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ, ฮดฮตฮฝ ฮผฯฮฟฯฮตฮฏ ฮฝฮฑ ฮตฮผฯฮนฯฯฮตฯ ฯฮตฮฏ ฮผฮนฮฑ ฮดฮนฮฌฮณฮฝฯฯฮท ฮผฯฮฝฮฟ ฮบฮฑฮน ฮผฯฮฝฮฟ ฮตฯฮตฮนฮดฮฎ ฯฮฟ ฮตฮฏฯฮต ฮญฮฝฮฑฯ ฯ ฯฮฟฮปฮฟฮณฮนฯฯฮฎฯ. ฮ ฯฮญฯฮตฮน ฮตฯฮฏฯฮทฯ ฮฝฮฑ ฮณฮฝฯฯฮฏฮถฮตฯฮต ฮตฮฌฮฝ ฮผฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮตฮผฯฮนฯฯฮตฯ ฯฮตฮฏฯฮต ฯฮฟ ฮผฮฟฮฝฯฮญฮปฮฟ ฯฯฮนฮฝ ฯฮฟ ฮฒฮฌฮปฮตฯฮต ฯฯฮทฮฝ ฯฮฑฯฮฑฮณฯฮณฮฎ.
ฮฆฮฑฮฝฯฮฑฯฯฮตฮฏฯฮต ฯฯฮน ฮผฯฮฟฯฮฟฯฮผฮต ฮฝฮฑ ฮบฮฑฯฮฑฮปฮฌฮฒฮฟฯ ฮผฮต ฮณฮนฮฑฯฮฏ ฮฟฯฮฟฮนฮฟฯฮดฮฎฯฮฟฯฮต ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮฎฯ ฮบฮฌฮฝฮตฮน ฮผฮนฮฑ ฯฯฯฮฒฮปฮตฯฮท ฮฑฮบฯฮผฮฑ ฮบฮฑฮน ฮฑฯฮฏฯฯฮตฯ ฯฮฑ ฯฮตฯฮฏฯฮปฮฟฮบฮฑ ฮผฮฟฮฝฯฮญฮปฮฑ ฯฯฯฯ ฮฝฮตฯ ฯฯฮฝฮนฮบฮฌ ฮดฮฏฮบฯฯ ฮฑ, ฯฯ ฯฮฑฮฏฮฑ ฮดฮฌฯฮท ฮฎ svms ฮผฮต ฮฟฯฮฟฮนฮฟฮฝฮดฮฎฯฮฟฯฮต ฯฯ ฯฮฎฮฝฮฑ
ฮธฮฑ ฮณฮฏฮฝฮตฮน ฯฮนฮฟ ฯฯฮฟฯฮนฯฯ ฮณฮนฮฑ ฮฝฮฑ ฮตฮผฯฮนฯฯฮตฯ ฯฮฟฯฮผฮต ฮผฮนฮฑ ฯฯฯฮฒฮปฮตฯฮท, ฮฑฮฝ ฮบฮฑฯฮฑฮปฮฌฮฒฮฟฯ ฮผฮต ฯฮฟฯ ฯ ฮปฯฮณฮฟฯ ฯ ฯฮฏฯฯ ฮฑฯฯ ฮฑฯ ฯฮฎฮฝ. ฮฯฯ ฯฮฟ ฯฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ ฮผฮต ฯฮฟฮฝ ฮณฮนฮฑฯฯฯ, ฮฑฮฝ ฯฮฟ ฮผฮฟฮฝฯฮญฮปฮฟ ฯฮฟฯ ฮญฮปฮตฮณฮต ฯฮฟฮนฮฑ ฯฯ ฮผฯฯฯฮผฮฑฯฮฑ ฮตฮฏฮฝฮฑฮน ฮฑฯฮฑฯฮฑฮฏฯฮทฯฮฑ ฮธฮฑ ฯฮฟ ฮตฮผฯฮนฯฯฮตฯ ฯฯฮฟฯ ฮฝ, ฮตฮฏฮฝฮฑฮน ฮตฯฮฏฯฮทฯ ฯฮนฮฟ ฮตฯฮบฮฟฮปฮฟ ฮฝฮฑ ฮบฮฑฯฮฑฮปฮฌฮฒฮตฮนฯ ฮฑฮฝ ฮดฮตฮฝ ฯฯฮญฯฮตฮน ฮฝฮฑ ฮตฮผฯฮนฯฯฮตฯฮตฯฮฑฮน ฯฮฟ ฮผฮฟฮฝฯฮญฮปฮฟ.
ฮคฮฟ Lime ฮผฯฮฟฯฮตฮฏ ฮฝฮฑ ฯฮฑฯ ฯฮตฮน ฯฮฟฮนฮฑ ฯฮฑฯฮฑฮบฯฮทฯฮนฯฯฮนฮบฮฌ ฮตฯฮทฯฮตฮฌฮถฮฟฯ ฮฝ ฯฮนฯ ฮฑฯฮฟฯฮฌฯฮตฮนฯ ฯฮฟฯ ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮฎ
ฮ ฯฮฟฮตฯฮฟฮนฮผฮฑฯฮฏฮฑ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ
ฮฮฏฮฝฮฑฮน ฮผฮตฯฮนฮบฮฌ ฯฯฮฌฮณฮผฮฑฯฮฑ ฯฮฟฯ ฯฯฮญฯฮตฮน ฮฝฮฑ ฮฑฮปฮปฮฌฮพฮตฯฮต ฮณฮนฮฑ ฮฝฮฑ ฯฯฮญฮพฮตฯฮต ฮผฮต ฯฮฟ LIME ฮ ฯฮธฯฮฝ. ฮ ฯฯฯฮฑ ฮฑฯ 'ฯฮปฮฑ, ฯฯฮญฯฮตฮน ฮฝฮฑ ฮตฮณฮบฮฑฯฮฑฯฯฮฎฯฮตฯฮต ฮฑฯฮฒฮญฯฯฮท ฯฯฮฟ ฯฮตฯฮผฮฑฯฮนฮบฯ. ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮฎฯฮตฯฮต pip install lime
ฮคฮฟ Lime ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮตฮฏ ฯฮฟ ฮฑฮฝฯฮนฮบฮตฮฏฮผฮตฮฝฮฟ LimeTabularExplainer ฮณฮนฮฑ ฮฝฮฑ ฯฯฮฟฯฮตฮณฮณฮฏฯฮตฮน ฯฮฟ ฮผฮฟฮฝฯฮญฮปฮฟ ฯฮฟฯฮนฮบฮฌ. ฮฯ ฯฯ ฯฮฟ ฮฑฮฝฯฮนฮบฮตฮฏฮผฮตฮฝฮฟ ฮฑฯฮฑฮนฯฮตฮฏ:
- ฮญฮฝฮฑ ฯฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ ฯฮต ฮผฮฟฯฯฮฎ numpy
- ฮคฮฟ ฯฮฝฮฟฮผฮฑ ฯฯฮฝ ฯฮฑฯฮฑฮบฯฮทฯฮนฯฯฮนฮบฯฮฝ: feature_names
- ฮคฮฟ ฯฮฝฮฟฮผฮฑ ฯฯฮฝ ฮบฮปฮฌฯฮตฯฮฝ: class_names
- ฮคฮฟ ฮตฯ ฯฮตฯฮฎฯฮนฮฟ ฯฮทฯ ฯฯฮฎฮปฮทฯ ฯฯฮฝ ฮบฮฑฯฮทฮณฮฟฯฮนฮบฯฮฝ ฯฮฑฯฮฑฮบฯฮทฯฮนฯฯฮนฮบฯฮฝ: categorical_features
- ฮคฮฟ ฯฮฝฮฟฮผฮฑ ฯฮทฯ ฮฟฮผฮฌฮดฮฑฯ ฮณฮนฮฑ ฮบฮฌฮธฮต ฮบฮฑฯฮทฮณฮฟฯฮฏฮฑ ฯฮฑฯฮฑฮบฯฮทฯฮนฯฯฮนฮบฯฮฝ: categorical_names
ฮฮทฮผฮนฮฟฯ ฯฮณฮฎฯฯฮต ฮญฮฝฮฑ ฯฮตฯ ฯฯฮญฮฝฯฮฝ ฮผฮต ฮฑฮฝฯฮผฮฑฮปฮฟ ฯฯฮญฮฝฮฟ
ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮฑฮฝฯฮนฮณฯฮฌฯฮตฯฮต ฮบฮฑฮน ฮฝฮฑ ฮผฮตฯฮฑฯฯฮญฯฮตฯฮต ฯฮฟ df_train ฮฑฯฯ pandas ฯฮต ฯฮฟฮปฮปฮฟฮฏ ฯฮฑฮฝฮตฯฮบฮฟฮปฮฑ
df_train.head(5) # Create numpy data df_lime = df_train df_lime.head(3)
ฮฮฌฮฒฮตฯฮต ฯฮฟ ฯฮฝฮฟฮผฮฑ ฯฮทฯ ฯฮฌฮพฮทฯ ฮ ฮตฯฮนฮบฮญฯฮฑ ฮตฮฏฮฝฮฑฮน ฯฯฮฟฯฮฒฮฌฯฮนฮผฮท ฮผฮต ฯฮฟ ฮฑฮฝฯฮนฮบฮตฮฏฮผฮตฮฝฮฟ unique(). ฮ ฯฮญฯฮตฮน ฮฝฮฑ ฮดฮตฮนฯ:
- '<=50 ฯฮนฮปฮนฮฌฮดฮตฯ'
- '>50 ฯฮนฮปฮนฮฌฮดฮตฯ'
# Get the class name class_names = df_lime.label.unique() class_names
array(['<=50K', '>50K'], dtype=object)
ฮตฯ ฯฮตฯฮฎฯฮนฮฟ ฯฮทฯ ฯฯฮฎฮปฮทฯ ฯฯฮฝ ฮบฮฑฯฮทฮณฮฟฯฮนฮบฯฮฝ ฯฮฑฯฮฑฮบฯฮทฯฮนฯฯฮนฮบฯฮฝ
ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮฎฯฮตฯฮต ฯฮท ฮผฮญฮธฮฟฮดฮฟ ฯฮฟฯ ฮฑฮบฮฟฯ ฮผฯฮฌฯฮต ฯฯฮนฮฝ ฮณฮนฮฑ ฮฝฮฑ ฮปฮฌฮฒฮตฯฮต ฯฮฟ ฯฮฝฮฟฮผฮฑ ฯฮทฯ ฮฟฮผฮฌฮดฮฑฯ. ฮฯฮดฮนฮบฮฟฯฮฟฮนฮตฮฏฯฮต ฯฮทฮฝ ฮตฯฮนฮบฮญฯฮฑ ฮผฮต ฯฮฟ LabelEncoder. ฮฯฮฑฮฝฮฑฮปฮฑฮผฮฒฮฌฮฝฮตฯฮต ฯฮท ฮปฮตฮนฯฮฟฯ ฯฮณฮฏฮฑ ฯฮต ฯฮปฮฑ ฯฮฑ ฮบฮฑฯฮทฮณฮฟฯฮนฮบฮฌ ฯฮฑฯฮฑฮบฯฮทฯฮนฯฯฮนฮบฮฌ.
##
import sklearn.preprocessing as preprocessing
categorical_names = {}
for feature in CATE_FEATURES:
le = preprocessing.LabelEncoder()
le.fit(df_lime[feature])
df_lime[feature] = le.transform(df_lime[feature])
categorical_names[feature] = le.classes_
print(categorical_names)
{'workclass': array(['?', 'Federal-gov', 'Local-gov', 'Never-worked', 'Private',
'Self-emp-inc', 'Self-emp-not-inc', 'State-gov', 'Without-pay'],
dtype=object), 'education': array(['10th', '11th', '12th', '1st-4th', '5th-6th', '7th-8th', '9th',
'Assoc-acdm', 'Assoc-voc', 'Bachelors', 'Doctorate', 'HS-grad',
'Masters', 'Preschool', 'Prof-school', 'Some-college'],
dtype=object), 'marital': array(['Divorced', 'Married-AF-spouse', 'Married-civ-spouse',
'Married-spouse-absent', 'Never-married', 'Separated', 'Widowed'],
dtype=object), 'occupation': array(['?', 'Adm-clerical', 'Armed-Forces', 'Craft-repair',
'Exec-managerial', 'Farming-fishing', 'Handlers-cleaners',
'Machine-op-inspct', 'Other-service', 'Priv-house-serv',
'Prof-specialty', 'Protective-serv', 'Sales', 'Tech-support',
'Transport-moving'], dtype=object), 'relationship': array(['Husband', 'Not-in-family', 'Other-relative', 'Own-child',
'Unmarried', 'Wife'], dtype=object), 'race': array(['Amer-Indian-Eskimo', 'Asian-Pac-Islander', 'Black', 'Other',
'White'], dtype=object), 'sex': array(['Female', 'Male'], dtype=object), 'native_country': array(['?', 'Cambodia', 'Canada', 'China', 'Columbia', 'Cuba',
'Dominican-Republic', 'Ecuador', 'El-Salvador', 'England',
'France', 'Germany', 'Greece', 'Guatemala', 'Haiti', 'Honduras',
'Hong', 'Hungary', 'India', 'Iran', 'Ireland', 'Italy', 'Jamaica',
'Japan', 'Laos', 'Mexico', 'Nicaragua',
'Outlying-US(Guam-USVI-etc)', 'Peru', 'Philippines', 'Poland',
'Portugal', 'Puerto-Rico', 'Scotland', 'South', 'Taiwan',
'Thailand', 'Trinadad&Tobago', 'United-States', 'Vietnam',
'Yugoslavia'], dtype=object)}
df_lime.dtypes
age float64 workclass int64 fnlwgt float64 education int64 education_num float64 marital int64 occupation int64 relationship int64 race int64 sex int64 capital_gain float64 capital_loss float64 hours_week float64 native_country int64 label object dtype: object
ฮคฯฯฮฑ ฯฮฟฯ ฯฮฟ ฯฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ ฮตฮฏฮฝฮฑฮน ฮญฯฮฟฮนฮผฮฟ, ฮผฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮดฮทฮผฮนฮฟฯ ฯฮณฮฎฯฮตฯฮต ฯฮฟ ฮดฮนฮฑฯฮฟฯฮตฯฮนฮบฯ ฯฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ ฯฯฯฯ ฯฮฑฮฏฮฝฮตฯฮฑฮน ฯฯฮฑ ฯฮฑฯฮฑฮบฮฌฯฯ ฯฮฑฯฮฑฮดฮตฮฏฮณฮผฮฑฯฮฑ ฮตฮบฮผฮฌฮธฮทฯฮทฯ Scikit. ฮฃฯฮทฮฝ ฯฯฮฑฮณฮผฮฑฯฮนฮบฯฯฮทฯฮฑ ฮผฮตฯฮฑฯฯฮทฮผฮฑฯฮฏฮถฮตฯฮต ฯฮฑ ฮดฮตฮดฮฟฮผฮญฮฝฮฑ ฮตฮบฯฯฯ ฯฮฟฯ ฮฑฮณฯฮณฮฟฯ ฮณฮนฮฑ ฮฝฮฑ ฮฑฯฮฟฯฯฮณฮตฯฮต ฯฯฮฌฮปฮผฮฑฯฮฑ ฮผฮต ฯฮฟ LIME. ฮคฮฟ ฯฮตฯ ฮตฮบฯฮฑฮฏฮดฮตฯ ฯฮทฯ ฯฯฮฟ LimeTabularExplainer ฮธฮฑ ฯฯฮญฯฮตฮน ฮฝฮฑ ฮตฮฏฮฝฮฑฮน ฮญฮฝฮฑฯ numpy ฯฮฏฮฝฮฑฮบฮฑฯ ฯฯฯฮฏฯ ฯฯ ฮผฮฒฮฟฮปฮฟฯฮตฮนฯฮฌ. ฮฮต ฯฮทฮฝ ฯฮฑฯฮฑฯฮฌฮฝฯ ฮผฮญฮธฮฟฮดฮฟ, ฮญฯฮตฯฮต ฮฎฮดฮท ฮผฮตฯฮฑฯฯฮญฯฮตฮน ฮญฮฝฮฑ ฯฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ ฮตฮบฯฮฑฮฏฮดฮตฯ ฯฮทฯ.
from sklearn.model_selection import train_test_split
X_train_lime, X_test_lime, y_train_lime, y_test_lime = train_test_split(df_lime[features],
df_lime.label,
test_size = 0.2,
random_state=0)
X_train_lime.head(5)
ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฯฯฮนฮฌฮพฮตฯฮต ฯฮท ฮดฮนฮฟฯฮญฯฮตฯ ฯฮท ฮผฮต ฯฮนฯ ฮฒฮญฮปฯฮนฯฯฮตฯ ฯฮฑฯฮฑฮผฮญฯฯฮฟฯ ฯ ฮฑฯฯ ฯฮฟ XGBoost
model_xgb = make_pipeline(
preprocess,
xgboost.XGBClassifier(max_depth = 3,
gamma = 0.5,
n_estimators=600,
objective='binary:logistic',
silent=True,
nthread=1))
model_xgb.fit(X_train_lime, y_train_lime)
/Users/Thomas/anaconda3/envs/hello-tf/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py:351: FutureWarning: The handling of integer data will change in version 0.22. Currently, the categories are determined based on the range [0, max(values)], while in the future they will be determined based on the unique values. If you want the future behavior and silence this warning, you can specify "categories='auto'."In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly. warnings.warn(msg, FutureWarning)
Pipeline(memory=None,
steps=[('columntransformer', ColumnTransformer(n_jobs=1, remainder='drop', transformer_weights=None,
transformers=[('standardscaler', StandardScaler(copy=True, with_mean=True, with_std=True), [0, 2, 10, 4, 11, 12]), ('onehotencoder', OneHotEncoder(categorical_features=None, categories=None,...
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
silent=True, subsample=1))])
ฮฮฑฮผฮฒฮฌฮฝฮตฯฮต ฮผฮนฮฑ ฯฯฮฟฮตฮนฮดฮฟฯฮฟฮฏฮทฯฮท. ฮ ฯฯฮฟฮตฮนฮดฮฟฯฮฟฮฏฮทฯฮท ฮตฮพฮทฮณฮตฮฏ ฯฯฮน ฮดฮตฮฝ ฯฯฮตฮนฮฌฮถฮตฯฮฑฮน ฮฝฮฑ ฮดฮทฮผฮนฮฟฯ ฯฮณฮฎฯฮตฯฮต ฮญฮฝฮฑฮฝ ฮบฯฮดฮนฮบฮฟฯฮฟฮนฮทฯฮฎ ฮตฯฮนฮบฮตฯฯฮฝ ฯฯฮนฮฝ ฮฑฯฯ ฯฮท ฮดฮนฮฟฯฮญฯฮตฯ ฯฮท. ฮฮฌฮฝ ฮดฮตฮฝ ฮธฮญฮปฮตฯฮต ฮฝฮฑ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮฎฯฮตฯฮต ฯฮฟ LIME, ฮผฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮฎฯฮตฯฮต ฯฮท ฮผฮญฮธฮฟฮดฮฟ ฮฑฯฯ ฯฮฟ ฯฯฯฯฮฟ ฮผฮญฯฮฟฯ ฯฮฟฯ ฯฮตฮผฮนฮฝฮฑฯฮฏฮฟฯ Machine Learning with Scikit-learn. ฮฮนฮฑฯฮฟฯฮตฯฮนฮบฮฌ, ฮผฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฯฯ ฮฝฮตฯฮฏฯฮตฯฮต ฮผฮต ฮฑฯ ฯฮฎฮฝ ฯฮท ฮผฮญฮธฮฟฮดฮฟ, ฯฯฯฯฮฑ ฮฝฮฑ ฮดฮทฮผฮนฮฟฯ ฯฮณฮฎฯฮตฯฮต ฮญฮฝฮฑ ฮบฯฮดฮนฮบฮฟฯฮฟฮนฮทฮผฮญฮฝฮฟ ฯฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ, ฮฝฮฑ ฮฟฯฮฏฯฮตฯฮต ฮปฮฎฯฮท ฯฮฟฯ ฮบฯฮดฮนฮบฮฟฯฮฟฮนฮทฯฮฎ hot one ฮตฮฝฯฯฯ ฯฮฟฯ ฮฑฮณฯฮณฮฟฯ.
print("best logistic regression from grid search: %f" % model_xgb.score(X_test_lime, y_test_lime))
best logistic regression from grid search: 0.873157
model_xgb.predict_proba(X_test_lime)
array([[7.9646105e-01, 2.0353897e-01],
[9.5173013e-01, 4.8269872e-02],
[7.9344827e-01, 2.0655173e-01],
...,
[9.9031430e-01, 9.6856682e-03],
[6.4581633e-04, 9.9935418e-01],
[9.7104281e-01, 2.8957171e-02]], dtype=float32)
ฮ ฯฮนฮฝ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮฎฯฮตฯฮต ฯฮฟ LIME ฯฮต ฮดฯฮฌฯฮท, ฮฑฯ ฮดฮทฮผฮนฮฟฯ ฯฮณฮฎฯฮฟฯ ฮผฮต ฮญฮฝฮฑฮฝ numpy array ฮผฮต ฯฮฑ ฯฮฑฯฮฑฮบฯฮทฯฮนฯฯฮนฮบฮฌ ฯฮทฯ ฮปฮฑฮฝฮธฮฑฯฮผฮญฮฝฮทฯ ฯฮฑฮพฮนฮฝฯฮผฮทฯฮทฯ. ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮฎฯฮตฯฮต ฮฑฯ ฯฮฎฮฝ ฯฮท ฮปฮฏฯฯฮฑ ฮฑฯฮณฯฯฮตฯฮฑ ฮณฮนฮฑ ฮฝฮฑ ฯฮฌฯฮตฯฮต ฮผฮนฮฑ ฮนฮดฮญฮฑ ฯฯฮตฯฮนฮบฮฌ ฮผฮต ฯฮฟ ฯฮน ฯฮฑฯฮฑฯฮปฮฑฮฝฮฌ ฯฮฟฮฝ ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮฎ.
temp = pd.concat([X_test_lime, y_test_lime], axis= 1)
temp['predicted'] = model_xgb.predict(X_test_lime)
temp['wrong']= temp['label'] != temp['predicted']
temp = temp.query('wrong==True').drop('wrong', axis=1)
temp= temp.sort_values(by=['label'])
temp.shape
(826, 16)
ฮฮทฮผฮนฮฟฯ ฯฮณฮตฮฏฯฮต ฮผฮนฮฑ ฯฯ ฮฝฮฌฯฯฮทฯฮท ฮปฮฌฮผฮดฮฑ ฮณฮนฮฑ ฮฝฮฑ ฮฑฮฝฮฑฮบฯฮฎฯฮตฯฮต ฯฮทฮฝ ฯฯฯฮฒฮปฮตฯฮท ฮฑฯฯ ฯฮฟ ฮผฮฟฮฝฯฮญฮปฮฟ ฮผฮต ฯฮฑ ฮฝฮญฮฑ ฮดฮตฮดฮฟฮผฮญฮฝฮฑ. ฮฮฑ ฯฮฟ ฯฯฮตฮนฮฑฯฯฮตฮฏฯฮต ฯฯฮฝฯฮฟฮผฮฑ.
predict_fn = lambda x: model_xgb.predict_proba(x).astype(float) X_test_lime.dtypes
age float64 workclass int64 fnlwgt float64 education int64 education_num float64 marital int64 occupation int64 relationship int64 race int64 sex int64 capital_gain float64 capital_loss float64 hours_week float64 native_country int64 dtype: object
predict_fn(X_test_lime)
array([[7.96461046e-01, 2.03538969e-01],
[9.51730132e-01, 4.82698716e-02],
[7.93448269e-01, 2.06551731e-01],
...,
[9.90314305e-01, 9.68566816e-03],
[6.45816326e-04, 9.99354184e-01],
[9.71042812e-01, 2.89571714e-02]])
ฮฮตฯฮฑฯฯฮญฯฮตฯฮต ฯฮฟ ฯฮปฮฑฮฏฯฮนฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ pandas ฯฮต numpy array
X_train_lime = X_train_lime.values X_test_lime = X_test_lime.values X_test_lime
array([[4.00000e+01, 5.00000e+00, 1.93524e+05, ..., 0.00000e+00,
4.00000e+01, 3.80000e+01],
[2.70000e+01, 4.00000e+00, 2.16481e+05, ..., 0.00000e+00,
4.00000e+01, 3.80000e+01],
[2.50000e+01, 4.00000e+00, 2.56263e+05, ..., 0.00000e+00,
4.00000e+01, 3.80000e+01],
...,
[2.80000e+01, 6.00000e+00, 2.11032e+05, ..., 0.00000e+00,
4.00000e+01, 2.50000e+01],
[4.40000e+01, 4.00000e+00, 1.67005e+05, ..., 0.00000e+00,
6.00000e+01, 3.80000e+01],
[5.30000e+01, 4.00000e+00, 2.57940e+05, ..., 0.00000e+00,
4.00000e+01, 3.80000e+01]])
model_xgb.predict_proba(X_test_lime)
array([[7.9646105e-01, 2.0353897e-01],
[9.5173013e-01, 4.8269872e-02],
[7.9344827e-01, 2.0655173e-01],
...,
[9.9031430e-01, 9.6856682e-03],
[6.4581633e-04, 9.9935418e-01],
[9.7104281e-01, 2.8957171e-02]], dtype=float32)
print(features,
class_names,
categorical_features,
categorical_names)
['age', 'workclass', 'fnlwgt', 'education', 'education_num', 'marital', 'occupation', 'relationship', 'race', 'sex', 'capital_gain', 'capital_loss', 'hours_week', 'native_country'] ['<=50K' '>50K'] [1, 3, 5, 6, 7, 8, 9, 13] {'workclass': array(['?', 'Federal-gov', 'Local-gov', 'Never-worked', 'Private',
'Self-emp-inc', 'Self-emp-not-inc', 'State-gov', 'Without-pay'],
dtype=object), 'education': array(['10th', '11th', '12th', '1st-4th', '5th-6th', '7th-8th', '9th',
'Assoc-acdm', 'Assoc-voc', 'Bachelors', 'Doctorate', 'HS-grad',
'Masters', 'Preschool', 'Prof-school', 'Some-college'],
dtype=object), 'marital': array(['Divorced', 'Married-AF-spouse', 'Married-civ-spouse',
'Married-spouse-absent', 'Never-married', 'Separated', 'Widowed'],
dtype=object), 'occupation': array(['?', 'Adm-clerical', 'Armed-Forces', 'Craft-repair',
'Exec-managerial', 'Farming-fishing', 'Handlers-cleaners',
'Machine-op-inspct', 'Other-service', 'Priv-house-serv',
'Prof-specialty', 'Protective-serv', 'Sales', 'Tech-support',
'Transport-moving'], dtype=object), 'relationship': array(['Husband', 'Not-in-family', 'Other-relative', 'Own-child',
'Unmarried', 'Wife'], dtype=object), 'race': array(['Amer-Indian-Eskimo', 'Asian-Pac-Islander', 'Black', 'Other',
'White'], dtype=object), 'sex': array(['Female', 'Male'], dtype=object), 'native_country': array(['?', 'Cambodia', 'Canada', 'China', 'Columbia', 'Cuba',
'Dominican-Republic', 'Ecuador', 'El-Salvador', 'England',
'France', 'Germany', 'Greece', 'Guatemala', 'Haiti', 'Honduras',
'Hong', 'Hungary', 'India', 'Iran', 'Ireland', 'Italy', 'Jamaica',
'Japan', 'Laos', 'Mexico', 'Nicaragua',
'Outlying-US(Guam-USVI-etc)', 'Peru', 'Philippines', 'Poland',
'Portugal', 'Puerto-Rico', 'Scotland', 'South', 'Taiwan',
'Thailand', 'Trinadad&Tobago', 'United-States', 'Vietnam',
'Yugoslavia'], dtype=object)}
import lime
import lime.lime_tabular
### Train should be label encoded not one hot encoded
explainer = lime.lime_tabular.LimeTabularExplainer(X_train_lime ,
feature_names = features,
class_names=class_names,
categorical_features=categorical_features,
categorical_names=categorical_names,
kernel_width=3)
ฮฯ ฮตฯฮนฮปฮญฮพฮฟฯ ฮผฮต ฮญฮฝฮฑ ฯฯ ฯฮฑฮฏฮฟ ฮฝฮฟฮนฮบฮฟฮบฯ ฯฮนฯ ฮฑฯฯ ฯฮฟ ฮดฮฟฮบฮนฮผฮฑฯฯฮนฮบฯ ฯฯฮฝฮฟฮปฮฟ ฮบฮฑฮน ฮฑฯ ฮดฮฟฯฮผฮต ฯฮทฮฝ ฯฯฯฮฒฮปฮตฯฮท ฯฮฟฯ ฮผฮฟฮฝฯฮญฮปฮฟฯ ฮบฮฑฮน ฯฯฯ ฮฟ ฯ ฯฮฟฮปฮฟฮณฮนฯฯฮฎฯ ฮญฮบฮฑฮฝฮต ฯฮทฮฝ ฮตฯฮนฮปฮฟฮณฮฎ ฯฮฟฯ .
import numpy as np np.random.seed(1) i = 100 print(y_test_lime.iloc[i]) >50K
X_test_lime[i]
array([4.20000e+01, 4.00000e+00, 1.76286e+05, 7.00000e+00, 1.20000e+01,
2.00000e+00, 4.00000e+00, 0.00000e+00, 4.00000e+00, 1.00000e+00,
0.00000e+00, 0.00000e+00, 4.00000e+01, 3.80000e+01])
ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฯฯฮทฯฮนฮผฮฟฯฮฟฮนฮฎฯฮตฯฮต ฯฮทฮฝ ฮตฯฮตฮพฮฎฮณฮทฯฮท ฮผฮต ฯฮฟ explore_instance ฮณฮนฮฑ ฮฝฮฑ ฮตฮปฮญฮณฮพฮตฯฮต ฯฮทฮฝ ฮตฮพฮฎฮณฮทฯฮท ฯฮฏฯฯ ฮฑฯฯ ฯฮฟ ฮผฮฟฮฝฯฮญฮปฮฟ
exp = explainer.explain_instance(X_test_lime[i], predict_fn, num_features=6) exp.show_in_notebook(show_all=False)
ฮฯฮฟฯฮฟฯฮผฮต ฮฝฮฑ ฮดฮฟฯฮผฮต ฯฯฮน ฮฟ ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮฎฯ ฯฯฮฟฮญฮฒฮปฮตฯฮต ฯฯฯฯฮฌ ฯฮฟ ฮฝฮฟฮนฮบฮฟฮบฯ ฯฮนฯ. ฮคฮฟ ฮตฮนฯฯฮดฮทฮผฮฑ ฮตฮฏฮฝฮฑฮน, ฯฯฮฌฮณฮผฮฑฯฮน, ฯฮฌฮฝฯ ฮฑฯฯ 50 ฯฮนฮปฮนฮฌฮดฮตฯ.
ฮคฮฟ ฯฯฯฯฮฟ ฯฯฮฌฮณฮผฮฑ ฯฮฟฯ ฮผฯฮฟฯฮฟฯฮผฮต ฮฝฮฑ ฯฮฟฯฮผฮต ฮตฮฏฮฝฮฑฮน ฯฯฮน ฮฟ ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮฎฯ ฮดฮตฮฝ ฮตฮฏฮฝฮฑฮน ฯฯฯฮฟ ฯฮฏฮณฮฟฯ ฯฮฟฯ ฮณฮนฮฑ ฯฮนฯ ฯฯฮฟฮฒฮปฮตฯฯฮผฮตฮฝฮตฯ ฯฮนฮธฮฑฮฝฯฯฮทฯฮตฯ. ฮ ฮผฮทฯฮฑฮฝฮฎ ฯฯฮฟฮฒฮปฮญฯฮตฮน ฯฯฮน ฯฮฟ ฮฝฮฟฮนฮบฮฟฮบฯ ฯฮนฯ ฮญฯฮตฮน ฮตฮนฯฯฮดฮทฮผฮฑ ฯฮฌฮฝฯ ฮฑฯฯ 50 ฯฮนฮปฮนฮฌฮดฮตฯ ฮผฮต ฯฮนฮธฮฑฮฝฯฯฮทฯฮฑ 64%. ฮฯ ฯฯ ฯฮฟ 64% ฮฑฯฮฟฯฮตฮปฮตฮฏฯฮฑฮน ฮฑฯฯ ฮบฮตฯฮฑฮปฮฑฮนฮฟฯ ฯฮนฮบฯ ฮบฮญฯฮดฮฟฯ ฮบฮฑฮน ฮณฮฌฮผฮฟ. ฮคฮฟ ฮผฯฮปฮต ฯฯฯฮผฮฑ ฯฯ ฮผฮฒฮฌฮปฮปฮตฮน ฮฑฯฮฝฮทฯฮนฮบฮฌ ฯฯฮท ฮธฮตฯฮนฮบฮฎ ฮบฮฑฯฮทฮณฮฟฯฮฏฮฑ ฮบฮฑฮน ฮท ฯฮฟฯฯฮฟฮบฮฑฮปฮฏ ฮณฯฮฑฮผฮผฮฎ, ฮธฮตฯฮนฮบฮฌ.
ฮ ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮฎฯ ฮตฮฏฮฝฮฑฮน ฮผฯฮตฯฮดฮตฮผฮญฮฝฮฟฯ ฮตฯฮตฮนฮดฮฎ ฯฮฟ ฮบฮตฯฮฑฮปฮฑฮนฮฑฮบฯ ฮบฮญฯฮดฮฟฯ ฮฑฯ ฯฮฟฯ ฯฮฟฯ ฮฝฮฟฮนฮบฮฟฮบฯ ฯฮนฮฟฯ ฮตฮฏฮฝฮฑฮน ฮผฮทฮดฮตฮฝฮนฮบฯ, ฮตฮฝฯ ฯฮฟ ฮบฮตฯฮฑฮปฮฑฮนฮฑฮบฯ ฮบฮญฯฮดฮฟฯ ฮตฮฏฮฝฮฑฮน ฯฯ ฮฝฮฎฮธฯฯ ฮญฮฝฮฑฯ ฮบฮฑฮปฯฯ ฯฯฮฟฮณฮฝฯฯฯฮนฮบฯฯ ฮดฮตฮฏฮบฯฮทฯ ฯฮฟฯ ฯฮปฮฟฯฯฮฟฯ . ฮฮพฮฌฮปฮปฮฟฯ , ฯฮฟ ฮฝฮฟฮนฮบฮฟฮบฯ ฯฮนฯ ฮตฯฮณฮฌฮถฮตฯฮฑฮน ฮปฮนฮณฯฯฮตฯฮฟ ฮฑฯฯ 40 ฯฯฮตฯ ฯฮทฮฝ ฮตฮฒฮดฮฟฮผฮฌฮดฮฑ. ฮ ฮทฮปฮนฮบฮฏฮฑ, ฯฮฟ ฮตฯฮฌฮณฮณฮตฮปฮผฮฑ ฮบฮฑฮน ฯฮฟ ฯฯฮปฮฟ ฯฯ ฮผฮฒฮฌฮปฮปฮฟฯ ฮฝ ฮธฮตฯฮนฮบฮฌ ฯฯฮฟฮฝ ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮฎ.
ฮฮฌฮฝ ฮท ฮฟฮนฮบฮฟฮณฮตฮฝฮตฮนฮฑฮบฮฎ ฮบฮฑฯฮฌฯฯฮฑฯฮท ฮฎฯฮฑฮฝ ฮฌฮณฮฑฮผฮฟฯ, ฮฟ ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮฎฯ ฮธฮฑ ฮตฮฏฯฮต ฯฯฮฟฮฒฮปฮญฯฮตฮน ฮตฮนฯฯฮดฮทฮผฮฑ ฮบฮฌฯฯ ฮฑฯฯ 50 ฯฮนฮปฮนฮฌฮดฮตฯ (0.64-0.18 = 0.46)
ฮฯฮฟฯฮฟฯฮผฮต ฮฝฮฑ ฮดฮฟฮบฮนฮผฮฌฯฮฟฯ ฮผฮต ฮผฮต ฮฌฮปฮปฮฟ ฮฝฮฟฮนฮบฮฟฮบฯ ฯฮนฯ ฯฮฟฯ ฮญฯฮตฮน ฯฮฑฮพฮนฮฝฮฟฮผฮทฮธฮตฮฏ ฮตฯฯฮฑฮปฮผฮญฮฝฮฑ
temp.head(3) temp.iloc[1,:-2]
age 58 workclass 4 fnlwgt 68624 education 11 education_num 9 marital 2 occupation 4 relationship 0 race 4 sex 1 capital_gain 0 capital_loss 0 hours_week 45 native_country 38 Name: 20931, dtype: object
i = 1
print('This observation is', temp.iloc[i,-2:])
This observation is label <=50K predicted >50K Name: 20931, dtype: object
exp = explainer.explain_instance(temp.iloc[1,:-2], predict_fn, num_features=6) exp.show_in_notebook(show_all=False)
ฮ ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮฎฯ ฯฯฮฟฮญฮฒฮปฮตฯฮต ฮตฮนฯฯฮดฮทฮผฮฑ ฮบฮฌฯฯ ฯฯฮฝ 50 ฯฮนฮปฮนฮฌฮดฯฮฝ ฮตฮฝฯ ฮตฮฏฮฝฮฑฮน ฮฑฮฝฮฑฮปฮทฮธฮฎฯ. ฮฯ ฯฯ ฯฮฟ ฮฝฮฟฮนฮบฮฟฮบฯ ฯฮนฯ ฯฮฑฮฏฮฝฮตฯฮฑฮน ฯฮตฯฮฏฮตฯฮณฮฟ. ฮฮตฮฝ ฮญฯฮตฮน ฮบฮตฯฮฑฮปฮฑฮนฮฟฯ ฯฮนฮบฯ ฮบฮญฯฮดฮฟฯ, ฮฟฯฯฮต ฮบฮตฯฮฑฮปฮฑฮนฮฟฯ ฯฮนฮบฮฎ ฮฑฯฯฮปฮตฮนฮฑ. ฮฮฏฮฝฮฑฮน ฮดฮนฮฑฮถฮตฯ ฮณฮผฮญฮฝฮฟฯ ฮบฮฑฮน ฮตฮฏฮฝฮฑฮน 60 ฮตฯฯฮฝ ฮบฮฑฮน ฮตฮฏฮฝฮฑฮน ฮผฮฟฯฯฯฮผฮญฮฝฮฟฯ, ฮดฮทฮปฮฑฮดฮฎ Education_num > 12. ฮฃฯฮผฯฯฮฝฮฑ ฮผฮต ฯฮฟ ฮณฮตฮฝฮนฮบฯ ฯฯฯฯฯ ฯฮฟ, ฮฑฯ ฯฯ ฯฮฟ ฮฝฮฟฮนฮบฮฟฮบฯ ฯฮนฯ ฮธฮฑ ฯฯฮญฯฮตฮน, ฯฯฯฯ ฮตฮพฮทฮณฮตฮฏ ฮฟ ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮฎฯ, ฮฝฮฑ ฮญฯฮตฮน ฮตฮนฯฯฮดฮทฮผฮฑ ฮบฮฌฯฯ ฮฑฯฯ 50 ฯฮนฮปฮนฮฌฮดฮตฯ.
ฮ ฯฮฟฯฯฮฑฮธฮตฮฏฯ ฮฝฮฑ ฯฮฑฮฏฮพฮตฮนฯ ฮผฮต ฯฮฟ LIME. ฮฮฑ ฯฮฑฯฮฑฯฮทฯฮฎฯฮตฯฮต ฮผฮตฮณฮฌฮปฮฑ ฮปฮฌฮธฮท ฮฑฯฯ ฯฮฟฮฝ ฯฮฑฮพฮนฮฝฮฟฮผฮทฯฮฎ.
ฮฯฮฟฯฮตฮฏฯฮต ฮฝฮฑ ฮตฮปฮญฮณฮพฮตฯฮต ฯฮฟ GitHub ฯฮฟฯ ฮบฮฑฯฯฯฮฟฯ ฯฮทฯ ฮฒฮนฮฒฮปฮนฮฟฮธฮฎฮบฮทฯ. ฮ ฮฑฯฮญฯฮฟฯ ฮฝ ฮตฯฮนฯฮปฮญฮฟฮฝ ฯฮตฮบฮผฮทฯฮฏฯฯฮท ฮณฮนฮฑ ฯฮฑฮพฮนฮฝฯฮผฮทฯฮท ฮตฮนฮบฯฮฝฯฮฝ ฮบฮฑฮน ฮบฮตฮนฮผฮญฮฝฮฟฯ .
ฮ ฮตฯฮฏฮปฮทฯฮท
ฮ ฮฑฯฮฑฮบฮฌฯฯ ฮตฮฏฮฝฮฑฮน ฮผฮนฮฑ ฮปฮฏฯฯฮฑ ฮผฮต ฮผฮตฯฮนฮบฮญฯ ฯฯฮฎฯฮนฮผฮตฯ ฮตฮฝฯฮฟฮปฮญฯ ฮผฮต ฮญฮบฮดฮฟฯฮท scikit Learn >=0.20
| ฮดฮทฮผฮนฮฟฯ ฯฮณฮฏฮฑ ฯฯ ฮฝฯฮปฮฟฯ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ ฯฯฮญฮฝฮฟฯ /ฮดฮฟฮบฮนฮผฯฮฝ | ฮฟฮน ฮฑฯฮบฮฟฯฮผฮตฮฝฮฟฮน ฯฯฯฮฏฮถฮฟฮฝฯฮฑฮน |
| ฮฆฯฮนฮฌฮพฯฮต ฮญฮฝฮฑฮฝ ฮฑฮณฯฮณฯ | |
| ฮตฯฮนฮปฮญฮพฯฮต ฯฮท ฯฯฮฎฮปฮท ฮบฮฑฮน ฮตฯฮฑฯฮผฯฯฯฮต ฯฮฟฮฝ ฮผฮตฯฮฑฯฯฮทฮผฮฑฯฮนฯฮผฯ | ฮผฮตฯฮฑฯฯฮทฮผฮฑฯฮนฯฯฮฎฯ ฯฯฮฎฮปฮทฯ ฮบฮฑฯฮฑฯฮบฮตฯ ฮฎฯ |
| ฮตฮฏฮดฮฟฯ ฮผฮตฯฮฑฯฯฮทฮผฮฑฯฮนฯฮผฮฟฯ | |
| ฯฯ ฯฮฟฯฮฟฮนฯ | StandardScaler |
| ฮตฮปฮฌฯ. ฮผฮญฮณ | MinMaxScaler |
| ฮฮผฮฑฮปฯฮฝฯ | ฮฮฑฮฝฮฟฮฝฮฟฯฮฟฮนฮทฯฮฎฯ |
| ฮฮฑฯฮฑฮปฮฟฮณฮนฯฮผฯฯ ฯฮนฮผฮฎฯ ฯฮฟฯ ฮปฮตฮฏฯฮตฮน | ฮฑฯฮฟฮดฮฏฮดฯ |
| ฮฮตฯฮฑฯฯฮฟฯฮฎ ฮบฮฑฯฮทฮณฮฟฯฮนฯฮฝ | OneHotEncoder |
| ฮ ฯฮฟฯฮฑฯฮผฮฟฮณฮฎ ฮบฮฑฮน ฮผฮตฯฮฑฯฯฮฟฯฮฎ ฯฯฮฝ ฮดฮตฮดฮฟฮผฮญฮฝฯฮฝ | fit_transform |
| ฮฆฯฮนฮฌฮพฯฮต ฯฮฟฮฝ ฮฑฮณฯฮณฯ | make_pipeline |
| ฮฮฑฯฮนฮบฯ ฮผฮฟฮฝฯฮญฮปฮฟ | |
| ฮปฮฟฮณฮนฮบฮฎ ฯฮฑฮปฮนฮฝฮดฯฯฮผฮทฯฮท | ฮฮฟฮณฮนฯฯฮนฮบฮฎ ฯฮฑฮปฮนฮฝฮดฯฯฮผฮทฯฮท |
| XGBoost | XGBClassifier |
| ฮฮตฯ ฯฮนฮบฯ ฮดฮฏฮบฯฯ ฮฟ | MLPClassifier |
| ฮฮฝฮฑฮถฮฎฯฮทฯฮท ฯฮปฮญฮณฮผฮฑฯฮฟฯ | GridSearchCV |
| ฮคฯ ฯฮฑฮนฮฟฯฮฟฮนฮทฮผฮญฮฝฮท ฮฑฮฝฮฑฮถฮฎฯฮทฯฮท | ฮคฯ ฯฮฑฮนฮฟฯฮฟฮนฮทฮผฮญฮฝฮท ฮฮฝฮฑฮถฮฎฯฮทฯฮท CV |


