Scikit-Learn Tutorial: How to Install & Scikit-Learn Examples

ฮคฮน ฮตฮฏฮฝฮฑฮน ฯ„ฮฟ Scikit-learn;

Scikit-ฮผฮฌฮธฮตฯ„ฮต ฮตฮฏฮฝฮฑฮน ฮฑฮฝฮฟฮนฯ‡ฯ„ฮฟฯ ฮบฯŽฮดฮนฮบฮฑ Python ฮฒฮนฮฒฮปฮนฮฟฮธฮฎฮบฮท ฮณฮนฮฑ ฮผฮทฯ‡ฮฑฮฝฮนฮบฮฎ ฮผฮฌฮธฮทฯƒฮท. ฮฅฯ€ฮฟฯƒฯ„ฮทฯฮฏฮถฮตฮน ฮฑฮปฮณฯŒฯฮนฮธฮผฮฟฯ…ฯ‚ ฮฑฮนฯ‡ฮผฮฎฯ‚ ฯŒฯ€ฯ‰ฯ‚ KNN, XGBoost, random forest ฮบฮฑฮน SVM. ฮ•ฮฏฮฝฮฑฮน ฯ‡ฯ„ฮนฯƒฮผฮญฮฝฮฟ ฯ€ฮฌฮฝฯ‰ ฮฑฯ€ฯŒ ฯ„ฮฟ NumPy. ฮคฮฟ Scikit-learn ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮตฮฏฯ„ฮฑฮน ฮตฯ…ฯฮญฯ‰ฯ‚ ฯƒฯ„ฮฟฮฝ ฮฑฮฝฯ„ฮฑฮณฯ‰ฮฝฮนฯƒฮผฯŒ Kaggle ฮบฮฑฮธฯŽฯ‚ ฮบฮฑฮน ฯƒฮต ฮตฮพฮญฯ‡ฮฟฯ…ฯƒฮตฯ‚ ฮตฯ„ฮฑฮนฯฮตฮฏฮตฯ‚ ฯ„ฮตฯ‡ฮฝฮฟฮปฮฟฮณฮฏฮฑฯ‚. ฮ’ฮฟฮทฮธฮฌ ฯƒฯ„ฮทฮฝ ฯ€ฯฮฟฮตฯ€ฮตฮพฮตฯฮณฮฑฯƒฮฏฮฑ, ฯ„ฮท ฮผฮตฮฏฯ‰ฯƒฮท ฮดฮนฮฑฯƒฯ„ฮฌฯƒฮตฯ‰ฮฝ (ฮตฯ€ฮนฮปฮฟฮณฮฎ ฯ€ฮฑฯฮฑฮผฮญฯ„ฯฯ‰ฮฝ), ฯ„ฮทฮฝ ฯ„ฮฑฮพฮนฮฝฯŒฮผฮทฯƒฮท, ฯ„ฮทฮฝ ฯ€ฮฑฮปฮนฮฝฮดฯฯŒฮผฮทฯƒฮท, ฯ„ฮทฮฝ ฮฟฮผฮฑฮดฮฟฯ€ฮฟฮฏฮทฯƒฮท ฮบฮฑฮน ฯ„ฮทฮฝ ฮตฯ€ฮนฮปฮฟฮณฮฎ ฮผฮฟฮฝฯ„ฮญฮปฮฟฯ….

ฮคฮฟ Scikit-learn ฮญฯ‡ฮตฮน ฯ„ฮทฮฝ ฮบฮฑฮปฯฯ„ฮตฯฮท ฯ„ฮตฮบฮผฮทฯฮฏฯ‰ฯƒฮท ฮฑฯ€ฯŒ ฯŒฮปฮตฯ‚ ฯ„ฮนฯ‚ ฮฒฮนฮฒฮปฮนฮฟฮธฮฎฮบฮตฯ‚ ฮฑฮฝฮฟฮนฯ‡ฯ„ฮฟฯ ฮบฯŽฮดฮนฮบฮฑ. ฮฃฮฑฯ‚ ฯ€ฮฑฯฮญฯ‡ฮตฮน ฮญฮฝฮฑ ฮดฮนฮฑฮดฯฮฑฯƒฯ„ฮนฮบฯŒ ฮณฯฮฌฯ†ฮทฮผฮฑ ฯƒฯ„ฮฟ https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html.

ฮ ฯŽฯ‚ ฮปฮตฮนฯ„ฮฟฯ…ฯฮณฮตฮฏ ฯ„ฮฟ Scikit Learn
ฮ ฯŽฯ‚ ฮปฮตฮนฯ„ฮฟฯ…ฯฮณฮตฮฏ ฯ„ฮฟ Scikit Learn

ฮคฮฟ Scikit-learn ฮดฮตฮฝ ฮตฮฏฮฝฮฑฮน ฯ€ฮฟฮปฯ ฮดฯฯƒฮบฮฟฮปฮฟ ฯƒฯ„ฮท ฯ‡ฯฮฎฯƒฮท ฮบฮฑฮน ฯ€ฮฑฯฮญฯ‡ฮตฮน ฮตฮพฮฑฮนฯฮตฯ„ฮนฮบฮฌ ฮฑฯ€ฮฟฯ„ฮตฮปฮญฯƒฮผฮฑฯ„ฮฑ. ฮฉฯƒฯ„ฯŒฯƒฮฟ, ฯ„ฮฟ scikit Learn ฮดฮตฮฝ ฯ…ฯ€ฮฟฯƒฯ„ฮทฯฮฏฮถฮตฮน ฯ€ฮฑฯฮฌฮปฮปฮทฮปฮฟฯ…ฯ‚ ฯ…ฯ€ฮฟฮปฮฟฮณฮนฯƒฮผฮฟฯฯ‚. ฮ•ฮฏฮฝฮฑฮน ฮดฯ…ฮฝฮฑฯ„ฯŒ ฮฝฮฑ ฮตฮบฯ„ฮตฮปฮญฯƒฮตฯ„ฮต ฮญฮฝฮฑฮฝ ฮฑฮปฮณฯŒฯฮนฮธฮผฮฟ ฮฒฮฑฮธฮนฮฌฯ‚ ฮตฮบฮผฮฌฮธฮทฯƒฮทฯ‚ ฮผฮต ฮฑฯ…ฯ„ฯŒ, ฮฑฮปฮปฮฌ ฮดฮตฮฝ ฮตฮฏฮฝฮฑฮน ฮท ฮฒฮญฮปฯ„ฮนฯƒฯ„ฮท ฮปฯฯƒฮท, ฮตฮนฮดฮนฮบฮฌ ฮฑฮฝ ฮณฮฝฯ‰ฯฮฏฮถฮตฯ„ฮต ฯ€ฯŽฯ‚ ฮฝฮฑ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮตฮฏฯ„ฮต ฯ„ฮฟ TensorFlow.

ฮ ฯŽฯ‚ ฮฝฮฑ ฮบฮฑฯ„ฮตฮฒฮฌฯƒฮตฯ„ฮต ฮบฮฑฮน ฮฝฮฑ ฮตฮณฮบฮฑฯ„ฮฑฯƒฯ„ฮฎฯƒฮตฯ„ฮต ฯ„ฮฟ Scikit-learn

ฮคฯŽฯฮฑ ฯƒฮต ฮฑฯ…ฯ„ฯŒ Python ฮ•ฮบฮผฮฌฮธฮทฯƒฮท Scikit-learn, ฮธฮฑ ฮผฮฌฮธฮฟฯ…ฮผฮต ฯ€ฯŽฯ‚ ฮฝฮฑ ฮบฮฑฯ„ฮตฮฒฮฌฯƒฮตฯ„ฮต ฮบฮฑฮน ฮฝฮฑ ฮตฮณฮบฮฑฯ„ฮฑฯƒฯ„ฮฎฯƒฮตฯ„ฮต ฯ„ฮฟ Scikit-learn:

ฮ•ฯ€ฮนฮปฮฟฮณฮฎ 1: AWS

ฮคฮฟ scikit-learn ฮผฯ€ฮฟฯฮตฮฏ ฮฝฮฑ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮทฮธฮตฮฏ ฮผฮญฯƒฯ‰ AWS. ฮฃฮฑฯ‚ ฯ€ฮฑฯฮฑฮบฮฑฮปฮฟฯฮผฮต ฯ€ฮฑฯฮฑฯ€ฮญฮผฯ€ฯ‰ ฮ— ฮตฮนฮบฯŒฮฝฮฑ docker ฯ€ฮฟฯ… ฮญฯ‡ฮตฮน ฯ€ฯฮฟฮตฮณฮบฮฑฯ„ฮตฯƒฯ„ฮทฮผฮญฮฝฮฟ ฯ„ฮฟ scikit-learn.

ฮ“ฮนฮฑ ฮฝฮฑ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮฎฯƒฮตฯ„ฮต ฯ„ฮทฮฝ ฮญฮบฮดฮฟฯƒฮท ฯ€ฯฮฟฮณฯฮฑฮผฮผฮฑฯ„ฮนฯƒฯ„ฮฎ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮฎฯƒฯ„ฮต ฯ„ฮทฮฝ ฮตฮฝฯ„ฮฟฮปฮฎ in Jupyter

import sys
!{sys.executable} -m pip install git+git://github.com/scikit-learn/scikit-learn.git

ฮ•ฯ€ฮนฮปฮฟฮณฮฎ 2: Mac ฮฎ Windows ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฯŽฮฝฯ„ฮฑฯ‚ ฯ„ฮฟ Anaconda

ฮ“ฮนฮฑ ฮฝฮฑ ฮผฮฌฮธฮตฯ„ฮต ฯƒฯ‡ฮตฯ„ฮนฮบฮฌ ฮผฮต ฯ„ฮทฮฝ ฮตฮณฮบฮฑฯ„ฮฌฯƒฯ„ฮฑฯƒฮท ฯ„ฮฟฯ… Anaconda, ฮฑฮฝฮฑฯ„ฯฮญฮพฯ„ฮต https://www.guru99.com/download-install-tensorflow.html

ฮ ฯฯŒฯƒฯ†ฮฑฯ„ฮฑ, ฮฟฮน ฯ€ฯฮฟฮณฯฮฑฮผฮผฮฑฯ„ฮนฯƒฯ„ฮญฯ‚ ฯ„ฮฟฯ… scikit ฮบฯ…ฮบฮปฮฟฯ†ฯŒฯฮทฯƒฮฑฮฝ ฮผฮนฮฑ ฮญฮบฮดฮฟฯƒฮท ฮฑฮฝฮฌฯ€ฯ„ฯ…ฮพฮทฯ‚ ฯ€ฮฟฯ… ฮฑฮฝฯ„ฮนฮผฮตฯ„ฯ‰ฯ€ฮฏฮถฮตฮน ฮบฮฟฮนฮฝฯŒ ฯ€ฯฯŒฮฒฮปฮทฮผฮฑ ฯ€ฮฟฯ… ฮฑฮฝฯ„ฮนฮผฮตฯ„ฯ‰ฯ€ฮฏฮถฮตฮน ฮท ฯ„ฯฮญฯ‡ฮฟฯ…ฯƒฮฑ ฮญฮบฮดฮฟฯƒฮท. ฮ’ฯฮฎฮบฮฑฮผฮต ฯŒฯ„ฮน ฮตฮฏฮฝฮฑฮน ฯ€ฮนฮฟ ฮฒฮฟฮปฮนฮบฯŒ ฮฝฮฑ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮฟฯฮผฮต ฯ„ฮทฮฝ ฮญฮบฮดฮฟฯƒฮท ฯ€ฯฮฟฮณฯฮฑฮผฮผฮฑฯ„ฮนฯƒฯ„ฮฎ ฮฑฮฝฯ„ฮฏ ฮณฮนฮฑ ฯ„ฮทฮฝ ฯ„ฯฮญฯ‡ฮฟฯ…ฯƒฮฑ ฮญฮบฮดฮฟฯƒฮท.

ฮ ฯŽฯ‚ ฮฝฮฑ ฮตฮณฮบฮฑฯ„ฮฑฯƒฯ„ฮฎฯƒฮตฯ„ฮต ฯ„ฮฟ scikit-learn ฮผฮต ฯ„ฮฟ Conda Environment

ฮ•ฮฌฮฝ ฮตฮณฮบฮฑฯ„ฮฑฯƒฯ„ฮฎฯƒฮฑฯ„ฮต ฯ„ฮฟ scikit-learn ฮผฮต ฯ„ฮฟ ฯ€ฮตฯฮนฮฒฮฌฮปฮปฮฟฮฝ conda, ฮฑฮบฮฟฮปฮฟฯ…ฮธฮฎฯƒฯ„ฮต ฯ„ฮฟ ฮฒฮฎฮผฮฑ ฮณฮนฮฑ ฮตฮฝฮทฮผฮญฯฯ‰ฯƒฮท ฯƒฯ„ฮทฮฝ ฮญฮบฮดฮฟฯƒฮท 0.20

ฮ’ฮฎฮผฮฑ 1) ฮ•ฮฝฮตฯฮณฮฟฯ€ฮฟฮนฮฎฯƒฯ„ฮต ฯ„ฮฟ ฯ€ฮตฯฮนฮฒฮฌฮปฮปฮฟฮฝ tensorflow

source activate hello-tf

ฮ’ฮฎฮผฮฑ 2) ฮ‘ฯ†ฮฑฮนฯฮญฯƒฯ„ฮต ฯ„ฮฟ scikit lean ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฯŽฮฝฯ„ฮฑฯ‚ ฯ„ฮทฮฝ ฮตฮฝฯ„ฮฟฮปฮฎ conda

conda remove scikit-learn

ฮ’ฮฎฮผฮฑ 3) ฮ•ฮณฮบฮฑฯ„ฮฑฯƒฯ„ฮฎฯƒฯ„ฮต ฯ„ฮทฮฝ ฮญฮบฮดฮฟฯƒฮท ฯ€ฯฮฟฮณฯฮฑฮผฮผฮฑฯ„ฮนฯƒฯ„ฮฎ.
ฮ•ฮณฮบฮฑฯ„ฮฑฯƒฯ„ฮฎฯƒฯ„ฮต ฯ„ฮทฮฝ ฮญฮบฮดฮฟฯƒฮท ฯ€ฯฮฟฮณฯฮฑฮผฮผฮฑฯ„ฮนฯƒฯ„ฮฎ scikit Learn ฮผฮฑฮถฮฏ ฮผฮต ฯ„ฮนฯ‚ ฮฑฯ€ฮฑฯฮฑฮฏฯ„ฮทฯ„ฮตฯ‚ ฮฒฮนฮฒฮปฮนฮฟฮธฮฎฮบฮตฯ‚.

conda install -c anaconda git
pip install Cython
pip install h5py
pip install git+git://github.com/scikit-learn/scikit-learn.git

ฮฃฮ—ฮœฮ•ฮŠฮฉฮฃฮ—: Windows ฮฟ ฯ‡ฯฮฎฯƒฯ„ฮทฯ‚ ฮธฮฑ ฯ€ฯฮญฯ€ฮตฮน ฮฝฮฑ ฮตฮณฮบฮฑฯ„ฮฑฯƒฯ„ฮฎฯƒฮตฮน Microsoft ฮŸฯ€ฯ„ฮนฮบฯŒ C++ 14. ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฯ„ฮฟ ฯ€ฮฌฯฮตฯ„ฮต ฮฑฯ€ฯŒ ฮตฮดฯŽ

ฮ ฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ Scikit-Learn ฮผฮต ฮœฮทฯ‡ฮฑฮฝฮนฮบฮฎ ฮœฮฌฮธฮทฯƒฮท

ฮ‘ฯ…ฯ„ฯŒ ฯ„ฮฟ ฯƒฮตฮผฮนฮฝฮฌฯฮนฮฟ Scikit ฯ‡ฯ‰ฯฮฏฮถฮตฯ„ฮฑฮน ฯƒฮต ฮดฯฮฟ ฮผฮญฯฮท:

  1. ฮœฮทฯ‡ฮฑฮฝฮนฮบฮฎ ฮผฮฌฮธฮทฯƒฮท ฮผฮต scikit-learn
  2. ฮ ฯŽฯ‚ ฮฝฮฑ ฮตฮผฯ€ฮนฯƒฯ„ฮตฯ…ฯ„ฮตฮฏฯ„ฮต ฯ„ฮฟ ฮผฮฟฮฝฯ„ฮญฮปฮฟ ฯƒฮฑฯ‚ ฮผฮต ฯ„ฮฟ LIME

ฮคฮฟ ฯ€ฯฯŽฯ„ฮฟ ฮผฮญฯฮฟฯ‚ ฯ€ฮตฯฮนฮณฯฮฌฯ†ฮตฮน ฯ„ฮฟฮฝ ฯ„ฯฯŒฯ€ฮฟ ฮบฮฑฯ„ฮฑฯƒฮบฮตฯ…ฮฎฯ‚ ฮตฮฝฯŒฯ‚ ฮฑฮณฯ‰ฮณฮฟฯ, ฯ„ฮท ฮดฮทฮผฮนฮฟฯ…ฯฮณฮฏฮฑ ฮตฮฝฯŒฯ‚ ฮผฮฟฮฝฯ„ฮญฮปฮฟฯ… ฮบฮฑฮน ฯ„ฮฟฮฝ ฯƒฯ…ฮฝฯ„ฮฟฮฝฮนฯƒฮผฯŒ ฯ„ฯ‰ฮฝ ฯ…ฯ€ฮตฯฯ€ฮฑฯฮฑฮผฮญฯ„ฯฯ‰ฮฝ, ฮตฮฝฯŽ ฯ„ฮฟ ฮดฮตฯฯ„ฮตฯฮฟ ฮผฮญฯฮฟฯ‚ ฯ€ฮฑฯฮญฯ‡ฮตฮน ฯ„ฮทฮฝ ฯ„ฮตฮปฮตฯ…ฯ„ฮฑฮฏฮฑ ฮปฮญฮพฮท ฯ„ฮทฯ‚ ฯ„ฮตฯ‡ฮฝฮฟฮปฮฟฮณฮฏฮฑฯ‚ ฯŒฯƒฮฟฮฝ ฮฑฯ†ฮฟฯฮฌ ฯ„ฮทฮฝ ฮตฯ€ฮนฮปฮฟฮณฮฎ ฮผฮฟฮฝฯ„ฮญฮปฮฟฯ….

ฮ’ฮฎฮผฮฑ 1) ฮ•ฮนฯƒฮฑฮณฮฌฮณฮตฯ„ฮต ฯ„ฮฑ ฮดฮตฮดฮฟฮผฮญฮฝฮฑ

ฮšฮฑฯ„ฮฌ ฯ„ฮท ฮดฮนฮฌฯฮบฮตฮนฮฑ ฮฑฯ…ฯ„ฮฟฯ ฯ„ฮฟฯ… ฯƒฮตฮผฮนฮฝฮฑฯฮฏฮฟฯ… ฮตฮบฮผฮฌฮธฮทฯƒฮทฯ‚ Scikit, ฮธฮฑ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮตฮฏฯ„ฮต ฯ„ฮฟ ฯƒฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ ฮณฮนฮฑ ฮตฮฝฮฎฮปฮนฮบฮตฯ‚.

ฮ“ฮนฮฑ ฮญฮฝฮฑ ฯ…ฯ€ฯŒฮฒฮฑฮธฯฮฟ ฯƒฮต ฮฑฯ…ฯ„ฯŒ ฯ„ฮฟ ฯƒฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ, ฮฑฮฝฮฑฯ„ฯฮญฮพฯ„ฮต ฮตฮฌฮฝ ฮตฮฝฮดฮนฮฑฯ†ฮญฯฮตฯƒฯ„ฮต ฮฝฮฑ ฮผฮฌฮธฮตฯ„ฮต ฯ€ฮตฯฮนฯƒฯƒฯŒฯ„ฮตฯฮฑ ฯƒฯ‡ฮตฯ„ฮนฮบฮฌ ฮผฮต ฯ„ฮฑ ฯ€ฮตฯฮนฮณฯฮฑฯ†ฮนฮบฮฌ ฯƒฯ„ฮฑฯ„ฮนฯƒฯ„ฮนฮบฮฌ ฯƒฯ„ฮฟฮนฯ‡ฮตฮฏฮฑ, ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮฎฯƒฯ„ฮต ฯ„ฮฑ ฮตฯฮณฮฑฮปฮตฮฏฮฑ Dive ฮบฮฑฮน Overview.

ฮ ฮฑฯฮฑฯ€ฮญฮผฯ€ฯ‰ ฮฑฯ…ฯ„ฯŒ ฯ„ฮฟ ฯƒฮตฮผฮนฮฝฮฌฯฮนฮฟ ฮผฮฌฮธฮตฯ„ฮต ฯ€ฮตฯฮนฯƒฯƒฯŒฯ„ฮตฯฮฑ ฮณฮนฮฑ ฯ„ฮฟ Dive and Overview

ฮ•ฮนฯƒฮฌฮณฮตฯ„ฮต ฯ„ฮฟ ฯƒฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ ฮผฮต ฯ„ฮฑ Pandas. ฮฃฮทฮผฮตฮนฯŽฯƒฯ„ฮต ฯŒฯ„ฮน ฯ€ฯฮญฯ€ฮตฮน ฮฝฮฑ ฮผฮตฯ„ฮฑฯ„ฯฮญฯˆฮตฯ„ฮต ฯ„ฮฟฮฝ ฯ„ฯฯ€ฮฟ ฯ„ฯ‰ฮฝ ฯƒฯ…ฮฝฮตฯ‡ฯŽฮฝ ฮผฮตฯ„ฮฑฮฒฮปฮทฯ„ฯŽฮฝ ฯƒฮต ฮผฮฟฯฯ†ฮฎ float.

ฮ‘ฯ…ฯ„ฯŒ ฯ„ฮฟ ฯƒฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ ฯ€ฮตฯฮนฮปฮฑฮผฮฒฮฌฮฝฮตฮน ฮฟฮบฯ„ฯŽ ฮบฮฑฯ„ฮทฮณฮฟฯฮนฮบฮญฯ‚ ฮผฮตฯ„ฮฑฮฒฮปฮทฯ„ฮญฯ‚:

ฮŸฮน ฮบฮฑฯ„ฮทฮณฮฟฯฮนฮบฮญฯ‚ ฮผฮตฯ„ฮฑฮฒฮปฮทฯ„ฮญฯ‚ ฯ€ฮฑฯฮฑฯ„ฮฏฮธฮตฮฝฯ„ฮฑฮน ฯƒฯ„ฮฟ CATE_FEATURES

  • ฯ„ฮฌฮพฮท ฮตฯฮณฮฑฯƒฮฏฮฑฯ‚
  • ฮตฮบฯ€ฮฑฮฏฮดฮตฯ…ฯƒฮท
  • ฯƒฯ…ฮถฯ…ฮณฮนฮบฯŒฯ‚
  • ฮตฯ€ฮฌฮณฮณฮตฮปฮผฮฑ
  • ฯƒฯ‡ฮญฯƒฮท
  • ฮฑฮณฯŽฮฝฮฑฯ‚
  • ฯ†ฯฮปฮฟ
  • ฯ‡ฯŽฯฮฑ ฮนฮธฮฑฮณฮญฮฝฮตฮนฮฑฯ‚

ฮ•ฯ€ฮนฯ€ฮปฮญฮฟฮฝ, ฮญฮพฮน ฯƒฯ…ฮฝฮตฯ‡ฮตฮฏฯ‚ ฮผฮตฯ„ฮฑฮฒฮปฮทฯ„ฮญฯ‚:

ฮŸฮน ฯƒฯ…ฮฝฮตฯ‡ฮตฮฏฯ‚ ฮผฮตฯ„ฮฑฮฒฮปฮทฯ„ฮญฯ‚ ฯ€ฮฑฯฮฑฯ„ฮฏฮธฮตฮฝฯ„ฮฑฮน ฯƒฯ„ฮนฯ‚ CONTI_FEATURES

  • ฯ„ฮทฮฝ ฮทฮปฮนฮบฮฏฮฑ ฯ„ฮฟฯ…
  • fnlwgt
  • ฮตฮบฯ€ฮฑฮฏฮดฮตฯ…ฯƒฮท_ฮฑฯฮนฮธฮผ
  • ฮบฮตฯ†ฮฑฮปฮฑฮนฮฑฮบฯŒ ฮบฮญฯฮดฮฟฯ‚
  • ฮฑฯ€ฯŽฮปฮตฮนฮฑ_ฮบฮตฯ†ฮฑฮปฮฑฮฏฮฟฯ…
  • ฯŽฯฮตฯ‚_ฮตฮฒฮดฮฟฮผฮฌฮดฮฑ

ฮฃฮทฮผฮตฮนฯŽฯƒฯ„ฮต ฯŒฯ„ฮน ฯƒฯ…ฮผฯ€ฮปฮทฯฯŽฮฝฮฟฯ…ฮผฮต ฯ„ฮท ฮปฮฏฯƒฯ„ฮฑ ฮผฮต ฯ„ฮฟ ฯ‡ฮญฯฮน, ฯŽฯƒฯ„ฮต ฮฝฮฑ ฮญฯ‡ฮตฯ„ฮต ฮบฮฑฮปฯฯ„ฮตฯฮท ฮนฮดฮญฮฑ ฮณฮนฮฑ ฯ„ฮนฯ‚ ฯƒฯ„ฮฎฮปฮตฯ‚ ฯ€ฮฟฯ… ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮฟฯฮผฮต. ฮˆฮฝฮฑฯ‚ ฯ€ฮนฮฟ ฮณฯฮฎฮณฮฟฯฮฟฯ‚ ฯ„ฯฯŒฯ€ฮฟฯ‚ ฮณฮนฮฑ ฮฝฮฑ ฮดฮทฮผฮนฮฟฯ…ฯฮณฮฎฯƒฮตฯ„ฮต ฮผฮนฮฑ ฮปฮฏฯƒฯ„ฮฑ ฮบฮฑฯ„ฮทฮณฮฟฯฮนฮบฯŽฮฝ ฮฎ ฯƒฯ…ฮฝฮตฯ‡ฯŽฮฝ ฮตฮฏฮฝฮฑฮน ฮฝฮฑ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮฎฯƒฮตฯ„ฮต:

## List Categorical
CATE_FEATURES = df_train.iloc[:,:-1].select_dtypes('object').columns
print(CATE_FEATURES)

## List continuous
CONTI_FEATURES =  df_train._get_numeric_data()
print(CONTI_FEATURES)

ฮ•ฮดฯŽ ฮตฮฏฮฝฮฑฮน ฮฟ ฮบฯŽฮดฮนฮบฮฑฯ‚ ฮณฮนฮฑ ฯ„ฮทฮฝ ฮตฮนฯƒฮฑฮณฯ‰ฮณฮฎ ฯ„ฯ‰ฮฝ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ:

# Import dataset
import pandas as pd

## Define path data
COLUMNS = ['age','workclass', 'fnlwgt', 'education', 'education_num', 'marital',
           'occupation', 'relationship', 'race', 'sex', 'capital_gain', 'capital_loss',
           'hours_week', 'native_country', 'label']
### Define continuous list
CONTI_FEATURES  = ['age', 'fnlwgt','capital_gain', 'education_num', 'capital_loss', 'hours_week']
### Define categorical list
CATE_FEATURES = ['workclass', 'education', 'marital', 'occupation', 'relationship', 'race', 'sex', 'native_country']

## Prepare the data
features = ['age','workclass', 'fnlwgt', 'education', 'education_num', 'marital',
           'occupation', 'relationship', 'race', 'sex', 'capital_gain', 'capital_loss',
           'hours_week', 'native_country']

PATH = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data"

df_train = pd.read_csv(PATH, skipinitialspace=True, names = COLUMNS, index_col=False)
df_train[CONTI_FEATURES] =df_train[CONTI_FEATURES].astype('float64')
df_train.describe()
ฯ„ฮทฮฝ ฮทฮปฮนฮบฮฏฮฑ ฯ„ฮฟฯ… fnlwgt ฮตฮบฯ€ฮฑฮฏฮดฮตฯ…ฯƒฮท_ฮฑฯฮนฮธฮผ ฮบฮตฯ†ฮฑฮปฮฑฮนฮฑฮบฯŒ ฮบฮญฯฮดฮฟฯ‚ ฮฑฯ€ฯŽฮปฮตฮนฮฑ_ฮบฮตฯ†ฮฑฮปฮฑฮฏฮฟฯ… ฯŽฯฮตฯ‚_ฮตฮฒฮดฮฟฮผฮฌฮดฮฑ
ฮผฮตฯ„ฯฮฌฮฝฮต 32561.000000 3.256100e + 04 32561.000000 32561.000000 32561.000000 32561.000000
ฮตฮฝฮฝฮฟฯŽ 38.581647 1.897784e + 05 10.080679 1077.648844 87.303830 40.437456
std 13.640433 1.055500e + 05 2.572720 7385.292085 402.960219 12.347429
ฯ€ฯฮฑฮบฯ„ฮนฮบฮฌ 17.000000 1.228500e + 04 1.000000 0.000000 0.000000 1.000000
25% 28.000000 1.178270e + 05 9.000000 0.000000 0.000000 40.000000
50% 37.000000 1.783560e + 05 10.000000 0.000000 0.000000 40.000000
75% 48.000000 2.370510e + 05 12.000000 0.000000 0.000000 45.000000
max 90.000000 1.484705e + 06 16.000000 99999.000000 4356.000000 99.000000

ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮตฮปฮญฮณฮพฮตฯ„ฮต ฯ„ฮฟฮฝ ฮฑฯฮนฮธฮผฯŒ ฯ„ฯ‰ฮฝ ฮผฮฟฮฝฮฑฮดฮนฮบฯŽฮฝ ฯ„ฮนฮผฯŽฮฝ ฯ„ฯ‰ฮฝ ฯ‡ฮฑฯฮฑฮบฯ„ฮทฯฮนฯƒฯ„ฮนฮบฯŽฮฝ native_country. ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮดฮตฮฏฯ„ฮต ฯŒฯ„ฮน ฮผฯŒฮฝฮฟ ฮญฮฝฮฑ ฮฝฮฟฮนฮบฮฟฮบฯ…ฯฮนฯŒ ฯ€ฯฮฟฮญฯฯ‡ฮตฯ„ฮฑฮน ฮฑฯ€ฯŒ ฯ„ฮทฮฝ ฮŸฮปฮปฮฑฮฝฮดฮฏฮฑ-ฮŸฮปฮปฮฑฮฝฮดฮฏฮฑ. ฮ‘ฯ…ฯ„ฯŒ ฯ„ฮฟ ฮฝฮฟฮนฮบฮฟฮบฯ…ฯฮนฯŒ ฮดฮตฮฝ ฮธฮฑ ฮผฮฑฯ‚ ฯ†ฮญฯฮตฮน ฮบฮฑฮผฮฏฮฑ ฯ€ฮปฮทฯฮฟฯ†ฮฟฯฮฏฮฑ, ฮฑฮปฮปฮฌ ฮธฮฑ ฮผฮฑฯ‚ ฮดฯŽฯƒฮตฮน ฮญฮฝฮฑ ฮปฮฌฮธฮฟฯ‚ ฮบฮฑฯ„ฮฌ ฯ„ฮท ฮดฮนฮฌฯฮบฮตฮนฮฑ ฯ„ฮทฯ‚ ฮตฮบฯ€ฮฑฮฏฮดฮตฯ…ฯƒฮทฯ‚.

df_train.native_country.value_counts()
United-States                 29170
Mexico                          643
?                               583
Philippines                     198
Germany                         137
Canada                          121
Puerto-Rico                     114
El-Salvador                     106
India                           100
Cuba                             95
England                          90
Jamaica                          81
South                            80
China                            75
Italy                            73
Dominican-Republic               70
Vietnam                          67
Guatemala                        64
Japan                            62
Poland                           60
Columbia                         59
Taiwan                           51
Haiti                            44
Iran                             43
Portugal                         37
Nicaragua                        34
Peru                             31
France                           29
Greece                           29
Ecuador                          28
Ireland                          24
Hong                             20
Cambodia                         19
Trinadad&Tobago                  19
Thailand                         18
Laos                             18
Yugoslavia                       16
Outlying-US(Guam-USVI-etc)       14
Honduras                         13
Hungary                          13
Scotland                         12
Holand-Netherlands                1
Name: native_country, dtype: int64

ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮตฮพฮฑฮนฯฮญฯƒฮตฯ„ฮต ฮฑฯ…ฯ„ฮฎฮฝ ฯ„ฮท ฮผฮท ฯ€ฮปฮทฯฮฟฯ†ฮฟฯฮนฮฑฮบฮฎ ฯƒฮตฮนฯฮฌ ฮฑฯ€ฯŒ ฯ„ฮฟ ฯƒฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ

## Drop Netherland, because only one row
df_train = df_train[df_train.native_country != "Holand-Netherlands"]

ฮฃฯ„ฮท ฯƒฯ…ฮฝฮญฯ‡ฮตฮนฮฑ, ฮฑฯ€ฮฟฮธฮทฮบฮตฯฮตฯ„ฮต ฯ„ฮท ฮธฮญฯƒฮท ฯ„ฯ‰ฮฝ ฯƒฯ…ฮฝฮตฯ‡ฯŽฮฝ ฮดฯ…ฮฝฮฑฯ„ฮฟฯ„ฮฎฯ„ฯ‰ฮฝ ฯƒฮต ฮผฮนฮฑ ฮปฮฏฯƒฯ„ฮฑ. ฮ˜ฮฑ ฯ„ฮฟ ฯ‡ฯฮตฮนฮฑฯƒฯ„ฮตฮฏฯ„ฮต ฯƒฯ„ฮฟ ฮตฯ€ฯŒฮผฮตฮฝฮฟ ฮฒฮฎฮผฮฑ ฮณฮนฮฑ ฯ„ฮทฮฝ ฮบฮฑฯ„ฮฑฯƒฮบฮตฯ…ฮฎ ฯ„ฮฟฯ… ฮฑฮณฯ‰ฮณฮฟฯ.

ฮŸ ฯ€ฮฑฯฮฑฮบฮฌฯ„ฯ‰ ฮบฯŽฮดฮนฮบฮฑฯ‚ ฮธฮฑ ฮบฮฌฮฝฮตฮน ฮฒฯฯŒฯ‡ฮฟ ฯ€ฮฌฮฝฯ‰ ฮฑฯ€ฯŒ ฯŒฮปฮฑ ฯ„ฮฑ ฮฟฮฝฯŒฮผฮฑฯ„ฮฑ ฯƒฯ„ฮทฮปฯŽฮฝ ฯƒฯ„ฮฟ CONTI_FEATURES ฮบฮฑฮน ฮธฮฑ ฮปฮฌฮฒฮตฮน ฯ„ฮท ฮธฮญฯƒฮท ฯ„ฮฟฯ… (ฮดฮทฮป. ฯ„ฮฟฮฝ ฮฑฯฮนฮธฮผฯŒ ฯ„ฮฟฯ…) ฮบฮฑฮน ฯƒฯ„ฮท ฯƒฯ…ฮฝฮญฯ‡ฮตฮนฮฑ ฮธฮฑ ฯ„ฮฟฮฝ ฯ€ฯฮฟฯƒฮฑฯฯ„ฮฎฯƒฮตฮน ฯƒฮต ฮผฮนฮฑ ฮปฮฏฯƒฯ„ฮฑ ฯ€ฮฟฯ… ฮฟฮฝฮฟฮผฮฌฮถฮตฯ„ฮฑฮน conti_features

## Get the column index of the categorical features
conti_features = []
for i in CONTI_FEATURES:
    position = df_train.columns.get_loc(i)
    conti_features.append(position)
print(conti_features)  
[0, 2, 10, 4, 11, 12]

ฮŸ ฯ€ฮฑฯฮฑฮบฮฌฯ„ฯ‰ ฮบฯŽฮดฮนฮบฮฑฯ‚ ฮบฮฌฮฝฮตฮน ฯ„ฮทฮฝ ฮฏฮดฮนฮฑ ฮดฮฟฯ…ฮปฮตฮนฮฌ ฯŒฯ€ฯ‰ฯ‚ ฯ€ฮฑฯฮฑฯ€ฮฌฮฝฯ‰ ฮฑฮปฮปฮฌ ฮณฮนฮฑ ฯ„ฮทฮฝ ฮบฮฑฯ„ฮทฮณฮฟฯฮนฮบฮฎ ฮผฮตฯ„ฮฑฮฒฮปฮทฯ„ฮฎ. ฮŸ ฯ€ฮฑฯฮฑฮบฮฌฯ„ฯ‰ ฮบฯŽฮดฮนฮบฮฑฯ‚ ฮตฯ€ฮฑฮฝฮฑฮปฮฑฮผฮฒฮฌฮฝฮตฮน ฯŒ,ฯ„ฮน ฮญฯ‡ฮตฯ„ฮต ฮบฮฌฮฝฮตฮน ฯƒฯ„ฮฟ ฯ€ฮฑฯฮตฮปฮธฯŒฮฝ, ฮตฮบฯ„ฯŒฯ‚ ฮฑฯ€ฯŒ ฯ„ฮฑ ฮบฮฑฯ„ฮทฮณฮฟฯฮทฮผฮฑฯ„ฮนฮบฮฌ ฯ‡ฮฑฯฮฑฮบฯ„ฮทฯฮนฯƒฯ„ฮนฮบฮฌ.

## Get the column index of the categorical features
categorical_features = []
for i in CATE_FEATURES:
    position = df_train.columns.get_loc(i)
    categorical_features.append(position)
print(categorical_features)  
[1, 3, 5, 6, 7, 8, 9, 13]

ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฯฮฏฮพฮตฯ„ฮต ฮผฮนฮฑ ฮผฮฑฯ„ฮนฮฌ ฯƒฯ„ฮฟ ฯƒฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ. ฮฃฮทฮผฮตฮนฯŽฯƒฯ„ฮต ฯŒฯ„ฮน, ฮบฮฌฮธฮต ฮบฮฑฯ„ฮทฮณฮฟฯฮนฮบฯŒ ฯ‡ฮฑฯฮฑฮบฯ„ฮทฯฮนฯƒฯ„ฮนฮบฯŒ ฮตฮฏฮฝฮฑฮน ฮผฮนฮฑ ฯƒฯ…ฮผฮฒฮฟฮปฮฟฯƒฮตฮนฯฮฌ. ฮ”ฮตฮฝ ฮผฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฯ„ฯฮฟฯ†ฮฟฮดฮฟฯ„ฮฎฯƒฮตฯ„ฮต ฮญฮฝฮฑ ฮผฮฟฮฝฯ„ฮญฮปฮฟ ฮผฮต ฯ„ฮนฮผฮฎ ฯƒฯ…ฮผฮฒฮฟฮปฮฟฯƒฮตฮนฯฮฌฯ‚. ฮ ฯฮญฯ€ฮตฮน ฮฝฮฑ ฮผฮตฯ„ฮฑฯ„ฯฮญฯˆฮตฯ„ฮต ฯ„ฮฟ ฯƒฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฯŽฮฝฯ„ฮฑฯ‚ ฮผฮนฮฑ ฮตฮนฮบฮฟฮฝฮนฮบฮฎ ฮผฮตฯ„ฮฑฮฒฮปฮทฯ„ฮฎ.

df_train.head(5)

ฮฃฯ„ฮทฮฝ ฯ€ฯฮฑฮณฮผฮฑฯ„ฮนฮบฯŒฯ„ฮทฯ„ฮฑ, ฯ€ฯฮญฯ€ฮตฮน ฮฝฮฑ ฮดฮทฮผฮนฮฟฯ…ฯฮณฮฎฯƒฮตฯ„ฮต ฮผฮฏฮฑ ฯƒฯ„ฮฎฮปฮท ฮณฮนฮฑ ฮบฮฌฮธฮต ฮฟฮผฮฌฮดฮฑ ฯƒฯ„ฮฟ ฯ‡ฮฑฯฮฑฮบฯ„ฮทฯฮนฯƒฯ„ฮนฮบฯŒ. ฮ‘ฯฯ‡ฮนฮบฮฌ, ฮผฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮตฮบฯ„ฮตฮปฮญฯƒฮตฯ„ฮต ฯ„ฮฟฮฝ ฯ€ฮฑฯฮฑฮบฮฌฯ„ฯ‰ ฮบฯŽฮดฮนฮบฮฑ ฮณฮนฮฑ ฮฝฮฑ ฯ…ฯ€ฮฟฮปฮฟฮณฮฏฯƒฮตฯ„ฮต ฯ„ฮท ฯƒฯ…ฮฝฮฟฮปฮนฮบฮฎ ฯ€ฮฟฯƒฯŒฯ„ฮทฯ„ฮฑ ฯ„ฯ‰ฮฝ ฯƒฯ„ฮทฮปฯŽฮฝ ฯ€ฮฟฯ… ฯ‡ฯฮตฮนฮฌฮถฮตฯƒฯ„ฮต.

print(df_train[CATE_FEATURES].nunique(),
      'There are',sum(df_train[CATE_FEATURES].nunique()), 'groups in the whole dataset')
workclass          9
education         16
marital            7
occupation        15
relationship       6
race               5
sex                2
native_country    41
dtype: int64 There are 101 groups in the whole dataset

ฮŸฮปฯŒฮบฮปฮทฯฮฟ ฯ„ฮฟ ฯƒฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ ฯ€ฮตฯฮนฮญฯ‡ฮตฮน 101 ฮฟฮผฮฌฮดฮตฯ‚ ฯŒฯ€ฯ‰ฯ‚ ฯ†ฮฑฮฏฮฝฮตฯ„ฮฑฮน ฯ€ฮฑฯฮฑฯ€ฮฌฮฝฯ‰. ฮ“ฮนฮฑ ฯ€ฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ, ฯ„ฮฑ ฯ‡ฮฑฯฮฑฮบฯ„ฮทฯฮนฯƒฯ„ฮนฮบฮฌ ฯ„ฮทฯ‚ ฯ„ฮฌฮพฮทฯ‚ ฮตฯฮณฮฑฯƒฮฏฮฑฯ‚ ฮญฯ‡ฮฟฯ…ฮฝ ฮตฮฝฮฝฮญฮฑ ฮฟฮผฮฌฮดฮตฯ‚. ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮฑฯ€ฮตฮนฮบฮฟฮฝฮฏฯƒฮตฯ„ฮต ฯ„ฮฟ ฯŒฮฝฮฟฮผฮฑ ฯ„ฯ‰ฮฝ ฮฟฮผฮฌฮดฯ‰ฮฝ ฮผฮต ฯ„ฮฟฯ…ฯ‚ ฮฑฮบฯŒฮปฮฟฯ…ฮธฮฟฯ…ฯ‚ ฮบฯ‰ฮดฮนฮบฮฟฯฯ‚

ฮ— ฮผฮฟฮฝฮฑฮดฮนฮบฮฎ() ฮตฯ€ฮนฯƒฯ„ฯฮญฯ†ฮตฮน ฯ„ฮนฯ‚ ฮผฮฟฮฝฮฑฮดฮนฮบฮญฯ‚ ฯ„ฮนฮผฮญฯ‚ ฯ„ฯ‰ฮฝ ฮบฮฑฯ„ฮทฮณฮฟฯฮนฮบฯŽฮฝ ฯ‡ฮฑฯฮฑฮบฯ„ฮทฯฮนฯƒฯ„ฮนฮบฯŽฮฝ.

for i in CATE_FEATURES:
    print(df_train[i].unique())
['State-gov' 'Self-emp-not-inc' 'Private' 'Federal-gov' 'Local-gov' '?'
 'Self-emp-inc' 'Without-pay' 'Never-worked']
['Bachelors' 'HS-grad' '11th' 'Masters' '9th' 'Some-college' 'Assoc-acdm'
 'Assoc-voc' '7th-8th' 'Doctorate' 'Prof-school' '5th-6th' '10th'
 '1st-4th' 'Preschool' '12th']
['Never-married' 'Married-civ-spouse' 'Divorced' 'Married-spouse-absent'
 'Separated' 'Married-AF-spouse' 'Widowed']
['Adm-clerical' 'Exec-managerial' 'Handlers-cleaners' 'Prof-specialty'
 'Other-service' 'Sales' 'Craft-repair' 'Transport-moving'
 'Farming-fishing' 'Machine-op-inspct' 'Tech-support' '?'
 'Protective-serv' 'Armed-Forces' 'Priv-house-serv']
['Not-in-family' 'Husband' 'Wife' 'Own-child' 'Unmarried' 'Other-relative']
['White' 'Black' 'Asian-Pac-Islander' 'Amer-Indian-Eskimo' 'Other']
['Male' 'Female']
['United-States' 'Cuba' 'Jamaica' 'India' '?' 'Mexico' 'South'
 'Puerto-Rico' 'Honduras' 'England' 'Canada' 'Germany' 'Iran'
 'Philippines' 'Italy' 'Poland' 'Columbia' 'Cambodia' 'Thailand' 'Ecuador'
 'Laos' 'Taiwan' 'Haiti' 'Portugal' 'Dominican-Republic' 'El-Salvador'
 'France' 'Guatemala' 'China' 'Japan' 'Yugoslavia' 'Peru'
 'Outlying-US(Guam-USVI-etc)' 'Scotland' 'Trinadad&Tobago' 'Greece'
 'Nicaragua' 'Vietnam' 'Hong' 'Ireland' 'Hungary']

ฮ•ฯ€ฮฟฮผฮญฮฝฯ‰ฯ‚, ฯ„ฮฟ ฯƒฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ ฮตฮบฯ€ฮฑฮฏฮดฮตฯ…ฯƒฮทฯ‚ ฮธฮฑ ฯ€ฮตฯฮนฮญฯ‡ฮตฮน 101 + 7 ฯƒฯ„ฮฎฮปฮตฯ‚. ฮŸฮน ฯ„ฮตฮปฮตฯ…ฯ„ฮฑฮฏฮตฯ‚ ฮตฯ€ฯ„ฮฌ ฯƒฯ„ฮฎฮปฮตฯ‚ ฮตฮฏฮฝฮฑฮน ฯ„ฮฑ ฯƒฯ…ฮฝฮตฯ‡ฮฎ ฯ‡ฮฑฯฮฑฮบฯ„ฮทฯฮนฯƒฯ„ฮนฮบฮฌ.

ฮคฮฟ Scikit-learn ฮผฯ€ฮฟฯฮตฮฏ ฮฝฮฑ ฮฑฮฝฮฑฮปฮฌฮฒฮตฮน ฯ„ฮท ฮผฮตฯ„ฮฑฯ„ฯฮฟฯ€ฮฎ. ฮ“ฮฏฮฝฮตฯ„ฮฑฮน ฯƒฮต ฮดฯฮฟ ฮฒฮฎฮผฮฑฯ„ฮฑ:

  • ฮ ฯฯŽฯ„ฮฑ, ฯ€ฯฮญฯ€ฮตฮน ฮฝฮฑ ฮผฮตฯ„ฮฑฯ„ฯฮญฯˆฮตฯ„ฮต ฯ„ฮท ฯƒฯ…ฮผฮฒฮฟฮปฮฟฯƒฮตฮนฯฮฌ ฯƒฮต ID. ฮ“ฮนฮฑ ฯ€ฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ, ฯ„ฮฟ State-gov ฮธฮฑ ฮญฯ‡ฮตฮน ฯ„ฮฟ ID 1, Self-emp-not-inc ID 2 ฮบฮฑฮน ฮฟฯฯ„ฯ‰ ฮบฮฑฮธฮตฮพฮฎฯ‚. ฮ— ฮปฮตฮนฯ„ฮฟฯ…ฯฮณฮฏฮฑ LabelEncoder ฯ„ฮฟ ฮบฮฌฮฝฮตฮน ฮฑฯ…ฯ„ฯŒ ฮณฮนฮฑ ฮตฯƒฮฌฯ‚
  • ฮœฮตฯ„ฮฑฯ†ฮญฯฮตฯ„ฮต ฮบฮฌฮธฮต ฮฑฮฝฮฑฮณฮฝฯ‰ฯฮนฯƒฯ„ฮนฮบฯŒ ฯƒฮต ฮผฮนฮฑ ฮฝฮญฮฑ ฯƒฯ„ฮฎฮปฮท. ฮŒฯ€ฯ‰ฯ‚ ฮฑฮฝฮฑฯ†ฮญฯฮธฮทฮบฮต ฯ€ฯฮฟฮทฮณฮฟฯ…ฮผฮญฮฝฯ‰ฯ‚, ฯ„ฮฟ ฯƒฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ ฮญฯ‡ฮตฮน 101 ฮฑฮฝฮฑฮณฮฝฯ‰ฯฮนฯƒฯ„ฮนฮบฯŒ ฮฟฮผฮฌฮดฮฑฯ‚. ฮ•ฯ€ฮฟฮผฮญฮฝฯ‰ฯ‚, ฮธฮฑ ฯ…ฯ€ฮฌฯฯ‡ฮฟฯ…ฮฝ 101 ฯƒฯ„ฮฎฮปฮตฯ‚ ฯ€ฮฟฯ… ฮธฮฑ ฮบฮฑฯ„ฮฑฮณฯฮฌฯ†ฮฟฯ…ฮฝ ฯŒฮปฮตฯ‚ ฯ„ฮนฯ‚ ฮฟฮผฮฌฮดฮตฯ‚ ฯ‡ฮฑฯฮฑฮบฯ„ฮทฯฮนฯƒฯ„ฮนฮบฯŽฮฝ ฮบฮฑฯ„ฮทฮณฮฟฯฮนฯŽฮฝ. ฮคฮฟ Scikit-learn ฮญฯ‡ฮตฮน ฮผฮนฮฑ ฮปฮตฮนฯ„ฮฟฯ…ฯฮณฮฏฮฑ ฯ€ฮฟฯ… ฮฟฮฝฮฟฮผฮฌฮถฮตฯ„ฮฑฮน OneHotEncoder ฯ€ฮฟฯ… ฮตฮบฯ„ฮตฮปฮตฮฏ ฮฑฯ…ฯ„ฮฎ ฯ„ฮท ฮปฮตฮนฯ„ฮฟฯ…ฯฮณฮฏฮฑ

ฮ’ฮฎฮผฮฑ 2) ฮ”ฮทฮผฮนฮฟฯ…ฯฮณฮฎฯƒฯ„ฮต ฯ„ฮฟ ฯƒฮตฯ„ ฯ„ฯฮญฮฝฮฟฯ…/ฮดฮฟฮบฮนฮผฯŽฮฝ

ฮคฯŽฯฮฑ ฯ€ฮฟฯ… ฯ„ฮฟ ฯƒฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ ฮตฮฏฮฝฮฑฮน ฮญฯ„ฮฟฮนฮผฮฟ, ฮผฯ€ฮฟฯฮฟฯฮผฮต ฮฝฮฑ ฯ„ฮฟ ฯ‡ฯ‰ฯฮฏฯƒฮฟฯ…ฮผฮต 80/20.

80 ฯ„ฮฟฮนฯ‚ ฮตฮบฮฑฯ„ฯŒ ฮณฮนฮฑ ฯ„ฮฟ ฯƒฮตฯ„ ฯ€ฯฮฟฯ€ฯŒฮฝฮทฯƒฮทฯ‚ ฮบฮฑฮน 20 ฯ„ฮฟฮนฯ‚ ฮตฮบฮฑฯ„ฯŒ ฮณฮนฮฑ ฯ„ฮฟ ฯƒฮตฯ„ ฮดฮฟฮบฮนฮผฯŽฮฝ.

ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮฎฯƒฮตฯ„ฮต ฯ„ฮฟ train_test_split. ฮคฮฟ ฯ€ฯฯŽฯ„ฮฟ ฯŒฯฮนฯƒฮผฮฑ ฮตฮฏฮฝฮฑฮน ฯ„ฮฟ ฯ€ฮปฮฑฮฏฯƒฮนฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ ฮตฮฏฮฝฮฑฮน ฯ„ฮฑ ฯ‡ฮฑฯฮฑฮบฯ„ฮทฯฮนฯƒฯ„ฮนฮบฮฌ ฮบฮฑฮน ฯ„ฮฟ ฮดฮตฯฯ„ฮตฯฮฟ ฯŒฯฮนฯƒฮผฮฑ ฮตฮฏฮฝฮฑฮน ฯ„ฮฟ ฯ€ฮปฮฑฮฏฯƒฮนฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ ฮตฯ„ฮนฮบฮญฯ„ฮฑฯ‚. ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮบฮฑฮธฮฟฯฮฏฯƒฮตฯ„ฮต ฯ„ฮฟ ฮผฮญฮณฮตฮธฮฟฯ‚ ฯ„ฮฟฯ… ฯƒฯ…ฮฝฯŒฮปฮฟฯ… ฮดฮฟฮบฮนฮผฮฎฯ‚ ฮผฮต ฯ„ฮฟ test_size.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df_train[features],
                                                    df_train.label,
                                                    test_size = 0.2,
                                                    random_state=0)
X_train.head(5)
print(X_train.shape, X_test.shape)
(26048, 14) (6512, 14)

ฮ’ฮฎฮผฮฑ 3) ฮšฮฑฯ„ฮฑฯƒฮบฮตฯ…ฮฌฯƒฯ„ฮต ฯ„ฮฟฮฝ ฮฑฮณฯ‰ฮณฯŒ

ฮŸ ฮฑฮณฯ‰ฮณฯŒฯ‚ ฮดฮนฮตฯ…ฮบฮฟฮปฯฮฝฮตฮน ฯ„ฮทฮฝ ฯ„ฯฮฟฯ†ฮฟฮดฮฟฯƒฮฏฮฑ ฯ„ฮฟฯ… ฮผฮฟฮฝฯ„ฮญฮปฮฟฯ… ฮผฮต ฯƒฯ…ฮฝฮตฯ€ฮฎ ฮดฮตฮดฮฟฮผฮญฮฝฮฑ.

ฮ— ฮนฮดฮญฮฑ ฯ€ฮฏฯƒฯ‰ ฮฑฯ€ฯŒ ฮตฮฏฮฝฮฑฮน ฮฝฮฑ ฯ„ฮฟฯ€ฮฟฮธฮตฯ„ฮทฮธฮฟฯฮฝ ฯ„ฮฑ ฮฑฮบฮฑฯ„ฮญฯฮณฮฑฯƒฯ„ฮฑ ฮดฮตฮดฮฟฮผฮญฮฝฮฑ ฯƒฮต ฮญฮฝฮฑฮฝ ยซฮฑฮณฯ‰ฮณฯŒยป ฮณฮนฮฑ ฯ„ฮทฮฝ ฮตฮบฯ„ฮญฮปฮตฯƒฮท ฮปฮตฮนฯ„ฮฟฯ…ฯฮณฮนฯŽฮฝ.

ฮ“ฮนฮฑ ฯ€ฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ, ฮผฮต ฯ„ฮฟ ฯ„ฯฮญฯ‡ฮฟฮฝ ฯƒฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ, ฯ€ฯฮญฯ€ฮตฮน ฮฝฮฑ ฯ„ฯ…ฯ€ฮฟฯ€ฮฟฮนฮฎฯƒฮตฯ„ฮต ฯ„ฮนฯ‚ ฯƒฯ…ฮฝฮตฯ‡ฮตฮฏฯ‚ ฮผฮตฯ„ฮฑฮฒฮปฮทฯ„ฮญฯ‚ ฮบฮฑฮน ฮฝฮฑ ฮผฮตฯ„ฮฑฯ„ฯฮญฯˆฮตฯ„ฮต ฯ„ฮฑ ฮบฮฑฯ„ฮทฮณฮฟฯฮนฮบฮฌ ฮดฮตฮดฮฟฮผฮญฮฝฮฑ. ฮฃฮทฮผฮตฮนฯŽฯƒฯ„ฮต ฯŒฯ„ฮน ฮผฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮตฮบฯ„ฮตฮปฮญฯƒฮตฯ„ฮต ฮฟฯ€ฮฟฮนฮฑฮดฮฎฯ€ฮฟฯ„ฮต ฮปฮตฮนฯ„ฮฟฯ…ฯฮณฮฏฮฑ ฮตฮฝฯ„ฯŒฯ‚ ฯ„ฮฟฯ… ฮฑฮณฯ‰ฮณฮฟฯ. ฮ“ฮนฮฑ ฯ€ฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ, ฮตฮฌฮฝ ฮญฯ‡ฮตฯ„ฮต ยซNAยป ฯƒฯ„ฮฟ ฯƒฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ, ฮผฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฯ„ฮฑ ฮฑฮฝฯ„ฮนฮบฮฑฯ„ฮฑฯƒฯ„ฮฎฯƒฮตฯ„ฮต ฮผฮต ฯ„ฮฟฮฝ ฮผฮญฯƒฮฟ ฯŒฯฮฟ ฮฎ ฯ„ฮฟฮฝ ฮดฮนฮฌฮผฮตฯƒฮฟ. ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮตฯ€ฮฏฯƒฮทฯ‚ ฮฝฮฑ ฮดฮทฮผฮนฮฟฯ…ฯฮณฮฎฯƒฮตฯ„ฮต ฮฝฮญฮตฯ‚ ฮผฮตฯ„ฮฑฮฒฮปฮทฯ„ฮญฯ‚.

ฮˆฯ‡ฮตฯ„ฮต ฯ„ฮทฮฝ ฮตฯ€ฮนฮปฮฟฮณฮฎ. ฮบฯ‰ฮดฮนฮบฮฟฯ€ฮฟฮนฮฎฯƒฯ„ฮต ฯƒฮบฮปฮทฯฮฌ ฯ„ฮนฯ‚ ฮดฯฮฟ ฮดฮนฮฑฮดฮนฮบฮฑฯƒฮฏฮตฯ‚ ฮฎ ฮดฮทฮผฮนฮฟฯ…ฯฮณฮฎฯƒฯ„ฮต ฮผฮนฮฑ ฮดฮนฮฟฯ‡ฮญฯ„ฮตฯ…ฯƒฮท. ฮ— ฯ€ฯฯŽฯ„ฮท ฮตฯ€ฮนฮปฮฟฮณฮฎ ฮผฯ€ฮฟฯฮตฮฏ ฮฝฮฑ ฮฟฮดฮทฮณฮฎฯƒฮตฮน ฯƒฮต ฮดฮนฮฑฯฯฮฟฮฎ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ ฮบฮฑฮน ฮฝฮฑ ฮดฮทฮผฮนฮฟฯ…ฯฮณฮฎฯƒฮตฮน ฮฑฯƒฯ…ฮฝฮญฯ€ฮตฮนฮตฯ‚ ฮผฮต ฯ„ฮทฮฝ ฯ€ฮฌฯฮฟฮดฮฟ ฯ„ฮฟฯ… ฯ‡ฯฯŒฮฝฮฟฯ…. ฮœฮนฮฑ ฮบฮฑฮปฯฯ„ฮตฯฮท ฮตฯ€ฮนฮปฮฟฮณฮฎ ฮตฮฏฮฝฮฑฮน ฮฝฮฑ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮฎฯƒฮตฯ„ฮต ฯ„ฮฟฮฝ ฮฑฮณฯ‰ฮณฯŒ.

from sklearn.preprocessing import StandardScaler, OneHotEncoder, LabelEncoder
from sklearn.compose import ColumnTransformer, make_column_transformer
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression

ฮŸ ฮฑฮณฯ‰ฮณฯŒฯ‚ ฮธฮฑ ฮตฮบฯ„ฮตฮปฮญฯƒฮตฮน ฮดฯฮฟ ฮปฮตฮนฯ„ฮฟฯ…ฯฮณฮฏฮตฯ‚ ฯ€ฯฮนฮฝ ฯ„ฯฮฟฯ†ฮฟฮดฮฟฯ„ฮฎฯƒฮตฮน ฯ„ฮฟฮฝ ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮฎ logistic:

  1. ฮคฯ…ฯ€ฮฟฯ€ฮฟฮนฮฎฯƒฯ„ฮต ฯ„ฮท ฮผฮตฯ„ฮฑฮฒฮปฮทฯ„ฮฎ: "StandardScaler()"
  2. ฮœฮตฯ„ฮฑฯ„ฯฮฟฯ€ฮฎ ฯ„ฯ‰ฮฝ ฮบฮฑฯ„ฮทฮณฮฟฯฮนฮบฯŽฮฝ ฯ‡ฮฑฯฮฑฮบฯ„ฮทฯฮนฯƒฯ„ฮนฮบฯŽฮฝ: OneHotEncoder(sparse=False)

ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮตฮบฯ„ฮตฮปฮญฯƒฮตฯ„ฮต ฯ„ฮฑ ฮดฯฮฟ ฮฒฮฎฮผฮฑฯ„ฮฑ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฯŽฮฝฯ„ฮฑฯ‚ ฯ„ฮฟ make_column_transformer. ฮ‘ฯ…ฯ„ฮฎ ฮท ฮปฮตฮนฯ„ฮฟฯ…ฯฮณฮฏฮฑ ฮดฮตฮฝ ฮตฮฏฮฝฮฑฮน ฮดฮนฮฑฮธฮญฯƒฮนฮผฮท ฯƒฯ„ฮทฮฝ ฯ„ฯฮญฯ‡ฮฟฯ…ฯƒฮฑ ฮญฮบฮดฮฟฯƒฮท ฯ„ฮฟฯ… scikit-learn (0.19). ฮ”ฮตฮฝ ฮตฮฏฮฝฮฑฮน ฮดฯ…ฮฝฮฑฯ„ฯŒ ฮผฮต ฯ„ฮทฮฝ ฯ„ฯฮญฯ‡ฮฟฯ…ฯƒฮฑ ฮญฮบฮดฮฟฯƒฮท ฮฝฮฑ ฮตฮบฯ„ฮตฮปฮตฯƒฯ„ฮตฮฏ ฮฟ ฮบฯ‰ฮดฮนฮบฮฟฯ€ฮฟฮนฮทฯ„ฮฎฯ‚ ฮตฯ„ฮนฮบฮตฯ„ฯŽฮฝ ฮบฮฑฮน ฮญฮฝฮฑฯ‚ ฮบฮฑฯ…ฯ„ฯŒฯ‚ ฮบฯ‰ฮดฮนฮบฮฟฯ€ฮฟฮนฮทฯ„ฮฎฯ‚ ฯƒฮต ฮตฮพฮญฮปฮนฮพฮท. ฮ•ฮฏฮฝฮฑฮน ฮญฮฝฮฑฯ‚ ฮปฯŒฮณฮฟฯ‚ ฯ€ฮฟฯ… ฮฑฯ€ฮฟฯ†ฮฑฯƒฮฏฯƒฮฑฮผฮต ฮฝฮฑ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮฎฯƒฮฟฯ…ฮผฮต ฯ„ฮทฮฝ ฮญฮบฮดฮฟฯƒฮท ฯ€ฯฮฟฮณฯฮฑฮผฮผฮฑฯ„ฮนฯƒฯ„ฮฎ.

ฮคฮฟ make_column_transformer ฮตฮฏฮฝฮฑฮน ฮตฯฮบฮฟฮปฮฟ ฯƒฯ„ฮท ฯ‡ฯฮฎฯƒฮท. ฮ ฯฮญฯ€ฮตฮน ฮฝฮฑ ฮฟฯฮฏฯƒฮตฯ„ฮต ฯ€ฮฟฮนฮตฯ‚ ฯƒฯ„ฮฎฮปฮตฯ‚ ฮธฮฑ ฮตฯ†ฮฑฯฮผฮฟฯƒฯ„ฮตฮฏ ฮฟ ฮผฮตฯ„ฮฑฯƒฯ‡ฮทฮผฮฑฯ„ฮนฯƒฮผฯŒฯ‚ ฮบฮฑฮน ฯ€ฮฟฮนฮฟฯ‚ ฮผฮตฯ„ฮฑฯƒฯ‡ฮทฮผฮฑฯ„ฮนฯƒฮผฯŒฯ‚ ฮธฮฑ ฮปฮตฮนฯ„ฮฟฯ…ฯฮณฮฎฯƒฮตฮน. ฮ“ฮนฮฑ ฯ€ฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ, ฮณฮนฮฑ ฮฝฮฑ ฯ„ฯ…ฯ€ฮฟฯ€ฮฟฮนฮฎฯƒฮตฯ„ฮต ฯ„ฮท ฯƒฯ…ฮฝฮตฯ‡ฮฎ ฮปฮตฮนฯ„ฮฟฯ…ฯฮณฮฏฮฑ, ฮผฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮบฮฌฮฝฮตฯ„ฮต:

  • conti_features, StandardScaler() ฮผฮญฯƒฮฑ ฯƒฯ„ฮฟ make_column_transformer.
    • conti_features: ฮปฮฏฯƒฯ„ฮฑ ฮผฮต ฯ„ฮท ฯƒฯ…ฮฝฮตฯ‡ฮฎ ฮผฮตฯ„ฮฑฮฒฮปฮทฯ„ฮฎ
    • StandardScaler: ฯ„ฯ…ฯ€ฮฟฯ€ฮฟฮฏฮทฯƒฮท ฯ„ฮทฯ‚ ฮผฮตฯ„ฮฑฮฒฮปฮทฯ„ฮฎฯ‚

ฮคฮฟ ฮฑฮฝฯ„ฮนฮบฮตฮฏฮผฮตฮฝฮฟ OneHotEncoder ฮผฮญฯƒฮฑ ฯƒฯ„ฮฟ make_column_transformer ฮบฯ‰ฮดฮนฮบฮฟฯ€ฮฟฮนฮตฮฏ ฮฑฯ…ฯ„ฯŒฮผฮฑฯ„ฮฑ ฯ„ฮทฮฝ ฮตฯ„ฮนฮบฮญฯ„ฮฑ.

preprocess = make_column_transformer(
    (conti_features, StandardScaler()),
    ### Need to be numeric not string to specify columns name 
    (categorical_features, OneHotEncoder(sparse=False))
)

ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮตฮปฮญฮณฮพฮตฯ„ฮต ฮตฮฌฮฝ ฮฟ ฮฑฮณฯ‰ฮณฯŒฯ‚ ฮปฮตฮนฯ„ฮฟฯ…ฯฮณฮตฮฏ ฮผฮต ฯ„ฮฟ fit_transform. ฮคฮฟ ฯƒฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ ฯ€ฯฮญฯ€ฮตฮน ฮฝฮฑ ฮญฯ‡ฮตฮน ฯ„ฮฟ ฮฑฮบฯŒฮปฮฟฯ…ฮธฮฟ ฯƒฯ‡ฮฎฮผฮฑ: 26048, 107

preprocess.fit_transform(X_train).shape
(26048, 107)

ฮŸ ฮผฮตฯ„ฮฑฯƒฯ‡ฮทฮผฮฑฯ„ฮนฯƒฯ„ฮฎฯ‚ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ ฮตฮฏฮฝฮฑฮน ฮญฯ„ฮฟฮนฮผฮฟฯ‚ ฮณฮนฮฑ ฯ‡ฯฮฎฯƒฮท. ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮดฮทฮผฮนฮฟฯ…ฯฮณฮฎฯƒฮตฯ„ฮต ฯ„ฮท ฮดฮนฮฟฯ‡ฮญฯ„ฮตฯ…ฯƒฮท ฮผฮต ฯ„ฮฟ make_pipeline. ฮœฯŒฮปฮนฯ‚ ฯ„ฮฑ ฮดฮตฮดฮฟฮผฮญฮฝฮฑ ฮผฮตฯ„ฮฑฯƒฯ‡ฮทฮผฮฑฯ„ฮนฯƒฯ„ฮฟฯฮฝ, ฮผฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฯ„ฯฮฟฯ†ฮฟฮดฮฟฯ„ฮฎฯƒฮตฯ„ฮต ฯ„ฮทฮฝ ฮปฮฟฮณฮนฯƒฯ„ฮนฮบฮฎ ฯ€ฮฑฮปฮนฮฝฮดฯฯŒฮผฮทฯƒฮท.

model = make_pipeline(
    preprocess,
    LogisticRegression())

ฮ— ฮตฮบฯ€ฮฑฮฏฮดฮตฯ…ฯƒฮท ฮตฮฝฯŒฯ‚ ฮผฮฟฮฝฯ„ฮญฮปฮฟฯ… ฮผฮต ฯ„ฮฟ scikit-learn ฮตฮฏฮฝฮฑฮน ฮฑฯƒฮฎฮผฮฑฮฝฯ„ฮท. ฮ ฯฮญฯ€ฮตฮน ฮฝฮฑ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮฎฯƒฮตฯ„ฮต ฯ„ฮทฮฝ ฯ€ฯฮฟฯƒฮฑฯฮผฮฟฮณฮฎ ฮฑฮฝฯ„ฮนฮบฮตฮนฮผฮญฮฝฮฟฯ… ฯ€ฮฟฯ… ฯ€ฯฮฟฮทฮณฮตฮฏฯ„ฮฑฮน ฯ„ฮฟฯ… ฮฑฮณฯ‰ฮณฮฟฯ, ฮดฮทฮปฮฑฮดฮฎ ฯ„ฮฟ ฮผฮฟฮฝฯ„ฮญฮปฮฟ. ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮตฮบฯ„ฯ…ฯ€ฯŽฯƒฮตฯ„ฮต ฯ„ฮทฮฝ ฮฑฮบฯฮฏฮฒฮตฮนฮฑ ฮผฮต ฯ„ฮฟ ฮฑฮฝฯ„ฮนฮบฮตฮฏฮผฮตฮฝฮฟ ฮฒฮฑฮธฮผฮฟฮปฮฟฮณฮฏฮฑฯ‚ ฮฑฯ€ฯŒ ฯ„ฮท ฮฒฮนฮฒฮปฮนฮฟฮธฮฎฮบฮท scikit-learn

model.fit(X_train, y_train)
print("logistic regression score: %f" % model.score(X_test, y_test))
logistic regression score: 0.850891

ฮคฮญฮปฮฟฯ‚, ฮผฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฯ€ฯฮฟฮฒฮปฮญฯˆฮตฯ„ฮต ฯ„ฮนฯ‚ ฯ„ฮฌฮพฮตฮนฯ‚ ฮผฮต ฯ„ฮฟ predict_proba. ฮ•ฯ€ฮนฯƒฯ„ฯฮญฯ†ฮตฮน ฯ„ฮทฮฝ ฯ€ฮนฮธฮฑฮฝฯŒฯ„ฮทฯ„ฮฑ ฮณฮนฮฑ ฮบฮฌฮธฮต ฮบฮปฮฌฯƒฮท. ฮฃฮทฮผฮตฮนฯŽฯƒฯ„ฮต ฯŒฯ„ฮน ฮฑฮธฯฮฟฮฏฮถฮตฯ„ฮฑฮน ฯƒฮต ฮญฮฝฮฑ.

model.predict_proba(X_test)
array([[0.83576663, 0.16423337],
       [0.94582765, 0.05417235],
       [0.64760587, 0.35239413],
       ...,
       [0.99639252, 0.00360748],
       [0.02072181, 0.97927819],
       [0.56781353, 0.43218647]])

ฮ’ฮฎฮผฮฑ 4) ฮงฯฮฎฯƒฮท ฯ„ฮฟฯ… ฮฑฮณฯ‰ฮณฮฟฯ ฮผฮฑฯ‚ ฯƒฮต ฮผฮนฮฑ ฮฑฮฝฮฑฮถฮฎฯ„ฮทฯƒฮท ฯ€ฮปฮญฮณฮผฮฑฯ„ฮฟฯ‚

ฮŸ ฯƒฯ…ฮฝฯ„ฮฟฮฝฮนฯƒฮผฯŒฯ‚ ฯ„ฮทฯ‚ ฯ…ฯ€ฮตฯฯ€ฮฑฯฮฌฮผฮตฯ„ฯฮฟฯ… (ฮผฮตฯ„ฮฑฮฒฮปฮทฯ„ฮญฯ‚ ฯ€ฮฟฯ… ฮบฮฑฮธฮฟฯฮฏฮถฮฟฯ…ฮฝ ฯ„ฮท ฮดฮฟฮผฮฎ ฯ„ฮฟฯ… ฮดฮนฮบฯ„ฯฮฟฯ… ฯŒฯ€ฯ‰ฯ‚ ฮฟฮน ฮบฯฯ…ฯ†ฮญฯ‚ ฮผฮฟฮฝฮฌฮดฮตฯ‚) ฮผฯ€ฮฟฯฮตฮฏ ฮฝฮฑ ฮตฮฏฮฝฮฑฮน ฮบฮฟฯ…ฯฮฑฯƒฯ„ฮนฮบฯŒฯ‚ ฮบฮฑฮน ฮตฮพฮฑฮฝฯ„ฮปฮทฯ„ฮนฮบฯŒฯ‚.

ฮˆฮฝฮฑฯ‚ ฯ„ฯฯŒฯ€ฮฟฯ‚ ฮฑฮพฮนฮฟฮปฯŒฮณฮทฯƒฮทฯ‚ ฯ„ฮฟฯ… ฮผฮฟฮฝฯ„ฮญฮปฮฟฯ… ฮธฮฑ ฮผฯ€ฮฟฯฮฟฯฯƒฮต ฮฝฮฑ ฮตฮฏฮฝฮฑฮน ฮท ฮฑฮปฮปฮฑฮณฮฎ ฯ„ฮฟฯ… ฮผฮตฮณฮญฮธฮฟฯ…ฯ‚ ฯ„ฮฟฯ… ฯƒฮตฯ„ ฮตฮบฯ€ฮฑฮฏฮดฮตฯ…ฯƒฮทฯ‚ ฮบฮฑฮน ฮท ฮฑฮพฮนฮฟฮปฯŒฮณฮทฯƒฮท ฯ„ฯ‰ฮฝ ฮตฯ€ฮนฮดฯŒฯƒฮตฯ‰ฮฝ.

ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮตฯ€ฮฑฮฝฮฑฮปฮฌฮฒฮตฯ„ฮต ฮฑฯ…ฯ„ฮฎฮฝ ฯ„ฮท ฮผฮญฮธฮฟฮดฮฟ ฮดฮญฮบฮฑ ฯ†ฮฟฯฮญฯ‚ ฮณฮนฮฑ ฮฝฮฑ ฮดฮตฮฏฯ„ฮต ฯ„ฮนฯ‚ ฮผฮตฯ„ฯฮฎฯƒฮตฮนฯ‚ ฮฒฮฑฮธฮผฮฟฮปฮฟฮณฮฏฮฑฯ‚. ฮฉฯƒฯ„ฯŒฯƒฮฟ, ฮตฮฏฮฝฮฑฮน ฯ€ฮฌฯฮฑ ฯ€ฮฟฮปฮปฮฎ ฮดฮฟฯ…ฮปฮตฮนฮฌ.

ฮ‘ฮฝฯ„ฮฏฮธฮตฯ„ฮฑ, ฯ„ฮฟ scikit-learn ฯ€ฮฑฯฮญฯ‡ฮตฮน ฮผฮนฮฑ ฮปฮตฮนฯ„ฮฟฯ…ฯฮณฮฏฮฑ ฮณฮนฮฑ ฯ„ฮทฮฝ ฮตฮบฯ„ฮญฮปฮตฯƒฮท ฯƒฯ…ฮฝฯ„ฮฟฮฝฮนฯƒฮผฮฟฯ ฯ€ฮฑฯฮฑฮผฮญฯ„ฯฯ‰ฮฝ ฮบฮฑฮน ฮดฮนฮฑฯƒฯ„ฮฑฯ…ฯฮฟฯฮผฮตฮฝฮทฯ‚ ฮตฯ€ฮนฮบฯฯฯ‰ฯƒฮทฯ‚.

ฮ”ฮนฮฑฯƒฯ„ฮฑฯ…ฯฯ‰ฮผฮญฮฝฮท ฮตฯ€ฮนฮบฯฯฯ‰ฯƒฮท

Cross-Validation ฯƒฮทฮผฮฑฮฏฮฝฮตฮน ฯŒฯ„ฮน ฮบฮฑฯ„ฮฌ ฯ„ฮท ฮดฮนฮฌฯฮบฮตฮนฮฑ ฯ„ฮทฯ‚ ฮตฮบฯ€ฮฑฮฏฮดฮตฯ…ฯƒฮทฯ‚, ฯ„ฮฟ ฯƒฮตฯ„ ฮตฮบฯ€ฮฑฮฏฮดฮตฯ…ฯƒฮทฯ‚ ฮฟฮปฮนฯƒฮธฮฑฮฏฮฝฮตฮน n ฯ†ฮฟฯฮญฯ‚ ฯƒฮต ฯ€ฯ„ฯ…ฯ‡ฮญฯ‚ ฮบฮฑฮน ฯƒฯ„ฮท ฯƒฯ…ฮฝฮญฯ‡ฮตฮนฮฑ ฮฑฮพฮนฮฟฮปฮฟฮณฮตฮฏ ฯ„ฮฟ ฮผฮฟฮฝฯ„ฮญฮปฮฟ n ฯ‡ฯฯŒฮฝฮฟ. ฮ“ฮนฮฑ ฯ€ฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ, ฮตฮฌฮฝ ฯ„ฮฟ cv ฮญฯ‡ฮตฮน ฮฟฯฮนฯƒฯ„ฮตฮฏ ฯƒฯ„ฮฟ 10, ฯ„ฮฟ ฯƒฮตฯ„ ฮตฮบฯ€ฮฑฮฏฮดฮตฯ…ฯƒฮทฯ‚ ฮตฮบฯ€ฮฑฮนฮดฮตฯฮตฯ„ฮฑฮน ฮบฮฑฮน ฮฑฮพฮนฮฟฮปฮฟฮณฮตฮฏฯ„ฮฑฮน ฮดฮญฮบฮฑ ฯ†ฮฟฯฮญฯ‚. ฮฃฮต ฮบฮฌฮธฮต ฮณฯฯฮฟ, ฮฟ ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮฎฯ‚ ฮตฯ€ฮนฮปฮญฮณฮตฮน ฯ„ฯ…ฯ‡ฮฑฮฏฮฑ ฮตฮฝฮฝฮญฮฑ ฯ†ฮฟฯฮญฯ‚ ฮณฮนฮฑ ฮฝฮฑ ฮตฮบฯ€ฮฑฮนฮดฮตฯฯƒฮตฮน ฯ„ฮฟ ฮผฮฟฮฝฯ„ฮญฮปฮฟ ฮบฮฑฮน ฮท 10ฮท ฯ€ฯ„ฯ…ฯ‡ฮฎ ฯ€ฯฮฟฮฟฯฮฏฮถฮตฯ„ฮฑฮน ฮณฮนฮฑ ฮฑฮพฮนฮฟฮปฯŒฮณฮทฯƒฮท.

ฮ‘ฮฝฮฑฮถฮฎฯ„ฮทฯƒฮท ฯ€ฮปฮญฮณฮผฮฑฯ„ฮฟฯ‚

ฮšฮฌฮธฮต ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮฎฯ‚ ฮญฯ‡ฮตฮน ฯ…ฯ€ฮตฯฯ€ฮฑฯฮฑฮผฮญฯ„ฯฮฟฯ…ฯ‚ ฮณฮนฮฑ ฯƒฯ…ฮฝฯ„ฮฟฮฝฮนฯƒฮผฯŒ. ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮดฮฟฮบฮนฮผฮฌฯƒฮตฯ„ฮต ฮดฮนฮฑฯ†ฮฟฯฮตฯ„ฮนฮบฮญฯ‚ ฯ„ฮนฮผฮญฯ‚ ฮฎ ฮผฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮฟฯฮฏฯƒฮตฯ„ฮต ฮญฮฝฮฑ ฯ€ฮปฮญฮณฮผฮฑ ฯ€ฮฑฯฮฑฮผฮญฯ„ฯฯ‰ฮฝ. ฮ•ฮฌฮฝ ฮผฮตฯ„ฮฑฮฒฮตฮฏฯ„ฮต ฯƒฯ„ฮฟฮฝ ฮตฯ€ฮฏฯƒฮทฮผฮฟ ฮนฯƒฯ„ฯŒฯ„ฮฟฯ€ฮฟ ฯ„ฮฟฯ… scikit-learn, ฮผฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮดฮตฮฏฯ„ฮต ฯŒฯ„ฮน ฮฟ ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮฎฯ‚ logistic ฮญฯ‡ฮตฮน ฮดฮนฮฑฯ†ฮฟฯฮตฯ„ฮนฮบฮญฯ‚ ฯ€ฮฑฯฮฑฮผฮญฯ„ฯฮฟฯ…ฯ‚ ฮณฮนฮฑ ฯƒฯ…ฮฝฯ„ฮฟฮฝฮนฯƒฮผฯŒ. ฮ“ฮนฮฑ ฮฝฮฑ ฮบฮฌฮฝฮตฯ„ฮต ฯ„ฮทฮฝ ฯ€ฯฮฟฯ€ฯŒฮฝฮทฯƒฮท ฯ€ฮนฮฟ ฮณฯฮฎฮณฮฟฯฮท, ฮตฯ€ฮนฮปฮญฮณฮตฯ„ฮต ฮฝฮฑ ฯƒฯ…ฮฝฯ„ฮฟฮฝฮฏฯƒฮตฯ„ฮต ฯ„ฮทฮฝ ฯ€ฮฑฯฮฌฮผฮตฯ„ฯฮฟ C. ฮ•ฮปฮญฮณฯ‡ฮตฮน ฯ„ฮทฮฝ ฯ€ฮฑฯฮฌฮผฮตฯ„ฯฮฟ ฮบฮฑฮฝฮฟฮฝฮนฮบฮฟฯ€ฮฟฮฏฮทฯƒฮทฯ‚. ฮ˜ฮฑ ฯ€ฯฮญฯ€ฮตฮน ฮฝฮฑ ฮตฮฏฮฝฮฑฮน ฮธฮตฯ„ฮนฮบฯŒ. ฮœฮนฮฑ ฮผฮนฮบฯฮฎ ฯ„ฮนฮผฮฎ ฮดฮฏฮฝฮตฮน ฯ€ฮตฯฮนฯƒฯƒฯŒฯ„ฮตฯฮฟ ฮฒฮฌฯฮฟฯ‚ ฯƒฯ„ฮฟฮฝ ฮบฮฑฮฝฮฟฮฝฮนฮบฮฟฯ€ฮฟฮนฮทฯ„ฮฎ.

ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮฎฯƒฮตฯ„ฮต ฯ„ฮฟ ฮฑฮฝฯ„ฮนฮบฮตฮฏฮผฮตฮฝฮฟ GridSearchCV. ฮ ฯฮญฯ€ฮตฮน ฮฝฮฑ ฮดฮทฮผฮนฮฟฯ…ฯฮณฮฎฯƒฮตฯ„ฮต ฮญฮฝฮฑ ฮปฮตฮพฮนฮบฯŒ ฯ€ฮฟฯ… ฮฝฮฑ ฯ€ฮตฯฮนฮญฯ‡ฮตฮน ฯ„ฮนฯ‚ ฯ…ฯ€ฮตฯฯ€ฮฑฯฮฑฮผฮญฯ„ฯฮฟฯ…ฯ‚ ฮณฮนฮฑ ฮฝฮฑ ฯƒฯ…ฮฝฯ„ฮฟฮฝฮนฯƒฯ„ฮตฮฏฯ„ฮต.

ฮ‘ฮฝฮฑฯ†ฮญฯฮตฯ„ฮต ฯ„ฮนฯ‚ ฯ…ฯ€ฮตฯฯ€ฮฑฯฮฑฮผฮญฯ„ฯฮฟฯ…ฯ‚ ฮฑฮบฮฟฮปฮฟฯ…ฮธฮฟฯฮผฮตฮฝฮตฯ‚ ฮฑฯ€ฯŒ ฯ„ฮนฯ‚ ฯ„ฮนฮผฮญฯ‚ ฯ€ฮฟฯ… ฮธฮญฮปฮตฯ„ฮต ฮฝฮฑ ฮดฮฟฮบฮนฮผฮฌฯƒฮตฯ„ฮต. ฮ“ฮนฮฑ ฯ€ฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ, ฮณฮนฮฑ ฮฝฮฑ ฯƒฯ…ฮฝฯ„ฮฟฮฝฮฏฯƒฮตฯ„ฮต ฯ„ฮทฮฝ ฯ€ฮฑฯฮฌฮผฮตฯ„ฯฮฟ C, ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮตฮฏฯ„ฮต:

  • 'logisticregression__C': [0.1, 1.0, 1.0]: ฮ ฯฮนฮฝ ฮฑฯ€ฯŒ ฯ„ฮทฮฝ ฯ€ฮฑฯฮฌฮผฮตฯ„ฯฮฟ ฯ…ฯ€ฮฌฯฯ‡ฮตฮน ฯ„ฮฟ ฯŒฮฝฮฟฮผฮฑ, ฮผฮต ฯ€ฮตฮถฮฌ, ฯ„ฮฟฯ… ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮฎ ฮบฮฑฮน ฮดฯฮฟ ฮบฮฌฯ„ฯ‰ ฯ€ฮฑฯฮปฮตฯ‚.

ฮคฮฟ ฮผฮฟฮฝฯ„ฮญฮปฮฟ ฮธฮฑ ฮดฮฟฮบฮนฮผฮฌฯƒฮตฮน ฯ„ฮญฯƒฯƒฮตฯฮนฯ‚ ฮดฮนฮฑฯ†ฮฟฯฮตฯ„ฮนฮบฮญฯ‚ ฯ„ฮนฮผฮญฯ‚: 0.001, 0.01, 0.1 ฮบฮฑฮน 1.

ฮ•ฮบฯ€ฮฑฮนฮดฮตฯฮตฯ„ฮต ฯ„ฮฟ ฮผฮฟฮฝฯ„ฮญฮปฮฟ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฯŽฮฝฯ„ฮฑฯ‚ 10 ฯ€ฯ„ฯ…ฯ‡ฮญฯ‚: cv=10

from sklearn.model_selection import GridSearchCV
# Construct the parameter grid
param_grid = {
    'logisticregression__C': [0.001, 0.01,0.1, 1.0],
    }

ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮตฮบฯ€ฮฑฮนฮดฮตฯฯƒฮตฯ„ฮต ฯ„ฮฟ ฮผฮฟฮฝฯ„ฮญฮปฮฟ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฯŽฮฝฯ„ฮฑฯ‚ ฯ„ฮฟ GridSearchCV ฮผฮต ฯ„ฮทฮฝ ฯ€ฮฑฯฮฌฮผฮตฯ„ฯฮฟ gri ฮบฮฑฮน cv.

# Train the model
grid_clf = GridSearchCV(model,
                        param_grid,
                        cv=10,
                        iid=False)
grid_clf.fit(X_train, y_train)

ฮ ฮ‘ฮกฮ‘ฮ“ฮฉฮ“ฮ‰

GridSearchCV(cv=10, error_score='raise-deprecating',
       estimator=Pipeline(memory=None,
     steps=[('columntransformer', ColumnTransformer(n_jobs=1, remainder='drop', transformer_weights=None,
         transformers=[('standardscaler', StandardScaler(copy=True, with_mean=True, with_std=True), [0, 2, 10, 4, 11, 12]), ('onehotencoder', OneHotEncoder(categorical_features=None, categories=None,...ty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False))]),
       fit_params=None, iid=False, n_jobs=1,
       param_grid={'logisticregression__C': [0.001, 0.01, 0.1, 1.0]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=0)

ฮ“ฮนฮฑ ฮฝฮฑ ฮฑฯ€ฮฟฮบฯ„ฮฎฯƒฮตฯ„ฮต ฯ€ฯฯŒฯƒฮฒฮฑฯƒฮท ฯƒฯ„ฮนฯ‚ ฮบฮฑฮปฯฯ„ฮตฯฮตฯ‚ ฯ€ฮฑฯฮฑฮผฮญฯ„ฯฮฟฯ…ฯ‚, ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮตฮฏฯ„ฮต best_params_

grid_clf.best_params_

ฮ ฮ‘ฮกฮ‘ฮ“ฮฉฮ“ฮ‰

{'logisticregression__C': 1.0}

ฮœฮตฯ„ฮฌ ฯ„ฮทฮฝ ฮตฮบฯ€ฮฑฮฏฮดฮตฯ…ฯƒฮท ฯ„ฮฟฯ… ฮผฮฟฮฝฯ„ฮญฮปฮฟฯ… ฮผฮต ฯ„ฮญฯƒฯƒฮตฯฮนฯ‚ ฮดฮนฮฑฯ†ฮฟฯฮตฯ„ฮนฮบฮญฯ‚ ฯ„ฮนฮผฮญฯ‚ ฯ„ฮฑฮบฯ„ฮฟฯ€ฮฟฮฏฮทฯƒฮทฯ‚, ฮท ฮฒฮญฮปฯ„ฮนฯƒฯ„ฮท ฯ€ฮฑฯฮฌฮผฮตฯ„ฯฮฟฯ‚ ฮตฮฏฮฝฮฑฮน

print("best logistic regression from grid search: %f" % grid_clf.best_estimator_.score(X_test, y_test))

ฮบฮฑฮปฯฯ„ฮตฯฮท ฮปฮฟฮณฮนฯƒฯ„ฮนฮบฮฎ ฯ€ฮฑฮปฮนฮฝฮดฯฯŒฮผฮทฯƒฮท ฮฑฯ€ฯŒ ฯ„ฮทฮฝ ฮฑฮฝฮฑฮถฮฎฯ„ฮทฯƒฮท ฯ€ฮปฮญฮณฮผฮฑฯ„ฮฟฯ‚: 0.850891

ฮ“ฮนฮฑ ฯ€ฯฯŒฯƒฮฒฮฑฯƒฮท ฯƒฯ„ฮนฯ‚ ฯ€ฯฮฟฮฒฮปฮตฯ€ฯŒฮผฮตฮฝฮตฯ‚ ฯ€ฮนฮธฮฑฮฝฯŒฯ„ฮทฯ„ฮตฯ‚:

grid_clf.best_estimator_.predict_proba(X_test)
array([[0.83576677, 0.16423323],
       [0.9458291 , 0.0541709 ],
       [0.64760416, 0.35239584],
       ...,
       [0.99639224, 0.00360776],
       [0.02072033, 0.97927967],
       [0.56782222, 0.43217778]])

ฮœฮฟฮฝฯ„ฮญฮปฮฟ XGBoost ฮผฮต scikit-learn

ฮ‘ฯ‚ ฮดฮฟฮบฮนฮผฮฌฯƒฮฟฯ…ฮผฮต ฯ€ฮฑฯฮฑฮดฮตฮฏฮณฮผฮฑฯ„ฮฑ Scikit-learn ฮณฮนฮฑ ฮฝฮฑ ฮตฮบฯ€ฮฑฮนฮดฮตฯฯƒฮฟฯ…ฮผฮต ฮญฮฝฮฑฮฝ ฮฑฯ€ฯŒ ฯ„ฮฟฯ…ฯ‚ ฮบฮฑฮปฯฯ„ฮตฯฮฟฯ…ฯ‚ ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮญฯ‚ ฯƒฯ„ฮทฮฝ ฮฑฮณฮฟฯฮฌ. ฮคฮฟ XGBoost ฮตฮฏฮฝฮฑฮน ฮผฮนฮฑ ฮฒฮตฮปฯ„ฮฏฯ‰ฯƒฮท ฯƒฮต ฯƒฯ‡ฮญฯƒฮท ฮผฮต ฯ„ฮฟ ฯ„ฯ…ฯ‡ฮฑฮฏฮฟ ฮดฮฌฯƒฮฟฯ‚. ฮคฮฟ ฮธฮตฯ‰ฯฮทฯ„ฮนฮบฯŒ ฯ…ฯ€ฯŒฮฒฮฑฮธฯฮฟ ฯ„ฮฟฯ… ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮฎ ฮตฮบฯ„ฯŒฯ‚ ฯ„ฮฟฯ… ฯ€ฮตฮดฮฏฮฟฯ… ฮฑฯ…ฯ„ฮฟฯ Python ฮ•ฮบฮผฮฌฮธฮทฯƒฮท Scikit. ฮ›ฮฌฮฒฮตฯ„ฮต ฯ…ฯ€ฯŒฯˆฮท ฯŒฯ„ฮน, ฯ„ฮฟ XGBoost ฮญฯ‡ฮตฮน ฮบฮตฯฮดฮฏฯƒฮตฮน ฯ€ฮฟฮปฮปฮฟฯฯ‚ ฮดฮนฮฑฮณฯ‰ฮฝฮนฯƒฮผฮฟฯฯ‚ kaggle. ฮœฮต ฮญฮฝฮฑ ฮผฮญฯƒฮฟ ฮผฮญฮณฮตฮธฮฟฯ‚ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ, ฮผฯ€ฮฟฯฮตฮฏ ฮฝฮฑ ฮฑฯ€ฮฟฮดฯŽฯƒฮตฮน ฯ„ฯŒฯƒฮฟ ฮบฮฑฮปฮฌ ฯŒฯƒฮฟ ฮญฮฝฮฑฯ‚ ฮฑฮปฮณฯŒฯฮนฮธฮผฮฟฯ‚ ฮฒฮฑฮธฮนฮฌฯ‚ ฮผฮฌฮธฮทฯƒฮทฯ‚ ฮฎ ฮฑฮบฯŒฮผฮฑ ฮบฮฑฮปฯฯ„ฮตฯฮฑ.

ฮŸ ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮฎฯ‚ ฮตฮฏฮฝฮฑฮน ฮดฯฯƒฮบฮฟฮปฮฟ ฮฝฮฑ ฮตฮบฯ€ฮฑฮนฮดฮตฯ…ฯ„ฮตฮฏ ฮตฯ€ฮตฮนฮดฮฎ ฮญฯ‡ฮตฮน ฮผฮตฮณฮฌฮปฮฟ ฮฑฯฮนฮธฮผฯŒ ฯ€ฮฑฯฮฑฮผฮญฯ„ฯฯ‰ฮฝ ฮณฮนฮฑ ฯƒฯ…ฮฝฯ„ฮฟฮฝฮนฯƒฮผฯŒ. ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต, ฯ†ฯ…ฯƒฮนฮบฮฌ, ฮฝฮฑ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮฎฯƒฮตฯ„ฮต ฯ„ฮฟ GridSearchCV ฮณฮนฮฑ ฮฝฮฑ ฮตฯ€ฮนฮปฮญฮพฮตฯ„ฮต ฯ„ฮทฮฝ ฯ€ฮฑฯฮฌฮผฮตฯ„ฯฮฟ ฮณฮนฮฑ ฮตฯƒฮฌฯ‚.

ฮ‘ฮฝฯ„ฮฏฮธฮตฯ„ฮฑ, ฮฑฯ‚ ฮดฮฟฯฮผฮต ฯ€ฯŽฯ‚ ฮฝฮฑ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮฎฯƒฮตฯ„ฮต ฮญฮฝฮฑฮฝ ฮบฮฑฮปฯฯ„ฮตฯฮฟ ฯ„ฯฯŒฯ€ฮฟ ฮณฮนฮฑ ฮฝฮฑ ฮฒฯฮตฮฏฯ„ฮต ฯ„ฮนฯ‚ ฮฒฮญฮปฯ„ฮนฯƒฯ„ฮตฯ‚ ฯ€ฮฑฯฮฑฮผฮญฯ„ฯฮฟฯ…ฯ‚. ฮคฮฟ GridSearchCV ฮผฯ€ฮฟฯฮตฮฏ ฮฝฮฑ ฮตฮฏฮฝฮฑฮน ฮบฮฟฯ…ฯฮฑฯƒฯ„ฮนฮบฯŒ ฮบฮฑฮน ฯ€ฮฟฮปฯ ฮผฮฑฮบฯฯ ฮณฮนฮฑ ฮตฮบฯ€ฮฑฮฏฮดฮตฯ…ฯƒฮท, ฮฑฮฝ ฯ€ฮตฯฮฌฯƒฮตฯ„ฮต ฯ€ฮฟฮปฮปฮญฯ‚ ฯ„ฮนฮผฮญฯ‚. ฮŸ ฯ‡ฯŽฯฮฟฯ‚ ฮฑฮฝฮฑฮถฮฎฯ„ฮทฯƒฮทฯ‚ ฮผฮตฮณฮฑฮปฯŽฮฝฮตฮน ฮผฮฑฮถฮฏ ฮผฮต ฯ„ฮฟฮฝ ฮฑฯฮนฮธฮผฯŒ ฯ„ฯ‰ฮฝ ฯ€ฮฑฯฮฑฮผฮญฯ„ฯฯ‰ฮฝ. ฮœฮนฮฑ ฯ€ฯฮฟฯ„ฮนฮผฯŽฮผฮตฮฝฮท ฮปฯฯƒฮท ฮตฮฏฮฝฮฑฮน ฮท ฯ‡ฯฮฎฯƒฮท RandomizedSearchCV. ฮ‘ฯ…ฯ„ฮฎ ฮท ฮผฮญฮธฮฟฮดฮฟฯ‚ ฯƒฯ…ฮฝฮฏฯƒฯ„ฮฑฯ„ฮฑฮน ฯƒฯ„ฮทฮฝ ฮตฯ€ฮนฮปฮฟฮณฮฎ ฯ„ฯ‰ฮฝ ฯ„ฮนฮผฯŽฮฝ ฮบฮฌฮธฮต ฯ…ฯ€ฮตฯฯ€ฮฑฯฮฑฮผฮญฯ„ฯฮฟฯ… ฮผฮตฯ„ฮฌ ฮฑฯ€ฯŒ ฮบฮฌฮธฮต ฮตฯ€ฮฑฮฝฮฌฮปฮทฯˆฮท ฯ„ฯ…ฯ‡ฮฑฮฏฮฑ. ฮ“ฮนฮฑ ฯ€ฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ, ฮตฮฌฮฝ ฮฟ ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮฎฯ‚ ฮญฯ‡ฮตฮน ฮตฮบฯ€ฮฑฮนฮดฮตฯ…ฯ„ฮตฮฏ ฯ€ฮฌฮฝฯ‰ ฮฑฯ€ฯŒ 1000 ฮตฯ€ฮฑฮฝฮฑฮปฮฎฯˆฮตฮนฯ‚, ฯ„ฯŒฯ„ฮต ฮฑฮพฮนฮฟฮปฮฟฮณฮฟฯฮฝฯ„ฮฑฮน 1000 ฯƒฯ…ฮฝฮดฯ…ฮฑฯƒฮผฮฟฮฏ. ฮ›ฮตฮนฯ„ฮฟฯ…ฯฮณฮตฮฏ ฮปฮฏฮณฮฟ ฯ€ฮฟฮปฯ ฯŒฯ€ฯ‰ฯ‚. GridSearchCV

ฮ ฯฮญฯ€ฮตฮน ฮฝฮฑ ฮตฮนฯƒฮฑฮณฮฌฮณฮตฯ„ฮต ฯ„ฮฟ xgboost. ฮ•ฮฌฮฝ ฮท ฮฒฮนฮฒฮปฮนฮฟฮธฮฎฮบฮท ฮดฮตฮฝ ฮตฮฏฮฝฮฑฮน ฮตฮณฮบฮฑฯ„ฮตฯƒฯ„ฮทฮผฮญฮฝฮท, ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮฎฯƒฯ„ฮต ฯ„ฮฟ pip3 install xgboost ฮฎ

use import sys
!{sys.executable} -m pip install xgboost

In Jupyter ฮธฮตฯ„ฮนฮบฮฎ ฮฑฯ„ฮผฯŒฯƒฯ†ฮฑฮนฯฮฑ

ฮฃฯ„ฮท ฯƒฯ…ฮฝฮญฯ‡ฮตฮนฮฑ,

import xgboost
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import StratifiedKFold

ฮคฮฟ ฮตฯ€ฯŒฮผฮตฮฝฮฟ ฮฒฮฎฮผฮฑ ฯƒฮต ฮฑฯ…ฯ„ฯŒ ฯ„ฮฟ Scikit Python ฮคฮฟ ฯƒฮตฮผฮนฮฝฮฌฯฮนฮฟ ฯ€ฮตฯฮนฮปฮฑฮผฮฒฮฌฮฝฮตฮน ฯ„ฮฟฮฝ ฮบฮฑฮธฮฟฯฮนฯƒฮผฯŒ ฯ„ฯ‰ฮฝ ฯ€ฮฑฯฮฑฮผฮญฯ„ฯฯ‰ฮฝ ฮณฮนฮฑ ฯƒฯ…ฮฝฯ„ฮฟฮฝฮนฯƒฮผฯŒ. ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮฑฮฝฮฑฯ„ฯฮญฮพฮตฯ„ฮต ฯƒฯ„ฮทฮฝ ฮตฯ€ฮฏฯƒฮทฮผฮท ฯ„ฮตฮบฮผฮทฯฮฏฯ‰ฯƒฮท ฮณฮนฮฑ ฮฝฮฑ ฮดฮตฮฏฯ„ฮต ฯŒฮปฮตฯ‚ ฯ„ฮนฯ‚ ฯ€ฮฑฯฮฑฮผฮญฯ„ฯฮฟฯ…ฯ‚ ฮณฮนฮฑ ฯƒฯ…ฮฝฯ„ฮฟฮฝฮนฯƒฮผฯŒ. ฮ“ฮนฮฑ ฯ‡ฮฌฯฮท ฯ„ฮฟฯ… Python ฮ•ฮบฮผฮฌฮธฮทฯƒฮท Sklearn, ฮตฯ€ฮนฮปฮญฮณฮตฯ„ฮต ฮผฯŒฮฝฮฟ ฮดฯฮฟ ฯ…ฯ€ฮตฯฯ€ฮฑฯฮฌฮผฮตฯ„ฯฮฟฯ…ฯ‚ ฮผฮต ฮดฯฮฟ ฯ„ฮนฮผฮญฯ‚ ฮท ฮบฮฑฮธฮตฮผฮฏฮฑ. ฮคฮฟ XGBoost ฮฑฯ€ฮฑฮนฯ„ฮตฮฏ ฯ€ฮฟฮปฯ ฯ‡ฯฯŒฮฝฮฟ ฮณฮนฮฑ ฮฝฮฑ ฯ€ฯฮฟฯ€ฮฟฮฝฮทฮธฮตฮฏ, ฯŒฯƒฮฟ ฯ€ฮตฯฮนฯƒฯƒฯŒฯ„ฮตฯฮตฯ‚ ฯ…ฯ€ฮตฯฯ€ฮฑฯฮฌฮผฮตฯ„ฯฮฟฮน ฯƒฯ„ฮฟ ฯ€ฮปฮญฮณฮผฮฑ, ฯ„ฯŒฯƒฮฟ ฯ€ฮตฯฮนฯƒฯƒฯŒฯ„ฮตฯฮฟฯ‚ ฯ‡ฯฯŒฮฝฮฟฯ‚ ฯ‡ฯฮตฮนฮฌฮถฮตฯ„ฮฑฮน ฮฝฮฑ ฯ€ฮตฯฮนฮผฮญฮฝฮตฯ„ฮต.

params = {
        'xgbclassifier__gamma': [0.5, 1],
        'xgbclassifier__max_depth': [3, 4]
        }

ฮšฮฑฯ„ฮฑฯƒฮบฮตฯ…ฮฌฮถฮตฯ„ฮต ฮผฮนฮฑ ฮฝฮญฮฑ ฮดฮนฮฟฯ‡ฮญฯ„ฮตฯ…ฯƒฮท ฮผฮต ฯ„ฮฟฮฝ ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮฎ XGBoost. ฮ•ฯ€ฮนฮปฮญฮณฮตฯ„ฮต ฮฝฮฑ ฮฟฯฮฏฯƒฮตฯ„ฮต 600 ฮตฮบฯ„ฮนฮผฮทฯ„ฮญฯ‚. ฮฃฮทฮผฮตฮนฯŽฯƒฯ„ฮต ฯŒฯ„ฮน ฮฟฮน n_estimators ฮตฮฏฮฝฮฑฮน ฮผฮนฮฑ ฯ€ฮฑฯฮฌฮผฮตฯ„ฯฮฟฯ‚ ฯ€ฮฟฯ… ฮผฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฯƒฯ…ฮฝฯ„ฮฟฮฝฮฏฯƒฮตฯ„ฮต. ฮœฮนฮฑ ฯ…ฯˆฮทฮปฮฎ ฯ„ฮนฮผฮฎ ฮผฯ€ฮฟฯฮตฮฏ ฮฝฮฑ ฮฟฮดฮทฮณฮฎฯƒฮตฮน ฯƒฮต ฯ…ฯ€ฮตฯฯ€ฯฮฟฯƒฮฑฯฮผฮฟฮณฮฎ. ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮดฮฟฮบฮนฮผฮฌฯƒฮตฯ„ฮต ฮผฯŒฮฝฮฟฮน ฯƒฮฑฯ‚ ฮดฮนฮฑฯ†ฮฟฯฮตฯ„ฮนฮบฮญฯ‚ ฯ„ฮนฮผฮญฯ‚, ฮฑฮปฮปฮฌ ฮฝฮฑ ฮณฮฝฯ‰ฯฮฏฮถฮตฯ„ฮต ฯŒฯ„ฮน ฮผฯ€ฮฟฯฮตฮฏ ฮฝฮฑ ฯ€ฮฌฯฮตฮน ฯŽฯฮตฯ‚. ฮงฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮตฮฏฯ„ฮต ฯ„ฮทฮฝ ฯ€ฯฮฟฮตฯ€ฮนฮปฮตฮณฮผฮญฮฝฮท ฯ„ฮนฮผฮฎ ฮณฮนฮฑ ฯ„ฮนฯ‚ ฮฌฮปฮปฮตฯ‚ ฯ€ฮฑฯฮฑฮผฮญฯ„ฯฮฟฯ…ฯ‚

model_xgb = make_pipeline(
    preprocess,
    xgboost.XGBClassifier(
                          n_estimators=600,
                          objective='binary:logistic',
                          silent=True,
                          nthread=1)
)

ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮฒฮตฮปฯ„ฮนฯŽฯƒฮตฯ„ฮต ฯ„ฮท ฮดฮนฮฑฯƒฯ„ฮฑฯ…ฯฮฟฯฮผฮตฮฝฮท ฮตฯ€ฮนฮบฯฯฯ‰ฯƒฮท ฮผฮต ฯ„ฮฟ ฯ€ฯฯŒฮณฯฮฑฮผฮผฮฑ ฯ€ฮฟฮปฮปฮฑฯ€ฮปฮฎฯ‚ ฮตฯ€ฮนฮบฯฯฯ‰ฯƒฮทฯ‚ Stratified K-Folds. ฮšฮฑฯ„ฮฑฯƒฮบฮตฯ…ฮฌฮถฮตฯ„ฮต ฮผฯŒฮฝฮฟ ฯ„ฯฮตฮนฯ‚ ฯ€ฯ„ฯ…ฯ‡ฮญฯ‚ ฮตฮดฯŽ ฮณฮนฮฑ ฮฝฮฑ ฮตฯ€ฮนฯ„ฮฑฯ‡ฯฮฝฮตฯ„ฮต ฯ„ฮฟฮฝ ฯ…ฯ€ฮฟฮปฮฟฮณฮนฯƒฮผฯŒ ฮฑฮปฮปฮฌ ฮฝฮฑ ฮผฮตฮนฯŽฯƒฮตฯ„ฮต ฯ„ฮทฮฝ ฯ€ฮฟฮนฯŒฯ„ฮทฯ„ฮฑ. ฮ‘ฯ…ฮพฮฎฯƒฯ„ฮต ฮฑฯ…ฯ„ฮฎฮฝ ฯ„ฮทฮฝ ฯ„ฮนฮผฮฎ ฯƒฮต 5 ฮฎ 10 ฯƒฯ„ฮฟ ฯƒฯ€ฮฏฯ„ฮน ฮณฮนฮฑ ฮฝฮฑ ฮฒฮตฮปฯ„ฮนฯŽฯƒฮตฯ„ฮต ฯ„ฮฑ ฮฑฯ€ฮฟฯ„ฮตฮปฮญฯƒฮผฮฑฯ„ฮฑ.

ฮ•ฯ€ฮนฮปฮญฮณฮตฯ„ฮต ฮฝฮฑ ฮตฮบฯ€ฮฑฮนฮดฮตฯฯƒฮตฯ„ฮต ฯ„ฮฟ ฮผฮฟฮฝฯ„ฮญฮปฮฟ ฯƒฮต ฯ„ฮญฯƒฯƒฮตฯฮนฯ‚ ฮตฯ€ฮฑฮฝฮฑฮปฮฎฯˆฮตฮนฯ‚.

skf = StratifiedKFold(n_splits=3,
                      shuffle = True,
                      random_state = 1001)

random_search = RandomizedSearchCV(model_xgb,
                                   param_distributions=params,
                                   n_iter=4,
                                   scoring='accuracy',
                                   n_jobs=4,
                                   cv=skf.split(X_train, y_train),
                                   verbose=3,
                                   random_state=1001)

ฮ— ฯ„ฯ…ฯ‡ฮฑฮฏฮฑ ฮฑฮฝฮฑฮถฮฎฯ„ฮทฯƒฮท ฮตฮฏฮฝฮฑฮน ฮญฯ„ฮฟฮนฮผฮท ฮณฮนฮฑ ฯ‡ฯฮฎฯƒฮท, ฮผฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮตฮบฯ€ฮฑฮนฮดฮตฯฯƒฮตฯ„ฮต ฯ„ฮฟ ฮผฮฟฮฝฯ„ฮญฮปฮฟ

#grid_xgb = GridSearchCV(model_xgb, params, cv=10, iid=False)
random_search.fit(X_train, y_train)
Fitting 3 folds for each of 4 candidates, totalling 12 fits
[CV] xgbclassifier__max_depth=3, xgbclassifier__gamma=0.5 ............
[CV] xgbclassifier__max_depth=3, xgbclassifier__gamma=0.5 ............
[CV] xgbclassifier__max_depth=3, xgbclassifier__gamma=0.5 ............
[CV] xgbclassifier__max_depth=4, xgbclassifier__gamma=0.5 ............
[CV]  xgbclassifier__max_depth=3, xgbclassifier__gamma=0.5, score=0.8759645283888057, total= 1.0min
[CV] xgbclassifier__max_depth=4, xgbclassifier__gamma=0.5 ............
[CV]  xgbclassifier__max_depth=3, xgbclassifier__gamma=0.5, score=0.8729701715996775, total= 1.0min
[CV]  xgbclassifier__max_depth=3, xgbclassifier__gamma=0.5, score=0.8706519235199263, total= 1.0min
[CV] xgbclassifier__max_depth=4, xgbclassifier__gamma=0.5 ............
[CV] xgbclassifier__max_depth=3, xgbclassifier__gamma=1 ..............
[CV]  xgbclassifier__max_depth=4, xgbclassifier__gamma=0.5, score=0.8735460094437406, total= 1.3min
[CV] xgbclassifier__max_depth=3, xgbclassifier__gamma=1 ..............
[CV]  xgbclassifier__max_depth=3, xgbclassifier__gamma=1, score=0.8722791661868018, total=  57.7s
[CV] xgbclassifier__max_depth=3, xgbclassifier__gamma=1 ..............
[CV]  xgbclassifier__max_depth=3, xgbclassifier__gamma=1, score=0.8753886905447426, total= 1.0min
[CV] xgbclassifier__max_depth=4, xgbclassifier__gamma=1 ..............
[CV]  xgbclassifier__max_depth=4, xgbclassifier__gamma=0.5, score=0.8697304768486523, total= 1.3min
[CV] xgbclassifier__max_depth=4, xgbclassifier__gamma=1 ..............
[CV]  xgbclassifier__max_depth=4, xgbclassifier__gamma=0.5, score=0.8740066797189912, total= 1.4min
[CV] xgbclassifier__max_depth=4, xgbclassifier__gamma=1 ..............
[CV]  xgbclassifier__max_depth=3, xgbclassifier__gamma=1, score=0.8707671043538355, total= 1.0min
[CV]  xgbclassifier__max_depth=4, xgbclassifier__gamma=1, score=0.8729701715996775, total= 1.2min
[Parallel(n_jobs=4)]: Done  10 out of  12 | elapsed:  3.6min remaining:   43.5s
[CV]  xgbclassifier__max_depth=4, xgbclassifier__gamma=1, score=0.8736611770125533, total= 1.2min
[CV]  xgbclassifier__max_depth=4, xgbclassifier__gamma=1, score=0.8692697535130154, total= 1.2min
[Parallel(n_jobs=4)]: Done  12 out of  12 | elapsed:  3.6min finished
/Users/Thomas/anaconda3/envs/hello-tf/lib/python3.6/site-packages/sklearn/model_selection/_search.py:737: DeprecationWarning: The default of the `iid` parameter will change from True to False in version 0.22 and will be removed in 0.24. This will change numeric results when test-set sizes are unequal. DeprecationWarning)
RandomizedSearchCV(cv=<generator object _BaseKFold.split at 0x1101eb830>,
          error_score='raise-deprecating',
          estimator=Pipeline(memory=None,
     steps=[('columntransformer', ColumnTransformer(n_jobs=1, remainder='drop', transformer_weights=None,
         transformers=[('standardscaler', StandardScaler(copy=True, with_mean=True, with_std=True), [0, 2, 10, 4, 11, 12]), ('onehotencoder', OneHotEncoder(categorical_features=None, categories=None,...
       reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
       silent=True, subsample=1))]),
          fit_params=None, iid='warn', n_iter=4, n_jobs=4,
          param_distributions={'xgbclassifier__gamma': [0.5, 1], 'xgbclassifier__max_depth': [3, 4]},
          pre_dispatch='2*n_jobs', random_state=1001, refit=True,
          return_train_score='warn', scoring='accuracy', verbose=3)

ฮŒฯ€ฯ‰ฯ‚ ฮผฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮดฮตฮฏฯ„ฮต, ฯ„ฮฟ XGBoost ฮญฯ‡ฮตฮน ฮบฮฑฮปฯฯ„ฮตฯฮท ฮฒฮฑฮธฮผฮฟฮปฮฟฮณฮฏฮฑ ฮฑฯ€ฯŒ ฯ„ฮทฮฝ ฯ€ฯฮฟฮทฮณฮฟฯฮผฮตฮฝฮท logisitc ฯ€ฮฑฮปฮนฮฝฮดฯฯŒฮผฮทฯƒฮท.

print("ฮบฮฑฮปฯ…ฯ„ฮตฯฮฑ parameter", random_search.best_params_)
print("best logistic regression from grid search: %f" % random_search.best_estimator_.score(X_test, y_test))
ฮบฮฑฮปฯ…ฯ„ฮตฯฮฑ parameter {'xgbclassifier__max_depth': 3, 'xgbclassifier__gamma': 0.5}
best logistic regression from grid search: 0.873157
random_search.best_estimator_.predict(X_test)
array(['<=50K', '<=50K', '<=50K', ..., '<=50K', '>50K', '<=50K'],      dtype=object)

ฮ”ฮทฮผฮนฮฟฯ…ฯฮณฮฎฯƒฯ„ฮต DNN ฮผฮต ฯ„ฮฟฮฝ MLPClassifier ฯƒฯ„ฮฟ scikit-learn

ฮคฮญฮปฮฟฯ‚, ฮผฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮตฮบฯ€ฮฑฮนฮดฮตฯฯƒฮตฯ„ฮต ฮญฮฝฮฑฮฝ ฮฑฮปฮณฯŒฯฮนฮธฮผฮฟ ฮฒฮฑฮธฮนฮฌฯ‚ ฮผฮฌฮธฮทฯƒฮทฯ‚ ฮผฮต ฯ„ฮฟ scikit-learn. ฮ— ฮผฮญฮธฮฟฮดฮฟฯ‚ ฮตฮฏฮฝฮฑฮน ฮท ฮฏฮดฮนฮฑ ฮผฮต ฯ„ฮฟฮฝ ฮฌฮปฮปฮฟ ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮฎ. ฮŸ ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮฎฯ‚ ฮตฮฏฮฝฮฑฮน ฮดฮนฮฑฮธฮญฯƒฮนฮผฮฟฯ‚ ฯƒฯ„ฮฟ MLPClassifier.

from sklearn.neural_network import MLPClassifier

ฮŸฯฮฏฮถฮตฯ„ฮต ฯ„ฮฟฮฝ ฮฑฮบฯŒฮปฮฟฯ…ฮธฮฟ ฮฑฮปฮณฯŒฯฮนฮธฮผฮฟ ฮฒฮฑฮธฮนฮฌฯ‚ ฮผฮฌฮธฮทฯƒฮทฯ‚:

  • ฮ‘ฮดฮฌฮผ ฮปฯฯ„ฮทฯ‚
  • ฮ›ฮตฮนฯ„ฮฟฯ…ฯฮณฮฏฮฑ ฮตฮฝฮตฯฮณฮฟฯ€ฮฟฮฏฮทฯƒฮทฯ‚ Relu
  • ฮ†ฮปฯ†ฮฑ = 0.0001
  • ฮผฮญฮณฮตฮธฮฟฯ‚ ฯ€ฮฑฯฯ„ฮฏฮดฮฑฯ‚ 150
  • ฮ”ฯฮฟ ฮบฯฯ…ฯ†ฮฌ ฯƒฯ„ฯฯŽฮผฮฑฯ„ฮฑ ฮผฮต 100 ฮบฮฑฮน 50 ฮฝฮตฯ…ฯฯŽฮฝฮตฯ‚ ฮฑฮฝฯ„ฮฏฯƒฯ„ฮฟฮนฯ‡ฮฑ
model_dnn = make_pipeline(
    preprocess,
    MLPClassifier(solver='adam',
                  alpha=0.0001,
                  activation='relu',
                    batch_size=150,
                    hidden_layer_sizes=(200, 100),
                    random_state=1))

ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮฑฮปฮปฮฌฮพฮตฯ„ฮต ฯ„ฮฟฮฝ ฮฑฯฮนฮธฮผฯŒ ฯ„ฯ‰ฮฝ ฮตฯ€ฮนฯ€ฮญฮดฯ‰ฮฝ ฮณฮนฮฑ ฮฝฮฑ ฮฒฮตฮปฯ„ฮนฯŽฯƒฮตฯ„ฮต ฯ„ฮฟ ฮผฮฟฮฝฯ„ฮญฮปฮฟ

model_dnn.fit(X_train, y_train)
  print("DNN regression score: %f" % model_dnn.score(X_test, y_test))

ฮ’ฮฑฮธฮผฮฟฮปฮฟฮณฮฏฮฑ ฯ€ฮฑฮปฮนฮฝฮดฯฯŒฮผฮทฯƒฮทฯ‚ DNN: 0.821253

LIME: ฮ•ฮผฯ€ฮนฯƒฯ„ฮตฯ…ฯ„ฮตฮฏฯ„ฮต ฯ„ฮฟ ฮผฮฟฮฝฯ„ฮญฮปฮฟ ฯƒฮฑฯ‚

ฮคฯŽฯฮฑ ฯ€ฮฟฯ… ฮญฯ‡ฮตฯ„ฮต ฮญฮฝฮฑ ฮบฮฑฮปฯŒ ฮผฮฟฮฝฯ„ฮญฮปฮฟ, ฯ‡ฯฮตฮนฮฌฮถฮตฯƒฯ„ฮต ฮญฮฝฮฑ ฮตฯฮณฮฑฮปฮตฮฏฮฟ ฮณฮนฮฑ ฮฝฮฑ ฯ„ฮฟ ฮตฮผฯ€ฮนฯƒฯ„ฮตฯ…ฯ„ฮตฮฏฯ„ฮต. ฮ•ฮบฮผฮฌฮธฮทฯƒฮท ฮผฮทฯ‡ฮฑฮฝฯŽฮฝ ฮŸ ฮฑฮปฮณฯŒฯฮนฮธฮผฮฟฯ‚, ฮตฮนฮดฮนฮบฮฌ ฯ„ฮฟ ฯ„ฯ…ฯ‡ฮฑฮฏฮฟ ฮดฮฌฯƒฮฟฯ‚ ฮบฮฑฮน ฯ„ฮฟ ฮฝฮตฯ…ฯฯ‰ฮฝฮนฮบฯŒ ฮดฮฏฮบฯ„ฯ…ฮฟ, ฮตฮฏฮฝฮฑฮน ฮณฮฝฯ‰ฯƒฯ„ฯŒ ฯŒฯ„ฮน ฮตฮฏฮฝฮฑฮน ฮฑฮปฮณฯŒฯฮนฮธฮผฮฟฯ‚ ฮผฮฑฯฯฮฟฯ… ฮบฮฟฯ…ฯ„ฮนฮฟฯ. ฮ ฮตฮฏฯ„ฮต ฮดฮนฮฑฯ†ฮฟฯฮตฯ„ฮนฮบฮฌ, ฮปฮตฮนฯ„ฮฟฯ…ฯฮณฮตฮฏ ฮฑฮปฮปฮฌ ฮบฮฑฮฝฮตฮฏฯ‚ ฮดฮตฮฝ ฮพฮญฯฮตฮน ฮณฮนฮฑฯ„ฮฏ.

ฮคฯฮตฮนฯ‚ ฮตฯฮตฯ…ฮฝฮทฯ„ฮญฯ‚ ฮญฯ‡ฮฟฯ…ฮฝ ฮฒฯฮตฮน ฮญฮฝฮฑ ฮตฮพฮฑฮนฯฮตฯ„ฮนฮบฯŒ ฮตฯฮณฮฑฮปฮตฮฏฮฟ ฮณฮนฮฑ ฮฝฮฑ ฮดฮฟฯ…ฮฝ ฯ€ฯŽฯ‚ ฮฟ ฯ…ฯ€ฮฟฮปฮฟฮณฮนฯƒฯ„ฮฎฯ‚ ฮบฮฌฮฝฮตฮน ฮผฮนฮฑ ฯ€ฯฯŒฮฒฮปฮตฯˆฮท. ฮคฮฟ ฯ‡ฮฑฯฯ„ฮฏ ฮฟฮฝฮฟฮผฮฌฮถฮตฯ„ฮฑฮน ฮ“ฮนฮฑฯ„ฮฏ ฯ€ฯฮญฯ€ฮตฮน ฮฝฮฑ ฯƒฮต ฮตฮผฯ€ฮนฯƒฯ„ฮตฯฮฟฮผฮฑฮน;

ฮ‘ฮฝฮญฯ€ฯ„ฯ…ฮพฮฑฮฝ ฮญฮฝฮฑฮฝ ฮฑฮปฮณฯŒฯฮนฮธฮผฮฟ ฮผฮต ฯ„ฮฟ ฯŒฮฝฮฟฮผฮฑ ฮคฮฟฯ€ฮนฮบฯŒ ฮตฯฮผฮทฮฝฮตฯ…ฯ„ฮนฮบฯŒ ฮผฮฟฮฝฯ„ฮญฮปฮฟ-ฮฑฮณฮฝฯ‰ฯƒฯ„ฮนฮบฮญฯ‚ ฮตฯ€ฮตฮพฮทฮณฮฎฯƒฮตฮนฯ‚ (LIME).

ฮ ฮฌฯฯ„ฮต ฮญฮฝฮฑ ฯ€ฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ:

ฮผฮตฯฮนฮบฮญฯ‚ ฯ†ฮฟฯฮญฯ‚ ฮดฮตฮฝ ฮพฮญฯฮตฯ„ฮต ฮฑฮฝ ฮผฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮตฮผฯ€ฮนฯƒฯ„ฮตฯ…ฯ„ฮตฮฏฯ„ฮต ฮผฮนฮฑ ฯ€ฯฯŒฮฒฮปฮตฯˆฮท ฮผฮทฯ‡ฮฑฮฝฮนฮบฮฎฯ‚ ฮผฮฌฮธฮทฯƒฮทฯ‚:

ฮˆฮฝฮฑฯ‚ ฮณฮนฮฑฯ„ฯฯŒฯ‚, ฮณฮนฮฑ ฯ€ฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ, ฮดฮตฮฝ ฮผฯ€ฮฟฯฮตฮฏ ฮฝฮฑ ฮตฮผฯ€ฮนฯƒฯ„ฮตฯ…ฯ„ฮตฮฏ ฮผฮนฮฑ ฮดฮนฮฌฮณฮฝฯ‰ฯƒฮท ฮผฯŒฮฝฮฟ ฮบฮฑฮน ฮผฯŒฮฝฮฟ ฮตฯ€ฮตฮนฮดฮฎ ฯ„ฮฟ ฮตฮฏฯ€ฮต ฮญฮฝฮฑฯ‚ ฯ…ฯ€ฮฟฮปฮฟฮณฮนฯƒฯ„ฮฎฯ‚. ฮ ฯฮญฯ€ฮตฮน ฮตฯ€ฮฏฯƒฮทฯ‚ ฮฝฮฑ ฮณฮฝฯ‰ฯฮฏฮถฮตฯ„ฮต ฮตฮฌฮฝ ฮผฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮตฮผฯ€ฮนฯƒฯ„ฮตฯ…ฯ„ฮตฮฏฯ„ฮต ฯ„ฮฟ ฮผฮฟฮฝฯ„ฮญฮปฮฟ ฯ€ฯฮนฮฝ ฯ„ฮฟ ฮฒฮฌฮปฮตฯ„ฮต ฯƒฯ„ฮทฮฝ ฯ€ฮฑฯฮฑฮณฯ‰ฮณฮฎ.

ฮฆฮฑฮฝฯ„ฮฑฯƒฯ„ฮตฮฏฯ„ฮต ฯŒฯ„ฮน ฮผฯ€ฮฟฯฮฟฯฮผฮต ฮฝฮฑ ฮบฮฑฯ„ฮฑฮปฮฌฮฒฮฟฯ…ฮผฮต ฮณฮนฮฑฯ„ฮฏ ฮฟฯ€ฮฟฮนฮฟฯƒฮดฮฎฯ€ฮฟฯ„ฮต ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮฎฯ‚ ฮบฮฌฮฝฮตฮน ฮผฮนฮฑ ฯ€ฯฯŒฮฒฮปฮตฯˆฮท ฮฑฮบฯŒฮผฮฑ ฮบฮฑฮน ฮฑฯ€ฮฏฯƒฯ„ฮตฯ…ฯ„ฮฑ ฯ€ฮตฯฮฏฯ€ฮปฮฟฮบฮฑ ฮผฮฟฮฝฯ„ฮญฮปฮฑ ฯŒฯ€ฯ‰ฯ‚ ฮฝฮตฯ…ฯฯ‰ฮฝฮนฮบฮฌ ฮดฮฏฮบฯ„ฯ…ฮฑ, ฯ„ฯ…ฯ‡ฮฑฮฏฮฑ ฮดฮฌฯƒฮท ฮฎ svms ฮผฮต ฮฟฯ€ฮฟฮนฮฟฮฝฮดฮฎฯ€ฮฟฯ„ฮต ฯ€ฯ…ฯฮฎฮฝฮฑ

ฮธฮฑ ฮณฮฏฮฝฮตฮน ฯ€ฮนฮฟ ฯ€ฯฮฟฯƒฮนฯ„ฯŒ ฮณฮนฮฑ ฮฝฮฑ ฮตฮผฯ€ฮนฯƒฯ„ฮตฯ…ฯ„ฮฟฯฮผฮต ฮผฮนฮฑ ฯ€ฯฯŒฮฒฮปฮตฯˆฮท, ฮฑฮฝ ฮบฮฑฯ„ฮฑฮปฮฌฮฒฮฟฯ…ฮผฮต ฯ„ฮฟฯ…ฯ‚ ฮปฯŒฮณฮฟฯ…ฯ‚ ฯ€ฮฏฯƒฯ‰ ฮฑฯ€ฯŒ ฮฑฯ…ฯ„ฮฎฮฝ. ฮ‘ฯ€ฯŒ ฯ„ฮฟ ฯ€ฮฑฯฮฌฮดฮตฮนฮณฮผฮฑ ฮผฮต ฯ„ฮฟฮฝ ฮณฮนฮฑฯ„ฯฯŒ, ฮฑฮฝ ฯ„ฮฟ ฮผฮฟฮฝฯ„ฮญฮปฮฟ ฯ„ฮฟฯ… ฮญฮปฮตฮณฮต ฯ€ฮฟฮนฮฑ ฯƒฯ…ฮผฯ€ฯ„ฯŽฮผฮฑฯ„ฮฑ ฮตฮฏฮฝฮฑฮน ฮฑฯ€ฮฑฯฮฑฮฏฯ„ฮทฯ„ฮฑ ฮธฮฑ ฯ„ฮฟ ฮตฮผฯ€ฮนฯƒฯ„ฮตฯ…ฯŒฯƒฮฟฯ…ฮฝ, ฮตฮฏฮฝฮฑฮน ฮตฯ€ฮฏฯƒฮทฯ‚ ฯ€ฮนฮฟ ฮตฯฮบฮฟฮปฮฟ ฮฝฮฑ ฮบฮฑฯ„ฮฑฮปฮฌฮฒฮตฮนฯ‚ ฮฑฮฝ ฮดฮตฮฝ ฯ€ฯฮญฯ€ฮตฮน ฮฝฮฑ ฮตฮผฯ€ฮนฯƒฯ„ฮตฯฮตฯƒฮฑฮน ฯ„ฮฟ ฮผฮฟฮฝฯ„ฮญฮปฮฟ.

ฮคฮฟ Lime ฮผฯ€ฮฟฯฮตฮฏ ฮฝฮฑ ฯƒฮฑฯ‚ ฯ€ฮตฮน ฯ€ฮฟฮนฮฑ ฯ‡ฮฑฯฮฑฮบฯ„ฮทฯฮนฯƒฯ„ฮนฮบฮฌ ฮตฯ€ฮทฯฮตฮฌฮถฮฟฯ…ฮฝ ฯ„ฮนฯ‚ ฮฑฯ€ฮฟฯ†ฮฌฯƒฮตฮนฯ‚ ฯ„ฮฟฯ… ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮฎ

ฮ ฯฮฟฮตฯ„ฮฟฮนฮผฮฑฯƒฮฏฮฑ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ

ฮ•ฮฏฮฝฮฑฮน ฮผฮตฯฮนฮบฮฌ ฯ€ฯฮฌฮณฮผฮฑฯ„ฮฑ ฯ€ฮฟฯ… ฯ€ฯฮญฯ€ฮตฮน ฮฝฮฑ ฮฑฮปฮปฮฌฮพฮตฯ„ฮต ฮณฮนฮฑ ฮฝฮฑ ฯ„ฯฮญฮพฮตฯ„ฮต ฮผฮต ฯ„ฮฟ LIME ฮ ฯฮธฯ‰ฮฝ. ฮ ฯฯŽฯ„ฮฑ ฮฑฯ€ 'ฯŒฮปฮฑ, ฯ€ฯฮญฯ€ฮตฮน ฮฝฮฑ ฮตฮณฮบฮฑฯ„ฮฑฯƒฯ„ฮฎฯƒฮตฯ„ฮต ฮฑฯƒฮฒฮญฯƒฯ„ฮท ฯƒฯ„ฮฟ ฯ„ฮตฯฮผฮฑฯ„ฮนฮบฯŒ. ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮฎฯƒฮตฯ„ฮต pip install lime

ฮคฮฟ Lime ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮตฮฏ ฯ„ฮฟ ฮฑฮฝฯ„ฮนฮบฮตฮฏฮผฮตฮฝฮฟ LimeTabularExplainer ฮณฮนฮฑ ฮฝฮฑ ฯ€ฯฮฟฯƒฮตฮณฮณฮฏฯƒฮตฮน ฯ„ฮฟ ฮผฮฟฮฝฯ„ฮญฮปฮฟ ฯ„ฮฟฯ€ฮนฮบฮฌ. ฮ‘ฯ…ฯ„ฯŒ ฯ„ฮฟ ฮฑฮฝฯ„ฮนฮบฮตฮฏฮผฮตฮฝฮฟ ฮฑฯ€ฮฑฮนฯ„ฮตฮฏ:

  • ฮญฮฝฮฑ ฯƒฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ ฯƒฮต ฮผฮฟฯฯ†ฮฎ numpy
  • ฮคฮฟ ฯŒฮฝฮฟฮผฮฑ ฯ„ฯ‰ฮฝ ฯ‡ฮฑฯฮฑฮบฯ„ฮทฯฮนฯƒฯ„ฮนฮบฯŽฮฝ: feature_names
  • ฮคฮฟ ฯŒฮฝฮฟฮผฮฑ ฯ„ฯ‰ฮฝ ฮบฮปฮฌฯƒฮตฯ‰ฮฝ: class_names
  • ฮคฮฟ ฮตฯ…ฯฮตฯ„ฮฎฯฮนฮฟ ฯ„ฮทฯ‚ ฯƒฯ„ฮฎฮปฮทฯ‚ ฯ„ฯ‰ฮฝ ฮบฮฑฯ„ฮทฮณฮฟฯฮนฮบฯŽฮฝ ฯ‡ฮฑฯฮฑฮบฯ„ฮทฯฮนฯƒฯ„ฮนฮบฯŽฮฝ: categorical_features
  • ฮคฮฟ ฯŒฮฝฮฟฮผฮฑ ฯ„ฮทฯ‚ ฮฟฮผฮฌฮดฮฑฯ‚ ฮณฮนฮฑ ฮบฮฌฮธฮต ฮบฮฑฯ„ฮทฮณฮฟฯฮฏฮฑ ฯ‡ฮฑฯฮฑฮบฯ„ฮทฯฮนฯƒฯ„ฮนฮบฯŽฮฝ: categorical_names

ฮ”ฮทฮผฮนฮฟฯ…ฯฮณฮฎฯƒฯ„ฮต ฮญฮฝฮฑ ฯƒฮตฯ„ ฯ„ฯฮญฮฝฯ‰ฮฝ ฮผฮต ฮฑฮฝฯŽฮผฮฑฮปฮฟ ฯ„ฯฮญฮฝฮฟ

ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮฑฮฝฯ„ฮนฮณฯฮฌฯˆฮตฯ„ฮต ฮบฮฑฮน ฮฝฮฑ ฮผฮตฯ„ฮฑฯ„ฯฮญฯˆฮตฯ„ฮต ฯ„ฮฟ df_train ฮฑฯ€ฯŒ pandas ฯƒฮต ฯ€ฮฟฮปฮปฮฟฮฏ ฯ€ฮฑฮฝฮตฯฮบฮฟฮปฮฑ

df_train.head(5)
# Create numpy data
df_lime = df_train
df_lime.head(3)

ฮ›ฮฌฮฒฮตฯ„ฮต ฯ„ฮฟ ฯŒฮฝฮฟฮผฮฑ ฯ„ฮทฯ‚ ฯ„ฮฌฮพฮทฯ‚ ฮ— ฮตฯ„ฮนฮบฮญฯ„ฮฑ ฮตฮฏฮฝฮฑฮน ฯ€ฯฮฟฯƒฮฒฮฌฯƒฮนฮผฮท ฮผฮต ฯ„ฮฟ ฮฑฮฝฯ„ฮนฮบฮตฮฏฮผฮตฮฝฮฟ unique(). ฮ ฯฮญฯ€ฮตฮน ฮฝฮฑ ฮดฮตฮนฯ‚:

  • '<=50 ฯ‡ฮนฮปฮนฮฌฮดฮตฯ‚'
  • '>50 ฯ‡ฮนฮปฮนฮฌฮดฮตฯ‚'
# Get the class name
class_names = df_lime.label.unique()
class_names
array(['<=50K', '>50K'], dtype=object)

ฮตฯ…ฯฮตฯ„ฮฎฯฮนฮฟ ฯ„ฮทฯ‚ ฯƒฯ„ฮฎฮปฮทฯ‚ ฯ„ฯ‰ฮฝ ฮบฮฑฯ„ฮทฮณฮฟฯฮนฮบฯŽฮฝ ฯ‡ฮฑฯฮฑฮบฯ„ฮทฯฮนฯƒฯ„ฮนฮบฯŽฮฝ

ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮฎฯƒฮตฯ„ฮต ฯ„ฮท ฮผฮญฮธฮฟฮดฮฟ ฯ€ฮฟฯ… ฮฑฮบฮฟฯ…ฮผฯ€ฮฌฯ„ฮต ฯ€ฯฮนฮฝ ฮณฮนฮฑ ฮฝฮฑ ฮปฮฌฮฒฮตฯ„ฮต ฯ„ฮฟ ฯŒฮฝฮฟฮผฮฑ ฯ„ฮทฯ‚ ฮฟฮผฮฌฮดฮฑฯ‚. ฮšฯ‰ฮดฮนฮบฮฟฯ€ฮฟฮนฮตฮฏฯ„ฮต ฯ„ฮทฮฝ ฮตฯ„ฮนฮบฮญฯ„ฮฑ ฮผฮต ฯ„ฮฟ LabelEncoder. ฮ•ฯ€ฮฑฮฝฮฑฮปฮฑฮผฮฒฮฌฮฝฮตฯ„ฮต ฯ„ฮท ฮปฮตฮนฯ„ฮฟฯ…ฯฮณฮฏฮฑ ฯƒฮต ฯŒฮปฮฑ ฯ„ฮฑ ฮบฮฑฯ„ฮทฮณฮฟฯฮนฮบฮฌ ฯ‡ฮฑฯฮฑฮบฯ„ฮทฯฮนฯƒฯ„ฮนฮบฮฌ.

## 
import sklearn.preprocessing as preprocessing
categorical_names = {}
for feature in CATE_FEATURES:
    le = preprocessing.LabelEncoder()
    le.fit(df_lime[feature])
    df_lime[feature] = le.transform(df_lime[feature])
    categorical_names[feature] = le.classes_
print(categorical_names)    
{'workclass': array(['?', 'Federal-gov', 'Local-gov', 'Never-worked', 'Private',
       'Self-emp-inc', 'Self-emp-not-inc', 'State-gov', 'Without-pay'],
      dtype=object), 'education': array(['10th', '11th', '12th', '1st-4th', '5th-6th', '7th-8th', '9th',
       'Assoc-acdm', 'Assoc-voc', 'Bachelors', 'Doctorate', 'HS-grad',
       'Masters', 'Preschool', 'Prof-school', 'Some-college'],
      dtype=object), 'marital': array(['Divorced', 'Married-AF-spouse', 'Married-civ-spouse',
       'Married-spouse-absent', 'Never-married', 'Separated', 'Widowed'],
      dtype=object), 'occupation': array(['?', 'Adm-clerical', 'Armed-Forces', 'Craft-repair',
       'Exec-managerial', 'Farming-fishing', 'Handlers-cleaners',
       'Machine-op-inspct', 'Other-service', 'Priv-house-serv',
       'Prof-specialty', 'Protective-serv', 'Sales', 'Tech-support',
       'Transport-moving'], dtype=object), 'relationship': array(['Husband', 'Not-in-family', 'Other-relative', 'Own-child',
       'Unmarried', 'Wife'], dtype=object), 'race': array(['Amer-Indian-Eskimo', 'Asian-Pac-Islander', 'Black', 'Other',
       'White'], dtype=object), 'sex': array(['Female', 'Male'], dtype=object), 'native_country': array(['?', 'Cambodia', 'Canada', 'China', 'Columbia', 'Cuba',
       'Dominican-Republic', 'Ecuador', 'El-Salvador', 'England',
       'France', 'Germany', 'Greece', 'Guatemala', 'Haiti', 'Honduras',
       'Hong', 'Hungary', 'India', 'Iran', 'Ireland', 'Italy', 'Jamaica',
       'Japan', 'Laos', 'Mexico', 'Nicaragua',
       'Outlying-US(Guam-USVI-etc)', 'Peru', 'Philippines', 'Poland',
       'Portugal', 'Puerto-Rico', 'Scotland', 'South', 'Taiwan',
       'Thailand', 'Trinadad&Tobago', 'United-States', 'Vietnam',
       'Yugoslavia'], dtype=object)}

df_lime.dtypes
age               float64
workclass           int64
fnlwgt            float64
education           int64
education_num     float64
marital             int64
occupation          int64
relationship        int64
race                int64
sex                 int64
capital_gain      float64
capital_loss      float64
hours_week        float64
native_country      int64
label              object
dtype: object

ฮคฯŽฯฮฑ ฯ€ฮฟฯ… ฯ„ฮฟ ฯƒฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ ฮตฮฏฮฝฮฑฮน ฮญฯ„ฮฟฮนฮผฮฟ, ฮผฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮดฮทฮผฮนฮฟฯ…ฯฮณฮฎฯƒฮตฯ„ฮต ฯ„ฮฟ ฮดฮนฮฑฯ†ฮฟฯฮตฯ„ฮนฮบฯŒ ฯƒฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ ฯŒฯ€ฯ‰ฯ‚ ฯ†ฮฑฮฏฮฝฮตฯ„ฮฑฮน ฯƒฯ„ฮฑ ฯ€ฮฑฯฮฑฮบฮฌฯ„ฯ‰ ฯ€ฮฑฯฮฑฮดฮตฮฏฮณฮผฮฑฯ„ฮฑ ฮตฮบฮผฮฌฮธฮทฯƒฮทฯ‚ Scikit. ฮฃฯ„ฮทฮฝ ฯ€ฯฮฑฮณฮผฮฑฯ„ฮนฮบฯŒฯ„ฮทฯ„ฮฑ ฮผฮตฯ„ฮฑฯƒฯ‡ฮทฮผฮฑฯ„ฮฏฮถฮตฯ„ฮต ฯ„ฮฑ ฮดฮตฮดฮฟฮผฮญฮฝฮฑ ฮตฮบฯ„ฯŒฯ‚ ฯ„ฮฟฯ… ฮฑฮณฯ‰ฮณฮฟฯ ฮณฮนฮฑ ฮฝฮฑ ฮฑฯ€ฮฟฯ†ฯฮณฮตฯ„ฮต ฯƒฯ†ฮฌฮปฮผฮฑฯ„ฮฑ ฮผฮต ฯ„ฮฟ LIME. ฮคฮฟ ฯƒฮตฯ„ ฮตฮบฯ€ฮฑฮฏฮดฮตฯ…ฯƒฮทฯ‚ ฯƒฯ„ฮฟ LimeTabularExplainer ฮธฮฑ ฯ€ฯฮญฯ€ฮตฮน ฮฝฮฑ ฮตฮฏฮฝฮฑฮน ฮญฮฝฮฑฯ‚ numpy ฯ€ฮฏฮฝฮฑฮบฮฑฯ‚ ฯ‡ฯ‰ฯฮฏฯ‚ ฯƒฯ…ฮผฮฒฮฟฮปฮฟฯƒฮตฮนฯฮฌ. ฮœฮต ฯ„ฮทฮฝ ฯ€ฮฑฯฮฑฯ€ฮฌฮฝฯ‰ ฮผฮญฮธฮฟฮดฮฟ, ฮญฯ‡ฮตฯ„ฮต ฮฎฮดฮท ฮผฮตฯ„ฮฑฯ„ฯฮญฯˆฮตฮน ฮญฮฝฮฑ ฯƒฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ ฮตฮบฯ€ฮฑฮฏฮดฮตฯ…ฯƒฮทฯ‚.

from sklearn.model_selection import train_test_split
X_train_lime, X_test_lime, y_train_lime, y_test_lime = train_test_split(df_lime[features],
                                                    df_lime.label,
                                                    test_size = 0.2,
                                                    random_state=0)
X_train_lime.head(5)

ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฯ†ฯ„ฮนฮฌฮพฮตฯ„ฮต ฯ„ฮท ฮดฮนฮฟฯ‡ฮญฯ„ฮตฯ…ฯƒฮท ฮผฮต ฯ„ฮนฯ‚ ฮฒฮญฮปฯ„ฮนฯƒฯ„ฮตฯ‚ ฯ€ฮฑฯฮฑฮผฮญฯ„ฯฮฟฯ…ฯ‚ ฮฑฯ€ฯŒ ฯ„ฮฟ XGBoost

model_xgb = make_pipeline(
    preprocess,
    xgboost.XGBClassifier(max_depth = 3,
                          gamma = 0.5,
                          n_estimators=600,
                          objective='binary:logistic',
                          silent=True,
                          nthread=1))

model_xgb.fit(X_train_lime, y_train_lime)
/Users/Thomas/anaconda3/envs/hello-tf/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py:351: FutureWarning: The handling of integer data will change in version 0.22. Currently, the categories are determined based on the range [0, max(values)], while in the future they will be determined based on the unique values.
If you want the future behavior and silence this warning, you can specify "categories='auto'."In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.
  warnings.warn(msg, FutureWarning)
Pipeline(memory=None,
     steps=[('columntransformer', ColumnTransformer(n_jobs=1, remainder='drop', transformer_weights=None,
         transformers=[('standardscaler', StandardScaler(copy=True, with_mean=True, with_std=True), [0, 2, 10, 4, 11, 12]), ('onehotencoder', OneHotEncoder(categorical_features=None, categories=None,...
       reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
       silent=True, subsample=1))])

ฮ›ฮฑฮผฮฒฮฌฮฝฮตฯ„ฮต ฮผฮนฮฑ ฯ€ฯฮฟฮตฮนฮดฮฟฯ€ฮฟฮฏฮทฯƒฮท. ฮ— ฯ€ฯฮฟฮตฮนฮดฮฟฯ€ฮฟฮฏฮทฯƒฮท ฮตฮพฮทฮณฮตฮฏ ฯŒฯ„ฮน ฮดฮตฮฝ ฯ‡ฯฮตฮนฮฌฮถฮตฯ„ฮฑฮน ฮฝฮฑ ฮดฮทฮผฮนฮฟฯ…ฯฮณฮฎฯƒฮตฯ„ฮต ฮญฮฝฮฑฮฝ ฮบฯ‰ฮดฮนฮบฮฟฯ€ฮฟฮนฮทฯ„ฮฎ ฮตฯ„ฮนฮบฮตฯ„ฯŽฮฝ ฯ€ฯฮนฮฝ ฮฑฯ€ฯŒ ฯ„ฮท ฮดฮนฮฟฯ‡ฮญฯ„ฮตฯ…ฯƒฮท. ฮ•ฮฌฮฝ ฮดฮตฮฝ ฮธฮญฮปฮตฯ„ฮต ฮฝฮฑ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮฎฯƒฮตฯ„ฮต ฯ„ฮฟ LIME, ฮผฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮฎฯƒฮตฯ„ฮต ฯ„ฮท ฮผฮญฮธฮฟฮดฮฟ ฮฑฯ€ฯŒ ฯ„ฮฟ ฯ€ฯฯŽฯ„ฮฟ ฮผฮญฯฮฟฯ‚ ฯ„ฮฟฯ… ฯƒฮตฮผฮนฮฝฮฑฯฮฏฮฟฯ… Machine Learning with Scikit-learn. ฮ”ฮนฮฑฯ†ฮฟฯฮตฯ„ฮนฮบฮฌ, ฮผฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฯƒฯ…ฮฝฮตฯ‡ฮฏฯƒฮตฯ„ฮต ฮผฮต ฮฑฯ…ฯ„ฮฎฮฝ ฯ„ฮท ฮผฮญฮธฮฟฮดฮฟ, ฯ€ฯฯŽฯ„ฮฑ ฮฝฮฑ ฮดฮทฮผฮนฮฟฯ…ฯฮณฮฎฯƒฮตฯ„ฮต ฮญฮฝฮฑ ฮบฯ‰ฮดฮนฮบฮฟฯ€ฮฟฮนฮทฮผฮญฮฝฮฟ ฯƒฯฮฝฮฟฮปฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ, ฮฝฮฑ ฮฟฯฮฏฯƒฮตฯ„ฮต ฮปฮฎฯˆฮท ฯ„ฮฟฯ… ฮบฯ‰ฮดฮนฮบฮฟฯ€ฮฟฮนฮทฯ„ฮฎ hot one ฮตฮฝฯ„ฯŒฯ‚ ฯ„ฮฟฯ… ฮฑฮณฯ‰ฮณฮฟฯ.

print("best logistic regression from grid search: %f" % model_xgb.score(X_test_lime, y_test_lime))
best logistic regression from grid search: 0.873157
model_xgb.predict_proba(X_test_lime)
array([[7.9646105e-01, 2.0353897e-01],
       [9.5173013e-01, 4.8269872e-02],
       [7.9344827e-01, 2.0655173e-01],
       ...,
       [9.9031430e-01, 9.6856682e-03],
       [6.4581633e-04, 9.9935418e-01],
       [9.7104281e-01, 2.8957171e-02]], dtype=float32)

ฮ ฯฮนฮฝ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮฎฯƒฮตฯ„ฮต ฯ„ฮฟ LIME ฯƒฮต ฮดฯฮฌฯƒฮท, ฮฑฯ‚ ฮดฮทฮผฮนฮฟฯ…ฯฮณฮฎฯƒฮฟฯ…ฮผฮต ฮญฮฝฮฑฮฝ numpy array ฮผฮต ฯ„ฮฑ ฯ‡ฮฑฯฮฑฮบฯ„ฮทฯฮนฯƒฯ„ฮนฮบฮฌ ฯ„ฮทฯ‚ ฮปฮฑฮฝฮธฮฑฯƒฮผฮญฮฝฮทฯ‚ ฯ„ฮฑฮพฮนฮฝฯŒฮผฮทฯƒฮทฯ‚. ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮฎฯƒฮตฯ„ฮต ฮฑฯ…ฯ„ฮฎฮฝ ฯ„ฮท ฮปฮฏฯƒฯ„ฮฑ ฮฑฯฮณฯŒฯ„ฮตฯฮฑ ฮณฮนฮฑ ฮฝฮฑ ฯ€ฮฌฯฮตฯ„ฮต ฮผฮนฮฑ ฮนฮดฮญฮฑ ฯƒฯ‡ฮตฯ„ฮนฮบฮฌ ฮผฮต ฯ„ฮฟ ฯ„ฮน ฯ€ฮฑฯฮฑฯ€ฮปฮฑฮฝฮฌ ฯ„ฮฟฮฝ ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮฎ.

temp = pd.concat([X_test_lime, y_test_lime], axis= 1)
temp['predicted'] = model_xgb.predict(X_test_lime)
temp['wrong']=  temp['label'] != temp['predicted']
temp = temp.query('wrong==True').drop('wrong', axis=1)
temp= temp.sort_values(by=['label'])
temp.shape

(826, 16)

ฮ”ฮทฮผฮนฮฟฯ…ฯฮณฮตฮฏฯ„ฮต ฮผฮนฮฑ ฯƒฯ…ฮฝฮฌฯฯ„ฮทฯƒฮท ฮปฮฌฮผฮดฮฑ ฮณฮนฮฑ ฮฝฮฑ ฮฑฮฝฮฑฮบฯ„ฮฎฯƒฮตฯ„ฮต ฯ„ฮทฮฝ ฯ€ฯฯŒฮฒฮปฮตฯˆฮท ฮฑฯ€ฯŒ ฯ„ฮฟ ฮผฮฟฮฝฯ„ฮญฮปฮฟ ฮผฮต ฯ„ฮฑ ฮฝฮญฮฑ ฮดฮตฮดฮฟฮผฮญฮฝฮฑ. ฮ˜ฮฑ ฯ„ฮฟ ฯ‡ฯฮตฮนฮฑฯƒฯ„ฮตฮฏฯ„ฮต ฯƒฯฮฝฯ„ฮฟฮผฮฑ.

predict_fn = lambda x: model_xgb.predict_proba(x).astype(float)
X_test_lime.dtypes
age               float64
workclass           int64
fnlwgt            float64
education           int64
education_num     float64
marital             int64
occupation          int64
relationship        int64
race                int64
sex                 int64
capital_gain      float64
capital_loss      float64
hours_week        float64
native_country      int64
dtype: object
predict_fn(X_test_lime)
array([[7.96461046e-01, 2.03538969e-01],
       [9.51730132e-01, 4.82698716e-02],
       [7.93448269e-01, 2.06551731e-01],
       ...,
       [9.90314305e-01, 9.68566816e-03],
       [6.45816326e-04, 9.99354184e-01],
       [9.71042812e-01, 2.89571714e-02]])

ฮœฮตฯ„ฮฑฯ„ฯฮญฯ€ฮตฯ„ฮต ฯ„ฮฟ ฯ€ฮปฮฑฮฏฯƒฮนฮฟ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ pandas ฯƒฮต numpy array

X_train_lime = X_train_lime.values
X_test_lime = X_test_lime.values
X_test_lime
array([[4.00000e+01, 5.00000e+00, 1.93524e+05, ..., 0.00000e+00,
        4.00000e+01, 3.80000e+01],
       [2.70000e+01, 4.00000e+00, 2.16481e+05, ..., 0.00000e+00,
        4.00000e+01, 3.80000e+01],
       [2.50000e+01, 4.00000e+00, 2.56263e+05, ..., 0.00000e+00,
        4.00000e+01, 3.80000e+01],
       ...,
       [2.80000e+01, 6.00000e+00, 2.11032e+05, ..., 0.00000e+00,
        4.00000e+01, 2.50000e+01],
       [4.40000e+01, 4.00000e+00, 1.67005e+05, ..., 0.00000e+00,
        6.00000e+01, 3.80000e+01],
       [5.30000e+01, 4.00000e+00, 2.57940e+05, ..., 0.00000e+00,
        4.00000e+01, 3.80000e+01]])
model_xgb.predict_proba(X_test_lime)
array([[7.9646105e-01, 2.0353897e-01],
       [9.5173013e-01, 4.8269872e-02],
       [7.9344827e-01, 2.0655173e-01],
       ...,
       [9.9031430e-01, 9.6856682e-03],
       [6.4581633e-04, 9.9935418e-01],
       [9.7104281e-01, 2.8957171e-02]], dtype=float32)
print(features,
      class_names,
      categorical_features,
      categorical_names)
['age', 'workclass', 'fnlwgt', 'education', 'education_num', 'marital', 'occupation', 'relationship', 'race', 'sex', 'capital_gain', 'capital_loss', 'hours_week', 'native_country'] ['<=50K' '>50K'] [1, 3, 5, 6, 7, 8, 9, 13] {'workclass': array(['?', 'Federal-gov', 'Local-gov', 'Never-worked', 'Private',
       'Self-emp-inc', 'Self-emp-not-inc', 'State-gov', 'Without-pay'],
      dtype=object), 'education': array(['10th', '11th', '12th', '1st-4th', '5th-6th', '7th-8th', '9th',
       'Assoc-acdm', 'Assoc-voc', 'Bachelors', 'Doctorate', 'HS-grad',
       'Masters', 'Preschool', 'Prof-school', 'Some-college'],
      dtype=object), 'marital': array(['Divorced', 'Married-AF-spouse', 'Married-civ-spouse',
       'Married-spouse-absent', 'Never-married', 'Separated', 'Widowed'],
      dtype=object), 'occupation': array(['?', 'Adm-clerical', 'Armed-Forces', 'Craft-repair',
       'Exec-managerial', 'Farming-fishing', 'Handlers-cleaners',
       'Machine-op-inspct', 'Other-service', 'Priv-house-serv',
       'Prof-specialty', 'Protective-serv', 'Sales', 'Tech-support',
       'Transport-moving'], dtype=object), 'relationship': array(['Husband', 'Not-in-family', 'Other-relative', 'Own-child',
       'Unmarried', 'Wife'], dtype=object), 'race': array(['Amer-Indian-Eskimo', 'Asian-Pac-Islander', 'Black', 'Other',
       'White'], dtype=object), 'sex': array(['Female', 'Male'], dtype=object), 'native_country': array(['?', 'Cambodia', 'Canada', 'China', 'Columbia', 'Cuba',
       'Dominican-Republic', 'Ecuador', 'El-Salvador', 'England',
       'France', 'Germany', 'Greece', 'Guatemala', 'Haiti', 'Honduras',
       'Hong', 'Hungary', 'India', 'Iran', 'Ireland', 'Italy', 'Jamaica',
       'Japan', 'Laos', 'Mexico', 'Nicaragua',
       'Outlying-US(Guam-USVI-etc)', 'Peru', 'Philippines', 'Poland',
       'Portugal', 'Puerto-Rico', 'Scotland', 'South', 'Taiwan',
       'Thailand', 'Trinadad&Tobago', 'United-States', 'Vietnam',
       'Yugoslavia'], dtype=object)}
import lime
import lime.lime_tabular
### Train should be label encoded not one hot encoded
explainer = lime.lime_tabular.LimeTabularExplainer(X_train_lime ,
                                                   feature_names = features,
                                                   class_names=class_names,
                                                   categorical_features=categorical_features, 
                                                   categorical_names=categorical_names,
                                                   kernel_width=3)

ฮ‘ฯ‚ ฮตฯ€ฮนฮปฮญฮพฮฟฯ…ฮผฮต ฮญฮฝฮฑ ฯ„ฯ…ฯ‡ฮฑฮฏฮฟ ฮฝฮฟฮนฮบฮฟฮบฯ…ฯฮนฯŒ ฮฑฯ€ฯŒ ฯ„ฮฟ ฮดฮฟฮบฮนฮผฮฑฯƒฯ„ฮนฮบฯŒ ฯƒฯฮฝฮฟฮปฮฟ ฮบฮฑฮน ฮฑฯ‚ ฮดฮฟฯฮผฮต ฯ„ฮทฮฝ ฯ€ฯฯŒฮฒฮปฮตฯˆฮท ฯ„ฮฟฯ… ฮผฮฟฮฝฯ„ฮญฮปฮฟฯ… ฮบฮฑฮน ฯ€ฯŽฯ‚ ฮฟ ฯ…ฯ€ฮฟฮปฮฟฮณฮนฯƒฯ„ฮฎฯ‚ ฮญฮบฮฑฮฝฮต ฯ„ฮทฮฝ ฮตฯ€ฮนฮปฮฟฮณฮฎ ฯ„ฮฟฯ….

import numpy as np
np.random.seed(1)
i = 100
print(y_test_lime.iloc[i])
>50K
X_test_lime[i]
array([4.20000e+01, 4.00000e+00, 1.76286e+05, 7.00000e+00, 1.20000e+01,
       2.00000e+00, 4.00000e+00, 0.00000e+00, 4.00000e+00, 1.00000e+00,
       0.00000e+00, 0.00000e+00, 4.00000e+01, 3.80000e+01])

ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฯ‡ฯฮทฯƒฮนฮผฮฟฯ€ฮฟฮนฮฎฯƒฮตฯ„ฮต ฯ„ฮทฮฝ ฮตฯ€ฮตฮพฮฎฮณฮทฯƒฮท ฮผฮต ฯ„ฮฟ explore_instance ฮณฮนฮฑ ฮฝฮฑ ฮตฮปฮญฮณฮพฮตฯ„ฮต ฯ„ฮทฮฝ ฮตฮพฮฎฮณฮทฯƒฮท ฯ€ฮฏฯƒฯ‰ ฮฑฯ€ฯŒ ฯ„ฮฟ ฮผฮฟฮฝฯ„ฮญฮปฮฟ

exp = explainer.explain_instance(X_test_lime[i], predict_fn, num_features=6)
exp.show_in_notebook(show_all=False)

ฮ ฯฮฟฮตฯ„ฮฟฮนฮผฮฑฯƒฮฏฮฑ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ

ฮœฯ€ฮฟฯฮฟฯฮผฮต ฮฝฮฑ ฮดฮฟฯฮผฮต ฯŒฯ„ฮน ฮฟ ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮฎฯ‚ ฯ€ฯฮฟฮญฮฒฮปฮตฯˆฮต ฯƒฯ‰ฯƒฯ„ฮฌ ฯ„ฮฟ ฮฝฮฟฮนฮบฮฟฮบฯ…ฯฮนฯŒ. ฮคฮฟ ฮตฮนฯƒฯŒฮดฮทฮผฮฑ ฮตฮฏฮฝฮฑฮน, ฯ€ฯฮฌฮณฮผฮฑฯ„ฮน, ฯ€ฮฌฮฝฯ‰ ฮฑฯ€ฯŒ 50 ฯ‡ฮนฮปฮนฮฌฮดฮตฯ‚.

ฮคฮฟ ฯ€ฯฯŽฯ„ฮฟ ฯ€ฯฮฌฮณฮผฮฑ ฯ€ฮฟฯ… ฮผฯ€ฮฟฯฮฟฯฮผฮต ฮฝฮฑ ฯ€ฮฟฯฮผฮต ฮตฮฏฮฝฮฑฮน ฯŒฯ„ฮน ฮฟ ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮฎฯ‚ ฮดฮตฮฝ ฮตฮฏฮฝฮฑฮน ฯ„ฯŒฯƒฮฟ ฯƒฮฏฮณฮฟฯ…ฯฮฟฯ‚ ฮณฮนฮฑ ฯ„ฮนฯ‚ ฯ€ฯฮฟฮฒฮปฮตฯ€ฯŒฮผฮตฮฝฮตฯ‚ ฯ€ฮนฮธฮฑฮฝฯŒฯ„ฮทฯ„ฮตฯ‚. ฮ— ฮผฮทฯ‡ฮฑฮฝฮฎ ฯ€ฯฮฟฮฒฮปฮญฯ€ฮตฮน ฯŒฯ„ฮน ฯ„ฮฟ ฮฝฮฟฮนฮบฮฟฮบฯ…ฯฮนฯŒ ฮญฯ‡ฮตฮน ฮตฮนฯƒฯŒฮดฮทฮผฮฑ ฯ€ฮฌฮฝฯ‰ ฮฑฯ€ฯŒ 50 ฯ‡ฮนฮปฮนฮฌฮดฮตฯ‚ ฮผฮต ฯ€ฮนฮธฮฑฮฝฯŒฯ„ฮทฯ„ฮฑ 64%. ฮ‘ฯ…ฯ„ฯŒ ฯ„ฮฟ 64% ฮฑฯ€ฮฟฯ„ฮตฮปฮตฮฏฯ„ฮฑฮน ฮฑฯ€ฯŒ ฮบฮตฯ†ฮฑฮปฮฑฮนฮฟฯ…ฯ‡ฮนฮบฯŒ ฮบฮญฯฮดฮฟฯ‚ ฮบฮฑฮน ฮณฮฌฮผฮฟ. ฮคฮฟ ฮผฯ€ฮปฮต ฯ‡ฯฯŽฮผฮฑ ฯƒฯ…ฮผฮฒฮฌฮปฮปฮตฮน ฮฑฯฮฝฮทฯ„ฮนฮบฮฌ ฯƒฯ„ฮท ฮธฮตฯ„ฮนฮบฮฎ ฮบฮฑฯ„ฮทฮณฮฟฯฮฏฮฑ ฮบฮฑฮน ฮท ฯ€ฮฟฯฯ„ฮฟฮบฮฑฮปฮฏ ฮณฯฮฑฮผฮผฮฎ, ฮธฮตฯ„ฮนฮบฮฌ.

ฮŸ ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮฎฯ‚ ฮตฮฏฮฝฮฑฮน ฮผฯ€ฮตฯฮดฮตฮผฮญฮฝฮฟฯ‚ ฮตฯ€ฮตฮนฮดฮฎ ฯ„ฮฟ ฮบฮตฯ†ฮฑฮปฮฑฮนฮฑฮบฯŒ ฮบฮญฯฮดฮฟฯ‚ ฮฑฯ…ฯ„ฮฟฯ ฯ„ฮฟฯ… ฮฝฮฟฮนฮบฮฟฮบฯ…ฯฮนฮฟฯ ฮตฮฏฮฝฮฑฮน ฮผฮทฮดฮตฮฝฮนฮบฯŒ, ฮตฮฝฯŽ ฯ„ฮฟ ฮบฮตฯ†ฮฑฮปฮฑฮนฮฑฮบฯŒ ฮบฮญฯฮดฮฟฯ‚ ฮตฮฏฮฝฮฑฮน ฯƒฯ…ฮฝฮฎฮธฯ‰ฯ‚ ฮญฮฝฮฑฯ‚ ฮบฮฑฮปฯŒฯ‚ ฯ€ฯฮฟฮณฮฝฯ‰ฯƒฯ„ฮนฮบฯŒฯ‚ ฮดฮตฮฏฮบฯ„ฮทฯ‚ ฯ„ฮฟฯ… ฯ€ฮปฮฟฯฯ„ฮฟฯ…. ฮ•ฮพฮฌฮปฮปฮฟฯ…, ฯ„ฮฟ ฮฝฮฟฮนฮบฮฟฮบฯ…ฯฮนฯŒ ฮตฯฮณฮฌฮถฮตฯ„ฮฑฮน ฮปฮนฮณฯŒฯ„ฮตฯฮฟ ฮฑฯ€ฯŒ 40 ฯŽฯฮตฯ‚ ฯ„ฮทฮฝ ฮตฮฒฮดฮฟฮผฮฌฮดฮฑ. ฮ— ฮทฮปฮนฮบฮฏฮฑ, ฯ„ฮฟ ฮตฯ€ฮฌฮณฮณฮตฮปฮผฮฑ ฮบฮฑฮน ฯ„ฮฟ ฯ†ฯฮปฮฟ ฯƒฯ…ฮผฮฒฮฌฮปฮปฮฟฯ…ฮฝ ฮธฮตฯ„ฮนฮบฮฌ ฯƒฯ„ฮฟฮฝ ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮฎ.

ฮ•ฮฌฮฝ ฮท ฮฟฮนฮบฮฟฮณฮตฮฝฮตฮนฮฑฮบฮฎ ฮบฮฑฯ„ฮฌฯƒฯ„ฮฑฯƒฮท ฮฎฯ„ฮฑฮฝ ฮฌฮณฮฑฮผฮฟฯ‚, ฮฟ ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮฎฯ‚ ฮธฮฑ ฮตฮฏฯ‡ฮต ฯ€ฯฮฟฮฒฮปฮญฯˆฮตฮน ฮตฮนฯƒฯŒฮดฮทฮผฮฑ ฮบฮฌฯ„ฯ‰ ฮฑฯ€ฯŒ 50 ฯ‡ฮนฮปฮนฮฌฮดฮตฯ‚ (0.64-0.18 = 0.46)

ฮœฯ€ฮฟฯฮฟฯฮผฮต ฮฝฮฑ ฮดฮฟฮบฮนฮผฮฌฯƒฮฟฯ…ฮผฮต ฮผฮต ฮฌฮปฮปฮฟ ฮฝฮฟฮนฮบฮฟฮบฯ…ฯฮนฯŒ ฯ€ฮฟฯ… ฮญฯ‡ฮตฮน ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฮธฮตฮฏ ฮตฯƒฯ†ฮฑฮปฮผฮญฮฝฮฑ

temp.head(3)
temp.iloc[1,:-2]
age                  58
workclass             4
fnlwgt            68624
education            11
education_num         9
marital               2
occupation            4
relationship          0
race                  4
sex                   1
capital_gain          0
capital_loss          0
hours_week           45
native_country       38
Name: 20931, dtype: object
i = 1
print('This observation is', temp.iloc[i,-2:])
This observation is label        <=50K
predicted     >50K
Name: 20931, dtype: object
exp = explainer.explain_instance(temp.iloc[1,:-2], predict_fn, num_features=6)
exp.show_in_notebook(show_all=False)

ฮ ฯฮฟฮตฯ„ฮฟฮนฮผฮฑฯƒฮฏฮฑ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ

ฮŸ ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮฎฯ‚ ฯ€ฯฮฟฮญฮฒฮปฮตฯˆฮต ฮตฮนฯƒฯŒฮดฮทฮผฮฑ ฮบฮฌฯ„ฯ‰ ฯ„ฯ‰ฮฝ 50 ฯ‡ฮนฮปฮนฮฌฮดฯ‰ฮฝ ฮตฮฝฯŽ ฮตฮฏฮฝฮฑฮน ฮฑฮฝฮฑฮปฮทฮธฮฎฯ‚. ฮ‘ฯ…ฯ„ฯŒ ฯ„ฮฟ ฮฝฮฟฮนฮบฮฟฮบฯ…ฯฮนฯŒ ฯ†ฮฑฮฏฮฝฮตฯ„ฮฑฮน ฯ€ฮตฯฮฏฮตฯฮณฮฟ. ฮ”ฮตฮฝ ฮญฯ‡ฮตฮน ฮบฮตฯ†ฮฑฮปฮฑฮนฮฟฯ…ฯ‡ฮนฮบฯŒ ฮบฮญฯฮดฮฟฯ‚, ฮฟฯฯ„ฮต ฮบฮตฯ†ฮฑฮปฮฑฮนฮฟฯ…ฯ‡ฮนฮบฮฎ ฮฑฯ€ฯŽฮปฮตฮนฮฑ. ฮ•ฮฏฮฝฮฑฮน ฮดฮนฮฑฮถฮตฯ…ฮณฮผฮญฮฝฮฟฯ‚ ฮบฮฑฮน ฮตฮฏฮฝฮฑฮน 60 ฮตฯ„ฯŽฮฝ ฮบฮฑฮน ฮตฮฏฮฝฮฑฮน ฮผฮฟฯฯ†ฯ‰ฮผฮญฮฝฮฟฯ‚, ฮดฮทฮปฮฑฮดฮฎ Education_num > 12. ฮฃฯฮผฯ†ฯ‰ฮฝฮฑ ฮผฮต ฯ„ฮฟ ฮณฮตฮฝฮนฮบฯŒ ฯ€ฯฯŒฯ„ฯ…ฯ€ฮฟ, ฮฑฯ…ฯ„ฯŒ ฯ„ฮฟ ฮฝฮฟฮนฮบฮฟฮบฯ…ฯฮนฯŒ ฮธฮฑ ฯ€ฯฮญฯ€ฮตฮน, ฯŒฯ€ฯ‰ฯ‚ ฮตฮพฮทฮณฮตฮฏ ฮฟ ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮฎฯ‚, ฮฝฮฑ ฮญฯ‡ฮตฮน ฮตฮนฯƒฯŒฮดฮทฮผฮฑ ฮบฮฌฯ„ฯ‰ ฮฑฯ€ฯŒ 50 ฯ‡ฮนฮปฮนฮฌฮดฮตฯ‚.

ฮ ฯฮฟฯƒฯ€ฮฑฮธฮตฮฏฯ‚ ฮฝฮฑ ฯ€ฮฑฮฏฮพฮตฮนฯ‚ ฮผฮต ฯ„ฮฟ LIME. ฮ˜ฮฑ ฯ€ฮฑฯฮฑฯ„ฮทฯฮฎฯƒฮตฯ„ฮต ฮผฮตฮณฮฌฮปฮฑ ฮปฮฌฮธฮท ฮฑฯ€ฯŒ ฯ„ฮฟฮฝ ฯ„ฮฑฮพฮนฮฝฮฟฮผฮทฯ„ฮฎ.

ฮœฯ€ฮฟฯฮตฮฏฯ„ฮต ฮฝฮฑ ฮตฮปฮญฮณฮพฮตฯ„ฮต ฯ„ฮฟ GitHub ฯ„ฮฟฯ… ฮบฮฑฯ„ฯŒฯ‡ฮฟฯ… ฯ„ฮทฯ‚ ฮฒฮนฮฒฮปฮนฮฟฮธฮฎฮบฮทฯ‚. ฮ ฮฑฯฮญฯ‡ฮฟฯ…ฮฝ ฮตฯ€ฮนฯ€ฮปฮญฮฟฮฝ ฯ„ฮตฮบฮผฮทฯฮฏฯ‰ฯƒฮท ฮณฮนฮฑ ฯ„ฮฑฮพฮนฮฝฯŒฮผฮทฯƒฮท ฮตฮนฮบฯŒฮฝฯ‰ฮฝ ฮบฮฑฮน ฮบฮตฮนฮผฮญฮฝฮฟฯ….

ฮ ฮตฯฮฏฮปฮทฯˆฮท

ฮ ฮฑฯฮฑฮบฮฌฯ„ฯ‰ ฮตฮฏฮฝฮฑฮน ฮผฮนฮฑ ฮปฮฏฯƒฯ„ฮฑ ฮผฮต ฮผฮตฯฮนฮบฮญฯ‚ ฯ‡ฯฮฎฯƒฮนฮผฮตฯ‚ ฮตฮฝฯ„ฮฟฮปฮญฯ‚ ฮผฮต ฮญฮบฮดฮฟฯƒฮท scikit Learn >=0.20

ฮดฮทฮผฮนฮฟฯ…ฯฮณฮฏฮฑ ฯƒฯ…ฮฝฯŒฮปฮฟฯ… ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ ฯ„ฯฮญฮฝฮฟฯ…/ฮดฮฟฮบฮนฮผฯŽฮฝ ฮฟฮน ฮฑฯƒฮบฮฟฯฮผฮตฮฝฮฟฮน ฯ‡ฯ‰ฯฮฏฮถฮฟฮฝฯ„ฮฑฮน
ฮฆฯ„ฮนฮฌฮพฯ„ฮต ฮญฮฝฮฑฮฝ ฮฑฮณฯ‰ฮณฯŒ
ฮตฯ€ฮนฮปฮญฮพฯ„ฮต ฯ„ฮท ฯƒฯ„ฮฎฮปฮท ฮบฮฑฮน ฮตฯ†ฮฑฯฮผฯŒฯƒฯ„ฮต ฯ„ฮฟฮฝ ฮผฮตฯ„ฮฑฯƒฯ‡ฮทฮผฮฑฯ„ฮนฯƒฮผฯŒ ฮผฮตฯ„ฮฑฯƒฯ‡ฮทฮผฮฑฯ„ฮนฯƒฯ„ฮฎฯ‚ ฯƒฯ„ฮฎฮปฮทฯ‚ ฮบฮฑฯ„ฮฑฯƒฮบฮตฯ…ฮฎฯ‚
ฮตฮฏฮดฮฟฯ‚ ฮผฮตฯ„ฮฑฯƒฯ‡ฮทฮผฮฑฯ„ฮนฯƒฮผฮฟฯ
ฯ„ฯ…ฯ€ฮฟฯ€ฮฟฮนฯŽ StandardScaler
ฮตฮปฮฌฯ‡. ฮผฮญฮณ MinMaxScaler
ฮŸฮผฮฑฮปฯฮฝฯ‰ ฮšฮฑฮฝฮฟฮฝฮฟฯ€ฮฟฮนฮทฯ„ฮฎฯ‚
ฮšฮฑฯ„ฮฑฮปฮฟฮณฮนฯƒฮผฯŒฯ‚ ฯ„ฮนฮผฮฎฯ‚ ฯ€ฮฟฯ… ฮปฮตฮฏฯ€ฮตฮน ฮฑฯ€ฮฟฮดฮฏฮดฯ‰
ฮœฮตฯ„ฮฑฯ„ฯฮฟฯ€ฮฎ ฮบฮฑฯ„ฮทฮณฮฟฯฮนฯŽฮฝ OneHotEncoder
ฮ ฯฮฟฯƒฮฑฯฮผฮฟฮณฮฎ ฮบฮฑฮน ฮผฮตฯ„ฮฑฯ„ฯฮฟฯ€ฮฎ ฯ„ฯ‰ฮฝ ฮดฮตฮดฮฟฮผฮญฮฝฯ‰ฮฝ fit_transform
ฮฆฯ„ฮนฮฌฮพฯ„ฮต ฯ„ฮฟฮฝ ฮฑฮณฯ‰ฮณฯŒ make_pipeline
ฮ’ฮฑฯƒฮนฮบฯŒ ฮผฮฟฮฝฯ„ฮญฮปฮฟ
ฮปฮฟฮณฮนฮบฮฎ ฯ€ฮฑฮปฮนฮฝฮดฯฯŒฮผฮทฯƒฮท ฮ›ฮฟฮณฮนฯƒฯ„ฮนฮบฮฎ ฯ€ฮฑฮปฮนฮฝฮดฯฯŒฮผฮทฯƒฮท
XGBoost XGBClassifier
ฮฮตฯ…ฯฮนฮบฯŒ ฮดฮฏฮบฯ„ฯ…ฮฟ MLPClassifier
ฮ‘ฮฝฮฑฮถฮฎฯ„ฮทฯƒฮท ฯ€ฮปฮญฮณฮผฮฑฯ„ฮฟฯ‚ GridSearchCV
ฮคฯ…ฯ‡ฮฑฮนฮฟฯ€ฮฟฮนฮทฮผฮญฮฝฮท ฮฑฮฝฮฑฮถฮฎฯ„ฮทฯƒฮท ฮคฯ…ฯ‡ฮฑฮนฮฟฯ€ฮฟฮนฮทฮผฮญฮฝฮท ฮ‘ฮฝฮฑฮถฮฎฯ„ฮทฯƒฮท CV

ฮฃฯ…ฮฝฮฟฯˆฮฏฯƒฯ„ฮต ฮฑฯ…ฯ„ฮฎฮฝ ฯ„ฮทฮฝ ฮฑฮฝฮฌฯฯ„ฮทฯƒฮท ฮผฮต: