0% found this document useful (0 votes)

65 views9 pages

DNN Tutorial for Data Scientists

This document summarizes a tutorial on using a deep neural network (DNN) to perform regression on the Boston housing price dataset using Scikit-Learn. The tutorial loads the Boston housing dataset, splits it into training and test sets, scales the features, trains DNN models with hidden layers of size [10,10] and [13,13] on the training set, and evaluates the models by predicting on the test set and calculating the mean squared error. It also generates a graph of the predicted vs actual housing prices and discusses how to view the neural network graph using TensorBoard.

Uploaded by

hopkeinst

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views9 pages

DNN Tutorial for Data Scientists

Uploaded by

hopkeinst

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

SKLearn Tutorial: DNN on Boston Data

This tutorial follows very closely two other good tutorials and merges elements from both:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/skflow/boston.py
(https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/skflow/boston.py)

http://bigdataexaminer.com/uncategorized/how-to-run-linear-regression-in-python-scikit-
learn/ (http://bigdataexaminer.com/uncategorized/how-to-run-linear-regression-in-python-
scikit-learn/)

D. Thiebaut
August 2016.

Get the Boston Data

This part is basically taken directly from the bigdataexaminer
(http://bigdataexaminer.com/uncategorized/how-to-run-linear-regression-in-python-scikit-learn/)
tutorial.

Imports, ﬁrst!

In [170]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from sklearn import cross_validation
from sklearn import metrics
from sklearn import preprocessing
import tensorflow as tf
#from tensorflow.contrib import learn
from tensorflow.contrib import learn
import pandas as pd

Get the data...

In [171]: from sklearn.datasets import load_boston

boston = load_boston()
print( "type of boston = ", type(boston))

type of boston = <class 'sklearn.datasets.base.Bunch'>

In [172]: boston.keys()

Out[172]: ['data', 'feature_names', 'DESCR', 'target']

In [173]: boston.data.shape

Out[173]: (506, 13)

In [174]: print( boston.feature_names )

['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
'B' 'LSTAT']

In [175]: print( boston.DESCR )

Boston House Prices dataset

Notes
------
Data Set Characteristics:

:Number of Instances: 506

:Number of Attributes: 13 numeric/categorical predictive

:Median Value (attribute 14) is usually the target

:Attribute Information (in order):

- CRIM per capita crime rate by town
- ZN proportion of residential land zoned for lots over 25,
000 sq.ft.
- INDUS proportion of non-retail business acres per town
- CHAS Charles River dummy variable (= 1 if tract bounds rive
r; 0 otherwise)
- NOX nitric oxides concentration (parts per 10 million)
In [176]: print( "target = ",
",".join( str(k) for k in boston.target[0:5] ),
"...",
", ".join( str(k) for k in boston.target[-5:] ) )

target = 24.0,21.6,34.7,33.4,36.2 ... 22.4, 20.6, 23.9, 22.0, 11.9

Convert the boston data into a panda data-frame

In [177]: bostonDF = pd.DataFrame( boston.data )
bostonDF.head()

Out[177]: 0 1 2 3 4 5 6 7 8 9 10 11 12

0 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98

1 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14

2 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03

3 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94

4 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33

Add column names

In [178]: bostonDF.columns = boston.feature_names

bostonDF.head()

Out[178]: CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT

0 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98

1 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14

2 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03

3 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94

4 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33

In [179]: print( "Number of features = ", len( bostonDF.columns ) )

Number of features = 13

In [180]: X = bostonDF
y = boston.target
print( "shape of X = ", X.shape, " shape of y = ", y.shape )

shape of X = (506, 13) shape of y = (506,)

Split the data into training and test data.

In [181]: X_train, X_test, y_train, y_test = cross_validation.train_test_split(

X, y, test_size=0.2, random_state=42)

Scale the X data to 0 mean and unit standard deviation

In [182]: scaler = preprocessing.StandardScaler( )
X_train = scaler.fit_transform( X_train )
X_train

Out[182]: array([[ 1.29133866, -0.50032012, 1.03323679, ..., 0.84534281,

-0.07433689, 1.75350503],
[-0.3338103 , -0.50032012, -0.41315956, ..., 1.20474139,
0.4301838 , -0.5614742 ],
[-0.40072291, 1.01327135, -0.71521823, ..., -0.63717631,
0.06529747, -0.65159505],
...,
[-0.40294118, 2.95931752, -1.30336132, ..., -0.59225149,
0.37901005, -0.91069248],
[ 0.85524904, -0.50032012, 1.03323679, ..., 0.84534281,
-2.69458597, 1.52257036],
[-0.37881118, -0.50032012, -0.35216694, ..., 1.15981657,
-3.12158061, -0.25731635]])

In [183]: y_train
22. , 7.2, 20.4, 13.8, 13. , 18.4, 23.1, 21.2, 23.1,
23.5, 50. , 26.6, 22.2, 50. , 8.3, 23.3, 21.7, 18.9,
18.4, 17.4, 13.4, 12.1, 26.6, 21.7, 28.4, 20.5, 22. ,
13.9, 11.3, 29.9, 26.6, 10.5, 23.2, 24.4, 46. , 21.9,
7.5, 36.2, 44. , 17.8, 27.5, 37.6, 14.1, 28.1, 10.2,
19.1, 43.8, 27.9, 25. , 16. , 16.6, 13.2, 50. , 22.2,
32.9, 15.2, 14.8, 13.8, 24.3, 33.8, 22.3, 50. , 9.5,
13.3, 22.2, 18.1, 18. , 25. , 16.5, 23. , 20.1, 33. ,
24.8, 18.2, 13.1, 34.9, 10.2, 19.9, 27.9, 23.3, 35.1,
12.8, 22. , 18.5, 25.1, 22.5, 22.4, 28.6, 19.5, 24.8,
24.5, 21.4, 33.1, 22.9, 20.7, 24.1, 50. , 24.7, 28.7,
7.2, 37. , 20.3, 30.1, 19.5, 23.4, 11.5, 21.6, 14.9,
15.2, 19.4, 8.4, 28. , 22.6, 13.5, 14.5, 31. , 10.9,
21.9, 22. , 19. , 21.4, 25. , 17.5, 36.5, 20.1, 20.4,
16.2, 23.6, 7.4, 35.2, 50. , 19.3, 21.2, 15.6, 33.4,
19.1, 21. , 23.7, 18.9, 16.8, 19.7, 17.7, 22.6, 11.8,
34.9, 20.6, 20.2, 32. , 22.3, 23.3, 14.4, 31.2, 24. ,
29.6, 19.6, 21.6, 20. , 27. , 33.2, 15.4, 30.5, 7.2,
23.9, 16.3, 23.9, 50. , 22.8, 15.4, 19.2, 19.6, 22.6,
33.2, 50. , 22.2, 14.9, 19.8, 23.7, 19. , 20.3, 11.9,
13.6, 29.8, 21.7, 19.5, 21.1, 24.5, 13.4, 18.6])

Building a [10,10] Neural Net

In [184]: # Build a 2 layer fully connected DNN

feature_columns = boston.feature_names
regressor = learn.DNNRegressor( feature_columns=None,
hidden_units=[10, 10] )#,
#model_dir = '/tmp/tf')
regressor.fit( X_train, y_train, steps=5000, batch_size=1 )

Out[184]: DNNRegressor()
Predict and score

In [185]: y_predicted = regressor.predict( scaler.transform( X_test ) )

score = metrics.mean_squared_error( y_predicted, y_test )
print('MSE: {0:f}'.format(score) )

MSE: 14.098925

In [186]: import matplotlib.pyplot as plt

%matplotlib inline
plt.scatter( y_predicted, y_test, s=5 )
plt.xlabel( "Predicted Prices")
plt.ylabel( "Real Prices")
plt.title( "Real vs Predicted Housing Prices")

Out[186]: <matplotlib.text.Text at 0x124891e10>

Generating the graph of the NN with

tensorboard
In order to generate the graph of the network, we simply add "model_dir = '/tmp/tf' " to the
DNNRegressor() instantiation:

In [187]: '''
regressor = learn.DNNRegressor( feature_columns=None,
hidden_units=[10, 10] ),
model_dir = '/tmp/tf')
'''

Out[187]: "\nregressor = learn.DNNRegressor( feature_columns=None,\n

hidden_units=[10, 10] ),\n
model_dir = '/tmp/tf')\n"
This might generate an error when we try to ﬁt, but that will happen after the useful information is
written in /tmp/tf, and we can safely run tensorboard as follows, from the command line:

tensorboard --logdir /tmp/tf

And here's the graph we get for the 10x10 NN:

Building a [13,13] Neural Net

In [188]: # Build a 2 layer fully connected DNN
regressor = learn.DNNRegressor( feature_columns=None,
hidden_units=[13, 13])
regressor.fit(X_train, y_train, steps=5000, batch_size=10)

Out[188]: DNNRegressor()

Predict and score

In [189]: y_test_predicted = regressor.predict( scaler.transform( X_test ) )

scoreTest = metrics.mean_squared_error( y_test_predicted, y_test )
print('MSE Test Data: {0:f}'.format(scoreTest) )

y_train_predicted = regressor.predict( X_train )

scoreTrain = metrics.mean_squared_error( y_train_predicted, y_train )
print('MSE Train Data: {0:f}'.format(scoreTrain ) )

MSE Test Data: 10.162460

MSE Train Data: 5.946833

In [190]: import matplotlib.pyplot as plt

%matplotlib inline
plt.scatter( y_predicted, y_test, s=5 )
plt.xlabel( "Predicted Prices")
plt.ylabel( "Real Prices")
plt.title( "Real vs Predicted Housing Prices")

Out[190]: <matplotlib.text.Text at 0x1281e5a10>

Building a [13,13,10] Neural Net

In [191]: # Build a 2 layer fully connected DNN
feature_columns = boston.feature_names
regressor = learn.DNNRegressor( feature_columns=None,
hidden_units=[13, 13, 10],
model_dir = '/tmp/tf/')
regressor.fit(X_train, y_train, steps=5000, batch_size=1)

Out[191]: DNNRegressor()

Predict and score

In [192]: y_test_predicted = regressor.predict( scaler.transform( X_test ) )

scoreTest = metrics.mean_squared_error( y_test_predicted, y_test )
print('MSE Test Data: {0:f}'.format(scoreTest) )

y_train_predicted = regressor.predict( X_train )

scoreTrain = metrics.mean_squared_error( y_train_predicted, y_train )
print('MSE Train Data: {0:f}'.format(scoreTrain ) )

MSE Test Data: 11.432168

MSE Train Data: 10.082671

In [193]: import matplotlib.pyplot as plt

%matplotlib inline
plt.scatter( y_predicted, y_test, s=5 )
plt.xlabel( "Predicted Prices")
plt.ylabel( "Real Prices")
plt.title( "Real vs Predicted Housing Prices")

Out[193]: <matplotlib.text.Text at 0x128ec5bd0>

Plotting Residuals
In [194]: plt.scatter( y_train_predicted, y_train_predicted - y_train,
c ='b', s=30, alpha=0.4 )
plt.scatter(y_test_predicted, y_test_predicted - y_test,
c ='g', s=30 )
plt.hlines( y=0, xmin=-5, xmax=55)
plt.title( "Residuals" )
plt.ylabel( "Residuals" )

Out[194]: <matplotlib.text.Text at 0x128f50110>

In [ ]:

Assignment 4
No ratings yet
Assignment 4
7 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
9 pages
ML Manual
No ratings yet
ML Manual
30 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
Mlalllabprgs
No ratings yet
Mlalllabprgs
17 pages
Linear Reg
No ratings yet
Linear Reg
25 pages
Linear Regression with Boston Housing Data
No ratings yet
Linear Regression with Boston Housing Data
14 pages
ML Record Print
No ratings yet
ML Record Print
20 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
18 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
33 pages
ML Programs
No ratings yet
ML Programs
14 pages
Python - Vectorized - Tute - Jupyter Notebook
No ratings yet
Python - Vectorized - Tute - Jupyter Notebook
16 pages
V
No ratings yet
V
8 pages
AI Regression & Classification Guide
No ratings yet
AI Regression & Classification Guide
47 pages
Titanic Shuffle Analysis in ML Lab
No ratings yet
Titanic Shuffle Analysis in ML Lab
24 pages
Experiment 1111
No ratings yet
Experiment 1111
25 pages
Machine Learning Programs
No ratings yet
Machine Learning Programs
10 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
26 pages
Ai Last 5
No ratings yet
Ai Last 5
4 pages
ML Labmanual
No ratings yet
ML Labmanual
33 pages
ML Record
No ratings yet
ML Record
19 pages
Wa0003
No ratings yet
Wa0003
16 pages
T2 Summary VHA
No ratings yet
T2 Summary VHA
14 pages
Lab Assignment 1 - KNN
No ratings yet
Lab Assignment 1 - KNN
6 pages
Argha's ML LAB - 240927 - 121838
No ratings yet
Argha's ML LAB - 240927 - 121838
13 pages
Project 4 - House Price Prediction - Ipynb - Colab
No ratings yet
Project 4 - House Price Prediction - Ipynb - Colab
5 pages
Machine Learning Lab Manaul BCSL606
No ratings yet
Machine Learning Lab Manaul BCSL606
27 pages
Coding Question
No ratings yet
Coding Question
6 pages
PRJ Housuing Price
No ratings yet
PRJ Housuing Price
14 pages
ML - Datascience Manual
No ratings yet
ML - Datascience Manual
64 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
16 pages
Mlda - Lab
No ratings yet
Mlda - Lab
35 pages
ML Manual
No ratings yet
ML Manual
9 pages
Machinelearning
No ratings yet
Machinelearning
26 pages
California Housing Data Analysis
No ratings yet
California Housing Data Analysis
1 page
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
Pattern - Recognition - 3 - Code With Output
No ratings yet
Pattern - Recognition - 3 - Code With Output
7 pages
Shubham Pract 6 - Merged
No ratings yet
Shubham Pract 6 - Merged
12 pages
ML Experiment WithDataset
No ratings yet
ML Experiment WithDataset
23 pages
ML Journal External
No ratings yet
ML Journal External
14 pages
Lab Extern L
No ratings yet
Lab Extern L
8 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
ML Full For Print New 1
No ratings yet
ML Full For Print New 1
38 pages
Experiment 1
No ratings yet
Experiment 1
19 pages
Boston Dataset
No ratings yet
Boston Dataset
6 pages
ML Lab Manual
No ratings yet
ML Lab Manual
60 pages
Python ML Algorithms Guide
No ratings yet
Python ML Algorithms Guide
7 pages
Pca 2382487
No ratings yet
Pca 2382487
8 pages
ML Spy Programs
No ratings yet
ML Spy Programs
16 pages
Continuous Assessment
No ratings yet
Continuous Assessment
4 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
Machine Learning
No ratings yet
Machine Learning
22 pages
LAB-4 Report
No ratings yet
LAB-4 Report
21 pages
External
No ratings yet
External
11 pages
Tanu Raman ML Lab File
No ratings yet
Tanu Raman ML Lab File
21 pages
Approachin190808095205 PDF
No ratings yet
Approachin190808095205 PDF
112 pages
ML RECORD EX 5,6,7,8,9 (Without Border)
No ratings yet
ML RECORD EX 5,6,7,8,9 (Without Border)
13 pages
DNN Lab Manual for MCA Semester II
No ratings yet
DNN Lab Manual for MCA Semester II
34 pages
BCSL606 Machine Learning Lab
No ratings yet
BCSL606 Machine Learning Lab
33 pages
Fable - Walkthrough
No ratings yet
Fable - Walkthrough
110 pages
IOS Security Guide Oct 2014
No ratings yet
IOS Security Guide Oct 2014
50 pages
Customer Access Request Form
44% (9)
Customer Access Request Form
2 pages
10L Statement
No ratings yet
10L Statement
4 pages
Employment 12th Class Part Notes by Tariq Sir
No ratings yet
Employment 12th Class Part Notes by Tariq Sir
28 pages
BCP 10B Documentation Reference Manual
No ratings yet
BCP 10B Documentation Reference Manual
166 pages
AHGL Installing CyberOps Workstation and Other VMs
No ratings yet
AHGL Installing CyberOps Workstation and Other VMs
3 pages
Unseen Passage 0.1
No ratings yet
Unseen Passage 0.1
3 pages
Kanban Tutorial
100% (3)
Kanban Tutorial
29 pages
Trees Code HELP
No ratings yet
Trees Code HELP
4 pages
IP Cam Manualul Utilizatorului
No ratings yet
IP Cam Manualul Utilizatorului
31 pages
Asco LV Ats & PCS
No ratings yet
Asco LV Ats & PCS
59 pages
SN Site ID City: AK0313 AK0316 AK0320 AK0326 AK0327 AK0331 AK0318 AK0317
No ratings yet
SN Site ID City: AK0313 AK0316 AK0320 AK0326 AK0327 AK0331 AK0318 AK0317
9 pages
AR in RETAIL Sector Report September 2019 v1.21 by STIQ
No ratings yet
AR in RETAIL Sector Report September 2019 v1.21 by STIQ
35 pages
High Availability Disaster Recovery For Sap Applications
No ratings yet
High Availability Disaster Recovery For Sap Applications
33 pages
CX Designer
No ratings yet
CX Designer
116 pages
Cyber
No ratings yet
Cyber
29 pages
Plantilla PowerPoint de Primeros Auxilios
No ratings yet
Plantilla PowerPoint de Primeros Auxilios
45 pages
Poster
No ratings yet
Poster
1 page
Tronsmart T6 Plus Upgraded Edition Manual
No ratings yet
Tronsmart T6 Plus Upgraded Edition Manual
38 pages
Bonet Englis Case Study
No ratings yet
Bonet Englis Case Study
3 pages
Professor Karl Heinz Hoffmann, Professor Michael Schreiber (Auth
No ratings yet
Professor Karl Heinz Hoffmann, Professor Michael Schreiber (Auth
311 pages
MetaboScape 5.0 User Manual
No ratings yet
MetaboScape 5.0 User Manual
272 pages
Model Paper 2 BEEE
No ratings yet
Model Paper 2 BEEE
2 pages
Quest - Stress Free SQL Queries-Slides
No ratings yet
Quest - Stress Free SQL Queries-Slides
16 pages
Piping Work: Objective
No ratings yet
Piping Work: Objective
3 pages
Simultaneous Equation Models Explained
No ratings yet
Simultaneous Equation Models Explained
74 pages
Ericsson RAN OAM
100% (4)
Ericsson RAN OAM
210 pages
LAB6 Final
No ratings yet
LAB6 Final
4 pages
Chapter # 1 (Management and Leadership) : Mind Maps
No ratings yet
Chapter # 1 (Management and Leadership) : Mind Maps
11 pages

DNN Tutorial for Data Scientists

Uploaded by

DNN Tutorial for Data Scientists

Uploaded by

SKLearn Tutorial: DNN on Boston Data

Get the Boston Data

Get the data...

In [171]: from sklearn.datasets import load_boston

type of boston = <class 'sklearn.datasets.base.Bunch'>

Out[172]: ['data', 'feature_names', 'DESCR', 'target']

Out[173]: (506, 13)

In [174]: print( boston.feature_names )

In [175]: print( boston.DESCR )

Boston House Prices dataset

:Number of Instances: 506

:Number of Attributes: 13 numeric/categorical predictive

:Median Value (attribute 14) is usually the target

:Attribute Information (in order):

target = 24.0,21.6,34.7,33.4,36.2 ... 22.4, 20.6, 23.9, 22.0, 11.9

Convert the boston data into a panda data-frame

Add column names

In [178]: bostonDF.columns = boston.feature_names

In [179]: print( "Number of features = ", len( bostonDF.columns ) )

shape of X = (506, 13) shape of y = (506,)

Split the data into training and test data.

In [181]: X_train, X_test, y_train, y_test = cross_validation.train_test_split(

Scale the X data to 0 mean and unit standard deviation

Out[182]: array([[ 1.29133866, -0.50032012, 1.03323679, ..., 0.84534281,

Building a [10,10] Neural Net

In [184]: # Build a 2 layer fully connected DNN

In [185]: y_predicted = regressor.predict( scaler.transform( X_test ) )

In [186]: import matplotlib.pyplot as plt

Out[186]: <matplotlib.text.Text at 0x124891e10>

Generating the graph of the NN with

Out[187]: "\nregressor = learn.DNNRegressor( feature_columns=None,\n

tensorboard --logdir /tmp/tf

And here's the graph we get for the 10x10 NN:

Building a [13,13] Neural Net

Predict and score

In [189]: y_test_predicted = regressor.predict( scaler.transform( X_test ) )

y_train_predicted = regressor.predict( X_train )

MSE Test Data: 10.162460

In [190]: import matplotlib.pyplot as plt

Out[190]: <matplotlib.text.Text at 0x1281e5a10>

Building a [13,13,10] Neural Net

Predict and score

In [192]: y_test_predicted = regressor.predict( scaler.transform( X_test ) )

y_train_predicted = regressor.predict( X_train )

MSE Test Data: 11.432168

In [193]: import matplotlib.pyplot as plt

Out[193]: <matplotlib.text.Text at 0x128ec5bd0>

Out[194]: <matplotlib.text.Text at 0x128f50110>

You might also like