CS6923 Machine Learning, Fall 2017
Prof. Hellerstein, NYU School of Engineering
Homework 3
Submit on NYU Classes by Fri. Oct. 20 at noon. You may work together
with one other person on this homework. If you do that, hand in JUST ONE
homework for the two of you, with both of your names on it. You may *discuss*
this homework with other students but YOU MAY NOT SHARE WRITTEN
ANSWERS OR CODE WITH ANYONE BUT YOUR PARTNER.
IMPORTANT SUBMISSION INSTRUCTIONS: Please submit your
solutions in 3 separate files: one file for your written answers to Part I, one file
for your written answers/output for the questions in Part II, and one file with
your code (a zip file if your code requires more than one file).
Part I: Written Exercises
1. A movie company wants to predict which of its new movies will be suc-
cessful. It has data about the previous films it has released.
(a) Suppose we want to formulate this as a regression problem. What
value should be predicted? It should measure success in a mean-
ingful way, and be something that could be computed automatically
from data (rather than by asking a person). There is more than one
possible correct answer to this question.
(b) Suppose we want to predict based on only one input attribute, where
the attribute is real-valued. Describe an attribute (in English) that
might be helpful in making our predictions.
(c) Now consider the problem of training a predictor that uses only the
one attribute you have described. Do you think that constructing a
linear model (i.e., fitting a line) would be a reasonable choice? Why
or why not? If so, do you think the slope of the learned line would
be positive or negative? Explain your answer.
2. Consider the following training set
x r
3 5
2 9
7 10
1 4
(a) Using the closed-form formula we covered in class, compute the co-
efficients of the line y = w1 x + w0 that minimizes squared error.
1
(b) Using the following formula for squared error, compute the squared
error of your line on the above training set.
N
X
E= (y t − rt )2
t=1
where y t is the predicted value for example xt .
3. A researcher studying a certain radioactive isotope believes that its mass
decays exponentially with time, according to the following model (for-
mula):
m(x) ≈ αe−βx
where x is the time variable, α and β are constants, and m(x) is the mass
at time x.
The researcher wants to learn the constants α and β from training data.
Because the above model is non-linear, the researcher cannot use linear
regression directly on the training data.
(a) Taking logs, show that you can transform the above model into one
that is linear in x.
(b) Give a formula for computing the values of α and β from a training
set {xt , rt }N
t=1 so as to minimize the squared error between the logs
t
of the predicted values P y andt the given values rt , as given in the
t 2
following expression: t (log y − log r ) .
(c) Write a few lines of Python or Matlab code that would compute the
values of α and β, according to the formula you just specified. (You
will use this in Part 2.)
4. Consider a classification problem with real valued attributes, and two
classes, C1 and C2 .
If we apply logistic regression to this problem, we learn an expression for
1
P [C1 |x] of the form f (x) = T , where x and w are d-dimensional
1+e−(w x+w0 )
vectors. Usually, we predict class C1 for x if f (x) ≥ 0.5. This is true iff
wT x + w0 ≥ 0. This is a linear function of x. Therefore, our prediction
uses the linear discriminant function g(x) = wT x + w0 ≥ 0.
If we want to avoid false positives, we might instead predict class C1 if
f (x) ≥ .75. Which linear discriminant function would be used for predic-
tion in this case?
Part 2: Programming Exercises
2
5. Write a Python or Matlab program that reads in a csv file called census-
data.csv, which is a regression dataset. The file has two columns. The first
column corresponds to the first attribute, year. The second corresponds
to population. Each row is an example, giving the population (in millions)
in the stated year (starting from year 1790).
The program should fit a model of the form y = αe−βx where x is the year,
and y is the predicted population, by finding the α and β that minimize
t t 2
P
t (log y − log r ) , To do this, use the Python or Matlab lines of code
you wrote for this purpose in Part I.
Your program should output theP values of α and β, the actual sum squared
error t (y t − rt )2 , the value of t (log y t − log rt )2 , and a plot of the data
P
with the learned curve.
were chosen to minimize t (log y t −
P
Notes: The α and β that you computed
t 2
) , and may not minimize t (y t −rt )2 . (In fact, the value you obtain
P
log rP
for t (y t − rt )2 may be very large.) Nevertheless, we hope the computed
α and β will produce a curve that fits the original data well.
In MATLAB, if the training data is in column vectors x and r, and the
predicted values for x are in column vector y, the plot can be formed using
the commands:
scatter(x,r)
hold on
plot(x,y)
6. Repeat the previous exercise, but this time, subtract 1790 from each entry
in column 2 so that the years begin with
P 0. Again, output theP values of α
and β, the actual sum squared error t (y t − rt )2 , the value of t (log y t −
log rt )2 , and a plot of the data with the learned curve.
Did this improve the fit? Explain your answer.
7. In Matlab and Python, there are tools that allow you to fit polynomials to
data, to minimize squared error. For example, in Matlab there is polyfit,
and in numpy there is numpy.polyfit. Use one of these two and try and
fit polynomials of degree 1 and 2 to the data in censusdata.csv. Show the
plots of the two resulting curves on the same graph.
(For Matlab, see https://www.mathworks.com/help/matlab/ref/polyfit.
html)
Then try to fit a curve of degree 3. Report what happens.
8. Suppose you are given a choice of predicting future population growth by
using one of the curves that you have learned above. Which one would you
choose and why? Do you think you have enough information to make a
good choice? You can answer this question either by giving an explanation
in English, or by giving some experimental evidence, or both.