Academia.eduAcademia.edu

Case Study Regression

AI-generated Abstract

This case study explores multiple linear regression as a predictive modeling technique within various data mining applications. It emphasizes the importance of dividing data into training and validation sets and addresses the relaxation of normal distribution assumptions for errors. Furthermore, the study discusses subset selection methods to enhance predictive performance, highlighting the significance of Mallow's C p criterion in identifying effective models within the context of limited observations.

Key takeaways

  • After this review, we introduce methods for identifying subsets of the independent variables to improve predictions.
  • There is a continuous random variable called the dependent variable, Y , and a number of independent variables, x 1 , x 2 , .
  • These coefficient estimates are used to make predictions for each case in the validation data.
  • A frequent problem in data mining is that of using a regression equation to predict the value of a dependent variable when we have a number of variables available to choose as independent variables in our model.
  • This suggests that we may be better off in terms of MSE of predictions if we use this subset rather than the full model of size 7 with all six variables in the model.