5.
Correlation and regression
DATE LEARNED @October 25, 2023
RETENTION 🟧🟧🟧🟧🟧🟧
NEXT REP. October 29, 2023
subject data analysis
notes
R-STUDIO COMMANDS
1. Scatterplot
>plot(Y~X) where Y is the dependent variable and X the explanatory variable
2. Correlation coefficient (measures the possible linear association between
the variables)
>cor(X,Y)
3. Regression line:
fit<-lm(Y~X)
* intercept → slope of the function
To get the regression line on the scatter plot:
>abline(fit)
or you can just do:
>abline(lm(Y~X)
To calculate the score of a student that got a 70 on the midterm for example:
>predict(fit, data.frame(xvariable=70)) (you can do this manually as well)
Residuals and more information:
>summary(lm(Y~X))
Coefficient of determination (R²) → how much percentage does the x variable
explain about the variation in the y variable.
* hay que hacer la raiz cuadrada del valor, no me acuerdo para qué
RESIDUAL PLOT
5. Correlation and regression 1
>fit.res <- resid(fit)
>plot(fit.res~midterm,ylab=”Residuals”,main=”Residual Plot”
(midterm=dependent variable)
>abline(0,0) → to plot horizontal line
Anlaysing influential cases
FIRST WAY
>identify(Y~X) (And you can click on the cases you want to identify as outliers
in the scatter plot con el ratón del ordenador). You will see clearly which are
the outliers.
Press "esc" TWICE to get out of the screen and it will show the values x that
you have clicked on.
SECOND WAY
>plot(Y~X, col=”lightblue”)
>text(Y~X, labels=rownames(dataset))
To eliminate the cases, now assign to a new data frame the info
without the outliers:
>exam_new <- exam[-c(2,18),]
(exam_new = the new dataset, exam = your previous dataset)
(2 and 18 are the outliers you identified previously)
5. Correlation and regression 2