RESEARCH III
CONCEPT PAPER
SIMPLE LINEAR REGRESSION
KATHLEN MAE E. MAROLLANO
SELWYN KEITH VICENTE
CLINT JOSH ACOSTA
MR. JADE B. MONTEJO
RESEARCH ADVISER
SPECIAL SCIENCE PROGRAM
VALENCIA NATIONAL HIGH SCHOOL
VALENCIA CITY, BUKIDNON
INTRODUCTION
Simple linear regression is a statistical method on summarizing and of the study of the
relationships between two continuous (quantitative) variables: the independent variable,
denoted x, is also regarded as the predictor or explanatory variable; the dependent variable
denoted y, is also regarded as the response or outcome variable. The other terms are mentioned
only to make us aware of them if ever we encounter them. Simple linear regression gets its
adjective "simple," because it concerns the study of only one predictor variable. If there are more
than two variables present, it is already referred to as “multiple linear regression”.
When you want to understand the association between two variables (one continuous
dependent variable and one independent variable), that is when you can use simple linear
regression. There are three major uses for simple linear regression: determining the strength of
predictors, forecasting an effect, and trend forecasting. It is not, however, used to interpret the
cause-and-effect relationship between variables.
Nevertheless, linear regression is an extremely simple method. It is very easy and
intuitive to use and understand. A person with only the knowledge of high school mathematics
can understand and use it. In addition, it works in most cases. Even when it doesn’t fit the data
exactly, we can use it to find the nature of the relationship between the two variables (D. Jain,
2009).
However, when using this statistical tool, it only presents the relationships between
dependent and independent variables that are linear. It assumes there is a straight-line
relationship between them which is incorrect sometimes. Also, linear regression is very sensitive
to the anomalies in the data (or outliers). Take for example most of your data lies in the range 0-
10. If due to any reason only one of the data items comes out of the range, say for example 15,
this significantly influences the regression coefficients. Another disadvantage is that if we have a
number of parameters than the number of samples available then the model starts to represent the
noise rather than the relationship between the variables (D. Jain, 2009).
ASSUMPTIONS
When using Simple Linear Regression, you have to consider:
1. Your two variables should be measured at the continuous level
2. There needs to be a linear relationship between the two variables.
3. There should be no significant outliers. An outlier is an observed data point that has a
dependent variable value that is very different to the value predicted by the regression
equation.
4. You should have the independence of observations
5. Your data needs to show homoscedasticity, which is where the variances along the line
of best fit remain similar as you move along the line.
SAMPLE PROBLEM
Example of simple linear regression
In a statistics course, we want to see if there is any relationship between study time and scores in
the mid-semester exam.
STEPS IN HYPOTHESIS TESTING:
1. Identify the Independent and Dependent Variable:
The independent variable is the study time while the dependent variable is the exam
scores.
2. State the Null Hypothesis (Ho) and Alternative Hypothesis (Ha):
There is a significant relationship between study time and exam scores.
There is no significant relationship between study time and exam scores.
3. Level of Significance: =.000 a =p<0.05, N=20)
4. Test Statistics: Simple linear regression
5. Steps in hypothesis testing:
First of all, we entered the sample data in the SPSS.
Then, we graphed the data using scatter plot to see if it has a linear relationship
Step 1
Select "Analyze -> Regression -> Linear".
A new window
pops out.
Step 2
From the list on the left, select the variable "Exam score" as "Dependent" and the variable
"Hours" as the "Independent(s)".
Step 3
After clicking “OK”, the results now pop out in the "Output" window.
Step 4
The results shows up, and now we can interpret it.
From B in the third table, since the p-value is .000, the relationship between study hours
and exam scores is significant. From A in the second table, the correlation coefficient, R, is
0.827. Therefore, we can conclude that study hours is positively correlated with exam score and
the relationship is very strong (R is positive and is very close to 1). From C in the last table, we
can conclude that on average, for every one hour a student study, he/she gets 2.391 more marks
in the exam.
Our alternative hypothesis is accepted which states that, there is a significant relationship
between the study time and scores in the mid-semester exam. The null hypothesis was rejected
that states otherwise.