0% found this document useful (0 votes)
29 views50 pages

Regression

The document provides an overview of regression analysis, a statistical technique for predicting the behavior of a dependent variable (Y) based on an independent variable (X). It explains the importance of correctly identifying dependent and independent variables, the use of scatter plots to assess data fit, and the calculation of regression lines and coefficients. Additionally, it distinguishes between simple and multiple linear regression, highlighting their applications in various fields such as business and social research.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views50 pages

Regression

The document provides an overview of regression analysis, a statistical technique for predicting the behavior of a dependent variable (Y) based on an independent variable (X). It explains the importance of correctly identifying dependent and independent variables, the use of scatter plots to assess data fit, and the calculation of regression lines and coefficients. Additionally, it distinguishes between simple and multiple linear regression, highlighting their applications in various fields such as business and social research.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

REGRESSION

ANALYSIS

Classified as Internal
Regression

⊹ is a statistical technique used in predicting the behavior of


a variable.

⊹ is essentially, a CAUSE-AND-EFFECT relationship

⊹ is the process of predicting variable Y (dependent


variable) using variable X (independent variable)

2
Classified as Internal
DEPENDENT (Y) & INDEPENDENT (X) VARIABLES

Independent Variable
Dependent Variable (Response
(Predictor)
Variable)

-is a value that is affected -is a value that the


when the value of the researcher can freely change
independent variable to test its effect on the
changes. dependent variable.

EXAMPLES: INDEPENDENT VARIABLE


DEPENDENT VARIABLE
- Time spent in studying causes a change in Test Score
INDEPENDENT DEPENDENT
VARIABLE
- (Nutrient intake) influences (growth of an infant) VARIABLE

• In correlation and regression analyses, it is important that independent and dependent variables are correctly
identified to make correct conclusions. Interchanging these 2 variables most of the time renders the scatter
3
plots and correlation coefficients meaningless.
Classified as Internal
Remember the acronym: DRY-MIX

D-dependent variable M-manipulated variable or the


R-responding variable one that is changed in an
Y-is the axis on which the experiment
dependent or responding I-independent variable
variable is graphed (the X-is the axis on which the
vertical axis independent variable is graphed
(the horizontal axis

4
Classified as Internal
EXERCISE:
Answer: Determine the Answer: Determine the
VARIABLES DV & IV
1. study time (a) and grades (b) 1. a. IV, b. DV
2. salary (a) and educational attainment (b) 2. a. DV, b. IV
3. sales (a) and advertising & marketing (b) 3. a. DV, b. IV
4. effect of soda (a) on blood sugar level (b) 4. a. IV, b. DV
5. test score (a) and tutoring (b) 5. a. DV, b. IV
6. investment choices (a) and risk appetite (b) 6. a. DV, b. IV
7. fasting (a) and body weight (b) 7. a. IV, b. DV
8. effect of phone usage (a) before bedtime on 8. a. IV, b. DV
number of hours of sleep (b)
9. employment rate (a) and minimum wage (b) 9. a. DV, b. IV
10. effect of caffeine (a) on sleep (b) 10. a. IV, b. DV

5
SCATTER PLOTS

• Regression Analysis requires interval and ratio-level data.

• To see if your data FITS the models of regression, it is wise


to conduct a scatter plot analysis.

• Reason: Regression analysis assumes linear relationship. If


you have a curvilinear relationship or no relationship,
regression analysis is of little use.

6
Classified as Internal
Types of Lines

7
Classified as Internal
Regression Line
⊹ is the best straight-line description of the plotted points
and can use to describe the association between the
variables. If all the data points fall exactly on the line, then
the line is 0 and you have a perfect relationship.

Regression
• Calculates the “best-fit” line (regression line) for a certain set of
data.
• The regression line makes the sum of the squares of the residuals
smaller than for any other line
• Regression minimizes residuals
8
Classified as Internal
Regression
⊹ we are able to construct a best fitting straight line to the
scatter diagram points and then formulate a regression
equation in the form of:

Where:
y = dependent variable
x = independent variable
b = intercept
a = slope
9
Classified as Internal
THINGS TO REMEMBER:

⊹ Regression still focuses on association, not causation.


⊹ Association is a necessary prerequisite for inferring
causation but also:
⊹ The independent variable must be preceded by the
dependent variable in time.
⊹ The two variables must be plausibly lined by a theory
⊹ Competing independent variables must be eliminated

10
Classified as Internal
Regression Coefficient

⊹ is the slope of the regression line and tells you what nature
of the relationship between the variable is.
⊹ how much change in the independent variables is
associated with how much change in the dependent
variable.
⊹ the larger the regression coefficient, the more change.
⊹ however, the regression coefficient is not a good indicator
for the strength of relationship because two scatter plots
with very different dispersions could produce the same
regression line.
11
Classified as Internal
2 BASIC FORMS OF
REGRESSION
ANALYSIS
Classified as Internal
FORMS OF REGRESSION ANALYSIS
Example: You want to know or predict what influences a person’s salary.

EDUCATIONAL
ATTAINMENT

PURPOSE:
WORKING  Measurement of the influence
HOURS SALARY of one or more variables on
another variable.
 Prediction of a variable by one
Dependent or more other variables
AGE Variable (Criterion)

Independent Variables Source: DATAtab. (2021, February 8). Simple and Multiple Linear Regression [Video].
13
(Predictors) https://www.youtube.com/watch?v=29rjWClT_3U
LINEAR & MULTIPLE LINEAR REGRESSION
Simple Linear Multiple Linear
 Do the weekly working hours and the age of employees have
 Does the weekly working time have an influence on the
an influence on their hourly salary?
hourly salary of the employees?

LINEAR REGRESSION MULTIPLE LINEAR REGRESSION


- also called simple linear regression. - considers more than one quantitative and qualitative
-considers one quantitative and independent variable (x) to variable to predict a quantitative and dependent variable y.
predict the other quantitative, but dependent, variable y.

Source: DATAtab. (2021, February 8). Simple and Multiple Linear Regression [Video].
https://www.youtube.com/watch?v=29rjWClT_3U 14
SIMPLE LINEAR REGRESSION

⊹ The goal of simple linear regression is to predict the value of


dependent variable based on independent variable.

⊹ The greater the linear relationship between the independent


variable and the dependent variable is, the more accurate is
the prediction.

⊹ What makes linear regression a powerful statistical tool is


that it allows to quantify by what quantity the
response/dependent variable varies when the
explanatory/independent variable increases by one unit.
15
Classified as Internal
SIMPLE LINEAR REGRESSION
Example: A teacher wants to know if there is a relationship between the amount of time her students spent
working on a social studies report and the grade each student received. She surveyed 10 students and recorded the
data below.
No. of 1. CONDUCT A SCATTER PLOT ANALYSIS 2. COMPUTE FOR X & Y VALUES
Stu- Stu- No. of Grade
hours Grade hours XY X^2 Y^2
dent dent (Y)
worked worked (X)
1 5 90 450 25 8100
1 5 90 2 3 80 240 9 6400
2 3 80 3 3.5 80 280 12.25 6400
3 3.5 80 4 1 60 60 1 3600
4 1 60 5 4.5 90 405 20.25 8100
5 4.5 90 6 1 70 70 1 4900
6 1 70 7 3 75 225 9 5625
7 3 75 8 4 85 340 16 7225
8 4 85 9 2 70 140 4 4900
9 2 70 10 2.5 75 187.5 6.25 5625
10 2.5 75 Σ 29.50 775.00 2,397.50 103.75 60875

16
SIMPLE LINEAR REGRESSION
Example: A teacher wants to know if there is a relationship between the amount of time her students spent
working on a social studies report and the grade each student received. She surveyed 10 students and recorded the
data below.
3. Find the Values of a & b.
(775)(103.75)-(29.50(2,397.50) 10(2,397.50)-(29.50)(775)
a = b =
10(103.75)-(29.50)^2 10(103.75)-(29.50)^2
9,680.00 1,112.50
a = b =
167.25 167.25 Therefore, y = 57.88 + 6.65x
a = 57.88 b = 6.65

What if the student made her report in SC for What if a student wanted to get a grade of 95? How
6 hours? What would be her estimated grade? many hours should she spend making her report?
Answer: 5.58 hours
Answer: 97.78% 95 = 57.88 + 6.65x
Y = 57.88 + 6.65 (6) 95-57.88 = 6.65x
X = 5.58
17
SIMPLE LINEAR REGRESSION
 Let's say a hospital ask you to give them an estimate Estimated length of age
based on the age of the person of how long the person stay
will stay in the hospital after a surgery.
ỹ = a + bx
ỹ = 1.2 + 0.14x
ỹ = 1.2 + 0.14 (33)
ỹ = 5.82 days

Formula to calculate value of a & b is as follows:

Source: DATAtab. (2021, February 8). Simple and Multiple Linear Regression [Video]. 18
https://www.youtube.com/watch?v=29rjWClT_3U
SIMPLE LINEAR REGRESSION

Regression error

ỹ = a + bx + ε -is the difference


between the true
value and the
estimated value

Source: DATAtab. (2021, February 8). Simple and Multiple Linear Regression [Video]. 19
https://www.youtube.com/watch?v=29rjWClT_3U
MULTIPLE LINEAR REGRESSION

⊹ Unlike simple linear regression, multiple linear regression


can include two or more independent variables.
⊹ The goal is to estimate one variable based on several other
variables.
⊹ The variable to be estimated is called the dependent
variable (x-criterion). The variables that are used for
prediction are called independent variables (y-predictors).
⊹ Multiple linear regression is often used in empirical social
research. In both areas, it is of interest to find out what
influence different factors have on a variable.

20
Classified as Internal
MULTIPLE LINEAR REGRESSION

SIMPLE LINEAR REGRESSION MULTIPLE LINEAR REGRESSION


ỹ = a + bx

 The coefficients are interpreted similarly to the linear regression equation.

 If all independent variables are 0, the value a is obtained.

 If the independent variable changes by one unit, the associated coefficient b indicates by
how much the dependent variable changes.

Source: DATAtab. (2021, February 8). Simple and Multiple Linear Regression [Video]. 21
https://www.youtube.com/watch?v=29rjWClT_3U
USE IN ORGANIZATION

⊹ In the field of business, regression is widely used.


Businessmen are interested in predicting future
production, consumption, investment, prices, profits,
sales, etc. So, the success of a businessman depends on
the correctness of the various estimates that he is
required to make. It is also used in sociological study and
economic planning to find the projections of population,
birth rates, death rates, etc.

22
Classified as Internal
HYPOTHESIS in regression

⊹ Problem Statement: Is there a relationship between


education and income in the number of children in a
family?

⊹ Null Hypothesis: There is no relationship between


education and income of respondents and the number of
children in families.

23
Classified as Internal
LITERATURE SAMPLES-REGRESSION

24
25
26
Classified as Internal
Linear Regression
Example Problem
Simple Linear regression equation
Y = Β0 + Β1X

Classified as Internal
Example 1:
You have to study the relationship between monthly e-commerce
sales and online advertising costs. You have the survey results for 7
online stores for the last year.

Online Store Monthly E-commerce Sales (in 1000s) Y Online Advertising Dollars (1000 s) X

1 368 1.7

2 340 1.5

3 665 2.8

4 954 5

5 331 1.3

6 556 2.2

7 376 1.3

Classified as Internal
1. Method #1- Scatter Chart with a Trendline

a. Select the two columns of the dataset (x and y), including headers b. Click on ‘Insert’ and expand the dropdown for ‘Scatter
Chart’ and select ‘Scatter’ thumbnail (first one)

Classified as Internal
c. Now a scatter plot will appear, and under the Design, look for d. Now in the ‘Format Trendline’ pane on the right, select
Select Data ,and select data source will appear, Click the Add and ‘Linear Trendline’ and ‘Display Equation on Chart’
then input a Series name and the X-value and Y-value(don’t
include the label). Then Click OK. And another OK. To do this,
right-click on any data point and select ‘Add Trendline.

Classified as Internal
E. Select ‘Display Equation on Chart and Display R-
squared value on chart’.

Classified as Internal
Method #2: Using Data Analysis

A regression dialog box will appear. Select the Input Y range and Input X range.
In the case of multiple linear regression, we can select more columns of
independent variables.
Check the ‘Labels’ box to include headers.
Choose the desired ‘output’ option.
a. Click on ‘Data Analysis’ in the ‘Data’ tab and select Regression
Select the ‘residuals’ checkbox and click ‘OK. (Optional)

Classified as Internal
Now our regression analysis output will be created in a new
worksheet, stating the Regression Statistics, ANOVA, residuals
and coefficients.

Classified as Internal
Steps to perform this linear regression in SPSS.

Step 2: Perform linear regression.


Step 1: Enter the data.
Click Analyze > Regression > Linear... on the top menu, as shown
below:

Classified as Internal
Drag the variable Monthly E-commerce into the box labelled Dependent.
Drag the variables Online Advertising into the box labelled Independent(s).
Then click OK.

Classified as Internal
Result

Classified as Internal
Example 2:
You have to examine the relationship between the age and price for used cars
sold in the last year by a car dealership company.
Car Age (in years) Price (in dollars)
4 6300
4 5800
5 5700
5 4500
7 4500
7 4200
8 4100
9 3100
10 2100
11 2500
12 2200

Classified as Internal
Steps to perform this linear regression in SPSS.

Step 2: Perform linear regression.


Step 1: Enter the data.
Click Analyze > Regression > Linear... on the top menu, as shown below:

Classified as Internal
Drag the variable Car Price into the box labelled
Dependent. Drag the variables Car age the box labelled
Independent(s). Then click OK.

Classified as Internal
Result:

Classified as Internal
Result using Excel

Classified as Internal
MULTIPLE
REGRESSION

formula y = b0 + b1*x1 +
b2*x2 + …..

Classified as Internal
Example Problem:

The ABC Corporation is opening new retail sales outlets and they want to staff these stores with
employees most likely to be successful at selling the products. To meet this goal, ABC decides
to study the sales staff at existing stores to determine if intelligence and extroversion (i.e., a
friendly and outgoing personality) predict the sales performance of current employees. ABC's
logic is that if intelligence and extroversion predict sales performance, then a good strategy for
new stores is to hire intelligent extroverts for the sales positions.

To conduct the study, all current retail sales employees at existing stores take psychological tests
designed to measure intelligence and extroversion. Also, past sales performance data is checked
for each employee. In the end, there are three scores for each salesperson:
1. an intelligence score (on a scale of 50-low intelligence to 150-high intelligence),
2. an extroversion score (on a scale of 15-low extroversion to 30-high extroversion), and
3. sales performance is expressed as the average dollar amount sold per week.

Classified as Internal
Sales Person Intelligence Extroversion $ Sales/Week
1 89 21 2625
2 93 24 2700
3 91 21 3100
4 122 23 3150
5 115 27 3175
6 100 18 3100
7 98 19 2700
8 105 16 2475
9 112 23 3625
10 109 28 3525
11 130 20 3225
12 104 25 3450
13 104 20 2425
14 111 26 3025
15 97 28 3625
16 115 29 2750
17 113 25 3150
18 88 23 2600
19 108 19 2525
20 101 16 2650

Classified as Internal
Method #2: Using Data Analysis

A regression dialog box will appear. Select the Input Y range (Sales/Week) and Input X range
(intelligence and Extroversion). In the case of multiple linear regression, we can select more
a. Click on ‘Data Analysis’ in the ‘Data’ tab and select Regression columns of independent variables Check the ‘Labels’ box to include headers.
Choose the desired ‘output’ option.

Classified as Internal
DATA ANALYSIS RESULT IN EXCEL

Classified as Internal
Steps to perform this multiple linear regression in
SPSS.

Step 1: Enter the data. Step 2: Perform multiple linear regression.

Classified as Internal
Drag the variable Sales/week into the box labelled Dependent.
Drag the variables intelligence and Extroversion into the box
labelled Independent(s). Then click OK.

Classified as Internal
Step 3: Interpret the output.

Classified as Internal
THANK YOU!!!!

50
Classified as Internal

You might also like