0% found this document useful (0 votes)

7 views17 pages

Task 3 Multiple Linear Regression

This document describes multiple linear regression, which analyzes the relationship between a dependent variable and multiple independent variables through equations. It explains that multiple linear regression is a statistical technique for testing hypotheses and causal relationships between variables. It also describes the conditions that must be met to apply this method and the methodological process that includes parameter estimation, hypothesis testing, and model fit evaluation.

Uploaded by

ScribdTranslations

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views17 pages

Task 3 Multiple Linear Regression

Uploaded by

ScribdTranslations

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

MULTIPLE LINEAR REGRESSION

What is multiple linear regression?

It is possible to analyze the relationship between two or more variables through equations,
what is called multiple regression or multiple linear regression.

Multiple linear regression is the great statistical technique for testing hypotheses.
and causal relationships.

Conditions that must be met to apply regression

multiple linear

1. The dependent variable (outcome) must be ordinal or scale, that is to say, that
the categories of the variable have internal order or hierarchy, for example:
income level, weight, number of children, etc.
The independent variables (causes) must be ordinal or scalar.
3. There are other conditions such as: the independent variables cannot be
highly correlated with each other, the relationships between the causes and the
results must be linear, all variables must follow thedistribution
normaland they must have equal variances. These conditions are not so
strict and there are ways to handle the data if it is breached.

Other criteria that must be met will be the following:

ƒ
To have numerical sense.
There should be no repeated or redundant variables.
The variables introduced in the model must have a certain
theoretical justification.
The relationship between explanatory variables in the model and cases must be
from 1 to 10 at a minimum.
The relationship of the explanatory variables with the dependent variable must
to be linear, that is, proportional.

Multiple Linear Regression Analysis

This allows us to establish the relationship that occurs between a variable

dependent variable Y and a set of independent variables (X1, X2, ... XK).

Multiple linear regression analysis, unlike simple regression, is closer to

real analysis situations since phenomena, facts, and processes
Social matters, by definition, are complex and, consequently, must be explained.
to the extent possible due to the series of variables that, directly and indirectly,
they participate in its realization.

From multiple linear regression analyses we can:

Identify which independent variables (causes) explain a variable
dependent (result).
Compare and verify causal models.
Predicting values of a variable, that is, based on some characteristics
to predict approximately a behavior or state.

The multiple linear regression model

The multiple linear regression model is identical to the linear regression model.
simple, with the only difference being that more explanatory variables appear:

Simple regression model:

Multiple regression model:

Another way to find it:

Meaning of the parameters:

β0 = Mean value of the response variable when X1= ... = Xk= 0.

Very often, the parameter β0 does not have an intuitive interpretation of interest.

βj= Measures the average variation that the response variable undergoes when Xj
increase by one unit (j = 1, ..., k).

The intuitive interpretation of βj (j = 1, ..., k) is always very interesting.

Hypothesis
In order to obtain and use statistical tools that allow us to
to make objective and reasoned decisions, we need the model to
adjust to certain hypotheses. These initial hypotheses of the model
they are the following:

Normality: The observations Yi follow a Normal distribution.

Linearity: The mean values of the response variable depend

Linearly from the values of X1, ... Xk: E[Yi] = β0 + β1x1i + ... + βjxji + ...
βkxki.

Homogeneity or equality of variances (homoscedasticity): V (Yi) = σ 2.

The observations are independent.

All these hypotheses can be briefly expressed in the following way:

Yi∼ N(β0 + β1x1i+ ... + βjxyes+ ... + βkxwhoindependent.

Absence of multicollinearity: There are no linear relationships between the variables.

explanatory X1, ..., Xk.

The absence of multicollinearity constitutes a completely new hypothesis and

its meaning is as follows:

On one hand, if any of the explanatory variables were a linear combination of the
others, the model, obviously, could be simplified. But that is not the most
important.

The practical importance of demanding absence of multicollinearity arises from the fact
that, if any of the explanatory variables is strongly correlated with
Others, distortions in the results can occur.

It is important that these initial hypotheses of the model are met.

(approximately) so that the conclusions we obtain are not a
barbarity.

At this point, the question can be addressed as to whether we have enough

data (sufficient sample information) to address the statistical analysis of this
model. The basic rule for responding to this is very easy to remember (and to
understand): in general, we will need at least as much data as parameters
we want to estimate in the model. In this model, we have:

Number of data = n
Number of parameters = k + 2

Therefore, we need at least n = k + 2 data sets.

Methodology:
The methodology or work plan that we will follow in the analysis
the statistic of a multiple regression model is as follows:

Diagnosis of the initial hypotheses of the model.

Point estimate of the model parameters.

Confidence intervals for estimating model parameters.

Hypothesis contrasts.

Analysis of variance.

(6) Evaluation of the fit provided by the adjusted regression model.

Another type of Methodology:

For selection of multiple linear regression models based on methods

multiobjective

The selection proposal of MRLM.

4.1. Algorithm.

The MERLIND algorithm (Non-Dominated Linear Regression Models) is based on

the operating principles of commitment programming using the L metric1y
L∞because it allows to narrow down the efficient set by providing a range of non- solutions
balanced and balanced respectively; it also uses the approach of
lexicographic meta programming, to ensure that the solutions do not
certain levels of achievement deteriorate. The steps of the
algorithm.

Step 0. Initialization.
a. Start from a dependent variable Y, and from a set Xkof
independent variables suggested by at least one theory that
explain the phenomenon.
b. Ask the analyst:
b1. ¿Cuál es el nivel de significación a utilizar para las pruebas estadísticas?: 1% o
5%.

b2What are the expected signs in multiple regression for the k

coefficients of each independent variable?
b3Is there any theoretical restriction to satisfy among the coefficients? In case
If the answer is affirmative, indicate the restriction(s).

Go to step 1.

Step 1. Model generation.

a. Optionally generate a set of transformed variables (logarithms,

differences, delays or combinations of these) based on the original data.

b. Generate the 2k-1 possible combinations of variables including the variables

transformed if applicable.

c. Estimate the corresponding regression models including or not the

intercept

d. Record the observed results of the following selection criteria for

each generated model, according to the following indicators:

• Observed signs of the coefficients: positive or negative.

• Hypothesis test on individual coefficients: p-value of the t statistic.
Global significance test: p-value of the F statistic.
Hypothesis testing for restricted models: global F and critical F.
Adjusted coefficient of determination: R value2adjusted.
• Durbin-Watson test: DW value and p-value.
AC and PAC bars outside or inside the confidence bands.
White Heteroscedasticity Test with error term: value of n·R2and p-
value.
• Asymmetry and kurtosis contrast of Jarque-Bera: value of the JB statistic and p-
value.
• Normality test: Kolmogorov-Smirnov: K-S statistic value and p-value.
Multicollinearity test: variance inflation factor value.
• Test of multicollinearity: Condition index.
• Mallow's Cp criterion: value of Cp.
Schwarz's CIS criterion: value of the CIS.

Go to step 2.

Step 2. Decision matrix.

a. For each model, reflect the score associated with the degree of compliance
each criterion, for this use the point conversion matrix shown in
Table 3.
b. Organize the results of the model scores in a matrix of Zij
as shown in Table 2.

Step 3. Normalized distances.

Set the ideal value at 3 points and the anti-ideal at 1 point.

b. Calculate the normalized distances:

c. Add the normalized distances of the k=7 block tests

criteria:
c.1. Theoretical coherence:

for i=1,2,…,m and j=1

c.2. Statistical coherence:

for i=1,2,…,m and j=2,3,…,5

c.3. Autocorrelation:

for i=1,2,…,m and for j=6,7

c.4. Heteroscedasticity:

for i=1,2,…,m and j=8

c.5. Normality:

for i=1,2,…,m and j=9,10

c.6. Multicollinearity:

6 for i=1,2,…,m and j=11,12

c.7. Other criteria:

Go to step 4.

Step 4. Commitment set.

a. For the set L1, the following lexicographic achievement function is defined: those
indices provide the multiple regression equations that meet the
theoretical coherence, they have a minimal global distance and simultaneously,
provide a balanced solution in the rest of the criteria.

Examples:

1.-'One wants to estimate the food expenses of a family.' based on

the information provided by the predictor variables X1income
monthly2"number of family members". To do this, a collection is made of a
simple random sample of 15 families whose results are in the table
attached (The expenditure and income are given in hundreds of thousands of pesetas).

Expenditure Income Size Expense Income Size

043 21 3 129 89 3

031 11 4 035 24 2
032 09 5 035 12 4

046 16 4 078 47 3

125 62 4 043 35 2

044 23 3 047 29 3

052 18 6 038 14 4

029 10 5

The data in matrix form:

From this data, it is obtained that:

Therefore:

Where from:
The linear regression model obtained is:

From this equation, the predictions and the associated residues are obtained.
the sample observations. For the first
observation is obtained:

Reasoning this way for all the sample points yields:

Calculation of scR

The scR can also be calculated in the following way

t = tY - tXtY = yi2 - 0 yi - 1 yix1i - 2 yix2i =

= 5'7733 - . 8'070 - 0'149 . 32'063 - 0'077 . 28'960

The confidence intervals for the model parameters are calculated at 90%.
For the variance, 2

~ 122

< < 21'0298

< 2 < 0.0138

The variance of the model estimators is

from where it is deduced that

0.00816 = 0.0903

0' 000099 = 0.0099

= 0' 00040 = 0'0201

Confidence interval for 0

t12 . 0'0903 <-0'160 -0<t12 . 0'0903

0'321 < 0< 0'001

Confidence interval for1(income)
t12 . 0'0099 is less than 0'149 - 1<t12 . 0'0099

0'1314 <1< 0'1666

HireH0 1= 0, "the income variable does not influence" (individual t-test contrast)

Confidence interval for2(size)

t12 . 0'0201 < 0077
' - 2<t12 . 0'0201

0'0412 <2< 0'1128

HireH0 2= 0, "the variable size does not influence" (individual t-test contrast)

ANOVA table,

where from
ANOVA Table

Sources of Sum of Degrees of Variances

variation Squares Freedom
scE (for the 13595 2 e2 = 0.6797
model)
scR (Residual) 00721 12 R2 = 0.0060
scG (Global) 14316 14 y2 = 0.1023

With this data, the following joint F contrast is obtained.

The joint contrast of the Findica clearly indicates the influence of the model on the
variable response. Therefore, from the individual contrasts and the whole
deduce the influence of each of the two regressor variables and the influence
joint of the model.
Now the individual contrast of F is calculated with respect to
variablex2size, contrast which is equivalent to the individual contrast of
To do this, the regression of the expenditure variable is obtained with respect to the variable
income

the ANOVA table of this model is

ANOVA Table

Sources of Sum of Degrees of Variances

Variation Squares Freedom
scE (income) 1'2716 1 2=
1'2716
e
scR (residual) 0’1600 13 R
2 = 0'0123
scG (global) 1'4316 14 y = 0 1022
2 '

The incremental variability due to the diameter variable is

this value indicates how much the variability explained by the model increases when
introduce the size variable.

To contrast the influence or not of this variable, the statistic is used.

which gives the same p-value as in the individual test of lat (there are small
differences due to the responses).

Calculation of correlation coefficients:

The coefficient of determination,

The multiple correlation coefficient

The coefficient of determination adjusted for the number of degrees of freedom.

The simple correlation coefficient between the expenditure and income variables,

This coefficient is a measure of the linear relationship between the variables.

expenses and income. It can also be calculated from the coefficient of
determination of the following regression

The ANOVA table of the model is:

Sources of Sum of Degrees of Variances

Income Squares Freedom
scE (income) 1'2716 1 e2 = 1'2716
scR (residual) 0'1600 13 R2 = 0.0123
scG (global) 1,4316 14 y2 = 0.1022
Similarly, the simple correlation coefficient between the variables expenditure and
size is,

Partial correlation coefficient between the variables spending and income.

Another more complex way to calculate this coefficient is as follows: it is obtained

the following regressions are performed and the residuals are saved,

= 0.6713 - 0.0363 size + expense. Size.

Size 5'5923 - 07615 + and income. Size.

Now the partial correlation coefficient between the variables spending and income is
obtains the simple correlation coefficient between the spending variables.
Size and income. Size.

this coefficient measures the relationship between the spending and income variables free of the
influence of the size variable.
Similarly, it is obtained that:

Estimation of the conditional mean.

Estimate the average spending on food for a family with an income of dex1 =
3'0 and a size of x2 = 4.
This is:
”.
Applying the regression model:

The influence value associated with the data h = =

= = 0'07649

nh = = 13'073
The variance of the estimator is

And a 90% confidence interval for mh is

Prediction of an observation.
The Pérez family has an income of dex1 = 3.0 and a size of x2 = 4. This
What will be the food expense?
Applying the estimated regression model

The variance of the prediction is:

= R2 0.0060 0.0065

= 0'0803

And a 90% prediction interval is

Some interesting charts that help to solve the problem are the following:
Partial graphs of the components

Residual plots
2.- A measurement on 12 individuals allows us to know data about their weight,
altura, contorno de cintura (en cm.) y su edad.

We are going to adjust a new linear regression model (multiple, in this case) that
Incorporate the information from these new variables. First, we are going to create
two numerical vectors, one for each new variable

> cintura <- c(62, 75, 60, 71, 66, 62, 79, 74, 70, 66, 71, 69)
> age <- c(25, 31, 29, 64, 44, 41, 37, 35, 34, 29, 19, 50)

And we are going to group the information related to the 4 variables we have.
in a data frame that we will name 'datos':
data <- [Link](weight, height, waist, age)

Let's check that, indeed, the data frame we have created contains the
information about the 4 variables:

head(data)
peso altura cintura edad
1 74 168 62 25
2 92 196 75 31
3 63 170 60 29
4 72 175 71 64
5 58 162 66 44
6 78 169 62 41

Next, we adjust the multiple linear regression model.

reg_lin_mul <- lm(weight ~ height + waist + age)

summary(reg_lin_mul)
Call:
lm(formula = weight ~ height + waist + age)
Residuals:
Min 1Q Median 3Q Max
-7.5822 -2.8758 -0.6746 2.6828 9.9842
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -78.03017 35.37744 -2.206 0.0585 .

Height Waist Age

0.93629 -0.13261 -0.09672
0.34941 0.60578 0.15806
2.680 -0.219 -0.612
0.0279* 0.8322 0.5576

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.024 on 8 degrees of freedom
Multiple R-squared: 0.7464 Adjusted R-squared: 0.6513
F-statistic: 7.85 on 3 and 8 DF, p-value: 0.009081

The model could be written as follows:

Both the interpretation and the verification of the significance of the parameters
They are performed in a similar way to the case when there is only one variable.
independent. Likewise, the validation is carried out in the same way as for
simple linear regression.
Regarding graphical representations, graphs can be represented of
dispersion of the dependent variable with respect to each of the variables
independents through the command plot, as shown earlier.

Bibliografía:
[Link]
[Link]
al_multiple_3.pdf
[Link]
Notes/[Link]
Methodology for model selection of... (PDF Download Available).
Available from:
[Link]
ion_of_multiple_linear_regression_models_based_on_multi-objective_methods
I am
[Link]

Political Constitution of the Republic of Guatemala Art. 1-46 Illustrated
No ratings yet
Political Constitution of the Republic of Guatemala Art. 1-46 Illustrated
17 pages
The Multiple Altars of Modernity
No ratings yet
The Multiple Altars of Modernity
4 pages
Initial Written Document RECOGNITION OF COMMON-LAW MARRIAGE
No ratings yet
Initial Written Document RECOGNITION OF COMMON-LAW MARRIAGE
4 pages
Stolen Life (Legendary Tales)
No ratings yet
Stolen Life (Legendary Tales)
12 pages
ISRAEL -2
No ratings yet
ISRAEL -2
2 pages
Causality Graphic Organizer
100% (1)
Causality Graphic Organizer
2 pages
62-0117 Presuming wmb
No ratings yet
62-0117 Presuming wmb
40 pages
return and recurrence
No ratings yet
return and recurrence
15 pages
Delf B1 2021 -1
No ratings yet
Delf B1 2021 -1
1 page
Practical Activity 2_Group Dynamics Circuit Games
No ratings yet
Practical Activity 2_Group Dynamics Circuit Games
5 pages
Degree +18+ +Rosicrucian Knight+ +Fourth Part Copy
No ratings yet
Degree +18+ +Rosicrucian Knight+ +Fourth Part Copy
66 pages
Instruction 334 for Inspection, Operation, and Testing of Water Pumps Co...-1
No ratings yet
Instruction 334 for Inspection, Operation, and Testing of Water Pumps Co...-1
9 pages
HUMANITIES PASSIONS AND VICES
No ratings yet
HUMANITIES PASSIONS AND VICES
6 pages
MONOGRAPH Personal Image
No ratings yet
MONOGRAPH Personal Image
22 pages
Activity Axis 2 (1)
No ratings yet
Activity Axis 2 (1)
6 pages
Individual report of the student with learning difficulties
No ratings yet
Individual report of the student with learning difficulties
1 page
History of the Panimavida Hot Springs
No ratings yet
History of the Panimavida Hot Springs
13 pages
SEQUENCE 12. OVERVIEW OF THE PERIOD
No ratings yet
SEQUENCE 12. OVERVIEW OF THE PERIOD
4 pages
Theories of Learning
No ratings yet
Theories of Learning
118 pages
READING COMPREHENSION GUIDE ANALOGIES
No ratings yet
READING COMPREHENSION GUIDE ANALOGIES
14 pages
Tzitzimime
No ratings yet
Tzitzimime
2 pages
The Constitution of Moca (1858)
No ratings yet
The Constitution of Moca (1858)
12 pages
Activity 3 - Identify a business opportunity through the problem
No ratings yet
Activity 3 - Identify a business opportunity through the problem
8 pages
The House on the Hill summary
No ratings yet
The House on the Hill summary
1 page
There Are Moments - Clarice Lispector
No ratings yet
There Are Moments - Clarice Lispector
1 page
Final project on Attention to Diversity
No ratings yet
Final project on Attention to Diversity
17 pages
Biography of Characters Who Contributed to the Social Sciences (Angel Ramírez)
No ratings yet
Biography of Characters Who Contributed to the Social Sciences (Angel Ramírez)
9 pages
EVALUATIVE ACTIVITY 4 - OPERATIONS MANAGEMENT
100% (1)
EVALUATIVE ACTIVITY 4 - OPERATIONS MANAGEMENT
7 pages
the-sources-and-receivers-of-light-lesson-4
No ratings yet
the-sources-and-receivers-of-light-lesson-4
3 pages
02 What is your dream?
No ratings yet
02 What is your dream?
4 pages
9 Instituties
No ratings yet
9 Instituties
21 pages
Chap 1-5 Rodaje 12 Abm
No ratings yet
Chap 1-5 Rodaje 12 Abm
60 pages
Risk-Based Cleaning in API Facilities
No ratings yet
Risk-Based Cleaning in API Facilities
24 pages
Linear Programming Basics & Solutions
No ratings yet
Linear Programming Basics & Solutions
1 page
Body Image: Jasmine Fardouly, Ronald M. Rapee
No ratings yet
Body Image: Jasmine Fardouly, Ronald M. Rapee
7 pages
Integration of UTAUT Model in Thailand Cashless Payment System Adoption: The Mediating Role of Perceived Risk and Trust
No ratings yet
Integration of UTAUT Model in Thailand Cashless Payment System Adoption: The Mediating Role of Perceived Risk and Trust
26 pages
Leonie Huddy - David O. Sears - Jack S. Levy - The Oxford Handbook of Political Psychology (2013, Oxford University Press, USA)
100% (1)
Leonie Huddy - David O. Sears - Jack S. Levy - The Oxford Handbook of Political Psychology (2013, Oxford University Press, USA)
1,005 pages
The Subjectivity of Credible Threats - SCorrea
No ratings yet
The Subjectivity of Credible Threats - SCorrea
8 pages
Negative Leadership Traits of Three Kings
No ratings yet
Negative Leadership Traits of Three Kings
17 pages
Impact of Non Financial Rewards On Employee Attitude Performance in The Workplace A Case Study of Business Institutes of Karachi
No ratings yet
Impact of Non Financial Rewards On Employee Attitude Performance in The Workplace A Case Study of Business Institutes of Karachi
6 pages
Service Crew Performance in Convenience Stores
No ratings yet
Service Crew Performance in Convenience Stores
57 pages
Critical Inquiry in Management Studies
No ratings yet
Critical Inquiry in Management Studies
17 pages
Presentacion Sillon Ecologico
No ratings yet
Presentacion Sillon Ecologico
21 pages
ML Unit 5
No ratings yet
ML Unit 5
30 pages
Narasinganavar Family Field Visit Report
No ratings yet
Narasinganavar Family Field Visit Report
35 pages
National Geographic Subscription Recommendation
No ratings yet
National Geographic Subscription Recommendation
3 pages
Zainab H. Mboga Ma Com 2018
No ratings yet
Zainab H. Mboga Ma Com 2018
105 pages
Organizational Development Questionnaire
60% (5)
Organizational Development Questionnaire
3 pages
CrossCultural Management An Introduction 1st Edition David C Thomas Kerr Inkson ISBN 9781071800027 Full Version
No ratings yet
CrossCultural Management An Introduction 1st Edition David C Thomas Kerr Inkson ISBN 9781071800027 Full Version
330 pages
Exam Prep The Palgrave Handbook of Operations Research Sad Salhi HQ File Comprehensive
No ratings yet
Exam Prep The Palgrave Handbook of Operations Research Sad Salhi HQ File Comprehensive
316 pages
The Adaptation of Translation Psychological Test As A Necessary Condition For Ensuring The Reliability of Scientific Research
No ratings yet
The Adaptation of Translation Psychological Test As A Necessary Condition For Ensuring The Reliability of Scientific Research
4 pages
Moral Pajak, Pemeriksaan, Sanksi, Kepatuhan Pajak Umkm Peran Moderasi Kesadaran Pajak
No ratings yet
Moral Pajak, Pemeriksaan, Sanksi, Kepatuhan Pajak Umkm Peran Moderasi Kesadaran Pajak
15 pages
Aviation Management Module-03
No ratings yet
Aviation Management Module-03
32 pages
FDA Postmarketing Drug Safety Guide
No ratings yet
FDA Postmarketing Drug Safety Guide
14 pages
The Wright-Constantine Structured Cultural Interview and Integration of Culture Into Case Conceptualization
No ratings yet
The Wright-Constantine Structured Cultural Interview and Integration of Culture Into Case Conceptualization
26 pages
Extension
No ratings yet
Extension
6 pages
It Will Be OK A Story of Empathy Kindness and Friendship Lisa Katzenberger Digital Access
No ratings yet
It Will Be OK A Story of Empathy Kindness and Friendship Lisa Katzenberger Digital Access
401 pages
HACKATHON
No ratings yet
HACKATHON
6 pages
Sexo y Estatura
No ratings yet
Sexo y Estatura
5 pages
Insurance Textbook Chapter 1 Answers - 3rdedition
No ratings yet
Insurance Textbook Chapter 1 Answers - 3rdedition
3 pages