0% found this document useful (0 votes)

161 views16 pages

Regression Analysis Is

1. Regression analysis is a statistical process used to estimate relationships between variables, with a focus on the relationship between a dependent variable and one or more independent variables. It helps understand how the dependent variable changes when independent variables are varied. 2. Regression models involve unknown parameters, independent variables, and a dependent variable. The goal is to estimate the unknown parameters so the model can predict the dependent variable from the independent variables. A minimum of independent measurements is needed to estimate parameters. 3. When there are more measurements than parameters, regression analysis provides statistical predictions about the parameters and dependent variable. The excess information from additional measurements is used to assess uncertainty in predictions and parameter estimates.

Uploaded by

Siva Prasad Pasupuleti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

161 views16 pages

Regression Analysis Is

Uploaded by

Siva Prasad Pasupuleti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Regression analysis

In statistical modeling,
regression analysis is
a statistical process for
estimating
the
relationships
among
variables. It includes
many techniques for
modeling and analyzing
several variables, when
the focus is on the
relationship be-tween a
dependent variable and
one
or
more
independent
variables
(or 'predictors). More
specifically, regression
analysis
helps
one
understand how the
typical value of the
dependent variable (or
'criterion
variable')
changes when any one
of
the
independent
variables is varied, while
the other independent
variables are held fixed.
Most
com-monly,
regression
analysis
estimates the conditional
ex-pectation
of
the
dependent
variable
given the indepen-dent
variables that is, the
average value of the
dependent
variable
when the independent
variables are fixed. Less
commonly, the focus is
on a quantile, or other
location pa-rameter of
the
conditional
distribution
of
the
dependent
variable
given the independent
variables. In all cases,
the estimation target is a
function
of
the
independent vari-ables
called the regression
function. In regression
analy-sis, it is also of
interest to characterize
the variation of the
dependent
variable

around the regression

function which can be
described
by
a
probability distribution. A
related
but
distinct
approach is necessary
condition

analysis

[1]

(NCA), which estimates

the maximum (rather
than aver-age) value of
the dependent variable
for a given value of the
independent
variable
(ceiling line rather than
central line) in order to
identify what value of the
independent variable is
necessary
but
not
sucient for a given
value of the dependent
variable.

Regression analysis is
widely
used
for
prediction
and
forecasting, where its
use has substantial
overlap with the field of
machine
learning.
Regression analysis is
also
used
to
understand
which
among
the
independent variables
are related to the
dependent
variable,
and to explore the
forms
of
these
relationships.
In
restricted
circumstances,
regression
analysis
can be used to infer
causal
relation-ships
between
the
independent
and
dependent variables.
However this can lead
to illusions or false
relationships,
so
[2]

caution is advisable;
for
example,
correlation does not
imply causation.
Many techniques for
carrying out regression
analysis have been
developed.
Familiar

methods such as linear

regres-sion
and
ordinary least squares
regression
are
parametric, in that the
regression function is
defined in terms of a finite
number
of
unknown parameters
that are estimated from
the
data.
Nonparametric
regression refers to
tech-niques that allow
the regression function
to lie in a speci-fied set
of functions, which may
be infinite-dimensional.
The performance of
regression analysis
methods in prac-

tice depends on the

form of the data
generating pro-cess,
and how it relates to
the
regression
approach
be-ing
used. Since the true
form of the datagenerating pro-cess
is
generally
not
known,
regression
analysis often depends to some extent
on
making
assumptions
about
this process. These
assumptions
are
sometimes testable if
a sucient quantity of
data is available.
Regression mod-els
for prediction are
often useful even
when the assumptions are moderately
violated,
although
they may not per-form
optimally. However, in
many
applications,
espe-cially with small
eects or questions of
causality based on
observational
data,
regression methods
can give mis-leading
[3][4]

results.
In a narrower sense,
regression may refer
specifically to the
estimation
of
continuous response
variables, as opposed
to
the
discrete
response
variables
used in classification.
[5]

The case of a
continuous
output
variable may be more
specifically referred to
as metric regression to distinguish it
from
related
problems.

[6]

1 History

The earliest form of

regression was the
method
of
least
squares, which was
published
by
[7]

Legendre in 1805,
and by Gauss in
[8]

1809. Legendre and

Gauss both applied
the method to the
problem
of
determining,
from
astro-nomical
observations,
the
orbits of bodies about
the
Sun
(mostly
comets, but also later
the
then
newly
discovered
minor
planets).
Gauss
published a further
development of the
theory
of
least
[9]

squares in 1821,
including a ver-sion of
the
GaussMarkov
theorem.
The term regression
was coined by Francis
Galton
in
the
nineteenth century to
describe a biological
phe-nomenon.
The
phenomenon was that
the heights of descendants
of
tall

ancestors
tend
to
regress down towards
a normal average (a
phenomenon
also
known as regression
toward the mean).

[10]

[11]

For
Galton,
regression had only
this biological meaning,
[12][13]

but his work was

later ex-tended by
Udny Yule and Karl
Pearson to a more
general
statistical
[14][15]

context.
In the
work of Yule and Pearson,
the
joint
distribution
of
the
response and explanatory
variables
is
assumed
to
be
Gaussian.
This
assump-tion
was
weakened by R.A.
Fisher in his works of
1922 and 1925.

[16][17]

[18]

Fisher assumed
that the conditional
distribution
of
the
response variable is
Gaussian, but the joint
distribution need not
be. In this respect,
Fishers assumption is
closer
to
Gausss
formulation of 1821.

2
In the 1950s and
1960s,
economists
used electromechanical desk calculators to
calculate regressions.
Before
1970,
it
sometimes took up to
24 hours to receive
the result from one
regression.

[19]

Regression
methods
continue to be an area
of active re-search. In
recent decades, new
methods have been
de-veloped for robust
regression, regression
involving
cor-related
responses such as
time series and growth

curves, regression in
which the predictor
(independent variable)
or response variables
are curves, images,
graphs,
or
other
complex data objects,
regression
methods
accommodat-ing
various
types
of
missing
data,
nonparametric
regression,
Bayesian
methods
for
regression, regression
in which the predictor
variables
are
measured with error,
re-gression with more
predictor variables than
observations,
and
causal inference with
regression.

2 Regressi
on
models

l
e
s

Regression models
involve the following
variables:

The unknown
parameters,
denoted as ,
which
may
represent
scalar or
vector.
T
h

X
.

a
a
T
h
e

d
i
n
d
e
p
e
n
d

e
p
e
n
d
e
n
t

e
n
t

v
a
r

v
a
r
i
a

i
a
b
l
e

approaches

regres-sion
analysis
be

since the system

of
.
In

cannot

performed:

defining

various

fields

dierent

application,

equations
the

regression model
is

under-

determined, there

terminologies are used

are not enough

in place of dependent

data to recover .

and

independent

variables.

A regression model
relates Y to a function
of X and .

f(X;

The approximation is
usually formalized as
E(Y | X) = f(X, ). To
carry out regression
analysis, the form of
the function f must
be
specified.
Sometimes the form
of this function is
based on knowledge
about
the
relationship between
Y and X that does
not rely on the data.
If
no
such
knowledge
is
available, a flexible
or convenient form
for f is chosen.

Assume now that

the
vector
of
unknown
parameters is of
length k. In order to
perform
a
regression analysis
the
user
must
provide information
about
the
dependent variable
Y:
If N data points of
the form (Y, X)
are

observed,

where N < k, most

classical

variable Y (also

2
REGRESSION MODELS

known as method
of least squares).

reduces
to
solving a set of
N
equations
with
N
unknowns
(the
elements of ),
which has a
unique so-lution
as long as the X
are
linearly
independent. If
f is nonlinear, a
solution
may
not exist, or
many solu-tions
may exist.
The
most
common
situation
is
where N > k data
points
are
observed. In this
case, there is
enough
information
in
the
data
to
estimate
a
unique value for
that best fits
the data in some
sense, and the
regression
model
when
applied to the
data can be
viewed as an
overdetermined
system in .

In the last case, the

regression analysis
provides the tools
for:
1. Finding a solution
for

unknown

parameters that
will, for example,
minimize

the

distance between
the measured and
predicted

values

of the dependent

2. Under

certain

statistical
assumptions,

the

regression analysis
uses the surplus of
information
provide
information
the

to
statistical
about
unknown

parame-ters and
predicted values of
the

dependent

variable

2.1 Necessary
number of
independe
nt
measurem
ents
Consider a regression
model which has
three
unknown
parameters, 0, 1,
and 2 . Suppose an
experimenter
performs
10
measurements all at
exactly the same
value of independent
variable vector X
(which contains the
in-dependent
variables X1, X2, and
X3). In this case,
regres-sion analysis
fails to give a unique
set
of
estimated
values for the three
unknown parameters;
the experimenter did
not provide enough
information. The best
one can do is to
estimate the average
value
and
the
standard deviation of
the
dependent
variable Y. Similarly,
measuring at two
dierent values of X
would give enough

data for a re-gression

with two unknowns,
but not for three or
more unknowns.
If the experimenter had
performed
measurements at three
dierent values of the
independent
vector
regression

variable
then

three

unknown

parameters in .

In the case of
general
linear
regression,
the
above state-ment is
equivalent to the
requirement
that
T
the matrix X X is
invertible.

analysis

would provide a unique

set of estimates for the

If exactly N = k data
points are observed,
and the function f is
linear, the equations
Y = f(X, ) can be
solved exactly rather
than approximately.
This

2.2 Statistical
assumptions
When the number of
measurements, N, is
larger than the number
of unknown parameters,
k, and the measurement

the inference
prediction.
errors are normally
distributed then the
excess
of
information contained in
(N

k)
measurements
is
used
to
make
statistical predictions
about the unknown
param-eters.
This
excess of information
is referred to as the
degrees of freedom of
the regression.

3 Underlyi
ng
assumpti
ons
Classical
assumptions for
regression analysis
include:

The sample is
representative
of
the
population for

3 In linear regression,

the
model
specification is that
the
de-pendent

4 Linear
regression
Main article: Linear
regression
See simple linear
regression for a
derivation of these
formulas and a
numerical example

variable, yi is a linear
combination of the
param-eters
(but
need not be linear in
the
independent
variables).
For
example, in simple
linear regression for
modeling
n
data
points there is one
independent variable:
xi

and

two

parameters, 0 and 1 :

The error is a
random variable
with a mean of
zero conditional
on

the

explanatory

variance

the

error.
The variance of
the

error

constant

across

obser-vations
(homoscedasticity

variables.

). If not, weighted

The independent

least squares or

variables

other

are

methods

measured with no

might instead be

er-ror.

used.

this

(Note:

not

If
so,

modeling may be
done

instead

using

errors-in-

variables

model

techniques).

The independent
variables
(predictors)

are

linearly
independent, i.e.
it is not possible
to express any
predictor

linear
combination

the others.
The

errors

are

uncorrelated, that
is, the variance
covariance matrix
of the errors is
diagonal
each
element

and
non-zero
is

the

These are sucient

conditions for the leastsquares esti-mator to
possess
desirable
properties; in particular,
these
assumptions
imply
that
the
parameter
estimates
will
be
unbiased,
consistent,
and
ecient in the class of
linear
unbiased
estimators.
It
is
important to note that
actual
data
rarely
satisfies
the
assumptions. That is,
the method is used
even
though
the
assumptions are not
true. Vari-ation from
the assumptions can
sometimes be used as
a measure of how far
the model is from being
useful. Many of these
assumptions may be
relaxed
in
more
advanced treatments.

Reports of statistical
analyses
usually
include analyses of
tests on the sample
data and methodology
for
the
fit
and
usefulness
of
the
model.

yi =

Assumptions
include
the geometrical support

variables.

[20]

of the variables.
Independent
and
dependent
variables
often refer to values
measured
at point
locations. There may
be spatial trends and
spatial autocorrelation
in the variables that
violate
statistical
assumptions
of
regression.
Geographic
weighted
regression
is
one
technique to deal with
[21]

such data.
Also,
variables may include
values aggre-gated by
areas. With aggregated
data the modifiable
areal unit problem can
cause
extreme
variation in regression

0 + 1xi + "i; i
= 1; : : : ; n:

multiple

linear

regression, there are

several

indepen-dent

variables or functions
of

independent
2

Adding a term in xi to
the preceding
regression gives:

yi =

2
0 + 1xi + 2x i

+ "i; i = 1; : : : ; n:
This is still linear
regression; although
the expression on the
right hand side is
quadratic
in
the
independent variable
xi , it is linear in the
parameters

and

In both cases, "i is

an error term and
the subscript i indexes a particular
observation.
[22]
parameters.
When Returning
our
analyzing
data attention
to the
aggregated by polit-ical
straight line case:
boundaries,
postal
codes or census areas Given a random
results may be very sample from the
we
distinct with a dierent population,
choice of units.
estimate
the
population
parameters
and
obtain the sample
linear re-gression
model:

ybi =

1xi:

The residual, ei = yi
ybi , is the dierence
between the value of
the
dependent
variable predicted by
the model, ybi , and
the true value of the
dependent variable, yi
. One method of
estimation is ordinary
least squares. This
method
obtains
parameter estimates
that minimize the sum

of squared residuals,
[23][24]
SSE,
also
sometimes de-noted
RSS:

n
2

SSE =

e i:
i=1

Minimization of this
function results in a
set
of
normal
equations, a set of
simultaneous linear
equations in the
parameters, which
are solved to yield

the parameter esti-

b b

mators, 0; 1 .
In the case of
simple regression,
the formulas for the
least
squares
estimates are

=
c

4 LINEAR REGRESSION
th

where xij is the i observation on the j independent variable. If the first independent variable takes the value 1 for
all i, xi1 = 1, then 1 is called the regression intercept.
The least squares parameter estimates are obtained from
p normal equations. The residual can be written as

"i = y i

1xi1

The normal equations are

Illustration of linear regression on a data set.

X X
ij

i=1

where x is the mean (average) of the x values

and y is the mean of the y values.
Under the assumption that the population error
term has a constant variance, the estimate of that
variance is given by:

^" =
This is called the mean square error (MSE) of the regression. The denominator is the sample size reduced by the
number of model parameters estimated from the same
data, (n-p) for p regressors or (n-p 1) if an intercept is
used.

[25]

In this case, p=1 so the denominator is n 2.

The standard errors of the parameter estimates

are given by

ik k

In matrix notation, the normal equations are written as

(X X)

=
where the ij element of X is xij, the i element of the column vector Y is yi, and the j element of

is np, Y is n1, and

. Thus X

is p1. The solution is

^ 0 = ^"

^ 1 = ^"

4.2 Diagnostics
Main article: Regression diagnostics
See also: Category:Regression diagnostics.
Once a regression model has been constructed, it may
be important to confirm the goodness of fit of the
model and the statistical significance of the estimated
parameters. Commonly used checks of goodness of fit
include the R-squared, analyses of the pattern of
residuals
and
hypoth-esis
testing.
Statistical
significance can be checked by an F-test of the overall
fit, followed by t-tests of individual parameters.

Under the further assumption that the population

error term is normally distributed, the researcher
can use these estimated standard errors to create
confidence intervals and conduct hypothesis tests
about the population param-eters.

4.1 General linear model

For a derivation, see linear least squares
For a numerical example, see linear regression
In the more general multiple regression model,
there are p independent variables:

Interpretations of these diagnostic tests rest heavily on the

model assumptions. Although examination of the residuals can be used to invalidate a model, the results of a ttest or F-test are sometimes more dicult to interpret if the
models assumptions are violated. For example, if the
error term does not have a normal distribution, in small
samples the estimated parameters will not follow normal
distributions and complicate inference. With relatively
large samples, however, a central limit theorem can be
invoked such that hypothesis testing may proceed using
asymptotic approximations.

4.3 Limited dependent variables

The phrase limited dependent is used in econometric

yi =

1xi1 + 2xi2 +

+ pxip + "i;

statistics for categorical and constrained variables.

The response variable may be non-continuous (limited to

lie on some subset of the real line). For binary (zero or
one) variables, if analysis proceeds with least-squares linear regression, the model is called the linear probability
model. Nonlinear models for binary dependent variables
include the probit and logit model. The multivariate pro-bit
model is a standard method of estimating a joint relationship between several binary dependent variables and
some independent variables. For categorical variables
with more than two values there is the multinomial logit.
For ordinal variables with more than two values, there are
the ordered logit and ordered probit models. Censored
regression models may be used when the dependent variable is only sometimes observed, and Heckman correction type models may be used when the sample is not
randomly selected from the population of interest. An
alternative to such procedures is linear regression based
on polychoric correlation (or polyserial correlations) between the categorical variables. Such procedures dier in
the assumptions made about the distribution of the variables in the population. If the variable is positive with low
values and represents the repetition of the occurrence of
an event, then count models like the Poisson regression or
the negative binomial model may be used instead.

is that a linear-in-variables and linear-in-parameters relationship should not be chosen simply for computational
convenience, but that all available knowledge should be
deployed in constructing a regression model. If this
knowledge includes the fact that the dependent variable
cannot go outside a certain range of values, this can be
made use of in selecting the model even if the observed
dataset has no values particularly near such bounds. The
implications of this step of choosing an appropriate functional form for the regression can be great when extrapolation is considered. At a minimum, it can ensure that any
extrapolation arising from a fitted model is realistic (or in
accord with what is known).

6 Nonlinear regression
Main article: Nonlinear regression
When the model function is not linear in the
parameters, the sum of squares must be minimized
by an iterative pro-cedure. This introduces many

complications which are summarized in Dierences

between linear and non-linear least squares

Interpolation and extrapolation

Regression models predict a value of the Y variable given

known values of the X variables. Prediction within the
range of values in the dataset used for model-fitting is
known informally as interpolation. Prediction outside this
range of the data is known as extrapolation. Performing
extrapolation relies strongly on the regression assumptions. The further the extrapolation goes outside the data,
the more room there is for the model to fail due to differences between the assumptions and the sample data or
the true values.
It is generally advised that when performing extrapola-tion,
one should accompany the estimated value of the dependent variable with a prediction interval that represents
the uncertainty. Such intervals tend to expand rapidly as
the values of the independent variable(s) moved outside
the range covered by the observed data.

For such reasons and others, some tend to say that

it might be unwise to undertake extrapolation.

[26]

However, this does not cover the full set of modelling errors that may be being made: in particular, the assumption of a particular form for the relation between Y and X. A
properly conducted regression analysis will include an
assessment of how well the assumed form is matched by
the observed data, but it can only do so within the range of
values of the independent variables actually available.
This means that any extrapolation is particularly reliant on
the assumptions being made about the structural form of
the regression relationship. Best-practice advice here

7 Power

and

sample

size

calcula-tions
There are no generally agreed methods for relating the
number of observations versus the number of independent variables in the model. One rule of thumb suggested
n

by Good and Hardin is N = m , where N is the sam-ple

size, n is the number of independent variables and m is
the number of observations needed to reach the de-sired
precision if the model had only one independent variable.
[27]

For example, a researcher is building a lin-ear

regression model using a dataset that contains 1000
patients ( N ). If the researcher decides that five observations are needed to precisely define a straight line ( m ),
then the maximum number of independent variables the
model can support is 4, because

8 Other methods
Although the parameters of a regression model are
usu-ally estimated using the method of least squares,
other methods which have been used include:
Bayesian methods, e.g. Bayesian linear regression
Percentage regression, for situations where reducing
percentage errors is deemed more appropriate.

11
Least absolute deviations, which is more
robust in the presence of outliers, leading to
quantile regres-sion

Signal processing

Nonparametric regression, requires a large number

Trend estimation

[28]

REFERENCES

Stepwise regression

of observations and is computationally intensive

Distance metric learning, which is learned by packages can perform various types of nonparametric and
the search of a meaningful distance metric in robust re-gression, these methods are less standardized;
a given input space.

[29]

9 Software

dierent software packages implement dierent methods,

and a method with a given name may be implemented
dier-ently in dierent packages. Specialized regression
soft-ware has been developed for use in fields such as
survey analysis and neuroimaging.

Main article: List of statistical packages

All major statistical software packages perform least
squares regression analysis and inference. Simple linear
regression and multiple regression using least squares
can be done in some spreadsheet applications and on
some calculators. While many statistical software

10 See also
Curve fitting

Estimation Theory

11 References

Forecasting
[1] Necessary Condition Analysis

Fraction of variance unexplained

[2]

Function approximation

ysis. International Journal of Forecasting (forthcoming).

28 (3): 689. doi:10.1016/j.ijforecast.2012.02.001.

Generalized linear models

Kriging (a linear least squares estimation
algorithm) Local regression

adaptive

regression

splines Multivariate normal distribution

Pearson

product-moment

[3] David A. Freedman, Statistical Models: Theory

and Prac-tice, Cambridge University Press (2005)
[4] R. Dennis Cook; Sanford Weisberg Criticism and
Influ-ence Analysis in Regression, Sociological
Methodology, Vol. 13. (1982), pp. 313361

Modifiable areal unit problem

Multivariate

Armstrong, J. Scott (2012). Illusions in Regression Anal-

correlation

coecient Prediction interval

Regression validation
Robust regression

Segmented regression

[5] Christopher M. Bishop (2006). Pattern Recognition and

Machine Learning. Springer. p. 3. Cases [...] in which
the aim is to assign each input vector to one of a finite
number of discrete categories, are called classification
problems. If the desired output consists of one or more
continuous variables, then the task is called regression.

[6] Waegeman,
Luc (2008).
learning.
doi:10.1016/j.patrec.2007.07.019.
[7]

A.M. Legendre. Nouvelles mthodes pour la dtermination

des orbites des comtes, Firmin Didot, Paris, 1805. Sur la
Mthode des moindres quarrs appears as an appendix.

[8] C.F. Gauss. Theoria Motus Corporum Coelestium

in Sec-tionibus Conicis Solem Ambientum. (1809)
[9] C.F. Gauss. Theoria combinationis observationum
er-roribus minimis obnoxiae. (1821/1823)

[10] Mogull, Robert G. (2004). Second-Semester

Applied Statistics. Kendall/Hunt Publishing
Company. p. 59. ISBN 0-7575-1181-3.
[11] Galton, Francis (1989).
tion (reprinted 1989)".
stitute of Mathematical Statistics.
doi:10.1214/ss/1177012581. JSTOR 2245330.
[12] Francis Galton. Typical laws of heredity, Nature
15 (1877), 492495, 512514, 532533. (Galton
uses the term reversion in this paper, which
discusses the size of peas.)
[13] Francis Galton. Presidential address, Section H, Anthropology. (1885) (Galton uses the term regression in this
paper, which discusses the height of humans.)
[14] Yule, G. Udny (1897). On the Theory of Correlation.

Journal of the Royal Statistical Society. Blackwell

Pub-lishing. 60 (4): 81254. doi:10.2307/2979746.
JSTOR 2979746.
Journal of the Royal Statistical Society. Blackwell
Pub-lishing.
85
(4):
597612.
doi:10.2307/2341124. JSTOR 2341124.
[15] Pearson, Karl; Yule, G.U.; Blanchard, Norman;
Lee,Alice (1903). The Law of Ancestral Heredity. Biometrika. Biometrika Trust. 2 (2): 211236.
doi:10.1093/biomet/2.2.211. JSTOR 2331683.
[16] Fisher, R.A. (1922). The goodness of fit of regression
formulae, and the distribution of regression coecients.

[17] Ronald A. Fisher (1954). Statistical Methods for

Research Workers (Twelfth ed.). Edinburgh:
Oliver and Boyd. ISBN 0-05-002170-2.
[18] Aldrich,
sion.
doi:10.1214/088342305000000331. JSTOR 20061201.

[19] Rodney Ramcharan. Regressions: Why Are

Economists Obessessed with Them? March 2006.
Accessed 2011-12-03.
[20] N. Cressie (1996) Change of Support and the Modiable
Areal Unit Problem. Geographical Systems 3:159180.

[21] Fotheringham, A. Stewart; Brunsdon, Chris;

Charlton, Martin (2002). Geographically weighted
regression: the analysis of spatially varying
relationships (Reprint ed.). Chichester, England:
John Wiley. ISBN 978-0-471-49616-8.
[22] Fotheringham, AS; Wong, DWS (1 January 1991).
The modifiable areal unit problem in multivariate
statistical analysis. Environment and Planning A.
23 (7): 1025 1044. doi:10.1068/a231025.
[23] M. H. Kutner, C. J. Nachtsheim, and J. Neter
(2004), Applied Linear Regression Models, 4th
ed., McGraw-Hill/Irwin, Boston (p. 25)
[24] N. Ravishankar and D. K. Dey (2002), A First
Course in Linear Model Theory, Chapman and
Hall/CRC, Boca Raton (p. 101)
[25] Steel, R.G.D, and Torrie, J. H.,

Principles and

Procedures of Statistics with Special Reference to the

Biological Sci-ences., McGraw Hill, 1960, page 288.

[26] Chiang, C.L, (2003) Statistical methods of analysis,

World Scientific. ISBN 981-238-310-7 - page 274
section 9.7.4 interpolation vs extrapolation
[27] Good, P. I.; Hardin, J. W. (2009). Common Errors in
Statistics (And How to Avoid Them) (3rd ed.). Hoboken,
New Jersey: Wiley. p. 211. ISBN 978-0-470-45798-6.

[28] Tofallis, C. (2009). Least Squares Percentage

Regres-sion. Journal of Modern Applied Statistical
Methods. 7: 526534. doi:10.2139/ssrn.1406472.

[29] YangJing Long (2009). Human age estimation by

met-ric learning for regression problems (PDF).
Proc. Inter-national Conference on Computer
Analysis of Images and Patterns: 7482.

12 Further reading
William H. Kruskal and Judith M. Tanur, ed.
(1978), Linear Hypotheses, International
Ency-clopedia of Statistics. Free Press, v. 1,
Evan J. Williams, I. Regression, pp. 52341.

Julian

Stanley,

II. Analysis

Variance, pp. 541554.

Lindley, D.V. (1987). Regression and
correla-tion analysis, New Palgrave: A
Dictionary of Eco-nomics, v. 4, pp. 12023.
Birkes, David and Dodge, Y., Alternative
Methods of Regression. ISBN 0-471-56881-3
Chatfield, C. (1993) Calculating Interval Fore-casts,
Journal of Business and Economic Statistics,

11. pp. 121135.

Draper, N.R.; Smith, H. (1998). Applied
Regression Analysis (3rd ed.). John Wiley.
ISBN 0-471-17082-8.
Fox, J. (1997). Applied Regression Analysis,
Linear Models and Related Methods. Sage
Hardle, W., Applied Nonparametric Regression

(1990), ISBN 0-521-42950-1

Meade, N. and T. Islam (1995) Prediction
Intervals for Growth Curve Forecasts
Journal of Forecasting, 14, pp. 413430.
A. Sen, M. Srivastava, Regression Analysis
The-ory, Methods, and Applications,
Springer-Verlag, Berlin, 2011 (4th printing).
T. Strutz: Data Fitting and Uncertainty (A practical
introduction to weighted least squares and beyond).
Vieweg+Teubner, ISBN 978-3-8348-1022-9.

Malakooti, B. (2013). Operations and

Production Systems with Multiple Objectives.
John Wiley & Sons.

13 External links
Hazewinkel, Michiel, ed. (2001), Regression
anal-ysis, Encyclopedia of Mathematics,
Springer, ISBN 978-1-55608-010-4
Earliest Uses: Regression basic history
and refer-ences
Regression of Weakly Correlated Data how
lin-ear regression mistakes can appear when
Y-range is much smaller than X-range

14 TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES

14 Text and image sources, contributors, and licenses

14.1 Text
Regression analysis Source: https://en.wikipedia.org/wiki/Regression_analysis?oldid=741763788 Contributors: Berek, Taw,
ChangChienFu, Michael Hardy, Kku, Meekohi, Jeremymiles, Ronz, Den fjttrade ankan~enwiki, Hike395, Quickbeam, Jitse Niesen,
Taxman, Samsara, Bevo, Mazin07, Benwing, Robinh, Giftlite, Bfinn, TomViza, Dratman, BrendanH, Jason Quinn, Noe, Beland, Piotrus,
APH, Israel Steinmetz, Urhixidur, Rich Farmbrough, Pak21, Paul August, Bender235, Bobo192, Cretog8, Arcadian, NickSchweitzer,
Photonique, Mdd, Jrme, Denoir, Arthena, Riana, Avenue, Emvee~enwiki, Nvrmnd, Gene Nygaard, Krubo, Oleg Alexandrov, Abanima,
Lkinkade, Woohookitty, LOL, Marc K, Kosher Fan, BlaiseFEgan, Wayward, Btyner, Lacurus~enwiki, Qwertyus, Gmelli, Salix alba,
MZMcBride, Pruneau, Mathbot, Valermos, Goudzovski, King of Hearts, Chobot, Jdannan, Krishnavedala, Wavelength,
Wimt, Afelton, Brian Crawford, DavidHouse~enwiki, DeadEyeArrow, Avraham, Jmchen, NorsemanII, Tribaal, Closedmouth, Arthur Rubin,
Josh3580, Wikiant, Shawnc, robot, Veinor, Doubleplusje , SmackBot, NickyMcLean, Deimos 28, Antro5, Cazort, Gilliam, Feinstein, Oli
Filth, Nbarth, Ctbolt, DHN-bot~enwiki, Gruzd, Hve, Berland, EvelinaB, Radagast83, Cybercobra, Krexer, CarlManaster, Nrcprm2026, G716,
Mwtoews, Cosmix, Tedjn, Friend of facts, Danilcha, John, FrozenMan, Tim bates, JorisvS, IronGargoyle, Beetstra, Dicklyon, AdultSwim,
Kvng, Joseph Solis in Australia, Chris53516, Dan1679, Ioannes Pragensis, Markjoseph125, CBM, Thomasmeeks, GargoyleMT,
Ravensfan5252, JohnInDC, Talgalili, Wikid77, Qwyrxian, Sagaciousuk, Tolstoy the Cat, N5iln, Carpentc, AntiVandalBot, Widefox,
Woollymammoth, Lcalc, JAnDbot, Goskan, Giler, QuantumEngineer, Ph.eyes, SiobhanHansa, DickStartz, JamesBWatson, Username550,
Fleagle11, Marcelobbribeiro, David Eppstein, DerHexer, Apdevries, Thenightowl~enwiki, Mbhiii, Discott, Trippingpixie, Cpiral, Gzkn, Rod57,
TomyDuby, Coppertwig, RenniePet, Policron, Bobianite, Blueharmony, Peepeedia, EconProf86, Qtea, BernardZ, TinJack, CardinalDan,
HughD, DarkArcher, Gpeilon, Franck Dernoncourt, TXiKiBoT, Oshwah, SueHay, Qxz, Gnomepirate, Sintaku, Antaltamas, JhsBot, Broadbot,
Beusson, Cremepu222, Zain Ebrahim111, Billinghurst, Kusyadi, Traderlion, Asjoseph, Petergans, Rlendog, BotMultichill, Statlearn,
Gerakibot, Matthew Yeager, Timhowardriley, Strife911, Indianarhodes, Amitabha sinha, OKBot, Water and Land, AlanUS, Savedthat,
Mangledorf, Randallbsmith, Amadas, Tesi1700, Melcombe, Denisarona, JL-Bot, Mrfebruary, Kotsiantis, Tdhaene, The Thing That Should
Not Be, Sabri76, Auntof6, DragonBot, Sterdeus, Skbkekas, Stephen Milborrow, Cfn011, Crash D 0T0, SBemper, Qwfp, Antonwg,
Sunsetsky, XLinkBot, Gerhardvalentin, Nomoskedasticity, Veryhuman, Piratejosh85, WikHead, SilvonenBot, Hess88, Addbot, Diegoful,
Wootbag, Fgnievinski, Geced, MrOllie, LaaknorBot, Lightbot, Luckas-bot, Yobot, Themfromspace, TaBOT-zerem, Andresswift,
KamikazeBot, Eaihua, Tempodivalse, AnomieBOT, Andypost, Jim1138, RandomAct, HanPritcher, Citation bot, Jyngyr, LilHelpa,
Obersachsebot, Xqbot, Statisticsblog, TinucherianBot II, Ilikeed, J04n, GrouchoBot, BYZANTIVM, Fstonedahl, Bartonpoulson, D0kkaebi,
Citation bot 1, Dmitronik~enwiki, Boxplot, Yuanfangdelang, Pinethicket, Kiefer.Wolfowitz, LittleWink, Tom.Reding, Stpasha, Di1000,
Jonkerz, Duoduoduo, Diannaa, Tbhotch, Mean as custard, RjwilmsiBot, EmausBot, Ajraddatz, RA0808, Klbrain, KHamsun, F,
Julienbarlan, Hypocritical~enwiki, Kgwet, Zfeinst, Bomazi, ChuispastonBot, 28bot, Rocketrod1960, ClueBot NG, Mathstat, MelbourneStar,
Joel B. Lewis, CH-stat, Helpful Pixie Bot, BG19bot, Marcocapelle, Giogm2000, CitationCleanerBot, Hakimo99, Gprobins, Prof. Squirrel,
Attleboro, Illia Connell, JYBot, Sinxvin, Sminthopsis84, Francescapelusi, Lugia2453, SimonPerera, Me, Myself, and I are Here,
Lemnaminor, Infiniti4, EJM86, Francisbach, Eli the King, Monkbot, Bob nau, Moorshed k, Moorshed, HelpUsStopSpam, KasparBot,
Statperson123, Ballatown, NatalieSunshine and Anonymous: 399

14.2 Images
File:Commons-logo.svg Source: https://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg License: CC-BYSA-3.0 Contribu-tors: ? Original artist: ?
File:Fisher_iris_versicolor_sepalwidth.svg
Source:
https://upload.wikimedia.org/wikipedia/commons/4/40/Fisher_iris_versicolor_ sepalwidth.svg License: CC BY-SA 3.0 Contributors:
en:Image:Fisher iris versicolor sepalwidth.png Original artist: en:User:Qwfp (origi-nal); Pbroks13 (talk) (redraw)

File:Folder_Hexagonal_Icon.svg Source: https://upload.wikimedia.org/wikipedia/en/4/48/Folder_Hexagonal_Icon.svg

License: Cc-by-sa-3.0 Contributors: ? Original artist: ?
File:Linear_regression.svg
Source:
https://upload.wikimedia.org/wikipedia/commons/3/3a/Linear_regression.svg
License: Public do-main Contributors: Own work Original artist: Sewaqu
File:People_icon.svg Source: https://upload.wikimedia.org/wikipedia/commons/3/37/People_icon.svg License: CC0
Contributors: Open-Clipart Original artist: OpenClipart
File:Portal-puzzle.svg Source: https://upload.wikimedia.org/wikipedia/en/f/fd/Portal-puzzle.svg License: Public domain Contributors: ?

Original artist: ?

14.3 Content license

Creative Commons Attribution-Share Alike 3.0

Regression Analysis - Wikipedia
No ratings yet
Regression Analysis - Wikipedia
10 pages
Overview of Regression Analysis
No ratings yet
Overview of Regression Analysis
10 pages
Regression Analysis
No ratings yet
Regression Analysis
9 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
11 pages
Ida Unit-3
No ratings yet
Ida Unit-3
34 pages
Regression Analysis Basics
No ratings yet
Regression Analysis Basics
12 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
17 pages
1.1.2simple Linear Regression
No ratings yet
1.1.2simple Linear Regression
14 pages
Unit 2
No ratings yet
Unit 2
48 pages
Regression12 5
No ratings yet
Regression12 5
6 pages
Regression Analysis Guide
No ratings yet
Regression Analysis Guide
11 pages
Ra Web
No ratings yet
Ra Web
70 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
59 pages
DISCRETE MATH Chapter-8
No ratings yet
DISCRETE MATH Chapter-8
34 pages
Intro Regression Modeling
No ratings yet
Intro Regression Modeling
11 pages
Data Analytics Unit 2
No ratings yet
Data Analytics Unit 2
13 pages
Econometrics 2
No ratings yet
Econometrics 2
27 pages
Regression: Rashid Mehmood M.Phil. (Education) 2 Semester
No ratings yet
Regression: Rashid Mehmood M.Phil. (Education) 2 Semester
22 pages
Unit 2 Data Analytics
No ratings yet
Unit 2 Data Analytics
33 pages
Regression
No ratings yet
Regression
7 pages
Regression Analysis
No ratings yet
Regression Analysis
6 pages
Regression Analysis
No ratings yet
Regression Analysis
13 pages
Slides
No ratings yet
Slides
39 pages
Regression Notes-I
No ratings yet
Regression Notes-I
10 pages
Aiml M3 C3
No ratings yet
Aiml M3 C3
37 pages
Unit 6 Machine Learning Algorithms
No ratings yet
Unit 6 Machine Learning Algorithms
13 pages
Regression Analysis
No ratings yet
Regression Analysis
14 pages
CH 5
No ratings yet
CH 5
36 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
(Revised) Simple Linear Regression and Correlation
No ratings yet
(Revised) Simple Linear Regression and Correlation
41 pages
Chapter Two: Simple Linear Regression Model: 2.1 Introduction To Regression Analysis
No ratings yet
Chapter Two: Simple Linear Regression Model: 2.1 Introduction To Regression Analysis
7 pages
Da 2
No ratings yet
Da 2
31 pages
MLT Unit 2
No ratings yet
MLT Unit 2
53 pages
Regression
No ratings yet
Regression
13 pages
Regression Analysis Basics
0% (1)
Regression Analysis Basics
14 pages
Regression Analysis
No ratings yet
Regression Analysis
10 pages
Bi - Variate Data Analysis - II Regression Analysis
No ratings yet
Bi - Variate Data Analysis - II Regression Analysis
37 pages
Aalysis
No ratings yet
Aalysis
16 pages
Notes of DA Unit-II
No ratings yet
Notes of DA Unit-II
91 pages
Definition 3. Use of Regression 4. Difference Between Correlation and Regression 5. Method of Studying Regression 6. Conclusion 7. Reference
No ratings yet
Definition 3. Use of Regression 4. Difference Between Correlation and Regression 5. Method of Studying Regression 6. Conclusion 7. Reference
11 pages
Regression Analysis Overview
No ratings yet
Regression Analysis Overview
15 pages
Regression Analysis in Machine Learning: Temperature, Age, Salary, Price
No ratings yet
Regression Analysis in Machine Learning: Temperature, Age, Salary, Price
12 pages
Unit 2 Regression
No ratings yet
Unit 2 Regression
31 pages
4 STAT-602 Regression & Correlation (Mid&Final)
No ratings yet
4 STAT-602 Regression & Correlation (Mid&Final)
22 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
9 pages
Econometrics: Linear Regression Basics
No ratings yet
Econometrics: Linear Regression Basics
52 pages
Hypothesis Testing or Inferential Testing
No ratings yet
Hypothesis Testing or Inferential Testing
23 pages
Linear Regression Analysis Intro
No ratings yet
Linear Regression Analysis Intro
13 pages
LESSON 6 (4hrs) Regression Analysis
No ratings yet
LESSON 6 (4hrs) Regression Analysis
4 pages
Unit 2-1
No ratings yet
Unit 2-1
30 pages
Third, Regression Analysis Predicts Trends and Future Values
No ratings yet
Third, Regression Analysis Predicts Trends and Future Values
2 pages
1 - Stat-701 Regression
No ratings yet
1 - Stat-701 Regression
18 pages
PSAI Unit3
No ratings yet
PSAI Unit3
36 pages
Linear Regression Models Guide
No ratings yet
Linear Regression Models Guide
42 pages
8.2 Regression
No ratings yet
8.2 Regression
16 pages
Notes 2
No ratings yet
Notes 2
22 pages
Department of Computer Applications (2018-2020) Assignment: Subject
No ratings yet
Department of Computer Applications (2018-2020) Assignment: Subject
1 page
Tata Recruitment Interview Details 2020
No ratings yet
Tata Recruitment Interview Details 2020
2 pages
ADIKAVI NANNAYA University Colleges
No ratings yet
ADIKAVI NANNAYA University Colleges
1 page
Harshad Mehta Scam: 1992 Fraud
No ratings yet
Harshad Mehta Scam: 1992 Fraud
4 pages
The Japanese Model
No ratings yet
The Japanese Model
3 pages
Unethical Practices in Indian Business
No ratings yet
Unethical Practices in Indian Business
8 pages
Biogas Production Analysis
No ratings yet
Biogas Production Analysis
14 pages
Calorific Value of Biogas Measurement
No ratings yet
Calorific Value of Biogas Measurement
3 pages
Simple Correlation Converted 23
No ratings yet
Simple Correlation Converted 23
5 pages
Mini Project Front Page Template
No ratings yet
Mini Project Front Page Template
1 page
Quantitative Analysis and Business Decisions: Decision Making Under Risk
No ratings yet
Quantitative Analysis and Business Decisions: Decision Making Under Risk
4 pages
Foreign (A) in North American English: Variation and Change in Loan Phonology
No ratings yet
Foreign (A) in North American English: Variation and Change in Loan Phonology
41 pages
Unit 6 - Jurisprudence I
No ratings yet
Unit 6 - Jurisprudence I
30 pages
Data Science Revision Sheet Question Bank
100% (1)
Data Science Revision Sheet Question Bank
6 pages
Primary Data Collection Methods Explained
No ratings yet
Primary Data Collection Methods Explained
13 pages
College Football Research Presentation
No ratings yet
College Football Research Presentation
12 pages
Ten Steps - Full
No ratings yet
Ten Steps - Full
20 pages
Does Female Descendent Entrepreneur's Self-Compassion and Financial Literacy Matter For Succession Success
No ratings yet
Does Female Descendent Entrepreneur's Self-Compassion and Financial Literacy Matter For Succession Success
25 pages
Amlaku Thesis
No ratings yet
Amlaku Thesis
79 pages
Chapter 3 Sago
No ratings yet
Chapter 3 Sago
15 pages
Jio vs Airtel: Chennai Network Analysis
No ratings yet
Jio vs Airtel: Chennai Network Analysis
19 pages
Manual Vectorworks 12 Ingles PDF
100% (1)
Manual Vectorworks 12 Ingles PDF
55 pages
Anova Dua Jalur Dengan Interaksi - Ika Hardina Putri Febrian PDF
No ratings yet
Anova Dua Jalur Dengan Interaksi - Ika Hardina Putri Febrian PDF
13 pages
A Methodology For Project Risk Analysis Using Bayesian Belief Networks Within A Monte Carlo Simulation Environment
No ratings yet
A Methodology For Project Risk Analysis Using Bayesian Belief Networks Within A Monte Carlo Simulation Environment
243 pages
Chapter 1 2 Group 5
No ratings yet
Chapter 1 2 Group 5
13 pages
Research Design Essentials
No ratings yet
Research Design Essentials
15 pages
Outline Comm 18
No ratings yet
Outline Comm 18
6 pages
Script Research Forum
No ratings yet
Script Research Forum
10 pages
DOE (Design of Experiment)
No ratings yet
DOE (Design of Experiment)
18 pages
Brand Leverage & Extensions Study
No ratings yet
Brand Leverage & Extensions Study
104 pages
Microfinance and Its Role in Rural Development: A Study in Sendhatira Area
No ratings yet
Microfinance and Its Role in Rural Development: A Study in Sendhatira Area
29 pages
Reseach Mba 2nd Sem
0% (1)
Reseach Mba 2nd Sem
44 pages
Factors That Relate To Good and Poor Handwriting: Heidi Camhil!, Jane Case-Smith
No ratings yet
Factors That Relate To Good and Poor Handwriting: Heidi Camhil!, Jane Case-Smith
8 pages
Cost Management of Engineering Projects
100% (1)
Cost Management of Engineering Projects
2 pages
Literature Review Psychology Dissertation
100% (2)
Literature Review Psychology Dissertation
20 pages
Research Report Topic 8
No ratings yet
Research Report Topic 8
7 pages
Author Checklist ARRIVE
No ratings yet
Author Checklist ARRIVE
2 pages
Scaini 2016
No ratings yet
Scaini 2016
8 pages
CHAPTER 17 - Audit Sampling For Substantive Tests
No ratings yet
CHAPTER 17 - Audit Sampling For Substantive Tests
4 pages
Individual Competency Assessment Questionnaire ICA Logo
No ratings yet
Individual Competency Assessment Questionnaire ICA Logo
2 pages
Appendix 4 - Supplier Quality Questionnaire
No ratings yet
Appendix 4 - Supplier Quality Questionnaire
1 page

Regression Analysis Is

Uploaded by

Regression Analysis Is

Uploaded by

Regression analysis

around the regression

(NCA), which estimates

methods such as linear

tice depends on the

The earliest form of

1809. Legendre and

but his work was

since the system

terminologies are used

are not enough

Assume now that

where N < k, most

In the last case, the

data for a re-gression

would provide a unique

These are sucient

regression, there are

In both cases, "i is

the parameter esti-

The normal equations are

Illustration of linear regression on a data set.

where x is the mean (average) of the x values

In this case, p=1 so the denominator is n 2.

The standard errors of the parameter estimates

In matrix notation, the normal equations are written as

is np, Y is n1, and

is p1. The solution is

Under the further assumption that the population

4.1 General linear model

Interpretations of these diagnostic tests rest heavily on the

4.3 Limited dependent variables

statistics for categorical and constrained variables.

The response variable may be non-continuous (limited to

complications which are summarized in Dierences

Interpolation and extrapolation

Regression models predict a value of the Y variable given

For such reasons and others, some tend to say that

by Good and Hardin is N = m , where N is the sam-ple

For example, a researcher is building a lin-ear

Nonparametric regression, requires a large number

of observations and is computationally intensive

dierent software packages implement dierent methods,

Main article: List of statistical packages

Fraction of variance unexplained

ysis. International Journal of Forecasting (forthcoming).

28 (3): 689. doi:10.1016/j.ijforecast.2012.02.001.

Generalized linear models

splines Multivariate normal distribution

[3] David A. Freedman, Statistical Models: Theory

Modifiable areal unit problem

Armstrong, J. Scott (2012). Illusions in Regression Anal-

coecient Prediction interval

[5] Christopher M. Bishop (2006). Pattern Recognition and

A.M. Legendre. Nouvelles mthodes pour la dtermination

[8] C.F. Gauss. Theoria Motus Corporum Coelestium

[10] Mogull, Robert G. (2004). Second-Semester

Journal of the Royal Statistical Society. Blackwell

[17] Ronald A. Fisher (1954). Statistical Methods for

[19] Rodney Ramcharan. Regressions: Why Are

[21] Fotheringham, A. Stewart; Brunsdon, Chris;

Procedures of Statistics with Special Reference to the

[26] Chiang, C.L, (2003) Statistical methods of analysis,

[28] Tofallis, C. (2009). Least Squares Percentage

[29] YangJing Long (2009). Human age estimation by

Variance, pp. 541554.

11. pp. 121135.

(1990), ISBN 0-521-42950-1

Malakooti, B. (2013). Operations and

14 TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES

14 Text and image sources, contributors, and licenses

File:Folder_Hexagonal_Icon.svg Source: https://upload.wikimedia.org/wikipedia/en/4/48/Folder_Hexagonal_Icon.svg

14.3 Content license

You might also like