Basic Econometrics Notes
Basic Econometrics Notes
The two-variable linear model, or simple regression analysis, is used for testing hypotheses about the
relationship between a dependent variable Y and an independent or explanatory variable X and for
prediction. Since the points are unlikely to fall precisely on the line, the exact linear relationship
includes a random disturbance, error, or stochastic term, ui. This results in the following equation:
5) Homoskedasticity: The error term has constant variance given any value of the explanatory
variable.
The ordinary least-squares method (OLS) is a technique for fitting the ‘‘best’’ straight line to the sample
of XY observations. It involves minimizing the sum of the squared (vertical) deviations of points from
the line:
where Yi refers to the actual observations, and Y^i refers to the corresponding fitted values. Their
difference is the residual (ei).
Solving the above minimization problem, we get the following results:
Examples
Significance Testing
We need to test whether are obtained parameters are statistically significant. Statistical significance is
a measure of whether your research findings are meaningful. In order to test for the statistical
significance of the parameter estimates of the regression, the variance of the estimates of b0 and b1 is
required:
A t test is a statistical test that is used to compare the means of two groups. It is often used in
hypothesis testing to determine whether a process or treatment actually has an effect on the population
of interest, or whether two groups are different from one another.
To perform a t-test of significance on a parameter estimate of the regression equation, the following
steps can be performed:
1) Calculate the t-statistic using the given formula:
Here, β^ is the estimate calculated, β0 is the value we are testing against, and SEβ^ is the standard
error of the estimate. In most cases the value of β0 is taken a 0, to test if the value should be
included in the model or not.
2) The tscore is compared against the critical values of the t-distribution at different confidence
levels. A confidence interval is a range of values that is likely to contain an unknown population
parameter. If you draw a random sample many times, a certain percentage of the confidence
intervals will contain the population mean. This percentage is the confidence level, denoted by
α.
3) If the tscore is greater than the critical value, then the null hypothesis (that the estimate’s value is
β0) is rejected. Two types of tests can be conducted: a twin-tail test (shown in the figure above),
where H1: β ≠ 0, and a single-tail test (shown below), H1: β > 0 or β < 0.
Example
Significance Testing using P-values
For the p-value approach, the likelihood (p-value) of the numerical value of the test statistic is compared to
the specified significance level (α) of the hypothesis test. The p-value corresponds to the probability of
observing sample data at least as extreme as the actually obtained test statistic. Small p-values provide
evidence against the null hypothesis. The smaller (closer to 0) the p-value, the stronger is the evidence
against the null hypothesis.
If the p-value is less than or equal to the specified significance level α, the null hypothesis is rejected;
otherwise, the null hypothesis is not rejected. In other words, if p ≤ α, reject H0; otherwise, if p > α do not
reject H0.
In consequence, by knowing the p-value any desired level of significance may be assessed. For example, if
the p-value of a hypothesis test is 0.01, the null hypothesis can be rejected at any significance level larger
than or equal to 0.01. It is not rejected at any significance level smaller than 0.01. Thus, the p-value is
commonly used to evaluate the strength of the evidence against the null hypothesis without reference to
significance level.
The following table provides guidelines for using the p-value to assess the evidence against the null
hypothesis:
The closer the observations fall to the regression line (i.e., the smaller the residuals), the greater is the
variation in Y ‘‘explained’’ by the estimated regression equation. The total variation in Y is equal to
the explained plus the residual variation:
Example
Properties of OLS
Multiple regression analysis is used for testing hypotheses about the relationship between a dependent
variable Y and two or more independent variables X and for prediction. The three-variable linear
regression model can be written as:
Assumptions: The assumptions of SLR are also applicable here. The additional assumption is that there
is no exact linear relationship between the X values, i.e., No perfect collinearity.
Example
Significance Testing
The method to test the significance is same as the one for simple linear regression. In order to test for
the statistical significance of the parameter estimates of the multiple regression, the variance of the
estimates is required.
Once the standard errors are calculated, t-test or p-value approach can be used to test for significance.
Similar to the R2 value in SLR, the coefficient of multiple determination is defined as the proportion of
total variation in Y “explained” by the multiple regression of Y on X1 and X2. It can be calculated as:
Note: the inclusion of more independent variables always increases the value of R 2, so a high value is
not necessarily a sign of a good fit. Adjusted R2 penalizes the addition of more independent variables
and is a better measure in such cases:
Example
Test of Overall Significance of the Regression
While the other tests of significance are run on individual parameters of the regression, we also need to
test the joint significance of the parameters. In some cases, it is found that the parameters are
individually insignificant, but have a joint significant effect on the independent variable and thus should
not be excluded from the model.
Joint significance testing is done by using the F-statistic. The F-stat is the ratio of the explained to the
unexplained variance. This follows an F distribution with k - 1 and n - k degrees of freedom, where n
is number of observations and k is number of parameters estimated:
The procedure for conducting the test using the F-stat is the same as the one explained previously for
the T-stat.
The partial-correlation coefficient measures the net correlation between the dependent variable and one
independent variable after excluding the common influence of (i.e., holding constant) the other
independent variables in the model. For example, rYX1.X2 is the partial correlation between Y and X1,
after removing the influence of X2 from both Y and X1:
where rYX1 = simple-correlation coefficient between Y and X1, and rYX2 and rX1X2 are analogously
defined. Partial-correlation coefficients range in value from -1 to +1 (as do simple-correlation
coefficients), have the sign of the corresponding estimated parameter, and are used to determine the
relative importance of the different explanatory variables in a multiple regression.
Matrix Notation
Calculations increase substantially as the number of independent variables increase. Matrix notation
can aid in solving larger regressions algebraically. The following solution works with any number of
independent variables and is therefore extremely flexible. For example:
Functional Forms
The OLS method is used to fit linear relationships only. Therefore, for cases where there is a non-linear
relationship between the variables, certain transformations can be made to make them linear, before
performing the regression analysis. Accordingly, the interpretation of the regression coefficients also
changes. Some of these transformations are given in the table below.
Example
Interaction Terms
Sometimes, it is natural for the partial effect, elasticity, or semi-elasticity of the dependent variable with
respect to an explanatory variable to depend on the magnitude of yet another explanatory variable. For
example, in the model
the partial effect of bdrms on price (holding all other variables fixed) is
Example
Dummy Variables
Qualitative explanatory variables (such as wartime vs. peacetime, periods of strike vs. nonstrike, male
vs. females, etc.) can be introduced into regression analysis by assigning the value of 1 for one
classification (e.g., wartime) and 0 for the other (e.g., peacetime). These are called dummy variables
and are treated as any other variable. Dummy variables can be used to capture changes (shifts) in the
intercept [Eq. (8.5)], changes in slope [Eq. (8.6)], and changes in both intercept and slope [Eq. (8.7)]:
where D is 1 for one classification and 0 otherwise and X is the usual quantitative explanatory variable.
Dummy variables also can be used to capture differences among more than two classifications, such as
seasons and regions [Eq. (8.8)]:
where b0 is the intercept for the first season or region and D1, D2, and D3 refer, respectively, to season
or region 2, 3, and 4. Note that for any number of classifications k, k 1 dummies are required.
Example
Binary Choice Models
These models are used when the dependent/outcome variable is binary i.e. takes only two values (Yes
or No). To estimate the model, we first set up an underlying model where Y is like a dummy variable:
Here, Yi* is considered an underlying propensity for the dummy variable to take the value of 1 and is
a continuous variable so that:
Instead of using OLS, another estimation technique called the maximum-likelihood estimate of the
coefficients is calculated by setting up the log-likelihood function:
where Σ1 and Σ0 indicate sum of all probabilities for those data points where Yi = 1 and 0, respectively,
and ^ b0 and ^ b1 are chosen to maximize the log-likelihood function. If the standard normal distribution
is used to find the probabilities, it is a probit model; if the logistic distribution is used, it is a logit model.
Since these functions are nonlinear, estimation by computer is usually required.
The interpretation of b1 changes in a binary choice model. b1 is the effect of X on Y* . The marginal
effect of X on P(Y = 1) is easier to interpret and is given by:
Multicollinearity
Multicollinearity refers to the case in which two or more explanatory variables in the regression model
are highly correlated, making it difficult or impossible to isolate their individual effects on the
dependent variable. With multicollinearity, the estimated OLS coefficients may be statistically
insignificant (and even have the wrong sign) even though R2 may be ‘‘high.’’ Multicollinearity can
sometimes be overcome or reduced by:
Heteroskedasticity
If the OLS assumption of homoskedasticity (that the variance of the error term is constant for all
observations does not hold), we face the problem of heteroscedasticity. This leads to unbiased but
inefficient (i.e., larger than minimum variance) estimates of the coefficients, as well as biased estimates
of the standard errors (and, thus, incorrect statistical tests and confidence intervals). Graphically, the
scatter plot gets more (or less) scattered as the values of X increases.
One test for heteroscedasticity involves arranging the data from small to large values of the independent
variable X and running two regressions, one for small values of X and one for large values, omitting,
say, one-fifth of the middle observations. Then, we test that the ratio of the error sum of squares (ESS)
of the second regression to the first regression is significantly different from zero, using the F table with
(n - d - 2k)/2 degrees of freedom, where n is the total number of observations, d is the number of omitted
observations, and k is the number of estimated parameters. If the error variance is proportional to X2
(often the case), heteroscedasticity can be overcome by dividing every term of the model by X and then
re-estimating the regression using the transformed variables.
Regression techniques such as Weighted Least Squares (WLS), or Generalized Least Squares (GLS)
can also be used in place of OLS, for cases where heteroskedasticity is present.
(Self-study: Tests for heteroskedasticity: Breusch-Pagan Test, Lagrange Multiplier, White Test)
Autocorrelation
When the error term in one time period is positively correlated with the error term in the previous time
period, we face the problem of (positive first-order) autocorrelation. This is common in time-series
analysis and leads to downward-biased standard errors (and, thus, to incorrect statistical tests and
confidence intervals). (Will be discussed further in Time Series Analysis)
Errors in variables refer to the case in which the variables in the regression model include measurement
errors. Measurement errors in the dependent variable are incorporated into the disturbance term and do
not create any special problem. However, errors in the explanatory variables lead to biased and
inconsistent parameter estimates. One method of obtaining consistent OLS parameter estimates is to
replace the explanatory variable subject to measurement errors with another variable (called an
instrumental variable) that is highly correlated with the original explanatory variable but is independent
of the error term. This is often difficult to do and somewhat arbitrary. The simplest instrumental variable
is usually the lagged explanatory variable in question. Another method used when only X is subject to
measurement errors involves regressing X on Y.
Proxy Variables
These are used in place of variables for which there is insufficient data to run a regression analysis. The
proxy variable (Z) is hypothesized to be linearly related to the missing explanatory variable (let X 2):
X2 = a + cZ
Therefore, the regression equation is rewritten as:
Y = b0 + b1X1 + b2(a + cZ) + ….. + bkXk + u
So, only the intercept and the slope variable of the proxy variable changes. For the rest of the variables,
they remain unbiased, and their standard errors remain same.
However, if a poor proxy variable is chosen, it can lead to measurement errors and the other explanatory
variables may try to act as proxies for the missing variable, which can lead to OVB.
Variable Misspecifications in Regression Analysis
This can occur when a variable that should have been included in the model is omitted. This makes the
coefficients biased (called Omitted Variable Bias or OVB) and the standard errors become invalid.
Consequently, all the tests of significance also become invalid.
Reason for biasness: The variable let (X2) closest in relation to the omitted variable (let X3) starts acting
as a mimic for the omitted variable. The strength of this proxy effect depends on i) the strength of the
omitted variable on Y and ii) ability of X2 to mimic X3.
The RESET test can be used to test for OVB.
Example
Inclusion of an irrelevant variable
Here, the coefficients remain unbiased, but they become inefficient, because the standard errors are
higher than they should be (they are still valid, but since they are very large, there is a loss of efficiency).
Time Series Analysis
Time Series Data
The first step to understanding time series analysis is to understand the data that we’ll be working on.
The data used until now was cross-sectional data i.e., data of a number of units at a given time. However,
in time series analysis, we use time series data which is data collected over time on one particular
unit. Due to its nature, the ordering of the data matters in time series since it is defined
chronologically. Shuffling observations in cross sectional data does not result in loss in information in
estimation. However, in time series data it can lead to confounding of the possible existence of dynamic
variables between the variables.
Time Series data have the following components: trend, cyclical, seasonal and random.
A univariate time series is a sequence of measurements of the same variable collected over time. Most
often, the measurements are made at regular time intervals. The basic objective usually is to determine
a model that describes the pattern of the time series. Uses for such a model are:
1. To describe the important features of the time series pattern.
2. To explain how the past affects the future or how two time series can “interact”.
3. To forecast future values of the series.
4. To possibly serve as a control standard for a variable that measures the quality of product in
some manufacturing situations.
Some important features of time series data are:
• Trend: on average, do the measurements tend to increase (or decrease) over time?
• Seasonality: is there a regularly repeating pattern of highs and lows related to calendar time
such as seasons, quarters, months, days of the week, and so on?
• Outliers: In regression, outliers are far away from your line. With time series data, your outliers
are far away from your other data.
Stationarity
Thus, a white noise process has constant mean and variance, and zero autocovariances, except at lag
zero. Another way to state this last condition would be to say that each observation is uncorrelated with
all other values in the sequence. Hence the autocorrelation function for a white noise process will be
zero apart from a single peak of 1 at s = 0. If μ = 0, and the three conditions hold, the process is known
as zero mean white noise.
In the following models, the error terms are assumed to follow a white noise process.
For convenience in writing and solving time series equations which can involve a lot of lagged terms,
a Lag Operator (B or L) is used. When multiplied to the time series variable, it lags it by the amount
which is equal to the power of the operator. For example:
BXt = Xt-1
B2Xt = Xt-2
So, in general, BkXt = Xt-k.
Thus, an AR or MA model equation can be expressed in the form of a polynomial of B, which can be
solved to find the roots.
This is the simplest class of time series models. The notation MA(q) refers to the moving average model
of order q:
where µ is the mean of the series, the Ɵi are the parameters of the model and the ε are white noise error
terms. The value of q is called the order of the MA model.
A moving-average model is conceptually a linear regression of the current value of the series against
current and previous (observed) white noise error terms or random shocks. The random shocks at each
point are assumed to be mutually independent and to come from the same distribution, typically a
normal distribution, with location at zero and constant scale.
Example
Invertibility of MA Models
Invertibility is a restriction programmed into time series software used to estimate the coefficients of
models with MA terms. It’s not something that we check for in the data analysis.
Autoregressive (AR) Model
Here, the current value of a variable depends only on the values that variable took in previous periods,
plus an error term. The notation AR(p) indicates an autoregressive model of order p. The AR(p) model
is defined as:
Stationarity is a desirable property in AR models. For this condition to be satisfied, the roots of model
must lie outside the unit circle.
Assumptions:
• Error terms are independently distributed with a normal distribution that has 0 mean and
constant variance.
• Properties of the error terms are independent of xt.
• The series Xt is (weakly) stationary.
Combining AR and MA models, we get ARMA model. The notation ARMA (p, q) refers to the model
with p autoregressive terms and q moving-average terms. This model contains the AR(p) and MA(q)
models:
If this model does not satisfy the stationarity condition, it can sometimes be achieved by differencing
the equation by its time lag i.e. Xt – Xt-1. Differencing removes the changes in the level of a time series,
eliminating trend and seasonality and consequently stabilizing the mean of the time series.
If this results in a stationary series, then the series is Integrated of order 1. Similarly, if this process
needs to be repeated ‘d’ number of times before stationarity is achieved, it is Integrated of order d. Such
a series is then denoted by ARIMA(p,d,q).
The sample autocorrelation function (ACF) for a series gives correlations between the series xt and
lagged values of the series for lags of 1, 2, 3, and so on. The lagged values can be written as xt-1, xt-2,
xt-3, and so on. The ACF gives correlations between xt and xt-1, xt-2 and xt-3, and so on.
The ACF can be used to identify the possible structure of time series data. It can be used to test the
significance of each lag so that it can be determined whether they should be included in the model or
not. It can also be used to test the autocorrelation between error terms (or residuals). Since there should
be no autocorrelation, they should be insignificant at all lags.
The ACF can be calculated as:
Many stationary series have recognizable ACF patterns. Most series that we encounter in practice,
however, is not stationary. A continual upward trend, for example, is a violation of the requirement that
the mean is the same for all t. Distinct seasonal patterns also violate that requirement.
ACF of AR model
The ACF of AR model shows an exponential decay to 0 as the lag increases, as shown in the figure
below:
ACF of MA Model
The ACF of MA models shows a “sudden death” of significance after a certain lag instead of an
exponential decline. This can be seen in the ACF plot shown:
The PACF measures the correlation between the current observation and the observation k periods ago
after controlling for the observations at intermediate lags. Mathematically, it can be expressed as:
The plot of PACF values is very convenient for identifying an AR process. For an AR series, the PACF
shows a “sudden-death” after the lag that is the order of the model. For example, the PACF plot of an
AR(1) model is shown below:
For an MA mode, the PACF tapers off instead, so its identification is best done with an ACF plot as
discussed previously.
This is an important test of significance to check up to which lag should be included in your model. In
general, the model should contain as few terms as possible (called a parsimonious model). The Ljung-
Box test statistic is given by:
where n is the sample size, ρk is the sample autocorrelation at lag k, and h is the number of lags
being tested.
The test works just like a standard t-test. If the Q-statistic value is greater than the critical value at
the chosen significance level, then the null hypothesis is rejected in favour of the alternate
hypothesis. If not, then the null hypothesis cannot be rejected. For this test, the null and alternate
hypotheses are:
H0: The data are independently distributed (i.e., the correlations in the population from
which the sample is taken are 0, so that any observed correlations in the data result from
randomness of the sampling process).
Ha: The data are not independently distributed; they exhibit serial correlation.
• Plot the data: this does not give any information about the correct model or its parameters, but
it gives indication whether the series is stationary or not, whether there is a trend/seasonality,
and what further steps you should take.
• Plot the ACF and PACF of series: Since MA and AR both have identifiable plots, this step is
used to decide on which model fits the series best. It is possible that neither MA, not AR can
fit it best, in which case it might be best to go with an ARIMA model.
• Model Diagnostics: this involves a series of steps to check whether model specified and
estimated is adequate. The best method for this is to check the Information Criteria values.
Information Criteria
Information criteria embody two factors: a term which is a function of the residual sum of squares
(RSS), and some penalty for the loss of degrees of freedom from adding extra parameters. So, adding a
new variable or an additional lag to a model will have two competing effects on the information criteria:
the residual sum of squares will fall but the value of the penalty term will increase.
There are several types of IC, the difference between them being how heavily they penalize the addition
of extra parameters. To build a parsimonious model, the one with the least value of IC is chosen.
Seasonality in a time series is a regular pattern of changes that repeats over S time periods, where S
defines the number of time periods until the pattern repeats again.
For example, there is seasonality in monthly data for which high values tend always to occur in some
particular months and low values tend always to occur in other particular months. In this case, S = 12
(months per year) is the span of the periodic seasonal behaviour. For quarterly data, S = 4 time periods
per year.
In a seasonal ARIMA model, seasonal AR and MA terms predict xt using data values and errors at
times with lags that are multiples of S (the span of the seasonality).
• With monthly data (and S = 12), a seasonal first order autoregressive model would use xt-12 to
predict xt. For instance, if we were selling cooling fans we might predict this August’s sales
using last August’s sales. (This relationship of predicting using last year’s data would hold for
any month of the year.)
• A seasonal second order autoregressive model would use xt-12 and xt-24 to predict xt. Here we
would predict this August’s values from the past two Augusts.
• A seasonal first order MA(1) model (with S = 12) would use wt-12 as a predictor. A seasonal
second order MA(2) model would use wt-12 and wt-24.
Differencing
Almost by definition, it may be necessary to examine differenced data when we have seasonality.
Seasonality usually causes the series to be nonstationary because the average values at some particular
times within the seasonal span (months, for example) may be different than the average values at other
times. For instance, our sales of cooling fans will always be higher in the summer months.
Seasonal Differencing
Seasonal differencing is defined as a difference between a value and a value with lag that is a multiple
of S. The differences (from the previous year) may be about the same for each month of the year giving
us a stationary series.
With S = 12, which may occur with monthly data, a seasonal difference is:
(1-B12)Xt = Xt – Xt-12
Where B is the lag operator.
Seasonal differencing removes seasonal trend and can also get rid of seasonal random walk type of
nonstationarity.
Non-seasonal differencing
If trend is present in the data, we may also need non-seasonal differencing. Often (not always) a first
difference (non-seasonal) will “detrend” the data. That is, we use (1-B)Xt = Xt – Xt-1 in the presence of
trend.
When both trend and seasonality are present, we may need to apply both a non-seasonal first difference
and a seasonal difference. That is, we may need to examine the ACF and PACF of:
The seasonal ARIMA model incorporates both non-seasonal and seasonal factors in a multiplicative
model. One shorthand notation for the model is
ARIMA(p,d,q) x (P,D,Q)S
with p = non-seasonal AR order, d = non-seasonal differencing, q = non-seasonal MA order, P =
seasonal AR order, D = seasonal differencing, Q = seasonal MA order, and S = time span of repeating
seasonal pattern.
Without differencing operations, the model could be written more formally as
Decomposition Models
Decomposition procedures are used in time series to describe the trend and seasonal factors in a time
series. More extensive decompositions might also include long-run cycles, holiday effects, day of week
effects and so on. Here, we’ll only consider trend and seasonal decompositions.
One of the main objectives for a decomposition is to estimate seasonal effects that can be used to create
and present seasonally adjusted values. A seasonally adjusted value removes the seasonal effect from a
value so that trends can be seen more clearly. For instance, in many regions of the U.S. unemployment
tends to decrease in the summer due to increased employment in agricultural areas. Thus, a drop in the
unemployment rate in June compared to May doesn’t necessarily indicate that there’s a trend toward
lower unemployment in the country. To see whether there is a real trend, we should adjust for the fact
that unemployment is always lower in June than in May.
Basic Structures
The following two structures are considered for basic decomposition models:
1. Additive: Xt = Trend + Seasonal + Random
2. Multiplicative: Xt = Trend * Seasonal * Random
The additive model is useful when the seasonal variation is relatively constant over time. The
multiplicative model is useful when the seasonal variation increases over time.
A time series that might benefit from additive decomposition would like this:
On the other hand, a time series that would benefit from multiplicative decomposition would look like
this:
The seasonal effects are usually adjusted so that they average to 0 for an additive decomposition or they
average to 1 for a multiplicative decomposition.
1. The first step is to estimate the trend. Two different approaches could be used for this (with
many variations of each).
• One approach is to estimate the trend with a smoothing procedure such as smoothing
averages.
• The second approach is to model the trend with a regression equation
2. The second step is to “de-trend” the series. For an additive decomposition, this is done by
subtracting the trend estimates from the series. For a multiplicative decomposition, this is done
by dividing the series by the trend values.
3. The second step is to “de-trend” the series. For an additive decomposition, this is done by
subtracting the trend estimates from the series. For a multiplicative decomposition, this is done
by dividing the series by the trend values.
4. The second step is to “de-trend” the series. For an additive decomposition, this is done by
subtracting the trend estimates from the series. For a multiplicative decomposition, this is done
by dividing the series by the trend values.
Smoothing Time Series
Smoothing is usually done to help us better see patterns, trends for example, in time series. Generally
smooth out the irregular roughness to see a clearer signal. For seasonal data, we might smooth out the
seasonality so that we can identify the trend. Smoothing doesn’t provide us with a model, but it can be
a good first step in describing various components of the series.
The term filter is sometimes used to describe a smoothing procedure. For instance, if the smoothed
value for a particular time is calculated as a linear combination of observations for surrounding times,
it might be said that we’ve applied a linear filter to the data (not the same as saying the result is a straight
line, by the way).
Moving Averages
The traditional use of the term moving average is that at each point in time we determine (possibly
weighted) averages of observed values that surround a particular time.
To take away seasonality from a series so we can better see trend, we would use a moving average with
a length = seasonal span. Thus, in the smoothed series, each smoothed value has been averaged across
all seasons. This might be done by looking at a “one-sided” moving average in which you average all
values for the previous years’ worth of data or a centred moving average in which you use values both
before and after the current time.
Example of one-sided filter:
A centred moving average creates a bit of a difficulty when we have an even number of time periods in
the seasonal span (as we usually do).
Some examples of centred filter:
Single Exponential Smoothing
The basic forecasting equation for single exponential smoothing is often given as
The value of α is called the smoothing constant. With a relatively small value of α, the smoothing will
be relatively more extensive. With a relatively large value of α, the smoothing is relatively less extensive
as more weight will be put on the observed value.
Double exponential smoothing might be used when there's trend (either long run or short run), but no
seasonality.
Essentially the method creates a forecast by combining exponentially smoothed estimates of the trend
(slope of a straight line) and the level (basically, the intercept of a straight line).
Two different weights, or smoothing parameters, are used to update these two components at each time.
The smoothed “level” is more or less equivalent to a simple exponential smoothing of the data values
and the smoothed trend is more or less equivalent to a simple exponential smoothing of the first
differences.
The procedure is equivalent to fitting an ARIMA(0,2,2) model, with no constant; it can be carried out
with an ARIMA(0,2,2) fit.
The Periodogram
Any time series can be expressed as a combination of cosine and sine waves with differing periods (how
long it takes to complete a full cycle) and amplitudes (maximum/minimum value during the cycle). This
fact can be utilized to examine the periodic (cyclical) behavior in a time series.
A periodogram is used to identify the dominant periods (or frequencies) of a time series. This can be a
helpful tool for identifying the dominant cyclical behavior in a series, particularly when the cycles are
not related to the commonly encountered monthly or quarterly seasonality.
Suppose that we have observed data at n distinct time points, and for convenience we assume that n is
even. Our goal is to identify important frequencies in the data. To pursue the investigation, we consider
the set of possible frequencies wj = j/n for j = 1, 2,…, n/2. These are called the harmonic frequencies.
This is a sum of sine and cosine functions at the harmonic frequencies. It uses the following identity:
Think of the β1(j/n) and β 2(j/n) as regression parameters. Then there is a total of n parameters because
we let j move from 1 to n/2. This means that we have n data points and n parameters, so the fit of this
regression model will be exact.
The first step in the creation of the periodogram is the estimation of the β1(j/n) and β 2(j/n) parameters.
It’s actually not necessary to carry out this regression to estimate the β1(j/n) and β 2(j/n) parameters.
Instead, a mathematical device called the Fast Fourier Transform (FFT) is used.
After the parameters have been estimated, we define:
This is the value of the sum of squared “regression” coefficients at the frequency j/n.
A relatively large value of P(j/n) indicates relatively more importance for the frequency j/n (or near j/n)
in explaining the oscillation in the observed series. P(j/n) is proportional to the squared correlation
between the observed series and a cosine wave with frequency j/n. The dominant frequencies might be
used to fit cosine (or sine) waves to the data or might be used simply to describe the important
periodicities in the series.
When we do regressions using time series variables, it is common for the errors (residuals) to have a
time series structure. This violates the usual assumption of independent errors made in ordinary least
squares regression. The consequence is that the estimates of coefficients and their standard errors will
be wrong if the time series structure of the errors is ignored.
It is possible, though, to adjust estimated regression coefficients and standard errors when the errors
have an AR structure. More generally, we will be able to make adjustments when the errors have a
general ARIMA structure.
Suppose that yt and xt are time series variables. A simple linear regression model with autoregressive
errors can be written as:
This procedure is attributed to Cochrane and Orcutt (1949) and is repeated until the estimates converge,
that is we observe a very small difference in our estimates between iterations. When the errors exhibit
an AR(1) pattern, the cochrane.orcutt function found within the orcutt package in R iterates this
procedure.
For a higher order AR, the adjustment variables are calculated in the same manner with more lags. For
instance, suppose the residuals were found to have an AR(2) with estimated coefficients 0.9 and -0.2.
Then the y- and x- variables for the adjustment regression would be
Here, the purpose is to adjust the regression estimates for the fact that the residuals have an ARIMA
structure.
The basic steps are:
1. Use OLS regression to estimate the model:
(note that the βt term represents the possibility of a trend component, and hence the series may
require de-trending)
2. Examine the ARIMA structure (if any) of the sample residuals from the model in step 1.
4. Examine the ARIMA structure (if any) of the sample residuals from the model in step 3. If
white noise is present, then the model is complete. If not, continue to adjust the ARIMA model
for the errors until the residuals are white noise.
The basic problem we’re considering is the description and modelling of the relationship between two
time series.
In the relationship between two time series (yt and xt), the series yt may be related to past lags of the x-
series. The sample cross correlation function (CCF) is helpful for identifying lags of the x-variable
that might be useful predictors of yt.
The sample CCF is defined as the set of sample correlations between xt+h and yt for h = 0, ±1, ±2, ±3,
and so on. A negative value for ℎ is a correlation between the x-variable at a time before t and the y-
variable at time t. For instance, consider ℎ = −2. The CCF value would give the correlation between xt-
2 and yt.
• When one or more xt+h, with h negative, are predictors of yt, it is sometimes said that x leads y.
• When one or more xt+h, with h positive, are predictors of yt, it is sometimes said that x lags y.
In some problems, the goal may be to identify which variable is leading and which is lagging. In many
problems we consider, though, we’ll examine the x-variable(s) to be a leading variable of the y-variable
because we will want to use values of the x-variable to predict future values of y.
The CCF pattern is affected by the underlying time series structures of the two variables and the trend
each series has. It often (perhaps most often) is helpful to de-trend and/or take into account the
univariate ARIMA structure of the x-variable before graphing the CCF.
ARCH/GARCH Models
An ARCH (autoregressive conditionally heteroscedastic) model is a model for the variance of a time
series. ARCH models are used to describe a changing, possibly volatile variance. Although an ARCH
model could possibly be used to describe a gradually increasing variance over time, most often it is
used in situations in which there may be short periods of increased variation. (Gradually increasing
variance connected to a gradually increasing mean level might be better handled by transforming the
variable.)
ARCH models were created in the context of econometric and finance problems having to do with the
amount that investments or stocks increase (or decrease) per time period, so there’s a tendency to
describe them as models for that type of variable. For that reason, the authors of our text suggest that
the variable of interest in these problems might either be yt = (xt – xt-1)/xt-1, the proportion gained or lost
since the last time, or log((xt – xt-1)/xt-1) =log((xt – xt-1)) – log(xt-1), the logarithm of the ratio of this
time’s value to last time’s value. It’s not necessary that one of these be the primary variable of
interest. An ARCH model could be used for any series that has periods of increased or decreased
variance. This might, for example, be a property of residuals after an ARIMA model has been fit to the
data.
Suppose that we are modelling the variance of a series yt . The ARCH(1) model for the variance of
model yt is that conditional on yt-1 , the variance at time t is:
The variance at time t is connected to the value of the series at time t – 1. A relatively large value of
y2t-1 gives a relatively large value of the variance at time t. This means that the value of yt is less
predictable at time t −1 than at times after a relatively small value of y2t-1.
If we assume that the series has mean = 0 (this can always be done by centering), the ARCH model
could be written as:
For inference (and maximum likelihood estimation) we would also assume that the εt are normally
distributed.
Two potentially useful properties of the useful theoretical property of the ARCH(1) model as written in
equation above are the following:
An ARCH(m) process is one for which the variance at time t is conditional on observations at the
previous m times, and the relationship is
With certain constraints imposed on the coefficients, the yt series squared will theoretically be AR(m).
A GARCH (generalized autoregressive conditionally heteroscedastic) model uses values of the past
squared observations and past variances to model the variance at time t. As an example, a GARCH(1,1)
is:
In the GARCH notation, the first subscript refers to the order of the y2 terms on the right side, and the
second subscript refers to the order of the σ 2
terms.
VAR models (vector autoregressive models) are used for multivariate time series. The structure is that
each variable is a linear function of past lags of itself and past lags of the other variables.
As an example, suppose that we measure three different time series variables, denoted by xt-1, xt-2,
and xt-3.
Each variable is a linear function of the lag 1 values for all variables in the set.
In a VAR(2) model, the lag 2 values for all variables are added to the right sides of the equations, In the
case of three x-variables (or time series) there would be six predictors on the right side of each equation,
three lag 1 terms and three lag 2 terms.
In general, for a VAR(p) model, the first p lags of each variable in the system would be used as
regression predictors for each variable.
VAR models are a specific case of more general VARMA models. VARMA models for multivariate
time series include the VAR structure above along with moving average terms for each variable. More
generally yet, these are special cases of ARMAX models that allow for the addition of other predictors
that are outside the multivariate set of principal interest.