Structmodel
Structmodel
JEAN-LOUIS BRILLET.
CONTENTS
1 FOREWORD 14
INTRODUCTION 15
1 19
1.2.3 PARAMETERS 24
1.2.6 FORMULATIONS 27
1.2.8 LINEARITY 32
1.2.12 CONCLUSION 51
1
2.1.2 ADVANTAGES OF MODELS 52
3.3.3 A CLASSIFICATION 58
4.1.2 ESTIMATION 60
5.2.6 UPDATES 82
5.2.7 SUPPRESSIONS 82
3
6.2.2 THE CONSTANT TERM 150
6.3.2 INVESTMENT: THE NECESSITY TO ESTABLISH A CONSISTENT THEORETICAL EQUATION PRIOR TO ESTIMATION 163
7 CHAPTER 7: TESTING THE MODEL THROUGH SIMULATIONS OVER THE PAST 214
4
8 CHAPTER 8: TESTING THE MODEL OVER THE FUTURE 286
5
8.7.2 CONSEQUENCES FOR MODEL SIMULATIONS 306
6
9.1.6 CHANGING MODEL SPECIFICATIONS 340
7
10.3.8 CHANGE IN INVENTORIES 472
9
13.3 RESPECIFICATION OF ENDOGENOUS AND EXOGENOUS VARIABLES 555
14 A LIST OF USEFUL SERIES FOR PRODUCING A SINGLE COUNTRY, SINGLE PRODUCT MODEL 559
14.3.1 THE SUPPLY - DEMAND VARIABLES AT CONSTANT PRICES (WHOLE ECONOMY) 560
15.3.1 THE SUPPLY - DEMAND VARIABLES AT CONSTANT PRICES (WHOLE ECONOMY) 564
11
577
12
602
18 INDEX 629
13
1 FOREWORD
First, of course, by its subject: structural econometric modelling no longer looks so fashionable, having lost ground to
Computable General Equilibrium models and in particular their Dynamic Stochastic versions.
We will contend that while this might be true in the academic field (you just have to look at the program of congresses
and symposiums) there is still a lot of room for structural models. Indeed, many institutions are still using them and
even building new ones, both in developed and developing countries. We shall try to show that this position is quite
justified, and that for a large part of the modelling applications, in particular the analysis and interpretation of
macroeconomic interactions, the call for structural models remains a good strategy, arguably the best one.
But we shall not stop at proving the usefulness of these models. For the people we have convinced, or which were so
already, we will provide a set of tools facilitating all the tasks in the modelling process. Starting from elementary
elements, it will lead by stages the user to a level at which he should be able to build, manage and use his
professional, operational model.
This means this book will, as its title says, focus essentially on applied and even technical features, which does not
mean it will be so simplistic.
After a necessary description of the field, we shall use the largest part of the book to show the reader how to build his
own model, from general strategies to technical details. For this we shall rely on a specific example, presented at the
beginning, and which we will follow through all the steps of model development. When the situation becomes more
complex (with the addition of product and international dimensions), we shall still find this model at the core of the
cases.
This model will also be present in the background, when we address new directions, which we think are quite
compatible with our approach: Quasi-Accounting, and Stock-Flow Consistent models.
Our examples will be based on EViews package, the most popular modeling product presently available This will allow
us to be more helpful to EViews users, concentrating on its practice (including some tricks).
Finally, just as important if not more so, we shall provide a set of files allowing readers to practice modelling (either
alone or as part of a course). And for more advanced users, we shall give access to files allowing to produce
operational (if small) models, which they can adapt to their own ideas, with the tedious tasks: producing the data,
defining the accounting framework and organizing simulations over the future, being already prepared.
All these elements are provided for free, and downloadable on the EViews site, at the address:
https://www.eviews.com/StructModel/structmodel.html
This version of the book takes into account the features of the last version of EViews; EViews 13. However, most of
the text is valid for earlier versions. The main differences come from improvements in the user-friendliness.
14
INTRODUCTION
Since an early date in the twentieth century, economists have tried to produce mathematical tools which, applied to a
given practical problem, formalized a given economic theory to produce a reliable numerical picture. The most natural
application is of course to forecast the future, and indeed this goal was present from the first. But one can also
consider learning the consequences of an unforeseen event, or measuring the efficiency of a change in the present
policy, or even improving the understanding of a set of mechanisms too complex to be grasped by the human mind.
In the beginning (let us say since the 1930s) the field was occupied by the βstructuralβ models. They start from a given
economic framework, defining the behaviors of the individual agents according to some globally consistent economic
theory. They use the available data to associate to these behaviors reliable formulas, which are linked by identities
guaranteeing the consistency of the whole set. These models can be placed halfway between the two above
categories: they do rely on statistics, and also on theory. To accept a formula, it must respect both types of criteria.
The use of this kind of models, which occupied the whole field at the beginning, is now mostly restricted to policy
analysis and medium-term forecasting. For the latter, they show huge advantages: the full theoretical formulations
provide a clear and understandable picture, including the measurement of individual influences. They allow also to
introduce stability constraints leading to identified long-term equilibriums, and to separate this equilibrium from the
dynamic fluctuations which lead to it.
In the last decades, new kinds of models have emerged, which share the present market.
β’ The βVARβ models. They try to give the most reliable image of the near future, using a complex estimated
structure of lagged elements, based essentially on the statistical quality, although economic theory can be
introduced, mostly through constraints on the specifications. The main use of this tool is to produce short-term
assessments.
β’ The Quasi-Accounting models, which rely on very basic behaviors, most of the time calibrated. This allows to
treat cases where data is available for extremely limited sample periods, or where the fine detail (generally in
products) forbids to apply econometrics with a good chance of global success.
β’ Stock-Flow Consistent models, which answer two criticisms addressed to structural models: producing incomplete
and formally unbalanced models, and not taking enough into account the stocks, in particular of financial assets.
By detailing these assets by agent and category, SFCs allow to consider sophisticated financial behaviors,
sometimes at the expense of the βrealβ properties.
β’ And last (but not the least) Computable General Equilibrium models. They use a detailed structure with a priori
formulations and calibrated coefficients to solve a generally local problem, through the application of one or
several optimizing behaviors. The issues typically addressed are optimizing resource allocations, or describing the
consequences of trade agreements. The mechanisms described contain generally little dynamics.
This is no longer true for the Dynamic Stochastic General Equilibrium models, which dominate the current field. They
include dynamic behaviors and take into account the uncertainty in economic evolutions. Compared to the traditional
models (see later) they formalize explicitly the optimizing equilibria, based on the aggregated behavior of individual
agents. This means that they allow agents to adapt their behavior to changes is the rules governing the behaviors of
others, including the State, in principle escaping the Lucas critique. As the model does not rely on traditional
estimated equations, calibration is required for most parameters.
Compared to CGEs and DSGEs, optimization behaviors are present (as we shall see later) and introduced in the
estimated equations. But they are frozen there, in a state associated with a period, and the behavior of other agents
at the time. If these conditions do not change, the statistical validation is an important advantage. But sensitivity to
shocks is flawed, in a way which is difficult to measure.
A very good (and objective) description of the issue can be found in:
15
http://en.wikipedia.org/wiki/Dynamic_stochastic_general_equilibrium
http://en.wikipedia.org/wiki/Macroeconomic_model#Empirical_forecasting_models
It seems to us that the main criterion in the choice between DSGEs and traditional structural models lie in the tradeoff
between statistical validation and adaptability of behaviors.
In the last years, popularity of structural econometric modelling seems to have stabilized. A personal hint for this (if
not an actual proof) is the sustained demand for participation in structural modelling projects, observed on the sites of
companies devoted to international cooperation.
Another issue is that being the first tool produced (in the thirties of the last century) it was applied immediately to the
ambitious task of producing reliable forecasts. The complexity of the economy, and the presence of many random
shocks makes this completely unrealistic (and this is even more true today). During the golden years of structural
modelling, when economy was growing at a regular (and high) rate, forecasting was as easy as riding a tame horse on
a straight path: anybody could do it. But when the horse turned into a wild one, the quality of the rider showed, and it
did not stay in the saddle too long. Failing to succeed in a task too difficult for any tool (including VAR and CGE models,
which do not have to forecast the medium-term), gave discredit to structural models and all their uses, including
policy analysis and even the understanding and interpretation of complex economic mechanisms, applications for
which neither VAR nor CGE can compete in our opinion.
Also, the role of financial issues has grown, which the initial structural models were not well equipped to address. But
Stock-Flow Consistent versions can be an answer to this problem.
Anyway, even with limited ambitions, producing a sound econometric structural model is not a simple task. Even a
professional economist, having an excellent knowledge of both economic theory (but not necessarily a complete and
consistent picture) and econometric techniques (but not necessarily of their practical application) will find it quite
difficult producing a reliable and operational econometric model.
The purpose of this book is to shorten the learning process, in several ways.
We shall describe how to organize the sequence of model building tasks, from data production and framework
specification to actual operational studies.
For each task, we shall give all the necessary elements of methodology.
We shall present the main economic options available, with some theoretical explanations.
All these explanations will be based on a practical example, the production of a very small model of the French
economy. The size will not forbid us to address most of the problems encountered in the process.
The methods, techniques and solutions proposed will be based on the EViews software. This will allow us to present
some useful features and tricks, and to provide a sequence of complete programs, which the user can modify at will,
but not necessarily too heavily, as all the models of this type share a number of common elements. The main issue is
of course the estimation process, each case leading generally to an original version of each behavioral equation.
16
A set of documented programs is available on demand, following the above principles
In each case, we shall present programs which actually work. An econometric solution will be found, reliable both in
statistical and economic terms. And the properties of the models will be rather satisfying, with a long-term solution
and reasonable dynamics leading to it.
Finally, we shall address the more complex problems: multi-sector and multi-country models (and both options
combined). The specific issues will be described, and a framework for a three-product model will be provided,
following the same lines as the previous example.
The goal of this book is therefore both limited and ambitious. Without getting into theoretically complex features, it
should give readers all the elements required to construct their own model. Being relieved of the more technical (and
tedious) tasks, they will be allowed to concentrate on the more intelligent (and interesting) ones.
Readers must be aware they will find here neither a full description of econometric and statistical methods, nor a
course in economic theory. We shall give basic elements on these fields, and rather focus on their links with the
modelling process itself. For more detailed information, one can refer to the list of references provided at the end of
the volume.
Concerning Quasi-Accounting and even more Stock-Flow Consistent models, for which our experience is much more
limited, we will be even less directive.
17
THE EXAMPLE: A VERY BASIC MODEL
To present the elements and the framework of a structural econometric model, we shall use a specific example, which
we shall address permanently during our presentation. In spite of its limited size, we think it remains quite
representative of the class of models we are considering in this manual.
At the start of any model building process, one has to specify in a broad manner the logic of his model, and the
behaviors he wants his model to describe. No equation needs to be established at this time. We shall place ourselves
in this situation.
In our example, an economist has decided to build a very simple model of the French economy. As our tests will be
based on actual data, a country had to be chosen, but the principles apply to any medium sized industrialized country.
β’ Based on their production expectations and the productivity of factors, firms invest and hire workers to adapt
their productive capacity. However, they exert some caution in this process, as they do not want to be stuck
with unused elements.
β’ The levels reached in practice define potential production.
β’ Firms also build up inventories.
β’ Households obtain wages, based on total employment (including civil servants) but also a share of Gross
Domestic Product. They consume part of this revenue.
β’ Final demand is defined as the sum of its components: consumption, productive investment, housing
investment, the change in inventories, and government demand.
β’ Imports are a share of local demand (Β«domestic demandΒ»). But the less capacities remain available, the more
an increase in demand will call for imports.
β’ Exports follow world demand, but producers are limited by available capacities, and their priority is satisfying
local demand.
β’ Supply is equal to demand.
β’ Productive capital grows with investment, but is subject to depreciation.
The above framework looks rather straightforward, and certainly simplistic. Obviously, it lacks many elements, such as
prices, financial concepts, and taxes. This will be addressed as later developments.
Let us no go further for the time being. One can observe that if we have not built a single equation yet, a few are
already implicit from the above text.
18
1
2 CHAPTER 1: NOTATIONS AND DEFINITION S
Before we start presenting the process of model building, we must define the concepts we shall use. They will be
based on individual examples taken from our (future) model.
In a general way, a model will be defined as a set of fully defined formulas describing the links between a set of
concepts.
π(. . . . ) = 0
Obviously, a model will be used to measure economic concepts, depending on other elements.
β’ Endogenous variables, or results, whose value will be obtained by solving the system of equations,
β’ Exogenous variables, or assumptions, whose value is known from outside considerations, and which
obviously condition the solution.
If the model is solved over past periods, this value should be known. But in forecasting operations, it will have to be
chosen by the model builder (or user).
For the system to be solved, the number of endogenous variables must correspond to the number of equations.
π(π, π) = π
19
with
β’ Imports will be endogenous, as they depend on local demand. Exports too, depending on world demand.
β’ World demand will be exogenous, as we are building a model for a single country, and we are going to
neglect the impact of local variables on the world economy. Of course, this impact exists, as France is (still) an
important country, and its growth has some influence on the world economy. But the relatively limited
improvement can only be obtained at the very high cost of building a world model. This simplification would
be less acceptable for a model of the USA, or China, or the European Union as a whole (we shall address this
issue in detail later).
Technically, one can dispute the fact that exports are endogenous. As we make them depend only on an exogenous
world demand, they are de facto predetermined, apart from an unforecastable error. But they have to be considered
endogenous. Our model describes the local economy, and one of its features is the exports of local firms, allowed by
the external assumption on foreign demand, but following a local behavior.
As to Government demand, models of the present type will keep it also exogenous, but for different reasons:
β’ The main goal of this model is to show its user (which can be the State, or a State advising agency, an
independent economist playing the role of the State, or even a student answering a test on applied
economics) the consequences of its decisions. So these decisions must be left free, and not forced on him.
β’ The behavior of the State is almost impossible to formalize, as it has few targets (mostly growth, inflation,
unemployment, budget and trade balances) and a much larger number of available instruments1. If their base
values are more or less fixed, it can deviate from them arbitrarily, without too much delay. To achieve the
same goal, past French governments have used different global approaches, calling for different panels of
individual instruments.2
β’ The State alone has enough individual power to influence significantly the national economy.
Each of the two exogenous elements is characteristic of a broader category:
β’ Variables considered as external to the modeled area, on which economic agents considered by the model
have no or little influence. In addition to the world environment for a national model, this can mean
population3, or meteorological conditions4, or the area available for farming. The theoretical framework of
the model can also suppose exogenous structural elements, such as the real interest rate, the evolution of
factor productivity, or the depreciation rate of capital.
1
Not here, but in the general case.
2
For instance, to decrease unemployment, a government can increase demand or reduce firmsβ taxes, and the tax
instrument can be social security contributions, or subsidies
3
In long-term models growth might affect the death and birthed rates thus population.
4
Which can depend on growth (glasshouse effect).
20
β’ Variables controlled by an agent, but whose decision process the model does not describe. Even if it was
formally possible, the model builder wants to master their value, to measure their consequences on the
economic balance. These will be referred to as decision variables or Β«instruments".
Changing the assumptions on these two types of variables, therefore, will relate to questions of very different spirit:
Sometimes the two approaches can also be combined; by considering first the consequences of an evolution of
uncontrolled elements, and then supposing a reaction of the State, for instance a change in policy that would return
the situation to normal. For instance, the State could use its own tools to compensate losses in external trade due to a
drop in world demand.
From a model to another, the field described can change, but also the separation between endogenous and
exogenous. The real interest rate can change its nature depending on the endogeneity of the financial sector,
technical progress can be assumed as a trend or depend on growth, and the level of population can depend on
revenue.
1.2.2.1 Behaviors
The first role of the model is to describe βbehaviorsβ: the model builder, following most of the time an existing
economic theory, will establish a functional form describing the behavior of a given agent, and will use econometrics
to choose a precise formulation, with estimated parameters.
In describing consumption, one might suppose that its share in household income is determined by
β’ The level of income (a higher income will make consumption less attractive or necessary, compared to
savings6).
β’ Recent variations of income (consumers take time in adapting their habits to their new status).
β’ The evolution of unemployment: if it grows, the prospect of losing a job will lead households to increase
reserves.
β’ Inflation: it defines the contribution required to maintain the purchasing power of financial savings.
Once identified, all these elements will be united in a formula, or rather a set of possible formulas (households can
consider present inflation, or the average over the last year; the increase in unemployment can use its level or
percentage change). These formulas will be confronted with the available data, to find a specification statistically
acceptable on the whole, each element participating significantly in the explanation, and presenting coefficient values
5
Provided the EU commission will allow it.
6
Let us recall that investment in housing is considered as savings.
21
consistent with economic theory. Once parameters are estimated, each element of the resulting formulation will
contribute to the logical behavior of the associated agent.
But the process is not always so straightforward. Two other cases can be considered.
β’ The behavior can be formalized, but not directly as estimation-ready formulas. A framework has first to be
formalized, then processed through analytical transformations possibly including derivations and
maximizations, leading finally to the equation (or set of equations) to estimate. This will be the case for our
Cobb-Douglas production function (page 105) for which we compute the combination of labor and capital
which maximize profits for a given production level, according to a set of formulas obtained outside the
model. Or for the definition of the wage rate as the result of negotiations between workers unions and firm
managers, based on their respective negotiating power.
β’ Often the model builder will not be able to formulate precisely the equation, but will consider a set of
potential explanatory elements, waiting for econometric diagnoses to make a final choice between
formulations (generally linear). For instance, the exchange rate might depend on the comparison of local and
foreign inflation, and on the trade balance.
In any case, even if the exact intensity of influences is unknown to the model builder 7, economic theory generally
defines an interval of validity, and especially a sign. Whatever the significance of the statistical explanation, it will be
rejected if its sign does not correspond to theory. In the example above, the increase of labor demand must generate
gains in the purchasing power of the wage rate.
Anyway, the less the theoretical value of the estimated coefficient is known, the more care must be applied to the
numerical properties of the model, at simulation time.
The formulation of these theoretical equations often makes use of specific operators, allowing alternative
calculations: Boolean variables, operators for maximum and minimum. For instance, in disequilibrium models, the
theoretical equation can include a constraint. We can consider also the case of a function of production with
complementary factors, where the level of each factor determines an individual constraint:
with CAP production capacity, L employment, K capital, and pl, pk the associated productivities
1.2.2.2 Identities
A model composed only of behavioral equations cannot generally be used as such. Additional equations will be
needed, this time describing undisputable and exact relationships.
β’ Some concepts are linked by an accounting formula, and we need to ensure their numerical coherence. For
example, once the model has defined household revenue, it cannot estimate savings and consumption
7
Otherwise he would not have estimated it.
22
separately as the sum of the two is known8. A single element will be estimated: it can be savings,
consumption, the savings ratio or the consumption ratio, and the other elements will follow, using an
identity.
β’ Some concepts are linked by a causal sequence of elements, and some elements in the chain are not defined
by behaviors. For example, if we estimate firmsβ employment and household consumption, we must
formalize household revenue (as a sum including wages) to make job creation improve consumption. And in
our example, defining final demand (as a sum of its components) ensures that imports will follow
consumption.
Of course, one can consider eliminating these identities by replacing each element they compute by the
corresponding formula. This is not always technically possible, but in any case it would:
β’ Intermediate variables simplifying formulations (and speeding up computations). Even if the growth rate of
the real wage rate, which uses a slightly complex expression, was not considered interesting as an economic
quantity, it will be useful to define it, if it appears as an explanatory element in many equations.
β’ Purely descriptive elements: the ratio of Government Balance to GDP is a crucial element in evaluating the
financial health of the state (and one of the Β« Maastricht Β» criteria for entering the European Monetary
Union).
β’ Finally, economic theory is not always absent from this type of equation: the supply β demand equilibrium
has to be enforced:
Q (supply from local producers) + M (foreign supply to the country) = FD (demand from local agents) + X (foreign
demand to the country).
And the choice of the variable which balances it has a strong theoretical impact on model properties.
o If exports and imports come from behaviors, and demand from the sum of its components, we need to
compute Q as:
Q (local output) = (FD-M) (local demand supplied by local producers) +X (foreign demand supplied by local producers)
This means that production will adapt to demand (which itself can depend on the availability of products).
The producers chose to limit their output at a level actually lower than demand, because additional production would
bring negative marginal profits. In this case Q will be fixed, and we could have:
8
This would also be absurd in terms of household behavior.
23
o Or the country can only import in foreign currency, which it obtains through exports.
1.2.3 PARAMETERS
Parameters can be defined as scalars with a varying value. The only formal difference with exogenous variables is that
they lack a time dimension9.
Two types of parameters can be considered, according to the way their value is established:
β’ Those estimated by reference to the past: starting from a theoretical but fully defined formula including
unknown parameters, the model builder will seek the values which provide the formulation closest to
observed reality, according to a certain distance. This means using "econometrics".
β’ Those decided by the model builder: economic theory or technical considerations can supply a priori
assumptions concerning a particular behavior. For instance, if a Central Bank uses a standard Taylor rule to
decide its interest rate, its sensitivity to the inflation level should be 0.5. A special case will be represented by
a control variable, giving (without changing the formulation) a choice between several types of independent
behaviors.
The distinction is not as clear as it may seem: in particular, if estimation fails to provide an economically coherent
result, the model builder can be driven to accept the value of a parameter, if it is consistent with economic theory.
And even if not, to choose the nearest value within the theoretical range. For instance, an indexation of wages on
inflation by 1.1 can lead the modeler to apply 1, if the difference is not significant.
π(π, π, πΜ , π) = π
And in our example, one could estimate the influence of world demand on exports, for example by supposing that
relative variations are proportional (or equivalently that the elasticity of exports to world demand is constant).
ππΏ/πΏ = π β ππΎπ«/πΎπ«
9
In EViews, modifying a parameter value applies to the current model, and taking it into account calls for a new
compilation, making the new version operational. This is both tedious and error-prone. One might consider replacing
parameters by series with a constant value, which gives access to the much more manageable βscenarioβ feature.
24
where a should be close to unity, if the share of the country on the world market is stable10.
But if the estimated coefficient not significant, we can get back to:
π₯π/π = π₯ππ·/ππ·
This choice could also have been made from the start for theoretical reasons, or to ensure the long-term stability of
the model.
In practice, the behavior of agents does not answer exactly to formalized functions, and the formulation obtained by
estimation will not reproduce the reality. It will only approximate this behavior, using elements which conform to
some economic theory, each of them providing a significant contribution to the changes in the explained variable. The
number of estimated parameters will then generally be much lower than the size of the sample, or the number of
observed values. In practice, adding elements to the explanation can:
β’ In the favorable cases, improve the quality of the explanation given by the elements already present, which
can now concentrate on their natural role, instead of trying to participate in the explanation of other
mechanisms in which their efficiency is limited11.
β’ But the new element can compete with the others in explaining a mechanism in which they all have some
competence, limiting the improvement and leaving the sharing of the explanation rather undetermined (and
therefore limiting the significance of the coefficients).1213
In practice, these correlation problems will always appear, sometimes very early, and generally before the fifth or
sixth element. Beyond that figure, the precision of individual coefficients will decrease, and global quality will improve
less and less.
This means that a typical econometric equation will contain a maximum of four parameters, while variables will be
known on fifty to one hundred quarters.
10
In our model WD stands for world trade (including its expansion), not the aggregate demand of countries.
11
Just like a worker which has to use his time on two tasks, and is really qualified for one. For example, if an excellent
musician but average lyricist is teamed with a good lyricist, the quality of songs (both music and lyrics) will improve.
12
This can be a problem for the model if the two competing elements have a different sensitivity to a particular
variable. For instance, if one is sensitive to a tax rate, the other not: then the role of the tax rate will be undetermined.
13
If two workers with the same profile complete a task together, it is difficult to evaluate their individual contribution.
One might have rested the whole period.
25
It will be therefore necessary, to formulate an exact model, to accept the presence of non-zero additional terms
(residuals). If one believes in the model, this residual should be interpreted as a random perturbation without
economic meaning14. But if the equation is badly specified, it will also come from other sources: omitting a relevant
variable, replacing it by another less relevant, or choosing the wrong form for the equation 15.
The fault will not always lie with the model builder, who might not have been able to apply his original ideas. The
variables he needs may not be precisely measured, or only with a slightly different definition, or they may not be
available at all, as in, for example, the goals or expectations of a given agent.
Practically speaking, one will often suppose that this residual follows a random distribution, with a null average, a
constant standard error, and residuals independent across periods.
Our formulation becomes therefore, in the general case, noting u the vector of residuals:
π(π, π, πΜ , π, π) = π
In the example, if we want to represent changes in household consumption as a constant share of total production
variations, we will write:
πͺπΆ = π β πΈ + π + π
πͺπΆ/πΈ = π + π
14
One can also take into account that the relationship is not exact. For instance, that an value of an elasticity is only
very close to constant.
15
Of course, as we have said before, one is never able to estimate the Β« true Β» equation. This remark should apply to a
large conceptual error, leading to behaviors distinctly different from an acceptable approximation of reality.
26
It is probably the time to bring an important issue about the nature of econometrics.
When he considers a behavioral equation, the economist can have two extreme positions.
β’ He believes the behavior can be exactly specified according to a formula, which is affected by an error term
with a given distribution (maybe a white noise, or a normal law). With an infinite number of observations, we
would get an exact measurement of the parameters, and therefore of the error (which remains of course)
and its distribution.
β’ He thinks that the concept he wants to describe is linked to some other economic elements, but the relation
is only an application, of which any formula represents only an approximation. To this application a random
term can also be added, if one believes that the replication of the same explanatory elements will bring a
different result. Additional observations will only get a better mapping.
The debate is made more complex by several facts:
β’ The data on which he wants to base his estimation is not measured correctly. One cannot expect the
statisticians to produce error free information, for many reasons: measurement errors, inappropriate sample,
mistaken concepts...
β’ Even if measured correctly, the concepts he is going to use are not necessarily the right ones. For instance, a
given behavior should be applied to the only firms which do make profits, a separation which is not available
at the macroeconomic level.
β’ The discrete lags which he will apply to these concepts are not the right ones either. For instance, it might be
known that an agent considers the price index of the last month, but only quarterly data is available.
β’ The estimation period is not homogenous, and this cannot be explained by the data. For instance, the mood
of consumers (and their consumption behavior) can evolve without any link to measurable economic
elements.
From the above issues, the logical conclusion should be:
β’ The first position is illusory, and to a point which is impossible to measure (of course).
β’ But we have to take it if we want to apply econometric methods.
This means that in the following text we shall put ourselves in the first position, but we will always keep in mind the
true situation and give to the difference between the concept and its estimation the less ambitious name of
βresidualβ.
1.2.6 FORMULATIONS
We shall now consider the form of the equations. Let us first approach the time dimension.
Variables in economic models have generally a time dimension, which means they are known through discrete values,
almost always with a constant periodicity: generally annual, quarterly or monthly series. This means we will consider
models in discrete time.
There are exceptions, however. The most frequent applies to micro-economic models, describing the behavior of a
panel of individual firms or households, and the dimension will correspond to items in a set. Sometimes they will be
ordered, using the level of one variable, such as the income level for a set of households. Time can be introduced as
an additional dimension, but possibly with a varying interval, either predetermined (phases of the moon) or
unpredictable (periods of intense cold).
27
1.2.7.1 Consequences of the discretization
The time discretization of variables will be introduced in several ways, leading to:
β’ really instantaneous variables, measured at a given point in time: the capital on the 31st of December at
midnight, in an annual model (defined as a stock variable).
β’ averages: the average level of employment observed during a period.
β’ flows: the goods produced during a period.
The same economic concept might appear under several forms: inflation and price level, stock of debt and balance for
the period, average and end-of-period employment levels, plus net job creations. For one household, we can consider
the revenue, its yearly change, and the global revenue accumulated during its existence.
When models have a less than yearly periodicity, some series can present a specific distortion depending on the sub-
period inside the year. This can come from changes in the climate: in winter the consumption of electricity will
increase due to heating and lighting, but construction will be mostly stopped. It can be due to social issues: the
concentration of holidays in the summer months can reduce production, and the coming of Christmas will increase
consumption (in Christian countries). We are going here to provide a basic sketch of the problem, leaving a more
serious description to specialized books like Ladiray and Quenneville (2001).
Using unprocessed data can lead to problems: for instance, the level of production in the second quarter will be lower
than what we could expect from labor and capital levels. This will disturb estimations and make model solutions more
difficult to interpret.
The second method will be favored, as it also solves the interpretation problem.
Several techniques are available, the most well-known being Census-X13 ARIMA, developed by the US Census Bureau
and Statistics Canada16. But TRAMO-SEATS17 is also a common choice. Both are available under EViews.
One must be aware that this process often reduces the statistical quality of estimations. For instance, if demand is
particularly high in the last quarter of each year, and imports follow, seasonally adjusting both series will make the link
less clear, bringing less precise results. Even more obviously, the relation between demand for heating and
temperature will lose power from seasonal adjustment.
These examples show the main issue: in a one-equation model, three situations are possible:
The dependent variable contains a seasonal component, in addition to truly economic features. For instance,
agricultural production will be lower in winter, even if the same level of labor, land, fertilizer, machinery is available....
16
https://www.census.gov/srd/www/x13as/
17
http://www.bde.es/webbde/es/secciones/servicio/software/econom.html
28
Truly, at the same time, the use of fertilizer will decrease, and probably of labor too, but in a lower way. One either
adjust this variable or introduce dummy elements. The internal quality of the relationship should be the same, but the
statistical one will improve in appearance, through the correlation of the unadjusted dependent variable with the
explanatory dummies.
On the contrary, if all the seasonal explanation comes from the seasonality of explanatory elements, seasonally
adjusting is not necessary, and even reduces the quality of estimations (with the variability of elements). One could
use raw series to estimate an imports equation, using demand, rate of use of capacities and price competitiveness as
explanatory elements.
But what is true for one equation does not apply to the whole model. One cannot mix the two types of series, and this
means seasonally adjusting will prevail in practice.
To determine the equilibrium for a given period, some models will use only variables from this period: we shall call
them static models. They correspond to the formulation
ππ‘ (π₯π‘ , π¦π‘ , π, π’π‘ ) = 0
The most frequent case is that of input-output models, which use a matrix of "technical coefficients" to compute the
detailed production associated to a given decomposition of demand into categories of goods, which itself depends
only on instantaneous elements.
π = π΄ β πΉπ·
πΉπ· = π(π)
29
β’ institutional: the income tax paid by households can be based on their income of the previous period (this
was the case in France, until 2019).
β’ technical: if a model considers a variable and its growth rate, computing one from the other considers the
previous level.
One observes that each of these justifications supposes that influences come only from previous periods: one will
speak of (negatively) lagged influences.
Let us go back to our model. We can observe already an undisputable lagged influence: most of present capital will
come from the remaining part of its previous level. Any other case is still undecided. However, without going too deep
into economic theory, one can think of several lagged influences:
β’ For household consumption, we have already considered that adapting to a new level of revenue takes some
time. This means it will depend on previous levels. If we detailed it into products, the previous level can have
a positive influence (some consumptions are habit - forming) or a negative one (generally, people do not buy
a new car every quarter):
β’ Firms invest to adapt their productive capacities to the level of production needed in the future. We can
suppose that they build their expectations on past values.
It is interesting to note that the previous formulation could be simplified, eliminating any lag larger than one by the
addition of intermediate variables:
ππ (π²π’,π , π²π£,πβπ€ ) = π
is equivalent to
30
ππ‘ (π¦π,π‘ , π§π,π‘ ) = 0
π§1,π‘ = π¦1,π‘β1
........
in which a lag of k periods on a single variable has been replaced by k one period lags on as many variables (including
new ones).
π1π‘ = ππ‘β1
π2π‘ = π1π‘β1
π3π‘ = π2π‘β1
But if this method simplifies the theoretical formulation, it has the obvious disadvantage of artificially increasing the
size of the model and reducing its readability, without producing additional information. Its interest is reserved to
specific studies. For instance, assessing model dynamics can call for the linearization of the model according to
present and lagged variables. The above transformation will limit the matrix system to two elements (with lags 0 and
1), which will make further formal computations easier, and independent from the number of lags.
ππ‘ (π¦π‘ , π¦π‘β1 , π₯π‘ , π, π’π‘ ) = 0
31
1.2.7.4 A particular case: rational expectations
It has appeared natural, in previous examples, to consider only negative lags. This will happen if we suppose that the
anticipation of agents relies only on the observation of the past (and the present) 18.
β’ That agents have the possibility, by their present decisions, to determine the future values of some variables
(and the associated behavior can be formalized).
β’ That agents perfectly anticipate the future (perfect expectations).
β’ That the expectation by agents of specific evolutions has for consequence the realization of these evolutions
(self-fulfilling expectations).
β’ That agents build their expectations on the behaviors of the other agents19, for which they know the
properties (rational expectations). Basically, this means that they are able to apply the model controlling the
economy (but not necessarily know its formulas), and the decision process defining its assumptions. For
instance, they can forecast the investment program of the Government (depending on economic conditions),
they know how firms and households will react, and they know the links between these elements (they are
able to consider the supply-demand equilibrium). Actually, this is rather called βmodel consistent
expectationsβ.
β’ However, they do not necessarily know the unexplained part of the behaviors (which can be associated with
the random term). If know only their distribution, we shall speak of stochastic rational expectations. EViews
does not provide this feature at present (only one or the other), although this should appear in a future
version. They also do not have to know the actual formulas, just be able to compute them.
You do not have to believe in rational expectations to apply them. Producing alternate simulations with different
assumptions on expectations will improve greatly the insight in one particular model or on economic mechanisms in
general. We shall present this later on a specific case.
This also is a very specific area: some theoretical models will be formulated as a system of equations where variables
appear as a function of continuous time, and variations (or growth rates) become exact derivatives. One ends up then
with a system of differential equations, which one can be led to integrate.
These models seldom evolve beyond a theoretical stage, if only for lack of statistical information.
But some operational models, describing for instance the stock exchange, can reduce their periodicity to a daily or
even shorter value.
1.2.8 LINEARITY
We will consider here the linearity relative to variables. The linearity relative to coefficients will appear in the chapter
on estimation.
18
This use of proxies is made necessary by the absence of direct measurement of anticipations. Exceptionally, they
can be obtained by surveys, leading to a specific estimation.
19
Including the State.
32
The potential linearity of a model represents a very important property for its analysis as well as its solution. But first
we must define the notion of linearity, which can be more or less strict.
π¨ β ππ + π© β ππβπ + πͺ β ππ + π + ππ = π
π΄π‘ β π¦π‘ + π΅π‘ β π¦π‘β1 + πΆπ‘ β π₯π‘ + ππ‘ + π’π‘ = 0
a definition again less restrictive will suppose linearity relative to the sole endogenous variables:
πΊ(π¦π‘β1 , π₯π‘ , π) β π¦π‘ + π»(π¦π‘β1 , π₯π‘ , π) + π’π‘ = 0
Using the multiplier as an example, we can already show that these properties affect the computation of derivatives
of model solutions. We will detail later the consequences on convergence properties.
The first property tells that it does not depend on the initial equilibrium, or the period considered. Multiplying the
shock by a given factor will have a proportional effect. It is enough to compute it once to know it once and for all.
In the second case, the multiplier will depend only on the period. Starting from different base assumptions will not
change the consequences of a given change.
In the third case, the multiplier will depend also on the exogenous values (and the coefficients). It has to be re-
computed each time these elements change (or have changed in the past except for one period ahead solutions), but
can be stored until the next time they do.
The last case is similar to the third one. But convergence will be affected (see later).
33
It is obvious enough that a single non-linear equation makes the model non-linear, according to one of the previous
definitions. Reasons for non - linearity are multiple; one will find in particular:
β’ Expressions measured in growth rates (therefore possibly linear relative to the endogenous of the period).
For example, the growth rate of wages can depend on inflation.
β’ Expressions formulated as elasticities (generally integrated into logarithms). One will suppose for example
that imports and domestic demand show proportional relative variations.
β’ Ratios entering in behavioral equations.
β’ Equations using elements at current prices, computed as the product of a quantity by a deflator (which shows
the evolution of the price compared to a base year). For example, the trade balance will be obtained as the
difference between the products of exports and imports at constant prices by their respective deflators.
Sometimes this distinction is purely formal, and an adequate variable change will allow the return to a linear
formulation. However, if we consider the whole model, replacing by its logarithm a variable computed in elasticities
will only transfer the problem if the level appears also in the model.
Thus, in our general example, if one uses for the exports equation the formulation:
πΏππ(π) = π β πΏππ(ππ·) + π
one can very well introduce variables πΏπ = πΏππ(π) andπΏππ· = πΏππ(ππ·), which will make the equation
linear:
πΏπ = π β πΏππ· + π
π + π = πΉπ· + π
π = πΈπ₯π(πΏπ)
34
Therefore, most economic models presenting a minimum of realism will not be linear. But numerical computations
will generally show that even for models including many formal non - linearities, the approximation by a linearized
form around a model solution (denoted by an asterisk):
On the other hand, the stability of the derivatives with time is much more questionable.
πΏππ(ππ‘ ) = π β πΏππ(πΉπ·π‘ ) + π
Or
which will represent an adequate linear approximation of the connection between M and FD, provided that M and FD
do not move too far away from their base value20. This base value might represent a reference path, from which
actual values differ due to a change in assumptions.
But, if we restrict further the expression to a constant influence (linearity to constant coefficients),
20
In other words, if the terms of the derivative are negligible beyond the first order.
35
the approximation can be accepted only if the ratio M / FD does not change too much with time. This is not generally
true: the expansion of international trade has led, and still leads, to a sustained growth of the share of imports in
domestic demand, for most countries. The ratio M * / FD * will grow strongly with time, and the last formulation will
be quite inadequate for forecasts.
1.2.9.1 Continuity
We consider here the continuity of the whole set of endogenous variables relative to assumptions (exogenous
variables, parameters). It is almost never verified formally, but should only be considered within the set of acceptable
solutions (and assumptions).
For instance, most models use ratios, which is acceptable if the denominator can never become null (like the
productivity of labor measured as the ratio of production to employment). Or using logarithms to link imports to
demand requires (logically) that those elements are strictly positive. In other words, a fully linear model can produce a
negative GDP, but this does not make it less operational if this value is associated with absurd assumptions or
coefficients.
So even if all models show non-continuity potential, it should never occur in practice. We can think of only three
cases:
β’ The model framework is correct, but something is wrong with its elements: the numerical assumptions, the
estimated coefficients.
β’ The algorithm used for solving the model leads to absurd values (more on this later).
β’ The behavioral equations are wrongly specified. As we also shall see later, it can be dangerous to put
together elements without a previous assessment of the associated mechanisms (for instance using
logarithms as a natural solution).
It is necessary, however, to distinguish these absurd cases from those where the discontinuity applies to the derivative
of a variable differentiable by pieces, as we are going to see in the following paragraph.
1.2.9.2 Differentiability
It is less necessary, but its absence can lead to problems in the system solving phase, as well as in the interpretation of
results.
Separating from the previous criteria is not always straightforward, as the non-derivability of one variable can
correspond to the discontinuity of another: a discontinuous marginal productivity can make the associated production
non-differentiable at points of discontinuity.
Returning to the example, we could formalize household consumption in the following manner:
36
πΆππ‘ = π0 β
π β
ππ‘ + (π1 β π0) β
πππ₯( 0, (π β
ππ‘ β π
π‘ ))
At the point Q = R / a, CO is not differentiable (the derivative to the left is c0.a, to the right c1.a). And the sensitivity of
consumption to income is not continuous.
This derivative is not purely formal: it defines the marginal propensity to consume (consumption associated to a
unitary income increase), which can appear itself in the model, at least as a descriptive element.
At the household level, the evolution of income tax as a function of revenue (with rates associated to brackets) would
represent another example, determining disposable household income.
It is obviously necessary for the model to have a solution, at least when it is provided with acceptable assumptions21.
But the potential absence of a solution is present in many formal systems, including linear models. This absence of
solution is generally logically equivalent to the existence of an absurd solution, as one can illustrate on the following
case.
Let use consider a model with n+1 endogenous variables: X (dimension n) and x (a single variable). We shall describe it
as f, a vector of formulas (dimension n+1), in which x appears as an argument of a logarithm,
If none of the positive values of x ensures the solution of the complete model, it has no solution.
and making it vary in R+, solving the associated model on x and X will never provide a value of x equal to this
parameter.
21
Refusing to provide a solution for absurd assumptions should rather be considered as a quality.
37
But if the model builder has used a formulation in logarithms, he has probably not considered letting the argument
take negative values. By replacing the logarithm by some other expression giving similar values, we would probably
have obtained a solution. But if the variable remains negative, this solution would have been unacceptable.
To illustrate this case, we are going to reduce the usual model to a three equations version.
Production adapts to demand corrected by imports and exports, the last being exogenous:
[1] π + π = πΉπ· + π
as for demand, one supposes that its relative variations are proportional to those of production:
[3] π = π β πΉπ·
Let us suppose that one has obtained by estimation in the past: a = 1.05 and b > 0, justified by a level and growth of
demand generally superior to production, obviously associated to imports greater (and growing faster) than exports.
[1β] πΉπ· = π/(1 β π) β π
[2β] πΉπ· = π π β ππ₯π( π)
38
from (2)
and
Obviously, if Q grows (as a-1 = 0.05), the negative element will become eventually higher that the positive one, which
means that Q can only be negative, which is impossible as it enters in a logarithm in equation (2). The model has no
solution.
Of course, these mathematical observations have an economic counterpart. In the long run, final demand cannot
grow continuously faster than production, if imports are a share of demand and exports are fixed. Assumptions,
therefore, are not consistent with the estimated formula.
One will notice that the absence of solution is due here to the implicit adoption of a condition verified numerically on
the past, but not guaranteed in general. This will be in practice the most frequent case.
The uniqueness of the solution, for given (and reasonable) values of parameters and assumptions, is also very
important. Indeed, we do not see how one could use a model which leaves the choice between several solutions,
except maybe if this freedom has a precise economic meaning.
In practice, most models are highly nonlinear if you look at the equations, but the linear approximation is rather
accurate within the domain of economically acceptable solutions. This limits the possibility of multiple equilibria: if the
system was fully linear, and the associated matrixes regular, there would be indeed a single solution. However, as we
move away from this domain, the quasi linearity disappears, and we cannot eliminate the possibility of alternate
solutions, probably far enough from the reasonable solution to appear quite absurd. Fortunately, if we start
computations inside the domain, an efficient algorithm will converge to the acceptable equilibrium, and we will never
even know about any other.
The most significant exception will be that of optimization models, which look for values of variables giving the best
result for a given objective (for example the set of tax decreases which will produce the highest decrease in
unemployment, given a certain cost): if several combinations of values give a result equal in quality 22, this lack of
determination will not undermine the significance of the solution. The existence of several (or an infinity of) solutions
will represent an economic diagnosis, which will have to be interpreted in economic terms23.
Another case appears when the formula represents the inversion of another formula giving a constant value, at least
on a c
ertain interval. For example, if over a certain threshold of income households save all of it:
22
For instance, if the model is too simple to differentiate the role of two taxes.
23
provided the algorithm used for solving the model is able to manage this indetermination.
39
πΆπ = πππ( π(π), πΆπβ )
Then the income level associated with CO * will represent the total set of values higher than the threshold.
In the general case, the main danger appears in sensitivity studies: if one wants to measure and interpret the
economic effects of a modification of assumptions, the existence of a unique reference simulation is an absolute
necessity.
Finally, finding several solutions very close to each other might come from purely numerical problems, due to the
imprecision of the algorithm: any element of the set can then be accepted, if the difference is sufficiently low.
The convexity of the system, which is the convexity of the evolution of each endogenous variable with each
exogenous variable and parameter taken individually (or of a linear combination of them), can be requested by some
algorithms, especially in optimization. In practice it is very difficult to establish, and even rarely verified. At any rate,
this characteristic is linked to the definition of variables, and a single change of variables might make it disappear.
In addition to its theoretical validity, the model will have to meet a set of more technical constraints.
a - on the endogenous between themselves: one cannot let the model compute variables independently if they are
linked by a logical relationship, accounting or theoretical. For example, if the consumer price enters in the
determination of the wage rate, it also will have to be influenced directly by the (estimated) price of local production.
Or the employment level has to affect household revenue and consumption through a sequence of links.
Accounting balances must be verified: once household revenue has been computed as a sum of elements, an increase
in consumption must produce the associated decrease in savings.
Maybe the most important issue lies with the Β« supply = demand Β» identity, which will have to enforced both in at
constant and current prices. This can lead either to use one of its elements to balance the equation, or to distribute
the residual over the global set of elements on one side. By formulating total supply and demand as:
n
O = ο₯ Oi
i=1
And
40
m
D = ο₯ Dj
j=1
m-1
Dm = O - ο₯ D j
j=1
Or one will correct the set of demand variables by multiplying each of them by the uncorrected ratio O / D.
In most cases the equilibrium at constant prices will be enforced automatically. It can be written as:
Local GDP + Intermediate consumption + Imports = Local final demand + Intermediate consumption + Exports
β’ With only one product, intermediate consumption can be discarded, and one will generally use the equation
to compute GDP, checking that it does not get higher that productive capacity24 .
β’ With several products, we must consider as many equilibrium equations, in which the supply of intermediate
consumption goods sums inputs needed for production of the good, and the demand for intermediate
consumption goods sums the intermediate uses of the good itself.
24
This can be obtained by a share of imports growing with constraints on local productive capacity.
41
If we suppose that returns to scale are constant, the vector of value added by good will come from a matrix
transformation. The constraint on capacity will be achieved in the same way as above (provided a capacity equation
can be obtained).
Defining ci,j as the quantity of good i needed to produce one unit of good j, we get:
Or in matrix terms
π + πΆ β π + π = πΉπ· + π‘πΆ β π + π
or
Q = ( I β C + t C) β1 ( FD + X β M )
Using this framework will automatically enforce the supply-demand equilibrium for all goods.
In practice, most of the problem comes from the equilibrium at current prices. If demand prices are computed
individually using behavioral equations, there is no chance the equilibrium will be met. The process described earlier
will in practice correct the prices. With S and D as supply and demand elements at constant prices, ps and pd as the
associated deflators, we can compute the global values as:
DV= βπ
j=1 pdπ π·π
42
m-1
where the βpdβ elements are the independently computed demand prices, and the βpd'β elements the corrected
values.
r = SV/ βπ
j=1 pdπ π·π
r = r (SV/ βπ
j=1 pd'π π·π )
which with
pd 'j = r pd j
gives a set of equations ensuring the equilibrium. As βrβ measures the potential discrepancy between supply and
demand, one must check that it is not too different from one.
o A small and unimportant variable, to reduce the consequences for model properties; perhaps even a variable
which has absolutely no influence on the rest of the model.
o A variable with large value, to reduce the correcting factor
β’ The second method represents an extreme application of the first one, where all variables on one side are
affected in the same proportional way.
Actually, none of the solutions dominates clearly, the worst being in our sense the very first, which is the same as
accepting de facto an imbalance, hidden but with potentially damaging consequences. Also, the second could be
associated with a converging economic process, while the first can have no economic interpretation whatsoever.
In fact, one should concentrate on limiting the size of the correction itself. One could represent the problem as
eliminating dust: instead of storing it in a specific location (under a carpet), or spreading it all over the place, the best
solution is clearly to reduce its production as much as possible. This means that the initially computed prices should be
designed to give naturally close values to global supply and demand.
b - on exogenous-> endogenous connections: Connections must be formulated with care. For example, if the social
contributions rate is defined as an exogenous variable in the model, it has to enter in all computations of contribution
levels. In particular, it cannot coexist with an exogenous representation of contributions, or one using an estimated
coefficient.
To avoid this type of error, a systematic study of model properties must be undertaken before any operational
application: in our example, this would mean checking that an increase of the social contribution rate has all expected
effects on State revenues as well as on the accounts and behaviors of concerned agents.
Also, the true exogenous concept should be decided. Concerning contributions, the decision variable is clearly its rate,
while the associated revenue is influenced by endogenous prices and employment.
c - on the exogenous between themselves: one should avoid defining two variables as exogenous if they are linked (in
any direction) by a logical relationship. If possible, one should endogenize one of them by formalizing this connection.
Let us suppose for example that a model for France uses two exogenous measures of the prices established by its
foreign competitors: in foreign currency and in Euros (with a fixed exchange rate). To take into account an increase of
foreign inflation, these two variables will have to be modified simultaneously. This is as best more complex, and can
lead to errors if one is not careful enough, while it can be avoided simply by endogenizing the price in Euros as the
product of the price in foreign currency by the (exogenous) exchange rate.
However, establishing such links is not always possible. For instance, in a national model, foreign prices and foreign
production are exogenous, but also clearly influenced by each other. But the nature and importance of the link are
highly variable. For instance, a decrease in foreign production can produce world deflation,25 while inflation can
reduce exports and production. To describe them completely one should have to resort to a foreign or world model.
An intermediary solution could be to establish a set of linear multipliers linking these elements, but generally the
model builder himself will take care of the problem by producing a set of consistent assumptions (with perhaps some
help from specialists of the world economy, or from a separate model).
25
This is the case for the MacSim world model we shall present later.
44
d - on endogenous-> exogenous connections: they are obviously proscribed, because contrary to the preceding links
the model builder cannot master them. They will be found in some models, however, through the presence of the
following exogenous:
β’ Elements measured in constant terms, while they should change with economic activity.
β’ Deflators, which should depend on other deflators.
β’ Elements measured in current terms, for both reasons.
If the associated model can possibly produce correct estimates and even forecasts, it runs the risk of showing
abnormal sensitivity properties. Let us take an example:
β’ Of the wage revenue, computed as the product of employment by the wage rate: LT . W.
β’ of other exogenous revenues
Salaries will be indexed perfectly on prices:
π = ππ β πΆππΌ
This equation might perform well in forecasts. But if a change in the assumptions makes prices increase, the
purchasing power of total wages will remain unchanged, but for the complement HIQ it will be reduced in the same
proportion as the price rise:
One can question this assumption. Some elements in non-wage revenue (social benefits, rents, firm ownerβs profits,
independent workers revenue) are more or less indexed, and can even be over indexed in the case of interest
payments (the interest rate should increase with inflation). Others, associated to differed payments (dividends,
income tax) will not change immediately. The global sensitivity to prices is not clear, but a null value is obviously not
correct.
45
π₯(π»πΌπ/π) = β(π₯π/π) β
(π»πΌπ/π)
where we cannot suppose that revenue does not change (grow) with economic activity. Some elements do not or
show a limited sensitivity (pensions) but dividends and the revenue of owners of small firms certainly do.
In conclusion, even when a variable measured at current prices has no theoretical content, it should not be kept
exogenous, especially if it can be supposed to grow at constant prices. It is general better to consider as exogenous its
ratio to another variable, supposed to follow the same trend (in the absence of idea, one can use plain GDP). The
model equation will compute the variable by applying the exogenous ratio. This is also can be valid for variables at
constant prices (which generally increase with production), to the exception of decision variables identified as such.
in which the introduction of Q links additional revenue to the global growth of the economy.
1.2.10.2 Homogeneity
If some equations in a model do not meet homogeneity constraints, this endangers its properties, particularly its
sensitivity to shocks. Let us quote some cases:
CO (consumption at constant prices) = a HRI (current income) + b is not only absurd from a theoretical viewpoint,
but will lead in the long-term to a level of savings
πΆπ = π β πΏππ(π»π πΌ) + π
46
(this time the two elements will be measured in quantities) makes the ratio CO / HRI decrease to 0, and therefore the
savings rate to 1, when HRI grows indefinitely.
This last example shows however a limit to the argument: on short periods the equation can present a satisfactory
adjustment, as the consumption to income ratio (propensity to consume, complement to 1 of the savings rate)
decreases effectively with income. It is the speed of the decrease, and its long-term evolution, which is questioned
here.
The problem is identical to that of the exogenous with dimension. It invites a careful study of the theoretical content
of the constant. Furthermore, as most variables grow with time, the influence of the constant will generally decrease
or even disappear in practice. We shall address this issue later, on a practical case.
Once equations have been estimated, the problem of normalization remains. We have seen that very often the
estimated formula will not explain a variable, but an expression (logarithm, growth rate, ratio, or a more complex
expression). But some simulation algorithms will request a model a specific form, called βidentifiedβ, in which a single
untransformed variable appears on the left-hand side:
π¦π‘ = ππ‘ (π¦π‘ , π¦π‘β1 , π₯π‘ , π, π’π‘ )
This means the model builder might have, after estimation, to transform the formulation: this operation is called the
normalization of the model.
β’ The application of some solution algorithms is made easier. In some cases (Gauss-Seidel), this form is actually
requested.
β’ This type of formulation allows a better interpretation of the process determining the equilibrium, provided
each equation can be interpreted as a causal relation. If the equation describes a behavior, the economist
should have placed to the left the element it is supposed to determine, conditional on the elements on the
right. This is what we can (and will) do naturally in our example. For instance, the equation describing the
choice by households of their consumption level will place naturally the variable "consumption" to the left.
The vast majority of equations will take naturally an identified form. Sometimes, a simple transformation will be
necessary, however. Perhaps the most frequent nonlinear operator is the logarithm, associated with the integration of
a formula in elasticities.
ππ₯/π₯ = π(. . . )
represents
47
πΏππ(π₯) = β« π(. . . . ) β
ππ₯
πΏππ(π₯) = π(. . . . )
by
π₯ = ππ₯π( π(. . . . ))
If you use EViews26, the software will do it for you. You can write the equation using the first form, and the software
will normalize the equation itself, computing x. This is also true if the left-hand element contains several variables, but
allows straightforward normalization. The most frequent cases are:
To choose which variable to compute, EViews will take the first variable in the specification of the equation. This
simple method will be applied even if the variable has been identified as computed by a previous equation. For
instance, in our model, if we introduce the estimation of imports M, then state:
π + π = πΉπ· + π
26
Or most packages of the same type.
48
Moreover, when an equation is forecasted individually, one can chose between the computation of the left-hand term
and the element which determines it, for instance M or οlog (M) for our imports equation.
However, EViews does not solve analytically any equation for the variable. For instance:
π/(π + π) = π(. . . . )
π = (π + π) β π(. . . . )
π(π¦, . . . . ) = 0
π¦ = π¦ + π(π¦, . . . . )
However the convergence of a model defined in this manner is often difficult to obtain (for instance if βfβ is positively
linked to y). In that case, one can use (the value for βaβ can be negative):
π¦ = π¦ + π β π(π¦, β¦ . )
Stronger simplifications are sometimes possible and will be approached with the numerical solution process.
49
Identification is not always economically straightforward: in our example, when balancing demand and supply, we can
observe that three last variables (Final demand, Exports and Imports) are going to be determined by their own
equation (the sum of its elements for the first, estimated equations for the others). This means that balancing must be
done through GDP, and we must write the equation as:
π + π = πΉπ· + π
or
π = (πΉπ· β π) + π
which makes its theoretical content clearer as: production must (and can) satisfy both exports and the non-imported
part of domestic demand.
50
1.2.12 CONCLUSION
It must be clear by now that the formal definition of the whole set of equations represents with the estimation of
behavioral equations an iterative and simultaneous process:
β’ Behavioral equations start from an initial theoretical formulation to evolve gradually to their final form by
reconciling this theory with data and estimation results.
β’ Accounting equations have been defined as precisely as possible in the preliminary phase, to establish a
coherent framework, but they often will have to adapt to the evolution of behavioral equations. Let us
suppose for example that the first estimation results suggest excluding from the econometric explanation of
exports their agricultural component, setting it as exogenous: a new equation and variable will appear, and
the equation for total exports will become an identity.
51
2 CHAPTER 2: MODEL APPLICATIONS
1. We shall now give a panorama of applications using models. Comments will be centered on the example of economic
models, and more particularly on the macro-economic ones. But most of the observations can be transposed to the
general case.
2. For each of these applications, technical details shall be left to the "implementation" part (chapter 7). To understand
these practical aspects of the use of models, one must first know about the way they are built, described later in
chapters 4 to 8.
3. The most natural use of a model seems to be the evaluation of the economic future, whether as its most probable
evolution or as the consequences of some decisions. Assumptions concerning the future will be injected into the
model, and its solution will produce the requested diagnosis. Thus, one will seek to anticipate the evolution of the
main aggregates of the French economy until the year 2020, taking into account assumptions on the evolution of
international economy.
β’ In a scenario, one is interested in absolute results, and associating to a full set of assumptions a future
evolution of the economic equilibrium. One might seek to obtain
β’ On the contrary, with a shock, one starts from a base simulation (often called "reference forecast" or
βbaselineβ), or a simulation on the historical period, and measures the sensitivity of the economic equilibrium
to a change of assumptions. Two economic paths will then be compared (on the past, one of them can be the
historical one).
These shocks can be more or less complex, from the modification of a single assumption to the testing of a new
economic policy27.
These two techniques, scenarios and shocks, before the production of any operational policy diagnosis, will play an
important role in the model validation process.
Now that we have described the characteristics of models and their basic use, we shall discuss the advantages they
bring (and their failings too).
Relative to the diagnosis provided by a human expert, advantages common to all models will:
27
However, this new policy should stay within the economic framework of the original model.
52
β’ Guarantee the accounting coherence of the resulting equilibrium.
β’ Consider a practically unlimited number of interdependent influences.
β’ Provide an explicit formalization of behaviors, allowing an external user to interpret them.
β’ Produce an exact and instantaneous computation of associated formulas.
β’ Adapt immediately the full system to a local change of theoretical formulation.
but also
β’ The progress of economic theory, allowing the formalization of more sophisticated mechanisms, better
adapted to the observed reality.
β’ The progress of econometrics, giving access to the statistical method that will produce the most reliable
formulation associated with a given problem, and to test more complex assumptions.
β’ The improvement of numerical algorithms, both for computation speed, and solving more complex systems.
β’ The simultaneous improvement of computation hardware allowing to process problems of growing size, by
increasingly complex methods.
β’ The progress of modelling science, in producing models better adapted to the original problem, facilitating
the production of assumptions, and reducing the cost of reaching acceptable solutions.
β’ The production of computer software specialized in model building, increasingly efficient, user-friendly, and
connected with other packages.
β’ The improvement of the reliability of data, and the growth of the available sample, regarding both the scope
of series and the number of observations (years and periodicity)28.
β’ The easier communication between modelers, through direct contact and forums, allowing to communicate
ideas, programs and methods, and to get the solution to small and large problems already addressed by
others.
However, the use of models has engendered criticism from the start, using often the term Β« black box Β», describing
the difficulty in controlling and understanding a set of mechanisms often individually simple but globally very complex.
In recent decades criticism has mounted to the point of calling for a global rejection of traditional (βstructuralβ)
models. Surprisingly, critics often find their arguments in the above improvements. One can find:
A utilitarian critique: models have proven unable to correctly anticipate the future. If this observation has appeared
(in the beginning of the eighties), it is obviously not because the quality of models has declined. But information on
model performance is more accessible (some systematic studies have been produced), and the fluctuations following
28
However, the size of samples does not necessarily grow with time. In a system of national accounts, the base year
has to be changed from time to time, and the old data is not necessarily converted.
53
the first oil shock have made forecasting more difficult. In periods of sustained and regular growth, extrapolating a
tendency is very easy for experts as well as for models. Paradoxically, the emergence of this criticism has followed,
rather than preceded, the increasingly direct intervention of model builders and their partners in forecasting results.
An econometric critique: modern techniques require a quantity and a quality of observations that available samples
have not followed. A gap has opened between estimation methods judged by econometrics theoreticians as the only
ones acceptable, and methods really applicable to a model29.
A theoretical critique: the development of economic theory often leads to sophisticated formulations that available
information has difficulty to validate. And in any event many areas present several alternate theories, between which
any choice runs the risk of being criticized by a majority of economists. Thus, in the monetary area, going beyond a
basic framework leads to rely on information unavailable in practice, or on formulations too complex to be estimated.
A mixed critique: users of models are no longer passive clients. They criticize formulations, as to their estimated
specification, or their numerical properties. This evolution is paradoxically favored by the improvement of the logical
interpretation of economic mechanisms, itself fathered essentially by economic knowledge (even the economic
magazine articles use implicit macroeconomic relations) and modelling practice (the population of clients includes
more and more previous model builders or at least followers of courses on modelling). One could say that model users
ask the tool to go beyond their own spontaneous diagnosis, and they want this additional information to be justified.
It is clear that these criticisms grow in relevance as the goal grows in ambition. Forecasts are more vulnerable than
simple indicative projections, which seek to cover the field of the possible evolutions. As for policy shock studies, they
are not prone to errors on the baseline assumptions, if we discount non-linearities30.
This relevance will also depend on credit granted to results. One can use figures as such, or be content with orders of
magnitude, or even simply seek to better understand global economic mechanisms by locating the most influential
interactions (possibly involving complex causal chains). In our sense, it is in this last aspect that the use of models is
the most fruitful and the least disputable31.
Contrary to previous models, theoretical models may be built for the single purpose of formalizing an economic
theory. It may be sufficient to write their equations, associating to a theoretical behavior a coherent and complete
system. Reproducing the observed reality is not the main goal of these models, and it is not mandatory to estimate
parameters: one can choose an arbitrary value, often dictated by the theory itself. In fact, this estimation will often be
technically impossible, when some of the variables used are not observed statistically (the goals or expectations of
agents for example).
29
Actually, the sample size required by present techniques (50 or better 100 observations) limits the possibility of
estimating equations using deflators or variables at constant prices. Even using quarterly data, separating values into
prices and volumes is quite questionable 15 years from the base period.
30
With a linear model, the consequence of a shock depends only on its size, not on the simulation it starts from.
31
One example is the impact of a decrease in local tariffs. Ex-ante it increases imports (a negative demand shock). Ex-
post it decreases local factor costs (with cheaper investment and cheaper labor, indexed on a lower consumption
price). This leads to more local capacity and competitiveness, both on the local scene (limiting the imports increase),
and the foreign one. In most models, GDP decreases then grows.
The full interpretation of such a shock provides a lot of information, even if one remains at the non-quantitative level.
54
However, even based on an artificial series and arbitrary parameters, the numerical simulation of these models can be
interesting. Actually, the formulas are often so complex that solving the model numerically will be necessary to
observe its solutions as well as properties (such as the sensitivity of solutions to assumptions and to coefficients).
These models represent an intermediate case. One seeks a realistic representation of the economy, adapted to
observed reality, but sufficiently simple to accept the application of complex analysis methods (and the interpretation
of their results). In addition to scientific research, this study can be done to measure and to analyze properties of an
operational model on a simplified representation (in the eighties Minims, then MicroDMS have been used to
characterize the Dynamic Multi Sectorial model of INSEE).
β’ βExternalβ methods will use model simulations to observe its quantitative properties, and infer a descriptive
comment, both statistical and economic.
β’ βInternalβ methods seek to explain properties of the model by its structural characteristics, using
mathematical tools. This does not necessarily call for actual simulations.
Although often of the same type as the ones above, these models try to present economic mechanisms as complete as
possible, based on real data, under an interpretable and concise form. If necessary, one will favor the message
contained in the presentation over the respect of statistical criteria.
This is the case of the MacSim package, allowing students to interpret international mechanisms and interactions.
55
3 CHAPTER 3: MODEL TYPES
We shall now try to establish a classification of models, focusing on the link between the modelβs characteristics and
the goal for which it has been built.
The field described by a model is characterized by the variables it computes, but also by assumptions it takes into
account.
β’ A geographical field: national models, multinational models, world models. These last can be built in two
ways: by putting together preexisting national models, with potentially quite different structures, or by
building simultaneously country models of identical structure, possibly with a single team of modelers. We
shall deal with this later.
β’ A theoretical field: the theory used for the formalization of the model may or may not approach specific
economic aspects. A Keynesian model might limit the treatment of monetary aspects. A short-term model
will not formalize demographic evolutions.
β’ A field of units: a model might present only variables at constant prices, or physical quantities like barrels of
oil or number of pigs.
β’ A field of agents: a model will describe the behavior of a single agent: households, the State, firms.
β’ A field of goods: a model might consider only the production and the consumption of one good, for example
energy. An energy model can use physical units.
There are other types of fields. However, the distinction is not always easy: some models will describe summarily a
global field, except for a certain aspect on which it will concentrate. An energy model, to consider interactions with
the rest of the economy, will have to model it also, but not in the same detail. And it can mix physical units (barrels of
oil or gigawatts) with national accounts elements, with obvious conversion problems.
On the other hand, it will always be possible, and made easier by some modelling packages, to change (actually to
restrict) at the time of simulation the scope of the model. The distinction is then no longer permanent: a multi-
national model can be used to simulate a complete evolution of the world economy, but its user can also restrict
calculations to the evolution of a group of countries or even a single one, the other elements being fixed. One can
simulate a model of the real economy with or without additional monetary features. Or a model using normally
rational expectation elements can drop them to become purely backward looking.
The history of modelling shows that for a long period new models generally have seen their size grow, for the reasons
cited earlier: the progress of model-building techniques, the increased availability of data, the faster computer
computations. Additionally, for any given model, size increases regularly in the course of its existence, as new team
members want to add their contribution.
However, the last decades have seen a trend favoring a return to models of limited size. Productivity improvements,
requested from teams of model builders, are less and less compatible with the management of a large model. Despite
the progress of model-building techniques, the desire to reduce costs and delays conflicts with the size, especially (but
not only) regarding human operations: elaboration of assumptions and interpretation of results.
56
Also, the use of a very detailed model can make individual estimations and specifications look too expensive. The
attractiveness of a calibrated CGE model will increase.
Finally, the desire to reply to critics comparing models to "black boxes" leads model builders to look for more explicit
and manageable instruments.
However, paradoxically, the need for detailed explanations, the availability of more detailed data, and the increased
power of computers (both in speed and size of manageable problems) has led to the development of more detailed
(often extremely detailed) tools: the Quasi-Accounting versions, considering generally a large number of products
(possibly hundreds), with a limit depending only on the availability of data. The framework is generally a full input-
output table.
Of course, econometrics are no longer applicable, and formulations are most often rather crude, with behaviors
established as exogenous ratios. But this also makes specifications clearer and more manageable, and the properties
easier to control. Also, the request for large samples is less strong.
This issue will be treated in a specific part. Among the cases we will present, we will consider a model of more than
15 000 equations, which can be summarized by collapsing the dimensions into a 50 equations presentation.
The degree of aggregation will not be inevitably uniform: an energy model will use a particularly fine detail for energy
products.
In fact the same model can appear under several versions of different size, depending especially on the degree of
aggregation. Each version has then its proper area of utilization: detailed forecasts, quick simulations, mathematical
analysis, and educational uses.
Thus at the end of the 1980s, the 3000 equation D.M.S model (Dynamics Multi Sectorial) used by INSEE for its
medium-term forecasts had two companion versions of reduced size: Mini - DMS (200 equations), used for some
operational projections and analysis which did not require detailed products, and Micro - DMS (45 equations), with an
essentially educational purpose.
This distinction has lost most of its validity, however, following the reduction of the size of operational models.
57
3.3 THE HORIZON
If a model is designed for forecasting, its horizon will be defined at the construction of the model. It will be strongly
linked to its general philosophy and to the set of mechanisms it implements. A long-term model will be little
interested in circumstantial phenomena (such as the lags in the adjustment of wages to prices), while a short-term
one will not consider the longest trends (such as the influence of the economic situation on demography).
These differences seem to discard elaborating a model that can be used for both short - and long-term projections.
But we shall see that strong reasons, in particular econometric, have made this option appear as the most natural in
the present context. We will develop them when we address periodicity, in paragraph 3.4.
In any case, one can find a certain asymmetry in the relevance of this observation. If long-term models can neglect
intermediate periods if they do not show significant fluctuations, simulation of the periods beyond the operational
horizon can evidence future problems, already present but not visible in the short-term.
It is clear:
β’ That treating medium or even short-term problems calls for a model with stabilizing properties, which can
only be controlled through long-term simulations. This includes in particular controlling the existence and
speed of numerical convergence and evidencing cyclical properties.
β’ That observing long-term properties has to be complemented by the intermediary profile. Again, a long-term
stabilization can be obtained through a monotonous or cyclical process.
Here, the horizon depends on the type of analysis one wants to produce. Often, to analyze a model built with a given
forecasting horizon, simulation over a longer period must be obtained. Even more than for forecasts, analytic shocks
will show and explain anomalies that were not apparent in the normal projection period, but had already a
significantly harmful influence. We shall stress these issues later.
32
One shall notice that we can use several words to characterize these exercises: forecasts, projections, scenarios,
simulations. It all depends on the purpose for which the test was made, and perhaps the trust allowed to the results.
We favor the last term, which unfortunately has to be completed into: Β« simulation over future periods Β».
58
The periodicity of a model is linked to the mechanisms it seeks to study and therefore to its horizon.
Short-term models require a short periodicity to consider circumstantial phenomena: delays in the wage indexation
on prices, progressive adjustment of the consumption level to an increase of income.
Long-term models can use a sparser periodicity, less for theoretical reasons (long-term behavior can be described by a
short-periodicity model), than for technical ones: this choice will reduce constraints on the availability of series,
facilitate the production of assumptions, and limit simulation costs.
However, we shall see that the use of βmodernβ econometrics methods calls for a short periodicity, for all kinds of
models, as soon as estimations are considered.
This means that the main determinant of model periodicity comes from the data. Countries which produce quarterly
national accounts use quarterly models, which allow them to apply modern techniques with some comfort, and
produce both short and long-term studies. Of course, results can be summarized later in yearly tables.
When only yearly accounts are available, the techniques become more simplistic, and true short-term applications are
not possible. Unfortunately, this applies most often to countries with a short history of statistics, making the problem
the harder.
We have essentially concentrated on the macro-economic model case. One can also find:
These models will sometimes be more theoretical, calling for optimization computations (such as cost minimization)
or to elements of strategy (game theory). They will often be estimated on survey data, with very large samples
β’ Non-economic models: they can apply to biology, physics, chemistry, astronomy, meteorology, ecology,
process control, and so on.... and can be used to evaluate the consequences of building a dam, controlling a
manufacturing process, looking for the best organization of a project, describing a biological process. These
models will often be conceived not as a formalized equation system, but as the maximization of a criterion
under some constraints, or as a system of propositions connected by logical operators.
59
4 CHAPTER 4: GENERAL ELEMENTS
This part of the book describes the process of development, use and management of a model, taking special interest
in technical aspects and particularly computer-oriented features. Application to EViews will be presented in detail, but
most of the teachings can be applied to other packages, including those which are not dedicated to econometric
structural modelling.
But let us give first a quick description of the organization of the model building process.
The first step in the building of any model is producing a draft which ensures some compatibility between available
data (wherever it might come from) and the type of model its builder has in mind (goal, field, nature of the variables,
underlying theory).
Knowing the scope of available data, the economist will define a model framework for which values can be attributed
to all variables, either by using available elements, by computation or as a last resort to expert advice (including the
modeler itself). This means that a first decision has to be made as to the field described by the model, the variables
used as assumptions, and the variables it shall compute. Moreover, he must divide the equations into identities, which
set indisputable links between variables, and equations describing the behavior of agents, for which the final
formulation will be based on past evolutions of the associated elements. In the course of model building, this status
can change.
The first task will be to gather, by reading from files and transforming the data, the full set of variables needed by the
model, to define the form of the identities, and give a first assessment of the behaviors he intends to describe. He
shall check for which periods the necessary data is known, and that on these periods identities hold true. If some
elements are not available, he will use the best proxies he can get. And if this also fails, he will use his imagination.
He can also make a first economic analysis of the framework implied by model specifications. This is greatly helped by
EViews which can give essential information on the modelβs logic, even in the absence of any data.
4.1.2 ESTIMATION
The second phase will look for a satisfying description of the behavior of agents, by checking economic theory against
available data. The modeler shall define a set of formulations with unknown parameters, compute for each
formulation the values which give the best explanation of past evolutions, and make his selection, using as criteria
both statistical tests and compliance to economic theory. This process can call for the introduction of new variables, or
changes in some definitions, which will mean reformulating some identities.
Of course, both individual and global consistencies must be applied. For instance, using a Cobb-Douglas production
function implies considering the global cost in the equation for the output deflator.
Once the full model is defined, one can try to solve it.
60
β’ One shall first check for consistency the set of equations, data and parameters, by applying each formula
separately on the sample period. If the estimation residuals have been introduced as additional elements, the
process should give the historical values in all cases.
β’ One shall then simulate dynamically the full model on the same period, setting (temporarily) the residuals to
zero. This will show if considering current and lagged interactions does not amplify too much the estimation
errors, both on the current period and with time. Using an error correction framework should limit the risk of
divergence.
β’ Finally, the reactions of the equilibrium to a change in assumptions, for instance the exogenous component
of demand, will be measured. The results will be compared with the teachings of economic theory, and what
is known of values given by other models, moderated by the characteristics of the country. However, one
should not spend too time here, as simulations over the future will provide a much better context.
Discovering discrepancies can lead to changes in some elements of the model, including the set of its variables. This
means going back to step 1 or 2.
Once the model has passed all tests on the past, further tests will be conducted, under conditions more
representative of its actual use: on the future. For this, values will have to be established for future assumptions.
Again, the sensitivity of the model to shocks will be studied, this time with a longer and smoother base, better
associated with future use. As to the reliability of baseline results, one can rely this time on stochastic simulations.
The results of this step can of course show the necessity to revert to a previous stage, including the introduction of
new data, changing causalities, or re-estimation. To limit the number of backward steps, one should introduce in the
original data set all potential variables, and decide on behavioral equations considering the global properties.
Finally, the model will be considered as fit for economic studies: forecasts and economic policy analysis.
From now on, we shall suppose we are using a dedicated package like EViews. But for people who till model through a
spreadsheet), most of our observations will stim apply.
β’ Methodical option:
o Specifies completely a coherent model (including accounting equations), precisely separating assumptions
from results.
o Looks for the necessary series.
o Estimates behavioral equations.
o Uses the consequent model.
Applying such a framework is obviously illusory, as many backtrackings will be necessary in practice:
61
o Some series will show up as unavailable, and it will be necessary to replace them or to eliminate them from
formulations. Thus, in the absence of series for interests paid by firms one will have to be content with profits
before interests.
o Some estimations will give unsatisfactory results: it will be necessary to change formulations, to use
additional or alternate series. Thus, a formulation in levels might have to be replaced by a formulation in
logarithms (constant elasticities), or in growth rates. Or one will be led to explain the average monthly wage
instead of the hourly wage, and to introduce in this last explanation the evolution of the minimal wage. For
an oil producing country, it will appear necessary to identify oil (and non-oil products) in both production and
exports.33
o New ideas will appear during estimation. For example, a recent article on the role of Foreign Direct
Investment might lead to test an original formulation.
o Formal errors are going to be identified. Thus, an element (a type of pension) might have been forgotten
from householdsβ income.
o Some variables defined as assumptions are going to appear sufficiently influenced by results to see their
status modified.
o Some causalities will be questioned when observing numerical properties.
o Simultaneities will have to be replaced by lagged influences.
o The size (or even the sign) of the response to changes in the assumptions will be inconsistent with theory.
o The model will not converge, and the specifications will be considered as the cause.
β’ Improvisation
o establish general options for the model structure and theoretical framework,
o produce some formulations independent from each other,
o estimate them by accessing to separate series,
o And gradually connect selected elements by completing the model with linking identities and the data set
with the necessary exogenous variables.
This framework will be even less effective, if only because the number of single operations on equations and series
will present a prohibitive cost. Furthermore, enforcing the accounting and theoretical coherence of the model could
prove difficult, and the modelling process might never converge at all to a satisfying version.
o Define as precisely as possible the field and the classification of the model.
o Define its general theoretical options and its goal.
o Obtain, create and store the total set of presumably useful series, with no limitations.
o Establish domains to estimate, specify associated variables and set formal connections, especially accounting
relations.
o Undertake estimations
o And go through changes (hopefully limited) until an acceptable form is obtained.
It is clear that this type of organization is all the easier to implement if:
β’ The size of the model is small: it is possible to memorize the total set of variable names for a thirty equations
model, but for a large model a formal documentation will be necessary, produced from the start and updated
33
Actually, this should have been evident from the start.
62
regularly. This framework should be discussed in detail by the modelling team and as many outsiders as
possible.
β’ The number of concerned persons is small (the distinction comes essentially between one and several): for a
team project, the role of each participant and his area of responsibility have to be clearly defined. Especially,
physical changes (on both data and model specifications) should be the responsibility of one individual, who
will centralize requests and apply them. And modifications must be clearly stated and documented, through a
formal process.
Individual modifications of the model can be allowed, however, provided a base version is preserved. Thus several
members of a team of model builders can test, one a new production function, another an extended description of
the financial sector. But even in this case updates will often interfere, at the time modifications generated in separate
test versions are applied to the base one. For instance, a new definition of the costs of wages and investment, which
define the optimal shares of labor and capital in the production function, will influence the target in the price
equation.
63
5 CHAPTER 5: PREPARING THE MODEL
We shall now start with the first (and probably most important task): preparing the production of the model.
One might be tempted to start actual model production as soon as possible. But it is extremely important to spend
enough time at the start evaluating the options and choosing a strategy. Realizing much later that he has chosen the
wrong options, the modeler is faced by two bad solutions: continuing a process leading to a subpar model, or
backtracking to the point where the choice was made.
β’ The organization of tasks, like producing at first single country models, for a world modelling project.
β’ Economic issues, like choosing the complexity of the production function.
β’ Accounting issues, like deciding the decomposition of products, or the distinction into agents.
β’ Technical ones, like the number of letters identifying the country in a world model series name.
At the start of the model building process, the modeler (or the team) has at least:
β’ The data can be directly available, as a computer file, but not necessarily in the format needed by the
modelling package. Many databases (like the World Bankβs World Development Indicators) are stored on the
producerβs website in Excel or CSV format. In more and more cases, access can be provided from inside
EViews, but this is not necessarily the best option.
β’ Equations may have already been established, either as formulas or even estimated items, if the modeling is
the continuation of an econometric study, produced by the modeler or another economist.
In any case, the first stage in the process should lead to:
β’ A fully defined set of equations, except for the actual estimated formulas.
β’ The corresponding set of potentially relevant data.
Obviously, these two tasks are linked, as equations are established on the basis of available data, and the data is
produced to fit the model equations. This means that they are normally processed in parallel. However, it is quite
possible:
β’ To produce most of the data before the equations are defined. Some concepts (the supply - demand
equilibrium at constant and current prices, employment, the interest rates) will certainly appear in the
model. But some model-specific variables might have to wait.
β’ To produce the model specification before any data is available. Of course, writing an identity, or stating the
equation to be estimated, does not require data. It is only the application (checking the identity is consistent
or estimating the equation) which does. But one must be reasonably sure that the data will be available, or
that there will be a reasonable technique totargettarget estimate it.
One can even produce a first version of the program transforming into model concepts the original data, once these
concepts are completely defined, but before any data is technically available (one just needs their definition).
64
One can compare the situation with the building of a house: one can draw the plans before the equipment is
purchased, but its eventual availability (at the right time) must be certain. And the goods can be obtained before the
plans are completely drawn (but the chance of having to use them must be reasonably high)34.
One can even imagine the data using a random process, and apply an estimation program, without considering the
results but checking for the presence of technical mistakes.
These options are not optimal in the general case, but they can help to gain time. Most modelling projects have a
deadline, and once the work force is available, the tasks should be processed as soon as possible, if one wants to have
the best chance of meeting it.
One can question the feasibility of producing a full set of equations before any estimation. What we propose is to
replace the future formulations by a βdeclaration of intentβ which states only the dependent variable on the left, and
the explanatory elements on the right. For each equation, the format should be as close as possible to:
For instance, for exports X depending on world demand WD, the rate of use of capacities UR and price
competitiveness COMPX, one will use:
scalar f
X = f*(WD+UR+COMPX)
β’ The modeler will be able to check by sight the logic of his model
β’ The text can be provided to other economists for advice
β’ The full list of requested variables can be established, allowing to produce a complete transfer program
β’ Processing the equations through EViews will give interesting advice on several elements. Double clicking on
the βmodelβ item one will get:
o The equations (from βEquationsβ or βPrint Viewβ).
34
As there is a cost to the goods. For free or quasi-free data, the chance can be lowered.
65
οͺ The grammatical acceptability of equations will be checked: for instance, if the number of left and right
parenthesizes is indeed the same. Erroneous equations will appear in red in βEquationsβ.
οͺ Also, the fact that each endogenous variable is computed only once. The second occurrence will also appear
in red.
o The variables.
66
Note: if a variable is currently overridden, its name will appear in red.
οͺ The most important information will come from the list of exogenous: one might find elements which should
have been determined by the model, according to its logic. In general, this will mean one has forgotten to
state the associated equation. Also, some elements might appear, which should not belong to the model.
Normally this corresponds to typing errors.
Number of equations: 14
Number of independent blocks: 3
Number of simultaneous blocks: 1
Number of recursive blocks: 2
prΓͺt(3) x(13)
k(14)
67
It decomposes the set of equations into a sequence of blocks, either recursive (each variable depends only on
preceding elements) or simultaneous (variables are used before they are computed). If one is going to succeed in
estimating equations which follow the same logic as intended in the preliminary version, the block structure described
at this stage will be already fully representative of the future one. One can detect:
β’ Abnormal simultaneities: a causal loop might appear, which is not supported by economic theory behind the
model.
β’ Abnormal recursive links: a block of equations containing a theoretical loop (the wage price loop, the
Keynesian cross) can appear as recursive. This can come from a forgotten equation, a typing errorβ¦
Practical operational examples will be given later.
In any case, observing the causal structure of the model will give some preliminary information about its general logic,
and its potential properties.
To check on the model specifications, you can use the βView > Print Viewβ will display the following window:
One will note that it is possible to decide on the number of significant digits, which produces clearer displays in a
document (the default is 8).
68
5.2 PREPARING THE MODEL: SPECIFIC DATA ISSUES
In the case of a national macroeconomic model, the needed data can be:
β’ National Accounts elements: operations on goods and services, transfers between agents, measured in value,
at constant prices, or at the prices of the previous year. The producer will generally be the national statistical
office. For France it would be INSEE (the National Institute for Statistics and Economic Studies).
β’ The corresponding deflators.
β’ Their foreign equivalents, using the accounting system and the corresponding base year of the particular
country, or rather a synthesis produced by an international organism (OECD, International Monetary Fund,
EuroStat....).
β’ Variables in a greater detail, possibly measured in physical quantities (oil barrels, tons of rice). They can come
from a public or private agency, or from the producers themselves. In France energy elements would come
from the Observatory of Energy.
β’ Monetary and financial data, coming mostly from the local National Bank (in France the Bank of France or the
European Central Bank. ...), or from an international bank (OECD, World Bank, International Monetary Fund,
EBRD, ADBβ¦).
69
β’ Data on employment or unemployment. One can get detailed labor statistics (by age, qualification, sex...)
from the US Bureau of Labor or the French βMinistΓ¨re du Travailβ.
β’ Demographic data: population, population in age of working, age classes (INSEE in France).
β’ Survey data: growth and investment prospects according to firm managers, productive capacity, living
conditions of households (coming from public or private institutes).
β’ Qualitative elements: the fact of belonging to a specific set, meeting a specific constraint.
β’ Micro economic models will generally use survey data (households, firms) with sometimes a time dimension
(panels, cohorts) and possibly include some of the above elements as global indicators.
As the area of application of models is unlimited, the field of potentially relevant data is also. A model on the economy
of transportation would include technical data on the railway system and on distances between cities, an agricultural
model meteorological data and information on varieties of species.
The medium through which data can be obtained will play an important role. Accessing the necessary data takes into
account several features:
Several options are available for transferring the data to the model.
Data can be obtained from a physical support, either commercially produced or created for the purpose. This can be
either a CD or DVD-ROM, or another rewritable media such as an USB key, or a memory card. For instance, INSEE
provides CD-ROMs containing the full National Accounts.
One can share files using Google Drive, Dropbox, or other means.
The advantage is for participants to a project to share elements in real time. Of course, one must be careful with their
status, between read only and read+write. This requires some organization between team members.
In any case, one can transfer the shared file to his computer, allowing any changes.
70
Files can be downloaded from a website, commercial or not
An extensive survey of the data available online for free, compiled by John Sloman at the Economics, will be found at
the address:
https://www.economicsnetwork.ac.uk/data_sets
In our experience of building single or multi-country macro econometric models, we are mainly using:
OECD Economies
Austria, Belgium, Canada, Chile, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary,
Iceland, Ireland, Israel, Italy, Japan, Korea, Latvia, Lithuania, Luxembourg, Mexico, Netherlands, New, Zealand,
Norway, Poland, Portugal, Slovak, Republic, Slovenia, Spain, Sweden, Switzerland, Turkey, United, Kingdom, United,
States, Euro area (17 countries), OECD Total.
World
Non-OECD Economies
Argentina, Brazil, Bulgaria, China (People's Republic of), Colombia, Costa Rica, India, Indonesia, Romania, Russia, South
Africa
Each set contains 122 quarterly series, and 221 yearly. They begin in 1962 and end at present (April 2020) in 2017.
The data is available in CVS format, easily converted to Excel and then transferred to EViews.
β’ The World Bank covers a much larger set of countries (267 including groups), and series (1429), but:
o The periodicity is annual, starting in 1960 at the earliest.
o Some important elements are lacking, in particular employment and capital (available in principle in the
OECD set).
The reason for the large number is that the fields covered are much wider, with a focus on sociological issues,
including for instance (potentially):
71
o People using safely managed drinking water services, rural (% of rural population)
o Women who believe a husband is justified in beating his wife when she burns the food (%)
https://datacatalog.worldbank.org/dataset/world-development-indicators
Clicking on βDownloadβ creates a Zip file for a complete Excel sheet (110 Mb).
Creating an EVIews file containing pages associated with countries is rather straightforward, but involves a little
programming (50 lines). The text is available from the author.
Of course, the number of actual values can vary and can be zero.
http://data.imf.org/
Very interesting for financial data, of course, such as the countriesβ Balance of Payments. But most of this information
is duplicated on the World Bank site (in annual terms, however).
https://ilostat.ilo.org/data/
for detailed labor series, and employment series missing for the World Bank.
β’ and for French data, one can access the INSEE site www.insee.fr.
72
If documentation is attached to series, it can be imported along with the values and the series names.
Now you can access the World Bank data from inside a session, through the menu options.
First use:
EViews displays this menu in which you select: World Bank Database.
73
5.2.3.7 Accessing INSEE series
For the French users (or those interested in the French economy), the official INSEE series can be downloaded.
74
The information can be obtained at:
https://www.insee.fr/en/information/2868055
In less and less frequent cases, some data will not be available in magnetic form: series will be found in printed or
faxed documents, or obtained directly from other experts, or fixed by the user (who then plays the role of expert).
This data will have generally to be entered by hand, although a direct interpretation by the computer through optical
character recognition (OCR) is quite operational (but this technique calls for documents of good quality).
In this case it is essential not to enter figures directly into the model file, but to create a preliminary file (such as an
Excel sheet or even and ASCII file) from which the information will be read. This separates the modelling process from
the production of βofficialβ information.
It is now quite easy to share files through Internet. This is useful in two cases:
Although this feature can look unrelated to modelling, it can be quite useful for researchers communicating together
on some project. In particular the files can be shared on a Google Drive.
For instance, for a Google Drive, you can work through a Cloud directory, using:
76
5.2.3.11 Change of format
As indicated above, the original data format is generally different from the one used by the model-building software.
In the worst cases, transfer from one software program to another will call for the creation of an intermediate file in a
given format, which the model-building software can interpret. The Excel format is the most natural intermediary, as it
is read and produced by all packages. In that case, it is not necessary to own a copy of the package to use its format.
In the very worst cases, it is always possible to ask the first program to produce a character file (in the ASCII standard)
which can, with minimal editing, be interpreted by the second program as the sequence of statements allowing the
creation of the transferred series, including data and definitions35.
However, the situation has improved in the last years, as more and more packages provide a direct access to the
formats
Access
Armes-TSD
Binard
dBase
Excel (through 2003)
Excel 2007 (xml)
EViews Workfile
Gauss Dataset
GiveWin/PcGive
HTML
Lotus 1-2-3
ODBC Dsn File
ODBC Query File
35
For instance, the sequence:
77
ODBC Data Source
MicroTSP Workfile
MicroTSP Mac Workfile
RATS 4.x
RATS Portable / TROLL
SAS Program
SAS Transport
SPSS
SPSS Portable
Stata
Text / ASCII
TSP Portable
Of course, one must also consider the relationship between the data producing and modelling institutions. The most
technically complex transfers do not necessarily occur between separate institutions. A commercial contract might
give the modelling institution direct access (through one of the above means) to information managed by a data
producing firm, under the same software format, while a large institution might still use CD-ROMs as a medium
between separate units.
However, one must also consider the cost of establishing contracts, including perhaps some bartering between data
producing and study producing institutions.
As a general principle, one should favor using a single source. But this is not always possible. In that case, one should
define a primary source, and take from the alternate ones the only additional series. The main problems might come
from:
In addition, for operational models designed to produce official studies and in particular forecasts, it is essential that
the results concur with the official local statistics. As forecasts are presented mostly as growth rates (GDP and inflation
for example), but provide also the last statistical (official) level, the first value in the forecast must be consistent with
both. If the model is built on an outside source, the forecast must be corrected accordingly. This issue will be
developed when we present the forecasting task.
For instance, let us suppose that the limits on the availability of local statistics availability forces the modeler to use on
external source (like the WDI from the World Bank) to produce a full model, but the local statistical office provides
basic elements like GDP. If the model forecast starts in 2020
78
Let us now define the best organization for transferring data from the original source to the software (we shall use
EViews as an example).
β’ Insert if necessary a line of series names above the first period data (or a column left of the first column).
It does not matter if the matrix does not start in cell B2. Just insert as asked.
In recent versions, the import statement memorizes the reference to the original (Excel) file. EViews will detect if a
change is made and propose updating the page in the workfile accordingly.
Very often the nature of available series is not really adapted to the needs of the model. A preliminary processing is
then necessary. This can apply to several features.
Most of the time the series the model builder will access have the right periodicity. Individual exceptions can occur.
New series will have to be computed (inside the modelling package).
36
For instance, quarterly data can appear in yearly lines of four columns.
79
5.2.5.1.1 Aggregation
The easiest case happens if the available periodicity is too short. The nature of the variable will lead naturally to a
method of aggregation, giving the exact value of the series:
If we call Xt the aggregated variable in t, and xt, i the variable of the sub - period i in t, we can consider the following
techniques:
n
X t = ο₯ x t ,i
i =1
n
X t = 1 / n ο₯ x t ,i
i =1
β’ First or last value, for a level at a given date (for example the capital on the first day of a year will come from
the first day of the first quarter). This will apply to stock variables.
ππ‘ = π₯π‘,1
or
ππ‘ = π₯π‘,π
5.2.5.1.2 Disaggregation
When moving to a shorter periodicity, EViews provides a large list of options, depending on the nature (flow, level or
stock) of the variable.
The following table is copied from the EViews Help and applies to the βc=β modifier in the copy statement.
For instance:
copy(c=q) quart\x
80
can copy the yearly series X into the quarterly page quart using a quadratic smoothing, in such a way that the average
of the quarterly values matches the yearly one.
arg Low to high conversion methods: βrβ or βrepeataβ (constant match average), βdβ or βrepeatsβ
(constant match sum), βqβ or βquadaβ (quadratic match average), βtβ or βquadsβ (quadratic match
sum), βlinearfβ (linear match first), βiβ or βlinearlβ (linear match last), βcubicfβ (cubic match first),
βcβ or βcubiclβ (cubic match last), βpointfβ (point first), βpointlβ (point last), βdentonfβ (Denton
first), βdentonlβ (Denton last), βdentonaβ (Denton average), βdentonsβ (Denton sum), βchowlinfβ
(Chow-Lin first), βchowlinlβ (Chow-Lin last), βchowlinaβ (Chow-Lin average),βchowlinsβ (Chow-Lin
sum), βlitmanfβ (Litterman first), βlitmanlβ (Litterman last), βlitmanaβ (Litterman average),
βlitmansβ (Litterman sum).
rho=arg Autocorrelation coefficient (for Chow-Lin and Litterman conversions). Must be between 0 and 1,
inclusive.
5.2.5.1.3 Smoothing
Smoothing represents a particular case: preserving the same periodicity as the original series, but with the constraint
of a regular evolution, for example a constant growth rate. Instead of n free values, the choice is reduced to the value
of one (or maybe two) parameters.
EViews provides a large set of methods, some very sophisticated, the most popular being the Hodrick-Prescott and
Holt-Winters methods. The methodology and syntax are explained in detail in the Userβs Manual.
As we explained before, one method for dealing with variables presenting a seasonality is to eliminate it, and work
with seasonally adjusted series.
Several algorithms can be considered, the best known being probably Census X-13-ARIMA and TRAMO-SEATS, both
available in EViews.
Obviously, one should not mix original and adjusted series in the same set of model variables.
We have already considered this problem when we addressed the fields of models.
Changing categories will usually correspond to aggregation. In the case of economic models, this will apply essentially
to:
β’ Economic agents: one might separate more or less precisely householdsβ categories (following their income,
their occupation, their size...), types of firms (according to their size, the nature of their production...),
Government institutions (central and local, social security, agencies...).
β’ Products (production can be described in more or less detail).
β’ Operations (one can separate social benefits by risk, by controlling agency, or consider only the global value).
β’ Geographical units (a world model can aggregate countries into zones).
5.2.6 UPDATES
Once adapted to needs of the model builder, series often will have to be modified.
β’ Correcting a formal error, made by the model builder or the producer of series: typing errors, or errors of
concept.
β’ Lengthening the available sample: new observations will keep showing up for the most recent periods.
β’ Improving information: for the last known years, series in the French National Accounts appear in succession
as provisional, semi-final and final.
β’ Changing the definition of some variables. For instance, the work of private doctors in State hospitals can
move from the market to the non-market sector or vice-versa.
One can also add to the data set a completely new series
This multiplicity of possible changes prohibits the global set of series used by the model to remain the same even for a
short period. Adapting constantly model specifications (in particular the estimated equations) to this evolution would
ask a lot from the model builder to the detriment of more productive tasks. This means one should limit the frequency
of reconstitutions, for the operational data set (for example once or twice per year for an annual model, or every
quarter for a quarterly one), with few exceptions: correcting serious mistakes or introducing really important
information.
Without doubt, the best solution is actually to manage two sets of data, one updated frequently enough with the last
values, the other built at longer intervals (the periodicity of the model for example). This solution allows to study in
advance, by estimations based on the first set, the consequences of the integration of new values on the
specifications and properties of the next model version.
5.2.7 SUPPRESSIONS
It is beneficial to delete in the bank those series which have become useless:
82
For EViews, this presents an additional interest: the elements in the workfile will be display in a single window, and it
is essential for this window to concentrate as many interesting elements as possible.
Similarly, investment in the documentation of series produces quick returns. It can concern:
β’ The definition, possibly on two levels: a short one to display titles in tables or graphs, and a long one to fully
describe the concept used.
β’ The source: original file (and sheet), producing institution and maybe how to contact the producer.
β’ The units in which the series is measured
β’ Additional remarks, such as the quality and status (final, provisory, estimated) of each observation.
β’ The date of production and last update (hours and even minutes also can be useful to determine exactly
which set of values an application has used). This information is often recorded automatically by the
software.
β’ If pertinent, the formula used to compute it.
Example: Wage rate = Wages paid / (employment x Number of weeks of work x weekly work duration).
EViews allows to specify the first four types, using the label command, and produces automatically the last two.
For example, a series called GDP can be defined through the sequence:
GDP.label(c)
GDP.label(s) from the Excel file accounts.xls produced by the Statistical Office
Which clears the contents, gives the definition, describes units, the source, and adds remarks.
β’ In addition, from version 8, EViews allows to introduce oneβs own labels, for instance the country for a
multinational model, the agent for an accounting one, or the fact that a series belongs to a particular model.
HI.label(agent) Households
MARG.label(agent) Firms
83
β’ If the workfile window screen is in βDisplay+β mode, you can sort the elements according to their
characteristics. In addition to the name, the type and the time of last modification (or creation) you have
access to the description.
Moreover, if you right click on one of the column headings, and choose βEdit Columnsβ you can display additional
columns for any of the label types, including the ones you have created.
This can prove quite useful, as it allows you to filter and sort on any criterion, provided you have introduced it as a
label.
Once the display is produced, it can be transferred to a table, which can be edited (lines, fontsβ¦) and used for
presentations.
For instance, one can produce a table for a model, with columns for type, agent, units, source, identity / behaviorβ¦.
This table can be sorted using any of the criteria.
These new functions allow table production to be integrated in the modelling process, a very powerful information
tool for both model development and documentation.
U_MARG.label(d) Margins
F_HDI.label(agent) Households
U_MARG.label(agentl) Firms
37
You can also use the βsourceβ
84
One of the main interests of this feature is to create a table(using βfreezeβ). This table can then be sorted according to
any of the criteria.
These definitions follow the series as they are moved through the workfile, or even to an external file.
In the general case, the model builder will be confronted with a large set of series of more or less various origins.
Optimal management strategy might appear to vary with each case, but in fact it is unique in its main feature: one
must produce a file, in the standard of the model building software, and containing the series having a chance to be
useful for the model.
This is true even if the global set of necessary series is produced and managed on the same computer or computer
network, using the same software (the task of transfer will be simply made easier): it is essential that the model
builder has control over the series he uses, and especially that he manages changes (in particular updates of series in
current use). Interpreting a change in model properties (simulations, estimations), one must be able to dismiss a
change in the data as a source, except if this change has been introduced knowingly by the model builder himself38.
Such an organization also makes the management of series easier. In particular, limiting the amount of series in the
bank, apart from the fact that it will save computer time and space, will make the set easier to handle intellectually.
Concerning the scope of the series, two extreme options can however be considered:
β’ Transferring in the model bank the whole set of series that have a chance (even if a small one) to become
useful at one time to the development of the model39.
β’ Transferring the minimum, then adding to the set according to needs.
If a median solution can be considered, the choice leans strongly in favor of the first solution. It might be more
expensive initially, in human time, and in size of files, but it will prove generally a good investment, as it avoids often a
costly number of limited transfers, and gives some stability to the bank as well as to its management procedures.
For models managed by institutions or research groups, the most frequently found organization is a team working
through computers connected through Internet, where storage and synchronization services like Google Drive allow
to share files inside a project. A rigorous work organization is needed to manage the elements of the project,
between work in progress for which current information must be provided, and documented final versions, which can
38
This remark is a particular application of the general principle Β« let us avoid potential problems which can prove
expensive in thinking time Β».
39
Even if they are not considered for actual model variables. For instance, one can be interested in comparing the
capital output radio of the modelled country with those of other countries.
85
only be unfrozen by a controlling manager. Not meeting these principles can lead very quickly to a confusing and
unmanageable situation.
The final version can be made available online to followers, along with the associated documentation and examples. If
the follower has access to the relevant modeling software, direct access to the files can be provided.
In the case of an operational project (like allowing Government economists to produce forecasts and studies) access
can be provided through a front end, which does not require any knowledge of the model management software. This
is the case for EViews.
As to the original data, it can come from distant sources like the website of the World Bank, or of the statistical office
of the modelβs country. One might in some cases access directly the data sets of the provider from inside a model-
building session (this is the case in EViews for the World Bankβs WDI). The producers of modelling packages are giving
a high priority to this type of option.
One must however pay attention to format incompatibilities, especially if the operating system is different (Windows
and its versions, Linux, UNIX, Macintosh...)40.
In this chapter, we describe the use of data provided by international organizations and institutes.
We will not try to be comprehensive: this is a formidable task, certainly more time consuming than the production of
the present book.
This is done much better by various institutes, which the reader can find easily with a simple Google search. One very
good instance is:
https://www.economicsnetwork.ac.uk/data_sets
https://data.un.org/
where you can get access to all the main sites, and also some specific domains, like agriculture and rural development.
Our purpose will rather be the following: to help the producer of a new model to complete the data base he needs for
this task.
So we will rather focus on the technical process, limited to the most promising options in our opinion.
β’ The model applies to a single country, which represents the interest of the builder.
40
Most modelling packages work actually under Windows, except freeware like R.
86
This is particularly relevant if he belongs to an official organization, and is responsible for providing studies (maybe
forecasts) applied to the countryβs economy. In this case, he will have access to the official data for this country, and
his model must conform to it, if only to provide results in the official format for data and concepts.
But this is also true if the builder is a local independent (maybe a PHD student). He will be more familiar with the
context and have more direct access to local resources.
However, he will need some foreign information, if only to produce some assumptions on foreign demand and prices.
In this case, any source of information is adequate. Detail will only be useful for building detailed scenarios on the
evolution of the world economy, and its consequences on local growth. For instance, a Vietnamese modeler might
require a description of external trade identifying Chinese growth, to establish the model assumptions.
But the classifications, base years and accounting systems can remain different.
β’ The model applies to a group (like the European Union) or the world (a set of groupings covering the whole
world).
In this case compatibility between models requires access to a single source. The choice must be made at the start,
based on a detailed study of the advantages of all solutions available.
We will rather select the most and focus on the technical access to the main sources, with practical examples. A very
comprehensive list (giving access to the various sites) will be found in:
We will focus on the World Bank βWorld Developments Indicatorsβ under EViews.
One can wonder why we did not use a single source, which would have simplified the process, in particular the
programming.
The reasons are practical, and the choice has been obvious:
The OECD data uses a quarterly periodicity, essential to describe the dynamics of the MacSim developed economies
and applying sophisticated econometric techniques such as cointegration.
The countries in the ECOWAS model are not considered developed enough to be described in the OECD data set.
87
As to the additional sources:
The ILO data set is extremely detailed, but limited to the field of labor: employment, unemployment, revenue and
costs. However, if fills gaps on wages, a problem for the WDI and less for OECD.
The IMF data set is logically more detailed on financial series, although many of them can be found in the WDI, with
an annual periodicity, however.
At this time (22nd June, 2020) The World Bank makes available, on the site
https://datatopics.worldbank.org/world-development-indicators/
The series are contained in the page βDataβ unfortunately as a single page, with all the series (1442 of them) for all the
countries (266 of them, including a number of groupings) and the periods 1960 to 2020 (in principle).
If used as such, EViews will replace (conveniently) the dots by underscores (β_β).
88
The list of the countries and series are given in annex, as separate Excel tables.
β’ Locate the country series in the page βDataβ of the WDI data set.
β’ Copy the set in a separate Excel file.
β’ Create a one-page EViews workfile.
β’ Import the data into the page by menu or program.
In addition, you can attach the definitions to the series, using the elements of column C of the same page. One can use
the first 1431 cells, which apply to all countries. This calls for a little editing.
We will now present our method to produce a set of EViews workfiles, in which each page contains the whole set of
1341 WDI series associated with a single country or group.
β’ Use the name of the three-letter acronym for the country or group, used by the World Bank.
β’ Contain all the WDI series for that country or region.
β’ Give access to the definition of the variables (both short and long ones).
This is what we have done, and the provided programs will do. As we have considered that the size of a complete file
was too big, we divided the countries into 8 regions, following the World Bankβs own partition (column H of sheet
Countries).
89
6 : SAS : SOUTH ASIA
In addition, the region groupings (like the page for South Asia as a whole) will be found in the corresponding regional
data set.
The list of countries and sub-groups will be found in the related annex.
Although this is not really needed (after all, the files are there), we shall briefly describe the method.
β’ We create a file called indic.xls with 263 lines (for each country or group) and two columns: the acronym and
the number. We read it as a matrix.
β’ We modify the original World Bank WDI Excel file by separating the βDataβ sheet into 6 parts, to meet a
constraint on the number of lines. At this moment, only 65536 (=2 16) lines can be read, which means 45
countries. Our pages will contain only 40.
β’ We chose a region.
β’ We create a workfile for the region.
β’ We run an EViews program which checks for the presence of the region number, in each of the 6 pages in
sequence.
β’ If the acronym meets the number in indic:
o We create a page with the acronym name.
o We read the following 1343 series into the page.
β’ We repeat the process until the end of the sheet is met.
This sequence is available to any user, after editing the program for any changes in the number of countries and
series.
The four elements appear when a single series is displayed (using the βsheetβ option).
The βDisplay+β option presents one series per line, including the short description. The other elements can be
displayed too, by right-clicking on the top bar, selecting βEdit columnsβ and clicking in the appropriate boxes (the
βTypeβ and βLast Updateβ columns can be dropped at the same time).
90
5.3.4.3 The problems
Although the World Bank provides a lot of information, sometimes in great detail, some very important elements are
not present. In particular, these elements are clearly requested if you want to build a general econometric model.
These series are clearly required, as they enter the wage-price loop, the production function and the households and
firms accounts.
β’ Capital.
This is required too, but not readily available in most data banks. There are ways to compute it, depending on the
related information available.
β’ Intermediate consumption.
This is needed to compute total demand (which defines imports) and the production price (which defines the trade
prices and competitiveness).
β’ Housing investment.
β’ Social contributions.
This affects the revenue of all agents, and the cost of labor (thus the value-added deflator and the capital labor ratio in
case of substitution).
91
5.3.4.4 The program
cd "d:\eviews\__world_bank_2020"
' This workfile contains the single page "countries" with artificial series for all the acronyms (each with the "NA" value);
' This can be done easily using the following list
close wb_all
close wb_eap_2020
open wb_all
save wb_eap_2020
' We include a file with a subprogram for creating the series characteristic (short definition, long definition, topic,
source)
include def_2020
pageselect countries
delete(noerr) *
group g_countries
for %1 ABW AFG AGO ALB ANO ARB ARE ARG ARM ASM ATG AUS AUT AZE BDI BEL BEN BFA BGD BGR BHR
BHS BIH BLR BLZ BMU BOL BRA BRB BRN BTN BWA CAF CAN CEB CHE CHI CHL CHN CIV CMR COD COG COL
COM CPV CRI CSS CUB CUW CYM CYP CZE DEU DJI DMA DNK DOM DZA EAP EAR EAS ECA ECS ECU EGY
EMU ERI ESP EST ETH EUU FCS FIN FJI FRA FRO FSM GAB GBR GEO GHA GIB GIN GMB GNB GNQ GRC GRD
GRL GTM GUM GUY HIC HKG HND HPC HRV HTI HUN IBD IBT IDA IDB IDN IDX IMN IND IRL IRN IRQ ISL ISR ITA
JAM JOR JPN KAZ KEN KGZ KHM KIR KNA KOR KWT LAC LAO LBN LBR LBY LCA LCN LDC LIC LIE LKA LMC LMY
LSO LTE LTU LUX LVA MAC MAF MAR MCO MDA MDG MDV MEA MEX MHL MIC MKD MLI MLT MMR MNA MNE
MNG MNP MOZ MRT MUS MWI MYS NAC NAM NCL NER NGA NIC NLD NOR NPL NRU NZL OED OMN OSS PAK
PAN PER PHL PLW PNG POL PRE PRI PRK PRT PRY PSE PSS PST PYF QAT ROU RUS RWA SAS SAU SDN SEN
SGP SLB SLE SLV SMR SOM SRB SSA SSD SSF SST STP SUR SVK SVN SWE SWZ SXM SYC SYR TCA TCD TEA
TEC TGO THA TJK TKM TLA TLS TMN TON TSA TSS TTO TUN TUR TUV TZA UGA UKR UMC URY USA UZB VCT
VEN VGB VIR VNM VUT WLD WSM XKX YEM ZAF ZMB ZWE
if @isobject(%1)=0 then
genr {%1}=na
g_countries.add {%1}
endif
pagedelete(noerr) {%1}
next
vector(263) indic
indic.read(type=excel,b1,s=indic) indic.xls
92
' We
!j=1
!n=263
!p=40
!l=1
while !n>0
!k= 2
'
for !i=1 to !p
pageselect countries
%1=g_countries.@seriesname(!l)
pagedelete(noerr) {%1}
if indic(!l)=1 then
' We read the 1431 series from the EXcel sheet number "j"
' starting in cell e{!k}
call def
close {%1}
endif
pageselect countries
93
!k=!k+1442
!l=!l+1
next
!j=!j+1
!n=!n-!p
wfsave wb_eap_2020
The simplest and most accurate way to fill the gaps in the WDI data is the access to an alternate source. We shall
concentrate on the first case, clearly the most important.
94
The explanations are given in the last file:
https://www.ilo.org/ilostat-files/Documents/ILOSTAT_BulkDownload_Guidelines.pdf
If you are currently accessing the ILO data by Internet, the Excel menu will be modified automatically to include a
βILOSTATβ item, as follows (sorry for the French):
As you can see, you can access data for a country or a subject.
95
which leads to the menu:
This will start a download for all available series, for all countries for which at least one series is present.
96
The results can appear as:
This is a very interesting source (stats.oecd.org). It provides data from 2000 to 2018, sometimes 2019 (at present).
After logging in, you will have access to the following menu, with the Economic Outlook the most interesting elements
formacromodelers.
https://stats.oecd.org/viewhtml.aspx?datasetcode=EO107_INTERNET_2&lang=en
97
Using βCustomizeβ you can decide on the countries, the topics and the time span.
Then you can ask from the selected elements to be downloaded as and Excel or CSV file.
Remember that to read a CSV file in Excel you should not try to open it directly, but rather open a blank sheet and use:
and select βCommaβ as a delimiter (βVirguleβ in French).to organize the information into colums.
98
5.3.5.3 The United Nations, in particular the Economic Commission for Europe.
The site:
https://w3.unece.org/PXWeb/en
99
The countries are (63 including groupings):
European Union-28, Euro area-19, EECCA, CIS-11, North America-2, UNECE-52, Western Balkans-6, Albania, Andorra,
Armenia, Austria, Azerbaijan, Belarus, Belgium, Bosnia and Herzegovina, Bulgaria, Canada, Croatia, Cyprus, Czechia,
Denmark, Estonia, Finland, France, Georgia, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Kazakhstan,
Kyrgyzstan, Latvia, Liechtenstein, Lithuania, Luxembourg, Malta, Monaco, Montenegro, Netherlands, North
Macedonia, Norway, Poland, Portugal, Republic of Moldova, Romania, Russian Federation, San Marino, Serbia,
Slovakia, Slovenia, Spain, Sweden, Switzerland, Tajikistan, Turkey, Turkmenistan, Ukraine, United Kingdom, United
States, Uzbekistan.
Once you have made your selection, you can save it in different format (through βSave asβ), the most practical being
probably Excel .xlsx.
100
5.3.5.4 The International Monetary Fund.
After signing in (free) on the site:
https://www.imf.org/en/Data
you are given access to the following files, for 193 countries and 11 groups:
101
The reference will be created as something like:
https://data.imf.org/?sk=388DFA60-1D26-4ADE-B505-A05A558D9A42&sId=1479329328660
You can ask for the download of a given file. It will be sent using you registered e-mail address, for instance as:
If you want data for a specific country (like Gabon) and table (like the Balance of Payments), you can use:
102
β¦β¦β¦
which you can export to Excel (not CSV!) using the corresponding menu item.
103
5.3.6 BACK TO OUR EXAMPLE
Now that we know the principles, let us see how to apply them to the case we have defined earlier. To avoid switching
between distant pages, we shall repeat its presentation.
1. In the example, our economist has decided to build a very simple model of a countryβs economy, which includes
the following elements: Based on their production expectations and productivity of factors, firms invest and hire
workers to adapt productive capacity. However, they exert some caution in this process, as they do not want to be
stuck with unused elements.
2. Productive capital grows with investment but is subject to depreciation.
3. The levels actually reached for labor and capital define potential GDP.
4. They also need intermediate products (proportional to actual GDP), and adapt inventories, from the previous
level.
5. Households obtain wages, based on total employment (including civil servants) and a share of Gross Domestic
Product. They consume part of this revenue and invest another (in housing).
6. Final demand is the sum of its components: consumption, productive investment, housing investment,
inventories, and government demand. Total demand includes also intermediate consumption. Final and total demand
are the sum of their components
7. Imports are a share of local total demand, final or intermediate. But the fewer capacities remain available, the
more imports will be called for.
8. Exports follow world demand, but the priority of local firms is satisfying local demand. They are also affected by
capacity constraints.
9. Supply is equal to demand.
104
We have voluntarily kept the framework simple, as our purpose is only explanatory at this time. However, the model
we are building has some economic consistency, and can actually represent the nucleus for further extensions which
we shall present later.
We shall also suppose that the following data is available in an Excel file called FRA.XLS, selected from OECDβs
Economic Perspectives data set. Series are available from the first quarter of 1962 to the last of 2010. However, the
set contains a forecast, and the historical data ends in 2004.
A note: the reason for using older data is not laziness in updating the statistics. The period we are going to consider is
the most interesting as it includes years of steady growth (1962 to 1973), followed by uncertainty following the first oil
shock. This justifies too using French data: the main point here is using actual data for a non-exceptional country.
When we move to a more operational case our data set will include later periods.
The reason for the βFRAβ prefix is to identify series for France in a large set of countries, representing all the OECD
members as well as some groupings.
Values: Euros
Populations: persons
It should be clear that this will have to be done through a set of stored statements in a readable language (a program).
This option will allow:
41
However, one can copy the sequence of statements entered in the command window into a program file.
106
The obvious choice is even comforted by three features provided by recent versions:
β’ You can run part of a program, by selecting it with the mouse (in the usual Windows way), clicking on the
right button, and choosing βRun Selectedβ.
This is generally more efficient than the previous method of copying the selected part into a blank program, and
running it. However, the new method does not allow editing, useful when one wants to run a selected AND modified
set.
β’ Symmetrically one can exclude temporarily from execution part of a program, by βcommenting it outβ. To do
this, one should select the relevant part, click on the right button, and choose βComment Selectionβ. To
reactivate the statements, one should select them again and use βUncomment Selectionβ. One can also type
a single quote (β) before the statement.
This can be a little dangerous, especially if you (like myself) have the reflex of saving the program before each
execution. To avoid destroying the original, one can save first the modified program under another name42.
β’ Finally, one can ask a column of numbers to be displayed left of the program lines. This is particularly efficient
if you use the βGo To Lineβ statement43.
This is done by first unwrapping the command lines. Each command will use one line regardless of its length, which
can be a little annoying for very long ones. Then (and only then) one can ask for the line numbers.
β’ So actually, the only option is the one we proposed above: defining a program producing all the necessary
information, and the framework of equations which is going to use it. But the ordering of the tasks can be
questioned, as we have started explaining earlier. Until both are completed, the job is not done, but they are
technically independent: one does not need the physical model to create the data, or series filled with values
to specify the equations. This means that one can consider two extreme methods:
β’ Creating all the data needed by the model, then specifying the model.
β’ Specifying all the model equations, and then producing the associated data.
The criterion is the intellectual feasibility of the ordered sequence of tasks.
Clearly the first option is not realistic, as writing down the equations will surely evidence the need for additional
elements. The second is more feasible, as one does not need actual series to write an equation. But as the definition
of the equation processes, one has to check that all the addressed elements are or will be available in the required
form, either as actual concepts (GDP) or transformations of actual concepts (the budget deficit in GDP points calls for
the deficit and GDP series). If a concept appears to be lacking, one will have to: use an alternate available element (a
βproxyβ), establish an assumption, look in alternate bases not yet accessed, or simply eliminate the element from the
model.
This shows that if producing both sets can be done in any order, there is a preference for specifying the equations
first, with a general knowledge on data availability. If the data set is not ready, but its contents are known, it is
possible to write down the equations and ask the software to proceed the text. The user will be told about possible
42
Only once of course.
43
However, you have to be careful to update the numbers when the program changes.
107
syntax errors, about the nature of the variables (endogenous / exogenous), and the architecture of his model. This will
lead to early model corrections, allowing to gain time and avoiding taking wrong directions later. And if the model
specifications are still discussed, it is possible to build a first version of the associated data set, which will be updated
when the model is complete.
In practice, especially in the simplest cases, one can also start defining the program with two blank paragraphs, and fill
them with data and equation creating statements until both are complete. The eight original paragraphs in our model
specifications can be treated one by one (not necessarily in the numerical order) filling separately the data and
equation generating blocks with the associated elements.
β’ Model then data: Specifying first the full model, checking that all elements used can be produced either
directly or through a formula. Then producing the full set of data, preferably through a direct transfer or a
transformation.
β’ Model and data: Producing the equations in sequence, or related block by related block, and establishing
simultaneously the statements which create all the series they need.
Let us now show on our example how the process can be conducted using the second method, probably more
adapted to such a small model (one cannot expect to switch between the two processes too many times).
We shall first present the process in general (non-EViews) terms, treating each case in sequence, and presenting both
the equations and the statements generating the associated variables. To make thinks clearer, the equations will be
numbered, and the creation statements will start with β>>β.
Also, the endogenous variable will use uppercase characters, the exogenous lowercase. This has no impact on
treatments by EViews, but will make interpretation clearer for the model builder and especially for his
(1) Based on their production expectations and productivity of factors, firms invest and hire workers.
This defines two behavioral equations for factor demand, in which employment (let us call it LE) and Investment
(called I) depend on GDP, called Q.
(1) LE=f(Q)
(2) I=f(Q)
We need:
>> IP=FRA_IBV
>> Q =FRA_GDPV
108
But for LE, we face our first problem. Private employment is not directly available. However, we have supposed that
total employment contained only public (government) and private. This means we can use:
>> LE=FRA_ET-FRA_EG
In another case, private and public employment could have been available, but not the total, which would have been
computed as a sum. This highlights the fact that computation and economic causality need not be related.
Capital K, measured at the end of the period, is defined by an identity. Starting from the initial level, we apply a
depreciation rate (called dr) and add investment. The equation is written as:
>> K=FRA_KBV
In other words, dr will be the ratio, to the initial capital level, of the difference between two levels of capital: the
value we would have obtained without depreciation, and the actual one.
109
(4) CAP(t)=f(LE(t), K(t))
>> CAP=FRA_GDPVTR
which rather represents a βnormalβ GDP value considering the present level of factors.
The direct availability of this concept as a series represents the best case, not often met in practice. Later in the text
we shall address the alternate techniques available in less favorable situations.
Intermediate consumption can be defined as proportional to GDP, using the actual value. This means that at any level
of production, each unit produced will need the same amount of intermediary products.
(5) IC = r_icq . Q
(6) CI=f(Q)
>> IC=FRA_ISKV
>> CI=FRA_CIV
(5) Households obtain wages, based on total employment (including civil servants) and a share of Gross
Domestic Product. They consume part of this revenue.
110
Now we need to define total employment, by adding government employment (called lg) to LE.
(7) LT=LE+lg
>> LT=FRA_ET
>> lg=FRA_EG
Now we have to compute household revenue, which we shall call R_HI. We shall suppose that the same wage applies
to all workers, and that the non-wage part of Household revenue is a given share of GDP, a series called r_rhiq. This
gives:
Actually the above assumption, while simplistic, is probably not too far from the truth. The sensitivity to GDP of the
elements included in this heterogeneous concept can be low (such as pensions, or interests from long-term bonds),
high (the revenue of small firm owners, with fixed costs and variable output), or medium (self-employed working in
the same capacity as wage earners).
Household consumption is given by applying to RHI the complement to 1 of a savings rate which we shall call sr. For
the time being, the savings rate is exogenous:
The new variables are RHI, wr, r_rhiq, sr, IH and r_ih.
111
RHI is given simply by:
Let now compute the real wage rate wr. This is done through the following computation.
Dividing FRA_WSSS by FRA_ET gives the individual nominal value, which we divide again by FRA_CPI/10044 to get the
real value45.
>> wr = (FRA_WSS/FRA_ET)/(FRA_CPI/100)
r_rhi will be obtained as the ratio to GDP of household revenue minus wages
44
The OECD deflators are measured as 100 in the year 1995. This means that 1995 the average of values and volume
is the same.
2.98E+11
2.97E+11
2.96E+11
2.95E+11
2.92E+11
1995q1 1995q2 1995q3 1995q4
45
Considering the above list of available series, one can observe other optiosn aren possible.
112
Consumption and housing investment will be obtained directly:
>> CO=FRA_CPV
>> IH=FRA_IHV
Computing the savings rate and r_ih will use the inversion of the associated equation:
sr =(RHI-CO)/RHI
or
sr=(FRA_YDRH-FRA_CPV)/FRA_YDRH
>> r_ih=IH/RHI
Or
>> FRA_IHV/FRA_YDRH
(6) Final demand is the sum of its components: consumption, productive investment, housing investment,
inventories, and government demand. Total demand includes also intermediate consumption.
(11) FD=IP+CO+IH+gd+CI
(12) TD = FD + r_ic . Q
113
We need to compute gd as the sum of FRA_IGV and FRA_CGV.
>> gd = FRA_IGV+FRA_CGV
>> FD = FRA_TDDV
(7) Imports are a share of local demand (Β«domestic demandΒ»). But the less capacities are still available, the
more an increase in demand will have to be imported.
(13) UR=Q/CAP
(14) M=f(FD+IC,UR)
We need to compute:
>> M=FRA_MGSV
(8) Exports will essentially depend on World demand. But we shall also suppose that if tensions appear
(through UR) local firms will switch some of their output to local demand, and be less dynamic in their
search for foreign contracts.
We need:
>> X=FRA_XGSV
>> WD=FRA_XGVTR
114
(9) Supply is equal to demand.
The supply-demand equation will for the moment use the following implicit form:
(16) Q + M = FD + X
We can now reorder the framework of our model into the following elements:
[1] LE =f(Q)
[2] IP=f(Q)
[5] IC=r_icq . Q
[6] CI=f(Q)
[7] LT=LE+lg
[11] FD = CO + IH + IP + CI + gd
[12] TD = FD + r_ic . Q
[13] UR = Q/CAP
[16] Q + M = FD + X
115
Endogenous variables
I Firms investment.
LE Firms employment.
LT Total employment.
CI Change in inventories
IC Intermediate consumption
IH Housing investment.
CO Household consumption.
M French Imports.
X French Exports.
exogenous variables
lg Public employment
One observes:
This distinction is normal. As we have already indicated, identities generally represent a mandatory formal connection,
while conforming behavior equations to economic theory is not so restrictive.
β’ Computing formulas
By considering the formulas we have obtained, we can see that most of the data needed is available directly, so a
simple transfer should be enough. We might even have considered using the original names. But as our model will
apply only to France, there is no reason to keep the prefix, which helped to identify the French data inside a much
larger multi-country file. And one might decide (rightly in our sense) that our names are clearer.
Q = FRA_GDPV
CAP = FRA_GDPVTR
CI = FRA_ISKV
LT = FRA_ET
LG = FRA_EG
FD = FRA_TDDV
CO = FRA_CPV
RHI = FRA_YDRH
I = FRA_IBV
IH = FRA_IHV
WD = FRA_XGVMKT
X = FRA_XGSV
M = FRA_MGSV
117
Only eight elements are lacking, seven of them exogenous variables:
In real cases, this kind of computation will be used often. One must be aware of one important issue:
The use of these formulas is logically distinct from the definition of model equations. The only reason we need them is
to produce the historical values of series not yet available. If the statisticians had made a comprehensive job (and if
they knew the requirements of the model) they would have provided the full set, and no computation would have
been necessary (just a changes in names).
β’ Applying the computation statements ensures that all the requested data is available. By associating
formulas to missing elements, they allow to produce the full set required for simulation and estimation. If the
data was already available in the right format, and the names given to the variables were acceptable, no
statement would be necessary. And one can check that in our case, most of the computations are actually
direct transfers, which allow to create a model element while retaining the original series.
Actually, one could question the necessity of having a full set of historical values for endogenous variables. These will
be computed by the model, which will be simulated on the future anyway. The reasons for producing a full set are the
following:
These formulas can include original data, transformed data computed earlier in the program, or simply assumptions.
For instance:
118
β’ The model equations establish a logical link between elements, which will be used by the model to produce a
consistent equilibrium. This means that if the formula for computing variable A contains variable B, variable A
is supposed to depend on B, in economic terms.
This is obviously true for estimated equations. For instance, the wage rate can depend on inflation, or exports on
world demand. But this is also true for identities:
Household revenue is the sum of its elements. If one grows, revenue changes in the same way (ex-ante, of course).
Basically, we suppose that some behaviors apply in the same way to every element of revenue, whatever its source.
If household consumption is estimated, savings are the difference between revenue and consumption.
It is extremely important to understand this issue, at the start of any modeling project.
It is quite possible however that the same formula is present in both sets. For instance, we might not have values for
FD, and we believe that CO, I, IH and gd represent the whole set of its components. In this case the formula:
FD = CO + IP + IH + gd
will be used both to compute historical values of FD and to define FD in the model.
This introduces an obvious problem: if we make a mistake in the formula, or we use the wrong data, there is no way to
detect it.
119
5.3.6.2 The EViews program
Let us now consider how the above task can be produced. We want to create:
β’ First, we need a work file. In EViews, all tasks are conducted in memory, but they apply to the image of a file
which will contain all the elements managed at a given moment.
We can create the file right now (as a memory image) or start from a pre-existing one, in which case the file will be
transferred from its device into memory.
o First, only one version of the file must be open in memory. As we state elsewhere, EViews allows the user to
open a second version (or even third, and fourthβ¦) of a file already opened. Then changes can be applied
only to one of the memory versions, such as series generation and estimations.
This is obviously46 very dangerous. At the least, one will lose one of the set of changes, as there is no way to transfer
elements from an image to the other. Of course, each file can be saved under a different name, but this does not
allow merging the changes47. At the worst, one will forget the allocation of changes to the files, and one or both will
become inconsistent, the best option being to avoid saving any of them, and to start afresh.
β’ In command mode, check that no file of the same name is opened, and close it if necessary.
β’ In program mode (the case here) make sure that no file is open at first. This calls for an initial βCLOSEβ
statement, which will not succeed most of the time 48 but will guarantee that we are in the required situation.
o Second, a new project must start from a clean (or empty) workfile. For an initial file to contain elements is at
best confusing, at worst dangerous. For instance, series with the same name as elements in our project can
already be present with a different meaning (GDP for a different country?), and available for a larger period.
Allowing EViews to estimate equations over the largest period available will introduce in the sample
irrelevant values.
A simple way to solve the problem is to delete any existing element, through the statement:
46
This is only a personal opinion.
47
Providing this option does not look impossible.
48
With fortunately no error message.
120
DELETE *
which will destroy any pre-existing item, except for the generic C (generic vector of coefficients) and RESID (generic
series of residuals) which are created automatically with the work file and cannot be deleted.
There is only one acceptable case for pre-existing elements: if the work file contains some original information,
provided to the user by an external source. But even in this case the file has to be saved first, to allow tracing back the
steps to the very beginning in which only this original information was present, in its original form.
In any case, in EViews, the possibility to define separate pages (sheets) inside the work file solves the problem. As we
have seen earlier, one can just store the original data in one page and start building the transformed data in a blank
one, logically linked to the original.
First principle of modeling: always organize your work in such a way that if step n fails, you can always get back to
the result of step n-1.
First principle of modeling (alternate version): Always organize your programs in such a way that you can produce
again all the elements associated with the present situation.
CLOSE small
DELETE *
β’ That the file small.wf1 is open in memory with the needed characteristics, for a page called βmodelβ.
β’ That only one version is open (provided no more than one was open previously, of course, but we shall
suppose you are going to follow our suggestions).
β’ That the page is empty (actually it contains only C and RESID).
Now that we have a work file, we must fill it with the necessary information.
The original information is represented by the 72 series in the FRA.XLS49 Excel file. We shall import them using the
IMPORT statement. This statement is quite simple (see the Userβs manual for detailed options):
49
EViews allows also to read Excel 2010 .xlsx files (but not to produce them).
121
READ fra.xls 72
But beware: even if the Excel file contains dates (in the first column or line) this information is not taken into account.
What is used is rather the current sample, defined by the last SMPL statement. Fortunately, in our case, the current
sample, defined at workfile creation, is the same as the one in the Excel file. But this will not always be the case:
better to state the SMPL before the READ.
READ fra.xls 72
Second principle of modelling: if introducing a (cheap) statement can be useful, even extremely seldom, do it now.
One also has to be careful about the orientation of series: normally they appear as columns, and data starts from cell
B2 (second line, second column). Any other case has to be specified, as well as the name of the sheet for a multi-sheet
file.
If the follow the above method, all the data will be transferred to the βmodelβ page. This makes things easier in a way,
as all information will be immediately available. But
β’ The separation between original and model data will not be clear.
Instead of loading the original series in the model page, a specific page is created (named for instance βoecdβ) in
which the data is imported.
Then in the model page the model variables are declared as βlinkedβ, and a link is defined with the original series in
the βOECDβ page.
β’ Now, we need to define the model on which we shall work. Producing a model starts with the statement:
MODEL modelname
122
Let us call our model _fra_1.
A trick: starting the name of important elements by an underscore allows them to be displayed at the upper corner of
the workfile screen, avoiding a tedious scrolling if the number of elements is large. For very important elements (like
the model itself) you can even use a double underscore.
The statement
MODEL __fra_1
The second option is dangerous in our case, as we want to start from scratch. To make sure of that, the most efficient
(and brutal) technique is to delete the model first, which puts us in the first case.
DELETE _fra_1
MODEL _fra_1
This introduces a slight problem, however. In most cases (including right now) the model does not exist, and the
DELETE statement will fail. No problem, as what we wanted is to make sure no model preexisted, and this is indeed
the situation we obtain. But EViews will complain, as it could not perform the required task. And if the maximum
number of accepted errors is 1 (the default option) the program will stop.
o It is better to specify the βnoerrβ option, which accepts failure of the statement without error message.
DELETE(noerr) _fra_1
MODEL _fra_1
123
Another way to avoid this situation is obviously to set the maximum number of errors to more than 1. This is done by
changing the number in the βMaximum errors before haltingβ box in the βRun programβ menu. If you want this option
to apply to all subsequent runs, you have to tick in the βSave options as defaultβ box.
Actually, if you have followed the principle above, there is no risk in proceeding in a program which produced error
messages, even valid ones. You have saved the elements associated to the initial situation, and even if you forgot to
do that, you can always repeat the steps which led to it.
Now, which number should we specify? In my opinion, depending on the model size, from 1000 to 10000. The number
has to be higher than the number of potential errors, as you want to get as close as possible to the end of the
program. Of course, you will never make 10000 logical errors. But the count is made on the number of error
messages. And in a 2000 equations model, if you have put all the endogenous to zero and you compute their growth
rates, this single mistake will generate 2000 messages.
The only drawback is that if your program uses a loop on the number of elements of a group, and this group could not
be created, the loop will run indefinitely with the message:
50
The message associated with a real error will locate it between the preceding and following artificial errors.
124
β’ Introducing the equations.
Now that we have a blank model, we can introduce the equations one by one. The text of these equations has already
been defined; we just need to state the EViews commands.
_fra_1.append IP=f*(Q)
o At this moment, we expect the model to explain the decision on investment by the evolution of GDP. This
seems quite logical, but we have not decided between the possible forms of the theoretical equation, and we
have not checked that at least one of these equations is validated by all required econometric tests.
o But at the same time we want EViews to give us as much information as possible on the structure of our
model: simultaneities, exogenous parts...
o The best compromise is clearly to produce a model which, although devoid of any estimated equation,
nevertheless presents the same causal relationships as the (future) model we consider.
The simplest choice should be, as if we were writing model specifications in a document or on a blackboard, to state:
IP=f(Q)
Unfortunately, EViews does not accept an equation written in this way. It will consider we are using a function called f,
with the argument Q. As this function does not exist, the equation will be rejected.
The trick we propose is to put an asterisk between βf βand the first parenthesis, which gives
IP=f*(Q).
CAP=f*(LE,K)
CAP=f*(LE+K)
One just has to state his conventions, and you are welcome to use your own.
M=FD+TD
This will work too, but the equation can be confused with an actual identity, quite misleading in this case.
_fra_1.append LE =f*(Q)
_fra_1.append I=f*(Q)
_fra_1.append K = K(-1)*(1-depr) + I
_fra_1.append CAP=f*(LE+ K)
_fra_1.append IC=r_icq * Q
_fra_1.append CI=f*(Q)
_fra_1.append LT=LE+lg
_fra_1.append TD = FD + r_ic * Q
_fra_1.append UR = Q/CAP
_fra_1.append M=f*(TD+UR)
_fra_1.append X=f*(wd+UR)
_fra_1.append Q + M = FD + X
They produce a 16 equations model called _fra_1. After running the statements, an item will be created in the
workfile, with the name β_fra_1β and the symbol βMβ (in blue).
Double-clicking on this item will open a special window, with the list of equations:
β’ Variables: shows the variables (endogenous in blue with βEnβ, exogenous in yellow with βXβ). For the
endogenous, the number of the equation is given. This allows locating the equation in the model text, which
is useful for large models.
The βdependenciesβ button gives access to a sub-menu, which allows to identify the variables depending on the
current one (Up) and which influences it (Down).
For instance, for FD, βUpβ will give TD and Q, βDownβ will give CO, I, G, and IH
The βFilterβ option will allow selecting variables using a specific βmaskβ. For instance, in a multi-country model the
French variables can be identified with FRA_*, provided one has used such a convention.
β’ Source text: this is basically the text of the model code. We shall see that this changes with estimated
equations.
β’ Block structure: this gives information on the logical structure of the model (detailed later in Chapter 7).
We get:
127
For the time being, let us only say that a simultaneous block contains interdependent elements. For any couple of
elements in the block, a path can lead from the first to the second, and vice-versa. Of course, this property does not
depend on the ordering of equations inside the block.
EViews gives also number of feedback variables (this will be explained later too).
On the contrary, a recursive block can be ordered (and EViews can do it) in such a way that each variable depends only
(for the present period) on previously defined variables.
This information is useful to improve the understanding of the model, to locate inconsistencies and to correct
technical problems.
o Normally endogenous elements appear as exogenous: the equation for the variable has been forgotten, or
written incorrectly.
o Elements foreign to the model appear: variables have been misspelled.
o A loop appears where there should be none.
o Or (more likely) an expected loop does not appear: for instance a Keynesian model is described as recursive,
or a model for two countries trading with each other can be solved as two independent blocks.
All these errors can be detected (and corrected) without calling for the data. This can speed up the building process,
especially if the data is not yet produced.
If the original and model series share the same page, one will simply use the βgenrβ statement, in the sequence.
genr Q=FRA_GDPV
genr CI =FRA_ISKV
genr IC=FRA_ICV
genr LT =FRA_ET
genr LG =FRA_EG
genr FD = FRA_TDDV
genr CO =FRA_CPV
128
genr IP = FRA_IBV
genr IH = FRA_IHV
genr WD = FRA_XGVMKT
genr GD = FRA_IGV+FRA_CGV
genr X =FRA_XGSV
genr M =FRA_MGSV
genr r_rhiq=(RHI-WR*LT)/Q
genr sr=(RHI-CO)/RHI
genr UR=Q/CAP
genr rdep=((K(-1)+IP)-K)/K(-1)
If the original series are managed in their own page (a better option in our opinion), one will use:
link {%1}
next
Q.linkto oecd\FRA_GDPV
CAP.linkto oecd\FRA_GDPVTR
CI.linkto oecd\FRA_ISKV
IC.linkto oecd\FRA_ICV
LT.linkto oecd\FRA_ET
LG.linkto oecd\FRA_EG
FD.linkto oecd\FRA_TDDV
CO.linkto oecd\FRA_CPV
129
RHI. oecd\FRA_YDRH
I.linkto oecd\FRA_IBV
IH.linkto oecd\FRA_IHV
WD.linkto oecd\FRA_XGVMKT
X.linkto oecd\FRA_XGSV
M.linkto oecd\FRA_MGSV
GD .linkto oecd\FRA_IGV+FRA_CGV
However, a problem remains for GD, the sum of the two original variables FRA_IGV and FRA_CGV. The LINK
function allows to refer to single variables and not functions (as Excel does). Until EViews 8 you had two options.
LINK FRA_IGV
LINK FRA_CGV
FRA_IGV.linkto oecd\FRA_IGV
FRA_CGV.linkto oecd\FRA_CGV
genr gd=FRA_IGV+FRA_CGV
genr FRA_GDV=FRA_IGV+FRA_CGV
And linking it
LINK GD
GD.linkto oecd\FRA_GDV
130
But it is also possible to refer to variables in a different page, as
page_name\variable name
Of course, the same method could have been used for single variables.
genr Q=oecd\FRA_GDPV
genr CAP=oecd\FRA_GDPVTR
genr CI=oecd\FRA_ISKV
genr IC=oecd\FRA_ICV
genr LT=oecd\FRA_ET
genr LG=oecd\FRA_EG
genr FD=oecd\FRA_TDDV
genr CO=oecd\FRA_CPV
genr I=oecd\FRA_IBV
genr IH=oecd\FRA_IHV
genr WD=oecd\FRA_XGVMKT
genr X=oecd\FRA_XGSV
genr M=oecd\FRA_MGSV
genr GD =oecd\FRA_IGV+oecd\FRA_CGV
131
it all depends on if you want changes in the original series to be applied automatically, or to control the process
through GENR51. But if the series is not present (like GD) in the original data, a GENR statement is called for anyway.
Now we have produced a first version of the model, and the associated data. As the behaviors have not been
established, we obviously cannot solve it. But we can check two important things:
These conditions are needed to start estimation, the next stage in the process. The first one is obvious, the second less
so. But inconsistencies in identities can come from using a wrong concept for a variable, of computing it wrongly. If
this variable is going to be used in estimation, whether as dependent or explanatory, the whole process will be based
on wrong elements.
This test can be conducted through a very simple technique: the residual check
At this point, asking for a solution of the model cannot be considered. However, some controls can be conducted,
which do call for a very specific βsimulationβ. This technique is called βresidual checkβ.
This method will compute each formula in the model using the historical values of the variables. This can be done by
creating for each equation a formula giving the value of the right-hand side expression (using the GENR statement in
EViews). However, there is a much simpler method, provided by EViews.
We can perform a very specific βsimulationβ, in which each equation is computed separately using historical values.
β’ Breaking down the model into single equation models, as many as there are equations.
51
Of course, this will also increase the size of the workfile.
132
β’ Solving each of these models at the same time but separately, using as explanatory values the historical ones.
If we call these historical values π¦π‘0
It means we shall compute:
0
π¦π‘ = π(π¦π‘0 , π¦π‘β1 , π₯π‘ , πΌΜ) + ππ‘
The interest of this method is obvious: if the residual in the equation is not zero, it means that there is at least one
error in that particular equation. Of course, the problem is not solved, but its location is identified. We shall see later
that this method is even more efficient for a fully estimated model, and we shall extend our discussion at that time.
It would be illusory, however, to hope to obtain a correct model immediately: some error diagnoses might have been
badly interpreted, and corrections badly performed. But even if the error has been corrected:
genr IH=FRA_IH
_fra_1.append FD = CO + IH + IP + CI + gd
If we correct the error on IH without correcting r_ih, the IH equation will now appear as wrong, while its actual
number of errors has decreased from 2 to 1.
This means achieving a set of all zero residuals might take a little time, and a few iterations, but should converge
regularly until all errors have disappeared52.
β’ Failure to solve
o series with the right name, but unavailable, either completely (they have not been obtained), or partially
(some periods are lacking).
o bad spelling (call to a non-existent series)
β’ Non-zero residuals
β’ Non-verified behavioral equations (or with erroneous residual). This issue will be applicable (and addressed)
later.
o At the base year (where elements at constant and current prices are identical): the price indexes could be
mistaken for one another, or values could be mistaken for volumes.
o Otherwise, it could come from a typing error (made by the user or the data producer).
o Or if it appears in the last periods, the provisory elements could be inconsistent.
β’ Observing the magnitude of the error also can be useful: a residual exceeding the normal economic
magnitude (1000% for example) should come from a specification error: bad operator, inversion of
52
Unless the modeler allows some identities to hold true only approximately.
134
coefficients, mistaking values for values per capita. A low residual will often come from confusion between
two close concepts (the consumption price deflator excluding or including VAT).
β’ For additive equations, a missing or extra element may be identified by comparing the residual to the actual
values of variables. For instance, if the error on final demand for 2010Q1 is 56734 and this is the actual value
of housing investment.
β’ If the sign of the error is constant (and especially if the order of magnitude is similar across periods), the error
could come from the absence of an element, a multiplication by a wrong factor, or a missing positive
influence.
β’ If several errors have identical values, they should have the same origin. This is the case when values are
mistaken for volume, if the share the same deflator.
β’ If two variables show roughly identical errors with the opposite sign, this can come the fact that one of them
has erroneous values and explains the other.
For instance, if historical values for Q are overestimated, the relative error on UR and Q will be similar with different
signs.
UR = Q/CAP
Q + M = FD + X
Diagnosing errors in the residual check phase can lead back to different phases of the modelling process:
β’ Data management: the data obtained from external producers is not consistent, for a full series or for specific
observations (this happens!).
β’ Production of model data: using the wrong original series, using a wrong computation.
Example: using a variable at current prices instead of constant or forgetting an element in a sum.
Example: forgetting the term for Housing investment in the definition of demand. But if the same error was made
when computing the series, the two errors will compensate each other.
Example: an error in the imports equation shows that the explanatory series for domestic demand has been changed
since estimation.
Applying this process a number of times will be necessary to produce a coherent model.
135
5.3.6.3.3 Back to the example
Producing a residual check is quite easy in EViews: one just has to specify the option βd=fβ in the SOLVE statement:
_fra_1.solve(d=f)
Of course, as all equations will be computed separately, all information must be available on the current sample
period, including the values of the endogenous variables (which should be explanatory somewhere else). Contrarily to
computations and estimations, EViews does not adapt the simulation process to the feasible period (this seems rather
logical).
As the model is recursive (super-recursive?) computation gives the result directly, and no element describing the
solving method is needed (we shall see them later).
However:
Every time EViews has to solve a model, the name given to the results will be built from the original name of the
variable, with the addition of a suffix (a prefix is also possible but less manageable in our opinion). This avoids
destroying the original information, and allows comparing alternate solutions.
(Remember: append adds text to the model, an identity equation is only a special case of text).
_fra_1.append @all _C
The equation for FD will give FD_C, which we can compare with the actual values of FD.
Computing the differences between actual and computed values can be done in a loop, using the syntax described
later. The elements in the loop can be defined βby handβ but it is more efficient to use the βmakegroupβ statement.
136
_fra_1.makegroup(a,n) groupname @exog
In our case:
Two remarks:
β’ You surely wonder about the reason for the (a,n). This modifies the default options of the βmakegroupβ
statement, which would produce a group with the baseline names (in our case with _C added) and leave out
the actual names. Stating (a,n):
It would be best to restrict the computations to the identities. The residuals on the βestimatedβ have no meaning: as
the βfβ scalar is null, the right-hand side will be computed as zero, and the percentage error as 100% as 100*(value -
0)/value. But being able to compute the whole model proves that estimations can be conducted on that period.
group g_vbeha CI I LE M X
Or
group g_vbeha CI I LE M X
g_viden.drop CI I LE M X
This creates first a full group g_viden, then eliminates the estimated from it.
137
This last technique is clearly inefficient here, but will be much more with a 500 equations model with 50 estimated
ones (a more usual situation).
However, both techniques call for a user-defined list, which will have to be updated each time the variable set is
modified, something we want to avoid: we propose using a more complex, but automatic one.
A tip: A visual check is made difficult by the relative imprecision of EViews, which often produces small residuals for
exact equations. In scientific format, these residuals appear as numbers with high negative exponents, which are hard
to identify. One solution is to move to a fixed decimal presentation, by selecting a zone (in the βspreadsheetβ view)
then using the right mouse button to access βdisplay formatβ then βfixed decimalβ.
A simpler solution to observe if there is no error is to display all the residuals as a single graph, and look, not at the
series (they should move around in some Brownian motion) but at the scale: both maximum and minimum must be
very small.
Another idea is to transfer the residuals to Excel and sort the sheet (unfortunately EViews does not sort a sheet across
series on the values at a given period). The non-negligible elements should appear at the top and the bottom
according to their sign and the sorting order. Then one can delete the small errors in the middle (less than 0.001%?).
As error correction progresses, the number of remaining lines should decrease.
This technique takes more time but allows to identify immediately and fully the faulty elements.
Managing groups (very useful in modelling) is more flexible and organized. The Preview function (see above) applies to
the group as a set.
However, for very simple tasks (like adding and element to a group) using the command window (in which the
previous group specification is available) can actually prove faster.
You certainly have realized by now (and you knew it probably before anyway) that one should avoid as much as
possible having to edit the text of modeling programs, each time changes have been made earlier in the process. This
represents at best extra work, at worst a source of error. We have just violated this principle, by separating by
ourselves the endogenous into behavioral and identity.
This will introduce problems, in particular in large models: the initial process will be tedious and error prone, and one
will have to remember to update the list every time the model structure changes.
We propose a simple technique to avoid this, and make the initial separation and its updating automatic. It is based on
the presence of the βfβ scalar in the behavioral equations.
o Simulate the model with the option βd=fβ and f=1, saving the results under a given suffix
o Set f to 2 and update the model (this is necessary for EViews to take into account the change).
o Simulate the model again with f=2 and another suffix.
o Create empty groups of estimated and identity variables.
o Produce a loop over the whole group of endogenous, and test each time if the results of the two simulations
are different.
Note: when we move to actual estimated formulas, we will introduce a residual appending the suffix β_ecβ to the
name of the variable. We will use the same technique applying a change to this element.
We can use the following program (for the period 2000 β 2002). We suppose that any percentage error higher than
0.00001 denotes an error.
scalar f=0
solve(d=f) _fra_1
scalar f=1
_fra_1.update
solve(d=f) _fra_1
%1=g_vendo.@seriesname(!i)
series pf_{%1}=100*({%1}_d-{%1}_c)/({%1}_c+({%1}_c=0))
if @max(@abs(pf_{%1}))>1e-5 then
g_vbeha.add {%1}
else
g_viden.add {%1}
endif
next
139
This sequence calls for some explanation.
β’ The loop (βforβ to βnextβ) is reproduced for each variable in the list g_vendo. The number of these variables is
g_vendo.@count (For EViews, x.@count is an integer scalar containing the number of elements in group x).
For regular users of EViews, or people familiar with programming, the above was probably clear. For others, this is the
time to give very basic information about EViews programming (even if this is not the purpose of this book).
In the programs we are going to present, intensive use is made of two elements: groups and loops.
5.3.7.1 Groups
Groups are named elements which refer to a set of objects (which can be series, series expressions but also other
objects), allowing to treat them either as a whole or in sequence.
For instance
group g x y z x/y
will create a group named g containing the three series x, y and z and the ratio of x to y.
The element must be series of expression, but one can cheat by creating artificial series with the name of the
requested element.
One can:
β’ Group groups
β’ Add and drop elements from groups:
140
g.add a
g.drop x
g.@seriesname is a character vector which contains the names of the series in group g.
group g_fra fra_* will create a group from all the elements starting with fra_ an underscore.
group g_GDP ???_GDP will create a group from all the GDPs of OECD countries (using three characters as a
label).
group g_3 ???_* will create a group from all the elements starting with three characters, then
Groups can be used to display a list of series, as spreadsheet or graph, by double-clicking on its name in the workfile
window (where they appear as a blue βGβ symbol) or calling for it.
The default display is a spreadsheet format, but one can move to graphs using the βViewβ button then βgraphβ, or
even editing the list of elements by βViewβ + βgroup membersβ.
Managing groups has been made more flexible and organized in the last versions. The Preview function (see later)
applies to the group as a set.
However, for very simple tasks (like adding and element to a group) using the command window (in which the
previous group specification is available) can actually prove faster.
5.3.7.2 Loops
β’ By element
β’ s (a list or a group)
141
for %parameter list-of-variables or group-name
next
The block of statements will be repeated in sequence for each element in the list, which will then replace the
parameter.
The presence of brackets around the parameter changes its status. With brackets the associated characters are
included in the statements, then the brackets are dropped. Without brackets the parameter is considered as a
character string variable.
%1="111"
The statement
genr xxx={%1}
while
xxx=%1
The statement
genr xxx=%1
142
will be illegal as it tries to transfer a character string to a series.
%2=%1+"333"
%2={%1}+"333"
β’ By integer number.
next
The block of statements will be repeated in sequence from first-integer to second-integer, incrementing if necessary
by third-integer, the value replacing the parameter.
143
This type of loop can also be applied to a group
%1=group-name.@seriesname(!integer)
next
During the modelling process, you often have to compare two sets of information.
o Make sure that two sets of data are identical. This applies to the results of a program you are running again,
maybe after a long delay.
o Control the evolution of historical values for a model data set, showing for instance which equations will have
to be estimated again.
o Summarize the results of a residual check, showing for which equations the right-hand side (using historical
values of the explained variable) is different from the right hand side (the result of the computation). By
setting a tolerance level slightly higher than zero (for instance 0.0001) one can restrict the display to the
errors deemed significant.
o Or you just might want to know which elements of a set are present in another set, for instance which
available series are actually used by one model.
You can compare elements between workfiles and pages inside the same workfile. EViews will display one line per
element, in which will be stated its relation, between: unchanged, modified (numerically), added, deleted,
replaced (logically, the last case applies for instance to a linked variable have been modified). A filter can be
applied.
For series, a tolerance level can be set, under which the series are not considered modified. The display will tell how
many periods show a higher difference.
By default, all elements will be displayed, but one can restrict the case (for instance, to all variables present in both
pages with a difference higher than the criterion).
Equations and models are not compared but appear in the list.
144
wfcompare(tol=criterion,list=comparison_type) list_of_compared_series list_of_reference series
For instance, if you want to compare all French series (starting with βFRA_β) between the pages βbaseβ and
βupdatedβ, for a tolerance level of 0.00001 one will state:
145
6 CHAPTER 6 THE ESTIMATION OF EQUATIONS
We now have
β’ A full description of the framework of the model, in which all the identities are completely specified, and the
intents in terms of behaviors are described as clearly as possible.
β’ A full database containing all the series in the present model, endogenous and exogenous, with their
description.
We have also checked that:
The next stage is obviously to replace each of the tentative behaviors by actual ones, validated both by economic
theory and statistical criteria.
What we are proposing is not a book on econometrics, and anyway we will never be as knowledgeable, by far, as the
EViews team and collaborators, both in terms of theory and ability to teach it (remember that one of them is Robert
Engleβ¦).
This means we will not approach the theoretical aspects of the subject, leaving the reader to the use of the books we
propose in our bibliography, or even to the EViews Help manuals, which can be actually used as teaching tools, as they
are both comprehensive and very progressive in their approach.
But once the modeler is familiar with the concepts, their application to an actual case53 is not straightforward at all.
This means we think this book can bring a very important contribution: showing how these methods can be used in
the process of building our models. The reader will learn how, in very specific cases, first very basic then more
operational econometrics can be used (or not used), considering the information he has and the goal he is pursuing.
We shall also show the role econometrics take in the process, not as a single task between data gathering and
simulations, but as a recurrent partner in the iterative process of building a working model.
We shall not only give examples working smoothly from the start, but show also how econometrics can be set aside,
and how, in some cases, an initial failure can be transformed into success, with some imagination 54.
53
One in which he is not playing with data, but actually oblided to succeed.
54
Remember David Hendryβs four golden rules of econometrics: 1.Think brilliantly, 2.Be infinitely creative, 3.Be
outstandingly lucky, 4.Otherwise, stick to being a theorist.
146
6.2 SPECIFIC ISSUES
Nevertheless, we feel it will be useful to start with two cases, which are not generally treated by manuals, and can
lead to wrong decisions, or wrongly evaluating the results of tests.
The statistic called "R2" or "R-squared" is the most commonly used to judge the global quality of an estimation. It is
defined by the following formula.
R2 = βπt=1(π₯Μπ‘ - xΜ )2 / βπt=1(π₯π‘ - xΜ )2
This statistic can therefore be interpreted as the share of the variance of the observed variable x explained by the
estimated formula.
A geometrical explanation also can be used: if we consider the space of variables (dimension T = number of
observations), the estimation method will consist in minimizing the distance between the explained variable and the
space (the plane or hyper plane) generated by the vectors of explanatory series, using combinations of parameter
values.
Especially, if the formula is linear relative to estimated parameters and contains a constant term, we can consider the
estimation is based on the difference of variables (explained and explanatory) to their means. In this case, minimizing
the Euclidian distance will lead (as can be seen on the graph) the vector (π¦Μπ‘ - yπ‘ ) to be orthogonal to the space and
therefore to the vector(π¦Μπ‘ - yΜ ). These two elements represent the non-explained and explained part of(π¦π‘ - yΜ ), the
variance of which is the sum of their squares. The R 2 can be interpreted as the square of the cosine of the angle
between the observed and adjusted series: the closer the R 2 is to 1, the smaller the angle will be and the higher the
share of the estimated variable in the explanation of the total variance. The explanation will be perfect if y-yΜ belongs
to the space, and null if the perpendicular meets the space at the origin.
147
obs 3
y
x1-x1
y
y x2-x2
obs 2
obs 1
If the equation presents no constant term, the same reasoning can be applied, but this time the mean is not
subtracted. However, the R2 no longer has the same meaning: instead of variances, direct sum of squares will be used.
We will not go further in the explanation of this test, concentrating instead on its practical properties.
The R2 statistic will be all the higher as the explained variable and at least one of the explanatory variables present a
time trend according to the rank of the observation. Thus components of each of these variables on axes of
observations will grow in the same or opposite direction (from highly negative to highly positive or the reverse), and
give associated vectors very close orientations. In the above graph, the components of variables on the axes will be
more or less ordered according to the numbering of the axes themselves. The first observations will be the most
negative, then values will grow through zero and reach the most positive ones in the end. The same goes if the
ordering follows opposite directions: the estimation will evidence a negative link.
In this case, even in the absence of a true causal relationship between variables, the orientation of the vectors will be
similar to a given multiplicative factor, and the R2 test will seem to validate the formulation. And most time series (like
values, quantities or prices), generally present a growing trend, giving this phenomenon a good chance to happen. For
example, if we apply the previous equation for French imports:
148
(1) πΏππ(ππ‘ ) = π β
πΏππ(ππ·π‘ ) + π + π’π‘
Replacing TD by any steadily growing (or decreasing) variable55 will give a βgoodβ R2, better maybe than actual French
demand.
Actually, it can be shown that testing for each OECD country the estimation of its imports as a function of the demand
of any country, the βtrueβ equation does not come always as the best, although it is never far from it.
This happens in particular when we explain the same concept using a different transformation.
We can see that the time trend has disappeared from both series, and any correlation will come from common
deviations around this trend (or rather common changes in the value from one period to another). This is of course a
much better proof of a link between the two elements (independently from autocorrelation).
To put the two formulations on equal grounds, they must explain the same element. For this, one can just modify the
new equation into:
Compared to the initial formula, this transformation will not change the explanation56, as obviously the minimization
of the sum of squared residuals represents the same process. The only modified statistic will be the R2, which will
increase a lot, as an identical element with a high variance (compared to that of οLog(Mt)) has been added on both
sides.
55
Like Australian demand, or the price of a pack of cigarettes in Uzbekistan.
56
Before estimation EViews will move the lagged term to the left.
149
The choice between the two formulations should not rely on the R 2 but on the autocorrelation of the residual: if ut is
not correlated one should use (1), if it is one should try (2). But in any case the issues will be solved by error correction
models and cointegration, which we shall address later.
The following two formulations are equivalent an indeed give exactly the same results, except for the R-Squared
statistic.
When observing the validity of individual influences, one element plays a very specific role: the constant term.
150
β’ To manage the fact that the equation does not consider elements as such, but the deviations from their
means. In ordinary least squares, even if the final result is a linear formulation of the variables and a
constant term, the process actually
o computes the mean
o substracts it from the variable
o uses the deviations to estimate a formula with no constant57
o recombines estimated coefficients and means into a constant
This constant is an integral part of the process. It should be included every time at least one of the
explanatory elements does not have a zero mean.
Let us give an example for the first case: if imports have a constant elasticity to demand, we will estimate:
Or
πΏππ(ππ‘ ) = π β πΏππ(ππ·π‘ ) + π
but the estimation process will first use the difference to the average to get βaβ
or
57
As all elements in the formula have zero mean, the sum of the residuals will also
151
π = πΏππ(π) β π β
πΏππ(ππ·)
We can see in particular the consequences of a change in the units (thousands, millions, billions...). The constant term
will absorb it, leaving βaβ unchanged. In the absence of βbβ, βaβ will get a different value, for no economic reason.
Of course, the more βbβ is significant, the more its absence will be damaging to the quality of the estimation (and the
more βaβ will be affected). But this is no reason to judge βbβ. We can see this as weighting an object with a balance:
the two platters never have the same weight, and even if the damage decreases with the difference, it is always useful
to correct it. And in our case there is no cost (actually it makes things cheaper, as the cost of the decision process
disappears).
It is not frequent for the constant term to have a theoretical meaning. The majority of such cases come from a formula
in growth rates or variations, where the constant term will be associated with a trend.
The only justification for the absence of a constant term is when this theoretical constant is null. In this case,
observing a significant value becomes a problem, as it contradicts the theory. We shall give an example soon.
In our model, we have to estimate five equations, for which we have already ideas about their logic:
β’ Other simple formulations could probably be built on the same sample, with equivalent or maybe better
quality. having
β’ Using another sample (another country for instance) the same economic ideas could lead to different
formulations (not only different coefficient values).
β’ Other economic ideas could be applied, with the same complexity.
β’ To produce a truly operational model, the present framework would have to be developed in a large way. We
will present such developments later.
152
However the model we are building represents in our sense a simplified but consistent summary of the general class
of models of this type. Reading descriptive documents for any operational structural model, one will meet many of the
ideas we are going to develop.
We shall use this simplest estimation to present the basic features of EViews estimation, and also stress the necessity
for homoscedasticity.
Our formulation will suppose simply that firms want to achieve a level of stocks proportional to their production (or
GDP). For a particular producer, this should be true both for the goods he produces and for the ones he is going to use
for production. For instance, a car manufacturer will allow for a given delay between production and sale (maybe
three months, which will lead to an inventory level of 1/4 th of annual production). And to be sure of the availability of
intermediary goods (like steel, tires, electronic components and fuel for machines in this case) he will buy the
necessary quantity (proportional to production) sometime in advance.
We shall suppose that firms have achieved, at the previous period, an inventory level IL representing a number of
semesters of production:
π³π°πβπ = π β πΈπβπ
π³π°πβ = π β πΈπ
π³π°πβ = π°π³π
This means that contrary to the general case this equation should not include a constant term. Its presence would call
for a trend (and a constant) in the equation in levels, with no economic justification. It would also introduce a
problem: adding a constant to an explanation in constant Euros would make the equation non-homogenous.
Even then, the equation faces a problem, concerning the residual: between 1963 and 2004, French GDP has been
multiplied by 4. We can suppose the level of inventories too (maybe a little less with economies of scale and improved
management techniques).
153
It is difficult to suppose that the unexplained part of the change in inventories is not affected by this evolution. As the
variable grows, the error should grow. But to apply the method (OLS), we need the residual to have a constant
standard error. Something must be done.
The simplest idea is to suppose that the error grows at the same rate as GDP, which means that if we measure the
change in inventories in proportion to GDP, we should get a concept for which the error remains stable. Of course, we
shall have to apply the same change to the right-hand side, which becomes the relative change in GDP.
To avoid causality problems (for a given semester, demand for IC is partly satisfied by Q) we shall use the previous
value of Q.
As this is our first example, when shall use it to present the basic estimation features.
Actually, the technique will be different according to the stage in the estimation process: whether we are exploring
several individual formulations, looking for the best option both in statistical and economic terms, or we have already
elected the best one, and want to merge it into our model.
The simplest way to estimate an equation under EViews is through the menus, using in succession:
154
In the case of ordinary least squares, this can be a list of elements separated by blanks, in our case:
IC/Q(-1)=c(1)*D(Q)/Q(-1)
Of course, the two methods give exactly the same results (in the first case, the βcβ vector will also be filled with the
estimated coefficient).
The default method will be Least Squares, appropriate in our case. If the equation was not linear in the coefficients,
the second presentation would be automatically called for.
155
smpl 1962q1 2004q4
which means that we consider data from the first quarter of 1962 to the last of 2004.
If the equation is linear in coefficients, EViews recognizes this property, and does not try to iterate on the coefficients,
as it knows the values found are the right ones.
We can see that EViews gives the sample used (the relevant periods of our sample). Estimation starts in 1963q2, (with
data starting in 1963Q1) and ends in 2004q4.
β’ We get also the number of periods, and the time and date.
β’ The other elements are the usual statistics, described earlier. The most important are:
o The R-squared, the Durbin-Watson test and the Standard Error of regression for global elements.
o The coefficient, the t-Statistic and the probability for reaching the coefficient value if the true coefficient is
null with the estimated standard error.
In our case:
β’ The R-Squared is very low, even if the extreme variability and the absence of trend of the left-hand element
plead in favor of the explanation58.
However, as with almost all homogenous estimations, a simple interpretation is available, through the standard error:
as the explained variable is measured in points of GDP, this average error represents 0.68 points.
58
If we knew the values for IL, its estimation would get a better R 2 (due to the colinearity of LI and Q). But we would
be led to estimate an error correction model on IL, anyway. We have seen the advantage of this formulation, but for
the quality to extend to the whole model, all equations must be of this type.
156
β’ The coefficient is very significant. The probability of reaching 0.107 for a normal law with mean 0 and
standard error .00126 is measured as zero. Of course, it is not null, but it is lower than 0.00005, and probably
much so.
β’ But the Durbin-Watson test takes an unacceptable value, even if the absence of a constant term (and the
subsequent non-zero average of residuals) makes its use questionable.
β’ The graph of residuals is the second important element for diagnosis. It shows the evolution of actual and
estimated series (top of the graph, using the right-hand scale) and of the residual (bottom, using the left hand
scale, with lines at + and β 1 standard error). This means that inside the band residuals are lower than
average, and higher outside it. Of course, it gives only a relative diagnosis.
.04
.02
.00
.03
.02 -.02
.01 -.04
.00
-.01
-.02
1965 1970 1975 1980 1985 1990 1995 2000
The graph shows (in our opinion) that the equation provides some explanation, but some periods (1975-1980 in
particular) present a large and persistent error
In addition to the display of estimation results and graph of residuals, EViews creates several objects:
β’ A vector of coefficients, contained in the βCβ vector. The zero values or the results from the previous
regression are replaced59 .
β’ A series for the residuals, contained in the βRESIDβ variable. The βNAβ values or the results from the previous
regression are replaced60.
β’ A tentative equation, called βUntitledβ for the moment, and containing the developed formula, with βCβ as
the vector of coefficients, with numbers starting from 1. In our case, the formula is obviously
IC/Q(-1)=c(1)*D(Q)/Q(-1)
59
But if the present regression contains fewer coefficients than the previous ones, the additional elements are not put
to zero.
60
But this time, residuals from previous equations are given either computed values or Β« NA Β».
157
Any subsequent estimation will replace this equation by the new βUntitledβ version.
β’ EViews provides also several options, accessed from the menu, and which can be useful:
o View gives three representations of the equation: the original statement, and two formulas including
coefficients as parameters (the above βcβ type) or as values.
o βPrintβ allows printing the current window: to a printer, to a text file (using characters, which saves space but
reduces readability, especially for graphs), or to a graphics RTF file. This last option might call for a
monochrome presentation, which is obtained through the Β« Monochrome Β» template (the last of the general
Graph options).
o βNameβ allows creating the equation as a named item in the workfile, with an attached comment. It is
important to use it immediately after the estimation, as the temporary equation (named βuntitledβ) will be
replaced by the next estimation.
However, inserting an underscore ("_") before the name proposed will place the equations in the first positions of the
working window.
EViews proposes as a standard name βEQβ followed by a two-digit number, following the lowest one unused at the
moment.
οͺ Give a name representative of the equation (like βEQ_X3Uβ for the third equation estimating X as influenced
by the rate of use).
οͺ Accept the EViews suggestion and rely on the attached comment for the explanation.
Actually the item saved is more complex than the actual formula. Double-clicking on it shows that it contains the full
representation, including the residual (and actually the standard errors of the coefficients, even if they are not
displayed).
158
o Forecast produces a series for the estimated variable (or the estimated left-hand expression, generally less
interesting), and an associated graph with an error band (and a box with the statistics).
Instead of using Quick>Estimate, one can work directly through the command window. One just has to add βlsβ before
the formula.
ls IC/Q(-1)=c(1)*D(Q)/Q(-1)
β’ By copying and editing the current equation on the next line of the command box, entering changes is made
much easier.
β’ After a session of estimations, the set can be copied into a program file and reused at will. Management of a
set of alternate versions is much easier.
β’ One can control the size of characters. This is quite interesting when working with a team, or making a
presentation, as the normal font is generally quite small.
β’ The only drawback is sample definition: it has to be entered as a command, not as an item in the βestimateβ
panel.
Let us go back to our estimated formula. If we are not satisfied with the previous results, we can try alternate options,
without changing the economic background.
First, one can observe a very clear outlier for the second quarter of 1968. Economists familiar with French history are
aware that this corresponds to the May 1968 βrevolutionβ when student demonstrations turned into general strikes,
paralyzing the economy as not only factories closed but also transportation came to a standstill, and available goods
could not be delivered to the firms which needed them.
As production was much more reduced than demand, in an unexpected way, satisfying it had to call for inventories,
and even a lower production needed intermediate goods (in particular oil and coal) which were not available due to
these transportation problems.
This will lead us to introduce a βdummyβ variable, taking the value 1 in the second quarter only. We can observe
already the gain from our βtimeβ variable: we can introduce it explicitly, without having to create a specific element
which will have to be managed in every simulation, including forecasts.
159
Dependent Variable: CI/Q(-1)
Method: Least Squares
Date: 03/21/20 Time: 23:00
Sample (adjusted): 1963Q2 2004Q4
Included observations: 167 after adjustments
One could assume that the change in inventories does not depend on the present change in GDP, but rather on the
sequence of past changes, due both to inertia on firms behavior, and technical difficulties in implementing decisions.
Let us first consider the last five periods leaving the coefficients free. We get;
Not only most of the explanations are not significant, but the value of the first one (maybe the most important) takes
the wrong sign.
To make the set of lagged coefficients smoother, we can constraint them to follow a polynomial on the lags.
The syntax for this element is :
160
PDL(variable, number of lags, degree of the polynomial, conditions).
PDL(@pch(Q),4,3,2)
which implies
β’ A maximum lag of 4.
β’ A polynomial of degree 3.
β’ A zero value for the last coefficient (just beyond the last lag).
The results are rather satisfactory, with a nice profile for the reconstructed coefficients, and generally significant
explanations. The dummy element provides also a much better explanation.
161
.04
.02
.00
.015 -.02
.010 -.04
.005
.000
-.005
-.010
1965 1970 1975 1980 1985 1990 1995 2000
Once an equation has been selected for introduction in the model, a different strategy should be used.
β’ It is not simple to link the equation name with its purpose, which makes the process unclear and forbids to
use any automated and systematic process.
β’ The C vector is used by all equations is only consistent with the last estimated one.
β’ The residuals cannot be managed simply
β’ Instead, we propose the following organization, deriving all elements from the name of the dependent
variable, though a systematic transformation:
β’ Naming the equation after the estimated variable.
coef(10) c_ci
β’ Introducing an additive explicit residual, named after the estimated variable. The reason is the following.
162
o It is essential for a model to estimate and simulate the same equation. Of course two versions can be
maintained, one being copied into the other after each new estimation. This is:
οͺ Tedious.
οͺ Difficult to manage.
οͺ Error-prone.
It is much better to use a single item. However this faces a problem: one wants access to the residual, in particular
for forecasts as we shall see later. And the estimation calls for no residual.
The solution is quite simple: introduce a formal residual, but set it to zero before any estimation.
This allows:
coef(10) ec_ci
genr ec_ci=0
genr ec_ci=resid
In this estimation, we shall stress the importance of establishing a sound economic framework before any estimation.
In our sense, trying for the best estimation without considering the economics behind the formula, and especially its
consequences for model properties, is rather irresponsible. For instance, using the logarithm of investment is quite
dangerous. Its value can change in very high proportions, and if we go back to the microeconomic foundation of this
163
behavior, its value could very well be negative, as some firms are led to disinvest from time to time, by selling more
capital than they buy.
.2
.1
.0
.06
.04 -.1
.02
-.2
.00
-.02
-.04
-.06
1965 1970 1975 1980 1985 1990 1995 2000
Everything seems to go well: the statistics are quite good (even the Durbin-Watson test), the signs are right, the graph
shows a really strong fit. However, when we merge the equation into a model, its simulation properties will be
affected by the base solution: even a very high increase in GDP will have a low impact of the absolute level of
164
investment if it this level was very low in the previous period. And investment can show huge variations as it
represents a change in a stable element, capital.
One can guess that although linking investment (a change in capital) to the change in production seems a natural idea,
jumping to the above formulation was moving a little too fast. In general, one should be naturally reticent in taking
the logarithm of a growth rate, itself a derivative.
The right starting approach is to clarify the economic process through a full logical formalization.
Let us suppose that production follows a βcomplementary factorsβ function, which means that to reach a given level
of productive capacity, fixed levels of capital and employment are required, and a reduction in one factor cannot be
compensated by an increase in the other. This means obviously that the less costly process (optimal) is the one which
respects exactly these conditions.
Combinations of K and L
such that:CAP=CAP1
K1 CAP=CAP1
L1m L1 L
(the βt-1β means that we shall use the level of capital reached at the end of the previous period).
Actually, for a given level of employment, there is always some short-term leverage on production, at least at the
macroeconomic level. Temporarily increasing labor productivity by 1% can be easily achieved through extra hours, less
vacations, cancelled training courses...
This means capital will be the only limiting factor in the short-term.
165
πͺπ¨π·π = πππ β
π²πβπ
πΌπΉπ = πΈπ /πͺπ¨π·π
*
Now let us suppose firms actually want to reach a constant target utilization rate UR , and expect a production
π
levelππ‘+1 . Then by definition:
This means that the target growth rate of capital can be decomposed as the sum of three terms, one with a positive
influence:
β’ The target growth rate of the rate of use: if the firms feel their capacities are 1% too high for the present level
of production, they can reach the target by decreasing capital by 1% even if production is not expected to
change.
β’ The growth rate of capital productivity: if it increases by 1%, 1% less capital will be needed.
But the element we need is investment. To get it we shall use the definition.
166
πΎπ‘ = πΎπ‘β1 β
(1 β πππ‘ ) + πΌπ‘
In other words:
β’ If firms expect production to grow by 2.5%, capacities should adapt to that growth
β’ But if they feel their capacities are under-used by 1%, their desired capacity will only increase by 1.5%.
β’ If capital productivity is going to increase by 0.5%, they will need 0.5% less capital.
β’ But once capital growth has been defined, they also have to compensate for depreciation (5% is a reasonable
value).
In summary, the accumulation rate (the ratio of investment to the previous level of capital) would be:
If we suppose
β’ That the depreciation rate is constant, as well as the rate of growth of capital productivity,
β’ That production growth expectations are based on an average of the previous rates,
And we consider as the rate of use the ratio of actual GDP to a value obtained under normal utilization of factors,
which leads to a unitary target, we get the simplified formula:
With
167
βππ=0 πΌπ = 1
finally, we can suppose, as we shall do also for employment, that the desired growth of capital is only partially reached
in practice, either because firms react cautiously to fluctuations of demand, or because they are constrained by
investment programs covering more than one period, from the decision to the actual installation of investment
goods.61
The results are rather satisfactory, with the right sign and acceptable statistics for all explanatory elements. This was
not obvious, as their strong correlation (both use Q in the numerator) could have made it difficult for the estimation
process to separate their role.
61
In this model, we suppose there is no delay between the acquisition of investment (with impact on demand and the
supply-demand equilibrium) and the participation of this investment to the production process.
168
.026
.024
.022
.020
.0008
.018
.0004
.016
.0000
-.0004
-.0008
78 80 82 84 86 88 90 92 94 96 98 00 02 04
The graph of residuals shows that the quality of the explanation grows with time, and is especially good for the last
periods. This is rather important for simulations over the future, and one can wonder what we would have done if the
sample had been reversed, and the initial residuals had applied to the last periods.
We will deal with this problem of growing errors on recent periods when we address forecasts.
The equation we have built is not only satisfactory by itself, but we can expect it to provide the model with adequate
properties. In particular, the long-term elasticity of capital to production is now unitary by construction. Starting from
a base simulation, a 1% permanent shock on Q will leave the long run value of UR unchanged62. This gives the same
relative variations to production, capacity and (with a constant capital productivity) capital.
The coefficients βaβ and βbβ determine only the dynamics of the convergence to this target.
Actually we have estimated a kind of error-correction equation, in which the error is the gap between actual and
target capacity (the rate of use).
We hope to have made clear that to produce a consistent formulation, in particular in a modelling context, one must
start by establishing a sound economic background.
Of course, the employment equation should follow also a complementary factors framework.
In the previous paragraph, we have shown that in this framework the element determining capacity is the sole capital,
while firms could ask from workers a temporary increase in productivity, high enough to ensure the needed level of
production63. Adapting employment to the level required to obtain a βnormalβ productivity target will be done by
steps.
62
As the left-hand side represents the (fixed) long-term growth rate of capital.
63
This is true in our macroeconomic framework, in which the changes in production are limited, and part of growth is
compensated by increases in structural productivity (due for instance to more capital-intensive processes). At the firm
169
This means estimating employment will allow us to apply elements on error correction models in a very simple
framework.
But they do not adapt the actual employment level to this target, and this for:
β’ Technical reasons: between the conclusion that more employees are needed and the actual hiring 64, firms
have to decide on the type of jobs called for, set up their demands, conduct interviews, negotiate wages,
establish contracts, get authorizations if they are foreign citizens, maybe ask prospective workers to train...
Of course this delay depends heavily on the type of job. And this goes also for laying out workers.
β’ Behavioral reasons: if facing a hike in production, firms adapt immediately their employment level to a higher
target, they might be faced later with over employment if the hike is only temporary. The workers they have
trained, maybe at a high cost, have no usefulness at the time they become potentially efficient. And laying
them out will call generally for compensations....
We should realize that we are facing an error correction framework, which we can materialize as:
βNormalβ labor productivity does not depend on economic conditions. It might follow a constant trend over the
period, such as:
πΏππ(πππ‘ ) = π + π β π‘
πΏπΈπ‘β = ππ‘ /πππ‘β
level, employment can produce bottlenecks. This will be the case if a sudden fashion appears for particular goods
requiring specialized craftspeople, even if the tools and machines are available for buying.
64
But not the start of actual work: what we measure is the number of workers employed, even if they are still training
for instance.
170
π₯πΏππ(πΏπ‘ ) = πΌ β
π₯πΏππ(πΏβπ‘ ) + π½ β
πΏππ(πΏβπ‘β1 /πΏπ‘β1 ) + πΎ + ππ‘
We recognize here the error correction framework presented earlier, which requires:
But ο‘ does not have to be unitary. However, if we follow the above reasoning, its value should be between 0 and 1,
and probably significantly far from each of these bounds.
To estimate this system we face an obvious problem: pl* is not an actual series (LE* either, but if we know one we
know the other).
But if we call βplβ the actual level of productivity (Q/LE) we can observe that:
Now it should be obvious that if pl* and pl have a trend, it must be the same, actually the trend defining completely
pl*. If not, they will diverge over the long run, and we will face infinite under or over employment. So target
productivity can be identified using the trend in the actual value, if it exists.
This means we can test the stationarity of the ratio as the stationarity of actual productivity around a trend, a test
provided directly by EViews.
We can expect a framework in which actual productivity fluctuates around a regularly growing target, with cycles
which we do not expect to be too long, but can last for several periods65.
genr PROD = Q / LE
65
Which will create (acceptable) autocorrelation in the difference to the trend.
171
and regress it on time:
ls log(PROD) c t
Results are quite bad. Of course productivity shows a significant growth, but the standard error is quite high (more
than 5 %). More important, the graph of residuals and the auto-correlation test show that we are not meeting the
condition we have set: that observed productivity fluctuates around a trend, with potential but not unreasonably long
cycles.
26.8
26.4
26.0
.1
25.6
.0
25.2
-.1
-.2
-.3
1965 1970 1975 1980 1985 1990 1995 2000
The problem apparently lies in the fact that the average growth rate is consistently higher in the first part of the
period, and lower later. Seen individually, each sub-period might seem to meet the above condition.
172
From the graph, we clearly need one, and probably two breaks. One will observe that the first period follows the first
oil shock, and the beginning of a lasting world economic slowdown. The reason for the second break is less clear
(some countries like the US and Scandinavia show a break in the opposite direction).
For choosing the most appropriate dates, we can use two methods:
β’ A visual one: 1973 and 1990 could be chosen, possibly plus or minus 1 year.
β’ A statistical one: the most appropriate test is the Chow breakpoint test, which diagnoses if the introduction
of one or more breaks improve the explanation. To make our choice automatic, we shall consider three
intervals, and apply the test to all reasonably possible combinations of dates from those intervals. As we
could expect, all the tests conclude to a break. But we shall select the combination associated to the lowest
probability (of no break), which means the highest likelihood ratio66. Of course, this criterion works only
because the sample and the number of breaks remain the same.
The best result corresponds actually to 1972q3 and 1992q4, as shown in this table of log-likelihood ratios.
66
the highest F gives the same conclusion
173
9.8
9.4
.04 9.0
.02
.00 8.6
-.02
-.04
-.06
-.08
1965 1970 1975 1980 1985 1990 1995 2000
The first element is quite logical: what we are estimating for the model is not actual productivity (this is given in the
model by an identity, dividing actual GDP by employment). We are looking for the exact formula for target
productivity, prone to error only because we have not enough information to produce the true value. If the sample
grew, or the periodicity shortened, the precision would improve constantly. The residual might not decrease, but it
does not represent an error, rather the gap between the actual and βnormalβ values of labor productivity. Whereas, in
a normal behavioral equation, the residual corresponds to an error on the variable, and cannot be decreased
indefinitely, as the identification of the role of explanatory elements becomes less reliable with their number.
The reason here is purely technical. Our model is designed to be used on the future. So it is essential to make the
forecasting process as easy as possible.
If the partial trends are still active in the future, we shall have to manage them simultaneously. We can expect that we
want to control the global trend of labor productivity, if only to make it consistent with our long-term evolutions of
GDP (which should follow world growth) and employment (which should follow population trends). Obviously,
controlling a single global trend is easier than a combination of three trends.
Also, the last trend is the most important for interpretation of model properties, and it is better to make it the easiest
to observe.
On the other hand, our technique has no bad points, once it has been understood.
Finally, the reason for breaking the trend in 2004 is also associated with handling of its future values. If the global
coefficient is changed, this will be the period for a new break, and this is the best period to introduce it.
174
10
-10
-20
-30
-40
-50
65 70 75 80 85 90 95 00 05 10
T-2004
(T-1972.50)*(T<1972.50)
(T-1992.75)*(T<1992.75)
We can see that in the beginning three trends apply, then two, then after 2004 only the global (blue) trend is
maintained.
The results look quite good, both in the validation of coefficients and the graphs. We are presenting the program
version, which will be introduced in the model (as a identity )67.
However, we observe a very high residual in the second quarter of 1968. As we are estimating a trend, this is not the
place for considering a one-period outlier. We will come back to this problem when we estimate employment itself.
Now we must test the stationarity of the residual. We shall use the Dickey Fuller test (or Phillips β Perron).
First we need to generate from the current RESID a variable containing the residual (the test is going to compute its
own RESID, so it is not possible to test on a variable with this name).
genr res_prle=resid
uroot(1,p) res_prle
uroot(h,p) res_prle
67
This is not absolutely needed, as a variable depending only on time can be considered exogenous and computed
outside the model. But we want to be able to change the assumption in forecasts, and this is the easiest way.
175
Null Hypothesis: RES has a unit root
Exogenous: Constant
Lag Length: 1 (Automatic - based on SIC, maxlag=13)
t-Statistic Prob.*
Null
The Hypothesis:
values RES_PRLE
of target productivity has aemployment
and desired unit root are given by:
Exogenous: Constant
Lag Length: 1 (Automatic - based on SIC, maxlag=11)
t-Statistic Prob.*
176
Augmented Dickey-Fuller test statistic -5.191811 0.0000
Test critical values: 1% level -3.522887
5% level -2.901779
10% level -2.588280
genr log(prle_t)= c_prle(1)+c_prle(2)*t+c_prle(3)*(t-1972.50)*(t<1972.50)+c_prle(4)*(t-
1992.75)*(t<1992.75)
genr led=q/prle_t
Now, as to the estimation of employment itself, LE will be estimated (using here the developed form) by:
where LED is equal to Q/prle_t, the trend obtained in the previous equation.
It is now time to consider the 1968 residual. With such a high value, there should be some economic explanation.
Indeed, the behavior of firms did not follow normal lines. They believed (rightly) that the decrease in production they
were facing was quite temporary. For them, laying out the corresponding number of workers was not reasonable, as it
would cost severance payments, and when things went back to normal there was no reason they would find as
efficient and firm-knowledgeable workers as before.
This means labor productivity decreased, then increased to get back to normal.
Dependent Variable: DLOG(LE)
Method: Least Squares (Gauss-Newton / Marquardt steps)
Date: 03/22/20 Time: 16:35
Sample (adjusted): 1963Q2 2004Q4
Included observations: 167 after adjustments
DLOG(LE)=C_LE(1)*DLOG(LED)+C_LE(2)*LOG(LED(-1)/LE(-1))
+C_LE(3)+C_LE(4)*((T=1968.25)-(T=1968.50))+EC_LE
The results are rather significant, except for the last coefficient.
177
LE
19,200,000
18,800,000
18,400,000
18,000,000
17,600,000
17,200,000
16,800,000
16,400,000
1965 1970 1975 1980 1985 1990 1995 2000
Following the reasoning made earlier, c_le (3) (or rather c_le(3)/c_le(2)) should represent the logarithm of the long-
term gap between the target employment and the level reached. This gap will be significant if both:
β’ Employment shows a trend (the target is moving), which means that GDP and target productivity show
different trends.
β’ A difference between the growths of GDP and target productivity is not compensated immediately (the value
of c_le(1) is different from one)
The second condition is clearly met, but for the first the answer is dubious. Instead of trend, one rather observe a
break in the level. Nevertheless, the coefficient diagnosed as significant.
As to the first coefficients, they are quite significant, maybe lower than expected.
178
.010
.005
.000
-.005
.010 -.010
-.015
.005
.000
-.005
-.010
1965 1970 1975 1980 1985 1990 1995 2000
This will not be true in the US case. We will use another data base, this time bi-yearly.
First, the β1992β break exists too, but it is now positive, as shown here:
179
-2.6
-2.8
-3.0
.08 -3.2
-3.4
.04
-3.6
.00
-.04
-.08
1960 1965 1970 1975 1980 1985 1990 1995 2000
Second, employment has grown substantially over the sample period, which means that a constant term is called for:
180
.04
.02
.02
.00
.01
-.02
.00
-.04
-.01
-.02
1965 1970 1975 1980 1985 1990 1995 2000
Reverting to the French case, the outlier observed earlier (and in the equation for the change in inventories) remains.
The year 1968 presents a strong negative residual in the first semester, and a negative one for the last. As we are now
considering a dynamic behavior, this is the time for treating this problem.
As stated earlier, French people (and people familiar with French post-war history) will certainly recall the May 1968
βstudent revolutionβ which lasted roughly from March to June. During that period, the French economic process was
heavily disturbed, in particular the transportation system, and GDP decreased (by 7.6% for the quarter). If the
equation had worked , employment would have decreased too, especially as productivity growth was quite high. On
the contrary, it remained almost stable.
The explanation is obvious: firms expected the slump to be purely temporary, and activity to start back after a while
(actually they were right, and GDP grew by 7.5% in the next semester, due in part to the higher consumption allowed
by βGrenelle68β wage negotiations, very favorable to workers). They did not want to lay out (at a high cost) workers
whom they would need back later with no guarantee to find the same individuals, familiar with the firmsβ techniques.
So the employment level was very little affected.
This means that the global behavior does not apply here, and the period has to be either eliminated from the sample,
or rather treated through a specific dummy variable, taking the value 1 in the first semester and β1 in the second
(when employment increased less than the growth in GDP would call for).
This case is rather interesting: some economists could be tempted to introduce dummies just because the equation
does not work, and indeed the results will be improved (including in general the statistics for the explanatory
variables). This can probably be called cheating. On the contrary, not introducing the present dummy can be
considered incorrect: we know that the local behavior did not follow the formulation we have selected, so it has to be
modified accordingly.
68
From the location of the Ministry of Employment where negotiations where conducted.
181
The global results are slightly improved, and the first coefficient increases significantly, meaning that the adaptation of
employment to the target is more effective at first. The introduction of the element was not a negligible issue.
.010
.005
.000
.008
.004 -.005
.000 -.010
-.004
-.008
-.012
1965 1970 1975 1980 1985 1990 1995 2000
We shall use:
+c_prle(4)*(t-1992)*(t<1992)
+c_le(4)*((t=1968.5)-(t=1968))+ec_le
182
genr ec_le=resid
Note: the reason for the initial βdropβ statement is to avoid the duplication of the elements inside the group, in case
the procedure is repeated. If the elements are not present, nothing happens.
Estimating exports will be simpler from the theoretical side. We shall use it as an example for introducing first
autoregressive processes, then cointegration.
Let us first start with the simplest idea: exports show a constant elasticity to world demand. In other words:
π₯π/π/(π₯ππ·/ππ·) = π
or by integration:
πΏππ(π) = π β πΏππ(ππ·) + π
Estimation should give to a value close to unity, as world demand is measured as the normal demand addressed to
France by its clients, and takes into account:
183
Dependent Variable: LOG(X)
Method: Least Squares
Date: 03/22/20 Time: 17:39
Sample (adjusted): 1974Q1 2004Q4
Included observations: 124 after adjustments
25.6
25.2
24.8
.12
24.4
.08
24.0
.04
23.6
.00
-.04
-.08
74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04
However the low value of the Durbin-Watson test indicates a strongly positive autocorrelation of residuals, and
invalidates the formulation. The graph shows indeed long periods with a residual of the same sign, even though the
two variables share quite often common evolutions.
ππ‘ = π β ππ‘β1 + π’π‘
where ο² should be significant (positive here), and u(t) independent across time.
184
πΏππ(ππ‘ ) = π β
πΏππ(ππ·π‘ ) + π + ππ‘
We can multiply the second equation by ο², and subtract it from the first:
To estimate the above formula, it is not necessary to establish the full equation (which calls for a full nonβOLS
specification, as it is not linear in the coefficients).
One can very well use the same presentation as for ordinary least squares, introducing in the estimation window the
additional term AR(n), n representing the autocorrelation lag, in our case 1:
185
Dependent Variable: LOG(X)
Method: ARMA Conditional Least Squares (Marquardt - EViews legacy)
Date: 03/22/20 Time: 17:50
Sample (adjusted): 1974Q2 2004Q4
Included observations: 123 after adjustments
Convergence achieved after 1 iteration
LOG(X)=C_X(1)*LOG(WD)+C_X(2)+[AR(1)=C_X(3)]
The results are rather satisfactory: the first coefficient retains the theoretical value, the new coefficient is significant,
the global precision is much improved (see also the graph) and the DW test is closer to satisfactory.
25.6
25.2
24.8
.06 24.4
.04 24.0
.02
23.6
.00
-.02
-.04
-.06
74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04
However our formulation is a little too simplistic. We want exports to decrease with the rate of use of capacities,
representing the fact that if firms are already using selling most of their potential production, they will be to be less
dynamic in their search for foreign markets (more on this later).
We get:
186
Dependent Variable: LOG(X)
Method: ARMA Conditional Least Squares (Marquardt - EViews legacy)
Date: 03/22/20 Time: 18:00
Sample: 1978Q1 2004Q4
Included observations: 108
Convergence achieved after 13 iterations
LOG(X)=C_X(1)*LOG(WD)+C_X(2)*LOG(UR)+C_X(3)+[AR(1)=C_X(4)]
The relevant coefficients are significant, the average error lower 69 (1,5%), the Durbin-Watson test acceptable, but
there is a problem: the sign for the new element is wrong (and unsurprisingly, the coefficient for World demand is
now much too low.
Let us not despair. If this old fashioned tool did not work, let us try a more up to date: cointegration.
Just as for stationarity, we will not develop the theory behind the method, leaving it to actual econometricians, an
excellent source of information being actually the EVIews manual. We sill also rely on basic cointegration theory, and
not on any of the recent developments.
Let us just say that cointegration is actually a simple extension of stationarity to a set of two or more variables. To
establish cointegration between two elements, one has to prove that in the long run these elements move together,
maintaining a bounded βdistanceβ (or rather that a linear combination is bounded), while the value of each of the two
elements is unbounded (a necessary condition).
For a group of more than two elements to be cointegrated, no subset of this group must have this property
(stationarity of no single element and cointegration of no subset).
If we want to go beyond intuition, the reason for the last condition is that if a cointegrating relation is evidenced
between elements, some of which are already cointegrated, one can always recompose the encompassing equation
into the true cointegrating equation (considered as a new stationary variable) and other variables.
69
As the logarithm measures relative evolutions, an absolute error on a logarithm is equivalent to a relative error on
the variable itself.
187
For instance,
if
πβ π₯+πβ π¦+πβ π§
π β π₯ + πβ² β π¦
is too (we can use the same a as a cointegrating equation is known to a given factor), then
πβ π₯+πβ π¦+πβ π§
is equivalent to:
(π β π₯ + πβ² β π¦) + (πβ² β π) β π¦ + π β π§
three new elements, one of which is stationary, which forbids us to test cointegration on the three.
So the two properties must be checked: moving together means both βmovingβ and βtogetherβ.
Using images rather related to stationarity (as they apply to the actual difference of two elements, without weighting
coefficients) we can illustrate the concept as
β’ Astral bodies moving in outer space and linked together by gravity. Their distance is bounded but their
position relative to each other is unknown within those bounds, and we do not know if one is leading the
other.
β’ Human beings: if they are always close to each other, they can be decided to be related (love, hate,
professional relationship). But only if they move: if they are in jail, a small distance means nothing.
In our example, the first idea could be to test cointegration between X, WD and UR. But to ensure the stability of our
long-term simulations, we need exports to have a unitary elasticity to WD. If this is not the case, when X reaches a
constant growth rate, it will be different from that of WD: either France will become the only exporter in the world (in
relative terms) or the role of France in the world market will become infinitely negligible. Both prospects are
unacceptable (the first more than the second, admittedly).
188
This constraint can be enforced very easily by considering only in the long run (cointegrating) equation the ratio of X
to WD, which we shall link to the rate of use. We will test cointegration between these two elements.
Let us first test their stationarity. We know how to do it (from the estimation of employment).
UROOT(1,p) Log(X/WD)
UROOT(1,p) Log(UR)
Note: if we use menus, we should first display the group (either by selecting elements in the workfile window or
creating a group). The default display mode is βspreadsheetβ but the βViewβ item proposes other modes, among them
βUnit root testβ and βcointegration testβ (if more than one series is displayed at the same time).
189
Null Hypothesis: LOG(X/WD) has a unit root
Exogenous: Constant, Linear Trend
Lag Length: 1 (Automatic - based on SIC, maxlag=12)
t-Statistic Prob.*
190
Null Hypothesis: LOG(UR) has a unit root
Exogenous: Constant, Linear Trend
Lag Length: 10 (Automatic - based on SIC, maxlag=12)
t-Statistic Prob.*
These first tests show that both UR and the ratio of exports to world demand cannot be considered stationary, even
around a trend: the T statistic is too low, and the estimated probability for the coefficient to zero is too high 70.71
70
Not extremely high, however.
191
Let us now see if the two elements are co integrated, using the Johansen test.
a No deterministic trend in the data, and no intercept or trend in the cointegrating equation.
b No deterministic trend in the data, and an intercept but no trend in the cointegrating equation.
c Linear trend in the data, and an intercept but no trend in the cointegrating equation.
d Linear trend in the data, and both an intercept and a trend in the cointegrating equation.
e Quadratic trend in the data, and both an intercept and a trend in the cointegrating equation.
In our case, we shall use option d (trend in the cointegrating equation, no trend in the VAR)
71
We can observe that in a traditional least squares estimation, the same T value would give the opposite diagnosis.
192
Date: 03/22/20 Time: 19:09
Sample (adjusted): 1979Q1 2004Q4
Included observations: 104 after adjustments
Trend assumption: Linear deterministic trend (restricted)
Series: LOG(X/WD) LOG(UR)
Lags interval (in first differences): 1 to 4
EViews tests first if there is no cointegration. If this is accepted (if it shows a high enough probability), the process
stops. But here this is refused, as the probability (that there is no cointegration) is too low.
193
In this case, there is at least one cointegrating equation, and EViews proceeds to testing if there is more than one. This
cannot be refused here, as this time the probability is too high.
If the second assumption (at most 1 relation) had not been rejected, there would be at least two and we would have
to continue (there cannot be more relations than variables, however).
Evidencing more than one relation is problematic, maybe worse for the model builder than finding no relation (in
which case we can always proceed with adding new elements). Even if a cointegrating equation has no implications on
causality between elements, we generally intend to include it in a single dynamic formula (a VAR), which does explain
a given variable. With two equations, we are stuck with a parasite one, which will be difficult if not impossible to
manage in the context of the model (if we stop at econometrics, the problem is smaller).
We can observe if the existence (or the rejection) of at least one relation is barely or strongly accepted (and also of
only one for that matter).
The equation introduces a tradeoff between several concepts (here the share of French exports in world demand and
their rate of use). We have always an idea on the sign of the relationship, and also on an interval of economic validity.
There is no guarantee that the value will follow these constraints. It can even happen that the right sign obtained by
Ordinary Least Squares will become wrong when cointegration is tested on the same relation.
First, the sign is right: exports go down when the rate of use goes up (the sign is positive but both elements are on the
same side of the formula).
The size of the coefficient is more difficult to judge. The derivative of the equation relative to Q gives:
Let us suppose an increase in Q of 1 billion Euros coming only from local demand FD. In 2004, the share of exports in
French GDP was 32%. The exports target will decrease by
194
ο X = 0.665 * 0.32 = 213 millions of Euros.
Of course:
β’ In the long run, capacities will build up, UR will get back to its base value (we know that from the investment
equation) and the loss will disappear.
β’ The changes in Q can come also from X, introducing a loop. This means UR might not be the best
representative element. Perhaps we should restrict UR to the satisfaction of local demand (but this looks
difficult to formulate).
β’ This requires to use the same variables in both forms, in other words to extend the unitary elasticity
assumption to the dynamic equation, which is neither needed statistically nor realistic from an economic
point of view (as we have seen when estimating employment).
β’ The output does not provide information on the quality of the cointegration, an essential element in our
process.
But the good point is that by estimating a VAR using the same elements as the tested cointegration, we get the same
cointegrating coefficients, and this time they are stored in a vector! We can then specify the cointegrating equation
using these elements.
This has the extremely high advantage of allowing to establish a program which will adapt automatically to any change
in the data, an essential element in all operational modelling projects.
195
we shall use
to create and estimate the VAR, and store the coefficients in a vector by accessing the first line of matrix _var_x.b (not
displayed in the workfile window):
vector(10) p_x
p_x(1)=_var_x.b(1,1)
p_x(2)=_var_x.b(1,2)
p_x(3)=_var_x.b(1,3)
0= p_x(1)*log(x/wd)+p_x(2)*log(ur)+p_x(3)*@trend(60S1)
Actually the first parameter is not really needed, as it is equal by construction. We think using it makes the equation
clearer.
Estimating the dynamic equation calls for the computation of the residual in the cointegrating equation:
Then we estimate the VAR, releasing the constraint on the unitary elasticity of X to WD. In principle, the coefficient
should be positive for WD, negative for UR, and we can introduce a lag structure for both elements.
Unfortunately, the coefficient for UR is not significant here, maybe because of a delay in its influence.
72
The figures 1 and 1 indicate the scope of lagged variations of the left-hand side variable in the VAR which will be
added to the right-hand side. Here it will be 1 to 1 (or 1 with lag 1). The options used are consistent with the ones we
have used in coint (which have been determined automatically by EViews).
196
Dependent Variable: DLOG(X)
Method: Least Squares (Gauss-Newton / Marquardt steps)
Date: 03/22/20 Time: 19:18
Sample (adjusted): 1978Q3 2004Q4
Included observations: 106 after adjustments
DLOG(X)=C_X(4)*DLOG(WD)+C_X(6)*RES_X(-1)+C_X(7)
.06
.04
.02
.06
.00
.04
-.02
.02
-.04
.00
-.02
-.04
78 80 82 84 86 88 90 92 94 96 98 00 02 04
In a single country model, the rest of the world is exogenous, and imports and exports have to be estimated
separately, following of course the same guidelines:
β’ Imports will depend on demand and capacity utilization, through constant elasticities.
However, the definition of demand is not straightforward. For exports, we just had to consider global imports from
each partner country in each product, and compute an average using a double weighting: the role of these partners in
French exports, and the structure of products exported by France.
197
This was possible because we considered the rest of the world as exogenous, and did not try to track the origin of its
imports.
β’ We can consider final demand and exports. Obviously, they do not have the same impact on imports (due to
the absence of re-exports). We can generate a global demand by applying to exports a correcting factor.
Under rather standard assumptions (same import share in all uses, unitary ratio of intermediate consumption
to value added), this factor can be set at 0.5.
β’ We can also define intermediate consumption, and add it to final demand to get the total demand of the
country, a share of which will be imported. This method is obviously more acceptable from the economic
point of view. Unfortunately it relies on the computation of intermediate consumption, a variable less
accurately measured, and sensitive to categories and the integration of the productive process73.
We have chosen this last method nevertheless, favoring economic properties over statistical reliability.
73
For instance, if good A (say cotton) is used to produce good B (unprinted fabric) which gives good C (printed fabric),
both A and B will be counted as intermediary consumption. If the fabric is printed as the same time it is produced,
only A will be counted. If we consider value added, the total amount will not change, just the number of elements.
198
Dependent Variable: LOG(M)
Method: ARMA Maximum Likelihood (BFGS)
Date: 04/25/20 Time: 12:18
Sample: 1962Q1 2004Q4
Included observations: 172
Convergence achieved after 8 iterations
Coefficient covariance computed using outer product of gradients
27.0
26.5
26.0
25.5
.08 25.0
24.5
.04
24.0
.00
-.04
-.08
1965 1970 1975 1980 1985 1990 1995 2000
The fit is quite good including the sensitivity to fluctuations (which was the least we could expect), and the
autocorrelation eliminated. But we face a problem: the high value of the coefficient.
Now the question is: should the growth of demand be the only explanation for the growth of imports? In other words,
is there not an autonomous force increasing the weight of foreign trade, independently from growth itself? Or: if
demand did not grow for a given period, would imports stay stable, or keep part of their momentum? The associated
formula would present:
199
Dependent Variable: LOG(M)
Method: Least Squares
Date: 04/25/20 Time: 12:32
Sample (adjusted): 1962Q1 2004Q4
Included observations: 172 after adjustments
This does not work. Either we get autocorrelation, or a negative (but not significant) trend.
The problem with our formulation is actually very clear: in terms of model properties, it is reasonable to suppose that
in the short run, an increase in final demand will increase imports beyond their normal share, by generating local
bottlenecks on domestic supply. But the explanatory element should be the rate of use of capacities. At it is fixed in
the long run, the share should go back to normal with time.
A rate of use of 85% (a normal value over the whole economy) does not mean that all firms work at 85% capacity, in
which case they can easily move to 86% if needed. It means that the rates of use follows a given distribution, some at
lower than 85%, some higher, some at 99%, and a finite number at 100% (see graph).
200
Demand and the rate of use
.6
Baseline
An increase in demand
.5
.4
probability
.3
.2
.1
.0
2500 5000 7500 10000 12500 15000
The rate of use (in 10000ths)
An increase in demand will move the curve to the right, and more firms will face the limit: an increase of 1% will meet
it halfway for firms starting from a 99.5% rate of use. The additional demand, if clients do not accept local substitutes,
will have to be supplied by imports.
However, local firms will react to this situation, and try to gain back lost market shares by increasing their capacities
through investment: this is the mechanism we have described earlier. In our small model, the long-term rate of use is
fixed: the sharing of the additional demand will come back to the base values. These values can increase with time
due to the expansion of world trade.
Our formula will make imports depend on total demand and the rate of use:
Where
πΌπΆ = π‘π . π
ππ· = πΉπ· + πΌπΆ
201
And tc is the quantity of intermediary consumption units required to produce one unit of GDP.
By integration, we get
202
Dependent Variable: LOG(M)
Method: ARMA Maximum Likelihood (BFGS)
Date: 04/25/20 Time: 12:45
Sample: 1963Q2 2004Q4
Included observations: 167
Convergence achieved after 9 iterations
Coefficient covariance computed using outer product of gradients
No formulation is acceptable on all counts, with no autocorrelation, a significant explanation of the rate of use, and a
positive significant trend (to say nothing of the demand coefficient).
Actually, we are facing a usual problem : the two explanations (deviations of demand from its trend, and output gap)
are strongly correlated.
One idea can be to force the first coefficient to unity, and estimate the ratio of imports to demand.
203
Dependent Variable: LOG(M/TD)
Method: ARMA Maximum Likelihood (BFGS)
Date: 04/25/20 Time: 13:07
Sample: 1963Q2 2004Q4
Included observations: 167
Convergence achieved after 7 iterations
Coefficient covariance computed using outer product of gradients
-2.0
-2.4
-2.8
.08
-3.2
.06
.04 -3.6
.02
.00
-.02
-.04
-.06
1965 1970 1975 1980 1985 1990 1995 2000
This works quite well on all counts, and ensures that the growth of imports and demand will converge in the long run,
once the trend has been suppressed, and the rate of use has stabilized through the investment equation.
We could stop here. But as in the exports case, we can try to separate the behavior into short-term and long-term, in
other words to apply a n error correction framework. To represent this process, we need a formula which:
β’ Enforces a long-term unitary elasticity of imports to the total demand variable, with a positive additional
effect of the rate of use.
β’ Allows free elasticities in the short run.
We shall start by testing the cointegration between the share of imports in demand: M/TD and the rate of use.
Before, we test the stationarity of M/TD, or rather its logarithm (UR has already been tested):
204
It is strongly contradicted by the Dickey Fuller test:
t-Statistic Prob.*
It fails!
205
Date: 03/22/20 Time: 20:12
Sample (adjusted): 1979Q1 2010Q4
Included observations: 128 after adjustments
Trend assumption: Linear deterministic trend (restricted)
Series: LOG(M/TD) LOG(UR)
Lags interval (in first differences): 1 to 4
Let us not despair. Diagnosis an absence of cointegration is not such bad news74, as it allows us to proceed further. If a
set of two variables does not work, why not a set of three?
Now which additional element could we consider? The natural candidate comes both from theory and from the data:
74
Identifying two equations would be much worse, especially in a modelling framework.
206
If demand is present but local producers have no capacity problems, how can foreign exporters penetrate a market?
Of course, through price competitiveness, in other words by decreasing the import price compared to the local one.
This observation is confirmed by the data. Let us regress the import-demand ratio over the rate of use, consider the
residual (the unexplained part) and compare it to the ratio of import to local prices: we observe a clearly negative
relation.
.24
.20
Residual of log(m/td) over log(ur)
Log of import price competitiveness
.16
.12
.08
.04
.00
-.04
-.08
78 80 82 84 86 88 90 92 94 96 98 00 02
β’ Non-stationarity of Log(COMPM)
β’ Non- cointegration of Log(COMPM) with both Log(UR) and log(M/TD) individually,
We can test the cointegration of the three elements:
It works!!
207
Date: 03/22/20 Time: 20:17
Sample (adjusted): 1979Q3 2004Q4
Included observations: 102 after adjustments
Trend assumption: Linear deterministic trend (restricted)
Series: LOG(M/(FD+CT*Q)) LOG(UR) LOG(COMPM)
Lags interval (in first differences): 1 to 3
β’ An apparently high sensitivity of imports to the rate of use (but remember the investment equation will
stabilize it in the end).
However the true effect is not so high. If the equation was applied to the short-term, with a share of imports in total
demand of 0.15 (the value for 2004), we would get:
209
Dependent Variable: DLOG(M)
Method: Least Squares (Gauss-Newton / Marquardt steps)
Date: 03/22/20 Time: 20:26
Sample (adjusted): 1978Q4 2004Q4
Included observations: 105 after adjustments
DLOG(M)=C_M(5)*DLOG(FD+CT*Q)+C_M(6)*DLOG(UR)+C_M(7)
*DLOG(COMPM)+C_M(8)*RES_M(-1)+C_M(9)+EC_M
.06
.04
.02
.03 .00
.02 -.02
.01 -.04
.00
-.01
-.02
-.03
80 82 84 86 88 90 92 94 96 98 00 02 04
The results are rather acceptable, including the graph. By the way, it shows that rejecting a low R-Squared is not
justified when the dependent element shows a high variability.
The method we are using for storing equations has an additional advantage. Now that the residuals have been
introduced with their estimated values, all the equations should hold true. The checking process can now be extended
to all the endogenous variables.
Theoretically the estimated equations should be consistent with the data, as merging with the model the actual
estimated equations ensures the consistency. However:
β’ If the package does not provide this direct storing, or if the equation had to be normalized by the modeler,
editing the formulation could introduce errors.
β’ The storing of coefficients may have been done badly.
β’ The text, series or coefficients may have been modified by the user after estimation
210
β’ One could have accessed other series or coefficients than the ones used by the estimation (for example one
can seek them in another bank including series of similar names).
The reasons for a non-zero residual are less numerous than for identities. They can come only from the fact that
equation elements have changed since the last estimation.
Obviously, the main suspect is the data. New data series are due to be inconsistent with the previous estimation,
whether it has been updated (moving to a more precise version) or corrected (suppressing an error).
Actually, in EViews, applying a new version of an equation to a model requires, in addition to its estimation, to actually
merge it again into the model. This will create a new compiled version, without need to explicitly update the model.
Anyway, in our opinion, applying a new estimation should call for a full model re-creation. This is the only way to
guarantee a clear and secure update.
For our model, the statements for the residual check will be the following:
solve(d=f) _fra_1
%2=g_vendo.@seriesname(!i)
genr dc_{%2}={%2}-{%2}_c
genr pc_{%2}=100*dc_{%2}/({%2}+({%2}=0))
next
The solution series will have the suffix β_cβ, and the residuals the prefix βdc_β for the errors in levels, and βpc_β for the
relative errors.
211
Architecture of the small model
Scrapping
rate
Firms
Total Civil
GDP employmt
employmt servants
Invest
Capital
ment Household Real
revenue Wage rate
share of GDP
Change
inventor
Capacity
Housing Share of
Investmt revenue
Rate of
use
Consump Savings
Final rate
demand tion
exports imports
Governmt
demand
World Price
demand compet
[2] ur = q / cap
[3] q + m = fd + x
[5] ic = ct * q
[6] log(prle_t) = 9.8398+ 0.009382* (t - 2002) + 0.02278* (t - t1) * (t<t1) + 0.01540* (t - t2) * (t<t2)
[9] lt = le + lg
212
[13] ci/q( - 1) = -0.02680*(T = 1968.25) + 0.6128*ci( - 1)/q( - 2) - 0.0021490 + 0.2193*@PCH(q) + 0.1056*@PCH(q( - 1))
+ 0.05492*@PCH(q( - 2)) + 0.03918*@PCH(q( - 3)) + 0.03026*@PCH(q( - 4)) + ec_ci
[14] fd = co + i + gd + ci + ih
[15] td = fd + ic
[16] res_m = log(m / (fd + ct * q)) β 1.517 * log(ur) +-0.552 * log(compm) + 0.00579 * (@trend(60:1) * (t<=2004) +
@elem(@trend(60:1) , "2004q4") * (t>2004))
[18] res_x = p_x(1) * log(x / wd) + 0.9815 * log(ur) + 0.001441 * (@trend(60:1) * (t<=2004) + @elem(@trend(60:1) ,
"2004q4") * (t>2004))
213
7 CHAPTER 7: TESTING THE MODEL THROUGH SIMULATIONS OVER THE PAST
β’ The consistency of the data might hide errors compensating each other.
β’ Some complex equations might have hidden wrong individual properties.
β’ Connections between concepts might have been forgotten or wrongly specified.
β’ The growth rates provided naturally by equations could be the wrong ones.
β’ Some error correction processes could be diverging instead of converging.
β’ Assembling individually acceptable formulations might create a system with unacceptable properties.
β’ Exogenous elements might be linked with each other.
β’ Theoretical balances might not have been specified.
And finally:
Another modeler would have obtained a different model, with potentially different properties.
Let us just give an example for a problem: if in the short run increasing government demand by 1000 creates 800
consumption and 600 investment, while exports do not change and imports increase by 300, the individual equations
might look acceptable, but the model will diverge immediately through an explosive multiplier effect (800 +600-300 =
1100).
β’ Is there some indication that the model is unsuitable for forecasts, and for policy analysis?
In our opinion, it is only later, by simulations over the future (its normal field of operation, actually) that we can really
validate the use of a model. But as usual, problems should be diagnosed as soon as possible. And the availability of
actual data increases strongly the testing possibilities.
75
In some cases we might have been obliged to calibrate the values.
214
Finally, the errors evidenced at this stage might help to build a better forecasting structure, before any attempt on the
future.
To solve the model we need to apply a method (an βalgorithmβ). Let us present the different options.
7.1.1 GAUSS-SEIDEL
This is the most natural algorithm: one often uses Gauss-Seidel without knowing it, like M. Jourdain (the Bourgeois
Gentilhomme) makes prose.
The method starts from initial values. They can be the historical values on the past, on the future the values computed
for the previous period or for an alternate base simulation. The whole set of equations will be applied in a given order,
using as input the most recent information (computations replace the initial values). This gives a new set of starting
values. The process is repeated, using always the last information available, until the distance between the two last
solutions is small enough to be considered negligible. One will then consider that the solution has been reached.
As only present values will change during computation, we will not consider the other elements, and will drop the
time index.
π¦ = π(π¦)
We will use it to define the particular endogenous, and an exponent to define the iteration count.
a - We start from y0, value before any computation, setting the number of iterations to zero.
b - We add 1 to the number of iterations (which we shall note k); this gives to the first iteration the number 1.
k
c -We compute y i from i = 1 to n, taking into account the i-1 values we have just produced. This means we compute:
215
π¦ππ = π(π¦1π , . . . , π¦πβ1
π
, π¦ππβ1 , . . . . , π¦ππβ1 )
(at the first iteration, explanatory elements will take the starting value y 0 if their index is higher than the computed
variable)76.
d β At the end of the process, we compare yk and yk-1: if the distance is small enough for every element (using a
criterion we shall present) we stop the process, and use the last value as a solution. If not, we check if we have
reached the maximum number of iterations, in which case we accept the failure of the algorithm, and stop. Otherwise
we resume the process at step b.
Clearly, this algorithm requests an identified model, with a single variable on the left hand side (or an expression
containing a single simultaneous variable).
7.1.2 RITZ-JORDAN
The Ritz-Jordan method is similar to the one above: it simply abstains from using values computed at the current
iteration:
π¦ π = π(π¦ πβ1 )
Refusing to consider the last information, it looks less efficient than Gauss-Seidel. In our opinion, its only interest
appears when the model faces convergence problems: it makes their interpretation easier by reducing the
interferences between variables.
Contrary to the two above, the Newton method applies naturally to non-identified formulations. It represents
actually a generalization to an n-dimensional case of the well-known method using a sequence of linearizations to
solve a single equation.
76
This means only variables which are used before they are computed must be given values for initialization. We shall
come back to this later.
216
that we will simplify as above into:
π (π¦) = 0
The linearization of f around a starting solution gives, by calling βflβ the value of f linearized:
π¦ = π¦ 0 β (ππ/ππ¦)β1 0
π¦=π¦ 0 β
π(π¦ )
π¦ β π(π¦) = 0
π¦ = π¦ 0 β (πΌ β ππ/ππ¦)β1 0 0
π¦=π¦ 0 β
(π¦ β π(π¦ ))
217
The Newton method (one equation)
f(y0)
f(y1)
f(y2)
y2 y1 y0
Linearizing the model again, around the new solution y 1, and solving the new linearized version of the model, we
define an iterative process which, as the preceding, will stop when the distance between the two last values gets small
enough. Implementing this method is more complex: in addition to inverting a matrix, each iteration involves the
computation of a Jacobian. This can be done in practice in two ways:
β’ Analytically, by determining from the start the formal expressions of derivatives. At each iteration, we shall
compute them again from the present values of variables. This method supposes either undertaking the
derivation "by hand" with a high probability of errors, or having access to an automatic formal processor, a
program analyzing the text of equations to produce the formal expression of their derivatives. To a high initial
cost, this method opposes a simplification of computations during the iterative process 77.
β’ By finite differences, determining separately each column of the Jacobian by the numerical computation of a
limited first order development, applied in turn to each variable. One computes the y vector using the base
values, then for starting values differing only by a single variable, and compares the two results to get a
column of the Jacobian. One will speak then of a method of secants, or pseudo-Newton.
77
However, changing some model specifications calls for a new global derivation (or a dangerous manual updating).
78
Unfortunately, the associated code is not apparently available to the user, which would allow interesting
computations.
218
ο₯ ο»οf ( y ο ο½
n
i
k
+ e j οy j ) β f ( y k ) / οy j ( y j β y kj ) = fli ( y ) β f i ( y k )
j =1
One will have only to compute the y vector n+1 times: one time with no modification and one time for each of the
endogenous variables.
The expensive part of this algorithm being clearly the computation of the Jacobian and its inversion, a variant will
consist in computing it only each m iterations. The convergence will be slower in terms of number of iterations, but
the global cost might decrease.
EViews provides another alternative: Broydenβs method, which uses a secant method and does not require to
compute the Jacobian at each step. As we shall see later, this method proves often very efficient.
π¦ = π (π¦)
π¦ β π(π¦) = 0
219
π¦ = π¦ 0 β (πΌ β ππ/ππ¦)β1 0 0
π¦=π¦ 0 β
(π¦ β π(π¦ ))
or
π¦ = (πΌ β ππ/ππ¦)β1 0 0
π¦=π¦ 0 β
(π(π¦ ) β (ππ/ππ¦)π¦=π¦ 0 β
π¦ )
Broydenβs method (also called secant method) computes the Jacobian only once, in the same way as Newtonβs,
and computes a new value of the variable accordingly.
After that, it updates the Jacobian, not by derivation, but by considering the difference with the previous solution,
and the direction leading from the previous solution to the new one.
where J is the Jacobian, F the function which should reach zero, and x the vector of unknown variables.
Let us clarify all this with a graph based on the single equation case.
220
We can see that the direction improves with each iteration, less than Newton but more than Gauss-Seidel (for
which it does not improve at all).
Otherwise the method shares all the characteristics of Newtonβs, in particular its independence to equation
ordering. It takes generally more iterations, but each of them is cheaper (except for the first).
We shall see on a set of practical examples that on average it looks like the most efficient option on the whole,
both in terms of speed and probability of convergence 79. But the diagnosis is not so clear cut.
Methods described above have a common feature: starting from initial values, they apply formulations to get a new
set. The process is repeated until the two last sets are sufficiently close to be considered as the solution of the system.
One cannot identify the difference between two iterations with the precision actually reached (or the difference to
the solution). This is valid only for alternate processes. For monotonous ones, it actually can be the reverse: the slower
the convergence, the smaller the change in the criterion from one iteration to the other, and the higher the chance
that the criterion will be reached quite far from the solution. As to cyclical processes, they can reach convergence
mistakenly at the top or bottom of a cycle.
79
The most important feature in our opinion.
221
values
iterations
convergence
So one could question the use of this type of method, by stressing that the relative stability of values does not mean
that the solution has been found. However, one can observe that if the values do not change, it means that the
computation which gave a variable would give the same result with the new values of its explanatory variables: it
means also that the equation holds almost true (but that very different values have the same property).
In this case, it is clear that we do not get the exact solution. This criticism should not be stretched too much: the
precision of models is in any event limited, and even supposedly exact algorithms are limited by the precision of
computers.
For the algorithm to know at which moment to stop computations, we shall have to establish a test.
In fact, the only criterion used in practice will consider the variation of the whole set of variables in the solution, from
an iteration to the other.
in relative values:
or in levels:
222
ππ = |π¦ππ β π¦ππβ1 |
by variable: ππ < ππ , βπ
Generally one will choose a criterion in relative value, each error being compared with a global criterion. This value
will have to be small compared to the expected model precision (so that the simulation error will not really contribute
to the global error), and to the number of digits used for results interpretation.
The most frequent exception should correspond to variables which, like the trade balance, fluctuate strongly and can
even change sign: here the choice of a criterion in level seems a natural solution, which will avoid a non-convergence
diagnosis due to negligible fluctuations of a variable around a solution (by pure chance) very small.
For example, if the convergence threshold is 0.0001 in relative value, convergence will be refused if solutions for the
US trade balance alternate by chance between - 1 billion current US Dollars and - 1.0002 billion80, while a difference of
200 000 Dollars, at the scale of US foreign trade, is obviously very small. And this difference, which represents less
than one millionth of US exports and imports, might never be reduced if the computer precision guarantees only 8
significant figures81.
In practice we shall see that the test could be restricted to a subset of variables in the model, the convergence of
which extends mathematically to global convergence.
o In case of Gauss-Seidel, each additional digit bears roughly the same cost. The convergence is qualified as
linear.
o In case of Newton, the number of digits gained increases with the iterations: beyond the minimum level (say
0.01%) a given gain is cheaper and cheaper (this will be developed later). The convergence is called quadratic
80
There is no risk for this in present times.
81
Exports and imports will be precise to the 8th digit, but the difference, a million times smaller, to the 2nd only.
223
β’ On the type of simulation:
o For a forecast, one will not be too strict, as we all know the precision is quite low anyway. Forecasting growth
of 2.05% and 2.07% three years from now delivers the same message, especially as the actual growth might
materialize as 1% (more on forecast precision later).
o For a shock analysis, especially if the shock is small, the evaluation of the difference between the two
simulations is obviously more affected by the error: decisions increasing in GDP by 0.07% and 0.09% will not
be given the same efficiency.
In a stochastic simulation, it is essential that the consequence for the solution of introducing small random residuals is
precisely associated with the shock, and not on the simulation process.
As to the number of iterations, it will be used as a limit, after which we suppose that the model has no chance to
converge. In practice one never considers stopping an apparently converging process, just because it has taken too
much time. So the only relevant case is when the process is not progressing, because it is oscillating between two or
more solutions, and the deadlock has to be broken. Reverting to the use of damping factors (described later) should
solve the problem in the Gauss-Seidel case.
Testing convergence under EViews is not very flexible: the only option allowed is the level of the (relative)
convergence criterion, and it will apply to all variables.
One can also decide on the maximum number of iterations. For most models, after 1000 iterations, convergence
becomes rather improbable. But just to make sure, one can set an initial high number. Observing the actual number
required can allow to improve the figure.
We are now going to show how the choice of the algorithm affects the convergence process.
β’ Ai,j = 1 if the variable yj appears formally, through its unlagged value, in the equation of rank i.
β’ Ai,j = 0 otherwise.
224
We will suppose the model to be normalized, therefore put under the form:
π¦ β π(π¦) = 0
where the variable yi will appear naturally to the left of the equation of rank i: the main diagonal of the matrix will be
composed of 1s.
The definition of the incidence matrix, as one can see, depends not only on the model, but also on the ordering of
equations, actually the one in which they are going to be computed.
The formal presence of a variable in an equation does not necessarily mean a numerical influence: it could be affected
by a potentially null coefficient, or intervene only in a branch of an alternative. Often we will not be able to associate
to a model a unique incidence matrix, nor a matrix constant with time, except if one considers the total set of
potential influences (the matrix will be then the Boolean sum of individual Boolean matrices).
One will also notice that defining the incidence matrix does not require the full formulations, or the knowledge of
variable values. We simply need to know the list of variables which appear in each explanation, as well as their
instantaneous or lagged character 82.
To apply this technique to our model, we can rely on the block structure provided by EViews, through access to:
(double-click)>View>Block structure,
Number of equations: 20
Number of independent blocks: 3
Number of simultaneous blocks: 1
Number of recursive blocks: 2
Block 1: 3 Recursive Equations
cap(1) prle_t(6) x(19)
Block 2: 14 Simultaneous Equations (1 feedback var)
ic(5) ci(13) led(7) le(8)
td(15) m(17) q(3)
Block 3: 3 Recursive Equations
res_m(16) res_x(18) k(20)
82
Following our methodology, the incidence matrix can be produced before any estimation.
225
First, we can use the above separation to move the three predetermined variables at the beginning, and the three
post determined at the end, which give the following matrix:
We can see that the model has been divided into three parts:
β’ A three equation block, with elements which do not depend on the complement, or on subsequent variables
in the same block. The variables in this can then be computed once and for all, in a single iteration, at the
beginning of the solving process. Actually they do not depend on any variable in the same block, but this is
not required.
This property is called recursiveness, and the block is usually named the prologue.
We can see that variables can belong to this block for various reasons:
o prle_t depends only on time. The only reason for introducing its equation is to allow easy modification in
forecasts.
o cap depends on an exogenous and a predetermined variable.
o x should depend on the rest of the equilibrium (through UR) but this link has not been evidenced
statistically, leaving only the instantaneous influence of the exogenous WD.
In practice, however, respecting the convergence threshold will need two iterations, the starting value being different
from the solution found, unless the recursivity is known from the start, and the first solution accepted without
control83.
83
Which is of course the case for EViews.
226
β’ A three equations block, in which elements do not affect the rest of the model, and do not depend on
subsequent variables in the same block. These variables can be computed after all the others, once and for
all in one pass. Again, they do not depend on any variable in the same block, but this is not necessary. The
only condition is that they do not depend on elements computed later (or that the matrix is lower-
triangular).
o The residuals for the cointegration equations, which will only be corrected at the next period.
o The end-of-period capital, which obviously cannot affect the equilibrium for the period.
We shall see later another important category: variables with a purely descriptive role, like the government deficit in
GDP points
β’ The rest of the model is simultaneous, and sometimes called the heart. We can check on the model graph
that for any given couple of variables in the set, there is at least one sequence of direct causal relationships
leading from the first to the second, and vice versa. This means also that exogeneizing any element (at a
value different from the model solution of course) all the other elements will be affected.
We can now try to better interpret the simultaneity in the heart. The first stage is observing the presence of loop
variables.
The incidence matrix allows defining loop variables, as variables that enter in an equation of rank lower than the one
that defines them, or will be used before they are computed. In matrix notations, we shall have:
The variables appearing as an explanatory factor in their own equation of definition also will have to be added to this
set. But in practice this case is rather rare.
Let us look at our incidence matrix. Two loop variables are present: FD and M. The reason is that they are used to
compute Q, in an equation which appears at the beginning (of the heart).
Actually X should also be present, but as UR appears only through its lagged value, and WD is exogenous, its exact
value can be computed immediately, which means it is located in the prologue. In a way it is now technically
exogenous (considering only same period relationships).
Of course, a model can contain a sequence of non-recursive blocks. This will happen for instance for two subsequent
non-recursive blocks if elements of the second depend on elements in the first, but not vice-versa. Between the two
blocks, a recursive one can be introduced.
We shall see examples of this situation when we deal with more complex models.
The definition of the set of loop variables presents the following interest: if this set is empty, the model becomes
recursive, which means that the sequential calculation of each of the equations (in the chosen order) gives the
solution of the system. Values obtained at the end of the first iteration will satisfy the global set of equations, none of
these values being questioned by a later modification of an explanatory element. And a second iteration will not
modify the result.
227
This favorable case is rare enough 84. However, one can often identify simultaneous subsets (or Β« blocks Β» ) with a
recursive structure relative to each other, such that the p first equations of the model are not influenced by the last n
β p (as we have shown on our example). The process of simulation can be then improved, as it will suffice to solve in
sequence two systems of reduced size, allowing to gain time as the cost of solution grows more than proportionally
with the number of equations. This property is evident for Newton, where the cost of both Jacobian computation and
inversion decrease, less for Broyden and even less for Gauss-Seidel, where the only proof comes from practice.
It is obvious that discovering the above properties and performing the associated reordering are interesting for the
model builder, as they allow to improve the organization of the solution process, and therefore reduce computation
time. This process will also allow to detect logical errors, for example by evidencing the recursive determination of an
element known as belonging to a loop (such as investment in the Keynesian loop). Most software packages, including
EViews, take care of this search and the associated reorganization, but further improvement may be sought in the
solving process by modifying by hand the order of equation computations.
The second is much less obvious and in any case more complex. One will seek generally to minimize the number of
loop variables. The cost of this technique will depend on the ambition: the search for one set of loop variables from
which we cannot eliminate an element (Nepomiaschy and Ravelli) is cheaper than the search for all orderings with the
smallest possible number of elements (Gilli and Rossier). The first type of set will be called minimal, the second
minimum. In fact, minimizing the number of loop variables might not be a good preparation for the use of the Gauss -
Seidel algorithm, as we will see later.
EViews determines automatically the block structure of the model (which is de facto optimal, even if other
organizations exist). As to reordering inside the simultaneous blocks, if it does not apply an optimization algorithm, it
determines the loop variables associated with a given ordering (actually associated to the initial one) and places the
associated equations at the end of the block. The efficiency of this last action is questionable, as it means that in a
given iteration all computations use the previous value of loop variables, delaying somewhat the impact of βnewβ
information.
For instance, in our model, we can reduce the number of loop variables by transferring the equation for Q to the end
of the heart:
84
And the associated model is probably quite poor.
228
prle_ res_
cap t x ur ic ci i led le lt rhi ih co fd td m q res_x m k
cap 1
prle_
t 1
x 1
ur 1 1 1
ic 1 1
ci 1 1
i 1 1 1
led 1 1 1
le 1 1
lt 1 1
rhi 1 1 1
ih 1 1
co 1 1
fd 1 1 1 1 1
td 1
m 1 1 1 1
q 1 1 1 1
res_x 1 1
res_
m 1 1 1
k 1 1
Now Q is the on