0% found this document useful (0 votes)

107 views633 pages

Structmodel

This document provides an overview of structural econometric modeling methodology and tools for application in EViews. It discusses key concepts like endogenous and exogenous variables, behavioral and identity equations, parameters, and the time dimension. It also covers model applications, types, preparation including accessing international data, estimation of equations, testing the model through simulations over past and future periods, and ensuring long-run convergence. The document is intended to guide readers through the full process of developing and applying econometric models in EViews.

Uploaded by

Rodrigo Acuña

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views633 pages

Structmodel

Uploaded by

Rodrigo Acuña

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 633

STRUCTURAL ECONOMETRIC MODELLING

METHODOLOGY AND TOOLS WITH

APPLICATIONS UNDER EVIEWS

JEAN-LOUIS BRILLET.
CONTENTS

1 FOREWORD 14

INTRODUCTION 15

THE EXAMPLE: A VERY BASIC MODEL 18

1 19

2 CHAPTER 1: NOTATIONS AND DEFINITIONS 19

1.1 THE MODEL AS A SET OF EQUATIONS 19

1.2 THE ELEMENTS IN A MODEL 19

1.2.1 VARIABLES: ENDOGENOUS AND EXOGENOUS 19

1.2.2 EQUATIONS: BEHAVIORAL AND IDENTITIES 21

1.2.3 PARAMETERS 24

1.2.4 THE RANDOM TERM 25

1.2.5 RESIDUALS VERSUS ERRORS 26

1.2.6 FORMULATIONS 27

1.2.7 THE TIME DIMENSION 27

1.2.8 LINEARITY 32

1.2.9 OTHER PROPERTIES 36

1.2.10 CONSTRAINTS THE MODEL MUST MEET 40

1.2.11 NORMALIZATION AND IDENTIFICATION 47

1.2.12 CONCLUSION 51

2 CHAPTER 2: MODEL APPLICATIONS 52

2.1 OPERATIONAL DIAGNOSES 52

2.1.1 DIFFERENT TYPES OF DIAGNOSES: SCENARIOS AND SHOCKS 52

1
2.1.2 ADVANTAGES OF MODELS 52

2.1.3 A CERTAIN REASSESSING 53

2.2 THEORETICAL MODELS 54

2.3 QUANTIFIED SMALL MODELS 55

2.3.1 WITH SCIENTIFIC PURPOSE 55

2.3.2 WITH AN EDUCATIONAL PURPOSE 55

3 CHAPTER 3: MODEL TYPES 56

3.1 THE FIELD 56

3.2 THE SIZE 56

3.2.1 QUASI-ACCOUNTING MODELS 57

3.2.2 DETERMINANTS OF THE SIZE 57

3.3 THE HORIZON 58

3.3.1 FOR FORECASTING 58

3.3.2 FOR MODEL ANALYSIS 58

3.3.3 A CLASSIFICATION 58

3.4 THE PERIODICITY 58

3.5 OTHER MODELS 59

4 CHAPTER 4: GENERAL ELEMENTS 60

4.1 THE STAGES IN THE PROCESS 60

4.1.1 PREPARING THE MODEL 60

4.1.2 ESTIMATION 60

4.1.3 SOLVING AND TESTING OVER THE PAST. 60

4.1.4 SOLVING AND TESTING OVER THE FUTURE 61

4.1.5 USING THE MODEL FOR FORECASTS AND POLICY STUDIES 61

4.2 HOW TO ORGANIZE THE DEVELOPMENT OF THE MODEL 61

5 CHAPTER 5: PREPARING THE MODEL 64

2
5.1 PREPARING THE MODEL: THE FRAMEWORK 64

5.1.1 PRINT VIEW 68

5.2 PREPARING THE MODEL: SPECIFIC DATA ISSUES 69

5.2.1 TYPES OF DATA 69

5.2.2 THE ACCESS TO DATA 70

5.2.3 THE MODE OF TRANSMISSION 70

5.2.4 PREPARING THE DATA FOR THE TRANSFER 78

5.2.5 THE PRELIMINARY PROCESSING OF SERIES 79

5.2.6 UPDATES 82

5.2.7 SUPPRESSIONS 82

5.2.8 THE DOCUMENTATION 83

5.2.9 CONSEQUENCES ON WORK ORGANIZATION 85

5.2.10 THE PRACTICAL OPTIONS 85

5.3 ACCESSING INTERNATIONAL DATA BASES 86

5.3.1 THE WORLD BANK 87

5.3.2 THE ORIGINAL SERIES 88

5.3.3 CREATING A SINGLE COUNTRY WORKFILE 89

5.3.4 WORKING ON SEVERAL (OR ALL) COUNTRIES) 89

5.3.5 THE ALTERNATE SOURCES 94

5.3.6 BACK TO OUR EXAMPLE 104

5.3.7 USING LOOPS AND GROUPS IN EVIEWS 140

5.3.8 COMPARING WORKFILES : THE WFCOMPARE COMMAND 144

6 CHAPTER 6 THE ESTIMATION OF EQUATIONS 146

6.1 THE PROCESS OF ESTIMATION 146

6.2 SPECIFIC ISSUES 147

6.2.1 THE R2 OR R-SQUARED 147

3
6.2.2 THE CONSTANT TERM 150

6.3 APPLICATIONS: OUR MODEL 152

6.3.1 THE CHANGE IN INVENTORIES 153

6.3.2 INVESTMENT: THE NECESSITY TO ESTABLISH A CONSISTENT THEORETICAL EQUATION PRIOR TO ESTIMATION 163

6.3.3 EMPLOYMENT: STATIONARITY, ERROR CORRECTION MODELS, BREAKPOINT TEST. 169

6.3.4 EXPORTS: AUTOREGRESSIVE PROCESS, COINTEGRATION, LONG-TERM STABILITY. 183

6.3.5 IMPORTS: GOING FURTHER ON COINTEGRATION AND LONG-TERM STABILITY. 197

6.3.6 BACK TO THE RESIDUAL CHECK 210

6.3.7 THE PRESENT MODEL 212

7 CHAPTER 7: TESTING THE MODEL THROUGH SIMULATIONS OVER THE PAST 214

7.1 THE SOLUTION 215

7.1.1 GAUSS-SEIDEL 215

7.1.2 RITZ-JORDAN 216

7.1.3 NEWTON AND ITS VARIANTS 216

7.1.4 BROYDEN’S METHOD 220

7.1.5 ITERATIONS AND TEST OF CONVERGENCE 221

7.1.6 STUDY OF THE CONVERGENCE 224

7.1.7 SOLVING THE MODEL: BASIC EVIEWS TECHNIQUES 247

7.2 A FIRST VALIDATION 266

7.2.1 EX POST SIMULATIONS 266

7.2.2 EX-POST FORECASTS 273

7.2.3 SOLVING THE MODEL: SCENARIOS 273

7.2.4 OVERRIDING THROUGH MENUS 277

7.2.5 OUR EXAMPLE 278

7.2.6 ANALYTIC SHOCKS 280

7.2.7 OUR EXAMPLE 282

4
8 CHAPTER 8: TESTING THE MODEL OVER THE FUTURE 286

8.1 MAKING THE MODEL CONVERGE IN THE LONG RUN 286

8.1.1 THE ASSUMPTIONS 287

8.1.2 ADAPTING THE FORMULATIONS 288

8.1.3 IMPROVING THE CHANCE (AND SPEED) OF CONVERGENCE 290

8.1.4 SOLVING PARTIAL MODELS 291

8.1.5 CHECKING THE EXISTENCE OF A LONG-TERM SOLUTION 291

8.2 CONVERGENCE PROBLEMS IN THE SHORT RUN 293

8.3 CONVERGENCE PROBLEMS IN THE MEDIUM RUN 293

8.4 TESTING THE RESULTS 294

8.4.1 OF SIMULATIONS 294

8.4.2 OF SHOCKS 294

8.5 EVIEWS FEATURES 295

8.5.1 PRODUCING SIMULATIONS 295

8.5.2 PRODUCING A BASE SOLUTION 296

8.5.3 PRODUCING SHOCKS 297

8.5.4 CHANGING MODEL SPECIFICATIONS 298

8.5.5 UPDATING THE MODEL SPECIFICATIONS 300

8.5.6 MODEL PROTECTION 302

8.6 SOME RECENT USEFUL FEATURES TO BETTER MANAGE SIMULATIONS 302

8.6.1 THE GRAPH OF DEPENDENCIES 302

8.6.2 INTERPRETING THE LIST OF VARIABLES 303

8.6.3 THE “PRINT”DISPLAY 304

8.6.4 EQUATION SEARCH 305

8.7 RATIONAL EXPECTATIONS 305

8.7.1 THEFRAMEWORK 305

5
8.7.2 CONSEQUENCES FOR MODEL SIMULATIONS 306

8.7.3 TECHNICAL ELEMENTS 307

8.7.4 OUR EXAMPLE 307

8.7.5 THE TEST 308

8.7.6 THE RESULTS 308

8.8 STOCHASTIC SIMULATIONS 311

8.8.1 PURELY STATISTICAL ERRORS 311

8.8.2 FORMULATION AND ASSUMPTION ERRORS 311

8.8.3 CONSIDERING THE ERRORS 311

8.8.4 THE INTEREST OF THE TECHNIQUE 313

8.8.5 BACK TO OUR EXAMPLE 314

8.8.6 THE RESULTS 314

8.9 GOING FURTHER: STUDYING MODEL PROPERTIES 317

8.9.1 EIGENVALUE ANALYSIS 317

8.9.2 THE CASE OF ERROR CORRECTION MODELS: A SIMPLE EXAMPLE 328

8.10 MISCELLANEOUS MODELLING FUNCTIONS 331

8.10.1 MODEL DEPENDENCIES 331

8.10.2 DEBUGGING TOOLS FOR EVIEWS PROGRAMS. 331

8.10.3 PROGRAM DEPENDENCY TRACKING 333

9 CHAPTER 9: USING MODELS 333

9.1 OPERATIONAL DIAGNOSES 333

9.1.1 SCENARIOS AND THEIR DIFFERENT TYPES 333

9.1.2 MANAGING ACTUAL FORECASTS: MANAGING THE RESIDUALS 335

9.1.3 MANAGING ACTUAL FORECASTS: THE TARGETING OF SIMULATIONS 336

9.1.4 DIFFERENT TYPES OF SHOCKS 337

9.1.5 THE FORECAST: TECHNICAL ASPECTS 339

6
9.1.6 CHANGING MODEL SPECIFICATIONS 340

9.1.7 OPTIMAL CONTROL 342

9.2 TEACHING WITH MODELS 342

9.3 PRESENTATION OF RESULTS 343

9.3.1 GENERAL ISSUES 343

9.3.2 TABLES 343

9.3.3 GRAPHS 344

10 CHAPTER 10: APPLYING THE ABOVE PRINCIPLES TO OPERATIONAL CASES 349

10.1 THE ACCOUNTING FRAMEWORK 349

10.1.1 THE AGENTS: A FIRST DEFINITION 350

10.1.2 THE OPERATIONS 350

10.1.3 THE INTEGRATED ECONOMIC ACCOUNTS 351

10.1.4 SECTORS, BRANCHES AND PRODUCTS, 351

10.1.5 AGENTS SUBDIVISIONS 352

10.1.6 A MULTI COUNTRY MODEL 352

10.2 A SINGLE COUNTRY, SINGLE PRODUCT MODEL 352

10.2.1 THE ECONOMIC ASPECTS 352

10.2.2 THE EVIEWS PROGRAMS 370

10.3 A SINGLE COUNTRY, MULTI PRODUCT MODEL 464

10.3.1 THE MAIN REASONS FOR PRODUCT DECOMPOSITION 464

10.3.2 INTRODUCING INTERMEDIATE CONSUMPTION 467

10.3.3 SPECIFIC SECTORAL ISSUES 468

10.3.4 THE DATA 469

10.3.5 CONSEQUENCES FOR ESTIMATIONS 470

10.3.6 THE PRODUCTION FUNCTION 471

10.3.7 UNEMPLOYMENT 472

7
10.3.8 CHANGE IN INVENTORIES 472

10.3.9 HOUSEHOLD CONSUMPTION 472

10.3.10 EXTERNAL TRADE 472

10.3.11 WAGES 473

10.3.12 PRICES 473

10.4 A MULTI COUNTRY, SINGLE PRODUCT MODEL 484

10.4.1 FIRST ISSUE: THE MODELS 484

10.4.2 SECOND ISSUE: INTER-COUNTRY TRADE 485

10.4.3 A CONSISTENT METHOD: MACSIM 486

10.4.4 THE EVIEWS PROGRAM 499

10.5 A REGIONAL MODEL 499

10.6 A MULTI COUNTRY, MULTI PRODUCT MODEL 500

10.6.1 THE EVIEWS PROGRAM 500

11 CHAPTER 10 : QUASI ACCOUNTING MODELS 501

11.1 THE ORIGINALITY OF THE APPROACH 501

11.1.1 THE PRODUCT DETAIL 501

11.1.2 THE OPERATIONS DETAIL 502

11.2 THE LEVEL OF SIMPLIFICATION 503

11.3 THE GOOD POINTS 504

11.4 TWO EXAMPLES 504

11.4.1 WORKING WITH BOTH TYPES 511

12 CHAPTER 11 : STOCK-FLOW CONSISTENT MODELS 512

12.1 THE GENERAL APPROCH 512

12.2 THE ACCOUNTING FRAMEWORK 514

12.2.1 THE AGENTS 514

12.2.2 THE OPERATIONS 514

8
12.3 A PROPOSAL 515

12.3.1 THE FRAMEWORK FOR THE NEW PRODUCTION BLOCK 515

12.3.2 THE FINANCIAL BLOCK 517

12.3.3 THE ADDITIONAL CHANGES LINKED TO THE FINANCIAL ACCOUNT. 517

12.4 EVOLVING TOWARDS A FULL MACROECONOMIC MODEL 518

12.4.1 A FULL PROPOSAL 519

12.4.2 THE CAUSALITIES 520

12.4.3 THE CAUSALITIES ( KEYS) 520

12.4.4 EXPLICITING THE BEHAVIORS 521

12.5 THE SIMULATIONS 527

THE SHOCKS 528

12.5.2 CONCLUSION ON MODEL PROPERTIES 533

13 SOLVING A MODEL FOR TARGET VALUES 536

13.1 THE METHOD USING A PROGRAM 536

13.1.1 INTRODUCTION 536

13.1.2 THE PROBLEM 536

13.1.3 THE LINEAR CASE 537

13.1.4 THE NON-LINEAR CASE 538

13.1.5 A FEW QUESTIONINGS 539

13.1.6 A FEW IDEAS ON CHOSING THE INSTRUMENTS 540

13.1.7 A TEST ON A VERY SMALL MODEL. 540

13.1.8 A TEST ON A LARGER MODEL 549

13.2 USING THE EVIEWS INTEGRATED COMMAND 550

13.2.1 AN EXAMPLE BASED ON OUR SMALL MODEL 551

13.2.2 USING A MENU 552

13.2.3 WITH TWO TARGETS 553

9
13.3 RESPECIFICATION OF ENDOGENOUS AND EXOGENOUS VARIABLES 555

13.3.1 THE FIRST EXAMPLE 556

13.3.2 A MORE COMPLEX EXAMPLE 557

13.3.3 A LESS LOGICAL ONE 557

13.4 THE CONSEQUENCES FOR THE SOLVING PROCESS 559

14 A LIST OF USEFUL SERIES FOR PRODUCING A SINGLE COUNTRY, SINGLE PRODUCT MODEL 559

14.1 THE PERIODICITY 560

14.2 GENERAL ELEMENTS 560

14.3 THE SERIES 560

14.3.1 THE SUPPLY - DEMAND VARIABLES AT CONSTANT PRICES (WHOLE ECONOMY) 560

14.3.2 THE SAME ELEMENTS AT CURRENT PRICES 561

14.3.3 THE SAME ELEMENTS DECOMPOSED INTO PRODUCTS 561

14.4 PRODUCTIVE CAPACITY 561

14.5 EMPLOYMENT 561

14.6 PRICE DEFLATORS 561

14.7 HOUSEHOLDS ACCOUNT 562

14.8 FIRMS ACCOUNT (ALL TYPES OF FIRMS) 562

14.9 REST OF THE WORLD 562

14.10 GOVERNMENT ACCOUNT (ELEMENTS NOT DESCRIBED EARLIER) 563

14.11 FINANCIAL DOMAIN 563

15 A LIST OF USEFUL SERIES FOR A MODEL WITH SEVERAL SECTORS 564

15.1 THE PERIODICITY 564

15.2 GENERAL ELEMENTS 564

15.3 THE SERIES 564

15.3.1 THE SUPPLY - DEMAND VARIABLES AT CONSTANT PRICES (WHOLE ECONOMY) 564

15.3.2 THE SAME ELEMENTS AT CURRENT PRICES 565

10
15.4 PRODUCTIVE CAPACITY 565

15.5 EMPLOYMENT 565

15.6 PRICE DEFLATORS 566

15.7 HOUSEHOLDS ACCOUNT 566

15.8 FIRMS ACCOUNT (ALL TYPES OF FIRMS) 566

15.9 REST OF THE WORLD 567

15.10 GOVERNMENT ACCOUNT (ELEMENTS NOT DESCRIBED EARLIER) 567

15.11 FINANCIAL DOMAIN 567

16 RECENT ADDITIONS TO SESSION MANAGEMENT 568

16.1 SNAPSHOTS 568

16.2 CLOUD MANAGEMENT 568

16.2.1 GOOGLE DRIVE 568

17 USING EVIEWS MENUS TO BUILD AND MANAGE AN ECONOMETRIC MODEL 569

17.1 GENERAL MENU FEATURES 569

17.1.1 COMMAND CAPTURE 569

17.1.2 WINDOW MANAGEMENT AND DOCKING 569

17.1.3 SNAPSHOTS 570

17.1.4 LOGGING 570

17.2 A VERY SMALL MODEL 570

17.3 THE DATA 571

17.4 CREATING THE WORKFILE 571

17.4.1 PROGRAMMING 573

17.5 READING THE DATA 573

11
577

17.5.1 PROGRAMMING 577

17.6 CREATING THE PAGE 577

17.6.1 PROGRAMMING 578

17.7 LINKING THE DATA 578

17.7.1 PROGRAMMING 579

17.8 DISPLAYING THE DATA 580

17.9 CREATING THE MODEL 582

17.9.1 PROGRAMMING 586

17.10 CHECKING THE MODEL 586

17.11 ESTIMATING CONSUMPTION 587

17.11.1 A VERY SIMPLE FORMULA 587

17.11.2 IMPROVING THE FORMULA 589

17.12 SOLVING THE MODEL 595

17.12.1 PROGRAMMING 601

17.13 A SHOCK 601

12
602

17.13.1 PROGRAMMING 605

17.14 FORECASTING 606

17.14.1 CREATING A PAGE 606

17.14.2 THE ASSUMPTIONS 608

17.14.3 BASIC SIMULATION 609

17.14.4 A SHOCK 611

17.15 A FULL PROGRAM 617

17.15.1 A RESIDUAL IN THE ESTIMATED EQUATION 617

17.15.2 A RESIDUAL CHECK 618

17.15.3 COMPUTING BASEFUTURE VALUES FOR THE ENDOGENOUS 618

17.15.4 A BASELINE SOLUTION USING A SCENARIO 618

17.15.5 PRODUCING A SHOCK FROM SCENARIO 618

17.15.6 COMPUTING DIFFERENCES USING A LOOP 619

17.15.7 THE PROGRAM 619

18 INDEX 629

13
1 FOREWORD

The purpose of this book is a little special.

First, of course, by its subject: structural econometric modelling no longer looks so fashionable, having lost ground to
Computable General Equilibrium models and in particular their Dynamic Stochastic versions.

We will contend that while this might be true in the academic field (you just have to look at the program of congresses
and symposiums) there is still a lot of room for structural models. Indeed, many institutions are still using them and
even building new ones, both in developed and developing countries. We shall try to show that this position is quite
justified, and that for a large part of the modelling applications, in particular the analysis and interpretation of
macroeconomic interactions, the call for structural models remains a good strategy, arguably the best one.

But we shall not stop at proving the usefulness of these models. For the people we have convinced, or which were so
already, we will provide a set of tools facilitating all the tasks in the modelling process. Starting from elementary
elements, it will lead by stages the user to a level at which he should be able to build, manage and use his
professional, operational model.

This means this book will, as its title says, focus essentially on applied and even technical features, which does not
mean it will be so simplistic.

After a necessary description of the field, we shall use the largest part of the book to show the reader how to build his
own model, from general strategies to technical details. For this we shall rely on a specific example, presented at the
beginning, and which we will follow through all the steps of model development. When the situation becomes more
complex (with the addition of product and international dimensions), we shall still find this model at the core of the
cases.

This model will also be present in the background, when we address new directions, which we think are quite
compatible with our approach: Quasi-Accounting, and Stock-Flow Consistent models.

Our examples will be based on EViews package, the most popular modeling product presently available This will allow
us to be more helpful to EViews users, concentrating on its practice (including some tricks).

Finally, just as important if not more so, we shall provide a set of files allowing readers to practice modelling (either
alone or as part of a course). And for more advanced users, we shall give access to files allowing to produce
operational (if small) models, which they can adapt to their own ideas, with the tedious tasks: producing the data,
defining the accounting framework and organizing simulations over the future, being already prepared.

All these elements are provided for free, and downloadable on the EViews site, at the address:

https://www.eviews.com/StructModel/structmodel.html

This version of the book takes into account the features of the last version of EViews; EViews 13. However, most of
the text is valid for earlier versions. The main differences come from improvements in the user-friendliness.

14
INTRODUCTION

Since an early date in the twentieth century, economists have tried to produce mathematical tools which, applied to a
given practical problem, formalized a given economic theory to produce a reliable numerical picture. The most natural
application is of course to forecast the future, and indeed this goal was present from the first. But one can also
consider learning the consequences of an unforeseen event, or measuring the efficiency of a change in the present
policy, or even improving the understanding of a set of mechanisms too complex to be grasped by the human mind.

In the beginning (let us say since the 1930s) the field was occupied by the “structural” models. They start from a given
economic framework, defining the behaviors of the individual agents according to some globally consistent economic
theory. They use the available data to associate to these behaviors reliable formulas, which are linked by identities
guaranteeing the consistency of the whole set. These models can be placed halfway between the two above
categories: they do rely on statistics, and also on theory. To accept a formula, it must respect both types of criteria.

The use of this kind of models, which occupied the whole field at the beginning, is now mostly restricted to policy
analysis and medium-term forecasting. For the latter, they show huge advantages: the full theoretical formulations
provide a clear and understandable picture, including the measurement of individual influences. They allow also to
introduce stability constraints leading to identified long-term equilibriums, and to separate this equilibrium from the
dynamic fluctuations which lead to it.

In the last decades, new kinds of models have emerged, which share the present market.

• The “VAR” models. They try to give the most reliable image of the near future, using a complex estimated
structure of lagged elements, based essentially on the statistical quality, although economic theory can be
introduced, mostly through constraints on the specifications. The main use of this tool is to produce short-term
assessments.
• The Quasi-Accounting models, which rely on very basic behaviors, most of the time calibrated. This allows to
treat cases where data is available for extremely limited sample periods, or where the fine detail (generally in
products) forbids to apply econometrics with a good chance of global success.
• Stock-Flow Consistent models, which answer two criticisms addressed to structural models: producing incomplete
and formally unbalanced models, and not taking enough into account the stocks, in particular of financial assets.
By detailing these assets by agent and category, SFCs allow to consider sophisticated financial behaviors,
sometimes at the expense of the “real” properties.
• And last (but not the least) Computable General Equilibrium models. They use a detailed structure with a priori
formulations and calibrated coefficients to solve a generally local problem, through the application of one or
several optimizing behaviors. The issues typically addressed are optimizing resource allocations, or describing the
consequences of trade agreements. The mechanisms described contain generally little dynamics.
This is no longer true for the Dynamic Stochastic General Equilibrium models, which dominate the current field. They
include dynamic behaviors and take into account the uncertainty in economic evolutions. Compared to the traditional
models (see later) they formalize explicitly the optimizing equilibria, based on the aggregated behavior of individual
agents. This means that they allow agents to adapt their behavior to changes is the rules governing the behaviors of
others, including the State, in principle escaping the Lucas critique. As the model does not rely on traditional
estimated equations, calibration is required for most parameters.

Compared to CGEs and DSGEs, optimization behaviors are present (as we shall see later) and introduced in the
estimated equations. But they are frozen there, in a state associated with a period, and the behavior of other agents
at the time. If these conditions do not change, the statistical validation is an important advantage. But sensitivity to
shocks is flawed, in a way which is difficult to measure.

A very good (and objective) description of the issue can be found in:

15
http://en.wikipedia.org/wiki/Dynamic_stochastic_general_equilibrium

http://en.wikipedia.org/wiki/Macroeconomic_model#Empirical_forecasting_models

It seems to us that the main criterion in the choice between DSGEs and traditional structural models lie in the tradeoff
between statistical validation and adaptability of behaviors.

In the last years, popularity of structural econometric modelling seems to have stabilized. A personal hint for this (if
not an actual proof) is the sustained demand for participation in structural modelling projects, observed on the sites of
companies devoted to international cooperation.

Another issue is that being the first tool produced (in the thirties of the last century) it was applied immediately to the
ambitious task of producing reliable forecasts. The complexity of the economy, and the presence of many random
shocks makes this completely unrealistic (and this is even more true today). During the golden years of structural
modelling, when economy was growing at a regular (and high) rate, forecasting was as easy as riding a tame horse on
a straight path: anybody could do it. But when the horse turned into a wild one, the quality of the rider showed, and it
did not stay in the saddle too long. Failing to succeed in a task too difficult for any tool (including VAR and CGE models,
which do not have to forecast the medium-term), gave discredit to structural models and all their uses, including
policy analysis and even the understanding and interpretation of complex economic mechanisms, applications for
which neither VAR nor CGE can compete in our opinion.

Also, the role of financial issues has grown, which the initial structural models were not well equipped to address. But
Stock-Flow Consistent versions can be an answer to this problem.

Anyway, even with limited ambitions, producing a sound econometric structural model is not a simple task. Even a
professional economist, having an excellent knowledge of both economic theory (but not necessarily a complete and
consistent picture) and econometric techniques (but not necessarily of their practical application) will find it quite
difficult producing a reliable and operational econometric model.

The purpose of this book is to shorten the learning process, in several ways.

After a global presentation of economic models:

• Notations, definitions, mathematical characteristics (dynamics, linearity, continuity, identifiability…).

• Applications: economic theory, forecast, education.
• Classification of existing models.

We shall describe how to organize the sequence of model building tasks, from data production and framework
specification to actual operational studies.

For each task, we shall give all the necessary elements of methodology.

We shall present the main economic options available, with some theoretical explanations.

All these explanations will be based on a practical example, the production of a very small model of the French
economy. The size will not forbid us to address most of the problems encountered in the process.

The methods, techniques and solutions proposed will be based on the EViews software. This will allow us to present
some useful features and tricks, and to provide a sequence of complete programs, which the user can modify at will,
but not necessarily too heavily, as all the models of this type share a number of common elements. The main issue is
of course the estimation process, each case leading generally to an original version of each behavioral equation.
16
A set of documented programs is available on demand, following the above principles

• For the small example,

• For a more detailed product, a model for a single country, not far from an operational version.

These programs will allow to:

• Import the original data

• Build the model framework
• Transform the data to conform to the elements in the model.
• Estimate a set of equations, starting with standard behaviors, possibly updated.
• Check the technical and theoretical consistency of the resulting model.
• Produce forecasts and policy studies.

In each case, we shall present programs which actually work. An econometric solution will be found, reliable both in
statistical and economic terms. And the properties of the models will be rather satisfying, with a long-term solution
and reasonable dynamics leading to it.
Finally, we shall address the more complex problems: multi-sector and multi-country models (and both options
combined). The specific issues will be described, and a framework for a three-product model will be provided,
following the same lines as the previous example.

The goal of this book is therefore both limited and ambitious. Without getting into theoretically complex features, it
should give readers all the elements required to construct their own model. Being relieved of the more technical (and
tedious) tasks, they will be allowed to concentrate on the more intelligent (and interesting) ones.

Readers must be aware they will find here neither a full description of econometric and statistical methods, nor a
course in economic theory. We shall give basic elements on these fields, and rather focus on their links with the
modelling process itself. For more detailed information, one can refer to the list of references provided at the end of
the volume.

Concerning Quasi-Accounting and even more Stock-Flow Consistent models, for which our experience is much more
limited, we will be even less directive.

17
THE EXAMPLE: A VERY BASIC MODEL

To present the elements and the framework of a structural econometric model, we shall use a specific example, which
we shall address permanently during our presentation. In spite of its limited size, we think it remains quite
representative of the class of models we are considering in this manual.

At the start of any model building process, one has to specify in a broad manner the logic of his model, and the
behaviors he wants his model to describe. No equation needs to be established at this time. We shall place ourselves
in this situation.

In our example, an economist has decided to build a very simple model of the French economy. As our tests will be
based on actual data, a country had to be chosen, but the principles apply to any medium sized industrialized country.

Our model includes the following elements.

• Based on their production expectations and the productivity of factors, firms invest and hire workers to adapt
their productive capacity. However, they exert some caution in this process, as they do not want to be stuck
with unused elements.
• The levels reached in practice define potential production.
• Firms also build up inventories.
• Households obtain wages, based on total employment (including civil servants) but also a share of Gross
Domestic Product. They consume part of this revenue.
• Final demand is defined as the sum of its components: consumption, productive investment, housing
investment, the change in inventories, and government demand.
• Imports are a share of local demand («domestic demand»). But the less capacities remain available, the more
an increase in demand will call for imports.
• Exports follow world demand, but producers are limited by available capacities, and their priority is satisfying
local demand.
• Supply is equal to demand.
• Productive capital grows with investment, but is subject to depreciation.

The above framework looks rather straightforward, and certainly simplistic. Obviously, it lacks many elements, such as
prices, financial concepts, and taxes. This will be addressed as later developments.

Let us no go further for the time being. One can observe that if we have not built a single equation yet, a few are
already implicit from the above text.

18
1
2 CHAPTER 1: NOTATIONS AND DEFINITION S

Before we start presenting the process of model building, we must define the concepts we shall use. They will be
based on individual examples taken from our (future) model.

1.1 THE MODEL AS A SET OF EQUATIONS

In a general way, a model will be defined as a set of fully defined formulas describing the links between a set of
concepts.

Formally, a model can be written as the vector function of variables.

𝑓(. . . . ) = 0

We shall address in turn:

• The nature of elements appearing in the function.

• The nature of the functions themselves.

1.2 THE ELEMENTS IN A MODEL

1.2.1 VARIABLES: ENDOGENOUS AND EXOGENOUS

Obviously, a model will be used to measure economic concepts, depending on other elements.

Two variable types will appear in a model:

• Endogenous variables, or results, whose value will be obtained by solving the system of equations,
• Exogenous variables, or assumptions, whose value is known from outside considerations, and which
obviously condition the solution.

If the model is solved over past periods, this value should be known. But in forecasting operations, it will have to be
chosen by the model builder (or user).

For the system to be solved, the number of endogenous variables must correspond to the number of equations.

Our formulation becomes:

𝒇(𝒙, 𝒚) = 𝟎

19
with

• x vector of exogenous variables

• y vector of endogenous variable (with the same dimension as f).

For instance, in our model:

• Imports will be endogenous, as they depend on local demand. Exports too, depending on world demand.
• World demand will be exogenous, as we are building a model for a single country, and we are going to
neglect the impact of local variables on the world economy. Of course, this impact exists, as France is (still) an
important country, and its growth has some influence on the world economy. But the relatively limited
improvement can only be obtained at the very high cost of building a world model. This simplification would
be less acceptable for a model of the USA, or China, or the European Union as a whole (we shall address this
issue in detail later).

Technically, one can dispute the fact that exports are endogenous. As we make them depend only on an exogenous
world demand, they are de facto predetermined, apart from an unforecastable error. But they have to be considered
endogenous. Our model describes the local economy, and one of its features is the exports of local firms, allowed by
the external assumption on foreign demand, but following a local behavior.

As to Government demand, models of the present type will keep it also exogenous, but for different reasons:

• The main goal of this model is to show its user (which can be the State, or a State advising agency, an
independent economist playing the role of the State, or even a student answering a test on applied
economics) the consequences of its decisions. So these decisions must be left free, and not forced on him.
• The behavior of the State is almost impossible to formalize, as it has few targets (mostly growth, inflation,
unemployment, budget and trade balances) and a much larger number of available instruments1. If their base
values are more or less fixed, it can deviate from them arbitrarily, without too much delay. To achieve the
same goal, past French governments have used different global approaches, calling for different panels of
individual instruments.2
• The State alone has enough individual power to influence significantly the national economy.
Each of the two exogenous elements is characteristic of a broader category:

• Variables considered as external to the modeled area, on which economic agents considered by the model
have no or little influence. In addition to the world environment for a national model, this can mean
population3, or meteorological conditions4, or the area available for farming. The theoretical framework of
the model can also suppose exogenous structural elements, such as the real interest rate, the evolution of
factor productivity, or the depreciation rate of capital.

1
Not here, but in the general case.

2
For instance, to decrease unemployment, a government can increase demand or reduce firms’ taxes, and the tax
instrument can be social security contributions, or subsidies

3
In long-term models growth might affect the death and birthed rates thus population.

4
Which can depend on growth (glasshouse effect).

20
• Variables controlled by an agent, but whose decision process the model does not describe. Even if it was
formally possible, the model builder wants to master their value, to measure their consequences on the
economic balance. These will be referred to as decision variables or «instruments".
Changing the assumptions on these two types of variables, therefore, will relate to questions of very different spirit:

• What happens if perhaps...? (the price of oil increases abruptly).

• What happens if I (the State), decide...? (to decrease VAT rate5).
The second type of question can be inverted: what decision do I have to take to obtain this particular result? (By how
much should I decrease the rate of employers’ social contributions to create 1000 jobs?). This means that the status
(exogenous/endogenous) of some variables is changed: the answer calls for specific techniques or solving a
transformed model. We will deal with this later.

Sometimes the two approaches can also be combined; by considering first the consequences of an evolution of
uncontrolled elements, and then supposing a reaction of the State, for instance a change in policy that would return
the situation to normal. For instance, the State could use its own tools to compensate losses in external trade due to a
drop in world demand.

From a model to another, the field described can change, but also the separation between endogenous and
exogenous. The real interest rate can change its nature depending on the endogeneity of the financial sector,
technical progress can be assumed as a trend or depend on growth, and the level of population can depend on
revenue.

1.2.2 EQUATIONS: BEHAVIORAL AND IDENTITIES

1.2.2.1 Behaviors

The first role of the model is to describe “behaviors”: the model builder, following most of the time an existing
economic theory, will establish a functional form describing the behavior of a given agent, and will use econometrics
to choose a precise formulation, with estimated parameters.

In describing consumption, one might suppose that its share in household income is determined by

• The level of income (a higher income will make consumption less attractive or necessary, compared to
savings6).
• Recent variations of income (consumers take time in adapting their habits to their new status).
• The evolution of unemployment: if it grows, the prospect of losing a job will lead households to increase
reserves.
• Inflation: it defines the contribution required to maintain the purchasing power of financial savings.
Once identified, all these elements will be united in a formula, or rather a set of possible formulas (households can
consider present inflation, or the average over the last year; the increase in unemployment can use its level or
percentage change). These formulas will be confronted with the available data, to find a specification statistically
acceptable on the whole, each element participating significantly in the explanation, and presenting coefficient values

5
Provided the EU commission will allow it.

6
Let us recall that investment in housing is considered as savings.

21
consistent with economic theory. Once parameters are estimated, each element of the resulting formulation will
contribute to the logical behavior of the associated agent.

But the process is not always so straightforward. Two other cases can be considered.

• The behavior can be formalized, but not directly as estimation-ready formulas. A framework has first to be
formalized, then processed through analytical transformations possibly including derivations and
maximizations, leading finally to the equation (or set of equations) to estimate. This will be the case for our
Cobb-Douglas production function (page 105) for which we compute the combination of labor and capital
which maximize profits for a given production level, according to a set of formulas obtained outside the
model. Or for the definition of the wage rate as the result of negotiations between workers unions and firm
managers, based on their respective negotiating power.
• Often the model builder will not be able to formulate precisely the equation, but will consider a set of
potential explanatory elements, waiting for econometric diagnoses to make a final choice between
formulations (generally linear). For instance, the exchange rate might depend on the comparison of local and
foreign inflation, and on the trade balance.
In any case, even if the exact intensity of influences is unknown to the model builder 7, economic theory generally
defines an interval of validity, and especially a sign. Whatever the significance of the statistical explanation, it will be
rejected if its sign does not correspond to theory. In the example above, the increase of labor demand must generate
gains in the purchasing power of the wage rate.

Anyway, the less the theoretical value of the estimated coefficient is known, the more care must be applied to the
numerical properties of the model, at simulation time.

The formulation of these theoretical equations often makes use of specific operators, allowing alternative
calculations: Boolean variables, operators for maximum and minimum. For instance, in disequilibrium models, the
theoretical equation can include a constraint. We can consider also the case of a function of production with
complementary factors, where the level of each factor determines an individual constraint:

𝐶𝐴𝑃 = 𝑚𝑖𝑛( 𝑝𝑙. 𝐿, 𝑝𝑘. 𝐾)

with CAP production capacity, L employment, K capital, and pl, pk the associated productivities

1.2.2.2 Identities

A model composed only of behavioral equations cannot generally be used as such. Additional equations will be
needed, this time describing undisputable and exact relationships.

Several cases can be identified, which can apply simultaneously:

• Some concepts are linked by an accounting formula, and we need to ensure their numerical coherence. For
example, once the model has defined household revenue, it cannot estimate savings and consumption

7
Otherwise he would not have estimated it.

22
separately as the sum of the two is known8. A single element will be estimated: it can be savings,
consumption, the savings ratio or the consumption ratio, and the other elements will follow, using an
identity.
• Some concepts are linked by a causal sequence of elements, and some elements in the chain are not defined
by behaviors. For example, if we estimate firms’ employment and household consumption, we must
formalize household revenue (as a sum including wages) to make job creation improve consumption. And in
our example, defining final demand (as a sum of its components) ensures that imports will follow
consumption.

Of course, one can consider eliminating these identities by replacing each element they compute by the
corresponding formula. This is not always technically possible, but in any case it would:

o Lead to overly complex formulations, difficult to interpret and slower to compute.

o Discard potentially interesting information.

In addition, one will be led to introduce:

• Intermediate variables simplifying formulations (and speeding up computations). Even if the growth rate of
the real wage rate, which uses a slightly complex expression, was not considered interesting as an economic
quantity, it will be useful to define it, if it appears as an explanatory element in many equations.
• Purely descriptive elements: the ratio of Government Balance to GDP is a crucial element in evaluating the
financial health of the state (and one of the « Maastricht » criteria for entering the European Monetary
Union).
• Finally, economic theory is not always absent from this type of equation: the supply – demand equilibrium
has to be enforced:
Q (supply from local producers) + M (foreign supply to the country) = FD (demand from local agents) + X (foreign
demand to the country).

And the choice of the variable which balances it has a strong theoretical impact on model properties.

o If exports and imports come from behaviors, and demand from the sum of its components, we need to
compute Q as:

Q (local output) = (FD-M) (local demand supplied by local producers) +X (foreign demand supplied by local producers)

This means that production will adapt to demand (which itself can depend on the availability of products).

o But we could also suppose that:

The producers chose to limit their output at a level actually lower than demand, because additional production would
bring negative marginal profits. In this case Q will be fixed, and we could have:

Q = fixed, X = f(WD), FD = f(economy), M = FD - Q + X

8
This would also be absurd in terms of household behavior.

23
o Or the country can only import in foreign currency, which it obtains through exports.

X = f(WD), M = f(X), Q = fixed, FD = Q+(M-X)

1.2.3 PARAMETERS

Parameters can be defined as scalars with a varying value. The only formal difference with exogenous variables is that
they lack a time dimension9.

Two types of parameters can be considered, according to the way their value is established:

• Those estimated by reference to the past: starting from a theoretical but fully defined formula including
unknown parameters, the model builder will seek the values which provide the formulation closest to
observed reality, according to a certain distance. This means using "econometrics".
• Those decided by the model builder: economic theory or technical considerations can supply a priori
assumptions concerning a particular behavior. For instance, if a Central Bank uses a standard Taylor rule to
decide its interest rate, its sensitivity to the inflation level should be 0.5. A special case will be represented by
a control variable, giving (without changing the formulation) a choice between several types of independent
behaviors.
The distinction is not as clear as it may seem: in particular, if estimation fails to provide an economically coherent
result, the model builder can be driven to accept the value of a parameter, if it is consistent with economic theory.
And even if not, to choose the nearest value within the theoretical range. For instance, an indexation of wages on
inflation by 1.1 can lead the modeler to apply 1, if the difference is not significant.

With a as a vector of parameters (â estimated) the system becomes:

𝒇(𝒙, 𝒚, 𝒂̑ , 𝒂) = 𝟎

And in our example, one could estimate the influence of world demand on exports, for example by supposing that
relative variations are proportional (or equivalently that the elasticity of exports to world demand is constant).

𝜟𝑿/𝑿 = 𝒂 ⋅ 𝜟𝑾𝑫/𝑾𝑫

9
In EViews, modifying a parameter value applies to the current model, and taking it into account calls for a new
compilation, making the new version operational. This is both tedious and error-prone. One might consider replacing
parameters by series with a constant value, which gives access to the much more manageable “scenario” feature.

24
where a should be close to unity, if the share of the country on the world market is stable10.

But if the estimated coefficient not significant, we can get back to:

𝛥𝑋/𝑋 = 𝛥𝑊𝐷/𝑊𝐷

This choice could also have been made from the start for theoretical reasons, or to ensure the long-term stability of
the model.

Clearly, to estimate a parameter it is necessary to define entirely the associated formula.

1.2.4 THE RANDOM TERM

In practice, the behavior of agents does not answer exactly to formalized functions, and the formulation obtained by
estimation will not reproduce the reality. It will only approximate this behavior, using elements which conform to
some economic theory, each of them providing a significant contribution to the changes in the explained variable. The
number of estimated parameters will then generally be much lower than the size of the sample, or the number of
observed values. In practice, adding elements to the explanation can:

• In the favorable cases, improve the quality of the explanation given by the elements already present, which
can now concentrate on their natural role, instead of trying to participate in the explanation of other
mechanisms in which their efficiency is limited11.
• But the new element can compete with the others in explaining a mechanism in which they all have some
competence, limiting the improvement and leaving the sharing of the explanation rather undetermined (and
therefore limiting the significance of the coefficients).1213
In practice, these correlation problems will always appear, sometimes very early, and generally before the fifth or
sixth element. Beyond that figure, the precision of individual coefficients will decrease, and global quality will improve
less and less.

This means that a typical econometric equation will contain a maximum of four parameters, while variables will be
known on fifty to one hundred quarters.

10
In our model WD stands for world trade (including its expansion), not the aggregate demand of countries.

11
Just like a worker which has to use his time on two tasks, and is really qualified for one. For example, if an excellent
musician but average lyricist is teamed with a good lyricist, the quality of songs (both music and lyrics) will improve.

12
This can be a problem for the model if the two competing elements have a different sensitivity to a particular
variable. For instance, if one is sensitive to a tax rate, the other not: then the role of the tax rate will be undetermined.

13
If two workers with the same profile complete a task together, it is difficult to evaluate their individual contribution.
One might have rested the whole period.

25
It will be therefore necessary, to formulate an exact model, to accept the presence of non-zero additional terms
(residuals). If one believes in the model, this residual should be interpreted as a random perturbation without
economic meaning14. But if the equation is badly specified, it will also come from other sources: omitting a relevant
variable, replacing it by another less relevant, or choosing the wrong form for the equation 15.

The fault will not always lie with the model builder, who might not have been able to apply his original ideas. The
variables he needs may not be precisely measured, or only with a slightly different definition, or they may not be
available at all, as in, for example, the goals or expectations of a given agent.

Practically speaking, one will often suppose that this residual follows a random distribution, with a null average, a
constant standard error, and residuals independent across periods.

Our formulation becomes therefore, in the general case, noting u the vector of residuals:

𝒇(𝒙, 𝒚, 𝒂̑ , 𝒂, 𝒖) = 𝟎

In the example, if we want to represent changes in household consumption as a constant share of total production
variations, we will write:

𝑪𝑶 = 𝒂 ⋅ 𝑸 + 𝒃 + 𝒖

or rather, if we want u to have a constant relative influence:

𝑪𝑶/𝑸 = 𝒂 + 𝒖

As we shall see later, the second equation avoids heteroscedasticity problems.

1.2.5 RESIDUALS VERSUS ERRORS

14
One can also take into account that the relationship is not exact. For instance, that an value of an elasticity is only
very close to constant.

15
Of course, as we have said before, one is never able to estimate the « true » equation. This remark should apply to a
large conceptual error, leading to behaviors distinctly different from an acceptable approximation of reality.

26
It is probably the time to bring an important issue about the nature of econometrics.

When he considers a behavioral equation, the economist can have two extreme positions.

• He believes the behavior can be exactly specified according to a formula, which is affected by an error term
with a given distribution (maybe a white noise, or a normal law). With an infinite number of observations, we
would get an exact measurement of the parameters, and therefore of the error (which remains of course)
and its distribution.
• He thinks that the concept he wants to describe is linked to some other economic elements, but the relation
is only an application, of which any formula represents only an approximation. To this application a random
term can also be added, if one believes that the replication of the same explanatory elements will bring a
different result. Additional observations will only get a better mapping.
The debate is made more complex by several facts:

• The data on which he wants to base his estimation is not measured correctly. One cannot expect the
statisticians to produce error free information, for many reasons: measurement errors, inappropriate sample,
mistaken concepts...
• Even if measured correctly, the concepts he is going to use are not necessarily the right ones. For instance, a
given behavior should be applied to the only firms which do make profits, a separation which is not available
at the macroeconomic level.
• The discrete lags which he will apply to these concepts are not the right ones either. For instance, it might be
known that an agent considers the price index of the last month, but only quarterly data is available.
• The estimation period is not homogenous, and this cannot be explained by the data. For instance, the mood
of consumers (and their consumption behavior) can evolve without any link to measurable economic
elements.
From the above issues, the logical conclusion should be:

• The first position is illusory, and to a point which is impossible to measure (of course).
• But we have to take it if we want to apply econometric methods.
This means that in the following text we shall put ourselves in the first position, but we will always keep in mind the
true situation and give to the difference between the concept and its estimation the less ambitious name of
“residual”.

1.2.6 FORMULATIONS

We shall now consider the form of the equations. Let us first approach the time dimension.

1.2.7 THE TIME DIMENSION

Variables in economic models have generally a time dimension, which means they are known through discrete values,
almost always with a constant periodicity: generally annual, quarterly or monthly series. This means we will consider
models in discrete time.

There are exceptions, however. The most frequent applies to micro-economic models, describing the behavior of a
panel of individual firms or households, and the dimension will correspond to items in a set. Sometimes they will be
ordered, using the level of one variable, such as the income level for a set of households. Time can be introduced as
an additional dimension, but possibly with a varying interval, either predetermined (phases of the moon) or
unpredictable (periods of intense cold).

27
1.2.7.1 Consequences of the discretization

The time discretization of variables will be introduced in several ways, leading to:

• really instantaneous variables, measured at a given point in time: the capital on the 31st of December at
midnight, in an annual model (defined as a stock variable).
• averages: the average level of employment observed during a period.
• flows: the goods produced during a period.
The same economic concept might appear under several forms: inflation and price level, stock of debt and balance for
the period, average and end-of-period employment levels, plus net job creations. For one household, we can consider
the revenue, its yearly change, and the global revenue accumulated during its existence.

1.2.7.2 The seasonality

When models have a less than yearly periodicity, some series can present a specific distortion depending on the sub-
period inside the year. This can come from changes in the climate: in winter the consumption of electricity will
increase due to heating and lighting, but construction will be mostly stopped. It can be due to social issues: the
concentration of holidays in the summer months can reduce production, and the coming of Christmas will increase
consumption (in Christian countries). We are going here to provide a basic sketch of the problem, leaving a more
serious description to specialized books like Ladiray and Quenneville (2001).

Using unprocessed data can lead to problems: for instance, the level of production in the second quarter will be lower
than what we could expect from labor and capital levels. This will disturb estimations and make model solutions more
difficult to interpret.

Two solutions can be considered:

• Introducing in the estimated equations “dummy variables associated to each sub-period.
• Extracting from the series their seasonal component and producing a completely new set of values.
Of course, one should not mix the two types of techniques in the same equation (or model).

The second method will be favored, as it also solves the interpretation problem.

Several techniques are available, the most well-known being Census-X13 ARIMA, developed by the US Census Bureau
and Statistics Canada16. But TRAMO-SEATS17 is also a common choice. Both are available under EViews.

One must be aware that this process often reduces the statistical quality of estimations. For instance, if demand is
particularly high in the last quarter of each year, and imports follow, seasonally adjusting both series will make the link
less clear, bringing less precise results. Even more obviously, the relation between demand for heating and
temperature will lose power from seasonal adjustment.

These examples show the main issue: in a one-equation model, three situations are possible:

The dependent variable contains a seasonal component, in addition to truly economic features. For instance,
agricultural production will be lower in winter, even if the same level of labor, land, fertilizer, machinery is available....

16
https://www.census.gov/srd/www/x13as/

17
http://www.bde.es/webbde/es/secciones/servicio/software/econom.html

28
Truly, at the same time, the use of fertilizer will decrease, and probably of labor too, but in a lower way. One either
adjust this variable or introduce dummy elements. The internal quality of the relationship should be the same, but the
statistical one will improve in appearance, through the correlation of the unadjusted dependent variable with the
explanatory dummies.

On the contrary, if all the seasonal explanation comes from the seasonality of explanatory elements, seasonally
adjusting is not necessary, and even reduces the quality of estimations (with the variability of elements). One could
use raw series to estimate an imports equation, using demand, rate of use of capacities and price competitiveness as
explanatory elements.

But what is true for one equation does not apply to the whole model. One cannot mix the two types of series, and this
means seasonally adjusting will prevail in practice.

1.2.7.3 Static and dynamic models

To determine the equilibrium for a given period, some models will use only variables from this period: we shall call
them static models. They correspond to the formulation

𝑓𝑡 (𝑥𝑡 , 𝑦𝑡 , 𝑎, 𝑢𝑡 ) = 0

The most frequent case is that of input-output models, which use a matrix of "technical coefficients" to compute the
detailed production associated to a given decomposition of demand into categories of goods, which itself depends
only on instantaneous elements.

𝑄 = 𝐴 ⋅ 𝐹𝐷

(A representing an n-by-n square matrix)

𝐹𝐷 = 𝑓(𝑄)

On the contrary, dynamic models use variables from other periods.

• The reasons are quite numerous. They can be:

• theoretical: some agents will be supposed to base their behavior on the observation of the past. Firms will
increase their prices if the profits of the previous quarter have been too low. Or they will build their
expectations of demand growth on the previous evolutions of the same variable. These two examples
illustrate the main issues: using the past to create an image of the future (backward looking expectations), or
to measure a previous gap between actual and target values, which the agent will try to close in the present
period.

29
• institutional: the income tax paid by households can be based on their income of the previous period (this
was the case in France, until 2019).
• technical: if a model considers a variable and its growth rate, computing one from the other considers the
previous level.
One observes that each of these justifications supposes that influences come only from previous periods: one will
speak of (negatively) lagged influences.

The formulation becomes therefore:

𝑓𝑡 (𝑦𝑡 , 𝑦𝑡−1 , . . . . , 𝑦𝑡−𝑘 , 𝑥𝑡 , 𝑥𝑡−1 , . . . . , 𝑥𝑡−𝑙 , 𝑎, 𝑢𝑡 ) = 0

Let us go back to our model. We can observe already an undisputable lagged influence: most of present capital will
come from the remaining part of its previous level. Any other case is still undecided. However, without going too deep
into economic theory, one can think of several lagged influences:

• For household consumption, we have already considered that adapting to a new level of revenue takes some
time. This means it will depend on previous levels. If we detailed it into products, the previous level can have
a positive influence (some consumptions are habit - forming) or a negative one (generally, people do not buy
a new car every quarter):

𝐶𝑂𝑡 = 𝑓(𝐶𝑂𝑡−1 , 𝐶𝑂𝑡−2 , . . . . , 𝐻𝑅𝐼𝑡 )

• Firms invest to adapt their productive capacities to the level of production needed in the future. We can
suppose that they build their expectations on past values.

𝑰𝒕 = 𝒇(𝑸𝒕 , 𝑸𝒕−𝒍 , 𝑸𝒕−𝟐 , . . . . )

It is interesting to note that the previous formulation could be simplified, eliminating any lag larger than one by the
addition of intermediate variables:

𝐟𝐭 (𝐲𝐢,𝐭 , 𝐲𝐣,𝐭−𝐤 ) = 𝟎

(where yi and yj represent variables, indexed by time t and t-k)

is equivalent to
30
𝑓𝑡 (𝑦𝑖,𝑡 , 𝑧𝑗,𝑡 ) = 0

𝑧1,𝑡 = 𝑦1,𝑡−1

𝑧2,𝑡 = 𝑧1,𝑡−1 (= 𝑦𝑗,𝑡−2 )

........

𝑧𝑘−1,𝑡 = 𝑧𝑘−2,𝑡−1 (= 𝑦𝑗,𝑡−𝑘+1 )

𝑧𝑘,𝑡 = 𝑧𝑘−1,𝑡−1 (= 𝑦𝑗,𝑡−𝑘 )

in which a lag of k periods on a single variable has been replaced by k one period lags on as many variables (including
new ones).

The same method clearly allows eliminating lagged exogenous variables.

On the investment equation of the example, this would give:

𝐼𝑡 = 𝑓(𝑄𝑡 , 𝑄1𝑡 , 𝑄2𝑡 , 𝑄3𝑡 )

𝑄1𝑡 = 𝑄𝑡−1

𝑄2𝑡 = 𝑄1𝑡−1

𝑄3𝑡 = 𝑄2𝑡−1

But if this method simplifies the theoretical formulation, it has the obvious disadvantage of artificially increasing the
size of the model and reducing its readability, without producing additional information. Its interest is reserved to
specific studies. For instance, assessing model dynamics can call for the linearization of the model according to
present and lagged variables. The above transformation will limit the matrix system to two elements (with lags 0 and
1), which will make further formal computations easier, and independent from the number of lags.

It also allows us to use a simplified formulation in subsequent presentations:

𝑓𝑡 (𝑦𝑡 , 𝑦𝑡−1 , 𝑥𝑡 , 𝑎, 𝑢𝑡 ) = 0

31
1.2.7.4 A particular case: rational expectations

It has appeared natural, in previous examples, to consider only negative lags. This will happen if we suppose that the
anticipation of agents relies only on the observation of the past (and the present) 18.

To justify positive lag formulations, it is necessary to suppose:

• That agents have the possibility, by their present decisions, to determine the future values of some variables
(and the associated behavior can be formalized).
• That agents perfectly anticipate the future (perfect expectations).
• That the expectation by agents of specific evolutions has for consequence the realization of these evolutions
(self-fulfilling expectations).
• That agents build their expectations on the behaviors of the other agents19, for which they know the
properties (rational expectations). Basically, this means that they are able to apply the model controlling the
economy (but not necessarily know its formulas), and the decision process defining its assumptions. For
instance, they can forecast the investment program of the Government (depending on economic conditions),
they know how firms and households will react, and they know the links between these elements (they are
able to consider the supply-demand equilibrium). Actually, this is rather called “model consistent
expectations”.
• However, they do not necessarily know the unexplained part of the behaviors (which can be associated with
the random term). If know only their distribution, we shall speak of stochastic rational expectations. EViews
does not provide this feature at present (only one or the other), although this should appear in a future
version. They also do not have to know the actual formulas, just be able to compute them.
You do not have to believe in rational expectations to apply them. Producing alternate simulations with different
assumptions on expectations will improve greatly the insight in one particular model or on economic mechanisms in
general. We shall present this later on a specific case.

1.2.7.5 Other case: continuous time models

This also is a very specific area: some theoretical models will be formulated as a system of equations where variables
appear as a function of continuous time, and variations (or growth rates) become exact derivatives. One ends up then
with a system of differential equations, which one can be led to integrate.

These models seldom evolve beyond a theoretical stage, if only for lack of statistical information.

But some operational models, describing for instance the stock exchange, can reduce their periodicity to a daily or
even shorter value.

1.2.8 LINEARITY

We will consider here the linearity relative to variables. The linearity relative to coefficients will appear in the chapter
on estimation.

18
This use of proxies is made necessary by the absence of direct measurement of anticipations. Exceptionally, they
can be obtained by surveys, leading to a specific estimation.

19
Including the State.

32
The potential linearity of a model represents a very important property for its analysis as well as its solution. But first
we must define the notion of linearity, which can be more or less strict.

The most restrictive will be:

𝑨 ⋅ 𝒚𝒕 + 𝑩 ⋅ 𝒚𝒕−𝟏 + 𝑪 ⋅ 𝒙𝒕 + 𝒃 + 𝒖𝒕 = 𝟎

but one can let matrix elements change with time:

𝐴𝑡 ⋅ 𝑦𝑡 + 𝐵𝑡 ⋅ 𝑦𝑡−1 + 𝐶𝑡 ⋅ 𝑥𝑡 + 𝑏𝑡 + 𝑢𝑡 = 0

a definition again less restrictive will suppose linearity relative to the sole endogenous variables:

𝐺(𝑥𝑡 , 𝑎)𝑦𝑡 + 𝐻(𝑥𝑡 , 𝑎)𝑦𝑡−1 + 𝐽(𝑥𝑡 , 𝑎) + 𝑢𝑡 = 0

or even relative to the endogenous of the period:

𝐺(𝑦𝑡−1 , 𝑥𝑡 , 𝑎) ⋅ 𝑦𝑡 + 𝐻(𝑦𝑡−1 , 𝑥𝑡 , 𝑎) + 𝑢𝑡 = 0

Using the multiplier as an example, we can already show that these properties affect the computation of derivatives
of model solutions. We will detail later the consequences on convergence properties.

The first property tells that it does not depend on the initial equilibrium, or the period considered. Multiplying the
shock by a given factor will have a proportional effect. It is enough to compute it once to know it once and for all.

In the second case, the multiplier will depend only on the period. Starting from different base assumptions will not
change the consequences of a given change.

In the third case, the multiplier will depend also on the exogenous values (and the coefficients). It has to be re-
computed each time these elements change (or have changed in the past except for one period ahead solutions), but
can be stored until the next time they do.

The last case is similar to the third one. But convergence will be affected (see later).

1.2.8.1 Practical cases of non-linearity

33
It is obvious enough that a single non-linear equation makes the model non-linear, according to one of the previous
definitions. Reasons for non - linearity are multiple; one will find in particular:

• Expressions measured in growth rates (therefore possibly linear relative to the endogenous of the period).
For example, the growth rate of wages can depend on inflation.
• Expressions formulated as elasticities (generally integrated into logarithms). One will suppose for example
that imports and domestic demand show proportional relative variations.
• Ratios entering in behavioral equations.
• Equations using elements at current prices, computed as the product of a quantity by a deflator (which shows
the evolution of the price compared to a base year). For example, the trade balance will be obtained as the
difference between the products of exports and imports at constant prices by their respective deflators.
Sometimes this distinction is purely formal, and an adequate variable change will allow the return to a linear
formulation. However, if we consider the whole model, replacing by its logarithm a variable computed in elasticities
will only transfer the problem if the level appears also in the model.

Thus, in our general example, if one uses for the exports equation the formulation:

𝐿𝑜𝑔(𝑋) = 𝑎 ⋅ 𝐿𝑜𝑔(𝑊𝐷) + 𝑏

one can very well introduce variables 𝐿𝑋 = 𝐿𝑜𝑔(𝑋) and𝐿𝑊𝐷 = 𝐿𝑜𝑔(𝑊𝐷), which will make the equation
linear:

𝐿𝑋 = 𝑎 ⋅ 𝐿𝑊𝐷 + 𝑏

But it will be necessary, to introduce exports in the supply - demand equilibrium:

𝑄 + 𝑀 = 𝐹𝐷 + 𝑋

to add the non - linear equation

𝑋 = 𝐸𝑥𝑝(𝐿𝑋)

34
Therefore, most economic models presenting a minimum of realism will not be linear. But numerical computations
will generally show that even for models including many formal non - linearities, the approximation by a linearized
form around a model solution (denoted by an asterisk):

(𝜕𝑓𝑡 ⁄𝜕𝑦𝑡 )(𝑦𝑡 − 𝑦𝑡∗ ) + (𝜕𝑓𝑡 ⁄𝜕𝑦𝑡−1 )(𝑦𝑡−1 − 𝑦𝑡−1

∗ )
+ (𝜕𝑓𝑡 ⁄𝜕𝑥𝑡 )(𝑥𝑡 − 𝑥𝑡∗ ) = 0

is acceptable for general purposes.

On the other hand, the stability of the derivatives with time is much more questionable.

Let us suppose the formulation for imports is:

𝐿𝑜𝑔(𝑀𝑡 ) = 𝑎 ⋅ 𝐿𝑜𝑔(𝐹𝐷𝑡 ) + 𝑏

Linearizing it around a particular solution (noted *), we get

(𝑀𝑡 − 𝑀𝑡∗ )/𝑀𝑡∗ = 𝑎 ⋅ (𝐹𝐷𝑡 − 𝐹𝐷𝑡∗ )/𝐹𝐷𝑡∗

(𝑀𝑡 − 𝑀𝑡∗ ) = 𝑎 ⋅ 𝑀𝑡∗ /𝐹𝐷𝑡∗ ⋅ (𝐹𝐷𝑡 − 𝐹𝐷𝑡∗ )

which will represent an adequate linear approximation of the connection between M and FD, provided that M and FD
do not move too far away from their base value20. This base value might represent a reference path, from which
actual values differ due to a change in assumptions.

But, if we restrict further the expression to a constant influence (linearity to constant coefficients),

(𝑀𝑡 − 𝑀𝑡∗ ) = 𝑎 ⋅ (𝐹𝐷𝑡 − 𝐹𝐷𝑡∗ )

20
In other words, if the terms of the derivative are negligible beyond the first order.

35
the approximation can be accepted only if the ratio M / FD does not change too much with time. This is not generally
true: the expansion of international trade has led, and still leads, to a sustained growth of the share of imports in
domestic demand, for most countries. The ratio M * / FD * will grow strongly with time, and the last formulation will
be quite inadequate for forecasts.

1.2.9 OTHER PROPERTIES

1.2.9.1 Continuity

We consider here the continuity of the whole set of endogenous variables relative to assumptions (exogenous
variables, parameters). It is almost never verified formally, but should only be considered within the set of acceptable
solutions (and assumptions).

For instance, most models use ratios, which is acceptable if the denominator can never become null (like the
productivity of labor measured as the ratio of production to employment). Or using logarithms to link imports to
demand requires (logically) that those elements are strictly positive. In other words, a fully linear model can produce a
negative GDP, but this does not make it less operational if this value is associated with absurd assumptions or
coefficients.

So even if all models show non-continuity potential, it should never occur in practice. We can think of only three
cases:

• The model framework is correct, but something is wrong with its elements: the numerical assumptions, the
estimated coefficients.
• The algorithm used for solving the model leads to absurd values (more on this later).
• The behavioral equations are wrongly specified. As we also shall see later, it can be dangerous to put
together elements without a previous assessment of the associated mechanisms (for instance using
logarithms as a natural solution).
It is necessary, however, to distinguish these absurd cases from those where the discontinuity applies to the derivative
of a variable differentiable by pieces, as we are going to see in the following paragraph.

1.2.9.2 Differentiability

It is less necessary, but its absence can lead to problems in the system solving phase, as well as in the interpretation of
results.

Separating from the previous criteria is not always straightforward, as the non-derivability of one variable can
correspond to the discontinuity of another: a discontinuous marginal productivity can make the associated production
non-differentiable at points of discontinuity.

Returning to the example, we could formalize household consumption in the following manner:

• They receive a constant share - a - of production Q.

• Under an income threshold - R - they consume a share c0.
• On the supplement they consume a share c1.
The consumption equation will become:

36
𝐶𝑂𝑡 = 𝑐0 ⋅ 𝑎 ⋅ 𝑄𝑡 + (𝑐1 − 𝑐0) ⋅ 𝑚𝑎𝑥( 0, (𝑎 ⋅ 𝑄𝑡 − 𝑅𝑡 ))

At the point Q = R / a, CO is not differentiable (the derivative to the left is c0.a, to the right c1.a). And the sensitivity of
consumption to income is not continuous.

This derivative is not purely formal: it defines the marginal propensity to consume (consumption associated to a
unitary income increase), which can appear itself in the model, at least as a descriptive element.

At the household level, the evolution of income tax as a function of revenue (with rates associated to brackets) would
represent another example, determining disposable household income.

1.2.9.3 Existence of a solution

It is obviously necessary for the model to have a solution, at least when it is provided with acceptable assumptions21.
But the potential absence of a solution is present in many formal systems, including linear models. This absence of
solution is generally logically equivalent to the existence of an absurd solution, as one can illustrate on the following
case.

Let use consider a model with n+1 endogenous variables: X (dimension n) and x (a single variable). We shall describe it
as f, a vector of formulas (dimension n+1), in which x appears as an argument of a logarithm,

𝑓(𝑥, 𝑋, 𝑙𝑜𝑔( 𝑥)) = 0

If none of the positive values of x ensures the solution of the complete model, it has no solution.

In other words, taking the argument of the logarithm as a parameter 

𝑓(𝑥, 𝑋, 𝑙𝑜𝑔( 𝛼)) = 0

and making it vary in R+, solving the associated model on x and X will never provide a value of x equal to this
parameter.

The model has obviously no solution.

21
Refusing to provide a solution for absurd assumptions should rather be considered as a quality.

37
But if the model builder has used a formulation in logarithms, he has probably not considered letting the argument
take negative values. By replacing the logarithm by some other expression giving similar values, we would probably
have obtained a solution. But if the variable remains negative, this solution would have been unacceptable.

To illustrate this case, we are going to reduce the usual model to a three equations version.

Production adapts to demand corrected by imports and exports, the last being exogenous:

[1] 𝑄 + 𝑀 = 𝐹𝐷 + 𝑋

as for demand, one supposes that its relative variations are proportional to those of production:

[2] 𝐿𝑜𝑔(𝐹𝐷) = 𝑎 ⋅ 𝐿𝑜𝑔(𝑄) + 𝑏

And imports are a share of demand

[3] 𝑀 = 𝑐 ⋅ 𝐹𝐷

Let us suppose that one has obtained by estimation in the past: a = 1.05 and b > 0, justified by a level and growth of
demand generally superior to production, obviously associated to imports greater (and growing faster) than exports.

Now, let us produce a forecast.

The model can be reduced into:

[1’] 𝐹𝐷 = 𝑄/(1 − 𝑐) − 𝑋

(from (1) and (3))

[2’] 𝐹𝐷 = 𝑄 𝑎 ⋅ 𝑒𝑥𝑝( 𝑏)

38
from (2)

and

[3’] 𝑋/𝑄 = (1/(1 − 𝑐) − 𝑄𝑎−1 ⋅ 𝑒𝑥𝑝( 𝑏))

Obviously, if Q grows (as a-1 = 0.05), the negative element will become eventually higher that the positive one, which
means that Q can only be negative, which is impossible as it enters in a logarithm in equation (2). The model has no
solution.

Of course, these mathematical observations have an economic counterpart. In the long run, final demand cannot
grow continuously faster than production, if imports are a share of demand and exports are fixed. Assumptions,
therefore, are not consistent with the estimated formula.

One will notice that the absence of solution is due here to the implicit adoption of a condition verified numerically on
the past, but not guaranteed in general. This will be in practice the most frequent case.

1.2.9.4 Uniqueness of the solution

The uniqueness of the solution, for given (and reasonable) values of parameters and assumptions, is also very
important. Indeed, we do not see how one could use a model which leaves the choice between several solutions,
except maybe if this freedom has a precise economic meaning.

In practice, most models are highly nonlinear if you look at the equations, but the linear approximation is rather
accurate within the domain of economically acceptable solutions. This limits the possibility of multiple equilibria: if the
system was fully linear, and the associated matrixes regular, there would be indeed a single solution. However, as we
move away from this domain, the quasi linearity disappears, and we cannot eliminate the possibility of alternate
solutions, probably far enough from the reasonable solution to appear quite absurd. Fortunately, if we start
computations inside the domain, an efficient algorithm will converge to the acceptable equilibrium, and we will never
even know about any other.

The most significant exception will be that of optimization models, which look for values of variables giving the best
result for a given objective (for example the set of tax decreases which will produce the highest decrease in
unemployment, given a certain cost): if several combinations of values give a result equal in quality 22, this lack of
determination will not undermine the significance of the solution. The existence of several (or an infinity of) solutions
will represent an economic diagnosis, which will have to be interpreted in economic terms23.

Another case appears when the formula represents the inversion of another formula giving a constant value, at least
on a c

ertain interval. For example, if over a certain threshold of income households save all of it:

22
For instance, if the model is too simple to differentiate the role of two taxes.

23
provided the algorithm used for solving the model is able to manage this indetermination.

39
𝐶𝑂 = 𝑚𝑖𝑛( 𝑓(𝑄), 𝐶𝑂∗ )

Then the income level associated with CO * will represent the total set of values higher than the threshold.

In the general case, the main danger appears in sensitivity studies: if one wants to measure and interpret the
economic effects of a modification of assumptions, the existence of a unique reference simulation is an absolute
necessity.

Finally, finding several solutions very close to each other might come from purely numerical problems, due to the
imprecision of the algorithm: any element of the set can then be accepted, if the difference is sufficiently low.

1.2.9.5 Convexity (or concavity)

The convexity of the system, which is the convexity of the evolution of each endogenous variable with each
exogenous variable and parameter taken individually (or of a linear combination of them), can be requested by some
algorithms, especially in optimization. In practice it is very difficult to establish, and even rarely verified. At any rate,
this characteristic is linked to the definition of variables, and a single change of variables might make it disappear.

1.2.10 CONSTRAINTS THE MODEL MUST MEET

In addition to its theoretical validity, the model will have to meet a set of more technical constraints.

1.2.10.1 Global compatibility

Constraints of compatibility will bear in practice:

a - on the endogenous between themselves: one cannot let the model compute variables independently if they are
linked by a logical relationship, accounting or theoretical. For example, if the consumer price enters in the
determination of the wage rate, it also will have to be influenced directly by the (estimated) price of local production.
Or the employment level has to affect household revenue and consumption through a sequence of links.

Accounting balances must be verified: once household revenue has been computed as a sum of elements, an increase
in consumption must produce the associated decrease in savings.

Maybe the most important issue lies with the « supply = demand » identity, which will have to enforced both in at
constant and current prices. This can lead either to use one of its elements to balance the equation, or to distribute
the residual over the global set of elements on one side. By formulating total supply and demand as:

n
O =  Oi
i=1

And

40
m
D =  Dj
j=1

One will use for instance, either

m-1
Dm = O -  D j
j=1

Or one will correct the set of demand variables by multiplying each of them by the uncorrected ratio O / D.

In most cases the equilibrium at constant prices will be enforced automatically. It can be written as:

Local production + Imports = Local demand + Exports

Or identifying intermediate consumption:

Local GDP + Intermediate consumption + Imports = Local final demand + Intermediate consumption + Exports

• With only one product, intermediate consumption can be discarded, and one will generally use the equation
to compute GDP, checking that it does not get higher that productive capacity24 .
• With several products, we must consider as many equilibrium equations, in which the supply of intermediate
consumption goods sums inputs needed for production of the good, and the demand for intermediate
consumption goods sums the intermediate uses of the good itself.

𝑄𝑖 + ∑𝑗 𝐼𝐶𝑗,𝑖 + 𝑀𝑖 = 𝐹𝐷𝑖 + ∑𝑗 𝐼𝐶𝑖,𝑗 + 𝑋𝑖

24
This can be obtained by a share of imports growing with constraints on local productive capacity.

41
If we suppose that returns to scale are constant, the vector of value added by good will come from a matrix
transformation. The constraint on capacity will be achieved in the same way as above (provided a capacity equation
can be obtained).

Defining ci,j as the quantity of good i needed to produce one unit of good j, we get:

𝑄𝑖 + ∑𝑗 𝑐𝑗,𝑖 𝑄𝑖 + 𝑀𝑖 = 𝐹𝐷𝑖 + ∑𝑗 𝑐𝑖,𝑗 𝑄𝑗 + 𝑋𝑖

Or in matrix terms

𝑄 + 𝐶 ⋅ 𝑄 + 𝑀 = 𝐹𝐷 + 𝑡𝐶 ⋅ 𝑄 + 𝑋

Q = ( I − C + t C) −1 ( FD + X − M )

Using this framework will automatically enforce the supply-demand equilibrium for all goods.

In practice, most of the problem comes from the equilibrium at current prices. If demand prices are computed
individually using behavioral equations, there is no chance the equilibrium will be met. The process described earlier
will in practice correct the prices. With S and D as supply and demand elements at constant prices, ps and pd as the
associated deflators, we can compute the global values as:

SV= ∑𝑛i=1 ps𝑖 𝑆𝑖

DV= ∑𝑚
j=1 pd𝑗 𝐷𝑗

The first option will compute a specific price

42
m-1

pd𝑚 = (SV- ∑ DV𝑗 )/𝐷𝑚

j=1

and the second

pd𝑗′ = (SV/ ∑ pd𝑗 𝐷𝑗 )pd𝑗

j=1

where the “pd” elements are the independently computed demand prices, and the “pd'” elements the corrected
values.

The correcting factor:

r = SV/ ∑𝑚
j=1 pd𝑗 𝐷𝑗

can also be written as

r = r (SV/ ∑𝑚
j=1 pd'𝑗 𝐷𝑗 )

which with

pd 'j = r pd j

gives a set of equations ensuring the equilibrium. As “r” measures the potential discrepancy between supply and
demand, one must check that it is not too different from one.

The following issues appear:

43
• With the first method, which element should be used to balance the system? The choice is between

o A small and unimportant variable, to reduce the consequences for model properties; perhaps even a variable
which has absolutely no influence on the rest of the model.
o A variable with large value, to reduce the correcting factor

• The second method represents an extreme application of the first one, where all variables on one side are
affected in the same proportional way.

Actually, none of the solutions dominates clearly, the worst being in our sense the very first, which is the same as
accepting de facto an imbalance, hidden but with potentially damaging consequences. Also, the second could be
associated with a converging economic process, while the first can have no economic interpretation whatsoever.

In fact, one should concentrate on limiting the size of the correction itself. One could represent the problem as
eliminating dust: instead of storing it in a specific location (under a carpet), or spreading it all over the place, the best
solution is clearly to reduce its production as much as possible. This means that the initially computed prices should be
designed to give naturally close values to global supply and demand.

b - on exogenous-> endogenous connections: Connections must be formulated with care. For example, if the social
contributions rate is defined as an exogenous variable in the model, it has to enter in all computations of contribution
levels. In particular, it cannot coexist with an exogenous representation of contributions, or one using an estimated
coefficient.

To avoid this type of error, a systematic study of model properties must be undertaken before any operational
application: in our example, this would mean checking that an increase of the social contribution rate has all expected
effects on State revenues as well as on the accounts and behaviors of concerned agents.

Also, the true exogenous concept should be decided. Concerning contributions, the decision variable is clearly its rate,
while the associated revenue is influenced by endogenous prices and employment.

c - on the exogenous between themselves: one should avoid defining two variables as exogenous if they are linked (in
any direction) by a logical relationship. If possible, one should endogenize one of them by formalizing this connection.

Let us suppose for example that a model for France uses two exogenous measures of the prices established by its
foreign competitors: in foreign currency and in Euros (with a fixed exchange rate). To take into account an increase of
foreign inflation, these two variables will have to be modified simultaneously. This is as best more complex, and can
lead to errors if one is not careful enough, while it can be avoided simply by endogenizing the price in Euros as the
product of the price in foreign currency by the (exogenous) exchange rate.

However, establishing such links is not always possible. For instance, in a national model, foreign prices and foreign
production are exogenous, but also clearly influenced by each other. But the nature and importance of the link are
highly variable. For instance, a decrease in foreign production can produce world deflation,25 while inflation can
reduce exports and production. To describe them completely one should have to resort to a foreign or world model.
An intermediary solution could be to establish a set of linear multipliers linking these elements, but generally the
model builder himself will take care of the problem by producing a set of consistent assumptions (with perhaps some
help from specialists of the world economy, or from a separate model).

25
This is the case for the MacSim world model we shall present later.

44
d - on endogenous-> exogenous connections: they are obviously proscribed, because contrary to the preceding links
the model builder cannot master them. They will be found in some models, however, through the presence of the
following exogenous:

• Elements measured in constant terms, while they should change with economic activity.
• Deflators, which should depend on other deflators.
• Elements measured in current terms, for both reasons.
If the associated model can possibly produce correct estimates and even forecasts, it runs the risk of showing
abnormal sensitivity properties. Let us take an example:

Let us suppose household income HI is composed

• Of the wage revenue, computed as the product of employment by the wage rate: LT . W.
• of other exogenous revenues
Salaries will be indexed perfectly on prices:

𝑊 = 𝑊𝑅 ⋅ 𝐶𝑃𝐼

One will have therefore:

𝑅𝐻𝐼 = 𝑊𝑅 ⋅ 𝐶𝑃𝐼 ⋅ 𝐿𝑇 + 𝐻𝐼𝑄

This equation might perform well in forecasts. But if a change in the assumptions makes prices increase, the
purchasing power of total wages will remain unchanged, but for the complement HIQ it will be reduced in the same
proportion as the price rise:

𝛥(𝐻𝐼𝑄/𝐶𝑃𝐼) = −(𝛥𝐶𝑃𝐼/𝐶𝑃𝐼) ⋅ (𝐻𝐼𝑄/𝐶𝑃𝐼)

One can question this assumption. Some elements in non-wage revenue (social benefits, rents, firm owner’s profits,
independent workers revenue) are more or less indexed, and can even be over indexed in the case of interest
payments (the interest rate should increase with inflation). Others, associated to differed payments (dividends,
income tax) will not change immediately. The global sensitivity to prices is not clear, but a null value is obviously not
correct.

We will face the same problem with a change in GDP:

45
𝛥(𝐻𝐼𝑄/𝑄) = −(𝛥𝑄/𝑄) ⋅ (𝐻𝐼𝑄/𝑄)

where we cannot suppose that revenue does not change (grow) with economic activity. Some elements do not or
show a limited sensitivity (pensions) but dividends and the revenue of owners of small firms certainly do.

In conclusion, even when a variable measured at current prices has no theoretical content, it should not be kept
exogenous, especially if it can be supposed to grow at constant prices. It is general better to consider as exogenous its
ratio to another variable, supposed to follow the same trend (in the absence of idea, one can use plain GDP). The
model equation will compute the variable by applying the exogenous ratio. This is also can be valid for variables at
constant prices (which generally increase with production), to the exception of decision variables identified as such.

In the case above, one could write:

𝐻𝐼 = 𝑊𝑅 ⋅ 𝐶𝑃𝐼 ⋅ 𝐿𝑇 + 𝐶𝑃𝐼 ⋅ 𝑄 ⋅ 𝑟_ℎ𝑖𝑞

in which the introduction of Q links additional revenue to the global growth of the economy.

1.2.10.2 Homogeneity

If some equations in a model do not meet homogeneity constraints, this endangers its properties, particularly its
sensitivity to shocks. Let us quote some cases:

• Linear relationships between values and quantities. The equation:

CO (consumption at constant prices) = a HRI (current income) + b is not only absurd from a theoretical viewpoint,
but will lead in the long-term to a level of savings

𝑆 = 𝐻𝑅𝐼 − 𝐶𝑃𝐼 ⋅ (𝑎 ⋅ 𝐻𝑅𝐼 + 𝑏)

that will become clearly negative over a certain price level.

• Mixing logarithms and levels. Similarly, the equation:

𝐶𝑂 = 𝑎 ⋅ 𝐿𝑜𝑔(𝐻𝑅𝐼) + 𝑏

46
(this time the two elements will be measured in quantities) makes the ratio CO / HRI decrease to 0, and therefore the
savings rate to 1, when HRI grows indefinitely.

This last example shows however a limit to the argument: on short periods the equation can present a satisfactory
adjustment, as the consumption to income ratio (propensity to consume, complement to 1 of the savings rate)
decreases effectively with income. It is the speed of the decrease, and its long-term evolution, which is questioned
here.

1.2.10.3 Constants with dimension

The problem is identical to that of the exogenous with dimension. It invites a careful study of the theoretical content
of the constant. Furthermore, as most variables grow with time, the influence of the constant will generally decrease
or even disappear in practice. We shall address this issue later, on a practical case.

1.2.11 NORMALIZATION AND IDENTIFICATION

Once equations have been estimated, the problem of normalization remains. We have seen that very often the
estimated formula will not explain a variable, but an expression (logarithm, growth rate, ratio, or a more complex
expression). But some simulation algorithms will request a model a specific form, called “identified”, in which a single
untransformed variable appears on the left-hand side:

𝑦𝑡 = 𝑓𝑡 (𝑦𝑡 , 𝑦𝑡−1 , 𝑥𝑡 , 𝑎, 𝑢𝑡 )

This means the model builder might have, after estimation, to transform the formulation: this operation is called the
normalization of the model.

The advantage is double:

• The application of some solution algorithms is made easier. In some cases (Gauss-Seidel), this form is actually
requested.
• This type of formulation allows a better interpretation of the process determining the equilibrium, provided
each equation can be interpreted as a causal relation. If the equation describes a behavior, the economist
should have placed to the left the element it is supposed to determine, conditional on the elements on the
right. This is what we can (and will) do naturally in our example. For instance, the equation describing the
choice by households of their consumption level will place naturally the variable "consumption" to the left.
The vast majority of equations will take naturally an identified form. Sometimes, a simple transformation will be
necessary, however. Perhaps the most frequent nonlinear operator is the logarithm, associated with the integration of
a formula in elasticities.

𝑑𝑥/𝑥 = 𝑓(. . . )

represents

47
𝐿𝑜𝑔(𝑥) = ∫ 𝑓(. . . . ) ⋅ 𝑑𝑥

In this case, one just needs to replace:

𝐿𝑜𝑔(𝑥) = 𝑓(. . . . )

𝑥 = 𝑒𝑥𝑝( 𝑓(. . . . ))

If you use EViews26, the software will do it for you. You can write the equation using the first form, and the software
will normalize the equation itself, computing x. This is also true if the left-hand element contains several variables, but
allows straightforward normalization. The most frequent cases are:

A change in logarithm: 𝐿𝑜𝑔(𝑥𝑡 /𝑥𝑡−1 ) = 𝑓(. . . . )

A growth rate: (𝑥𝑡 − 𝑥𝑡−1 )/𝑥𝑡−1 = 𝑓(. . . . )

A ratio: 𝑥𝑡 /𝑦𝑡 = 𝑓(. . . . )

To choose which variable to compute, EViews will take the first variable in the specification of the equation. This
simple method will be applied even if the variable has been identified as computed by a previous equation. For
instance, in our model, if we introduce the estimation of imports M, then state:

𝑀 + 𝑄 = 𝐹𝐷 + 𝑋

EViews will give an error message, as M appears to be computed twice.

26
Or most packages of the same type.

48
Moreover, when an equation is forecasted individually, one can chose between the computation of the left-hand term
and the element which determines it, for instance M or log (M) for our imports equation.

However, EViews does not solve analytically any equation for the variable. For instance:

𝑀/(𝑄 + 𝑀) = 𝑓(. . . . )

will be translated into:

𝑀 = (𝑄 + 𝑀) ⋅ 𝑓(. . . . )

introducing a non-recursive process over M.

In any event, normalizing the general equation

𝑓(𝑦, . . . . ) = 0

is possible by adding on both sides the same variable, which gives:

𝑦 = 𝑦 + 𝑓(𝑦, . . . . )

However the convergence of a model defined in this manner is often difficult to obtain (for instance if “f” is positively
linked to y). In that case, one can use (the value for “a” can be negative):

𝑦 = 𝑦 + 𝑎 ⋅ 𝑓(𝑦, … . )

Stronger simplifications are sometimes possible and will be approached with the numerical solution process.

49
Identification is not always economically straightforward: in our example, when balancing demand and supply, we can
observe that three last variables (Final demand, Exports and Imports) are going to be determined by their own
equation (the sum of its elements for the first, estimated equations for the others). This means that balancing must be
done through GDP, and we must write the equation as:

𝑄 + 𝑀 = 𝐹𝐷 + 𝑋

𝑄 = (𝐹𝐷 − 𝑀) + 𝑋

which makes its theoretical content clearer as: production must (and can) satisfy both exports and the non-imported
part of domestic demand.

50
1.2.12 CONCLUSION

It must be clear by now that the formal definition of the whole set of equations represents with the estimation of
behavioral equations an iterative and simultaneous process:

• Behavioral equations start from an initial theoretical formulation to evolve gradually to their final form by
reconciling this theory with data and estimation results.
• Accounting equations have been defined as precisely as possible in the preliminary phase, to establish a
coherent framework, but they often will have to adapt to the evolution of behavioral equations. Let us
suppose for example that the first estimation results suggest excluding from the econometric explanation of
exports their agricultural component, setting it as exogenous: a new equation and variable will appear, and
the equation for total exports will become an identity.

51
2 CHAPTER 2: MODEL APPLICATIONS

1. We shall now give a panorama of applications using models. Comments will be centered on the example of economic
models, and more particularly on the macro-economic ones. But most of the observations can be transposed to the
general case.
2. For each of these applications, technical details shall be left to the "implementation" part (chapter 7). To understand
these practical aspects of the use of models, one must first know about the way they are built, described later in
chapters 4 to 8.

2.1 OPERATIONAL DIAGNOSES

3. The most natural use of a model seems to be the evaluation of the economic future, whether as its most probable
evolution or as the consequences of some decisions. Assumptions concerning the future will be injected into the
model, and its solution will produce the requested diagnosis. Thus, one will seek to anticipate the evolution of the
main aggregates of the French economy until the year 2020, taking into account assumptions on the evolution of
international economy.

2.1.1 DIFFERENT TYPES OF DIAGNOSES: SCENARIOS AND SHOCKS

Two types of forecasts can be considered: scenarios and shocks.

• In a scenario, one is interested in absolute results, and associating to a full set of assumptions a future
evolution of the economic equilibrium. One might seek to obtain

o forecasts based on the most probable assumptions

o forecasts associated to a given set (like a party’s program)
o an evaluation of the scope of potential evolutions
o assumptions allowing to reach specific economic targets.

• On the contrary, with a shock, one starts from a base simulation (often called "reference forecast" or
“baseline”), or a simulation on the historical period, and measures the sensitivity of the economic equilibrium
to a change of assumptions. Two economic paths will then be compared (on the past, one of them can be the
historical one).

These shocks can be more or less complex, from the modification of a single assumption to the testing of a new
economic policy27.

These two techniques, scenarios and shocks, before the production of any operational policy diagnosis, will play an
important role in the model validation process.

2.1.2 ADVANTAGES OF MODELS

Now that we have described the characteristics of models and their basic use, we shall discuss the advantages they
bring (and their failings too).

Relative to the diagnosis provided by a human expert, advantages common to all models will:

27
However, this new policy should stay within the economic framework of the original model.

52
• Guarantee the accounting coherence of the resulting equilibrium.
• Consider a practically unlimited number of interdependent influences.
• Provide an explicit formalization of behaviors, allowing an external user to interpret them.
• Produce an exact and instantaneous computation of associated formulas.
• Adapt immediately the full system to a local change of theoretical formulation.

but also

• Allow the stability of reasoning, for human users of an unchanged model.

• Provide the possibility of formal comparisons with other models.
This forecasting ambition was already the basis for the construction of the first models. But this type of use has
benefited (since the 1970s) from some evolutions:

• The progress of economic theory, allowing the formalization of more sophisticated mechanisms, better
adapted to the observed reality.
• The progress of econometrics, giving access to the statistical method that will produce the most reliable
formulation associated with a given problem, and to test more complex assumptions.
• The improvement of numerical algorithms, both for computation speed, and solving more complex systems.
• The simultaneous improvement of computation hardware allowing to process problems of growing size, by
increasingly complex methods.
• The progress of modelling science, in producing models better adapted to the original problem, facilitating
the production of assumptions, and reducing the cost of reaching acceptable solutions.
• The production of computer software specialized in model building, increasingly efficient, user-friendly, and
connected with other packages.
• The improvement of the reliability of data, and the growth of the available sample, regarding both the scope
of series and the number of observations (years and periodicity)28.
• The easier communication between modelers, through direct contact and forums, allowing to communicate
ideas, programs and methods, and to get the solution to small and large problems already addressed by
others.

2.1.3 A CERTAIN REASSESSING

However, the use of models has engendered criticism from the start, using often the term « black box », describing
the difficulty in controlling and understanding a set of mechanisms often individually simple but globally very complex.

In recent decades criticism has mounted to the point of calling for a global rejection of traditional (“structural”)
models. Surprisingly, critics often find their arguments in the above improvements. One can find:

A utilitarian critique: models have proven unable to correctly anticipate the future. If this observation has appeared
(in the beginning of the eighties), it is obviously not because the quality of models has declined. But information on
model performance is more accessible (some systematic studies have been produced), and the fluctuations following

28
However, the size of samples does not necessarily grow with time. In a system of national accounts, the base year
has to be changed from time to time, and the old data is not necessarily converted.

53
the first oil shock have made forecasting more difficult. In periods of sustained and regular growth, extrapolating a
tendency is very easy for experts as well as for models. Paradoxically, the emergence of this criticism has followed,
rather than preceded, the increasingly direct intervention of model builders and their partners in forecasting results.

An econometric critique: modern techniques require a quantity and a quality of observations that available samples
have not followed. A gap has opened between estimation methods judged by econometrics theoreticians as the only
ones acceptable, and methods really applicable to a model29.

A theoretical critique: the development of economic theory often leads to sophisticated formulations that available
information has difficulty to validate. And in any event many areas present several alternate theories, between which
any choice runs the risk of being criticized by a majority of economists. Thus, in the monetary area, going beyond a
basic framework leads to rely on information unavailable in practice, or on formulations too complex to be estimated.

A mixed critique: users of models are no longer passive clients. They criticize formulations, as to their estimated
specification, or their numerical properties. This evolution is paradoxically favored by the improvement of the logical
interpretation of economic mechanisms, itself fathered essentially by economic knowledge (even the economic
magazine articles use implicit macroeconomic relations) and modelling practice (the population of clients includes
more and more previous model builders or at least followers of courses on modelling). One could say that model users
ask the tool to go beyond their own spontaneous diagnosis, and they want this additional information to be justified.

It is clear that these criticisms grow in relevance as the goal grows in ambition. Forecasts are more vulnerable than
simple indicative projections, which seek to cover the field of the possible evolutions. As for policy shock studies, they
are not prone to errors on the baseline assumptions, if we discount non-linearities30.

This relevance will also depend on credit granted to results. One can use figures as such, or be content with orders of
magnitude, or even simply seek to better understand global economic mechanisms by locating the most influential
interactions (possibly involving complex causal chains). In our sense, it is in this last aspect that the use of models is
the most fruitful and the least disputable31.

2.2 THEORETICAL MODELS

Contrary to previous models, theoretical models may be built for the single purpose of formalizing an economic
theory. It may be sufficient to write their equations, associating to a theoretical behavior a coherent and complete
system. Reproducing the observed reality is not the main goal of these models, and it is not mandatory to estimate
parameters: one can choose an arbitrary value, often dictated by the theory itself. In fact, this estimation will often be
technically impossible, when some of the variables used are not observed statistically (the goals or expectations of
agents for example).

29
Actually, the sample size required by present techniques (50 or better 100 observations) limits the possibility of
estimating equations using deflators or variables at constant prices. Even using quarterly data, separating values into
prices and volumes is quite questionable 15 years from the base period.

30
With a linear model, the consequence of a shock depends only on its size, not on the simulation it starts from.

31
One example is the impact of a decrease in local tariffs. Ex-ante it increases imports (a negative demand shock). Ex-
post it decreases local factor costs (with cheaper investment and cheaper labor, indexed on a lower consumption
price). This leads to more local capacity and competitiveness, both on the local scene (limiting the imports increase),
and the foreign one. In most models, GDP decreases then grows.

The full interpretation of such a shock provides a lot of information, even if one remains at the non-quantitative level.

54
However, even based on an artificial series and arbitrary parameters, the numerical simulation of these models can be
interesting. Actually, the formulas are often so complex that solving the model numerically will be necessary to
observe its solutions as well as properties (such as the sensitivity of solutions to assumptions and to coefficients).

2.3 QUANTIFIED SMALL MODELS

2.3.1 WITH SCIENTIFIC PURPOSE

These models represent an intermediate case. One seeks a realistic representation of the economy, adapted to
observed reality, but sufficiently simple to accept the application of complex analysis methods (and the interpretation
of their results). In addition to scientific research, this study can be done to measure and to analyze properties of an
operational model on a simplified representation (in the eighties Minims, then MicroDMS have been used to
characterize the Dynamic Multi Sectorial model of INSEE).

There are two categories of methods:

• “External” methods will use model simulations to observe its quantitative properties, and infer a descriptive
comment, both statistical and economic.
• “Internal” methods seek to explain properties of the model by its structural characteristics, using
mathematical tools. This does not necessarily call for actual simulations.

2.3.2 WITH AN EDUCATIONAL PURPOSE

Although often of the same type as the ones above, these models try to present economic mechanisms as complete as
possible, based on real data, under an interpretable and concise form. If necessary, one will favor the message
contained in the presentation over the respect of statistical criteria.

This is the case of the MacSim package, allowing students to interpret international mechanisms and interactions.

55
3 CHAPTER 3: MODEL TYPES

We shall now try to establish a classification of models, focusing on the link between the model’s characteristics and
the goal for which it has been built.

3.1 THE FIELD

The field described by a model is characterized by the variables it computes, but also by assumptions it takes into
account.

In the economic model subset, we can consider:

• A geographical field: national models, multinational models, world models. These last can be built in two
ways: by putting together preexisting national models, with potentially quite different structures, or by
building simultaneously country models of identical structure, possibly with a single team of modelers. We
shall deal with this later.
• A theoretical field: the theory used for the formalization of the model may or may not approach specific
economic aspects. A Keynesian model might limit the treatment of monetary aspects. A short-term model
will not formalize demographic evolutions.
• A field of units: a model might present only variables at constant prices, or physical quantities like barrels of
oil or number of pigs.
• A field of agents: a model will describe the behavior of a single agent: households, the State, firms.
• A field of goods: a model might consider only the production and the consumption of one good, for example
energy. An energy model can use physical units.
There are other types of fields. However, the distinction is not always easy: some models will describe summarily a
global field, except for a certain aspect on which it will concentrate. An energy model, to consider interactions with
the rest of the economy, will have to model it also, but not in the same detail. And it can mix physical units (barrels of
oil or gigawatts) with national accounts elements, with obvious conversion problems.

On the other hand, it will always be possible, and made easier by some modelling packages, to change (actually to
restrict) at the time of simulation the scope of the model. The distinction is then no longer permanent: a multi-
national model can be used to simulate a complete evolution of the world economy, but its user can also restrict
calculations to the evolution of a group of countries or even a single one, the other elements being fixed. One can
simulate a model of the real economy with or without additional monetary features. Or a model using normally
rational expectation elements can drop them to become purely backward looking.

3.2 THE SIZE

The history of modelling shows that for a long period new models generally have seen their size grow, for the reasons
cited earlier: the progress of model-building techniques, the increased availability of data, the faster computer
computations. Additionally, for any given model, size increases regularly in the course of its existence, as new team
members want to add their contribution.

However, the last decades have seen a trend favoring a return to models of limited size. Productivity improvements,
requested from teams of model builders, are less and less compatible with the management of a large model. Despite
the progress of model-building techniques, the desire to reduce costs and delays conflicts with the size, especially (but
not only) regarding human operations: elaboration of assumptions and interpretation of results.

56
Also, the use of a very detailed model can make individual estimations and specifications look too expensive. The
attractiveness of a calibrated CGE model will increase.

Finally, the desire to reply to critics comparing models to "black boxes" leads model builders to look for more explicit
and manageable instruments.

3.2.1 QUASI-ACCOUNTING MODELS

However, paradoxically, the need for detailed explanations, the availability of more detailed data, and the increased
power of computers (both in speed and size of manageable problems) has led to the development of more detailed
(often extremely detailed) tools: the Quasi-Accounting versions, considering generally a large number of products
(possibly hundreds), with a limit depending only on the availability of data. The framework is generally a full input-
output table.

Of course, econometrics are no longer applicable, and formulations are most often rather crude, with behaviors
established as exogenous ratios. But this also makes specifications clearer and more manageable, and the properties
easier to control. Also, the request for large samples is less strong.

This issue will be treated in a specific part. Among the cases we will present, we will consider a model of more than
15 000 equations, which can be summarized by collapsing the dimensions into a 50 equations presentation.

3.2.2 DETERMINANTS OF THE SIZE

Determinants of the size of the model will be:

• The size of the field covered (see above).

• The degree of aggregation, which can be
o vertical: number of operations considered (for example one can distinguish several types of subsidies, or
social benefits),
o or horizontal: number of agents listed; one can distinguish more or less sectors of firms, or categories of
households.

The degree of aggregation will not be inevitably uniform: an energy model will use a particularly fine detail for energy
products.

In fact the same model can appear under several versions of different size, depending especially on the degree of
aggregation. Each version has then its proper area of utilization: detailed forecasts, quick simulations, mathematical
analysis, and educational uses.

Thus at the end of the 1980s, the 3000 equation D.M.S model (Dynamics Multi Sectorial) used by INSEE for its
medium-term forecasts had two companion versions of reduced size: Mini - DMS (200 equations), used for some
operational projections and analysis which did not require detailed products, and Micro - DMS (45 equations), with an
essentially educational purpose.

This distinction has lost most of its validity, however, following the reduction of the size of operational models.

57
3.3 THE HORIZON

3.3.1 FOR FORECASTING 32

If a model is designed for forecasting, its horizon will be defined at the construction of the model. It will be strongly
linked to its general philosophy and to the set of mechanisms it implements. A long-term model will be little
interested in circumstantial phenomena (such as the lags in the adjustment of wages to prices), while a short-term
one will not consider the longest trends (such as the influence of the economic situation on demography).

These differences seem to discard elaborating a model that can be used for both short - and long-term projections.
But we shall see that strong reasons, in particular econometric, have made this option appear as the most natural in
the present context. We will develop them when we address periodicity, in paragraph 3.4.

In any case, one can find a certain asymmetry in the relevance of this observation. If long-term models can neglect
intermediate periods if they do not show significant fluctuations, simulation of the periods beyond the operational
horizon can evidence future problems, already present but not visible in the short-term.

It is clear:
• That treating medium or even short-term problems calls for a model with stabilizing properties, which can
only be controlled through long-term simulations. This includes in particular controlling the existence and
speed of numerical convergence and evidencing cyclical properties.
• That observing long-term properties has to be complemented by the intermediary profile. Again, a long-term
stabilization can be obtained through a monotonous or cyclical process.

3.3.2 FOR MODEL ANALYSIS

Here, the horizon depends on the type of analysis one wants to produce. Often, to analyze a model built with a given
forecasting horizon, simulation over a longer period must be obtained. Even more than for forecasts, analytic shocks
will show and explain anomalies that were not apparent in the normal projection period, but had already a
significantly harmful influence. We shall stress these issues later.

3.3.3 A TENTATIVE CLASSIFICATION

One could use the following classification:

• Short-term models: 1 quarter to 2 years.

• Medium-term models: 4 to 7 years.
• Long-term models: 10 years and more.
Obviously, for a dynamic simulation, the full path, including intermediate values, is of interest.

3.4 THE PERIODICITY

32
One shall notice that we can use several words to characterize these exercises: forecasts, projections, scenarios,
simulations. It all depends on the purpose for which the test was made, and perhaps the trust allowed to the results.
We favor the last term, which unfortunately has to be completed into: « simulation over future periods ».

58
The periodicity of a model is linked to the mechanisms it seeks to study and therefore to its horizon.

Short-term models require a short periodicity to consider circumstantial phenomena: delays in the wage indexation
on prices, progressive adjustment of the consumption level to an increase of income.

Long-term models can use a sparser periodicity, less for theoretical reasons (long-term behavior can be described by a
short-periodicity model), than for technical ones: this choice will reduce constraints on the availability of series,
facilitate the production of assumptions, and limit simulation costs.

However, we shall see that the use of “modern” econometrics methods calls for a short periodicity, for all kinds of
models, as soon as estimations are considered.

This means that the main determinant of model periodicity comes from the data. Countries which produce quarterly
national accounts use quarterly models, which allow them to apply modern techniques with some comfort, and
produce both short and long-term studies. Of course, results can be summarized later in yearly tables.

When only yearly accounts are available, the techniques become more simplistic, and true short-term applications are
not possible. Unfortunately, this applies most often to countries with a short history of statistics, making the problem
the harder.

3.5 OTHER MODELS

We have essentially concentrated on the macro-economic model case. One can also find:

• Micro-economic models: describing the behavior of firms, of households.

These models will sometimes be more theoretical, calling for optimization computations (such as cost minimization)
or to elements of strategy (game theory). They will often be estimated on survey data, with very large samples

• Non-economic models: they can apply to biology, physics, chemistry, astronomy, meteorology, ecology,
process control, and so on.... and can be used to evaluate the consequences of building a dam, controlling a
manufacturing process, looking for the best organization of a project, describing a biological process. These
models will often be conceived not as a formalized equation system, but as the maximization of a criterion
under some constraints, or as a system of propositions connected by logical operators.

59
4 CHAPTER 4: GENERAL ELEMENTS

This part of the book describes the process of development, use and management of a model, taking special interest
in technical aspects and particularly computer-oriented features. Application to EViews will be presented in detail, but
most of the teachings can be applied to other packages, including those which are not dedicated to econometric
structural modelling.

But let us give first a quick description of the organization of the model building process.

4.1 THE STAGES IN THE PROCESS

4.1.1 PREPARING THE MODEL

The first step in the building of any model is producing a draft which ensures some compatibility between available
data (wherever it might come from) and the type of model its builder has in mind (goal, field, nature of the variables,
underlying theory).

Knowing the scope of available data, the economist will define a model framework for which values can be attributed
to all variables, either by using available elements, by computation or as a last resort to expert advice (including the
modeler itself). This means that a first decision has to be made as to the field described by the model, the variables
used as assumptions, and the variables it shall compute. Moreover, he must divide the equations into identities, which
set indisputable links between variables, and equations describing the behavior of agents, for which the final
formulation will be based on past evolutions of the associated elements. In the course of model building, this status
can change.

The first task will be to gather, by reading from files and transforming the data, the full set of variables needed by the
model, to define the form of the identities, and give a first assessment of the behaviors he intends to describe. He
shall check for which periods the necessary data is known, and that on these periods identities hold true. If some
elements are not available, he will use the best proxies he can get. And if this also fails, he will use his imagination.

He can also make a first economic analysis of the framework implied by model specifications. This is greatly helped by
EViews which can give essential information on the model’s logic, even in the absence of any data.

4.1.2 ESTIMATION

The second phase will look for a satisfying description of the behavior of agents, by checking economic theory against
available data. The modeler shall define a set of formulations with unknown parameters, compute for each
formulation the values which give the best explanation of past evolutions, and make his selection, using as criteria
both statistical tests and compliance to economic theory. This process can call for the introduction of new variables, or
changes in some definitions, which will mean reformulating some identities.

Of course, both individual and global consistencies must be applied. For instance, using a Cobb-Douglas production
function implies considering the global cost in the equation for the output deflator.

4.1.3 SOLVING AND TESTING OVER THE PAST.

Once the full model is defined, one can try to solve it.

60
• One shall first check for consistency the set of equations, data and parameters, by applying each formula
separately on the sample period. If the estimation residuals have been introduced as additional elements, the
process should give the historical values in all cases.
• One shall then simulate dynamically the full model on the same period, setting (temporarily) the residuals to
zero. This will show if considering current and lagged interactions does not amplify too much the estimation
errors, both on the current period and with time. Using an error correction framework should limit the risk of
divergence.
• Finally, the reactions of the equilibrium to a change in assumptions, for instance the exogenous component
of demand, will be measured. The results will be compared with the teachings of economic theory, and what
is known of values given by other models, moderated by the characteristics of the country. However, one
should not spend too time here, as simulations over the future will provide a much better context.

Discovering discrepancies can lead to changes in some elements of the model, including the set of its variables. This
means going back to step 1 or 2.

4.1.4 SOLVING AND TESTING OVER THE FUTURE

Once the model has passed all tests on the past, further tests will be conducted, under conditions more
representative of its actual use: on the future. For this, values will have to be established for future assumptions.
Again, the sensitivity of the model to shocks will be studied, this time with a longer and smoother base, better
associated with future use. As to the reliability of baseline results, one can rely this time on stochastic simulations.

The results of this step can of course show the necessity to revert to a previous stage, including the introduction of
new data, changing causalities, or re-estimation. To limit the number of backward steps, one should introduce in the
original data set all potential variables, and decide on behavioral equations considering the global properties.

4.1.5 USING THE MODEL FOR FORECASTS AND POLICY STUDIES

Finally, the model will be considered as fit for economic studies: forecasts and economic policy analysis.

From now on, we shall suppose we are using a dedicated package like EViews. But for people who till model through a
spreadsheet), most of our observations will stim apply.

4.2 HOW TO ORGANIZE THE DEVELOPMENT OF THE MODEL

Let us now consider the organization of the model production task.

To create a model, two extreme types of organization can be considered:

• Methodical option:

The model builder

o Specifies completely a coherent model (including accounting equations), precisely separating assumptions
from results.
o Looks for the necessary series.
o Estimates behavioral equations.
o Uses the consequent model.

Applying such a framework is obviously illusory, as many backtrackings will be necessary in practice:

61
o Some series will show up as unavailable, and it will be necessary to replace them or to eliminate them from
formulations. Thus, in the absence of series for interests paid by firms one will have to be content with profits
before interests.
o Some estimations will give unsatisfactory results: it will be necessary to change formulations, to use
additional or alternate series. Thus, a formulation in levels might have to be replaced by a formulation in
logarithms (constant elasticities), or in growth rates. Or one will be led to explain the average monthly wage
instead of the hourly wage, and to introduce in this last explanation the evolution of the minimal wage. For
an oil producing country, it will appear necessary to identify oil (and non-oil products) in both production and
exports.33
o New ideas will appear during estimation. For example, a recent article on the role of Foreign Direct
Investment might lead to test an original formulation.
o Formal errors are going to be identified. Thus, an element (a type of pension) might have been forgotten
from households’ income.
o Some variables defined as assumptions are going to appear sufficiently influenced by results to see their
status modified.
o Some causalities will be questioned when observing numerical properties.
o Simultaneities will have to be replaced by lagged influences.
o The size (or even the sign) of the response to changes in the assumptions will be inconsistent with theory.

o The model will not converge, and the specifications will be considered as the cause.

• Improvisation

To the contrary, a model builder can

o establish general options for the model structure and theoretical framework,
o produce some formulations independent from each other,
o estimate them by accessing to separate series,
o And gradually connect selected elements by completing the model with linking identities and the data set
with the necessary exogenous variables.

This framework will be even less effective, if only because the number of single operations on equations and series
will present a prohibitive cost. Furthermore, enforcing the accounting and theoretical coherence of the model could
prove difficult, and the modelling process might never converge at all to a satisfying version.

• The optimal solution is of course intermediate:

o Define as precisely as possible the field and the classification of the model.
o Define its general theoretical options and its goal.
o Obtain, create and store the total set of presumably useful series, with no limitations.
o Establish domains to estimate, specify associated variables and set formal connections, especially accounting
relations.
o Undertake estimations
o And go through changes (hopefully limited) until an acceptable form is obtained.

It is clear that this type of organization is all the easier to implement if:

• The size of the model is small: it is possible to memorize the total set of variable names for a thirty equations
model, but for a large model a formal documentation will be necessary, produced from the start and updated

33
Actually, this should have been evident from the start.

62
regularly. This framework should be discussed in detail by the modelling team and as many outsiders as
possible.
• The number of concerned persons is small (the distinction comes essentially between one and several): for a
team project, the role of each participant and his area of responsibility have to be clearly defined. Especially,
physical changes (on both data and model specifications) should be the responsibility of one individual, who
will centralize requests and apply them. And modifications must be clearly stated and documented, through a
formal process.

Individual modifications of the model can be allowed, however, provided a base version is preserved. Thus several
members of a team of model builders can test, one a new production function, another an extended description of
the financial sector. But even in this case updates will often interfere, at the time modifications generated in separate
test versions are applied to the base one. For instance, a new definition of the costs of wages and investment, which
define the optimal shares of labor and capital in the production function, will influence the target in the price
equation.

63
5 CHAPTER 5: PREPARING THE MODEL

We shall now start with the first (and probably most important task): preparing the production of the model.

One might be tempted to start actual model production as soon as possible. But it is extremely important to spend
enough time at the start evaluating the options and choosing a strategy. Realizing much later that he has chosen the
wrong options, the modeler is faced by two bad solutions: continuing a process leading to a subpar model, or
backtracking to the point where the choice was made.

This can concern

• The organization of tasks, like producing at first single country models, for a world modelling project.
• Economic issues, like choosing the complexity of the production function.
• Accounting issues, like deciding the decomposition of products, or the distinction into agents.
• Technical ones, like the number of letters identifying the country in a world model series name.

5.1 PREPARING THE MODEL: THE FRAMEWORK

At the start of the model building process, the modeler (or the team) has at least:

• General ideas about the logic of the model he wants to build.

• Information about the set of available data.
Actually, things can be more advanced:

• The data can be directly available, as a computer file, but not necessarily in the format needed by the
modelling package. Many databases (like the World Bank’s World Development Indicators) are stored on the
producer’s website in Excel or CSV format. In more and more cases, access can be provided from inside
EViews, but this is not necessarily the best option.
• Equations may have already been established, either as formulas or even estimated items, if the modeling is
the continuation of an econometric study, produced by the modeler or another economist.
In any case, the first stage in the process should lead to:

• A fully defined set of equations, except for the actual estimated formulas.
• The corresponding set of potentially relevant data.
Obviously, these two tasks are linked, as equations are established on the basis of available data, and the data is
produced to fit the model equations. This means that they are normally processed in parallel. However, it is quite
possible:

• To produce most of the data before the equations are defined. Some concepts (the supply - demand
equilibrium at constant and current prices, employment, the interest rates) will certainly appear in the
model. But some model-specific variables might have to wait.
• To produce the model specification before any data is available. Of course, writing an identity, or stating the
equation to be estimated, does not require data. It is only the application (checking the identity is consistent
or estimating the equation) which does. But one must be reasonably sure that the data will be available, or
that there will be a reasonable technique totargettarget estimate it.
One can even produce a first version of the program transforming into model concepts the original data, once these
concepts are completely defined, but before any data is technically available (one just needs their definition).

64
One can compare the situation with the building of a house: one can draw the plans before the equipment is
purchased, but its eventual availability (at the right time) must be certain. And the goods can be obtained before the
plans are completely drawn (but the chance of having to use them must be reasonably high)34.

One can even imagine the data using a random process, and apply an estimation program, without considering the
results but checking for the presence of technical mistakes.

These options are not optimal in the general case, but they can help to gain time. Most modelling projects have a
deadline, and once the work force is available, the tasks should be processed as soon as possible, if one wants to have
the best chance of meeting it.

One can question the feasibility of producing a full set of equations before any estimation. What we propose is to
replace the future formulations by a “declaration of intent” which states only the dependent variable on the left, and
the explanatory elements on the right. For each equation, the format should be as close as possible to:

Variable=f*(sum of the explanatory variables)

For instance, for exports X depending on world demand WD, the rate of use of capacities UR and price
competitiveness COMPX, one will use:

scalar f

X = f*(WD+UR+COMPX)

The first statement avoids having f considered as an exogenous element.

The advantages of defining a full model are numerous:

• The modeler will be able to check by sight the logic of his model
• The text can be provided to other economists for advice
• The full list of requested variables can be established, allowing to produce a complete transfer program
• Processing the equations through EViews will give interesting advice on several elements. Double clicking on
the “model” item one will get:
o The equations (from ‘Equations” or “Print View”).

34
As there is a cost to the goods. For free or quasi-free data, the chance can be lowered.

65
 The grammatical acceptability of equations will be checked: for instance, if the number of left and right
parenthesizes is indeed the same. Erroneous equations will appear in red in “Equations”.

 Also, the fact that each endogenous variable is computed only once. The second occurrence will also appear
in red.

o The variables.

66
Note: if a variable is currently overridden, its name will appear in red.

 The most important information will come from the list of exogenous: one might find elements which should
have been determined by the model, according to its logic. In general, this will mean one has forgotten to
state the associated equation. Also, some elements might appear, which should not belong to the model.
Normally this corresponds to typing errors.

o The block structure:

Number of equations: 14
Number of independent blocks: 3
Number of simultaneous blocks: 1
Number of recursive blocks: 2

Block 1 : 2 Recursive Equations

prêt(3) x(13)

Block 2: 11 Simultaneous Equations (1 feedback var)

ic(10) led(4) le(5)

lt(6) rhi(7) co(9)
ih(8) i(2) fd(11)
m(12) q(1)

Block 3: 1 Recursive Equations

k(14)

67
It decomposes the set of equations into a sequence of blocks, either recursive (each variable depends only on
preceding elements) or simultaneous (variables are used before they are computed). If one is going to succeed in
estimating equations which follow the same logic as intended in the preliminary version, the block structure described
at this stage will be already fully representative of the future one. One can detect:

• Abnormal simultaneities: a causal loop might appear, which is not supported by economic theory behind the
model.
• Abnormal recursive links: a block of equations containing a theoretical loop (the wage price loop, the
Keynesian cross) can appear as recursive. This can come from a forgotten equation, a typing error…
Practical operational examples will be given later.

In any case, observing the causal structure of the model will give some preliminary information about its general logic,
and its potential properties.

5.1.1 PRINT VIEW

To check on the model specifications, you can use the “View > Print View” will display the following window:

With “OK” we get a presentation more versatile than “Text”.

One will note that it is possible to decide on the number of significant digits, which produces clearer displays in a
document (the default is 8).

68
5.2 PREPARING THE MODEL: SPECIFIC DATA ISSUES

Let us detail the process.

5.2.1 TYPES OF DATA

In the case of a national macroeconomic model, the needed data can be:

• National Accounts elements: operations on goods and services, transfers between agents, measured in value,
at constant prices, or at the prices of the previous year. The producer will generally be the national statistical
office. For France it would be INSEE (the National Institute for Statistics and Economic Studies).
• The corresponding deflators.
• Their foreign equivalents, using the accounting system and the corresponding base year of the particular
country, or rather a synthesis produced by an international organism (OECD, International Monetary Fund,
EuroStat....).
• Variables in a greater detail, possibly measured in physical quantities (oil barrels, tons of rice). They can come
from a public or private agency, or from the producers themselves. In France energy elements would come
from the Observatory of Energy.
• Monetary and financial data, coming mostly from the local National Bank (in France the Bank of France or the
European Central Bank. ...), or from an international bank (OECD, World Bank, International Monetary Fund,
EBRD, ADB…).

69
• Data on employment or unemployment. One can get detailed labor statistics (by age, qualification, sex...)
from the US Bureau of Labor or the French “Ministère du Travail”.
• Demographic data: population, population in age of working, age classes (INSEE in France).
• Survey data: growth and investment prospects according to firm managers, productive capacity, living
conditions of households (coming from public or private institutes).
• Qualitative elements: the fact of belonging to a specific set, meeting a specific constraint.
• Micro economic models will generally use survey data (households, firms) with sometimes a time dimension
(panels, cohorts) and possibly include some of the above elements as global indicators.
As the area of application of models is unlimited, the field of potentially relevant data is also. A model on the economy
of transportation would include technical data on the railway system and on distances between cities, an agricultural
model meteorological data and information on varieties of species.

5.2.2 THE ACCESS TO DATA

The medium through which data can be obtained will play an important role. Accessing the necessary data takes into
account several features:

• the mode of transmission

• the format used
• the institutional aspects
We shall treat them in turn, then present the most usual cases.

5.2.3 THE MODE OF TRANSMISSION

Several options are available for transferring the data to the model.

5.2.3.1 Physical transmission

Data can be obtained from a physical support, either commercially produced or created for the purpose. This can be
either a CD or DVD-ROM, or another rewritable media such as an USB key, or a memory card. For instance, INSEE
provides CD-ROMs containing the full National Accounts.

5.2.3.2 E-mail transmission

Files can be transferred from a user to another by e-mail, as an attachment to a message.

5.2.3.3 Sharing files

One can share files using Google Drive, Dropbox, or other means.

The advantage is for participants to a project to share elements in real time. Of course, one must be careful with their
status, between read only and read+write. This requires some organization between team members.

In any case, one can transfer the shared file to his computer, allowing any changes.

5.2.3.4 Downloads from a site

70
Files can be downloaded from a website, commercial or not

An extensive survey of the data available online for free, compiled by John Sloman at the Economics, will be found at
the address:

https://www.economicsnetwork.ac.uk/data_sets

with direct access to the relevant sites.

In our experience of building single or multi-country macro econometric models, we are mainly using:

• The OECD Main Economic Indicators.

• The World Bank Development Indicators.
Both institutions provide a large set of series, covering most of the fields associated with our models. There are
differences, however:

• The OECD set covers a limited number of countries:

OECD Economies

Austria, Belgium, Canada, Chile, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary,
Iceland, Ireland, Israel, Italy, Japan, Korea, Latvia, Lithuania, Luxembourg, Mexico, Netherlands, New, Zealand,
Norway, Poland, Portugal, Slovak, Republic, Slovenia, Spain, Sweden, Switzerland, Turkey, United, Kingdom, United,
States, Euro area (17 countries), OECD Total.

World

Non-OECD Economies

Argentina, Brazil, Bulgaria, China (People's Republic of), Colombia, Costa Rica, India, Indonesia, Romania, Russia, South
Africa

Dynamic Asian Economies

Rest of the World

Other oil producers

Each set contains 122 quarterly series, and 221 yearly. They begin in 1962 and end at present (April 2020) in 2017.

The data is available in CVS format, easily converted to Excel and then transferred to EViews.

• The World Bank covers a much larger set of countries (267 including groups), and series (1429), but:
o The periodicity is annual, starting in 1960 at the earliest.
o Some important elements are lacking, in particular employment and capital (available in principle in the
OECD set).

The list of countries and groups can be found at:

The reason for the large number is that the fields covered are much wider, with a focus on sociological issues,
including for instance (potentially):

71
o People using safely managed drinking water services, rural (% of rural population)
o Women who believe a husband is justified in beating his wife when she burns the food (%)

The set is accessed at:

https://datacatalog.worldbank.org/dataset/world-development-indicators

which displays a page containing:

Clicking on “Download” creates a Zip file for a complete Excel sheet (110 Mb).

Creating an EVIews file containing pages associated with countries is rather straightforward, but involves a little
programming (50 lines). The text is available from the author.

Of course, the number of actual values can vary and can be zero.

Other interesting sites are:

• The IMF site

http://data.imf.org/

Very interesting for financial data, of course, such as the countries’ Balance of Payments. But most of this information
is duplicated on the World Bank site (in annual terms, however).

• The International Labor Organization

https://ilostat.ilo.org/data/

for detailed labor series, and employment series missing for the World Bank.

• and for French data, one can access the INSEE site www.insee.fr.

5.2.3.5 Attribute import and export

72
If documentation is attached to series, it can be imported along with the values and the series names.

5.2.3.6 Using World Bank data

Now you can access the World Bank data from inside a session, through the menu options.

First use:

File > Open> Database

EViews displays this menu in which you select: World Bank Database.

73
5.2.3.7 Accessing INSEE series

For the French users (or those interested in the French economy), the official INSEE series can be downloaded.

From EViews, one should access:

File>Open database>INSEE SDMX database, which opens:

74
The information can be obtained at:

https://www.insee.fr/en/information/2868055

5.2.3.8 Other media

In less and less frequent cases, some data will not be available in magnetic form: series will be found in printed or
faxed documents, or obtained directly from other experts, or fixed by the user (who then plays the role of expert).
This data will have generally to be entered by hand, although a direct interpretation by the computer through optical
character recognition (OCR) is quite operational (but this technique calls for documents of good quality).

In this case it is essential not to enter figures directly into the model file, but to create a preliminary file (such as an
Excel sheet or even and ASCII file) from which the information will be read. This separates the modelling process from
the production of “official” information.

5.2.3.9 Sharing files

It is now quite easy to share files through Internet. This is useful in two cases:

You participate in a project with other modelers.

You follow a project and receive information from the manager.

5.2.3.10 Using the cloud

5.2.3.10.1 Cloud drive support

Although this feature can look unrelated to modelling, it can be quite useful for researchers communicating together
on some project. In particular the files can be shared on a Google Drive.

For instance, for a Google Drive, you can work through a Cloud directory, using:

Open>Foreign Data as Workfile>New Location>

The location can be:

75
• Dropbox.
• Google Drive.
• One Box
• Box.
Authorization can be (has to be) managed.

Then authorize and get access to the whole directory.

76
5.2.3.11 Change of format

As indicated above, the original data format is generally different from the one used by the model-building software.

In the worst cases, transfer from one software program to another will call for the creation of an intermediate file in a
given format, which the model-building software can interpret. The Excel format is the most natural intermediary, as it
is read and produced by all packages. In that case, it is not necessary to own a copy of the package to use its format.

In the very worst cases, it is always possible to ask the first program to produce a character file (in the ASCII standard)
which can, with minimal editing, be interpreted by the second program as the sequence of statements allowing the
creation of the transferred series, including data and definitions35.

However, the situation has improved in the last years, as more and more packages provide a direct access to the
formats

Access
Armes-TSD
Binard
dBase
Excel (through 2003)
Excel 2007 (xml)
EViews Workfile
Gauss Dataset
GiveWin/PcGive
HTML
Lotus 1-2-3
ODBC Dsn File
ODBC Query File

35
For instance, the sequence:

use 1970 to 2007

read x

----- values ----

end
can be translated easily by a word processor into
smpl 1970 2007
series x
----- values ----- ;

77
ODBC Data Source
MicroTSP Workfile
MicroTSP Mac Workfile
RATS 4.x
RATS Portable / TROLL
SAS Program
SAS Transport
SPSS
SPSS Portable
Stata
Text / ASCII
TSP Portable

5.2.3.12 Institutional issues

Of course, one must also consider the relationship between the data producing and modelling institutions. The most
technically complex transfers do not necessarily occur between separate institutions. A commercial contract might
give the modelling institution direct access (through one of the above means) to information managed by a data
producing firm, under the same software format, while a large institution might still use CD-ROMs as a medium
between separate units.

However, one must also consider the cost of establishing contracts, including perhaps some bartering between data
producing and study producing institutions.

5.2.3.13 How to manage the access to several sources

As a general principle, one should favor using a single source. But this is not always possible. In that case, one should
define a primary source, and take from the alternate ones the only additional series. The main problems might come
from:

• Deflators and values at constant prices using a different base year.

• Financial and labor data bases sharing elements with national accounts.
• Variables measured in physical units (tons, square meters) having their counterparts in values.
In all these cases, the priority is the consistency of model equations, based on the data from the primary source.
Additional elements must be adjusted to provide this consistency. This applies in particular to the balance of
equilibrium equations (supply = demand), or sums (total demand= sum of its components).

In addition, for operational models designed to produce official studies and in particular forecasts, it is essential that
the results concur with the official local statistics. As forecasts are presented mostly as growth rates (GDP and inflation
for example), but provide also the last statistical (official) level, the first value in the forecast must be consistent with
both. If the model is built on an outside source, the forecast must be corrected accordingly. This issue will be
developed when we present the forecasting task.

For instance, let us suppose that the limits on the availability of local statistics availability forces the modeler to use on
external source (like the WDI from the World Bank) to produce a full model, but the local statistical office provides
basic elements like GDP. If the model forecast starts in 2020

5.2.4 PREPARING THE DATA FOR THE TRANSFER

78
Let us now define the best organization for transferring data from the original source to the software (we shall use
EViews as an example).

We must guarantee several things:

• The original data must remain available

• It must be updated easily.
• Transfer must be as easy as possible.
To achieve these goals, the best organization should be:

• Storing the original file under another name.

• In this file, creating a new page.
• Copying the original series into this page, using “copy with link” for Excel.
We shall suppose the original data is organized as a matrix (or a set of matrixes) with series either in lines or columns.
If not, an additional intermediary phase can be needed 36.

• Insert if necessary a line of series names above the first period data (or a column left of the first column).
It does not matter if the matrix does not start in cell B2. Just insert as asked.

• Read it in EViews using import or copy.

This guarantees that:

• The original data is not modified.

• Updates are easy: just copy the new page into the original one (and drag cells in the second page if new
observations have appeared).
The only change in the EViews transfer programs concerns the sample period.

In recent versions, the import statement memorizes the reference to the original (Excel) file. EViews will detect if a
change is made and propose updating the page in the workfile accordingly.

5.2.5 THE PRELIMINARY PROCESSING OF SERIES

Very often the nature of available series is not really adapted to the needs of the model. A preliminary processing is
then necessary. This can apply to several features.

5.2.5.1 Time transformations

Most of the time the series the model builder will access have the right periodicity. Individual exceptions can occur.
New series will have to be computed (inside the modelling package).

The change can be undertaken in two directions: aggregating and disaggregating.

36
For instance, quarterly data can appear in yearly lines of four columns.

79
5.2.5.1.1 Aggregation

The easiest case happens if the available periodicity is too short. The nature of the variable will lead naturally to a
method of aggregation, giving the exact value of the series:

If we call Xt the aggregated variable in t, and xt, i the variable of the sub - period i in t, we can consider the following
techniques:

• Sum, for a flow (such as the production of a branch).

n
X t =  x t ,i
i =1

• Average, for a level (such as unemployment for a given period).

n
X t = 1 / n  x t ,i
i =1

• First or last value, for a level at a given date (for example the capital on the first day of a year will come from
the first day of the first quarter). This will apply to stock variables.

𝑋𝑡 = 𝑥𝑡,1

or
𝑋𝑡 = 𝑥𝑡,𝑛

5.2.5.1.2 Disaggregation

When moving to a shorter periodicity, EViews provides a large list of options, depending on the nature (flow, level or
stock) of the variable.

The following table is copied from the EViews Help and applies to the “c=” modifier in the copy statement.

For instance:

copy(c=q) quart\x
80
can copy the yearly series X into the quarterly page quart using a quadratic smoothing, in such a way that the average
of the quarterly values matches the yearly one.

arg Low to high conversion methods: “r” or “repeata” (constant match average), “d” or “repeats”
(constant match sum), “q” or “quada” (quadratic match average), “t” or “quads” (quadratic match
sum), “linearf” (linear match first), “i” or “linearl” (linear match last), “cubicf” (cubic match first),
“c” or “cubicl” (cubic match last), “pointf” (point first), “pointl” (point last), “dentonf” (Denton
first), “dentonl” (Denton last), “dentona” (Denton average), “dentons” (Denton sum), “chowlinf”
(Chow-Lin first), “chowlinl” (Chow-Lin last), “chowlina” (Chow-Lin average),“chowlins” (Chow-Lin
sum), “litmanf” (Litterman first), “litmanl” (Litterman last), “litmana” (Litterman average),
“litmans” (Litterman sum).

rho=arg Autocorrelation coefficient (for Chow-Lin and Litterman conversions). Must be between 0 and 1,
inclusive.

5.2.5.1.3 Smoothing

Smoothing represents a particular case: preserving the same periodicity as the original series, but with the constraint
of a regular evolution, for example a constant growth rate. Instead of n free values, the choice is reduced to the value
of one (or maybe two) parameters.

EViews provides a large set of methods, some very sophisticated, the most popular being the Hodrick-Prescott and
Holt-Winters methods. The methodology and syntax are explained in detail in the User’s Manual.

5.2.5.1.4 Seasonal adjustment

As we explained before, one method for dealing with variables presenting a seasonality is to eliminate it, and work
with seasonally adjusted series.

Several algorithms can be considered, the best known being probably Census X-13-ARIMA and TRAMO-SEATS, both
available in EViews.

Obviously, one should not mix original and adjusted series in the same set of model variables.

5.2.5.2 Change of classification

We have already considered this problem when we addressed the fields of models.

Changing categories will usually correspond to aggregation. In the case of economic models, this will apply essentially
to:

• Economic agents: one might separate more or less precisely households’ categories (following their income,
their occupation, their size...), types of firms (according to their size, the nature of their production...),
Government institutions (central and local, social security, agencies...).
• Products (production can be described in more or less detail).
• Operations (one can separate social benefits by risk, by controlling agency, or consider only the global value).
• Geographical units (a world model can aggregate countries into zones).

5.2.5.3 Formal transformations

81
Some variables needed by the model will not be available as such, but will have to be computed from existing series
by a mathematical formula. For example, the rate of use of production capacity will be defined as the ratio between
effective production and capacity, coming possibly from different sources. Or the relative cost of wages and capital
(used for defining the optimal production process) will take into account the price of the two factors, but also the
interest rate, the depreciation rate, the expected evolution of wages, and some tax rates.

5.2.6 UPDATES

Once adapted to needs of the model builder, series often will have to be modified.

Changing the values of existing series can have several purposes:

• Correcting a formal error, made by the model builder or the producer of series: typing errors, or errors of
concept.
• Lengthening the available sample: new observations will keep showing up for the most recent periods.
• Improving information: for the last known years, series in the French National Accounts appear in succession
as provisional, semi-final and final.
• Changing the definition of some variables. For instance, the work of private doctors in State hospitals can
move from the market to the non-market sector or vice-versa.

One can also add to the data set a completely new series

• Which has appeared recently as useful to the model.

• Which has been made available by access to a new source of information, or the creation by data builders of
a new, more interesting, concept.

This multiplicity of possible changes prohibits the global set of series used by the model to remain the same even for a
short period. Adapting constantly model specifications (in particular the estimated equations) to this evolution would
ask a lot from the model builder to the detriment of more productive tasks. This means one should limit the frequency
of reconstitutions, for the operational data set (for example once or twice per year for an annual model, or every
quarter for a quarterly one), with few exceptions: correcting serious mistakes or introducing really important
information.

Without doubt, the best solution is actually to manage two sets of data, one updated frequently enough with the last
values, the other built at longer intervals (the periodicity of the model for example). This solution allows to study in
advance, by estimations based on the first set, the consequences of the integration of new values on the
specifications and properties of the next model version.

5.2.7 SUPPRESSIONS

It is beneficial to delete in the bank those series which have become useless:

• This allows to gain space.

• Searches will be faster.
• The bank will be more coherent with the model.
• The model builder will have less information to memorize, and the architecture of the bank will be easier to
master (one will have guessed that this is the most important feature, in our sense).
Useless series that are preserved too long lead to forgetting what they represent, and their later destruction will
require a tedious identification process.

82
For EViews, this presents an additional interest: the elements in the workfile will be display in a single window, and it
is essential for this window to concentrate as many interesting elements as possible.

5.2.8 THE DOCUMENTATION

Similarly, investment in the documentation of series produces quick returns. It can concern:
• The definition, possibly on two levels: a short one to display titles in tables or graphs, and a long one to fully
describe the concept used.
• The source: original file (and sheet), producing institution and maybe how to contact the producer.
• The units in which the series is measured
• Additional remarks, such as the quality and status (final, provisory, estimated) of each observation.
• The date of production and last update (hours and even minutes also can be useful to determine exactly
which set of values an application has used). This information is often recorded automatically by the
software.
• If pertinent, the formula used to compute it.

Example: Wage rate = Wages paid / (employment x Number of weeks of work x weekly work duration).

EViews allows to specify the first four types, using the label command, and produces automatically the last two.

For example, a series called GDP can be defined through the sequence:

GDP.label(c)

GDP.label(d) Gross Domestic Product at constant prices

GDP.label(u) In 2014 Euros

GDP.label(s) from the Excel file accounts.xls produced by the Statistical Office

GDP.label(r) 2019Q4 is provisory

Which clears the contents, gives the definition, describes units, the source, and adds remarks.

• In addition, from version 8, EViews allows to introduce one’s own labels, for instance the country for a
multinational model, the agent for an accounting one, or the fact that a series belongs to a particular model.

For instance, you can use:

HI.label(agent) Households

MARG.label(agent) Firms

83
• If the workfile window screen is in “Display+” mode, you can sort the elements according to their
characteristics. In addition to the name, the type and the time of last modification (or creation) you have
access to the description.

Moreover, if you right click on one of the column headings, and choose “Edit Columns” you can display additional
columns for any of the label types, including the ones you have created.

This can prove quite useful, as it allows you to filter and sort on any criterion, provided you have introduced it as a
label.

This criterion can be for instance:

o The agent concerned

o The country
o The association with a given model
o The formula in the model
o The formula used to create the series (if any)37
o The type within this model (exogenous, endogenous, identity, behavior…)
o The sub-type: for exogenous it can be policy, foreign, structural. For endogenous it can be behavior or
identity.

Once the display is produced, it can be transferred to a table, which can be edited (lines, fonts…) and used for
presentations.

For instance, one can produce a table for a model, with columns for type, agent, units, source, identity / behavior….
This table can be sorted using any of the criteria.

These new functions allow table production to be integrated in the modelling process, a very powerful information
tool for both model development and documentation.

For instance, you could use:

F_HDI.label(d) Disposable income

U_MARG.label(d) Margins

F_HDI.label(model) France small

U_MARG.label(model) USA small

F_HDI.label(agent) Households

U_MARG.label(agentl) Firms

37
You can also use the “source”

84
One of the main interests of this feature is to create a table(using “freeze”). This table can then be sorted according to
any of the criteria.

These definitions follow the series as they are moved through the workfile, or even to an external file.

5.2.9 CONSEQUENCES ON WORK ORGANIZATION

Let us now give some specific considerations on data management.

In the general case, the model builder will be confronted with a large set of series of more or less various origins.
Optimal management strategy might appear to vary with each case, but in fact it is unique in its main feature: one
must produce a file, in the standard of the model building software, and containing the series having a chance to be
useful for the model.

This is true even if the global set of necessary series is produced and managed on the same computer or computer
network, using the same software (the task of transfer will be simply made easier): it is essential that the model
builder has control over the series he uses, and especially that he manages changes (in particular updates of series in
current use). Interpreting a change in model properties (simulations, estimations), one must be able to dismiss a
change in the data as a source, except if this change has been introduced knowingly by the model builder himself38.

Such an organization also makes the management of series easier. In particular, limiting the amount of series in the
bank, apart from the fact that it will save computer time and space, will make the set easier to handle intellectually.

Concerning the scope of the series, two extreme options can however be considered:

• Transferring in the model bank the whole set of series that have a chance (even if a small one) to become
useful at one time to the development of the model39.
• Transferring the minimum, then adding to the set according to needs.
If a median solution can be considered, the choice leans strongly in favor of the first solution. It might be more
expensive initially, in human time, and in size of files, but it will prove generally a good investment, as it avoids often a
costly number of limited transfers, and gives some stability to the bank as well as to its management procedures.

5.2.10 THE PRACTICAL OPTIONS

For models managed by institutions or research groups, the most frequently found organization is a team working
through computers connected through Internet, where storage and synchronization services like Google Drive allow
to share files inside a project. A rigorous work organization is needed to manage the elements of the project,
between work in progress for which current information must be provided, and documented final versions, which can

38
This remark is a particular application of the general principle « let us avoid potential problems which can prove
expensive in thinking time ».

39
Even if they are not considered for actual model variables. For instance, one can be interested in comparing the
capital output radio of the modelled country with those of other countries.

85
only be unfrozen by a controlling manager. Not meeting these principles can lead very quickly to a confusing and
unmanageable situation.

The final version can be made available online to followers, along with the associated documentation and examples. If
the follower has access to the relevant modeling software, direct access to the files can be provided.

In the case of an operational project (like allowing Government economists to produce forecasts and studies) access
can be provided through a front end, which does not require any knowledge of the model management software. This
is the case for EViews.

As to the original data, it can come from distant sources like the website of the World Bank, or of the statistical office
of the model’s country. One might in some cases access directly the data sets of the provider from inside a model-
building session (this is the case in EViews for the World Bank’s WDI). The producers of modelling packages are giving
a high priority to this type of option.

One must however pay attention to format incompatibilities, especially if the operating system is different (Windows
and its versions, Linux, UNIX, Macintosh...)40.

5.3 ACCESSING INTERNATIONAL DATA BASES

In this chapter, we describe the use of data provided by international organizations and institutes.

We will not try to be comprehensive: this is a formidable task, certainly more time consuming than the production of
the present book.

This is done much better by various institutes, which the reader can find easily with a simple Google search. One very
good instance is:

https://www.economicsnetwork.ac.uk/data_sets

Also, a portal giving access to all UN-related sites is:

https://data.un.org/

where you can get access to all the main sites, and also some specific domains, like agriculture and rural development.

Our purpose will rather be the following: to help the producer of a new model to complete the data base he needs for
this task.

So we will rather focus on the technical process, limited to the most promising options in our opinion.

Two cases must be considered:

• The model applies to a single country, which represents the interest of the builder.

40
Most modelling packages work actually under Windows, except freeware like R.

86
This is particularly relevant if he belongs to an official organization, and is responsible for providing studies (maybe
forecasts) applied to the country’s economy. In this case, he will have access to the official data for this country, and
his model must conform to it, if only to provide results in the official format for data and concepts.

But this is also true if the builder is a local independent (maybe a PHD student). He will be more familiar with the
context and have more direct access to local resources.

However, he will need some foreign information, if only to produce some assumptions on foreign demand and prices.

In this case, any source of information is adequate. Detail will only be useful for building detailed scenarios on the
evolution of the world economy, and its consequences on local growth. For instance, a Vietnamese modeler might
require a description of external trade identifying Chinese growth, to establish the model assumptions.

But the classifications, base years and accounting systems can remain different.

• The model applies to a group (like the European Union) or the world (a set of groupings covering the whole
world).

In this case compatibility between models requires access to a single source. The choice must be made at the start,
based on a detailed study of the advantages of all solutions available.

We will rather select the most and focus on the technical access to the main sources, with practical examples. A very
comprehensive list (giving access to the various sites) will be found in:

5.3.1 THE WORLD BANK

In our opinion, the main sources for international data are:

• The World Bank World Development Indicators

• The OECD Economic Perspectives.

And as a potential complement:

• The IMF and ILO databases.

We will focus on the World Bank “World Developments Indicators” under EViews.

5.3.1.1 The main criteria for the choice

The two main options provide enough information to build a single product model. We have used both to build two
multi country models OECD data for the MacSim project detailed above, and the World Bank WDI for a fifteen country
ECOWAS model, presently in the final stage of development.

One can wonder why we did not use a single source, which would have simplified the process, in particular the
programming.

The reasons are practical, and the choice has been obvious:

The OECD data uses a quarterly periodicity, essential to describe the dynamics of the MacSim developed economies
and applying sophisticated econometric techniques such as cointegration.

The countries in the ECOWAS model are not considered developed enough to be described in the OECD data set.
87
As to the additional sources:

The ILO data set is extremely detailed, but limited to the field of labor: employment, unemployment, revenue and
costs. However, if fills gaps on wages, a problem for the WDI and less for OECD.

The IMF data set is logically more detailed on financial series, although many of them can be found in the WDI, with
an annual periodicity, however.

5.3.2 THE ORIGINAL SERIES

At this time (22nd June, 2020) The World Bank makes available, on the site

https://datatopics.worldbank.org/world-development-indicators/

The compressed file:

The series are contained in the page “Data” unfortunately as a single page, with all the series (1442 of them) for all the
countries (266 of them, including a number of groupings) and the periods 1960 to 2020 (in principle).

As shown below, Columns A to D contain:

• The country name

• The 3-letter acronym
• The definition
• The name using “.” as separator

If used as such, EViews will replace (conveniently) the dots by underscores (“_”).

The following columns contain the data, starting in 1960.

88
The list of the countries and series are given in annex, as separate Excel tables.

Obviously, many of the observations are missing, sometimes entirely.

5.3.3 CREATING A SINGLE COUNTRY WORKFILE

If your purpose is to model a single country, no true programming is necessary.

You just have to:

• Locate the country series in the page “Data” of the WDI data set.
• Copy the set in a separate Excel file.
• Create a one-page EViews workfile.
• Import the data into the page by menu or program.

In addition, you can attach the definitions to the series, using the elements of column C of the same page. One can use
the first 1431 cells, which apply to all countries. This calls for a little editing.

5.3.4 WORKING ON SEVERAL (OR ALL) COUNTRIES)

We will now present our method to produce a set of EViews workfiles, in which each page contains the whole set of
1341 WDI series associated with a single country or group.

Each page will:

• Use the name of the three-letter acronym for the country or group, used by the World Bank.
• Contain all the WDI series for that country or region.
• Give access to the definition of the variables (both short and long ones).

This is what we have done, and the provided programs will do. As we have considered that the size of a complete file
was too big, we divided the countries into 8 regions, following the World Bank’s own partition (column H of sheet
Countries).

1 : EAP : EAST ASIA AND PACIFIC

2 : ECA : EUROPE AND CENTRAL ASIA

3 : LAC : LATIN AMERICA COUNTRIES

4 : MEA : MIDDLE EAST AND AFRICA

5 : NAM : NORTH AMERICA AND MEXICO

89
6 : SAS : SOUTH ASIA

7 : SSA : SUB SAHARIAN AFRICA

8 : OTH : OTHERS (such as OECD members…)

In addition, the region groupings (like the page for South Asia as a whole) will be found in the corresponding regional
data set.

The list of countries and sub-groups will be found in the related annex.

5.3.4.1 The method

Although this is not really needed (after all, the files are there), we shall briefly describe the method.

• We create a file called indic.xls with 263 lines (for each country or group) and two columns: the acronym and
the number. We read it as a matrix.
• We modify the original World Bank WDI Excel file by separating the “Data” sheet into 6 parts, to meet a
constraint on the number of lines. At this moment, only 65536 (=2 16) lines can be read, which means 45
countries. Our pages will contain only 40.
• We chose a region.
• We create a workfile for the region.
• We run an EViews program which checks for the presence of the region number, in each of the 6 pages in
sequence.
• If the acronym meets the number in indic:
o We create a page with the acronym name.
o We read the following 1343 series into the page.
• We repeat the process until the end of the sheet is met.

This sequence is available to any user, after editing the program for any changes in the number of countries and
series.

5.3.4.2 the definitions

At the same time as a series is created, a full description is attached. It contains:

• A short description, useful for tables

• The topic of the series, useful for selections
• A longer description, to fully understand the concept
• The source, including the organization and the publication.

The four elements appear when a single series is displayed (using the “sheet” option).

The “Display+” option presents one series per line, including the short description. The other elements can be
displayed too, by right-clicking on the top bar, selecting “Edit columns” and clicking in the appropriate boxes (the
“Type” and “Last Update” columns can be dropped at the same time).

90
5.3.4.3 The problems

Although the World Bank provides a lot of information, sometimes in great detail, some very important elements are
not present. In particular, these elements are clearly requested if you want to build a general econometric model.

They are, in order of decreasing importance:

• Employment and wages.

These series are clearly required, as they enter the wage-price loop, the production function and the households and
firms accounts.

• Capital.

This is required too, but not readily available in most data banks. There are ways to compute it, depending on the
related information available.

• Intermediate consumption.

This is needed to compute total demand (which defines imports) and the production price (which defines the trade
prices and competitiveness).

• Housing investment.

This is a part of demand.

• Non-wage revenue of households.

This enters household revenue and influences consumption.

• Social contributions.

This affects the revenue of all agents, and the cost of labor (thus the value-added deflator and the capital labor ratio in
case of substitution).

91
5.3.4.4 The program

' A program for creating a workfile with one page

' for each of the "East Asia and Pacific" countries

cd "d:\eviews\__world_bank_2020"

' We close the workfiles

' This workfile contains the single page "countries" with artificial series for all the acronyms (each with the "NA" value);
' This can be done easily using the following list

close wb_all
close wb_eap_2020

' We open the base workfile

' and save it as wb_eap (for "East Asia and Pacific"

open wb_all
save wb_eap_2020

' We include a file with a subprogram for creating the series characteristic (short definition, long definition, topic,
source)

include def_2020

' We create the acronym series again

pageselect countries
delete(noerr) *
group g_countries
for %1 ABW AFG AGO ALB ANO ARB ARE ARG ARM ASM ATG AUS AUT AZE BDI BEL BEN BFA BGD BGR BHR
BHS BIH BLR BLZ BMU BOL BRA BRB BRN BTN BWA CAF CAN CEB CHE CHI CHL CHN CIV CMR COD COG COL
COM CPV CRI CSS CUB CUW CYM CYP CZE DEU DJI DMA DNK DOM DZA EAP EAR EAS ECA ECS ECU EGY
EMU ERI ESP EST ETH EUU FCS FIN FJI FRA FRO FSM GAB GBR GEO GHA GIB GIN GMB GNB GNQ GRC GRD
GRL GTM GUM GUY HIC HKG HND HPC HRV HTI HUN IBD IBT IDA IDB IDN IDX IMN IND IRL IRN IRQ ISL ISR ITA
JAM JOR JPN KAZ KEN KGZ KHM KIR KNA KOR KWT LAC LAO LBN LBR LBY LCA LCN LDC LIC LIE LKA LMC LMY
LSO LTE LTU LUX LVA MAC MAF MAR MCO MDA MDG MDV MEA MEX MHL MIC MKD MLI MLT MMR MNA MNE
MNG MNP MOZ MRT MUS MWI MYS NAC NAM NCL NER NGA NIC NLD NOR NPL NRU NZL OED OMN OSS PAK
PAN PER PHL PLW PNG POL PRE PRI PRK PRT PRY PSE PSS PST PYF QAT ROU RUS RWA SAS SAU SDN SEN
SGP SLB SLE SLV SMR SOM SRB SSA SSD SSF SST STP SUR SVK SVN SWE SWZ SXM SYC SYR TCA TCD TEA
TEC TGO THA TJK TKM TLA TLS TMN TON TSA TSS TTO TUN TUR TUV TZA UGA UKR UMC URY USA UZB VCT
VEN VGB VIR VNM VUT WLD WSM XKX YEM ZAF ZMB ZWE

if @isobject(%1)=0 then
genr {%1}=na
g_countries.add {%1}
endif
pagedelete(noerr) {%1}
next

' We read the file indic

' creating a vector with the category for each acronym
' from to 1 to 8

' Here 1 = East Asia and Pacific

' This file can be adapted to any case

' For instance using 1 for relevant countries and 0 for others
' one can restrict the workfile to a subset

vector(263) indic
indic.read(type=excel,b1,s=indic) indic.xls

92
' We

!j=1

' The number of countries and groups

!n=263

' The number of countries in each Excel page (1 to 7)

!p=40
!l=1
while !n>0

' The starting ilne for a country's series

' We initialize by “2” if there is an items definition line
‘ 1 otherwise

!k= 2

' We process the "p" next countries

' in page number "j"

' "l" is the number of the country

' and %1 the associated acronym

for !i=1 to !p
pageselect countries
%1=g_countries.@seriesname(!l)

' We delete the page preventively

pagedelete(noerr) {%1}

' If the country belongs to the set

if indic(!l)=1 then

' We create the page

pagecreate(page={%1}) a 1960 2020

' We read the 1431 series from the EXcel sheet number "j"
' starting in cell e{!k}

smpl 1960 2020

read(type=excel,e{!k},s=Data{!j},t) wdiexcel20.xls 1431

' Once data is read we associate definitions

call def

' We close the page

close {%1}
endif

' If the country does not belong we do nothing

pageselect countries

' We increment the starting line in the sheet

93
!k=!k+1442

' and the number of countries treated (relevant or not)

!l=!l+1
next

' We increment the sheet number

!j=!j+1

' "n" is he number of remaining countries

!n=!n-!p

' If it is lower than the number of countries in a sheet

' we have reached the end of the countries set
if (!n<!p) then

' The number of remaining countries is now "n"

!p=!n
endif
wend

' We save the workfile

wfsave wb_eap_2020

5.3.5 THE ALTERNATE SOURCES

The simplest and most accurate way to fill the gaps in the WDI data is the access to an alternate source. We shall
concentrate on the first case, clearly the most important.

Wages and employment can be obtained from:

5.3.5.1 The International Labor Organization.

This institution (www.ilo.org) looks like the most logical source. It does provide information for 234 countries (almost
all countries in the world) and 433 indicators.

There are two main options:

• Downloading a whole set of data, using “Bulk download”.

94
The explanations are given in the last file:

https://www.ilo.org/ilostat-files/Documents/ILOSTAT_BulkDownload_Guidelines.pdf

• Downloading the data through Excel.

If you are currently accessing the ILO data by Internet, the Excel menu will be modified automatically to include a
“ILOSTAT” item, as follows (sorry for the French):

As you can see, you can access data for a country or a subject.

If you chose “Subject” you get the following menu:

95
which leads to the menu:

You can choose the time period and periodicity:

This will start a download for all available series, for all countries for which at least one series is present.
96
The results can appear as:

5.3.5.2 The OECD (Organization for Economic Cooperation and Development.

First, access to this information requests registration and logging in, at least in some cases. But this is free, and advises
you periodically (not too often) on various updates. And by stating your preferences, you can direct this information.

This is a very interesting source (stats.oecd.org). It provides data from 2000 to 2018, sometimes 2019 (at present).

The only drawback is the limitation to the 40 following countries:

(excluding for instance all African countries except South Africa).

After logging in, you will have access to the following menu, with the Economic Outlook the most interesting elements
formacromodelers.

You can also access it directly busing:

https://stats.oecd.org/viewhtml.aspx?datasetcode=EO107_INTERNET_2&lang=en

97
Using “Customize” you can decide on the countries, the topics and the time span.

Then you can ask from the selected elements to be downloaded as and Excel or CSV file.

Remember that to read a CSV file in Excel you should not try to open it directly, but rather open a blank sheet and use:

Data >Obtain data>From file>From Text/CSV.

Select the OECD EO107 (or later) downloaded file.

and select “Comma” as a delimiter (“Virgule’ in French).to organize the information into colums.

98
5.3.5.3 The United Nations, in particular the Economic Commission for Europe.
The site:

https://w3.unece.org/PXWeb/en

gives you access to the following menu:

If you chose “National Accounts” you get:

99
The countries are (63 including groupings):

European Union-28, Euro area-19, EECCA, CIS-11, North America-2, UNECE-52, Western Balkans-6, Albania, Andorra,
Armenia, Austria, Azerbaijan, Belarus, Belgium, Bosnia and Herzegovina, Bulgaria, Canada, Croatia, Cyprus, Czechia,
Denmark, Estonia, Finland, France, Georgia, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Kazakhstan,
Kyrgyzstan, Latvia, Liechtenstein, Lithuania, Luxembourg, Malta, Monaco, Montenegro, Netherlands, North
Macedonia, Norway, Poland, Portugal, Republic of Moldova, Romania, Russian Federation, San Marino, Serbia,
Slovakia, Slovenia, Spain, Sweden, Switzerland, Tajikistan, Turkey, Turkmenistan, Ukraine, United Kingdom, United
States, Uzbekistan.

Once you have made your selection, you can save it in different format (through “Save as”), the most practical being
probably Excel .xlsx.

100
5.3.5.4 The International Monetary Fund.
After signing in (free) on the site:

https://www.imf.org/en/Data

you are given access to the following files, for 193 countries and 11 groups:

101
The reference will be created as something like:

https://data.imf.org/?sk=388DFA60-1D26-4ADE-B505-A05A558D9A42&sId=1479329328660

and propose the files (which we will not present individually):

You can ask for the download of a given file. It will be sent using you registered e-mail address, for instance as:

If you want data for a specific country (like Gabon) and table (like the Balance of Payments), you can use:

102
………

which you can export to Excel (not CSV!) using the corresponding menu item.

103
5.3.6 BACK TO OUR EXAMPLE

Now that we know the principles, let us see how to apply them to the case we have defined earlier. To avoid switching
between distant pages, we shall repeat its presentation.

1. In the example, our economist has decided to build a very simple model of a country’s economy, which includes
the following elements: Based on their production expectations and productivity of factors, firms invest and hire
workers to adapt productive capacity. However, they exert some caution in this process, as they do not want to be
stuck with unused elements.
2. Productive capital grows with investment but is subject to depreciation.
3. The levels actually reached for labor and capital define potential GDP.
4. They also need intermediate products (proportional to actual GDP), and adapt inventories, from the previous
level.
5. Households obtain wages, based on total employment (including civil servants) and a share of Gross Domestic
Product. They consume part of this revenue and invest another (in housing).
6. Final demand is the sum of its components: consumption, productive investment, housing investment,
inventories, and government demand. Total demand includes also intermediate consumption. Final and total demand
are the sum of their components
7. Imports are a share of local total demand, final or intermediate. But the fewer capacities remain available, the
more imports will be called for.
8. Exports follow world demand, but the priority of local firms is satisfying local demand. They are also affected by
capacity constraints.
9. Supply is equal to demand.

104
We have voluntarily kept the framework simple, as our purpose is only explanatory at this time. However, the model
we are building has some economic consistency, and can actually represent the nucleus for further extensions which
we shall present later.

We shall also suppose that the following data is available in an Excel file called FRA.XLS, selected from OECD’s
Economic Perspectives data set. Series are available from the first quarter of 1962 to the last of 2010. However, the
set contains a forecast, and the historical data ends in 2004.

A note: the reason for using older data is not laziness in updating the statistics. The period we are going to consider is
the most interesting as it includes years of steady growth (1962 to 1973), followed by uncertainty following the first oil
shock. This justifies too using French data: the main point here is using actual data for a non-exceptional country.
When we move to a more operational case our data set will include later periods.

The reason for the “FRA” prefix is to identify series for France in a large set of countries, representing all the OECD
members as well as some groupings.

They use the following units:

Values: Euros

Deflators: base 100 in 1995.

Volumes (or quantities): Millions of 1995 Euros

Populations: persons

FRA_CGV Government Consumption, Volume

FRA_CPV Private Consumption, Volume
FRA_EG Employment, Government
FRA_ET Total Employment
FRA_CGV Government Consumption, Volume
FRA_CPV Private Consumption, Volume
FRA_EG Employment, Government
FRA_ET Total Employment
FRA_FDDV Final Domestic Demand, Volume
FRA_CGV Government Consumption, Volume
FRA_CPV Private Consumption, Volume
FRA_EG Employment, Government
FRA_ET Total Employment
FRA_FDDV Final Domestic Demand, Volume
FRA_GAP Output Gap
FRA_GDPTR Potential Output, Total Economy at Current Prices
FRA_GDPV Gross Domestic Product (Market prices), Volume
FRA_IBV Gross Fixed Cap Form, Business Sector, Volume(Narrow Definition)
FRA_ICV Intermediate consumption, Volume
FRA_IGV Government Investment, Volume
FRA_IHV Investment in Housing, Volume
FRA_ISKV Increase in stocks. volume
FRA_KBV Capital Stock, Business
FRA_MGSV Imports Goods and Services, N.A. Basis, Volume
105
FRA_PCP Deflator, Private Consumption
FRA_TDDV Total Domestic Demand, Volume
FRA_WSSS Compensation of Employees
FRA_XGSV Exports Goods and Services, N.A. Basis, Volume
FRA_XGVMKT Exports Goods and Services, Market Potential, Volume
FRA_YDRH Real Household Disposable Income

Applying the principles, we have defined above calls for:

• Creating the model specifications

• Identifying the variables in the model.
• Separating them into endogenous and exogenous.
• Writing down the full identities.
• Establishing each behavioral equation as an identity, presenting in the simplest way the variable it defines,
and the explanatory elements.
• Creating the associated series, from the available data.
• Transferring the elements already available into model series using the names allocated to them.
• Specifying formulas computing the remaining elements
Now that we have obtained the data, we can move to the two tasks: transform it to fit the model needs, start
specifying the model equations.

It should be clear that this will have to be done through a set of stored statements in a readable language (a program).
This option will allow:

• Establishing an apparently consistent set of statements, which can be controlled visually.

• Locating errors and introducing corrections as simply and clearly as possible.
• Storing subsequent versions, including the last and most correct one, until a satisfying version is established.
• Replicating this process with the smallest amount of work.
• Displaying the steps in the process as clearly as possible, introducing comments.
• Once a satisfying stage has been reached, memorizing the actions for later use (especially if the modeling
project faces breaks, short or long).
• Allowing external users to master the current state of operations, to evaluate the present stage of
development of the project.
The program can be inserted with comments, making the sequence of tasks and the role of individual commands
clearer, and allowing to warn of the presence of local problems and the way they have been processed. This is
especially useful for a team project, for which the name of the author should also be included.

Under EViews, two other methods are available:

• Using a sequence of menu and sub-menu functions,

• Typing commands without saving them, directly from the command window.
These two methods fail on all criteria. The record of the tasks is not available, which means errors are difficult to
detect. Reproducing the task, whether to correct errors or to update specifications or data, calls for a new sequence of
menu selections or typed statements41.

41
However, one can copy the sequence of statements entered in the command window into a program file.
106
The obvious choice is even comforted by three features provided by recent versions:

• You can run part of a program, by selecting it with the mouse (in the usual Windows way), clicking on the
right button, and choosing “Run Selected”.
This is generally more efficient than the previous method of copying the selected part into a blank program, and
running it. However, the new method does not allow editing, useful when one wants to run a selected AND modified
set.

• Symmetrically one can exclude temporarily from execution part of a program, by “commenting it out”. To do
this, one should select the relevant part, click on the right button, and choose “Comment Selection”. To
reactivate the statements, one should select them again and use “Uncomment Selection”. One can also type
a single quote (‘) before the statement.

This can be a little dangerous, especially if you (like myself) have the reflex of saving the program before each
execution. To avoid destroying the original, one can save first the modified program under another name42.

• Finally, one can ask a column of numbers to be displayed left of the program lines. This is particularly efficient
if you use the “Go To Line” statement43.

This is done by first unwrapping the command lines. Each command will use one line regardless of its length, which
can be a little annoying for very long ones. Then (and only then) one can ask for the line numbers.

• So actually, the only option is the one we proposed above: defining a program producing all the necessary
information, and the framework of equations which is going to use it. But the ordering of the tasks can be
questioned, as we have started explaining earlier. Until both are completed, the job is not done, but they are
technically independent: one does not need the physical model to create the data, or series filled with values
to specify the equations. This means that one can consider two extreme methods:

• Creating all the data needed by the model, then specifying the model.
• Specifying all the model equations, and then producing the associated data.
The criterion is the intellectual feasibility of the ordered sequence of tasks.

Clearly the first option is not realistic, as writing down the equations will surely evidence the need for additional
elements. The second is more feasible, as one does not need actual series to write an equation. But as the definition
of the equation processes, one has to check that all the addressed elements are or will be available in the required
form, either as actual concepts (GDP) or transformations of actual concepts (the budget deficit in GDP points calls for
the deficit and GDP series). If a concept appears to be lacking, one will have to: use an alternate available element (a
“proxy”), establish an assumption, look in alternate bases not yet accessed, or simply eliminate the element from the
model.

This shows that if producing both sets can be done in any order, there is a preference for specifying the equations
first, with a general knowledge on data availability. If the data set is not ready, but its contents are known, it is
possible to write down the equations and ask the software to proceed the text. The user will be told about possible

42
Only once of course.

43
However, you have to be careful to update the numbers when the program changes.

107
syntax errors, about the nature of the variables (endogenous / exogenous), and the architecture of his model. This will
lead to early model corrections, allowing to gain time and avoiding taking wrong directions later. And if the model
specifications are still discussed, it is possible to build a first version of the associated data set, which will be updated
when the model is complete.

In practice, especially in the simplest cases, one can also start defining the program with two blank paragraphs, and fill
them with data and equation creating statements until both are complete. The eight original paragraphs in our model
specifications can be treated one by one (not necessarily in the numerical order) filling separately the data and
equation generating blocks with the associated elements.

Actually, among the above proposals we favor two alternate techniques:

• Model then data: Specifying first the full model, checking that all elements used can be produced either
directly or through a formula. Then producing the full set of data, preferably through a direct transfer or a
transformation.
• Model and data: Producing the equations in sequence, or related block by related block, and establishing
simultaneously the statements which create all the series they need.

5.3.6.1 Application to our example

Let us now show on our example how the process can be conducted using the second method, probably more
adapted to such a small model (one cannot expect to switch between the two processes too many times).

We shall first present the process in general (non-EViews) terms, treating each case in sequence, and presenting both
the equations and the statements generating the associated variables. To make thinks clearer, the equations will be
numbered, and the creation statements will start with “>>”.

Also, the endogenous variable will use uppercase characters, the exogenous lowercase. This has no impact on
treatments by EViews, but will make interpretation clearer for the model builder and especially for his

(1) Based on their production expectations and productivity of factors, firms invest and hire workers.

This defines two behavioral equations for factor demand, in which employment (let us call it LE) and Investment
(called I) depend on GDP, called Q.

(1) LE=f(Q)

(2) I=f(Q)

We need:

>> IP=FRA_IBV

>> Q =FRA_GDPV

108
But for LE, we face our first problem. Private employment is not directly available. However, we have supposed that
total employment contained only public (government) and private. This means we can use:

>> LE=FRA_ET-FRA_EG

In another case, private and public employment could have been available, but not the total, which would have been
computed as a sum. This highlights the fact that computation and economic causality need not be related.

(2) Productive capital grows with investment but is subject to depreciation.

Capital K, measured at the end of the period, is defined by an identity. Starting from the initial level, we apply a
depreciation rate (called dr) and add investment. The equation is written as:

(3) K(t)= K(t-1).(1-dr(t)) + I(t)

Defining K at the end of the period would only change notations.

We need the data for K

>> K=FRA_KBV

And we get dr by inverting the formula:

>> dr=((K(-1) + IP) – K) / K(-1)

In other words, dr will be the ratio, to the initial capital level, of the difference between two levels of capital: the
value we would have obtained without depreciation, and the actual one.

(3) The levels actually reached define potential production.

Capacity (called CAP) depends on factors LE and K

109
(4) CAP(t)=f(LE(t), K(t))

It can be computed directly as:

>> CAP=FRA_GDPVTR

which rather represents a “normal” GDP value considering the present level of factors.

The direct availability of this concept as a series represents the best case, not often met in practice. Later in the text
we shall address the alternate techniques available in less favorable situations.

(4) They need inputs, and also build inventories.

Intermediate consumption can be defined as proportional to GDP, using the actual value. This means that at any level
of production, each unit produced will need the same amount of intermediary products.

(5) IC = r_icq . Q

For inventories, we will estimate its change:

(6) CI=f(Q)

For this, we need to compute:

>> IC=FRA_ISKV

>> r_icq=IC/Q (or FRA_ISKV/FRA_GDP)

>> CI=FRA_CIV

(5) Households obtain wages, based on total employment (including civil servants) and a share of Gross
Domestic Product. They consume part of this revenue.

110
Now we need to define total employment, by adding government employment (called lg) to LE.

(7) LT=LE+lg

The new series are obtained by:

>> LT=FRA_ET

>> lg=FRA_EG

Now we have to compute household revenue, which we shall call R_HI. We shall suppose that the same wage applies
to all workers, and that the non-wage part of Household revenue is a given share of GDP, a series called r_rhiq. This
gives:

(8) RHI = wr . LT + r_rhiq . Q

Actually the above assumption, while simplistic, is probably not too far from the truth. The sensitivity to GDP of the
elements included in this heterogeneous concept can be low (such as pensions, or interests from long-term bonds),
high (the revenue of small firm owners, with fixed costs and variable output), or medium (self-employed working in
the same capacity as wage earners).

Household consumption is given by applying to RHI the complement to 1 of a savings rate which we shall call sr. For
the time being, the savings rate is exogenous:

(9) CO = RHI . (1 – sr)

Housing investment is also a share of RHI, which we shall call r_ih.

(10) IH= r_ih . RHI

The new variables are RHI, wr, r_rhiq, sr, IH and r_ih.
111
RHI is given simply by:

>> RHI = FRA_YDRH

Let now compute the real wage rate wr. This is done through the following computation.

Dividing FRA_WSSS by FRA_ET gives the individual nominal value, which we divide again by FRA_CPI/10044 to get the
real value45.

>> wr = (FRA_WSS/FRA_ET)/(FRA_CPI/100)

(parenthesizes are added for clarity).

r_rhi will be obtained as the ratio to GDP of household revenue minus wages

>> r_rhi = (RHI – wr . LT) /Q

44
The OECD deflators are measured as 100 in the year 1995. This means that 1995 the average of values and volume
is the same.

2.98E+11

2.97E+11

2.96E+11

2.95E+11

2.94E+11 GDP at current prices

GDP at constant prices
2.93E+11

2.92E+11
1995q1 1995q2 1995q3 1995q4

45
Considering the above list of available series, one can observe other optiosn aren possible.

112
Consumption and housing investment will be obtained directly:

>> CO=FRA_CPV

>> IH=FRA_IHV

Computing the savings rate and r_ih will use the inversion of the associated equation:

sr =(RHI-CO)/RHI

sr=(FRA_YDRH-FRA_CPV)/FRA_YDRH

(savings divided by revenue)

>> r_ih=IH/RHI

>> FRA_IHV/FRA_YDRH

(6) Final demand is the sum of its components: consumption, productive investment, housing investment,
inventories, and government demand. Total demand includes also intermediate consumption.

(11) FD=IP+CO+IH+gd+CI

(12) TD = FD + r_ic . Q

113
We need to compute gd as the sum of FRA_IGV and FRA_CGV.

>> gd = FRA_IGV+FRA_CGV

>> FD = FRA_TDDV

>> r_ic = FRA_ICV/FRA_GDPV

(7) Imports are a share of local demand («domestic demand»). But the less capacities are still available, the
more an increase in demand will have to be imported.

This calls for:

(13) UR=Q/CAP

(14) M=f(FD+IC,UR)

We need to compute:

>> UR=Q/CAP (its definition)

>> M=FRA_MGSV

(8) Exports will essentially depend on World demand. But we shall also suppose that if tensions appear
(through UR) local firms will switch some of their output to local demand, and be less dynamic in their
search for foreign contracts.

(15) X=f(WD, UR)

We need:

>> X=FRA_XGSV

>> WD=FRA_XGVTR

114
(9) Supply is equal to demand.

The supply-demand equation will for the moment use the following implicit form:

(16) Q + M = FD + X

(all variable values are obtained earlier)

We can now reorder the framework of our model into the following elements:

[1] LE =f(Q)

[2] IP=f(Q)

[3] K= K-1 (1-depr) + IP

[4] CAP=f(LE, K-1)

[5] IC=r_icq . Q

[6] CI=f(Q)

[7] LT=LE+lg

[8] RHI = wr . LT + r_rhiq . Q

[9] CO = (1-sr) . RHI

[10] IH = r_ih . RHI

[11] FD = CO + IH + IP + CI + gd

[12] TD = FD + r_ic . Q

[13] UR = Q/CAP

[14] M=f(TD, UR)

[15] X=f(wd, UR)

[16] Q + M = FD + X

115
Endogenous variables

I Firms investment.

LE Firms employment.

K Firms (productive) capital

CAP Potential output

LT Total employment.

CI Change in inventories

IC Intermediate consumption

IH Housing investment.

CO Household consumption.

FD French final demand

TD French total demand

M French Imports.

RHI Household real income.

UR Rate of use of capacities

X French Exports.

Q Gross Domestic Product

exogenous variables

depr Depreciation rate of capital

gd State consumption and investment.

lg Public employment

r_ih Share of Housing investment in Household revenue.

r_rhiq Share of GDP transferred to Households, in addition to wages

wd World demand normally addressed to France.

116
r_icq Ratio of intermediate consumption to GDP

wr Real average wage rate

One observes:

o That we have indeed as many equations as variables to compute.

o That we have separated behavioral equations and identities.
o That accounting identities are completely defined.
o That on the other hand the form of behavioral equations is still vague, although the explanatory elements are
known (at least as a first guess).

This distinction is normal. As we have already indicated, identities generally represent a mandatory formal connection,
while conforming behavior equations to economic theory is not so restrictive.

• Computing formulas

By considering the formulas we have obtained, we can see that most of the data needed is available directly, so a
simple transfer should be enough. We might even have considered using the original names. But as our model will
apply only to France, there is no reason to keep the prefix, which helped to identify the French data inside a much
larger multi-country file. And one might decide (rightly in our sense) that our names are clearer.

The correspondences are:

Q = FRA_GDPV

CAP = FRA_GDPVTR

CI = FRA_ISKV

LT = FRA_ET

LG = FRA_EG

FD = FRA_TDDV

CO = FRA_CPV

RHI = FRA_YDRH

I = FRA_IBV

IH = FRA_IHV

WD = FRA_XGVMKT

X = FRA_XGSV

M = FRA_MGSV
117
Only eight elements are lacking, seven of them exogenous variables:

gd =FRA_IGV+FRA_CGV Government demand

UR =Q/CAP Rate of use of capacities (endogenous)

depr = ((K(t-1) + IP) – K(t))/K(t-1) Depreciation rate of capital

r_ic =IC/Q Ratio of intermediate consumption to GDP

r_ih =IH/RHI Share of Housing investment.

r_rhiq = (RHI – wr . LT) /Q Non-wage households revenue: share of GDP

sr =(RHI-CO)/RHI Savings rate

wr = (FRA_WSS/FRA_ET)/(FRA_CPI/100) Real average wage rate

In real cases, this kind of computation will be used often. One must be aware of one important issue:

The use of these formulas is logically distinct from the definition of model equations. The only reason we need them is
to produce the historical values of series not yet available. If the statisticians had made a comprehensive job (and if
they knew the requirements of the model) they would have provided the full set, and no computation would have
been necessary (just a changes in names).

So these two types of formulas have completely different purposes

• Applying the computation statements ensures that all the requested data is available. By associating
formulas to missing elements, they allow to produce the full set required for simulation and estimation. If the
data was already available in the right format, and the names given to the variables were acceptable, no
statement would be necessary. And one can check that in our case, most of the computations are actually
direct transfers, which allow to create a model element while retaining the original series.

Actually, one could question the necessity of having a full set of historical values for endogenous variables. These will
be computed by the model, which will be simulated on the future anyway. The reasons for producing a full set are the
following:

o Estimation will need all the elements in the associated equations.

o Controlling the consistency of identity equations with the data is a prerequisite before any simulation;
otherwise, we may start with a flawed model set.
o Checking that the model gives accurate simulations on the past will need all the historical elements.
o Many equations use lagged values. This requires actual values preceding the starting simulation date.

These formulas can include original data, transformed data computed earlier in the program, or simply assumptions.
For instance:

o GDP has been drawn directly from the original set.

o The depreciation rate is computed using the sequence of capital values, and investment.
o In the absence of other information, the target for inflation can be set to 2%.

118
• The model equations establish a logical link between elements, which will be used by the model to produce a
consistent equilibrium. This means that if the formula for computing variable A contains variable B, variable A
is supposed to depend on B, in economic terms.

This is obviously true for estimated equations. For instance, the wage rate can depend on inflation, or exports on
world demand. But this is also true for identities:

Household revenue is the sum of its elements. If one grows, revenue changes in the same way (ex-ante, of course).
Basically, we suppose that some behaviors apply in the same way to every element of revenue, whatever its source.

If household consumption is estimated, savings are the difference between revenue and consumption.

It is extremely important to understand this issue, at the start of any modeling project.

It is quite possible however that the same formula is present in both sets. For instance, we might not have values for
FD, and we believe that CO, I, IH and gd represent the whole set of its components. In this case the formula:

FD = CO + IP + IH + gd

will be used both to compute historical values of FD and to define FD in the model.

This introduces an obvious problem: if we make a mistake in the formula, or we use the wrong data, there is no way to
detect it.

119
5.3.6.2 The EViews program

Let us now consider how the above task can be produced. We want to create:

• A workfile for all model elements

• An image of the model, with fully defined identities, and indications as the intended estimated equations,
• The associated data.

5.3.6.2.1 The workfile

• First, we need a work file. In EViews, all tasks are conducted in memory, but they apply to the image of a file
which will contain all the elements managed at a given moment.

We can create the file right now (as a memory image) or start from a pre-existing one, in which case the file will be
transferred from its device into memory.

Some precautions have to be taken.

o First, only one version of the file must be open in memory. As we state elsewhere, EViews allows the user to
open a second version (or even third, and fourth…) of a file already opened. Then changes can be applied
only to one of the memory versions, such as series generation and estimations.

This is obviously46 very dangerous. At the least, one will lose one of the set of changes, as there is no way to transfer
elements from an image to the other. Of course, each file can be saved under a different name, but this does not
allow merging the changes47. At the worst, one will forget the allocation of changes to the files, and one or both will
become inconsistent, the best option being to avoid saving any of them, and to start afresh.

This means one should:

• In command mode, check that no file of the same name is opened, and close it if necessary.
• In program mode (the case here) make sure that no file is open at first. This calls for an initial “CLOSE”
statement, which will not succeed most of the time 48 but will guarantee that we are in the required situation.
o Second, a new project must start from a clean (or empty) workfile. For an initial file to contain elements is at
best confusing, at worst dangerous. For instance, series with the same name as elements in our project can
already be present with a different meaning (GDP for a different country?), and available for a larger period.
Allowing EViews to estimate equations over the largest period available will introduce in the sample
irrelevant values.

A simple way to solve the problem is to delete any existing element, through the statement:

46
This is only a personal opinion.

47
Providing this option does not look impossible.

48
With fortunately no error message.

120
DELETE *

which will destroy any pre-existing item, except for the generic C (generic vector of coefficients) and RESID (generic
series of residuals) which are created automatically with the work file and cannot be deleted.

There is only one acceptable case for pre-existing elements: if the work file contains some original information,
provided to the user by an external source. But even in this case the file has to be saved first, to allow tracing back the
steps to the very beginning in which only this original information was present, in its original form.

In any case, in EViews, the possibility to define separate pages (sheets) inside the work file solves the problem. As we
have seen earlier, one can just store the original data in one page and start building the transformed data in a blank
one, logically linked to the original.

First principle of modeling: always organize your work in such a way that if step n fails, you can always get back to
the result of step n-1.

First principle of modeling (alternate version): Always organize your programs in such a way that you can produce
again all the elements associated with the present situation.

This (long) discourse leads to the following statements:

CLOSE small

WFCREATE(page=model) small Q 1970Q1 2005Q4

DELETE *

Applying them guarantees:

• That the file small.wf1 is open in memory with the needed characteristics, for a page called “model”.

• That only one version is open (provided no more than one was open previously, of course, but we shall
suppose you are going to follow our suggestions).
• That the page is empty (actually it contains only C and RESID).

5.3.6.2.2 The data

Now that we have a work file, we must fill it with the necessary information.

The original information is represented by the 72 series in the FRA.XLS49 Excel file. We shall import them using the
IMPORT statement. This statement is quite simple (see the User’s manual for detailed options):

49
EViews allows also to read Excel 2010 .xlsx files (but not to produce them).

121
READ fra.xls 72

But beware: even if the Excel file contains dates (in the first column or line) this information is not taken into account.
What is used is rather the current sample, defined by the last SMPL statement. Fortunately, in our case, the current
sample, defined at workfile creation, is the same as the one in the Excel file. But this will not always be the case:
better to state the SMPL before the READ.

SMPL 1962Q1 2010Q4

READ fra.xls 72

Second principle of modelling: if introducing a (cheap) statement can be useful, even extremely seldom, do it now.

One also has to be careful about the orientation of series: normally they appear as columns, and data starts from cell
B2 (second line, second column). Any other case has to be specified, as well as the name of the sheet for a multi-sheet
file.

An alternate (and probably better) option

If the follow the above method, all the data will be transferred to the “model” page. This makes things easier in a way,
as all information will be immediately available. But

• The separation between original and model data will not be clear.

• The stability of the original data is not guaranteed.

• As the set original series is probably larger, most of the screen will be occupied by elements no longer useful.
Of course, one can separate original and model data by using a prefix for the first type. But it is even better to
separate the two sets physically. This can be done through the “link” EViews function.

Instead of loading the original series in the model page, a specific page is created (named for instance “oecd”) in
which the data is imported.

Then in the model page the model variables are declared as “linked”, and a link is defined with the original series in
the “OECD” page.

The associated syntax will be presented later.

5.3.6.2.3 The model

• Now, we need to define the model on which we shall work. Producing a model starts with the statement:

MODEL modelname
122
Let us call our model _fra_1.

A trick: starting the name of important elements by an underscore allows them to be displayed at the upper corner of
the workfile screen, avoiding a tedious scrolling if the number of elements is large. For very important elements (like
the model itself) you can even use a double underscore.

The statement

MODEL __fra_1

defines __fra_1 as the “active” model.

Two cases can be considered:

o The model does not exist. It is created (with no equations yet).

o The model exists. It is opened, with its present equations.

The second option is dangerous in our case, as we want to start from scratch. To make sure of that, the most efficient
(and brutal) technique is to delete the model first, which puts us in the first case.

DELETE _fra_1

MODEL _fra_1

This introduces a slight problem, however. In most cases (including right now) the model does not exist, and the
DELETE statement will fail. No problem, as what we wanted is to make sure no model preexisted, and this is indeed
the situation we obtain. But EViews will complain, as it could not perform the required task. And if the maximum
number of accepted errors is 1 (the default option) the program will stop.

o It is better to specify the “noerr” option, which accepts failure of the statement without error message.

DELETE(noerr) _fra_1

MODEL _fra_1

o We can change the default number of accepted errors.

123
Another way to avoid this situation is obviously to set the maximum number of errors to more than 1. This is done by
changing the number in the “Maximum errors before halting” box in the “Run program” menu. If you want this option
to apply to all subsequent runs, you have to tick in the “Save options as default” box.

Actually, if you have followed the principle above, there is no risk in proceeding in a program which produced error
messages, even valid ones. You have saved the elements associated to the initial situation, and even if you forgot to
do that, you can always repeat the steps which led to it.

The advantage of this option:

o The program will continue after irrelevant error messages.

o You can produce artificial errors, which can be quite useful as flags50.
o The messages can be associated to several logically independent errors, which can be corrected
simultaneously, leading faster to a correct version.

Now, which number should we specify? In my opinion, depending on the model size, from 1000 to 10000. The number
has to be higher than the number of potential errors, as you want to get as close as possible to the end of the
program. Of course, you will never make 10000 logical errors. But the count is made on the number of error
messages. And in a 2000 equations model, if you have put all the endogenous to zero and you compute their growth
rates, this single mistake will generate 2000 messages.

The only drawback is that if your program uses a loop on the number of elements of a group, and this group could not
be created, the loop will run indefinitely with the message:

Syntax error in "FOR !I=1 TO G.@COUNT"

You will have to wait for the maximum number to be reached.

50
The message associated with a real error will locate it between the preceding and following artificial errors.

124
• Introducing the equations.

Now that we have a blank model, we can introduce the equations one by one. The text of these equations has already
been defined; we just need to state the EViews commands.

This is done through the APPEND statement.

The first one will define investment:

_fra_1.append IP=f*(Q)

Clearly the syntax

o Contains the statement “append”

o Uses the model name on the left, with a dot.
o Adds the text of the equation on the right, with a separating blank.

We must now explain the syntax of our equation.

o At this moment, we expect the model to explain the decision on investment by the evolution of GDP. This
seems quite logical, but we have not decided between the possible forms of the theoretical equation, and we
have not checked that at least one of these equations is validated by all required econometric tests.
o But at the same time we want EViews to give us as much information as possible on the structure of our
model: simultaneities, exogenous parts...
o The best compromise is clearly to produce a model which, although devoid of any estimated equation,
nevertheless presents the same causal relationships as the (future) model we consider.

The simplest choice should be, as if we were writing model specifications in a document or on a blackboard, to state:

IP=f(Q)

Unfortunately, EViews does not accept an equation written in this way. It will consider we are using a function called f,
with the argument Q. As this function does not exist, the equation will be rejected.

The trick we propose is to put an asterisk between “f “and the first parenthesis, which gives

IP=f*(Q).

And state f as a scalar (to avoid confusion with an additional exogenous).

125
If more than one explanatory variable is used, such as in the productive capacity equation, we would like to write:

CAP=f*(LE,K)

Again, this is not accepted by EViews, and we can write instead:

CAP=f*(LE+K)

One just has to state his conventions, and you are welcome to use your own.

However, dropping the f is dangerous, such as in:

M=FD+TD

This will work too, but the equation can be confused with an actual identity, quite misleading in this case.

The complete set of equation statements is:

_fra_1.append LE =f*(Q)

_fra_1.append I=f*(Q)

_fra_1.append K = K(-1)*(1-depr) + I

_fra_1.append CAP=f*(LE+ K)

_fra_1.append IC=r_icq * Q

_fra_1.append CI=f*(Q)

_fra_1.append LT=LE+lg

_fra_1.append RHI = wr * LT + r_rhiq .*Q

_fra_1.append CO = (1-sr) * RHI

_fra_1.append] IH = r_ih * RHI

126
_fra_1.append FD = CO + IH + IP + CI + gd

_fra_1.append TD = FD + r_ic * Q

_fra_1.append UR = Q/CAP

_fra_1.append M=f*(TD+UR)

_fra_1.append X=f*(wd+UR)

_fra_1.append Q + M = FD + X

They produce a 16 equations model called _fra_1. After running the statements, an item will be created in the
workfile, with the name “_fra_1” and the symbol “M” (in blue).

Double-clicking on this item will open a special window, with the list of equations:

• Text (with the icon ”TXT” on the left),

• Number (in the order of introduction in the model).
• Economic dependencies: the dependent variable on the left, the explanatory on the right, using actually the
syntax we could not apply earlier. Lags are not specified, as we shall see later. So K is presented as depending
on K.
Actually, three other display modes are available, using the “View” button:

• Variables: shows the variables (endogenous in blue with “En”, exogenous in yellow with “X”). For the
endogenous, the number of the equation is given. This allows locating the equation in the model text, which
is useful for large models.

The “dependencies” button gives access to a sub-menu, which allows to identify the variables depending on the
current one (Up) and which influences it (Down).

For instance, for FD, “Up” will give TD and Q, “Down” will give CO, I, G, and IH

Of course, exogenous will only be allowed the “Up” button.

The “Filter” option will allow selecting variables using a specific “mask”. For instance, in a multi-country model the
French variables can be identified with FRA_*, provided one has used such a convention.

• Source text: this is basically the text of the model code. We shall see that this changes with estimated
equations.
• Block structure: this gives information on the logical structure of the model (detailed later in Chapter 7).
We get:

o The number of equations.

o The number of blocks, separated into simultaneous and recursive.
o The contents of each block.

127
For the time being, let us only say that a simultaneous block contains interdependent elements. For any couple of
elements in the block, a path can lead from the first to the second, and vice-versa. Of course, this property does not
depend on the ordering of equations inside the block.

EViews gives also number of feedback variables (this will be explained later too).

On the contrary, a recursive block can be ordered (and EViews can do it) in such a way that each variable depends only
(for the present period) on previously defined variables.

This information is useful to improve the understanding of the model, to locate inconsistencies and to correct
technical problems.

EViews can detect errors if:

o A variable is defined twice

o The syntax of an equation is wrong (a parenthesis is lacking for instance)

and allow the user to observe errors himself if:

o Normally endogenous elements appear as exogenous: the equation for the variable has been forgotten, or
written incorrectly.
o Elements foreign to the model appear: variables have been misspelled.
o A loop appears where there should be none.
o Or (more likely) an expected loop does not appear: for instance a Keynesian model is described as recursive,
or a model for two countries trading with each other can be solved as two independent blocks.

All these errors can be detected (and corrected) without calling for the data. This can speed up the building process,
especially if the data is not yet produced.

For the production of series, there are two options.

If the original and model series share the same page, one will simply use the “genr” statement, in the sequence.

genr Q=FRA_GDPV

genr CAP= FRA_GDPVTR

genr CI =FRA_ISKV

genr IC=FRA_ICV

genr LT =FRA_ET

genr LG =FRA_EG

genr FD = FRA_TDDV

genr CO =FRA_CPV

genr RHI =FRA_YDRH

128
genr IP = FRA_IBV

genr IH = FRA_IHV

genr WD = FRA_XGVMKT

genr GD = FRA_IGV+FRA_CGV

genr X =FRA_XGSV

genr M =FRA_MGSV

genr r_icq= FRA_IC/FRA_Q

genr r_ih=IH/RHI (or FRA_IHV/FRA_YDRH)

genr r_rhiq=(RHI-WR*LT)/Q

genr sr=(RHI-CO)/RHI

genr UR=Q/CAP

genr wr=FRA_WSSS//FRA_LT (FRA_PCP/100)

genr rdep=((K(-1)+IP)-K)/K(-1)

If the original series are managed in their own page (a better option in our opinion), one will use:

for %1 Q CAP CI IC LT LG FD CO RHI I IH WD X M

link {%1}

Q.linkto oecd\FRA_GDPV

CAP.linkto oecd\FRA_GDPVTR

CI.linkto oecd\FRA_ISKV

IC.linkto oecd\FRA_ICV

LT.linkto oecd\FRA_ET

LG.linkto oecd\FRA_EG

FD.linkto oecd\FRA_TDDV

CO.linkto oecd\FRA_CPV
129
RHI. oecd\FRA_YDRH

I.linkto oecd\FRA_IBV

IH.linkto oecd\FRA_IHV

WD.linkto oecd\FRA_XGVMKT

X.linkto oecd\FRA_XGSV

M.linkto oecd\FRA_MGSV

GD .linkto oecd\FRA_IGV+FRA_CGV

However, a problem remains for GD, the sum of the two original variables FRA_IGV and FRA_CGV. The LINK
function allows to refer to single variables and not functions (as Excel does). Until EViews 8 you had two options.

Creating links to the original elements in the model page.

LINK FRA_IGV

LINK FRA_CGV

FRA_IGV.linkto oecd\FRA_IGV

FRA_CGV.linkto oecd\FRA_CGV

genr gd=FRA_IGV+FRA_CGV

Or computing a FRA_GDV variable in the original page.

genr FRA_GDV=FRA_IGV+FRA_CGV

And linking it

LINK GD

GD.linkto oecd\FRA_GDV

130
But it is also possible to refer to variables in a different page, as

page_name\variable name

This means you can use the much simpler method :

genr GD= oecd\FRA_IGV+oecd\FRA_CGV

Of course, the same method could have been used for single variables.

genr Q=oecd\FRA_GDPV

genr CAP=oecd\FRA_GDPVTR

genr CI=oecd\FRA_ISKV

genr IC=oecd\FRA_ICV

genr LT=oecd\FRA_ET

genr LG=oecd\FRA_EG

genr FD=oecd\FRA_TDDV

genr CO=oecd\FRA_CPV

genr RHI. oecd\FRA_YDRH

genr I=oecd\FRA_IBV

genr IH=oecd\FRA_IHV

genr WD=oecd\FRA_XGVMKT

genr X=oecd\FRA_XGSV

genr M=oecd\FRA_MGSV

genr GD =oecd\FRA_IGV+oecd\FRA_CGV

131
it all depends on if you want changes in the original series to be applied automatically, or to control the process
through GENR51. But if the series is not present (like GD) in the original data, a GENR statement is called for anyway.

Now we have produced a first version of the model, and the associated data. As the behaviors have not been
established, we obviously cannot solve it. But we can check two important things:

o The data required for estimation is present.

o The data is consistent with the identities.

These conditions are needed to start estimation, the next stage in the process. The first one is obvious, the second less
so. But inconsistencies in identities can come from using a wrong concept for a variable, of computing it wrongly. If
this variable is going to be used in estimation, whether as dependent or explanatory, the whole process will be based
on wrong elements.

o The time spent in estimation will be lost.

o This time will probably be longer than usual, as it is generally more difficult (sometimes impossible) to find a
good fit based on wrong data (fortunately?).
o If a good fit is found, the associated equation can remain in the model for a long time (if not indefinitely), and
all the subsequent results will be invalidated. If one is honest, discovering the error later means that a lot of
work will have to be done again, including possibly published results.

This test can be conducted through a very simple technique: the residual check

5.3.6.3 A first test: checking the residuals in the identities

At this point, asking for a solution of the model cannot be considered. However, some controls can be conducted,
which do call for a very specific “simulation”. This technique is called “residual check”.

This method will compute each formula in the model using the historical values of the variables. This can be done by
creating for each equation a formula giving the value of the right-hand side expression (using the GENR statement in
EViews). However, there is a much simpler method, provided by EViews.

If we consider a model written as:

𝑦𝑡 = 𝑓(𝑦𝑡 , 𝑦𝑡−1 , 𝑥𝑡 , 𝛼̂)

with y and x the vectors of endogenous and exogenous variables.

We can perform a very specific “simulation”, in which each equation is computed separately using historical values.

Technically this means:

• Breaking down the model into single equation models, as many as there are equations.

51
Of course, this will also increase the size of the workfile.

132
• Solving each of these models at the same time but separately, using as explanatory values the historical ones.
If we call these historical values 𝑦𝑡0
It means we shall compute:

0
𝑦𝑡 = 𝑓(𝑦𝑡0 , 𝑦𝑡−1 , 𝑥𝑡 , 𝛼̂) + 𝑒𝑡

This method will control:

• For identities, the consistency between data and formulation.

• For the behavioral equations, the availability of the variables requested by the contemplated estimations. But
one gets no numerical information (actually the method we are proposing will give a zero value).
Actually, EViews allows the use of an expression on the left-hand side. This applies also here; the comparison being
made between the left and right expressions.

The interest of this method is obvious: if the residual in the equation is not zero, it means that there is at least one
error in that particular equation. Of course, the problem is not solved, but its location is identified. We shall see later
that this method is even more efficient for a fully estimated model, and we shall extend our discussion at that time.

It would be illusory, however, to hope to obtain a correct model immediately: some error diagnoses might have been
badly interpreted, and corrections badly performed. But even if the error has been corrected:

• There could be several errors in the same equation

• The correcting process can introduce an error in another equation that looked previously exact, but
contained actually two balancing errors. Let us elaborate on this case.
Let us consider our example. If we had used for housing investment the value at current prices:

genr IH=FRA_IH

Then the equation for FD

_fra_1.append FD = CO + IH + IP + CI + gd

will not hold true, but the one for IH

_fra_1.append IH = r_ih . RHI

133
will, as the computation of r_ih as the ratio of IH to RHI will compensate the error by another error.

If we correct the error on IH without correcting r_ih, the IH equation will now appear as wrong, while its actual
number of errors has decreased from 2 to 1.

This means achieving a set of all zero residuals might take a little time, and a few iterations, but should converge
regularly until all errors have disappeared52.

5.3.6.3.1 The types of error met

The residual check allows diagnosing the following errors

• Failure to solve

o syntax error (call to a non-existent function, unbalanced parentheses).

o series with the right name, but unavailable, either completely (they have not been obtained), or partially
(some periods are lacking).
o bad spelling (call to a non-existent series)

• Non-zero residuals

o bad spelling (call to the wrong series).

o errors of logic. This can be more or less serious, as it can come from a purely technical error: forgetting a
term for example, or from a conceptual error: stating an unverified theoretical identity.
o data error: badly entered information, badly computed series, information coming from non-coherent
sources, or from different versions of the same bank.

• Non-verified behavioral equations (or with erroneous residual). This issue will be applicable (and addressed)
later.

Observing error values can give clues as to their origin:

• If some periods give a correct result:

o At the base year (where elements at constant and current prices are identical): the price indexes could be
mistaken for one another, or values could be mistaken for volumes.

o If a variable in the formula is null for these periods, it could be responsible.

o Otherwise, it could come from a typing error (made by the user or the data producer).
o Or if it appears in the last periods, the provisory elements could be inconsistent.

• Observing the magnitude of the error also can be useful: a residual exceeding the normal economic
magnitude (1000% for example) should come from a specification error: bad operator, inversion of

52
Unless the modeler allows some identities to hold true only approximately.

134
coefficients, mistaking values for values per capita. A low residual will often come from confusion between
two close concepts (the consumption price deflator excluding or including VAT).

• For additive equations, a missing or extra element may be identified by comparing the residual to the actual
values of variables. For instance, if the error on final demand for 2010Q1 is 56734 and this is the actual value
of housing investment.

• If the sign of the error is constant (and especially if the order of magnitude is similar across periods), the error
could come from the absence of an element, a multiplication by a wrong factor, or a missing positive
influence.

• If several errors have identical values, they should have the same origin. This is the case when values are
mistaken for volume, if the share the same deflator.

• If two variables show roughly identical errors with the opposite sign, this can come the fact that one of them
has erroneous values and explains the other.

For instance, if historical values for Q are overestimated, the relative error on UR and Q will be similar with different
signs.

UR = Q/CAP

Q + M = FD + X

5.3.6.3.2 Processing errors

Diagnosing errors in the residual check phase can lead back to different phases of the modelling process:

• Data management: the data obtained from external producers is not consistent, for a full series or for specific
observations (this happens!).
• Production of model data: using the wrong original series, using a wrong computation.
Example: using a variable at current prices instead of constant or forgetting an element in a sum.

• Specification of the model (badly written equations).

Example: forgetting the term for Housing investment in the definition of demand. But if the same error was made
when computing the series, the two errors will compensate each other.

• Estimation (modified series since estimation, bad coefficients).

Example: an error in the imports equation shows that the explanatory series for domestic demand has been changed
since estimation.

Applying this process a number of times will be necessary to produce a coherent model.

135
5.3.6.3.3 Back to the example

Producing a residual check is quite easy in EViews: one just has to specify the option “d=f” in the SOLVE statement:

_fra_1.solve(d=f)

Of course, as all equations will be computed separately, all information must be available on the current sample
period, including the values of the endogenous variables (which should be explanatory somewhere else). Contrarily to
computations and estimations, EViews does not adapt the simulation process to the feasible period (this seems rather
logical).

As the model is recursive (super-recursive?) computation gives the result directly, and no element describing the
solving method is needed (we shall see them later).

However:

• One should specify the name given to the computed variables.

Every time EViews has to solve a model, the name given to the results will be built from the original name of the
variable, with the addition of a suffix (a prefix is also possible but less manageable in our opinion). This avoids
destroying the original information, and allows comparing alternate solutions.

The prefix is specified using the statement:

modelname.append @all suffix

(Remember: append adds text to the model, an identity equation is only a special case of text).

In our case, applying the suffix “_C” calls for:

_fra_1.append @all _C

The equation for FD will give FD_C, which we can compare with the actual values of FD.

Computing the differences between actual and computed values can be done in a loop, using the syntax described
later. The elements in the loop can be defined “by hand” but it is more efficient to use the “makegroup” statement.

_fra_1.makegroup(a,n) groupname @endog

136
_fra_1.makegroup(a,n) groupname @exog

In our case:

_fra_1.makegroup(a,n) g_vendo @endog

_fra_1.makegroup(a,n) g_vexo @exog

Two remarks:

• You surely wonder about the reason for the (a,n). This modifies the default options of the “makegroup”
statement, which would produce a group with the baseline names (in our case with _C added) and leave out
the actual names. Stating (a,n):

o Introduces the actual names (a for actual)

o Eliminates the baseline ones (n for no baseline)

It would be best to restrict the computations to the identities. The residuals on the “estimated” have no meaning: as
the “f” scalar is null, the right-hand side will be computed as zero, and the percentage error as 100% as 100*(value -
0)/value. But being able to compute the whole model proves that estimations can be conducted on that period.

One can create two sub-groups by

group g_vbeha CI I LE M X

group g_viden CAP CO FD IC IH K LT Q RHI TD UR

group g_vbeha CI I LE M X

_fra_1.makegroup(a,n) g_viden @endo

g_viden.drop CI I LE M X

This creates first a full group g_viden, then eliminates the estimated from it.

137
This last technique is clearly inefficient here, but will be much more with a 500 equations model with 50 estimated
ones (a more usual situation).

However, both techniques call for a user-defined list, which will have to be updated each time the variable set is
modified, something we want to avoid: we propose using a more complex, but automatic one.

A tip: A visual check is made difficult by the relative imprecision of EViews, which often produces small residuals for
exact equations. In scientific format, these residuals appear as numbers with high negative exponents, which are hard
to identify. One solution is to move to a fixed decimal presentation, by selecting a zone (in the “spreadsheet” view)
then using the right mouse button to access “display format” then “fixed decimal”.

A simpler solution to observe if there is no error is to display all the residuals as a single graph, and look, not at the
series (they should move around in some Brownian motion) but at the scale: both maximum and minimum must be
very small.

Another idea is to transfer the residuals to Excel and sort the sheet (unfortunately EViews does not sort a sheet across
series on the values at a given period). The non-negligible elements should appear at the top and the bottom
according to their sign and the sorting order. Then one can delete the small errors in the middle (less than 0.001%?).
As error correction progresses, the number of remaining lines should decrease.

This technique takes more time but allows to identify immediately and fully the faulty elements.

5.3.6.4 Group members view

Managing groups (very useful in modelling) is more flexible and organized. The Preview function (see above) applies to
the group as a set.

However, for very simple tasks (like adding and element to a group) using the command window (in which the
previous group specification is available) can actually prove faster.

5.3.6.4.1 A trick: generating the groups of identities and behavioral

You certainly have realized by now (and you knew it probably before anyway) that one should avoid as much as
possible having to edit the text of modeling programs, each time changes have been made earlier in the process. This
represents at best extra work, at worst a source of error. We have just violated this principle, by separating by
ourselves the endogenous into behavioral and identity.

This will introduce problems, in particular in large models: the initial process will be tedious and error prone, and one
will have to remember to update the list every time the model structure changes.

We propose a simple technique to avoid this, and make the initial separation and its updating automatic. It is based on
the presence of the “f” scalar in the behavioral equations.

We just have to:

o Simulate the model with the option “d=f” and f=1, saving the results under a given suffix
o Set f to 2 and update the model (this is necessary for EViews to take into account the change).
o Simulate the model again with f=2 and another suffix.
o Create empty groups of estimated and identity variables.
o Produce a loop over the whole group of endogenous, and test each time if the results of the two simulations
are different.

 If they are, add the variable to the list of estimated elements.

138
 If not, to the list of identity elements.

Note: when we move to actual estimated formulas, we will introduce a residual appending the suffix “_ec” to the
name of the variable. We will use the same technique applying a change to this element.

We can use the following program (for the period 2000 – 2002). We suppose that any percentage error higher than
0.00001 denotes an error.

_fra_1.makegroup(am) g_vendo @endog

_fra_1.makegroup(an) g_vexo @exog

group g_varia g_vendo g_vexo

group g_vbeha ‘creates an empty group

group g_viden ‘creates an empty group

smpl 1986Q1 2004Q4

_fra_1.append assign @all _c

scalar f=0

solve(d=f) _fra_1

scalar f=1

_fra_1.update

_fra_1.append assign @all _d

solve(d=f) _fra_1

for !i=1 to g_vendo.@count

%1=g_vendo.@seriesname(!i)

series pf_{%1}=100*({%1}_d-{%1}_c)/({%1}_c+({%1}_c=0))

if @max(@abs(pf_{%1}))>1e-5 then

g_vbeha.add {%1}

else

g_viden.add {%1}

endif

next
139
This sequence calls for some explanation.

• The loop (“for” to “next”) is reproduced for each variable in the list g_vendo. The number of these variables is
g_vendo.@count (For EViews, x.@count is an integer scalar containing the number of elements in group x).

o !i is the rank of the variable in the group g_vendo (from 1 to g_vendo.@count).

o %1 receives as a character string the contents of g_vendo.@seriesname(!i) , the name of the variable in group
g_vendo, with rank !i.
o The subsequent formulas replace %1 by its string value, and brackets are dropped leaving the characters in the
statement.

For regular users of EViews, or people familiar with programming, the above was probably clear. For others, this is the
time to give very basic information about EViews programming (even if this is not the purpose of this book).

5.3.7 USING LOOPS AND GROUPS IN EVIEWS

In the programs we are going to present, intensive use is made of two elements: groups and loops.

5.3.7.1 Groups

Groups are named elements which refer to a set of objects (which can be series, series expressions but also other
objects), allowing to treat them either as a whole or in sequence.

The statement creating a group is

group name-of-the group list-of-elements

For instance

group g x y z x/y

will create a group named g containing the three series x, y and z and the ratio of x to y.

The element must be series of expression, but one can cheat by creating artificial series with the name of the
requested element.

One can:

• Group groups
• Add and drop elements from groups:

140
g.add a

will add the series a to the group g.

g.drop x

will drop the series x from the group g.

Two useful elements can be associated with the group:

g.@count is a scalar which contains the number of elements of group g.

g.@seriesname is a character vector which contains the names of the series in group g.

Finally, groups can be created through a mask:

group g_fra fra_* will create a group from all the elements starting with fra_ an underscore.

group g_GDP ???_GDP will create a group from all the GDPs of OECD countries (using three characters as a
label).

group g_3 ???_* will create a group from all the elements starting with three characters, then

Groups can be used to display a list of series, as spreadsheet or graph, by double-clicking on its name in the workfile
window (where they appear as a blue “G” symbol) or calling for it.

The default display is a spreadsheet format, but one can move to graphs using the “View” button then “graph”, or
even editing the list of elements by “View” + “group members”.

Managing groups has been made more flexible and organized in the last versions. The Preview function (see later)
applies to the group as a set.

However, for very simple tasks (like adding and element to a group) using the command window (in which the
previous group specification is available) can actually prove faster.

5.3.7.2 Loops

EViews allow two kinds of loops:

• By element
• s (a list or a group)

The syntax is:

141
for %parameter list-of-variables or group-name

block of statements including {%parameter} or %parameter

The block of statements will be repeated in sequence for each element in the list, which will then replace the
parameter.

The presence of brackets around the parameter changes its status. With brackets the associated characters are
included in the statements, then the brackets are dropped. Without brackets the parameter is considered as a
character string variable.

For instance, with

%1="111"

The statement

genr xxx={%1}

will give to the series xxx the value 111,

while

xxx=%1

will create a character string with the value “111”

The statement

genr xxx=%1

142
will be illegal as it tries to transfer a character string to a series.

We get the message:

can not assign string expression to numeric variable in "GENR XXX="111""

On the other hand, the statement:

%2=%1+"333"

Will create a “111333” string, while

%2={%1}+"333"

will be illegal as it mixes strings and values:

Scalar assigned to string in "%2=111+"333""

• By integer number.

The syntax is:

for !name=first-integer to second-integer by third integer

block of statements including {!parameter}

The block of statements will be repeated in sequence from first-integer to second-integer, incrementing if necessary
by third-integer, the value replacing the parameter.

Integers can be negative. If third-integer is omitted, the increment will be 1.

143
This type of loop can also be applied to a group

for !integer=1 to group-name.@count

%1=group-name.@seriesname(!integer)

block of statements including !integer, %1, {%1}

• group-name.@count is the number of elements in the group group-name.

• %1 receives the contents of group-name.@seriesname(!i) , the name of the variable in group group-name,
with rank !integer.

5.3.8 COMPARING WORKFILES : THE WFCOMP ARE COMMAND

During the modelling process, you often have to compare two sets of information.

In particular, you might want to:

o Make sure that two sets of data are identical. This applies to the results of a program you are running again,
maybe after a long delay.
o Control the evolution of historical values for a model data set, showing for instance which equations will have
to be estimated again.
o Summarize the results of a residual check, showing for which equations the right-hand side (using historical
values of the explained variable) is different from the right hand side (the result of the computation). By
setting a tolerance level slightly higher than zero (for instance 0.0001) one can restrict the display to the
errors deemed significant.
o Or you just might want to know which elements of a set are present in another set, for instance which
available series are actually used by one model.

This can be done easily, using the wfcompare command.

You can compare elements between workfiles and pages inside the same workfile. EViews will display one line per
element, in which will be stated its relation, between: unchanged, modified (numerically), added, deleted,
replaced (logically, the last case applies for instance to a linked variable have been modified). A filter can be
applied.

For series, a tolerance level can be set, under which the series are not considered modified. The display will tell how
many periods show a higher difference.

By default, all elements will be displayed, but one can restrict the case (for instance, to all variables present in both
pages with a difference higher than the criterion).

Equations and models are not compared but appear in the list.

The syntax of the wfcompare command is:

144
wfcompare(tol=criterion,list=comparison_type) list_of_compared_series list_of_reference series

For more details you should refer to the EViews Help.

For instance, if you want to compare all French series (starting with “FRA_”) between the pages “base” and
“updated”, for a tolerance level of 0.00001 one will state:

wfcompare(tol=1E-5,list=m) updated\fra_* base\fra_*

145
6 CHAPTER 6 THE ESTIMATION OF EQUATIONS

We now have

• A full description of the framework of the model, in which all the identities are completely specified, and the
intents in terms of behaviors are described as clearly as possible.
• A full database containing all the series in the present model, endogenous and exogenous, with their
description.
We have also checked that:

• The specification of identities is consistent with the available data.

• The information obtained on the structure of the model (causalities, interdependencies) is consistent with
our economic ideas.
Both the list of variables and equations are available as printable documents.

The next stage is obviously to replace each of the tentative behaviors by actual ones, validated both by economic
theory and statistical criteria.

6.1 THE PROCESS OF ESTIMATION

What we are proposing is not a book on econometrics, and anyway we will never be as knowledgeable, by far, as the
EViews team and collaborators, both in terms of theory and ability to teach it (remember that one of them is Robert
Engle…).

This means we will not approach the theoretical aspects of the subject, leaving the reader to the use of the books we
propose in our bibliography, or even to the EViews Help manuals, which can be actually used as teaching tools, as they
are both comprehensive and very progressive in their approach.

But once the modeler is familiar with the concepts, their application to an actual case53 is not straightforward at all.
This means we think this book can bring a very important contribution: showing how these methods can be used in
the process of building our models. The reader will learn how, in very specific cases, first very basic then more
operational econometrics can be used (or not used), considering the information he has and the goal he is pursuing.

We shall also show the role econometrics take in the process, not as a single task between data gathering and
simulations, but as a recurrent partner in the iterative process of building a working model.

We shall not only give examples working smoothly from the start, but show also how econometrics can be set aside,
and how, in some cases, an initial failure can be transformed into success, with some imagination 54.

53
One in which he is not playing with data, but actually oblided to succeed.

54
Remember David Hendry’s four golden rules of econometrics: 1.Think brilliantly, 2.Be infinitely creative, 3.Be
outstandingly lucky, 4.Otherwise, stick to being a theorist.

146
6.2 SPECIFIC ISSUES

Nevertheless, we feel it will be useful to start with two cases, which are not generally treated by manuals, and can
lead to wrong decisions, or wrongly evaluating the results of tests.

We shall use a very practical approach.

6.2.1 THE R2 OR R-SQUARED

The statistic called "R2" or "R-squared" is the most commonly used to judge the global quality of an estimation. It is
defined by the following formula.

R2 = ∑𝑇t=1(𝑥̂𝑡 - x̄ )2 / ∑𝑇t=1(𝑥𝑡 - x̄ )2

This statistic can therefore be interpreted as the share of the variance of the observed variable x explained by the
estimated formula.

A geometrical explanation also can be used: if we consider the space of variables (dimension T = number of
observations), the estimation method will consist in minimizing the distance between the explained variable and the
space (the plane or hyper plane) generated by the vectors of explanatory series, using combinations of parameter
values.

Especially, if the formula is linear relative to estimated parameters and contains a constant term, we can consider the
estimation is based on the difference of variables (explained and explanatory) to their means. In this case, minimizing
the Euclidian distance will lead (as can be seen on the graph) the vector (𝑦̂𝑡 - y𝑡 ) to be orthogonal to the space and
therefore to the vector(𝑦̂𝑡 - ȳ ). These two elements represent the non-explained and explained part of(𝑦𝑡 - ȳ ), the
variance of which is the sum of their squares. The R 2 can be interpreted as the square of the cosine of the angle
between the observed and adjusted series: the closer the R 2 is to 1, the smaller the angle will be and the higher the
share of the estimated variable in the explanation of the total variance. The explanation will be perfect if y-ȳ belongs
to the space, and null if the perpendicular meets the space at the origin.

147
obs 3
y
x1-x1

y
y x2-x2

obs 2

obs 1

If the equation presents no constant term, the same reasoning can be applied, but this time the mean is not
subtracted. However, the R2 no longer has the same meaning: instead of variances, direct sum of squares will be used.

We will not go further in the explanation of this test, concentrating instead on its practical properties.

6.2.1.1 Questioning the R-squared

One must be very careful when using the R2 statistic.

• it is favored by trends in variables, independently from economic significance.

The R2 statistic will be all the higher as the explained variable and at least one of the explanatory variables present a
time trend according to the rank of the observation. Thus components of each of these variables on axes of
observations will grow in the same or opposite direction (from highly negative to highly positive or the reverse), and
give associated vectors very close orientations. In the above graph, the components of variables on the axes will be
more or less ordered according to the numbering of the axes themselves. The first observations will be the most
negative, then values will grow through zero and reach the most positive ones in the end. The same goes if the
ordering follows opposite directions: the estimation will evidence a negative link.

In this case, even in the absence of a true causal relationship between variables, the orientation of the vectors will be
similar to a given multiplicative factor, and the R2 test will seem to validate the formulation. And most time series (like
values, quantities or prices), generally present a growing trend, giving this phenomenon a good chance to happen. For
example, if we apply the previous equation for French imports:

148
(1) 𝐿𝑜𝑔(𝑀𝑡 ) = 𝑎 ⋅ 𝐿𝑜𝑔(𝑇𝐷𝑡 ) + 𝑏 + 𝑢𝑡

Replacing TD by any steadily growing (or decreasing) variable55 will give a “good” R2, better maybe than actual French
demand.

Actually, it can be shown that testing for each OECD country the estimation of its imports as a function of the demand
of any country, the “true” equation does not come always as the best, although it is never far from it.

• It gives misleading diagnoses when comparing estimations explaining different elements.

This happens in particular when we explain the same concept using a different transformation.

Let us consider our equation 14, as

(14) 𝛥𝐿𝑜𝑔(𝑀𝑡 ) = 𝑎 ⋅ 𝛥𝐿𝑜𝑔(𝑇𝐷𝑡 ) + 𝑏 + 𝑣𝑡

We can see that the time trend has disappeared from both series, and any correlation will come from common
deviations around this trend (or rather common changes in the value from one period to another). This is of course a
much better proof of a link between the two elements (independently from autocorrelation).

To put the two formulations on equal grounds, they must explain the same element. For this, one can just modify the
new equation into:

𝐿𝑜𝑔(𝑀𝑡 ) = 𝐿𝑜𝑔(𝑀𝑡−1 ) + 𝑎 ⋅ 𝛥𝐿𝑜𝑔(𝑇𝐷𝑡 ) + 𝑏 + 𝑣𝑡

Compared to the initial formula, this transformation will not change the explanation56, as obviously the minimization
of the sum of squared residuals represents the same process. The only modified statistic will be the R2, which will
increase a lot, as an identical element with a high variance (compared to that of Log(Mt)) has been added on both
sides.

55
Like Australian demand, or the price of a pack of cigarettes in Uzbekistan.

56
Before estimation EViews will move the lagged term to the left.

149
The choice between the two formulations should not rely on the R 2 but on the autocorrelation of the residual: if ut is
not correlated one should use (1), if it is one should try (2). But in any case the issues will be solved by error correction
models and cointegration, which we shall address later.

The following two formulations are equivalent an indeed give exactly the same results, except for the R-Squared
statistic.

Dependent Variable: DLOG(M)

Method: Least Squares (Gauss-Newton / Marquardt steps)
Date: 03/21/20 Time: 19:25
Sample (adjusted): 1963Q2 2004Q4
Included observations: 167 after adjustments
DLOG(M)=C_M(1)*DLOG(FD)+C_M(2)

Coefficient Std. Error t-Statistic Prob.

C_M(1) 1.941998 0.092941 20.89494 0.0000

C_M(2) 0.001169 0.001359 0.860131 0.3910

R-squared 0.725731 Mean dependent var 0.015300

Adjusted R-squared 0.724068 S.D. dependent var 0.028994
S.E. of regression 0.015231 Akaike info criterion -5.519130
Sum squared resid 0.038275 Schwarz criterion -5.481788
Log likelihood 462.8473 Hannan-Quinn criter. -5.503974
F-statistic 436.5986 Durbin-Watson stat 1.720498
Prob(F-statistic) 0.000000

Dependent Variable: LOG(M)

Method: Least Squares (Gauss-Newton / Marquardt steps)
Date: 03/21/20 Time: 19:23
Sample (adjusted): 1963Q2 2004Q4
Included observations: 167 after adjustments
LOG(M)=LOG(M(-1))+C_M(1)*DLOG(FD)+C_M(2)

Coefficient Std. Error t-Statistic Prob.

C_M(1) 1.941998 0.092941 20.89494 0.0000

C_M(2) 0.001169 0.001359 0.860131 0.3910

R-squared 0.999457 Mean dependent var 24.31343

Adjusted R-squared 0.999454 S.D. dependent var 0.651772
S.E. of regression 0.015231 Akaike info criterion -5.519130
Sum squared resid 0.038275 Schwarz criterion -5.481788
Log likelihood 462.8473 Hannan-Quinn criter. -5.503974
F-statistic 303832.9 Durbin-Watson stat 1.720498
Prob(F-statistic) 0.000000

6.2.2 THE CONSTANT TERM

When observing the validity of individual influences, one element plays a very specific role: the constant term.

This element can have two purposes:

150
• To manage the fact that the equation does not consider elements as such, but the deviations from their
means. In ordinary least squares, even if the final result is a linear formulation of the variables and a
constant term, the process actually
o computes the mean
o substracts it from the variable
o uses the deviations to estimate a formula with no constant57
o recombines estimated coefficients and means into a constant

This constant is an integral part of the process. It should be included every time at least one of the
explanatory elements does not have a zero mean.

• To describe an economic mechanism.

Let us give an example for the first case: if imports have a constant elasticity to demand, we will estimate:

𝛥𝑀𝑡 /𝑀𝑡 = 𝑎 ⋅ 𝛥𝑇𝐷𝑡 /𝑇𝐷𝑡

𝐿𝑜𝑔(𝑀𝑡 ) = 𝑎 ⋅ 𝐿𝑜𝑔(𝑇𝐷𝑡 ) + 𝑏

but the estimation process will first use the difference to the average to get “a”

𝐿𝑜𝑔(𝑀𝑡 ) − 𝐿𝑜𝑔(𝑀̄) = 𝑎 ⋅ (𝐿𝑜𝑔(𝑇𝐷𝑡 ) − 𝐿𝑜𝑔(𝑇𝐷))

𝐿𝑜𝑔(𝑀𝑡 /𝑀̄) = 𝑎 ⋅ 𝐿𝑜𝑔(𝑇𝐷𝑡 /𝑇𝐷)

Then the constant

57
As all elements in the formula have zero mean, the sum of the residuals will also

151
𝑏 = 𝐿𝑜𝑔(𝑀) − 𝑎 ⋅ 𝐿𝑜𝑔(𝑇𝐷)

We can see in particular the consequences of a change in the units (thousands, millions, billions...). The constant term
will absorb it, leaving “a” unchanged. In the absence of “b”, “a” will get a different value, for no economic reason.

Of course, the more “b” is significant, the more its absence will be damaging to the quality of the estimation (and the
more “a” will be affected). But this is no reason to judge “b”. We can see this as weighting an object with a balance:
the two platters never have the same weight, and even if the damage decreases with the difference, it is always useful
to correct it. And in our case there is no cost (actually it makes things cheaper, as the cost of the decision process
disappears).

It is not frequent for the constant term to have a theoretical meaning. The majority of such cases come from a formula
in growth rates or variations, where the constant term will be associated with a trend.

The only justification for the absence of a constant term is when this theoretical constant is null. In this case,
observing a significant value becomes a problem, as it contradicts the theory. We shall give an example soon.

6.3 APPLICATIONS: OUR MODEL

Let us now apply the above principles to our sample model.

In our model, we have to estimate five equations, for which we have already ideas about their logic:

• The change in inventories, employment and investment should depend on GDP

• Exports and imports should depend on the associated demand (world and domestic) and availability of
potential supply.
We shall use each of these equations to illustrate a specific aspect of estimation.

• The change in inventories: general elements, homoscedasticity, presence of a constant term.

• Employment: stationarity, error correction models.
• Investment: the necessity to establish a consistent theoretical equation prior to estimation.
• Exports: autoregressive processes, cointegration, long-term stability.
• Imports: going further on cointegration and long-term stability.
Each of our formulations will be based on very simple economic ideas, and we shall select a specification which
complies with both econometric tests and economic consistency. They are also chosen in a way which should allow
them to merge harmoniously into the model we are building. However, it should be clear that

• Other simple formulations could probably be built on the same sample, with equivalent or maybe better
quality. having
• Using another sample (another country for instance) the same economic ideas could lead to different
formulations (not only different coefficient values).
• Other economic ideas could be applied, with the same complexity.
• To produce a truly operational model, the present framework would have to be developed in a large way. We
will present such developments later.

152
However the model we are building represents in our sense a simplified but consistent summary of the general class
of models of this type. Reading descriptive documents for any operational structural model, one will meet many of the
ideas we are going to develop.

6.3.1 THE CHANGE IN INVENTORIES

We shall use this simplest estimation to present the basic features of EViews estimation, and also stress the necessity
for homoscedasticity.

Our formulation will suppose simply that firms want to achieve a level of stocks proportional to their production (or
GDP). For a particular producer, this should be true both for the goods he produces and for the ones he is going to use
for production. For instance, a car manufacturer will allow for a given delay between production and sale (maybe
three months, which will lead to an inventory level of 1/4 th of annual production). And to be sure of the availability of
intermediary goods (like steel, tires, electronic components and fuel for machines in this case) he will buy the
necessary quantity (proportional to production) sometime in advance.

We shall suppose that firms have achieved, at the previous period, an inventory level IL representing a number of
semesters of production:

𝑳𝑰𝒕−𝟏 = 𝒂 ⋅ 𝑸𝒕−𝟏

And they want to keep this level at the present period:

𝑳𝑰𝒕∗ = 𝒂 ⋅ 𝑸𝒕

𝑳𝑰𝒕∗ = 𝑰𝑳𝒕

Then the change in inventory will represent:

𝑪𝑰𝒕 = (𝑳𝑰𝒕 − 𝑳𝑰𝒕−𝟏 ) = 𝒂 ⋅ 𝜟𝑸𝒕

This means that contrary to the general case this equation should not include a constant term. Its presence would call
for a trend (and a constant) in the equation in levels, with no economic justification. It would also introduce a
problem: adding a constant to an explanation in constant Euros would make the equation non-homogenous.

Even then, the equation faces a problem, concerning the residual: between 1963 and 2004, French GDP has been
multiplied by 4. We can suppose the level of inventories too (maybe a little less with economies of scale and improved
management techniques).

153
It is difficult to suppose that the unexplained part of the change in inventories is not affected by this evolution. As the
variable grows, the error should grow. But to apply the method (OLS), we need the residual to have a constant
standard error. Something must be done.

The simplest idea is to suppose that the error grows at the same rate as GDP, which means that if we measure the
change in inventories in proportion to GDP, we should get a concept for which the error remains stable. Of course, we
shall have to apply the same change to the right-hand side, which becomes the relative change in GDP.

To avoid causality problems (for a given semester, demand for IC is partly satisfied by Q) we shall use the previous
value of Q.

The equation becomes:

𝑪𝑰𝒕 /𝑸𝒕−𝟏 = 𝒂 ⋅ 𝜟𝑸𝒕 /𝑸𝒕−𝟏

6.3.1.1 The basic EViews estimation features

As this is our first example, when shall use it to present the basic estimation features.

Actually, the technique will be different according to the stage in the estimation process: whether we are exploring
several individual formulations, looking for the best option both in statistical and economic terms, or we have already
elected the best one, and want to merge it into our model.

We shall start with the first situation.

The simplest way to estimate an equation under EViews is through the menus, using in succession:

Quick > Estimate equation

A window appears, in which one has to type the formula.

154
In the case of ordinary least squares, this can be a list of elements separated by blanks, in our case:

We can also use

IC/Q(-1)=c(1)*D(Q)/Q(-1)

Of course, the two methods give exactly the same results (in the first case, the “c” vector will also be filled with the
estimated coefficient).

The default method will be Least Squares, appropriate in our case. If the equation was not linear in the coefficients,
the second presentation would be automatically called for.

One will note that

• A constant term has to be introduced explicitly (as an additional element called “C”).
• EViews allows to specify a sample, which will be applied only to the particular equation (the current sample is
not modified). This is quite useful if some periods have to be excluded from the estimation. This will happen
for instance if they are deemed not to follow the estimated behavior (like pre – transition data for Central
European countries, China or Vietnam), or observations are also provided over the future (OECD’ Economic
Perspectives completes the historical data with the results of its forecasts over the next three years).
• On the contrary, one does not have to care about leaving in the sample periods for which estimation is not
possible, due to missing elements or the impossibility to compute a term (for instance the logarithm of a
negative value). EViews will eliminate by itself the corresponding periods (and tell you about the reduced
sample).
In our case we can use the sample:

155
smpl 1962q1 2004q4

which means that we consider data from the first quarter of 1962 to the last of 2004.

If the equation is linear in coefficients, EViews recognizes this property, and does not try to iterate on the coefficients,
as it knows the values found are the right ones.

Using the “Ok” button gives the following results

Dependent Variable: CI/Q(-1)

Method: Least Squares
Date: 03/21/20 Time: 20:46
Sample (adjusted): 1963Q2 2004Q4
Included observations: 167 after adjustments

Variable Coefficient Std. Error t-Statistic Prob.

D(Q)/Q(-1) 0.183685 0.034776 5.281999 0.0000

R-squared 0.057924 Mean dependent var 0.002150

Adjusted R-squared 0.057924 S.D. dependent var 0.006806
S.E. of regression 0.006606 Akaike info criterion -7.195844
Sum squared resid 0.007243 Schwarz criterion -7.177173
Log likelihood 601.8529 Hannan-Quinn criter. -7.188266
Durbin-Watson stat 0.695764

We can see that EViews gives the sample used (the relevant periods of our sample). Estimation starts in 1963q2, (with
data starting in 1963Q1) and ends in 2004q4.

• We get also the number of periods, and the time and date.
• The other elements are the usual statistics, described earlier. The most important are:
o The R-squared, the Durbin-Watson test and the Standard Error of regression for global elements.
o The coefficient, the t-Statistic and the probability for reaching the coefficient value if the true coefficient is
null with the estimated standard error.

In our case:

• The R-Squared is very low, even if the extreme variability and the absence of trend of the left-hand element
plead in favor of the explanation58.

However, as with almost all homogenous estimations, a simple interpretation is available, through the standard error:
as the explained variable is measured in points of GDP, this average error represents 0.68 points.

58
If we knew the values for IL, its estimation would get a better R 2 (due to the colinearity of LI and Q). But we would
be led to estimate an error correction model on IL, anyway. We have seen the advantage of this formulation, but for
the quality to extend to the whole model, all equations must be of this type.

156
• The coefficient is very significant. The probability of reaching 0.107 for a normal law with mean 0 and
standard error .00126 is measured as zero. Of course, it is not null, but it is lower than 0.00005, and probably
much so.

• But the Durbin-Watson test takes an unacceptable value, even if the absence of a constant term (and the
subsequent non-zero average of residuals) makes its use questionable.
• The graph of residuals is the second important element for diagnosis. It shows the evolution of actual and
estimated series (top of the graph, using the right-hand scale) and of the residual (bottom, using the left hand
scale, with lines at + and – 1 standard error). This means that inside the band residuals are lower than
average, and higher outside it. Of course, it gives only a relative diagnosis.
.04

.02

.00
.03
.02 -.02

.01 -.04
.00
-.01
-.02
1965 1970 1975 1980 1985 1990 1995 2000

Residual Actual Fitted

The graph shows (in our opinion) that the equation provides some explanation, but some periods (1975-1980 in
particular) present a large and persistent error

In addition to the display of estimation results and graph of residuals, EViews creates several objects:

• A vector of coefficients, contained in the “C” vector. The zero values or the results from the previous
regression are replaced59 .
• A series for the residuals, contained in the “RESID” variable. The “NA” values or the results from the previous
regression are replaced60.
• A tentative equation, called “Untitled” for the moment, and containing the developed formula, with “C” as
the vector of coefficients, with numbers starting from 1. In our case, the formula is obviously

IC/Q(-1)=c(1)*D(Q)/Q(-1)

59
But if the present regression contains fewer coefficients than the previous ones, the additional elements are not put
to zero.

60
But this time, residuals from previous equations are given either computed values or « NA ».

157
Any subsequent estimation will replace this equation by the new “Untitled” version.

• EViews provides also several options, accessed from the menu, and which can be useful:

o View gives three representations of the equation: the original statement, and two formulas including
coefficients as parameters (the above “c” type) or as values.
o “Print” allows printing the current window: to a printer, to a text file (using characters, which saves space but
reduces readability, especially for graphs), or to a graphics RTF file. This last option might call for a
monochrome presentation, which is obtained through the « Monochrome » template (the last of the general
Graph options).
o “Name” allows creating the equation as a named item in the workfile, with an attached comment. It is
important to use it immediately after the estimation, as the temporary equation (named “untitled”) will be
replaced by the next estimation.

However, inserting an underscore ("_") before the name proposed will place the equations in the first positions of the
working window.

EViews proposes as a standard name “EQ” followed by a two-digit number, following the lowest one unused at the
moment.

There are two options:

 Give a name representative of the equation (like “EQ_X3U” for the third equation estimating X as influenced
by the rate of use).
 Accept the EViews suggestion and rely on the attached comment for the explanation.

Personally we favor the second option:

• It is simpler and more natural to use.

• It allows placing all the equation in the same workfile (and window) location.
• It avoids defining a complex and maybe unclear naming method.
 The comment zone is much wider and can follow any format, including blanks and special characters.

Actually the item saved is more complex than the actual formula. Double-clicking on it shows that it contains the full
representation, including the residual (and actually the standard errors of the coefficients, even if they are not
displayed).

158
o Forecast produces a series for the estimated variable (or the estimated left-hand expression, generally less
interesting), and an associated graph with an error band (and a box with the statistics).

6.3.1.2 An alternate technique: using the command window

Instead of using Quick>Estimate, one can work directly through the command window. One just has to add “ls” before
the formula.

ls IC/Q(-1)=c(1)*D(Q)/Q(-1)

This has several advantages:

• By copying and editing the current equation on the next line of the command box, entering changes is made
much easier.
• After a session of estimations, the set can be copied into a program file and reused at will. Management of a
set of alternate versions is much easier.
• One can control the size of characters. This is quite interesting when working with a team, or making a
presentation, as the normal font is generally quite small.
• The only drawback is sample definition: it has to be entered as a command, not as an item in the “estimate”
panel.

6.3.1.3 Other possible specifications

Let us go back to our estimated formula. If we are not satisfied with the previous results, we can try alternate options,
without changing the economic background.

First, one can observe a very clear outlier for the second quarter of 1968. Economists familiar with French history are
aware that this corresponds to the May 1968 “revolution” when student demonstrations turned into general strikes,
paralyzing the economy as not only factories closed but also transportation came to a standstill, and available goods
could not be delivered to the firms which needed them.

As production was much more reduced than demand, in an unexpected way, satisfying it had to call for inventories,
and even a lower production needed intermediate goods (in particular oil and coal) which were not available due to
these transportation problems.

The situation came back to normal in the third quarter.

This will lead us to introduce a “dummy” variable, taking the value 1 in the second quarter only. We can observe
already the gain from our “time” variable: we can introduce it explicitly, without having to create a specific element
which will have to be managed in every simulation, including forecasts.

159
Dependent Variable: CI/Q(-1)
Method: Least Squares
Date: 03/21/20 Time: 23:00
Sample (adjusted): 1963Q2 2004Q4
Included observations: 167 after adjustments

Variable Coefficient Std. Error t-Statistic Prob.

D(Q)/Q(-1) 0.141900 0.037159 3.818766 0.0002

T=1968.25 -0.019887 0.007058 -2.817550 0.0054

R-squared 0.101169 Mean dependent var 0.002150

Adjusted R-squared 0.095722 S.D. dependent var 0.006806
S.E. of regression 0.006472 Akaike info criterion -7.230859
Sum squared resid 0.006911 Schwarz criterion -7.193517
Log likelihood 605.7767 Hannan-Quinn criter. -7.215703
Durbin-Watson stat 0.614546
s

The results do improve.

6.3.1.4 Introducing distributed lags

One could assume that the change in inventories does not depend on the present change in GDP, but rather on the
sequence of past changes, due both to inertia on firms behavior, and technical difficulties in implementing decisions.

Let us first consider the last five periods leaving the coefficients free. We get;

Dependent Variable: CI/Q(-1)

Method: Least Squares
Date: 03/21/20 Time: 23:22
Sample (adjusted): 1964Q2 2004Q4
Included observations: 163 after adjustments

Variable Coefficient Std. Error t-Statistic Prob.

@PCH(Q) -0.015392 0.043290 -0.355545 0.7227

@PCH(Q(-1)) 0.058955 0.036838 1.600411 0.1115
@PCH(Q(-2)) 0.166449 0.034885 4.771425 0.0000
@PCH(Q(-3)) 0.177331 0.037380 4.743971 0.0000
@PCH(Q(-4)) 0.073800 0.035119 2.101456 0.0372
T=1968.25 -0.037652 0.006491 -5.801017 0.0000

R-squared 0.386670 Mean dependent var 0.002280

Adjusted R-squared 0.367137 S.D. dependent var 0.006800
S.E. of regression 0.005409 Akaike info criterion -7.565217
Sum squared resid 0.004594 Schwarz criterion -7.451337
Log likelihood 622.5652 Hannan-Quinn criter. -7.518983
Durbin-Watson stat 0.753603

Not only most of the explanations are not significant, but the value of the first one (maybe the most important) takes
the wrong sign.
To make the set of lagged coefficients smoother, we can constraint them to follow a polynomial on the lags.
The syntax for this element is :

160
PDL(variable, number of lags, degree of the polynomial, conditions).

Here we shall use :

PDL(@pch(Q),4,3,2)

which implies

• A maximum lag of 4.
• A polynomial of degree 3.
• A zero value for the last coefficient (just beyond the last lag).

We have also introduced:

• The lagged value of the dependent variable (representing additional inertia).

• A constant term associated with a trend in the level of inventories.
Dependent Variable: CI/Q(-1)
Method: Least Squares
Date: 03/21/20 Time: 23:46
Sample (adjusted): 1964Q2 2004Q4
Included observations: 163 after adjustments

Variable Coefficient Std. Error t-Statistic Prob.

CI(-1)/Q(-2) 0.612895 0.063632 9.631865 0.0000

T=1968.25 -0.026807 0.005099 -5.256900 0.0000
C -0.002149 0.000591 -3.636950 0.0004
PDL01 0.054922 0.027450 2.000823 0.0471
PDL02 -0.028519 0.022629 -1.260305 0.2094
PDL03 0.017477 0.009394 1.860457 0.0647

R-squared 0.636943 Mean dependent var 0.002280

Adjusted R-squared 0.625380 S.D. dependent var 0.006800
S.E. of regression 0.004162 Akaike info criterion -8.089560
Sum squared resid 0.002720 Schwarz criterion -7.975679
Log likelihood 665.2991 Hannan-Quinn criter. -8.043325
F-statistic 55.08772 Durbin-Watson stat 2.022998
Prob(F-statistic) 0.000000

Lag Distribution of... i Coefficient Std. Error t-Statistic

0 0.21940 0.03934 5.57655

1 0.10561 0.02824 3.73939
2 0.05492 0.02745 2.00082
3 0.03919 0.02780 1.40989
4 0.03026 0.02738 1.10538

Sum of Lags 0.44938 0.07327 6.13293

The results are rather satisfactory, with a nice profile for the reconstructed coefficients, and generally significant
explanations. The dummy element provides also a much better explanation.

161
.04

.02

.00

.015 -.02

.010 -.04
.005
.000
-.005
-.010
1965 1970 1975 1980 1985 1990 1995 2000

Residual Actual Fitted

6.3.1.5 Preparing the equation for the model

Once an equation has been selected for introduction in the model, a different strategy should be used.

If we use the estimated formula, we will face several problems:

• It is not simple to link the equation name with its purpose, which makes the process unclear and forbids to
use any automated and systematic process.
• The C vector is used by all equations is only consistent with the last estimated one.
• The residuals cannot be managed simply
• Instead, we propose the following organization, deriving all elements from the name of the dependent
variable, though a systematic transformation:
• Naming the equation after the estimated variable.

For instance we can call our equation EQ_CI.

• Using the developed specification, with explicit coefficients.

• Naming the coefficient vector after the estimated variable.
For instance we can call it C_CI. Of course this calls for its creation, with a high enough dimension:

coef(10) c_ci

(we chose 10 as a round number which we know we shall never reach).

• Introducing an additive explicit residual, named after the estimated variable. The reason is the following.

162
o It is essential for a model to estimate and simulate the same equation. Of course two versions can be
maintained, one being copied into the other after each new estimation. This is:

 Tedious.
 Difficult to manage.
 Error-prone.

It is much better to use a single item. However this faces a problem: one wants access to the residual, in particular
for forecasts as we shall see later. And the estimation calls for no residual.

The solution is quite simple: introduce a formal residual, but set it to zero before any estimation.

• Work through a program

This allows:

o Visual control over the specification.

o Easy replication of the estimation (for instance if the data has changed).
o Easy introduction of marginal changes.
o Documentation of the economic context (by introducing comments in the program).

In our case we shall use:

coef(10) ec_ci

genr ec_ci=0

equation eq_ci ci/q(-1)=c_ci(1)@pch(q)+c_ci(2)@pch(q(-1))+ec_ci

genr ec_ci=resid

6.3.2 INVESTMENT: THE NECESSITY TO ESTABLISH A CONSISTENT THEORETICAL EQUATION

PRIOR TO ESTIMATION

In this estimation, we shall stress the importance of establishing a sound economic framework before any estimation.

The basic economic idea is quite simple: the purpose of investment is

• To replace capital discarded from wear or obsolescence.

• To allow a higher level production, facing an increase of demand.
Without proceeding further on theory, many formulations can be considered. For instance, investment could have a
constant elasticity to GDP, maybe with an error correction term including capital...

In our sense, trying for the best estimation without considering the economics behind the formula, and especially its
consequences for model properties, is rather irresponsible. For instance, using the logarithm of investment is quite
dangerous. Its value can change in very high proportions, and if we go back to the microeconomic foundation of this

163
behavior, its value could very well be negative, as some firms are led to disinvest from time to time, by selling more
capital than they buy.

For instance, the following equation seems to work quite well:

𝜟𝑳𝒐𝒈(𝑰𝒕 ) = 𝒂 ⋅ 𝜟𝑳𝒐𝒈(𝑸𝒕 ) + 𝒃 ⋅ 𝑳𝒐𝒈(𝑰𝒕−𝟏 /𝑸𝒕−𝟏 ) + 𝒄 ⋅ 𝒕 + 𝒅

The results are:

Dependent Variable: DLOG(I)

Method: Least Squares
Date: 03/22/20 Time: 11:49
Sample (adjusted): 1963Q2 2004Q4
Included observations: 167 after adjustments

Variable Coefficient Std. Error t-Statistic Prob.

DLOG(Q) 1.835512 0.106843 17.17945 0.0000

LOG(I(-1)/Q(-1)) -0.046829 0.020085 -2.331615 0.0209
C -0.931254 0.301928 -3.084362 0.0024
T 0.000415 0.000138 3.012286 0.0030

R-squared 0.651895 Mean dependent var 0.008590

Adjusted R-squared 0.645488 S.D. dependent var 0.028010
S.E. of regression 0.016677 Akaike info criterion -5.325856
Sum squared resid 0.045336 Schwarz criterion -5.251174
Log likelihood 448.7090 Hannan-Quinn criter. -5.295544
F-statistic 101.7498 Durbin-Watson stat 2.152872
Prob(F-statistic) 0.000000

.0
.06
.04 -.1
.02
-.2
.00
-.02
-.04
-.06
1965 1970 1975 1980 1985 1990 1995 2000

Residual Actual Fitted

Everything seems to go well: the statistics are quite good (even the Durbin-Watson test), the signs are right, the graph
shows a really strong fit. However, when we merge the equation into a model, its simulation properties will be
affected by the base solution: even a very high increase in GDP will have a low impact of the absolute level of

164
investment if it this level was very low in the previous period. And investment can show huge variations as it
represents a change in a stable element, capital.

One can guess that although linking investment (a change in capital) to the change in production seems a natural idea,
jumping to the above formulation was moving a little too fast. In general, one should be naturally reticent in taking
the logarithm of a growth rate, itself a derivative.

The right starting approach is to clarify the economic process through a full logical formalization.

Let us suppose that production follows a “complementary factors” function, which means that to reach a given level
of productive capacity, fixed levels of capital and employment are required, and a reduction in one factor cannot be
compensated by an increase in the other. This means obviously that the less costly process (optimal) is the one which
respects exactly these conditions.

Combinations of K and L
such that:CAP=CAP1

K1 CAP=CAP1

L1m L1 L

With “pk” productivity of capital, and “pl” productivity of labor, we get :

𝑪𝑨𝑷𝒕 = 𝒎𝒊𝒏( 𝒑𝒌𝒕 ⋅ 𝑲𝒕−𝟏 , 𝒑𝒍𝒊 ⋅ 𝑳𝒕 )

(the “t-1” means that we shall use the level of capital reached at the end of the previous period).

Actually, for a given level of employment, there is always some short-term leverage on production, at least at the
macroeconomic level. Temporarily increasing labor productivity by 1% can be easily achieved through extra hours, less
vacations, cancelled training courses...

This means capital will be the only limiting factor in the short-term.

The capacity equation can be simplified into:

165
𝑪𝑨𝑷𝒕 = 𝒑𝒌𝒊 ⋅ 𝑲𝒕−𝟏

Now let us define the rate of use of capacities:

𝑼𝑹𝒕 = 𝑸𝒕 /𝑪𝑨𝑷𝒕

𝑪𝑨𝑷𝒕 = 𝒑𝒌𝒕 ⋅ 𝑲𝒕−𝟏 = 𝑸𝒕 /𝑼𝑹𝒕

*
Now let us suppose firms actually want to reach a constant target utilization rate UR , and expect a production
𝑎
level𝑄𝑡+1 . Then by definition:

𝑲∗𝒕 = 𝑪𝑨𝑷∗𝒕 /𝒑𝒌∗𝒕 = 𝑸𝒂𝒕+𝟏 /𝑼𝑹∗ /𝒑𝒌𝒕+𝟏

𝑲𝒕 = 𝑪𝑨𝑷𝒕 /𝒑𝒌𝒕 = 𝑸𝒕 /𝑼𝑹𝒕 /𝒑𝒌𝒕

And defining tx(z) as the growth rate of z:

𝑡𝑥 ∗ (𝐾𝑡 ) ≈ 𝑡𝑥 ∗ (𝐶𝐴𝑃𝑡 ) − 𝑡𝑥(𝑝𝑘𝑡 ) ≈ 𝑡𝑥 𝑎 (𝑄𝑡 ) − 𝑡𝑥 ∗ (𝑈𝑅𝑡 ) − 𝑡𝑥(𝑝𝑘𝑡 )

This means that the target growth rate of capital can be decomposed as the sum of three terms, one with a positive
influence:

• The expected growth rate of production

and two negative ones:

• The target growth rate of the rate of use: if the firms feel their capacities are 1% too high for the present level
of production, they can reach the target by decreasing capital by 1% even if production is not expected to
change.
• The growth rate of capital productivity: if it increases by 1%, 1% less capital will be needed.

But the element we need is investment. To get it we shall use the definition.

166
𝐾𝑡 = 𝐾𝑡−1 ⋅ (1 − 𝑑𝑟𝑡 ) + 𝐼𝑡

which can be written as

𝑡𝑥(𝐾𝑡 ) = −𝑑𝑟𝑡 + 𝐼𝑡 /𝐾𝑡−1

This gives finally:

𝐼𝑡∗ /𝐾𝑡−1 = 𝑡𝑥(𝐾𝑡 ) = 𝑑𝑟𝑡 + 𝑡𝑥 ∗ (𝐾𝑡 ) = 𝑑𝑟𝑡 + 𝑡𝑥 𝑎 (𝑄𝑡 ) − 𝑡𝑥 ∗ (𝑈𝑅𝑡 ) − 𝑡𝑥(𝑝𝑘𝑡 )

In other words:

• If firms expect production to grow by 2.5%, capacities should adapt to that growth
• But if they feel their capacities are under-used by 1%, their desired capacity will only increase by 1.5%.
• If capital productivity is going to increase by 0.5%, they will need 0.5% less capital.
• But once capital growth has been defined, they also have to compensate for depreciation (5% is a reasonable
value).
In summary, the accumulation rate (the ratio of investment to the previous level of capital) would be:

2.5 - 1 - 0.5 + 5 = 6%.

If we suppose

• That the depreciation rate is constant, as well as the rate of growth of capital productivity,
• That production growth expectations are based on an average of the previous rates,
And we consider as the rate of use the ratio of actual GDP to a value obtained under normal utilization of factors,
which leads to a unitary target, we get the simplified formula:

𝐼𝑡∗ /𝐾𝑡−1 = 𝑎 + ∑𝑛𝑖=0 𝛼𝑖 ⋅ 𝑡𝑥 𝑎 (𝑄𝑡−𝑖 ) − 𝑡𝑥 ∗ (𝑈𝑅𝑡+1 )

With

167
∑𝑛𝑖=0 𝛼𝑖 = 1

finally, we can suppose, as we shall do also for employment, that the desired growth of capital is only partially reached
in practice, either because firms react cautiously to fluctuations of demand, or because they are constrained by
investment programs covering more than one period, from the decision to the actual installation of investment
goods.61

And we shall leave free the coefficients:

𝐼𝑡∗ /𝐾𝑡−1 = 𝑏 ⋅ 𝐼𝑡−1 /𝐾𝑡−2 + (1 − 𝑏) ⋅ (𝑎 + 𝑐 ⋅ ∑𝑛𝑖=0 𝛼𝑖 ⋅ 𝑡𝑥 𝑎 (𝑄𝑡−𝑖 ) − 𝑑 ⋅ 𝑡𝑥 ∗ (𝑈𝑅𝑡 ))

The results are rather satisfactory, with the right sign and acceptable statistics for all explanatory elements. This was
not obvious, as their strong correlation (both use Q in the numerator) could have made it difficult for the estimation
process to separate their role.

Dependent Variable: I/K(-1)

Method: Least Squares (Gauss-Newton / Marquardt steps)
Date: 03/22/20 Time: 13:01
Sample (adjusted): 1977Q4 2004Q4
Included observations: 109 after adjustments
I/K(-1)=C_I(1)*I(-1)/K(-2)+C_I(2)*(UR_STAR-UR)/UR+C_I(3)*.125*Q/Q(-8)
+C_I(4)

Coefficient Std. Error t-Statistic Prob.

C_I(1) 0.941668 0.011760 80.07071 0.0000

C_I(2) -0.002311 0.000863 -2.679352 0.0086
C_I(3) 0.045795 0.013397 3.418178 0.0009
C_I(4) -0.004818 0.001653 -2.915711 0.0043

R-squared 0.988698 Mean dependent var 0.020830

Adjusted R-squared 0.988375 S.D. dependent var 0.002417
S.E. of regression 0.000261 Akaike info criterion -13.63100
Sum squared resid 7.13E-06 Schwarz criterion -13.53224
Log likelihood 746.8897 Hannan-Quinn criter. -13.59095
F-statistic 3061.832 Durbin-Watson stat 1.649042
Prob(F-statistic) 0.000000

61
In this model, we suppose there is no delay between the acquisition of investment (with impact on demand and the
supply-demand equilibrium) and the participation of this investment to the production process.

168
.026
.024
.022
.020
.0008
.018
.0004
.016
.0000

-.0004

-.0008
78 80 82 84 86 88 90 92 94 96 98 00 02 04

Residual Actual Fitted

The graph of residuals shows that the quality of the explanation grows with time, and is especially good for the last
periods. This is rather important for simulations over the future, and one can wonder what we would have done if the
sample had been reversed, and the initial residuals had applied to the last periods.

We will deal with this problem of growing errors on recent periods when we address forecasts.

The equation we have built is not only satisfactory by itself, but we can expect it to provide the model with adequate
properties. In particular, the long-term elasticity of capital to production is now unitary by construction. Starting from
a base simulation, a 1% permanent shock on Q will leave the long run value of UR unchanged62. This gives the same
relative variations to production, capacity and (with a constant capital productivity) capital.

The coefficients “a” and “b” determine only the dynamics of the convergence to this target.

Actually we have estimated a kind of error-correction equation, in which the error is the gap between actual and
target capacity (the rate of use).

We hope to have made clear that to produce a consistent formulation, in particular in a modelling context, one must
start by establishing a sound economic background.

6.3.3 EMPLOYMENT: STATIONARITY, ERROR CORRECTION MODELS, BREAKPOINT TEST.

6.3.3.1 The economic framework

Of course, the employment equation should follow also a complementary factors framework.

In the previous paragraph, we have shown that in this framework the element determining capacity is the sole capital,
while firms could ask from workers a temporary increase in productivity, high enough to ensure the needed level of
production63. Adapting employment to the level required to obtain a “normal’ productivity target will be done by
steps.

62
As the left-hand side represents the (fixed) long-term growth rate of capital.

63
This is true in our macroeconomic framework, in which the changes in production are limited, and part of growth is
compensated by increases in structural productivity (due for instance to more capital-intensive processes). At the firm
169
This means estimating employment will allow us to apply elements on error correction models in a very simple
framework.

We shall suppose that firms:

• Know the level of production they have to achieve.

• Know also the level of production which should be achieved by each worker under normal circumstances (in
other term his normal productivity).
From these two elements they can determine the normal number of workers they need.

But they do not adapt the actual employment level to this target, and this for:

• Technical reasons: between the conclusion that more employees are needed and the actual hiring 64, firms
have to decide on the type of jobs called for, set up their demands, conduct interviews, negotiate wages,
establish contracts, get authorizations if they are foreign citizens, maybe ask prospective workers to train...
Of course this delay depends heavily on the type of job. And this goes also for laying out workers.
• Behavioral reasons: if facing a hike in production, firms adapt immediately their employment level to a higher
target, they might be faced later with over employment if the hike is only temporary. The workers they have
trained, maybe at a high cost, have no usefulness at the time they become potentially efficient. And laying
them out will call generally for compensations....

6.3.3.2 The formulas: stationarity and error correction

We should realize that we are facing an error correction framework, which we can materialize as:

“Normal” labor productivity does not depend on economic conditions. It might follow a constant trend over the
period, such as:

𝐿𝑜𝑔(𝑝𝑙𝑡 ) = 𝑎 + 𝑏 ⋅ 𝑡

Firms use this target to define “normal” employment:

𝐿𝐸𝑡∗ = 𝑄𝑡 /𝑝𝑙𝑡∗

They adapt actual employment to this target with some inertia:

level, employment can produce bottlenecks. This will be the case if a sudden fashion appears for particular goods
requiring specialized craftspeople, even if the tools and machines are available for buying.

64
But not the start of actual work: what we measure is the number of workers employed, even if they are still training
for instance.

170
𝛥𝐿𝑜𝑔(𝐿𝑡 ) = 𝛼 ⋅ 𝛥𝐿𝑜𝑔(𝐿∗𝑡 ) + 𝛽 ⋅ 𝐿𝑜𝑔(𝐿∗𝑡−1 /𝐿𝑡−1 ) + 𝛾 + 𝜀𝑡

We recognize here the error correction framework presented earlier, which requires:

𝐿𝑜𝑔(𝐿∗𝑡 /𝐿𝑡 ) to be stationary.

But  does not have to be unitary. However, if we follow the above reasoning, its value should be between 0 and 1,
and probably significantly far from each of these bounds.

To estimate this system we face an obvious problem: pl* is not an actual series (LE* either, but if we know one we
know the other).

But if we call “pl” the actual level of productivity (Q/LE) we can observe that:

𝐿𝑜𝑔(𝐿∗𝑡 /𝐿𝑡 ) = 𝐿𝑜𝑔((𝑄𝑡 /𝑝𝑙𝑡∗ )/(𝑄𝑡 /𝑝𝑙𝑡 )) = −𝐿𝑜𝑔(𝑝𝑙𝑡∗ /𝑝𝑙𝑡 )

The stationarity of 𝐿𝑜𝑔(𝐿∗𝑡 /𝐿𝑡 ) is equivalent to that of 𝐿𝑜𝑔(𝑝𝑙𝑡∗ /𝑝𝑙𝑡 )

Now it should be obvious that if pl* and pl have a trend, it must be the same, actually the trend defining completely
pl*. If not, they will diverge over the long run, and we will face infinite under or over employment. So target
productivity can be identified using the trend in the actual value, if it exists.

This means we can test the stationarity of the ratio as the stationarity of actual productivity around a trend, a test
provided directly by EViews.

We can expect a framework in which actual productivity fluctuates around a regularly growing target, with cycles
which we do not expect to be too long, but can last for several periods65.

6.3.3.3 The first estimations

First, we compute actual labor productivity

genr PROD = Q / LE

65
Which will create (acceptable) autocorrelation in the difference to the trend.

171
and regress it on time:

ls log(PROD) c t

to get the structural productivity trend.

Results are quite bad. Of course productivity shows a significant growth, but the standard error is quite high (more
than 5 %). More important, the graph of residuals and the auto-correlation test show that we are not meeting the
condition we have set: that observed productivity fluctuates around a trend, with potential but not unreasonably long
cycles.

Dependent Variable: LOG(Q)

Method: Least Squares
Date: 03/22/20 Time: 13:16
Sample (adjusted): 1963Q1 2004Q4
Included observations: 168 after adjustments

Variable Coefficient Std. Error t-Statistic Prob.

C -25.54456 0.829672 -30.78877 0.0000

T 0.026054 0.000418 62.30088 0.0000

R-squared 0.958986 Mean dependent var 26.14374

Adjusted R-squared 0.958739 S.D. dependent var 0.323533
S.E. of regression 0.065719 Akaike info criterion -2.595031
Sum squared resid 0.716947 Schwarz criterion -2.557841
Log likelihood 219.9826 Hannan-Quinn criter. -2.579938
F-statistic 3881.400 Durbin-Watson stat 0.036326
Prob(F-statistic) 0.000000

26.8

26.4

26.0
.1
25.6
.0
25.2
-.1

-.2

-.3
1965 1970 1975 1980 1985 1990 1995 2000

Residual Actual Fitted

The problem apparently lies in the fact that the average growth rate is consistently higher in the first part of the
period, and lower later. Seen individually, each sub-period might seem to meet the above condition.

172
From the graph, we clearly need one, and probably two breaks. One will observe that the first period follows the first
oil shock, and the beginning of a lasting world economic slowdown. The reason for the second break is less clear
(some countries like the US and Scandinavia show a break in the opposite direction).

For choosing the most appropriate dates, we can use two methods:

• A visual one: 1973 and 1990 could be chosen, possibly plus or minus 1 year.
• A statistical one: the most appropriate test is the Chow breakpoint test, which diagnoses if the introduction
of one or more breaks improve the explanation. To make our choice automatic, we shall consider three
intervals, and apply the test to all reasonably possible combinations of dates from those intervals. As we
could expect, all the tests conclude to a break. But we shall select the combination associated to the lowest
probability (of no break), which means the highest likelihood ratio66. Of course, this criterion works only
because the sample and the number of breaks remain the same.
The best result corresponds actually to 1972q3 and 1992q4, as shown in this table of log-likelihood ratios.

The equation for structural productivity is

log(prle)=c_prle(1)+c_prle(2)(t-2004)+c_prle(3)(t-1972 .50)(t<1972 .50)+c_prle(4)(t-1992.75)*(t<1992.75)

Dependent Variable: LOG(PRLE)

Method: Least Squares (Gauss-Newton / Marquardt steps)
Date: 03/22/20 Time: 15:32
Sample (adjusted): 1963Q1 2004Q4
Included observations: 168 after adjustments
LOG(PRLE)=C_PRLE(1)+C_PRLE(2)*(T-2004)+C_PRLE(3)*(T-T1)
*(T<T1)+C_PRLE(4)*(T-T2)*(T<T2)

Coefficient Std. Error t-Statistic Prob.

C_PRLE(1) 9.839846 0.002212 4447.971 0.0000

C_PRLE(2) 0.009382 0.000262 35.76099 0.0000
C_PRLE(3) 0.022782 0.000457 49.89498 0.0000
C_PRLE(4) 0.015405 0.000361 42.64593 0.0000

R-squared 0.999093 Mean dependent var 9.466339

Adjusted R-squared 0.999077 S.D. dependent var 0.307411
S.E. of regression 0.009341 Akaike info criterion -6.485251
Sum squared resid 0.014310 Schwarz criterion -6.410871
Log likelihood 548.7611 Hannan-Quinn criter. -6.455064
F-statistic 60233.59 Durbin-Watson stat 1.553017
Prob(F-statistic) 0.000000

66
the highest F gives the same conclusion

173
9.8

9.4

.04 9.0
.02
.00 8.6
-.02
-.04
-.06
-.08
1965 1970 1975 1980 1985 1990 1995 2000

Residual Actual Fitted

One will note:

• That we have introduced no residual, contrary to our usual practice.

• That we have introduced reversed trends, which stop after a while instead of starting inside the period.

o Target productivity is not a behavior.

The first element is quite logical: what we are estimating for the model is not actual productivity (this is given in the
model by an identity, dividing actual GDP by employment). We are looking for the exact formula for target
productivity, prone to error only because we have not enough information to produce the true value. If the sample
grew, or the periodicity shortened, the precision would improve constantly. The residual might not decrease, but it
does not represent an error, rather the gap between the actual and “normal” values of labor productivity. Whereas, in
a normal behavioral equation, the residual corresponds to an error on the variable, and cannot be decreased
indefinitely, as the identification of the role of explanatory elements becomes less reliable with their number.

o Partial trends should apply to past periods.

The reason here is purely technical. Our model is designed to be used on the future. So it is essential to make the
forecasting process as easy as possible.

If the partial trends are still active in the future, we shall have to manage them simultaneously. We can expect that we
want to control the global trend of labor productivity, if only to make it consistent with our long-term evolutions of
GDP (which should follow world growth) and employment (which should follow population trends). Obviously,
controlling a single global trend is easier than a combination of three trends.

Also, the last trend is the most important for interpretation of model properties, and it is better to make it the easiest
to observe.

On the other hand, our technique has no bad points, once it has been understood.

Finally, the reason for breaking the trend in 2004 is also associated with handling of its future values. If the global
coefficient is changed, this will be the period for a new break, and this is the best period to introduce it.

We can summarize the above in the following graph.

174
10

-10

-20

-30

-40

-50
65 70 75 80 85 90 95 00 05 10

T-2004
(T-1972.50)*(T<1972.50)
(T-1992.75)*(T<1992.75)

We can see that in the beginning three trends apply, then two, then after 2004 only the global (blue) trend is
maintained.

The results look quite good, both in the validation of coefficients and the graphs. We are presenting the program
version, which will be introduced in the model (as a identity )67.

However, we observe a very high residual in the second quarter of 1968. As we are estimating a trend, this is not the
place for considering a one-period outlier. We will come back to this problem when we estimate employment itself.

Now we must test the stationarity of the residual. We shall use the Dickey Fuller test (or Phillips – Perron).

First we need to generate from the current RESID a variable containing the residual (the test is going to compute its
own RESID, so it is not possible to test on a variable with this name).

• In program form, the test is conducted by:

genr res_prle=resid

uroot(1,p) res_prle

uroot(h,p) res_prle

• Using menus, one has to

o Display the variable

o Select View>Unit root test
o Choose the method.

67
This is not absolutely needed, as a variable depending only on time can be considered exogenous and computed
outside the model. But we want to be able to change the assumption in forecasts, and this is the easiest way.

175
Null Hypothesis: RES has a unit root
Exogenous: Constant
Lag Length: 1 (Automatic - based on SIC, maxlag=13)

t-Statistic Prob.*

Augmented Dickey-Fuller test statistic -7.029197 0.0000

Test critical values: 1% level -3.469933
5% level -2.878829
10% level -2.576067

*MacKinnon (1996) one-sided p-values.

Augmented Dickey-Fuller Test Equation

Dependent Variable: D(RES)
Method: Least Squares
Date: 03/22/20 Time: 15:55
Sample (adjusted): 1963Q3 2004Q4
Included observations: 166 after adjustments

Variable Coefficient Std. Error t-Statistic Prob.

RES(-1) -0.675886 0.096154 -7.029197 0.0000

D(RES(-1)) -0.163434 0.074306 -2.199455 0.0293
C 0.000183 0.000677 0.270373 0.7872

R-squared 0.420548 Mean dependent var 2.58E-05

Adjusted R-squared 0.413438 S.D. dependent var 0.011383
S.E. of regression 0.008718 Akaike info criterion -6.628887
Sum squared resid 0.012389 Schwarz criterion -6.572646
Log likelihood 553.1976 Hannan-Quinn criter. -6.606059
F-statistic 59.15010 Durbin-Watson stat 2.024797
Prob(F-statistic) 0.000000

The test concludes very strongly the stationarity of the residual.

Null
The Hypothesis:
values RES_PRLE
of target productivity has aemployment
and desired unit root are given by:
Exogenous: Constant
Lag Length: 1 (Automatic - based on SIC, maxlag=11)

t-Statistic Prob.*
176
Augmented Dickey-Fuller test statistic -5.191811 0.0000
Test critical values: 1% level -3.522887
5% level -2.901779
10% level -2.588280
genr log(prle_t)= c_prle(1)+c_prle(2)*t+c_prle(3)*(t-1972.50)*(t<1972.50)+c_prle(4)*(t-
1992.75)*(t<1992.75)

genr led=q/prle_t

Now, as to the estimation of employment itself, LE will be estimated (using here the developed form) by:

equation eq_le.ls dlog(LE)=c_le(1)dlog(LED)+c_le(2)log(LED(-1)/LE(-1))+c_le(3)

where LED is equal to Q/prle_t, the trend obtained in the previous equation.

It is now time to consider the 1968 residual. With such a high value, there should be some economic explanation.

Indeed, the behavior of firms did not follow normal lines. They believed (rightly) that the decrease in production they
were facing was quite temporary. For them, laying out the corresponding number of workers was not reasonable, as it
would cost severance payments, and when things went back to normal there was no reason they would find as
efficient and firm-knowledgeable workers as before.
This means labor productivity decreased, then increased to get back to normal.
Dependent Variable: DLOG(LE)
Method: Least Squares (Gauss-Newton / Marquardt steps)
Date: 03/22/20 Time: 16:35
Sample (adjusted): 1963Q2 2004Q4
Included observations: 167 after adjustments
DLOG(LE)=C_LE(1)*DLOG(LED)+C_LE(2)*LOG(LED(-1)/LE(-1))
+C_LE(3)+C_LE(4)*((T=1968.25)-(T=1968.50))+EC_LE

Coefficient Std. Error t-Statistic Prob.

C_LE(1) 0.295420 0.030602 9.653533 0.0000

C_LE(2) 0.199058 0.024568 8.102254 0.0000
C_LE(3) 0.000475 0.000190 2.495656 0.0136
C_LE(4) 0.017596 0.003294 5.342659 0.0000

R-squared 0.443499 Mean dependent var 0.000756

Adjusted R-squared 0.433257 S.D. dependent var 0.003228
S.E. of regression 0.002430 Akaike info criterion -9.178280
Sum squared resid 0.000962 Schwarz criterion -9.103598
Log likelihood 770.3864 Hannan-Quinn criter. -9.147968
F-statistic 43.30049 Durbin-Watson stat 1.087096
Prob(F-statistic) 0.000000

The results are rather significant, except for the last coefficient.

177
LE
19,200,000

18,800,000

18,400,000

18,000,000

17,600,000

17,200,000

16,800,000

16,400,000
1965 1970 1975 1980 1985 1990 1995 2000

Following the reasoning made earlier, c_le (3) (or rather c_le(3)/c_le(2)) should represent the logarithm of the long-
term gap between the target employment and the level reached. This gap will be significant if both:

• Employment shows a trend (the target is moving), which means that GDP and target productivity show
different trends.
• A difference between the growths of GDP and target productivity is not compensated immediately (the value
of c_le(1) is different from one)
The second condition is clearly met, but for the first the answer is dubious. Instead of trend, one rather observe a
break in the level. Nevertheless, the coefficient diagnosed as significant.

As to the first coefficients, they are quite significant, maybe lower than expected.

Dependent Variable: DLOG(LE)

Method: Least Squares (Gauss-Newton / Marquardt steps)
Date: 03/22/20 Time: 16:59
Sample (adjusted): 1963Q2 2004Q4
Included observations: 167 after adjustments
DLOG(LE)=C_LE(1)*DLOG(LED)+C_LE(2)*LOG(LED(-1)/LE(-1))
+C_LE(3)+EC_LE

Coefficient Std. Error t-Statistic Prob.

C_LE(1) 0.166334 0.020297 8.195005 0.0000

C_LE(2) 0.216267 0.026322 8.216159 0.0000
C_LE(3) 0.000598 0.000204 2.931151 0.0039

R-squared 0.346046 Mean dependent var 0.000756

Adjusted R-squared 0.338071 S.D. dependent var 0.003228
S.E. of regression 0.002626 Akaike info criterion -9.028889
Sum squared resid 0.001131 Schwarz criterion -8.972877
Log likelihood 756.9122 Hannan-Quinn criter. -9.006155
F-statistic 43.39114 Durbin-Watson stat 0.894407
Prob(F-statistic) 0.000000

178
.010
.005
.000
-.005

.010 -.010
-.015
.005

.000

-.005

-.010
1965 1970 1975 1980 1985 1990 1995 2000

Residual Actual Fitted

This will not be true in the US case. We will use another data base, this time bi-yearly.

First, the “1992” break exists too, but it is now positive, as shown here:

179
-2.6

-2.8

-3.0

.08 -3.2

-3.4
.04

-3.6
.00

-.04

-.08
1960 1965 1970 1975 1980 1985 1990 1995 2000

Residual Actual Fitted

Second, employment has grown substantially over the sample period, which means that a constant term is called for:

180
.04

.02

.02
.00

.01
-.02

.00
-.04

-.01

-.02
1965 1970 1975 1980 1985 1990 1995 2000

Residual Actual Fitted

6.3.3.4 A virtuous dummy element

Reverting to the French case, the outlier observed earlier (and in the equation for the change in inventories) remains.
The year 1968 presents a strong negative residual in the first semester, and a negative one for the last. As we are now
considering a dynamic behavior, this is the time for treating this problem.

As stated earlier, French people (and people familiar with French post-war history) will certainly recall the May 1968
“student revolution” which lasted roughly from March to June. During that period, the French economic process was
heavily disturbed, in particular the transportation system, and GDP decreased (by 7.6% for the quarter). If the
equation had worked , employment would have decreased too, especially as productivity growth was quite high. On
the contrary, it remained almost stable.

The explanation is obvious: firms expected the slump to be purely temporary, and activity to start back after a while
(actually they were right, and GDP grew by 7.5% in the next semester, due in part to the higher consumption allowed
by “Grenelle68” wage negotiations, very favorable to workers). They did not want to lay out (at a high cost) workers
whom they would need back later with no guarantee to find the same individuals, familiar with the firms’ techniques.
So the employment level was very little affected.

This means that the global behavior does not apply here, and the period has to be either eliminated from the sample,
or rather treated through a specific dummy variable, taking the value 1 in the first semester and –1 in the second
(when employment increased less than the growth in GDP would call for).

This case is rather interesting: some economists could be tempted to introduce dummies just because the equation
does not work, and indeed the results will be improved (including in general the statistics for the explanatory
variables). This can probably be called cheating. On the contrary, not introducing the present dummy can be
considered incorrect: we know that the local behavior did not follow the formulation we have selected, so it has to be
modified accordingly.

68
From the location of the Ministry of Employment where negotiations where conducted.

181
The global results are slightly improved, and the first coefficient increases significantly, meaning that the adaptation of
employment to the target is more effective at first. The introduction of the element was not a negligible issue.

Dependent Variable: DLOG(LE)

Method: Least Squares (Gauss-Newton / Marquardt steps)
Date: 03/22/20 Time: 17:15
Sample (adjusted): 1963Q2 2004Q4
Included observations: 167 after adjustments
DLOG(LE)=C_LE(1)*DLOG(LED)+C_LE(2)*LOG(LED(-1)/LE(-1))
+C_LE(3)+C_LE(4)*((T=1968.25)-(T=1968.50))+EC_LE

Coefficient Std. Error t-Statistic Prob.

C_LE(1) 0.295420 0.030602 9.653533 0.0000

C_LE(2) 0.199058 0.024568 8.102254 0.0000
C_LE(3) 0.000475 0.000190 2.495656 0.0136
C_LE(4) 0.017596 0.003294 5.342659 0.0000

R-squared 0.443499 Mean dependent var 0.000756

.010

.005

.000
.008
.004 -.005

.000 -.010
-.004
-.008
-.012
1965 1970 1975 1980 1985 1990 1995 2000

Residual Actual Fitted

We shall use:

equation eq_prle.ls(p) log(prle)=c_prle(1)+c_prle(2)t+c_prle(3)(t-1973)*(t<1973)

+c_prle(4)*(t-1992)*(t<1992)

equation eq_le.ls(p) dlog(le)=c_le(1)dlog(led)+c_le(2)log(led(-1)/le(-1))+c_le(3)

+c_le(4)*((t=1968.5)-(t=1968))+ec_le

182
genr ec_le=resid

and as we have to introduce two new variables

g_vendo.drop led prle_t

g_vendo.add led prle_t

Note: the reason for the initial “drop” statement is to avoid the duplication of the elements inside the group, in case
the procedure is repeated. If the elements are not present, nothing happens.

6.3.4 EXPORTS: AUTOREGRESSIVE PROCESS, COINTEGRATION, LONG-TERM STABILITY.

Estimating exports will be simpler from the theoretical side. We shall use it as an example for introducing first
autoregressive processes, then cointegration.

Let us first start with the simplest idea: exports show a constant elasticity to world demand. In other words:

𝛥𝑋/𝑋/(𝛥𝑊𝐷/𝑊𝐷) = 𝑎

or by integration:

𝐿𝑜𝑔(𝑋) = 𝑎 ⋅ 𝐿𝑜𝑔(𝑊𝐷) + 𝑏

Estimation should give to a value close to unity, as world demand is measured as the normal demand addressed to
France by its clients, and takes into account:

• The expansion of international trade.

• The types of goods France exports,
• The structure of the countries to which France naturally exports.
For instance, both luxury goods and Germany have a higher share in this indicator, compared to the global world
market. As a weighting of normal imports, it takes also into account the growing importance of international trade.

Indeed the coefficient we obtain is close to unity (but significantly different).

183
Dependent Variable: LOG(X)
Method: Least Squares
Date: 03/22/20 Time: 17:39
Sample (adjusted): 1974Q1 2004Q4
Included observations: 124 after adjustments

Variable Coefficient Std. Error t-Statistic Prob.

LOG(WD) 0.887012 0.004704 188.5550 0.0000

C 2.940895 0.115151 25.53944 0.0000

R-squared 0.996580 Mean dependent var 24.64867

Adjusted R-squared 0.996552 S.D. dependent var 0.446469
S.E. of regression 0.026216 Akaike info criterion -4.428916
Sum squared resid 0.083846 Schwarz criterion -4.383427
Log likelihood 276.5928 Hannan-Quinn criter. -4.410437
F-statistic 35552.98 Durbin-Watson stat 0.480527
Prob(F-statistic) 0.000000

25.6

25.2

24.8
.12
24.4
.08
24.0
.04
23.6
.00

-.04

-.08
74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04

Residual Actual Fitted

However the low value of the Durbin-Watson test indicates a strongly positive autocorrelation of residuals, and
invalidates the formulation. The graph shows indeed long periods with a residual of the same sign, even though the
two variables share quite often common evolutions.

Let us try to eliminate auto-correlation, supposing that the residual is actually:

𝑒𝑡 = 𝜌 ⋅ 𝑒𝑡−1 + 𝑢𝑡

where  should be significant (positive here), and u(t) independent across time.

The simplest idea is to transform the equation

184
𝐿𝑜𝑔(𝑋𝑡 ) = 𝑎 ⋅ 𝐿𝑜𝑔(𝑊𝐷𝑡 ) + 𝑏 + 𝑒𝑡

which is also true in the previous period.

𝐿𝑜𝑔(𝑋𝑡−1 ) = 𝑎 ⋅ 𝐿𝑜𝑔(𝑊𝐷𝑡−1 ) + 𝑏 + 𝑒𝑡−1

We can multiply the second equation by , and subtract it from the first:

𝐿𝑜𝑔(𝑋𝑡 ) − 𝜌 ⋅ 𝐿𝑜𝑔(𝑋𝑡−1 ) = 𝑎 ⋅ (𝐿𝑜𝑔(𝑊𝐷𝑡 ) − 𝜌 ⋅ 𝐿𝑜𝑔(𝑊𝐷𝑡−1 )) + 𝑏 ⋅ (1 − 𝜌) + 𝑒𝑡 − 𝜌 ⋅ 𝑒𝑡−1

The residual for the new equation is the uncorrelated u:

𝐿𝑜𝑔(𝑋𝑡 ) − 𝜌 ⋅ 𝐿𝑜𝑔(𝑋𝑡−1 ) = 𝑎 ⋅ (𝐿𝑜𝑔(𝑊𝐷𝑡 ) − 𝜌 ⋅ 𝐿𝑜𝑔(𝑊𝐷𝑡−1 )) + 𝑏 ⋅ (1 − 𝜌) + 𝑢𝑡

6.3.4.1 Introducing an autoregressive process

To estimate the above formula, it is not necessary to establish the full equation (which calls for a full non–OLS
specification, as it is not linear in the coefficients).

One can very well use the same presentation as for ordinary least squares, introducing in the estimation window the
additional term AR(n), n representing the autocorrelation lag, in our case 1:

ls Log(X) log(WD) c ar(1)

But the application to the developed formula is also quite simple:

equation eq_lx.ls Log(X)=c_x(1)*log(WD)+c_x(2)+[ar(1)=c_x(3)]

185
Dependent Variable: LOG(X)
Method: ARMA Conditional Least Squares (Marquardt - EViews legacy)
Date: 03/22/20 Time: 17:50
Sample (adjusted): 1974Q2 2004Q4
Included observations: 123 after adjustments
Convergence achieved after 1 iteration
LOG(X)=C_X(1)*LOG(WD)+C_X(2)+[AR(1)=C_X(3)]

Coefficient Std. Error t-Statistic Prob.

C_X(1) 0.884418 0.012700 69.64166 0.0000

C_X(2) 3.003843 0.311465 9.644243 0.0000
C_X(3) 0.757250 0.059233 12.78433 0.0000

R-squared 0.998533 Mean dependent var 24.65426

Adjusted R-squared 0.998509 S.D. dependent var 0.443924
S.E. of regression 0.017142 Akaike info criterion -5.270440
Sum squared resid 0.035263 Schwarz criterion -5.201850
Log likelihood 327.1320 Hannan-Quinn criter. -5.242579
F-statistic 40847.79 Durbin-Watson stat 2.304652
Prob(F-statistic) 0.000000

Inverted AR Roots .76

The results are rather satisfactory: the first coefficient retains the theoretical value, the new coefficient is significant,
the global precision is much improved (see also the graph) and the DW test is closer to satisfactory.

25.6
25.2
24.8
.06 24.4
.04 24.0
.02
23.6
.00
-.02
-.04
-.06
74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04

Residual Actual Fitted

However our formulation is a little too simplistic. We want exports to decrease with the rate of use of capacities,
representing the fact that if firms are already using selling most of their potential production, they will be to be less
dynamic in their search for foreign markets (more on this later).

Let us introduce the rate of use UR in the formula.

We get:

186
Dependent Variable: LOG(X)
Method: ARMA Conditional Least Squares (Marquardt - EViews legacy)
Date: 03/22/20 Time: 18:00
Sample: 1978Q1 2004Q4
Included observations: 108
Convergence achieved after 13 iterations
LOG(X)=C_X(1)*LOG(WD)+C_X(2)*LOG(UR)+C_X(3)+[AR(1)=C_X(4)]

Coefficient Std. Error t-Statistic Prob.

C_X(1) 0.381166 0.088426 4.310555 0.0000

C_X(2) 0.219096 0.083380 2.627682 0.0099
C_X(3) 21.58771 36.98678 0.583660 0.5607
C_X(4) 0.998913 0.006440 155.1013 0.0000

R-squared 0.998621 Mean dependent var 24.74243

Adjusted R-squared 0.998581 S.D. dependent var 0.399745
S.E. of regression 0.015058 Akaike info criterion -5.517431
Sum squared resid 0.023582 Schwarz criterion -5.418092
Log likelihood 301.9413 Hannan-Quinn criter. -5.477153
F-statistic 25100.00 Durbin-Watson stat 2.144309
Prob(F-statistic) 0.000000

Inverted AR Roots 1.00

The relevant coefficients are significant, the average error lower 69 (1,5%), the Durbin-Watson test acceptable, but
there is a problem: the sign for the new element is wrong (and unsurprisingly, the coefficient for World demand is
now much too low.

Let us not despair. If this old fashioned tool did not work, let us try a more up to date: cointegration.

6.3.4.2 Applying cointegration under EViews

Just as for stationarity, we will not develop the theory behind the method, leaving it to actual econometricians, an
excellent source of information being actually the EVIews manual. We sill also rely on basic cointegration theory, and
not on any of the recent developments.

Let us just say that cointegration is actually a simple extension of stationarity to a set of two or more variables. To
establish cointegration between two elements, one has to prove that in the long run these elements move together,
maintaining a bounded “distance” (or rather that a linear combination is bounded), while the value of each of the two
elements is unbounded (a necessary condition).

For a group of more than two elements to be cointegrated, no subset of this group must have this property
(stationarity of no single element and cointegration of no subset).

If we want to go beyond intuition, the reason for the last condition is that if a cointegrating relation is evidenced
between elements, some of which are already cointegrated, one can always recompose the encompassing equation
into the true cointegrating equation (considered as a new stationary variable) and other variables.

69
As the logarithm measures relative evolutions, an absolute error on a logarithm is equivalent to a relative error on
the variable itself.

187
For instance,

𝑎⋅𝑥+𝑏⋅𝑦+𝑐⋅𝑧

is tested as a cointegrating equation, but:

𝑎 ⋅ 𝑥 + 𝑏′ ⋅ 𝑦

is too (we can use the same a as a cointegrating equation is known to a given factor), then

𝑎⋅𝑥+𝑏⋅𝑦+𝑐⋅𝑧

is equivalent to:

(𝑎 ⋅ 𝑥 + 𝑏′ ⋅ 𝑦) + (𝑏′ − 𝑏) ⋅ 𝑦 + 𝑐 ⋅ 𝑧

three new elements, one of which is stationary, which forbids us to test cointegration on the three.

So the two properties must be checked: moving together means both “moving” and “together”.

Using images rather related to stationarity (as they apply to the actual difference of two elements, without weighting
coefficients) we can illustrate the concept as

• Astral bodies moving in outer space and linked together by gravity. Their distance is bounded but their
position relative to each other is unknown within those bounds, and we do not know if one is leading the
other.
• Human beings: if they are always close to each other, they can be decided to be related (love, hate,
professional relationship). But only if they move: if they are in jail, a small distance means nothing.
In our example, the first idea could be to test cointegration between X, WD and UR. But to ensure the stability of our
long-term simulations, we need exports to have a unitary elasticity to WD. If this is not the case, when X reaches a
constant growth rate, it will be different from that of WD: either France will become the only exporter in the world (in
relative terms) or the role of France in the world market will become infinitely negligible. Both prospects are
unacceptable (the first more than the second, admittedly).

188
This constraint can be enforced very easily by considering only in the long run (cointegrating) equation the ratio of X
to WD, which we shall link to the rate of use. We will test cointegration between these two elements.

Let us first test their stationarity. We know how to do it (from the estimation of employment).

In a program, this can be done through:

UROOT(1,p) Log(X/WD)

UROOT(1,p) Log(UR)

Note: if we use menus, we should first display the group (either by selecting elements in the workfile window or
creating a group). The default display mode is “spreadsheet” but the “View” item proposes other modes, among them
“Unit root test” and “cointegration test” (if more than one series is displayed at the same time).

189
Null Hypothesis: LOG(X/WD) has a unit root
Exogenous: Constant, Linear Trend
Lag Length: 1 (Automatic - based on SIC, maxlag=12)

t-Statistic Prob.*

Augmented Dickey-Fuller test statistic -2.515780 0.3201

Test critical values: 1% level -4.046072
5% level -3.452358
10% level -3.151673

*MacKinnon (1996) one-sided p-values.

Augmented Dickey-Fuller Test Equation

Dependent Variable: D(LOG(X/WD))
Method: Least Squares
Date: 03/22/20 Time: 19:06
Sample: 1978Q2 2004Q4
Included observations: 107

Variable Coefficient Std. Error t-Statistic Prob.

LOG(X(-1)/WD(-1)) -0.157329 0.062537 -2.515780 0.0134

D(LOG(X(-1)/WD(-1))) -0.265307 0.092520 -2.867562 0.0050
C 0.038142 0.016339 2.334424 0.0215
@TREND("1978Q2") -0.000260 0.000117 -2.217517 0.0288

R-squared 0.172932 Mean dependent var -0.001515

Adjusted R-squared 0.148842 S.D. dependent var 0.017713
S.E. of regression 0.016341 Akaike info criterion -5.353556
Sum squared resid 0.027505 Schwarz criterion -5.253637
Log likelihood 290.4152 Hannan-Quinn criter. -5.313050
F-statistic 7.178750 Durbin-Watson stat 1.937887
Prob(F-statistic) 0.000201

190
Null Hypothesis: LOG(UR) has a unit root
Exogenous: Constant, Linear Trend
Lag Length: 10 (Automatic - based on SIC, maxlag=12)

t-Statistic Prob.*

Augmented Dickey-Fuller test statistic -2.399050 0.3778

Test critical values: 1% level -4.054393
5% level -3.456319
10% level -3.153989

*MacKinnon (1996) one-sided p-values.

Augmented Dickey-Fuller Test Equation

Dependent Variable: D(LOG(UR))
Method: Least Squares
Date: 03/22/20 Time: 19:08
Sample (adjusted): 1980Q3 2004Q4
Included observations: 98 after adjustments

Variable Coefficient Std. Error t-Statistic Prob.

LOG(UR(-1)) -0.091478 0.038131 -2.399050 0.0186

D(LOG(UR(-1))) 1.464814 0.094658 15.47473 0.0000
D(LOG(UR(-2))) -1.665424 0.170251 -9.782189 0.0000
D(LOG(UR(-3))) 1.785757 0.212274 8.412515 0.0000
D(LOG(UR(-4))) -1.737132 0.246525 -7.046472 0.0000
D(LOG(UR(-5))) 1.618277 0.251123 6.444148 0.0000
D(LOG(UR(-6))) -1.586227 0.255319 -6.212736 0.0000
D(LOG(UR(-7))) 1.412498 0.236345 5.976417 0.0000
D(LOG(UR(-8))) -1.047248 0.214050 -4.892542 0.0000
D(LOG(UR(-9))) 0.719400 0.157728 4.561021 0.0000
D(LOG(UR(-10))) -0.292147 0.096418 -3.030009 0.0032
C -0.019947 0.008365 -2.384508 0.0193
@TREND("1978Q2") 3.72E-05 3.38E-05 1.101800 0.2737

R-squared 0.793872 Mean dependent var 0.000238

Adjusted R-squared 0.764772 S.D. dependent var 0.018172
S.E. of regression 0.008813 Akaike info criterion -6.502075
Sum squared resid 0.006603 Schwarz criterion -6.159171
Log likelihood 331.6017 Hannan-Quinn criter. -6.363378
F-statistic 27.28050 Durbin-Watson stat 2.006954
Prob(F-statistic) 0.000000

These first tests show that both UR and the ratio of exports to world demand cannot be considered stationary, even
around a trend: the T statistic is too low, and the estimated probability for the coefficient to zero is too high 70.71

70
Not extremely high, however.
191
Let us now see if the two elements are co integrated, using the Johansen test.

For EViews this calls for:

coint(option,p) list-of-variables or group-name

Option represents the type of cointegration tested:

a No deterministic trend in the data, and no intercept or trend in the cointegrating equation.

b No deterministic trend in the data, and an intercept but no trend in the cointegrating equation.

c Linear trend in the data, and an intercept but no trend in the cointegrating equation.

d Linear trend in the data, and both an intercept and a trend in the cointegrating equation.

e Quadratic trend in the data, and both an intercept and a trend in the cointegrating equation.

s Summarize the results of all 5 options (a-e).

In our case, we shall use option d (trend in the cointegrating equation, no trend in the VAR)

coint(d,p) log(x/wd) log(ur)

71
We can observe that in a traditional least squares estimation, the same T value would give the opposite diagnosis.

192
Date: 03/22/20 Time: 19:09
Sample (adjusted): 1979Q1 2004Q4
Included observations: 104 after adjustments
Trend assumption: Linear deterministic trend (restricted)
Series: LOG(X/WD) LOG(UR)
Lags interval (in first differences): 1 to 4

Unrestricted Cointegration Rank Test (Trace)

Hypothesized Trace 0.05

No. of CE(s) Eigenvalue Statistic Critical Value Prob.**

None * 0.211670 31.14976 25.87211 0.0100

At most 1 0.059815 6.414546 12.51798 0.4095

Trace test indicates 1 cointegrating eqn(s) at the 0.05 level

* denotes rejection of the hypothesis at the 0.05 level
**MacKinnon-Haug-Michelis (1999) p-values

Unrestricted Cointegration Rank Test (Maximum Eigenvalue)

Hypothesized Max-Eigen 0.05

No. of CE(s) Eigenvalue Statistic Critical Value Prob.**

None * 0.211670 24.73521 19.38704 0.0076

At most 1 0.059815 6.414546 12.51798 0.4095

Max-eigenvalue test indicates 1 cointegrating eqn(s) at the 0.05 level

* denotes rejection of the hypothesis at the 0.05 level
**MacKinnon-Haug-Michelis (1999) p-values

Unrestricted Cointegrating Coefficients (normalized by b'S11b=I):

LOG(X/WD) LOG(UR) @TREND(62Q2)

51.27274 34.10694 0.080146
-18.80755 21.73616 -0.041619

Unrestricted Adjustment Coefficients (alpha):

D(LOG(X/WD)) -0.005934 0.002273

D(LOG(UR)) -0.003137 -0.001939

1 Cointegrating Equation(s): Log likelihood 622.6250

Normalized cointegrating coefficients (standard error in parentheses)

LOG(X/WD) LOG(UR) @TREND(62Q2)
1.000000 0.665206 0.001563
(0.12689) (0.00013)

Adjustment coefficients (standard error in parentheses)

D(LOG(X/WD)) -0.304258
(0.07800)
D(LOG(UR)) -0.160833
(0.05275)

Obviously, cointegration is accepted (a message says so). But we can also:

• Understand the logical process:

EViews tests first if there is no cointegration. If this is accepted (if it shows a high enough probability), the process
stops. But here this is refused, as the probability (that there is no cointegration) is too low.

193
In this case, there is at least one cointegrating equation, and EViews proceeds to testing if there is more than one. This
cannot be refused here, as this time the probability is too high.

We have at least one relation, and at most one: we have one.

If the second assumption (at most 1 relation) had not been rejected, there would be at least two and we would have
to continue (there cannot be more relations than variables, however).

Evidencing more than one relation is problematic, maybe worse for the model builder than finding no relation (in
which case we can always proceed with adding new elements). Even if a cointegrating equation has no implications on
causality between elements, we generally intend to include it in a single dynamic formula (a VAR), which does explain
a given variable. With two equations, we are stuck with a parasite one, which will be difficult if not impossible to
manage in the context of the model (if we stop at econometrics, the problem is smaller).

• Look at the probabilities:

We can observe if the existence (or the rejection) of at least one relation is barely or strongly accepted (and also of
only one for that matter).

• Observe the coefficients in the cointegrating equation.

The equation introduces a tradeoff between several concepts (here the share of French exports in world demand and
their rate of use). We have always an idea on the sign of the relationship, and also on an interval of economic validity.
There is no guarantee that the value will follow these constraints. It can even happen that the right sign obtained by
Ordinary Least Squares will become wrong when cointegration is tested on the same relation.

Here it is not too difficult to judge on the soundness of the explanation.

First, the sign is right: exports go down when the rate of use goes up (the sign is positive but both elements are on the
same side of the formula).

The size of the coefficient is more difficult to judge. The derivative of the equation relative to Q gives:

(X)/X= -0.665  (UR)/UR= -0.665 ( (Q)/Q -  (CAP)/CAP.

In the short run, CAP does not change:

 (X)/X= -0.665  (UR)/UR= -0.665  (Q)/Q

 (X) = -0.665 X/Q  (Q)

Let us suppose an increase in Q of 1 billion Euros coming only from local demand FD. In 2004, the share of exports in
French GDP was 32%. The exports target will decrease by

194
 X = 0.665 * 0.32 = 213 millions of Euros.

The substitution effect looks quite reasonable.

Of course:

• In the long run, capacities will build up, UR will get back to its base value (we know that from the investment
equation) and the loss will disappear.
• The changes in Q can come also from X, introducing a loop. This means UR might not be the best
representative element. Perhaps we should restrict UR to the satisfaction of local demand (but this looks
difficult to formulate).

6.3.4.3 Once cointegration has been evidenced

Two things have to be done:

• Storing the cointegrating equation and its parameters.
• Estimating the VAR (the dynamic equation) and creating the associated element.
The first task should be easy, as EViews does display the requested equation, with its values. However:

• This equation is not available as an item.

• The coefficients are not available as a vector (or as scalars).
There is a trick, however, which solves this problem. One can estimate a VAR, in other terms a system which includes
both a dynamic equation and a cointegrating equation. This is not directly useful for us, as

• This requires to use the same variables in both forms, in other words to extend the unitary elasticity
assumption to the dynamic equation, which is neither needed statistically nor realistic from an economic
point of view (as we have seen when estimating employment).
• The output does not provide information on the quality of the cointegration, an essential element in our
process.
But the good point is that by estimating a VAR using the same elements as the tested cointegration, we get the same
cointegrating coefficients, and this time they are stored in a vector! We can then specify the cointegrating equation
using these elements.

This has the extremely high advantage of allowing to establish a program which will adapt automatically to any change
in the data, an essential element in all operational modelling projects.

For this particular example, after:

coint(d,p) log(x/wd) log(ur)

195
we shall use

vary _var_x.ec(d,p) 1 1 log(x/wd) log(ur)72

to create and estimate the VAR, and store the coefficients in a vector by accessing the first line of matrix _var_x.b (not
displayed in the workfile window):

vector(10) p_x

p_x(1)=_var_x.b(1,1)

p_x(2)=_var_x.b(1,2)

p_x(3)=_var_x.b(1,3)

The cointegrating equation will be:

0= p_x(1)*log(x/wd)+p_x(2)*log(ur)+p_x(3)*@trend(60S1)

Actually the first parameter is not really needed, as it is equal by construction. We think using it makes the equation
clearer.

Estimating the dynamic equation calls for the computation of the residual in the cointegrating equation:

genr res_x= p_x(1)log(x/wd)+p_x(2)log(ur)+p_x(3)*t

Then we estimate the VAR, releasing the constraint on the unitary elasticity of X to WD. In principle, the coefficient
should be positive for WD, negative for UR, and we can introduce a lag structure for both elements.

Unfortunately, the coefficient for UR is not significant here, maybe because of a delay in its influence.

72
The figures 1 and 1 indicate the scope of lagged variations of the left-hand side variable in the VAR which will be
added to the right-hand side. Here it will be 1 to 1 (or 1 with lag 1). The options used are consistent with the ones we
have used in coint (which have been determined automatically by EViews).

196
Dependent Variable: DLOG(X)
Method: Least Squares (Gauss-Newton / Marquardt steps)
Date: 03/22/20 Time: 19:18
Sample (adjusted): 1978Q3 2004Q4
Included observations: 106 after adjustments
DLOG(X)=C_X(4)*DLOG(WD)+C_X(6)*RES_X(-1)+C_X(7)

Coefficient Std. Error t-Statistic Prob.

C_X(4) 0.494017 0.084894 5.819245 0.0000

C_X(6) -0.091424 0.038151 -2.396342 0.0184
C_X(7) 0.019601 0.006141 3.191585 0.0019

R-squared 0.266535 Mean dependent var 0.011661

Adjusted R-squared 0.252293 S.D. dependent var 0.017116
S.E. of regression 0.014800 Akaike info criterion -5.560452
Sum squared resid 0.022562 Schwarz criterion -5.485071
Log likelihood 297.7039 Hannan-Quinn criter. -5.529900
F-statistic 18.71468 Durbin-Watson stat 2.032482
Prob(F-statistic) 0.000000

.06

.04

.02
.06
.00
.04
-.02
.02
-.04
.00

-.02

-.04
78 80 82 84 86 88 90 92 94 96 98 00 02 04

Residual Actual Fitted

6.3.5 IMPORTS: GOING FURTHER ON COINTEGRATION AND LONG-TERM STABILITY.

In a single country model, the rest of the world is exogenous, and imports and exports have to be estimated
separately, following of course the same guidelines:

• Imports will depend on demand and capacity utilization, through constant elasticities.

However, the definition of demand is not straightforward. For exports, we just had to consider global imports from
each partner country in each product, and compute an average using a double weighting: the role of these partners in
French exports, and the structure of products exported by France.
197
This was possible because we considered the rest of the world as exogenous, and did not try to track the origin of its
imports.

Now imports can come from three endogenous elements:

• Local final demand, such as foreign cars.

• The intermediate goods necessary to local firms to satisfy this local demand. For cars it will be electronics,
steel, energy to run the machines…
• Identically, intermediate goods necessary to produce exported goods.
• But not finished goods used to satisfy foreign demand: France does not re-export significantly goods without
transformation (contrary to Hong Kong for instance).
Basically, two methods can be considered:

• We can consider final demand and exports. Obviously, they do not have the same impact on imports (due to
the absence of re-exports). We can generate a global demand by applying to exports a correcting factor.
Under rather standard assumptions (same import share in all uses, unitary ratio of intermediate consumption
to value added), this factor can be set at 0.5.
• We can also define intermediate consumption, and add it to final demand to get the total demand of the
country, a share of which will be imported. This method is obviously more acceptable from the economic
point of view. Unfortunately it relies on the computation of intermediate consumption, a variable less
accurately measured, and sensitive to categories and the integration of the productive process73.

We have chosen this last method nevertheless, favoring economic properties over statistical reliability.

Dependent Variable: LOG(M)

Method: Least Squares
Date: 04/25/20 Time: 12:16
Sample (adjusted): 1962Q1 2004Q4
Included observations: 172 after adjustments

Variable Coefficient Std. Error t-Statistic Prob.

LOG(TD) 1.986960 0.013282 149.6030 0.0000

C -30.78521 0.378155 -81.40902 0.0000

R-squared 0.992462 Mean dependent var 25.78331

Adjusted R-squared 0.992417 S.D. dependent var 0.723716
S.E. of regression 0.063021 Akaike info criterion -2.679148
Sum squared resid 0.675173 Schwarz criterion -2.642549
Log likelihood 232.4067 Hannan-Quinn criter. -2.664299
F-statistic 22381.05 Durbin-Watson stat 0.069582
Prob(F-statistic) 0.000000

73
For instance, if good A (say cotton) is used to produce good B (unprinted fabric) which gives good C (printed fabric),
both A and B will be counted as intermediary consumption. If the fabric is printed as the same time it is produced,
only A will be counted. If we consider value added, the total amount will not change, just the number of elements.

198
Dependent Variable: LOG(M)
Method: ARMA Maximum Likelihood (BFGS)
Date: 04/25/20 Time: 12:18
Sample: 1962Q1 2004Q4
Included observations: 172
Convergence achieved after 8 iterations
Coefficient covariance computed using outer product of gradients

Variable Coefficient Std. Error t-Statistic Prob.

LOG(TD) 2.027497 0.063354 32.00281 0.0000

C -31.90108 1.786973 -17.85202 0.0000
AR(1) 0.976587 0.020587 47.43628 0.0000
SIGMASQ 0.000271 2.29E-05 11.81244 0.0000

R-squared 0.999480 Mean dependent var 25.78331

Adjusted R-squared 0.999471 S.D. dependent var 0.723716
S.E. of regression 0.016642 Akaike info criterion -5.312913
Sum squared resid 0.046529 Schwarz criterion -5.239715
Log likelihood 460.9105 Hannan-Quinn criter. -5.283215
F-statistic 107738.0 Durbin-Watson stat 1.836241
Prob(F-statistic) 0.000000

Inverted AR Roots .98

27.0
26.5
26.0
25.5
.08 25.0
24.5
.04
24.0
.00

-.04

-.08
1965 1970 1975 1980 1985 1990 1995 2000

Residual Actual Fitted

The fit is quite good including the sensitivity to fluctuations (which was the least we could expect), and the
autocorrelation eliminated. But we face a problem: the high value of the coefficient.

Now the question is: should the growth of demand be the only explanation for the growth of imports? In other words,
is there not an autonomous force increasing the weight of foreign trade, independently from growth itself? Or: if
demand did not grow for a given period, would imports stay stable, or keep part of their momentum? The associated
formula would present:

• A more or less unitary elasticity of imports to demand.

• A positive trend representing the regular growth of world trade.
Applying this idea, we get:

199
Dependent Variable: LOG(M)
Method: Least Squares
Date: 04/25/20 Time: 12:32
Sample (adjusted): 1962Q1 2004Q4
Included observations: 172 after adjustments

Variable Coefficient Std. Error t-Statistic Prob.

LOG(TD) 1.608779 0.056271 28.58972 0.0000

C -42.37624 1.719498 -24.64454 0.0000
T 0.011273 0.001640 6.872870 0.0000

R-squared 0.994108 Mean dependent var 25.78331

Adjusted R-squared 0.994039 S.D. dependent var 0.723716
S.E. of regression 0.055878 Akaike info criterion -2.913993
Sum squared resid 0.527683 Schwarz criterion -2.859095
Log likelihood 253.6034 Hannan-Quinn criter. -2.891719
F-statistic 14257.73 Durbin-Watson stat 0.098040
Prob(F-statistic) 0.000000

Dependent Variable: LOG(M)

Method: ARMA Maximum Likelihood (BFGS)
Date: 04/25/20 Time: 12:33
Sample: 1962Q1 2004Q4
Included observations: 172
Convergence achieved after 7 iterations
Coefficient covariance computed using outer product of gradients

Variable Coefficient Std. Error t-Statistic Prob.

LOG(TD) 2.116669 0.100512 21.05895 0.0000

C -26.91516 7.657330 -3.514954 0.0006
T -0.003789 0.004770 -0.794421 0.4281
AR(1) 0.980164 0.021535 45.51418 0.0000
SIGMASQ 0.000269 2.30E-05 11.72939 0.0000

R-squared 0.999483 Mean dependent var 25.78331

Adjusted R-squared 0.999471 S.D. dependent var 0.723716
S.E. of regression 0.016653 Akaike info criterion -5.305020
Sum squared resid 0.046312 Schwarz criterion -5.213523
Log likelihood 461.2317 Hannan-Quinn criter. -5.267897
F-statistic 80700.22 Durbin-Watson stat 1.843649
Prob(F-statistic) 0.000000

This does not work. Either we get autocorrelation, or a negative (but not significant) trend.

The problem with our formulation is actually very clear: in terms of model properties, it is reasonable to suppose that
in the short run, an increase in final demand will increase imports beyond their normal share, by generating local
bottlenecks on domestic supply. But the explanatory element should be the rate of use of capacities. At it is fixed in
the long run, the share should go back to normal with time.

A rate of use of 85% (a normal value over the whole economy) does not mean that all firms work at 85% capacity, in
which case they can easily move to 86% if needed. It means that the rates of use follows a given distribution, some at
lower than 85%, some higher, some at 99%, and a finite number at 100% (see graph).

200
Demand and the rate of use

.6
Baseline
An increase in demand
.5

.4
probability

.0
2500 5000 7500 10000 12500 15000
The rate of use (in 10000ths)

An increase in demand will move the curve to the right, and more firms will face the limit: an increase of 1% will meet
it halfway for firms starting from a 99.5% rate of use. The additional demand, if clients do not accept local substitutes,
will have to be supplied by imports.

However, local firms will react to this situation, and try to gain back lost market shares by increasing their capacities
through investment: this is the mechanism we have described earlier. In our small model, the long-term rate of use is
fixed: the sharing of the additional demand will come back to the base values. These values can increase with time
due to the expansion of world trade.

Our formula will make imports depend on total demand and the rate of use:

𝛥𝑀/𝑀 = 𝑎 ⋅ 𝛥𝑇𝐷/𝑇𝐷 + 𝑏. 𝛥𝑈𝑅/𝑈𝑅

Where

𝐼𝐶 = 𝑡𝑐 . 𝑄

𝑇𝐷 = 𝐹𝐷 + 𝐼𝐶

201
And tc is the quantity of intermediary consumption units required to produce one unit of GDP.

By integration, we get

𝐿𝑜𝑔(𝑀) = 𝑎 ⋅ 𝐿𝑜𝑔(𝑇𝐷) + 𝑏 ⋅ 𝐿𝑜𝑔(𝑈𝑅) + 𝑐

Introducing the rate of use, we get:

Dependent Variable: LOG(M)

Method: Least Squares
Date: 04/25/20 Time: 12:41
Sample (adjusted): 1963Q2 2004Q4
Included observations: 167 after adjustments

Variable Coefficient Std. Error t-Statistic Prob.

LOG(TD) 1.982239 0.013327 148.7350 0.0000

LOG(UR) 1.118627 0.187822 5.955783 0.0000
C -30.65432 0.379702 -80.73265 0.0000

R-squared 0.993071 Mean dependent var 25.82899

Adjusted R-squared 0.992987 S.D. dependent var 0.683591
S.E. of regression 0.057248 Akaike info criterion -2.865037
Sum squared resid 0.537488 Schwarz criterion -2.809025
Log likelihood 242.2306 Hannan-Quinn criter. -2.842303
F-statistic 11752.39 Durbin-Watson stat 0.099373
Prob(F-statistic) 0.000000

Dependent Variable: LOG(M)

Method: ARMA Maximum Likelihood (BFGS)
Date: 04/25/20 Time: 12:43
Sample: 1963Q2 2004Q4
Included observations: 167
Convergence achieved after 7 iterations
Coefficient covariance computed using outer product of gradients

Variable Coefficient Std. Error t-Statistic Prob.

LOG(TD) 2.012993 0.082841 24.29940 0.0000

LOG(UR) 0.160941 0.127756 1.259755 0.2096
C -31.50111 2.352651 -13.38962 0.0000
AR(1) 0.972438 0.023201 41.91354 0.0000
SIGMASQ 0.000245 2.23E-05 10.96716 0.0000

R-squared 0.999473 Mean dependent var 25.82899

Adjusted R-squared 0.999460 S.D. dependent var 0.683591
S.E. of regression 0.015881 Akaike info criterion -5.400439
Sum squared resid 0.040859 Schwarz criterion -5.307085
Log likelihood 455.9366 Hannan-Quinn criter. -5.362549
F-statistic 76850.10 Durbin-Watson stat 1.820626
Prob(F-statistic) 0.000000

202
Dependent Variable: LOG(M)
Method: ARMA Maximum Likelihood (BFGS)
Date: 04/25/20 Time: 12:45
Sample: 1963Q2 2004Q4
Included observations: 167
Convergence achieved after 9 iterations
Coefficient covariance computed using outer product of gradients

Variable Coefficient Std. Error t-Statistic Prob.

LOG(TD) 1.931196 0.201264 9.595350 0.0000

LOG(UR) 0.240297 0.208748 1.151137 0.2514
C -33.98000 7.679693 -4.424657 0.0000
T 0.002421 0.006392 0.378777 0.7054
AR(1) 0.967930 0.024792 39.04209 0.0000
SIGMASQ 0.000245 2.26E-05 10.85188 0.0000

R-squared 0.999473 Mean dependent var 25.82899

Adjusted R-squared 0.999457 S.D. dependent var 0.683591
S.E. of regression 0.015935 Akaike info criterion -5.388806
Sum squared resid 0.040881 Schwarz criterion -5.276782
Log likelihood 455.9653 Hannan-Quinn criter. -5.343338
F-statistic 61066.95 Durbin-Watson stat 1.818857
Prob(F-statistic) 0.000000

Inverted AR Roots .97

No formulation is acceptable on all counts, with no autocorrelation, a significant explanation of the rate of use, and a
positive significant trend (to say nothing of the demand coefficient).

Actually, we are facing a usual problem : the two explanations (deviations of demand from its trend, and output gap)
are strongly correlated.

One idea can be to force the first coefficient to unity, and estimate the ratio of imports to demand.

203
Dependent Variable: LOG(M/TD)
Method: ARMA Maximum Likelihood (BFGS)
Date: 04/25/20 Time: 13:07
Sample: 1963Q2 2004Q4
Included observations: 167
Convergence achieved after 7 iterations
Coefficient covariance computed using outer product of gradients

Variable Coefficient Std. Error t-Statistic Prob.

LOG(UR) 1.129681 0.096782 11.67241 0.0000

C -60.03575 5.793368 -10.36284 0.0000
T 0.028911 0.002922 9.893121 0.0000
AR(1) 0.969395 0.018275 53.04455 0.0000
SIGMASQ 0.000268 2.33E-05 11.47345 0.0000

R-squared 0.997749 Mean dependent var -2.664699

Adjusted R-squared 0.997693 S.D. dependent var 0.345836
S.E. of regression 0.016610 Akaike info criterion -5.311368
Sum squared resid 0.044692 Schwarz criterion -5.218015
Log likelihood 448.4993 Hannan-Quinn criter. -5.273478
F-statistic 17951.06 Durbin-Watson stat 1.721525
Prob(F-statistic) 0.000000

Inverted AR Roots .97

-2.0

-2.4

-2.8
.08
-3.2
.06
.04 -3.6
.02
.00
-.02
-.04
-.06
1965 1970 1975 1980 1985 1990 1995 2000

Residual Actual Fitted

This works quite well on all counts, and ensures that the growth of imports and demand will converge in the long run,
once the trend has been suppressed, and the rate of use has stabilized through the investment equation.

We could stop here. But as in the exports case, we can try to separate the behavior into short-term and long-term, in
other words to apply a n error correction framework. To represent this process, we need a formula which:

• Enforces a long-term unitary elasticity of imports to the total demand variable, with a positive additional
effect of the rate of use.
• Allows free elasticities in the short run.

We shall start by testing the cointegration between the share of imports in demand: M/TD and the rate of use.

Before, we test the stationarity of M/TD, or rather its logarithm (UR has already been tested):

204
It is strongly contradicted by the Dickey Fuller test:

Null Hypothesis: LOG(M/TD) has a unit root

Exogenous: Constant, Linear Trend
Lag Length: 1 (Automatic - based on SIC, maxlag=14)

t-Statistic Prob.*

Augmented Dickey-Fuller test statistic -2.921357 0.1582

Test critical values: 1% level -4.007084
5% level -3.433651
10% level -3.140697

*MacKinnon (1996) one-sided p-values.

Augmented Dickey-Fuller Test Equation

Dependent Variable: D(LOG(M/TD))
Method: Least Squares
Date: 03/22/20 Time: 20:09
Sample (adjusted): 1963Q3 2010Q4
Included observations: 190 after adjustments

Variable Coefficient Std. Error t-Statistic Prob.

LOG(M(-1)/TD(-1)) -0.061830 0.021165 -2.921357 0.0039

D(LOG(M(-1)/TD(-1))) 0.271652 0.067991 3.995397 0.0001
C -0.188041 0.066624 -2.822416 0.0053
@TREND("1962Q1") 0.000421 0.000149 2.828711 0.0052

R-squared 0.108959 Mean dependent var 0.007611

Adjusted R-squared 0.094587 S.D. dependent var 0.019241
S.E. of regression 0.018308 Akaike info criterion -5.142088
Sum squared resid 0.062347 Schwarz criterion -5.073730
Log likelihood 492.4984 Hannan-Quinn criter. -5.114397
F-statistic 7.581507 Durbin-Watson stat 2.018032
Prob(F-statistic) 0.000082

Now we test cointegration of LOG(UR) and LOG(M /TD)

It fails!

205
Date: 03/22/20 Time: 20:12
Sample (adjusted): 1979Q1 2010Q4
Included observations: 128 after adjustments
Trend assumption: Linear deterministic trend (restricted)
Series: LOG(M/TD) LOG(UR)
Lags interval (in first differences): 1 to 4

Unrestricted Cointegration Rank Test (Trace)

Hypothesized Trace 0.05

No. of CE(s) Eigenvalue Statistic Critical Value Prob.**

None 0.056385 11.83231 25.87211 0.8234

At most 1 0.033818 4.403551 12.51798 0.6832

Trace test indicates no cointegration at the 0.05 level

* denotes rejection of the hypothesis at the 0.05 level
**MacKinnon-Haug-Michelis (1999) p-values

Unrestricted Cointegration Rank Test (Maximum Eigenvalue)

Hypothesized Max-Eigen 0.05

No. of CE(s) Eigenvalue Statistic Critical Value Prob.**

None 0.056385 7.428760 19.38704 0.8694

At most 1 0.033818 4.403551 12.51798 0.6832

Max-eigenvalue test indicates no cointegration at the 0.05 level

* denotes rejection of the hypothesis at the 0.05 level
**MacKinnon-Haug-Michelis (1999) p-values

Unrestricted Cointegrating Coefficients (normalized by b'S11b=I):

LOG(M/TD) LOG(UR) @TREND(62Q2)

-15.37135 -8.656178 0.136527
27.20696 -15.09145 -0.171007

Unrestricted Adjustment Coefficients (alpha):

D(LOG(M/TD)) 0.001320 -0.001626

D(LOG(UR)) 0.004203 0.001371

1 Cointegrating Equation(s): Log likelihood 731.9615

Normalized cointegrating coefficients (standard error in parentheses)

LOG(M/TD) LOG(UR) @TREND(62Q2)
1.000000 0.563137 -0.008882
(0.43358) (0.00089)

Adjustment coefficients (standard error in parentheses)

D(LOG(M/TD)) -0.020290
(0.01466)
D(LOG(UR)) -0.064610
(0.02652)

Let us not despair. Diagnosis an absence of cointegration is not such bad news74, as it allows us to proceed further. If a
set of two variables does not work, why not a set of three?

Now which additional element could we consider? The natural candidate comes both from theory and from the data:

74
Identifying two equations would be much worse, especially in a modelling framework.

206
If demand is present but local producers have no capacity problems, how can foreign exporters penetrate a market?
Of course, through price competitiveness, in other words by decreasing the import price compared to the local one.

This observation is confirmed by the data. Let us regress the import-demand ratio over the rate of use, consider the
residual (the unexplained part) and compare it to the ratio of import to local prices: we observe a clearly negative
relation.

.24

.20
Residual of log(m/td) over log(ur)
Log of import price competitiveness
.16

.12

.08

.04

.00

-.04

-.08
78 80 82 84 86 88 90 92 94 96 98 00 02

We observe a clear correlation, especially if we consider variations around a negative trend.

After having tested of course:

• Non-stationarity of Log(COMPM)
• Non- cointegration of Log(COMPM) with both Log(UR) and log(M/TD) individually,
We can test the cointegration of the three elements:

It works!!

207
Date: 03/22/20 Time: 20:17
Sample (adjusted): 1979Q3 2004Q4
Included observations: 102 after adjustments
Trend assumption: Linear deterministic trend (restricted)
Series: LOG(M/(FD+CT*Q)) LOG(UR) LOG(COMPM)
Lags interval (in first differences): 1 to 3

Unrestricted Cointegration Rank Test (Trace)

Hypothesized Trace 0.05

No. of CE(s) Eigenvalue Statistic Critical Value Prob.**

None * 0.240307 46.08716 42.91525 0.0233

At most 1 0.122323 18.05342 25.87211 0.3403
At most 2 0.045452 4.744804 12.51798 0.6333

Trace test indicates 1 cointegrating eqn(s) at the 0.05 level

* denotes rejection of the hypothesis at the 0.05 level
**MacKinnon-Haug-Michelis (1999) p-values

Unrestricted Cointegration Rank Test (Maximum Eigenvalue)

Hypothesized Max-Eigen 0.05

No. of CE(s) Eigenvalue Statistic Critical Value Prob.**

None * 0.240307 28.03374 25.82321 0.0252

At most 1 0.122323 13.30861 19.38704 0.3038
At most 2 0.045452 4.744804 12.51798 0.6333

Max-eigenvalue test indicates 1 cointegrating eqn(s) at the 0.05 level

* denotes rejection of the hypothesis at the 0.05 level
**MacKinnon-Haug-Michelis (1999) p-values

Unrestricted Cointegrating Coefficients (normalized by b'S11b=I):

LOG(M/(FD+... LOG(UR) LOG(COMPM) @TREND(62Q2)

-38.34645 58.18449 -21.17524 0.221992
40.54223 -13.27895 14.45687 -0.274847
2.500665 4.998751 15.34349 0.024427

Unrestricted Adjustment Coefficients (alpha):

D(LOG(M/(F... 0.002033 -0.003559 -0.000202

D(LOG(UR)) -0.004216 -0.002450 -0.000251
D(LOG(COM... -0.000361 0.000809 -0.001906

1 Cointegrating Equation(s): Log likelihood 982.2613

Normalized cointegrating coefficients (standard error in parentheses)

LOG(M/(FD+... LOG(UR) LOG(COMPM) @TREND(62Q2)
1.000000 -1.517337 0.552209 -0.005789
(0.16825) (0.08279) (0.00025)

Adjustment coefficients (standard error in parentheses)

D(LOG(M/(F... -0.077941
(0.04357)
D(LOG(UR)) 0.161669
(0.04152)
D(LOG(COM... 0.013852
(0.03721)

2 Cointegrating Equation(s): Log likelihood 988.9156

Normalized cointegrating coefficients (standard error in parentheses)

LOG(M/(FD+... LOG(UR) LOG(COMPM) @TREND(62Q2)
1.000000 0.000000 0.302737 -0.007052
(0.14837) (0.00043)
2080.000000 1.000000 -0.164414 -0.000832
(0.11811) (0.00035)

Adjustment coefficients (standard error in parentheses)

D(LOG(M/(F... -0.222221 0.165519
(0.05989) (0.06405)
D(LOG(UR)) 0.062329 -0.212769
(0.05869) (0.06277)
D(LOG(COM... 0.046648 -0.031760
Of course we have to consider the coefficients in the equation. They describe:

• An apparently high sensitivity of imports to the rate of use (but remember the investment equation will
stabilize it in the end).

However the true effect is not so high. If the equation was applied to the short-term, with a share of imports in total
demand of 0.15 (the value for 2004), we would get:

(M) = 1 x 0.15  (TD) +1.52 x 0.15  (Q)

 (M) = 0.15  (TD) +0.23  (Q)

And if we do not consider the change in exports:

 (FD - Q) = 0.15  (FD + ct . Q) +0.23  (Q)

 (Q) = 0.85 / 1.52  (FD) = 0.56  (FD)

A quite acceptable multiplier for France.

• A much higher sensitivity than in the autoregressive formula.

We can now test the VAR:

209
Dependent Variable: DLOG(M)
Method: Least Squares (Gauss-Newton / Marquardt steps)
Date: 03/22/20 Time: 20:26
Sample (adjusted): 1978Q4 2004Q4
Included observations: 105 after adjustments
DLOG(M)=C_M(5)*DLOG(FD+CT*Q)+C_M(6)*DLOG(UR)+C_M(7)
*DLOG(COMPM)+C_M(8)*RES_M(-1)+C_M(9)+EC_M

Coefficient Std. Error t-Statistic Prob.

C_M(5) 1.676415 0.236019 7.102896 0.0000

C_M(6) 0.213352 0.065300 3.267272 0.0015
C_M(7) -0.105752 0.078397 -1.348931 0.1804
C_M(8) -0.102809 0.025468 -4.036803 0.0001
C_M(9) -0.275761 0.068639 -4.017567 0.0001

R-squared 0.614643 Mean dependent var 0.011899

Adjusted R-squared 0.599229 S.D. dependent var 0.016169
S.E. of regression 0.010236 Akaike info criterion -6.279400
Sum squared resid 0.010477 Schwarz criterion -6.153021
Log likelihood 334.6685 Hannan-Quinn criter. -6.228189
F-statistic 39.87492 Durbin-Watson stat 2.059633
Prob(F-statistic) 0.000000

.06
.04
.02

.03 .00
.02 -.02
.01 -.04
.00
-.01
-.02
-.03
80 82 84 86 88 90 92 94 96 98 00 02 04

Residual Actual Fitted

The results are rather acceptable, including the graph. By the way, it shows that rejecting a low R-Squared is not
justified when the dependent element shows a high variability.

6.3.6 BACK TO THE RESIDUAL CHECK

The method we are using for storing equations has an additional advantage. Now that the residuals have been
introduced with their estimated values, all the equations should hold true. The checking process can now be extended
to all the endogenous variables.

Theoretically the estimated equations should be consistent with the data, as merging with the model the actual
estimated equations ensures the consistency. However:

• If the package does not provide this direct storing, or if the equation had to be normalized by the modeler,
editing the formulation could introduce errors.
• The storing of coefficients may have been done badly.
• The text, series or coefficients may have been modified by the user after estimation
210
• One could have accessed other series or coefficients than the ones used by the estimation (for example one
can seek them in another bank including series of similar names).
The reasons for a non-zero residual are less numerous than for identities. They can come only from the fact that
equation elements have changed since the last estimation.

Obviously, the main suspect is the data. New data series are due to be inconsistent with the previous estimation,
whether it has been updated (moving to a more precise version) or corrected (suppressing an error).

Actually, in EViews, applying a new version of an equation to a model requires, in addition to its estimation, to actually
merge it again into the model. This will create a new compiled version, without need to explicitly update the model.

Anyway, in our opinion, applying a new estimation should call for a full model re-creation. This is the only way to
guarantee a clear and secure update.

For our model, the statements for the residual check will be the following:

' We check the residuals

smpl 1980Q1 2002S1

__fra_1.append assign @all _c

solve(d=f) _fra_1

for !i=1 to g_vendo.@count

%2=g_vendo.@seriesname(!i)

genr dc_{%2}={%2}-{%2}_c

genr pc_{%2}=100*dc_{%2}/({%2}+({%2}=0))

The solution series will have the suffix “_c”, and the residuals the prefix “dc_” for the errors in levels, and “pc_” for the
relative errors.

We can now present the framework of the model.

211
Architecture of the small model
Scrapping
rate
Firms
Total Civil
GDP employmt
employmt servants

Invest
Capital
ment Household Real
revenue Wage rate
share of GDP
Change
inventor
Capacity
Housing Share of
Investmt revenue
Rate of
use

Consump Savings
Final rate
demand tion

exports imports
Governmt
demand

World Price
demand compet

6.3.7 THE PRESENT MODEL

In EViews notations, its specifications are:

[1] cap = pk * k(-1)

[2] ur = q / cap

[3] q + m = fd + x

[4] i/k( - 1) = 0.944I( - 1)/K( - 2) - 0.00256(ur-ur_star)/ur + 0.0428.125q/q( - 8) - 0.00448+ ec_i

[5] ic = ct * q

[6] log(prle_t) = 9.8398+ 0.009382* (t - 2002) + 0.02278* (t - t1) * (t<t1) + 0.01540* (t - t2) * (t<t2)

[7] led = q / prle_t

[8] DLOG(le) = 0.2954DLOG(led) + 0.1990LOG(led( - 1)/le( - 1)) + 0.0004748* + 0.01759**((T = 1968.25) - (T =

1968.50)) + ec_le

[9] lt = le + lg

[10] rhi = wr * lt + r_rhiq * q

[11] co = rhi * (1 - sr)

[12] ih = r_ih * rhi

212
[13] ci/q( - 1) = -0.02680*(T = 1968.25) + 0.6128*ci( - 1)/q( - 2) - 0.0021490 + 0.2193*@PCH(q) + 0.1056*@PCH(q( - 1))
+ 0.05492*@PCH(q( - 2)) + 0.03918*@PCH(q( - 3)) + 0.03026*@PCH(q( - 4)) + ec_ci

[14] fd = co + i + gd + ci + ih

[15] td = fd + ic

[16] res_m = log(m / (fd + ct * q)) ‘ 1.517 * log(ur) +-0.552 * log(compm) + 0.00579 * (@trend(60:1) * (t<=2004) +
@elem(@trend(60:1) , "2004q4") * (t>2004))

[17] DLOG(m) = 1.676DLOG(td) + 0.2133DLOG(ur) - 0.1057DLOG(compm) - 0.1028res_m( - 1) - 0.2757+ EC_M

[18] res_x = p_x(1) * log(x / wd) + 0.9815 * log(ur) + 0.001441 * (@trend(60:1) * (t<=2004) + @elem(@trend(60:1) ,
"2004q4") * (t>2004))

[19] DLOG(x) = 0.4940DLOG(wd) - 0.09142res_x( - 1) + 0.01960+ ec_x

[20] k = k(-1) * (1 - dr) +i

213
7 CHAPTER 7: TESTING THE MODEL THROUGH SIMULATIONS OVER THE PAST

We have now achieved the production of a model, for which:

• All the identities are consistent with the data.

• All estimations are in our opinion acceptable from a statistical point of view.
• All the coefficient values and equation specifications seem consistent with economic theory75.
• All the necessary connections between model elements are present.
• The estimated equations have been built in such a way that they should provide a steady state growth, and a
long-term solution for the full model.
This does not mean our model is acceptable.

• The consistency of the data might hide errors compensating each other.
• Some complex equations might have hidden wrong individual properties.
• Connections between concepts might have been forgotten or wrongly specified.
• The growth rates provided naturally by equations could be the wrong ones.
• Some error correction processes could be diverging instead of converging.
• Assembling individually acceptable formulations might create a system with unacceptable properties.
• Exogenous elements might be linked with each other.
• Theoretical balances might not have been specified.

And finally:

Another modeler would have obtained a different model, with potentially different properties.

Let us just give an example for a problem: if in the short run increasing government demand by 1000 creates 800
consumption and 600 investment, while exports do not change and imports increase by 300, the individual equations
might look acceptable, but the model will diverge immediately through an explosive multiplier effect (800 +600-300 =
1100).

Our next goal will be to control:

• That the model gives a numerical solution

• That it can be used for forecasts.

• That it can be used for policy analysis.
Actually, our first tests will rather answer the reverse question:

• Is there some indication that the model is unsuitable for forecasts, and for policy analysis?

In our opinion, it is only later, by simulations over the future (its normal field of operation, actually) that we can really
validate the use of a model. But as usual, problems should be diagnosed as soon as possible. And the availability of
actual data increases strongly the testing possibilities.

75
In some cases we might have been obliged to calibrate the values.

214
Finally, the errors evidenced at this stage might help to build a better forecasting structure, before any attempt on the
future.

Let us first address the process of solving the model.

7.1 THE SOLUTION

To solve the model we need to apply a method (an “algorithm”). Let us present the different options.

7.1.1 GAUSS-SEIDEL

This is the most natural algorithm: one often uses Gauss-Seidel without knowing it, like M. Jourdain (the Bourgeois
Gentilhomme) makes prose.

The method starts from initial values. They can be the historical values on the past, on the future the values computed
for the previous period or for an alternate base simulation. The whole set of equations will be applied in a given order,
using as input the most recent information (computations replace the initial values). This gives a new set of starting
values. The process is repeated, using always the last information available, until the distance between the two last
solutions is small enough to be considered negligible. One will then consider that the solution has been reached.

Let us formalize this process.

Considering the model

𝑦𝑡 = 𝑓𝑡 (𝑦𝑡 , 𝑦𝑡−1 , 𝑥𝑡 , 𝛼̂)

Where 𝑦𝑡 is the vector of endogenous variables at period t.

As only present values will change during computation, we will not consider the other elements, and will drop the
time index.

𝑦 = 𝑓(𝑦)

We will use it to define the particular endogenous, and an exponent to define the iteration count.

a - We start from y0, value before any computation, setting the number of iterations to zero.

b - We add 1 to the number of iterations (which we shall note k); this gives to the first iteration the number 1.

k
c -We compute y i from i = 1 to n, taking into account the i-1 values we have just produced. This means we compute:

215
𝑦𝑖𝑘 = 𝑓(𝑦1𝑘 , . . . , 𝑦𝑖−1
𝑘
, 𝑦𝑖𝑘−1 , . . . . , 𝑦𝑛𝑘−1 )

(at the first iteration, explanatory elements will take the starting value y 0 if their index is higher than the computed
variable)76.

d – At the end of the process, we compare yk and yk-1: if the distance is small enough for every element (using a
criterion we shall present) we stop the process, and use the last value as a solution. If not, we check if we have
reached the maximum number of iterations, in which case we accept the failure of the algorithm, and stop. Otherwise
we resume the process at step b.

Clearly, this algorithm requests an identified model, with a single variable on the left hand side (or an expression
containing a single simultaneous variable).

7.1.2 RITZ-JORDAN

The Ritz-Jordan method is similar to the one above: it simply abstains from using values computed at the current
iteration:

𝑦 𝑘 = 𝑓(𝑦 𝑘−1 )

Refusing to consider the last information, it looks less efficient than Gauss-Seidel. In our opinion, its only interest
appears when the model faces convergence problems: it makes their interpretation easier by reducing the
interferences between variables.

This method is not provided by EViews.

7.1.3 NEWTON AND ITS VARIANTS

Contrary to the two above, the Newton method applies naturally to non-identified formulations. It represents
actually a generalization to an n-dimensional case of the well-known method using a sequence of linearizations to
solve a single equation.

Let us consider the model:

𝑓𝑡 (𝑦𝑡 , 𝑦𝑡−1 , 𝑥𝑡 , 𝛼̂) = 0

76
This means only variables which are used before they are computed must be given values for initialization. We shall
come back to this later.

216
that we will simplify as above into:

𝑓 (𝑦) = 0

The linearization of f around a starting solution gives, by calling “fl” the value of f linearized:

(𝜕𝑓/𝜕𝑦)𝑦=𝑦0 ⋅ (𝑦 − 𝑦 0 ) = 𝑓𝑙(𝑦) − 𝑓(𝑦 0 )

Solving the system for fl (y) = 0 leads to:

𝑦 = 𝑦 0 − (𝜕𝑓/𝜕𝑦)−1 0
𝑦=𝑦 0 ⋅ 𝑓(𝑦 )

or with an identified system:

𝑦 − 𝑓(𝑦) = 0

we would get naturally:

𝑦 = 𝑦 0 − (𝐼 − 𝜕𝑓/𝜕𝑦)−1 0 0
𝑦=𝑦 0 ⋅ (𝑦 − 𝑓(𝑦 ))

217
The Newton method (one equation)

f(y0)

f(y1)

f(y2)
y2 y1 y0

Linearizing the model again, around the new solution y 1, and solving the new linearized version of the model, we
define an iterative process which, as the preceding, will stop when the distance between the two last values gets small
enough. Implementing this method is more complex: in addition to inverting a matrix, each iteration involves the
computation of a Jacobian. This can be done in practice in two ways:

• Analytically, by determining from the start the formal expressions of derivatives. At each iteration, we shall
compute them again from the present values of variables. This method supposes either undertaking the
derivation "by hand" with a high probability of errors, or having access to an automatic formal processor, a
program analyzing the text of equations to produce the formal expression of their derivatives. To a high initial
cost, this method opposes a simplification of computations during the iterative process 77.

EViews allows both methods78.

• By finite differences, determining separately each column of the Jacobian by the numerical computation of a
limited first order development, applied in turn to each variable. One computes the y vector using the base
values, then for starting values differing only by a single variable, and compares the two results to get a
column of the Jacobian. One will speak then of a method of secants, or pseudo-Newton.

The derivate formulation becomes in this case:

77
However, changing some model specifications calls for a new global derivation (or a dangerous manual updating).

78
Unfortunately, the associated code is not apparently available to the user, which would allow interesting
computations.

218
 f ( y  
n

i
k
+ e j y j ) − f ( y k ) / y j ( y j − y kj ) = fli ( y ) − f i ( y k )
j =1

where ej is a vector of dimension n (number of endogenous), with 1 in position i and 0 otherwise.

In other words, the element of the Jacobian:

fi / y j will be approximated by fi ( y k + e jy j ) − f ( y k ) / y j

One will have only to compute the y vector n+1 times: one time with no modification and one time for each of the
endogenous variables.

The expensive part of this algorithm being clearly the computation of the Jacobian and its inversion, a variant will
consist in computing it only each m iterations. The convergence will be slower in terms of number of iterations, but
the global cost might decrease.

EViews provides another alternative: Broyden’s method, which uses a secant method and does not require to
compute the Jacobian at each step. As we shall see later, this method proves often very efficient.

7.1.3.1 The identified case

If the model is in “identified” form:

𝑦 = 𝑓 (𝑦)

the Newton algorithm will be applied to

𝑦 − 𝑓(𝑦) = 0

and the Newton formula becomes:

219
𝑦 = 𝑦 0 − (𝐼 − 𝜕𝑓/𝜕𝑦)−1 0 0
𝑦=𝑦 0 ⋅ (𝑦 − 𝑓(𝑦 ))

𝑦 = (𝐼 − 𝜕𝑓/𝜕𝑦)−1 0 0
𝑦=𝑦 0 ⋅ (𝑓(𝑦 ) − (𝜕𝑓/𝜕𝑦)𝑦=𝑦 0 ⋅ 𝑦 )

This does not change the technical process.

7.1.4 BROYDEN’S METHOD

Broyden’s method (also called secant method) computes the Jacobian only once, in the same way as Newton’s,
and computes a new value of the variable accordingly.

After that, it updates the Jacobian, not by derivation, but by considering the difference with the previous solution,
and the direction leading from the previous solution to the new one.

The formula for updating the Jacobian is:

𝐽𝑡+1 =𝐽𝑡 + (𝐹(𝑥𝑡+1 ) − 𝐹(𝑥𝑡 ) − 𝐽𝑡 ∆𝑥𝑡 ). ∆𝑥 ′ 𝑡 /(∆𝑥 ′ 𝑡 . ∆𝑥𝑡 )

where J is the Jacobian, F the function which should reach zero, and x the vector of unknown variables.

Let us clarify all this with a graph based on the single equation case.

220
We can see that the direction improves with each iteration, less than Newton but more than Gauss-Seidel (for
which it does not improve at all).

Otherwise the method shares all the characteristics of Newton’s, in particular its independence to equation
ordering. It takes generally more iterations, but each of them is cheaper (except for the first).

We shall see on a set of practical examples that on average it looks like the most efficient option on the whole,
both in terms of speed and probability of convergence 79. But the diagnosis is not so clear cut.

7.1.5 ITERATIONS AND TEST OF CONVERGENCE

Methods described above have a common feature: starting from initial values, they apply formulations to get a new
set. The process is repeated until the two last sets are sufficiently close to be considered as the solution of the system.

One cannot identify the difference between two iterations with the precision actually reached (or the difference to
the solution). This is valid only for alternate processes. For monotonous ones, it actually can be the reverse: the slower
the convergence, the smaller the change in the criterion from one iteration to the other, and the higher the chance
that the criterion will be reached quite far from the solution. As to cyclical processes, they can reach convergence
mistakenly at the top or bottom of a cycle.

79
The most important feature in our opinion.

221
values

iterations

convergence

So one could question the use of this type of method, by stressing that the relative stability of values does not mean
that the solution has been found. However, one can observe that if the values do not change, it means that the
computation which gave a variable would give the same result with the new values of its explanatory variables: it
means also that the equation holds almost true (but that very different values have the same property).

In this case, it is clear that we do not get the exact solution. This criticism should not be stretched too much: the
precision of models is in any event limited, and even supposedly exact algorithms are limited by the precision of
computers.

7.1.5.1 The general options

For the algorithm to know at which moment to stop computations, we shall have to establish a test.

In fact, the only criterion used in practice will consider the variation of the whole set of variables in the solution, from
an iteration to the other.

It can be measured, for each variable:

in relative values:

𝑑𝑖 = |(𝑦𝑖𝑘 − 𝑦𝑖𝑘−1 )/𝑦𝑖𝑘−1 |

or in levels:

222
𝑑𝑖 = |𝑦𝑖𝑘 − 𝑦𝑖𝑘−1 |

As to the condition for accepting convergence, it can be defined:

by variable: 𝑑𝑖 < 𝑐𝑖 , ∀𝑖

on the whole set: 𝑑𝑖 < 𝑐𝑖 , ∀𝑖

or sometimes through a global measure: 𝑓(𝑑) < 𝑐

Generally one will choose a criterion in relative value, each error being compared with a global criterion. This value
will have to be small compared to the expected model precision (so that the simulation error will not really contribute
to the global error), and to the number of digits used for results interpretation.

The most frequent exception should correspond to variables which, like the trade balance, fluctuate strongly and can
even change sign: here the choice of a criterion in level seems a natural solution, which will avoid a non-convergence
diagnosis due to negligible fluctuations of a variable around a solution (by pure chance) very small.

For example, if the convergence threshold is 0.0001 in relative value, convergence will be refused if solutions for the
US trade balance alternate by chance between - 1 billion current US Dollars and - 1.0002 billion80, while a difference of
200 000 Dollars, at the scale of US foreign trade, is obviously very small. And this difference, which represents less
than one millionth of US exports and imports, might never be reduced if the computer precision guarantees only 8
significant figures81.

In practice we shall see that the test could be restricted to a subset of variables in the model, the convergence of
which extends mathematically to global convergence.

The value given to the criterion can depend:

• On the algorithm used:

o In case of Gauss-Seidel, each additional digit bears roughly the same cost. The convergence is qualified as
linear.
o In case of Newton, the number of digits gained increases with the iterations: beyond the minimum level (say
0.01%) a given gain is cheaper and cheaper (this will be developed later). The convergence is called quadratic

80
There is no risk for this in present times.

81
Exports and imports will be precise to the 8th digit, but the difference, a million times smaller, to the 2nd only.

223
• On the type of simulation:

o For a forecast, one will not be too strict, as we all know the precision is quite low anyway. Forecasting growth
of 2.05% and 2.07% three years from now delivers the same message, especially as the actual growth might
materialize as 1% (more on forecast precision later).
o For a shock analysis, especially if the shock is small, the evaluation of the difference between the two
simulations is obviously more affected by the error: decisions increasing in GDP by 0.07% and 0.09% will not
be given the same efficiency.

• And perhaps on the stochastic character:

In a stochastic simulation, it is essential that the consequence for the solution of introducing small random residuals is
precisely associated with the shock, and not on the simulation process.

As to the number of iterations, it will be used as a limit, after which we suppose that the model has no chance to
converge. In practice one never considers stopping an apparently converging process, just because it has taken too
much time. So the only relevant case is when the process is not progressing, because it is oscillating between two or
more solutions, and the deadlock has to be broken. Reverting to the use of damping factors (described later) should
solve the problem in the Gauss-Seidel case.

7.1.5.2 The EViews options

Testing convergence under EViews is not very flexible: the only option allowed is the level of the (relative)
convergence criterion, and it will apply to all variables.

One can also decide on the maximum number of iterations. For most models, after 1000 iterations, convergence
becomes rather improbable. But just to make sure, one can set an initial high number. Observing the actual number
required can allow to improve the figure.

7.1.6 STUDY OF THE CONVERGENCE

We are now going to show how the choice of the algorithm affects the convergence process.

Let us begin by stating the problem, and introducing some definitions.

7.1.6.1 The incidence matrix

The incidence matrix of an n-equation model

f (y, ...) = 0 (n endogenous variables, n equations)

will be defined as the Boolean matrix A (dimension n by n) such that

• Ai,j = 1 if the variable yj appears formally, through its unlagged value, in the equation of rank i.
• Ai,j = 0 otherwise.

224
We will suppose the model to be normalized, therefore put under the form:

𝑦 − 𝑓(𝑦) = 0

where the variable yi will appear naturally to the left of the equation of rank i: the main diagonal of the matrix will be
composed of 1s.

The definition of the incidence matrix, as one can see, depends not only on the model, but also on the ordering of
equations, actually the one in which they are going to be computed.

The formal presence of a variable in an equation does not necessarily mean a numerical influence: it could be affected
by a potentially null coefficient, or intervene only in a branch of an alternative. Often we will not be able to associate
to a model a unique incidence matrix, nor a matrix constant with time, except if one considers the total set of
potential influences (the matrix will be then the Boolean sum of individual Boolean matrices).

One will also notice that defining the incidence matrix does not require the full formulations, or the knowledge of
variable values. We simply need to know the list of variables which appear in each explanation, as well as their
instantaneous or lagged character 82.

Application to our model

To apply this technique to our model, we can rely on the block structure provided by EViews, through access to:

(double-click)>View>Block structure,

which gives in our case:

Number of equations: 20
Number of independent blocks: 3
Number of simultaneous blocks: 1
Number of recursive blocks: 2
Block 1: 3 Recursive Equations
cap(1) prle_t(6) x(19)
Block 2: 14 Simultaneous Equations (1 feedback var)
ic(5) ci(13) led(7) le(8)
td(15) m(17) q(3)
Block 3: 3 Recursive Equations
res_m(16) res_x(18) k(20)

All these elements are consistent with the model graph.

We can use these elements to improve the ordering.

82
Following our methodology, the incidence matrix can be produced before any estimation.

225
First, we can use the above separation to move the three predetermined variables at the beginning, and the three
post determined at the end, which give the following matrix:

cap prle_t x q ur ic ci i led le lt rhi ih co fd td m res_x res_m k

cap 1
prle_t 1
x 1
q 1 1 1 1
ur 1 1 1
ic 1 1
ci 1 1
i 1 1 1
led 1 1 1
le 1 1
lt 1 1
rhi 1 1 1
ih 1 1
co 1 1
fd 1 1 1 1 1
td 1
m 1 1 1 1
res_x 1 1
res_
m 1 1 1
k 1 1

We can see that the model has been divided into three parts:

• A three equation block, with elements which do not depend on the complement, or on subsequent variables
in the same block. The variables in this can then be computed once and for all, in a single iteration, at the
beginning of the solving process. Actually they do not depend on any variable in the same block, but this is
not required.

This property is called recursiveness, and the block is usually named the prologue.

We can see that variables can belong to this block for various reasons:

o prle_t depends only on time. The only reason for introducing its equation is to allow easy modification in
forecasts.
o cap depends on an exogenous and a predetermined variable.
o x should depend on the rest of the equilibrium (through UR) but this link has not been evidenced
statistically, leaving only the instantaneous influence of the exogenous WD.

In practice, however, respecting the convergence threshold will need two iterations, the starting value being different
from the solution found, unless the recursivity is known from the start, and the first solution accepted without
control83.

83
Which is of course the case for EViews.
226
• A three equations block, in which elements do not affect the rest of the model, and do not depend on
subsequent variables in the same block. These variables can be computed after all the others, once and for
all in one pass. Again, they do not depend on any variable in the same block, but this is not necessary. The
only condition is that they do not depend on elements computed later (or that the matrix is lower-
triangular).

In this block we find:

o The residuals for the cointegration equations, which will only be corrected at the next period.
o The end-of-period capital, which obviously cannot affect the equilibrium for the period.

We shall see later another important category: variables with a purely descriptive role, like the government deficit in
GDP points

• The rest of the model is simultaneous, and sometimes called the heart. We can check on the model graph
that for any given couple of variables in the set, there is at least one sequence of direct causal relationships
leading from the first to the second, and vice versa. This means also that exogeneizing any element (at a
value different from the model solution of course) all the other elements will be affected.

We can now try to better interpret the simultaneity in the heart. The first stage is observing the presence of loop
variables.

The incidence matrix allows defining loop variables, as variables that enter in an equation of rank lower than the one
that defines them, or will be used before they are computed. In matrix notations, we shall have:

for variable j, ∃ Ai,j = 1 such as 𝑖 < j

The variables appearing as an explanatory factor in their own equation of definition also will have to be added to this
set. But in practice this case is rather rare.

Let us look at our incidence matrix. Two loop variables are present: FD and M. The reason is that they are used to
compute Q, in an equation which appears at the beginning (of the heart).

Actually X should also be present, but as UR appears only through its lagged value, and WD is exogenous, its exact
value can be computed immediately, which means it is located in the prologue. In a way it is now technically
exogenous (considering only same period relationships).

Of course, a model can contain a sequence of non-recursive blocks. This will happen for instance for two subsequent
non-recursive blocks if elements of the second depend on elements in the first, but not vice-versa. Between the two
blocks, a recursive one can be introduced.

We shall see examples of this situation when we deal with more complex models.

The definition of the set of loop variables presents the following interest: if this set is empty, the model becomes
recursive, which means that the sequential calculation of each of the equations (in the chosen order) gives the
solution of the system. Values obtained at the end of the first iteration will satisfy the global set of equations, none of
these values being questioned by a later modification of an explanatory element. And a second iteration will not
modify the result.

227
This favorable case is rare enough 84. However, one can often identify simultaneous subsets (or « blocks » ) with a
recursive structure relative to each other, such that the p first equations of the model are not influenced by the last n
– p (as we have shown on our example). The process of simulation can be then improved, as it will suffice to solve in
sequence two systems of reduced size, allowing to gain time as the cost of solution grows more than proportionally
with the number of equations. This property is evident for Newton, where the cost of both Jacobian computation and
inversion decrease, less for Broyden and even less for Gauss-Seidel, where the only proof comes from practice.

It is obvious that discovering the above properties and performing the associated reordering are interesting for the
model builder, as they allow to improve the organization of the solution process, and therefore reduce computation
time. This process will also allow to detect logical errors, for example by evidencing the recursive determination of an
element known as belonging to a loop (such as investment in the Keynesian loop). Most software packages, including
EViews, take care of this search and the associated reorganization, but further improvement may be sought in the
solving process by modifying by hand the order of equation computations.

In the light of previous observations, one can look:

• For the best block-recursive structure.

• Inside each block, for the order which permits the fastest convergence, or ensures its highest probability.
The first goal is indisputable, and in fact the easiest to realize from the algorithmic viewpoint. The separation found is
unique, but some orderings of blocks can be equivalent (for example equations using only exogenous or lagged
elements can be placed in any order).

The second is much less obvious and in any case more complex. One will seek generally to minimize the number of
loop variables. The cost of this technique will depend on the ambition: the search for one set of loop variables from
which we cannot eliminate an element (Nepomiaschy and Ravelli) is cheaper than the search for all orderings with the
smallest possible number of elements (Gilli and Rossier). The first type of set will be called minimal, the second
minimum. In fact, minimizing the number of loop variables might not be a good preparation for the use of the Gauss -
Seidel algorithm, as we will see later.

EViews determines automatically the block structure of the model (which is de facto optimal, even if other
organizations exist). As to reordering inside the simultaneous blocks, if it does not apply an optimization algorithm, it
determines the loop variables associated with a given ordering (actually associated to the initial one) and places the
associated equations at the end of the block. The efficiency of this last action is questionable, as it means that in a
given iteration all computations use the previous value of loop variables, delaying somewhat the impact of “new”
information.

For instance, in our model, we can reduce the number of loop variables by transferring the equation for Q to the end
of the heart:

84
And the associated model is probably quite poor.

228
prle_ res_
cap t x ur ic ci i led le lt rhi ih co fd td m q res_x m k
cap 1
prle_
t 1
x 1
ur 1 1 1
ic 1 1
ci 1 1
i 1 1 1
led 1 1 1
le 1 1
lt 1 1
rhi 1 1 1
ih 1 1
co 1 1
fd 1 1 1 1 1
td 1
m 1 1 1 1
q 1 1 1 1
res_x 1 1
res_
m 1 1 1
k 1 1

Now Q is the on