0% found this document useful (0 votes)
735 views547 pages

Advanced Control Using Matlab

This document provides an overview of using MATLAB for computer-aided control system design. It discusses modeling dynamic systems using differential equations, discretizing continuous system models for digital analysis and design, and analyzing stability. The document covers modeling techniques including regressing experimental data, examples from chemical processes, and numerical tools in MATLAB for modeling tasks. The goal is to explain how to simulate and analyze both single-input/single-output and multi-input/multi-output control systems using a digital computer and MATLAB.

Uploaded by

IonelCop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
735 views547 pages

Advanced Control Using Matlab

This document provides an overview of using MATLAB for computer-aided control system design. It discusses modeling dynamic systems using differential equations, discretizing continuous system models for digital analysis and design, and analyzing stability. The document covers modeling techniques including regressing experimental data, examples from chemical processes, and numerical tools in MATLAB for modeling tasks. The goal is to explain how to simulate and analyze both single-input/single-output and multi-input/multi-output control systems using a digital computer and MATLAB.

Uploaded by

IonelCop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 547

Advanced Control using

M ATLAB

or Stabilising the
unstabilisable
David I. Wilson
Auckland University of Technology
New Zealand
April 12, 2013
Copyright © 2013 David I. Wilson
Auckland University of Technology
New Zealand

Creation date: April, 2013.

All rights reserved. No part of this work may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or
otherwise, without prior permission.
Contents

1 Introduction 1
1.1 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Matlab for computer aided control design . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Alternative computer design aids . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Economics of control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Laboratory equipment for control tests . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.1 Plants with one input and one output . . . . . . . . . . . . . . . . . . . . . . 6
1.4.2 Multi-input and multi-output plants . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Slowing down Simulink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 From differential to difference equations 13


2.1 Computer in the loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.1 Sampling an analogue signal . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.2 Selecting a sample rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.3 The sampling theorem and aliases . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.4 Discrete frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Finite difference models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 Difference equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 The z transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.1 z-transforms of common functions . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Inversion of z-transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.1 Inverting z-transforms with symbolically . . . . . . . . . . . . . . . . . . . . 25
2.4.2 The partial fraction method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.3 Long division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4.4 Computational approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.5 Numerically inverting the Laplace transform . . . . . . . . . . . . . . . . . . 31
2.5 Discretising with a sample and hold . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.5.1 Converting Laplace transforms to z-transforms . . . . . . . . . . . . . . . . 37
2.5.2 The bilinear transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.6 Discrete root locus diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.7 Multivariable control and state space analysis . . . . . . . . . . . . . . . . . . . . . . 43
2.7.1 States and state space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.7.2 Converting differential equations to state-space form . . . . . . . . . . . . . 47
2.7.3 Interconverting between state space and transfer functions . . . . . . . . . . 50
2.7.4 Similarity transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.7.5 Interconverting between transfer functions forms . . . . . . . . . . . . . . . 55
2.7.6 The steady state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.8 Solving the vector differential equation . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.8.1 Numerically computing the discrete transformation . . . . . . . . . . . . . . 61
2.8.2 Using M ATLAB to discretise systems . . . . . . . . . . . . . . . . . . . . . . . 63
2.8.3 Time delay in state space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.9 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

iii
iv CONTENTS

2.9.1 Stability in the continuous domain . . . . . . . . . . . . . . . . . . . . . . . . 70


2.9.2 Stability of the closed loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.9.3 Stability of discrete time systems . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.9.4 Stability of nonlinear differential equations . . . . . . . . . . . . . . . . . . . 74
2.9.5 Expressing matrix equations succinctly using Kronecker products . . . . . . 80
2.9.6 Summary of stability analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
2.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

3 Modelling dynamic systems with differential equations 85


3.1 Dynamic system models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.1.1 Steady state and dynamic models . . . . . . . . . . . . . . . . . . . . . . . . 86
3.2 A collection of illustrative models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.2.1 Simple models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.3 Chemical process models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.3.1 A continuously-stirred tank reactor . . . . . . . . . . . . . . . . . . . . . . . 92
3.3.2 A forced circulation evaporator . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.3.3 A binary distillation column model . . . . . . . . . . . . . . . . . . . . . . . 95
3.3.4 Interaction and the Relative Gain Array . . . . . . . . . . . . . . . . . . . . . 103
3.4 Regressing experimental data by curve fitting . . . . . . . . . . . . . . . . . . . . . . 107
3.4.1 Polynomial regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.4.2 Nonlinear least-squares model identification . . . . . . . . . . . . . . . . . . 112
3.4.3 Parameter confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.5 Numerical tools for modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
3.5.1 Differential/Algebraic equation systems and algebraic loops . . . . . . . . . 123
3.6 Linearisation of nonlinear dynamic equations . . . . . . . . . . . . . . . . . . . . . . 125
3.6.1 Linearising a nonlinear tank model . . . . . . . . . . . . . . . . . . . . . . . 127
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

4 The PID controller 131


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.1.1 P, PI or PID control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.2 The industrial PID algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.2.1 Implementing the derivative component . . . . . . . . . . . . . . . . . . . . 133
4.2.2 Variations of the PID algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.2.3 Integral only control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.3 Simulating a PID process in S IMULINK . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.4 Extensions to the PID algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
4.4.1 Avoiding derivative kick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
4.4.2 Input saturation and integral windup . . . . . . . . . . . . . . . . . . . . . . 140
4.5 Discrete PID controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
4.5.1 Discretising continuous PID controllers . . . . . . . . . . . . . . . . . . . . . 144
4.5.2 Simulating a PID controlled response in Matlab . . . . . . . . . . . . . . . . 146
4.5.3 Controller performance as a function of sample time . . . . . . . . . . . . . 148
4.6 PID tuning methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
4.6.1 Open loop tuning methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.6.2 Closed loop tuning methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
4.6.3 Closed loop single-test tuning methods . . . . . . . . . . . . . . . . . . . . . 159
4.6.4 Summary on closed loop tuning schemes . . . . . . . . . . . . . . . . . . . . 166
4.7 Automated tuning by relay feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
4.7.1 Describing functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
4.7.2 An example of relay tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
4.7.3 Self-tuning with noise disturbances . . . . . . . . . . . . . . . . . . . . . . . 173
4.7.4 Modifications to the relay feedback estimation algorithm . . . . . . . . . . . 176
4.8 Drawbacks with PID controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
CONTENTS v

4.8.1 Inverse response processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181


4.8.2 Approximating inverse-response systems with additional deadtime . . . . 183
4.9 Dead time compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
4.10 Tuning and sensitivity of control loops . . . . . . . . . . . . . . . . . . . . . . . . . . 187
4.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

5 Digital filtering and smoothing 193


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
5.1.1 The nature of industrial noise . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
5.1.2 Differentiating without smoothing . . . . . . . . . . . . . . . . . . . . . . . . 196
5.2 Smoothing measured data using analogue filters . . . . . . . . . . . . . . . . . . . . 197
5.2.1 A smoothing application to find the peaks and troughs . . . . . . . . . . . . 197
5.2.2 Filter types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
5.2.3 Classical analogue filter families . . . . . . . . . . . . . . . . . . . . . . . . . 201
5.3 Discrete filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
5.3.1 A low-pass filtering application . . . . . . . . . . . . . . . . . . . . . . . . . . 209
5.3.2 Digital filter approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
5.3.3 Efficient hardware implementation of discrete filters . . . . . . . . . . . . . 214
5.3.4 Numerical and quantisation effects for high-order filters . . . . . . . . . . . 217
5.4 The Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
5.4.1 Fourier transform definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
5.4.2 Orthogonality and frequency spotting . . . . . . . . . . . . . . . . . . . . . . 224
5.4.3 Using M ATLAB’s FFT function . . . . . . . . . . . . . . . . . . . . . . . . . . 225
5.4.4 Periodogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
5.4.5 Fourier smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
5.5 Numerically differentiating industrial data . . . . . . . . . . . . . . . . . . . . . . . 230
5.5.1 Establishing feedrates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

6 Identification of process models 235


6.1 The importance of system identification . . . . . . . . . . . . . . . . . . . . . . . . . 235
6.1.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
6.1.2 Black, white and grey box models . . . . . . . . . . . . . . . . . . . . . . . . 237
6.1.3 Techniques for identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
6.2 Graphical and non-parametric model identification . . . . . . . . . . . . . . . . . . 239
6.2.1 Time domain identification using graphical techniques . . . . . . . . . . . . 239
6.2.2 Experimental frequency response analysis . . . . . . . . . . . . . . . . . . . 246
6.2.3 An alternative empirical transfer function estimate . . . . . . . . . . . . . . 253
6.3 Continuous model identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
6.3.1 Fitting transfer functions using nonlinear least-squares . . . . . . . . . . . . 254
6.3.2 Identification using derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . 256
6.3.3 Practical continuous model identification . . . . . . . . . . . . . . . . . . . . 258
6.4 Popular discrete-time linear models . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
6.4.1 Extending the linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
6.4.2 Output error model structures . . . . . . . . . . . . . . . . . . . . . . . . . . 264
6.4.3 General input/output models . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
6.5 Regressing discrete model parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 266
6.5.1 Simple offline system identification routines . . . . . . . . . . . . . . . . . . 268
6.5.2 Bias in the parameter estimates . . . . . . . . . . . . . . . . . . . . . . . . . . 269
6.5.3 Using the System Identification toolbox . . . . . . . . . . . . . . . . . . . . . 270
6.5.4 Fitting parameters to state space models . . . . . . . . . . . . . . . . . . . . 274
6.6 Model structure determination and validation . . . . . . . . . . . . . . . . . . . . . 276
6.6.1 Estimating model order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
6.6.2 Robust model fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
vi CONTENTS

6.6.3 Common nonlinear model structures . . . . . . . . . . . . . . . . . . . . . . 280


6.7 Online model identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
6.7.1 Recursive least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
6.7.2 Recursive least-squares in M ATLAB . . . . . . . . . . . . . . . . . . . . . . . 286
6.7.3 Tracking the precision of the estimates . . . . . . . . . . . . . . . . . . . . . . 290
6.8 The forgetting factor and covariance windup . . . . . . . . . . . . . . . . . . . . . . 292
6.8.1 The influence of the forgetting factor . . . . . . . . . . . . . . . . . . . . . . . 294
6.8.2 Covariance wind-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
6.9 Identification by parameter optimisation . . . . . . . . . . . . . . . . . . . . . . . . . 296
6.10 Online estimating of noise models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
6.10.1 A recursive extended least-squares example . . . . . . . . . . . . . . . . . . 302
6.10.2 Recursive identification using the SI toolbox . . . . . . . . . . . . . . . . . . 305
6.10.3 Simplified RLS algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
6.11 Closed loop identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
6.11.1 Closed loop RLS in Simulink . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
6.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309

7 Adaptive Control 317


7.1 Why adapt? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
7.1.1 The adaption scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
7.1.2 Classification of adaptive controllers . . . . . . . . . . . . . . . . . . . . . . . 319
7.2 Gain scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
7.3 The importance of identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
7.3.1 Polynomial manipulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
7.4 Self tuning regulators (STRs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
7.4.1 Simple minimum variance control . . . . . . . . . . . . . . . . . . . . . . . . 323
7.5 Adaptive pole-placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
7.5.1 The Diophantine equation and the closed loop . . . . . . . . . . . . . . . . . 326
7.5.2 Solving the Diophantine equation in Matlab . . . . . . . . . . . . . . . . . . 327
7.5.3 Adaptive pole-placement with identification . . . . . . . . . . . . . . . . . . 330
7.6 Practical adaptive pole-placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
7.6.1 Dealing with non-minimum phase systems . . . . . . . . . . . . . . . . . . . 335
7.6.2 Separating stable and unstable factors . . . . . . . . . . . . . . . . . . . . . . 338
7.6.3 Experimental adaptive pole-placement . . . . . . . . . . . . . . . . . . . . . 340
7.6.4 Minimum variance control with dead time . . . . . . . . . . . . . . . . . . . 341
7.7 Summary of adaptive control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

8 Multivariable controller design 351


8.1 Controllability and observability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
8.1.1 Controllability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
8.1.2 Observability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
8.1.3 Computing controllability and observability . . . . . . . . . . . . . . . . . . 355
8.1.4 State reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
8.2 State space pole-placement controller design . . . . . . . . . . . . . . . . . . . . . . 359
8.2.1 Poles and where to place them . . . . . . . . . . . . . . . . . . . . . . . . . . 362
8.2.2 Deadbeat control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
8.3 Estimating the unmeasured states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
8.4 Combining estimation and state feedback . . . . . . . . . . . . . . . . . . . . . . . . 367
8.5 Generic model control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
8.5.1 The tuning parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
8.5.2 GMC control of a linear model . . . . . . . . . . . . . . . . . . . . . . . . . . 373
8.5.3 GMC applied to a nonlinear plant . . . . . . . . . . . . . . . . . . . . . . . . 375
8.6 Exact feedback linearisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
8.6.1 The nonlinear system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
CONTENTS vii

8.6.2 The input/output feedback linearisation control law . . . . . . . . . . . . . 380


8.6.3 Exact feedback example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
8.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386

9 Classical optimal control 387


9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
9.2 Parametric optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
9.2.1 Choosing a performance indicator . . . . . . . . . . . . . . . . . . . . . . . . 388
9.2.2 Optimal tuning of a PID regulator . . . . . . . . . . . . . . . . . . . . . . . . 389
9.2.3 Using S IMULINK inside an optimiser . . . . . . . . . . . . . . . . . . . . . . . 395
9.2.4 An optimal batch reactor temperature policy . . . . . . . . . . . . . . . . . . 396
9.3 The general optimal control problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
9.3.1 The optimal control formulation . . . . . . . . . . . . . . . . . . . . . . . . . 399
9.3.2 The two-point boundary problem . . . . . . . . . . . . . . . . . . . . . . . . 401
9.3.3 Optimal control examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
9.3.4 Problems with a specified target set . . . . . . . . . . . . . . . . . . . . . . . 406
9.4 Linear quadratic control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
9.4.1 Continuous linear quadratic regulators . . . . . . . . . . . . . . . . . . . . . 409
9.4.2 Analytical solution to the LQR problem . . . . . . . . . . . . . . . . . . . . . 411
9.4.3 The steady-state solution to the matrix Riccati equation . . . . . . . . . . . . 415
9.4.4 The discrete LQR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
9.4.5 A numerical validation of the optimality of LQR . . . . . . . . . . . . . . . . 423
9.4.6 An LQR with integral states . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
9.5 Estimation of state variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
9.5.1 Random processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
9.5.2 Combining deterministic and stochastic processes . . . . . . . . . . . . . . . 437
9.5.3 The Kalman filter estimation scheme . . . . . . . . . . . . . . . . . . . . . . . 438
9.5.4 The steady-state form of the Kalman filter . . . . . . . . . . . . . . . . . . . . 442
9.5.5 Current and future prediction forms . . . . . . . . . . . . . . . . . . . . . . . 443
9.5.6 An application of the Kalman filter . . . . . . . . . . . . . . . . . . . . . . . . 447
9.5.7 The role of the Q and R noise covariance matrices in the state estimator . . 448
9.5.8 Extensions to the basic Kalman filter algorithm . . . . . . . . . . . . . . . . . 452
9.5.9 The Extended Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
9.5.10 Combining state estimation and state feedback . . . . . . . . . . . . . . . . . 457
9.5.11 Optimal control using only measured outputs . . . . . . . . . . . . . . . . . 457
9.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459

10 Predictive control 461


10.1 Model predictive control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
10.1.1 Constrained predictive control . . . . . . . . . . . . . . . . . . . . . . . . . . 464
10.1.2 Dynamic matrix control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
10.2 A Model Predictive Control Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
10.2.1 A model predictive control GUI . . . . . . . . . . . . . . . . . . . . . . . . . 474
10.2.2 MPC toolbox in M ATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
10.2.3 Using the MPC toolbox in S IMULINK . . . . . . . . . . . . . . . . . . . . . . 476
10.2.4 Further readings on MPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
10.3 Optimal control using linear programming . . . . . . . . . . . . . . . . . . . . . . . 478
10.3.1 Development of the LP problem . . . . . . . . . . . . . . . . . . . . . . . . . 479
viii CONTENTS

11 Expert systems and neural networks 487


11.1 Expert systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487
11.1.1 Where are they used? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
11.1.2 Features of an expert system . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
11.1.3 The user interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
11.1.4 Expert systems used in process control . . . . . . . . . . . . . . . . . . . . . 490
11.2 Neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
11.2.1 The architecture of the neural network . . . . . . . . . . . . . . . . . . . . . . 495
11.2.2 Curve fitting using neural networks . . . . . . . . . . . . . . . . . . . . . . . 499
11.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503

A List of symbols 505

B Useful utility functions in Matlab 507

C Transform pairs 509

D A comparison of Maple and MuPad 511


D.1 Partial fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
D.2 Integral transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
D.3 Differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
D.4 Vectors and matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513

E Useful test models 515


E.1 A forced circulation evaporator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
E.2 Aircraft model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
List of Figures

1.1 Traditional vs. Advanced control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5


1.2 Economic improvements of better control . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Blackbox configuration. The manual switch marked will toggle between either 7
or 9 low-pass filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 The “Black-box” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Balance arm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 The flapper wiring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.7 Flapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.8 Helicopter plant with 2 degrees of freedom. See also Fig. 1.9(a). . . . . . . . . . . . 10
1.9 Helicopter control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.10 Helicopter flying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.11 Real-time Simulink simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1 The computer in the control loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14


2.2 3 bit sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 A time series with unknown frequency components . . . . . . . . . . . . . . . . . . 18
2.4 The frequency component of a sampled signal . . . . . . . . . . . . . . . . . . . . . 18
2.5 Frequency aliases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.6 The Scarlet Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.7 Hénon’s attractor in Simulink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.8 Hénon’s attractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.9 Inverting z-transforms using dimpulse . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.10 Numerically inverting the Laplace transform using the Bromwich integral . . . . . 33
2.11 Numerically inverting the Laplace transform . . . . . . . . . . . . . . . . . . . . . . 33
2.12 Numerically inverting Laplace transforms . . . . . . . . . . . . . . . . . . . . . . . . 34
2.13 Ideal sampler and zeroth-order hold . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.14 Zeroth-order hold effects on the discrete Bode diagram . . . . . . . . . . . . . . . . 41
2.15 Bode plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.16 The discrete root locus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.17 Various discrete closed loop responses . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.18 A binary distillation column with multiple inputs and multiple outputs . . . . . . 45
2.19 A block diagram of a state-space dynamic system, (a) continuous system: ẋ =
Ax + Bu, and (b) discrete system: xk+1 = Φxk + ∆uk . (See also Fig. 2.20.) . . . . 47
2.20 A complete block diagram of a state-space dynamic system with output and direct
measurement feed-through, Eqn. 2.41. . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.21 Unsteady and steady states for level systems . . . . . . . . . . . . . . . . . . . . . . 57
2.22 Submarine step response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.23 Issues in assessing system stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.24 Nyquist diagram of Eqn. 2.94 in (a) three dimensions and (b) as typically presented
in two dimensions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.25 Liapunov (1857–1918) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.26 Regions of stability for the poles of continuous (left) and discrete (right) systems . 82

ix
x LIST OF FIGURES

3.1 A stable and unstable pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87


3.2 Simple buffer tank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.3 The UK growth based on the GDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.4 A CSTR reactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.5 A forced circulation evaporator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.6 Schematic of a distillation column . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.7 Wood-Berry step response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.8 Wood-Berry column in S IMULINK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.9 Distillation tower . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.10 Sparsity of the distillation column model . . . . . . . . . . . . . . . . . . . . . . . . 101
3.11 Open loop distillation column control . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.12 Distillation column control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.13 Distillation column control (in detail) . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.14 Distillation interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.15 Dynamic RGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.16 Dynamic RGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.17 Density of Air . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.18 Fitting a high-order polynomial to some physical data . . . . . . . . . . . . . . . . . 111
3.19 A bio-chemical reaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.20 Model of compressed water . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.21 Experimental pressure-rate data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.22 Parameter confidence regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.23 Linear and nonlinear trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
3.24 Linearising a nonlinear tank model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

4.1 Comparing PI and integral-only control for the real-time control of a noisy flapper
plant with sampling time T = 0.08. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.2 PID simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.3 PID internals in Simulink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.4 Block diagram of PID controllers as implemented in S IMULINK (left) and classical
text books (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.5 Realisable PID controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
4.6 PID controller in Simulink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
4.7 PID controller with anti-derivative kick. . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.8 Avoiding derivative kick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.9 Illustrating the improvement of anti-derivative kick schemes for PID controllers
when applied to the experimental electromagnetic balance. . . . . . . . . . . . . . . 140
4.10 Derivative control and noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.11 Anti-windup comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
4.12 Discrete PID controller in S IMULINK . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
4.13 Headbox control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
4.14 Headbox controlled response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
4.15 A PID controlled process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
4.16 Sample time and discrete PID control . . . . . . . . . . . . . . . . . . . . . . . . . . 149
4.17 The parameters T and L to be graphically estimated for the openloop tuning method
relations given in Table 4.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
4.18 Cohen-Coon model fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
4.19 Cohen-Coon tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
4.20 PID tuning using a GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
4.21 Solving for the ultimate frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
4.22 Ziegler-Nichols tuned responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
4.23 Typical response of a stable system to a P-controller. . . . . . . . . . . . . . . . . . . 160
4.24 A Yuwana-Seborg closed loop step test . . . . . . . . . . . . . . . . . . . . . . . . . . 163
4.25 Closed loop responses using the YS scheme . . . . . . . . . . . . . . . . . . . . . . . 164
LIST OF FIGURES xi

4.26 A self-tuning PID controlled process . . . . . . . . . . . . . . . . . . . . . . . . . . . 167


4.27 A process under relay tuning with the PID regulator disabled. . . . . . . . . . . . . 167
4.28 An unknown plant under relay feedback exhibits an oscillation . . . . . . . . . . . 169
4.29 Nyquist & Bode diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
4.30 PID Relay tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
4.31 Relay tuning with noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
4.32 Relay tuning of the blackbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
4.33 Relay tuning results of the blackbox . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
4.34 A relay with hysteresis width h and output amplitude d. . . . . . . . . . . . . . . . 177
4.35 Relay feedback with hysteresis width h. . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.36 Relay feedback with hysteresis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.37 Relay feedback with an integrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.38 2-point Relay identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
4.39 The J curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
4.40 An inverse response process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
4.41 A NMP plant controlled with a PI controller . . . . . . . . . . . . . . . . . . . . . . 183
4.42 Approximating inverse-response systems with additional deadtime . . . . . . . . . 184
4.43 The Smith predictor structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
4.44 The Smith predictor structure from Fig. 4.43 assuming no model/plant mis-match. 186
4.45 Smith predictor in Simulink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
4.46 Dead time compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
4.47 Deadtime compensation applied to the blackbox . . . . . . . . . . . . . . . . . . . . 188
4.48 Closed loop with plant G(s) and controller C(s) subjected to disturbances and
measurement noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
4.49 Sensitivity transfer functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
4.50 Sensitivity robustness measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

5.1 A filter as a transfer function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193


5.2 A noisy measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
5.3 Noise added to a true, but unknown, signal . . . . . . . . . . . . . . . . . . . . . . . 195
5.4 Derivative action given noisy data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
5.5 Smoothing industrial data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
5.6 Low-pass filter specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
5.7 Three single low-pass filters cascaded together to make a third-order filter. . . . . . 199
5.8 Amplitude response for ideal, low-pass, high pass and band-pass filters. . . . . . . 200
5.9 Analogue Butterworth filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
5.10 Analogue Chebyshev filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
5.11 Butterworth and Chebyshev filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
5.12 Using a Butterworth filter to smooth noisy data . . . . . . . . . . . . . . . . . . . . . 211
5.13 The frequency response for Butterworth filters . . . . . . . . . . . . . . . . . . . . . 212
5.14 Various Butterworth filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
5.15 Advantages of frequency pre-warping . . . . . . . . . . . . . . . . . . . . . . . . . . 214
5.16 Hardware difference equation in Direct Form I . . . . . . . . . . . . . . . . . . . . . 215
5.17 An IIR filter with a minimal number of delays, Direct Form II . . . . . . . . . . . . 216
5.18 Cascaded second-order sections to realise a high-order filter. See also Fig. 5.19. . . 217
5.19 A second-order section (SOS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
5.20 Comparing single precision second-order sections with filters in direct form II
transposed form. Note that the direct form II filter is actually unstable when run
in single precision. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
5.21 Approximating square waves with sine waves . . . . . . . . . . . . . . . . . . . . . 223
5.22 The Fourier approximation to a square wave . . . . . . . . . . . . . . . . . . . . . . 223
5.23 Two signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
5.24 Critical radio frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
5.25 Power spectrum for a signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
xii LIST OF FIGURES

5.26 Smoothing by Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230


5.27 Differentiating and smoothing noisy measurement . . . . . . . . . . . . . . . . . . . 232
5.28 Filtering industrial data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

6.1 The prediction problem: Given a model and the input, u, can we predict the out-
put, y? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
6.2 A good model, M, duplicates the behaviour of the true plant, S. . . . . . . . . . . . 237
6.3 An experimental setup for input/output identification. We log both the input and
the response data to a computer for further processing. . . . . . . . . . . . . . . . . 238
6.4 Typical open loop step tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
6.5 Areas method for model identification . . . . . . . . . . . . . . . . . . . . . . . . . . 241
6.6 Examples of the Areas method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
6.7 Identification of the Blackbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
6.8 Balance arm step test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
6.9 Random signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
6.10 A 5-element binary shift register to generate a pseudo-random binary sequence. . 245
6.11 Pseudo-random binary sequence generator in S IMULINK . . . . . . . . . . . . . . . 245
6.12 Black box experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
6.13 Black box response analysis using a series of sine waves . . . . . . . . . . . . . . . . 247
6.14 Black box response using an input chirp signal. . . . . . . . . . . . . . . . . . . . . . 249
6.15 Black box frequency response analysis using a chirp signal. . . . . . . . . . . . . . . 249
6.16 Flapper response to a chirp signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
6.17 Experimental setup to subject a random input into an unknown plant. The in-
put/output data was collected, processed through Listing 6.2 to give the frequency
response shown in Fig. 6.18. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
6.18 The experimental frequency response compared to the true analytical Bode dia-
gram. See the routine in Listing 6.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
6.19 Black box response given a pseudo-random input sequence. . . . . . . . . . . . . . 252
6.20 Black box frequency response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
6.21 Empirical transfer function estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
6.22 Experimental data from a continuous plant . . . . . . . . . . . . . . . . . . . . . . . 255
6.23 A continuous-time model fitted to input/output data . . . . . . . . . . . . . . . . . 256
6.24 Continuous model identification strategy . . . . . . . . . . . . . . . . . . . . . . . . 257
6.25 Continuous model identification simulation . . . . . . . . . . . . . . . . . . . . . . . 259
6.26 Continuous model identification of the blackbox . . . . . . . . . . . . . . . . . . . . 260
6.27 Identification using Laguerre functions . . . . . . . . . . . . . . . . . . . . . . . . . 261
6.28 A signal flow diagram of an auto-regressive model with exogenous input or ARX
model. Compare this structure with the similar output-error model in Fig. 6.30. . . 262
6.29 A signal flow diagram of a ARMAX model. Note that the only difference between
this, and the ARX model in Fig. 6.28, is the inclusion of the C polynomial filtering
the noise term. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
6.30 A signal flow diagram of an output-error model. Compare this structure with the
similar ARX model in Fig. 6.28. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
6.31 A general input/output model structure . . . . . . . . . . . . . . . . . . . . . . . . . 265
6.32 ARX estimation exhibiting bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
6.33 Offline system identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
6.34 Identification of deadtime from the step response . . . . . . . . . . . . . . . . . . . 278
6.35 Deadtime estimation at fast sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 278
6.36 Deadtime estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
6.37 Blackbox step model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
6.38 Hammerstein and Wiener model structures . . . . . . . . . . . . . . . . . . . . . . . 281
6.39 Ideal RLS parameter estimation. (See also Fig. 6.41(a).) . . . . . . . . . . . . . . . . 286
6.40 Recursive least squares estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
6.41 RLS under S IMULINK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
LIST OF FIGURES xiii

6.42 RLS under S IMULINK (version 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289


6.43 Estimation of a two parameter plant . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
6.44 Confidence limits for estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
6.45 RLS and an abrupt plant change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
6.46 The memory when using a forgetting factor . . . . . . . . . . . . . . . . . . . . . . . 293
6.47 Identification using various forgetting factors . . . . . . . . . . . . . . . . . . . . . . 294
6.48 Covariance wind-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
6.49 The MIT rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
6.50 Optimising the adaptation gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
6.51 Addition of coloured noise to a dynamic process. See also Fig. 6.31. . . . . . . . . . 301
6.52 RLS with coloured noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
6.53 Nonlinear parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
6.54 Recursive extended least-squares estimation . . . . . . . . . . . . . . . . . . . . . . 305
6.55 A simplified recursive least squares algorithm . . . . . . . . . . . . . . . . . . . . . 307
6.56 Furnace input/output data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
6.57 Closed loop estimation using Simulink . . . . . . . . . . . . . . . . . . . . . . . . . . 309
6.58 RLS in Simulink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

7.1 The structure of an indirect adaptive controller . . . . . . . . . . . . . . . . . . . . . 319


7.2 Varying process gain of a spherical tank . . . . . . . . . . . . . . . . . . . . . . . . . 320
7.3 Simple minimum variance control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
7.4 Simple minimum variance control (zoomed) . . . . . . . . . . . . . . . . . . . . . . 325
7.5 Adaptive pole-placement control structure . . . . . . . . . . . . . . . . . . . . . . . 325
7.6 Adaptive pole-placement control structure with RLS identification . . . . . . . . . 331
7.7 Control of multiple plants with an adapting controller. We desire the same closed
loop response irrespective of the choice of plant. . . . . . . . . . . . . . . . . . . . . 332
7.8 Three open loop plants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
7.9 Desired closed loop response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
7.10 Adaptive pole-placement with identification . . . . . . . . . . . . . . . . . . . . . . 334
7.11 Comparing the adaptive pole-placement with the reference trajectory . . . . . . . . 335
7.12 Bursting in adaptive control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
7.13 A plant with poorly damped zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
7.14 Adaptive pole-placement with an unstable B . . . . . . . . . . . . . . . . . . . . . . 339
7.15 Pole-zero map of an adaptive pole-placement . . . . . . . . . . . . . . . . . . . . . . 340
7.16 Areas of well-damped poles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
7.17 Adaptive pole-placement of the black-box . . . . . . . . . . . . . . . . . . . . . . . . 342
7.18 Adaptive pole-placement of the black-box . . . . . . . . . . . . . . . . . . . . . . . . 343
7.19 A non-minimum phase plant with an unstable zero which causes an inverse step
response. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
7.20 Moving average control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346

8.1 Reconstructing states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360


8.2 Pole-placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
8.3 Deadbeat control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
8.4 Simultaneous control and state estimation . . . . . . . . . . . . . . . . . . . . . . . . 368
8.5 Control and estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
8.6 GMC tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
8.7 GMC tuning characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
8.8 Linear GMC response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
8.9 Linear GMC comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
8.10 GMC CSTR control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
8.11 A CSTR phase plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
8.12 The configuration of an input/output feedback linearisation control law . . . . . . 381
8.13 Exact feedback linearisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
xiv LIST OF FIGURES

9.1 IAE areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389


9.2 ITAE breakdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
9.3 Optimal responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
9.4 Optimal PID tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
9.5 Optimum PI tuning of the blackbox plant . . . . . . . . . . . . . . . . . . . . . . . . 394
9.6 S IMULINK model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
9.7 Production of a valuable chemical in a batch reactor. . . . . . . . . . . . . . . . . . . 396
9.8 Temperature profile optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
9.9 Temperature profile optimisation using 3 temperatures . . . . . . . . . . . . . . . . 398
9.10 Optimum temperature profile comparison for different number of temperatures . 398
9.11 Optimal control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
9.12 Optimal control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
9.13 Optimal control with targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
9.14 Steady-state and time-varying LQR control . . . . . . . . . . . . . . . . . . . . . . . 414
9.15 Steady-state continuous LQR controller . . . . . . . . . . . . . . . . . . . . . . . . . 417
9.16 Comparing discrete and continuous LQR controllers . . . . . . . . . . . . . . . . . . 420
9.17 LQR control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
9.18 Pole-placement and LQR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
9.19 Pole-placement and LQR showing the input . . . . . . . . . . . . . . . . . . . . . . 425
9.20 Trial pole locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
9.21 Trial pole-placement performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
9.22 State feedback control system with an integral output state . . . . . . . . . . . . . . 428
9.23 State feedback with integral states . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
9.24 Black box servo control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
9.25 A state-based estimation and control scheme . . . . . . . . . . . . . . . . . . . . . . 431
9.26 PDF of a random variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
9.27 Correlated noisy x, y data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
9.28 2D ellipse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
9.29 Kalman filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
9.30 A block diagram of a steady-state prediction-type Kalman filter applied to a linear
discrete plant. Compare with the alternative form in Fig. 9.31. . . . . . . . . . . . . 444
9.31 A block diagram of a steady-state current estimator-type Kalman filter applied to
a linear discrete plant. Compare with the alternative prediction form in Fig. 9.30. . 445
9.32 Kalman filter demonstration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
9.33 The performance of a Kalman filter for different q/r ratios . . . . . . . . . . . . . . 451
9.34 A random walk process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456
9.35 LQG in S IMULINK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458

10.1 Horizons used in model predictive control . . . . . . . . . . . . . . . . . . . . . . . 462


10.2 Predictions of the Reserve Bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
10.3 Inverse plant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
10.4 Predictive control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
10.5 Acausal response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
10.6 Varying the horizons of predictive control . . . . . . . . . . . . . . . . . . . . . . . . 468
10.7 MPC on the blackbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
10.8 Step response coefficients, gi , for a stable system. . . . . . . . . . . . . . . . . . . . . 469
10.9 DMC control details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
10.10DMC control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
10.11Adaptive DMC of the blackbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
10.12An MPC graphical user interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
10.13Multivariable MPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
10.14S IMULINK and MPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
10.15MPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
10.16LP constraint matrix dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
LIST OF FIGURES xv

10.17LP optimal control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484


10.18LP optimal control with active constraints . . . . . . . . . . . . . . . . . . . . . . . . 485
10.19Non-square LP optimal control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
10.20LP optimal control showing acausal behaviour . . . . . . . . . . . . . . . . . . . . . 486

11.1 Possible neuron activation functions. . . . . . . . . . . . . . . . . . . . . . . . . . . 495


11.2 A single neural processing unit with multiple inputs . . . . . . . . . . . . . . . . . . 496
11.3 Single layer feedforward neural network . . . . . . . . . . . . . . . . . . . . . . . . 497
11.4 A 3 layer fully interconnected feedforward neural network . . . . . . . . . . . . . . 497
11.5 Single layer network with feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
11.6 An unknown input/output function . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
11.7 Fitting a Neural-Network to an unknown input/output function . . . . . . . . . . 501
11.8 Tide predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
List of Tables

1.1 Computer aids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 Final and initial value theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24


2.2 Inverting a z transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3 Laplace transform pairs used for testing . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.1 Standard nomenclature used in modelling dynamic systems . . . . . . . . . . . . . 90


3.2 Parameters of the CSTR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.3 The important variables in the forced circulation evaporator . . . . . . . . . . . . . 94
3.4 Compressed water . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.5 The parameter values for the CSTR model . . . . . . . . . . . . . . . . . . . . . . . . 122
3.6 The initial state and manipulated variables for the CSTR simulation . . . . . . . . . 123

4.1 Alternative PID tuning parameter conventions . . . . . . . . . . . . . . . . . . . . . 132


4.2 Ziegler-Nichols open-loop PID tuning rules . . . . . . . . . . . . . . . . . . . . . . . 152
4.3 PID controller settings based on IMC for a small selection of common plants where
the control engineer gets to chose a desired closed loop time constant, τc . . . . . . . 154
4.4 Various alternative ‘Ziegler-Nichols’ type PID tuning rules as a function of the
ultimate gain, Ku , and ultimate period, Pu . . . . . . . . . . . . . . . . . . . . . . . . 155
4.5 Closed-loop single-test PID design rules . . . . . . . . . . . . . . . . . . . . . . . . . 161
4.6 Relay based PID tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

5.1 Filter transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

6.1 Experimentally determined frequency response of the blackbox . . . . . . . . . . . 248


6.2 Identification in state-space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

8.1 The relationship between regulation and estimation . . . . . . . . . . . . . . . . . . 367


8.2 Litchfield nonlinear CSTR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376

9.1 Common integral performance indices . . . . . . . . . . . . . . . . . . . . . . . . . . 391

11.1 Comparing expert systems and neural networks . . . . . . . . . . . . . . . . . . . . 494

xvii
Listings

2.1 Symbolic Laplace to z-transform conversion . . . . . . . . . . . . . . . . . . . . . . 37


2.2 Symbolic Laplace to z-transform conversion with ZOH . . . . . . . . . . . . . . . . 37
2.3 Extracting the gain, time constants and numerator time constants from an arbitrary
transfer function format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.4 Submarine simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.5 Example of the Routh array using the symbolic toolbox . . . . . . . . . . . . . . . . 71
2.6 Solve the continuous matrix Lyapunov equation using Kronecker products . . . . 77
2.7 Solve the matrix Lyapunov equation using the lyap routine . . . . . . . . . . . . . 78
2.8 Solve the discrete matrix Lyapunov equation using Kronecker products . . . . . . 79
3.1 Computing the dynamic relative gain array analytically . . . . . . . . . . . . . . . . 105
3.2 Computing the dynamic relative gain array numerically as a function of ω. See
also Listing 3.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.3 Curve fitting using polynomial least-squares . . . . . . . . . . . . . . . . . . . . . . 109
3.4 Polynomial least-squares using singular value decomposition. This routine fol-
lows from, and provides an alternative to Listing 3.3. . . . . . . . . . . . . . . . . . 111
3.5 Curve fitting using a generic nonlinear optimiser . . . . . . . . . . . . . . . . . . . . 113
3.6 Curve fitting using the O PTI optimisation toolbox. (Compare with Listing 3.5.) . . 113
3.7 Fitting water density as a function of temperature and pressure . . . . . . . . . . . 115
3.8 Parameter confidence limits for a nonlinear reaction rate model . . . . . . . . . . . 118
3.9 Comparing the dynamic response of a pendulum to the linear approximation . . . 121
3.10 Using linmod to linearise an arbitrary S IMULINK module. . . . . . . . . . . . . . . 127
4.1 Constructing a transfer function of a PID controller . . . . . . . . . . . . . . . . . . 133
4.2 Constructing a discrete (filtered) PID controller . . . . . . . . . . . . . . . . . . . . . 145
4.3 A simple PID controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
¿ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
4.4 Ziegler-Nichols PID tuning rules for an arbitrary transfer function . . . . . . . . . . 159
4.5 Identifies the characteristic points for the Yuwana-Seborg PID tuner from a trial
closed loop response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
4.6 Compute the closed loop model from peak and trough data . . . . . . . . . . . . . 165
4.7 Compute the ultimate gain and frequency from the closed loop model parameters. 165
4.8 Compute the open loop model, Gm , Eqn. 4.31. . . . . . . . . . . . . . . . . . . . . . 165
4.9 Compute appropriate PI or PID tuning constants based on a plant model, Gm ,
using the IMC schemes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
4.10 Calculates the period and amplitude of a sinusoidal time series using least-squares. 174
5.1 Designing Butterworth Filters using Eqn. 5.4. . . . . . . . . . . . . . . . . . . . . . . 203
5.2 Designing a low-pass Butterworth filter with a cut-off frequency of fc = 800 Hz. . 203
5.3 Designing a high-pass Butterworth filter with a cut-off frequency of fc = 800 Hz. . 203
5.4 Designing Chebyshev Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
5.5 Computing a Chebyshev Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
5.6 Converting a 7th-order Butterworth filter to 4 second-order sections . . . . . . . . . 218
5.7 Comparing DFII and SOS digital filters in single precision. . . . . . . . . . . . . . . 219
5.8 Routine to compute the power spectral density plot of a time series . . . . . . . . . 226

xix
xx LISTINGS

5.9 Smoothing and differentiating a noisy signal . . . . . . . . . . . . . . . . . . . . . . 232


6.1 Identification of a first-order plant with deadtime from an openloop step response
using the Areas method from Algorithm 6.1. . . . . . . . . . . . . . . . . . . . . . . 241
6.2 Frequency response identification of an unknown plant directly from input/out-
put data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
6.3 Non-parametric frequency response identification using etfe. . . . . . . . . . . . . 253
6.4 Function to generate output predictions given a trial model and input data. . . . . 255
6.5 Optimising the model parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
6.6 Validating the fitted model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
6.7 Continuous model identification of a non-minimum phase system . . . . . . . . . . 258
6.8 Generate some input/output data for model identification . . . . . . . . . . . . . . 268
6.9 Estimate an ARX model from an input/output data series using least-squares . . . 269
6.10 An alternative way to construct the data matrix for ARX estimation using Toeplitz
matrices. See also Listing 6.9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
6.11 Offline system identification using arx from the System Identification Toolbox . . 271
6.12 Offline system identification with no model/plant mismatch . . . . . . . . . . . . . 271
6.13 Demonstrate the fitting of an AR model. . . . . . . . . . . . . . . . . . . . . . . . . . 272
6.14 Create an input/output sequence from an output-error plant. . . . . . . . . . . . . 273
6.15 Parameter identification of an output error process using oe and arx. . . . . . . . 273
6.16 A basic recursive least-squares (RLS) update (without forgetting factor) . . . . . . . 284
6.17 Tests the RLS identification scheme using Listing 6.16. . . . . . . . . . . . . . . . . . 286
6.18 A recursive least-squares (RLS) update with a forgetting factor. (See also List-
ing 6.16.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
6.19 Adaption of the plant gain using steepest descent . . . . . . . . . . . . . . . . . . . 299
6.20 Create an ARMAX process and generate some input/output data suitable for sub-
sequent identification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
6.21 Identify an ARMAX process from the data generated in Listing 6.20. . . . . . . . . 303
6.22 Recursively identify an ARMAX process. . . . . . . . . . . . . . . . . . . . . . . . . 303
6.23 Kaczmarz’s algorithm for identification . . . . . . . . . . . . . . . . . . . . . . . . . 306
7.1 Simple minimum variance control where the plant has no time delay . . . . . . . . 323
7.2 A Diophantine routine to solve F A + BG = T for the polynomials F and G. . . . . 328
7.3 Alternative Diophantine routine to solve F A + BG = T for the polynomials F and
G. Compare with Listing 7.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
7.4 Constructing polynomials for the Diophantine equation example . . . . . . . . . . 329
7.5 Solving the Diophantine equation using polynomials generated from Listing 7.4. . 330
7.6 Adaptive pole-placement control with 3 different plants . . . . . . . . . . . . . . . . 333
7.7 The pole-placement control law when H = 1/B . . . . . . . . . . . . . . . . . . . . 337
7.8 Factorising a polynomial B(q) into stable, B + (q) and unstable and poorly damped,
B − (q) factors such that B = B + B − and B + is defined as monic. . . . . . . . . . . . 338
7.9 Minimum variance control design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
8.1 A simple state reconstructor following Algorithm 8.1. . . . . . . . . . . . . . . . . . 359
8.2 Pole-placement control of a well-behaved system . . . . . . . . . . . . . . . . . . . . 362
8.3 A deadbeat controller simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
8.4 Pole placement for controllers and estimators . . . . . . . . . . . . . . . . . . . . . . 369
8.5 GMC on a Linear Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
8.6 GMC for a batch reactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
8.7 The dynamic equations of a batch reactor . . . . . . . . . . . . . . . . . . . . . . . . 377
8.8 Find the Lie derivative for a symbolic system . . . . . . . . . . . . . . . . . . . . . . 383
8.9 Establish relative degree, r (ignore degree 0 possibility) . . . . . . . . . . . . . . . . 384
8.10 Design Butterworth filter of order r. . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
8.11 Symbolically create the closed loop expression . . . . . . . . . . . . . . . . . . . . . 384
9.1 Returns the IAE performance for a given tuning. . . . . . . . . . . . . . . . . . . . . 392
9.2 Optimal tuning of a PID controller for a non-minimum phase plant. This script file
uses the objective function given in Listing 9.1. . . . . . . . . . . . . . . . . . . . . . 392
LISTINGS xxi

9.3 Returns the ITSE using a S IMULINK model. . . . . . . . . . . . . . . . . . . . . . . . 395


9.4 Analytically computing the co-state dynamics and optimum input trajectory as a
function of states and co-states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
9.5 Solving the reaction profile boundary value problem using the boundary value
problem solver, bvp4c.m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
9.6 Computes the full time-evolving LQR solution . . . . . . . . . . . . . . . . . . . . . 413
9.7 The continuous time differential Riccati equation. This routine is called from List-
ing 9.8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
9.8 Solves the continuous time differential Riccati equation using a numerical ODE
integrator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
9.9 Calculate the continuous optimal steady-state controller gain. . . . . . . . . . . . . 416
9.10 Closed loop simulation using an optimal steady-state controller gain. . . . . . . . . 416
9.11 Solving the algebraic Riccati equation for P∞ using Kronecker products and vec-
torisation given matrices A, B, Q and R. . . . . . . . . . . . . . . . . . . . . . . . . 417
9.12 Calculate the discrete optimal steady-state gain by ‘iterating until exhaustion’.
Note it is preferable for numerical reasons to use lqr for this computation. . . . . 419
9.13 Comparing the continuous and discrete LQR controllers. . . . . . . . . . . . . . . . 419
9.14 An LQR controller for the blackbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
9.15 Comparing an LQR controller from Listing 9.14 with a pole-placement controller . 424
9.16 Computing the closed loop poles from the optimal LQR controller from Listing 9.14.426
9.17 Comparing the actual normally distributed random numbers with the theoretical
probability density function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
9.18 Probability and inverse probability distributions for the F -distribution. . . . . . . . 434
9.19 Generate some correlated random data. . . . . . . . . . . . . . . . . . . . . . . . . . 434
9.20 Plot a 3D histogram of the random data from Listing 9.19. . . . . . . . . . . . . . . 435
9.21 Compute the uncertainty regions from the random data from Listing 9.20. . . . . . 436
9.22 Validating the uncertainty regions computed theoretically from Listing 9.21. . . . . 436
9.23 Solving the discrete time Riccati equation using exhaustive iteration around Eqn. 9.98
or alternatively using the dare routine. . . . . . . . . . . . . . . . . . . . . . . . . . 443
9.24 Alternative ways to compute the Kalman gain . . . . . . . . . . . . . . . . . . . . . 445
9.25 State estimation of a randomly generated discrete model using a Kalman filter. . . 447
9.26 Computing the Kalman gain using dlqe. . . . . . . . . . . . . . . . . . . . . . . . . 449
9.27 Demonstrating the optimality of the Kalman filter. . . . . . . . . . . . . . . . . . . . 450
9.28 Potter’s algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
10.1 Predictive control with input saturation constraints using a generic nonlinear op-
timiser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
10.2 Objective function to be minimised for the predictive control algorithm with input
saturation constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
10.3 Dynamic Matric Control (DMC) control . . . . . . . . . . . . . . . . . . . . . . . . . 471
10.4 Setting up an MPC controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
10.5 Optimal control using linear programming . . . . . . . . . . . . . . . . . . . . . . . 482
11.1 Generate some arbitrary data to be used for subsequent fitting . . . . . . . . . . . . 500
B.1 Polynomial addition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
B.2 Multiple convolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
B.3 Strip leading zeros from a polynomial. . . . . . . . . . . . . . . . . . . . . . . . . . . 508
Chapter 1

Introduction

Mathematicians may flatter themselves that they posses new ideas which mere human language is as yet
unable to express. Let them make the effort to express those ideas in appropriate words without the aid of
symbols, and if they succeed, they will not only lay us laymen under a lasting obligation but, we venture to
say, they will find themselves very much enlightened during the process, and will even be doubtful
whether the ideas expressed as symbols had ever quite found their way out of the equations into their
minds.
James Clerk Maxwell, 1890

Control, in an engineering sense, is where actions are taken to ensure that a particular physi-
cal process responds in some desired manner. Automatic control is where we have relieved the
human operator from the tedium of consistently monitoring the process and supplying the nec-
essary corrections. Control as a technical discipline is therefore important not only in the fields
of engineering, but also in economics, sociology and indeed in most aspects of our life. When
studying control, we naturally assume that we do conceivably have some chance of influencing
things. For example, it is worthwhile to study the operation of a coal fired power plant in order
to minimise possibly polluting emissions, but it is not worth our time to save the world from the
next ice age, or as the results of a special study group who investigated methods designed to pro-
tect the world from a stray comet (such as the one postulated to have wiped out the dinosaurs 80
million years ago) concluded, there was nothing feasible we could do, such as change the earth’s
orbit, or blast the asteroid, to avoid the collision. In these latter examples, the problem exists, but
our influence is negligible.

The teaching of control has changed in emphasis over the last decade from that of linear single-
input/single-output systems elegantly described in the Laplace domain, to general nonlinear
multiple-input/multiple-output systems best analysed in the state space domain. This change
has been motivated by the increasing demands by industry and public to produce more, faster or
cleaner and is now much more attractive due to the impressive improvements in computer aided
tools, such as M ATLAB used in these notes. This new emphasis is called advanced (or modern)
control as opposed to the traditional or classical control. This set of notes is intended for students
who have previously attended a first course in automatic control covering the usual continuous
control concepts like Laplace transforms, Bode diagrams, stability of linear differential equations,
PID controllers, and perhaps some exposure to discrete time control topics like z transforms and
state space.

This book attempts to describe what advanced control is, and how it is applied in engineering
applications with emphasis on the construction of controllers using computer aided design tools
such as the numerical programming environment M ATLAB from the MathWorks, [134]. With

1
2 CHAPTER 1. INTRODUCTION

this tool, we can concentrate on the intentions behind the design procedures, rather than the
mechanics to follow them.

1.1 Scope

Part one contains some revision material in z-transforms, modelling and PID controller tuning.
The discrete domain, z-transforms, and stability concepts with a brief discussion of appropriate
numerical methods are introduced in chapter 2. A brief potpourri of modelling is summarised
in chapter 3. Chapter 4 is devoted to the most common industrial controller, the three term PID
controller with emphasis on tuning, implementation and limitations. Some basic concepts from
signal processing such as filtering and smoothing are introduced in chapter 5. Identification and
the closely related adaptive control are together in chapters 6 and 7. State space analysis and
optimal control design are given in the larger chapters 8 and 9.

Notation conventions

Throughout these notes I have used some typographical conventions. In mathematical expres-
sions, scalar variables are written in italic such as a, b, c, or if Greek, γ, ϕ while vectors x, y are
upright bold lower case and matrices, A, ∆ are bold upper case. More notation is introduced as
required.

Computer commands and output and listings are given in a fixed-width font as A=chol(B*B’).
In some cases where you are to type an interactive command, the M ATLAB prompt >> is given,
and the computed solution returned. If no ambiguity exists such as in the case for functions, the
prompt is omitted.

1.2 Matlab for computer aided control design

Modern control design has heavy computing requirements. In particular one needs to:

1. manipulate symbolic algebraic expressions, and

2. perform intensive numerical calculations and simulations for proto-typing and testing quickly
and reliably, and finally

3. to implement the controller at high speed in special hardware such as an embedded con-
troller or a digital signal processing (DSP) chip perhaps using assembler.

To use this new theory, it is essential to use computer aided design (CAD) tools efficiently as
real-world problems can rarely be solved manually. But as [170] point out, ‘the use of computers
in the design of control systems has a long and fairly distinguished history’. This book uses
M ATLAB for the design, simulation and prototyping of controllers.

M ATLAB, (which is short for MATrix LABoratory), is a programming environment that grew out
of an effort to create an easy user-interface to the very popular and well regarded public domain
F ORTRAN linear algebra collection of programmes, L INPACK and E ISPACK. With this direct inter-
pretive interface, one can write quite sophisticated algorithms in a very high level language, that
are consequently easy to read and maintain. Today M ATLAB is a commercial package, (although
1.2. MATLAB FOR COMPUTER AIDED CONTROL DESIGN 3

some public domain lookalikes exist), that is supported with a variety of toolboxes comprising
of collections of source code subroutines organised in areas of specific interest. The toolboxes we
are most interested in, and used in this book are:

Control toolbox containing functions for controller design, frequency domain analysis, conver-
sions between various models forms, pole placement, optimal control etc. (Used through-
out)
Symbolic toolbox which contains a gateway to the symbolic capabilities of M APLE.
Signal processing toolbox containing filters, wave form generation and spectral analysis. (Used
principally in chapter 5.)
System identification toolbox for identifying the parameters of various dynamic model types.
(Used in chapter 6.) You may also find the following free statistics toolbox useful available
at: www.maths.lth.se/matstat/stixbox/.
Real-time toolbox can be used to interface M ATLAB to various analogue to digital converters.

The student version of M ATLAB (at time of writing) has a special S IGNALS & S YSTEMS TOOLBOX
that has a subset of routines from the control and signal processing toolboxes. Other toolboxes
used for some sections of the notes are the O PTIMISATION TOOLBOX, used in chapter 9 and the
N EURAL N ETWORK toolbox.

Additional documentation to that supplied with M ATLAB is the concise and free summary notes
[183] or the more recent [68]. Recently there has been exponential growth of other texts that heav-
ily use M ATLAB (such as this one), and a current list is available from the Mathwork’s anonymous
ftp server at www.mathworks.com. This server also contains many user contributed codes, as
well as updates, bug fixes etc.

If M ATLAB, or even programming a high level language is new to you, then [201] is a cheap
recommended compendium, similar in form to this, covering topics in numerical analysis, again
with many M ATLAB examples.

1.2.1 Alternative computer design aids

Table 1.1 lists a number of alternatives computer-aided design and modelling environments sim-
ilar and complimentary to M ATLAB.

Product WWW site comment


S CI L AB www.scilab.org Free Matlab/Simulink clone
O CTAVE www.octave.org Free Matlab clone, inactive
RL AB rlabplus.sourceforge.net Matlab clone, Linux
V ISUAL M ODEL Q www.qxdesign.com shareware Simulink clone
M ATH V IEWS www.mathwizards.com shareware
M U PAD www.mupad.de Interfaces with S CI L AB
M APLE www.maplesoft.com commercial CAS
M ATHEMATICA www.mathematica.com commercial CAS

Table 1.1: Shareware or freeware Matlab lookalikes and computer algebra systems

Unlike M ATLAB, symbolic manipulators are computer programs that by manipulating symbols
can perform algebra. Such programs are alternatively known as computer algebra systems or
4 CHAPTER 1. INTRODUCTION

CAS. The most well known examples are M ATHEMATICA, M APLE, M U PAD, and M ACSYMA,
(see Table 1.1). These programs can find analytical solutions to many mathematical problems
involving integrals, limits, special functions and so forth. They are particularly useful in the
controller design stage.

The Numerics in Control1 group in Europe have collect together a freeware F ORTRAN subroutine
library S LICOT for routines relevant in systems and control.

Problem 1.1 1. Familiarise yourself with the fundamentals of M ATLAB. Run the M ATLAB
demo by typing demo once inside M ATLAB.

2. Try the M ATLAB tutorial (part 1).

3. Read through the M ATLAB primer, [183] or [68], and you should get acquainted with the
M ATLAB users manual.

1.3 Economics of control

Most people would agree that Engineers apply technology, but what do these two words really
mean? Technology is derived from two Greek words, techne which means skill or art, and logia
which means science or study. The interesting point here is that the art component is included.
The English language unfortunately confuses the word engine with engineering so that many peo-
ple have the mistaken view that engineers drive engines (mostly). Actually engineer is derived
from the Latin ingeniatorium which means one who is ingenious at devising. A far cry from the
relatively simple act of piloting jumbos. An interesting American perspective of the professional
engineer and modern technology is given as light reading in [2] and Florman’s The Existential
Pleasures of Engineering, [66].

Chemical engineering is, succinctly put, chemistry constrained by cost. The chemist wants the
reaction to proceed, the chemical engineer takes that for granted, but is interested in increasing
the rate, or pushing the equilibrium, or in most cases both at the same time. As the incentive to
produce better products increases, accompanied by an awareness of potent global competition
driving one to reduce costs, process control becomes an important aspect of engineering.

Obviously modern computerised process control systems are expensive. They are especially ex-
pensive compared with other computers such as office or financial computers because the market
is smaller, the environment harsher, the graphical requirements more critical, the duties more var-
ied, and the potential payoffs larger. Process control has at least two main duties to perform; first
to ensure that the plant is operated safely (that is protect the plant, environment and people), and
second that the product quality is consistent with some customer or regulatory body demanded
specifications. There is always a trade off between how much control you apply and the bene-
fits that result. An automobile manufacturer could produce an almost totally indestructible car
(ie, tank), but the expense of raw materials required, and the high running costs would certainly
deem the project an economic failure.

On the other hand, in 1965 Ralph Nader complained in the aptly named Unsafe at Any Speed
about the poor quality of American automotive engineering, the lack of controls and the unsafe
result. This influential book challenged the balance from between commercial profits and more
quality control. A product with less variation in quality may be worth more than a product that
has a higher average quality, but more variation. A potentially dangerous production facility that
regularly destroys equipment, people or surroundings is not usually tolerated by the licensing
authorities.
1 The home page is located at http://www.win.tue.nl/niconet/niconet.html
1.3. ECONOMICS OF CONTROL 5

Safety concerns motivate


better control.

Fig. 1.1, adapted from [6], gives an industrial perspective of the status of process control in 1994.
The techniques are divided into those considered “classical” or traditional, which demand only
modest digital online computing power, (if any), little in the way of explicit process models or
understanding, and those termed loosely “advanced”.

Advanced control

Optimisation Direct search Rule definition Online simulation

Constraint Steady state dynamic methods single variable multivariable

Regulatory Signal condition PID algorithm Deadtime feedforward control


compensation

Basic analogue control PLCs DCS Process computer

Field Valves Transmitters smart transmitters Onstream analysers

Traditional control

Figure 1.1: A comparison of traditional vs. advanced process control techniques. Adapted from
[6].

One of the major concerns for the process control engineer is to reduce the output variance. If
the variation about the setpoint is small, then the setpoint can be shifted closer to the operating
constraint, without increasing the frequency of alarms. Fig. 1.2 demonstrates the ideal case that
while popular in the advertising literature, is harder to achieve unambiguously in practice.

Many text books in the control field are very vague about the actual configuration of real pro-
cess control systems used today. Other books that take a more trade oriented approach, are
vague about the academic side of the control performance. There are a number of reasons for
this. First many texts try hard to describe only the theoretical aspects of digital control, and any-
thing remotely applied is not considered worthy of their attention. Secondly, the control systems
are rapidly changing as the cost of micro-processors drop in price, and different programming
methods come into flavour. Thirdly many industries are deliberately vague about publishing
the details of their control system since they perceive that this information could help their com-
petitors. However some information of this type is given in [63, pp131-149] and [17]. One good
6 CHAPTER 1. INTRODUCTION

upper quality constraint


violations

spt #3
loss ($) spt #2

setpoint #1

process output advanced control

regulatory

manual control

time

Figure 1.2: Economic improvements owing to better control. If the control scheme can reduce
the variance, the setpoint can be shifted closer to the operating or quality constraint, thereby
decreasing operating costs.

balance for the practitioner is [143].

1.4 Laboratory equipment for control tests

Obviously if we are to study automatic control with the aim to control eventually chemical plants,
manufacturing processes, robots, undertake filtering to do active noise cancellation and so forth,
we should practice, preferably on simpler, more well understood, and potentially less hazardous
equipment.

In the Automatic Control Laboratory in the Department of Electrical Engineering at the Karlstad
University, Sweden we have a number of simple bench-scale plants to test identification and
control algorithms on.

1.4.1 Plants with one input and one output

The blackbox

Fig. 1.3 and Fig. 1.4(a) shows what we perhaps unimaginatively refer to as a “black-box’. It is a
box, and it is coloured black. Subjecting the box to an input voltage from 0 to 5 volts delivers an
output voltage also spanning from around 0 to 5 volts, but lagging behind the input voltage since
the internals of the blackbox are simply either 7 or 9 (depending on the switch position) low-pass
passive filters cascaded together.

The blackbox is a relatively well behaved underdamped stable system with dominant time con-
stants of around 5 to 10 seconds. Fig. 1.4(b) shows the response of the blackbox to two input
steps. The chief disadvantage of using this device for control studies is that the output response
1.4. LABORATORY EQUIPMENT FOR CONTROL TESTS 7

Input Output
From computer
To computer
D/A - B LACK -B OX Sluggish - A/D
(connector #10) (connector #1)

-
Fast
GND - input indicator blue - Earth
blue
(connector #11)

Figure 1.3: Blackbox configuration. The manual switch marked will toggle between either 7 or 9
low-pass filters.

Blackbox step response

0.45

0.4

0.35

0.3
ipnut/output

0.25

0.2

0.15 input
output
0.1

0.05

−0.05
15 20 25 30 35 40 45 50
time (s)

(a) Black-box wiring to the National Instruments (b) The response of the blackbox to 2 step inputs
LabPC terminator.

Figure 1.4: The “Black-box”

is not ‘visible’ to the naked eye, and that we cannot manually introduce disturbances. One com-
plication you can do is to cascade two blackboxes together to modify the dynamics.

Electro-magnetic balance arm

The electromagnetic balance arm shown in Fig. 1.5(a) is a fast-acting, highly oscillatory, plant
with little noise. The aim is to accurately weigh small samples by measuring the current required
to keep the balance arm level, or alternatively just to position the arm at different angles. The
output response to a step in input shown in Fig. 1.5(b) indicates how long it would take for the
oscillations to die away.
8 CHAPTER 1. INTRODUCTION

Step response of the balance arm


1

0.8

0.6

arm position
0.4

0.2

−0.2

−0.4
20 30 40 50 60 70 80 90 100

0.3

0.2

0.1
input

−0.1

−0.2

20 30 40 50 60 70 80 90 100
time ∆T=0.05 seconds

(a) The electromagnetic balance arm (b) The response of the arm to a step changes in input.

Figure 1.5: The electromagnetic balance arm

Flapper

Contrary to the balance arm, the flapper in Fig. 1.6 and Fig. 1.7(b) has few dynamics, but signif-
icant low-pass filtered measurement noise. An interesting exercise is to place two flapper units
in close proximity. The air from one then disturbs the flapper from the other which makes an
interacting multivariable plant.

Stepper motors

A stepper motor is an example of a totally discrete system.

1.4.2 Multi-input and multi-output plants

It is possible to construct multivariable interacting plants by physically locating two plants close
to each other. One possibility is to locate two flappers adjacent to each other, another possibility
is one flapper and one balance arm. The extent of the interaction can be varied by adjusting the
relative position of the two plants.

Helicopter

The model helicopter, Fig. 1.8, is an example of a highly unstable, multivariable (3 inputs, 2
outputs) nonlinear, strongly interacting plant. It is a good example of where we must apply
control (or crash and burn). Fig. 1.9(b) shows the controlled response using 2 PID controllers to
control the direction and altitude. Fig. 1.10(a) shows a 3 dimensional view of the desired and
actual flight path.
1.4. LABORATORY EQUIPMENT FOR CONTROL TESTS 9

position transducer


O
- angle
- Flapper arm
Motor -
-
6 6
?

6 6
power to the motor

auto/man
N

6 6 switch On/Off
manual

? ?
Digital to Analogue
A/D
(Fan speed)

Figure 1.6: The flapper wiring

Step response of the flapper


0.35

0.3

0.25
output

0.2

0.15

0.1
0 5 10 15 20 25 30 35 40 45 50

0.4

0.3
input

0.2

0.1

−0.1
0 5 10 15 20 25 30 35 40 45 50
time (sec) dt=0.05

(a) The fan/flapper equipment (b) Step response of the flapper

Figure 1.7: The Fan and flapper.


10 CHAPTER 1. INTRODUCTION

control inputs
z }| {
top rotor

side rotor

- 1 moveable counter weight


 
 azimuth
plant outputs
 -

 )
elevation angle
support stand

Figure 1.8: Helicopter plant with 2 degrees of freedom. See also Fig. 1.9(a).

1.5 Slowing down Simulink

For some applications like the development of PID controllers you want to be able to slow down
the S IMULINK simulation to have time to manually introduce step changes, add disturbances,
switch from automatic to manual etc. If left alone in simulation mode, S IMULINK will run as fast
as possible, but it will slow down when it needs to sub-sample the integrator around discontinu-
ities or periods when the system is very stiff.

The S IMULINK Execution Control block allows you to specify that the simulation runs at a multi-
ple of real-time. This is most useful when you want to slow down a simulation, or ensure that it
runs at a constant rate.

The block is available from: http://www.mathworks.com/matlabcentral/fileexchange

The implementation is shown in Fig. 1.11 where the Simulation Execution Control block looks
like an A/D card but in fact is not connected to anything, although it does export some diagnostic
timing information.

An alternative method to slow down a Simulink simulation and force it to run at some set rate is
to use the commercial Humusoft real-time toolbox, but not with the A/D card actually interfaced
to anything.
1.5. SLOWING DOWN SIMULINK 11

(a) The ‘flying’ helicopter balanced using 2 PID controllers since it is openloop unstable and would otherwise crash.

Helicopter
1.5

0.5
output

−0.5

−1

−1.5
0 10 20 30 40 50 60 70 80 90 100

0.5
input

−0.5

−1
0 10 20 30 40 50 60 70 80 90 100
Time (sec)
(b) Multivariable PID control of the helicopter exhibiting mediocre controlled response and severe
derivative kick.

Figure 1.9: Multivariable PID control of an unstable helicopter


12 CHAPTER 1. INTRODUCTION

Helicopter flight path

0.5

0
Up/Down

−0.5

−1
1.5
1
0.5
0 250
200
−0.5
150
−1 100
50
East/West −1.5 0
time

(a) Multivariable helicopter control (b) Model helicopter in the trees

Figure 1.10: Helicopter flying results

Scope1
Simulink
Execution
Control

1
s+1
Signal Transfer Fcn Scope
Generator

Figure 1.11: Real-time Simulink simulations. A parameter in the S IMULINK E XECUTION C ON -


TROL block sets the speed of the simulation.
Chapter 2

From differential to difference


equations

Excerpt from What a sorry state of affairs, Martin Walker, Guardian Weekly, June 29, 1997.

. . . But then he said something sensible, as he quite often does. . . .

“We need to treat individuals as individuals and we need to address discrete problems for what they are,
and not presume them to be part of some intractable racial issue.” Gingrich, properly understood, is a
national treasure, and not only because he is one of the few Americans who understand the difference
between “discreet” and “discrete”.

2.1 Computer in the loop

The cost and flexibility advantages of implementing a control scheme in software rather than
fabricating it in discrete components today are simply too large to ignore. However by inserting a
computer to run the software necessitates that we work with discrete regularly sampled signals.
This added complexity, which by the way is more than compensated by the above mentioned
advantages, introduces a whole new control discipline, that of discrete control.

Fig. 2.1 shows a common configuration of the computer in the loop. For the computer to respond
to any outside events, the signals must first be converted from an analogue form to a digital
signal, say 1 to 5 Volts, which can, with suitable processing be wired to an input port of the
computer’s processor. This device that accomplishes this conversion is called an Analogue to
Digital converter or A/D. Similarly, any binary output from the pins on a processor must first be
converted to an analogue signal using a Digital to Analogue converter, or D/A. In some micro-
controllers, (rather than micro-processors), such as Intel’s 8048 or some versions of the 8051, these
converters may be implemented on the microcontroller chip.

Digital to analogue conversion is easy and cheap. One simply loads each bit across different re-
sistors, and sums the resultant voltages. The conversion is essentially instantaneous. Analogue
to digital is not nearly as easy nor cheap, and this is the reason that the common data acquisition
cards you can purchase for your PC will often multiplex the analogue input channel. There are
various schemes for the A/D, one using a D/A inside a loop using a binary search algorithm.
Obviously this conversion is not instantaneous, although this is not normally considered a prob-
lem for process control applications. Any introductory electrical engineering text such as [171]

13
14 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

Process control computer

+
-
r(t) + - A/D - computer - D/A - Plant - y(t)
setpoint 6
− output

Figure 2.1: The computer in the control loop

will give further details on the implementation details.

2.1.1 Sampling an analogue signal

It is the A/D converter that is the most interesting for our analysis of the discrete control loop.
The A/D converter will at periodic intervals defined by the computer’s clock sample the continu-
ous analogue input signal. The value obtained is typically stored in a device called a zeroth-order
hold until it is eventually replaced by the new sample collected one sample time later. Given con-
straints on cost, the A/D converter will only have a limited precision, or limited number of bits
in which to store the incoming discrete value. Common A/D cards such as the PCLabs card,
[3], use 12 bits giving 212 or slightly over 4 thousand discretisation levels. The residual chopped
away is referred to as the quantisation error. For a given number of bits, b, used in the converter,
the amplitude quantisation is
δ = 2−b
Low cost analogue converters may only use 8 bits, while digital audio equipment use between
16 and 18 bits.

Fig. 2.2 shows the steps in sampling an analogue signal with a three bit (8 discrete levels), A/D
sampler. The dashed stair plot gives an accurate representation of the sampled signal, but owing
to the quantisation error, we are left with the solid stair plot. You can reproduce Fig. 2.2 in
M ATLAB using the fix command to do the chopping, and stairs to construct the stair plot.
While other types of hold are possible, anything higher than a first-order hold is rarely used.

2.1.2 Selecting a sample rate

Once we have decided to implement discrete control, rather than continuous control, we must
decided on a reasonable sampling rate. This is a crucial parameter in discrete control systems.
The sample time, (T or sometimes denoted ∆t ), is measured in time units, say seconds or in
industrial applications, minutes. The reciprocal of the sample time is the sample frequency, f ,
and is usually measured in cycles/second or Hertz. The radial or angular velocity (which some
confusingly also term frequency) is denoted ω and is measured in radians/second. The inter-
relationships between these quantities are
   
1 cycles ω radians/s
f= = (2.1)
T second 2π radians/cycle
2.1. COMPUTER IN THE LOOP 15

Sampling a continuous signal

7 Analogue
Sampled
6 Sampled & quantisized

4
Signal

2 Figure 2.2: Sampling an analogue signal (heavy


solid) with a three bit (8 discrete levels) A/D con-
1 verter and zeroth-order hold. The sampled val-
0 ues, •, are chopped to the next lowest discrete
0 2 4 6 8 10 12
integer level giving the sampled and quantisied
Sample time, (k∆T) output

The faster the sampling rate, (the smaller the sampling time, T ), the better our discretised signal
approximates the real continuous signal. However, it is uneconomic to sample too fast, as the
computing and memory hardware may become too expensive. When selecting an appropriate
sampling interval, or sample rate, we should consider the following issues:

• The maximum frequency of interest in the signal

• The sampling theorem which specifies a lower limit required on the sampling rate to resolve
any particular frequency unambiguously. (See §2.1.3 following.)

• Any analogue filtering that may be required (to reduce the problem of aliasing)

• The cost of the hardware and the speed of the A/D converters.

Ogata discusses the selection of a sample time qualitatively in [148, p38]. However for most
chemical engineering processes, which are dominated by relatively slow and overdamped pro-
cesses, the sample time should lie somewhere between ten times the computational delay of the
hardware tc and some small fraction of the process dominant time constant τ , say
τ
10tc ≤ T ≤ (2.2)
10
For most chemical engineering applications, the computational delay is negligible compared with
the process time constant, (tc → 0), so we often choose T ≈ τ /10. Thus for a simple first order
rise, we would expect to have about 20–30 data samples from 0 to 99%. Some may argue that
even this sampling rate is too high, and opt for a more conservative (larger) sample time down
to τ /6. Note that commonly used guidelines such as presented in Table 22.1 in [179, p535] span a
wide range of recommended sample times.

Overly fast sampling

Apart from the high cost of fast A/D converters, there is another argument against fast sampling.
When one samples a continuous system, the poles of a stable continuous system map to the poles
of a stable discrete system as T goes to zero. However the zeros in the LHP of a continuous
system may not map to zeros inside the unit disk of a discrete system as T tends to zero. This
nonintuitive result could create problems if one samples a system, and then uses the inverse
of this system within a one step controller, since now the zeros outside the unit circle become
16 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

unstable poles inside the controller for small sample times. Note that the inverse continuous
system is stable, and the discrete inverse system will be stable for large sample times, and it will
only be unstable for small sample times.

2.1.3 The sampling theorem and aliases

“Dinsdale was a gentleman. And what’s more he knew how to treat a female impersonator”
John Cleese in Monty Python (15-9-1970)

The sampling theorem gives the conditions necessary to ensure that the inverse sampling pro-
cedure is a “one to one” relationship. To demonstrate the potential problems when sampling,
consider the case where we have two sinusoidal signals, but at different frequencies.1
   
7 1
y1 = − sin 2π t and y2 = sin 2π t
8 8

If we sample these two signals relatively rapidly, say at 25Hz (T =40ms) then we can easily see
two distinct sine curves. However if we sample at T = 1, we obtain identical points.

Aliasing
1

1 t = [0:0.04:8]'; 0.5
y1= -sin(2*pi*7/8*t);
signal

y2= sin(2*pi/8*t);
plot(t,[y1, y2]) 0

As shown opposite, we note that y1 −0.5


(solid) and y2 (dashed) just happen to
coincide at t = 0, 1, 2, 3, . . . (•). −1
0 2 4 6 8
time (s)

Consequently, at the slower sampling rate of 1 Hz, we cannot distinguish between the two dif-
ferent signals. This is the phenomenon known as aliasing since one of the frequencies “pretends”
to be another. In conclusion, two specific sinusoids of different frequencies can have identical
sampled signals. Thus in the act of taking samples from a continuous measurement, we have lost
some information. Since we have experienced a problem when we sample too slowly, it is rea-
sonable to ask what the minimum rate is so that no aliasing occurs. This question is answered by
the sampling theorem which states:

To recover a signal from its sample, one must sample at least two times a
period, or alternatively sample at a rate twice the highest frequency of interest
in the signal.

Alternatively, the highest frequency we can unambiguously reconstruct for a given sampling
rate, 1/T , is half of this, or 1/(2T ). This is called the Nyquist frequency, fN .

In the second example above when we sampled at 1 Hz, the sampling radial velocity was ωs =
2π = 6.28 rad/s. This was satisfactory to reconstruct the low frequency signal (f1 = 1/8 Hz) since
1 This example was adapted from Franklin & Powell p81
2.1. COMPUTER IN THE LOOP 17

2ω1 = 1.58 rad/s. We are sampling faster than this minimum, so we can reconstruct this signal.
However for the faster signal (f2 = 7/8 Hz), we cannot reconstruct this signal since 2ω2 = 11.0
rad/s, which is faster than the sampling radial velocity.

2.1.4 Discrete frequency

If we sample the continuous signal,


x(t) = A cos(ωt)
with a sample time of T ,
x(nT ) = A cos(ωnT ) = A cos(Ωn)
where the digital frequency, Ω, is defined as

def f
Ω = ωT = 2πf T = 2π
fs

then the range of analogue frequencies is 0 < f < ∞ while the range of digital frequencies is
limited by the Nyquist sampling limit, fs /2 giving the allowable range for the digital frequency
as
0≤Ω≤π

M ATLAB unfortunately decided on a slightly different standard in the S IGNAL P ROCESSING tool-
box. Instead of a range from zero to π, M ATLAB uses a range from zero to 2 where 1 corresponds
to half the sampling frequency or the Nyquist frequency. See [42].

In summary:

symbol units
sample time T or ∆t s
sampling frequency fs = T1 Hz
angular velocity ω = 2πf rad/s
digital frequency Ω = ωT = 2πf
fs –

where the allowable ranges are:

0 ≤ω < ∞, continuous
0 ≤Ω ≤ π, sampled

and Nyquist frequency, fN , and the dimensionless Nyquist frequency, ΩN are:

fs 1
fN = = [Hz]
2 2T
2πfN
ΩN = =π
fs

It is practically impossible to avoid aliasing problems when sampling, using only digital filters.
Almost all measured signals are corrupted by noise, and this noise has usually some high or even
infinite frequency components. Thus the noise is not band limited. With this noise, no matter
how fast we sample, we will always have some reflection of a higher frequency component that
appears as an impostor or alias frequency.
18 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

If aliasing is still a problem, and you cannot sample at a higher rate, then you can insert a low
pass analogue filter between the measurement and the analogue to digital converter (sampler).
The analogue filter or in this case known as an anti-aliasing filter, will band-limit the signal, but
not corrupt it with any aliasing. Expensive high fidelity audio equipment will still use analogue
filters in this capacity. Analogue and digital filters are discussed in more detail in chapter 5.

Detecting aliases

Consider the trend y(t) in Fig. 2.3 where we wish to estimate the important frequency compo-
nents of the signal. It is evident that y(t) is comprised of one or two dominating harmonics.

3
continuous
Ts=0.7
2
Ts=1.05

1
output

Figure 2.3: Part of a noisy time series with


−1
unknown frequency components. The con-
tinuous underlying signal (solid) is sam- −2
0 1 2 3 4 5 6 7
pled at T = 0.7, (•), and at T = 1.05 s (△). time (s)

The spectral density when sampling at T = 0.7s (• in Fig.2.3) given in the upper trend of Fig. 2.4
exhibits three distinct peaks. These peaks are the principle frequency components of the signal
and are obtained by plotting the absolute value of the Fourier transform of the time signal2 ,
|DF T {y(t)}|. Reading off the peak positions, and for the moment overlooking any potential

2
10 T = 0.7 s
s
power

0
10
f =0.71
N

2
10 T = 1.05 s
s
power

0
10
f =0.48
N

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8


frequency (Hz)

Figure 2.4: The frequency component of a signal sampled at Ts = 0.7s (upper) and Ts = 1.05s
(lower). The Nyquist frequencies for both cases are shown as vertical dashed lines. See also
Fig. 2.5.

problems with undersampling, we would expect y(t) to be something like

y(t) ≈ sin(2π0.1t) + sin(2π0.5t) + sin(2π0.63t)

However in order to construct Fig. 2.4 we had to sample the original time series y(t) possibly
introducing spurious frequency content. The Nyquist frequency fN = 1/(2Ts ) is 0.7143 and is
2 More about the spectral analysis or the power spectral density of signals comes in chapter 5.
2.1. COMPUTER IN THE LOOP 19

shown as a vertical dashed line in Fig. 2.4(top). The power spectrum is reflected in this line, but
is not shown in Fig. 2.4.

If we were to re-sample the process at a different frequency and re-plot the power density plot
then the frequencies that were aliased will move in this second plot. The △ points in Fig. 2.3 are
sampled at ∆t = 1.05s with corresponding spectral power plot in Fig. 2.4. The important data
from Fig. 2.4 is repeated below.

Curve Ts (s) fN (Hz) peak 1 peak 2 peak 3


top 0.7 0.7143 0.1 0.50 0.63
bottom 1.05 0.4762 0.1 0.152 0.452

First we note that the low frequency peak (f1 = 0.1Hz) has not shifted from curve a (top) to
curve b (bottom), so we would be reasonably confident that f1 = 0.1Hz and is not corrupted by
the sampling process.

However, the other two peaks have shifted, and this shift must be due to the sampling process.
Let us hypothesize that f2 = 0.5Hz. If this is the case, then it will appear as an alias in curve b
since the Nyquist frequency for curve b (fN (b) = 0.48) is less than the proposed f2 = 0.5, but it
will appear in the correct position on curve a. The apparent frequency fˆ2 (b) on curve b will be

fˆ2 (b) = 2fN (b) − f2 = 2 × 0.4762 − 0.5 = 0.4524

which corresponds to the third peak on curve b. This would seem to indicate that our hypothesis
is correct for f2 . Fig. 2.5 shows this reflection.

2
10 Ts = 0.7 s
power

0
10
f =0.71
N

2
10 T = 1.05 s
s
Figure 2.5: Reflecting the frequency re-
power

sponse in the Nyquist frequency from


0
10 Fig. 2.4 shows why one of the peaks is ev-
f =0.48
N
ident in the frequency spectrum at Ts =
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
frequency (Hz) 1.05, but what about the other peak?

Now we turn our attention to the final peak. f3 (a) = 0.63 and f2 (b) = 0.152. Let us again try
the fact that f3 (a) is the true frequency = 0.63Hz. If this were the case the apparent frequency
on curve b would be fˆ3 (b) = 2fN (b) − f2 = 0.3224Hz. There is no peak on curve b at this
frequency so our guess is probably wrong. Let us suppose that the peak at f3 (a) is the first
harmonic. In that case the true frequency will be f3 = 2fN (a) − fˆ3 (a) = 0.8Hz. Now we check
using curve b. If the true third frequency is 0.8Hz, then the apparent frequency on curve b will
be fˆ3 (b) = 2fN (b) − f3 = 0.153Hz. We have a peak here which indicates that our guess is a good
one. In summary, a reasonable guess for the unknown underlying function is

y(t) ≈ sin(2π0.1t) + sin(2π0.5t) + sin(2π0.8t)

although we can never be totally sure of the validity of this model. At best we could either re-
sample at a much higher frequency, say fs > 10Hz, or introduce an analogue low pass filter to cut
out, or at least substantially reduce, any high frequencies that may be reflected.
20 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

2.2 Finite difference models

To compute complex mathematical operations such as differentiation or integration, on a digital


computer, we must first discretise the continuous equation into a discrete form. Discretising is
where we convert a continuous time differential equation into a difference equation that is then
possible to solve on a digital computer.

Figure 2.6: In Nathaniel Hawthorne’s The Scarlet Letter, Hester


Prynne is forced to wear a large, red “A” on her chest when she is
found guilty of adultery and refuses to name the father of her illegiti-
mate child.

In control applications, we are often required to solve ordinary differential equations such as
dy
= f (t, y) (2.3)
dt
for y(t). The derivative can be approximated by the simple backward difference formula
dy yt − yt−T
≈ (2.4)
dt T
where T is the step size in time or the sampling interval. Provided T is kept sufficiently small,
(but not zero) the discrete approximation to the continuous derivative is reasonable. Inserting
this approximation, Eqn. 2.4 into our original differential equation, Eqn. 2.3, and re-arranging for
yt results in the famous Euler backward difference scheme;
yt = yt−T + T f (t − T, yt−T ) (2.5)
Such a scheme is called a recurrence scheme since a previous solution, yt−T , is used to calculate
yt , and is suitable for computer implementation. The beauty of this crude method of differen-
tiation is the simplicity and versatility. Note that we can discretise almost any function, linear
or nonlinear without needing to solve analytically the original differential equation. Of course
whether our approximation to the original problem is adequate is a different story and this com-
plex, but important issue is addressed in the field of numerical analysis.

Problem 2.1 1. What is the finite difference approximation for a third order derivative?
2. Write down the second order finite difference approximation to
d2 y dy
τ2 + 2ζτ + y = ku
dt2 dt

2.2.1 Difference equations

Difference equations are the discrete counterpart to continuous differential equations. Often the
difference equations we are interested in are the result of the discretisation of a continuous dif-
ferential equation, but sometimes, as in the case below, we may be interested in the difference
2.2. FINITE DIFFERENCE MODELS 21

equation in its own right. Difference equations are much easier to simulate and study in a digital
computer than a differential equation for example, since they are already written in a form where
we can just step along following a relation

xk+1 = f (xk )

rather than resorting to integration. This step by step procedure is also known as a mathematical
mapping since we map the old vector xk to a new vector xk+1 .

Hénon’s chaotic attractor

The system of discrete equations known as Hénon’s attractor,

xk+1 = (yk + 1) − 1.4x2k , x0 = 1 (2.6)


yk+1 = 0.3xk , y0 = 1 (2.7)

is an interesting example of a discrete mapping which exhibits chaotic behaviour.

We will start from point x0 = (x0 , y0 ) = (1, 1) and investigate the subsequent behaviour using
S IMULINK. Since Hénon’s attractor is a discrete system, we will use unit delays to shift back
from xk+1 → xk and to better visualise the results we will plot an (x, y) phase plot. The S IMULINK
simulation is given in Fig. 2.7. Compare this S IMULINK construction (which I have drawn de-
liberately to flow from right to left) so that it matches the way we read the system equations,
Eqn. 2.6-Eqn. 2.7.

1.4
Product
Gain2

x 1/z 1
To Workspace x1 Sum1 Constant

XY Graph
y 1/z 0.3
To Workspace2 y1
Gain

Figure 2.7: Hénon’s attractor, (Eqns 2.6–2.7), modelled in S IMULINK. See also Fig. 2.8.

The time response of x and y, (left figures in Fig. 2.8), of this mapping is not particularly inter-
esting, but the result of the phase plot (without joining the dots), (right figure in Fig. 2.8), is an
interesting mapping.

Actually what the chaotic attractor means is not important at this stage, this is after all only a
demonstration of a difference equation. However the points to note in this exercise are that we
never actually integrated any differential equations, but only stepped along using the unit delay
blocks in S IMULINK. Consequently these types of simulations are trivial to program, and run
very fast in a computer. Another example of a purely discrete system is given in later Fig. 6.10.
22 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

2
1
1 Start point
xk

−1 0.5

k
−2
1

y
0.5 0
k
y

−0.5 −0.5
0 5 10 15 20 25 30 35 −1.5 −1 −0.5 0 0.5 1 1.5
time counter x
k

(a) The time response (b) The (x, y) or phase response

Figure 2.8: The time response (left) and phase plot (right) of Hénon’s attractor as computed by
the S IMULINK block diagram in Fig. 2.7.

2.3 The z transform

The sampling of a continuous function f (t) at regular sampling intervals equally spaced T time
units apart creates a new function f ∗ (t),

X
f ∗ (t) = f (kT ) δ(t − kT ) (2.8)
n=0

where δ(x) is the Dirac delta function which is defined as being a perfect impulse at x = 0,
and zero elsewhere. Note that the actual value of δ(0) is infinity, but the integral of δ(t) is 1.0.
Expanding the summation term in Eqn 2.8 gives

f ∗ (t) = f0 δ(t) + fT δ(t − T ) + f2T δ(t − 2T ) + f3T δ(t − 3T ) + · · ·

Now suppose we wish to know the value of the third sampled function f ∗ (t = 3T ). For simplicity
we will assume the sample time is T = 1.

f ∗ (3) = f0 δ(3) + f1 δ(2) + f2 δ(1) + f3 δ(0) + f4 δ(−1) + · · ·


| {z } | {z }
all zero all zero

All the terms except for the term containing δ(0) are zero, while δ(0) = ∞. Thus the “value”
of the function f ∗ (3) = ∞. Often you will see a graph of f ∗ (t) depicted where the heights of
the values at the sample times are the same as the height of the continuous distribution such as
sketched in Fig. 2.13. Strictly this is incorrect, as it is the ‘strength’ or integral of f ∗ (t) which is the
same as the value of the continuous distribution f (t). I think of the “function” f ∗ (t) as a series of
impulses whose integral is equal to the value of the continuous function f (t).

If we take the Laplace transform of the sampled function f ∗ (t), we get


(∞ )
X
L {f ∗ (t)} = L fkT δ(t − kT ) (2.9)
n=0

Now the function fkT is assumed constant so it can be factored out of the Laplace transform,
and the Laplace transform of the δ(0) is simply 1.0. If the impulse is delayed kT units, then the
2.3. THE Z TRANSFORM 23

Laplace transform is e−kT × 1. Thus Eqn. 2.9 simplifies to



X ∞
X
L {f ∗ (t)} = fkT e−kT s = fk z −k
n=0 k=0

where we have defined


def
z = esT (2.10)
In summary, the z-transform of the sampled function f ∗ (t) is defined as

X
def
F (z) = Z {f ∗ (t)} = fk z −k (2.11)
k=0

The function F (z) is an infinite series, but can be written in a closed form if f (t) is a rational
function.

2.3.1 z-transforms of common functions

We can use the definition of the z-transform, Eqn. 2.11, to compute the z-transform of common
functions such as steps, ramps, sinusoids etc. In this way we can build for ourselves a table of
z-transforms such as those found in many mathematical or engineering handbooks.

Sampled step function

The unit sampled step function is simply s(kT ) = 1, k ≥ 0. The z-transform of s(kT ) following
the definition of Eqn. 2.11 is

X
S(z) = kn z −k = 1 + 1 · z −1 + 1 · z −2 + 1 · z −3 + · · · (2.12)
k=0

By using the sum to infinity for a geometric series,3 we obtain the closed form equivalent for
Eqn. 2.12 as
1
S(z) = (2.13)
1 − z −1

Ramp and exponential functions

For the ramp function, x(k) = n for k ≥ 0, and the exponential function, x(k) = e−an , k ≥ 0, we
could go through the same formal analytical procedure, but in this case we can use the ztrans
command from the symbolic toolbox in M ATLAB.
3 The sum of a geometric series of n terms is

a (1 − r n )
a + ar + ar 2 + · · · + ar n−1 = ; r 6= 1
1−r
and if |r| < 1, then the sum for an infinite number of terms converges to
a
S∞ =
1−r
24 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

z-transform of the exponential function.


z-transform of the ramp function using the Symbolic n o
toolbox for M ATLAB. Z e−ak
Z {k}

1 >> syms k T a
1 >> syms k T >> ztrans(exp(-a*k*T))
>> ztrans(k*T) ans =
ans = z/exp(-a*T)/(z/exp(-a*T)-1)
T*z/(z-1)ˆ2 >> simplify(ans)
6 ans =
z*exp(a*T)/(z*exp(a*T)-1)

In a similar manner we can build our own table of z-transforms of common functions.

Final and initial value theorems

Tables and properties of z-transforms of common functions are given in many handbooks and
texts such as [148, p49,67], but two of the more useful theorems are the initial value and the final
value theorems given in Table 2.1. As in the continuous case, the final value theorem is only
applicable if the transfer function is stable.

Table 2.1: Comparing final and initial value theorems for continuous and discrete systems

Continuous Discrete
Initial value, limt→0 y(t) lims→∞ sY (s) lim
 z→∞ Y (z)
 
Final value, limt→∞ y(t) lims→0 sY (s) limz→1 1 − z −1 Y (z)

2.4 Inversion of z-transforms

The usefulness of the z-transform is that we can do algebraic manipulations on discrete difference
equations in the same way we can do algebraic manipulations on differential equations using the
Laplace transform. Generally we:

1. Convert to z-transforms, then

2. do some relatively simple algebraic manipulations to make it easier to

3. invert back to the time domain.

The final step, the inversion, is the tricky part. The process of obtaining f ∗ (t) back from F (z) is
called inverting the z-transform, and is written as

f ∗ (t) = Z −1 {F (z)} (2.14)

Note that only the sampled function is returned f ∗ (t), not the original f (t). Thus the full inver-
sion process to the continuous domain f (t) is a “one to many” operation.

There are various ways to invert z-transforms:


2.4. INVERSION OF Z -TRANSFORMS 25

1. Use tables of z-transforms and their inverses in handbooks, or use a symbolic manipulator.
(See §2.4.1.)
2. Use partial-fraction expansion and tables. In M ATLAB use residue to compute the partial
fractions. (See §2.4.2.)
3. Long division for rational polynomial expressions in the discrete domain. In M ATLAB use
deconv to do the polynomial division. (See §2.4.3.)
4. Computational approach while suitable for computers, has the disadvantage that the an-
swer is not in a closed form. (See §2.4.4.)
5. Analytical formula from complex variable theory. (Not useful for engineering applications,
see §2.4.5.)

2.4.1 Inverting z-transforms with symbolically

The easiest, but perhaps not the most instructive, way to invert z-transforms is to use a computer
algebra package or symbolic manipulator such as the symbolic toolbox or M U PAD. One simply
enters the z-transform, then requests the inverse using the iztrans function in a manner similar
to the forward direction shown on page 24.

>> syms z % Define a symbolic variable z


>> G = (10*z+5)/(z-1)/(z-1/4) % Construct G(z) = (10z + 5)/(z − 1)/(z − 1/4)
3 G =
(10*z+5)/(z-1)/(z-1/4)
>> pretty(G) % check it
10 z + 5
-----------------
8 (z - 1) (z - 1/4)
>> iztrans(G) % Invert the z-transform, Z −1 {G(z)}
ans =
20*kroneckerDelta(n, 0) - 40*(1/4)ˆn + 20

The Kronecker δ function, or kroneckerDelta(n, 0) is a shorthand way of expressing piece-


wise functions in M ATLAB. The expression kroneckerDelta(n, 0) returns 1 when n = 0,
and 0 for all other values.

This is a rather messy and needlessly complicated artifact due to the fact that the symbolic toolbox
does not know that n is defined as positive. We can explicitly inform M ATLAB that n > 0, and
then we get a cleaner solution.

>> syms z
>> syms n positive
>> G = (10*z+5)/(z-1)/(z-1/4);
4

>> y=iztrans(G) % Invert to get,y[n] = Z −1 {G(z)}


y =
20-40*(1/4)ˆn
>> limit(y,n,inf) % y[∞]
9 ans =
20

The last command computed the steady-state by taking the limit of y[n] as n → ∞.
26 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

2.4.2 The partial fraction method

Inverting transforms by partial fractions is applicable for both discrete z-transforms and con-
tinuous Laplace transforms. In both cases, we find an equivalent expression for the transform
in simple terms that are summed together. Hopefully we can then consult a set of tables, such
as [148, Table 2.1 p49], to invert each term individually. The inverse of the sum, is the sum of
the inverses meaning that these expressions when summed together give the full inversion. The
M ATLAB function residue forms partial fractions from a continuous transfer function, although
the help file warns that this operation is numerically poorly conditioned. Likewise the routine
residuez (that is residue with a trailing ‘z’) is designed to extract partial fractions in a form
suitable when using z−transforms.

As an example, suppose we wish to invert

−4s2 − 58s − 24 −4s2 − 58s − 24


G(s) = 3 2
=
s + 2s − 24s s(s + 6)(s − 4)

using partial fractions noting that it contains no multiple or complex roots.

It is easy to spot the pole at s = 0, and the two remaining can be found by synthetic division, or
by factorising using the roots([1 2 -24 0]) command. The partial fraction decomposition
of G(s) can be written as

−4s2 − 58s − 24 A B C
G(s) = = + +
s(s + 6)(s − 4) s s+6 s−4

where the coefficients A, B and C can be found either by equating the coefficients or using the
‘cover-up’ rule shown below.

−4s2 − 58s − 24 −4s2 − 58s − 24


A= = 1, B= =3
(s + 6)(s − 4) s=0 s(s − 4) s=−6
−4s2 − 58s − 24
C= = −8
s(s + 6) s=4

In M ATLAB we can use the residue command to extract the partial fractions.

>> B = [-4 -58 -24]; % Numerator of G(s) = (−4s2 − 58s − 24)/(s(s + 6)(s − 4))
>> A = [ 1 2 -24 0]; % Denominator of G(s)
>> [r,p,k] = residue(B,A) % Find partial fractions
r = % residues (top line)
5 3
-8
1
p = % factors or poles
-6
10 4
0
k = % No extra parts
[]

The order of the residue and pole coefficient produced by residue is the same for each, so the
partial fraction decomposition is again

−4s2 − 58s − 24 3 −8 1
G(s) = = + +
s3 − 2s − 24s s+6 s−4 s
2.4. INVERSION OF Z -TRANSFORMS 27

and we can invert each term individually perhaps using tables to obtain the time domain solution

g(t) = 3e−6t − 8e4t + 1

If the rational polynomial has repeated roots or complex poles, slight modifications to the proce-
dure are necessary so you may need to consult the help file.

Special cases for z-transforms

We can invert z-transforms in a manner similar to Laplace, but if you consult a table of z-
transforms and their inverses in standard control textbooks such as Table 2-1 in [149, p49], you
will discover that the table is written in terms of factors of the form z/(z + a) or alternatively
1/(1 − z −1 rather than say 1/(z + a). This means that we should first perform the partial fraction
decomposition on X(z)/z, rather than just X(z) as in the continuous case. An outline to follow
is given in Algorithm 2.1.

Algorithm 2.1 Inverting z-transforms by partial fractions

To invert a z transform, X(z), using partial fractions, do the following:

1. Divide X(z) by z obtaining X(z)/z.


2. Find the partial fractions of X(z)/z perhaps using residue.
3. Multiply each partial fraction by z to obtain a fraction of the form z/(z + a)
4. Find the inverse of each fraction separately using tables, and sum together to find x(t).

Symbolic partial fractions in M ATLAB

Suppose we want to invert


10z
G(z) =
z 2 + 4z + 3
to g[n] using partial fractions. Unfortunately there is no direct symbolic partial fraction command,
but Scott Budge from Utah University found the following sneaky trick which involves first inte-
grating, then differentiating the polynomial. This round-about way works because M ATLAB in-
tegrates the expression by internally forming partial fractions and then integrating term by term.
Taking the derivative brings you back to the original expression, but now in partial fraction form.

>> syms z
2 >> G = 10*z/(zˆ2 + 4*z + 3);
>> G= diff(int(G/z)) % Extract partial fractions of G(z)/z
G =
5/(z+1)-5/(z+3)
>> G=expand(G*z)
7 G =
5z 5z
5*z/(z+1)-5*z/(z+3) % Gives us G(z)/z = z+1 − z+3

>> syms n positive; iztrans(G,z,n)


ans =
12 -5*(-3)ˆn+5*(-1)ˆn
28 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

While this strategy currently works, we should take note that in future versions of the symbolic
toolbox this may not continue to work. In the above example we also have the option of using
residuez.

2.4.3 Long division

Consider the special case of the rational polynomial where the denominator is simply 1,

F (z) = c0 + c1 z −1 + c2 z −2 + c3 z −3 + · · ·

The inverse of this transform is trivial since the time solution at sample number k or t = kT is sim-
ply the coefficient of the term z −k . Consequently it follows that f (0) = c0 , f (T ) = c1 , f (2T ) =
c2 , . . . , f (kT ) = ck etc. Thus if we have a rational transfer function, all we must do is to synthet-
ically divide out the two polynomials to obtain a single, albeit possibly infinite, polynomial in
z.

Ogata refers to this approach as “direct division” [148, p69]. In general the division will result
in an infinite power series in z, and so is not a particularly elegant closed-form solution, but for
most stable systems, the power series can be truncated eventually with little error.

So to invert a z-transform using long-division, we:

1. Convert the z-transform to a (possibly infinite) power series by dividing the denominator
into the numerator using long division or deconv.

2. The coefficients of z k are the solutions at the sample times kT .

Long division by hand is tedious and error-prone, but we can use the polynomial division capa-
bilities of M ATLAB’s deconv to do this long division automatically. Since deconv only performs
“integer” division returning a quotient and remainder, we must fool it into giving us the “infi-
nite” series by padding the numerator with zeros.

To invert
z2 + z z2 + z
Y (z) = 2
= 3
(z − 1)(z − 1.1z + 1) z − 2.1z 2 + 2.1z − 1
using deconv for the long division, we pad the numerator on the right with say 5 zeros, (that is,
we multiply by z 5 ), and then do the division.

z7 + z6
= z 4 + 3.1z 3 + 4.41z{z
2
+ 3.751z + 1.7161} + remainder
z 3 − 2.1z 2 + 2.1z − 1 |
Q

We are not particularly interested in the remainder polynomial. The more zeros we add, (5 in the
above example), the more solution points we get.

>> Yn = [1,1,0]; % Numerator of G(z)


>> Yd = conv([1,-1],[1 -1.1 1]); % Denominator of G(z) is (z − 1)(z 2 − 1.1z + 1)
3 >> [Q,R] = deconv([Yn,0,0,0,0,0],Yd) % Multiply by z 5 to zero pad, then do long division
Q =
1.0000 3.1000 4.4100 3.7510 1.7161

>> dimpulse(Yn,Yd,7); % Matlab's check


2.4. INVERSION OF Z -TRANSFORMS 29

3
Output

1
Figure 2.9: Inverting z-transforms using dimpulse for
0
the first six samples. It is possible also to construct a
discrete transfer function object and then use the generic
0 1 2 3 4 5 6
sample # impulse routine.

We can also numerically verify the inversion using dimpulse for six samples as shown in Fig. 2.9.

Problem 2.2 1. The pulse transfer function of a process is given by

Y (z) 4(z + 0.3)


= 2
X(z) z − z + 0.4

Calculate the response of y(nT ) to a unit step change in x using long division.

2. Determine the inverse by long division of G(z)

z(z + 1)
G(z) =
(z − 1)(z 2 − z + 1)

2.4.4 Computational approach

The computational method is so-called because it is convenient for a computer solution technique
such as M ATLAB as opposed to an analytical explicit solution. In the computational approach we
convert the z-transform to a difference equation and use a recurrence solution. Consider the
following transform (used in the example from page 25) which we wish to invert back to the time
domain.

X(z) 10z + 5 10z + 5 10z −1 + 5z −2


= = 2 = (2.15)
U (z) (z − 1)(z − 0.25) z − 1.25z + 0.25 1 − 1.25z −1 + 0.25z −2

The inverse is equivalent to solving for x(t) when u(t) is an impulse function. The transform can
be expanded and written as a difference equation

xk = 1.25xk−1 − 0.25xk−2 + 10uk−1 + 5uk−2 (2.16)

Since U (z) is an impulse, thus U (z) = 1 which means that u(0) = 1 and u(k) = 0, ∀k > 0. Now
we substitute k = 0 to start, and note that u(k) = x(k) = 0 when k < 0 by definition. We now
have enough information to compute x(0).

x0 = 1.25x−1 − 0.25x−2 + 10u−1 + 5u−2


= 1.25 × 0 − 0.25 × 0 + 10 × 0 + 5 × 0
=0
30 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

Continuing, we substitute k = 1 into Eqn 2.16 and using our previously computed x0 and find
the next term.

x1 = 1.25x0 − 0.25x−1 + 10u0 + 5u−1


= 1.25 × 0 − 0.25 × 0 + 10 × 1 + 5 × 0
= 10

and just to clarify this recurrence process further, we can try the next iteration of Eqn. 2.16 with
k = 2,

x2 = 1.25x1 − 0.25x0 + 10u1 + 5u0


= 1.25 × 10 − 0.2 × 0 + 10 × 0 + 5 × 1
= 17.5

Now we can use the full recurrence relation, (Eqn 2.16), to obtain the solution in a stepwise
manner to build up the solution as shown in Table 2.2. All the terms on the right hand side of
Eqn 2.16 are either known initially (u(t)), or involve past information, hence it is possible to solve.
Note that in this inversion scheme, u(t) can be any known input series.

Table 2.2: The inversion of a z transform using the computation method. Only the first 5 samples
and the final value are calculated. The final value is as calculated using the symbolic manipulator
earlier on page 25.

time (kT ) u(k) x(k)


0 1 0
1 0 10
2 0 17.5
3 0 19.375
4 0 19.8438
.. .. ..
. . .
∞ 0 20

M ATLAB is well suited for this type of computation. To invert Eqn. 2.15 we can use the discrete
filter command. We rewrite Eqn. 2.15 in the form of a rational polynomial in z −1 ,

b0 + b1 z −1 + b2 z −2
G(z) =
1 + a1 z −1 + a2 z −2
where the numerator and denominator are entered as row vectors in decreasing powers of z. We
then create an impulse input vector u with say 6 samples and then compute the output of the
system

>> b = [0.0, 10.0, 5.0] % Numerator (10z −1 + 5z −2 )


>> a = [1.0, -1.25, 0.25] % Denominator (1 − 1.25z −1 + 0.25z −2 )
3

>> u = [1, zeros(1,5)] % impulse input


u =
1 0 0 0 0 0
>> x = filter(b,a,u) % compute output
8 x =
10.0000 17.5000 19.3750 19.8438 19.9609 19.9902

which gives the same results as Table 2.2.


2.4. INVERSION OF Z -TRANSFORMS 31

The control toolbox has a special object for linear time invariant (LTI) systems. We can create a
discrete system with the transfer function command, tf, and by specifying a sampling time, we
imply to M ATLAB that it is a discrete system.

1 >> sysd = tf([10 5],[1 -1.25 0.25],1) % System of interest (10z + 5)/(z 2 − 1.25z + 0.25)

Transfer function:
10 z + 5
-------------------
6 zˆ2 - 1.25 z + 0.25

Sampling time: 1

>> y=impulse(sysd) % compute the impulse response


11 y =
0
10.0000
17.5000
19.3750
16 19.8438
19.9609

Finally, the contour integral method can also be used to invert z-transforms, [148, §3–3], but
Seborg et al maintains it is seldom used in engineering practice, [179, p571] because it is fraught
with numerical implementation difficulties. These difficulties are also present in the continuous
equivalent; some of which are illustrated in the next section.

2.4.5 Numerically inverting the Laplace transform

Surprisingly, the numerical inversion of continuous transfer functions is considered far less impor-
tant than the computation of the inverse of discrete transfer functions. This is fortunate because
the numerical inversion of Laplace transforms is devilishly tricky. Furthermore, it is unlikely that
you would ever want to numerically invert a Laplace transform in control applications, since
most problems involver little more than a rational polynomial with a possible exponential term.
For these problems we can easily factorise the polynomial and use partial fractions or use the
step orimpulse routines for continuous linear responses.

However in the rare cases where we have a particularly unusual F (s) which we wish to convert
back to the time domain, we might be tempted to use the analytical expression for the inverse
directly
Z σ+j∞
1
f (t) = F (s) est ds (2.17)
2πj σ−j∞

where σ is chosen to be larger than the real part of any singularity of F (s). Eqn. 2.17 is sometimes
known as the Bromwich integral or Mellin’s inverse formula.

The following example illustrates a straight forward application of Eqn. 2.17 to invert a Laplace
transform. However be warned that for all but the most well behaved rational polynomial exam-
ples this strategy is not to be recommended as it results in severe numerical roundoff error due
to poor conditioning.
32 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

Suppose we try to numerically invert a simple Laplace transform,


3 3 
F (s) = ←→ f (t) = 1 − e−5t
s(s + 5) 5
with a corresponding simple time solution. We can start by defining an anonymous function of
the Laplace transform to be inverted.

Fs = @(s) 3.0./(s+5)./s; % Laplace function to invert F (s) = 3/(s(s + 5)) . . .


Fsexp = @(s,t) Fs(s).*exp(s*t) % and associated integrand F (s)est

Now since the largest singularity of F (s) is 0, we can choose the contour path safely to be say
σ = 0.1. We can also approximate the infinite integration interval by reasonably large numbers.
It is convenient that the M ATLAB routine integral works directly in the complex domain.

c = 0.1; % Value σ > than any singularities of F (s)


a = c-100j; b = c+200j; % Contour path approx: σ − j∞ to σ + j∞
3

t=linspace(0,8); % time range of interest


% Numerically approximate Eqn. 2.17.
ft = 1./(2*pi*1j)*arrayfun(@(t) integral(@(x) Fsexp(x,t),a,b), ti);
plot(t,ft) % Compare results in Fig. 2.10(a).

Due to small numerical roundoff however, the returned solution has a small imaginary compo-
nent, which we can in this instance safely ignore. In this rather benign example, where we know
the analytical solution, we can validate the accuracy as shown in Fig. 2.10(a), which it has to be
admitted, is not that wonderful. We should also note in the simplistic implementation above we
have not adequately validated that our algorithmic choices such as the finite integration limits
are appropriate, so we should treat any subsequent result with caution. In fact if you repeat this
calculation with a larger value of σ, or a smaller integration range such as say 1 − 10j to 1 + 10j
then the quality of the solution drops alarmingly. One way to improve the integration accuracy
is to use the waypoints option in the integral routine.

Over the years there have been many attempts to improve the inversion strategy as reviewed
in [1], but nearly all of the proposals run into the problem√of precision. For example, Fig. 2.11
shows the problem when attempting to invert F (s) = 1/ s2 + 1 using the Gaver-Stehfest ap-
proximation algorithm with a varying number of terms. The Gaver-Stehfest family of schemes
are well known to produce inaccurate results for underdamped systems, but when we increase
the number of terms in an attempt to improve the accuracy, the results deteriorate due to numer-
ical round off. Directly applying the Bromwich integral to this transform as shown in Fig. 2.10(b)
is no better. While the strategy suggested in [1] seemingly circumvents the problem of precision,
it does so by brute-force using multi-precision arithmetic which is hardly elegant.

An alternative numerical algorithm is the inverse Laplace transform function invlap function
contributed by Karl Hollenbeck from the Technical University of Denmark. You can test this
routine using one of the following Laplace transform pairs in Table 2.3. The first two are standard,
those following are more unusual. A more challenging collection of test transforms is given in
[1].
n o
The numerical solution to L−1 √1s e−1/s is compared with the analytical solution from Table 2.3
in Fig. 2.12 using a modest tolerance of 10−2 . The routine splits the time axis up into decades,
and inverts each separately which is why we can see some numerical noise starting at t = 10.

All of these numerical inversion strategies, starting from the direct application of the Bromwich
2.4. INVERSION OF Z -TRANSFORMS 33

0.7 2

0.6 1.5 n o
0.5 L−1 √ 1
1
s 2 +1

0.4
f(t)

f(t)
0.5
0.3
0
0.2 n o
−1 3
L s(s+5) −0.5
0.1

0 −1
0.02 2

1
error

error
0
0

−0.02 −1
0 2 4 6 8 0 5 10 15 20
t t

(a) Inverting
n o a benign transfer function, (b) Inverting
 a challenging transfer function,
3
L−1 s(s+5) L−1 √ 1
s2 +1

Figure 2.10: Numerically inverting Laplace transforms by direct evaluation of the Bromwich
integral. Since this strategy is suspectable to considerable numerical errors, it is not to be recom-
mended.

No. of terms = 8
1

0.5

−0.5
0 2 4 6 8 10
No. of terms = 16
1

0.5

−0.5
0 2 4 6 8 10
No. of terms = 20
1

0.5

−0.5
0 2 4 6 8 10
No. of terms = 26
1

0
Figure 2.11: Demonstrating the precision problems
when numerically inverting the Laplace transform using
the Gaver-Stehfest algorithm with a varying number of
−1
0 2 4 6 8 10 terms. The exact inversion is the red solid line while the
t approximate inversion is given by –◦–.
34 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

Table 2.3: A collection of Laplace transform pairs suitable for testing numerical inversion strate-
gies.

Description F (s) f (t) Comment


2
Easy test 1/s t
1 1

Control TF 2 e−t − e−3t
(s + 2)(s + 3)
1
Oscillatory √ J0 (t) Bessel function, see Fig. 2.10(b) & Fig. 2.11.
2
s +1
1 −1/s √
Nasty √ e √1 cos( 4t), See Fig. 2.12.
s πt

1.5

n o
exp(−1/s)
f(t)

0.5 L−1 √
s

0
−0.5
10

−2
10
error

−4
10

Figure 2.12: Testing the numerical inver-


√ −6
sion of the Laplace transform e−1/s / s us- 10
0 5 10 15 20 25 30 35 40
ing the invlap routine. time

integral in Eqn. 2.17, the Gaver-Stehfest algorithm and variants and the implementation invlap
require some care to get reasonable results. Unfortunately in practical cases it can be difficult to
choose suitable algorithmic tuning constants, which mean that it is difficult to generate any sort
of error bound on the computed curve.

2.5 Discretising with a sample and hold

In §2.1.1 we saw how the continuous analogue signal must be sampled to be digestible by the
process control computer. To analyse this sampling operation, we can approximate the real sam-
pler with an ideal sampler in cascade with a hold element such as given in Fig. 2.13.

When we sample the continuous function f (t), we get a sampled function f ∗ (t), that is a series
of spikes that exist only at the sampling instant as shown in the middle plot of Fig. 2.13. The
sampled function, f ∗ (t), is undefined for the time between the sampling instants which is incon-
venient given that this is the bulk of the time. Obviously one sensible solution is to retain the
last sampled value until a new one is collected. During the sample period, the sampled value
f ∗ (t) is the same as the last sampled value of the continuous function; f (kT + t) = f (kT ) where
2.5. DISCRETISING WITH A SAMPLE AND HOLD 35

 f ∗ (t)
f (t) - - Hold - fh (t)
ideal sampler

6 6 6
666
6
6
6

- - -
analogue signal sampled signal sample & held signal

Figure 2.13: Ideal sampler and zeroth-order hold

0 ≤ t < T shown diagrammatically in the third plot in Fig. 2.13. Since the last value collected is
stored or held, this sampling scheme is referred to as a zeroth-order hold. The zeroth-order refers
to the fact that the interpolating function used between adjacent values is just a horizontal line.
Higher order holds are possible, but the added expense is not typically justified even given the
improved accuracy.

We can find the Laplace transform of the zeroth-order hold element by noting that the input is an
impulse function, and the output is a rectangular pulse of T duration, and y(t) high,

output 6
Rectangular pulse
y(t) 

 - time, t
0 sample time, T

Directly applying the definition of the Laplace transform,


Z ∞ Z T
L{zoh} = y(t) e−st dt = yt 1 × e−st dt
0 0
1 − e−sT
= yt ×
| {zs }
zoh

we obtain the transform for the zeroth-order hold. In summary, the transformation of a continu-
ous plant G(s) with zero-order hold is
 
G(s)
G(z) = (1 − z −1 ) Z (2.18)
s

Note that this is not the same as simply the z-transform of G(s).
36 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

Suppose we wish to discretise the continuous plant


8
G(s) = (2.19)
s+3
with a zero-order hold at a sample time of T = 0.1 seconds. We can do this analytically using ta-
bles, or we can trust it to M ATLAB. First we try analytically and apply Eqn 2.18 to the continuous
process. The difficult bit is the z-transformation. We note that
   
G(s) 8 3
Z = Z (2.20)
s 3 s(s + 3)

and that this is in the form given in tables of z-transforms such as [148, p50], part of which is
repeated here

X(s) X(z)
a (1−e−aT )z −1
s(s+a) (1−z −1 )(1−e−aT z −1 )

Inverting using the tables, and substituting in Eqn 2.18 gives


   
−1 G(s) 8 −1 3
G(z) = (1 − z ) · = (1 − z ) · Z
s 3 s(s + 3)
8 (1 − e−3T )z −1
= (1 − z −1 ) ·
3 (1 − z −1 )(1 − e−3T z −1 )
8 (1 − e−0.3 )z −1
= ·
3 (1 − e−0.3 z −1 )
0.6912z −1
= , for T = 0.1 (2.21)
1 − 0.7408z −1
We can repeat this conversion numerically using the continuous-to-discrete function, c2d. We
first define the continuous system as an LTI object, and then convert to the discrete domain using
a zeroth order hold.

>>sysc = tf(8,[1 3]) % Continuous system Gc (s) = 8/(s + 3)

3 Transfer function:
8
-----
s + 3

8 >> sysd = c2d(sysc,0.1,'zoh') % convert to discrete using a zoh

Transfer function:
0.6912
----------
13 z - 0.7408

Sampling time: 0.1

which once again is Eqn. 2.21. In fact we need not specify the ’zoh’ option when constructing a
discrete model using c2d or even in S IMULINK, since a zeroth-order hold is employed by default.

Methods for doing the conversion symbolically are given next in section 2.5.1.
2.5. DISCRETISING WITH A SAMPLE AND HOLD 37

2.5.1 Converting Laplace transforms to z-transforms

In many practical cases we may already have a continuous transfer function of a plant or con-
troller which we wish to discretise. In these circumstances we would like to convert from a con-
tinuous transfer function in s, (such as say a Butterworth filter) to an equivalent discrete transfer
function in z. The two common ways to do this are:

1. Analytically by first inverting the Laplace transform back to the time domain, and then take
the (forward) z-transform.
def 
G(z) = Z L−1 {G(s)} (2.22)

2. or by using an approximate method known as the bilinear transform.

The bilinear method whilst approximate has the advantage that it just involves a simple algebraic
substitution for s in terms of z. The bilinear transform is further covered in §2.5.2.

The formal method we have already seen previously, but since it is such a common operation, we
can write a symbolic M ATLAB script to do it automatically for us for any given transfer function.
However we should be aware whether the transformation includes the sample and hold, or if
that inclusion is left up to the user.

Listing 2.1 converts a Laplace expression to a z-transform expression without a zeroth-order hold.
Including the zeroth-order hold option is given in Listing 2.2.

Listing 2.1: Symbolic Laplace to z-transform conversion


function Gz = lap2ztran(G)
% Convert symbolically G(s) to G(z)
syms t k T z
Gz = simplify(ztrans(subs(ilaplace(G),t,k*T),k,z));
5 return

Listing 2.2: Symbolic Laplace to z-transform conversion with ZOH


function Gz = lap2ztranzoh(G)
% Convert symbolically G(s) to G(z) with a ZOH.
syms s t k T z
Gz = simplify((1-1/z)*ztrans(subs(ilaplace(G/s),t,k*T),k,z));
5 return

We can test the conversion routines in Listings 2.1 and 2.2 on the trial transfer function G(s) =
1/s2 .

>> Gz =lap2ztran(1/sˆ2); % Convert G(s) = 1/s2 to the discrete domain G(z).


>> pretty(Gz)
T z
--------
5 2
(z - 1)

>> Gz = lap2ztranzoh(1/sˆ2);% Do the conversion again, but this time include a ZOH.
>> pretty(Gz)
10 2
38 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

(z + 1) T
1/2 ----------
2
(z - 1)

2.5.2 The bilinear transform

The analytical ‘one step back–one step forward’ procedure while strictly correct, is a little tedious,
so a simpler, albeit approximate, way to transform between the Laplace domain and the z-domain
is to use the bilinear transform method, sometimes known as Tustin’s method. By definition
def
z = esT or z −1 = e−sT , (from Eqn. 2.10). If we substituted directly natural logs would appear in
the rational polynomial in z,
ln(z)
s= (2.23)
T
making the subsequent analysis difficult. For example the resulting expression could not be
transformed into a difference equation which is what we desire.

We can avoid the troublesome logarithmic terms by using a Padé approximation for the expo-
nential term as
2 − sT
e−sT = z −1 ≈ (2.24)
2 + sT
or alternatively
2 (1 − z −1 )
s≈ (2.25)
T (1 + z −1 )
This allows us to transform a continuous time transfer function G(s) to a discrete time transfer
function G(z),
G(z) = G(s)|s= 2 1−z−1 (2.26)
T 1+z −1

Eqn. 2.26 is called the bilinear transform owing to the linear terms in both the numerator and
denominator and it has the advantage that a stable continuous time filter will be stable in the dis-
crete domain. The disadvantage is that the algebra required soon becomes unwieldy if attempted
manually. Other transforms are possible and are discussed in §4-2 p308 of Ogata [148]. Always
remember that this transformation is approximate, being equivalent to a trapezoidal integration.

Here we wish to approximately discretise the continuous plant

1
G(s) =
(s + 1)(s + 2)

at a sample time of T = 0.1 using the bilinear transform. The discrete approximate transfer
function is obtained by substituting Eqn. 2.25 for s and simplifying.

1
G(z) =
(s + 1)(s + 2) s= T2 1−z −1
1+z −1

1 1 T 2 (z + 1)2
=   =
2 1−z −1
+1 2 1−z −1
+2 2 (2T 2 + 6T + 4)z 2 + (4T 2 − 8)z + 2T 2 − 6T + 4
T 1+z −1 T 1+z −1

0.0022z 2 + 0.0043z + 0.0022


= for T = 0.1
z 2 − 1.723z + 0.74
2.5. DISCRETISING WITH A SAMPLE AND HOLD 39

Since this transformation is just an algebraic substitution, it is easy to execute it symbolically in


M ATLAB.

1 >> syms s T z
>> G = 1/(s+1)/(s+2);
>> Gd = simplify(subs(G,s,2/T*(1-1/z)/(1+1/z)))
Gd =
1/2*Tˆ2*(z+1)ˆ2/(2*z-2+T*z+T)/(z-1+T*z+T)
6 >> pretty(Gd)
2 2
T (z + 1)
1/2 -------------------------------------
(2 z - 2 + T z + T) (z - 1 + T z + T)

Alternatively we could use M ATLAB to numerically verify our symbolic solution.

>> G = zpk([],[-1 -2],1)


Zero/pole/gain:
1
-----------
5 (s+1) (s+2)

>> T=0.1;
>> Gd = c2d(G,T,'tustin')

10 Zero/pole/gain:
0.0021645 (z+1)ˆ2
---------------------
(z-0.9048) (z-0.8182)

15 Sampling time: 0.1


>> tf(Gd)

Transfer function:
0.002165 zˆ2 + 0.004329 z + 0.002165
20 ------------------------------------
zˆ2 - 1.723 z + 0.7403

Sampling time: 0.1

The bilinear command in the S IGNAL P ROCESSING toolbox automatically performs this map-
ping from s to z. This is used for example in the design of discrete versions of common analogue
filters such as the Butterworth or Chebyshev filters. These are further described in §5.2.3.

Problem 2.3 1. Use Tustin’s method (approximate z-transform) to determine the discrete time
response of
4
G(s) =
(s + 4)(s + 2)

to a unit step change in input by long division. The sample time T = 1 and solve for 7 time
steps. Compare with the exact solution.
40 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

The frequency response characteristics of a hold element

The zeroth order hold element modifies the transfer function, so consequently influences the
closed loop stability, and frequency characteristics of the discrete process. We can investigate this
influence by plotting the discrete Bode and Nyquist plots for a sampled process with and without
a zeroth order hold.

Suppose we have the continuous process


1.57
Gp (s) = (2.27)
s(s + 1)
which is sampled at a frequency of ω = 4 rad/s. (This corresponds to a sample time of T = π/2
seconds.) First we will ignore the zeroth order hold and transform Eqn 2.27 to the z domain by
using tables such as say Table 2–1, p49 #8 in DCS.

(1 − e−aT )z −1
Gp (z) = 1.57 (2.28)
(1 − z −1 )(1 − e−aT z −1 )
1.243z
= (2.29)
(z − 1)(z − 0.208)
To get the discrete model with the zeroth order hold, we again use Eqn 2.18, and the tables. Doing
this, in a similar manner to what was done for Eqn 2.21, we get
1.2215z + 0.7306
Gh0 Gp (z) = (2.30)
(z − 1)(z − 0.208)
Now we can plot the discrete frequency responses using M ATLAB to duplicate the figure given
in [110, p649] as shown in Fig. 2.14.

In the listing below, I first construct the symbolic discrete transfer function, substitute the sam-
pling time, then convert to a numeric rational polynomial. This rational polynomial is now in a
suitable form to be fed to the control toolbox routines such as bode.

syms s T z
2 G = 1.57/(s*(s+1))
Gd = lap2ztran(G) % convert to a z-transform without ZOH
Gd2 = subs(Gd,T,pi/2)
[num,den] = numden(Gd2) % extract top & bottom lines

7 B= sym2poly(num); A = sym2poly(den); % convert to polynomials

Gd = tf(B,A,pi/2) % construct a transfer function

To generate the discretisation including the zeroth-order hold, we could use the symbolic lap2ztranzoh
routine given in Listing 2.2, or we could use the built-in c2d function.

1 [num,den] = numden(G) % extract numerator & denominator


Bc= sym2poly(num); Ac = sym2poly(den);
Gc = tf(Bc,Ac)

Gdzoh = c2d(Gc,pi/2,'zoh')
6

bode(Gc,Gd,Gdzoh)
legend('Continuous','No ZOH','with ZOH')
2.5. DISCRETISING WITH A SAMPLE AND HOLD 41

In this case I have used vanilla Bode function which will automatically recognise the difference
between discrete and continuous transfer functions. Note that it also automatically selects both
a reasonable frequency spacing, and inserts the Nyquist frequency at fN = 1/2T = 1/πHz or
ωN = 2 rad/s.

These Bode plots shown in Fig. 2.14, or the equivalent Nyquist plots, show that the zeroth order
hold destabilises the system. The process with the hold has a larger peak resonance, and smaller
gain and phase margins.

Bode Diagram

100

50
Magnitude (dB)

−50

−100
−90
Continuous
No ZOH
with ZOH
Phase (deg)

−135

−180
Figure 2.14: The Bode dia-
gram showing the difference
−225 between the continuous plant,
−2 −1 0 1 2
10 10 10 10 10 a discretisation with and with-
Frequency (rad/sec) out the zeroth-order hold.

Of course we should compare the discrete Bode diagrams with that for the original continuous
process. The M ATLAB bode function is in this instance the right tool for the job, but I will con-
struct the plot manually just to demonstrate how trivial it is to substitute s = iω, and compute
the magnitude and phase of G(iω) for all frequencies of interest.

num = 1.57; % Plant of interest G(s) = 1.57/(s2 + s + 0)


2 den = [1 1 0];
w = logspace(-2,1)'; % Select frequency range of interest 10−2 < ω < 101 rad/s.
iw = 1i*w; % s = jω
G = polyval(num,iw)./polyval(den,iw); % G(s = jω)
loglog(w,abs(G)) % Plot magnitude |G(iω)|, see Fig. 2.15.
7 semilogx(w,angle(G)*180/pi) % and phase φ(iω)

We can compare this frequency response of the continuous system with the discretised version
in Fig. 2.14.
42 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

5
10

AR
10

−5
10
−2 −1 0 1
10 10 10 10

−50

Phase lag φ [deg]


−100

−150

Figure 2.15: A frequency response plot of −200


G(s) where s = jω constructed manually 10
−2
10
−1
10
0
10
1

without using the bode command. frequency ω [rad/s]

2.6 Discrete root locus diagrams

The root locus diagram is a classical technique used to study the characteristic response of trans-
fer functions to a single varying parameter such as different controller gains. Traditionally the
construction of root locus diagrams required tedious manual algebra. However now modern
programs such as M ATLAB can easily construct reliable root locus diagrams which makes the
design procedure far more attractable.

For many controlled systems, the loop is stable for small values of controller gain Kc , but unstable
for a gain above some critical value Ku . It is therefore natural to ask how does the stability (and
response) vary with controller gain? The root locus gives this information in a plot form which
shows how the poles of a closed loop system change as a function of the controller gain Kc .
Given a process transfer function G and a controller Kc Q, the closed loop transfer function is the
familiar
Kc Q(z)G(z)
Gcl = (2.31)
1 + Kc Q(z)G(z)
where the stability is dependent on the denominator (characteristic equation) 1 + Kc Q(z)G(z),
which is of course itself dependent on the controller gain Kc . Plotting the roots of

1 + Kc Q(z)G(z) = 0 (2.32)

as a function of Kc creates the root locus diagram.

In this section4 we will plot the discrete root locus diagram for the process

0.5(z + 0.6)
Gp (z) = (2.33)
z 2 (z − 0.4)

which is a discrete description of a first order process with dead time. We also add a discrete
integral controller of the form
z
Gc (z) = Kc
z−1
We wish to investigate the stability of the characteristic equation as a function of controller gain.
4 Adapted from [70, p124]
2.7. MULTIVARIABLE CONTROL AND STATE SPACE ANALYSIS 43

With no additional arguments apart from the transfer function, rlocus will draw the root locus
selecting sensible values for Kc automatically. For this example, I will constrain the plots to have
a square aspect ratio, and I will overlay a grid of constant shape factor ζ and natural frequency
ωn using the zgrid function.

Q = tf([1 0],[1 -1],-1); % controller without gain


G = tf(0.5*[1 0.6],[1 -0.4 0 0],-1); % plant
3

Gol = Q*G; % open loop


zgrid('new');
xlim([-1.5 1.5]); axis equal %
rlocus(Gol) % plot root locus
8 rlocfind(Gol)

Once the plot such as Fig. 2.16 is on the screen, I can use rlocfind interactively to establish the
values of Kc at different critical points in Fig. 2.16. Note that the values obtained are approximate.

Pole description Name ≈ Kc


border-line stability ultimate gain 0.6
cross the ζ = 0.5 line design gain 0.2
critically damped breakaway gain 0.097
overdamped, ωn = 36◦ 0.09

Once I select the gain that corresponds to the ζ = 0.5 crossing, I can simulate the closed loop step
response.

Gol.Ts = 1; % Set sampling time


2 Kc = 0.2 % A gain where we expect ζ = 0.5
step(Kc*Gol/(1 + Kc*Gol),50) % Simulate closed loop

A comparison of step responses for various controller gains is given in Fig. 2.17. The curve in
Fig. 2.17 with Kc = 0.2 where ζ ≈ 0.5 exhibits an overshoot of about 17% which agrees well with
the expected value for a second order response with a shape factor of ζ = 0.5. The other curves
do not behave exactly as one would expect since our process is not exactly a second order transfer
function.

2.7 Multivariable control and state space analysis

In the mid 1970s, the western world suffered an ‘oil shock’ when the petroleum producers and
consumers alike realised that oil was a scarce, finite and therefore valuable commodity. This had
a number of important implications, and one of these in the chemical processing industries was
the increased use of integrated heat plants. This integration physically ties separate parts of the
plant together and demands a corresponding integrated or ‘plant-wide’ control system. Clas-
sical single-input/single-output (SISO) controller design techniques such as transfer functions,
frequency analysis or root locus were found to be deficient, and with the accompanying devel-
opment of affordable computer control systems that could administer hundreds of inputs and
outputs, many of which were interacting, nonlinear and time varying, new tools were required
and the systematic approach offered by state-space analysis became popular.
44 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

Root Locus

1.5

1
0.5π/T Ultimate
0.6π/T 0.4π/T gain
0.7π/T 0.1
0.3π/T
0.2
0.3
0.8π/T 0.4 0.2π/T Design gain
0.5 0.5
0.6
0.7
0.9π/T 0.8 0.1π/T
Imaginary Axis

0.9
π/T
0
π/T

0.9π/T 0.1π/T
breakaway
−0.5 0.8π/T 0.2π/T gain

0.7π/T 0.3π/T
0.6π/T 0.4π/T
0.5π/T
−1

−1.5

−1.5 −1 −0.5 0 0.5 1 1.5


Real Axis

Figure 2.16: The discrete root locus of Eqn. 2.33 plotted using rlocus. See also Fig. 2.17 for the
resultant step responses for various controller gains.

1
K =0.1
c
0
2

1
K =0.2
c
0
2

Figure 2.17: Various discrete closed loop re-


sponses for gains Kc = 0.6, 0.2 and 0.1. Note that 1
with Kc = 0.6 we observe a response on the sta- K =0.6
c
bility boundary, while for Kc = 0.1 we observe
0
a critically damped response as anticipated from 0 10 20 30 40 50
the root locus diagram given in Fig. 2.16. time [s]
2.7. MULTIVARIABLE CONTROL AND STATE SPACE ANALYSIS 45

Many physical processes are multivariable. A binary distillation column such as given in Fig. 2.18
typically has four inputs; the feed flow and composition and reflux and boil-up rates, and at least
four outputs, the product flow and compositions. To control such an interacting system, mul-
tivariable control such as state space analysis is necessary. The Wood-Berry distillation column
model (Eqn 3.24) and the Newell & Lee evaporator model are other examples of industrial pro-
cess orientated multivariable models.

Feed composition
- - distillation composition, xD
(
Disturbances
- - distillate rate, D
Feed rate
Distillation - tray 5 temperature, T5
column - tray 15 temperature, T15
reflux(rate
Manpipulated - - Bottoms rate, B
variables - - Bottoms comp., xB
reboiler heat

Inputs Outputs

Figure 2.18: A binary distillation column with multiple inputs and multiple outputs

State space analysis only considers first order differential equations. To model higher order sys-
tems, one needs only to build systems of first order equations. These equations are conveniently
collected, if linear, in one large matrix. The advantage of this approach is a compact represen-
tation, and a wide variety of good mathematical and robust numerical tools to analyse such a
system.

A few words of caution

The state-space analysis is sometimes referred to as the “modern control theory”, despite the fact
that is has been around since the early 1960s. However by the end of the 1970s, the promise of
the academic advances made in the previous decade were turning out to be ill-founded, and it
was felt that this new theory was ill-equipped to cope with the practical problems of industrial
control. In many industries therefore, ‘Process Control’ attracted a bad smell. Writing a decade
still later in 1987, Morari in [136] attempts to rationalise why this disillusionment was around at
the time, and whether the subsequent decade of activity alleviated any of the concerns. Morari
summarises that commentators such as [69] considered theory such as linear multivariable con-
trol theory, (i.e. this chapter) which seemed to promise so much, actually delivered very little
‘and had virtually no impact on industrial practice’. There were other major concerns such as
the scarcity of good process models, the increasing importance of operating constraints, operator
acceptance etc, but the poor track record of linear multivariable control theory landed top billing.
Incidentally, Morari writing almost a decade later still in 1994, [139], revisits the same topic giving
the reader a nice linear perspective.
46 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

2.7.1 States and state space

State space equations are just a mathematical equivalent way of writing the original common dif-
ferential equation. While the vector/matrix construction of the state-space approach may initially
appear intimidating compared with high order differential equations, it turns out to be more con-
venient and numerically more robust to manipulate these equations when using programs such
as M ATLAB. In addition the state-space form can be readily expanded into multivariable systems
and even nonlinear systems.

The state of a system is the smallest set of values such that knowledge of these values, knowledge
of any future inputs and the governing dynamic equations, it is possible to predict everything
about the future (and past) output of the system. The state variables are often written as a column
vector of length n and denoted x. The state space is the n dimensional coordinate space that all
possible state vectors must lie in.

The input to a dynamic system is the vector of m input variables, u, that affect the state vari-
ables. Historically control engineers further subdivided the input variables into those they could
easily and consciously adjust (known as control or manipulated variable inputs), and those vari-
ables that may change, but are outside the engineer’s immediate sphere of influence (like the
weather) known as disturbance variables. The number of input variables need not be the same as
the number of state variables, and indeed m is typically less than n.

Many mathematical control text books follow a standard nomenclature. Vectors are written using
a lower case bold font such as x, and matrices are written in upper case bold such as A.

The state space equations are the n ordinary differential equations that relate the state derivatives
to the state themselves, the inputs if any, and time. We can write these equations as

ẋ = f (x, u, t) (2.34)

where f (·) is a vector function meaning that both the arguments and the result are vectors. In
control applications, the input is given by a control law which is often itself a function of the
states,
u = h(x) (2.35)
which if we substitute into Eqn. 2.34, we get the closed loop response

ẋ = f (x, h(x), t)

For autonomous closed loop systems there is no explicit dependence on time, so the differential
equation is simply
ẋ = f (x) (2.36)
For the continuous linear time invariant case, Eqn. 2.34 simplifies to

ẋ = Ax + Bu (2.37)

or in discrete form at time t = kT


xk+1 = Φxk + ∆uk (2.38)
n
where the state vector x is an (n × 1) vector, or alternatively written as x ∈ ℜ , the control input
vector has m elements, or u ∈ ℜm . The model (transition) matrix is (n × n) or A, Φ ∈ ℜn×n , and
the control or input matrix is (n × m) or B, ∆ ∈ ℜn×m .

Block diagrams of both the continuous and discrete formulations of the state-space model are
shown in Fig. 2.19. Such a form is suitable to implement state-space systems in a simulator such
as S IMULINK for example.
2.7. MULTIVARIABLE CONTROL AND STATE SPACE ANALYSIS 47

ẋ R x(t) xk xk+1
u(t) - B -+ - - uk - ∆ -+ - z −1 -
6 6

 A   Φ 

Continuous plant Discrete plant

Figure 2.19: A block diagram of a state-space dynamic system, (a) continuous system: ẋ = Ax +
Bu, and (b) discrete system: xk+1 = Φxk + ∆uk . (See also Fig. 2.20.)

The output or measurement vector, y, are the variables that are directly measured from the operat-
ing plant, since often the true states cannot themselves be directly measured. These outputs are
related to the states by
y = g(x) (2.39)
If the measurement relation is linear, then

y = Cx (2.40)

where the r element measurement vector is ℜr , and the measurement matrix is sized C ∈ ℜr×n .

Sometimes the input may directly affect the output bypassing the process, in which case the full
linear system in state space is described by,

ẋ = Ax + Bu
(2.41)
y = Cx + Du

Eqn. 2.41 is the standard starting point for the analysis of continuous linear dynamic systems, and
are shown in block diagram form in Fig. 2.20. Note that in the diagram, the internal state vector
(bold lines), has typically more elements than either the input or output vectors (thin lines). The
diagram also highlights the fact that as an observer to the plant, we can relatively easily measure
our outputs (or appropriately called measurements), we presumably know our inputs to the
plant, but we do not necessarily have access to the internal state variables since they are always
contained inside the dashed box in Fig. 2.20. Strategies to estimate these hidden internal states is
known as state estimation and are described in section 9.5.

2.7.2 Converting differential equations to state-space form

If we have a collection or system of interlinked differential equations, it is often convenient and


succinct to group them together in a vector/matrix collection. Alternatively if we start with an
nth order differential equation, for the same reasons it is advisable to rewrite this as a collection
of n first order differential equations known as Cauchy form.

Given a general linear nth order differential equation

y (n) + a1 y (n−1) + a2 y (n−2) + · · · + an−1 ẏ + an y = b0 u(n) + b1 u(n−1) + · · · + bn−1 u̇ + bn u (2.42)


48 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

- D

ẋ R x(t) ?
u(t) -- B -+ - - C -+ - y(t)
6

 A 

Figure 2.20: A complete block diagram of a state-space dynamic system with output and direct
measurement feed-through, Eqn. 2.41.

by inspection, we can create the equivalent transfer function as

Y (s) b0 sn + b1 sn−1 + · · · + bn−1 s + bn


= n (2.43)
U (s) s + a1 sn−1 + · · · + an−1 s + an s
We can cast this transfer function (or alternatively the original differential equation) into state
space in a number of equivalent forms. They are equivalent in the sense that the input/output
behaviour is identical, but not necessarily the internal state behaviour.

The controllable canonical form is


      
ẋ1 0 1 0 ··· 0 x1 0
 ẋ2   0
   0 1 ··· 0 


 x2  
  0 

 ..   .. .. .. .. ..   ..   .. 
 . = . . . . .   . + . u (2.44)
       
 ẋn−1   0 0 0 ··· 1   xn−1   0 
ẋn −a0 −a1 −a2 · · · −an−1 xn 1
 
x1
h

i x2 

.. .. .  .. 
y= b n − an b 0 . bn−1 − an−1 b0 . · · · .. b1 − a1 b0 
 .  + b0 u

(2.45)
 xn−1 
xn

which is useful when designing pole-placement controllers. The observable canonical form is
      
ẋ1 0 0 ··· 0 −an x1 b n − an b 0
 ẋ2   1 0 · · · 0 −an−1   x2   bn−1 − an−1 b0 
      
 ..  =  .. . . .. .. ..   ..  +  .. u (2.46)
 .   . . . . .   .   . 
ẋn 0 0 ··· 1 −a1 xn b 1 − a1 b 0
 
x1
 x2 
 
· · · 0 1  ...
  
y= 0 0  + b0 u (2.47)
 
 xn−1 
xn

and is useful when designing observers.


2.7. MULTIVARIABLE CONTROL AND STATE SPACE ANALYSIS 49

If the transfer function defined by Eqn. 2.43 has real and distinct factors,

Y (s) b0 sn + b1 sn−1 + · · · + bn−1 s + bn


=
U (s) (s + p1 )(s + p2 ) · · · (s + pn )
c1 c2 cn
= b0 + + + ···+
s + p1 s + p2 s + pn

we can derive an especially elegant state-space form

      
ẋ1 −p1 0 x1 1

 ẋ2  
  −p2 


 x2  
  1 

 .. = ..   .. + .. u (2.48)
 .   .   .   . 
ẋn 0 −pn xn 1
 
x1
 x2

 

y= c1 c2 · · · cn  ..  + b0 u (2.49)
 . 
xn

that is purely diagonal. As evident from the diagonal system matrix, this system is totally decou-
pled and possess the best numerical properties for simulation. For systems with repeated roots,
or complex roots, or both, the closest we can come to a diagonal form is the block Jordan form.

We can interconvert between all these above forms using the M ATLAB canon command.

>> G = tf([5 6],[1 2 3 4]); % Define transfer function G = (5s2 + 6s)/(s3 + 2s2 + 3s + 4)
2

>> canon(ss(G),'compan') % Convert TF to the companion or observable canonical form

a =
x1 x2 x3
7 x1 0 0 -4
x2 1 0 -3
x3 0 1 -2

b =
12 u1
x1 1
x2 0
x3 0

17 c =
x1 x2 x3
y1 0 5 -4

d =
22 u1
y1 0

These transformations are discussed further in [148, p515]. M ATLAB uses a variation of this form
when converting from a transfer function to state-space in the tf2ss function.
50 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

2.7.3 Interconverting between state space and transfer functions

The transfer function form and the state-space form are, by in large, equivalent, and we can con-
vert from one representation to the other. Starting with our generic state-space model, Eqn. 2.37,

ẋ = Ax + Bu (2.50)

with initial condition x(0) = 0 and taking Laplace transforms gives

sx(s) = Ax(s) + Bu(s)


x(s) (sI − A) = Bu(s)
x(s) = (sI − A)−1 B u(s) (2.51)
| {z }
Gp (s)

where Gp (s) is a matrix of expressions in s and is referred to as the transfer function matrix. The
M ATLAB command ss2tf (state-space to transfer function) performs this conversion.

State-space to transfer function conversion

Suppose we wish to convert the following state-space description to transfer function form.
   
−7 10 4 −1
ẋ = x+ u (2.52)
−3 4 2 −3

From Eqn. 2.51, the transfer function matrix is defined as


−1
Gp (s) = (sI − A) B
    −1  
1 0 −7 10 4 −1
= s −
0 1 −3 4 2 −3
 −1  
(s + 7) −10 4 −1
= (2.53)
3 (s − 4) 2 −3

Now at this point, the matrix inversion in Eqn. 2.53 becomes involved because of the presence of
the symbolic s in the matrix. Inverting the symbolic matrix expression gives the transfer function
matrix.
  
1 s − 4 −10 4 −1
Gp (s) =
(s + 2)(s + 1) −3 s + 7 2 −3
 
4 s + 26
− 2
 s+2 s + 3s + 2 
= 
2 3(6 + s) 
− 2
s+2 s + 3s + 2
We can directly apply Eqn. 2.51 using the symbolic toolbox in M ATLAB.

>> syms s
2 >> A = [-7 10; -3 4]; B = [4 -1; 2 -3];
>> G = (s*eye(2)-A)\B % Pulse transfer function Gp (s) = (sI − A)−1 B.
G =
[ 4/(s+2), -(s+26)/(2+sˆ2+3*s)]
[ 2/(s+2), -3*(s+6)/(2+sˆ2+3*s)]
2.7. MULTIVARIABLE CONTROL AND STATE SPACE ANALYSIS 51

The method for converting from a state space description to a transfer function matrix described
by Eqn. 2.51 is not very suitable for numerical computation owing to the symbolic matrix in-
version required. However [174, p35] describes a method due to Faddeeva that is suitable for
numerical computation in M ATLAB.

Faddeeva’s algorithm to convert a state-space description, ẋ = Ax + Bu into transfer function


form, Gp (s).

1. Compute a, the characteristic equation of the n × n matrix A. (Use poly in M ATLAB.)


2. Set En−1 = I.
3. Compute recursively the following n − 1 matrices

En−1−k = AEn−k + an−k I, k = 1, 2, · · · , n − 1

4. The transfer function matrix is then given by

sn−1 En−1 + sn−2 En−2 + · · · + E0


Gp (s) = B (2.54)
an sn + an−1 sn−1 + · · · + a1 s + a0

Expressed in M ATLAB notation, an (incomplete) version of the algorithm is

a = poly(A); % characteristic polynomial


n = length(a); % dimension of system
I = eye(n-1); E=I;
4 for i=n-2:-1:1
E = A*E + a(i+1)*I % collect & printout
end % for

Note however that it is also possible to use ss2tf, although since M ATLAB (version 4) cannot use
3D matrices, we need to call this routine n times to build up the entire transfer function matrix.

We can repeat the state-space to transfer function example, Eqn. 2.52, given on page 50 using
Faddeeva’s algorithm.      
ẋ1 −7 10 4 −1
= x+ u
ẋ2 −3 4 2 −3
We start by computing the characteristic polynomial of A

a(s) = s2 + 3s + 2 = (s + 1)(s + 2)

and with n = 2 we can compute


   
1 0 5 −1
E1 = , E0 =
0 1 2 2

Following Eqn. 2.54 gives a matrix of polynomials in s,


    
−1 1 1 0 −4 10
(sI − A) = 2 s +
s + 3s + 2 0 1 −3 7
 
1 s−4 10
=
(s + 1)(s + 2) −3 s+7
52 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

which we can recognise as the same expression as we found using the symbolic matrix inverse
on page 50. All that remains is to post multiply by B to obtain the pulse transfer function matrix.

Finally, we can also use the control toolbox for the conversion from state-space to transfer func-
tion form. First we construct the state-space form,

A = [-7 10; -3,4]; B = [4 -1; 2 -3]; % Continuous state-space example from Eqn. 2.52.
C = eye(size(A));
sys = ss(A,B,C,[]) % construct system object

where in this case if leave out the D matrix, M ATLAB assumes no direct feed though path. We
now can convert to the transfer function form:

>> sys_tf = minreal(tf(sys)) % Convert to transfer function & remember to cancel common
factors
2

Transfer function from input 1 to output...


4
#1: -----
s + 2
7

2
#2: -----
s + 2

12 Transfer function from input 2 to output...


-s - 26
#1: -------------
sˆ2 + 3 s + 2

17 -3 s - 18
#2: -------------
sˆ2 + 3 s + 2

Once again we get the same pulse transfer function matrix.

Transfer function to state space

To convert in the reverse direction, we can use the M ATLAB function tf2ss (transfer function to
state space) thus converting an arbitrary transfer function description to the state space format of
Eqn 2.37. For example starting with the transfer function

(s + 3)(s + 4) s2 + 7s + 12
G(s) = = 3
(s + 1)(s + 2)(s + 5) s + 8s + 17s + 10

we can convert to state-space form using tf2ss.

1 >>num = [1 7 12]; % Numerator: s2 + 7s + 12


>>den = [1 8 17 10]; % Denominator: s3 + 8s + 17s + 10
>>[A,B,C,D] = tf2ss(num,den) % convert to state-space
2.7. MULTIVARIABLE CONTROL AND STATE SPACE ANALYSIS 53

which returns the following four matrices


  
−8 −17 −10 1
A= 1 0 0 , B= 0 
0 1 0 0
 
C = 1 7 12 , D=0

This form is a variation of the controllable canonical form described previously in §2.7.2. This
is not the only state-space realisation possible however as the C ONTROL T OOLBOX will return a
slightly different ABCD package

>> Gc = tf([1 7 12],[1 8 17 10]) % Transfer function in polynomial form


2 Transfer function:
sˆ2 + 7 s + 12
-----------------------
sˆ3 + 8 sˆ2 + 17 s + 10

7 >>[A,B,C,D] = ssdata(Gc) % extract state-space matrices


A =
-8.0000 -4.2500 -1.2500
4.0000 0 0
0 2.0000 0
12 B =
2
0
0
C =
17 0.5000 0.8750 0.7500
D =
0

Alternatively we could start with zero-pole-gain form and obtain yet another equivalent state-
space form.

1 G = zpk([-3 -4],[-1 -2 -5],1) % Transfer function model in factored format


Gss = ss(G)

A later section, (§2.7.4) shows how we can convert between equivalent dynamic forms. The four
matrices in the above description form the linear dynamic system as given in Eqn. 2.41. We can
concatenate these four matrices into one large block thus obtaining a shorthand way of storing
these equations

# of inputs
n m
 - -
6
A B n # of states
G= (n × n) (n × m)
?
C D 6p # of measurements
(p × n) (p × m) ?
54 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

which is often called a packed matrix notation of the quadruplet (A,B,C,D). If you get confused
about the appropriate dimensions of A, B etc, then this is a handy check, or alternatively you
can use the diagnostic routine abcdchk. Note that in practice, the D matrix is normally the
zero matrix since the input does not generally effect the output immediately without first passing
through the system dynamics.

2.7.4 Similarity transformations

The state space description of a differential equation is not unique. We can transform from one
description to another by using a linear invertible transformation such as x = Tz. Geometrically
in 2 dimensions, this is equivalent to rotating the axes or plane. When one rotates the axes, the
inter-relationship between the states do not change, so the transformation preserves the dynamic
model.

Suppose we have the dynamic system

ẋ = Ax + Bu (2.55)

which we wish to transform in some manner using the non-singular transformation matrix, T,
where x = Tz. Naturally the reverse transformation z = T−1 x exists because we have restricted
ourselves to consider only the cases when the inverse of T exists. Writing Eqn. 2.55 in terms of
our new variable z we get

d(Tz)
= ATz + Bu
dt
ż = T−1 ATz + T−1 Bu (2.56)

Eqn 2.56 and Eqn 2.55 represent the same dynamic system. They have the same eigenvalues
hence the similarity transform, but just a different viewpoint. The mapping from A to T−1 AT
is called a similarity transform and preserves the eigenvalues. These two matrices are said to be
similar. The proof of this is detailed in [86, p300] and [148, p513–514].

The usefulness of these types of transformations is that the dynamics of the states are preserved
(since the eigenvalues are the same), but the shape and structure of the system has changed.
The motivation is that for certain operations (control, estimation, modelling), different shapes
are more convenient. A pure, (or nearly), diagonal shape of the A matrix for example has much
better numerical properties than a full matrix. This also has the advantage that less computer
storage and fewer operations are needed.

To convert a system to a diagonal form, we use the transformation matrix T, where the columns
of T are the eigenvectors of A. Systems where the model (A) matrix is diagonal are especially
easy to manipulate. We can find the eigenvectors T and the eigenvalues e of a matrix A using
the eig command, and construct the new transformed, and hopefully diagonal system matrix,

[T,e] = eig(A) % find eigenvectors & eigenvalues of A


V = T\A*T % New system matrix, T−1 AT, or use canon

Another use is when testing new control or estimation algorithms, it is sometimes instructive to
devise non-trivial systems with specified properties. For example you may wish to use as an
example a 3 × 3 system that is stable and interacting, and has one over-damped mode and two
oscillatory modes. That is we wish to construct a full A matrix with specified eigenvalues. We
can use the similarity transformations to obtain these models.
2.7. MULTIVARIABLE CONTROL AND STATE SPACE ANALYSIS 55

Other useful transformations such as the controllable and observable canonical forms are covered
in [148, §6–4 p646 ]. The M ATLAB function canon can convert state-space models to diagonal
or observable canonical form (sometimes known as the companion form). Note however the
help file for this routine discourages the use of the companion form due to its numerical ill-
conditioning.

M ATLAB comes with some utility functions to generate test models such as ord2 which gener-
ates stable second order models of a specified damping and natural frequency, and rmodel is a
flexible general stable random model builder or arbitrary order.

2.7.5 Interconverting between transfer functions forms

The previous section described how to convert between different representations of linear dy-
namic systems such as differential equations, transfer functions and state-space descriptions.
This section describes the much simpler task of converting between the different ways we can
write transfer functions.

Modellers tend to think in continuous time systems, G(s), and in terms of process gain and time
constants, so will naturally construct transfer functions of the form
K Πi (αi s + 1)
(2.57)
Πj (τj s + 1)
where the all the variables of interest such as time constants τj are immediately apparent. On
the other hand, system engineers tend to think in terms of poles and zeros, so naturally construct
transfer functions in the form
K ′ Πi (s + zi )
(2.58)
Πj (s + pi )
where once again the poles, pj , and zeros zi are immediately apparent. This is the form that
M ATLAB uses in the zeros-pole-gain format, zpk.

Finally the hardware engineer would prefer to operate in the expanded polynomial form, (par-
ticularly in discrete cases), where the transfer function is of the form
bm sm + bm−1 sm−1 + · · · + b0
(2.59)
sn + an−1 sn−1 + · · · + a0
This is the form that M ATLAB uses in the transfer-function format, tf. Note that the leading
coefficient in the denominator is set to 1. As a M ATLAB user, you can define a transfer function
that does not follow this convention, but M ATLAB will quietly convert to this normalised form if
you type something like G = tf(zpk(G)).

The inter-conversions between the forms is not difficult; between expressions Eqn. 2.57 and
Eqn. 2.58 simply require some adjusting of the gains and factors, while to convert from Eqn. 2.59
requires one to factorise polynomials.

For example, the following three transfer function descriptions are all equivalent
2 (10s + 1)(−3s + 1) −1.5 (s − 0.3333)(s + 0.1) −60s2 + 14s + 2
= =
(20s + 1)(2s + 1)(s + 1) (s + 1)(s + 0.5)(s + 0.05) 40s3 + 62s2 + 23s + 1
| {z } | {z } | {z }
time constant zero-pole-gain expanded polynomial

It is trivial to interconvert between the zero-pole-gain, Eqn. 2.58, and the transfer function for-
mats, Eqn. 2.59, in M ATLAB, but it is less easy to convert to the time constant description. List-
ing 2.3 extracts from an arbitrary transfer function form the time constants, τ , the numerator time
constants, α, and the plant gain K.
56 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

Listing 2.3: Extracting the gain, time constants and numerator time constants from an arbitrary
transfer function format
Gplant = tf(2*mconv([10 1],[-3 1]),mconv([20 1],[2 1],[1 1])) % TF of interest
G = zpk(Gplant); % Convert TF description to zero-pole-gain
3

Kp = G.k;
p = cell2mat(G.p); z = cell2mat(G.z); % Extract poles & zeros
delay0 = G.iodelay ; % Extract deadtime (if any)

8 tau = sort(-1./p,'descend'); % Convert (& sort) to time constants, 1/(τj s + 1)


ntau = sort(-1./z,'descend'); % Convert to numerator time constants, (αi s + 1)
K = Kp*prod(tau)/prod(ntau); % Adjust plant gain

We could of course use the control toolbox functions pole and zero to extract the poles and
zeros from an arbitrary LTI model.

2.7.6 The steady state

The steady-state, xss , of a general nonlinear system ẋ = f (x, u) is the point in state space such
that all the derivatives are zero, or the solution of

0 = f (xss , u) (2.60)

If the system is linear, ẋ = Ax+ Bu, then the steady state can be evaluated algebraically in closed
form
xss = −A−1 Bu (2.61)

Consequently to solve for the steady-state one could invert the model matrix A but which may be
illconditioned or computationally time consuming. Using M ATLAB one should use xss = -A\B*u.

If A has no inverse, then no (or alternatively infinite) steady states exist. An example of a process
that has no steady state could be a tank-flow system that has a pump withdrawing fluid from
the outlet at a constant rate independent of liquid height say just exactly balancing an input flow
shown in the left hand schematic in Fig. 2.21. If the input flow suddenly increased, then the level
will rise until the tank eventually overflows. If instead the tank was drained by a valve partially
open at a constant value, then as the level rises, the increased pressure (head) will force more
material out through the valve, (right-hand side of Fig. 2.21.) Eventually the system will rise to a
new steady state. It may however overflow before the new steady state is reached, but that is a
constraint on the physical system that is outside the scope of the simple mathematical description
used at this time.

If the system is nonlinear, there is the possibility that multiple steady states may exist. To solve
for the steady state of a nonlinear system, one must use a nonlinear algebraic solver such as
described in chapter 3.

Example The steady state of the differential equation

d2 y dy
2
+7 + 12y = 3u
dt dt
dy
where dt = y = 0 at t = 0 and u = 1 , t ≥ 0 can be evaluated using Laplace transforms and the
2.7. MULTIVARIABLE CONTROL AND STATE SPACE ANALYSIS 57

flow in

?
?
6
6
h h
? ?

- flow = constant - flow = f(height)

constant flow pump restriction

Figure 2.21: Unsteady and steady states for level systems

final value theorem. Transforming to Laplace transforms we get


s2 Y (s) + 7sY (s) + 12Y (s) = 3U (s)
Y (s) 3 3
= 2 =
U (s) s + 7s + 12 (s + 3)(s + 4)

while for a step input

1 3
Y (s) = ·
s s2 + 7s + 12
The final value theorem is only applicable if the system is stable. To check, we require that the
roots of the denominator, s2 + 7s + 12, lie in the left hand plane,

−6 ± 72 − 4 × 12
s= = −4 and − 3
2
Given that both roots have negative real parts, we have verified that our system is stable and we
are allowed to apply the final value theorem to solve for the steady-state, y(∞),
3 3
lim y(t) = lim sY (s) = lim = = 0.25
t→∞ s→0 s→0 s2 + 7s + 12 12
Using the state space approach to replicate the above, we first cast the second order differential
equation into two first order differential equations using the controllable canonical form given in
Eqn. 2.45. Let z1 = y and z2 = dydt , then
   
0 1 0
ż = z+ u
−12 −7 3

and the steady state is now


    
−7/12 −1/12 0 0.25
zss = −A−1 Bu = − 1=
1 0 3 0
Noting that z1 = y, we see that the steady state is also at 0.25. Furthermore, the derivative term
(z2 ) is zero, which is as expected at steady state.
58 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

2.8 Solving the vector differential equation

Since we can solve differential equations by inverting Laplace transforms, we would expect to be
able to solve state-space differential equations such as Eqn. 2.37 in a similar manner. If we look
at the Laplace transform of a simple linear scalar differential equation, ẋ = ax + bu we find two
terms,

sX(s) − x0 = aX(s) + bU (s)


x0 b
X(s) = + U (s) (2.62)
s−a s−a
One of these terms is the response of the system owing to the initial condition, x0 eat , and is
called the homogeneous solution, and the other term is due to the particular input we happen
to be using. This is called the particular integral, and we must know the form of the input, u(t),
before we solve this part of the problem. The total solution is the sum of the homogeneous and
particular components.

The homogeneous solution

For the moment we will consider just the homogeneous solution to our vector differential equa-
tion. That is, we will assume no driving input, or u(t) = 0. (In the following section we will add
in the particular integral due to a non-zero input.)

Our vector differential equation, ignoring any input, is simply

ẋ = Ax, x(t = 0) = x0 (2.63)

Taking Laplace transforms, and not forgetting the initial conditions, we have

sx(s) − x0 = A x(s)
x(s) (sI − A) = x0 (s)
−1
x(s) = (sI − A) x0

Finally inverting the Laplace transform back to the time domain gives
n o
−1
x(t) = L−1 (sI − A) x0 (2.64)

Alternatively we can solve Eqn. 2.63 by separating the variables and integrating obtaining

x(t) = eAt x0 (2.65)

where the exponent of a matrix, eAt , is itself a matrix of the same size as A. We call this matrix
exponential the transition matrix because it transforms the state vector at some initial time x0 to
some point in the future, xt . We will give it the symbol Φ(t).

The matrix exponential is defined just as in the scalar case as a Taylor series expansion,

A2 t2 A3 t3
eAt or Φ(t) = I + At +
def
+ + ···
2! 3!

X Ak tk
= (2.66)
k!
k=0
2.8. SOLVING THE VECTOR DIFFERENTIAL EQUATION 59

although this series expansion method is not recommended as a reliable computational strategy.
Better strategies are outlined on page 62.
n o
Comparing Eqn. 2.64 with Eqn. 2.65 we can see that the matrix eAt = L−1 (sI − A)−1 .

So to compute the solution, x(t), we need to know the initial condition and a strategy to numeri-
cally compute a matrix exponential.

The particular solution

Now we consider the full differential equation with nonzero input ẋ = Ax + Bu. Building on
the solution to the homogeneous part in Eqn. 2.65, we get
Z t
x(t) = Φt x0 + Φt−τ Buτ dτ (2.67)
0

where now the second term accounts for the particular input vector u.

Eqn. 2.67 is not particularly useful as written as both terms are time varying. However the con-
tinuous time differential equation can be converted to a discrete time differential equation that
is suitable for computer control implementation provided the sampling rate is fixed, and the in-
put is held constant between the sampling intervals. We would like to convert Eqn. 2.37 to the
discrete time equivalent, Eqn. 2.38, repeated here

xk+1 = Φxk + ∆uk (2.68)

where xk is the state vector x at time t = kT where T is the sample period. Once the sample pe-
riod is fixed, then Φ and ∆ are also constant matrices. We have also assumed here that the input
vector u is constant, or held, over the sample interval, which is the norm for control applications.

So starting with a known xk at time t = kT , we desire the state vector at the next sample time,
xk+1 , or
Z (k+1)T
A T
xk+1 = e xk + e A (k+1)T
e−Aτ Buτ dt (2.69)
kT

But as we have assumed that the input u is constant using a zeroth-order hold between the
sampling intervals kT and (k + 1)T , Eqn. 2.69 simplifies to
Z T
xk+1 = eAT xk + eAλ Buk dλ (2.70)
0

where λ = T − t. For convenience, we can define two new matrices as

Φ = eAT
def
(2.71)
Z T !
eAλ dλ B
def
∆= (2.72)
0

which gives us our desired transformation in the form of Eqn. 2.68. In summary, to discretise
ẋ = Ax + Bu at sample interval T , we must compute a matrix exponential, Eqn. 2.71, and
integrate a matrix exponential, Eqn. 2.72.

Note that Eqn. 2.70 involves no approximation to the continuous differential equation provided
the input is constant over the sampling interval. Also note that as the sample time tends to zero,
the state transition matrix Φ tends to the identity, I.
60 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

If the matrix A is non-singular, then


 
∆ = eAT − I A−1 B (2.73)

is a simpler expression than the more general Eqn. 2.72.

Double integrator example. The single-input/single output double integrator system, G(s) =
1/s2 , can be represented in continuous state space using two states as
   
0 0 1
ẋ = x+ u (2.74)
1 0 0
 
y= 0 1 x (2.75)
At sample time T , the state transition matrix is from Eqn. 2.71,
   
A T 0 0 1 0
Φ=e = exp =
T 0 T 1
and the control matrix is given by Eqn. 2.72
Z T !
∆= eAλ dλ B
0
Z T   ! 
1 0 1
= dλ
0 λ 1 0
   
T 0  1  T
=  T2  =  T2 
T 0
2 2
For small problems such as this, the symbolic toolbox helps do the computation.

>> A = [0 0; 1 0]; B = [1; 0];


>> syms T lambda % Symbolic sample time T and λ
>> Phi = expm(A*T) % Φ(T )
Phi =
5 [ 1, 0]
[ T, 1]
>> Delta = int(expm(A*lambda),lambda,0,T)*B % ∆(T )
Delta =
[ T]
10 [ 1/2*Tˆ2]

Symbolic example with invertible A matrix. We can discretise the continuous state-space sys-
tem    
−1.5 −0.5 1
ẋ = x+ u
1 0 0
analytically at a sample time T by computing matrix exponentials symbolically.

>> A = [-3/2, -1/2; 1, 0]; B = [1;0];

>> syms T
>> Phi = expm(A*T)
5 Phi =
[ -exp(-1/2*T)+2*exp(-T), exp(-T)-exp(-1/2*T)]
[ -2*exp(-T)+2*exp(-1/2*T), 2*exp(-1/2*T)-exp(-T)]
2.8. SOLVING THE VECTOR DIFFERENTIAL EQUATION 61

Since A is invertible in this example, we can use the simpler Eqn. 2.73

 
>> Delta = (Phi - eye(2))*A\B % ∆ = eAT − I A−1 B
Delta =
3 [ 2/(exp(-1/2*T)*exp(-T)-exp(-T)-exp(-1/2*T)+1)*(exp(-T)-exp(-1/2*T))]
[ -2*(-exp(-1/2*T)+2*exp(-T)-1)/(exp(-1/2*T)*exp(-T)-exp(-T)-exp(-1/2*T)+1)]
>> simplify(Delta)
ans =
[ 2*exp(-1/2*T)/(exp(-T)-1)]
8 [ -2*(2*exp(-1/2*T)+1)/(exp(-T)-1)]

Of course it is evident from the above example that the symbolic expressions for Φ(T ) and ∆(T )
rapidly become unwieldy for dimensions much larger than about 2. For this reason, analytical
expressions are of limited practical worth. The alternative numerical schemes are discussed in
the following section.

2.8.1 Numerically computing the discrete transformation

Calculating numerical values for the matrices Φ and ∆ can be done by hand for small dimensions
by converting to a diagonal or Jordan form, or numerically using the exponential of a matrix.
Manual calculations are neither advisable nor enjoyable, but [19, p35] mention that if you first
compute
Z T
AT 2 A2 T 3 Ak−1 T k
Ψ= eAτ dτ = IT + + + ···+ + ···
0 2! 3! k!
then
Φ = I + AΨ and ∆ = ΨB (2.76)
A better approach, at least when using M ATLAB, follows from Eqn. 2.71 and Eqn. 2.72 where


= ΦA = AΦ (2.77)
dt
d∆
= ΦB (2.78)
dt
These two equations can be concatenated to give
    
d Φ ∆ Φ ∆ A B
= (2.79)
dt 0 I 0 I 0 0
R R
which is in the same form as da/dt = ab. Rearranging this to da/a = b dt leads to the analytical
solution     
Φ ∆ A B
= exp T (2.80)
0 I 0 0
enabling us to extract the required Φ and ∆ matrices provided we can reliably compute the
exponential of a matrix. The M ATLAB C ONTROL T OOLBOX routine to convert from continuous
to discrete systems, c2d, essentially follows this algorithm.

We could try the augmented version of Eqn. 2.80 to compute both Φ and ∆ with one call to the
matrix exponential function for the example started on page 60.

>> A = [-3/2, -1/2; 1, 0]; B = [1;0];


2 >> syms T
62 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

>> [n,m] = size(B); % extract dimensions  


A B
>> Aa = [A,B;zeros(m,n+m)] % Augmented A matrix, Ã =
0 0
Aa =
7 -1.5 -0.5 1.0
1.0 0.0 0.0
0.0 0.0 0.0
>> Phi_a = expm(Aa*T) compute exponential, Φ̃ = exp(ÃT )
Phi_a =
12 [ -exp(-1/2*T)+2*exp(-T), exp(-T)-exp(-1/2*T), -2*exp(-T)+2*exp(-1/2*T)]
[-2*exp(-T)+2*exp(-1/2*T), 2*exp(-1/2*T)-exp(-T),2*exp(-T)-4*exp(-1/2*T)+2]
[ 0, 0, 1]
>> Phi = Phi_a(1:n,1:n)
Phi =
17 [ -exp(-1/2*T)+2*exp(-T), exp(-T)-exp(-1/2*T)]
[ -2*exp(-T)+2*exp(-1/2*T), 2*exp(-1/2*T)-exp(-T)]
>> Delta = Phi_a(1:n,n+1:end)
Delta =
[ -2*exp(-T)+2*exp(-1/2*T)]
22 [ 2*exp(-T)-4*exp(-1/2*T)+2]

Reliably computing matrix exponentials

There are in fact several ways to numerically compute a matrix exponential. Ogata gives three
computational techniques, [148, pp526–533], and a paper by Cleve Moler (one of the original
M ATLAB authors) and Charles Van Loan is titled Nineteen dubious ways to compute the exponential
of a matrix5 , and contrasts methods involving approximation theory, differential equations, matrix
eigenvalues and others. Of these 19 methods, two found their way into M ATLAB, namely expm,
which is the recommended default strategy, and expm1 intended to compute ex − 1 for small x .

The one time when matrix exponentials are trivial to compute is when the matrix is diagonal.
Physically this implies that the system is totally decoupled since the matrix A is comprised of
only diagonal elements and the corresponding exponential matrix is simply the exponent of the
individual elements. So given the diagonal matrix
   λ 
λ1 0 0 e1 0 0
D =  0 λ2 0  , then the matrix exponential is exp (D) =  0 eλ2 0 
0 0 λ3 0 0 eλ3

which is trivial and reliable to compute. So one strategy then is to transform our system to a
diagonal form (if possible), and then simply find the standard scalar exponential of the individual
elements. However some matrices, such as those with multiple eigenvalues, is is impossible to
convert to diagonal form, so in those cases the best we can do is convert the matrix to a Jordan
block form as described in [148, p527], perhaps using the jordan command from the Symbolic
toolbox.

However this transformation is very sensitive to numerical roundoff and for that reason is not
used for serious computation. For example the matrix
 
1 1
A=
ǫ 1
5 SIAM Review, vol. 20, A 1978, pp802–836 and reprinted in [157, pp649–680]. In fact 25 years later, an update was

published with some recent developments.


2.8. SOLVING THE VECTOR DIFFERENTIAL EQUATION 63

with ǫ = 0 has a Jordan form of  


1 1
0 1
but for ǫ 6= 0, then the Jordan form drastically changes to the diagonal matrix
 √ 
1+ ǫ 0√
0 1− ǫ

In summary, for serious numerical calculation we should use the matrix exponential function
expm. Remember not to confuse finding the exponent of a matrix, expm, with the M ATLAB func-
tion exp which simply finds the exponent of all individual elements in the matrix.

2.8.2 Using M ATLAB to discretise systems

All of the complications of discretising continuous systems to their discrete equivalent can be
circumvented by using the M ATLAB command c2d which is short for continuous to discrete.
Here we need only to pass the continuous system of interest, and the sampling time. As an
example we can verify the conversion of the double integrator system shown on page 2.8.

G = tf(1,[1 0 0]) % Continuous system G(s) = 1/s2 in transfer function form


Gc = ss(G) % Convert to continuous state-space
3 Gd = c2d(Gc,2) % Convert to discrete state-space with a sample time of T = 2

a =
x1 x2
x1 1 0
8 x2 2 1

b =
u1
x1 2
13 x2 2

c =
x1 x2
y1 0 1
18

d =
u1
y1 0

23 Sampling time (seconds): 2


Discrete-time state-space model.

Unlike in the analytical case presented previously, here we must specify a numerical value for
the sample time, T .

Example: Discretising an underdamped second order system with a sample time of T = 3


following Eqn. 2.80 using M ATLAB to compute the matrix exponential of the augmented system.

1 [A,B,C,D] = ord2(3, 0.5); % generate a second order system


Ts = 3.0; % sample time
64 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

[na,nb] = size(B);
X = expm([A,B; zeros(nb,na+nb)]*Ts); % matrix exponential
6 Phi = X(1:na,1:na); % Pull off blocks to obtain Φ & ∆.
Del = X(1:na,na+1:end);

We can use M ATLAB to numerically verify the expression found for the discretised double inte-
grator on page 60 at a specific sample time, say T = 4.

The double integrator in input/output


form is
1 >>Gc = tf(1,[1,0,0])% G(s) = 1/s2
G(s) = 2 >>Gc_ss = ss(Gc);% state space
s
3 >>dt = 4; % sample time
which in packed state space notation is
  >>Gd_ss = c2d(Gc_ss,dt)
  0 0 1 a =
A B
= 1 0 0  x1 x2
C D
0 1 0 8 x1 1.000 0
x2 4.000 1.000
Using a sample time of T = 4, we can com-
pute b =
    u1
1 0 1 0 13 x1 4.000
Φ= =
T 1 4 1 x2 8.000
   
T 4
∆ = T 2 = 16 c =
2 2 x1 x2
18 y1 0 1.000
We can verify this using c2d in M ATLAB as
demonstrated opposite. d =
u1
y1 0

Once the two system matrices Φ and ∆ are known, solving the difference equation for further
values of k just requires the repeated application of Eqn. 2.38. Starting from known initial condi-
tions x0 , the state vector x at each sample instant is calculated thus

x1 = Φx0 + ∆u0
x2 = Φx1 + ∆u1 = Φ2 x0 + Φ∆u0 + ∆u1
x3 = Φx2 + ∆u2 = Φ3 x0 + Φ2 ∆u0 + Φ∆u1 + ∆u2
.. ..
. .
n−1
X
xn = Φn x0 + Φn−1−k ∆uk (2.81)
k=0

The M ATLAB function ltitr which stands for linear time invariant time response or the more
general dlsim will solve the general linear discrete time model as in 2.81. In special cases such
as step or impulse tests, you can use dstep, or dimpulse.

Problem 2.4 1. Evaluate An where  


1 1
A=
1 0
2.8. SOLVING THE VECTOR DIFFERENTIAL EQUATION 65

for different values of n. What is so special about the elements of An ? (Hint: find the
eigenvalues of A.)

2. What is the determinant of An ?

3. Write a couple of M ATLAB m-functions to convert between the A, B, C and D form and the
packed matrix form.

4. Complete the state-space to transfer function conversion analytically started in §2.7.3,


Eqn. 2.53. Compare your answer with using ss2tf.

A submarine example A third-order model of a submarine from [62, p416 Ex9.17] is


   
0 1 0 0
ẋ =  −0.0071 −0.111 0.12  x +  −0.095  u (2.82)
0 0.07 −0.3 0.072

where the state vector x is defined as


 
x⊤ = θ dθ
dt α

The state θ is the inclination of the submarine and α is the angle of attack above the horizontal.
The scalar manipulated variable u is the deflection of the stern plane. We will assume that of the
three states, we can only actually measure two of them, θ and α, thus
 
1 0 0
C=
0 0 1

In this example, we consider just three states x, one manipulated variable u, and two outputs, y.

The stability of the open-loop system is given by the eigenvalues of A. In M ATLAB, the command
eig(A) returns  
−0.0383 + 0.07i
eig(A) =  −0.0383 − 0.07i 
−0.3343
showing that all eigenvalues have negative real parts, indicating that the submarine is oscillatory,
though stable. In addition, the complex conjugate pair indicates that there will be some oscillation
to the response of a step change in stern plane.

We can simulate the response of our submarine system (Eqn 2.82) to a step change in stern plane
movement, u, using the step command. The smooth plot in Fig. 2.22 shows the result of this
continuous simulation, while the ‘staired’ plot shows the result of the discrete simulation using
a sample time of T = 5. We see two curves corresponding to the two outputs.

Listing 2.4: Submarine simulation


A = [0,1,0;-0.0071,-0.111,0.12;0,0.07,-0.3]; B = [0,-0.095,0.072]';
C = [1 0 0;0 0 1];
3

Gc = ss(A,B,C,0) % Continuous plant, Gc (s)

dt = 5; % sample time T = 5 (rather coarse)


Gd = c2d(Gc,dt) % Create discrete model
8

step(Gc,Gd); % Do step response and see Fig. 2.22.


66 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

0
G
c
G
−5 d

1
x
−10

−15

0.4

2 0.3

0.2
x

0.1
Figure 2.22: Comparing the continuous
0
and discrete step responses of the subma- 0 20 40 60 80
rine model. time

Fig. 2.22 affirms that the openloop process response is stable, supporting the eigenvalue analysis.
Notice how the step command automatically selected appropriate time scales for the simulation.
How did it do this?

The steady state of the system given that u = 1 is, using Eqn 2.61,

1 >> u = 1;
>>xss = -A\B*u; % Steady-states given by xss = −ABu.
>>yss = C*xss
yss = % See steady-state outputs in Fig. 2.22.
-9.3239
6 0.2400

which corresponds to the final values given in Fig. 2.22.

We can verify the results from the c2d routine using Eqns 2.71 and 2.72, or in this case since A is
invertible, Eqn. 2.73.

>> Phi = expm(A*5.0); % Don’t forget the ‘m’ in expmatrix()


 
>> Delta = (Phi-eye(size(A)))*(A\B) % ∆ = eAT − I A−1 B

This script should give the same Φ and ∆ matrices as the c2d function since A is non-singular.

Given a discrete or continuous model, we can compute the response to an arbitrary input se-
quence using lsim:

t = [0:dt:100]'; % time vector


U = sin(t/20); % Arbitrary input U (t)
3

lsim(Gc,U,t) % continuous system simulation


lsim(Gd,U) % discrete system simulation
2.8. SOLVING THE VECTOR DIFFERENTIAL EQUATION 67

We could explicitly demand a first-order-hold as opposed to the default zeroth order by setting
the options for c2d.

optn = c2dOptions('Method','foh') % Ask for a first-order hold


Gd2 = c2d(Gc,5,optn) % Compare the step response of this with a zeroth-order hold

Problem 2.5 By using a suitable state transformation, convert the following openloop system

ẋ1 = x2
ẋ2 = 10 − 2x2 + u

where the input u is the following function of the reference r,


 
u = 10 + 9r − 9 4 x

into a closed loop form suitable for simulation in M ATLAB using the lsim function. (ie: ẋ =
Ax + Br) Write down the relevant M ATLAB code segment. (Note: In practice, it is advisable to
use lsim whenever possible for speed reasons.)

Problem 2.6 1. Write an m-file that returns the state space system in packed matrix form that
connects the two dynamic systems G1 and G2 together,

Gall = G1 · G2

Assume that G1 and G2 are already given in packed matrix form.


2. (Problem from [182, p126]) Show that the packed matrix form of the inverse of Eqn. 2.41,
u = G−1 y, is  
A − BD−1 C −BD−1
G−1 = (2.83)
D−1 C D−1
assuming D is invertible, and we have the same number of inputs as outputs.

2.8.3 The discrete state space equation with time delay

The discrete state space equation, Eqn. 2.37, does not account for any time delay or dead time
between the input u and the output x. This makes it difficult to model systems with delay.
However, by introducing new variables we can accommodate dead time or time delays. Consider
the continuous differential equation,

ẋt = Axt + But−θ (2.84)

where θ is the dead time and in this particular case, is exactly equal to 2 sample times (θ = 2T ).
The discrete time equivalent to Eqn 2.84 for a two sample time delay is

xk+3 = Φxk+2 + ∆uk (2.85)

which is not quite in our standard state-space form of Eqn. 2.38 owing to the difference in time
subscripts between u and x.

Now let us introduce a new vector of state variables, z where


 
xk
def 
zk = xk+1  (2.86)
xk+2
68 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

is sometimes known as a shift register or tapped delay line of states. Using this new state vector,
we can write the dynamic system, Eqn. 2.85 compactly as
   
xk+1 Ixk+1
zk+1 =  xk+2  =  Ixk+2  (2.87)
xk+3 Φxk+2 + ∆uk

or in standard state-space form as


   
0 I 0 0
zk+1 = 0 0 I  z k +  0  uk (2.88)
0 0 Φ ∆

This augmented system, Eqn. 2.88 (with 2 units of dead time) is now larger than the original
Eqn. 2.85 given that we have 2n extra states. Furthermore, note that the new augmented transi-
tion matrix in Eqn. 2.88 is now no longer of full rank (provided it was in the first place). If we
wish to incorporate a delay θ, of β sample times (θ = βT ), then we must introduce β dummy
state vectors into the system creating a new state vector z with dimensions (nβ × 1);
 ⊤
zk = xk xk+1 xk+2 · · · xk+β

The augmented state transition matrix Ξ and augmented control matrix Ψ are of the form
   
0 I 0 ··· 0 0

 0 0 I ··· 0 


 0 


 0 0 0 ··· 0 


 0 

Ξ= .. .. .. . . . , and Ψ =  0  (2.89)
 . . . . ..   
   .. 
 0 0 0 ··· I   . 
0 0 0 ··· Φ ∆

Now the discrete time equation is


zk+1 = Ξzk + Ψuk (2.90)

Since all the introduced states are unobservable, we have


 
xk = I 0 0 ··· 0 zk (2.91)

and that the output equation is amended to


 
yk = C I 0 0 ··· 0 zk (2.92)

If the dead time is not an exact multiple of the sample time, then more sophisticated analysis is
required. See [70, p174]. Systems with a large value of dead time become very unwieldy as a
dummy vector is required for each multiple of the sample time. This creates large systems with
many states that possibly become numerically ill-conditioned and difficult to manipulate.

Problem 2.7 1. Simulate the submarine example (Eqn 2.82) but now introducing a dead time
of 2 sample time units.

2. Explain how the functions step and dstep manage to guess appropriate sampling rates,
and simulation horizons. Under what circumstances will the heuristics fail?
2.9. STABILITY 69

System Stability

 s

Transfer function Nonlinear

I q
 Lyapunov
non-poly
^
Marginal Linearise
Pade approx. simulate
yes/no eigenvalues
roots of denominator
Routh array poles
Jury test

Figure 2.23: Issues in assessing system stability

2.9 Stability

Stability is a most desirable characteristic of a control system. As a control designer, you should
only design control systems that are stable. For this reason, one must at least be able to analyse
a potential control system in order to see if it is indeed theoretically stable at least. Once the
controller has been implemented, it may be too late and costly to discover that the system was
actually unstable.

One definition of stability is as follows:

A system of differential equations is stable if the output is bounded for all


time for any given bounded input, otherwise it is unstable. This is referred to
as bounded input bounded output (BIBO) stability.

While for linear systems, the concept of stability is well defined and relatively straight forward
to evaluate6 , this is not the case for nonlinear systems which can exhibit complex behaviour. The
next two sections discuss the stability for linear systems. Section 2.9.4 discusses briefly techniques
that can be used for nonlinear systems.

Before any type of stability analysis can be carried out, it is important to define clearly what sort
of analysis is desired. For example is a ‘Yes/No’ result acceptable, or do you want to quantify
how close to instability the current operating point is? Is the system linear? Are the nonlinearities
‘hard’ or ‘soft’? Fig. 2.23 highlights some of these concerns.

6 Actually, the stability criterion can be divided into 3 categories; Stable, unstable and critically (un)stable. Physically

critical stability never really occurs in practice.


70 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

2.9.1 Stability in the continuous domain

The most important requirement for any controller is that the controller is stable. Here we de-
scribe how one can analyse a continuous transfer function to determine whether it is stable or
not.

First recall the conditions for BIBO stability. In the Laplace domain, the transfer function system
is stable if all the poles (roots of the denominator) have negative real parts. Thus if the roots of the
denominator are plotted on the complex plane, then they must all lie to the left of the imaginary
axis or in the Left Hand Plane (LHP). In other words, the time domain solution of the differential
equation must contain only e−at terms and no terms of the form e+at , assuming a is positive.

Why is the stability only determined by the denominator of the transfer function and not by the
numerator or the input?

1. First the input is assumed bounded and stable (by BIBO definition).

2. If the transfer function is expanded as a sum of partial fractions, then the time solution
terms will be comprised of elements that are summed together. For the system to be un-
stable, at least one of these individual terms must be unstable. Conversely if all the terms
are stable themselves, then the summation of these terms will also be stable. The input will
also contribute separate fractions that are summed, but these are assumed to be stable by
definition, (part 1).

Note that it is assumed for the purposes for this analysis, that the transfer function is written in
its simplest form, that it is physically realisable, and that any time delays are expanded as a Padé
approximation.

In summary to establish the stability of a transfer function, we could factorise the denomina-
tor polynomial using a computer program to compute numerically the roots such as M ATLAB’s
roots routine. Alternatively, we can just hunt for the signs of the real part of the roots, a much
simpler operation, and this is the approach taken by the Routh and Jury tests described next.

Routh stability criterion

The easiest way to establish the absolute stability of a transfer function is simply to extract the
roots of the denominator perhaps using M ATLAB ’ S root command. In fact this is an overkill
since to establish absolute stability we need only to know the presence of any roots with positive
real parts, not the actual value.

Aside from efficiency, there are two cases where this strategy falls down. One is where we do not
have access to M ATLAB or the polynomial is particularly ill-conditioned such that simple root
extraction techniques fail for numerical reasons, and the other case is where we may have a free
parameter such as a controller gain in the transfer function. This has prompted the development
of simpler exact algebraic methods such as the Routh array or Lyapunov’s method to assess
stability.

The Routh criterion states that all the roots of the polynomial characteristic equation,

an sn + an−1 sn−1 + · · · + a1 s + a0 = 0 (2.93)

have negative real parts if and only if the first column of the Routh table have the same sign.
Otherwise the number of sign changes is equal to the number of right-hand plane roots.
2.9. STABILITY 71

The Routh table is defined starting with the coefficients of the characteristic polynomial, Eqn. 2.93,
sn an an−2 an−4 · · ·
sn−1 an−1 an−3 an−5 · · ·
sn−2 b1 b2 b3 ···
sn−3 c1 c2 c3 ···
.. .. ..
. . .

and where the new entries b and c are defined as


an−1 an−2 − an an−3 an−1 an−4 − an an−5
b1 = , b2 = , etc.
an−1 an−1
b1 an−3 − an−1 b2 b1 an−5 − an−1 b3
c1 = , c2 = , etc.
b1 b1

The Routh table is continued until only zeros remain.

The Routh array is only applicable for continuous time systems, but for discrete time systems,
the Jury test can be used in a manner similar to the Routh table for continuous systems. The
construction of the Jury table is slightly more complicated than the Routh table, and is described
in [60, 117–118].

The Routh array is most useful when investigating the stability as a function of a variable such as
controller gain. Rivera-Santos has written a M ATLAB routine to generate the Routh Array with
the possibility of including symbolic variables.7

Suppose we wish to determine the range of stability for a closed loop transfer function with
characteristic equation
s4 + 3s3 + 3s2 + 2s + K = 0
as a function of gain K.8

Listing 2.5: Example of the Routh array using the symbolic toolbox
>> syms K % Define symbolic gain K
>> ra = routh([1 3 3 2 K],K) % Build Routh array for s4 + 3s3 + 3s2 + 2s + K
3 ra =
[ 1, 3, K]
[ 3, 2, 0]
[ 7/3, K, 0]
[ -9/7*K+2, 0, 0]
8 [ K, 0, 0]

Since all the elements in the first column of the table must be positive, we know that for stability
−(9/7)K + 2 > 0 and that K > 0, or
0 < K < 14/7

Stability and time delays

Time delay or dead time does not affect the stability of the open loop response since it does not
change the shape of the output curve, but only shifts it to the right. This is due to the non-
polynomial term e−sθ in the numerator of the transfer function. Hence dead time can be ignored
7 The routine, routh.m, is available from the MathWorks user’s group collection at
www.mathworks.com/matlabcentral/fileexchange/.
8 Problem adapted from [150, p288]
72 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

for stability considerations for the open loop. However in the closed loop the dead time term ap-
pears in both the numerator and the denominator and now does affect the stability characteristics
and can now no longer be ignored. Since dead time is a nonpolynomial term, we cannot simply
find the roots of the denominator, since now there are an infinite number of them, but instead we
must approximate the exponential term with a truncated polynomial such as a Padé approxima-
tion and then apply either a Routh array in the continuous time case, or a Jury test in the discrete
time case.

Note however that the Nyquist stability criteria yields exact results, even for those systems with
time delays. The drawback is that the computation is tedious without a computer.

Problem 2.8 Given


15(4s + 1)e−θs
G(s) =
(3s + 1)(7s + 1)
Find the value of deadtime θ such that the closed-loop is just stable using both a Padé approxi-
mation and a Bode or Nyquist diagram.

A first order Padé approximation is


1 − 2θ s
e−θs =
1 + 2θ s

2.9.2 Stability of the closed loop

Prior to the days of cheap computers that can readily manipulate polynomial expressions and
extract the roots of polynomials, one could infer the stability of the closed loop from a Bode or
Nyquist diagram of the open loop.

The openloop system


100(s + 2)
G(s) = e−0.2s (2.94)
(s + 5)(s + 4)(s + 3)
is clearly stable, but the closed loop system, G/(1 + G) may, or may not be stable. If we substitute
s = iω, and compute the complex quantity G(iω) as a function of ω, we can apply either the Bode
stability criteria, or the equivalent Nyquist stability criteria to establish the stability of the closed
loop without deriving an expression for the closed loop, and subsequently solving for the closed
loop poles.

The Bode diagram consists of two plots; the magnitude of G(iω) versus ω, and the phase of
G(iω) versus ω. Alternatively we could plot the real and imaginary parts of G(iω) as a function
of frequency, but this results in a three dimensional plot. Such a plot for the system given in
Eqn. 2.94 is given in Fig. 2.24(a).

However three dimensional plots are difficult to manipulate, so we normally ignore the fre-
quency component, and just view the shadow of the plot on the real/imaginary plane. The
two dimensional Nyquist curve corresponding to Fig. 2.24(a) is given in Fig. 2.24(b). It is evident
from either plot that the curve does encircle the critical (0, −1i) point, so the closed loop system
will be unstable.

Establishing the closed loop stability using the Nyquist criteria is slightly more general than when
using the Bode criteria. This is because for systems that exhibit a non-monotonically decreasing
curves may cross the critical lines more than once leading to misleading results. An interesting
example from [80] is use to illustrate this potential problem.
2.9. STABILITY 73

Nyquist Diagram
3D Nyquist plot
3

4 2
10

2 1
10

Imaginary Axis
ω [rad/s]

0 0
10

−2 −1
10
2
−2
0
4
−2 2 −3
0 −1 0 1 2 3
ℑ(Gi ω)
−4 −2 ℜ(Gi ω) Real Axis

(a) The Nyquist diagram with frequency information (b) A 2D Nyquist diagram which shows the closed
presented in the third dimension. A ‘flagpole’ is loop is unstable since it encircles the (−1, 0i) point.
drawn at (−1, 0i). Since the curve encircles the flag- In this case both positive and negative frequencies are
pole, the closed loop is unstable. plotted.

Figure 2.24: Nyquist diagram of Eqn. 2.94 in (a) three dimensions and (b) as typically presented
in two dimensions.

2.9.3 Stability of discrete time systems

The definition of stability for a discrete time system is similar to that of the continuous time
system except that the definition is only concerned with the values of the output and input at the
sample times. Thus it is possible for a system to be stable at the sample points, but be unstable
between the sample points. This is called a hidden oscillation, although rarely occurs in practice.
Also note that the stability of a discrete time system is dependent on the sample time, T .

Recall that at sample time T , by definition z = exp(sT ), the poles of the continuous transfer
function pi map to exp(pi T ) in discrete space. If pi is negative, then exp(pi T ) will be less than
1.0. If pi is positive, then the corresponding discrete pole zi will be larger than 1.0. Strictly if pi
is complex, then it is the magnitude of zi that must be less than 1.0 for the system to be stable.
If the discrete poles zi , are plotted on the complex plane, then if they lie within a circle centered
about the origin with a radius of 1, then the system is stable. This circle of radius 1 is called the
unit circle.

The discrete time transfer function G(z) is stable if G(z) has no discrete poles
on, or outside, the unit circle.

Now the evaluation of stability procedure is the same as that for the continuous case. One simply
factorises the characteristic equation of the transfer function and inspects the roots. If any of the
roots zi lie outside the unit circle, the system will be unstable. If any of the poles lie exactly on
the unit circle, there can be some ambiguity about the stability, see [119, p120]. For example the
transfer function which has poles on the unit circle,

1
G(z) =
z2 + 1
74 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

given a bounded input such as a step, u(k) = {1, 1, 1, 1 . . .} produces

y(k) = {0, 0, 1, 1, 0, 0, 1, 1, 0, 0, . . .}

which is bounded and stable. However if we try a different bounded input such as a cycle u(k) =
{1, 0, −1, 0, 1, 0, −1, . . .} we observe

y(k) = {0, 0, 1, 0, −2, 0, 3, 0, −4, 0, 5, 0, −6, 0, . . .}

which is unbounded.

Just as in the continuous case, we can establish the absolute stability of a discrete transfer function
without explicitly solving for the roots of the denominator D(z) = 0, which, in the days before
computers was a main consideration. The main methods for discrete systems are the Jury test,
the bilinear transform coupled with the Routh array, and Lyapunov’s method. All these methods
are analytical and hence exact. The Jury test is the discrete equivalent to the Routh array and
is covered in Ogata [148, p242]. Lyapunov’s method has the special distinction that it is also
applicable to nonlinear ODEs and is covered in §2.9.4.

Suppose we wish to establish the stability of the discretised version of


1
G(s) =
6s2 + 5s + 1
at a sampling period of T = 2 and a first-order hold.

The discrete transfer function is


0.0746 + 0.2005z −1 + 0.0324z −2
G(z) = >> G = tf(1,[6 5 1]);
1 − 0.8813z −1 + 0.1889z −2
2 >> Gd = c2d(G,2,'foh')
with discrete poles given by the solution of >> pole(Gd)
ans =
1 − 0.8813z −1 + 0.1889z −2 = 0 0.5134
0.3679
or  7 >> pzmap(Gd); axis('equal');
0.5134
z=
0.3679

Since both discrete poles lie inside the unit circle, the discretised transfer function with a first-
order hold is still stable.

If you are using the state space description, then simply check the eigenvalues of the homoge-
neous part of the equation (the A or Φ matrix).

2.9.4 Stability of nonlinear differential equations

Despite the fact that nonlinear differential equations are in general very difficult, if not impossible
to solve, one can attempt to establish the stability without needing to find the solution. Studies of
this sort fall into the realm of nonlinear systems analysis which demands a high degree of math-
ematical insight and competence. [187] is a good introductory text for this subject that avoids
much of the sophisticated mathematics.

Two methods due to the Russian mathematician Lyapunov9 address this nonlinear system sta-
bility problem. The indirect or linearisation method is based on the fact that the stability near
9 Some authors, e.g. Ogata, tend to it spell as ‘Liapunov’. Note however that M ATLAB uses the lyap spelling.
2.9. STABILITY 75

the equilibrium point will be closely approximated to the linearised approximation at the point.
However it is the second method, or direct method that being exact is a far more interesting and
powerful analysis. Since the Lyapunov stability method is applicable for the general nonlinear
differential equation, it is of course applicable for linear systems as well.

Figure 2.25: “Alexander Mikhailovich Liapunov (1857–1918) was a Russian


mathematician and mechanical engineer. He had the very rare merit of pro-
ducing a doctoral dissertation of lasting value. This classic work was originally
published in 1892 in Russian but is now available in an English translation Sta-
bility of Motion, Academic Press, NY, 1966. Liapunov died by violence in Odessa,
which cannot be considered a surprising fate for a middle class intellectual in the
chaotic aftermath of the Russian revolution.”
Excerpt from Differential Equations and historical applications, Simmons, p465.

The Lyapunov stability theorem says that if such a differential system exists

ẋ = f (x, t) (2.95)

where f (0, t) = 0 for all t, and if there exists a scalar function V (x) having continuous first partial
derivatives satisfying V (x) is positive definite and the time derivative, V̇ (x), is negative definite,
then the equilibrium position at the origin is uniformly asymptotically stable. If V (x, t) → ∞ as
x → ∞, then it is stable “in the large”. V (x) is called a Lyapunov function of the system.

If the Lyapunov function is thought of as the total energy of the system, then we know that the
total energy must both always be positive, hence the restriction that V (x) is positive definite, and
also for the system to be asymptotically stable, this energy must slowly with time die away, hence
the requirement that V̇ (x) < 0. The hard part about this method is finding a suitable Lyapunov
function in the first place. Testing the conditions is easy, but note that if your particular candidate
V (x) function failed the requirements for stability, this either means that your system is actually
unstable, or that you just have not found the right Lyapunov function yet. The problem is that
you don’t know which. In some cases, particularly vibrating mechanical ones, good candidate
Lyapunov functions can be found by using the energy analogy, however in other cases this may
not be productive.

Algorithm 2.2 Lyapunov stability analysis of nonlinear continuous and discrete systems.

The second method of Lyapunov is applicable for both continuous and discrete systems, [187,
p65], [148, p557].
76 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

Discrete difference equations,

Given a nonlinear difference equation,


Continuous differential equations xk+1 = f (xk ), f (0) = 0
Given a nonlinear differential equation, and a scalar function V (x), continuous in x, then
if
ẋ = f (x, t)
1. V (x) > 0 for x 6= 0 (or V (x) is positive def-
and a scalar function V (x), then if inite), and
1. V (x) is positive definite, and 2. ∆V (x), the difference of V (x), is negative
2. V̇ (x), the derivative of V (x), is negative definite, where
definite, and ∆V (xk ) = V (xk+1 ) − V (xk )
3. V (x) → ∞ as kxk → ∞ = −V (f (xk )) − V (xk )
then the system ẋ = f (x, t) is globally asymptoti-
cally stable. and
3. V (x) → ∞ as kxk → ∞
then the system xk+1 = f (xk ) is globally asymp-
totically stable.

Example of assessing the stability of a discrete nonlinear system using Lyapunov’s second
method.

Consider the dynamic equation from Ogata [150, Ex 9-18, p729]



ẋ1 = x2 − x1 x21 + x22

ẋ2 = −x2 − x2 x21 + x22

and we want to establish if it is stable or not. The system is nonlinear and the origin is the only
equilibrium state. We can propose a trial Lyapunov function V (x) = x21 + x22 which is positive
definite. (Note that that was the hard bit!) Now differentiating gives

dx1 dx2 
V̇ (x) = 2x1 + 2x2 = −2 x21 + x22
dt dt
which is negative definite. Since V (x) → ∞ as x → ∞, then the system is “stable in the large”.
Relevant problems in [150, p771] include B-9-15, B-9-16 and B-9-17.

The problem with assessing the stability of nonlinear systems is that the stability can change
depending on the values of the states and/or inputs. This is not the case for linear systems,
where the stability is a function of the system, not the operating point. An important and relevant
chemical engineering example of a nonlinear system that can either be stable or unstable is a
continuously stirred tank reactor. The reactor can either be operated in the stable or unstable
regime depending on the heat transfer characteristics. The stability of such reactors is important
for exothermic reactions, since the thermal run-away is potentially very hazardous.

Stability of linear time invariant systems using Lyapunov

Sections 2.9.1 and 2.9.3 established stability criteria for linear systems. In addition to these meth-
ods, the second method of Lyapunov can be used since the linear system is simply a special case
of the general nonlinear system. The advantages of using the method of Lyapunov are:
2.9. STABILITY 77

1. The Lyapunov method is analytical even for nonlinear systems and so the criteria is, in
principle, exact.
2. It is computationally simple and robust and does not require the solution to the differential
equation.
3. We do not need to extract eigenvalues of an n × n matrix which is an expensive numerical
computation, (although we may need to check the definiteness of a matrix which does
require at least a Cholesky decomposition).

Of course similar advantages apply to the Routh array procedure, but unlike the Routh array,
this approach can be extended to nonlinear problems.

To derive the necessary condition for stability of ẋ = Ax using the method of Lyapunov, we
follow the procedure outlined in Algorithm 2.2 where we choose a possible Lyapunov function
in quadratic form,
V (x) = xT Px
The Lyapunov function will be positive definite if the matrix P is positive definite. The time
derivative is given by

V̇ (x) = ẋT Px + xT Pẋ


= (Ax)T Px + xT P(Ax)
 
= xT AT P + PA x
| {z }
−Q

So if −Q is negative definite or alternatively Q is positive definite, Q ≻ 0, then the system is


stable at the origin and hence asymptotically stable in the large. Note that this solution procedure
is analytical.

Unfortunately there is no cheap and easy computational way to establish if a matrix is positive
definite or not, but using M ATLAB we would usually attempt a Cholesky factorisation. Apart
from the difficulty in deciding if a matrix is positive definite or not, we also have the problem
that a failure of Q being positive definite does not necessarily imply that the original system is
unstable. All it means is that if the original system does turn out to be stable, then the postulated
P was actually not a Lyapunov function.

A solution procedure that avoids us testing many different Lyapunov candidates is to proceed in
reverse; namely choose an arbitrary symmetric positive definite Q (say the identity matrix), and
then solve
AT P + PA = −Q (2.96)
for the matrix P. Eqn. 2.96 is known as the matrix Lyapunov equation or the more general form,
AP + PB = −Q, is known as Sylvester’s equation.

One obvious way to solve for the symmetric matrix P in Eqn. 2.96 is by equating the coefficients
which results in a n(n + 1)/2 system of linear algebraic equations. It is convenient in M ATLAB
to use the Kronecker tensor product to set up the equations and then use M ATLAB’s backslash
operator to solve them. This has the advantage that it is expressed succinctly in M ATLAB, but it
has poor storage and algorithmic characteristics and is recommended only for small dimensioned
problems. Section 2.9.5 explains Kronecker tensor products and this analytical solution strategy
in further detail.

Listing 2.6: Solve the continuous matrix Lyapunov equation using Kronecker products
78 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

n= size(Q,1); % Solve AT P + PA = −Q, Q = QT


I= eye(n); % Identity In , the same size as A
3 P=reshape(-(kron(I,A')+kron(A',I))\Q(:),n,n);% −vec(I⊗AT +AT ⊗I)vec(P) = vec(Q)
P= (P+P')/2; % force symmetric (hopefully unnecessary)

The final line, while strictly unnecessary, simply forces the computed result to be symmetric,
which of course it should be in the first case.

Less restrictive conditions on the form of Q exist that could make the solution procedure easier
to do manually and are discussed in problem 9–18 Ogata [150, p766], and [148, p554]. Note
however, using M ATLAB, we would not normally bother with these modifications.

The C ONTROL T OOLBOX for M ATLAB includes the function lyap that solves the matrix Lya-
punov equation which is more numerically robust and efficient that equating the coefficients or
using the Kronecker tensor product method given on page 77. However note that the definition
of the Lyapunov matrix equation used by M ATLAB,

AP + PAT = −Q (2.97)

is not exactly the same as that defined by Ogata and used previously in Eqn. 2.96. (See the M AT-
LAB help file for clarification of this and compare with example 9–20 in [150, p734].)

Suppose we wish to establish the stability of the continuous system

ẋ = Ax
 
0 1
= x
−1 0.5

using lyap, as opposed to say extracting eigenvalues.

Now we wish to solve A⊤ P + PA = −Q for P where Q is some conveniently chosen positive


definite matrix such as say Q = I. Since M ATLAB’s lyap function solves the equivalent matrix
equation given by Eqn. 2.97, we must pass A⊤ rather than A as the first argument to lyap.

Listing 2.7: Solve the matrix Lyapunov equation using the lyap routine
1 >> A = [0,1;-1,-0.4];
>> P = lyap(A',eye(size(A))) % Use AT instead of A
P =
2.7000 0.5000
0.5000 2.5000
6 >> A'*P + P*A + eye(size(A)) % Does this equal zero ?
ans =
1.0e-015 *
0 -0.4441
-0.4441 0.2220
11

>> Q = eye(size(A)); % alternative method


>> n = max(size(Q)); % solve AT X + XA = C where C ≻ 0
>> I = eye(n);
>> P=reshape(-(kron(I,A')+ kron(A',I))\Q(:),n,n)
16 P =
2.7000 0.5000
0.5000 2.5000
2.9. STABILITY 79

Both methods return the same matrix P,


 
2.7 0.5
P=
0.5 2.5
which has minor determinants of 2.7 and 6.5. Since both are positive, following Sylvester’s crite-
ria, P is positive definite and so the system is stable.

We can use Sylvester’s criteria to establish if P is positive definite or not, and hence the sys-
tem’s stability. To establish if a symmetric matrix is positive definite, the easiest way is to look
at the eigenvalues. If they are all positive, the matrix is positive definite. Unfortunately we were
originally trying to avoid to solve for the eigenvalues, so this defeats the original purpose some-
what. Another efficient strategy to check for positive definiteness with M ATLAB is to attempt
a Cholesky decomposition using the [R,p]=chol(A) command. The decomposition will be
successful if A is positive definite, and will terminate early with a suitable error message if not.
Refer to the help file for further details.

As a final check, we could evaluate the eigenvalues of A in M ATLAB by typing eig(A).

Discrete time systems

For linear discrete time systems, the establishment of stability is similar to the continuous case
given previously. Given the discrete time system xk+1 = Φxk , we will again choose a positive
definite quadratic function for the trial Lyapunov function, V (x) = xT Px where the matrix P is
positive definite, and compute the forward difference
∆V (x) = V (xk+1 ) − V (xk )
= xTk+1 Pxk+1 − xTk Pxk
= (Φxk )T PΦxk − xTk Pxk
 
= xTk ΦT PΦ − P xk
| {z }
−Q

If the matrix Q  
Q = − ΦT PΦ − P (2.98)
is positive definite, then the system is stable.

The reverse solution procedure for P given Q and Φ is analogous to the continuous time case
given in Listing 2.6 using the Kronecker tensor product.

Listing 2.8: Solve the discrete matrix Lyapunov equation using Kronecker products
n = max(size(Q)); % Solve AT PA − P = −Q, with Q ≻ 0
2 I = eye(n); % identity
P = reshape((kron(I,I)-kron(Phi',Phi'))\Q(:),n,n)
P = (P+P')/2; % force symmetric (hopefully unnecessary)

Once again, the more efficient dlyap routine from the C ONTROL T OOLBOX can also be used. In
fact dlyap simply calls lyap after some minor pre-processing.

We can demonstrate this establishing the stability of


 
3 −0.5
xk+1 = xk
0 0.8
80 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

In this case since Φ is upper triangular, we can read the eigenvalues by inspection, (they are the
diagonal elements, 3 and 0.8). Since one of the eigenvalues is outside the unit circle we know
immediately that the process is unstable. However following the method of Lyapunov,

1 >> Phi = [3, -0.5; 0, 0.8]


>> Q = eye(size(Phi)); % Ensure Q ≻ 0
>> P = dlyap(Phi',Q) % Note Φ is transposed.
P =
-0.1250 -0.1339
6 -0.1339 2.9886
>> [R,rc] = chol(P) % Is P +ve definite?
R =
[]
rc =
11 1 % No it isn’t, unstable!

we verify that result.

We can both check the definiteness of P using Sylvester’s criteria [150, pp924-926], or by using
the chol function as in the above example, or even by finding the eigenvalues of P. We then can
compare the stability criteria thus established with that obtained by solving for the eigenvalues
of Φ.

Algorithm 2.3 Stability of linear systems using Lyapunov


We wish to establish the absolute stability of the dynamic linear continuous system, ẋ = Ax or
the discrete counterpart, xk+1 = Φxk

1. Choose a convenient positive definite n × n matrix Q, say the identity, In×n , then

2. Solve either
AT P + PA = −Q
in the case of a continuous system, or

ΦT PΦ − P = −Q

in the case of a discrete system for P by either equating the coefficients or using lyap or
dlyap.

3. The Lyapunov function is V (x) = xT Px, so check the definiteness of P. The system is
stable if and only if P is positive definite. Check using Sylvester’s criteria or attempt a
Cholesky factorisation.

2.9.5 Expressing matrix equations succinctly using Kronecker products

The strategy employed in Listings 2.6 and 2.8 to solve the Lyapunov equation used Kronecker
products and vectorisation or stacking to express the matrix equation succinctly. This made it easy
to solve since the resulting expression was now reformulated into a system of linear equations
which can be solved using standard linear algebra techniques. Further details on the uses and
properties of Kronecker products (or tensor products) are given in [88, p256], [115, Chpt 13]
and [128]. The review in [34] concentrates specifically on Kronecker products used in control
applications.
2.9. STABILITY 81

The Kronecker product, given the symbol ⊗, of two arbitrarily sized matrices, A ∈ ℜm×n and
B ∈ ℜp×q , results in a new (large) matrix
 
a11 B · · · a1n B
A⊗B=
 .. .. ..  mp×nq
. . . ∈ℜ (2.99)
am1 B · · · amn B
of size (mp × nq). In M ATLAB we would write kron(A,B). Note in general A ⊗ B 6= B ⊗ A.

The vectorisation of a matrix A is an operation that converts a rectangular matrix into a single
column by stacking the columns of A on top of each other. In other words, if A is an (n ×
m) matrix, then vec(A) is the resulting (nm × 1) column vector. In M ATLAB we convert block
matrices or row vectors to columns simply using the colon operator, or A(:).

We can combine this vectorisation operation with Kronecker products to express matrix multi-
plication as a linear transformation. For example, for the two matrices A and B of compatible
dimensions, then
vec (AB) = (I ⊗ A) vec(B)
  (2.100)
= BT ⊗ I vec(A)

and for the three matrices A, B and C of compatible dimensions, then


 
vec (ABC) = CT ⊗ A vec(B)
= (I ⊗ AB) vec(C) (2.101)
 
= CT BT ⊗ I vec(A)

Table II in [34] summarises these and many other properties of the algebra of Kronecker products
and sums.

This gives us an alternative way to express matrix expressions such as the Sylvester equation
AX − XA = Q where we wish to solve for the matrix X given matrix A. In this case, using
Eqn. 2.100, we can write
 
vec(AX − XA) = I ⊗ A − AT ⊗ I vec(X) = vec(Q)

which is in the form of a system of linear equations Gx = q where the vectors x and q are simply

the stacked columns of the matrices X, and Q, and the matrix G is given by I ⊗ A − AT ⊗ I .
We first solve for the unknown vector x using say x = G−1 q or some numerically sound equiv-
alent, and then we reassemble the matrix X by un-stacking the columns from x. Of course this
strategy is memory intensive because the size of the matrix G is (n2 ×n2 ). However [34] describes
some modifications to this approach to reduce the dimensionality of the problem.

Using fsolve to solve matrix equations

The general nonlinear algebraic equation solver fsolve within the O PTIMISATION T OOLBOX has
the nice feature that it can solve matrix equations such as the continuous time Lyapunov equation
directly. Here we wish to find the square matrix X such that
AX + XAT + Q = 0
given an arbitrary square matrix A and a positive definite matrix Q. For an initial estimate for X
we should start with a positive definite matrix such as I. We can compare this solution with the
one generated by the dedicated lyap routine.
82 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS

ℑ ℑ
6 Stability boundary 6

+1 unstable
9 z
stable
stable
0 - ℜ - ℜ
+1
unstable
 imaginary axis I unit circle
−1

Continuous s-plane Discrete z-plane

Figure 2.26: Regions of stability for the poles of continuous (left) and discrete (right) systems

>> n = 4;
>> A = rand(n); % Create a ‘random’ matrix A of dimensions (n × n)
>> Q1 = randn(n); Q = Q1'*Q1; % Create a ‘random’ positive definite matrix Q
4

>> LyEqn = @(X) A*X+X*A'+Q; % Matrix equation to be solved: AX + XAT + Q = 0


>> X = fsolve(LyEqn, eye(n)); % Solve the Lyapunov using fsolve
X1 =
-0.8574 -0.6776 1.0713
9 -0.6776 2.7742 -2.6581
1.0713 -2.6581 0.7611

>> Xx = lyap(A,Q); % Solve the Lyapunov using the control toolbox


Xx =
14 -0.8574 -0.6776 1.0713
-0.6776 2.7742 -2.6581
1.0713 -2.6581 0.7611

2.9.6 Summary of stability analysis

The stability of a linear continuous time transfer function is determined by the sign of the real
part of the poles. For the transfer function to be stable, all the poles must lie strictly in the left-
hand-plane. In the discrete case, the stability is determined by the magnitude of the possibly
complex poles. To be stable, the discrete poles must lie within the unit circle. (See Fig. 2.26.)
The key difference between the stability of discrete and continuous transfer functions is that the
sample time plays an important role. Generally as one increases the sample time, the discrete
system tend to instability.

To establish the stability of the transfer functions, one need not solve for the roots of the de-
nominator polynomial since exact algebraic methods such as the Routh Array, Jury blocks and
Lyapunov methods exist. However with the current computer aided tools such as M ATLAB, the
task of reliably extracting the roots of high order polynomials is not considered the hurdle it once
2.10. SUMMARY 83

was. The Lyapunov method is also applicable for nonlinear systems. See Ogata’s comments,
[148, p250].

2.10 Summary

This chapter briefly developed the tools needed to analyse discrete dynamic systems. While
most physical plants are continuous systems, most control is performed on a digital computer,
and therefore the discrete description is far more natural for computer implementation or sim-
ulation. Converting a continuous differential equation to a discrete differential equation can be
done either by using a backward difference approximation such as Euler’s scheme, or by using z
transforms. The z transform is the discrete equivalent to the Laplace transform in the continuous
domain.

A vector/matrix approach to systems modelling and control has the following characteristics:

1. We can convert any linear time invariant differential equation into the state space form
ẋ = Ax + Bu.
2. Once we have selected a sampling time, T , we can convert from the continuous time do-
main to the discrete form equivalent; xk+1 = Φxk + ∆uk .
3. The stability of both the discrete and continuous time systems is determined by the eigen-
values of the A or Φ matrix.
4. We can transform the mathematical description of the process to dynamically similar de-
scriptions by adjusting our base coordinates, which may make certain computations easier.

Stability is one of the most important concepts for the control engineer. Continuous linear sys-
tems are stable if all the poles lie in the left hand side of the complex plane, ie they have negative
real parts. For discrete linear systems, the poles must lie inside the unit circle. No easy check can
be made for nonlinear systems, although the method due to Lyapunov can possibly be used, or
one can approximate the nonlinear as a linear system and check the stability of the latter.

When we deal with discrete systems, we must sample the continuous function. Sampling can
introduce problems in that we may miss interesting information if we don’t sample fast enough.
The sampling theorem tells us how fast we must sample to reconstruct particular frequencies.
Over sampling is expensive, and could possibly introduce unwanted artifacts such as RHP zeros
etc.
84 CHAPTER 2. FROM DIFFERENTIAL TO DIFFERENCE EQUATIONS
Chapter 3

Modelling dynamic systems with


differential equations

The ability to solve so many problems with a small arsenal of mathematical methods gave rise
to a very optimistic “world view” which in the 18th Century came to be called the “Age of
Reason”. The essential point is that the world was felt to be predictable.
Hans Mark

3.1 Dynamic system models

Dynamic system models are groups of equations that involve time derivative terms that attempt
to reproduce behaviour that we observe around us. I admire the sense of optimism in the quote
given above by Hans Mark: if we know the governing equations, and the initial conditions, then
the future is assured. We now know that the 18th century thinkers were misguided in this sense,
but nevertheless engineers are a conservative lot, and still today with similar tools, they aim to
predict the known universe.

It is important to realise that with proven process models, control system design becomes sys-
tematic, hence the importance of modelling.

Physically dynamic systems are those that change with time. Examples include vibrating struc-
tures (bridges and buildings in earthquakes), bodies in motion (satellites in orbit, fingers on a
typewriter), changing compositions (chemical reactions, nuclear decay) and even human charac-
teristics (attitudes to religion or communism throughout a generation, or attitudes to the neigh-
bouring country throughout a football match) and of course, there are many other examples. The
aim of many modellers is to predict the future. For many of these types of questions, it is infea-
sible to experiment to destruction, but it is feasible to construct a model and test this. Today, it is
easiest to construct a simulation model and test this on a computer. The types of models referred
to in this context are mathematically based simulation models.

Black-box or heuristic models, where we just fit any old curve to the experimental data, and White-
box or fundamental models, where the curves are fitted to well established physical laws repre-
sent two possible extremes in modelling philosophy. In practice, most engineering models lie
somewhere in the grey zone, known appropriately as ‘grey-box’ models where we combine our
partial prior knowledge with black-box components to fit any residuals. Computer tools for

85
86 Modelling

grey-box modelling are developed in [30].

3.1.1 Steady state and dynamic models

Steady state models only involve algebraic equations. As such they are more useful for design
rather than control tasks. However they can be used for control to give broad guidelines about
the operating characteristics of the process. One example is Bristol’s relative gain array (§ 3.3.4)
which uses steady state data to analyse for multivariable interaction. But it is the dynamic model
that is of most importance to the control engineer. Dynamic models involve differential equa-
tions. Solving systems involving differential equations is much harder than for algebraic equa-
tions, but is essential for the study and application of control. A good reference for techniques of
modelling dynamic chemical engineering applications is [196].

3.2 A collection of illustrative models

An economist is an expert who will know tomorrow why the things he predicted yesterday didn’t happen
today.
Laurence J. Peter

This section reviews the basic steps for developing dynamic process models and gives a few
examples of common process models. There are many text books devoted to modelling for en-
gineers, notably [33, 114] for general issues, [43, 195] for solution techniques and [129, 196] for
chemical engineering applications. I recommend the following stages in the model building pro-
cess:

1. Draw a diagram of the system with a boundary and labelling the input/output flows that
cross the boundary.
2. Decide on what are state variables, what are parameters, manipulated variables, uncon-
trolled disturbances etc.
3. Write down any governing equations that may be relevant such as conservation of mass
and energy.
4. Massage these equations in a standard form suitable for simulation.

The following sections will describe models for a flow system into and out of a tank with external
heating, a double integrator such as an inverted pendulum or satellite control, a forced circulation
evaporator, and a binary distillation column.

3.2.1 Simple models

Pendulums

Pendulums provide an easy and fruitful target to model. They can easily be built, are visually
impressive, and the governing differential equations derived from the physics are simple but
nonlinear. Furthermore, if you invert the pendulum such as in a rocket, the system is unstable,
and needs an overlaying control scheme.
3.2. A COLLECTION OF ILLUSTRATIVE MODELS 87

torque, Tc

θ
length, l

?
mg
θ
?
mg

Pendulum (stable) Inverted pendulum (unstable)

Figure 3.1: A stable and unstable pendulum

Fig. 3.1 shows the two possible orientations. For the classical stable case, the equation of motion
for the pendulum is
d2 θ
ml2 + mgl sin θ = Tc (3.1)
dt2
where θ is the angle of inclination, and Tc is the applied torque. We can try to solve Eqn. 3.1
analytically using dsolve from the symbolic toolbox,

>> syms m l g positive


>> syms Tc theta
>> dsolve(’m*lˆ2*D2theta + m*g*l*sin(theta) = Tc’)

Warning: Explicit solution could not be found; implicit solution returned.


> In C:\MATLAB6p5p1\toolbox\symbolic\dsolve.m at line 292
ans =
[ Int(m/(m*(2*m*g*l*cos(a)+2*Tc*a+C1*m*lˆ2))ˆ(1/2)*l,a = .. theta)-t-C2 = 0]
[ Int(-m/(m*(2*m*g*l*cos(a)+2*Tc*a+C1*m*lˆ2))ˆ(1/2)*l,a = .. theta)-t-C2 = 0]

but the solution returned is not much help indicating that this particular nonlinear differential
equation probably does not have a simple explicit solution.

Eqn. 3.1 is a second order nonlinear differential equation, but by defining a new state variable
vector, x, as

def
x1 = θ (3.2)
def dθ
x2 = ẋ1 = (3.3)
dt

then we note that ẋ2 = ẍ1 , so the single second order differential equation can now be written as
88 Modelling

two coupled first order equations


ẋ1 = x2 (3.4)
g Tc
ẋ2 = − sin x1 + (3.5)
l ml2
Linearising Eqn. 3.5 is trivial since sin θ ≈ θ for small angles. Consequently, both linear and
nonlinear expressions are compared below
" # " # " # " #
0 x1 0 0 1 0
ẋ = −g + Tc , ẋ = −g x+ Tc (3.6)
sin(x1 ) 0 2
0 2
| l {z ml } | l {z ml }
nonlinear linear

while the numerical results are compared on page 120.

In the inverted case, the equation for ẋ2 is almost the same,
g Tc
ẋ2 = + sin x1 + (3.7)
l ml2
but the sign change is enough to place the linearised pole in the right-hand plane.

A double integrator

The double integrator is a simple linear example from classical mechanics. It can describe,
amongst other things, a satellite attitude control system,
d2 θ
J =u (3.8)
dt2
where J is the moment of inertia, θ is the attitude angle, u is the control torque (produced by
small attitude rockets mounted on the satellite’s side). We can convert the second order system
in Eqn. 3.8 to two first order systems by again defining a new state vector
   
x1 def θ
x= = (3.9)
x2 θ̇
Substituting this definition into Eqn. 3.8, we get J ẋ2 = u and ẋ1 = x2 , or in a matrix-vector form
     
ẋ1 0 1 0
= x 1 u (3.10)
ẋ2 0 0 J

A liquid level tank

Buffer storage vessels such as shown in Fig. 3.2 are commonly used in chemical plants between
various pieces of equipment such as reactors and separators to dampen out fluctuations in pro-
duction flows.

Our system boundary is essentially the figure boundary, so we have one input, the input flow
rate, Fin , and one output, the output flow rate, Fout . What interests us is the amount of fluid in
the tank, M , which physically must always be less than the full tank Mmax , and greater than 0.

The governing relation is the conservation of mass,


dM
= Fin − Fout (3.11)
dt
3.2. A COLLECTION OF ILLUSTRATIVE MODELS 89

flow in, Fin


tank with cross sectional area, A

6
measured height, ĥ ∆P true height, h

?
valve
- flow out, Fout

Figure 3.2: Simple buffer tank

If the tank has a constant cross sectional area, then the amount of water in the tank M is pro-
portional to the height of water h, since most fluids are assumed incompressible. In addition the
mass M is proportional to the volume, M = ρV = ρAh. Rewriting Eqn. 3.11 in terms of height
dh Fin − Fout
= (3.12)
dt ρA
since the area and density are constant. If the liquid passes though a valve at the bottom of the
tank, the flowrate out will be a function of height. For many processes, the flow is proportional
to the square root of the pressure drop, (Bernoulli’s equation)
√ √
Fout ∝ ∆P = k h
This square root relation introduces a mild nonlinearity into the model.

The “tank” level is not actually measured in the tank itself, but in a level leg connected to the
tank. The level in this level leg lags behind the true tank level owing to the restriction valve on
the interconnecting line. This makes the system slightly more difficult to control owing to the
added measurement dynamics. We assume, that the level leg can be modelled using a first order
dynamic equation, with a gain of 1, and a time constant of about 2 seconds. I can estimate this by
simply watching the apparatus. Thus
dĥ h − ĥ
= (3.13)
dt 2
The level signal is a current, I, between 4 and 20 mA. This current is algebraically related (by
some function f (·) although often approximately linearly) to the level in the level leg. Thus the
variable that is actually measured is:
I = f (ĥ) (3.14)
The current I is called the output or measured variable. In summary, the full model equations
are;

dh Fin − k h
= (3.15)
dt ρA
dĥ h − ĥ
= (3.16)
dt 2
I = f (ĥ) (3.17)
90 Modelling

The input variable (which we may have some hope of changing) is the input flow rate, Fin , and
the exit valve position which changes k. The dependent (state) variables are the actual tank
level, h, and the measurable level in the level leg, ĥ. The output variable is the current, I. Con-
stant parameters are density, and tank cross-sectional area. In Table 3.1 which summarises this
nomenclature, I have written vectors in a bold face, and scalars in italics. The input variables
can be further separated into variables that can be easily changed or changed on demand such
as the outlet valve position (manipulated variables), and inputs that change outside our control
(disturbance variables). In this case the inlet flowrate.

Table 3.1: Standard nomenclature used in modelling dynamic systems

type symbol variable


independent t time, t
states x h, ĥ
inputs u Fin , k
outputs y I
parameters θ ρ, A

A stirred tank heater

Suppose an electrical heater is added causing an increase in water temperature. Now in addition
to the mass balance, we have an energy balance. The energy balance in words is “the rate of
temperature rise of the contents in the tank is proportional to the amount of heat that the heater
is supplying plus the amount of heat that is arriving in the incoming flow minus the heat lost
with the out flowing stream minus any other heat losses”.

We note that for water the heat capacity cp , is almost constant over the limited temperature range
we are considering, and that the enthalpy of water in the tank is defined as

H = M cp ∆T (3.18)

where ∆T is the temperature difference between the actual tank temperature T and some refer-
ence temperature, Tref . Writing an energy balance gives

dT
M cp = Fin cp (Tin − Tref ) − Fout cp ∆T + Q − q (3.19)
dt

where Q is the heater power input (kW) and q is the heat loss (kW). These two equations (Eqn 3.12
and 3.19) are coupled. This means that a change in mass in the tank M, will affect the temperature
in the tank T (but not the other way around). This is called one-way coupling.

For most purposes, this model would be quite adequate. But you may decide that the heat loss q
to the surroundings is not constant and the variation significant. A more realistic approximation
of the heat loss would be to assume that it is proportional to the temperature difference between
the vessel and the ambient (room) temperature Troom . Thus q = k(T − Troom ). Note that the heat
loss now can be both positive or negative. You may also decide that the heat capacity cp , and
density ρ of water are a function of temperature. Now you replace these constants with functions
as cp (T ) and ρ(T ). These functions are tabulated in the steam tables1 . The more complete model

1 Steam tables for M ATLAB are available from http://www.kat.lth.se/edu/kat061/MfileLib.html


3.2. A COLLECTION OF ILLUSTRATIVE MODELS 91

would then be

dh Fin − Fout
= (3.20)
dt ρ(T )A
dT
Aρ(T ) = Fin cp (T ) (Tin − T ref ) − Fout cp (T )∆T + Q − k (T − Troom)
dt

“Caveat Emptor”

Finally we should always state under what conditions the model is valid and invalid. For this
example we should note that the vessel temperature should not vary outside the range 0 < T <
100◦ C since the enthalpy equation does not account for the latent heat of freezing or boiling. We
should also note that h, Fin , Fout , M, A, k, cp , ρ are all assumed positive.

As a summary and a check on the completeness of the model, it is wise to list all the variables,
and state whether the variable is a dependent or independent variable, whether it is a dynamic
or constant variable, or whether it is a parameter. The degree of freedom for a well posed prob-
lem should be zero. That is, the number of unknowns should equal the number of equations.
Incomplete models, and/or bad assumptions can give very misleading and bad results.

A great example highlighting the dangers of extrapolation from obviously poor models is the bi-
annual UK government forecast of Britain’s economic performance. This performance or growth,
is defined as the change in the Gross Domestic Product (GDP) which is the total amount of goods
and services produced in the country that year. Figure 3.3 plots the actual growth over the last
few years (solid line descending), compared with the forecasts given by the Chancellor of the Ex-
chequer, (dotted lines optimistically ascending). I obtained this information from a New Scientist
article provocatively titled Why the Chancellor is always wrong, [49]. Looking at this performance,
it is easy to be cynical about governments using enormous resources to model things that are
essentially so complicated as to be unmodellable, and then produce results that are politically
advantageous.

119

118 Forecasts

117
Growth GDP change

116

115

114

113

112 Figure 3.3: The UK growth based on the


Actual
GDP comparing the actual growth (solid)
111
1989 1990 1991 1992 1993 versus Treasury forecasts (dashed) from
Year 1989–1992.
92 Modelling

3.3 Chemical process models

The best material model of a cat is another, or preferably the same, cat.

Arturo Rosenblueth, Philosophy of Science, 1945

Modelling is very important in the capital-intensive world of chemical manufacture. Perhaps


there is no better example of the benefits of modelling than in distillation. One industry maga-
zine2 reported that distillation accounted for around 3% of the total world energy consumption.
Clearly this provides a large economic incentive to improve operation. Distillation models pre-
dict the extent of the component separation given the type of trays in the column, the column di-
ameter and height and the operating conditions (temperature,pressure etc). More sophisticated
dynamic models predict the concentration in the distillate given changes in feed composition,
boilup rate and reflux rate over time. An example of a simple distillation dynamic model is
further detailed in §3.3.3.

When modelling chemical engineering unit operations, we can consider (in increasing complex-
ity):

1. Lumped parameters systems such as well-mixed reactors or evaporators

2. Staged processes such as distillation columns, multiple effect evaporators, floatation.

3. Distributed parameter systems such as packed columns, poorly mixed reactors, dispersion
in lakes & ponds.

An evaporator is an example of a lumped parameter system and is described in §3.3.2 while a


rigorous distillation column model described on page 98 is an example of a staged process.

At present we will now investigate only lumped parameter systems and staged processes which
involve only ordinary differential equations, as opposed to partial differential equations, albeit
a large number of them. Problems which involve algebraic equations coupled to the ordinary
differential equations are briefly considered in section 3.5.1.

A collection of good test industrial process models is available in the nonlinear model library
from www.hedengren.net/research/models.htm.

3.3.1 A continuously-stirred tank reactor

Tank reactors are a common unit operation in chemical processing. A model presented in [85]
considers a simple A → B reaction with a cooling jacket to adjust the temperature, and hence the
reaction rate as shown in Fig. 3.4.

The reactor model has two states, the concentration of compound A, given the symbol Ca mea-
sured in mol/m3 and the temperature of the reaction vessel liquid, T measured in K. The manip-
ulated variables are the cooling jacket water temperature, Tc , the temperature of the feed Tf and

2 The Chemical Engineer, 21st October, 1999, p16


3.3. CHEMICAL PROCESS MODELS 93

Feed, Tf , Caf
-
Stirrer

Cooling, Tc - A→ B

?
Product, T, Ca

Figure 3.4: A CSTR reactor

the concentration of the feed, Caf .


 
dT q −E
= (Caf − Ca ) − k0 exp Ca (3.21)
dt V RT
 
dCa q m∆H −E UA
= (Tf − T ) + k0 exp Ca + (Tc − T ) (3.22)
dt V ρCp RT V ρCp

The values for the states and inputs at steady-state are


   
    Tc 300
def T 324.5 def
xss = = , uss =  Tf  =  350  (3.23)
Ca 0.8772
ss Caf ss 1

The values of the model parameters such as ρ, Cp etc. are given in Table 3.2.

Table 3.2: Parameters of the CSTR model


name variable unit
Volumetric Flowrate q = 100 m3 /sec
Volume of CSTR V = 100 m3
Density of A-B Mixture ρ = 1000 kg/m3
Heat capacity of A-B Mixture Cp = .239 J/kg-K
Heat of reaction for A→B m∆H = 5 · 104 J/mol
E/R = 8750 K
Pre-exponential factor k0 = 7.2 · 1010 1/sec
Overall Heat Transfer Coefficient × Area U A = 5 · 104 W/K

Note: At a jacket temperature of Tc = 305K, the reactor model has an oscillatory response. The
oscillations are characterized by reaction run-away with a temperature spike. When the concen-
tration drops to a low value, the reactor cools until the concentration builds and there is another
run-away reaction.
94 Modelling

3.3.2 A forced circulation evaporator

Evaporators such as illustrated in Fig. 3.5 are used to concentrate fruit juices, caustic soda, alu-
mina and many other mixtures of a solvent and a solute. The incentive for modelling such unit
operations is that the model can provide insight towards better operating procedures or future
design alternatives. A relatively simple dynamic model of a forced circulation evaporator is de-
veloped in [145, chapter 2] is reproduced here as a test process for a wide variety of control tech-
niques of interest to chemical engineers. Other similar evaporator models have been reported
in the literature; [65] documents research over a number of years at the University of Alberta
on a pilot scale double effect evaporator, and [40] describe a linearised pilot scale double effect
evaporator used in multivariable optimal selftuning control studies.

Vapour Cooling
water
Condenser

Condensate

L
level

Pressure P
Separator

Steam

Evaporator

Condensate circulation pump


composition
A
Feed
Product

Figure 3.5: A forced circulation evaporator from [145, p7].

The nonlinear evaporator model can be linearised into a state space form with the variables de-
scribed in Table 3.3. The parameters of the linearised state space model are given in Appendix E.1.

Table 3.3: The important variables in the forced circulation evaporator

type name variable unit range


level L2 m 0–2
State steam pressure P2 kPa ??
product conc x2 % 0–100
product flow F2 kg/min 0-10
Manipulated steam pressure P100 kPa 100–350
c/w flow F200 kg/min 0–300
circulating flow F3 kg/min 30-70
feed flow F1 kg/min 5-15
Disturbance feed composition x1 % 0-10

feed temperature T1 C 20-60

c/w temperature T200 C 15-35

The distinguishing characteristics of this evaporator system from a control point are that the
3.3. CHEMICAL PROCESS MODELS 95

detailed plant model is nonlinear, the model is non-square (the number of manipulated variables,
u, do not equal the number of state variables, x), and that one of the states, level, is an integrator.
The dynamic model is both observable and controllable, which means that by measuring the
outputs only, we can at least in theory, control all the important state variables by changing the
inputs.

Problem 3.1 1. Look at the development of the nonlinear model given in [145, chapter 2].
What extensions to the model would you suggest? Particularly consider the assumption of
constant mass (M = 50kg) in the evaporator circuit and the observation that the level in the
separator changes. Do you think this change will make a significant difference?
2. Describe in detail what tests you would perform experimentally to verify this model if you
had an actual evaporator available.
3. Construct a M ATLAB simulation of the nonlinear evaporator. The relevant equations can be
found in [145, Chpt 2].

3.3.3 A binary distillation column model

Distillation columns, which are used to separate mixtures of different vapour pressures into al-
most pure components, are an important chemical unit operation. The columns are expensive
to manufacture, and the running costs are high owing to the high heating requirement. Hence
there is much incentive to model them with the aim to operate them efficiently. Schematics of
two simple columns are given in Figures 3.6 and 3.9.

Wood and Berry experimentally modelled a binary distillation column in [202] that separated
water from ethanol as shown in Fig. 3.6. The transfer function model they derived by step testing
a real column has been extensively studied, although some authors, such as [185], point out
that the practical impact of all these studies has been, in all honesty, minimal. Other distillation
column models that are not obtained experimentally are called rigorous models and are based on
fundamental physical and chemical relations. Another more recent distillation column model,
developed by Shell and used as a test model is discussed in [137].

Columns such as given in Fig. 3.6 typically have at least 5 control valves, but because the hydro-
dynamics and the pressure loops are much faster than the composition dynamics, we can use the
bottoms exit valve to control the level at the bottom of the column, use the distillate exit valve to
control the level in the separator, and the condenser cooling water valve to control the column
pressure. This leaves the two remaining valves (steam reboiler and reflux) to control the top and
bottoms composition.

The Wood-Berry model is written as a matrix of Laplace transforms in deviation variables


   
12.8e−s −18.9e−3s 3.8e−8s
 16.7s + 1 21s + 1   u +  14.9s + 1  d
 
y=  6.6e−7s −3s   −3.4s 
(3.24)
−19.4e 4.9e
| 10.9s + 1 {z 14.4s + 1 } 13.2s + 1
G(s)
where the inputs u1 and u2 are the reflux and reboiler steam flowrates respectively, the outputs
y1 and y2 are the mole fractions of ethanol in the distillate and bottoms, and the disturbance
variable, d is the feed flowrate.

It is evident from the transfer function model structure in Eqn. 3.24 that the plant will be inter-
acting and that all the transfer functions have some time delay associated with them, but the
96 Modelling

c/w ?
6 Reflux, R
condensor s
 ? - - Distillate, D, xD

Feed, F, xF - -
Distillation column


6
Steam, S 
reboiler
-
? 6
- Bottoms, xB , B

Figure 3.6: Schematic of a distillation column

off-diagonal terms have the slightly larger time delays. Both these characteristics will make con-
trolling the column difficult.

In M ATLAB, we can construct such a matrix of transfer functions (ignoring the feed) with a matrix
of deadtimes as follows:

>> G = tf({12.8, -18.9; 6.6 -19.4}, ...


{ [16.7 1],[21 1]; [10.9 1],[14.4 1]}, ...
'ioDelayMatrix',[1,3; 7,3], ...
4 'InputName',{'Reflux','Steam'}, ...
'OutputName',{'Distillate','bottoms'})

Transfer function from input "Reflux" to output...


12.8
9 Distillate: exp(-1*s) * ----------
16.7 s + 1

6.6
bottoms: exp(-7*s) * ----------
14 10.9 s + 1

Transfer function from input "Steam" to output...


-18.9
Distillate: exp(-3*s) * --------
19 21 s + 1

-19.4
3.3. CHEMICAL PROCESS MODELS 97

bottoms: exp(-3*s) * ----------


14.4 s + 1

Once we have formulated the transfer function model G, we can perform the usual types of
analysis such as step tests etc as shown in Fig. 3.7. An alternative implementation of the column
model in S IMULINK is shown in Fig. 3.8.

Step Response

From: Reflux From: Steam

10
To: Distillate

−10
Amplitude

−20
To: bottoms

−10 Figure 3.7: Step responses of the four


transfer functions that make up the
−20 Wood-Berry binary distillation col-
0 50 100 0 50 100
Time (sec) umn model in Eqn. 3.24.

A 3-input/3 output matrix of transfer functions model of an 19 plate ethanol/water distillation


column model with a variable side stream draw off from [151] is
 
0.66e−2.6s −0.61e−3.5s −0.0049e−s
    6.7s + 1 8.64s + 1 9.06s + 1 
 
y1  1.11e−6.5s −3s  u1
 y2  =  −2.36e −0.012e−1.2s 
  u2 
 3.25s + 1 5s + 1 7.09s + 1 
y3   u3
 −34.68e−9.2s 46.2e −9.4s
0.87(11.61s + 1)e−2.6s 
8.15s + 1 10.9s + 1 (3.89s + 1)(18.8s + 1)
 −1.2s 
0.14e −0.0011(26.32s + 1)e−2.66s

 6.2s + 1 (3.89s + 1)(14.63s + 1)  
 −10.5s  
 0.53e −0.0032(19.62s + 1)e−3.44s  d1
+  (3.25)
 6.9s + 1 (7.29s + 1)(8.94s + 1)  d2
 
 −11.54e−0.6s 0.32e −2.6s 
7.01s + 1 7.76s + 1
which provides an alternative model for testing multivariable control schemes. In this model
the three outputs are the overhead ethanol mole fraction, the side stream ethanol mole fraction,
and the temperature on tray 19, the three inputs are the reflux flow rate, the side stream product
rate and the reboiler steam pressure, and the two disturbances are the feed flow rate and feed
temperature. This system is sometimes known as the OLMR after the initials of the authors.

Problem 3.2 1. Assuming no disturbances (d = 0), what are the steady state gains of the
Wood-Berry column model? (Eqn 3.24). Use the final value theorem.

2. Sketch the response for y1 and y2 for;

(a) a change in reflux flow of +0.02


98 Modelling

reflux reflux distillate distillate

steam bottoms

steam Wood Berry Column


bottoms
(a) Overall column mask.

12.8
1 1
16.7s+1
reflux distillate
Transfer Fcn Transport
Delay
−18.9
21s+1
Transfer Fcn1 Transport
Delay1
6.6
10.9s+1
Transfer Fcn2 Transport
Delay2

−19.4
2 2
14.1s+1
steam bottoms
Transfer Fcn3 Transport
Delay3

(b) Inside the Wood Berry column mask. Compare this with Eqn. 3.24.

Figure 3.8: Wood-Berry column model implemented in S IMULINK

(b) A change in the reflux flow of −0.02 and a change in reboiler steam flow of +0.025
simultaneously.
3. Modify the S IMULINK simulation to incorporate the feed dynamics.

More rigorous distillation column models

Distillation column models are important to chemical engineers involved in the operation and
maintenance of these expensive and complicated units. While the behaviour of the actual multi-
component tower is very complicated, models that assume ideal binary systems are often good
approximations for many columns. We will deal in mole fractions of the more volatile compo-
nent, x for liquid, y for vapour and develop a column model following [130, p69].

A generic simple binary component column model of a distillation column such as shown in
Fig. 3.9 assumes:

1. Equal molar overflow applies (heats of vapourisation of both components are equal, and
mixing and sensible heats are negligible.)
2. Liquid level on every tray remains above weir height.
3.3. CHEMICAL PROCESS MODELS 99

3. Relative volatility and the heat of vapourisation are constant. In fact we assume a constant
relative volatility, α. This simplifies the vapour-liquid equilibria (VLE) model to
αxn
yn =
1 + (α − 1)xn

on tray n with typically α ≈ 1.2–2.

4. Vapour phase holdup is negligible and the feed is a saturated liquid at the bubble point.

-
?

c/w - -
condensor
6
?
condensor collector
 - Distillate, D, xD
tray # N reflux, R, xD

..
Feed, F, xF .
-
feed tray, NF
..
.

tray #1 
6 reboiler
 - Bottoms, xB , B
6
?

Figure 3.9: Distillation tower

The N trays are numbered from the bottom to top, (tray 0 is reboiler and tray N + 1 is the con-
denser). We will develop separate model equations for the following parts of the column, namely:

Condenser is a total condenser, where the reflux is a liquid and the reflux drum is perfectly
mixed.

General tray has liquid flows from above, and vapour flows from below. It is assumed to be
perfectly mixed with a variable liquid hold up, but no vapour hold up as it is assumed very
fast.

Feed tray Same as a ‘general’ tray, but with an extra (liquid) feed term.

Top & bottom trays Same as a ‘general’ tray, but with one of the liquid (top) or vapour (bottom)
flows missing, but recycle/reflux added.

Reboiler in a perfectly mixed thermo-siphon reboiler with hold up Mb .


100 Modelling

The total mass balance in the condenser and reflux drum is


dMd
=V −R−D
dt
and the component balance on the top tray
dMd xd
= V yNT − (R + D)xD (3.26)
dt
For the general nth tray, that is trays #2 through N − 1, excepting the feed tray, the total mass
balance is
dMn
= Ln+1 − Ln + Vn−1 − Vn
dt | {z }
≈0
= Ln+1 − Ln (3.27)
and the component balance
dMn xn
= Ln+1 xn+1 − Ln xn + V yn−1 − V yn (3.28)
dt
For the special case of the feed nF th tray,
dMnF
= LnF +1 − LnF + F, mass balance
dt
dMnF xnF
= LnF +1 xnF +1 − LnF xnF + F xF + V ynF −1 − V ynF , component balance
dt
and for the reboiler and column base
dMB
= L1 − V − B
dt
dMB xB
= L1 x1 − BxB − V xB
dt
In summary the number of variables for the distillation column model are:

Tray compositions, xn , yn 2NT


Tray liquid flows, Ln NT
VARIABLES Tray liquid holdups, Mn NT
Reflux comp., flow & hold up, xD , R, D, MD 4
Base comp., flow & hold up, xB , yB , V, B, MB 5
Total # of equations 4NT + 9

and the number of equations are:

Tray component balance, NT


Tray mass balance NT
Equilibrium (tray + base) NT + 1
E QUATIONS
hydraulic NT
Reflux comp. & flow 2
Base comp & flow 2
Total # of equations 4NT + 7

Which leaves two degrees of freedom. From a control point of view we normally fix the boildup
rate, Q̇, and the reflux flow rate (or ratio) with some sort of controller,
R = f (xD ), V ∝ Q̇ = f (xB )
3.3. CHEMICAL PROCESS MODELS 101

Our dynamic model of a binary distillation column is relatively large with two inputs (R, V ), two
outputs (xB , xD ) and 4N states. Since a typical column has about 20 trays, we will have around
n = 44 states. which means 44 ordinary differential equations However the (linearised) Jacobian
for this system while a large 44 × 44 matrix is sparse. In our case the percentage of non-zero
elements or sparsity is ≈ 9%.  
∂ ẋ .. ∂ ẋ
 ∂x .
|{z} ∂u 
|{z}
44×44 44×2

The structure of the A and B matrices are, using the spy command, given in Fig. 3.10.

Jacobian of the Binary distiilation column model


0

10

15
ODE equations, (zdot=)

20

25

30

35

40

45
0 5 10 15 20 25 30 35 40 45
States & input, ([holdup, comp*holdup| [R V])

Figure 3.10: The incidence of the Jacobian and B matrix for the ideal binary distillation column
model. Over 90% of the elements in the matrix are zero.

There are many other examples of distillation column models around (e.g. [140, pp 459]) This
model has about 40 trays, and assumes a binary mixture at constant pressure and constant rela-
tive volatility.

Simulation of the distillation column

The simple nonlinear dynamic simulation of the binary distillation column model can be used in
a number of ways including investigating the openloop response, interactions and quantify the
extent of the nonlinearities. It can be used to develop simple linear approximate transfer function
models, or we could pose “What if?” type questions such as quantifying the response given feed
disturbances.

An openloop simulation of a distillation column gives some idea of the dominant time constants
and the possible nonlinearities. Fig. 3.11 shows an example of one such simulation where we
step change:

1. the reflux from R = 128 to R = 128 + 0.2, and


2. the reboiler from V = 178 to V = 178 + 0.2
102 Modelling

Step change in Reflux Step change in Vapour


1 1

0.995 0.995
Distill conc

0.99 0.99

0.985 0.985

0.98 0.98

0.975 0.975
0 10 20 30 0 10 20 30

0.1 0.1

0.08 0.08
base conc

0.06 0.06

0.04 0.04

0.02 0.02

0 0
0 10 20 30 0 10 20 30
time (min) time (min)

Figure 3.11: Open loop response of the distillation column control

From Fig. 3.11 we can note that the open loop step results are overdamped, and that steady-state
gains are very similar in magnitude. Furthermore the response looks very like a 2 × 2 matrix of
second order overdamped transfer functions,
    
xD (s) G11 G12 R(s)
=
xB (s) G21 G22 V (s)

So it is natural to wonder at this point if it is possible to approximate this response with a low-
order model rather than requiring the entire 44 states and associated nonlinear differential equa-
tions.

Controlled simulation of a distillation column

A closed loop simulation of the 20 tray binary distillation column with a feed disturbance from
xF = 0.5 to xF = 0.54 at t = 10 minutes is given in Fig. 3.12.

We can look more closely in Fig. 3.13 at the distillate and base compositions to see if they really
are in control, x⋆b = 0.02, x⋆d = 0.98.

Distillation columns are well known to interact, and these interactions cause difficulties in tuning.
We will simulate in Fig. 3.14 a step change in the distillate setpoint from x⋆D = 0.98 to x⋆D = 0.985
at t = 10 min and a step change in bottoms concentration at 150 minutes. The interactions
are evident in the base composition transients owing to the changing distillate composition and
visa versa. These interactions can be minimised by either tightly tuning one of the loops so
consequently leaving the other ‘loose’, or by using a steady-state or dynamic decoupler, or even
a multivariable controller.
3.3. CHEMICAL PROCESS MODELS 103

Binary distillation column model


1

0.8
Liquid concentration

0.6

0.4

0.2

0
130

128
R

126

124
185
Figure 3.12: A closed loop simulation of a
binary distillation column given a feed dis-
180
V

turbance. The upper plot shows the con-


centration on all the 20 trays, the lower
175 plots shows the manipulated variables (re-
0 20 40 60 80 100
time (min) flux and boil-up) response.

0.99
Distillate

0.985

0.98

0.975
0.04
Bottoms

0.03

0.02 Figure 3.13: Detail of the distillate and base


concentrations from a closed loop simula-
0.01 tion of a binary distillation column given a
0 20 40 60 80 100
time [mins] feed disturbance. (See also Fig. 3.12.)

3.3.4 Interaction and the Relative Gain Array

The Wood-Berry distillation column model is an example of a multivariable, interacting process.


This interaction is evident from the presence of the non-zero off diagonal terms in Eqn. 3.24. A
quantitative measure of the extent of interaction in a multivariable process is the Relative Gain
Array (RGA) due to Bristol, [35]. The RGA is only applicable to square systems (the number of
manipulated variables equal the number of controlled variables), and is a steady-state measure.

The RGA, given the symbol Λ, is an (n × n) matrix where each element is the ratio of the open
loop gain divided by the closed loop gain. Ideally, if you are the sort who do not like interactions,
you would like Λ to be diagonally dominant. Since all the rows and all the columns must sum to
1.0, then a system with no interaction will be such that Λ = I.

The open loop steady-state gain is easy to determine. Taking the Wood-Berry model, Eqn. 3.24,
104 Modelling

Distillate xD
0.99

0.98

0.04

0.03

Base, xB
0.02

0.01

0
160
Reflux
140

120
200
reboiler

180
Figure 3.14: Distillation interactions are ev-
ident when we step change the distillate 160
0 50 100 150 200 250 300
and bottoms setpoints independently. time (min)

as an example, the final steady-state as a function of the manipulated variable u, is


 
12.8 −18.9
yss = Gss u = u (3.29)
6.6 −19.4

The ijth element of Gss is the open loop gain between yi and uj . The Gss matrix can be evaluated
experimentally or formed by applying the final value theorem to Eqn 3.24. Now the closed loop
gain is a function of the open loop gain, so only the open loop gains are needed to evaluate the
RGA. Mathematically the relative gain array, Λ, is formed by multiplying together Gss and G−⊤ ss
elementwise,3
⊤
Λ = Gss ⊗ G−1 ss (3.30)
where the special symbol ⊗ means to take the Hardamard product (also known as the Schur
product), or simply the elementwise product of two equally dimensioned matrices as opposed to
the normal matrix multiplication.

In M ATLAB, the evaluation of the relative gain array Λ is easy.

1 >>Gss = [12.8, -18.9; 6.6, -19.4] % Steady-state gain from Eqn. 3.29
>>L = Gss.*inv(Gss)' % See Eqn. 3.30. Don’t forget the dot-times (.*)

which should return something like


 
2.0094 −1.0094
Λ=
−1.0094 2.0094

Note that all the columns and rows sum to 1.0. We would expect this system to exhibit severe
interactions, although the reverse pairing would be worse.

The usefulness of the RGA is in choosing which manipulated variable should go with which
control variable in a decentralised control scheme. We juggle the manipulated/control variable
parings until Λ most approaches the identity matrix. This is an important and sensitive topic in
 ⊤
3 Note that this does not imply that Λ = Gss G−1
ss , i.e. without the Hardamard product. See [179, p456] for further
details.
3.3. CHEMICAL PROCESS MODELS 105

process control and is discussed in more detail in [191, pp494–503], and in [179, p457]. The most
frequently discussed drawback of the relative gain array, is that the technique only addresses the
steady state behaviour, and ignores the dynamics. This can lead to poor manipulated/output
pairing in some circumstances where the dynamic interaction is particularly strong. The next
section further illustrates this point.

The dynamic relative gain array

As mentioned above, the RGA is only a steady-state interaction indicator. However would could
use the same idea to generate an interaction matrix, but this time consider the elements of the
transfer function matrices as a function of frequency by substituting s = jω. This now means
that the dynamic relative gain array, Λ(ω), is a matrix where the elements as functions of ω.

Consider the (2 × 2) system from [181]


 
  2.5e−5s 5  
y1  (15s + 1)(2s + 1)
 u1
4s + 1 
= (3.31)
y2  1 −4e−5s  u2
3s + 1 20s + 1
| {z }
G(s)
which has as its distinguishing feature significant time delays on the diagonal elements. We could
compute the dynamic relative gain array matrix using the definition

Λ(s) = G(s) ⊗ G(s)−T (3.32)

perhaps using the symbolic toolbox to help us with the possibly unwieldy algebra.

Listing 3.1: Computing the dynamic relative gain array analytically


>>syms s
>>syms w real
3

>>G=[2.5*exp(-5*s)/(15*s+1)/(2*s+1), 5/(4*s+1); ...


1/(3*s+1), -4*exp(-5*s)/(20*s+1)] % Plant from Eqn. 3.31

>>RGA = G.*inv(G') % Λ(s), Eqn. 3.32


8 >>DRGA = subs(RGA,'s',1j*w) % Substitute s = jω

>> abs(subs(DRGA,w,1)) % Magnitude of the DRGA matrix at ω = 1 rad/s


ans =
0.0379 0.9793
13 0.9793 0.0379

You can see that from the numerical values of the elements of Λ at ω = 1 rad/s that this system
is not diagonally dominant at this important frequency. Fig. 3.15 validates this observation.

An alternative numerical way to generate the elements of the DRGA matrix as a function of
frequency is to compute the Bode diagram for the multivariable system, extract the current gains,
and then form the RGA from Eqn. 3.30.

Listing 3.2: Computing the dynamic relative gain array numerically as a function of ω. See also
Listing 3.1.
106 Modelling

G = tf({2.5, 5; 1 -4}, ...


2 {conv([15 1],[2,1]),[4 1]; [3 1],[20 1]}, ...
'ioDelayMatrix',[5,0; 0,5]) % Plant from Eqn. 3.31

[¬,¬,K] = zpkdata(G); sK =sign(K); % get sign of the gains


[Mag,Ph,w] = bode(G); % Compute Bode diagram
7 DRGA = NaN(size(Mag));

for i=1:length(w)
K = Mag(:,:,i).*sK; % Gain, including signs
DRGA(:,:,i) = K.*(inv(K))'; % Λ(ω) = K ⊗ K−T
12 end

OnDiag = squeeze(DRGA(1,1,:)); OffDiag = squeeze(DRGA(1,2,:));


semilogx(w, OnDiag,'-',w,OffDiag,'--'); % See Fig. 3.15

The trends of the diagonal and off-diagonal elements of Λ are plotting in Fig. 3.15. What is
interesting about this example is that if we only consider the steady-state case, ω = 0, then
Λ is diagonally dominant, and our pairing looks suitable. However what we should be really
concentrating on are the values around the corner frequency at ω ≈ 0.1 where now the off-
diagonal terms start to dominate.

1
λ
1,1
0.8 λ1,2

0.6

0.4

0.2

Figure 3.15: The diagonal and off-diagonal el-


ements of the RGA matrix as a function of 0
−3 −2 −1 0 1
10 10 10 10 10
frequency. frequency

For comparison, Fig. 3.16 shows the dynamic RGA for the (3 × 3) OLMR distillation column
model from Eqn. 3.25. In this case the off-diagonal elements do not dominate at any frequency.

2.5
λ11
2
λ22
1.5 Diagonal elements
λ33
elements of Λ(ω)

0.5

0 λ23
λ13
−0.5 Off−diagonal elements
λ12
−1

Figure 3.16: The elements of the (3 × 3) RGA −1.5


−3 −2 −1 0 1
10 10 10 10 10
matrix from Eqn. 3.25. frequency
3.4. REGRESSING EXPERIMENTAL DATA BY CURVE FITTING 107

3.4 Regressing experimental data by curve fitting

In most cases of industrial importance, the models are never completely white-box, meaning that
there will always be some parameters that need to be fitted, or at least fine-tuned to experimental
data. Generally the following three steps are necessary:

1. Collect N experimental independent xi , and dependent yi data points.

2. Select a model structure, M(θ) with a vector of parameters θ that are to be estimated.

3. Search for the parameters that optimise some sort performance index such as to maximise
the “goodness of fit”, or equivalently minimise the sum of squared residuals.

A common performance index or loss function is to select the adjustable parameters so that the
sum of difference squared between the raw data and your model predictions is minimised. This
is the classical Gaussian least-squares minimisation problem. Formally we can write the perfor-
mance index as,
N
X 2
J = (yi − ŷi ) (3.33)
i=1

where the ith model prediction, ŷi is a function of the model, parameters and input data,

ŷi = f (M(θ), xi ) (3.34)

We want to find the “best” set of parameters, i.e. the set of θ that minimises J in Eqn. 3.33,
( N
)
X
⋆ 2
θ = arg min (yi − ŷi ) (3.35)
θ i=1

The “arg min” part of Eqn 3.35 can be read as “the argument who minimises . . . ” since we are
not too interested in the actual value of the performance function at the optimum, J , but rather
the value of the parameters at the optimum, θ⋆ . This least-squared estimation is an intuitive
performance index, and has certain attractable analytical properties. The German mathematician
Gauss is credited with popularising this approach, and indeed using it for analysing data taken
while surveying Germany and making astronomical observations. Note however other possible
objectives rather than Eqn. 3.33 are possible, such as minimising the sum of the absolute value
of the deviations, or minimising the single maximum deviation. Both these latter approaches
have seen a resurgence of interest of the last few years since the computer tools have enabled
investigators to by-pass the difficulty in analysis.

There are many ways to search for the parameters, and any statistical text book will cover these.
However the least squares approach is popular because it is simple, requires few prior assump-
tions, and typically gives good results.

3.4.1 Polynomial regression

An obvious way to fit smooth curves to experimental data is to find a polynomial that passes
through the cloud of experimental data. This is known as polynomial regression. The nth order
polynomial to be regressed is

M(θ) : ŷ = θn xn + θn−1 xn−1 + · · · θ1 x + θ0 (3.36)


108 Modelling

where we try different values of the (n+1) parameters in the vector θ until the difference between
the calculated dependent variable ŷ is close to the actual measured value y. Mathematically the
vector of parameters θ is obtained by solving a least squares optimisation problem.

Given a set of parameters, we can compute the ith model prediction using
 
θn
 θn−1 
 
ŷi = xn xn−1 · · · x 1 ·  ... 
  
(3.37)
 
 θ1 
θ0

or written in compact matrix notation


ŷi = xi θ
where xi is the data row vector for the ith observation. If all n observations are stacked vertically
together, we obtain the matrix system
   n 
ŷ1 x1 x1n−1 · · · x1 1
 ŷ2   xn2 x1 n−1
   · · · x2 1  
 ..  =  .. .. .. .. ..  θ (3.38)
 .   . . . . . 
n−1
ŷN xnN xN · · · xN 1
| {z }
X
or y = Xθ in a more compact matrix notation. The matrix of comprised of the stacked rows
of measured independent data, X, in Eqn. 3.38 is called the Vandermonde matrix or data matrix
and can be easily constructed in M ATLAB using the vander command, although it is well known
that this matrix can become very illconditioned.

We can search for the parameters in a number of ways, but minimising the squared error is
a common and easy approach. In this case our objective function that we want to minimise
(sometimes called the cost function), is the summed squared error. In matrix notation, the error
vector is
def
ǫ = y − Xθ (3.39)
and the scalar cost function then becomes

J = ǫ⊤ ǫ = (y − Xθ) (y − Xθ)
= y⊤ y − θ ⊤ X⊤ y − y⊤ Xθ + θ⊤ X⊤ Xθ

We want to choose θ such that J is minimised, ie a stationary point, thus we can set the partial
derivatives of J with respect to the parameters to zero,4

∂J
=0
∂θ
= −2X⊤ y + 2X⊤ Xθ (3.40)

which we can now solve for θ as  −1


θ = X⊤ X X⊤ y (3.41)
| {z }
pseudo-inverse
As a consequence of the fact that we carefully chose our model structure in Eqn. 3.36 to be linear
in the parameters, θ, then the solution given by Eqn. 3.41 is analytic and therefore very reliable,
4 Ogata, [148, p938], gives some helpful rules when using matrix-vector differentials.
3.4. REGRESSING EXPERIMENTAL DATA BY CURVE FITTING 109

and straight forward to implement as opposed to nonlinear regression which require iterative
solution techniques.

This method of fitting a polynomial through experimental data is called polynomial least-squares
regression. In general, the number of measurements must be greater (or equal) to the number
of parameters. Even with that proviso, the data matrix X⊤ X can get very ill-conditioned, and
hence it becomes hard to invert in a satisfactorily manner. This problem occurs more often when
high order polynomials are used or when you are trying to over-parameterise the problem. One
solution to this problem is given in §3.4.1.
 −1
The matrix X⊤ X X⊤ is called the left pseudo-inverse of X, and is sometimes denoted X+ .
The pseudo inverse is a generalised inverse applicable for even non-square matrices and is dis-
 −1
cussed in [150, p928]. M ATLAB can compute X⊤ X X⊤ with the pseudo inverse command,
pinv, enabling the parameters to be evaluated simply by typing theta = pinv(X)*y, al-
though it is more efficient to simply use the backslash command, theta = X\y.

An example of polynomial curve fitting using least-squares

Tabulated below is the density of air as a function of temperature. We wish to fit a smooth
quadratic curve to this data.

Temperature [◦ C] −100 −50 0 60 100 160 250 350


Air density [kg/m3 ] 1.98 1.53 1.30 1.067 0.946 0.815 0.675 0.566

2.5
Raw Data
fitted curve
2
Air density, [kg/m ]
3

1.5

Figure 3.17: The density of air as a function


of temperature. Experimental data, •, and a
0.5
−200 0 200 400 fitted quadratic curve. (See also Fig. 3.18 fol-
Temperature [°C] lowing for a higher-order polynomial fit.)

To compute the three model parameters from the air density data we can run the following m-file.

Listing 3.3: Curve fitting using polynomial least-squares


T = [-100 -50 0 60 100, 160 250 350]'; % Temperature, T
rho = [1.98, 1.53, 1.30, 1.067 0.946, 0.815, 0.675, 0.566]'; % Air density, ρ

X = vander(T); % Vandermonde matrix


5 X = X(:,end-2:end); % Keep only last 3 columns
theta = X\rho; % Solve for θ, where ρ̂ = θ1 T 2 + θ2 T + θ3

Ti = linspace(-130,400); % validate with a plot


110 Modelling

rho_pred = polyval(theta,Ti);
10 plot(T,rho,'o',Ti,rho_pred,'r-') % See Fig. 3.17.

The resulting curve is compared with the experimental data in Fig. 3.17.

In the example above, we constructed the Vandemonde data matrix explicitly, and solved for the
parameters using the pseudo-inverse. In practice however, we would normally simply use the
built-in M ATLAB command polyfit which essentially does the same thing.

Improvements to the least squares estimation algorithm

There are many extensions to this simple multi-linear regression algorithm that try to avoid the
poor numerical properties of the scheme given above. If you consider pure computational speed,
then solving a set of linear equations is about twice as fast as inverting a matrix. Therefore instead
of Eqn. 3.41, the equivalent
X⊤ Xθ = X⊤ y (3.42)

is the preferred scheme to calculate θ. M ATLAB ’ S backslash operator, \, used in the example
above follows this scheme internally. We write it as if we expect an inverse, but it actually solves
a system of linear equations.

A numerical technique that uses singular value decomposition (SVD) gives better accuracy (for
the same number of bits to represent the data), than just applying Eqn 3.41, [161, 176]. Singular
values stem from the property that we can decompose any matrix T into the product of two
orthogonal matrices (U, V) and a diagonal matrix Σ,

T = UΣV⊤ (3.43)

where due to the orthogonality,

UU⊤ = I = U⊤ U and VV⊤ = I = V⊤ V (3.44)

The diagonal matrix Σ consists of the square roots of the eigenvalues of T⊤ T which are called
the singular values of T, and the number that differ significantly from zero is the rank of T. We
can make one further modification to Eqn 3.41 by adding a weighting matrix W. This makes the
solution more general, but often in practice one sets W = I. Starting with Eqn 3.42 including the
weighting matrix W−1 = G⊤ G,

X⊤ W−1 Xθ = X⊤ W−1 y (3.45)


⊤ ⊤ ⊤ ⊤
X G GXθ = X G Gy (3.46)

def def
Now we define T = GX and z = Gy gives

T⊤ Tθ = T⊤ z (3.47)

Now we take the singular value decomposition (SVD) of T in Eqn 3.47 giving
  
VΣU⊤ UΣV⊤ θ = VΣU⊤ z (3.48)
2 ⊤
VΣ V θ = (3.49)
3.4. REGRESSING EXPERIMENTAL DATA BY CURVE FITTING 111

Multiplying both sides by V⊤ and noting that Σ is diagonal we get,

Σ2 V⊤ θ = ΣU⊤ z (3.50)
⊤ −1 ⊤
| {z } θ = VΣ
VV U z (3.51)
identity

θ = |VΣ−1 ⊤
{z U } z (3.52)
pseudo inverse

The inverse of Σ is simply the inverse of each of the individual diagonal (and non-zero) elements,
since it is diagonal. The key point here is that we never in the algorithm needed to calculate the
possibly ill-conditioned matrix T⊤ T. Consequently we should always use Eqn. 3.52 in preference
to Eqn. 3.41 due to the more robust numerical properties as shown in Listing 3.4.

Listing 3.4: Polynomial least-squares using singular value decomposition. This routine follows
from, and provides an alternative to Listing 3.3.
[U,S,V] = svd(X,0); % Use the ‘economy sized’ SVD, UΣV⊤ = X
theta2 = V*(S\U')*rho % θ = VΣ−1 U⊤ , Eqn. 3.52.

It does not take much for the troublesome XT X matrix to get ill-conditioned. Consider the tem-
perature/density data for air from Fig. 3.17, but in this case we will use temperature in degrees
Kelvin (as opposed to Celsius), and we will take data over slightly larger range of temperatures.
This is a reasonable experimental request, and so we should expect to be able to fit a polynomial
in much the same way we did in Listing 3.3.

However as shown in Fig. 3.18, we cannot reliably fit a fifth-order polynomial using standard
least-squares to this data set, although we can reliably fit such a 5th-order polynomial using the
SVD strategy of Eqn. 3.52. Note that M ATLAB’s polyfit uses the reliable SVD strategy.

Polynomial order = 5
3.5
Raw Data
3 Standard least−squares standard LS
θ = (X ′ X)−1 X ′ y SVD fitted
2.5
Air density, [kg/m ]
3

LS using SVD
2 θ = V Σ−1 U T y
Figure 3.18: Fitting a high-order polyno-
mial to some physical data, in this case
1.5
the density of air as a function of tempera-
1 ture in degrees Kelvin. Experimental data,
a fitted quintic curve using standard least
0.5
squares, and one using the more robust
0 singular value decomposition. (See also
100 200 300 400 500 600 700
Temperature [degrees Kelvin] Fig. 3.17.)

A further refined regression technique is termed partial least squares (PLS), and is summarised
in [74]. Typically these schemes decompose the data matrix X, into other matrices that are
ordered in some manner that roughly corresponds to information. The matrices with the least
information (and typically most noise) are omitted, and the subsequent regression uses only the
remaining information.

The example on page 109 used the least squares method for estimating the parameters of an
algebraic equation. However the procedure for estimating a dynamic equation remains the same.
This will be demonstrated later in §6.7.
112 Modelling

Problem 3.3 Accuracy and numerical stability are very important when we are dealing with com-
puted solutions. Suppose we wish to invert
 
1 1
A=
1 1+ǫ

where ǫ is a small quantity (such as nearly the machine eps)

1. What is the (algebraic) inverse of A?

2. Obviously if ǫ = 0 we will have trouble inverting since A is singular, but what does the
pseudo-inverse, A+ , converge to as ǫ → 0?

3.4.2 Nonlinear least-squares model identification

If the model equation is nonlinear in the parameters, then the solution procedure to find the opti-
mum parameters requires a nonlinear optimiser rather than the relatively robust explicit relation
given by Eqn. 3.41 or equivalent. Nonlinear optimisers are usually based on iterative schemes
additionally often requiring good initial parameter estimates and even then may quite possibly
fail to converge to a sensible solution. There are many algorithms for nonlinear optimisation
problems including exhaustive search, the simplex method due to Nelder-Mead, and gradient
methods.

M ATLAB provides a simple unconstrained optimiser, fminsearch, which uses the Nelder-Mead
simplex algorithm. A collection of more robust algorithms and algorithms to optimise con-
strained systems possibly with integer constraints is the O PTI toolbox5 .

A nonlinear curve-fitting example

The following biochemical example illustrates the solution of a nonlinear algebraic optimisation
problem. Many biological reactions are of the Michalis-Menton form where the cell number y(t)
is given at time t by the relation
αt
y= (3.53)
βt + 1
Suppose we have some experimental data for a particular reaction given as

time, t 0 0.2 0.5 1 1.5 2 2.5


cell count, y 0 1.2 2.1 2.43 2.52 2.58 2.62

where we wish to estimate the parameters α and β in Eqn 3.53 using nonlinear optimisation tech-
niques. Note that the model equation (3.53) is nonlinear in the parameters. While it is impossible
to write the equation in the linear form y = f (t) · θ as done in §3.4, it is possible to linearise the
equation (Lineweaver-Burke plot) by transforming the equation. However this will also trans-
form and bias the errors around the parameters. This is not good practice as it introduces bias
in the estimates, but does give good starting estimates for the nonlinear estimator. Transforming
Eqn 3.53 we get
α y
y= −
β βt
5 The O PTI toolbox is available from www.i2c2.aut.ac.nz
3.4. REGRESSING EXPERIMENTAL DATA BY CURVE FITTING 113

Thus plotting y/t against y should result in a straight line with an intercept of α/β and a slope of
−1/β.

3.5

Assuming we have the experimental data stored as


3
column vectors [t,y] in M ATLAB, we could plot

cell count y
2.5

t = [0, 0.2, 0.5:0.5:2.5]';


2
y = [0,1.2,2.1,2.43,2.52,2.58,2.62]';
3 % ignore divide-by-zero 1.5
plot(y./t,y,'o-')
1
0 1 2 3 4 5 6
y/t
which should given an approximately straight line (ignoring the first and possibly the second
points). To find the slope and intercept, we use the polyfit function to fit a line and we get a
slope = −0.2686 and intercept = 2.98. This corresponds to α = 11.1 and β = 3.723.

Now we can refine these estimates using the fminsearch Nelder-Mead nonlinear optimiser.
We must first construct a a small objection function file which evaluates the sum of the squared
error for a given set of trial parameters and the experimental data. This is succinctly coded as an
anonymous function in Listing 3.5 following.

Since we require the experimental data variables y and t in the objective function, but we are not
optimising with respect to them, we must pass these variables as additional parameters to the
optimiser.

Listing 3.5: Curve fitting using a generic nonlinear optimiser


1 % Compute sum of square errors given trial θ and (t, y) data.
sse= @(theta,t,y) sum((y-theta(1)*t./(theta(2)*t+1)).ˆ2);

optns = optimset('Diagnostics','on','Tolx',1e-5);
theta = fminsearch(sse,[11.1 3.723]',optns,t,y) % Polish the estimates.
6 plot(t,y,'o',t, theta(1)*t./(theta(2)*t+1)) % Check final fit in Fig. 3.19.

Listing 3.5 returns the refined estimates for θ as


   
α 12.05
θ̂ = =
β 4.10

and a comparison of both the experimental data • and the model’s predictions is given in Fig. 3.19.

Curve fitting using the O PTI optimisation toolbox

The M ATLAB O PTIMISATION toolbox contains more sophisticated routines specifically intended
for least-squares curve fitting. Rather than write an objective function to compute the sum of
squares and then subsequently call a generic optimiser as we did in Listing 3.5, we can solve the
problem in a much more direct manner. The opti_lsqcurvefit is the equivalent routine in
the O PTI toolbox that solves least-squares regression problems.

Listing 3.6: Curve fitting using the O PTI optimisation toolbox. (Compare with Listing 3.5.)
>> theta0 = [11,3]'; % Initial guess θ 0
114 Modelling

2.5

cell count
1.5

1
Exptal data
Figure 3.19: The fitted (dashed) and experi- 0.5 Best fit curve
mental, •, data for a bio-chemical reaction. α/β
The asymptotic final cell count, α/β, is given 0
0 0.5 1 1.5 2 2.5
by the dashed horizontal line. time

>> F = @(x,t) x(1)*t./(x(2)*t+1); % f (θ, t) = θ1 t/(θ2 t + 1)


>> >> theta = opti_lsqcurvefit(F,theta0,t,y)
4 theta =
12.0440
4.1035

Note that in Listing 3.6 we encapsulate the function in an anonymous function which is then
passed to the least-squares curve fit routine, opti_lsqcurvefit.

Higher dimensional model fitting

Searching for parameters where we have made two independent measurements is just like search-
ing when we have made one independent measurement, except that to plot the results we must
resort to contour or three-dimensional plots. This makes the visualisation a little more difficult,
but changes nothing in the general technique.

We will aim to fit a simple four parameter, two variable function to model the compressibility
of water. Water, contrary to what we taught in school, is compressible, although this is only
noticeable at very high pressures. If we look up the physical properties for compressed water
in steam tables,6 we will find something like table 3.4. Fig. 3.20(a) graphically illustrates the

Table 3.4: Density of compressed water, (ρ × 10−3 kg/m3 )

Pressure (bar) Temperature, ◦ C


0.01 100 200 250 300 350 374.15
100 1.0050 0.9634 0.8711 0.8065 0.7158 — —
221.2 1.0111 0.9690 0.8795 0.8183 0.7391 0.6120 —
500 1.0235 0.9804 0.8969 0.8425 0.7770 0.6930 0.6410
1000 1.0460 1.0000 0.9242 0.8772 0.8244 0.7610 0.7299

strong temperature influence on the density compared with pressure. Note how the missing data
represented by NaNs in M ATLAB is ignored in the plot.

Our model relates the density of water as a function of temperature and pressure. The proposed
6 Rogers & Mayhew, Thermodynamic and Transport properties of Fluids, 3rd Edition, (1980), p11
3.4. REGRESSING EXPERIMENTAL DATA BY CURVE FITTING 115

density (kg/m3)
1100

1000
1000
1100 900
950 850
800
3

1000
True density kg/m

775
900

Pressure (bar)
700
900
850
600
800
800 500
700
750 400 700
600
0
1000 700 300
100 700
200 200 1e+003 1e+003 925
925 850 775
500 650 1e+003
300 100
° 400 0 Pressure (bar) 0 50 100 150 200 250 300 350 400
Temperature, C Temperature (deg C)

(a) True density of compressed water as a function of tem- (b) Model fit
perature and pressure.

Figure 3.20: A 2D model for the density of compressed water. In Fig. 3.20(b), the • mark experi-
mental data points used to construct the 2-D model given as contour lines.

model structure is  
k b c
ρ = P exp a + + 3 (3.54)
T T
where the constants (model parameters θ) need to be determined. We define the parameters as;

def  ⊤
θ= a b c k (3.55)

Again the approach is to minimise numerically the sum of the squared errors using an optimiser
such as fminsearch. We can check the results of the optimiser using a contour plot with the
experimental data from Table 3.4 superimposed.

The script file in Listing 3.7 calls the minimiser which in turn calls the anonymous function
J_rhowat which given the experimental data and proposed parameters returns the sum of
squared errors. This particular problem is tricky, since it is difficult to know appropriate starting
guesses for θ, and the missing data must eliminated before the optimisation. Before embarking
on the full nonlinear minimisation, I first try a linear fitting to obtain good starting estimates for
θ. I also scale the parameters so that the optimiser deals with numbers around unity.

Listing 3.7: Fitting water density as a function of temperature and pressure


rhowat = @(a,P,T) P.ˆa(1).*exp(a(2) + ...
1e2*a(3)./T + 1e7*a(4)./T.ˆ3) % Assumed model ρ = P k exp a + Tb + Tc3


J_rhowat = @(a,P,T,rho) ...


norm(reshape(rho,[],1)-reshape(rhowat(a,P,T),[],1)); % SSE J = (ρ − ρ̂)2
P
4

T = [0.01, 100, 200, 250,300,350, 374.15]'; % Temp [deg C]


P = [100 221.2 500 1000]'; % Pressure [Bar]
rhof = 1.0e3*[1.00250, 0.9634, 0.8711, 0.8065, 0.7158, NaN, NaN; ...
9 1.0111, 0.9690, 0.8795, 0.8183, 0.7391, 0.6120, NaN; ...
1.0235, 0.9804, 0.8969, 0.8425, 0.7770, 0.6930, 0.6410; ...
1.0460, 1.0000, 0.9242, 0.8772, 0.8244, 0.7610, 0.7299]; % density kg/m3

[TT,PP] = meshgrid(T+273,P); Tv = TT(:); Pv = PP(:); % vectorise


116 Modelling

14 A = [ones(size(Tv)), 1.0e2 ./Tv, 1.0e7 ./Tv.ˆ3, log(Pv)]; % Scaled data matrix


idx = isnan(rhof(:)); % find missing data points
rhofv = rhof(:); rhofv(idx) = []; % remove bad points
Tv(idx) = []; Pv(idx) = []; A(idx,:) = [];
theta = A\log(rhofv); % first (linear) estimate
19

% Do nonlinear fit
theta_opt = fminsearch(@(theta) J_rhowat(theta,Pv,Tv,rhofv),theta);

[Ti,Pi] = meshgrid([0.01:10:370]+273,[100:100:1000]); % Compare fit with data


24 rho_est = rhowat(theta_opt,Pi,Ti);

Fig. 3.20(b) compares contour plots of the density of water as a function of pressure and temper-
ature derived from the experimental data and the fitted model. The • shows the location of the
experimental data points. The solid contour lines give the predicted density of water compared
with the contours derived from the experimental data (dashed lines). The optimum parameters
found above are    
a 5.5383
 b   5.8133 · 102 
θ⋆ =  c  =  −1.8726 · 107 
   (3.56)
k 0.0305

3.4.3 The confidence of the optimised parameters

Finding the optimum model parameters to fit the data is only part of the task. A potentially
more difficult objective is to try to establish how precise these optimum parameters are, or what
the confidence regions of the parameters are. simply expressing the uncertainty of your model as
parameter values ± some interval is a good first approximation, but it does neglect the interaction
of the other parameters, the so-called correlation effect.

To establish the individual confidence limits of the parameters, we need to know the following:

1. The n optimum parameters θ from the m experimental observations. (For the nonlinear
case, this can be found using a numerical optimiser)
2. A linearised data matrix X centered about the optimised parameters.
3. The measurement noise sȲi (This can be approximated from the sum of the squared error
terms, or from prior knowledge.)
4. Some statistical parameters such as the t-statistic as a function of confidence interval (90%,
95% etc) and degrees of freedom. This is easily obtained from statistical tables or using the
qt routine from the S TIXBOX collection mentioned on page 3 for m-file implementations of
some commonly used statistical functions.

The confidence interval for parameter θi is therefore


p
θi ± t(1−α/2) sȲi Pi,i (3.57)
where t(1−α/2) is the t-statistic evaluated at ν = n − m (number of observations less the number
of parameters) degrees of freedom and Pi,i are the diagonal elements of the covariance matrix P.
The measurement noise (if not already approximately known) is
s
p ǫ⊤ ǫ
sȲi ≈ s2r = (3.58)
n−m
3.4. REGRESSING EXPERIMENTAL DATA BY CURVE FITTING 117

and the covariance matrix P is obtained from the data matrix. Note that if we assume we have
perfect measurements and a perfect model, then we would expect that the sum of the errors
will be exactly zero given the true parameters. This is consistent with Eqn 3.58 when ǫ⊤ ǫ = 0
although of course this occurs rarely in practice!

An example of establishing confidence intervals for a nonlinear model

This section was taken from [87, p198], but with some modifications and corrections. Himmel-
blau, [87, p198], gives some reaction data as:

Pressure, p 20 30 35 40 50 55 60
rate, r 0.068 0.0858 0.0939 0.0999 0.1130 0.1162 0.1190

and proposes a 2 parameter nonlinear model of the form


θ0 p
r̂ = (3.59)
1 + θ1 p
Now suppose that we use a nonlinear optimiser program to search for the parameters θ such that
the sum of squared error is minimised.
n
X 
min = (ri − r̂i )2 = (r − r̂)⊤ (r − r̂) (3.60)
θ i=1

If we do this using say the same technique as in §3.4.2, we will find that the optimum parameters
are approximately    
θ0 5.154 · 10−3
θ= = (3.61)
θ1 2.628 · 10−2
and the raw data and model predictions are given in Fig 3.21.

Now the linearised data matrix is a n by m matrix where n is the number of observations and m
is the number of parameters. The partial derivatives of 3.59 with respect to the parameters are;

∂r p ∂r −θ0 p2
= , =
∂θ0 1 + θ1 p ∂θ1 (1 + θ1 p)2
and the data matrix X is defined as
def ∂ri
Xi,j = (3.62)
∂θj
Thus each row of X are the partial derivatives of that particular observation with respect for the
parameters. The covariance matrix P is defined as
 −1
def
P= X⊤ X (3.63)

and should be positive definite and symmetric just as the variance should always be greater
than, or equal to zero. However in practice this requirement may not always hold owing to poor
numerical conditioning. The diagonal elements are the variances we will use for the individual
confidence limits. When using M ATLAB, it is better to look at the singular values, or check the
rank of X⊤ X before doing the inversion.

Listing 3.8 fits parameters to the nonlinear reaction rate model and also illustrates the uncertain-
ties.
118 Modelling

Listing 3.8: Parameter confidence limits for a nonlinear reaction rate model
1 p = [20,30,35,40,50,55,60]'; % Pressure, P
r = [0.068,0.0858,0.0939,0.0999,0.1130,0.1162,0.1190]'; % Reaction rate, r

θ0 p
Rxn_rate = @(x,p) x(1)*p./(1+x(2)*p); % r = 1+θ 1p
theta = lsqcurvefit(Rxn_rate,[5e-3 2e-2]',p,r); % Refine estimate of parameters θ
6 nobs = length(p); mpar = length(theta); % # of observations & parameters

r_est = Rxn_rate(theta,p);% Predicted reaction rate r̂(θ, p)


% sum of squared errors J = i ǫ2
P
j = sum((r-r_est).ˆ2);

11 d1= 1+theta(2)*p;
X = [p./d1, -theta(1)*p.*p./d1.ˆ2]; % Data gradient matrix, Eqn. 3.62.
C = inv(X'*X); % not numerically sound ??

% t-distribution statistics
16 pt = @(x,v) (x≥0).*(1-0.5*betainc(v./(v+x.ˆ2),v/2,0.5)) + ...
(x<0).*(0.5*betainc(v./(v+x.ˆ2),v/2,0.5))
qt = @(P,v) fsolve(@(x) pt(x,v)-P,sqrt(2)*erfinv(2*P-1), ...
optimset('display','off'));

21 alpha = 0.05; % CI level (97.5%)


nu = nobs-mpar; % Degrees of freedom ν = n − m
t = qt(1-alpha,nu); % t statistic from approx function
sy = sqrt(j/nu); % approx measurement noise std
b_ci = t*sy*diag(C); % ± confidence regions

This gives 95% confidence limits for this experiment7 as


   
5.1061 · 10−3 5.2019 · 10−3
< θ < (3.64)
2.2007 · 10−2 3.0553 · 10−2

A plot of the raw data (◦), the model with the optimum estimated parameters and the associated
error bounds using Eqn. 3.64 is given in Fig. 3.21. Note while the model prediction is quite good,
the region defined by the 95% confidence on the parameters is surprisingly large. The error
bounds were generated by simulating the model with the lower and upper bounds of Eqn 3.64
respectively.

The confidence region (an ellipse in the two variable case) can also be plotted. This gives a deeper
understanding of the parameter interactions, but the visualisation becomes near impossible as
the number of parameters gets much larger than 3. The region for this example is evaluated the
following section.

The confidence regions of the parameters

An approximate confidence region can be constructed by linearising the nonlinear model about
the optimum parameters. The covariance of the parameter estimate is
 −1
cov(b) ≈ X⊤ X σȲ2 i (3.65)

where b is the estimated parameter and β is the true, (but unknown) parameter. Now the confi-
dence region, an ellipse in two parameter space (an ellipsoid in three diameter space) is the region
7 These figures differ from the worked example in Himmelblau. I think he swapped two matrices by mistake, then

continued with this error.


3.4. REGRESSING EXPERIMENTAL DATA BY CURVE FITTING 119

Experimental data (o) & model fit


0.15
Data
0.14 model
0.13

0.12

0.11
rate

0.1

0.09

0.08

0.07 Figure 3.21: The experimental raw


data (•), model (—), and associated
0.06
10 20 30 40 50 60 70 95% error bounds (dotted) for the
pressure optimised model fit.

that with a confidence limit of say 95% we are certain that the true parameter (β) lies within.

(β − b)⊤ (X⊤ X) (β − b) = s2Ȳi mF1−α [m, n − m] (3.66)

where F [m, n − m] is the upper limit of the F distribution for m and n − m degrees of freedom.
This value can be found in statistical tables. Note that the middle term of Eqn 3.66 is X⊤ X and
not the inverse of this.

For the previous example, we can plot the ellipse about the true optimised parameters. Fig 3.22
shows the 95% confidence ellipse about the optimised parameter (marked with a ×) and the
individual confidence lengths superimposed from Eqn. 3.64. Since the ellipse is slanted, we note
that the parameters are correlated with each other.

95% Confidence limit


0.032

0.03

0.028
θ2

0.026

0.024

0.022 Figure 3.22: The approximate 95% confi-


dence region in 2 parameter space with
0.02 the individual confidence intervals super-
4.6 4.8 5 5.2 5.4 5.6 5.8
θ1
imposed (marked as ◦).

The ellipse defines an area in which the probability that the true parameter vector lies is better
than 95%. If we use the parameter points marked as a ⋆ in Fig. 3.22, then the model prediction
will differ from the prediction using the optimised parameter vector. This difference is shown as
the errorbars in the lower plot of Fig. 3.22. Note that these errorbars are much smaller than the
errorbars I obtained from considering only the confidence interval for each individual parameter
120 Modelling

given in Fig 3.21. Here, owing to the correlation (slanting of the ellipse in Fig 3.22), the parameters
obtained when neglecting the correlation lie far outside the true 95% confidence region. Hence
the bigger errorbars.

3.5 Numerical tools for modelling

“Unfortunately there are no known methods of solving Eqn 1 (a nonlinear differential equation).
This, of course, is very disappointing.”
M. Braun, [33, p493]

Since differential equations are difficult to solve analytically, we typically need to resort to nu-
merical methods as described in many texts such as [39, 41, 46, 94, 104, 169] and in addition to
[201].

Differential equations are broadly separated into two families:

ODEs Ordinary differential equations (ODEs) where time is the only independent variable. The
solution to ODEs can be described using standard plots where the dependent variables are
plotted on the ordinate or y axis against time on the x axis.
PDEs Partial differential equations (PDEs) are where space (and perhaps time) are independent
variables. To display the solution of PDEs, contour maps or 3D plots are required in general.

Generally PDEs are much harder to solve than ODEs and are beyond the scope of this text.

The solution of nonlinear differential equations

One of the most important “tools” in the desk-top experimenter’s collection, is a good numerical
integrator. We need numerical integrators whenever we are faced with a nonlinear differential
equation which is intractable by any other means. M ATLAB supplies a number of different nu-
merical integrators optimised for different classess of problems. Typically the 4th/5th dual order
Runge-Kutta implementation, ode45, is a good first choice.

To use the integrators, you must write a function subroutine that contains your differential system
to be solved which will be called by the numerical integration routine.

1. The script file that calls the integrator (eg: ode45) with name of the function subroutine to
be integrated and parameters including initial conditions, tolerance and time span.
2. The function subroutine (named in the script file) that calculates the derivatives as a func-
tion of the independent variable (usually time), and the dependent variable (often x).

We will demonstrate this procedure by constructing templates of the above two files to solve a
small nonlinear differential equation. One can reuse this template as the base for solving other
ODE systems.

ODE template example

Suppose our task is to investigate the response of the nonlinear pendulum system given in
Eqn. 3.6 and compare this with the linearised version. The script file given in Listing 3.9 com-
putes the angle θ = x1 (t) trajectory for both the nonlinear and linear approximation.
3.5. NUMERICAL TOOLS FOR MODELLING 121

Listing 3.9: Comparing the dynamic response of a pendulum to the linear approximation
g=8.81; l=1; m = 5; T_c=0; % constants
xdot = @(t,x) [x(2); -g/l*sin(x(1)) + T_c/m/lˆ2]; % dx/dt

x0 = [1,1]'; % start position x(0)


5 [t,x]= ode45(xdot,[0,10],x0); % Integrate nonlinear

% Now try linear system


A = [0 1; -g/l 0]; B = [0;0]; C = [1,0];D=0;
tv = linspace(0,10)';
10 [y2,t2]=lsim(ss(A,B,C,D),0*tv,tv,x0);

plot(t,x(:,1),t2,y2) % See Fig. 3.23.

Note how the trajectories for the nonlinear (solid) result as calculated from ode45 and linear
approximation calculated using lsim gradually diverge in Fig. 3.23.

1.5

0.5

0
x(t)

−0.5

−1
Non Linear
−1.5 Linearised Figure 3.23: The trajectory of the true non-
0 2 4 6 8 10
linear pendulum compared with the lin-
time earised approximation.

Problem 3.4 1. One of my favourite differential equations is called the Van der Pol equation;

d2 y  dy
m 2
− b 1 − y2 + ky = 0
dt dt
where m = 1, b = 3 and k = 2. Solve the equation for 0 < t < 20 using both ode23 and
ode45. Sketch the solution. Which routine is better? You will need to create a function
called vandepol(t,y) which contains the system of equations to be solved.

2. A damped forced pendulum is an example of a chaotic system. The dynamic equations are

dω ω
=− − sin θ + g cos φ (3.67)
dt q

=ω (3.68)
dt

= ωd (3.69)
dt
 ⊤  ⊤
where there are three states, (x = ω θ φ
), and three parameters, (ρ = q g ωd ).
 ⊤
Simulate this system for 0 ≤ t ≤ 500 starting from x = −1 2 0.3 and using
 ⊤
ρ= 2 1.5 2/3 as parameters. Be warned, this may take some computer time!
122 Modelling

Problem 3.5 A particularly nasty nonlinear chemical engineering type model is found in [83]. This
is a model of two CSTRs8 in series, cooled with a co-current cooling coil. The model equations
are:
 
q E
Ċa1 = (Caf − Ca1 ) − k0 Ca1 exp −
V1 RT1
 
q (−∆H)k0 Ca1 E
Ṫ1 = (Tf − T1 ) + exp − +
V1 ρcp RT1
  
ρc cpc hA1
qc 1 − exp − (Tcf − T1 )
ρcp V1 qc ρc cpc
 
q E
Ċa2 = (Ca1 − Ca2 ) − k0 Ca2 exp −
V2 RT2
 
q (−∆H)k0 Ca2 E
Ṫ2 = (T1 − T2 ) + exp − +
V2 ρcp RT2
      
ρc cpc hA2 hA1
qc 1 − exp − T1 − T2 + exp − (Tcf − T1 )
ρcp V2 qc ρc cpc qc ρc cpc
where the state, control, disturbance and measured variables are defined as
def  ⊤ def def  ⊤
x = Ca1 T1 Ca2 T2 , u = qc , d = Caf Tcf
def
y = Ca2
and the parameters values are given in Table 3.5.

Table 3.5: The parameter values for the CSTR model

description variables value units


reactant flow q 100 l/min
conc. of feed reactant A Caf 1.0 mol/l
temp. of feed Tf 350 K
temp. of cooling feed Tcf 350 K
volume of vessels V1 = V2 100 l
heat transfer coeff. hA1 = hA2 1.67 · 105 J/min.K
pre-exp. constant k0 7.2 · 1010 min−1
E/R E/R 1.0 · 104 K
reaction enthalpy −∆H 4.78 · 104 J/mol
fluid density ρc = ρ 1.0 g/l
heat capacity cp = cpc 0.239 J/g.K

Note that the model equations are nonlinear since the state variable T1 (amongst others) is a
appears nonlinearly in the differential equation. In addition, this model is referred to as control
nonlinear, since the manipulated variable enters nonlinearly. Simulate the response of the con-
centration in the second tank (Ca2 ) to a step change of ±10% in the cooling flow using the initial
conditions given in Table 3.6.

You will need to write a .m file containing the model equations, and use the integrator ode45.
Solve the system over a time scale of about 20 minutes. What is unusual about this response?
How would you expect a linear system to behave in similar circumstances?

Problem 3.6 Develop a M ATLAB simulation for the high purity distillation column model given in
[140, pp459]. Verify the open loop responses given on page 463. Ensure that your simulation is
easily expanded to accommodate a different number of trays, different relative volatility etc.
8 continuously stirred tank reactors
3.5. NUMERICAL TOOLS FOR MODELLING 123

Table 3.6: The initial state and manipulated variables for the CSTR simulation

description variable value units


coolant flow qc 99.06 l/min
conc. in vessel 1 Ca1 8.53 · 10−2 mol/l
temp. in vessel 1 T1 441.9 K
conc. in vessel 2 Ca2 5.0 · 10−3 mol/l
temp. in vessel 2 T2 450 K

Problem 3.7 A slight complexity to the simple ODE with specified initial conditions is where we
still have the ODE system, but not all the initial conditions. In this case we may know some of the
end conditions instead, thus the system is in principle able to be solved, but not in the standard
manner;– we require a trial and error approach. These types of problems are called two point
boundary problems and we see them arise in heat transfer problems and optimal control. They
are so called two point boundary problems because for ODEs we have two boundaries for the
independent variable; the start and the finish. Try to solve the following from [196, p178].

A fluid enters an immersed cooling coil 10m long at 200 ◦ C and is required to leave at 40 ◦ C. The
cooling medium is at 20 ◦ C. The heat balance is

d2 T
= 0.01(T − 20)1.4 (3.70)
dx2
where the initial and final conditions are;

Tx=0 = 200, Tx=10 = 40

To solve the system, we must rewrite Eqn 3.70 as a system of two first order differential equations,
and supply a guess for the missing initial condition dTdx x=0 =? We can then integrate the system
until x = 10 and check that the final condition is as required. Solve the system.
dT
Hint: Try −47 < dx x=0 < −42.

3.5.1 Differential/Algebraic equation systems and algebraic loops

In many cases when deriving models from first principals, we end up with a coupled system
of some differential equations, and some algebraic equations. This is often due to our practice
of writing down conservation type equations (typically dynamic), and constraint equations, say
thermodynamic, which are typically algebraic. Such a system in general can be written as

f (ẋ, x, t) = 0 (3.71)

and is termed a DAE or differential/algebraic equation system. If we assume that our model
is “well posed”, that is we have some hope of finding a solution, then we expect that we have
the same number of variables as equations, and it follows that some of the variables will not
appear in the differential part, but only in the algebraic part. We may be able to substitute those
algebraic variables out, such that we are left with only ODEs, which can then be solved using
normal backward difference formula numerical schemes such as rk2.m. However it is more
likely that we cannot extract the algebraic variables out, and thus we need special techniques to
solve these sorts of problems (Eqn. 3.71) as one.

In a now classic article titled Differential/Algebraic Equations are not ODE’s, [158] gives some in-
sight into the problems. It turns out that even for linear DAE systems, the estimate of the error,
124 Modelling

which is typically derived from the difference between the predictor and the corrector, does not
decrease as the step size is reduced. Since most normal ODE schemes are built around this as-
sumption, they will fail. This state of affairs will only occur with certain DAE structures where
the nilpotency or index is ≥ 3. Index 2 systems also tend to cause BDF schemes to fail, but
can be tackled using other methods. The index problem is important because in many times it
can be changed (preferably reduced to less than two), by rearranging the equations. Automated
modelling tools tend, if left alone, to create overly complex models with a high index that are im-
possible to solve. However if we either by using an intelligent symbolic manipulator or human,
change these equations, we may be able to reduce the index.

The difficulties of algebraic loops

One reason that computer aided modelling tools such as S IMULINK have taken so long to mature
is the problem of algebraic loops. This is a particular problem when the job of assembling the
many different differential and algebraic equations in an efficient way is left to a computer.

Suppose we want to simulate a simple feedback process where the gain is a saturated function of
the output, say

K(y) = max(min(y, 5), 0.1)

If we simulate this in Simulink using


1
1
s+1
Step Gain Transfer Fcn Product Scope

Saturation Gain1

we run into an Algebraic Loop error. M ATLAB returns the following error diagnostic (or something
similar):

Warning: Block diagram ’sgainae’ contains 1 algebraic loop(s).


Found algebraic loop containing block(s):
’sgainae/Gain1’
’sgainae/Saturation’ (discontinuity)
’sgainae/Product’ (algebraic variable)
Discontinuities detected within algebraic loop(s), may have trouble solving

and the solution stalls.

The simplest way to avoid these types of problems is to insert some dynamics into the feedback
loop. In the example above, we could place a transfer function with a unit gain and very small
time constant in place of the algebraic gain in the feedback loop. While we desire the dynamics
of the gain to be very fast so that it approximates the original algebraic gain, overly fast dynamics
cause numerical stability problems, hence there is a trade off.
3.6. LINEARISATION OF NONLINEAR DYNAMIC EQUATIONS 125

3.6 Linearisation of nonlinear dynamic equations

While most practical engineering problems are nonlinear to some degree, it is often useful to
be able to approximate the dynamics with a linear differential equation which means we can
apply linear control theory. While it is possible to design compensators for the nonlinear system
directly, this is in general far more complicated, and one has far fewer reliable guidelines and
recipes to follow. One particular version of nonlinear controller design called exact nonlinear
feedback is discussed briefly in §8.6.

Common nonlinearities can be divided into two types; hard nonlinearities such as hysterisis,
“stiction”, and dead zones, and soft nonlinearities such as the Arrhenius temperature depen-
dency, power laws, etc. Hard nonlinearities are characterised by functions that are not differen-
tiable, while soft nonlinearities are. Many strategies for compensating for nonlinearities are only
applicable for systems which exhibit soft nonlinearities.

We can approximate soft nonlinearities by truncating a Taylor series approximation of the origi-
nal system. The success of this approximation depends on whether the original function has any
non-differentiable terms such as hysterisis, or saturation elements, and how far we deviate from
the point of linearisation. This section follows the notation of [76, §3.10].

Suppose we have the general nonlinear dynamic plant


ẋ = f (x(t), u(t)) (3.72)
y = g(x, u) (3.73)
and we wish to find an approximate linear model about some operating point (xa , ua ). A first-
order Taylor series expansion of Eqns 3.72–3.73 is
∂f ∂f
ẋ ≈ f (xa , ua ) + x=xa (x(t) − xa ) + x=xa (u(t) − ua ) (3.74)
∂x u =ua ∂u u =ua
∂g ∂g
y ≈ g(xa , ua ) + x=xa (x(t) − xa ) + x=xa (u(t) − ua ) (3.75)
∂x u =ua ∂u u =ua

where the ijth element of the Jacobian matrix ∂f /∂x is ∂f i /∂xj . Note that for linear systems, the
Jacobian is simply A in this notation, although some authors define the Jacobian as the transpose
of this, or AT .

The linearised system Eqn 3.74–3.75 can be written as


ẋ = Ax(t) + Bu(t) + E (3.76)
y = Cx(t) + Du(t) + F (3.77)
where the constant matrices A, B, C and D are defined as
∂f ∂f
A= x=xa , B= x=xa
∂x u =ua ∂u u =ua
∂g ∂g
C= x=xa , D= x=xa
∂x u =ua ∂u u =ua

and the bias vectors E and F are


∂f ∂f
E = f (xa , ua ) − x=xa xa − x=xa ua
∂x u =ua ∂u u =ua
∂g ∂g
F = g(xa , ua ) − x=xa xa − x=xa ua
∂x u =ua ∂u u =ua
126 Modelling

Note that Eqns 3.76–3.77 are almost in the standard state-space form, but they include the extra
bias constant matrices E and F. It is possible by introducing a dummy unit input to convert this
form into the standard state-space form,
 
  u
ẋ = Ax + B E (3.78)
1
 
  u
y = Cx + D F (3.79)
1

which we can then directly use in standard linear controller design routines such as lsim.

In summary, the linearisation requires one to construct matrices of partial derivatives with respect
to state and input. In principle this can be automated using a symbolic manipulator provided the
derivatives exist. In both the S YMBOLIC T OOLBOX and in M APLE we can use the jacobian
command.

Example: Linearisation using the S YMBOLIC toolbox. Suppose we want to linearise the nonlin-
ear system
"   #
ax1 exp 1 − xb2
ẋ =
cx1 (x2 − u2 )
at an operating point, xa = [1, 2]T , ua = 20.

First we start by defining the nonlinear plant of interest,

>> syms x1 x2 a b c u real


>> x = [x1 x2]';
3 >> fx = [a*x(1)*exp(1-b/x(2)); c*x(1)*(x(2)-uˆ2)]
fx =
[ a*x1*exp(1-b/x2)]
[ c*x1*(x2-uˆ2)]

Now we are ready to construct the symbolic matrix of partial derivatives,

>> Avar = jacobian(fx,x)


Avar =
[ a*exp(1-b/x2), a*x1*b/x2ˆ2*exp(1-b/x2)]
4 [ c*(x2-uˆ2), c*x1]
>> Bvar = jacobian(fx,u)
Bvar =
[ 0]
[ -2*c*x1*u]
9 >> Evar = fx - Avar*x - Bvar*u
Evar =
[ -a*x1*b/x2*exp(1-b/x2)]
[ -c*x1*x2+2*uˆ2*c*x1]

We can substitute a specific set of constants, say, a = 5, b = 6, c = 7 and an operating point


x = [1, 2]T , u = 20, into the symbolic matrices to obtain the numeric matrices.

>> A = subs(Avar,{a,b,c,x1,x2,u},{5,6,7,1,2,20})
A =
3 1.0e+003 *
3.6. LINEARISATION OF NONLINEAR DYNAMIC EQUATIONS 127

0.0007 0.0010
-2.7860 0.0070
>> B = subs(Bvar,{a,b,c,x1,x2,u},{5,6,7,1,2,20})
B =
8 [ 0]
[ -280]
>> E = subs(Evar,{a,b,c,x1,x2,u},{5,6,7,1,2,20})
E =
1.0e+003 *
13 -0.0020
5.5860

At this point we could compare in simulation the linearised version with the full nonlinear model.

3.6.1 Linearising a nonlinear tank model

Suppose we wish to linearise the model of the level in a tank given on page 88 where the tank
geometry is such that ρA = 1. The nonlinear dynamic system for the level h is then simplified to

dh √
= −k h + Fin
dt
For a constant flow in, Finss , the resulting steady state level is given by noting that dh/dt = 0, and
so  ss 2
Fin
hss =
k
We wish to linearise the system about this steady-state, so we will actually work with deviation
variables,
def def
x = h − hss , u = Fin − Finss
Now following Eqn. 3.74, we have

∂f ∂f
ḣ = ẋ = f (hss , Finss ) + (h − hss ) + (Fin − Finss )
| {z } ∂h ∂Fin
=0

Note that since ∂f /∂h = −k/(2 hss ), then our linearised model about (Finss , hss ) is

−k
ẋ = √ x+u
2 hss
which is in state-space form in terms of deviation variables x and u.

We can compare the linearised model with the nonlinear model in Fig. 3.24 about a nominal input
flow of Finss = 2 and k = 2 giving a steady-state level of hss = 4. Note that we must subtract and
add the relevant biases when using the linearised model.

An alternative, and much simpler way to linearise a dynamic model is to use linmod which
extracts the Jacobians from a S IMULINK model by finite differences.

Listing 3.10: Using linmod to linearise an arbitrary S IMULINK module.


1 k=1; Fin_ss = 2; % Model parameters and steady-state input
hss = (Fin_ss/k)ˆ2 % Steady-state level, hss
[A,B,C,D] = linmod('sNL_tank_linmod',hss,Fin_ss) % Linearise
128 Modelling

Repeating
Sequence
Stair 1 level, h
s
Add1 Add
level
Fin_ss
Math
Fin_ss Gain Function

k sqrt

Scope

simout
x’ = Ax+Bu To Workspace
u−Fin_ss u+Lss
y = Cx+Du
Bias State−Space Bias1

(a) A S IMULINK nonlinear tank model and linear state-space model for comparison

10
full non−linear response
9

7
level

6
linearised response

3
0 10 20 30 40 50
time

(b) Nonlinear and linearised model comparison

Figure 3.24: Comparing the linearised model with the full nonlinear tank level system

A = √
6 -0.2500 % Note −k/(2 hss ) = −1/4
B =
1
C =
1
11 D =
0

Quantifying the extent of the nonlinearity

It is important for the control designer to be able to quantify, if only approximately, the extent
of the open-loop process nonlinearity. If for example the process was deemed only marginally
nonlinear, then one would be confident a controller designed assuming a linear underlying plant
would perform satisfactory. On the other hand, if the plant was strongly nonlinear, then such a
linear controller may not be suitable. Ideally we would like to be able to compute a nonlinear
3.7. SUMMARY 129

metric, say from 0 (identically linear) to 1 (wildly nonlinear) that quantifies this idea simply by
measuring the openloop input/output data. This of course is a complex task, and is going to be
a function of the type input signals, duration of the experiment, whether the plant is stable or
unstable, and if feedback is present.

One such strategy is proposed in [82] and used to assess the suitability of linear control schemes
in [177]. The idea is to compute the norm of the difference between the best linear approxima-
tion and the true nonlinear response for the worst input trajectory within a predetermined set
of trajectories. This is a nested optimisation problem with a min-max construction. Clearly the
choice of linear model family from which to choose the best one, and the choice of the set of input
trajectories will have an effect on the final computed nonlinear measure.

3.7 Summary

9
Stuart Kauffman in a New Scientist article succinctly paraphrased the basis of the scientific
method. He said

“. . . state the variables, the laws linking the variables, and the initial and boundary
conditions, and from these compute the forward trajectory of the biosphere.”

In actual fact he was lamenting that this strategy, sometimes known as scientific determinism
voiced by Laplace in the early 19th century was not always applicable to our world as we under-
stand it today. Nonetheless, for our aim of modeling for control purposes, this philosophy has
been, and will remain for some time I suspect, to be remarkably successful.

Modelling of dynamic systems is important in industry. These types of models can be used for
design and or control. Effective modelling is an art. It requires mathematical skill and engi-
neering judgement. The scope and complexity of the model is dependent on the purpose of the
model. For design studies a detailed model is usually required, although normally only steady
state models are needed at this stage. For control, simpler models (although dynamic) can be
used since the feedback component of the controller will compensate for any model-plant mis-
match. However most control schemes require dynamic models which are more complicated
than the steady state equivalent.

Many of the dynamic models used in chemical engineering applications are built from conser-
vation laws with thermodynamic constraints. These are often expressed as ordinary differential
equations where we equate the rate of accumulation of something (mass or energy) to the inputs,
outputs and generation in a defined control volume. In addition there may be some restrictions
on allowable states, which introduces some accompanying algebraic equations. Thus general
dynamic models can be expressed as a combination of dynamic and algebraic relations
dx
= f (x, u, θ, t) (3.80)
dt
0 = g(x, u θ, t) (3.81)
which are termed DAEs (differential and algebraic equations), and special techniques have been
developed to solve them efficiently. DAEs crop up frequently in automated computer modelling
packages, and can be numerically difficult to solve. [133] provides more details in this field.

Steady state models are a subset of the general dynamic model where the dynamic term, Eqn
3.80, is set to equal zero. We now have an augmented problem of the form Eqn 3.81 only. Linear
9 Stuart Kauffman, God of creativity, 10 May 2008, NewScientist, pp52-53
130 Modelling

dynamic models are useful in control because of the many simple design techniques that exist.
These models can be written in the form

ẋ = Ax + Bu (3.82)

where the model structure and parameters are linear and often time invariant.

Models are obtained, at least in part, by writing the governing equations of the process. If these
are not known, experiments are needed to characterise fully the process. If experiments are used
the model is said to be heuristic. If the model has been obtained from detailed chemical and
physical laws, then the model is said to be mechanistic. In practice, most models are a mixture
of these two extremes. However what ever model is used, it still is only an approximation to the
real world. For this reason, the assumptions that are used in the model development must be
clearly stated and understood before the model is used.
Chapter 4

The PID controller

4.1 Introduction

The PID controller is the most common general purpose controller in the chemical process indus-
try today. It can be used as a stand-alone unit, or it can be part of a distributed computer control
system. Over 30 years ago, PID controllers were pneumatic-mechanical devices, whereas nowa-
days they are implemented in software in electronic controllers. The electronic implementation
is much more flexible than the pneumatic devices since the engineer can easily re-program it to
change the configuration of things like alarm settings, tuning constants etc.

Once we have programmed the PID controller, and have constructed something, either in soft-
ware or hardware to control, we must tune the controller. This is surprisingly tricky to do success-
fully, but some general hints and guidelines will be presented in §4.6. Finally, the PID controller
is not perfect for everything, and some examples of common pitfalls when using the PID are
given in §4.8.

4.1.1 P, PI or PID control

For many industrial process control requirements, proportional only control is unsatisfactory
since the offset cannot be tolerated. Consequently the PI controller is probably the most com-
mon controller, and is adequate when the dynamics of the process are essentially first or damped
second order. PID is satisfactory when the dynamics are second or higher order. However the
derivative component can introduce problems if the measured signal is noisy. If the process
has a large time delay (dead time), derivative action does not seem to help much. In fact PID
control finds it difficult to control processes of this nature, and generally a more sophisticated
controller such as a dead time compensator or a predictive controller is required. Processes that
are highly underdamped with complex conjugate poles close to the imaginary axis are also diffi-
cult to control with a PID controller. Processes with this type of dynamic characteristics are rare
in the chemical processing industries, although more common in mechanical or robotic systems
comprising of flexible structures.

My own impressions are that the derivative action is of limited use since industrial measurements
such as level, pressure, temperature are usually very noisy. As a first step, I generally only use a PI
controller, the integral part removes any offset, and the two parameter tuning space is sufficiently
small, that one has a chance to find reasonable values for them.

131
132 CHAPTER 4. THE PID CONTROLLER

4.2 The industrial PID algorithm

This section describes how to implement a simple continuous-time PID controller. We will start
with the classical textbook algorithm, although for practical and implementation reasons indus-
trially available PID controllers are never this simple for reasons which will soon become clear.
Further details on the subtlety of implementing a practical useful PID controller are described in
[53] and the texts [13, 15, 98].

The purpose of the PID controller is to measure a process error ǫ and calculate a manipulated ac-
tion u. Note that while u is referred to as an input to the process, it is the output of the controller.

The “textbook” non-interacting continuous PID controller follows the equation


 Z 
1 dǫ
u = Kc ǫ + ǫ dt + τd (4.1)
τi dt

where the three tuning constants are the controller proportional gain, Kc , the integral time, τi
and the derivative time, τd , the latter two constants having units of time, often minutes for in-
dustrial controllers. Personally, I find it more intuitive to use the reciprocal of the integral time,
1/τi , which is called reset and has units of repeats per unit time. This nomenclature scheme has
the advantage that no integral action equates to zero reset, rather than the cumbersome infinite
integral time. Just to confuse you further, for some industrial controllers to turn off the integral
component, rather than type in something like 99999, you just type in zero. It is rare in engineer-
ing where we get to approximate infinity with zero! Table 4.1 summarises these alternatives.

Table 4.1: Alternative PID tuning parameter conventions

Parameter symbol units alternative symbol units


Gain Kc input/output Proportional band PB %
integral time τi seconds reset 1/τi seconds−1
derivative time τd seconds

The Laplace transform of the ideal PID controller given in Eqn. 4.1 is
 
U (s) 1
= C(s) = Kc 1 + + τd s (4.2)
E(s) τi s

and the equivalent block diagram in parallel form is

integrator
?
ǫ - Kc - - 1
τi s
-+ - u
error input 6 controller output
- τd s

differentiator

where is is clear that the three terms are computed in parallel which is why this form of the PID
controller is sometimes also known as the parallel PID form.
4.2. THE INDUSTRIAL PID ALGORITHM 133

We could rearrange Eqn. 4.2 in a more familiar numerator/denominator transfer function format,

Kc τi τd s2 + τi s + 1
C(s) = (4.3)
τi s

where we can clearly see that the ideal PID controller is not proper, that is, the order of the
numerator (2), is larger than the order of the denominator (1) and that we have a pole at s = 0.
We shall see in section 4.2.1 that when we come to fabricate these controllers we must physically
have a proper transfer function, and so we will need to modify the ideal PID transfer function
slightly.

4.2.1 Implementing the derivative component

The textbook PID algorithm of Eqn. 4.2 includes a pure derivative term τd s. Such a term is
not physically realisable, and nor would we really want to implement it anyway, since abrupt
changes in setpoint would cause extreme values in the manipulated variable.

There are several approximations that are used in commercial controllers to address this problem.
Most schemes simply add a small factory-set lag term to the derivative term. So instead of simply
τd s, we would use
τd s
derivative term = τd (4.4)
s+1
N
where N is a large value, typically set to a large number somewhere between 10 and 100. Using
this derivative term modifies the textbook PID transfer function of Eqn. 4.2 or Eqn. 4.3 to
 
1 τd s
C(s) = Kc 1 + + τd (4.5)
τi s Ns+1
 
(τi τd + τi τd N ) s2 + (τi N + τd )s + N
= Kc (4.6)
τi s (τd s + N )

which is now physically realisable. Note that as N → ∞, the practical PID controller of Eqn. 4.6
converges to the textbook version of Eqn. 4.3. The derivative-filtered PID controller of Eqn. 4.6
collapses to a standard PI controller if τd = 0.

We can fabricate in M ATLAB the transfer function of this D-filtered PID controller with the pidstd
command which M ATLAB refers to as a PID controller in standard form.

Listing 4.1: Constructing a transfer function of a PID controller


>> C = pidstd(1,2,3,100) % Construct a standard form PID controller
Continuous-time PIDF controller in standard form:
3
 
1 τd s
% C(s) = Kp 1 + τi s
+ τd
s+1
N

with Kp = 1, Ti = 2, Td = 3, N = 100

The generated controller is of a special class known as a pidstd, but we can convert that to the
more familiar transfer function with the now overlayed tf command.

>> tf(C)
134 CHAPTER 4. THE PID CONTROLLER

Transfer function:
4 101 sˆ2 + 33.83 s + 16.67
-------------------------
sˆ2 + 33.33 s

A related M ATLAB function, pid, constructs a slight modification of the above PID controller in
what M ATLAB refers to as parallel form,
s
Cparallel (s) = P + I/s + D
τf s + 1

where in this case as the derivative filter time constant, τf , approaches zero, the derivative term
approaches the pure differentiator.

4.2.2 Variations of the PID algorithm

The textbook algorithm of the PID controller given in Eqn. 4.2 is sometimes known as the par-
allel or non-interacting form, however due to historical reasons there is another form of the PID
controller that is sometimes used. This is known as the series, cascade or interacting form
 
′ ′ 1
Gc (s) = Kc 1 + ′ (1 + τd′ s) (4.7)
τi s

where the three series PID controller tuning constants, Kc′ , τi′ and τd′ are related to, but distinct
from, the original PID tuning constants, Kc , τi and τd . A block diagram of the series PID controller
is given below.

differentiator integrator

- τd′ s - 1
τi′ s

? ?
ǫ - K′ - -+ - 1 -+ - u
c

error input controller output

A series PID controller in the form of Eqn. 4.7 can always be represented in parallel form

τi′ + τd′ τi′ τd′


Kc = Kc′ , τi = τi′ + τd′ , τd = (4.8)
τi′ τi′+ τd′

but not necessarily the reverse

Kc  p 
Kc′ = 1 + 1 − 4τd /τi (4.9)
2
′ τi  p 
τi = 1 + 1 − 4τd /τi (4.10)
2
′ τi  p 
τd = 1 − 1 − 4τd /τi (4.11)
2
since a series form only exists if τi > 4τd . For this reason, the series form of the PID controller is
less commonly used, although some argue that it is easier to tune, [186]. Note that both controller
forms are the same if the derivative component is not used. A more detailed discussion on the
various industrial controllers and associated nomenclatures is given in [98, pp32-33].
4.3. SIMULATING A PID PROCESS IN SIMULINK 135

PI controlled response Integral only control


0.5 0.4

0.4 0.3
output

output
0.3 0.2

0.2 0.1

0.1 0
0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200

0.8 0.6

0.6 0.4
input

input
0.4 0.2

0.2 0

0 −0.2
0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200
time (sec) dt=0.08 time (sec) dt=0.08

(a) PI control of a flapper with K = 1 and reset= 0.2 (b) Integral-only control of a flapper.

Figure 4.1: Comparing PI and integral-only control for the real-time control of a noisy flapper
plant with sampling time T = 0.08.

4.2.3 Integral only control

For some types of control, integral only action is desired. Fig. 4.1(a) shows the controlled response
of the laboratory flapper, (§1.4.1), controlled with a PI controller. The interesting characteristic of
this plant’s behaviour is the significant disturbances. Such disturbances make it both difficult to
control and demand excessive manipulated variable action.

If however we drop the proportional term, and use only integral control we obtain much the same
response but with a far better behaved input signal as shown in Fig. 4.1(b). This will decrease the
wear on the actuator.

If you are using Eqn. 4.1 with the gain Kc set to zero, then no control at all will result irrespective
of the integral time. For this reason, controllers either have four parameters as opposed to the
three in Eqn. 4.1, or we can follow the S IMULINK convention shown in Fig. 4.4.

4.3 Simulating a PID process in S IMULINK

S IMULINK is an ideal platform to rapidly simulate control problems. While it does not have the
complete flexibility of raw M ATLAB, this is more than compensated for in the ease of construction
and good visual feedback necessary for rapid prototyping. Fig. 4.2 shows a S IMULINK block
diagram for the continuous PID control of a third order plant.

3
PID
3s3 +5s2 +6s+1.6
Step PID Controller Plant output

Figure 4.2: A S IMULINK block diagram of a PID controller and a third-order plant.
136 CHAPTER 4. THE PID CONTROLLER

The PID controller block supplied as part of S IMULINK is slightly different from the classical de-
scription given in Eqn. 4.2. S IMULINK’s continuous PID block uses the complete parallel skeleton,

I Ns
Gc (s) = P + +D (4.12)
s τd s + N
where we choose the three tuning constants, P, I and D, and optionally the filter coefficient N
which typically lies between 10 and 100. You can verify this by unmasking the PID controller (via
the options menu) block to exhibit the internals as shown in Fig. 4.3.

Proportional Gain

1
1 I 1
s
u Sum y
Integral Gain
Integrator

Figure 4.3: An unmasked view of a PID D N


SumD
controller block that comes supplied in Derivative Gain Filter Coefficient

S IMULINK. Note how the configuration


1
differs from the classical form. See also s

Fig. 4.4. Filter

Note how the derivative component of the S IMULINK PID controller in Eqn. 4.12 follows the
realisable approximation given in Eqn. 4.4 by using a feedback loop with an integrator and gain
of N = 100 as a default.

Block diagrams of both the S IMULINK implementation and the classical PID scheme are com-
pared in Fig. 4.4. Clearly the tuning constants for both schemes are related as follows:

Kc
P = Kc , I= , D = Kc τd (4.13)
τi
or alternatively
Kc D
Kc = P, τi = , τd = (4.14)
I Kc
The S IMULINK scheme has the advantage of allowing integral-only control without modification.

- P

? ?
ǫ - - I -+ - u ǫ -Kc - - 1 -+ - u
s τi s
6 6
- Ds - τd s

Simulink PID controller block Classical controller block

Figure 4.4: Block diagram of PID controllers as implemented in S IMULINK (left) and classical text
books (right).
4.3. SIMULATING A PID PROCESS IN SIMULINK 137

If you would rather use the classical textbook style PID controller, then it is easy to modify the
S IMULINK PID controller block. Fig. 4.5 shows a S IMULINK implementation which includes the
filter on the derivative component following Eqn. 4.6. Since PID controllers are very common,
you may like to mask the controller as illustrated in Fig. 4.6, add a suitable graphic, and add this
to your S IMULINK library.

1
1 Kc 1/taui 1
s
error Add 4 u
P−gain reset Integrator

Figure 4.5: A realisable con-


s
taud tinuous PID controller imple-
taud /N.s+1
mented in S IMULINK with a
deriv filtered deriv filter on the derivative action.

3
3s3 +5s2 +6s+1.6
Step output
Plant
PID Controller

Figure 4.6: A S IMULINK block diagram of a classical PID controller as a mask

Note that to have a reverse acting controller, we either require all three constants to be negative,
or just add a negative gain to the output of the controller.

Reasonable S IMULINK controller tuning


constants for this example are: 1.5

1
y & ref

P = 2, I = 1.5, D=2
0.5
or in classical form
0

Kc = 2, τi = 1.33, τd = 1 4

2
u

which gives a step response as shown op-


posite. 0

0 2 4 6 8 10
time

Because the derivative term in the PID controller acts on the error rather than the output, we see
a large derivative kick in the controller output1 . We can avoid this by using the PID block with
anti-windup, or by modifying the PID block itself. Section 4.4.1 shows how this modification
works.

1 Note that I have re-scaled the axis in the simulation results


138 CHAPTER 4. THE PID CONTROLLER

4.4 Extensions to the PID algorithm

Industrial PID controllers are in fact considerably more complicated than the textbook formula
of Eqn. 4.1 would lead you to believe. Industrial PID controllers have typically between 15 and
25 parameters. The following describes some of the extra functionality one needs in an effective
commercial PID controller.

4.4.1 Avoiding derivative kick

If we implement the classical PID controller such as Eqn. 4.3 with significant derivative action, the
input will show jump excessively every time either the output or the setpoint changes abruptly.
Under normal operations conditions, the output is unlikely to change rapidly, but during a set-
point change, the setpoint will naturally change abruptly, and this causes a large, though brief,
spike in the derivative of the error. This spike is fed to the derivative part of the PID controller,
and causes unpleasant transients in the manipulated variable. If left unmodified, this may cause
excessive wear in the actuator. Industrial controllers and derivative kick is further covered in
[179, p191].

It is clear that there is a problem with the controller giving a large kick when the setpoint abruptly
changes. This is referred to as derivative kick and is due to the near infinite derivative of the error
when the setpoint instantaneously changes as when in a setpoint change. One way to avoid
problems of this nature is to use the derivative of the measurement ẏ, rather than the derivative
of the error ė = y˙⋆ − ẏ. If we do this, the derivative kick is eliminated, and the input is much less
excited. Equations of both the classical and the ‘anti-kick’ PID controller equations are compared
below.
 Z 
1 t dǫ
Normal: Gc (s) = Kc ǫ + ǫdτ + τd (4.15)
τi 0 dt
 Z 
1 t dy
Anti-Kick: Gc (s) = Kc ǫ + ǫdτ + τd (4.16)
τi 0 dt
The anti-derivative kick controller is sometimes known as a PI-D controller with the dash indi-
cating that the PI part acts on the error, and the D part acts on the output.

In S IMULINK you can build a PID controller with anti-kick by modifying the standard PID con-
troller block as shown in Fig. 4.7.

Fig. 4.8 compares the controlled performance for the third order plant and tuning constants given
previously in Fig. 4.2 where the derivative term uses the error (lefthand simulation), with the
modification where the derivative term uses the measured variable (righthand simulation).

Evident from Fig 4.8 is that the PID controller using the measurement rather than the error be-
haves better, with much less controller action. Of course, the performance improvement is only
evident when the setpoint is normally stationary, rather than a trajectory following problem. Sta-
tionary setpoints are the norm for industrial applications, but if for example, we subjected the
closed loop to a sine wave setpoint, then the PID that employed the error in the derivative term
would perform better than the anti-kick version.

An electromagnetic balance arm

The electromagnetic balance arm described previously in §1.4.1 is extremely oscillatory as shown
in Fig. 1.5(b). The overshoot is about 75% which corresponds to a damping ratio of ζ ≈ 0.1
4.4. EXTENSIONS TO THE PID ALGORITHM 139

input/output

1 3
1/taui Kc
s (s+4)(s+1)(s+0.4)
Sum1 Integrator Sum
Pulse gain Plant
reset
Generator
s
taud
taud/N.s+1
derivative filter
deriv

Figure 4.7: PID controller with anti-derivative kick. See also Fig. 4.8.

Normal PID No derivative Kick


1.5 1.5
output & setpoint

1 1

0.5 0.5

0 0

−0.5 −0.5
0 10 20 30 40 0 10 20 30 40

40 40

20 20
input

0 0

−20 −20

−40 −40
0 10 20 30 40 0 10 20 30 40
time time

Figure 4.8: PID control of a plant given in Fig. 4.7. The derivative kick is evident in the input
(lower trend of the left-hand simulation) of the standard PID controller. We can avoid this ‘kick’
by using the derivative of the output rather than the error as seen in the right-hand trend. Note
that the output controlled performance is similar in both cases
140 CHAPTER 4. THE PID CONTROLLER

assuming a proto-type second order process model.

To stabilise a system with poles so close to the imaginary axis requires substantial derivative
action. Without derivative action, the integral component needed to reduce the offset would
cause instability. Unfortunately however the derivative action causes problems with noise and
abrupt setpoint changes as shown in Fig. 4.9(a).

No derivative kick
Derivative kick 2200
2500
2000
Flapper angle

2000
1800

1500 1600

1400
1000

1600
3000
1400
2000
input

1200
1000 1000

0 800
0 10 20 30 40 50 60 0 10 20 30 40 50
time [s] time [s]
(a) PID control exhibiting significant derivative kick. (b) PID control with anti-kick.

Figure 4.9: Illustrating the improvement of anti-derivative kick schemes for PID controllers when
applied to the experimental electromagnetic balance.

The controller output, u(t) exhibits a kick every time the setpoint is changed. So instead of the
normal PID controller used in Fig. 4.9(a), an anti-kick version is tried in Fig. 4.9(b). Clearly there
are no spikes in the input signal when Eqn. 4.16 is used, and the controlled response is slightly
improved.

Abrupt setpoint changes are not the only thing that can trip up a PID controller. Fig. 4.10 shows
another response from the electromagnetic balance, this time with even more derivative action.
At low levels the balance arm is very oscillatory, although this behaviour tends to disappear at
the higher levels owing to the nonlinear friction effects.

4.4.2 Input saturation and integral windup

Invariably in practical cases the actual manipulated variable value, u, demanded by the PID
controller is impossible to implement owing to physical constraints on the system. A control
valve for example cannot shut less than 0% open or open more than 100%. Normally “clamps”
are placed on the manipulated variable to prevent unrealisable input demands occurring such as

umin < u < umax , or u = MIN(max u,MAX(min u,u))

or in other words, if the input u is larger than the maximum input allowed max_u, it will be
reset to that maximum input, and similarly the input is saturated if it is less than the minimum
allowable limit. In addition to an absolute limit on the position of u, the manipulated variable
can not instantaneously change from one value to another. This can be expressed as a limit on
the derivative of u, such as | du
dt | < cd .
4.4. EXTENSIONS TO THE PID ALGORITHM 141

4000

3500
Setpoint & output

3000

2500

2000

1500
5000

4000

3000 Figure 4.10: Derivative


input

control and noise. Note


2000 the difference in re-
sponse character once
1000 the setpoint is increased
0 over 3000. This is
0 5 10 15 20 25 30 35 due to nonlinear fric-
time [s] tion effects.

However the PID control algorithm presented so far assumes that the manipulated variable is
unconstrained so when the manipulated variable does saturate, the integral error term continues
to grow without bound. When finally the plant ‘catches up’ and the error is reduced, the integral
term is still large, and the controller must ‘eat up this error’. The result is an overly oscillatory
controlled response.

This is known as integral windup. Historically with the analogue controllers, integral windup
was not much of a problem since the pneumatic controllers had only a limited integral capacity.
However, this limit is effectively infinite in a digital implementation.

There are a number of ways to prevent integral windup and these are discussed in [15, pp10–14]
and more recently in [138, p60] and [31]. The easiest way is to check the manipulated variable
position, and ignore the integral term if the manipulated variable is saturated.

An alternative modification is to compute the difference between the desired manipulated vari-
able and the saturated version and to feed this value back to the integrator within the controller.
This is known as anti-windup tracking and is shown in Fig. 4.11(a). When the manipulated input is
not saturated, there is no change to normal PID algorithm. Note that in this modified PID config-
uration we differentiate the measured value (not the error), we approximate the derivative term
as discussed in section 4.2.1, and we can turn off the anti-windup component with the manual
switch.

As an example (adapted from [19, Fig 8.9]), suppose we are to control an integrator plant, G(s) =
1/s, with tight limits on the manipulated variable, |u| < 0.1. Without anti-windup, the controller
output rapidly saturates, and the uncompensated response shown in Fig. 4.11(b) is very oscilla-
tory. However with anti-windup enabled, the controlled response is much improved.
142 CHAPTER 4. THE PID CONTROLLER

Manual Switch Constant


0

Sum3
1/taui 1/s
Sum2
Integrator
reset

Sum1
1 Kc 1
error u
gain actuator model

taud.s+1
2 Kc*taud
taud/N.s+1
y
Kc*deriv approx Deriv
(a) A PID controller with anti-windup.

Anti−windup demonstration
3
2
output & setpoint

1
0
−1
−2 Anti−windup off Anti−windup on
−3

0.15
0.1
0.05
input

0
−0.05
−0.1

0 100 200 300 400 500 600 700 800 900 1000
time
(b) Turning on the anti-windup controller at t = 660 dramatically improves the controlled response.

Figure 4.11: The advantages of using anti-windup are evident after t = 660.
4.5. DISCRETE PID CONTROLLERS 143

4.5 Discrete PID controllers

To implement a PID controller as given in Eqn. 4.1 on a computer, one must first discretise or
approximate the continuous controller equation. If the error at time t = kT is ǫk , then the contin-
uous expression
 Z 
1 de(t)
u(t) = Kc e(t) + e(t) dt + τd (4.17)
τi dt

can be approximated with


 
 k

 T X τd 
 
uk = Kc ǫk + ǫj + (ǫk − ǫk−1 ) (4.18)
 τi j=0 T
| {z }
 
| {z } differential
integral

The integral in Eqn. 4.17 is approximated in Eqn. 4.18 by the rectangular rule and the derivative
is approximated as a first order difference, although other discretisations are possible. Normally
the sample time, T , used by the controller is much faster than the dominant time constants of the
process so the approximation is satisfactory and the discrete PID controller is, to all intents and
purposes, indistinguishable from a continuous controller.

It is considerably more convenient to consider the discrete PID controller as a rational polynomial
expression in z. Taking z-transforms of the discrete position form of the PID controller, Eqn. 4.18,
we get
 
 T −i+1  τd 
+ z −i+2 + · · · + z −1 + 1 E(z) + (1 − z −1 )E(z)
 
U (z) = Kc E(z) + z
 τi T
| {z }
| {z }
integral differential

This shows that the transfer function of the PID controller is


 
T 1 τd −1
GPID (z) = Kc 1 + + (1 − z ) (4.19)
τi (1 − z −1 ) T
 !
Kc T + τi T + τd τi − τi (T − 2τd ) z −1 + τi τd z −2
2
= (4.20)
T τi 1 − z −1

Eqn. 4.19 is the discrete approximation to Eqn. 4.2 and the approximation is reasonable provided
the sample time, T , is sufficiently short. A block diagram of this discrete approximation is

?
ǫ - Kc - T - 1 -+ - u
τi (1−z −1 )
6
- τd - 1 − z −1
T

which we could further manipulate this using (block diagram algebra rules) to use only delay
elements to give
144 CHAPTER 4. THE PID CONTROLLER

z −1 
+ 6
? ?
ǫ - Kc - T -+ -+ - u
τi
6
- τd -+
T
6

- z −1

There are other alternatives for a discrete PID controller depending on how we approximate the
integral part. For example we could use a backward difference,
ukT = u(k−1)T + T y(k−1)T
| {z } | {z }
old integral add on
T
Gi (z) =
z−1
or a forward difference,
ukT = u(k−1)T + T ykT
| {z } | {z }
old integral add on

Tz
Gi (z) =
z−1
(which is not to be recommended due to stability problems), or the trapezoidal approximation.

ukT = u(k−1)T + T /2 ykT + y(k−1)T
| {z } | {z }
old integral add on

T (z + 1)
Gi (z) =
2(z − 1)
We can insert any one of these alternatives to give the overall discrete z-domain PID controller,
although the trapeziodal scheme
 
T (1 + z −1 ) τd −1
GPID (z) = Kc 1 + + (1 − z ) (4.21)
2τi (1 − z −1 ) T
  !
Kc T 2 + 2τi T + 2τd τi + T 2 − 2τi T − 4τi τd z −1 + 2τi τd z −2
= (4.22)
2T τi 1 − z −1

is the most accurate and therefore the preferred implementation.

A S IMULINK discrete PID controller with sample time T using the trapeziodal approximation of
Eqn. 4.21 is given in Fig. 4.12. Note that now, as opposed to the continuous time implementation
of the PID controller, the derivative and integral gain values are now a function of sample time,
T . While continuous versions of PID controllers exist in S IMULINK, discrete versions simulate
much faster.

4.5.1 Discretising continuous PID controllers

The easiest way to generate a discrete PID controller is to simply call the M ATLAB standard PID
pidstd with a trailing argument to indicate that you want a discrete controller. Since the default
4.5. DISCRETE PID CONTROLLERS 145

1+z −1
1 Kc T/taui/2 1
error 1−z−1 u
Sum1
gain Discrete Filter1
integral

1−z−1
taud/T
1
Discrete Filter2
derivative

Figure 4.12: A discrete PID controller in S IMULINK using a trapezoidal approximation for the
integral with sample time T following Eqn. 4.21. This controller block is used in the simulation
presented in Fig. 4.13.

discretisation strategy is using the forward Euler, it would be prudent to use explicitly state the
stable backward Euler option for both the integration and differentiation.

Listing 4.2: Constructing a discrete (filtered) PID controller


>> C = pidstd(1,2,3,100,0.1, ...
'IFormula','BackwardEuler','DFormula','BackwardEuler')
Discrete-time PIDF controller in standard form:
4
 
1 Ts z 1
% C(s) = Kp 1 + τi
· z−1
+ τd Ts z
τd /N+ z−1

with Kp = 1, Ti = 2, Td = 3, N = 100, Ts = 0.1

9 >> tf(C)

Transfer function:
24.13 zˆ2 - 47.4 z + 23.31
--------------------------
14 zˆ2 - 1.231 z + 0.2308

Sampling time: 0.1

Velocity form of the PID controller

Eqn. 4.18 is called the position form implementation of the PID controller. An alternative form
is the velocity form which is obtained by subtracting the previous control input uk−1 from the
current input uk to get

∆uk = uk − uk−1
    
T τd 2τd τd
= Kc 1 + + ǫk − 1 + ǫk−1 + ǫk−2 (4.23)
τi T T T

The velocity form in Eqn. 4.23 has three advantages over the position form: (see [191, pp636-637]),
namely it requires no need for initialisation, (the computer does not need to know the current u),
no integral windup problems, and the algorithm offers some protection against computer failure
in that if the computer crashes, the input remains at the previous, presumably reasonable, value.
146 CHAPTER 4. THE PID CONTROLLER

One drawback, however, is that it should not be used for P or PD controllers since the controller
is unable to maintain the reference value.

Fig. 4.13 shows a S IMULINK block diagram of a pressure control scheme on a headbox of a paper
machine. The problem was that the pressure sensor was very slow, only delivering a new pres-
sure reading every 5 seconds. The digital PID controller however was running with a frequency
of 1 Hz. The control engineer in this application faced with poor closed loop performance, was
concerned that the pressure sensor was too slow and therefore should be changed. In Fig. 4.13,
the plant is a continuous transfer function while the discrete PID controller runs with sample
period of T = 1 second, and the pressure sensor is modelled with a zeroth-order hold with T = 5
seconds. Consequently the simulation is a mixed continuous/discrete example, with more than
one discrete sample time.

3.5
error ctrl output
200s2 +30s+1
Signal headbox Scope
Generator Discrete PID

pressure sensor

Figure 4.13: Headbox control with a slow pressure transducer. The discrete PID controller was
given in Fig. 4.12.

Fig. 4.14 shows the controlled results. Note how the pressure signal to the controller lags behind
the true pressure, but the controller still manages to control the plant.

4.5.2 Simulating a PID controlled response in Matlab

We can simulate a PID controlled plant in M ATLAB by writing a small script file that calls a gen-
eral PID controller function. The PID controller is written in the discrete velocity form following
Eqn. 4.23 in the M ATLAB function file pidctr.m shown in listing 4.3.

Listing 4.3: A simple PID controller


function [u,e] = pidctr(err,u,dt,e,pidt)
% [u,e] = pidctr(err,u,dt,e,pidt)
% PID controller
4 % err: = ǫ, current error, u = u(t), current manipulated variable
% dt = T sample time, e: row vector of past 3 errors
% pidt = [Kc , 1/τI , τd ] = tuning constants

k = pidt(1); rs = pidt(2); td2=pidt(3)/dt;


9 e = [err,e(1:2)]; % error shift register
du = k*[1+rs*dt+td2,-1-2*td2,td2]*e';
u = du + u; % Update control value unew = uold + ∆u
return

This simple PID controller function is a very naive implementation of a PID controller without
any of the necessary modifications common in robust commercial industrial controllers as de-
scribed in section 4.4.
4.5. DISCRETE PID CONTROLLERS 147

2
y
1.5 y sampled
setpoint
1
y & setpoint

0.5

−0.5

−1

−1.5
4

0
u

−2

−4
0 50 100 150
time [s]

Figure 4.14: Headbox control with a slow pressure transducer measurement sample time of T = 5
while the control sample time is T = 1. Upper: The pressure setpoint (dotted), the actual pressure
and the sampled-and-held pressure fed back to the PID controller. Lower: The controller output.

The plant to be controlled for this simulation is

1.2q −1
Gp (q −1 ) = q −d (4.24)
1 − 0.25q −1 − 0.5q −2

where the sample time is T = 2 seconds and the dead time is d = 3 sample time units, and the
setpoint is a long period square wave. For this simulation, we will try out the PID controller with
tuning constants of Kc = 0.3, 1/τi = 0.2 and τd = 0.1. How I arrived at these tuning constants is
discussed later in §4.6.

The M ATLAB simulation using the PID function from Listing 4.3 is given by the following script
file:

a=[0.25,0.5];b=1.2;theta=[a b]'; dead = 3; % Plant G(q), Eqn. 4.24.

3 dt = 2.0; t = dt*[0:300]'; yspt = square(t/40); % time vector & setpoint


y = zeros(size(yspt));u=zeros(size(y)); % initialise y(t) & u(t)
pid = [0.3, 0.2, 0.1]; % PID tuning constants: Kc = 0.3, 1/τI = 0.2, τd = 0.1
e = zeros(1,3); % initialise error integral

8 for i=3+dead:length(y)
X = [y(i-1), y(i-2), u(i-1-dead)]; % collect i/o
y(i) = X*theta; % system prediction
err = yspt(i) - y(i); % current error
[u(i),e] = pidctr(err,u(i-1),dt,e,pid); % PID controller from Listing 4.3.
13 end % for
plot(t,yspt,'--',t,[y,u]) % Plot results in Fig. 4.15.
148 CHAPTER 4. THE PID CONTROLLER

Figure 4.15 shows the controlled response of this simulation. Note how I have plotted the input
(lower plot of Fig. 4.15) as a series of horizontal lines that show that the input is actually a
piecewise zeroth-order hold for this discrete case using the stairs function.

PID control
2

y & setpoint 1

−1

−2

0.5
Input

Figure 4.15: The output (up-


−0.5
per solid), setpoint (upper
dashed) and discretised input
−1
(lower) of a PID controlled 0 50 100 150 200
process. Time

4.5.3 Controller performance as a function of sample time

Given that the discrete PID controller is an approximation to the continuous controller, we must
expect a deterioration in performance with increasing sample time. Our motivation to use coarse
sampling times is to reduce the computational load. Fig. 4.16 compares the controlled response
of the continuous plant,
s+3
Gp (s) =
(s + 4)(τ 2 s2 + 2ζτ s + 1)
with τ = 4, ζ = 0.4 given the same continuous controller settings at different sampling rates.
Note that the reset and derivative controller settings for the discrete controller are functions of
time, and must be adjusted accordingly. Fig. 4.16 shows that the controller performance improves
as the sampling time is decreased and converges to the continuous case. However if the sampling
time is too small, the discrete PID controller is then suspectable to numerical problems.

4.6 PID tuning methods

Tuning PID controllers is generally considered an art and is an active research topic both in
academia and in industry. Typical chemical processing plants have hundreds or perhaps thou-
sands of control loops of which the majority of these loops are non-critical and are of the PID
type and all these loops require tuning. Årzen [8] asserts that “it is a well known fact that many
4.6. PID TUNING METHODS 149

Ts= 4 Ts= 2 Ts= 1 Ts=0.1


1.5
discrete

1
y

0.5 continuous

2.5

1.5
u

0.5

0
0 20 40 0 20 40 0 20 40 0 20 40

Figure 4.16: The effect of varying sampling time, T , when using a discrete PID controller com-
pared to a continuous PID controller. As T → 0, the discrete PID controller converges to the
continuous PID controller.

control loops in (the) process industry are badly tuned, or run in manual mode.” Supporting this
claim, here is a summary of the rather surprising results that Ender, [64], found after investigating
thousands of control loops over hundreds of plants2 ;

• More than 30% of the controllers are operating in manual.

• More than 60% of all installed loops produce less variance in manual than automatic.

• The average loop costs $40,000 to design, instrument and install.

Note however that it is not only industry that seems unable to tune PID controllers since many
publications in the academic world also give mis-tuned PID controllers. This is most common
when comparing the PID with some other sort of more sophisticated (and therefore hopefully
better performing), controller. So it appears that it would be worthwhile to look more closely at
the tuning of PID regulators.

There are two possibilities that we face when tuning PID controllers. One is that we have a model
of the plant to be controlled, perhaps as a transfer function, so then we need to establish a suitable
PID controller such that when combined with the plant, we obtain an acceptable response. The
second case is where we don’t even have a model of the plant to be controlled, so our task is also
to identify (implicitly or explicitly) this plant model as well.

Tuning PID controllers can be performed in the time domain or in the frequency domain with the
controller either operating (closed loop), or disconnected (open loop), and the tuning parameter
calculations can be performed either online or offline. The online tuning technique is the central
component of automated self tuners, discussed more in chapter 7. Section 4.6.1 considers the two
classical time domain tuning methods, one closed loop, the other open loop.
2 mainly in the US
150 CHAPTER 4. THE PID CONTROLLER

4.6.1 Open loop tuning methods

Open loop tuning methods are where the feedback controller is disconnected and the experi-
menter excites the plant and measures the response. They key point here is that since the con-
troller is now disconnected the plant is clearly now no longer strictly under control. If the loop
is critical, then this test could be hazardous. Indeed if the process is open-loop unstable, then
you will be in trouble before you begin. Notwithstanding for many process control applications,
open loop type experiments are usually quick to perform, and deliver informative results.

To obtain any information about a dynamic process, one must excite it in some manner. If the
system is steady at setpoint, and remains so, then you have no information about how the process
behaves. (However you do have good control so why not quit while you are ahead?) The type
of excitation is again a matter of choice. For the time domain analysis, there are two common
types of excitation signal;– the step change, and the impulse test, and for more sophisticated
analysis, one can try a random input test. Each of the three basic alternatives has advantages and
disadvantages associated with them, and the choice is a matter for the practitioner.

Step change The step change method is where the experimenter abruptly changes the input to
the process. For example, if you wanted to tune a level control of a buffer tank, you could
sharply increase (or decrease) the flow into the tank. The controlled variable then slowly
rises (or falls) to the new operating level. When I want a quick feel for a new process, I
like to perform a step test and this quickly gives me a graphical indication of the degree of
damping, overshoot, rise time and time constants better than any other technique.

Impulse test The impulse test method is where the input signal is abruptly changed to a new
value, then immediately equally abruptly changed back to the old value. Essentially you
are trying to physically alter the input such that it approximates a Dirac delta function.
Technically both types of inputs are impossible to perform perfectly, although the step test
is probably easier to approximate experimentally.
The impulse test has some advantages over the step test. First, since the input after the ex-
periment is the same as before the experiment, the process should return to the same pro-
cess value. This means that the time spent producing off-specification (off-spec) product
is minimised. If the process does not return to the same operating point, then this indi-
cates that the process probably contains an integrator. Secondly the impulse test (if done
perfectly) contains a wider range of excitation frequencies than the step test. An excitation
signal with a wide frequency range gives more information about the process. However
the impulse test requires slightly more complicated analysis.

Random input The random input technique assumes that the input is a random variable approx-
imating white noise. Pure white noise has a wide (theoretically infinite) frequency range,
and can therefore excite the process over a similarly wide frequency range. The step test,
even a perfect step test, has a limited frequency range. The subsequent analysis of this type
of data is now much more tedious, though not really any more difficult, but it does require
a data logger (rather than just a chart recorder) and a computer with simple regression
software. Building up on this type of process identification where the input is assumed,
within reason, arbitrary, are methods referred to as Time Series Analysis (TSA), or spectral
analysis;– both of which are dealt with in more detail in chapter 6.

Open-loop or process reaction curve tuning methods

There are various tuning strategies based on an open-loop step response. While they all follow
the same basic idea, they differ in slightly in how they extract the model parameters from the
4.6. PID TUNING METHODS 151

- Plant -
input output

6
tangent at inflection pt
K ..
..
..
..
..
..
..
..
..
..
..
..
..
. -time
- L T -

Figure 4.17: The parameters T and L to be graphically estimated for the openloop tuning method
relations given in Table 4.2.

recorded response, and also differ slightly as to relate appropriate tuning constants to the model
parameters. This section describes three alternatives, the classic Ziegler-Nichols open loop test,
the Cohen-Coon test, and the Åström-Hägglund suggestion.

The classic way of open loop time domain tuning was first published in the early 40s by Ziegler
and Nichols3 , and is further described in [150, pp596–597] and in [179, chap 13]. Their scheme re-
quires you to apply a unit step to the open-loop process and record the output. From the response,
you graphically estimate the two parameters T and L as shown in Fig. 4.17. Naturally if your
response is not sigmoidal or ‘S’ shaped such as that sketched in Fig. 4.17 and exhibits overshoot,
or an integrator, then this tuning method is not applicable.

This method implicitly assumes the plant can be adequately approximated by a first order trans-
fer function with time delay,
Ke−θs
Gp ≈ (4.25)
τs + 1
where L is approximately the dead time θ, and T is the open loop process time constant τ . Once
you have recorded the openloop input/output data, and subsequently measured the times T and
L, the PID tuning parameters can be obtained directly from Table 4.2.

A similar open loop step tuning strategy due to Cohen and Coon published in the early 1950s is
where you record the time taken to reach 50% of the final output value, t2 , and the time taken to
reach 63% of the final value, t3 . You then calculate the effective deadtime with
t2 − ln(2)t3
θ=
1 − ln(2)
and time constant,
τ = t3 − t1
The open loop gain can be calculated by dividing the final change in output over the change in
the input step.
3 Ziegler & Nichols actually published two methods, one open loop, and one closed loop, [204]. However it is only the

second closed loop method that is generally remembered today as the “Ziegler–Nichols” tuning method.
152 CHAPTER 4. THE PID CONTROLLER

Once again, now that you have a model of the plant to be controlled in the form of Eqn. 4.25,
you can use one the alternative heuristics given in Table 4.2. The recommended range of values
for the deadtime ratio for the Cohen-Coon values is 0.1 < θ/τ < 1. Also listed in Table 4.2 are
the empirical suggestions from [16] known as AMIGO, or approximate M -constrained integral
gain optimisation. These values have the same form as the Cohen-Coon suggestions but perform
slightly better.

Table 4.2: The PID tuning parameters as a function of the openloop model parameters K, τ and
θ from Eqn. 4.25 as derived by Ziegler-Nichols (open loop method), Cohen and Coon, or alterna-
tively the AMIGO rules from [16].

Controller Kc τi τd
τ
P – –
Ziegler-Nichols Kθ
(Open loop) 0.9τ θ
PI –
Kθ 0.3
1.2τ
PID 2θ 0.5θ

 
1 τ θ
P 1+ – –
K θ 3τ
Cohen-Coon  
1 τ θ 30 + 3θ/τ
PI 0.9 + θ –
K θ 12τ 9 + 20θ/τ
 
1 τ 4 θ 32 + 6θ/τ 4
PID + θ θ
K θ 3 4τ 13 + 8θ/τ 11 + 2θ/τ

1  τ 0.4θ + 0.8τ 0.5θτ


AMIGO PID 0.2 + 0.45 θ
K θ θ + 0.1τ 0.3θ + τ

Fig. 4.18 illustrates the approximate first-order plus deadtime model fitted to a higher-order over-
damped process using the two points at 50% and 63%. The subsequent controlled response using
the values derived from in Table 4.2 is given in Fig. 4.19.

1.5

fitted model
1
y

Figure 4.18: Fitting a first-order model with


actual
deadtime using the Cohen-Coon scheme.
Note how the fitted model is a reason- 0.5
able approximation to the actual response
t2 t3
just using the two data points and gain.
See Fig. 4.19 for the subsequent controlled 0
0 5 10 15 20 25 30 35
response. time

Conventional thought now considers that both the Zeigler-Nichols scheme in Table 4.2 and the
Cohen-Coon scheme gives controller constants that are too oscillatory and hence other modified
tuning parameters exist, [178, p329]. Problem 4.1 demonstrates this tuning method.
4.6. PID TUNING METHODS 153

P−only PI−control PID control


2 2 2
y & setpoint

0 0 0

−2 −2 −2

5 5 5

0 0 0
u

−5 −5 −5
0 100 200 0 100 200 0 100 200
time time time

Figure 4.19: The closed loop response for a P, PI and PID controlled system using the Cohen-Coon
strategy from Fig. 4.18.

Problem 4.1 Suppose you have a process that can be described by the transfer function

K
Gp =
(3s + 1)(6s + 1)(0.2s + 1)

Evaluate the time domain response to a unit step change in input and graphically estimate the
parameters L and T . Design a PI and PID controller for this process using Table 4.2.

Controller settings based on the open loop model

If we have gone to all the trouble of estimating a model of the process, then we could in principle
use this model for controller design in a more formal method than just rely on the suggestions
given in Table 4.2. This is the thinking behind the Internal Model Control or IMC controller
design strategy. The IMC controller is a very general controller, but if we restrict our attention to
just controllers of the PID form, we can derive simple relations between the model parameters
and appropriate controller settings.

The nice feature of the IMC strategy is that it provides the scope to adjust the tuning with a single
parameter, the desired closed loop time constant, τc , something that is missing from the strategies
given previously in Table 4.2. A suitable starting guess for the desired closed loop time constant
is to set it equal to the dominant open loop time constant.

Table 4.3 gives the PID controller settings based on various common process models. For a more
complete table containing a larger selection of transfer functions, consult [179, p308].

An simplification of the this IMC idea in an effort to make the tuning as effortless as possible is
given in [186].

Perhaps the easiest way to tune a plant when the transfer function is known is to use the M ATLAB
function pidtune, or the GUI, pidtool as shown in Fig. 4.20.
154 CHAPTER 4. THE PID CONTROLLER

Table 4.3: PID controller settings based on IMC for a small selection of common plants where the
control engineer gets to chose a desired closed loop time constant, τc .

Plant PID constants


Kc K τi τd
τ
K/(τ s + 1) τ –
τc
K τ1 + τ2
τ1 + τ2 –
(τ1 s + 1)(τ2 s + 1) τc
2
K/s 2τc –
τc
K 2τc + τ 2τc τ
2
2τc + τ
s(τ s + 1) τc 2τc + τ
τ
Ke−θs τ –
τc + θ
τs + 1 τ + θ/2 τθ
τ + θ/2
τc + θ/2 2τ + θ

Figure 4.20: PID tuning of an arbitrary transfer function using the M ATLAB GUI.
4.6. PID TUNING METHODS 155

4.6.2 Closed loop tuning methods

The main disadvantage of the open loop tuning methods is that it is performed with the controller
switched to manual, i.e. leaving the output uncontrolled in open loop. This is often unreason-
able for systems that are openloop unstable, and impractical for plants where there is a large
invested interest, and the operating engineers are nervous. The Ziegler-Nichols continuous cy-
cling method described next is a well-known closed loop tuning strategy used to address this
problem, although a more recent single response strategy given later in §4.6.3 is faster, safer, and
easier to use.

Ziegler-Nichols continuous cycling method

The Ziegler-Nichols continuous cycling method is one of the best known closed loop tuning
strategies. The controller is left on automatic, but the reset and derivative components are turned
off. The controller gain is then gradually increased (or decreased) until the process output contin-
uously cycles after a small step change or disturbance. At this point, the controller gain you have
selected is the ultimate gain, Ku , and the observed period of oscillation is the ultimate period,
Pu .

Ziegler and Nichols originally suggested in 1942 PID tuning constants as a function of the ulti-
mate gain and ultimate period as shown in the first three rows of Table 4.4. While these values
give near-optimum responses fort load changes, practical experience and theoretical consider-
ations (i.e. [15, 29]) have shown that these tuning values tend to give responses that are too
oscillatory for step-point changes due to the small phase margin. For this reason, various people
have subsequently modified the heuristics slightly as listed in the remaining rows of Table 4.4
which is expanded from that given in [178, p318].

Table 4.4: Various alternative ‘Ziegler-Nichols’ type PID tuning rules as a function of the ultimate
gain, Ku , and ultimate period, Pu .
Response type PID constants
Kc τi τd
P 0.5Ku – –
Ziegler-Nichols PI 0.45Ku Pu /1.2 –
PID 0.6Ku Pu /2 Pu /8
No overshoot 0.2Ku Pu /2 Pu /2
Modified ZN Some overshoot 0.33Ku Pu /2 Pu /3
PI 0.31Ku 2.2Pu –
Tyreus-Luyben PID 0.45Ku 2.2Pu Pu /6.3
Chien-Hrones-Reswick PI 0.47Ku 1 –
Åström-Hägglund PI 0.32Ku 0.94 –
Specified phase-margin, φm PID Ku cos(φm ) f τd Eqn. 4.56

Experience has shown that the Chien-Hrones-Reswick values give an improved response on the
original Ziegler-Nichols, but the Åström-Hägglund tend, like the ZN, to be overly oscillatory.
While the Tyreus-Luyben values deliver very sluggish responses, they exhibit very little over-
shoot and are favoured by process engineers for that reason.

Algorithm 4.1 summarises the ZN ultimate oscillation tuning procedure.


156 CHAPTER 4. THE PID CONTROLLER

Algorithm 4.1 Ziegler-Nichols closed loop PID tuning

To tune a PID controller using the closed-loop Ziegler-Nichols method, do the following:

1. Connect a proportional controller to the plant to be controlled. I.e. turn the controller on
automatic, but turn off the derivative and integral action. (If your controller uses integral
time, you will need to set τi to the maximum allowable value.)

2. Choose a trial sensible controller gain, Kc to start.

3. Disturb the process slightly and record the output.

4. If the response is unstable, decrease Kc and go back to step 3, otherwise increase Kc and
repeat step 3 until the output response is a steady sinusoidal oscillation. Once a gain Kc
has been found to give a stable oscillation proceed to step 5.

5. Record the current gain, and measure the period of oscillation in time units (say seconds).
These are the ultimate gain, Ku and corresponding ultimate period, Pu .

6. Use Table 4.4 to establish the P, PI, or PID tuning constants.

7. Test the closed loop response with the new PID values. If the response is not satisfactory,
further manual ‘fine-tuning’ may be necessary.

This method has proved so popular, that automatic tuning procedures have been developed that
are based around this theory as detailed in chapter 7. Despite the fact that this closed loop test
is difficult to apply experimentally, gives only marginal tuning results in many cases, it is widely
used and very well known. Much of the academic control literature uses the Z-N tuning method
as a basis on which to compare more sophisticated schemes but often conveniently forgetting in
the process that the Z-N scheme was really developed for load disturbances as opposed to set-
point changes. Finally many practicing instrument engineers (who are the ones actually tuning
the plant) know only one formal tuning method— this one.

Consequently it is interesting to read the following quote from a review of a textbook in Process
control:4 “ . . . The inclusion of tuning methods based on the control loop cycling (Ziegler-Nichols
method) without a severe health warning to the user reveals a lack of control room experience on
behalf of the author. ”

Ziegler-Nichols continuous cycling example

Finding the ultimate gain and frequency in practice requires a tedious ‘trial-and-error’ approach.
If however, we already have the model of the process, say in the form of a transfer function, then
establishing the critical frequency analytically is much easier, although we may still need to solve
a nonlinear algebraic equation.

Suppose we have identified a model of our plant as

1
G(s) = e−3s (4.26)
6s2 + 7s + 1
4 Review of A Real-Time Approach to Process Control by Svrcek, Mahoney & Young, reviewed by Patrick Thorpe in The

Chemical Engineer, January 2002, p30.


4.6. PID TUNING METHODS 157

and we want to control this plant using a PID controller. To use Ziegler-Nichols settings we need
to establish the ultimate frequency, ωu , where the angle of G(s = iωu ) = −π radians, or solve the
nonlinear expression
6 G(iωu ) = −π (4.27)
for ωu . In the specific case given in Eqn. 4.26, we have

6 e−3iωu −6 2
1−6ωu +7iωu = −π
 
7ωu
−3ωu − tan−1 = −π (4.28)
1 − 6ωu2

which is a non-trivial function of ultimate frequency, ωu . However it is easy to graph this relation
as a function of ωu and look for zero crossing such as shown in Fig. 4.21, or use a numerical
technique such as Newton-Rhapson to establish that ωu ≈ 0.4839 rad/s implying an ultimate
period of Pu = 2π/ωu = 12.98 seconds per cycle.

−3 w−atan2(7 w,1−6 w2)+π

1
f(ω)

−1
Figure 4.21: Solving F (ω) = 0 for the
−2
ultimate frequency. In this case the ul-
0 0.2 0.4 0.6 0.8 1
ω timate frequency is ωu ≈ 0.48 rad/s.

A quick way to numerically solve Eqn. 4.28 is to first define an anonymous function and then to
use fzero to find the root.

 

1 >> F = @(w) -3*w - atan2(7*w,1-6*w.ˆ2)+pi; % F (ω) = −3ω − tan−1 1−6ω 2

>> ezplot(F,[0 1]) % See Fig. 4.21.
>> wu = fzero(F,0.1) % Solve F (ω) = 0 for ω
wu =
0.4839

Once we have found the critical frequency, ωu , we can establish the magnitude at this frequency
by substituting s = iωu into the process transfer function, Eqn. 4.26,

|G(iωu )| = |G(0.48i)| ≈ 0.2931

which gives a critical gain, Ku = 1/0.2931 = 3.412. An easy way to do this calculation in M ATLAB
is to simply use bode at a single frequency, as in

>> [M,ph] = bode(G,wu) % Compute G(s = jωu ), should obtain φ = −180◦ , and Ku .
M =
0.2931
ph =
10 -180.0000

Now that we have both the ultimate gain and frequency we can use the classic Zielger-Nichols
rules in Table 4.4 to obtain our tuning constants and simulate the controlled process as shown in
Fig. 4.22.
158 CHAPTER 4. THE PID CONTROLLER

P control PI control PID control

1.4 1.4 1.4

1.2 1.2 1.2

1 1 1
output & setpoint

0.8 0.8 0.8


0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
0 0 0
−0.2 −0.2 −0.2

2 2 2
input

1 1 1

0 0 0
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
time time time

Figure 4.22: The step and load responses of a P (left) and PI (middle) and PID (right) controlled
process using the closed loop Ziegler-Nichols suggestions from Table 4.4.

The tuned response to both a setpoint change and a load disturbance at t = 100 of all three
candidates shown in Fig. 4.22 is reasonable, but as expected the P response exhibits offset, and
the PI is sluggish. We would expect a good response from the PID controller because the actual
plant model structure (second order plus deadtime), is similar to the assumed model structure
by Ziegler and Nichols, which is first order plus deadtime. However while it could be argued
that the step response is still too oscillatory, the response to the load disturbance is not too bad.

In the very common case of a first order plus deadtime structure, Eqn. 4.25, we can find the
ultimate frequency by solving the nonlinear equation
− θωu − tan−1 (τ ωu ) + π = 0 (4.29)
for ωu , and then calculate the ultimate gain from a direct substitution

1 1 + τ 2 ω2
Ku = = (4.30)
|G(iωu )| K

The tedious part of the above procedure is generating the nonlinear equation in ωu for the spe-
cific plant structure of interest. We can however automate this equation generation for standard
polynomial transfer functions with deadtime as shown in the listing below.

[num,den] = tfdata(G,'v') % Or use: cell2mat(G.num)


iodelay = G.iodelay;

% Construct the equation 6 G(iω) = −π and solve for ω


15 F = @(w) angle(polyval(num,1i*w))-iodelay*w-angle(polyval(den,1i*w))+pi;
wc = fzero(F,0) % Now solve for ωc

Rather than solve for the critical frequency algebraically, we could alternatively rely on the M AT-
LAB margin routine which tries to extract the gain and phase margins and associated frequencies
for arbitrary transfer functions. Listing 4.4 shows how we can tune a PID controller for an arbi-
trary transfer function, although note that because this strategy attempts to solve a nonlinear
4.6. PID TUNING METHODS 159

expression for the critival frequency using a numerical iterative scheme, this routine is not infal-
lible.

Listing 4.4: Ziegler-Nichols PID tuning rules for an arbitrary transfer function
1
G = tf(1,conv([6 1],[1 1]),'iodelay',5); % Plant of interest G = (6s+1)(s+1)
e−5s

[Gm,Pm,Wcg,Wcp] = margin(G); % Establish critical points


20 Ku = Gm; Pu = 2*pi/Wcg; % Critical gain, Ku & period, Pu

Kc = 0.6*Ku; taui = Pu/2; taud = Pu/8; % PID tuning rules (Ziegler-Nichols)


Gc = tf(Kc*[taui*taud taui 1],[taui 0]); % PID controller Kc (τi τd s2 + τi s + 1)/(τi s)

Note that while the use of the ultimate oscillation Ziegler-Nichols tuning strategy is generally dis-
couraged by practicitioners, it is the oscillation at near instability which is the chief concern, not
the general idea of relating the tuning constants to the ultimate gain and frequency. If for exam-
ple, we already have a plant model in the form of a transfer function perhaps derived from first
principles, or system identification techniques, then we can by-pass the potentially hazardous
oscillation step, and compute directly the tuning constants as a function of ultimate gain and
frequency.

4.6.3 Closed loop single-test tuning methods

Despite the fact that Ziegler-Nichols continuous cycling tuning method is performed in closed
loop, the experiment is both tedious and dangerous. The Yuwana-Seborg tuning method de-
scribed here, [203], retains the advantages of the ZN tuning method, but avoids many of the the
disadvantages. Given the attractions of this closed-loop reaction curve tuning methodology, there
have been subsequently many extensions proposed, some of which are summarised in [48, 99].
The following development is based on the modifications of [47] with the corrections noted by
[193].

The Yuwana-Seborg (YS) tuning technique is based on the assumption that the plant transfer
function, while unknown, can be approximated by the first order plus deadtime structure,

Km e−θm s
Gm (s) = (4.31)
τm s + 1
with three plant parameters, process gain Km , time constant, τm , and deadtime θm .

Surrounding this process with a trial, but known, proportional-only controller Kc , gives a closed
loop transfer function between reference r(t) and output y(t) as

Y (s) Kc Km e−θm s
= (4.32)
R(s) 1 + τ s + Kc Km e−θm s

A Padé approximation can be used to expand the non-polynomial term in the denominator, al-
though various people have subsequently modified this aspect of the method such as using a
different deadtime polynomial approximation. Using some approximation for the deadtime, we
can approximate the closed loop response in Eqn. 4.32 with

Ke−θs
Gcl ≈ (4.33)
τ 2 s2 + 2ζτ s + 1
and we can extract the model parameters in Eqn. 4.33 from a single closed loop control step test.
160 CHAPTER 4. THE PID CONTROLLER

peak #1
6 yp1 peak #2
setpoint ..
.. yp2
..
.. ..
r1 W .. ..
6 .. .. offset 6
.. ..
.. .. ? yss
.........
.. ..
.. .. 
.. ym ..
.. .. typical output
Setpoint .. ..
.. .
change .. -...
. .
period, P

r0 ?
time
-

Figure 4.23: Typical response of a stable system to a P-controller.

Suppose we have a trial proportional-only feedback controlled response with a known controller
gain Kc . It does not really matter what this gain is so long as the response oscillates sufficiently
as shown in Fig. 4.23.

If we step change the reference variable setpoint from r0 to r1 , we are likely to see an under-
damped response such as shown in Fig. 4.23. From this response we measure the initial output y0 ,
the first peak yp1 , the first trough, ym1 and the second peak yp2 , and associated times. Under these
conditions we expect to see some offset given the proportional-only controlled loop, and that the
controller gain should be sufficiently high such that the response exhibits some underdamped
(oscillatory) behaviour.

The final value of the output, y∞ , is approximately


2
yp2 yp1 − ym
y∞ = (4.34)
yp1 + yp2 − 2ym

or alternatively, if the experimental test is of sufficient duration, y∞ could simply be just the last
value of y collected.

The closed loop gain K is given by


y∞
K= (4.35)
r1 − r0
and the overshoot is given by
 
1 yp1 − y∞ y∞ − ym yp2 − y∞
H= + + , (4.36)
3 y∞ yp1 − y∞ y∞ − ym

The deadtime is
θ = 2tp1 − tm , (4.37)
the shape factor
− ln(H)
ζ=q , (4.38)
π 2 + ln2 (H)
4.6. PID TUNING METHODS 161

and the time constant p


(tm − tp1 ) 1 − ζ 2
τ= . (4.39)
π
Now that we have fitted a closed loop model, we can compute the ultimate gain, Ku , and ultimate
frequency, ωu , by solving the nonlinear expression
 
−1 2τ ζωu
− θωu − tan = −π (4.40)
1 − τ 2 ωu2

for ωu , perhaps using an iterative scheme starting with ωu = 0. It follows then that the ultimate
gain is  
1
Ku = Kc 1 + (4.41)
|Gcl (iωu )|
where
K
|Gcl (iωu )| = p (4.42)
(1 − τ ωu )2 + (2τ ζωu )2
2 2

Alternatively of course we could try to use margin to extract Ku and ωu .

At this point, given Ku and ωu , we can tune PID controllers using the ZN scheme directly fol-
lowing Table 4.4.

Alternatively we can derive the parameters for the assumed open loop plant model, Eqn. 4.31, as

|y∞ − y0 |
Km = (4.43)
Kc (|r1 − r0 | − |y∞ − y0 |)
1 p 2 2
τm = Ku Km − 1 (4.44)
ωu
1  
θm = π − tan−1 (τm ωu ) (4.45)
ωu
and then subsequently we can tune it either using the ZN strategy given in Listing 4.4, or use
the internal model controller strategy from Table 4.3, or perhaps using ITAE optimal tuning con-
stants. The optimal ITAE tuning parameters for P, PI and PID controllers are derived from
 −B  D  F
A θm 1 1 θm θm
Kc = , = , τd = Eτm (4.46)
Km τm τi Cτm τm τm

where the constants A, B, C, D, E, F are given in Table 4.5 for P, PI and PID controllers. We are
now in a position to replace our trial Kc with the hopefully better controller tuning constants.

Table 4.5: Tuning parameters for P, PI, and PID controllers using an ITAE optimisation criteria
where the constants A through E are used in Eqn. 4.46.

mode A B C D E F
P 0.49 1.084 – – – –
PI 0.859 0.977 1.484 0.680 – –
PID 1.357 0.947 1.176 0.738 0.381 0.99

Note that the change in setpoint, |r1 − r0 | must be larger than the actual response |y∞ − y0 | owing
to the offset exhibited by a proportional only controller. If the system does not respond in this
manner, then a negative process gain is implied, which in turn implies nonsensical imaginary
time constants. Another often overlooked point is that if the controlled process response does
not oscillate, then the trial gain Kc is simply not high enough. This gain should be increased, and
162 CHAPTER 4. THE PID CONTROLLER

the test repeated. If the process never oscillates, no matter how high the gain is, then one can use
an infinite gain controller — at least in theory.

Algorithm 4.2 summarises this single-test closed loop tuning strategy.

Algorithm 4.2 Yuwana-Seborg closed loop PID tuning

Tuning a PID controller using the closed-loop Yuwana-Seborg method is very similar to Algo-
rithm 4.1, but avoids the trial and error search for the ultimate gain and period.

1. Connect a proportional only controller to the plant.

2. Choose a trial sensible controller gain, Kc and introduce a set point change and record the
output.

3. Check that the output is underdamped and shaped similar to Fig. 4.23, and if not increase
the trial controller gain, Kc , and repeat step 2.

4. From the recorded underdamped response, measure the peaks and troughs, and note the
demanded setpoint change. You may find Listing 4.5 helpful here.

5. Compute the parameters of the closed loop plant and solve for the ultimate frequency,
Eqn. 4.40 and gain, Eqn. 4.41.

6. Optionally compute the open loop plant model using Eqn. 4.43—Eqn. 4.45.

7. Compute the P, PI or PID tuning constants using Table 4.5 or perhaps using the model and
the ZN strategy in Listing 4.4.

8. Test the closed loop response with the new PID values. If the response is not satisfactory,
further manual ‘fine-tuning’ may be necessary.

A M ATLAB implementation to compute the tuning constants given response data is given in
listings starting on page 164.

A consequence of the assumption that the plant is adequately modelled using first-order dy-
namics with deadtime, the Yuwana-Seborg tuning scheme is not suitable for highly oscillatory
processes, nor plants with inverse response behaviour as illustrated in section 4.8.1. Furthermore,
a consequence of the low order Padé approximation is that the strategy fails for processes with
a large relative deadtime. Ways to use this tuning scheme in the presence of excessive measure-
ment noise is described in section 5.2.1.

An example of a closed loop response tuner

Suppose we are to tune a PID controller for an inverse response process,

−s + 1
G(s) =
(s + 1)2 (2s + 1)

Such processes with right-hand plane zeros are challenging because of the potential instability
owing to the initial response heading off into the opposite direction as described further in §4.8.1.

The response to a unit setpoint change is given in the top figure of Fig. 4.24. By recording the
values of the peaks and troughs, we can estimate the parameters of an approximate model of the
4.6. PID TUNING METHODS 163

Trial closed loop step test, K = 1


c

0.5

0 tp1 tm1 tp2


0 5 10 15 20 25

Closed loop model

1
Amplitude

0.5

Actual
0 Fitted

0 5 10 15 20 25
Time (sec)
Open loop model
Figure 4.24: A Yuwana-Seborg closed loop
1 step test. The top plot shows the response
Amplitude

given a unit change in setpoint with the key


0.5 points marked. The middle plot compares
Actual the estimated with the actual closed loop
0 fitted data, while the bottom plot compares the
0 5 10 15 20 25 30 estimated openloop plant model with the
Time (sec) true plant.

closed loop response, Eqn. 4.33. A comparison of this model with the actual data is given in the
middle plot of Fig. 4.24. It is not a perfect fit because the underlying plant is not a pure first order
with deadtime, and because Eqn. 4.33 is only approximate anyway.

Given the closed loop model, we can now extract the open loop model of the plant which is
compared with the true plant in the bottom trend of Fig. 4.24. Note how the first-order plus
deadtime model, specifically in this case

1.003
Ĝ = e−2.24s
3.128s + 1

approximates within reason the higher order true plant dynamics.

Finally now that we have a plant model,Ĝ, we can either use the ITAE optimal or IMC tuning
scheme as shown in Fig. 4.25.

As evident in Fig. 4.25, the ITAE responses are overly oscillatory which tends to be a common
failing of PID tuning schemes that are designed to satisfy a performance objective.

In the IMC case, the desired closed loop time constant was set equal to the sum of the dead-
time and the open loop time constant, τc = τm + θm , which as it turns out is perhaps overly
conservative in this case. Better performance would be achieved by reducing τc .
164 CHAPTER 4. THE PID CONTROLLER

ITAE PI control ITAE PID control


1.5 1.5

1 1
y&r

0.5 0.5

0 0
2 2

0 0
u

−2 −2

IMC PI control IMC PID control


1.5 1.5

1 1
y&r

0.5 0.5

0 0
2 2

0 0
u

−2 −2
0 10 20 30 40 50 0 10 20 30 40 50

Figure 4.25: The PI (left) and PID (right) closed loop responses using the ITAE (upper) and IMC
(lower) tuning schemes based on the identified model but applied to the actual plant.

Automating the Yuwana-Seborg tuner

Because this is such a useful strategy for PID tuning, it is helpful to have a semi-automated
procedure. The first step once you have collected the closed loop data with a trial gain is to
identify the values and timing of the peaks and troughs, yp1 , ym1 and yp2 . Listing 4.5 is a simple
routine that attempts to automatically extract these values. As with all small programs of this
nature, the identification part can fail, particularly given excessive noise or unusual response
curves.

Listing 4.5: Identifies the characteristic points for the Yuwana-Seborg PID tuner from a trial closed
loop response
function [yp1,tp1,ym1,tm1,yp2,tp2,yss_est] = ys_ident(t,Y);
% Identify the peaks & troughs from an underdamped closed loop test
3 % Used for Yuwana-Seborg PID tuning

y0 = Y(1); n = length(Y); % start point & total # of points


if n < 20
fprintf('Probably need more points\n');
8 end % if
Y = Y*((Y(n) > y0)-0.5)*2; % workout direction, 1 if positive)
yss_est = mean(Y(n:-1:n-round(n/50))); % average pt at the end (robust)

[yp1,idx] = max(Y); tp1 = t(idx); % 1st maximum point & time at occurrence
13 Y = Y(idx:n); t = t(idx:n); % chop off to keep searching
[ym1,idx] = min(Y); tm1 = t(idx); % 1st minimum point & time
Y(1:idx) = []; t(1:idx)=[];
[yp2,idx] = max(Y); tp2 = t(idx); % 2nd maximum point & time at occurrence
4.6. PID TUNING METHODS 165

18 return % end ys_ident.m

Using this peak and trough data, we can compute the parameters in the closed loop model fol-
lowing Listing 4.6.

Listing 4.6: Compute the closed loop model from peak and trough data
yinf = (yp1*yp2 - ym1ˆ2)/(yp1+yp2-2*ym1); % Steady-state output, y∞
2 K = yinf/A; % Closed loop gain, K, Eqn. 4.35 & Overshoot, H
H = 1/3*((yp1-yinf)/yinf+(yinf-ym1)/(yp1-yinf)+(yp2-yinf)/(yinf-ym1));

d = 2*tp1 - tm1; % Deadtime, θ, Eqn. 4.37


zeta = -log(H)/sqrt(piˆ2+log(H)ˆ2); % shape factor, ζ, Eqn. 4.38
7 tau = (tm1-tp1)*sqrt(1-zetaˆ2)/pi; % time constant, τ , Eqn. 4.39

Gcl = tf(K,[tauˆ2 2*zeta*tau 1],'iodelay',d); % Closed loop model, Gcl (s), Eqn. 4.33

Now that we have the closed loop model, we can compute the ultimate gain, Ku , and ultimate
frequency, ωu .

Listing 4.7: Compute the ultimate gain and frequency from the closed loop model parameters.
10 fwu = @(w) pi - d*w - atan2(2*zeta*tau*w,1-tauˆ2*w.ˆ2); %

[Gm,Pm,Wcg,Wcp] = margin(Gcl) % Use as a good starting estimate for ωu


wu = fsolve(fwu,Wcg); % Solve nonlinear Eqn. 4.40 for ωu

15 B= K/sqrt((1-tauˆ2*wuˆ2)ˆ2+(2*tau*zeta*wu)ˆ2);
Kcu = Kc*(1+1/B); % Ultimate gain, Ku , Eqn. 4.41

Finally we can now extract the open loop first-order plus deadtime model using Listing 4.8.

Listing 4.8: Compute the open loop model, Gm , Eqn. 4.31.


Km = yinf/(Kc*(A-yinf)); % Plant gain, Gm , Eqn. 4.43
taum = 1/wu*sqrt(Kcuˆ2*Kmˆ2-1); % Plant time constant, τm , Eqn. 4.44
dm = 1/wu*(pi - atan(taum*wu)); % Plant delay, θm , Eqn. 4.45
20 Gm = tf(Km,[taum 1],'iodelay',dm);% Open loop model estimate, Gm (s), Eqn. 4.31

Now that we have an estimate of the plant model, it is trivial to use say the IMC relations to
compute reasonable PI or PID tuning constants. We do need however to decide on an appropriate
desired closed loop time constant.

Listing 4.9: Compute appropriate PI or PID tuning constants based on a plant model, Gm , using
the IMC schemes.
tauc = (tau+dm)/3; % IMC desired closed loop time constant, τc
controller = 'PID'; % Type of desired controller, PI or PID

switch controller % See IMC tuning schemes in Table 4.3.


25 case 'PI'
Kc = 1/Km*(tau)/(tauc+dm);
taui = taum; rs = 1/taui; % Integral time, τi , and reset, 1/τi .
166 CHAPTER 4. THE PID CONTROLLER

taud = 0;
case 'PID'
30 Kc = 1/Km*(tau+dm/2)/(tauc+dm/2);
taui = tau+dm/2; rs = 1/taui;
taud = tau*dm/(2*tau+dm);
end % switch

Now all that is left to do is load the controller tuning constants into the PID controller, and set
the controller to automatic.

4.6.4 Summary on closed loop tuning schemes

These automated PID tuners are, not surprisingly, quite popular in industry and also in academic
circles. Most of the schemes are based on what was described above, that is the unknown process
is approximated by a simple transfer function, where the ultimate gain and frequency is known
as a function of parameters. Once these parameters are “curve fitted”, then the PID controller is
designed via a Ziegler–Nichols technique. There are numerous modifications to this basic tech-
nique, although most of them are minor. One extension is reported in [48] while [193] summarises
subsequent modifications, corrects errors, and compares alternatives. The relay feedback method
of Hägglund and Åström, [15], described in next in §4.7 is also a closed loop single test method.

Problem 4.2 1. Modify the ys_ident.m function to improve the robustness. Add improved
error checking, and try to avoid other common potential pitfalls such as negative controller
gain, integrator responses, sever nonlinearities, excessive noise, spikes etc.

2. Test your auto closed loop tuner on the following three processes Gp (s) (adapted from [48]);

e−4s
G1 (s) = (4.47)
s+1
e−3s
G2 (s) = (4.48)
(s + 1)2 (2s + 1)
e−0.5s
G3 (s) = (4.49)
(s − 1)(0.15s + 1)(0.05s + 1)

G1 and G2 show the effect of some dead time, and G3 is an open loop unstable process.

3. Create a m file similar to the one above, that implements the closed loop tuning method as
described by Chen, [48]. You will probably need to use the M ATLAB fsolve function to
solve the nonlinear equation.

4.7 Automated tuning by relay feedback

Because the manual tuning of PID controllers is troublesome and tedious, it is rarely done which
is the motivation behind the development of self-tuning controllers. This “tuning on demand”
behaviour after which the controller reverts back to a standard PID controller is different from
adaptive control where the controller is consistently monitoring and adjusting the controller tun-
ing parameters or even the algorithm structure. Industrial acceptance of this type of smart con-
troller was much better than for the fully adaptive controller, partly because the plant engineers
were far more confident about the inherent robustness of a selftuning controller.
4.7. AUTOMATED TUNING BY RELAY FEEDBACK 167

One method to selftune employed by the ECA family of controllers manufactured by ABB shown
in Fig. 4.26 uses a nonlinear element, the relay. The automated tuning by relays is based on the
assumption that the Ziegler-Nichols style of controller tuning is a good one, but is flawed in
practice since finding the ultimate frequency ωu is a potentially hazardous tedious trial and error
experiment. The development of this algorithm was a joint project between SattControl and Lund
University, Sweden [15]. This strategy of tuning by relay feedback is alternatively known as the
Auto Tune Variation or ATV method.

Figure 4.26: A PID con-


troller with a self-tuning
option using a relay from
ABB.

The PID controller with selftuning capability using relay feedback is really two controllers in one
as shown in Fig. 4.27. Here the PID component is disabled, and the relay is substituted. Once the
executive software is confident that the updated controller parameters are an improvement, the
switch is toggled, and the controller reverts back to a normal PID regulator.

relay (active)
d
-
Plant
setpoint −d

r -+ ǫ I u -
Gp (s) - y
6−

- PID

PID (inactive)

Figure 4.27: A process under relay tuning with the PID regulator disabled.

A relay can be thought of as an on/off controller or a device that approximates a proportional


controller with an infinite gain, but hard limits on the manipulated variable. Thus the relay has
two outputs: if the error is negative, the relay sends a correcting signal of −d units, and if the
error is positive, the relay sends a correcting signal of +d units.
168 CHAPTER 4. THE PID CONTROLLER

4.7.1 Describing functions

A relay is a nonlinear element that, unlike the situation with a linear component, the output
to a sinusoidal input will not be in general sinusoidal. However for many plants that are low-
pass in nature, the higher harmonics are attenuated, and we need only consider the fundamental
harmonic of the nonlinear output.

The describing function N , of a nonlinear element is defined as the complex ratio of the funda-
mental harmonic of the output to the input, or

def Y1
N = 6 φ (4.50)
X1

where X1 , Y1 are the amplitudes of the input and output, and φ is the phase shift of the fun-
damental harmonic component of the output, [150, p652]. If there is no energy storage in the
nonlinear element, then the describing function is a function only of the input amplitude.

By truncating a Fourier series of the output, and using Eqn. 4.50, one can show that the describing
function for a relay with amplitude d is

4d
N (a) = (4.51)
πa
where a is the input amplitude.

Now a sustained oscillation or limit cycle of a linear plant G(iω) together with a nonlinear ele-
ment with describing function N will occur when 1 + N G(iω) = 0 or

1
G(iω) = − (4.52)
N
Now since in the case of a relay, N given by Eqn. 4.51 is purely real, then the intersection of G(iω)
and −1/N is somewhere along the negative real axis. This will occur at a gain of πa/(4d), so the
ultimate gain, Ku , which is the reciprocal of this, is

4d
Ku = (4.53)
πa
The ultimate frequency, ωu = 2π/Pu , (in rad/s) is simply the frequency of the observed output
oscillation and can be calculated from counting the time between zero crossings, and the ampli-
tude, a, by measuring the range of the output. We now know from this simple single experiment
one point on the Nyquist curve and a reasonable PID controller can be designed based on this
point using, say, the classical Ziegler-Nichols table.

In summary, under relay feedback control, most plants will exhibit some type of limit oscillation
such as a sinusoidal-like wave with a phase lag of −180◦. This is at the point where the open loop
transfer function crosses the negative real axis on a Nyquist plot, although not necessarily at the
critical −1 point. A typical test setup and response is given in Fig. 4.28.

Given that we know a single point on the Nyquist curve, how do we obtain controller param-
eters? It is convenient to restrict the controller to the PID structure since that is what is mostly
about in practice. The original ZN tuning criteria gave responses that should give a specified
gain and phase margin. For good control a gain margin of about 1.7 and a phase margin of about
φm = 30◦ is recommended. Note that by using different settings of the PID controller, one can
move a single point on the Nyquist curve to any other arbitrary point. Let us assume that we
know the point where the open loop transfer function of the process, Gp (s), crosses the negative
4.7. AUTOMATED TUNING BY RELAY FEEDBACK 169

Relay Unknown plant


u
-+ - - Plant - output
6−

6
output output amplitude 2a

?
-

6
input, u 6 relay amplitude 2d
?
-
 - time, t
P
period

Figure 4.28: An unknown plant under relay feedback exhibits an oscillation

real axis, ωu . This point has a phase (or argument) of −π or −180◦. We wish the open loop trans-
fer function, Gc Gp to have a specified phase margin of φm by choosing appropriate controller
tuning constants. Thus equating the arguments of the two complex numbers gives
 
1
arg 1 + + iωu τd − π = φm − π (4.54)
iωu τi
which simplifies to5
1
ωu τd − = tan φm
ωu τi
Since we have two tuning parameters (τi , τd ), and one specified condition, (φm ), we have many
solutions. Since their are many PID controller constants that could satisfy this phase margins,
Åström, [11], chose that the integral time should be some factor of the derivative time,
τi = f τd (4.55)
Then the derivative time, which must be real and positive, is
p
f tan φm + f 2 tan2 φm + 4f
τd = (4.56)
2f ωu
where the arbitrary factor, f ≈ 4. Given the ultimate gain Ku , the controller gain is
Kc = Ku cos φm (4.57)
5 Remember that def
for a complex number z = x + iy, the argument of z is the angle z makes with the real axis;
y
arg(z) = arg(x + iy) = tan−1
x
170 CHAPTER 4. THE PID CONTROLLER

Algorithm 4.3 Relay auto-tuning


In summary, to tune a system by relay feedback, you must:

1. Turn off the current PID controller, and swap over to a relay feedback controller with known
relay gain d.
2. Apply some excitation if necessary to get the process to oscillate under relay control. Mea-
sure the resultant error amplitude a, and period Pu . Calculate the ultimate gain from
Eqn. 4.53 and associated frequency ωu .
3. Decide on a phase margin, (φm ≈ 30◦ ), the factor, f ≈ 4, and calculate the PID controller
tuning constants using Eqns 4.55–4.57.
4. Download these parameters to the PID controller, turn off the relay feedback, and swap
back over to the PID controller.
5. Wait until the operator again initiates the self-tuning program after which return to step 1.

4.7.2 An example of relay tuning

We can compare the experimentally determined ultimate gain, Ku , and ultimate frequency, ωu ,
with the theoretical values for the plant
1
G(s) = (4.58)
(τ s + 1)4
where the time constant is τ = 0.25. The ultimate gain and frequency are given by solving
 
Ku Ku
arg = −π, and =1
(τ s + 1)4 s=iωu (τ s + 1)4 s=iωu

which is rather involved. An alternative graphical solution is obtained using Bode and/or Nyquist
diagrams in M ATLAB.

G = zpk([],[-4 -4 -4 -4],4ˆ4);
2 nyquist(G,logspace(-1,3,200));
bode(G) % Bode diagram is an alternative

The Nyquist diagram given in the upper plot of Fig Fig. 4.29 shows the curve crossing the nega-
tive real axis at −1/Ku ≈ −0.25, thus Ku ≈ 4. To find the ultimate angular frequency, we need to
plot the phase lag of the Bode diagram shown in the bottom figure of Fig. 4.29. The phase angle
crosses the −180◦ ≡ −π radians point at a frequency of about ωu = 4 rad/s. Thus the ultimate
period is Pu = 2π/ωu = 1.57 seconds. Note that it was unnecessary to plot the Nyquist diagram
since we could also extract the ultimate gain information from the magnitude part of the Bode
diagram.

Estimating Ku and ωu by relay experiment

Now we can repeat the evaluation of Ku and ωu , but this time by using the relay experiment.
I will use a relay with a relay gain of d = 2 but with no hysteresis (h = 0) and a sample time
of T = 0.05 seconds. To get the system moving, I will choose a starting value away from the
setpoint, y0 = 1.
4.7. AUTOMATED TUNING BY RELAY FEEDBACK 171

Nyquist Diagram

0.5 1/K
u
Imaginary Axis

0 6
4
0.1
3
−0.5 0.5
2
1.5 1
−1
−1 −0.5 0 0.5 1
Real Axis

Bode Diagram
Gm = 12 dB (at 4 rad/sec) , Pm = −180 deg (at 0 rad/sec)

0
Magnitude (dB)

−50

−100

−150
0
Figure 4.29: Nyquist (top) and Bode (bot-
Phase (deg)

−180
tom) diagrams for 1/(0.25s + 1)4 . The
Nyquist curve crosses the negative ℜ-axis
at about 0.25. The frequency at which the
−360
10
−1
10
0 ω 1 2 Bode curve passes though φ = −180◦ is
u 10 10
Frequency (rad/sec) ω ≈ 4 rad/s.

G = zpk([],[-4 -4 -4 -4],4ˆ4); % Plant: G(s) = 1/(0.25s + 1)4


2 Ts=0.05; t = [0:Ts:7]'; % sample time
u = 0*t; y=NaN*t; % initialise u & y
d=2; % relay gain

[Phi,Del,C,D] = ssdata(c2d(G,Ts));
7 x = [0,0,0,0.1]'; % some non-zero initial state

for i=1:length(t)-1; % start relay experiment


x = Phi*x + Del*u(i); % integrate model 1 step
y(i) = C*x; % output
12 u(i+1) = -d*sign(y(i)); % Relay controller
end % for

plot(t,y,t,u,'r--')

The result of the relay simulation is given in Fig. 4.30 where we can see that once the transients
have died out, the amplitude of the oscillation of the output (or error since the setpoint is con-
stant) is a = 0.64 units, with a period of P = 1.6 seconds. Using Eqn. 4.53, we get estimates for
the ultimate gain and angular frequency

4d
P̂u = P = 1.6 ≈ 1.57s, and K̂u = = 3.978 ≈ 4
πa
172 CHAPTER 4. THE PID CONTROLLER

which approximately equal the estimates that I graphically found using the Bode and Nyquist
diagrams of the true plant.

2
output & setpoint

0
a = 0.64
−1
relay off
−2
5
input

0
Pu = P = 1.6s
−5
0 5 10 15 20 25 30
time [s]

Figure 4.30: PID tuning using relay feedback. For the first 10 seconds the relay is enabled, and
the process oscillates at the ultimate frequency. After 10s, the relay is turned off, the PID tuner
with the updated constants is enabled, and the plant is under control.

Once we have the ultimate gain and frequency, the PID controller constants are obtained from
Eqn. 4.55 to 4.57 and are shown in Table 4.6. Alternatively we can use any Z-N based tuning
scheme.

Table 4.6: The PID controller tuning parameters obtained from a relay based closed loop experi-
ment
PID parameter Hägglund Ziegler-Nichols units
Kc 3.445 0.6Ku = 3.4 –
τi 0.866 Pu /2 = 0.8 s
τd 0.217 Pu /8 = 0.2 s

Now that we have the PID tuning constants, we can simulate the closed loop response to setpoint
changes. We could create the closed loop transfer function using G*Gpid/(1 + Gpid*G), how-
ever the preferred approach is to use the routines series and feedback. This has the advantage
that it will cancel common factors and perform a balanced realisation.

G = zpk([],[-4 -4 -4 -4],4ˆ4); % Plant to be controlled, Eqn. 4.58.

K = 3.4; ti = 0.8; td = 0.2; % Controller constants from Table 4.6.


Gpid = tf(K*[ti*td ti 1],[ti 0])
5 Gcl = feedback(series(G,Gpid),1); % Create closed loop for simulation

R = [zeros(10,1);ones(100,1);-ones(100,1)]; % Design setpoint vector


T = 0.05*[0:1:length(R)-1]'; % time scale
y = lsim(Gcl,R,T); % Simulate continuous
10 plot(T,[y,R])
4.7. AUTOMATED TUNING BY RELAY FEEDBACK 173

The remainder of Fig. 4.30 for the period after 10 seconds shows the quite reasonable response us-
ing the PID constants obtained via a relay experiment. Perhaps the overshoot at around 40% and
oscillation is a little high, but this is a well known problem of controller tuning via the classical
Z–N approach, and can be addressed by using the modified ZN tuning constants.

4.7.3 Self-tuning with noise disturbances

The practical implementation of the relay feedback identification scheme requires that we mea-
sure the period of oscillation P , and the error amplitude a automatically. Both these parameters
are easily established manually by visual inspection if the process response is well behaved and
noise free, but noisy industrial outputs can make the identification of even these simple charac-
teristics difficult to obtain reliably and automatically.

When considering disturbances, [11] develops two modifications to the period and amplitude
identification part of the selftuning algorithm. The first involves a least squares fit, (described
below) and the second involves an extended Kalman filter described in problem 9.7. Of course
hybrids of both these schemes with additional heuristics could also be used. Using the same pro-
cess and relay as in §4.7.2, we can introduce some noise into the system as shown in the S IMULINK
model given in Fig. 4.31(a). Despite the low noise power (0.001), and the damping of the fourth-
order process, the oscillations vary in amplitude and frequency as shown in Fig. 4.31(b).

To obtain reliable estimates of the amplitude and period we can use the fact that a sampled
sinusoidal function with period P and sample time T satisfies

y(t) − θ1 y(t − T ) + y(t − 2T ) + θ2 = 0 (4.59)

with  
2πT
θ1 = 2 cos (4.60)
P
The extra constant, θ2 is introduced to take into account non-zero means. Eqn. 4.59 is linear
in the parameters, and we can solve the least-squares regression problem in M ATLAB using the
backslash command.  
θ1  +
θ= = yt−T , −1 (yt + yt−2T ) (4.61)
θ2
Once we have fitted θ1 , (we are uninterested in θ2 ), the period is obtained by inverting Eqn. 4.60,

2πT
P = (4.62)
cos−1 (θ1 /2)

Now that we know the period, we can solve another linear least-squares regression problem
given the N output data points,

N      2
X 2πkT 2πkT
min y(kT ) − θ3 sin − θ4 cos − θ5 (4.63)
θ k=1 P P

where the amplitude is given by


q
a= θ32 + θ42

By doing the regression in this two step approach, we avoid the nonlinearity of Eqn. 4.63 if we
were to estimate P as well in an attempt to do the optimisation in one step.

Running the S IMULINK simulation given in Fig. 4.31(a) gives a trend something like Fig. 4.31(b).
174 CHAPTER 4. THE PID CONTROLLER

Band−Limited
White Noise

256
0
(s+4)(s+4)(s+4)(s+4)
Constant Relay Zero−Pole output

(a) Relay feedback of a fourth-order plant with band-limited noise added


Relay tuning with noise
2

1
input & output

−1

−2
0 2 4 6 8 10 12 14 16 18 20
time

(b) The controlled output and relay response.

Figure 4.31: Relay oscillations with noise

The only small implementation problem is that S IMULINK does not necessarily return equally
sampled data for continuous system simulation. The easiest work-around is to define a regularly
spaced time vector and do a table lookup on the data using interp1. The script file given in
listing 4.10 calculates the period and amplitude using least-squares.

Listing 4.10: Calculates the period and amplitude of a sinusoidal time series using least-squares.
N = length(t); % Length of data series
tr = linspace(t(1),t(length(t)),1.5*N)'; dt = tr(2)-tr(1);
yr = interp1(t,y,tr,'linear'); % Simulink is not regular in time
y=yr; t = tr; % Now sample time T is regularly spaced.
5

% Construct data matrix


npts = length(y); % # of data points
rhs = y(3:npts) + y(1:npts-2);
X = [y(2:npts-1),-ones(size(rhs))];
10 theta = X\rhs; % Solve LS problem, Eqn. 4.61.
P = 2*pi*dt/acos(theta(1)/2); % Now find period from Eqn. 4.62.

omega = 2*pi/P; % 2nd round to find amplitude .....


k = [1:npts]';
15 X = [sin(omega*k*dt) cos(omega*k*dt) ones(size(k))]; % data matrix
theta = X\y; % Solve 2nd LS problem, Eqn. 4.63.
a = norm(theta(1:2)); % Extract amplitude.

In my case, the computed results are:

P = 1.822, a = 0.616

which is not far off the values obtained with no disturbances, from page 171 of P = 1.6 and
4.7. AUTOMATED TUNING BY RELAY FEEDBACK 175

t
Clock To Workspace1 Auto−Scale
Graph1
Adapter Mux yout
To Workspace
PID Mux
0.6
PID Controller
setpoint
+ +
+ Output Plug
− 0 RT Out RT In
Sum1 Sum
Signal Constant Switch RT Out RT In
generator1 Input Plug

Relay

(a) A S IMULINK implementation of the relay feedback experiment.

Relay based tuning (Black Box)


0.8

0.7
Output & setpoint

0.6

0.5

0.4

0.3
0 10 20 30 40 50 60 70

0.5
input

−0.5

−1
0 10 20 30 40 50 60 70
time (s)
(b) The output of the relay experiment

Figure 4.32: Experimental implementation of relay tuning of the blackbox

a = 0.64.

Relay based tuning of a ‘black-box’

Using a data acquisition card and a real-time toolbox extension6 to M ATLAB, we can repeat the
trial of the relay based tuning method this time controlling a real process, in this case a “black
box”. The S IMULINK controller and input/output data are given in Fig. 4.32.

From Fig. 4.32, we can measure the period and amplitude of the oscillating error
Pu = 7 seconds, a = 0.125
giving an approximate ultimate gain as
4d
Ku = , with d = 1
πa
4
= = 10.2
0.125π
which leads to appropriate Kc , τi , τd , using the modified ZN rules
6 Available from Humusoft
176 CHAPTER 4. THE PID CONTROLLER

Kc τi (s) τd (s)
0.33Ku = 3.4 Pu /2 = 3.5 Pu /3 = 2.3

Since S IMULINK uses a slightly different PID controller scheme from the classical formula,

Kc
P = Kc = 3.4, I= = 0.97, D = Kc τd = 7.8
τi

Using these values in the PID controller with the black-box, we obtain the controlled results
shown in Fig. 4.33 which apart from the noisy input and derivative kick (which is easily removed,
refer §4.4.1), does not look too bad.

PID control based on relay tune


1
Output & setpoint

0.8

0.6

0.4

0.2
0 20 40 60 80 100 120

1.5

0.5
input

−0.5

−1

−1.5
0 20 40 60 80 100 120
time (s)

Figure 4.33: Experimental PID control of the blackbox using parameters obtained by the relay-
based tuning experiment. (See also Fig. 4.32.)

4.7.4 Modifications to the relay feedback estimation algorithm

The algorithm using a relay under feedback as described in section 4.7 establishes just one point
on the Nyquist curve. However it is possible by using a richer set of relay feedback experiments
to build up a more complex model, and perhaps better characterise a wider collection of plants.

Relays with hysteresis

Physical relays always have some hysteresis h, to provide mechanical robustness when faced
with noise. By adjusting the hysteresis width we can excite the plant at different points on the
Nyquist curve.
4.7. AUTOMATED TUNING BY RELAY FEEDBACK 177

The response of a relay with hysteresis is given in Fig. 4.34. Note that the relay will not flip to
+d from the −d position until we have a small positive error, in this case h. Likewise, from the
reverse direction, the error must drop below a small negative error, −h, before the relay flips from
+d to −d.


+d relay amplitude

output 0

−d -

−h 0 h
error

Figure 4.34: A relay with hysteresis width h and output amplitude d.

The describing function for a relay with amplitude d and hysteresis width h is

1 πp 2 πh
− =− a − h2 − j (4.64)
N (a) 4d 4d

which is a line parallel to the real axis in the complex plane. Compare this with the describing
function for the pure relay in Eqn. 4.51.

The intersection of this line and G(iω) is the resultant steady-state oscillation due to the relay,
refer Fig. 4.35. By adjusting the relay hysteresis width, h, we can move the −1/N (a) line up and
down, thereby establishing different points on the Nyquist diagram. Of course if we increase the
hysteresis too much, then we may shift the 1/N (a) curve too far down so that it never intercepts
the G(iω) path. In this situation, we will not observe any limit cycle in the closed loop experiment.

Fig. 4.36 shows the estimated frequency response for the plant G(s) = 256/(s + 4)4 calculated
using a relay with d = 2 and varying amounts of hysteresis from h = 0 to h = 1.4. Note that the
calculated points do follow the true frequency response of the plant, at least for small values of h.
Obviously if h is too large, then the line −1/N (a) will not intersect the frequency response curve,
G(iω), so there will be no oscillations.

Inserting known dynamics into the feedback loop

A second simple modification is to insert an integrator prior to the relay. The integrator subtracts
−π/2 or −90◦ from the phase, and multiplies the gain by a factor of 1/ω. In this way, under relay
feedback we can estimate the point where the Nyquist curve crosses the negative imaginary
axis as well as the negative real axis. Such a scheme is shown in Fig. 4.37 and is described by
[199] which is a specific case of the general schemes described by [117, 144]. A survey of other
modifications to the basic relay feedback idea for controller tuning is given in [44].
178 CHAPTER 4. THE PID CONTROLLER


6

G(iω)
0 - ℜ

−1
N (a) + increasing ω

operating point under
relay feedback

Figure 4.35: Relay feedback with hysteresis width h.

0.4

0.2

0 0
0.2
0.4
−0.2 0.6
ℑG(i ω)

0.8
−0.4 1
1.2
1.4
−0.6

−0.8
Figure 4.36: Using a relay feedback with varying
−1
amounts of hysteresis from h = 0 to h = 1.4 to ob-
tain multiple points, 2, on a Nyquist curve. The −1.2
−1 −0.5 0 0.5 1
true frequency response G(iω) is the solid line. ℜG(i ω)

Relay

K(s)
0
den(s)
Constant Manual Switch Transfer Fcn Transport output
1 Delay
s
Integrator Relay1

Figure 4.37: Relay feedback with an integrator option


4.7. AUTOMATED TUNING BY RELAY FEEDBACK 179

Establishing multiple points on the Nyquist curve enables one to extract a transfer function mod-
els. In the case of a first-order plus deadtime model

Kp
G(s) = e−Ls (4.65)
τs + 1
we need two arbitrary distinct points. That is we establish the magnitude and phase of G(iω) at
two frequencies ω1 and ω2 via two relay experiments.

Defining the magnitudes and angles to be

1 1
|G(iω1 )| = , |G(iω2 )| =
k1 k2
6 G(iω1 ) = φ1 , 6 G(iω2 ) = φ2

then the plant gain is s


ω12 − ω22
Kp = , (4.66)
k22 ω12 − k12 ω12
the time constant is
1 q 2 2
τ= k1 Kp − 1, (4.67)
ω1
and the deadtime is
1 
L=− φ1 + tan−1 (ω1 τ ) . (4.68)
ω1
In the case of a pure relay and a pure relay with an integrator,

φ1 = −π, φ2 = −π/2.

The case for a second order plant

Kp
G(s) = e−Ls (4.69)
s2 + as + b
is more complicated. The following algorithm, due to [117], first estimates the deadtime via a
nonlinear function, then extracts the coefficients Kp , a and b using least-squares.

First we define the matrices


     
A1 −ω12 0 1   sin Lω1
 A2   −ω22 1/Kp
0 1   cos Lω1 
A=  A3  =  0
  , X =  a/Kp  , Y=  (4.70)
ω1 0   sin Lω2 
b/Kp
A4 0 ω2 0 cos Lω2

and
  
B1 ℑ{G−1 (jω1 )} ℜ{G−1 (jω1 )} 0 0
−1
 B2   0 0 ℑ{G (jω2 )} ℜ{G−1 (jω2 )} 
B=
 B3  =  −ℜ{G−1 (jω1 )} ℑ{G−1 (jω1 )}
   (4.71)
0 0 
B4 0 0 −ℜ{G−1 (jω2 )} ℑ{G−1 (jω2 )}

Now provided we know the deadtime L, we can solve for the remaining parameters in X using
 −1  
A1 B1
X =  A2   B2  Y (4.72)
A3 B3
180 CHAPTER 4. THE PID CONTROLLER

0.4
Step Response

2
0.2 G
G
1est
1.5 G
2est
0

1
ℑ{G(i ω)}

−0.2

Amplitude
−0.4 0.5

−0.6 0

−0.8 −0.5

−1 −1
−1 −0.5 0 0.5 1 1.5 2 0 20 40 60 80 100 120 140 160 180
ℜ{G(i ω)} Time (sec)

(a) The resultant Nyquist plots (b) The resultant step responses

Figure 4.38: Identification of transfer function models using a two-point relay experiment.

However since we do not know L, we must solve the nonlinear relation


(A4 C − B4 )Y = 0 (4.73)

Once given the model, any standard model-based controller tuning scheme such as internal
model control (IMC) can be used to establish appropriate PID tuning constants.

Fig. 4.38 illustrates the procedure for when the true, but unknown plant is the third-order system
1
G= e−6s
(9s + 1)(6s + 1)(4s + 1)
with estimated first and second order models.

Problem 4.3 1. Draw a Nyquist diagram for the transfer function Gp (s). Investigate what hap-
pens to the Nyquist diagram when a PI controller is added. How is the curve shifted
4
Gp (s) =
(τ s + 1)3
where τ = 2.0.
2. Write down pseudo code for the automatic tuning (one button tuning) of a PID controller
using the ZN tuning criteria. Ensure that your algorithm will work under a wide variety of
conditions and will not cause too much instability if any. Consider safety and robustness
aspects for your scheme.
3. Write a .m file to implement a relay feedback tuning scheme using the Åström and Hägglund
tuning method. Use the pidsim file as a starting point. Pass as parameters, the relay gain
d, and the hysteresis delay h. Return the ultimate period and gain and controller tuning
parameters.

4.8 Drawbacks with PID controllers

The PID controller despite being a simple, intuitive and flexible controller does have some draw-
backs. Some of these limitations are highlighted when trying to control an inverse response as
demonstrated in §4.8.1, and a simple add-on for deadtime processes is given in §4.9.
4.8. DRAWBACKS WITH PID CONTROLLERS 181

4.8.1 Inverse response processes

An inverse response is where the response to a step change first heads in one direction, but then
eventually arrives at a steady state in the reverse direction as shown in Fig. 4.40(b). Naturally
this sort of behaviour is very difficult to control, and if one tends to overreact, perhaps because
the controller’s gain is too high, then excessive oscillation is likely to result. Inverse responses are
common in the control of the liquid water level in a steam drum boiler as described in [20] due
to a phenomena known as the swell-and-shrink effect. For example the water level first increases
when the steam valve is opened because the drum pressure will drop causing a swelling of the
steam bubbles below the surface.

Transfer functions that have an odd number of right-hand-plane zeros have inverse responses.
Inverse response processes belong to a wider group of transfer functions called Non-Minimum
Phase (NMP) transfer functions. Other NMP examples are those processes that have dead time
which are investigated later in §4.9. The term “non-minimum phase” derives from the fact that
for these transfer functions, there will exist another transfer function that has the same amplitude
ratio at all frequencies, but a smaller phase lag.

Figure 4.39: The J curve by Ian Bremmer popularised the


notion that things like private investments, or even the
openness/stability state of nations often get worse before
they get better. J curves are inverse responses.

I like to think of inverse response systems as the sum of two transfer functions: one that goes
downwards a short way quickly, and one that goes up further but much more leisurely. The
addition of these two gives the inverse response. For example;

K1 K2
G(s) = − (4.74)
τ1 s + 1 τ2 s + 1
| {z } | {z }
G1 G2
(K1 τ2 − K2 τ1 )s + K1 − K2
= (4.75)
(τ1 s + 1)(τ2 s + 1)

If K1 > K2 , then for the system to have right hand plane zeros (numerator polynomial equals
zero in Eqn. 4.75), then
K1 τ2 < K2 τ1 (4.76)
If K1 < K2 , then the opposite of Eqn. 4.76 must hold.

A S IMULINK simulation of a simple inverse response process,

4 3 (−5s + 1)
G(s) = − = (4.77)
3s + 1 s + 1 (3s + 1)(s + 1)
| {z } | {z }
G1 G2

of the form given in Eqn. 4.74 and clearly showing the right-hand plane zero at s = 1/5 is shown
in Fig. 4.40(a). The simulation results for the two individual transfer functions that combine to
give the overall inverse response are given in Fig. 4.40(b).
182 CHAPTER 4. THE PID CONTROLLER

(a) A S IMULINK simulation of the inverse response process de-


scribed by Eqn. 4.77

Step Response

2 G
1
Amplitude

G
−1 1
G +G G2
1 2
G
2 G1+G2
−2

−3

0 2 4 6 8 10 12 14 16 18
Time (sec)

(b) The inverse response G made up of G1 + G2 .

Figure 4.40: An inverse response process is comprised of two component curves, G1 + G2 . Note
that the inverse response first decreases to a minimum y = −1 at around t = 1.5 but then increases
to a steady-state value of y∞ = 1.

Controlling an inverse response process

Suppose we wish to control the inverse response of Eqn. 4.77 with a PI controller tuned using the
ZN relations based on the ultimate gain and frequency. The following listing adapts the strategy
described in Listing 4.4 for the PID tuning of an arbitrary transfer function.

G= tf(4,[3 1]) - tf(3,[1 1]); % Inverse response plant, Eqn. 4.77.

3 [Ku,Pm,Wcg,Wcp] = margin(G); % Establish critical gain, Ku , and frequency, ωu .


Pu = 2*pi/Wcg; % Critical period, Pu
Kc = 0.45*Ku; taui = Pu/1.2; % PI tuning rules (Ziegler-Nichols)
Gc = tf(Kc*[taui 1],[taui 0]);

8 Gcl = feedback(G*Gc,1); % Closed loop


step(Gcl); % Controlled response as shown in Fig. 4.41.

The controlled performance in Fig. 4.41 is extremely sluggish taking about 100 seconds to reach
the setpoint while the open loop response as shown in Fig. 4.40(b) takes only about one tenth of
that to reach steady-state. The controlled response also exhibits an inverse response.
4.8. DRAWBACKS WITH PID CONTROLLERS 183

0.5

Figure 4.41: A NMP plant controlled with a


−0.5
0 20 40 60 80 100 120
PI controller is very sluggish but is stable, and
time [s] does eventually reach the setpoint.

The crucial problem with using a PID controller for an inverse response process is that if the
controller is tuned too tight, in other words the gain is too high and the controller too fast acting,
then the controller will look at the early response of the system, which being an inverse system is
going the wrong way, and then try to correct it, thus making things worse. If the controller were
more relaxed about the correction, then it would wait and see the eventual reversal in gain, and
not give the wrong action.

In reality, when engineers are faced with inverse response systems, the gains are deliberately de-
turned giving a sluggish response. While this is the best one can do with a simple PID controller,
far better results are possible with a model based predictive controller such as Dynamic Matrix
Control (DMC). In fact this type of process behaviour is often used as the selling point for these
more advanced schemes. Processes G(s) that have RHP zeros are not open-loop unstable (they
require RHP poles for that), but the inverse process (G(s)−1 ) will be unstable. This means that
some internal model controllers, that attempt to create a controller that approximates the inverse
process, will be unstable, and thus unsuitable in practice.

4.8.2 Approximating inverse-response systems with additional deadtime

We cannot easily remove the inverse response, but we can derive an approximate transfer func-
tion that has no right-hand plane zeros using a Padé approximation. We may wish to do this to
avoid the problem of an unstable inverse in a controller design for example. The procedure is to
(γs−1)
replace the non-minimum phase term (γs − 1) in the numerator with (γs+1) (γs + 1), and then
approximate the first term with a deadtime element. In some respects we are replacing one evil
(the right-hand plane zeros), with another, an additional deadtime.

For example the inverse response plant

(2s − 1)
G= e−3s (4.78)
(3s + 1)(s + 1)

has a right-hand zero at s = 1/2 which we wish to approximate as an additional deadtime ele-
ment. Now we can re-write the plant as

(2s + 1) −(2s − 1) −3s


G= · e
(3s + 1)(s + 1) (2s + 1)
| {z }
Pade
184 CHAPTER 4. THE PID CONTROLLER

and approximate the second factor with a Padé approximation7 in reverse

−(2s + 1)
G≈ e−4s e−3s
(3s + 1)(s + 1)
−(2s + 1)
≈ e−7s (4.79)
(3s + 1)(s + 1)

Note that the original system had an unstable zero and 3 units of deadtime while the approximate
system has no unstable zeros, but whose deadtime has increased to 7.

We can compare the two step responses with:

1 >> G = tf([2 -1],conv([3 1],[1 1]),'iodelay',3); % Original system, Eqn. 4.78.


>> G2 = tf(-[2 1],conv([3 1],[1 1]),'iodelay',7);% No inverse response, Eqn. 4.79.

>> step(G,G2); legend('G','G_{pade}') % Refer Fig. 4.42.

in Fig. 4.42 which, apart from the fact that the approximation version failed to capture the inverse
response, look reasonably similar.

Step Response

0.5
True system G
Gpade

0
Pade approximation
Amplitude

−0.5

Figure 4.42: Approximating inverse- −1


response systems with additional 0 5 10 15 20 25
deadtime Time (sec)

Problem 4.4 A model of a flexible beam pinned to a motor shaft with a sensor at the free end
repeated in [52, p12] is approximated by the non-minimum phase model,

1 −6.475s2 + 4.0302s + 175.77


Gp (s) = · 3 (4.80)
s 5s + 3.5682s2 + 139.5021s + 0.0929

1. Verify that Gp (s) is non-minimum phase.

2. Plot the root locus diagram of Gp (s) for the cases (a) including the integrator, and (b) ex-
cluding the integrator. Based on the root locus, what can you say about the performance of
stabilising the system with a proportional controller?

3. Simulate the controlled response of this system with an appropriately tuned PID controller.
7 Recall that e−θs ≈ (1 − (2/θ)s)/(1 + (2/θ)s) using a 1-1 Padé approximation.
4.9. DEAD TIME COMPENSATION 185

4.9 Dead time compensation

Processes that exhibit dead time are difficult to control and unfortunately all too common in
the chemical processing industries. The PID controller, when controlling a process with dead-
time falls prey to the sorts of problems exhibited by the inverse response processes simulated in
§4.8.1, and in practice the gain must be detuned so much that the controlled performance suf-
fers. One successful technique for controlling processes with significant dead time, and that has
subsequently spawned many other types of controllers, is the Smith predictor.

To implement a Smith predictor, we must have a model of the process, which we can artificially
split into two components, one being the estimated model without deadtime, G(s) and one com-
ponent being the estimated dead time,

Gp (s) = G(s) e−θs ≈ Ĝ(s) e−θ̂s (4.81)

The fundamental idea of the Smith predictor is that we do not control the actual process,Gp (s),
but we control a model of the process without the deadtime, Ĝ(s). Naturally in real life we
cannot remove the deadtime easily, (otherwise we would of course), but in a computer we can
easily remove the deadtime. Fig. 4.43 shows a block diagram of the deadtime compensator where
D(s) is our standard controller such as say a PID controller. The important feature of the Smith
predictor scheme is that the troublesome pure deadtime component is effectively removed from
the feedback loop. This makes the controller’s job much easier.

Dead time compensator


+
r -+ -+ -+ - D(s) - G(s)e−θs - y
6− 6− 6+
Plant
Ĝ(s)e−θ̂s 

Ĝ(s) 

Figure 4.43: The Smith predictor structure

Using block algebra on the Smith predictor structure given in Fig. 4.43 we can easily show that
the closed loop transfer function is

Y (s) D(s) G(s)e−θs


= (4.82)
R(s) 1 − D(s)Ĝ(s)e−θ̂s + D(s)G(s)e−θs + Ĝ(s)D(s)

and assuming our model closely apprtoximates the true plant,

D(s)G(s)
= e−θs if G = Ĝ and θ = θ̂ (4.83)
1 + G(s)D(s)
The nice property of the Smith predictor is now apparent in Fig. 4.44 in that the deadtime has
been cancelled from the denominator of Eqn. 4.83, and now sits outside the feedback loop.
186 CHAPTER 4. THE PID CONTROLLER

+
r -+ - D(s) G(s) - e−θs -y
6−

Figure 4.44: The Smith predictor structure from Fig. 4.43 assuming no model/plant mis-match.

For a successful implementation, the following requirements should hold:

1. The plant must be stable in the open loop.


2. There must be little or no model/plant mismatch.
3. The delay or dead time must be ‘realisable’. In a discrete controller this may just be a
shift register, but if the dead time is equivalent to a large number of sample times, or a
non-integral number of sample times, or we plan to use a continuous controller, then the
implementation of the dead time element is complicated.

Smith predictors in S IMULINK

Suppose we are to control a plant with significant deadtime,


2
Gp = e−50s (4.84)
τ 2 s2 + 2τ ζs + 1
where τ = 10, ζ = 0.34. A PID controller with tuning constants K = 0.35, I = 0.008, D = 0.8 is
unlikely to manage very well, so we will investigate the improvement using a Smith predictor.

A S IMULINK implementation of a Smith predictor is given in Fig. 4.45 which closely follows the
block diagram of Fig. 4.43. To turn off the deadtime compensation, simply double click on the
manual switch. This will break the compensation loop leaving just the PID controller.

2
PID
100 s2 +6.8s+1
setpoint PID Controller delayp =50 Scope
process

Sum 3
Manual Switch
delay =50
2
100 s2 +6.8s+1
model

Figure 4.45: A S IMULINK implementation of a Smith predictor. As shown, the deadtime compen-
sator is activated. Refer also to Fig. 4.43.

The configuration given in Fig. 4.45 shows the predictor active, but if we change the constant
block to zero, the switch will toggle removing the compensating Smith predictor, and we are left
4.10. TUNING AND SENSITIVITY OF CONTROL LOOPS 187

with the classical feedback controller. Fig. 4.46 highlights the usefulness of the Smith predictor
showing what happens when we turn on the predictor at t = 1750. Suddenly now the controller
finds it easier to drive the process since it needs not worry about the deadtime. Consequently the
controlled performance improves. We could further improve the tuning when using the Smith
predictor, but then in this case the uncompensated controlled response would be unstable. In

Smith predictor on at t=2100


2
output & setpoint

−1
Smith on
−2
2

1
Input u

−1
Smith on
−2
0 500 1000 1500 2000 2500 3000 3500 4000
time

Figure 4.46: Dead time compensation. The Smith predictor is turned on at t = 2100 and the
controlled performance subsequently rapidly improves.

the simulation given in Fig. 4.46, there is no model/plant mis-match, but it is trivial to re-run
the simulation with a different dead time in the model, and test the robustness capabilities of the
Smith Predictor.

Fig. 4.47 extends this application to the control of an actual plant, (but with artificially added
deadtime simply because the plant has little natural deadtime) and a deliberately poor model
(green box in Fig. 4.47(a)). Fig. 4.47(b) demonstrates that turning on the Smith predictor at t = 100
shows a big improvement in the controlled response. However the control input shows a little
too much noise.

4.10 Tuning and sensitivity of control loops

Fig. 4.48 depicts a plant G controlled by a controller C to a reference r in the presence of process
disturbances v and measurement noise w. A good controller must achieve a number of different
aims. It should encourage the plant to follow a given setpoint, it should reject disturbances
introduced by possible upstream process upsets and minimise the effect of measurement noise.
Finally the performance should be robust to reasonable changes in plant dynamics.

The output Y (s) from Fig. 4.48 is dependent on the three inputs, R(s), V (s) and W (s) and using
block diagram simplification, is given by

CG G CG
Y (s) = R(s) + V (s) − W (s) (4.85)
1 + CG 1 + CG 1 + CG
188 CHAPTER 4. THE PID CONTROLLER

PID In1 Out1

Signal PID Controller Transport Scope


Generator blackbox Delay

model
Delay model Switch Step
1
s+1

(a) Deadtime compensation using S IMULINK and the real-time toolbox

Smith predictor off


0.5
setpoint & output

−0.5
Smith predictor on

−1
0 20 40 60 80 100 120 140 160 180 200

0.5
input

−0.5

−1
0 20 40 60 80 100 120 140 160 180 200
time (s)

(b) Controlled results without, and with, deadtime compensation.

Figure 4.47: Deadtime compensation applied to the blackbox. The compensator is activated at
time t = 100 after which the controlled performance improves.

Ideally a successful control loop is one where the error is small, where the error is E = R − Y , or
1 G CG
E(s) = R(s) − V (s) + W (s) (4.86)
1 + CG 1 + CG 1 + CG
We can simplify this expression by defining the open loop transfer function, or sometimes re-
ferred to just as the loop transfer function, as
L(s) = C(s)G(s) (4.87)
which then means Eqn. 4.86 is now
1 G L
E(s) = R(s) − V (s) + W (s) (4.88)
1+L 1+L 1+L

By rearranging the blocks in Fig. 4.48, we can derive the effect of the output on the error, the input
on the setpoint and so on. These are known as sensitivity functions and they play an important
role in controller design.
4.10. TUNING AND SENSITIVITY OF CONTROL LOOPS 189

process disturbance
v

reference error
u ? controlled output
-+ e - - + - G(s) - y
r C(s)
6

controller plant

?
+ w
measurement noise

Figure 4.48: Closed loop with plant G(s) and controller C(s) subjected to disturbances and mea-
surement noise.

The sensitivity function is defined as the transfer function from setpoint to error
1
S(s) = = Ger (s) (4.89)
1 + L(s)
The complementary sensitivity function is defined as the transfer function from reference to out-
put
L(s)
T (s) = = Gyr (s) = Gyw (s) (4.90)
1 + L(s)
The disturbance sensitivity function is

G(s)
GS(s) = = Gyv (s) (4.91)
1 + L(s)
and the control sensitivity function is

C(s)
CS(s) = = Gur (s) = Guw (s) (4.92)
1 + L(s)

The error from Eqn. 4.88 (which we desire to keep as small as practical), now in terms of the the
sensitivity and complementary sensitivity transfer functions is

E(s) = S(s)R(s) − S(s)G(s)V (s) + T (s)W (s) (4.93)

If we are to keep the error small for a given plant G(s), then we need to design a controller C(s)
to keep both S(s) and T (s) small. However there is a problem because a direct consequence of
the definitions in Eqn. 4.89 and Eqn. 4.90 is that

S(s) + T (s) = 1

which means we cannot keep both small simultaneously. However we may be able to make S(s)
small over the frequencies when R(s) is large, and T (s) small when the measurement noise W (s)
dominates. In many practical control loops the servo response R(s) dominates at low frequencies,
and the measurement noise dominates the high frequencies. This is known as loop shaping.

We are primarily interested in the transfer function T (s), since it describes how the output changes
given changes in reference command. At low frequencies we demand that T (s) = 1 which means
190 CHAPTER 4. THE PID CONTROLLER

that we have no offset at steady-state and that we have an integrator somewhere in the process.
At high frequencies we demand that T (s) → 0 which means that fast changes in command signal
are ignored.

Correspondingly we demand that since S(s) + T (s) = 1, then S(s) should be small at low fre-
quencies, and 1 for high frequencies such as shown in Fig. 4.49.

Bode plot of S & T


1
10
||S||

0
10
Magnitude

−1
10

Sensitivity, S(s)
−2
10
Complementary, T(s)

S
Figure 4.49: Typical sensitiv-
T
ity transfer functions S(s) and −3
10
T (s). The maximum of the |S| −2
10 10
−1
10
0
10
1

is marked ∗. ω [rad/s]

The maximum values of |S| and |T | are also useful robustness measures. The maximum sensitiv-
ity function
kSk∞ = max |S(jω)| (4.94)
ω

is shown as a ∗ in Fig. 4.49 and is inversely proportional to the minimal distance from the loop
transfer function to the critical (−1, 0i) point. Note how the circle of radius 1/||S||∞ centered at
(−1, 0i) just touches the Nyquist curve shown in Fig. 4.50. A large peak in the sensitivity plot in
Fig. 4.49, corresponds to a small distance between the critical point and the Nyquist curve which
means that the closed loop is sensitive to modelling errors and hence not very robust. Values of
kSk∞ = 1.7 are considered reasonable, [108].

Nyquist plot of L(s)


1
circle radius=1/||S||

0.5

−0.5

−1
ℑ{L(i ω)}

−1.5

−2

−2.5

−3

−3.5
Figure 4.50: A circle of radius 1/kSk∞ cen-
tered at (−1, 0i) just touches the Nyquist −4
−3 −2 −1 0 1 2 3
curve. See also Fig. 4.49. ℜ{L(i ω)}
4.11. SUMMARY 191

4.11 Summary

PID controllers are simple and common in process control applications. The three terms of a PID
controller relate to the magnitude of the current error, the history of the error (integral), and the
current direction of the error (derivative). The integral component is necessary to remove offset,
but can destabilise the closed loop. While this may be countered by adding derivative action,
noise and abrupt setpoint changes may cause problems. Commercial PID controllers usually add
a small time constant to the derivative component to make it physically realisable, and often only
differentiate the process output, rather than the error.

Many industrial PID controllers are poorly tuned which is clearly uneconomic. Tuning con-
trollers can be difficult and time consuming and there is no correct fail-safe idiot-proof procedure.
The single experiment in closed loop (Yuwana-Seborg) is the safest way to tune a critical loop,
although the calculations required are slightly more complicated than the traditional methods
and is designed for processes that approximate a first-order system with time delay. The tuning
parameters obtained by all these methods should only be used as starting estimates. The fine
tuning of the tuning parameters is best done by an enlightened trial and error procedure on the
actual equipment.
192 CHAPTER 4. THE PID CONTROLLER
Chapter 5

Digital filtering and smoothing

Never let idle hands get in the way of the devil’s work.
Basil Fawlty (circ. ’75)

5.1 Introduction

This chapter is concerned with the design and use of filters. Historically analogue filters were the
types of filters that electrical engineers have been inserting for many years in radios fabricated
out of passive components such as resisters, capacitors and inductors or more recently and reli-
ably out of operational amplifiers. When we filter something, be it home made wine, sump oil,
or undesirable material on the Internet, we are really interested in purifying, or separating the
‘noise’ or unwanted, from the desired.

The classical approach to filtering assumes that the useful signals lie in one frequency band, and
the unwanted noise in another. Then simply by constructing a transfer function as our filter,
we can pass the wanted band (signal), while rejecting the unwanted band (noise) as depicted in
Fig. 5.1.

Unfortunately even if the frequency bands of the signal and noise are totally distinct, we still

input, u(t) - Filter, H(s) - output, y(t)


(noisy) (smooth)

| {z }
low-pass filter

Figure 5.1: A filter as a transfer function

193
194 CHAPTER 5. DIGITAL FILTERING AND SMOOTHING

cannot effect complete separation in an online filtering operation. In practice however, we can
design the filters with such sharp cut-offs that a practical separation is achieved. More commonly
the signal is not distinct from the noise (since the noise tends to inhabit all frequency bands), so
now the filter designer must start to make some trade-offs.

The more sophisticated model-based filters that require computer implementation are described
in chapter 9. While the classical filters can be realised (manufactured) with passive analogue
components such as resisters, capacitors, and more rarely inductors, today they are almost always
implemented digitally. This chapter describes these methods. The S IGNAL P ROCESSING toolbox
contains many useful filter design tools which we will use in this chapter, although the workhorse
function for this chapter, filter, is part of M ATLAB’s kernel.

5.1.1 The nature of industrial noise

Real industrial measurements are always corrupted with noise. This noise (or error) can be at-
tributed to many causes, but even if these causes are known, the noise is usually still unpre-
dictable. Causes for this noise can include mechanical vibrations, poor electrical connections
between instrument and transducer sensor, electrical interference of other equipment or a com-
bination of the above. What ever the cause, these errors are usually undesirable. Remember, if it
is not unpredictable, then it is not noise.

An interesting example of the debilitating affect of noise is shown in the controlled response
in Fig. 5.2. In this real control loop, the thermocouple measuring the temperature in a sterilising
furnace was disturbed by a mobile phone a couple of meters away receiving a call. Under normal
conditions the standard deviation of the measurement is around 0.1 degrees which is acceptable,
but with the phone operating, this increases to 2 or 3 degrees.

118
temperature [deg C]

116

114

112
mobile phone range here
110

1
Power [%]

0.5

Figure 5.2: A noisy temperature


measurement due to a nearby mobile 0
phone causes problems in the feed- 0 5 10 15 20 25
back control loop. time [min]

We can mathematically describe the real measurement process as

y = y⋆ + v

where y ⋆ is the true value which is corrupted with some other factor which we call v to give the
“as received” measured value y.
5.1. INTRODUCTION 195

true signal, y ⋆ -

?
measurement noise, v -+

?
corrupted measured signal, y

Figure 5.3: Noise added to a true, but unknown, signal

Naturally we never know precisely either v or y ⋆ . The whole point about filtering or signal
processing is to try and distinguish the “signal”, y ⋆ from the noise, v. There is another reason why
one should study filtering, and that is to disguise horrible experimental data. As one operator
said, “I control the plant by looking at trends of the unfiltered data, but I only give filtered data
to management”.

While the noise itself is unpredictable (non deterministic) it often has constant, known or pre-
dictable statistical properties. The noise is often described by a Gaussian distribution, with
known and stable statistical properties such as mean and variance.

Noise is generally considered a nuisance because

1. It is not a true reflection of the actual process variables, since the transducer, or other exter-
nal factors have introduced unwanted information.

2. It makes it difficult to differentiate the data accurately such as say for the ‘D’ part of a PID
controller, or extracting the flowrate from the rate of change of a level signal.

3. Often for process management applications, only an average value or smoothed character-
istic value is required.

4. Sometimes noise is not undesirable! In some specialised circumstances noise can actually
counter-intuitively improve the strength of a signal. See the article entitled The Benefits of
Background Noise [142] for more details of this paradox.

Some care should be taken with point 1. No one really knows what the true value of the process
is (except possibly Mother Nature), so the assertion that a particular time series is not true is
difficult to rigorously justify. However for most applications, it is a reasonable and defendable
assumption. When filtering the data for process control applications, it is often assumed that the
true value is dominated by low frequency dynamics (large time constants etc), and that anything
at a high frequency is unwanted noise. This is a fair assumption for measurements taken on most
equipment in the processing industries. Large industrial sized vessels, reactors, columns etc
often have large holding volumes, large capacities, and hence slow dynamics. This tendency to
favour the low frequencies by low-pass filtering everything, while common for chemical process
engineers, is certainly not the norm in say telecommunications.

Differentiating (i.e. estimating the slope of) noisy data is difficult. Conversely, integrating noisy
data is easy. Differentiating experimental data is required for the implementation of the ‘D’ part
of PID controllers, and in the analysis of rate data in chemical reaction control amongst other
things. Differentiation of actual industrial data from a batch reaction is demonstrated in §5.5, and
as a motivating example, differentiating actual data without smoothing first is given in §5.1.2.
196 CHAPTER 5. DIGITAL FILTERING AND SMOOTHING

5.1.2 Differentiating without smoothing

As seen in chapter 4, the Proportional-Integral-Differential controller uses the derivative compo-


nent, or D part, to stabilise the control scheme and offers improved control. However in practice,
the D part is rarely used in industrial applications, because of the difficulty in differentiating the
raw measured signal in real-time.

The electromagnetic balance arm shown in Fig. 1.5 is a very oscillatory device, that if left un-
compensated would wobble for a considerable time. If we were to compensate it with a PID
controller such that the balance arm closely follows the desired setpoint, we find that we would
need substantial derivative action to counter the oscillatory poles of the plant. The problem is
how we can reliably generate a derivative of a noisy signal for out PID controller.

Fig. 5.4 demonstrates experimentally what happens when we attempt to differentiate a raw and
noisy measurement, using a crude backward difference with a sample time T

dy yt − yt−1

dt T
The upper plot shows the actual position of the balance arm under real-time closed-loop PID con-
trol using a PC with a 12 bit (4096 discrete levels) analogue input/output card. In this instance,
I tuned the PID controller with a large amount of derivative action, and it is this that dominates
the controller output (lower plot). While the controlled response is reasonable, (the arm position
follows the desired setpoint, the input is too noisy), and will eventually wear the actuators if used
for any length of time. Note that this noise is a substantial fraction of the full scale input range.

2500
setpoint & output

2000

1500

1000
5000

4000

3000
Input

2000

1000

0
0 5 10 15 20 25 30
time [s]

Figure 5.4: The significant derivative action of a PID controller given the slightly noisy data from
an electromagnetic balance arm (solid line in upper plot) causes the controller input signal (lower
plot) to go exhibit very high frequencies.

One solution to minimise the excitedness of the noisy input signal whilst still retaining the dif-
ferentiating characteristic of the controller is to low-pass filter (smooth) the measured variable
before we differentiate it. This is accomplished with a filter, and the design of these filters is what
5.2. SMOOTHING MEASURED DATA USING ANALOGUE FILTERS 197

this chapter is all about. Alternatively the controller could be completely redesigned to avoid the
derivative step.

5.2 Smoothing measured data using analogue filters

If we have all the data at our disposal, we can draw a smooth curve through the data. In the past,
draughtsmen may have used a flexible ruler, or spline, to construct a smoothing curve “by eye”.
Techniques such as least squares splines or other classes of piecewise low-order polynomials
could be used if a more mathematically rigorous fitting is required. Techniques such as these fall
into the realm of regression and examples using M ATLAB functions such as least-squares fitting,
splines, and smoothing splines are covered more in [201]. If the noise is to be discarded and
information about the signal in the future is used, this technique is called smoothing. Naturally
this technique can only be applied offline after all the data is collected since the “future” data
must be known. A smoothing method using the Fourier transform is discussed in §5.4.5.

Unfortunately for real time applications such as the PID controller application in §5.1.2, we can-
not look into the future and establish what the signal will be for certain. In this case we are
restricted to smoothing the data using only output historical values. This type of noise rejection
is called real-time filtering and it is this aspect of filtering that is of most importance to control
engineers. Common analogue filters are discussed in §5.2.3, but to implement using digital hard-
ware we would rather use an equivalent digital description. The conversion from the classical
analogue filter to the equivalent digital filter using the bilinear transform is described in §5.3.

Finally we may wish to predict data in the future. This is called prediction.

5.2.1 A smoothing application to find the peaks and troughs

Section 4.6.3 described one way to tune a PID controller where we first to subject the plant to a
closed loop test with a trial gain, and then record the peaks and troughs of the resultant curve.
However as shown in the actual industrial example in Fig. 5.5(a), due to excessive noise on the
thermocouple (introduced by mobile phones in the near vicinity!), it is difficult to extract the
peaks and troughs from the measured data. It is especially difficult to write a robust algorithm
that will do this automatically with such noise.

The problem is that to achieve a reasonable smoothing we also introduce a large lag. This in turn
probably does not affect the magnitude of the peaks and troughs, but it does affect the timing, so
our estimates of the model parameters will be wrong.

One solution is to use an acausal filter which does not exhibit any phase lag. M ATLAB provides
a double-pass filter called filtfilt which is demonstrated in Fig. 5.5(b). Using the smooth
data from this acausal filter generates reasonable estimates for both the magnitude and timing of
the characteristic points needed for the tuning. The drawback is that the filtering must be done
off-line.

5.2.2 Filter types

There are four common filter types: the low-pass, high-pass, band-pass and notch-pass. Each of
these implementations can be derived from the low-pass filter, meaning that it is only necessary
198 CHAPTER 5. DIGITAL FILTERING AND SMOOTHING

Output & setpoint 4

2
raw value
0
Butterworth filter smoothed, n=5, ω =0.03
c
Key points
−2
0 100 200 300 400 500 600 700 800
time [s]

(a) To extract the key points needed for PID tuning, we must first
smooth the noisy data. Unfortunately, to achieve good smoothing
using a causal filter such as the 5th order Butterworth used here,
we introduce a large lag.

6
Output & setpoint

raw value
0
Acausal smoothed
Key points
−2
0 100 200 300 400 500 600 700 800
time [s]

(b) Using an acausal filter removes the phase lag problem

Figure 5.5: In this industrial temperature control of a furnace, we need to smooth the raw mea-
surement data in order to extract characteristic points required for PID tuning.

to study in detail the design of the first. A comparison of the magnitude response of these filters
is given in Fig. 5.8.

Low-pass filters

A linear filter is mathematically equivalent to a transfer function, just like a piece of processing
equipment such as a series of mixing tanks, or distillations columns. Most of the filter applica-
tions used in process industry are low-pass in nature. That is, ideally the filter passes all the low
frequencies, and above some cut off frequency ωc , they attenuate the signal completely. These
types of filters are used to smooth out noisy data, or to retain the long term trends, rather than
the short term noisy contaminated transients.

The ideal low pass filter is a transfer function that has zero phase lag at any input frequency, and
an infinitely sharp amplitude cut off above a specified frequency ωc . The Bode diagram of such
a filter would have a step function down to zero on the amplitude ratio plot and a horizontal
line on the phase angle plot as shown in Fig. 5.8. Clearly it is physically impossible to build
such a filter. However, there are many alternatives for realisable filters that approximate this
behaviour. Fig. 5.6 shows a reasonable attempt to approximate the ideal low-pass filter. Typically
the customer will specify a cut-off frequency, and it is the task of the filter designer to select
sensible values for the pass and stop band oscillations or ripple and the pass and stop band
frequencies to achieve these balancing filter performance with minimal hardware cost.
5.2. SMOOTHING MEASURED DATA USING ANALOGUE FILTERS 199

|H(iω)| 6

1 
pass band ripple
1 ..
1+ǫ2 ..
..
..
..
..
..
pass band .. stop band
.
 -...  -
..
..
..
..
..
..
..
..
..
..
1 ..
A2 ..
.
0 - ω
0 ωp ωs
corner frequency

Figure 5.6: Low-pass filter specification

The most trivial low-pass filter approximation to the ideal filter is a single lag with a cut off
frequency (ωc ) of 1/τ . If you wanted a higher order filter, say a nth order filter, you could simply
cascade n of these filters together as shown in Fig. 5.7.
1
Gn (s) =  n (5.1)
s
ωc +1

cascaded 3rd order low-pass filter


z }| {

input, u - 1 - 1 - 1 - output, y
s/ωc +1 s/ωc +1 s/ωc +1

| {z }
single low-pass filter

Figure 5.7: Three single low-pass filters cascaded together to make a third-order filter.

Actually Eqn 5.1 is a rather poor approximation to the ideal filter and better approximations exist
for the same complexity or filter order, although they are slightly more difficult to design and
200 CHAPTER 5. DIGITAL FILTERING AND SMOOTHING

implement in hardware. Two classic analogue filter designs are described in §5.2.3 following.

High-pass filters

There are however applications where high pass filters are desired, although these are far more
common in the signal processing and electronic engineering fields. These filters do the opposite
of the low pass filter; namely they attenuate the low frequencies and pass the high frequencies
unchanged. This can be likened to the old saying “If I’ve already seen it, I’m not interested!”
These remove the DC component of the signal and tend to produce the derivative of the data.
You may use a high pass filter if you wanted to remove long term trends from your data, such as
seasonal effects for example.

Band and notch-pass filters

The third type of filter is a combination of the high and low pass filters. If these two are combined,
they can form a band pass or alternatively a notch filter. Notch filters are used to remove a par-
ticular frequency such as the common 50Hz hum caused by the mains power supply. Band pass
filters are used in the opposite manner, and are used in transistor radio sets so that the listener can
tune into one particular station without hearing all the other stations. The difference between the
expensive radios and the cheap ones, is that the expensive radios have a sharp narrow pass band
so as to reduce the amount of extra frequency. Oddly enough for some tuning situations a notch
filter is also used to remove neighbouring disturbing signals. Some satellite TV stations intro-
duce into the TV signal a disturbing component to prevent non-subscribers from watching their
transmission without paying for the “secret decoder ring”. This disturbance, which essentially
makes the TV unwatchable, must of course must be within the TV station’s allowed frequency
range. The outer band pass filter will tune the TV to the station’s signal, and a subsequent notch
filter will remove the disturbing signal. Fig. 5.8 contrasts the amplitude response of these filters.

Ideal filter Low-pass High-pass Band-pass


AR AR AR AR
6 6 6 6

- - - -ω, [rad/s]
ωc ωc ωc ωl ωh

Figure 5.8: Amplitude response for ideal, low-pass, high pass and band-pass filters.

Filter transformation

Whatever the type of filter we desire, (high-pass, band-pass notch etc), they can all be obtained
by first designing a normalised low-pass filter prototype as described by [159, p415]. This filter
is termed normalised because it has a cutoff frequency of 1 rad/s. Then the desired filter, high-
pass, band-pass or whatever, is obtained by transforming the normalised low-pass filter using the
5.2. SMOOTHING MEASURED DATA USING ANALOGUE FILTERS 201

relations given in Table 5.1 where ωu is the upper cutoff and ωl is the lower cutoff frequencies.
This is very convenient, since we need only to be able to design a low-pass filter from which all

Table 5.1: The filter transformations needed to convert from the prototype filter to other general
filters. To design a general filter, replace the s in the prototype filter with one of the following
transformed expressions.

Desired filter transformation


low pass s → s/ωu
s2 + ω u ω l
band-pass s→
s(ωu − ωl )
s(ωu − ωl )
band-stop s→ 2
s + ωu ωl
high pass s → ωu /s

the others can be obtained using a variable transformation. Low-pass filter design is described
for the common classical analogue filter families in §5.2.3. M ATLAB also uses this approach in the
S IGNAL P ROCESSING toolbox with what it terms analogue low-pass proto-type filters.

5.2.3 Classical analogue filter families

Two common families of analogue filters superior to the simple cascaded first-order filter net-
work are the Butterworth and the Chebyshev filters. Both these filters require slightly more com-
putation to design, but once implemented, require no more computation time or hardware com-
ponents than any other filter of the same order. Historically these were important continuous-
time filters, and to use discrete filter designs we must convert the description into the z plane by
using the bilinear transform.

In addition to the Butterworth and Chebyshev filters, there are two slightly less common classic
analogue filters; the Chebyshev type II (sometimes called the inverse Chebyshev filter), and the
elliptic or Cauer filter. All four filters are derived by approximating the sharp ideal filter in
different ways. For the Butterworth, the filter is constructed in such a way as to maximise the
number of derivatives to zero at ω = 0 and ω = ∞. The Chebyshev filter exhibits minimum error
over the passband, while the type II version minimises the error over the stop band. Finally the
elliptic filter uses a Chebyshev mini-max approximation in both pass-band and stop-band. The
calculation of the elliptic filter requires substantially more computation than the other three. The
S IGNAL P ROCESSING toolbox contains both discrete and continuous design functions for all four
analogue filter families.

Butterworth filters

The Butterworth filter, HB , is characterised by the following magnitude relation

1
|HB (ω)|2 =  2n (5.2)
ω
1+ ωc

where n is the filter order and ωc is the desired cut off frequency. If we substitute s = iω, we find
that the 2n poles of the squared magnitude function, Eqn. 5.2, are
1
sp = (−1) 2n (iωc )
202 CHAPTER 5. DIGITAL FILTERING AND SMOOTHING

That is, they are spaced equally at π/n radians apart around a circle of radius ωc . Given that
we desire a stable filter, we choose just the n stable poles of the squared magnitude function for
HB (s). The transfer function form of the general Butterworth filter in factored form is
n
Y 1
HB (s) = (5.3)
s + pk
k=1

where   
1 2k − 1
pk = ωc exp −πi + (5.4)
2 2n
Thus all the poles are equally spaced apart on the stable half-circle of radius ωc in the s plane.
The M ATLAB function buttap, (Butterworth analogue low pass filter prototype), uses Eqn. 5.4
to generate the poles.

We can plot the poles of a 5th order Butterworth filter after converting the zero-pole-gain form
to a transfer function form, using the pole-zero map function, pzmap, or of course just plot the
poles directly.

1 [z,p,k] = buttap(5) % Design a 5th order Butterworth proto-type filter


pzmap(zpk(z,p,k)) % Look at the poles & zeros

Note how the poles of the Butterworth filter are equally spread around the stable half circle in
Fig. 5.9. Compare this circle with the ellipse pole-zero map for the Chebyshev filter shown in
Fig. 5.10.

0.5
ℑ(p)

−0.5

Figure 5.9: Note how the poles of the Butterworth filter, −1


×, are equally spaced on the stable half circle. See also
−1 −0.5 0 0.5 1
Fig. 5.10. ℜ(p)

Alternatively using only real arithmetic, the n poles of the Butterworth filter are pk = ak ± bk i
where
ak = −ωc sin θ, bk = −ωc cos θ (5.5)
where
π
θ= (2k − 1) (5.6)
2n
Expanding the polynomial given in Eqn. 5.3 gives the general Butterworth filter template in the
continuous time domain,
1
HB (s) =  n  n−1 (5.7)
s s
cn ωc + cn−1 ωc + · · · + c1 ωsc + 1

where ci are the coefficients of the filter polynomial. Using Eqn 5.4, the coefficients for the general
nth order Butterworth filter can be evaluated.
5.2. SMOOTHING MEASURED DATA USING ANALOGUE FILTERS 203

Taking advantage of the complex arithmetic abilities of M ATLAB, we can generate the poles typ-
ing Eqn. 5.4 almost verbatim. Note how I use 1i, (that is I type the numeral ‘1’ followed with an
‘i’ with no intervening space) to ensure I get a complex number.

Listing 5.1: Designing Butterworth Filters using Eqn. 5.4.


>>n = 4; % Filter order
>>wc = 1.0; % Cut-off frequency, ωc
3 >>k = [1:n]';
1 2k−1
 
>>p = wc*exp(-pi*1i*(0.5 + (2*k-1)/2/n)); % Poles, pk = ωc exp −πi 2
+ 2n
>>c = poly(p); % should force real
>>c = real(c) % look at the coefficients
c =
8 1.0000 2.6131 3.4142 2.6131 1.0000

>>[B,A] = butter(n,wc,'s'); % compare with toolbox direct method

Once we have the poles, it is a simple matter to expand the polynomial out and find the coeffi-
cients. In this instance, owing to numerical roundoff, we are left with a small complex residual
which we can safely delete. The 4th order Butterworth filter with a normalised cut-off frequency
is
1
HB (s) = 4
s + 2.6131s3 + 3.4142s2 + 2.6131s + 1
Of course using the S IGNAL P ROCESSING toolbox, we could duplicate the above with a single
call to butter given as the last line in the above example.

We can design a low-pass filter at an arbitrary cut-off frequency by scaling the normalised ana-
logue proto-type filter using Table 5.1. Listing 5.2 gives the example of the design of a low-pass
filter with a cutoff of fc = 800 Hz.

Listing 5.2: Designing a low-pass Butterworth filter with a cut-off frequency of fc = 800 Hz.
ord = 2; % Desired filter order
[Z,P,K] = buttap(ord); Gc = tf(zpk(Z,P,K)); % 2nd-order Butterworth filter prototype

Fc = 800; % Desired cut-off frequency in [Hz]


5 wc = Fc*pi*2 ; % [rad/s]

n = [ord:-1:0]; % Scale filter by substituting s for s/ωc


d = wc.ˆn;
Gc.den{:} = Gc.den{:}./d
10

[B,A] = butter(2,wc,'s'); Gc2 = tf(B,A); % Alternative continuous design (in Matlab)

p=bodeoptions; p.FreqUnits = 'Hz'; % Set Bode frequency axis units as Hz (not rad/s)
bode(Gc,Gc2,p)
15 hline(-90); vline(Fc)

A high pass filter can be designed in much the same way, but this time substituting ωc /s for s.

Listing 5.3: Designing a high-pass Butterworth filter with a cut-off frequency of fc = 800 Hz.
ord = 2; [Z,P,K] = buttap(ord); Gc = tf(zpk(Z,P,K)); % Low-pass Butterworth proto-
type
204 CHAPTER 5. DIGITAL FILTERING AND SMOOTHING

Fc = 800;wc = Fc*pi*2 ; % Frequency specifications

5 n = [0:ord]; d = wc.ˆn; % Scale filter by converting s to ωc /s


Gc.den{:} = Gc.den{:}.*d;
Gc.num{:} = [1,zeros(1,ord)];

bode(Gc,p); title('High pass')

The frequency response of the Butterworth filter has a reasonably flat pass-band, and then falls
away monotonically in the stop band. The high frequency asymptote has a slope of −n on a
log-log scale. Other filter types such as the Chebyshev (see following section) or elliptic filters
are contained in the S IGNAL P ROCESSING toolbox, and [42] details the use of them.

Algorithm 5.1 Butterworth filter design


Given a cut-off frequency, ωc , (in rad/s), and desired filter order, n,

1. Compute the n poles of the filter, HB (s), using


  
1 2k − 1
pk = ωc exp −πi + , k = 1, 2, . . . , n
2 2n

2. Construct the filter HB (s) either using the poles directly in factored form, or expanding the
polynomial
1
HB (s) =
poly(pk )

3. Convert low pass filter to high-pass, band pass etc by substituting for s given in Table 5.1.
4. Convert to a discrete filter using c2dm if desired ensuring that the cut-off frequency is much
less than the Nyquist frequency; ωc << ωN = π/T .

Algorithms 5.1 suffers from the flaw that it assumes the filter designer has already selected a
cut-off frequency, ωc , and filter order, n. However for practical cases, it is characteristics such
as band-pass ripple, stopband attenuation that are specified by the customer such as given in
Fig. 5.6, and not the filter order. We need some way to calculate the minimum filter order (to
construct the cheapest filter) from the specified magnitude response characteristics. Once that is
established, we can proceed with the filter design calculations using the above algorithm. Further
descriptions of the approximate order are given in [92, pp9305–327] and [194]. The functionally
equivalent routines, buttord and cheb1ord in the S IGNAL P ROCESSING toolbox attempt to
find a suitable order for a set of specified design constraints.

In the case for Butterworth filters, the filter order is given by


&    '
log10 10Rp /10 − 1 10As /10 − 1
n= (5.8)
2 log10 (ωp /ωs )

where Rp is the pass-band ripple, As is the stopband ripple, and ωp , ωs are the pass and stop
band frequencies respectively. In general, the order n given by the value inside the ⌈·⌉ brackets
(without feet) in Eqn. 5.8 will not be an exact integer, so we should select the next largest integer
to ensure that we meet or exceed the customer’s specifications. M ATLAB’s ceiling function ceil
is useful here.

Problem 5.1 1. Derive the coefficients for a sixth order Butterworth filter.
5.2. SMOOTHING MEASURED DATA USING ANALOGUE FILTERS 205

2. Draw a Bode diagram for a fourth order Butterworth filter with wc = 1 rad/s. On the same
plot, draw the Bode diagram for the filter

1
H(s) =  4
s
ωc +1

with the same wc . Which filter would you choose for a low pass filtering application?
Hints: Look at problem A-4-5 (DCS) and use the M ATLAB function bode. Plot the Bode
diagrams especially carefully around the corner frequency. You may wish to use the Matlab
commands freqs or freqz.

3. Find and plot the poles of a continuous time 5th order Butterworth filter. Also plot a circle
centered on the origin with a radius wc .

4. Plot the frequency response (Bode diagram) of both a high pass and low pass 4th order
Butterworth filter with a cutoff frequency of 2 rad/s.

Chebyshev filters

Related to the Butterworth filter family is the Chebyshev family. The squared magnitude function
of the Chebyshev filter is
1
|HC (ω)|2 =   (5.9)
1 + ǫ Cn2 ωωc
2

where again n is the filter order, ωc is the nominal cut-off frequency, and Cn is the nth order
Chebyshev polynomial. (For a definition of Chebyshev polynomials, see for example [201], and in
particular the cheb_pol function.) The design parameter ǫ is related to the amount of allowable
passband ripple, δ,
1
δ =1− √ (5.10)
1 + ǫ2
Alternatively the filter design specifications may be given in decibels of allowable passband rip-
ple1 , rdB ,
rdB
δ = 1 − 10− 20 (5.11)

This gives the Chebyshev filter designer two degrees of freedom, (filter order n and ǫ or equiva-
lent), rather than just order as in the Butterworth case. Similar to the Butterworth filter, the poles
of the squared magnitude function, Eqn. 5.9, are located equally spaced along an ellipse in the s
plane with minor axis radius of αωc aligned along the real axis, where

1  1/n 
α= γ − γ −1/n (5.12)
2
with
1 p
γ= + 1 + ǫ−2 (5.13)
ǫ
and a major axis radius of βωc aligned along t2he imaginary axis where

1  1/n 
β= γ + γ −1/n (5.14)
2
1 The functions in the S IGNAL P ROCESSING T OOLBOX use rdB rather than δ.
206 CHAPTER 5. DIGITAL FILTERING AND SMOOTHING

Since the poles pk of the Chebyshev filter, Hc (s) are equally spaced along the stable half of the
ellipse, (refer Fig. 5.10 for a verification of this), the real and imaginary parts are given by:
  
ℜ(pk ) = αωc cos π2 + (2k+1)π 2n

  k = 0, 1, . . . , n − 1 (5.15)
ℑ(pk ) = βωc sin π2 + (2k+1)π 2n

and the corresponding transfer function is given by

K
Hc (s) = Qn
k=0 (s − pk )

where the numerator K is chosen to make the steady-state equal to 1 if n is odd, or 1/ 1 + ǫ2 for
even n. Algorithm 5.2 summarises the design of a Chebyshev filter with ripple in the passband.

Algorithm 5.2 Chebyshev type I filter design


Given a cut-off frequency, ωc , passband ripple, δ, or rdB , and order filter n,

1. Calculate the ripple factor


s
p 1
ǫ = 10r/10 − 1 = −1
(1 − δ)2

2. Calculate the radius of the minor, α, and major β axis of the ellipse using Eqns 5.12–5.14.
3. Calculate the n stable poles from Eqn. 5.15 and expand out to form the denominator of
Hc (s).
4. Choose the normalising gain K such that at steady-state (DC gain)

√ 1 , n even
Hc (s = 0) = 1+ǫ2
1, n odd

We can calculate the gain of a transfer function using the final value theorem which assum-
ing a unit step gives
 a0

1+ǫ2
, n even
K=
a0 , n odd
where a0 is the coefficient of s0 in the denominator of Hc (s).

We can test this algorithm by designing a Chebyshev type I filter with 3dB ripple in the passband,
(rdB = 3dB or δ = 1 − 10−3/20 ), and a cross-over frequency of ωc = 2 rad/s.

Listing 5.4: Designing Chebyshev Filters


1 n = 4; wc = 2; √ n, ωc
% Order & cut-off frequency,
r = 3; % Ripple. Note −3dB = 1/ 2
rn = 1/n;
e = sqrt(10ˆ(r/10)-1); % δ = 1 − 10−r/20 , e = 1/(1 − δ)2 − 1
p

6 g = 1/e + sqrt(1 + 1/e/e); minora = (gˆrn - gˆ(-rn))/2;


majorb = (gˆrn + gˆ(-rn))/2;

k = [0:n-1]'; n2 = 2*n;
5.2. SMOOTHING MEASURED DATA USING ANALOGUE FILTERS 207

realp = minora*wc*cos(pi/2 + pi*(2*k+1)/n2); % real parts of the poles


11 imagp = majorb*wc*sin(pi/2 + pi*(2*k+1)/n2); % imaginary parts
polesc = realp + 1i*imagp; % poles of the filter
Ac = real(poly(polesc)); % force real
Bc = Ac(n+1); % overall DC gain = 1 (n =odd)
if ¬rem(n,2) % if filter order n is even
16 Bc = Bc/sqrt(1+eˆ2); % dc gain
end % if

You may like to compare this implementation with the equivalent S IGNAL P ROCESSING toolbox
version of the type 1 Chebyshev analogue prototype by typing type cheb1ap.

We can verify that the computed poles do indeed lie on the stable half of the ellipse with major
and minor axes αω and βω respectively in Fig. 5.10. I used the axis(’equal’) command for
this figure forcing the x and y axes to be equally, as opposed to conveniently, spaced to highlight
the ellipse. Compare this with the poles of a Butterworth filter plotted in Fig. 5.9.

1.5

0.5
ℑ(p)

−0.5

−1

−1.5

−2 Figure 5.10: Verifying that the poles of a fourth-order


−2 −1 0 1 2
analogue type 1 Chebyshev filter are equally spaced
ℜ(p) around the stable half of an ellipse.

As the above relations indicate, the generation of Chebyshev filter coefficients is slightly more
complex than for Butterworth filters, although they are also part of the S IGNAL P ROCESSING
toolbox. Further design relations are given in [152, p221]. We can compare our algorithm with
the equivalent toolbox function cheby1 which also designs type I Chebyshev filters. We want
the continuous description, hence the optional ‘s’ parameter. Alternatively we could use the
more specialised cheb1ap function which designs an analogue low-pass filter prototype which
is in fact called by cheby1.

Listing 5.5: Computing a Chebyshev Filter


n = 4; wc = 2; √ cut-off, ωc = 2 rad/s
% Order of Chebyshev filter;
r = 3; % dB of ripple, −3dB = 1/ 2
3 [Bc,Ac] = cheby1(n,r,wc,'s'); % design a continuous filter
[Hc,w] = freqs(Bc,Ac); % calculate frequency response.
semilogx(w,abs(Hc)); % magnitude characteristics

These filter coefficients should be identical to those obtained following Algorithm 5.2. You should
exercise some care with attempting to design extremely high order IIR filters. While the algorithm
is reasonably stable from a numerical point of view, if we restrict ourselves just to the factored
form, one may see unstable poles once we have expanded the polynomial owing to numerical
round-off for orders larger than about 40.
208 CHAPTER 5. DIGITAL FILTERING AND SMOOTHING

>> [Bc,Ac] = cheby1(50,3,1,'s'); % Very high order IIR filter


>> max(real(roots(Ac)))
ans =
0.0162 % +ve, unstable RHP pole
5 >> [z,p,g] = cheb1ap(50,3); % repeat, but only calculate poles
>> max(real(p)) % are any RHP?
ans =
-5.5478e-004 % All poles stable

Of course in applications while it is common to require high order for FIR filters, it is unusual
that we would need larger than about n = 10 for IIR filters.

Fig. 5.11 compares the magnitude of both the Butterworth and Chebyshev filter and was gener-
ated using, in part, the above code. For this plot, I have broken from convention, and plotted the
magnitude on a linear rather than logarithmic scale to emphasize better the asymptote towards
zero. Note that for even order Chebyshev filters such as the 4th order presented in Fig. 5.11,
the steady-state gain is not equal to 1, but rather 1− the allowable pass-band ripple. If this is a
problem, one can compensate by increasing the filter gain. For odd order Chebyshev filters, the
steady-state gain is 1 which perhaps is preferable.

Comparison of Butterworth & Chebyshev filters

0.8
−3dB
|H(ω)|

0.6
Butterworth
Chebychev
Figure 5.11: Comparing the magni-
0.4
tude characteristics of 4th order But-
terworth and Chebyshev filters. The
0.2
Chebyshev filter is designed to have
ideal filter
a ripple equal to −3dB (dotted line)
which is approx 30%. Both filter cut- 0
10
−1
10
0 ωc 1
10
offs are at ωc = 2 rad/s. ω [rad/s]

The difference between a maximally flat passband filter such as the Butterworth and the equal
ripple filter such as the Chebyshev is clear. [131, p176] claim that most designers prefer the
Chebyshev filter owing to the sharper transition across the stopband, provided the inevitable
ripple is acceptable.

In addition to the Butterworth and Chebyshev filters, there are two more classic analogue filters;
the Chebyshev type II and the elliptic or Cauer filter. All four filters are derived by approximating
the sharp ideal filter in different ways. For the Butterworth, the filter is constructed in such a way
as to maximise the number of derivatives to zero at ω = 0 and ω = ∞. The Chebyshev filter
exhibits minimum error over the passband, while the type II version minimises the error over
the stop band. Finally the elliptic filter uses a Chebyshev mini-max approximation in both pass-
band and stop-band. The calculation of the elliptic filter requires substantially more computation
than the other three, although all four can be computed using the S IGNAL P ROCESSING toolbox.
5.3. DISCRETE FILTERS 209

5.3 Discrete filters

Analogue filters are fine for small applications and for where aliasing may be a problem, but
they are more expensive and more inflexible than the discrete equivalent. We would much rather
fabricate a digital filter where the only difference between filter orders is a slightly longer tapped
delay or shift register, and proportionally slower computation, albeit with a longer word length.
Discrete filters are also easy to implement in M ATLAB using the filter command.

To design a digital filter, one option is to to start the design from the classical analogue filters
and use a transformation such as the bilinear transform described in chapter 2 to the discrete
domain. In fact this is the approach that M ATLAB uses in that it first designs the filter in the
continuous domain, then employs bilinear to convert to the discrete domain. Digital filter
design is discussed further in section 5.3.2.

Traditionally workers in the digital signal processing (DSP) area have referred to two types of fil-
ters, one where the response to an impulse is finite, the other where the response is theoretically
of infinite duration. This latter recursive filter or Infinite Impulse Response (IIR), is one where
previous filtered values are used to calculate the current filtered value. The former non-recursive
filter or Finite Impulse Response, (FIR) is one where the filtered value is only dependent on past
values of the input hence A(z) = 1. FIR filters find use in some continuous signal processing
applications where the operation is to be run for very long sustained periods where the alterna-
tive IIR filters are sensitive to numerical ‘blow-up’ owing to the recursive nature of the filtering
algorithm. FIR filters will never go unstable since they have only zeros which cannot cause in-
stability by themselves. FIR filters also exhibit what is known as linear phase lag, that is each
of the signal’s frequencies are delayed by the same amount of time. This is a desirable attribute,
especially in communications and one that a causal IIR filter cannot match.

However better filter performance is generally obtained by using IIR filters for the same number
of discrete components. In applications where the FIR filter is necessary, 50 to 150 terms are com-
mon. To convert approximately from an IIR to a FIR, divide A(z) into B(z) using long division,
and truncate the series at some suitably large value. Normally, the filter would be implemented
in a computer in the digital form as a difference equation, or perhaps in hardware using a series
of delay elements and shift registers.

5.3.1 A low-pass filtering application

Consider the situation where you are asked to filter a measurement signal from a sensitive pres-
sure transducer in a level leg in a small tank. The level in the tank is oscillating about twice a
second, but this measurement is corrupted by the mains electrical frequency of 50Hz. Suppose
we are sampling at 1 kHz (T = 0.001s), and one notes that this sampling rate is sufficient to avoid
aliasing problems since it is above twice the highest frequency of interest, (2 × 50). We can sim-
ulate this in M ATLAB. First we create a time vector t and then the two signals, one is the true
measurement y1 , and the other is the corrupting noise from the mains y2 . The actual signal, as
received by your data logger, is the sum of these two signals, y = y1 + y2 . The purpose of filtering
is to extract the “true” signal y1 from the corrupted y as closely as possible.
210 CHAPTER 5. DIGITAL FILTERING AND SMOOTHING

1.5
t = [0:0.001:1]'; % time vector 1
2 y1 = sin(2*pi*2*t); % true signal
0.5
y2 = 0.1*sin(2*pi*50*t); % noise
y = y1+y2; % what we read 0

plot(t,y) −0.5

−1

−1.5
0 0.2 0.4 0.6 0.8 1

Clearly even from just the time domain plot the signal y is composed of two dominant frequen-
cies. If a frequency domain plot were to be created, it would be clearer still. To reconstruct the
true level reading, we need to filter the measured signal y. We could design a filter with a cut-
off frequency of fc = 30Hz. This should in theory pass the slower y1 signal, but attenuate the
corrupting y2 .

To implement such a digital filter we must:

1. select a filter type, say Butterworth, (refer page 201)

2. select a filter order, say 3, and

3. calculate the dimensionless frequency related to the specified frequency cutoff fc = 30Hz,
and the sample time (or Nyquist frequency fN ),

fc
Ω= = 2T fc
fN
= 2 × 10−3 × 30 = 0.06

We are now ready to use the S IGNAL P ROCESSING T OOLBOX to design our discrete IIR filter or
perhaps build our own starting with a continuous filter and transforming to the discrete domain
as described on page 203.

>> wc = 2*1.0e-3*30; % Dimensionless cut-off frequency Ω


>> [B,A] = butter(3,wc) % Find A(q −1 ) & B(q −1 ) polynomials
B =
0.0007 0.0021 0.0021 0.0007
10 A =
1.0000 -2.6236 2.3147 -0.6855
>> yf = filter(B,A,y); % filter the measured data
>> plot(t,y,t,yf,'--',t,y1) % any better ?

Once having established the discrete filter coefficients, we can proceed to filter our raw signal
and compare with the unfiltered data as shown in Fig. 5.12.

The filtered data Fig. 5.12 looks smoother than the original, but it really is not ‘smooth’ enough.
We should try a higher order filter, say a 6th order filter.

[B6,A6] = butter(6,wc); % 6th order, same cut-off


15 yf6 = filter(B6,A6,y); % smoother ?
plot(t,[y,yf,yf6]) % looks any better ?
5.3. DISCRETE FILTERS 211

1.4

1.2

1 raw data

0.8

0.6
6th order

0.4
3rd order
Raw data
0.2
3rd or Butterworth
Figure 5.12: Comparing a 3rd and 6th order
6th order Butterworth discrete Butterworth filters to smooth noisy
0 data.
0 0.05 0.1 0.15 0.2

This time the 6th order filtered data in Fig. 5.12 is acceptably smooth, but there is quite a notice-
able phase lag such that the filtered signal is behind the original data. This is unfortunate, but
shows that there must be a trade off between smoothness and phase lag for all causal or online
filters.

The difference in performance between the two digital filters can be shown graphically by plot-
ting the frequency response of the filter. The easiest way to do this is to convert the filter to a
transfer function and then use bode.

Ts = t(2)-t(1); % Sampling time, T , for the correct frequency scale


H3 = tf(B,A,Ts);
H6 = tf(B6,A6,Ts)
20 bode(H3,H6); % See Frequency response plot in Fig. 5.13.

Alternatively we could use freqz and construct the frequency response manually.

[h,w] = freqz(B,A,200); % discrete frequency response


[h6,w6] = freqz(B6,A6,200);
w = w/pi*500; % convert from rad/s to Hz
w6 = w6/pi*500;
25 loglog(w,abs(h), w6,abs(h6)) % See Magnitude plot in Fig. 5.13.

The sixth order filter in Fig 5.13 has a much steeper roll off after 30Hz, and this more approximates
the ideal filter. Given only this curve, we would select the higher order (6th) filter. To plot the
second half of the Bode diagram, we must ‘unwrap’ the phase angle, and for convenience we will
also convert the phase angle from radians to degrees.

ph = unwrap(angle(h))*360/2/pi;
ph6 = unwrap(angle(h6))*360/2/pi;
semilogx(w,ph,w6,ph6) % plot the 2nd half of the Bode diagram

The phase lag at high frequencies for the third order filter is 3 × 90◦ = 270◦ , while the phase lag
for the sixth order filter is 6 × 90◦ = 540◦ . The ideal filter has zero phase lag at all frequencies.
212 CHAPTER 5. DIGITAL FILTERING AND SMOOTHING

0 3rd order
10 f f
c N

Amplitude ratio
−5
10

3rd order
6th order
6th order
−10
10
0 1 2 3
10 10 10 10

Phase angle φ
−200

Figure 5.13: The frequency response for a −400


3rd and 6th order discrete Butterworth fil-
fc fN
ter. The cut-off frequency for the filters is −600
fc = 30Hz and the Nyquist frequency is 10
0
10
1
10
2 3
10
fN = 500 Hz. frequency [Hz]

The amplitude ratio plot in Fig 5.13 asymptotes to a maximum frequency of 500 Hz. This may
seem surprising since the Bode diagram of a true continuous filter does not asymptote to any
limiting frequency. What has happened, is that Fig 5.13 is the plot of a discrete filter, and that
the limiting frequency is the Nyquist frequency (2/T ). However the discrete Bode plot is a good
approximation to the continuous filter up to about half the Nyquist frequency.

A matrix of filters

Fig. 5.14 compares the performance of 9 filters using the 2 frequency component signal from
§5.3.1. The 9 simulations in Fig 5.14 show the performance for 3 different orders (2,5 and 8) in
the columns at 3 different cut-off frequencies (20, 50 and 100 Hz) in the rows. This figure neatly
emphasizes the trade off between smoothness and phase lag.

5.3.2 Approximating digital filters from continuous filters

As mentioned in the introduction, one way to design digital filters, is to first start off with a
continuous design, and then transform the filter to the digital domain. The four common classical
continuous filter families: Butterworth, Chebyshev, elliptic and Bessel, are all strictly continuous
filters, and if we are to implement them in a computer or a DSP chip, we must approximate these
filters with a discrete filter.

To convert from a continuous filter H(s), to a digital filter H(z), we could use the bilinear trans-
form as described in section 2.5.2. Since this transformation method is approximate, we will find
that some frequency distortion will be introduced. However we can minimise this distortion by
adjusting the frequency response of the filter H(s) before we transform to the discrete approxi-
mation. This procedure is called using the bilinear transformation with frequency pre-warping.
5.3. DISCRETE FILTERS 213

n=2 n=5 n=8


raw
ω = 20

filtered
ω = 50
ω = 100

Figure 5.14: The performance of various Butterworth filters of different orders 2,5, and 8 (left to
right) and different cut-off frequencies, ωc of 20, 50 and 100 Hz (top to bottom) of a two frequency
component signal.

Frequency pre-warping

The basic idea of frequency pre-warping is that we try to modify the transformation from contin-
uous to discrete so that H(z) better approximates H(s) at the frequency of interest, rather than
just at ω = 0. This is particularly noticeable for bandpass filtering applications.

The frequencies in the z plane (ωd ) are related to the frequencies in the s plane (ωa ) by
 
2 ωa T
ωd = tan−1 (5.16)
T 2
 
2 ωd T
ωa = tan−1 (5.17)
T 2

This means that the magnitude of H(s) at frequency ωa , |H(iωa)|, is equal to the magnitude
of H(z) at frequency ωd , |H(eiωd T )|. Suppose we wish to discretise the continuous filter H(s)
using the bilinear transform, but we also would like the digital approximation to have the same
magnitude at the corner frequency (wc = 10), then we must substitute using Eqn 6.13.

Example of the advantages of frequency pre-warping.

Suppose we desire to fabricate an electronic filter for a guitar tuner. In this application the guitar
player strikes a possibly incorrectly tuned string, and the tuner box recognises which string is
struck, and then subsequently generates the correct tone. The player can then adjust the tuning
of the guitar. Such a device can be fabricated in a DSP chip using a series of bandpass filters
tuned to the frequencies of each string (give or take a note or two).

We will compare a continuous bandpass filter centered around a frequency of 440 Hz (concert
pitch A), with digital filters using a sampling rate of 2kHz. Since we are particularly interested
in the note A, we will take special care that the digital filter approximates the continuous filter
around f = 440Hz. We can do this by frequency pre-warping. The frequency response of the
three filters are shown in Fig. 5.15.
214 CHAPTER 5. DIGITAL FILTERING AND SMOOTHING

[B,A] = butter(4,[420 450]*2*pi,'s'); % band pass from 420 to 540 Hz


2 Gc = tf(B,A) % Continuous first order filter

Ts = 1/2000; % sample rate 2kHz


wn = pi/Ts; % Nyquist freq. (rad/s)

7 Gd = c2d(Gc,Ts,'tustin') % Now sample it using bilinear transform


Gd_pw = c2d(Gc,Ts,'prewarp',440*2*pi) % Now sample it with pre-warping

Fig. 5.15 highlights that the uncompensated digital filter (dashed) actually bandpasses signals
in the range of 350 to 400Hz rather than the 420 to 450Hz required. This significant error is
addressed by designing a filter to match the continuous filter at f = 440Hz using frequency
pre-warping (dotted).

Continuous & discrete Bode plots of a band−pass filter

0
10 Continuous
Discrete
pre−warp
f
N
AR

−1
10
pre−warp

discrete continuous

−2
10
3
10
Frequency [Hz]

Figure 5.15: Comparing a continuous bandpass filter with a discrete filter and discrete filter with
frequency pre-warping (dashed). Note the error in the passband when using the bilinear trans-
form without frequency pre-warping.

For an online filter application, we should

1. Decide on the cut-off frequency, ωc .

2. Select structure of the analogue filter, (Butterworth, Chebyshev, elliptical etc). Essentially
we are selecting the optimality criteria for the filter.

3. The filter order (number of poles).

4. Transform to the discrete domain using the bilinear transform.

All these considerations have advantages and disadvantages associated with them.

5.3.3 Efficient hardware implementation of discrete filters

To implement the discrete difference equation,

yk A(z −1 ) = B(z −1 )uk


5.3. DISCRETE FILTERS 215

in hardware, we can expand out to a difference equation


na
X nb
X
yk = − ai yk−i + bi uk−i (5.18)
i=1 i=0

meaning now we can implement this using operational amplifiers and delay elements. One
scheme following Eqn. 5.18 directly given in Fig. 5.16 is known as the Direct Form I. It is easy
to verify that the Direct Form I filter has the transfer function

b0 + b1 z −1 + b2 z −2 + · · · + bnb z −nb
H(z −1 ) =
1 + a1 z −1 + a2 z −2 + · · · + ana z −na

b0
xk - -+ -+ - yk
6 6
? ?
−1 −1
z b1 −a1 z
- -+ + 
? 6 6 ?
−1 −1
z .. −a2 z
.
- -+ + 
? 6 6 ?
−1 −1
z .. z
.
- -+ + 
6 ?
bnb z −1
−ana
| {z } + 
numerator
| {z }
denominator

Figure 5.16: Hardware difference equation in Direct Form I

The square boxes in Fig. 5.16 with z −1 are the delay elements, and since we take off, or tap, the
signal after each delay, the lines are called tapped delays.

For the circuit designer who must fabricate Fig. 5.16, the expensive components are the delay el-
ements since they are typically specialised integrated circuits and one is economically motivated
to achieve the same result but by using fewer components. Rather than use the topology where
we need separate delay lines for both the input sequence and the output sequence separately, we
can achieve the identical result using only half the number of delays. This reduces fabrication
cost.

Starting with original IIR filter model

Y (z −1 ) B(z −1 )
−1
= (5.19)
U (z ) A(z −1 )
216 CHAPTER 5. DIGITAL FILTERING AND SMOOTHING

and introducing an intermediate state, w,


Y (z −1 ) W (z −1 ) B(z −1 )
· =
W (z −1 ) U (z −1 ) A(z −1 )
which means that we can describe Eqn. 5.19 as one pure infinite impulse response (IIR) and one
finite impulse response (FIR) process
Y (z −1 ) = B(z −1 )W (z −1 ) (5.20)
−1 −1 −1
A(z )W (z ) = U (z ) (5.21)
A block diagram following this topology is given in Fig. 5.17 which is known as Direct Form II,
DFII.

wk b0
xk -+ - -+ - yk
6 6
?
−a1 z −1 b1
+  - -+
6 ? 6

−a2 z −1 ..
.
+  - -+
6 ? 6
z −1

+  - -+
6 ?
z −1 bnb
−ana
+ 

Figure 5.17: An IIR filter with a minimal number of delays, Direct Form II

We could have reached the same topological result by noticing that if we swapped the order of
the tapped delay lines, in the direct form 1 structure, Fig. 5.16, then we will notice that the delay
lines are separated by a unity gain branch. Thus we can eliminate one of the branches, and save
the hardware cost of half the delays.

Further forms of the digital filter are possible and are used in special circumstances such as when
we have parallel processes or we are concerned with the quantisation of the filter coefficients.
One special case is described further in section 5.3.4.

With M ATLAB we would normally use the optimised filter command to simulate IIR filters,
but this option is not available when using say C on a microprocessor or DSP chip. The following
script implements the topology given in Fig. 5.17 for the example filter
B(z −1 ) (z −1 − 0.4)(z −1 − 0.98)
=
A(z −1 ) (z −1 − 0.5)(z −1 − 0.6)(z −1 − 0.7)(z −1 − 0.8)(z −1 − 0.9)
and a random input sequence.
5.3. DISCRETE FILTERS 217

B = poly([0.4 0.98]); % Example stable process


2 A = poly([0.5:0.1:0.9]);
U = [ones(30,1); randn(50,1)]; % example input profile
Yx = dlsim(B,A,U); % Test Matlab's version

% Version with 1 shift element -----------------


7 nb = length(B); A(1) = []; na = length(A);
nw = max(na,nb); % remember A=A-1
shiftw = zeros(size(1:nw))'; % initialise with zeros
Y = []; for i=1:length(U);
w = - A*shiftw(1:na) + U(i) ;
12 shiftw = [w; shiftw(1:nw-1)];
Y(i,1) = B*shiftw(1:nb);
end % for

t = [0:length(U)-1]'; [tt,Ys] = stairs(t,Y); [tt,Yx] =


17 stairs(t,Yx);
plot(tt,Ys,tt,Yx,'--'); % plotting verification

M ATLAB’s dlsim gives a similar output but is simply shifted one unit to the right.

5.3.4 Numerical and quantisation effects for high-order filters

Due to commercial constraints, for many digital filtering applications, we want to be able to
realise cheap high-quality filters at high speed. Essentially this means implementing high-order
IIR filters on low-cost hardware with short word-lengths. The problem is that high-order IIR
filters are very sensitive to quantisation effects especially when implemented in the expanded
polynomial form such as DFII.

One way to reduce the effects of numerical roundoff is to implement rewrite the DFII filter in
factored form as a cascaded collection of second-order filters as shown in Fig. 5.18. Each indi-
vidual filter is a second-order filter as shown in Fig. 5.19. These are known as bi-quad filters, or
second-order sections (SOS). For an nth order filter, we will have ⌈n/2⌉ sections, that is for filters
with an odd order, we will have one second order section that is actually first order.

- - -
xk - - - - yk

| {z }
SOS #1 ··· SOS # ⌈n/2⌉

Figure 5.18: Cascaded second-order sections to realise a high-order filter. See also Fig. 5.19.

We can convert higher-order filters to a collection of second-order sections using the tf2sos or
equivalent commands in M ATLAB. You will notice that the magnitude and range of the coef-
ficients of the SOS filter realisation is much smaller than the equivalent expanded polynomial
filter.
218 CHAPTER 5. DIGITAL FILTERING AND SMOOTHING

b0
xk -+ - -+ - yk
6 6
?
−a1 z −1 b1
+  - -+
6 ? 6

−a2 z −1 b2
 -

Figure 5.19: A second-order section (SOS)

The separating of the factors into groups of second-order sections for a given high-order filter is
not unique, but one near-optimal way is to pair the poles nearest the unit circle with the zeros
nearest those poles and so on. This is is the strategy that tf2sos uses and is explained in further
detail in the help documentation. Listing 5.6 demonstrates how we can optimally decompose
a 7th order filter into four second-order sections. Of course the first of the four second-order
sections is actually a first-order section. We can see that by noticing that the coefficients b2 and a2
are equal to zero for the first filter section.

Listing 5.6: Converting a 7th-order Butterworth filter to 4 second-order sections


>> [B,A] = butter(7,0.8,'low'); H = tf(B,A,1) % Design a 7th order discrete DFII filter
2 Transfer function:

0.2363 zˆ7 + 1.654 zˆ6 + 4.963 zˆ5 + 8.271 zˆ4 + 8.271 zˆ3 + 4.963 zˆ2

+ 1.654 z + 0.2363
7

-------------------------------------------------------------------------
zˆ7 + 4.182 zˆ6 + 7.872 zˆ5 + 8.531 zˆ4 + 5.71 zˆ3 + 2.349 zˆ2

+ 0.5483 z + 0.05584
12

>> [SOS,G] = tf2sos(B,A) % Convert 7th order polynomial to 4 second-order sections


SOS =
1.0000 0.9945 0 1.0000 0.5095 0
1.0000 2.0095 1.0095 1.0000 1.0578 0.3076
17 1.0000 2.0026 1.0027 1.0000 1.1841 0.4636
1.0000 1.9934 0.9934 1.0000 1.4309 0.7687
G =
0.2363

The advantage when using second-order sections is that we can maintain reasonable accuracy
even when using relatively short word lengths. Fig. 5.20(a), which is generated by Listing 5.7,
illustrates that we cannot successfully run a 7th-order elliptic low-pass filter in single precision in
the expanded polynomial form, (DFII), one reason being that the resultant filter is actually unsta-
5.4. THE FOURIER TRANSFORM 219

zoomed region

2 0.4 Pole−Zero Map

single precision 0.35


filtered response

1.5 1
0.3
0.8
1
0.250.6

0.5 0.20.4
Double & single SOS
0.150.2
10 0
0 0.9 1 1.1


0
−0.2
−0.4
error

−5
10
−6 −0.6
10
−0.8 double
−1 single
−10
10 −1.5 −1 −0.5 0 0.5 1
0 20 40 60 80 100
sample interval ℜ

(a) The time response of double and single precision filters (b) Pole-zero maps of single and double precision
to a unit step response. filters. Note that the single precision version has
poles outside the unit circle.

Figure 5.20: Comparing single precision second-order sections with filters in direct form II trans-
posed form. Note that the direct form II filter is actually unstable when run in single precision.

ble! However we can reliably implement the same filter, in single precision if we use 4 cascaded
second-order sections. In fact for this application, the single precision filter implemented in
SOS is indistinguishable in Fig. 5.20(a) from the double precision implementation given that the
average error is always less than 10−6 .

Listing 5.7: Comparing DFII and SOS digital filters in single precision.
[B,A] = ellip(7,0.5,20,0.08); % Design a 7th order elliptic filter

U = ones(100,1); % Consider a step response


Y = filter(B,A,U); % Compute filter in double precision
5 Ys = filter(single(B), single(A), single(U)); % Compute filter in single precision

[SOS,G] = tf2sos(B,A); % Convert to SOS (in full precision)


n = size(SOS,1); % # of sections
ysos = single(G)*U; % Compute SOS filter in single precision
10 for i=1:n
ysos = filter(single(SOS(i,1:3)), single(SOS(i,4:end)), ysos)
end
stairs([Y Ys ysos]); % Compare results in Fig. 5.20(a).

It is also clear from the pole-zero plot in Fig. 5.20(b) that the single precision DFII implementation
has poles outside the unit circle due to the quantisation of the coefficients. This explains why this
single precision filter is unstable.

5.4 The Fourier transform

The Fourier Transform (FT) is one of the most well known techniques in the mathematical and
scientific world today. However it is also one of the most confusing subjects in engineering,
partly owing to the plethora of definitions, normalising factors, and other incompatibilities found
in many standard text books on the subject. The Fourier transform is covered in many books on
220 CHAPTER 5. DIGITAL FILTERING AND SMOOTHING

digital signal processing, (DSP), such as [36, 131], and can also be found in mathematical physics
texts such as [107, 161]. The Fourier Transform is a mathematical transform equation that takes
a function in time and returns a function in frequency. The Fast Fourier Transform, or FFT, is
simply an efficient way to compute this transformation.

Chapter 2 briefly mentioned the term “spectral analysis” when discussing aliases introduced
by the sampling process. Actually a very learned authority , [175, p53], tells us that “spectral”
formally means “pertaining to a spectre or ghost”, and then goes on to make jokes about ghosts
and gremlins causing noisy problems. But nowadays spectral analysis means analysis based
around the spectrum of a something. We are going to use spectral analysis to investigate the
frequency components of a measured signal, since in many cases these frequency components
may reveal critical information hidden in the time domain description. This decomposition of
the time domain signal into its frequency components is just one application of the FFT.

Commonly in mathematics, we can approximate any function f (t) using (amongst other things)
a Taylor series expansion, or a sine and cosine series expansion. If the original signal is periodic,
then the latter technique generally gives better results with fewer terms. This latter series is called
the Fourier series expansion.

The fundamental concept on which the Fourier transform (FT) is based is that almost any periodic
wave form can be represented as a sum of sine waves of different periods and amplitudes,

X ∞
X
f (t) = An sin (2πnf0 t) + Bn cos (2πnf0 t) (5.22)
n=0 n=0

This means that we can decompose any periodic wave form (PW) with a fundamental frequency2
(f0 ) into a sum of sinusoidal waves, (which is known as frequency analysis) or alternatively we
can construct complicated waves from simple sine waves (which is known as frequency synthe-
sis).

Since it is impractical to have an infinite number of terms, this series is truncated, and f (t) now
only approximates the right hand side of Eqn. 5.22. Naturally, as the number of terms tends to
infinity, the approximation to the true signal f (t) improves. All we have to do now is to determine
the constants An and Bn for a given function f (t). These coefficients can be evaluated by
Z P/2 Z P/2
2 2
An = f (t) sin (2πnf0 t) dt and Bn = f (t) cos (2πnf0 t) dt (5.23)
P −P/2 P −P/2

when n 6= 0. When n = 0, we have the special case


Z P/2
1
A0 = 0, and B0 = f (t)dt (5.24)
P −P/2

In summary, f (t) tells us how the signal develops in time, while the constants An , Bn give us a
method to generate f (t).

We can re-write Eqn. 5.22 as a sum of cosines and a phase angle



X
f (t) = Cn cos (2πnfo t + ψn ) (5.25)
n=0

2 Remember the relationship between the frequency f , the period P , and the angular velocity ω.
1 ω
f= =
P 2π
5.4. THE FOURIER TRANSFORM 221

where the phase angle, ψn , and amplitude, Cn , are given by


 
An p
ψn = tan−1 − and Cn = A2n + Bn2 (5.26)
Bn

If we plot Cn vs f then this is called the frequency spectrum of f (t). Note that Cn has the same
units of An or Bn which have the same units of whatever the time series f (t) is measured in.

5.4.1 Fourier transform definitions

Any function f (t) that is finite in energy can be represented in the frequency domain as F (ω) via
the transform
Z ∞
F (ω) = f (t)e−iωt dt (5.27)
−∞
Z ∞ Z ∞
= f (t) cos(ωt)dt − i f (t) sin(ωt)dt (5.28)
−∞ −∞

We can also convert back to the time domain by using the inverse Fourier transform,
Z ∞
1
f (t) = F (ω) eiωt dω
2π −∞

You may notice that the transform pair is not quite symmetrical, we have a factor of 2π in the
inverse relation. If we use frequency measured in Hertz, where ω = 2πf rather than ω, we obtain
a much more symmetrical, and easier to remember, pair of equations. [161, pp381–382] comment
on this, and [139, p145] give misleading relations.

The spectrum of a signal f (t) is defined as the square of the absolute value of the Fourier trans-
form,
Φf (ω) = |F (ω)|2
Parseval’s equation reminds us that the energy of a signal is the same whether we describe it in
the frequency domain or the time domain,
Z ∞ Z ∞
1
f 2 (t) dt = |F (ω)|2 dω
−∞ 2π −∞

We can numerically approximate the function F (ω) by evaluating Eqn 5.28 using a numerical
integration technique such as Euler’s method. Since this numerical integration is computation-
ally expensive, and we must do this for every ω of interest, the calculation procedure is quite
time consuming. I will refer to this technique as the Slow Fourier Transform (SFT). Remember
that H(ω) is a complex function of angular velocity (ω), and that the result is best displayed as a
graph with ω measured in radians/s as the dependent variable.

Some helpful properties

Here are some helpful properties of the Fourier transform that may make analysis easier.

1. If f (t) = −f (−t) which means that f (t) is an odd function like sin t, then all the cosine
terms disappear.
222 CHAPTER 5. DIGITAL FILTERING AND SMOOTHING

2. If f (t) = f (−t) which means that f (t) is an even function like cos t, then all the sine terms
disappear.

3. If f (t) = −f (t + P/2), then the series will contain only odd components; n = 1, 3, 5, . . .

4. If f (t) = f (t + P/2), then the series will contain only even components; n = 2, 4, 6, . . .

Naturally we are not at liberty to change the shape of our measured time series f (t), but we may
be allowed to choose the position of the origin such that points 3 and 4 are satisfied.

The Euler relations

We can also write the Fourier series using complex quantities with the Euler relations.

eiθ = cos θ + i sin θ, and e−iθ = cos θ − i sin θ (5.29)



where i = −1. This implies that

eiθ + e−iθ eiθ − e−iθ


cos θ = , and sin θ = (5.30)
2 2i
So we can re-write Eqn 5.22 in complex notation as

X
f (t) = Dn ei2πnf0 t (5.31)
−∞

where Z P/2
1
Dn = f (t)e−i2πnf0 t dt (5.32)
P −P/2

Example A decomposition of a square wave


The square wave is a simple periodic function that can be written as a infinite sine and cosine
series. Fig 5.21 shows the original time series f (t). We note that the original wave is:

1. Periodic, with a period of P = 1 seconds.

2. f (t) is an odd function such as sin t so we expect all the cosine terms disappear. An odd
function is where f (t) = −f (−t), while an even function is where f (t) = f (−t).

Using Eqn. 5.23, we see that


1
An = (1 − cos πn)
πn
which collapses to An = 2/nπ for n = 1, 3, 5, . . . ∞ and An = 0 for n = 2, 4, 6, . . . ∞. This gives
the Fourier series approximation to f (t) as

2 2 2
f (t) = sin 2πt + sin 6πt + · · · + sin 2(2n − 1)πt + · · · (5.33)
π 3π (2n − 1)π

As the number of terms (n) in Eqn. 5.33 increase, the approximation improves as illustrated in Fig
5.22 which shows the Fourier approximation for 1 through 4 terms. Later in §5.4.3 we will try to
do the reverse, namely given the square wave signal, we will extract numerically the coefficients
of the sine and cosine series.
5.4. THE FOURIER TRANSFORM 223

Original signal f(t)


1

0.5

−0.5

−1

First order approximation of f(t)


1

0.5

−0.5
Figure 5.21: The original square wave sig-
−1
0 1 2 3 4 nal f (t) (top), and an approximation to f (t)
time using one sine term (bottom)

# of terms = 1
1

−1

# of terms = 2
1

−1

# of terms = 3
1

−1

# of terms = 4
1

0 Figure 5.22: The Fourier approxima-


tion to a square wave with varying
−1 number of terms.
0 1 2 3 4
224 CHAPTER 5. DIGITAL FILTERING AND SMOOTHING

5.4.2 Orthogonality and frequency spotting

Consider the function y = sin(3t) sin(5t) which is plotted for 2 full periods in Fig 5.23 (top). If
we integrate it over one period, we can see that the positive areas, cancel out the negative areas,
hence the total integral is zero. Or
Z P =π
y(t) dt = 0
0

The only time an integral of a full period will not be zero is when the entire curve is either all
positive or all negative. The only time that can happen, is when the two frequencies are the same
such as in Fig. 5.23 (bottom). Thus if

y = sin(nt) + sin(mt) (5.34)

then the integral over one full period will be one of the following two results:
Z P
y(t) dt = 0 if n 6= m (5.35)
0
Z P
y(t) dt 6= 0 if n = m (5.36)
0

sin(3t)*sin(5t)
1
0.5
0
−0.5
−1
0 1 2 3 4 5 6 7
2
sin (3t)
1
0.5
0
Figure 5.23: A plot of (top)
y = sin(3t) sin(5t) and (bottom) −0.5
y = sin(3t) sin(3t). In the top figure −1
the shaded regions cancel giving a total 0 1 2 3 4 5 6 7
integral of zero for the two periods. time

Fig 5.23 was created by the following M ATLAB commands

t=[0:0.01:2*pi]'; % create a time vector 2 Periods long


2 y3=sin(3*t); y5=sin(5*t);
plot(t,[y3.*y3, y3.*y5]) % plot both curves

We can numerically integrate these curves using the sum command. The integral of the top curve
in Fig. 5.23 is

sum(y3.*y5)
ans = -1.9699e-05
5.4. THE FOURIER TRANSFORM 225

R 2π
which is close enough to zero while 0
sin2 3t is evaluated as

sum(y3.*y3)
ans = 314.1593

and is definitely non-zero. Note that for this example we have integrated over 2 periods, so
we should divide the resultant integral by 2. This however, does not change our conclusions.
Additionally note that the simple sum command when used as an integrator assumes that the
argument is uniformly spaced 1 time unit apart. This is not the case for this example, but the
difference is only a scale factor, and again is not important for this demonstration.

Period

The period of y = sin(nt) is P = 2π/n. The period of y = sin(nt) sin(nt) can be evaluated using
an expansion for sin2 (nt) as
1
y = sin2 (nt) = (1 − cos(2nt))
2
thus the period is P = π/n.

The period of the general expression y = sin(nt) sin(mt) where n and m are potentially different
is found by using the expansion
1
y= [cos(n + m)t + cos(n − m)t]
2
which for the example above gives
1
y = sin(3t) sin(5t) = [− cos 8t + cos 2t] (5.37)
2
The period of the first term in Eqn 5.37 is P1 = π/4 and for the second term is P2 = π. The total
period is the smallest common multiple of these two periods which is for this case P = π.

The property given in Eqn. 5.36 can be exploited for spectral analysis. Suppose we have measured
a signal x(t) that is made up of a sine wave with a particular, but unknown, frequency. However
we wish to establish this unknown frequency using spectral analysis. All we need to do is to
multiply x(t) by a trial sine wave, say sin 5t, integrate over one period, and look at the result.
If the integral is close to zero, then our original signal, x(t), had no sin 5t term, if however we
obtained a significant non-zero integral, then we would conclude that the measured signal had
a sin 5t term somewhere. Note that to obtain a complete frequency description of the signal, we
must try all possible frequencies from 0 to ∞.

So in summary, the idea is to multiply the signal x(t) with different sine waves containing the
frequency of interest, integrate, and then see if the result is non-zero. We could plot the integral
result against the frequency thus producing a spectral plot of x(t).

5.4.3 Using M ATLAB’s FFT function

M ATLAB provides a convenient way to analyse the frequency component of a finite sampled
signal by using the built-in fft function. This is termed spectral analysis and is at its simplest
just a plot of the magnitude of the Fourier transform of the time series, or abs(fft(x)). The
complexities are due to the scaling of the plot, and possible windowing to improve the frequency
226 CHAPTER 5. DIGITAL FILTERING AND SMOOTHING

resolution at the expense of the magnitude value, or vice-versa and possible adjustment for non-
periodic signals.

The S IGNAL P ROCESSING toolbox includes a command psd which is an improved version of the
example given in Technical Support Note # 1702 available from www.mathworks.com/support/tech-notes/17
Other functions to aid the spectral analysis of systems are spectrum and the associated
specplot, and etfe (empirical transfer function estimate) from the S YSTEM I DENTIFICATION
toolbox. All of these routines perform basically the same task.

To look how we can perform a spectral decomposition, let us start with a simple example where
we know the answer, namely
x(t) = 4.5 sin(2π200t)
It is immediately apparent that at a frequency of 200 Hz, we should observe a single Fourier
coefficient with a magnitude of 4.5 while all others are zero. Our aim is to verify this using
M ATLAB ’ S fft routine.

The function, spsd (scaled PSD) giving in Listing 5.8 will use the Fourier transform to decompose
the signal into different frequencies and display this information. This function was adapted from
the version given in Technical Support Note # 1702 available at the M ATH W ORKS ftp server and
further details are given in the note. Since we have assumed that the input signal is real, then
we know that the Fourier transform will be conjugate symmetrical about the Nyquist frequency,
thus we can throw the upper half away. Doing this, we must multiply the magnitude by 2 to
maintain the correct size. M ATLAB also normalises the transform by dividing by the number of
data points so must also account for this in the magnitude. Since the FFT works much faster
when the series is a power of two, we automatically pad with zeros up to the next power of 2. In
the example following, I generate 1000 samples, so 24 zeros are added to make up the difference
to 210 = 1024.

Listing 5.8: Routine to compute the power spectral density plot of a time series
function [Mx,f] = spsd(x,dt);
2 % [Mx,f] = spsd(x,[dt])
% Power spectral plot of x(t) sampled at dt [s], (default dt=1).
% Output: Power of x at frequency, f [Hz]
% See also: PSD.M, SPECTRUM, SPECPLOT, ETFE

7 if nargin < 2, dt = 1.0; end % default sample time


Fs = 1/dt; Fn = Fs/2; % sampling & Nyquist frequency [Hz]
NFFT=2.ˆnextpow2(length(x)); % NFFT=2.ˆ(ceil(log(length(x))/log(2)));
FFTX=fft(x,NFFT); % Take fft, padding with zeros,
NumUniquePts = ceil((NFFT+1)/2);
12

FFTX=FFTX(1:NumUniquePts); % fft is symmetric, throw away upper half


MX=2*abs(FFTX); % Take magnitude of X & 2* since threw half away

MX(1)=MX(1)/2; % Account for endpoint uniqueness


17 MX(length(MX))=MX(length(MX))/2; % We know # of samples is even

MX=MX/length(x); % scale for length


f=(0:NumUniquePts-1)*2*Fn/NFFT; % frequency Hz

22 if nargout == 0, % do plot if requested


plot(f,MX); xlabel('frequency [Hz]');
end % if
return % SPSD.M
5.4. THE FOURIER TRANSFORM 227

We are now ready to try our test signal to verify spsd.m given in Listing 5.8.

Fs = 1000; % Sampling freq, Fs , in Hz


t=0:1/Fs:1; % Time vector sampled at Fs for 1 second
x = 4.5*sin(2*pi*t*200)'; % scaled sine wave.
5
4
3
2
1
0
0 100 200 300 400 500
spsd(x,1/Fs) % Does Fs = 200, A = 4.5? frequency [Hz]

We should expect to see a single distinct spike at f = 200Hz with a height of 4.5.

With the power spectrum density function spsd, we can also analyse the frequency components
of a square wave. We can generate a square wave using the square function in a manner similar
to that of generating a sine wave, or simply wrap the signum function around a sine wave.

1 Fs = 30; t=0:1/Fs:20; % Sampling freq, Fs , in Hz & time vector for 20 seconds


s = 0.5*square(2*pi*t); % Square wave
spsd(s,1/Fs) % See odd spikes ??
Welch Power Spectral Density Estimate
Power/frequency (dB/Hz)

−10

−20

−30

−40

−50
0 5 10 15
pwelch(s,[],[],[],Fs); % Try toolbox version Frequency (Hz)

Note that all the dominant frequency components appear at the odd frequencies (1,3,5,7. . . ), and
that the frequency components at the even frequencies are suppressed as anticipated from §5.4.1.
Of course they theoretically should be zero, but the discretisation, finite signal length and sam-
pling etc. introduce small errors here.

5.4.4 Periodogram

There exists a parallel universe to the time domain called the frequency domain, and all the op-
erations we can do in one domain can be performed in the other. Our gateway to this frequency
domain as previously noted is the Fast Fourier Transform or known colloquially as the FFT. In-
stead of plotting the signal in time, we could plot the frequency spectrum of the signal. That is,
we plot the power of the signal as a function of frequency. The power Φ of the signal y(t) over a
228 CHAPTER 5. DIGITAL FILTERING AND SMOOTHING

frequency band is defined as the absolute square of the Fourier transform;


X  2
def
Φy (ω) = |Y (ω)|2 = y(t)e−iωt

When we plot the power as a function of frequency, it is much like a histogram where each
frequency “bin” contains some power. This plot is sometimes referred to as a periodogram.

Constructing the periodogram for real data can give rather messy, inconclusive and widely fluc-
tuating results. For a data series, you may find that different time segments of the series give
different looking spectra when you suspect that the true underlying frequency characteristics is
unchanged. One way around this is to compute the spectra or periodogram for different seg-
ments and average the results, or you could use a windowing function. The spa and etfe
functions in the System identification toolbox attempt to construct periodograms from logged
data.

Problem 5.2 Fig. 5.24 trends the monthly critical radio frequencies in Washington, D.C. from May
1934 until April 1954. These frequencies reflect the highest radio frequency that can be used for
broadcasting.

15
Critical frequency

10

0
1935 1940 1945 1950 1955
Year

Figure 5.24: Critical radio frequencies

This data is available from the collection of ‘classic’ time series data archived in: http://www-personal.buseco.
(See also Problem 6.3 for further examples using data from this collection.)

1. Find the dominant periods (in days) and amplitudes of this series.

2. Can you hypothesise any natural phenomena that could account for this periodicity?

5.4.5 Fourier smoothing

It is very easy to smooth data in the Fourier domain. You simply take all your data, FFT it,
multiply this transformed data by the filter function H(f ), and then do the inverse FFT (IFFT)
to get back into the time domain. The computationally difficult parts are the FFT and IFFT, but
this is easily achieved in M ATLAB or in custom integrated circuits. Note that this section is called
smoothing as opposed to online filtering because these smoothing filters are acausal since future
data is required to smooth present data as well as past data. Filters that only operate on past
data are referred to as causal filters. Acausal filters tend to perform better than causal filters as
they exhibit less phase lag but they cannot be used in real time applications. In process control,
physical realisation constraints mean that only causal filters are possible. However in the audio
world (compact disc and digital tapes), acausal filters are used since a time delay is not noticeable
5.4. THE FOURIER TRANSFORM 229

and hence allowed. Surprisingly time delays are allowed on telephone communications as well.
Numerical Recipes [161] has a good discussion of the advantages of Fourier smoothing.

Let us “smooth” the measurement data given in §5.3.1 and compare the smoothed signal with
the online filtered signal. This time however we will deliberately generate 210 = 1024 data pairs
rather than an even thousand so that we can take advantage of the fast Fourier Transform al-
gorithm. It is still possible to evaluate the Fourier transform of a series with a non-power of 2
number of pairs, but the calculation is much slower so its use is discouraged.

Again we will generate a measurement signal as

1
y = sin (2πf1 t) + sin (2πf2 t)
10

where f1 = 2 Hz and f2 = 50 Hz as before. We will generate 210 data pairs at a sampling frequency
of 200Hz.

1 t = 1/200*[0:pow2(10)-1]'; % 210 data pairs sampled at 200 Hz


1
y = sin(2*pi*2*t) + 0.1*sin(2*pi*50*t); % Signal: y = sin (2πf1 t) + 10 sin (2πf2 t)

Generating the Fourier transform of the data is easy. We can plot the “amount” or power in each
frequency bin Y(f ) by typing

yp = fft(y);
plot(abs(yp).ˆ2) % See periodogram in Fig. 5.25
5 semilogy(abs(yp).ˆ2) % clearer plot

In Fig. 5.25, we immediately note that the plot is mirrored about the central Nyquist frequency
meaning that we can ignore the right-hand side of the plot. The x-axis frequency scale is nor-
malised, but you can easily detect the two peaks corresponding to the two dominant frequencies
of 2 and 50 Hz. Since we are trying to eradicate the higher frequency, what we would desire in
the power spectrum is only one spike at a frequency of f = 2 Hz. We can generate a power curve
of this nature, by multiplying it with the appropriate filter function H(f ).

500 Spectrum
Filter

400 zoomed
600
300
Power

400

200 200

0
100 0 50

0 Figure 5.25: The frequency power spectrum


0 200 400 600 800 1000
for the level signal. Both the frequency axis
frequency and the power axis are normalised.

Let us generate the filter H(f ) in the frequency domain. We want to pass all frequencies less than
say 30Hz, and attenuate all frequencies above 30Hz. We do this by constructing a vector where 1
corresponds to the frequencies we wish to retain, and zero is for those we do not want. Then we
convolve the filter with the data, or elementwise multiply the two frequency series.
230 CHAPTER 5. DIGITAL FILTERING AND SMOOTHING

n = 20;
h=[ones(1,n) zeros(1,length(y)-2*n) ones(1,n)]'; % Filter H(f )
ypf = yp.*h % convolve signal & filter Y (f ) × H(f )
plot([abs(yp), h, abs(ypf)])

Now we can take the inverse FFT to obtain back the transformed the time domain signal which is
now our low pass filtered signal. Technically the inverse Fourier transform of a Fourier transform
of a real signal should return also a real time series of numbers. However owing to numerical
roundoff, the computed series may contain a small complex component. In practice we can
safely ignore these, and force the signal to be real.

10 yf = real(ifft(ypf));% force real signal, yf = ℜ{FFT −1 (Y (f ))}


plot(t,[y,yf]) % compare raw & smoothed in Fig. 5.26.

1.5
smoothed
1

0.5
output

−0.5
original

Figure 5.26: The Fourier smoothed signal, yf , −1


compared to the original noisy level signal, y.
Note the absence of any phase lag in the now −1.5
−0.2 0 0.2 0.4 0.6 0.8
offline smoothed signal. time (s)

The result shown in Fig. 5.26 is an acceptably smooth signal with no disturbing phase lag that
was unavoidable in the online filtering example.

Problem 5.3 Design a notch filter to band-stop 50Hz. Test


 your filter using an input signal with a
time varying frequency content such as u(t) = sin 16t4 . Plot both the original and filtered signal
together. Sample around 1000 data pairs using a sampling frequency of 1000 Hz. Also plot the
“apparent” frequency as a function of time for u(t).

5.5 Numerically differentiating industrial data

In industrial applications, we often want the know the slope of a measured process variable. For
example, we may wish to know the flowrate of material into a vessel, when we only measure
the level of the vessel. In this case, we differentiate the level signal, to calculate the flowrate.
However for industrial signals, which typically have significant noise, calculating an accurate
derivative is not trivial. A crude technique such as differencing, will give intolerable error. For
these types of online applications, we must first smooth the data, then differentiate the smoothed
data. This can be accomplished using discrete filters.
5.5. NUMERICALLY DIFFERENTIATING INDUSTRIAL DATA 231

5.5.1 Establishing feedrates

This case study demonstrates one way to differentiate noisy data. For this project, we were trying
to implement a model based estimation scheme to a 1000 liter fed-batch chemical reactor. For our
model, we needed to know the feedrate of the material being fed to the reactor. However we
did not have a flow meter on the feed line, but instead we could measured the weight of the
reactor using load cells. As the 3 hour batch progressed, feed was admitted to the reactor, and
the weight increased from about 800 kg to 1200 kg. By differentiating this weight, we could at
any time approximate the instantaneous feed rate. This is not an optimal arrangement, since we
expect considerable error, but in this case it was all that we could do.

The weight measurement of the reactor is quite noisy. The entire reactor is mounted on load cells,
and all the lines to and from the reactor (such as cooling/heating etc) are connected with flexible
couplings. Since the reactor was very exothermic, the feed rate was very small (30 g/s) and this
is almost negligible to the combined weight of the vessel (≈ 3000 kg) and contents (≈ 1000 kg).
The raw weight signal (dotted) of the vessel over the batch is shown in the upper figures in Fig.
5.28. An enlargement is given in the upper right plot. The sample time for this application was
T = 6 seconds.

A crude method of differentiating

The easiest (and crudest) method to establish the feedrate is to difference the weight signal. Sup-
pose the sampled weight signal is W (t), then the feedrate, Ẇ = F , is approximated by
Wi − Wi−1
F = Ẇ ≈
T
But due to signal discretisation and noise, this results in an unusable mess3 . The differencing
can be achieved elegantly in Matlab using the diff command. Clearly we must first smooth the
signal before differencing.

Using a discrete filter

This application is not just smoothing, but also differentiating. We can do this in one step by
combining a low pass filter (Glp ) with a differentiater, s, as
Gd (s) = Glp (s) · s (5.38)
We can design any low pass filter, (say a 3rd order Butterworth filter using the butter com-
mand), then multiply the filter polynomials by the differentiater. In the continuous domain, a
differentiater is simply s, but in the discrete domain we have a number of choices how to imple-
ment this. Ogata, [148, pp 308] gives these alternatives;

1−z −1

 T Backward difference
s≈ 1−z −1 (5.39)
−1
z T Forward difference

 2 1−z−1 Center difference
T 1+z −1

For a causal filter, we need to choose the backward or center difference variants. I will use the
backward difference Thus Eqn 5.38 using the backward difference in the discrete domain is
1 − z −1
Gd (z) = Glp (z) · (5.40)
T
3I do not even try to plot this, as it comes out a black rectangle
232 CHAPTER 5. DIGITAL FILTERING AND SMOOTHING

with sample time T . Thus the denominator is multiplied by T , and the numerator gets convolved
with the factor (1 − z −1 ). We need to exercise some care in doing the convolution, because the
polynomials given by butter are written in decreasing powers of z (not z −1 ). Note however
that fliplr(conv(fliplr(B),[-1,1])); is equivalent to conv(B,[1,-1]);. Listing 5.9
tests this smoothing differentiator algorithm.

Listing 5.9: Smoothing and differentiating a noisy signal


dt = 0.05; t = [0:dt:20]'; % Sample time T = 0.05
U = square(t/1.5); [num,den]=ord2(2,0.45); % Input u(t) & plant dynamics
3 y = lsim(num,den,U,t)+randn(size(y))*0.005;%simulate it & add noise

dyn = gradient(y,dt); % Do crude approx of differentiation

[B,A] = butter(3,0.1); % Design smoother, ω = 0.1


8 Gd_lp = tf(B,A,dt,'variable','zˆ-1'); % Convert to low-pass filter, Glp (z −1 )
Gdd = series(Gd_lp,tf([1 -1],dt,dt,'variable','zˆ-1')) % Eqn. 5.40.
yns = filter(B,A,y); % Smooth y(t)
dyns = lsim(Gdd,y); % Smooth & differentiate y(t), See Fig. 5.27.

The results of the smoothing and the differentiating of the noisy signal are given in Fig. 5.27.
The upper plot compares the noisy measurement with the the smoothed version exhibiting the
inevitable phase lag. The lower plot in Fig. 5.27 shows the crude approximation of the deriva-
tive obtained by differencing (with little phase lag, but substantial noise), and the smooth and
differentiated version.

0.4

0.2
Measurement

−0.2

−0.4
1

0.5
Derivative

−0.5
Figure 5.27: Smoothing, (upper)
and differentiating (lower) a noisy −1
0 5 10 15 20
measurement. time [s]

Note that even though the noise on the raw measurement is insignificant, the finite differencing
amplifies the noise considerably. While the degree of smoothing can be increased by decreasing
the low pass cut-off frequency, ω, we pay the price in increasing phase lag.

Fig. 5.28 shows actual industrial results using this algorithm. The right hand plots show an
enlarged portion of the signal. The upper left plot shows the raw (dotted) and smoothed (solid)
weight signal. The smoothed signal was obtained from the normal unity gain Butterworth filter
with a ωc = 0.02 1
2T = 600 s
−1
. There is a short transient at the beginning, when the filter is first
invoked, since I made no attempt to correct for the non-zero initial conditions. The lower 2 plots
show the differentiated signal, in this case feed rate using the digital filter and differentiater.
Note that since the filter is causal, there will be a phase lag, and this is evident in the filtered
signal given in the top right plot. Naturally the differentiated signal will also lag behind the true
feedrate. In this construction, the feed rate is occasionally negative corresponding to a decrease
5.6. SUMMARY 233

in weight or filter transient. This is physically impossible, and the derived feedrate be chopped
at zero if it were to be used further in a model based control scheme for example.

1200 1140
Vessel weight [kg]

Vessel weight [kg]


1100 1120

1000 1100

900 1080

800 1060
8 10 12 10.5 11 11.5 12
time [hr] time [hr]

0.1 0.1
Feed rate [kg/s]

Feed rate [kg/s]

0.05 0.05

0 0

-0.05 -0.05
8 10 12 10.5 11 11.5 12
time [hr] time [hr]

Figure 5.28: Industrial data showing (upper) the raw & filtered weight of a fed-batch reactor
vessel [kg] during a production run, and (lower) the feedrate [kg/s]

5.6 Summary

Filtering or better still, smoothing, is one way to remove unwanted noise from your data. Smooth-
ing is best done offline if possible. The easiest way to smooth data is to decide what frequencies
you wish to retain as the signal, and which frequencies you wish to discard. One then constructs
a filter function, H(f ) which is multiplied with the signal in the frequency domain to create the
frequency of the smoothed signal. To get the filtered time signal, you transform back to the time
domain. One uses the Fourier transform to convert between the time and frequency domains.

If you cannot afford the luxury of offline filtering, then one must use a causal filter. IIR filters per-
form better than FIRs, but both have phase lags at high frequencies and rounded edges that are
deleterious to the filtered signal. The Butterworth filter is better than simple cascaded first order
filters, it has only one parameter (cut-off frequency), relatively simple to design and suits most
applications. Other more complicated filters such as Chebyshev or elliptic filters, with corre-
spondingly more design parameters are easily implemented using the signal processing toolbox
in M ATLAB if desired. Usually for process engineering applications, we are most interested in
low pass filters, but in some applications where we may wish to remove slow long term trends
(adjustment for seasons which is common in reporting unemployment figures for example), or
differentiate, or remove a specific frequency such as the 50Hz mains hum, then high-pass or notch
filters are to be used. These can be designed in exactly the same way as a low pass filter, but with
a variable transformation.
234 CHAPTER 5. DIGITAL FILTERING AND SMOOTHING
Chapter 6

Identification of process models

Fiedler’s forecasting rules:


Forecasting is very difficult, especially if it’s about the future. Consequently when presenting a forecast:
Give them a number or give them a date, but never both.

6.1 The importance of system identification

Most advanced control schemes require a good process model to be effective. One option is to set
the control law to be simply the inverse of the plant so when the inevitable disturbances occur,
the controller applies an equal but opposite input to maintain a constant output. In practice this
desire must be relaxed somewhat to be physically realisable, but the general idea of inserting
a dynamic element (controller) into the feedback loop to obtain suitable closed loop dynamics
necessitates a process model.

When we talk about ‘models’, we mean some formal way to predict the system outputs knowing
the input. Fig. 6.1 shows the three components of interest; the input, the plant, and the output. If
we know any two components, we can back-calculate the third.

Does this match with reality?

input, u - - output, ŷ
Model

| {z }
Try to establish this Model

Figure 6.1: The prediction problem: Given a model and the input, u, can we predict the output,
y?

When convenient in this chapter, I will use the convention that inputs and associated parameters
such as the B polynomial are coloured blue, and the outputs and the A polynomial are coloured
red. As a mnemonic aid, think of a hot water heater. Disturbances will be coloured green (be-
cause sometimes they come from nature). These colour conventions may help make some of the
equations in this chapter easier to follow.

235
236 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

The prediction problem is given the plant and input, one must compute the output, the reconstruc-
tion problem is given the output and plant, one must propose a plausible input to explain the
observed results, and the identification problem, the one we are concerned with in this chapter, is
to propose a suitable model for the unknown plant, given input and output data.

Our aim for automatic control problems is to produce a usable model, that given typical plant
input data, will predict within a reasonable tolerance the consequences. We should however
remember the advice by the early English theologian and philosopher, William of Occam (1285–
1349) who said: “What can be accounted for by fewer assumptions is explained in vain by more.”
This is now known as the principle of “Occam’s razor”, where with a razor we are encouraged to
cut all that is not strictly necessary away from our model. This motivates us to seek simple, low
order, low complexity, succinct, elegant models.

For the purposes of controller design we are interested in dynamic models as opposed to simply
curve fitting algebraic models. In most applications, we collect discrete data, and employ digital
controllers, so we are only really interested in discrete dynamic models.

Sections 3.4 and 3.4.2 revises traditional curve and model fitting such as what you would use in
algebraic model fitting. Section 6.7 extends this to fit models to dynamic data.

Good texts for for system identification include the classic [32] for time series analysis, [188], and
[124] accompanied with the closely related M ATLAB toolbox for system identification, [125]. For
identification applications in the processing industries, [127] contains the basics for modelling
and estimation suitable for those with a chemical engineering background. A good collection
is found in [147] and a now somewhat dated detailed survey of applications, including some
interesting practical hints and observations with over 100 references is given in [79].

6.1.1 Basic definitions

When we formulate and use engineering models, we typically have in mind a real physical pro-
cess or plant, that is perhaps so complex, large, or just downright inconvenient, that we want to
use a model instead. We must be careful at all times to distinguish between the plant, model,
data and predictions. The following terms are adapted from [188].

System, S This is the system that we are trying to identify. It is the physical process or plant that
generates the experimental data. In reality this would be the actual chemical reactor, plane,
submarine or whatever. In simulation studies, we will refer to this as the “truth” model.
Model, M This is the model that we wish to find such that the model predictions generated by
M are sufficiently close to the actual observed system S output. Sometimes the model may
be non-parametric where it is characterised by a curve or a Bode plot for example, or the
model might be characterised by a finite vector of numbers or parameters, M(θ), such as
say the coefficients of a polynomial transfer function or weights in a neural network model.
In either case, we can think of this as a black-box model where if we feed in the inputs, turn
the handle, we will get output predictions that hopefully match the true system outputs.
Parameters, θ. These are the unknowns in the model M that we are trying to establish such
that our output predictions ŷ are as close as possible to the actual output y. In a transfer
function model, the parameters are the coefficients of the polynomials; in a neural network,
they would be the weights of the neurons.

This then brings us to the fundamental aim of system identification: Given sufficient input/out-
put data, and a tentative model structure M with free parameters θ, we want to find reasonable
6.1. THE IMPORTANCE OF SYSTEM IDENTIFICATION 237

values for θ such that our predictions ŷ closely follow the behaviour of our plant under test, y, as
shown in Fig. 6.2.

y
- Plant, S -

?
input - + - error, ǫ
6− (hopefully small)
- Model, M -

Figure 6.2: A good model, M, duplicates the behaviour of the true plant, S.

Models can be classified as linear or nonlinear. In the context of parameter identification, which
is the theme of this chapter, it is assumed that the linearity (or lack of) is with respect to the
parameters, and not to the shape of the actual model output. A linear model does not mean just
a straight line, but includes all models where the model parameters θ, enter linearly. If we have
the general model
y = f (θ, x)
with parameters θ and independent variables x, then if ∂y/∂θ is a constant vector independent
of θ, then the model is linear in the parameters, otherwise the model is termed nonlinear. Linear
models can be written in vector form as

y = f (x)⊤ · θ

Just as linear models are easier to use than nonlinear models, so too are discrete models compared
with continuous models.

6.1.2 Black, white and grey box models

The two extreme approaches to modelling are sometimes termed black and white box models.
The black box, or data-based model is built exclusively from experimental data, and a deliberate
boundary is cast around the model, the contents inside about which we neither know nor care.
The white box, theoretical or first-principles model, preferred by purists, contains already well
established physical and chemical relations ideally so fundamental, no experiments are needed
to characterise any free parameters.

However most industrial processes are so complex that totally fundamental models are usually
not available. Here the models that are used may be partly empirical and partly fundamental.
The empirical model will have some fitted parameters. An example of this type of model is the
Van de Waals gas law or correlations for the friction factor in turbulent flow. Finally if the process
is so complex that no fundamental model is possible, then a black box type model may be fitted.
The Wood-Berry column model described in Eqn. 3.24 is a black box model and the parameters
were obtained using the technique of system identification. Particularly hopeless models, based
mostly on whim and what functions happened to be lying around at the time, can be found in
[77]. These are the types of the models that should have never been regressed.

Of course many people have argued for many years about the relative merits of these two extreme
approaches. As expected, the practising engineer, tends to have the most success somewhere in
238 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

the middle, which, not unexpectedly, are termed grey box models. The survey in [184] describes
the advantages and disadvantages of these ‘colour coded’ models in further details.

6.1.3 Techniques for identification

There are two main ways to identify process models: offline or online in real-time. In both cases
however, it is unavoidable to collect response data from the system of interest as shown in Fig. 6.3.
Most of the more sophisticated model identification is performed in the offline mode where all
the data is collected first and only analysed later to produce a valid model. This is typically
convenient since before any regressing is performed we know all the data, and we can do this in
comfort in front of our own computer workstation. The offline developed model (hopefully) fits
the data at the time that it was collected, but there is no guarantee that it will be still applicable for
data collected sometime in the future. Process or environmental conditions may change which
may force the process to behave unexpectedly.

Computer

6 6
‘Dial-up’ a signal
input output
6

?
signal generator PI transducer

Plant
control valve

Figure 6.3: An experimental setup for input/output identification. We log both the input and the
response data to a computer for further processing.

The second way to identify models is to start the model regression analysis while you are col-
lecting the data in real-time. Clearly this is not a preferred method when compared with offline
identification, but in some cases we will study later, such as in adaptive control, or where we
have little input/output data at our immediate disposal, but are gradually collecting more, we
must use on-line identification. As you collect more data, you continually update and correct the
model parameters that were established previously. This is necessary when we require a model
during the experiment, but that the process conditions are likely to change during the period of
interest. We will see in chapter 7 that the online least squares recursive parameter estimation is
the central core of adaptive control schemes.
6.2. GRAPHICAL AND NON-PARAMETRIC MODEL IDENTIFICATION 239

6.2 Graphical and non-parametric model identification

Dynamic model identification is essentially curve fitting as described in section 3.4 but for dy-
namic systems. For model based control, (our final aim in this section), it is more useful to have
parametric dynamic models. These are models which have distinct parameters to be estimated in
a specified structure. Conversely non-parametric models are ones where the model is described
by a continuous curve for example, such as say a Bode or Nyquist plots, or collections of models
where the model structure has not been determined beforehand, but is established from the data
in addition to establishing the parameters.

In both parametric and non-parametric cases we can make predictions, which is a necessary
requirement of any model, but it is far more convenient in a digital computer to use models
based on a finite number of parameters rather than resort to some sort of table lookup or digitise
a non-parametric curve.

How one identifies the parameters in proposed models from experimental data is mostly a matter
of preference. There are two main environments in which to identify models;

1. Time domain analysis described in §6.2.1, and

2. frequency domain analysis described in §6.2.2.

Perturb the process somehow . . . and watch what happens.

6.2.1 Time domain identification using graphical techniques

Many engineers tend to be more comfortable in the time domain, and with the introduction of
computers, identification in the time domain, while tedious, is now reliable and quite straight
forward. The underlying idea is to perturb the process somehow, and watch what happens.
Knowing both the input and output signals, one can in principle, compute the process transfer
function. The principle experimental design decision is the type of exciting input signal.

Step and impulse inputs

Historically, the most popular perturbation test for visualisation and manual identification was
the ‘step test’. The input step test itself is very simple, and the major response types such as first
240 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

Step Response Step Response Step Response

1 1 1.5

0.8 0.8
Amplitude

Amplitude

Amplitude
1
0.6 0.6
1 1e− 4s 1
0.4 s+1 0.4 s+1 s 2 +0.8s+1
0.5
0.2 0.2

0 0 0
0 2 4 6 0 5 10 0 5 10 15
Time (sec) Time (sec) Time (sec)

Step Response Step Response Step Response

1.5 1 2

0.8
Amplitude

Amplitude

Amplitude
1 1
0.6 1
1 −6s+2
s 0.4 6s 3 +11s 2 +6s+1 3s 2 +4s+1
0.5 0
0.2

0 0 −1
0 0.5 1 0 5 10 15 0 5 10 15
Time (sec) Time (sec) Time (sec)

Figure 6.4: Typical open loop step tests for a variety of different standard plants.

order, underdamped second order, integrators and so forth are all easily recognisable from their
step tests by the plant personnel as illustrated in Fig. 6.4.

Probably the most common model structure for industrial plants is the first-order plus deadtime
model,
K
Gp (s) = e−θs (6.1)
τs + 1
where we are to extract reasonable values for the three parameters: plant gain K, time constant
τ , and deadtime θ. There are a number of strategies to identify these parameters, but one of the
more robust is called the “Method of Areas”, [120, pp32–33].

Algorithm 6.1 FOPDT identification from a step response using the Method of Areas.

Collect some step response data (u, y) from the plant, then

1. Identify the plant gain,


y(∞) − y(0)
K=
u(∞) − u(0)
and normalise the experimental data to obtain a unit step

y − y(0)
yus =
K

2. Compute the area Z ∞


A0 = (K − yus (t)) dt (6.2)
0
6.2. GRAPHICAL AND NON-PARAMETRIC MODEL IDENTIFICATION 241

3. Compute the time t1 = A0 /K and then compute a second area


Z t1
A1 = yus (t) dt (6.3)
0

as illustrated in Fig. 6.5.

4. Now the two remaining parameters of the proposed plant model in Eqn. 6.1 are

eA1 A0 − eA1
τ= and θ = (6.4)
K K
where e = exp(1).

1.2

0.8
A0
Output

0.6

0.4

0.2
A1

0 Figure 6.5: Areas method for a


first-order plus time delay model
0 10 20 30 40 50
time identification.

Listing 6.1 gives a simple M ATLAB routine to do this calculation and Fig. 6.5 illustrate the inte-
grals to be computed. Examples of the identification is given in Fig. 6.6 for a variety of simulated
plants (1st and 2nd order, and an integrator) with different levels of measurement noise super-
imposed. Fig. 6.7 shows the experimental results from a step test applied to the blackbox filter
with the identified model.

Listing 6.1: Identification of a first-order plant with deadtime from an openloop step response
using the Areas method from Algorithm 6.1.
Gp = tf(3, [7 2 1],'iodelay',2); % True plant: Gp (s) = 3e−2s /(7s2 + 2s + 1)
[y,t] = step(Gp); % Do step experiment to generate data, see Fig. 6.5.

4 % Calculate areas A0 & A1 as shown in Fig. 6.5.


K = y(end); % plant gain
A0 = trapz(t,K-y);
t0 = A0/K;
idx = find(t<t0);
9 t1 = t(idx); y1 = y(idx);
A1 = trapz(t1,y1);

% Create process model Ĝp (s)


tau = exp(1)*A1/K;
14 Dest = max(0,(A0-exp(1)*A1)/K); % deadtime
Gm = tf(K,[tau 1],'iodelay',Dest);

step(Gp,Gm) % Compare model with plant, see Fig. 6.5.


242 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

8 8

6 6

Output

Output
4 4

2 2

0 Plant 0 Plant
Model Model
−2 −2
0 20 40 60 0 50 100 150
time time

8 6
Plant
6 4 Model

Output
Output

4
2
2
Figure 6.6: The “Areas method” for 0
0 Plant
plant identification applied to a va- Model
riety of different plants with random −2
0 5 10 15 20
−2
0 0.5 1 1.5 2
disturbances. time time

600

400
Output

200

Blackbox
0 Model

1000

Figure 6.7: Identification of the experimental 500


Blackbox using the Method of Areas. Note
that the identification strategy is reasonably 0
robust to the noise and outlier in the experi- 0 5 10 15 20 25
mental data. time

Of course not all plants are adequately modelled using a first-order plus deadtime model. For
example, Fig. 6.8 shows the experimental results from a step test applied to the electromagnetic
balance arm which is clearly second order with poles very close to the imaginary axis.

However many plant operators do not appreciate the control engineer attempting to “step test”
the plant under open loop conditions since it necessarily causes periods of off-specification prod-
uct at best and an unstable plant at worst. An alternative to the step test is the impulse test where
the plant will also produce off-spec product for a limited period,1 but at least it returns to the
same operating point even using limited precision equipment. This is important if the plant ex-
hibits severe nonlinearities and the choice of operating point is crucial. The impulse response is
ideal for frequency domain testing since it contains equal amounts of stimulation at all frequen-
cies, but is difficult to approximate in practice. Given that we then must accept a finite height
pulse with finite slope, we may be limited as to the amount of energy we can inject into the sys-

1 Provided the plant does not contain an integrator


6.2. GRAPHICAL AND NON-PARAMETRIC MODEL IDENTIFICATION 243

3000

2000
Output

1000

−1000
2000
Input

1000 Figure 6.8: Two step tests subjected to the bal-


ance arm. Note the slow decay of the oscilla-
0 tions indicate that the plant has at least two
0 5 10 15 complex conjugate poles very close to the sta-
time ble half of the imaginary axis.

tem without hitting a limiting nonlinearity. The impulse test is more difficult to analyse using
manual graphical techniques (pencil and graph paper), as opposed to computerised regression
techniques, than the step test.

Random inputs

An alternative to the step or impulse tests is actually not giving any deliberate input to the plant
at all, but just relying on the ever-present natural input disturbances. This is a popular approach
when one wants still to perform an identification, but cannot, or is unable to influence the system
themselves. Economists, astronomers and historians fall into this group of armchair-sitting, non-
interventionist identificationalists. This is sometimes referred to as time series analysis (TSA), see
[32] and [54].

Typically for this scheme to work in practice, one must additionally consciously perturb the input
(control valve, heating coils etc) in some random manner, since the natural disturbance may not
normally be sufficient in either magnitude, or even frequency content by itself. In addition, it is
more complex to check the quality of model solution from comparison with the raw time series
analysis. The plant model analysed could be anything, and from a visual inspection of the raw
data it is difficult to verify that the solution is a good one. The computation required for this type
of analysis is more complex than graphically curve fitting and the numerical algorithms used for
time series analysis are much more sensitive to the initial parameters. However using random
inputs has the advantage that they continually excite or stimulate the process and this excitation,
provided the random sequence is properly chosen, stimulates the process over a wide frequency
range. For complicated models, or for low signal to noise ratios, lengthy experiments may be
necessary, which is infeasible using one-off step or impulse tests. This property of continual
stimulation is referred to as persistent excitation and is a property of the input signal and should
be considered in conjunction with the proposed model. See [18, §2.4] for further details.

Two obvious choices for random sequences are selecting the variable from a finite width uniform
distribution, or from a normal distribution. In practice, when using a normal distribution, one
must be careful not to inject values far from the desired mean input signal, since this could cause
additional unwanted process upsets.
244 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

Random binary signals

Electrical engineers have traditionally used random signals for testing various components where
the signal is restricted to one of two discrete signal levels. Such a random binary signal is called
an RBS. A binary signal is sometimes preferred to the normal random signal, since one avoids
possible outliers, and dead zone nonlinearities when using signals with very small amplitudes.
We can generate an RBS in M ATLAB using the random number generator and the signum func-
tion, sign, as suggested in [124, p373]. Using the signum function could be dangerous since
technically it is possible to have three values, −1, 1 and zero. To get different power spectra of
our binary test signal, we could filter the white noise before we “sign” it.

x = randn(100,1); % white noise


rbsx = 0.0 ≤ x; % binary signal
3

y = filter(0.2,[1 -0.8],x); % coloured noise


rbsy = 0.0 ≤ y; % binary value

Compare in Fig. 6.9 the white noise (left plots) with the filtered white noise through a low-pass
filter (solid). The filter used in Fig. 6.9 is 0.2/(1 − 0.8q −1 ).

White noise Coloured noise


4 1

2 0.5

0 0

−2 −0.5

−4 −1

Random binary noise Random binary coloured noise


1 1

0.5 0.5

0 0
0 50 100 0 50 100

Figure 6.9: Comparing white and filtered (coloured) random signals. Upper: Normal signals,
lower: Two-level or binary signals.

If you fabricate in hardware one of these random signal generators using standard components
such as flip-flops or XOR gates, rather than use a process control computer, then to minimise
component cost, you want to reduce the number of these physical components. This is done in
hardware using a linear feedback shift register (LFSR), which is an efficient approximation to our
(almost random) random number generator. Fig. 6.10 shows the design of one version which
uses a 5 element binary shift register. Initially the register is filled with all ones, and at each clock
cycle, the elements all shift one position to the right, and the new element is formed by adding
the 3rd and 5th elements modulus 2.

Since we have only 5 positions in the shift register in Fig. 6.10, the maximum length the sequence
can possibly be before it starts repeating is 25 − 1 or 31 elements. Fig. 6.11 shows a S IMULINK
6.2. GRAPHICAL AND NON-PARAMETRIC MODEL IDENTIFICATION 245

binary operation
+
6

- 1 1 1 1 1 - output

| {z }
5-element shift register

Figure 6.10: A 5-element binary shift register to generate a pseudo-random binary sequence.

Math
Function Sum

mod

Constant
2

1 1 1 1 1
z z z z z
Unit Delay1 Unit Delay2 Unit Delay3 Unit Delay4 Unit Delay5 Scope

(a) S IMULINK block diagram

Pseudo random binary sequence of length 31


1.2

0.8

0.6

0.4

0.2

−0.2
0 10 20 30 40 50 60 70
sample #

(b) S IMULINK simulation output

Figure 6.11: S IMULINK simulation of a 5-element binary shift register (upper) and resulting out-
put (lower). Note the period of 31 samples. Refer also to Fig. 6.10.

implementation and the resulting binary sequence which repeats every 31 samples.

Random number generators such as Fig. 6.10 are completely deterministic so are called pseudo
random binary generators since in principal anyone can regenerate the exact sequence given
the starting point since the construction in Fig. 6.10. RBS can be used as alternatives to our
random sequences for identification. These and other commonly used test and excitation signals
are compared and discussed in [200, p143].
246 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

6.2.2 Experimental frequency response analysis

When first faced with an unknown plant, it is often a good idea to perform a non-parametric model
identification such as a frequency response analysis. Such information, quantified perhaps as a
Bode diagram, can then be used for subsequent controller design, although is not suitable for
direct simulation. For that we would require a transfer function or equivalent, but the frequency
response analysis will aid us in the choice of appropriate model structure if we were to regress
further a parametric model.

The frequency response of an unknown plant G(s) can be calculated by substituting s = iω and
following the definition of the Fourier transform, one gets
R∞
Y (iω)
def y(t)e−iωt dt
G(iω) = = R0∞ (6.5)
X(iω) 0
x(t)e−iωt dt

For every value of frequency, 0 < ω < ∞, we must approximate the integrals in Eqn. 6.5. If we
choose the input to be a perfect Dirac pulse, then the denominator in Eqn. 6.5 will always equal
1. This simplifies the computation, but complicates the practical aspects of the experiment. If
we use arbitrary inputs, we must compute the integrals numerically, although they are in fact
simply the Fourier transform of y(t) and x(t) and can be computed efficiently using the Fast
Fourier Transform or FFT.

We will compare three alternative techniques to compute the frequency response of an actual
experimental laboratory plant, namely:

1. subjecting the plant to a series of sinusoids of different frequencies,

2. subjecting the plant to a single sinusoid with a time-varying frequency component (chirp
signal),

3. and subjecting a plant to a pseudo-random white noise input.

In the three alternatives, we must record (or compute) the steady-state amplitude ratio of the
output over the input and the phase lag of the output compared with the input for different
input frequencies.

Finally, to extract estimates of the model parameters from the frequency data, you could use the
M ATLAB functions invfreqs or the equivalent for the discrete domain, invfreqz. I have no
idea how “robust” this fitting works, and I would treat the results very cautiously at first. (Also
refer to problem 6.1.)

A series of input sine waves

The simplest way to experimentally characterise the frequency response of a plant is to input a
series of pure sine waves at differing frequencies as shown in in the S IMULINK diagram Fig. 6.12
which uses the real-time toolbox to communicate with the experimental blackbox.

Fig. 6.13 shows the response of the laboratory blackbox when subjected to 12 sine waves of in-
creasing frequency spanning from ω = 0.2 to 6 radians/second. The output clearly shows a re-
duction in amplitude at the higher frequencies, but also a small amplification at modest frequences.
By carefully reading the output amplitude and phase lag, we can establish 12 experimental points
given in table 6.1 suitable say for a Bode diagram.
6.2. GRAPHICAL AND NON-PARAMETRIC MODEL IDENTIFICATION 247

Terminator uy
Simulink
Execution To Workspace
Control Sine Wave

DAC ADC

Saturation Scope
MCC DAQ DAC OUT MCC DAQ ADC IN

Maximum real time sampling rate is around 0.1s

Figure 6.12: Using the real-time toolbox in M ATLAB to subject sine waves to the blackbox for an
experimental frequency analysis.

ω = 0.2 ω = 0.3 ω = 0.5


2

−2

ω = 0.8 ω = 1.0 ω = 1.3


2

−2

ω = 1.5 ω = 2.0 ω = 2.5


2

−2
0 5 10 15 20 25

ω = 3.0 ω = 4.0 ω = 6.0


2

−2
0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25

Figure 6.13: The black box frequency response analysis using a series of sine waves of different
frequencies. This data will be subsequently processed and is tabulated in Table 6.1 and plotted
in Fig. 6.20.

However while conceptually simple, the sine wave approach is not efficient nor practical. Lees
and Hougen, [118], while attempting a frequency response characterisation of a steam/water
heat exchanger noted that it was difficult to produce a good quality sine wave input to the plant
when using the control valve. In addition, since the ranges of frequencies spanned over 3.5 orders
of magnitude (from 0.001 rad/s to 3 rad/s), the testing time was extensive.

Additional problems of sine-wave testing are that the gain and phase lag is very sensitive to
bias, drift and output saturation. In addition it is often difficult in practice to measure accurately
phase lag. [141, pp117-118] give further potential problems and some modifications to lessen the
effects of nonlinearities. The most important conclusion from the Lees and Hougen study was
that they obtained better frequency information, with much less work using pulse testing and
subsequently analysing the input/output data using Fourier transforms.
248 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

Table 6.1: Experimentally determined frequency response of the blackbox derived from the labo-
ratory data shown in Fig. 6.13. For a Bode plot of this data, see Fig. 6.20.

Input frequency Amplitude ratio Phase lag


ω (rad/s) – φ (degrees)
0.2 1.19 -4.8
0.3 1.21 -5.1
0.5 1.27 -11.1
0.8 1.42 -20.8
1.0 1.59 -27.6
1.3 1.92 -46.7
1.5 2.08 -64.4
2.0 1.61 -119.2
2.5 0.88 -144.3
3.0 0.54 -156.8
4.0 0.27 -172.6
6.0 0.11 -356.1

A chirp signal

If we input a signal with a time varying frequency component such as a ‘chirp signal’, u(t) =
sin(ωt2 ), to an unknown plant, we can obtain an idea of the frequency response with a single,
albeit long drawn-out experiment unlike the series of experiments required in the pure sinusoidal
case.

Fig. 6.14 shows some actual input/output data collected from the ‘blackbox’ laboratory plant
collected at T = 0.1 seconds. As the input frequency is increased, the output amplitude decreases
as expected for a plant with low-pass filtering characteristics. However the phase lag information
is difficult to determine at this scale from the plot given in Fig. 6.14. Our aim is to quantify this
amplitude reduction and phase lag as a function of frequency for this experimental plant.

Since the phase information is hard to visualise in Fig. 6.14, we can plot the input and output sig-
nals as a phase plot similar to a Lissajou figure. The resulting curve must be an ellipse (since the
input and output frequencies are the same), but the ratio of the height to width is the amplitude
ratio, and the eccentricity and orientation is a function of the phase lag.

If we plot the start of the phase plot using the same input/output data as Fig. 6.14, we should see
the trajectory of an ellipse gradually rotating around, and growing thinner and thinner. Fig. 6.15
shows both the two-dimensional phase plot and a three dimensional plot with the time axis
added.

Fig. 6.16 shows the same experiment performed on the flapper. This experiment indicates that the
flapper has minimal dynamics since there is little discernable amplitude attenuation and phase
lag but substantial noise. The saturation of the output is also seen in the squared edges of ellipse
in the phase plot in addition to the flattened sinusoids of the flapper response.

A pseudo random input

A more efficient experimental technique is to subject the plant to an input signal that contains
a wide spectrum of input frequencies, and then compute the frequency response directly. This
has the advantage that the experimentation is considerably shorter, potentially processes the data
6.2. GRAPHICAL AND NON-PARAMETRIC MODEL IDENTIFICATION 249

Experimental Blackbox
1

0.8

0.6
output

0.4

0.2

0
0 50 100 150 200 250 300 350

−0.2

−0.4
input

−0.6

−0.8

−1
0 50 100 150 200 250 300 350
time (s)

Figure 6.14: Input (lower) /output (upper) data collected from the black box where the input is a
chirp signal.

Phase plot of the experimental Blackbox


1 Phase plot of the experimental Blackbox

Start
1
0.9

0.9
0.8
0.8
0.7
0.7

0.6 0.6
output (V)
output (V)

0.5 0.5

0.4
0.4

0.3
0.3
0.2

0.2
0.1
0
0.1
−0.5
350 400
250 300
150 200
0 −1 50 100
0
−1 −0.9 −0.8 −0.7 −0.6 −0.5 −0.4 −0.3 −0.2 −0.1 0 input (V)
input (V) time (s)

(a) 2D input/output phase plot (b) 2D input/output phase plot with time in the third di-
mension

Figure 6.15: Phase plot of the experimental black box when the input is a chirp signal. See also
Fig. 6.14.
250 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

Experimental Flapper Phase plot of the experimental Flapper


0.5 0.5

0.4

0.3 0.4
0.2
output

0.1
0.3
0

−0.1

−0.2 0.2 Start

output (V)
0 50 100 150 200 250 300 350 400

0 0.1

−0.2
0
−0.4
input

−0.6
−0.1

−0.8

−1 −0.2
0 50 100 150 200 250 300 350 400 −1 −0.9 −0.8 −0.7 −0.6 −0.5 −0.4 −0.3 −0.2 −0.1 0
time (s) input (V)

(a) Time response (b) Phase plot of the experimental flapper

Figure 6.16: Flapper response to a chirp signal. There is little evidence of any appreciable dynam-
ics over the range of frequencies investigated.

more efficiently and removes some of the human bias.

The experimental setup in Fig. 6.17 shows an unknown plant subjected to a random input. If we
collect enough input/output data pairs, we should be able to estimate the frequency response of
the unknown plant.

y & spt

1
UY

Figure 6.17: Experimental setup to subject a


random input into an unknown plant. The
G
input/output data was collected, processed
through Listing 6.2 to give the frequency re- Band−Limited Unknown plant
sponse shown in Fig. 6.18. White Noise

Fig. 6.18 compares the magnitude and phase of the ratio of the Fourier transforms of the input
and output to the analytically computed Bode plot given that the ‘unknown’ plant is in fact
G(s) = e−2s /(5s2 + s + 6).

The actual numerical computations to produce the Bode plot in Fig. 6.18 are given in Listing 6.2.
While the listing looks complex, the key calculation is the computation of the fast Fourier trans-
form, and the subsequent division of the two vectors of complex numbers.

It is important to note that S IMULINK must deliver equally spaced samples necessary for the
subsequent FFT transformation, and that the frequency vector is equally spaced from 0 to half
the Nyquist frequency.

Listing 6.2: Frequency response identification of an unknown plant directly from input/output
6.2. GRAPHICAL AND NON-PARAMETRIC MODEL IDENTIFICATION 251

0
10

−1
10

−2
10
|G(i ω)|

−3
10

−4
10

−5
10

0
Phase angle [degrees]

−200

−400
Figure 6.18: The experimental frequency
−600 response compared to the true analytical
−3 −2 −1 0 1 2 Bode diagram. See the routine in List-
10 10 10 10 10 10
fN
frequency ω [rad/s] ing 6.2.

data
G = tf(1,[5 1 6],'iodelay',2); % ‘Unknown’ plant under test
Ts = 0.2; timespan = [0 2000]; % sample time & period of interest
sl_optns = simset('RelTol',1e-5,'AbsTol',1e-5, ...
'MaxStep',1e-2,'MinStep',1e-3);
5 [t,x,UY] = sim('splant_noiseIO',timespan,sl_optns); % Refer Fig. 6.17.

n = ceil((pow2(nextpow2((length(t)))-1)+1)/2);
t = t(1:n); % discard down to a power of 2 for the FFT
u = UY(1:n,1); y = UY(1:n,2);
10

Fs = 1/Ts; Fn = Fs/2; % sampling & Nyquist frequency [Hz]


Giw = fft(y)./fft(u); % G(iω) = FFT (Y )/FFT (U )
w=2*pi*(0:n-1)*2*Fn/n; % frequency ω in [rad/s]

15 [Mag,Phase,wa] = bode(G); % Analytical Bode plot for comparison


Mag = Mag(:); Phase = Phase(:);

subplot(2,1,1); % Refer Fig. 6.18.


loglog(w,abs(Giw),wa,Mag,'r-'); ylabel('|G(i \omega)| ')
20 subplot(2,1,2);
semilogx(w,phase(Giw)*180/pi, wa,Phase,'r-');
ylim([-720,10])
ylabel('Phase angle [degrees]'); xlabel('frequency \omega [rad/s]');

We can try the same approach to find the frequency response of the blackbox experimentally.
Fig. 6.19 shows part of the 213 = 8192 samples collected at a sampling rate of 10Hz or T = 0.1 to
be used to construct the Bode diagram.

The crude frequency response obtained simply by dividing the two Fourier transforms of the
input/output data from Fig. 6.19 is given in Fig. 6.20 which, for comparison, also shows the
frequency results obtained from a series of distinct sine wave inputs and also from using the
M ATLAB toolbox command etfe which will be subsequently explained in section 6.2.3.

Since we have not employed any windows in the FFT analysis, the spectral plot is unduly noisy,
especially at the high frequencies. The 12 amplitude ratio and phase lag data points manually
252 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

4
output

0
4

3
input

0 50 100 150
time [s]

Figure 6.19: Part of the experimental black box response data series given a pseudo-random input
sequence. This data was processed to produce the frequency response given in Fig. 6.20.

0
10
|G(i ω)|

−1
10

50

0
Phase angle [degrees]

−50

−100

−150

−200
FFT
−250 etfe
sine waves
−300
−2 −1 0 1 2
10 10 10 10 10
frequency ω [rad/s]

Figure 6.20: The frequency response of the black box computed using FFTs (cyan line), the Sys-
tem ID toolbox function etfe, (blue line) and the discrete sine wave input data, (2), read from
Fig. 6.13. The FFT analysis technique is known to give poor results at high frequencies.
6.2. GRAPHICAL AND NON-PARAMETRIC MODEL IDENTIFICATION 253

read from Fig. 6.13 (repeated in Table 6.1) show good agreement across the various methods.

6.2.3 An alternative empirical transfer function estimate

The frequency response obtained from experimental input/output data shown in Fig. 6.18 and
particularly Fig. 6.20 illustrate that the simple approach of simply numerically dividing the
Fourier transforms is suspect to excessive noise. A better approach is to take more care in the
data pre-processing stage by judicious choice of windows. The empirical transfer function esti-
mate routine, etfe, from the System Identification toolbox does this and delivers more reliable
and consistent results. Compare the result given in Fig. 6.21 from Listing 6.3 with the simpler
strategy used to generate Fig. 6.18.

Listing 6.3: Non-parametric frequency response identification using etfe.


G = tf(1,[5 1 6],'iodelay',2); % Plant under test: G(s) = e−2s /(5s2 + s + 6)
2 Ts = 0.2; t = [0:Ts:2000]'; % sample time & span of interest

u = randn(size(t)); y = lsim(G,u,t);

Dat = iddata(y,u,Ts); % collect data together


7 Gmodel = etfe(Dat); % estimate a frequency response model
bode(Gmodel) % Refer Fig. 6.21.

0
10
|G(i ω)|

−1
10

Estimated
Actual
−2
10
0

−200
φ [deg]

Figure 6.21: An empirical transfer function


−400 estimate from experimental input/output
data computed by etfe. The true fre-
−600
quency response is overlayed for compar-
0.3 0.5 1 2 3
ison. This is preferred over the simpler
ω [rad/s] strategy used to generate Fig. 6.18.

Problem 6.1 1. Plot the frequency response of an “unknown” process on both a Bode diagram
and the Nyquist diagram by subjecting the process to a series of sinusoidal inputs. Each
input will contribute one point on the Bode or Nyquist diagrams. The “unknown” transfer
function is
0.0521s2 + 0.4896s + 1
G(s) =
0.0047s + 0.0697s3 + 0.5129s2 + 1.4496s + 1
4

Compare your plot with one generated using bode. Note that in practice we do not know
the transfer function in question, and we construct a Bode or Nyquist plot to obtain it. This
is the opposite of typical textbook student assignments.
254 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

Hint: Do the following for a dozen or so different frequencies:


Gcn=[0.0521 0.4896 1];
(a) First specify the “unknown” continuous time Gcd=[0.0047 0.0697 0.5129 1.4496 1];
transfer function. w=2; % rad/s
(b) Choose a test frequency, say ω = 2 rad/s, and t=[0:0.1:20]’; % seconds
generate a sine wave at this frequency. u=sin(w*t); % input signal
y=lsim(Gcn,Gcd,u,t); plot(t,[u y])
(c) Simulate the output of the process given this
input. Plot input/output together and read off
the phase lag and amplitude ratio.

Note that the perfect frequency response can be obtained using the freqs command.

2. The opposite calculation, that is fitting a transfer function to an experimentally obtained


frequency response is trivial with the invfreqs command. Use this command to recon-
struct the polynomial coefficients of G(s) using only your experimentally obtained frequency
response.

Problem 6.2 Write a M ATLAB function file to return Gp (iω) given the input/output x(t), y(t) us-
ing Eqn6.5 for some specified frequency. Test your subroutine using the transfer function from
Problem 6.1 and some suitable input signal, and compare the results with the exact frequency
response.
Hint: To speed up the calculations, you should vectorise the function as much as possible. (Of
course for the purposes of this educational example, do not use an FFT)

6.3 Continuous model identification

Historically graphical methods, especially for continuous time domain systems, used to be very
popular partly because one needed little more than pencil and graph paper. For first-order or for
some second-order models common in process control, we have a number of ‘graphical recipes’
which if followed, enable us to extract reasonable parameters. See [179, Chpt 7] for further details.

While we could in principle invert the transfer function given a step input to the time domain
and then directly compare the model and actual outputs, this has the drawback that the solutions
are often convoluted nonlinear functions of the parameters. For example, the solutions to a step
response for first and second order overdamped systems are:

K  
⇐⇒ y(t) = K 1 − e−t/τ
τs + 1
 
K τ1 e−t/τ1 − τ2 e−t/τ2
⇐⇒ y(t) = K 1 −
(τ1 s + 1)(τ2 s + 1) τ1 − τ2

and the problem is that while the gain appears linearly in the time domain expressions, the time
constants appear nonlinearly. This makes subsequent parameter estimation difficult, and in gen-
eral we would need to use a tool such as the O PTIMISATION toolbox to find reasonable values for
the time constants. The following section describes this approach.

6.3.1 Fitting transfer functions using nonlinear least-squares

Fig. 6.22 shows some actual experimental data collected from a simple “black box” plant sub-
jected to a low-frequency square wave. The data was sampled at 10Hz or Ts = 0.1. You can
6.3. CONTINUOUS MODEL IDENTIFICATION 255

3.5

3
output

2.5

1.5
3

2.5

2
0 10 20 30 40 50 60 70 80 90 100
time (s)

Figure 6.22: Experimental data from a continuous plant subjected to a square wave.

notice that in this case the noise characteristics seems to be clearly dependent on the magnitude
of y. By collecting this input/output data we will try to identify a continuous transfer function.

We can see from a glance at the response in Fig. 6.22 that a suitable model for the plant would be
an under-damped second order transfer function

Ke−as
G(s) = (6.6)
τ 2 s2 + 2ζτ s + 1
def  T
where we are interested in estimating the four model parameters of Eqn. 6.6 as θ = K τ ζ a .
This is a nonlinear least-squares regression optimisation problem we can solve using lsqcurvefit.
Listing 6.4 will be called by the optimisation routine in order to generate the model predictions
for a given trial set of parameters.

Listing 6.4: Function to generate output predictions given a trial model and input data.
function ypred = fmodelsim(theta, t,u)
2 % Compute the predicted output, ŷ(t), given a model & input data, u(t).
K = theta(1); tau = theta(2); zeta = theta(3); deadt = theta(4);

Ke−as
G = tf(K,[tauˆ2 2*tau*zeta, 1],'iodelay',deadt); % G(s) = τ 2 s2 +2ζτ s+1
ypred = lsim(G,u,t);
7 return

The optimisation routine in Listing 6.5 calls the function given in Listing 6.4 repeatedly in order
to search for good values for the parameters such that the predicted outputs match the actual
outputs. In parameter fitting cases such as these, it is prudent to constrain the parameters to be
non-negative. This is particularly important in the case of the deadtime where we want to ensure
that we do not inadvertently try models with acausal behaviour.

Listing 6.5: Optimising the model parameters.


K0 = 1.26; tau0 = 1.5; zeta0 = 0.5; deadt0 = 0.1; % Initial guesses for θ 0
theta0 = [K0,tau0,zeta0,deadt0]';
3 LowerBound = zeros(size(theta0)); % Avoid negative deadtime
theta_opt = lsqcurvefit(@(x,t) fmodelsim(x,t,u), theta0,t,y,LowerBound)

Once we have some optimised values, we should plot the predicted model output on top of the
actual output as shown in Fig. 6.23 which, in this case, is not too bad.
256 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

Listing 6.6: Validating the fitted model.


5 K = theta_opt(1); tau = theta_opt(2);
zeta = theta_opt(3); deadt = theta_opt(4);
Gopt = tf(K,[tauˆ2 2*tau*zeta, 1],'iodelay',deadt);

ypred = lsim(Gopt,u,t);
10 plot(t,y,t,ypred,'r-') % See results of the comparison in Fig. 6.23.

0.5

−0.5
output

−1

−1.5
Actual 1.166
fitted
G(s) = 0.5712 s2 +0.317s+1
−2
0 10 20 30 40 50 60 70 80 90 100
time (s)

Figure 6.23: A continuous-time model fitted to input/output data from Fig. 6.22.

The high degree of similarity between the model and the actual plant shown in Fig. 6.23 suggests
that, with some care, this method does work in practice using real experimental data complete
with substantial noise. However on the downside, we did use a powerful nonlinear optimiser,
we were careful to eliminate the possibility any of the parameters could be negative, we chose
sensible starting estimates, and we used nearly 1000 data points.

6.3.2 Identification using derivatives

While most modern system identification and model based control is done in the discrete do-
main, it is possible, as shown above, to identify the parameters in continuous models, G(s),
directly. This means that we can establish the continuous time poles and zeros of a transfer func-
tion without first identifying a discrete time model, and then converting back to the continuous
time domain.

However there are two main reasons why continuous time model identification is far less use-
ful than its discrete time equivalent. The first reason is that since discrete time models are more
useful for digital control, it makes sense to directly estimate them rather than go through a tem-
porary continuous model. The second drawback is purely practical. Either we must resort to
the delicate task of applying nonlinear optimisation techniques such as illustrated above, or if
we wish to use the robust linear least-squares approach, we need reliably to estimate high order
derivatives of both the input and output data. In the presence of measurement noise and possible
discontinuity in gradient for the input such as step changes, it is almost impossible to construct
anything higher than a second-order derivative, thus restricting our models to second order or
less. Notwithstanding these important implementation issues, [29] describes a recursive-least
squares approach for continuous system identification.
6.3. CONTINUOUS MODEL IDENTIFICATION 257

As continuous time model identification requires one to measure not only the input and output,
but also the high order derivatives of y(t) and u(t), it turns out that this construction is delicate
even for well behaved simulated data, so we would anticipate major practical problems for actual
experimental industrial data.

If we start with the continuous transfer function form,


B(s)
Y (s) = U (s)
A(s)
then by rationalising and expanding out the polynomials in s we get
 
an sn + an−1 sn−1 + · · · + a1 s + 1 Y (s) = bm sm + bm−1 sm−1 + · · · + b1 s + b0 U (s)

Inverting to the time domain and assuming zero initial conditions, we get a differential equation

an y (n) + an−1 y (n−1) + · · · + a1 y (1) + y = bm um + bm−1 u(m−1) + · · · + b1 u(1) + b0 u

or in vector format,
 
an

 an−1 

 .. 


 . 

y= −y (n) −y (n−1) · · · −y (1) u(m) ··· u 
 a1

 (6.7)
 bm 
 
 . 
 .. 
b0

where the notation y (n) is the nth derivative of y.

Eqn. 6.7 is in a form suitable for least-squares regression of the vector of unknown coefficients of
the A and B polynomials. Note that if we desire na poles, we must differentiate the output na
times. Similarly, we must differentiate the input once for each zero we want to estimate. A block
diagram of this method is given in Fig. 6.24.

parameters, θ

differentiator 6 differentiator
- 
- 
i -  si
s - regress 
- 
6 6

u - Plant - y

| {z }
plant under investigation

Figure 6.24: Continuous model identification strategy

A simple simulation is given in Listing 6.7. I first generate a “truth” model with four poles and
three zeros. Because I want to differentiate the input, I need to make it sufficiently ‘smooth’ by
258 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

passing a random input through a very-low-pass filter. To differentiate the logged data I will use
the built-in gradient command which approximates the derivative using finite differences. To
obtain the higher-order differentials, I just repeatedly call gradient. With the input/output data
and its derivatives, the normal equations are formed, and solved for the unknown polynomial
coefficients.

Listing 6.7: Continuous model identification of a non-minimum phase system


3(s−3)(s+3)(s+0.5)
Gplant = tf(3*poly([3,-3,-0.5]),poly([-2 -4 -7 -0.4])); % Plant G(s) = (s+7)(s+4)(s+2)(s+0.4)
step(Gplant); % Step response of a Continuous plant

4 % Design smooth differentiable input


dt = 5e-2; t = [0:dt:10]'; % Keep sample time T small
Ur = randn(size(t)); % white noise
[Bf,Af] = butter(4,0.2); % very low pass filter
u = filter(Bf,Af,Ur); % smoothish input
9

y = lsim(Gplant,u,t); % do experiment
plot(t,[u,y]); % verify I/O data

% Now do the identification


14 na = 2; nb = 1; % # of poles & zeros to be estimated. (Keep small).

dy = y; % Zeroth derivative, y (0) = y


for i=1:na % build up higher derivatives recursively, y (1) , y (2) , . . .
dy(:,i+1) = gradient(dy(:,i),t); % should be correctly scaled
19 end % for
du = u; % do again for input signal
for i=1:nb % but only for nb times
du(:,i+1) = gradient(du(:,i),t); % could use diff()
end % for
24

b = y; % RHS
X = [-dy(:,2:na+1), du]; % data matrix
theta = X\b; % solve for parameters

29 % re-construct linear model polynomials


Gmodel = tf(fliplr(theta(na+1:length(theta))'), ...
fliplr([1,theta(1:na)']))

yp = lsim(Gmodel,u,t); % test simulation


34 plot(t,[y],t,yp,'--');

Even though we know the truth model possessed four zeros and three poles, our fitted model
with just two poles and one zero gave a close match as shown in Fig. 6.25. In fact the plant output
(heavy line) is almost indistinguishable from the predicted output (light line) and this is due to
the lack of noise and smoothed input trajectory. However if you get a little over ambitious in
your identification, the algorithm will go unstable owing to the difficulty in constructing reliable
derivatives.

6.3.3 Practical continuous model identification

We may be tempted given the results presented in Fig. 6.25 to suppose that identifying a contin-
uous transfer function from logged data is relatively straight forward. Unfortunately however, if
6.3. CONTINUOUS MODEL IDENTIFICATION 259

0.6
Plant
0.4 Model

0.2
output

−0.2

−0.4
1

0
input

−1

−2
0 1 2 3 4 5 6 7 8 9 10
time

Figure 6.25: A simulated example of a continuous model identification comparing the actual
plant with the model prediction showing almost no discernable error. Compare these results
with an actual experiment in Fig. 6.26.

you actually try this scheme on a real plant, even a well behaved plant such as the black-box, you
will find practical problems of such magnitude that it renders the scheme is almost worthless. We
should also note that if the deadtime is unknown, then the identification problem is non-linear in
the parameters which makes the subsequent identification problematic. This is partly the reason
why many common textbooks on identification and control pay only scant attention to identifica-
tion in the continuous domain, but instead concentrate on identifying discrete models. (Discrete
model identification is covered next in §6.4.)

The upper plot in Fig. 6.26 shows my attempt at identifying a continuous model of the blackbox,
which, compared with Fig. 6.25, is somewhat less convincing. Part of the problem lies in the diffi-
culty of constructing the derivatives of the logged input/output data. The lower plot in Fig. 6.26
shows the logged output combined with the first and second derivatives computed crudely us-
ing finite differences. One can clearly see the compounding effects on the higher derivatives due
to measurement noise, discretisation errors, and so forth.

For specific cases of continuous transfer functions, there are a number of optimised algorithms
to directly estimates the continuous coefficients. Himmelblau, [87, p358–360] gives once such
procedure for which reduces to a linear system which is easy to solve in M ATLAB. However
again this algorithm essentially suffers the same drawbacks as the sequential differential schemes
given previously.

Improving identification reliability by using Laguerre functions

One scheme that has found to be practical is one based on approximating the continuous response
with a series of orthogonal functions known as Laguerre functions. The methodology was devel-
oped at the University of British Columbia and is available commercially from BrainWave, now
part of Andritz Automation. Further details about the identification strategy are given in [198].

Compared to the standard curve fitting techniques, this strategy using orthogonal functions is
more numerically robust, and only involves numerical integration of the raw data, (as opposed
to the troublesome differentiation). On the downside, manipulating the Laguerre polynomial
260 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

Continuous model identification


0.1
output & prediction

0.05

−0.05
Blackbox
Model
−0.1
0 2 4 6 8 10 12 14 16 18 20
Sample time T=0.05 s

Measured output & derivatives


0.15

0.1

0.05
2
y, dy/dt, d y/dt

0
2

−0.05

−0.1

−0.15

−0.2
0 2 4 6 8 10 12 14 16 18 20
time (s)

Figure 6.26: A practical continuous model identification of the blackbox is not as successful as
the simulated example given in Fig. 6.25. Part of the problem is the reliable construction of the
higher derivatives. Upper: The plant (solid) and model prediction (dashed). Lower: The output
y, and derivatives ẏ, ÿ.

functions is cumbersome, and the user must select an appropriate scaling factor to obtain rea-
sonable results. While [198] suggest some rules of thumb, and some algorithms for choosing the
scaling factor, it is still delicate and sensitive calculation and therefore tricky in practice.

Suppose we wish to use Laguerre functions to identify a complex high-order transfer function

(−45s + 1)2 (4s + 1)(2s + 1)


G(s) = (6.8)
(20s + 1)3 (18s + 1)3 (5s + 1)3 (10s + 1)2 (16s + 1)(14s + 1)(12s + 1)

which was used as a challenging test example in [198, p32]. To perform the identification, the
user must decide on an appropriate order, (8 in this case), and a suitable scaling factor, p.

Fig. 6.27 compares the collected noisy step data with the identified step and impulse responses,
compares the identified and gives the error in the impulse response.

6.4 Popular discrete-time linear models

The previous section showed that while it was possible in theory to identify continuous models
by differentiating the raw data, in practice this scheme lacked robustness. Furthermore it makes
sense for evenly spaced sampled data systems not to estimate continuous time models, but to
estimate discrete models directly since we have sampled the plant to obtain the input/output
data anyway. This section describes the various common forms of linear discrete models that we
will find useful in modelling and control.
6.4. POPULAR DISCRETE-TIME LINEAR MODELS 261

T =493.1, p =0.05
ss opt
1.5

1
Step test

0.5

T
ss
−0.5

Step test: Order = 8, scale factor p=0.050


0.04
Inpulse comparison

0.03

0.02

0.01

−0.01

0.05
error

−0.05
0 100 200 300 400 500 600 700 800
time [s]

Figure 6.27: Identification of the 14th order system given in Eqn. 6.8 using an 8th-order Laguerre
series and a near-optimal scaling parameter.

If we collect the input/output data at regular sampling intervals, t = kT , for most applications it
suffices to use linear difference models such as
y(k) + a1 y(k − 1) + a2 y(k − 2) + · · · + an y(k − n) = b0 u(k) + b1 u(k − 1) + · · · + bm u(k − m) (6.9)
where in the above model, the current output y(k) is dependent on m + 1 past and present inputs
u and the n immediate past outputs y. The ai s and bi s are the model parameters we seek.

A shorthand description for Eqn. 6.9 is


A(q −1 ) y(k) = B(q −1 )u(k) + e(k) (6.10)
where we have added a noise term, e(k), and where q −1 is defined as the backward shift-operator.
The polynomials A(q −1 ), B(q −1 ) are known as shift polynomials
A(q −1 ) = 1 + a1 q −1 + a2 q −2 + · · · + an q −n
B(q −1 ) = b0 + b1 q −1 + b2 q −2 + · · · + bm q −m
where the A(q −1 ) polynomial is defined by convention to be monic, that is, the leading coeffi-
cient is 1. In the following development, we will typically drop the argument of the A and B
polynomials.
262 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

Even with the unmeasured noise term, we can still estimate the current y(k) given old values of
y and old and present values of u using

ŷ(k|k − 1) = −a1 y(k − 1) · · · − an y(k − n) + b0 u(k) + b1 u(k − 1) + · · · + bm u(k − m) (6.11)


| {z } | {z }
past outputs present & past inputs

= (1 − A)y(k) + Bu(k) (6.12)

Our estimate or prediction of the current y(k) based on information up to, but not including k is
ŷ(k|k − 1). If we are lucky, our estimate is close to the true value, or ŷ(k|k − 1) ≈ y(k). Note
that the prediction model of Eqn. 6.11 involves just known terms, the model parameters and past
measured data. It does not involve the unmeasured noise terms.

The model described by Eqn. 6.10 is very common and is called an Auto-Regressive with Exoge-
nous, (externally generated) input, or an ARX model. The auto-regressive refers to the fact that
the output y is dependent on old ys, (i.e. it is regressed on itself), in addition to an external input,
u. A signal flow diagram of the model is given in Fig. 6.28.

noise
e

known input
? output
u - B -+ - 1 - y
A

Figure 6.28: A signal flow diagram of an auto-regressive model with exogenous input or ARX
model. Compare this structure with the similar output-error model in Fig. 6.30.

There is no reason why we could not write our models such as Eqn. 6.10 in the forward shift
operator, q as opposed to the backward shift q −1 . Programmers and those working in the digital
signal processing areas tended to favour the backward shift because it is easier and more natural
to write computational algorithms such as Eqn. 6.11, whereas some control engineers favoured
the forward notation because the time delay is given naturally by the difference in order of the A
and B polynomials.

6.4.1 Extending the linear model

Since the white noise term in Eqn. 6.10 is also filtered by the plant denominator polynomial, A(q),
this form is known as an equation error model structure. However in many practical cases we will
need more flexibility when describing the noise term. One obvious extension is to filter the noise
term with a moving average filter

y(k) + a1 y(k − 1) + · · · + an y(k − n) = b0 u(k) + b1 u(k − 1) + · · · + bm u(k − m)


| {z }
ARX model
+ e(k) white noise
+ c1 e(k − 1) + · · · + cn e(k − n)
| {z }
coloured noise

This is now known as “coloured” noise, the colouring filter being the C polynomial. The A UTO -
R EGRESSIVE M OVING -AVERAGE with E X OGENOUS input, ARMAX, model is written in compact
6.4. POPULAR DISCRETE-TIME LINEAR MODELS 263

polynomial form as
A(q)y(k) = B(q)u(k) + C(q)e(k) (6.13)
where we have dropped the notation showing the explicit dependance on sample interval k. The
ARMAX signal flow is given in Fig. 6.29. This model now has two inputs, one deterministic u,
and one disturbing noise, e, and one output, y. If the input u is zero, then the system is known
simply as an ARMA process.

noise
e

?
C

known input
? output
u - B -+ - 1 - y
A

Figure 6.29: A signal flow diagram of a ARMAX model. Note that the only difference between
this, and the ARX model in Fig. 6.28, is the inclusion of the C polynomial filtering the noise term.

While the ARMAX model offers more flexibility than the ARX model, the regression of the pa-
rameters is now nonlinear optimisation problem. The nonlinear estimation of the parameters
in the A, B and C polynomials is demonstrated on page 302. Also note that the noise term is
still filtered by the A polynomial meaning that both the ARX and ARMAX models are suitable
if the dominating noise enters the system early in the process such as a disturbance in the input
variable.

Example We can simulate such a process in raw M ATLAB, although the handling the indices and
initialisation requires some care. Suppose we have an ARMAX model of the form Eqn. 6.13 with
coefficient matrices

A(q) = q 2 − q + 0.5, B(q) = 2q + 0.5, C(q) = q 2 − 0.2q + 0.05

Note that these polynomials are written in the forward shift operator, and the time delay given
by the difference in degrees, na − nb , is 1. A M ATLAB script to construct the model is:

1 A = [1 -1 0.5]; % Model A(q) = q 2 − q + 0.5


B = [2 0.5]; % No leading zeros B(q) = 2q + 0.5
C = [1 -0.2 0.05]; % Noise model C(q) = q 2 − 0.2q + 0.05
ntot = 10; % # of simulated data points
U = randn(ntot,1); E = randn(ntot,1);

Now to compute y given the inputs u and e, we use a for-loop and we write out the equation
explicitly as

y(k + 2) = y(k + 1) − 0.5y(k) + 2u(k + 1) + 0.5u(k) + e(k + 2) − 0.2e(k + 1) + 0.05e(k)


| {z } | {z } | {z }
old outputs old inputs noise

It is more convenient to wind the indices back two steps so we calculate the current y(k) as
opposed to the future y(k + 2) as above. We will assume zero initial conditions to get started.
264 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

n = length(A)-1; % Order na
z = zeros(n,1); % Padding: Assume ICs = zero
Uz = [z;U]; Ez = [z;E]; y = [z;NaN*U]; }% Initialise
d = length(A)-length(B); % deadtime = na − nb
5 zB = [zeros(1,d),B]; % pad B(q) with leading zeros

for i=n+1:length(Uz)
y(i) = -A(2:end)*y(i-1:-1:i-n) + zB*Uz(i:-1:i-n) + C*Ez(i:-1:i-n);
end
10 y(1:n) = []; % strip off initial conditions

This strategy of writing out the difference equations explicitly is rather messy and error-prone,
so it is cleaner, but less transparent, to use the idpoly routine to create an armax model and the
idmodel/sim command in the identification toolbox to simulate it. Again we must pad with
zeros the deadtime onto the front of the B polynomial.

>>G = idpoly(A,[0,B],C,1,1) % Create polynomial model A(q)y(t) = B(q)u(t) + C(q)e(t)


Discrete-time IDPOLY model: A(q)y(t) = B(q)u(t) + C(q)e(t)
A(q) = 1 - qˆ-1 + 0.5 qˆ-2

5 B(q) = 2 qˆ-1 + 0.5 qˆ-2

C(q) = 1 - 0.2 qˆ-1 + 0.05 qˆ-2

This model was not estimated from data.


10 Sampling interval: 1

>>y = sim(G,[U E]); % test simulation

Both methods (writing the equations out explicitly, or using idpoly) will give identical results.

6.4.2 Output error model structures

Both the linear ARX model of Eqn. 6.10, and the version with the filtered noise, Eqn. 6.13, as-
sume that both the noise and the input are filtered by the plant denominator, A. A more flexible
approach is to treat the input and noise sequences separately as illustrated in Fig. 6.30. This is
known as an output error model. This model is suitable when the dominating noise term enters
late in the process, such as a disturbance in the measuring transducer for example.

noise
e
known input output
?
u - B -+ - y
F

Figure 6.30: A signal flow diagram of an output-error model. Compare this structure with the
similar ARX model in Fig. 6.28.
6.4. POPULAR DISCRETE-TIME LINEAR MODELS 265

Again drawback of this model structure compared to the ARX model is that the optimisation
function is a nonlinear function of the unknown parameters. This is because the estimate ŷ not
directly observable, but is itself a function of the parameters in polynomials B and F . This makes
the solution procedure more delicate and iterative. The S YSTEM I DENTIFICATION toolbox can
identify output error models with the oe command.

6.4.3 General input/output models

The most general linear input/output model with one noise input, and one known input is de-
picted in Fig. 6.31. The relation is described by

B(q) C(q)
A(q)y(k) = u(k) + e(k) (6.14)
F (q) D(q)

although it is rare that any one particular application will require all the polynomials. For exam-
ple [91, p76] point out that only the process dynamics B and F are usually identified, and D is
a design parameter. Estimating the noise C polynomial is difficult in practice because the noise
sequence is unknown and so must be approximated by the residuals.

noise
e - C
D

known input
? output
u - B -+ - 1 - y
F A

Figure 6.31: A general input/output model structure

Following the block diagram in Fig. 6.31 or Eqn. 6.14, we have the following common reduced
special cases:

1. ARX: Autoregressive with eXogenous input; C = D = F = 1

2. AR: Autoregressive with no external input; C = D = 1, and B = 0

3. FIR (Finite Input Response): C = D = F = A = 1

4. ARMAX: Autoregressive, moving average with eXogenous input; D = F = 1

5. OE (Output error): C = D = A = 1

6. BJ (Box-Jenkins): A = 1

In all these above models, our aim is to estimate the model parameters which are the coefficients
of the various polynomials given the observable input/output experimental plant data. As al-
ways, we want the smallest, most efficient model that we can get away with. The next section
will describe how we can find suitable parameters to these models. Whether we have the correct
model structure is another story, and that tricky and delicate point is described in section 6.6.
Nonlinear models are considered in section 6.6.3.
266 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

6.5 Regressing discrete model parameters

The simple auto-regressive with exogenous input (ARX) model,


A(q −1 ) y(k) = B(q −1 ) u(k) (6.15)
can be arranged to give the current output in terms of the past outputs and current and past
inputs as,
y(k) = (1 − A(q −1 ))y(k) + B(q −1 ) u(k) (6.16)

or written out explicitly as

y(k) = −a1 y(k − 1) · · · − an y(k − n) + b0 u(k) + b1 u(k − 1) + · · · + bm u(k − m) (6.17)


| {z } | {z }
past outputs current & past inputs

Our job is to estimate reasonable values for the unknown parameters a1 , . . . , an and b0 , . . . , bm
such that our model is a reasonable approximation to data collected from the true plant. If we
collectively call the parameter column vector, θ, as
h iT
θ = a1 a2 · · · an ... b0 b1 · · · bm
def

and the past data vector as


h iT
def
ϕ = −y(k − 1) −y(k − 2) · · · −y(k − n) ..
. u(k) u(k − 1) · · · u(k − m)
then we could write Eqn. 6.17 compactly as
y(k) = ϕT θ (6.18)
The data vector ϕ is also known as the regression vector since we are using it to regress the values
of the parameter vector θ.

While Eqn. 6.18 contains (n + m + 1) unknown parameters we have only one equation. Clearly
we need additional equations to reduce the degree of freedom which in turn means collecting
more input/output data pairs.

Suppose we collect a total of N data pairs. The first n are used to make the first prediction, and
the next N − n are used to estimate the parameters. Stacking all the N − n equations, of which
Eqn. 6.17 is the first, in a matrix form gives
 
  −yn−1 −yn−2 ··· −y0 un un−1 ··· un−m
yn  
  −yn −yn−1 ··· −y1 ···
 yn+1   un+1 un un−m+1 


 ..  =  θ
 .    .. .. .. .. .. .. .. .. 

 . . . . . . . . 
yN
−yN −1 −yN −2 ··· −yN −n−1 uN uN −1 ··· uN −m
(6.19)
or in a compact form
yN = XN θ (6.20)
Now the optimisation problem is, as always, to choose the parameter vector θ such that the
model M is an optimal estimate to the true system S. We will choose to minimise the sum of the
squared errors, so our cost function becomes
N
X 2
J = yk − ϕTk θ (6.21)
k=n
6.5. REGRESSING DISCRETE MODEL PARAMETERS 267

which we wish to minimise. In practice, we are rarely interested in the numerical value of J but
in the value of the parameter vector, θ, that minimises it,
( N
)
X 2

θ = argminθ yk − ϕTk θ (6.22)
k=n

Eqn. 6.22 reads as “the optimum parameter vector is given by the argument that minimises the
sum of squared errors”. Eqn. 6.22 is a standard linear regression problem which has the same
solution as in Eqn 3.41, namely
 −1
θ = X⊤ N XN X⊤N yN (6.23)

For the parameters to be unique, the inverse of the matrix X⊤


N XN must exist. We can ensure
this by constructing an appropriate input sequence to the unknown plant. This is known as
experimental design.

Example of offline discrete model identification.

Suppose we have collected some input/output data from a plant

time, k input output


0 1 1
1 4 2
2 −3 −7
3 2 16

to which we would like to fit a model of the form A(q)yk = B(q)uk . We propose a model
structure
A(q) = 1 + a1 q −1 , and B(q) = b0
def T
which has two unknown parameters, θ = a1 b0 and we are to find values for a1 and b0
such that our predictions adequately match the measured output above. We can write our model
as a difference equation at time k,

yk = −a1 yk−1 + b0 uk

which if we write out in full, at time k = 3 using the 4 previously collected input/output data,
gives
   
y1 −y0 u1  
 y2  =  −y1 a1
u2 
b0
y3 −y2 u3

or by inserting the numerical values


   
2 −1 4  
 −7  =  −2 −3  a1
b0
16 7 2

The solution for the parameter vector θ is over-constrained as we have 3 equations and only 2
unknowns, so we will use Eqn. 6.23 to solve for the optimum θ.
 −1
θ N = X⊤
N XN X⊤
N yN
268 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

and substituting N = 3 gives


  −1  
  −1 4   2
−1 −2 7  −1 −2 7 
θ3 =  −2 −3  −7 
4 −3 2 4 −3 2
7 2 16
  
1 29/2 8 124
=
655 8 27 61
 
2
=
1
This means the parameters are a1 = 2 and b0 = 1 giving a model
1
M(θ) =
1 + 2q −1
This then should reproduce the measured output data, and if we are lucky, be able to predict
future outputs too. It is always a good idea to compare our predictions with what was actually
measured.

time input actual output predicted output


0 1 1 —
1 4 2 2
2 −3 −7 −7
3 2 16 16

Well, surprise surprise, our predictions look unrealistically perfect! (This problem is continued
on page 285 by demonstrating how by using a recursive technique we can efficiently process new
incoming data pairs.)

6.5.1 Simple offline system identification routines

The following routines automate the offline identification procedure given in the previous sec-
tion. However it should be noted that if you are serious about system identification, then a better
option is to use the more robust routines from the the S YSTEM I DENTIFICATION T OOLBOX. These
routines will be described in section 6.5.3.

In Listing 6.8 below, we will use the true ARX process with polynomials
A(q) = 1 − 1.9q −1 + 1.5q −2 − 0.5q −3 , and B(q) = 1 + 0.2q −1 (6.24)
to generate 50 input/output data pairs.

Listing 6.8: Generate some input/output data for model identification


1+0.2q −1
G = tf([1,0.2,0,0],[1 -1.9 1.5 -0.5],1); % ‘Unknown’ plant 1−1.9q −1 +1.5q −2 −0.5q −3
G.variable = 'zˆ-1';
3 u = randn(50,1); % generate random input
y = lsim(G,u); % Generate output with no noise

Now that we have some trial input/output data, we can write some simple routines to calculate
using least-squares the coefficients of the underlying model by solving Eqn. 6.19. Once we have
performed the regression, we can in this simulated example, compare our estimated coefficients
with the true coefficients given in Eqn. 6.24.
6.5. REGRESSING DISCRETE MODEL PARAMETERS 269

Listing 6.9: Estimate an ARX model from an input/output data series using least-squares
1 na = 3; nb = 2; % # of unknown parameters in the denominator, na , & numerator, nb .

N = length(y); % length of data series


nmax = max(na,nb)+1; % max order

6 X = y(nmax-1:N-1); % Construct Eqn. 6.19


for i=2:na
X = [X, y(nmax-i:N-i)];
end
for i=0:nb-1
11 X = [X, u(nmax-i:N-i)];
end
y_lhs = y(nmax:N); % left hand side of y = Xθ
theta = X\y_lhs % θ = X+ y

16 G_est = tf(theta(na+1:end)',[1 -theta(1:na)'],-1,'Variable','zˆ-1')

If you run the estimation procedure in Listing 6.9, your estimated model, G_est, should be
identical to the model used to generate the data, G, in Listing 6.8. This is because in this simulation
example we have no model/plant mismatch, no noise and we use a rich, persistently exciting
identification signal.

As an aside, it is possible to construct the Vandemonde matrices (data matrix X, in Eqn. 6.20)
using Toeplitz (or the closely related Hankel matrices) as shown in Listing 6.10.

Listing 6.10: An alternative way to construct the data matrix for ARX estimation using Toeplitz
matrices. See also Listing 6.9.
na = length(A)-1; nb = length(B); % # of parameters to be estimated
nmax = max(na,nb);

4 Y = toeplitz(y(nmax:end-1),y(nmax:-1:nmax-na+1));
U = toeplitz(u(nmax+1:end),u(nmax+1:-1:nmax-nb+2));
X = [Y,U]; % Form data matrix, X.

theta = X\y(nmax+1:end) % θ = X+ y

Note that in this implementation, na is both the order of the A(q −1 ) polynomial and the number
of unknown coefficients to be estimated given that A is monic, while nb is the order of B plus
one.

6.5.2 Bias in the parameter estimates

The strategy to identify ARX systems has the advantage that the regression problem is linear, and
the estimated parameters will be consistent, that is, they can be shown to converge to the true pa-
rameters as the number of samples increases provided the disturbing noise is white. If however,
the disturbances are non-white as in the case where we have a non-trivial C(q) polynomial, then
the estimated A(q) and B(q) parameters will exhibit a bias.

To illustrate this problem, suppose we try to estimate the A and B polynomial coefficients in the
270 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

the ARMAX process


(1 + 0.4q −1 − 0.5q −2 )yk = q −1 (1.2 − q −1 )uk + (1 + 0.7q −1 + 0.3q −2 ) ek (6.25)
| {z }
noise filter

where we have a non-trivial colouring C polynomial disturbing the process. However the results
shown in Fig. 6.32 illustrate the failure for the ARX estimation routine to properly identify the
A(q) and B(q) parameters. Note that the estimate of a1 after 300 iterations is around 0.2 which
is some distance for the true result of 0.4. Similar errors are evident for a2 and b1 and it does not
look like they will converge even if we collect mode data. This inconsistency of the estimated
parameters is a direct result of the C(q) polynomial which filters the white noise, ek . As the
disturbing signal is no longer white, the ARX estimation routine is now no longer consistent.

input
20
output
Input/output

−20
2

1.5
b
0
1
Paramaters, θ

0.5 a1

−0.5 a2
Figure 6.32: In this example, the
−1 b
ARX estimation routine fails to con- 1
verge to the true A(q) and B(q) pa-
rameters. This bias is due to the pres- −1.5
ence of the C(q) polynomial which 0 100 200 300
‘colours’ the noise. Sample No. [k]

To obtain the correct parameters without the bias, we have no option other than to estimate the
C(q) polynomial as well. This ARMAX estimation strategy is unfortunately a nonlinear estima-
tion problem and appropriate identification strategies are available in the S YSTEM I DENTIFICA -
TION T OOLBOX described next.

6.5.3 Using the System Identification toolbox

As an alternative to using the rather simple and somewhat ‘home grown’ identification routines
developed in section 6.5.1 above, we could use the collection in the S YSTEM I DENTIFICATION
T OOLBOX. This comprehensive set of routines for dynamic model identification that closely fol-
low [124] and is far more reliable and robust than the purely illustrative routines presented so
far.
6.5. REGRESSING DISCRETE MODEL PARAMETERS 271

The simplest model considered by the S YS ID toolbox is the auto-regressive with exogeneous
input, or ARX model, described in Eqn. 6.10. However for computational reasons, the toolbox
separates the deadtime from the B polynomial so that the model is now written

A(q −1 )y = q −d B(q −1 )u + e (6.26)

where the integer d is the assumed number of delays. The arx routine will, given input/output
data, try to establish reasonable values for the coefficients of the polynomials A and B using
least-squares functionally similar to the procedure given in Listing 6.9.

Of course in practical identification cases we do not know the structure of the model (i.e. the
order of B, A and the actual sample delay), so we must guess a trial structure. The arx command
returns both the estimated parameters and associated uncertainties in an object which we will
call testmodel and we can view with present. Once we fitted out model, we can compare
the model predictions with the actual output data using the compare command. The script in
Listing 6.11 demonstrates the load, fitting and comparing, but in this case we deliberately use a
structurally deficient model.

Listing 6.11: Offline system identification using arx from the System Identification Toolbox
>>plantio = iddata(y,u,1); % Use the data generated from Listing 6.8.
2

>>testmodel = arx(plantio,[2 2 0]) % estimate model coefficients


Discrete-time IDPOLY model: A(q)y(t) = B(q)u(t) + e(t)
A(q) = 1 - 1.368 qˆ-1 + 0.5856 qˆ-2

7 B(q) = 1.108 + 0.8209 qˆ-1

Estimated using ARX from data set plantio


Loss function 0.182791 and FPE 0.214581
Sampling interval: 1
12

>>compare(plantio,testmodel) % See Fig. 6.33.

Measured Output and Simulated Model Output


10
Measured Output
8 testmodel Fit: 71.34%

2
y1

−2
Figure 6.33: Offline system
−4
identification using the Sys-
−6
tem Identification Toolbox
−8
and a structurally deficient
−10
0 10 20 30 40 50 model.

Note however that with no structural mismatch, the arx routine in Listing 6.12 should manage
to find the exact values for the model coefficients,

Listing 6.12: Offline system identification with no model/plant mismatch


>> testmodel = arx(plantio,[3 2 0]) % Data from Listing 6.11 with correct model structure.
2 Discrete-time IDPOLY model: A(q)y(t) = B(q)u(t) + e(t)
A(q) = 1 - 1.9 qˆ-1 + 1.5 qˆ-2 - 0.5 qˆ-3
272 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

B(q) = 1 + 0.2 qˆ-1

7 Estimated using ARX from data set plantio


Loss function 1.99961e-029 and FPE 2.44396e-029
Sampling interval: 1

which it does.

Note that in this case the identification routine ‘perfectly’ identified our plant and that the loss
function given in the result summary is practically zero (≈ 10−30 ). This indicates an extraordi-
nary goodness of fit. This is not unexpected, since we have used a perfect linear process, which
we knew perfectly in structure although naturally we could not expect this in practice.

The arx command essentially solves a linear regression problem using the special M ATLAB com-
mand \ (backslash), as shown in the simple version given in Listing 6.9. Look at the source of
arx for further details.

Purely auto-regressive models

As the name suggests, purely auto-regressive models have no measured input,

A(q −1 ) yk = ek (6.27)

and are used when we have a single time series of data which we suspect is simply disturbed
by random noise. In Listing 6.13 we attempt to fit a 3rd order AR model and we get reasonable
results with the coefficients within 1% of the true values. (We should of course compare the
pole-zero plots of the model and plant because the coefficients of the polynomials are typically
numerically ill-conditioned.) Note however that we used 1000 data points for this 3 parameter
model, far more than we would need for an ARX model.

Listing 6.13: Demonstrate the fitting of an AR model.


1 >> N = 1e3; e = randn(N,1); % Take 1000 data points
>> B=1; A = [1 0.4 0.1 0.2]; % True AR plant is: A(q −1 ) = 1 − 0.4q −1 + 0.1q −2 + 0.2q −3
>> y = filter(B,A,e);

>> Gm = ar(iddata(y),3); % Fit AR model with no model/plant mismatch


6 Discrete-time IDPOLY model: A(q)y(t) = e(t)
A(q) = 1 + 0.419 qˆ-1 + 0.1008 qˆ-2 + 0.1959 qˆ-3

Estimated using AR ('fb'/'now') from data set Dat


Loss function 1.00981 and FPE 1.01587
11 Sampling interval: 1

Estimating noise models

Estimating noise models is complicated since some of the regressors are now functions them-
selves of the unknown parameters. This means the optimisation problem is now non-linear com-
pared to the relatively straight forward linear arx case. However, as illustrated in section 6.5.2,
we need to estimate the noise models to eliminate the bias in the parameter estimation.
6.5. REGRESSING DISCRETE MODEL PARAMETERS 273

In Listing 6.14 we generate some input/output data from an output-error process,


B
y(k) = u(k − d) + e(k)
F
1 + 0.5q −1
= u(k − 1) + e(k)
1 − 0.5q −1 + 0.2q −2
whose parameters are, of course, in reality unknown. The easiest way to simulate an output-error
process is to use the idmodel/sim command from the System ID toolbox.

Listing 6.14: Create an input/output sequence from an output-error plant.


>> Ts = 1.0; % Sample time T
>> NoiseVariance = 0.1;

4 >> F = [1 -0.5 0.2]; % F (q −1 ) = 1 − 0.5q −1 + 0.2q −2


>> B = [0 1 0.5]; % Note one unit of delay: B(q −1 ) = q −1 + 0.5q −2
>> A = 1;C=1;D=1; % OE model: y = (B/F )u + e

>> G = idpoly(A,B,C,D,F,NoiseVariance,Ts)
9 Discrete-time IDPOLY model: y(t) = [B(q)/F(q)]u(t) + e(t)
B(q) = qˆ-1 + 0.5 qˆ-2

F(q) = 1 - 0.5 qˆ-1 + 0.2 qˆ-2

14 This model was not estimated from data.


Sampling interval: 1

>> N = 10000; % # of samples


>> U = randn(N,1); E = 0.1*randn(N,1); % Input & noise
19 >> Y = sim(G,[U E]);

Using the data generated in Listing 6.14, we can attempt to fit both an output-error and an arx
model. Listing 6.15 shows that in fact the known incorrect arx model is almost as good as the
structurally correct output-error model.

Listing 6.15: Parameter identification of an output error process using oe and arx.
1 >> Gest_oe = oe(Z,'nb',2,'nf',2,'nk',1,'trace','on');
>> present(Gest_oe);

Discrete-time IDPOLY model: y(t) = [B(q)/F(q)]u(t) + e(t)


B(q) = 1.002 (+-0.0009309) qˆ-1 + 0.5009 (+-0.002302) qˆ-2
6

F(q) = 1 - 0.4987 (+-0.001718) qˆ-1 + 0.1993 (+-0.00107) qˆ-2

Estimated using OE from data set Z


Loss function 0.00889772 and FPE 0.00890484
11 Sampling interval: 1
Created: 02-Jul-2007 11:35:15
Last modified: 02-Jul-2007 11:35:16

>> Gest_arx = arx(Z,'na',2,'nb',2,'nk',1);


16 >> present(Gest_arx);
Discrete-time IDPOLY model: A(q)y(t) = B(q)u(t) + e(t)
A(q) = 1 - 0.4816 (+-0.001879) qˆ-1 + 0.1881 (+-0.001366) qˆ-2
274 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

B(q) = 1.001 (+-0.001061) qˆ-1 + 0.518 (+-0.002154) qˆ-2


21

Estimated using ARX from data set Z


Loss function 0.0111952 and FPE 0.0112041
Sampling interval: 1
Created: 02-Jul-2007 11:35:16
26 Last modified: 02-Jul-2007 11:35:16

The System Identification toolbox has optimal parameters for the fitting of the nonlinear models.
One of the more useful options is to be able to fix certain parameters to known values. This can
be achieved using the ’FixParameter’ option.

6.5.4 Fitting parameters to state space models

So far we have concentrated on finding parameters for transfer function models, which means
we are searching for the coefficients of just two polynomials. However in the multivariable case
we need to identify state-space models which means estimating the elements in the Φ, ∆ and C
matrices. If we are able to reliably measure the states, then we can use the same least-squares
approach (using the pseudo inverse) that we used in the transfer function identification. If how-
ever, and more likely, we cannot directly measure the states, x, but only the output variables, y
then we need to use a more complex subspace approach described on page 276.

The state-space identification problem starts with a multivariable discrete model

xk+1 = Φxk + ∆uk , x ∈ ℜn , u ∈ ℜm (6.28)

where we want to establish elements in the model matrices Φ, ∆ given the input/state sequences
u and x. To ensure we have enough degrees of freedom, we would expect to need to collect at
least (n + m) × m data pairs, although typically we would collect many more if we could.

Transposing Eqn. 6.28 gives xTk+1 = xTk ΦT + uTk ∆T , or

 
φ1,1 φ2,1 ··· φn,1
   φ1,2 φ2,2 ··· φn,2 
xTk+1 =
 
x1 x2 · · · xn 
k .. .. .. 
. . . 
φ1,n φ2,n · · · φn,n
| {z }
n×n
 
∆1,1 ∆2,1 · · · ∆n,1
   . .. .. 
+ u1 · · · um k
.. . . 
∆1,m ∆2,n · · · ∆m,n
| {z }
m×n
6.5. REGRESSING DISCRETE MODEL PARAMETERS 275

we can see that it is simple to rearrange this equation into


 
φ1,1 φ2,1 ··· φn,1

 φ1,2 φ2,2 ··· φn,2 

 .. .. .. 


  . . . 

x1 x2 · · · xn | u1 · · · um  φ1,n φ2,n · · · φn,n 
xTk+1 = | {z } 
 —
 (6.29)
n
— — — 
k 
 ∆1,1 ∆2,1 · · · ∆n,1 
 
 . 
 .. 
∆1,m ∆2,n · · · ∆m,n
| {z }
Θ matrix (n+m)×n
where we have stacked the unknown model matrices Φ and ∆ together into one large parameter
matrix Θ. If we can estimate this matrix, we have achieved our model fitting.

Currently Eqn. 6.29 comprises of n equations, but far more, (n + m)n, unknowns; the elements
in the Φ and ∆ matrices. To address this degree of freedom problem, we need to collect more
input/state data. If we collect say N more input/states and stack them underneath Eqn. 6.29, we
get
     
xTk+1  x1 x2 · · · xn | u1 · · · um k
 xTk+2   x1 x2 · · · xn | u1 · · · um 
   k+1 
 .. = . Θ (6.30)
 .   .. 
 
xT +1 x1 x2 · · · xn | u1 · · · um k+N
| k+N {z } | {z }
N ×n N ×(n+m)

or Y = XΘ. Inverting this, using the pseudo inverse of the data matrix X, gives the least-squares
parameters
 T 
Φ
Θ= = X+ Y (6.31)
∆T
and hence the Φ and ∆ matrices as desired.

The computation of the model matrices Φ and ∆ is a linear least-squares regression which is easy
to program, and robust to execute. However we are still constrained to the normal restrictions
on the invertibility and conditioning of the data matrix X apply in order to obtain meaningful
solutions. The largest drawback of this method is that we must measure states x, as opposed to
only outputs. (If we only have outputs then we need to use the more sophisticated technique
described on 276). If we had measured the outputs in addition to the states, it is easy to calculate
the C matrix subsequently using another least-squares regression.

In M ATLAB we could use the fragment below to compute Φ and ∆ given a sequence of x(k), and
u(k).

[y,t,X] = lsim(G,U,[],x0); % Discrete I/O data collected

Xdata = [X,U]; Xdata(end,:) = []; % data matrix


4 y = X; y(1,:) = []; % delete first row

theta = Xdata\y; % solve for parameters Θ = X+ Y


Phiest = theta(1:n,:)'; Delest = theta(n+1:end,:)';

If any of the parameters are known a priori, they can be removed from the parameter matrix,
and the constants substituted. If any dependencies between the parameters are known (such as
276 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

a = 3b), then this may be incorporated as a constraint, and a nonlinear optimiser used to search
for the resulting best-fit parameter vector. The S YSTEM I DENTIFICATION T OOLBOX incorporates
both these extensions.

Estimating the confidence limits for parameters in dynamic systems uses the same procedure as
outlined in §3.4.3. The difficult calculation is to extract the data matrix, X, (as defined by Eqn
3.62) accurately. [23, pp226–228] describes an efficient method which develops and subsequently
integrates sensitivity equations, which is in general, the preferred technique to that using finite
differences.

Identification of state-space models without state measurements

The previous method required state measurements, but it would be advantageous to be able still
to estimate the model matrices, but only be required to measure input, u, and output y data. This
is addressed in a method known as a subspace identification described in [71, p536] and in more
detail by the developers in [153]. The state-space subspace algorithm for identification, known
as n4sid, is available in the System Identification toolbox.

Summary of identification alternatives in state-space

Given the underlying system

xk+1 = Φxk + ∆uk


yk = Cxk

then Table 6.2 summarises the identification possibilities.

Table 6.2: Identification alternatives in state-space

We measure Wish to calculate Algorithm


page #
uk , yk , Φ, ∆, C xk Kalman filter 431
uk , yk , xk Φ, ∆, C Normal linear regression 274
uk , y k n, Φ, ∆, C, xk Subspace identification 276

6.6 Model structure determination and validation

The basic linear ARX model using in Eqn. 6.26 has three structural parameters: the number of
poles, na , the number of zeros nb and the number of samples of delay, d. Outside of regressing
suitable parameters given a tentative model structure, we need some method to establish what a
tentative structure is. Typically this involves some iteration in the model fitting procedure.

If we inadvertently choose too high an order for the model, we are in danger of overfitting the
data, which will probably mean that our model will perform poorly on new data. Furthermore,
for reasons of efficiency, we would like the simplest model, with the smallest number of pa-
rameters that still adequately predicts the output. This is the rational for dividing the collected
experimental data into two sets, one estimation set used for model fitting, and one validation set
used to distinguish between models structures.
6.6. MODEL STRUCTURE DETERMINATION AND VALIDATION 277

One way to penalise overly complex models is to use the Akaike Information Criteria (AIC)
which for normally distributed errors is
 
SSE
AIC = 2p + N ln (6.32)
N
where p is the number of parameters, N is the number of observations, and SSE is the sum of
squared errors. The S YSTEM I DENTIFICATION toolbox computes the AIC.

If we increase the number of parameters to be estimated, the sum of squared errors decreases,
but this is partially offset by the 2p term which increases. Hence by using the AIC we try to find
a balance between models that fit the data, but are parsimonious in parameters.

6.6.1 Estimating model order

If we restrict our attention to linear models, the most crucial decision to make is that of model
order. We can estimate the model order from a spectral analysis, perhaps by looking at the high-
frequency roll-off of the experimentally derived Bode diagram.

An alternative strategy is to monitor the rank of the matrix XT X. If we over-estimate the order,
then the matrix XT X will become singular. Of course with additive noise, the matrix may not
be exactly singular, but it will certainly become illconditioned. Further details are given in [126,
pp496–497].

Identification of deadtime

The number of delay samples, d in Eqn. 6.26 is a key structural parameter to be established prior
to any model fitting. In some cases, an approximate value of the deadtime is known from the
physics of the system, but this is not always a reliable indicator. For example, the blackbox from
Fig. 1.4 should not possess any deadtime since it is composed only of passive resistive/capacitor
network. However the long time constant, the high system order, or the enforced one-sample
delay in the A/D sampler may make it appear if there is some deadtime.

Question: If we suspect some deadtime, how do we confirm & subsequently measure it?

1. Do a step test and look for a period of inactivity.


2. Use the S YSTEM I DENTIFICATION T OOLBOX (SITB) and optimise the fit for deadtime. This
will probably require a trial & error search.
3. Use some statistical tests (such as correlation) to estimate the deadtime.

Note: Obviously the deadtime will be an integer number of the sampling time. So if you want
an accurate deadtime, you will need a small sample time T . However overly fast sampling will
cause problems in the numerical coefficients of the discrete model.

Method #1 Apply a step response & look for the ‘first sign of life’.

1. First we do an open loop step test using a sample time T = 0.5 seconds.
2. We should repeat the experiment at a different sampling rate to verify the deadtime. Fig. 6.35
compares both T = 1 and T = 0.1 seconds.
278 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

Blackbox step response


0.5 zoomed region input
0.06 output
0.04
0.4
0.02

0
0.3
input/output

−0.02
19 20 21 22 23
zoomed region
0.2
0.28
0.26
0.1 0.24
0.22
0.2
0 0.18
40 41 42 43 44

15 20 25 30 35 40 45 50
time (s)

Figure 6.34: Identification of deadtime from the step response of the blackbox. Note that the
deadtime is approximately 2T ≈ 1 second.

0.2

0.15
i/o

0.1

0.05

0
Figure 6.35: Deadtime estimation at both T = 20 21 22 23 24 25
1 and T = 0.1s. time [s]

Method #2: Use ‘trial & error’ with the S YSTEM IDENTIFICATION T OOLBOX. We can collect some
input/output data and identify a model M(θ) with different deadtimes, namely d = 0 to 3.
Clearly the middle case in Fig. 6.36 where d = 2 has the least error.

This shows, at least for the model M(θ) with na = nb = 2, then the optimum number of delays
is d = 2.

Modelling the blackbox

A model of the experimental blackbox is useful to test control schemes before you get to the lab.
Based on the resistor/capcitor internals of the black-box, we expect an overdamped plant with
time constants in the range of 2 to 5 seconds.

Identification tests on my blackbox gave the following continuous transfer function model
0.002129s2 + 0.02813s + 0.5788 −0.5s
Ĝbb (s) = e
s2 + 2.25s + 0.5362
or in factored form
0.002129(s2 + 13.21s + 271.9) −0.5s
= e
(s + 1.9791)(s + 0.2709)
Fig. 6.37 illustrates the S IMULINK experimental setup and compares the model predictions to the
actual measured data.
6.6. MODEL STRUCTURE DETERMINATION AND VALIDATION 279

0.5
Deadtime=0

−0.5

0.5
Deadtime=1

−0.5

0.5
Deadtime=2

−0.5

actual
0.5
predicted
Deadtime=3

0
Figure 6.36: Model prediction and
−0.5
20 40 60 80 100 120 140 160
actual using a variety of tentative
time (s) deadtimes, d = 0, 1, 2, 3.

uy
To Workspace2
0
In1 Out1
initialise
Manual Switch Saturation Scope1
bb_pcl711
Mux1

Band−Limited
White Noise

0.002129s2 +0.02813s+0.5788
s2+2.25s+0.5362
BB Delay
Blackbox dyn

(a) Blackbox experimental setup in S IMULINK

0.6

0.4

0.2

−0.2

−0.4
u
−0.6 y
y predict
−0.8
25 30 35 40 45 50 55 60 65 70 75
time (seconds)
(b) Blackbox model predictions compared to actual data

Figure 6.37: A dynamic model for the blackbox compared to actual output data.
280 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

6.6.2 Robust model fitting

So far we have chosen to fit models by selecting parameters that minimise the sum of squared
residuals. This is known as the principle of the least squares prediction error and is equivalent
to the maximum likelihood method when the uncertainty distribution is gaussian. This strategy
has some very nice properties first identified by Gauss in 1809 such as the fact that the estimates
are consistent, that is they eventually converge to the true parameters, the estimate is efficient in
that no other unbiased estimator has a smaller variance, and the numerical strategies are simple
and robust as described in [10].

However by squaring the error, we run into problems in the presence of outliers in the data. In
these cases which are clearly non-gaussian, but actually quite common, we must modify the form
of the loss function. One strategy due to [90] is to use a quadratic form when the errors are small,
but a linear form for large errors, another suggested in [10] is to use something like

ǫ2
f (ǫ) = (6.33)
1 + aǫ2
and there are many others. Numerical implementations of these robust fitting routines are given
in [201].

6.6.3 Common nonlinear model structures

Nonlinear models, or more precisely, dynamic models that are nonlinear in the parameters are
considerably more complex to identify than linear models such as ARX, or even pseudo-linear
models such as ARMAX as outlined in the unified survey presented in [184]. For this reason fully
general nonlinear models are far less commonly used for control and identification than linear
models. However there are cases where we will need to account for some process nonlinearities,
and one popular way is to only consider static nonlinearities that can be separated from the linear
dynamics.

Fig. 6.38 shows one way how these block-orientated static nonlinear models can be structured.
These are known as Hammerstein-Wiener models, or block-orientated static nonlinear models.
The only difference between the two structures is the order of the nonlinear block element.
In the Hammerstein model, it precedes the linear dynamics such as in the case for when we
are modelling input nonlinearities such as modelling equal percentage control valves, and for
the Wiener models, the nonlinearity follows the linear dynamics such as in the case where we
are modelling nonlinear thermocouples. Note of course that unlike linear systems, the order of
nonlinear blocks does make a difference since in general nonlinear systems do not commute.

The S YSTEM I DENTIFICATION TOOLBOX can identify a selection of commonly used nonlinear
models in Hammerstein and Wiener structures using the nlhw routine.

6.7 Online model identification

In many practical cases such as adaptive control, or where the system may change from day to
day, we will estimate the parameters of our models online. Other motivations for online identifi-
cation given by [124, p303] include optimal control with model following, using matched filters,
failure prediction etc. Because the identification is online, it must also be reasonably automatic.
The algorithm must pick a suitable model form (number of dead times, order of the difference
equation etc.), guess initial parameters and then calculate residuals and measures of fit.
6.7. ONLINE MODEL IDENTIFICATION 281

Hammerstein Model

..
.. vt
..
- .....................
.. - B(q −1 ) - yt
ut ..
.. A(q −1 )
.
Static nonlinearity Linear dynamics
vt = N (ut )

Wiener Model
..
vt ..
- B(q −1 ) ..
- .....................
.. - yt
ut ..
A(q −1 ) ..
.
Linear dynamics Static nonlinearity
yt = N (vt )

Figure 6.38: Hammerstein and Wiener model structures

As time passes, we should expect that owing to the increase in number of data points that we are
continually collecting, we are able to build better and better models. The problem with using the
offline identification scheme as outlined in §6.4 is that the data matrix, X in Eqn. 6.20 will grow
and grow as we add more input/output rows to it. Eventually this matrix will grow too large to
store or manipulate in our controller. There are two obvious solutions to this problem:

1. Use a sliding window where we retain only the last say 50 input/output data pairs

2. Use a recursive scheme where we achieve the same result but without the waste of throwing
old data away.

The following section develops this second, and more attractive alternative known as RLS or
recursive least-squares.

6.7.1 Recursive least squares

As more data is collected, we can update the current estimate of the model parameters. If we
make the update at every sample time, our estimation procedure is now no longer off-line, but
now online. One approach to take advantage of the new data pair would be to add another
row to the X matrix in Eqn. 6.20 as each new data pair is collected. Then a new θ estimate can
be obtained using Eqn 6.23 with the now augmented X matrix. This equation would be solved
every sample time. However this scheme has the rather big disadvantage that the C matrix grows
as each new data point is collected so consequently one will run out of storage memory, to say
nothing of the growing impracticality of the matrix inversion required. The solution is to use a
recursive formula for the estimation of the new θk+1 given the previous θk . Using this scheme,
282 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

we have both constant sized matrices to store and process, but without the need to sacrifice old
data.

Mean and variance calculated recursively

Digressing somewhat, let us briefly look at the power of a recursive calculation scheme, and the
possible advantages that has for online applications. Consider the problem of calculating the
mean or average, x̄, of a collection of N samples, xi . The well known formula for the mean is
N
1 X
x̄ = xi
N i=1
Now what happens, (as it often does when I collect and grade student assignments) is that after
I have laboriously summed and averaged the class assignments, I receive a late assignment or
value xN +1 . Do I need to start my calculations all over again, or can I can use a recursive formula
making use of the old mean x̄N based on N samples to calculate the new mean based on N + 1
samples? Starting with the standard formula for the new mean, we can develop an equivalent
recursive equation
N +1 N
!
1 X 1 X
x̄N +1 = xi = xN +1 + xi
N + 1 i=1 N +1 i=1
1
= (xN +1 + N x̄N )
N +1
1
= x̄N + (x − x̄ )
|{z} N + 1 | N +1{z N }
old
| {z } error
gain

which is in the form of new estimate = old estimate + gain × error. Note that now the computation
of the new mean, assuming we already have the old mean, is much faster and more numerically
sound.

In a similar manner, the variance can also be calculated in a recursive manner. The variance at
sample number N + 1 based on the mean and variance using samples 1 to N is
 
2 2 1 1 2 2
σN +1 = σN + (xN +1 − x̄N ) − σN (6.34)
N +1 N +1
which once again is the linear combination of the previous estimate and a residual. For further
details, see [121, p29]. This recursive way to calculate the variance published in by B.P. Welford
goes back to 1962 and has been shown to have good numerical properties.

A recursive algorithm for least-squares

Suppose we have an estimate of the model parameters at time t = N , θN , perhaps calculated


offline, and we subsequently collect a new data pair (uN +1 , yN +1 ). To find the new improved
parameters, θN +1 , Eqn. 6.20 is augmented by adding a new row,
.
ϕTN +1 = [yN yN −1 · · · yN −n+1 .. uN +1 uN · · · uN −m+1 ] (6.35)
to the old data matrix XN to obtain the new, now slightly larger, system
   
yN XN
 · · ·  =  · · ·  θN (6.36)
yN +1 ϕTN +1
6.7. ONLINE MODEL IDENTIFICATION 283

or yN +1 = XN +1 θN +1 . This we can solve in exactly the same way as in Eqn. 6.23,


 −1
θ N +1 = XTN +1 XN +1 XTN +1 yN +1 (6.37)
  −1  
h i XN h i yN
=  X⊤ ... ϕ  · · ·  .
XTN .. ϕN +1
 ···  (6.38)
N N +1 T
ϕN +1 yN +1
except that now our problem has grown in the vertical dimension. Note however, the number of
unknowns in the parameter vector θ remains the same.

However we can simplify the matrix to invert by noting


  −1
h i XN  −1
 T
..  · · ·  = XTN XN + ϕN +1 ϕTN +1 (6.39)
XN . ϕN +1
ϕTN +1

The trick is to realise that we have already computed the inverse of XTN XN previously in Eqn. 6.23
and we would like to exploit this in the calculation of Eqn. 6.39. We can exploit this using the
Matrix inversion lemma which states
−1
(A + BDC)−1 = A−1 − A−1 B D−1 + CA−1 B CA−1 (6.40)
If B and C are n × 1 and 1 × n, (column and row vectors respectively), then Eqn. 6.40 simplifies
to
A−1 BCA−1
(A + BC)−1 = A−1 − (6.41)
1 + CA−1 B
If we compare Eqn. 6.41 with Eqn. 6.39, then with
def def def
A = X⊤ X, B = ϕN +1 , C = ϕTN +1
we can rewrite Eqn. 6.39 as
 −1  −1
 −1  −1 XTN XN ϕN +1 ϕTN +1 XTN XN
XTN XN + ϕN +1 ϕTN +1 = XTN XN −  −1 (6.42)
1 + ϕTN +1 XTN XN ϕN +1

and we can also rewrite Eqn. 6.38 as


  −1  −1 

 −1 XTN XN ϕN +1 ϕTN +1 XTN XN 
 
T
θN +1 = XN XN −  −1 XTN yN + ϕTN +1 yN +1
1 + ϕTN +1 XTN XN
 
 ϕN +1 
(6.43)
and recalling that the old parameter value, θ N , was given by Eqn. 6.37, we start to develop the
new parameter vector in terms of the old,
 −1
XTN XN ϕTN +1 ϕN +1  −1
θN +1 = θN −  −1 θN + XTN XN ϕTN +1 yN +1
T
1 + ϕN +1 XN XN ϕTN +1
 −1  −1
XTN XN ϕTN +1 ϕN +1 XTN XN
−  −1 ϕTN +1 yN +1 (6.44)
T T
1 + ϕN +1 XN XN ϕN +1
 −1
XTN XN ϕTN +1 
= θN +  −1 yN +1 − ϕN +1 θ N (6.45)
1 + ϕN +1 XTN XN ϕTN +1
284 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

If we define P as the covariance matrix,


 −1
def
PN = X⊤
N XN (6.46)

then the parameter update equation, Eqn. 6.45 is more concisely written as

θN +1 = θN + KN +1 yN +1 − ϕTN +1 θN (6.47)

with the gain K in Eqn. 6.47 and new covariance given by Eqn. 6.42 as

PN ϕN +1
KN +1 = (6.48)
1 + ϕTN +1 PN ϕN +1
" #
ϕN +1 ϕTN +1 PN h
T
i
PN +1 = PN I − = PN I − ϕ N +1 KN +1 (6.49)
1 + ϕTN +1 PN ϕN +1

Note that the form of the parameter update equation, Eqn. 6.47, is such that the new value of
θN +1 is the old value θN with an added correction term which is a recursive form. As expected,
the correction term is proportional to the observed error. A summary of the recursive least-square
(RLS) estimation scheme is given in Algorithm 6.2.

Algorithm 6.2 Recursive least-squares estimation


Initialise parameter vector, θ0 , to something sensible (say random values), and set co-variance to
a large value, P0 = 106 I, to reflect the initial uncertainty in the trial guesses.

1. At sample N , collect a new input/output data pair (yN +1 and uN +1 )

2. Form the new ϕN +1 vector by inserting uN +1 .

3. Evaluate the new gain matrix KN +1 , Eqn 6.48

4. Update the parameter vector θ N +1 , Eqn 6.47

5. Update the covariance matrix PN +1 , Eqn 6.49 which is required for the next iteration.

6. Wait out the remainder of one sample time T , increment sample counter, N ← N + 1, then
go back to step 1.

The routine in listing 6.16 implements the basic recursive least-squares (RLS) algorithm 6.2 in
M ATLAB. This routine will be used in subsequent identification applications, although later in
section 6.8 we will improve this basic algorithm to incorporate a forgetting factor.

Listing 6.16: A basic recursive least-squares (RLS) update (without forgetting factor)
function [thetaest,P] = rls0(y,phi,thetaest,P)
% Basic Recursive Least-Squares (RLS) parameter update
3 % Inputs: yk ,ϕk : current output & past input/outputs column vector
% Input/Outputs: θ, P: parameter estimates & covariance matrix

K = P*phi/(1 + phi'*P*phi); % Gain K, Eqn. 6.48.


P = P-(P*phi*phi'*P)/(1 + phi'*P*phi); % Covariance update, P, Eqn. 6.49.
thetaest = thetaest + K*(y - phi'*thetaest);% θN+1 = θN + KN+1 yN+1 − ϕTN+1 θ N

8

return % end rls.m


6.7. ONLINE MODEL IDENTIFICATION 285

A recursive estimation example

On page 267 we solved for the parameters of an unknown model using an offline technique. Now
we will try the recursive scheme given that we have just collected a new input/output data pair
at time t = 4,

time input output


4 6 10

and we wish to update our parameter estimates. Previously we had computed that θ 3 = [−2, 1]⊤ ,
but now with the new data we have

θ4 = θ3 + K4 y4 − ϕT4 θ3
where ϕT4 = [y3 u4 ] = [16, 6]. The gain K4 can be calculated from equations 6.48 and 6.49. Now
 
  −2
y4 − ϕT4 θ 3 = 10 − 16 6 · = 36
1
which is clearly non-zero. Since this is the residual, we should continue to update the parameter
estimates.

The covariance matrix at time t = 3 has already been computed and (repeated here) is
 −1    
1 29/2 8 0.0221 0.0122
P3 = X⊤ 3 X3 = =
655 8 27 0.0122 0.0412
so the new gain matrix K4 is now
 
P3 ϕ4 0.0407
K4 = =
1 + ϕT4 P3 ϕ4 0.0422
Thus the new parameter estimate, θ4 , is modified to
     
−2 0.0407 −0.5338
θ4 = + · 36 =
1 0.0422 2.5185
and the updated covariance matrix P4 is also updated to
 
P3 ϕ4 ϕT4 P3 0.0047 −0.0058
P4 = P3 − =
1 + ϕT4 P3 ϕ4 −0.0058 0.0225
Note that P4 is still symmetric, and less immediate apparent, is that it is also still positive definite,
which we could verify by computing the eigenvalues. If we subsequently obtain another (u, y)
data pair, we can just continue with this recursive estimation scheme.

We can compare the new estimates θ4 , with the values obtained if we used to nonrecursive
method of Eqn 6.23 on the full data set. At this level, both methods show much the same com-
plexity in terms of computation count and numerical round-off problems, but as N tends to ∞,
the non-recursive scheme is disadvantaged.

Starting the recursive scheme

In practice there is a small difficulty in starting these recursive style estimation schemes, since ini-
tially we do not know the covariance PN . However we have two alternatives: we could calculate
the first parameter vector using a non-recursive scheme, and the switch to a recursive scheme
only later once we know PN , or we could just assume that at the beginning, since we have no
estimates, our expected error will be large, say PN = 104 I. The latter method is more common
in practice, although it takes a few iterations to converge to the “correct” estimates if they exist.
286 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

Plant under test

white noise - 1+1.2q−1 - output


1−1.9q−1 +1.5q−2 −0.5q−3

-

)
Initial covariance, P0 = 103 I - RLS
θ 0 = random
?
parameter estimate, θ

Figure 6.39: Ideal RLS parameter estimation. (See also Fig. 6.41(a).)

6.7.2 Recursive least-squares in M ATLAB

M ATLAB can automate the estimation procedure so that it is suitable for online applications. To
test this scheme we will try to estimate the same system S from Eqn. 6.24 used previously on
page 268 repeated here

1 + 0.2q −1
S: G(q −1 ) =
1−1.9q −1 + 1.5q −2 − 0.5q −3

where the unknown vector of A and B polynomial coefficients consist of the five values
 T
θ = −1.9 1.5 −0.5 1 0.2

which are to be estimated. Note of course we do not need to estimate the leading 1 in the monic
A polynomial.

The estimation is performed in open loop using a white noise input simply to ensure persistence
excitability. We will select a model structure the same as our true system. Our initial parameter
estimate is a random vector, and the initial covariance matrix is set to the relatively large value
of 106 I as shown in Fig. 6.39. Under these conditions, we expect near perfect estimation.

Listing 6.17 calls the rls routine from listing 6.16 to do the updating of estimated parameters θ
and associated statistics using Eqns 6.48 and 6.49.

Listing 6.17: Tests the RLS identification scheme using Listing 6.16.
1+0.2q −1
1 G = tf([1,0.2,0,0],[1 -1.9 1.5 -0.5],1); % ‘unknown’ plant 1−1.9q −1 +1.5q −2 −0.5q −3
G.variable = 'zˆ-1';
N = 12; % # of samples
U = randn(N,1); % design random input u(k) = N (0, 1)
Y = lsim(G,U); % Compute output
6

n = 5; P = 1e6*eye(n); % Identification initialisation, P0 = 106 I


thetaest = randn(n,1); Param = [];
6.7. ONLINE MODEL IDENTIFICATION 287

for i=4:N % start estimation loop


11 phi = [Y(i-1:-1:i-3); U(i:-1:i-1)]; % shift X register (column) ϕ
[thetaest, P] = rls0(Y(i),phi,thetaest,P); % RLS update of θ
Param = [Param; thetaest']; % collect data
end

If you run the script in Listing 6.17 above you should expect something like Fig. 6.40 giving
the online estimated parameters (lower) which quickly converge to the true values after about 5
iterations. The upper plot in Fig. 6.40 shows the input/output sequence I used for this open loop
identification scheme. Remember for this identification application, we are not trying to control
the process, and that provided the input is sufficiently exciting for the identification, we do not
really care precisely what particular values we are using.

input
5
output
Input/output

−5
2

1.5 a
2

1 b
0

0.5
b1
Paramaters, θ

−0.5 a
3

−1 Figure 6.40: Recursive least squares param-


eter estimation.
−1.5 Upper: The input/output trend for an un-
a
1
known process.
−2
Lower: The online estimated parameters
quickly converge to the true parameters.
−2.5
0 2 4 6 8 10 12 Note that no estimation is performed until
Sample No. after we have collected 4 data pairs.

The near perfect estimation in Fig. 6.40 is to be expected, and is due to the fact that we have no
systematic error or random noise. This means that after 4 iterations (we have five parameters),
we are solving an over constrained system of linear equations for the exact parameter vector. Of
course if we add noise to y, which is more realistic, then our estimation performance will suffer.

Recursive least-squares in S IMULINK

The easiest way to perform recursive estimation in S IMULINK is to use the the identification
routines from the S YSTEM I DENTIFICATION T OOLBOX. Fig. 6.41 shows both the S IMULINK block
diagram for the example given in Fig. 6.39, and the resultant trend of the prediction error.
288 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

1+1.2z−1
1−1.9z−1+1.5z−2−0.5z−3
Scope
Band −Limited Plant under test
White Noise

y
u
AutoRegressive
with eXternal input ARX
model estimator

(a) Identifying a simple plant using the RLS block in S IMULINK .

Actual Output (Red Line) vs. The Simulated Predicted Model output (Blue Line)
5

−5
40 45 50 55 60 65 70 75 80 85
Time (secs)
Error In Simulated Model

0.5

40 45 50 55 60 65 70 75 80 85
Time (secs)
(b) The output of the RLS block in S IMULINK

Figure 6.41: RLS under S IMULINK. In this case, with no model/plant mismatch, and no distur-
bances, we rapidly obtain perfect estimation.

The ARX block also stores the resultant models in a cell array which can be accessed from the
workspace once the simulation has finished. The structure of the model to be estimated, and the
frequency at which it is updated is specified by double clicking on the ARX block.

If we don’t have the System Identification toolbox, we can still reproduce the functionality of
the ARX block by writing out the three recursive equations for the update in raw S IMULINK as
shown in Fig. 6.42. The phi block in Fig. 6.42(a) generates the shifted input/output data vector
using a simple discrete state-space system, while the bulk of the RLS update is contained in the
RLS block shown in Fig. 6.42 which simply implements the thee equations given in Algorithm 6.2
on page 284.

An elegant way to generate the past values of input and output vector required for the ϕ vector is
to implement a shift register in discrete state-space. If we have a stream of data, and we want to
keep a rolling collection of the most recent three values, then we could run a discrete state-space
6.7. ONLINE MODEL IDENTIFICATION 289

1+1.2z−1
u 1−1.9z−1+1.5z−2−0.5z−3 y
Band −Limited u y
"Unknown" plant
White Noise

theta
y

y
Subsystem
5

phi
5
y (k) theta theta
5
5
[5x5] To Workspace1
phi P P
5
RLS To Workspace3

(a) Implementing RLS in raw S IMULINK .

5
y(k)
1 1
Unit Delay
z
phi 5
5 5 5
2 5 5
phi 5 1
theta 5
phi ’*theta 5
theta
K*err

[5x5] Matrix 5
Multiply P*phi 5 5
5 K
P*phi Divide

5
u+1
5
phi ’*P*phi Bias

2
[5x5]
P
Unit Delay 1 P*(I−phi *K’)
[5x5] phi *K’
P(k) 1 P(k+1) Matrix 5 Transpose1
[5x5] [5x5] Multiply phi*K’ Matrix
z phi
[5x5] [5x5] Multiply 5
[1x5] uT
[5x5]

Constant

eye(na+nb)
[5x5]

(b) The internals of the RLS block in S IMULINK . The inputs are the current plant output and ϕ,
and the outputs are the parameter estimate, θ, and the covariance matrix.

Figure 6.42: Implementing RLS under S IMULINK without using the System Identification toolbox.

system as

      
yk 0 0 0 yk−1 1
 yk−1 = 1 0 0   yk−2  +  0  yk (6.50)
yk−2 0 1 0 yk−3 0

where the state transition matrix has ones on the first sub-diagonal.

In general, if we want to construct a shift register of multiple old values of input and output, we
290 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

can construct a discrete state-space system as


   
   
yk  0 0 ··· 0  yk 1 
      
 yk−1   1 0 ··· 0   yk−1   0 
      
   0    0 
 .   ..  .   . 
 .   0 1 . 0   .   .. 
 .    .   
      
 yk−na   0 0 ··· 1   yk−na   0  
       yk
 =  +  (6.51)
 uk     ul    uk
   0 0 ··· 0    1 
      
 uk−1     uk−1   
   1 0 ··· 0    0 

 .  
  0 
 .   0
  

 .   .  .   ..
1 .. 0

 .   0  .   . 
   
uk−nb 0 0 ··· 1 uk−nb 0

where we are collecting na shifted values of y, and nb values of u.

6.7.3 Tracking the precision of the estimates

One must be careful not to place too much confidence on the computed or ‘inherited’ covariance
matrix P. Ideally this matrix gives an indication of the errors surrounding the parameters, but
since we started with an arbitrary value for P, we should not expect to believe these values.
Even if we did start with the non-recursive scheme, and then later switched, the now supposedly
correct covariance matrix will be affected by the ever present nonlinearities and the underly-
ing assumptions of the least square regression algorithm (such as perfect independent variable
knowledge) which are rarely satisfied in practice.

Suppose we attempt to estimate the two parameters of the plant

b0
G= (6.52)
1 + a1 q −1

where the exact values for the parameters are b0 = 1 and a1 = −0.5. Fig. 6.43 shows the parameter
estimates converging to the true values, and the elements of the covariance matrix P decreasing.

Now even though in this simulated example we have the luxury of knowing the true parameter
values, the RLS scheme also delivers an uncertainty estimate of the estimated parameters via
the covariance P matrix. Fig. 6.44 which uses the same data from Fig. 6.43 superimposes the
confidence ellipses around the data estimates for the points once the estimate has converged
sufficiently after k = 2. It is interesting to note that even after 3 or 4 samples where the estimates
look perfect in Fig. 6.43, have, according to Fig. 6.44, surprisingly large confidence ellipses with
large associated uncertainties.

A more realistic estimation example

The impressive results of the fully deterministic estimation example shown in Fig. 6.40 is some-
what misleading since we have a no process noise, good input excitation, no model plant/mis-
match in structure and we only attempt to identify the model once. All of these issues are crucial
for the practical implementation of an online estimator. The following simulation examples will
investigate some of these problems.
6.7. ONLINE MODEL IDENTIFICATION 291

Recursive least−squares parameter estimation


1
Input/output

−1
Output
Input
−2

2
parameters, θ

1 b0
−a
1
0

−1

10
10
Figure 6.43: Estimation of a two pa-
elements of P

10
5 rameter plant. Note that the parame-
ter estimates converge to their correct
10
0
values, and that the elements of the
covariance matrix decrease with time.
−5
10 See also Fig. 6.44 for the associated
0 1 2 3 4 5
sample No. confidence ellipses.

10
8
6
4
2
0

b
b

0 0

−2
−4
−6
a1
−8
Figure 6.44: The 95% confidence re-
−10 −5 0 5 10 gions at various times for the two es-
a
1 timated parameters shown in Fig. 6.43.

Suppose we want to identify a plant where there is an abrupt change in dynamics at sample time
k = 40,

 1.2q −1

 −1 − 0.4q −2
for k < 40,
S(θ) = 1 − 0.2q −1
 1.5q

 for k ≥ 40
1 − 0.3q −1 − 0.7q −2

In this case shown, as in Fig. 6.40, the parameters quickly converge to the correct values as shown
in Fig. 6.45 for the first plant, but after the abrupt model change, they only reluctantly converge
to the new values.

This problem of an increasing tendency to resist further updating is addressed by incorporating


a forgetting factor into the standard RLS algorithm and is described later in §6.8.
292 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

Identifying abrupt plant changes


20

Input/output
10

−10

2
error

−2

1.5 b
0

−a
θ

3
Figure 6.45: Identifying an abrupt 0.5
plant change at k = 40. The estima- −a
2
tor’s parameters initially converges
quickly to the first plant (lower 0
trend), but find difficulty in converg-
ing to the second and subsequent 0 50 100 150 200 250 300
plants. sample No.

6.8 The forgetting factor and covariance windup

Sometimes it is a good idea to forget about the past. This is especially true when the process
parameters have changed in the interim. If we implement Eqns 6.48 and 6.47, or alternatively
run the routine in Listing 6.17, we will find that it works well initially, the estimates converge
to the true values with luck, and the trace of the covariance matrix decreases as our confidence
improves. However if the process subsequently changes again, we find that the parameters do
not converge as well to the new parameters. The reason for this failure to converge a second time
is due to the fact that the now minuscule covariance matrix, reflecting our large, but misplaced
confidence, is inhibiting any further large changes in θ.

One solution to this problem, is to reset the covariance matrix to some large value (103 I for ex-
ample) when the system change occurs. While this works well, it is not generally feasible, since
we do not know when the change is likely to take place. If we did, then gain scheduling may be
more appropriate.

An alternative scheme is to introduce a forgetting factor, λ, to decrease the influence of old sam-
ples. With the inclusion of the forgetting factor, the objective function of the fitting exercise is
modified from Eqn. 6.21 to
N
X 2
J = λN −k yk − ϕTk θ , λ<1 (6.53)
k=1
6.8. THE FORGETTING FACTOR AND COVARIANCE WINDUP 293

where data n samples past is weighted by λn . The smaller λ is, the quicker the identification
scheme discounts old data. Actually it would make more sense if the forgetting factor λ were
called the “remembering” factor, since a higher value means more memory!

Incorporating the forgetting factor into the recursive least squares scheme modifies the gain and
covariance matrix updates very slightly to
PN ϕN +1
KN +1 = (6.54)
λ + ϕTN +1 PN ϕN +1
!
1 PN ϕN +1 ϕTN +1 PN
PN +1 = PN −
λ λ + ϕTN +1 PN ϕN +1
PN  
= I − ϕN +1 KTN +1 (6.55)
λ

To choose an appropriate forgetting factor, we should note that the information dies away with a
time constant of N sample units where N is given by
1
N= (6.56)
1−λ
As evident from Fig. 6.46, a reasonable value of λ is 0.995 which gives a time constant of 100
samples.

200

150
N

100

50
Figure 6.46: The ‘memory’, N when us-
ing a forgetting factor λ from Eqn. 6.56.
0 Typically λ = 0.995 which corresponds
0.6 0.7 0.8 0.9 1
λ to a time constant of 100 samples.

Consequently most textbooks recommend a forgetting factor of between 0.95 and 0.999, but lower
values may be suitable if you expect to track fast plant changes. However if we drop λ too much,
we will run into covariance windup problems which are further described later in section 6.8.2.

We can demonstrate the usefulness of the forgetting factor by creating and identifying a time
varying model, but first we need to make some small changes to the recursive least-squares
update function rls given previously in Listing 6.16, to incorporate the forgetting factor. as
shown in Listing 6.18 which we should use from now on. Note that if λ is not specified explicitly,
a value of 0.995 is used by default.

Listing 6.18: A recursive least-squares (RLS) update with a forgetting factor. (See also List-
ing 6.16.)
1 function [thetaest,P] = rls(y,phi,thetaest,P,lambda)
% Basic Recursive Least-Squares (RLS) parameter update
% Inputs: yk ,ϕk , λ: current output & past input/outputs column vector & forgetting factor
% Input/Outputs: θ, P: parameter estimates & covariance matrix

6 if nargin < 5
294 CHAPTER 6. IDENTIFICATION OF PROCESS MODELS

lambda = 0.995; % Set default forgetting factor, λ


end % if

K=P*phi/(lambda + phi'*P*phi); % Gain K, Eqn. 6.54.


11 P=(P-(P*phi*phi'*P)/(lambda+phi'*P*phi))/lambda;% Covariance update, P, Eqn. 6.55. 
thetaest = thetaest + K*(y - phi'*thetaest);% θN+1 = θN + KN+1 yN+1 − ϕTN+1 θ N
return % end rls.m

6.8.1 The influence of the forgetting factor

Figure 6.47 shows how varying the forgetting factor λ alters the estimation performance when
faced with a time varying system. The true time varying parameters S(θ) are the dotted lines,
and the estimated parameters M(θ) for different choices of λ from 0.75 to 1.1 are the solid lines.

When λ = 1, (top trend in Fig. 6.47), effectively no forgetting factor is used and the simulation
shows the quick convergence of the estimated parameters to the true parameters initially, but
when the true plant S(θ) abruptly changes at t = 200, the estimates, M(θ), follows only very
sluggishly, and indeed never really converges convincingly to the new S(θ). However by incor-
porating a forgetting factor of λ = 0.95, (second trend in Fig. 6.47), better convergence is obtained
for the second and subsequent plant changes.

1 b
1
λ=1

−a2
0 −a3
−1

1 b1
λ = 0.95

−a2
0 −a3
−1

1 b1
λ = 0.75

−a2
0 −a
3
−1

2
Figure 6.47: Comparing the estimation per-
formance of an abruptly time-varying un- 1 b1
λ = 1.1

known plant with different forgetting factors, −a2


λ. 0 −a3
Estimated parameters (solid) and true param-
−1
eters (dashed) for (a) λ = 1, (b) λ = 0.95, (c)
λ = 0.75, (d) λ = 1.1. Note that normally one 0 200 400 600
would use λ ≈ 0.95, and never λ > 1. sample time [k]
6.8. THE FORGETTING FACTOR AND COVARIANCE WINDUP 295

Reducing the forgetting factor still further to λ = 0.75, (third trend in Fig. 6.47), increases the con-
vergence speed, although this scheme will exhibit less robustness to noisy data. We also should
note that there are larger errors in the parameter estimates (S(θ) − M(θ)) during the transients
when λ = 0.75 than in the case where λ = 1. This is a trade-off that should be considered in the
design of these estimators. We can further decrease the forgetting factor, and we would expect
faster convergence, although with larger errors in the transients. In this simplified simulation
we have a no model/plant mismatch, no noise and abrupt true process changes. Consequently
in this unrealistic environment, a very small forgetting factor is optimal. Clearly in practice the
above conditions are not met, and values just shy of 1.0, say 0.995 are recommended.

One may speculate what would happen if the forgetting factor were set greater than unity (λ > 1).
Here the estimator is heavily influenced by past data, the more distant past—the more influence.
In fact, it essentially disregards all recent data. How recent is recent? Well actually it will disre-
gard all data except perhaps the first few data points collected. This will not make an effective
estimator. A simulation where λ = 1.1 is shown in the bottom trend of Fig. 6.47), and demon-
strates that the estimated parameters converge to the correct parameters initially, but fail to budge
from then on. The conclusion is not to let λ assume values > 1.0.

Modifications on the ‘forgetting’ theme

Introducing a forgetting factor is just one attempt to control the behaviour of the covariance
matrix. As you might anticipate, there are many more modifications to the RLS scheme along
these lines such as variable forgetting factors, [67], directional forgetting facto