Lect Notes 1
Lect Notes 1
Introduction
1 Signals
“ A collection of sensible objects ”
It is not necessary that ‘sensible’ objects come from a physical sensing device. It can
be a conceptual entity like an impulse, or an abstract collection in some space. Our aim is
to develop this naive representation to concrete mathematical abstractions, simple enough
to understand and manipulate, but sufficient to capture the true properties.
2 Symbols
Symbols perhaps stand next to signals in the evolutionary cycle (ahead or behind ?). It is
a creative signal, Webster makes it easy for us.
Like the little in the table above, some symbols that we hold dear in this course will now
be introduced as notations.
N : Natural numbers
Z : Integers
Q : Rational Numbers
R : Real Numbers
C : Complex Numbers
f (⋅) : function from R → C
Ck : Class of k-times continuously differentiable functions
R+ : Non-negative real numbers
We will denote a closed interval by square brackets, i.e. [a, b] for the interval from point a
to point b in the real axis, including the endpoints. On the other hand, (a, b) will denote
the open interval.
3 Systems
“ Meaningful operations on signals ”
Our discussion on signals apply here too. In other words, though meaningful and
sensible are semantically alright, we need an approach which has more syntax, to create a
precise (more formal) notion of signals and systems. Systems can be treated as a mapping
from signals to signals. Pictorially,
Input Output
System
Signal Signal
In this representation, the terms input and output carry their conventional meaning,
that we engineers are familiar with. While we will deal with almost any signal thrown at
us, say by theory or practice, most of our discussions will consider only a particular class of
systems, the so called linear systems1 . As the name implies, linear systems perform linear
operations of the input signals (examples: add two or more input signals, scale an input
signal). Understanding linear systems is the back-bone of any signal processing course.
Since many of you come from diverse backgrounds, it is instructive to have an example
in mind which we all can visualize. We will introduce the concept of interpolation to sketch
a system level view of signal manipulation. This should reassure the newcomers to the field
that every aspect we learn has great practical impact. At times down the line, we may
spent considerable time in seemingly mundane mathematical aspects, but it will be followed
by seasons where we apply the theory to real-life problems. Once mastered, the beauty of
the whole machinery will be greatly appreciated.
You will not take much time in figuring out that the answer is NO. There are so many an-
swers that will fit the above description, i.e. problem is ill-posed given no other assumption
about the function. However, the answer is YES if the function f (⋅) is assumed to be a
polynomial of degree at most N . We can determine it completely from N + 1 points. This
then is the kind of philosophy that we can use in getting the in-between values.
4.1 Strategy I
Assume the unknown function f (⋅) was a polynomial of degree at most N . Given f (ti ),
0 ≤ i ≤ N , construct a polynomial of minimum degree g(t) such that g(ti ) = f (ti ), 0 ≤ i ≤ N .
The polynomial construction can be accomplished by the elegant interpolation formula by
Lagrange. Define the Lagrange polynomial
t − tj
Li (t) = ∏ .
j≠i ti − tj
Notice that
⎧
⎪
⎪1 if t = ti
Li (t) = ⎨ (1)
⎪
⎩0 if t = tj , j ≠ i
⎪
Now, we can form g(t) as
g(t) = ∑ f (ti )Li (t). (2)
i
Example 1. Consider a function specified at three points as shown in Figure 1. Our aim
is to find the interpolating polynomial of second degree.
Applying Lagrange interpolation formula,
(x − 2)(x − 3) (x − 1)(x − 3) (x − 1)(x − 2)
g(x) = −2 + (3)
2 1 2
= −x + 4x − 2
2
(4)
See Figure 2 for a plot.
While Lagrange interpolation seems a natural step to do, it has several limitations.
For example, in our problem of plotting temperatures, we may get a polynomial of degree
as high as 300. While modern computers can perform such computations, scalability will
become an issue, particularly for images, videos etc. For many applications, we may need
online interpolation, as data values keep coming in real time. Furthermore, many quantities
of interest vary sufficiently slow, that fitting low degree polynomials between neighboring
values itself is sufficient. This last point is more like a greedy approach to interpolation,
and let us in particular target the piece-wise polynomial approximation.
f (xi )
1 2 3
g(x)
−x2 + 4x − 2
1 2 3
Example 2. Consider 5 points shown in the top of Figure 3. Let us construct an approx-
2 ].
imation for the interval [ 12 , 11
1 2 3 4 5
1 2 3 4 5
Let the successive samples are apart by τ units. Take every sample point and draw a
horizontal line of length τ with the given point as its center. This approximation is shown
in the bottom half of Figure 3.
Apart from its simplicity, piece-wise constant approximations are of immense value
in integration theory. As far as signal processing is concerned, piece-wise constants fall
terribly short of quality. But our motivation of introducing this here is to learn another
fundamental idea, that of system representation. In particular, we can visualize the
samples entering a blackbox and interpolated values coming out of the box.
To see this, imagine each sample which enters a blackbox getting replaced by a rectangle
of constant width and variable height. More precisely, an input sample at time t0 of value
f (t0 ) will be replaced by a rectangle of height f (t0 ), its base sitting on x−axis with the
mid-point at t0 .
Input Samples Interpolated signal
Interpolator
(discrete-time) (continuous-time)
f (t0 ) f (t0 )
Blackbox
t0 t0
We can treat the rectangle with unit base and height as an atom (more formally it
is called the interpolating kernel). The system is built using such atoms. Each in-
put sample stretches the atomic height while keeping the base-width the same. The
center of the atom coincides with the position of the sample. Notice further that the
input height can be positive or negative or even complex, with the appropriate sub-
stitution clear from the context.
1 2 3 4 5 1 2 3 4 5
0
Input System Output
We can label the black-box by the picture of the atom. This will open new possibilities
by using different atoms3 . The piece-wise constant interpolation system is pictorially shown
in Figure 6. The important take-home point here, in addition to the concept of kernel, is
that the output waveform is the superposition of the effects of the individual samples
entering the interpolator/system.
A careful reader might have noticed the arrows on top of the input samples in Figure 6.
This is a symbolization. It tells us that each arrow is replaced by a scaled atom/kernel at
the output of the system. A vertical line with an arrow on top symbolizes an impulse. A
unit impulse at origin is also known as Dirac delta measure. Such representations are often
known as symbolic calculus. We will encounter further details of this calculus later, known
as ‘symbolic calculus of impulses’.
3
unlike chemistry, here we are not bothered about putting together an integer number of atoms.
4.2.2 A Small Detour: Digital Signals
The preceding section and example dealt with a set of samples at the input. On the
other hand, the output produced continuous time signals. One can imbibe the refer-
ence time from the output signal and associate each input with a time instant. This
leads to the idea of discrete-time signals.
2 3 6 9
0 1 4 5 7 8 10
0 1 2 5 6 7 8 9 12 13 14
Most discrete-time signals we deal with are regular, i.e. the time instants are integral
multiples of some fundamental time-period T . Thus, x(nT ), n ∈ Z represents a discrete-
time signal. Once we strip the time factor, a discrete-time signal simply becomes an indexed
set of values, which we call it a digital signal. For example, in a still image, each index
can correspond to the pixel position and the y− axis can plot the corresponding intensity.
This can be stored in an indexed memory register. We will denote a digital signal by x[n],
where n represents the index.
Let us now continue with our story on interpolation.
0 System
1 2 3 4 5 0 1 2 3 4 5 6
sample by a scaled atomic diagram and get linearly interpolated values at the output using
superposition. The answer is YES, but we need to pay attention to a small detail. The
constructed atomic width is 2 units of the sample interval, as opposed to the single unit
we had in piece-wise constants. In the above example, the base of the atomic function will
occupy the interval [−1, 1]. This is mildly confusing, since successive atomic replacements
will now overlap in time at the output, see the illustration in figure. The overlap of atomic-
0
1 2 3 4 5 0 1 2 3 4 5 6
0
replacements will result in the super-position of these. In other words, we need to add the
effect of all overlapping segments at any given time to get the total effect. Notice that at
for any interval (i, i + 1), 1 ≤ i ≤ 4 on the right side of 9, there are two overlapping lines.
The sum of two lines is another line. Further, at each sample instant ti the lines sum to
f (ti ). This uniquely identifies the superposed line, shown as the thick line in Figure 10.
0 1 2 3 4 5 6
1.
2.
Let us find the cubic polynomial approximation between the points t1 and t2 with function
values f (t1 ) and f (t2 ) respectively. Once we construct this, we can use the same technique
to construct the cubic interpolation between any two points. The governing equations for
any cubic polynomial at3 + bt2 + ct + d satisfying (5) are,
By ensuring this derivative condition for the cubic polynomial that we find for every seg-
ment, we will indeed have g(t) a differentiable function. A very common choice for βi is
the difference approximation to the derivative. Specifically,
f (ti+1 ) − f (ti−1 )
βi = . (11)
2
We thus have four equations (7) to (11) to evaluate the parameters a, b, c, d. The following
properties will help us easily find the polynomial coefficients.
w.l.o.g assume t1 = 0, as we can port the design to any starting time by simply shifting
the polynomial (drawing the picture from some t instead of 0).
The interval length t2 − t1 w.l.o.g can be taken to be unity, since we already assumed
that the sample points are equi-distant in time. This is justified, since the obtained
interpolation for [0, 1] can be stretched/time-shortened to fit into [0, t2 −t1 ], and then
use the previous property.
4
0
0 1 2 3 4 5 6
Figure 11: Cubic interpolation for (1, 1) (2, 2) (3, 1) (4, 2.5) (5, 3)
1.0
−2 −1 0 1 2
We now understand cubic interpolation, which has a lot of applications. However, our
interests here is to use this model to say something more general about systems. For
example, can we write the output of the interpolator as atomic superpositions, the same
way we did for constant and linear interpolators. At first this looks a bit awkward, as the
polynomials are of degree 3 and linearity is not visible in their very nature. Nevertheless, we
indeed have such a representation, which is nothing but the output of the cubic interpolator
to a Dirac delta measure, this is depicted in Figure 12.
Notice that we obtained the kernel by passing an impulse through the system. This
generalizes, and we can find the response of many systems by passing an impulse through
it. In this sense, the kernel function is also known as impulse response.
We can now show that the atomic superpositions of the kernel shown in Figure 12,
when driven by the input samples, will indeed result in cubic interpolation. Without loss
of generality, let us show this for the interpolation between points 0 and 1. Let f−1 , f0 , f1 , f2
be the set of inputs and the interpolating polynomial be p(t) = at3 + bt2 + ct + d. Clearly,
⎡a⎤ ⎡− 1 3 − 3 1 ⎤ ⎡f−1 ⎤
⎢ ⎥ ⎢ 2 2 ⎥⎢ ⎥
⎢ b ⎥ ⎢ 1 − 5 22 −21 ⎥ ⎢ f ⎥
⎢ ⎥ ⎢ 2⎥ ⎢ ⎥
⎢ ⎥=⎢ 1 2 ⎥⎢ 0 ⎥. (16)
⎢ c ⎥ ⎢− 2 0 1
0 ⎥ ⎢ ⎥
⎢ ⎥ ⎢ 2 ⎥ ⎢ f1 ⎥
⎢ d ⎥ ⎢ 0 1 0 0 ⎥ ⎢ f2 ⎥
⎣ ⎦ ⎣ ⎦⎣ ⎦
With G as the matrix above, we can write (16) as p̄ = Gf¯, where the column vector p̄ has
the coefficients of polynomial p(t), and f¯ contains the input values. Take two vectors ᾱ
and β̄ such that f¯ = ᾱ+ β̄. We will show that if we interpolate input ᾱ and β̄ separately and
superpose, we will get p(t), the interpolation of f¯. In particular, if ᾱ is the input sequence,
our interpolation polynomial will have coefficients Gᾱ. Clearly,
Thus, the sum of the polynomial coefficients for ᾱ and β̄ are same as the coefficients of p(t).
Since any discrete-time input can be written as a superposition of values at discrete-time
instants, atomic superpositions are sufficient to get us cubic interpolation.
We studied interpolation to enable visualizing the superposition property. It can be
hidden from a cursory look, as in the case of a cubic interpolation. More generally, how do
we figure out that such representations exist? To our relief, it turns out that every linear
time-invariant system (LTI) has such a property. What is an LTI system? How do we
check whether a system is LTI?, please read on, the answers are ahead.
5 Class of Signals
Our structured approach is to divide the set of sensible objects (signals) and meaningful
operations (systems) to classes. To see how classification helps, consider the collection of
citizens of India. The operation we want: administer meaningful governance (say imparting
basic education). We can define sets of people who have some properties in common (such
as the same language), and define operations on each of these classes. The classification can
also be based on convenience, observed facts or pedagogical reasons. Thus the partitions
that we introduce are not that rigid.
The discrete-time signals that we consider have a property that the interval between
successive time instants is an integer multiple of some real number, the largest of which we
call the fundamental sampling interval (unofficial terminology).
4
recall that R, Z and N stands for reals, integers and natural numbers respectively.
Exercise 1. Does every discrete-time signal have a fundamental sampling interval. Write
a proof for your answer.
From now onwards, the discrete-time signals that we consider are assumed to have
successive time-indices equi-distant. Thus index n happens at time nT , where T is the
sampling interval.
A continuous-time signal is defined for all t ∈ R. One can interchange the term with
functions f (x), x ∈ R. Care should be applied that only the time index may be continuous,
a continuous-time signal still can have discrete jumps. To emphasize, we will say that the
signal itself is continuous when there are no jumps (see examples below).
Our idea on meaningful (operation) is clearer now. We will improve further, and place
this concept on sound logical and mathematical footing.
Stable Signals
A signal f (t) is stable or integrable if,
∫R ∣f (t)∣dt < ∞.
Not all signals are stable. For example, a periodic sinusoid is unstable by our defi-
nition. The notion of local integrability is central for the mathematical treatment of
unstable signals. A signal f (t) is locally integrable over the finite interval (a, b) if
b
∫a ∣f (t)∣dt < ∞.
The signals shown in Figure 1, the signal pattern repeats after constant intervals. On
other hand, the well known Gaussian function is continuous and belongs to C k , ∀k ∈ N , but
it does not repeat itself. Our second classification is based on the property of periodicity.
√1 e− 2
t 2
2π
∫R e dx = 2π
Ef = ∫ ∣f (t)∣2 dt.
R
5
some may disagree, and argue that periodicity is indirectly used in most discrete-time operations
5.2.1 Energy Signals
If Ef < ∞, then the signal f is called a square integrable signal or energy signal.
Exercise 6. Show that any bounded function with finite support is square integrable.
for any finite real interval of the form (a, b), we will say that the signal is locally square
integrable.
The properties that we described above covers most of the signal terminologies that we
use in this course. Let us now look at some of the common system operations.
6 System Operations
The most common operations that we encounter in this course are scaling, shifting and
addition of signals. In an abstract sense, a system (operation) can be represented by a
deterministic mapping from the space of input signals to the space of output signals.
6.1 Scaling
The simplest scaling example is that of amplification.
g(t) = af (t), ∀t
is a meaningful operation.
Notice that the frequency of the waveform is unchanged by amplitude scaling. Yet another
scaling operation, very common in wavelet analysis, is where the argument of the function
is scaled, as opposed to the function itself. The argument scaling (time-scaling) will change
the frequency components. See the following example for f (t) = sin(t) and f (2t). The fre-
quency of the time-scaled waveform is double that of the original, though there is no change
in amplitude. Amplitude scaling is a linear operation, while time-scaling is non-linear. If
you wonder what this means, a major part of our study is to make this differentiation,
which will get more clear as we move on.
f (t)
2 f (t)
1
f (t)
f (2t)
t
g(t) = f (t − T0 ).
As an example, consider 3 cycles of a triangle wave 6 shown in Figure 17. Keep in mind
f (t)
f (t − T0 )
t
T0
that T0 can be any real number, positive or negative. If T0 is negative, the waveform will
6
only 3 cycles are considered just for illustration, the same idea works for periodic waveforms too
be shifted to the left. It is evident that if g(t) = f (t − T0 ), then f (t) = g(t + T0 ), which
corresponds to g(⋅) being shifted to the right along the time-axis.
7 Echo-System
This section summarizes some of the properties above with an example. We will describe
a system with echos, which will involve scaling and time-shifting of input signals. In
particular, an echo is a time delayed version of the original signal. We will introduce the
notion of adding signals, wherein the original signal and echoes are added to obtain a
new waveform. A practical example of a system with echoes happens in mobile cellular
communications. The echo phenomenon there is known as multi-path propagation. Here
a signal transmitted from the antenna reaches the destination antenna through multiple-
paths with different delays, see figure. Apart from its relevance to a key technology sector,
Exercise 7. Justify the usage ’a low-pass filter’ for the system in (18).
Exercise 8. Can you suggest one change which will render the system to a high-pass one.
The above examples are that of linear systems. There are non-linear systems too. In
fact, most systems in real life has some non-linearities (we all have it), and we will first
learn how to live with linear approximations. We can launch ourselves to bigger ‘depths’
once we master linear systems.
Let us now have a more formal definition of linear systems.
8 Linear Systems
Let us go back to the previous example . This belongs to the class of ‘linear’ systems. In
Euclidean coordinates linear is synonymous with the geometric path (hyper-plane) defined
by
a1 x1 + a2 x2 + ⋅ ⋅ +ak xk = c
where xi is the ith coordinate in Rn . We can extend this to abstract spaces, by treating
every member(element) of the space as a coordinate. For example, linear combination of
sinusoids, where every coordinate is a sine wave (how many parameters do you need to
specify a wave?). While an Euclidean line and sum of sine-waves are straightforward to
imagine, what connection does these have to linear systems?, or plainly, what is a linear
system?. To answer this, we give a precise characterization.
Linear System
A system defined by the mapping y = f (x) is linear iff the following two proper-
ties hold:
(i)Superposition :
If y1 = f (x1 ) and y2 = f (x2 ) are the outputs produced by inputs x1 and x2
respectively, then
(ii)Homogeneity :
If y1 = f (x1 ) and x2 = ax1 , a ∈ R, then
Exercise 12. From the definition of time-invariance, our coffee example seems to be time-
invariant when we pay rupees and time-variant when we buy in dollars. Is this a discrep-
ancy. Explain your answer.
Systems which are linear and time invariant belongs to an important class, and for most
of the lectures we consider only such systems. We give it a pet name, LTI systems. The
reader can now guess what LTV stands for.
Exercise 13. Show that a cascade of (i) LTI and LTV is LTV. What about an LTV and
another LTV.
We should not be prejudiced about having only serial connection of systems. Parallel
systems not only are possible, but also are key to our understanding of complex systems.
For illustration, let us express (18) in parallel form.
1
x[n] Amp 2 Σ y[n]
1
D Amp 2
Here the box D is a delay element which delays the sample by one fundamental interval.
Exercise 14. Comment whether LTI parallel LTI is still LTI. What about LTI parallel
LTV?.
We have seen that time-variations pose number of problems, increasing the complexity
of analysis. What will we do as a coffee vendor (engineer) to make things simple. One
idea is to take the closing exchange rate at a day, and use it as a fixed conversion rate
for the whole of the next day, ignoring changes within the trading hours. So within a
day, we have a linear time-invariant system. Come next day, we may have another LTI
system, depending on what the rates are. Do this for every day, we have approximated the
time-variance by a collection of time-invariant ones, making life easy for all. This is the
reason why LTIs are important, not only in theory, but also in practice. A keen reader may
have observed that, to reduce complexity further we can fix the exchange rate for weeks or
months instead of a 24 hour basis.
Let us now zoom into the LTI class. Assume that the vendor had fixed the exchange
rates for a long time (say an average butterfly lifetime). He now starts accepting all
currencies, converting to rupees first and then business as usual. So even in the LTI class,
converting the input into a form which is simple and easy to manipulate, is important.
Those who attended the first lecture will realize that we were trying to motivate the very
same thing.
Example 7. A computer is like our coffee shop!. Yes, you are in the right class. It accepts
a program in some language (currency), converts to machine language, executes and in
turn outputs some service (coffee).
Exercise 15. What is the standard currency that the computer accepts. (give it a try, there
is no right or wrong answer)
Exercise 16. Is the computer that we considered in previous example time-invariant. Com-
ment on the linearity of the system.
10 Time Variance
We already mentioned the naive fact that time variance means not time-invariant. To
complete the story, let us have a closer look at time-variance. In several physical examples,
time variance is caused by the relative movement of the signal source (input) and the system.
A well known example is Doppler Spread, which corresponds to a frequency shift. (Here
we describe a simplified version, which is just to convey the concept. Please see a physics
textbook or wikipedia for a more systematic and correct explanation.) If the relative motion
takes the source and observer closer, then a sine-wave, assuming no amplitude degradation
over the path (and not considering path delays) will be observed as
(Please note that the above is merely a mathematically convenient representation than the
physics behind it. In particular, a sine wave starting at a certain time, has much more
frequency components than the visible number of cycles per time unit.)
Currency x̂
x Coffee Machine y = αx̂
Converter
In picture, x̂ is the equivalent Indian currency which the machine can easily manipulate.
Let us try to emulate the first step of converting x to x̂, but now x is a signal given to us,
instead of some currency.
Let x(t) ∈ C 0 , one such signal is showed below for some interval of time in the positive
axis. It is assumed that the signal is continuous and well-defined on the rest of the axis
too.
0 t
How will we represent this signal in a concise manner for the shown interval. At first, it
appears that we need an infinite number values, say if we store x(t) for every rational t.
This thought leads to the further (simpler) idea that we can store the values at equi-distant
intervals (at integer multiples of a given time interval). Most students may rightly guess
that if the samples are far apart, we may loose the ’fidelity’ of the original signal, and
on the other hand, too many samples will make life difficult. Is there a middle line, the
Sampling Theorem deals with it, which we will cover soon. However, we will first focus
on an alternate representation called Fourier Series for periodic signals. And guess what,
studying Fourier Series will give us the Sampling Theorem with little extra effort. Our goal
now is to explore the thread of events and connections in this treasure hunt.
yB
D β
For simplicity, assume that x[n] = 0, ∀n < 0. Let us observe the waveforms at points A and
B, when n ≥ 0.
(y [n])n≥0
YΣ = [ A ], (25)
(yB [n])n≥0
then the output at each n is nothing but the sum of the nth column of YΣ . This gives us a
handle on representing systems. We will simply denote the system as
[α β] (26)
More generally, a discrete system which at any time takes a linear combination of M + 1
consecutive samples can be represented by a parallel arrangement of M delays. We will
simply represent this system as
Sometimes, when the context is clear, to have some economy of space, we will write h[j]
simply as hj , excluding the square brackets, the same applies to x[i], which is written as xi .
Can we write a relation similar to (25) for a general discrete-time system h. It is straight
forward as shown in the matrix representation below.
RRRh0 x0 h0 x1 h0 x2 ⋅⋅ h0 xM h0 xM +1 ⋅⋅ RRRR
RRR
RRR 0 h1 x0 h1 x1 ⋅⋅ h1 xM −1 h1 xM ⋅⋅ RRRR
RRR RRR
RRR ⋅ R
RRR ⋅ ⋅ ⋅⋅ ⋅ ⋅ ⋅⋅ RRRR
RRR ⋅ ⋅ ⋅ ⋅⋅ ⋅ ⋅ ⋅⋅ RRRR
RRR 0 R
R 0 0 ⋅⋅ hM x0 hM x1 ⋅⋅ RRRR
Observe the sum of each column corresponds to the output of the linear time invariant
system defined by h = (h0 h1 ⋅ ⋅ hM ). We can represent this concisely as
The sum represented above is known as convolution. We will often write this as,
where ∗ stands for the convolution sum. When the time index n is clear from the context,
we may even shorten this as
y = h ∗ x. (30)
For those who like matrices, the convolution sum can be written in a concise pleasing form.
y = hX̄ (31)
where X̄ is a matrix with the input vector (x[n]) in first row, and k th row as (x[n]) shifted
to the right by k − 1 elements, i.e. (x[n − k + 1]). There will be M + 1 such rows, that the
multiplication is well-defined. When M < ∞, the system h is known as a finite impulse
response filter (FIR filter).
There are infinite impulse response filters (IIR) too, but we will study it after learning
how to design FIR ones. By the way, FIR filter design is all about choosing the coefficients
h0 ⋅ ⋅hM . We have to choose M as well as the values. Higher M , as expected, gives us more
control on the filter, but will require more computations to obtain the convolution sum.
There is something more funny about the convolution sum. Do you remember seeing
this operation illustrated in the matrix form before equation (28) anywhere else. Guess
hard and long, 13 − 14 years being such a long time. Where were you at that time, likely
to be in your second standard class, where the teacher with stick taught you multiplication
on the board. Here is one such procedure.
1 3 4 3 1 ×
4 2 3
3 9 12 9 3
2 6 8 6 2
4 12 16 12 4
4 14 25 29 22 11 3
The final step is to move the carry from right to left, that we get a decimal representation.
But the brave may ask, what if we don’t use a decimal system. For example, the hexadeci-
mals use A,B,C,D,E,F for numbers 10 to 15. Similarly a system based on 0, 1, ⋅ ⋅ 9 and A − Z
can represent 35 unique numbers. In such a system, we will not have carry bits from left
to right in the above multiplication, as all numbers are below 35.
Exercise 19. Represent the output of our multiplication in a 35-mal number system as
described above.
This seems like an unnecessary divergence. The point we wish to make is that, mul-
tiplication without carry is conceivable and quite natural if we do not stick to just the
decimals. With this, we present our main result.
The above theorem gives us a simple means to compute the convolution of any two
discrete sequences. Typically, one will be the signal and other the system. Nevertheless,
the question remains about carry. Why is the carry adding step omitted. To understand
that, let us compare the multiplication example and convolution. Take the first number
as a sequence of decimals, and imagine it as a signal. So the signal is x[0] = 1, x[1] =
3, x[2] = 4, x[3] = 3 and x[4] = 1. Similarly treat the second number as the system with
h[0] = 4, h[1] = 2 and h[2] = 3. The system outputs h0 x0 at time 0, assuming that
x[n] = 0, ∀n < 0. Let us consider two cases. If there was no carry to the final digit in our
multiplication, the system output is same as the last digit in multiplication. On the other
hand, if there was a carry in our multiplication, we will know about it only in the next
time instant, when the second column is summed up. But in the second instant, since we
have already read out the sum of the first column as output, we cannot now add the carry
to it. This is the reason that a carry addition step is missing in convolution.
This extension is possible under the assumption that the above Riemann integral is well
defined. This is not the case with number of pathological inputs and systems. To develop
the integral as above in a more concrete fashion, we need tools from measure theory, which
are beyond the scope of this lectures. But this does not stop us from trying to understand
why the above definition works at instants where the integral is well-defined. The important
formalism that we need to carry from our understanding of the discrete case is the so called
Kronecker delta or discrete impulse. It is defined as,
⎧
⎪
⎪1 , whenever n = m
δ[n − m] = ⎨ (33)
⎪
⎪
⎩0 , otherwise.
In particular, with the definition above, we have
From this observation, we go ahead and define a continuous time impulse or a Dirac
measure δ(t) as that function where for every t ∈ R, the property
Example 8. The following example illustrates convolution of a continuous time signal with
an impulse (Dirac) at time zero. The impulse is having a weight c. i.e., its area is c instead
of unity. (In the notes, we will always draw the impulse by an upward arrow with closed
tip reaching up to c, but this may not be the standard convention in your disciplines.)
b
c b.c
t τ t
0 a 0 0 a
Example 9. What will happen if the impulse in previous example is shifted to time t0 .
b
c b.c
t τ t
0 a 0 t0 0 t0 t0 + a
Example 10. Suppose that instead of one impulse, we have two, one at time 0 and other
at time t0 . This system is linear(why?). The system is the sum of the those in the last two
examples. Since the input remains the same, the corresponding output will be the sum of
the two outputs in the last examples.
b
c c b.c
t τ t
0 a 0 t0 0 t0 a t0 + a
Observe that the big triangle regions at the output corresponds to the past two examples
and the dashed region is the sum of those triangles, obtained by point-wise addition at each
instant of time.
Exercise 24. Consider applying the input shown below to the system in Example 10. Let
c = 1 (unit impulse) and the height of the input waveform is also 1. Draw the output signal
∀t when t0 = 2 and t1 = 3, with the time axis properly marked.
0 t0 t1
14 Non-causal Systems
The last three examples considered causal systems. But convolution itself has no such
restrictions, it works as well for non-causal systems too. The physical interpretations of
causality apart, including non-causal systems in our study gives us more flexibility and
adds an inherent symmetry in time to the system.
b
c b.c
t τ t
− a2 a
2
0 − a2 a
2
Notice that though the system starts at time zero, there is non-zero output from time − a2 .
This is in some sense analogous to people making each payment for our holiday party a few
days in advance. We saw that delays caused a shift to the right. Advances do the reverse,
a shift to the left. This system actually is causal, as the input values at any time t depends
only on current and past values.
Now imagine that we interchange the system and signal in the previous example. Since
the convolution formula remains the same, the output is the same as that in the example.
But the system now becomes non-causal, as the output at time t = − a2 + depends on the
input at time zero.
From now onward, we will not demarcate between causal and non-causal systems. It
is not just for mere convenience, but a lot of notions that we are going to introduce needs
both sides of the time axis. For example, a sine wave having just a single frequency has to
span itself over all time, from −∞ to ∞. This is not just an imaginary concept. Consider
yourself walking inside a circular tube, let us say anticlockwise is the positive direction,
and clockwise negative. It is a never ending journey, and we can extend this to time too.
15 Time Reversion
We have used the convention all along that time t runs from left to right. Imagine a
traveling wave (say a wave in the ocean), and you are watching it from a podium which
points south, and waves are traveling the east to west direction. The wave as such moves
on the surface of the sea. So the values close to time zero (creation of the wave) are
actually moving to the west (right) in space and more and more values added to the tail
part. Similarly, a signal can be visualized as being passed through the system (just like
our tourists to Goa). In this case, notice that the time is reversed. The first person to
undertake the trip comes out first and keep walking forever through the calendar.
This time-reversal is actually inherent in the multiplication without carry that we de-
scribed. So another way of computing the convolution sum is to take the signal, reverse
it in time, and slide it through the system. In every-time instant compute the cumulative
effect, i.e., each signal value which is inside the system is weighted by the corresponding
segment of the system and all such effects are added up. But this is not anything new.
We actually started with this idea in the first section of this chapter. We repeat it here to
facilitate computing convolutions in continuous time.
Notice that there is a reciprocity between signals and systems. The output will be un-
changed if you interchange a signal and the system. This helps in our computation. Instead
of time-reversing the signal, we can keep the signal and time-reverse the system, and the
output will be the same. In many cases that we consider, signal sequences are longer than
the system (filter), so it is better to reverse the system sequence.
On careful observation, the above equation reveals that convolution can be determined in
the following algorithmic fashion.
(a) we shift h(−τ ) to the right by t units to get h(−(τ − t)) = h(t − τ ).
(b) Initialize y(t) = 0;
(c) Find the sum over τ (see note a )
Exercise 26. Did you use the above technique while doing Exercise 24. If not, try it this
way and compare both the answers.
16 A Computational Simplification
Suppose you are given two continuous functions f (x) and g(x), say with bounded sup-
port. How can we program a computer to computer f (x) ∗ g(x). This may appear a
dubious question at first, the computer is a fixed precision device and how are we going
to compute integrals using this. Notice, however, that this problem is more fundamen-
tal than convolution, and numerical integration is a powerful tool in such contexts. So
can we find approximate numerical values to the output of convolution using numerical
techniques? This illustrated by an example below, for f (x) = exp(−x2 ), x ∈ [−2, 2] and
g(x) = ∣sin(πx)∣, x ∈ [−1, 1].
f (x) g(x)
x x
Observe that f (x) can be approximated by a piece-wise constant function. A piece-wise
constant function is nothing but an interpolation of equi-spaced samples of f (x). We al-
ready learnt at the start of the chapter that piecewise constant interpolation can be written
as a convolution of the input samples with a rectangular waveform. This is illustrated in
Figure 19. Thus f [x] ≈ fn ∗ h(x), where h(x) is a rectangle of unit height and width τ ,
and fn = f (nτ ), n ∈ Z. Here τ is the time interval between consecutive input samples. In
a similar fashion, we can write g(x) also as a convolution, say g(x) ≈ gn ∗ h(x), where
gn = g(nτ ), n ∈ Z. This is shown in Figure 20.
1
0.8
0.6
0.4 = ∗
0.2
0
−2 −1 0 1 2
0.8
0.6
0.4 = ∗
0.2
0
−1 −0.5 0 0.5 1
where ∆(x) is a triangle wave-form centered at origin with width 2τ and height τ . In other
words, we can find the discrete convolution fn ∗ gn first and then linearly interpolate the
resulting values to obtain an approximation to f (x) ∗ g(x). This provides a algorithmic
framework to evaluate the convolution of most functions that we encounter. For integrable
functions, the approximation gives the actual values in the limit of τ → 0.
We like to remind the reader that both the functions f (x) and g(x) should be sam-
pled at the same spacing τ for the above procedure to work. To exemplify this, we
have drawn the approximations of Figures 19 and 20 in different scales of x−axis. In
particular, g(x) is only half as wide as f (x), but they are drawn to the same display
dimensions. The decompositions on the right side inherits the dimensions from the
left most curve.