0% found this document useful (0 votes)
9 views15 pages

SSP Lecture 3

The document provides an overview of signal estimation theory, differentiating between parameter estimation and signal detection, and discussing the role of estimation in recovering signals from noise. It explains Maximum Likelihood Estimation (MLE), its formula, advantages, and limitations, along with derivations for estimating parameters in Gaussian distributions. Additionally, it covers the Expectation-Maximization (EM) algorithm, its applications in signal processing, and specific examples of its use in clustering and handling missing data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views15 pages

SSP Lecture 3

The document provides an overview of signal estimation theory, differentiating between parameter estimation and signal detection, and discussing the role of estimation in recovering signals from noise. It explains Maximum Likelihood Estimation (MLE), its formula, advantages, and limitations, along with derivations for estimating parameters in Gaussian distributions. Additionally, it covers the Expectation-Maximization (EM) algorithm, its applications in signal processing, and specific examples of its use in clustering and handling missing data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

STAT 5102: Statistical Signal Processing

(Lecture 3: Signal Estimation Theory Overview)

Dr. Md Bipul Hossen


Associate Professor
Department of Statistics, BRUR

Question 1: What is signal estimation? Differentiate


between parameter estimation and signal detection.
Explain the role of estimation in signal recovery.

Answer

Signal Estimation refers to the process of extracting or reconstructing the original signal
from observed data that may be corrupted by noise or distortion. In practice, signals
received in real-world systems are often contaminated by random noise, and estimation
techniques help in recovering the underlying useful signal from such noisy observations.

Mathematically, if the observed signal is given by:

y(t) = x(t) + n(t)

where x(t) is the true signal and n(t) is the noise, the goal of signal estimation is to
construct an estimator x̂(t) that approximates x(t) as accurately as possible.

Parameter Estimation vs Signal Detection

• Parameter Estimation: It involves estimating unknown parameters of a signal


or system. For example, in the model y = ax + n, we may estimate the parameter

1
Signal Estimation Theory Overview Statistical Signal Processing

a based on observations of y and x.

• Signal Detection: This is a decision-making process where the objective is to de-


termine whether a particular signal is present or not. It typically involves hypothesis
testing between two (or more) alternatives:

H : y(t) = n(t) (noise only)
0
H : y(t) = x(t) + n(t) (signal plus noise)
1

Key Difference:

• Parameter estimation deals with continuous-valued outputs (e.g., estimating mean,


amplitude, frequency).

• Signal detection deals with discrete decisions (e.g., signal present or not).

Role of Estimation in Signal Recovery

Signal estimation plays a crucial role in the recovery of signals from noisy environments
in various applications such as:

• Communication Systems: Recovering transmitted signals distorted by the chan-


nel.

• Medical Imaging: Estimating physiological signals like ECG or MRI from noisy
measurements.

• Radar and Sonar: Estimating range and velocity of targets using reflected signals.

Signal estimation methods such as the Maximum Likelihood Estimator (MLE), Minimum
Mean Square Error (MMSE), and Bayesian Estimation are widely used for accurate signal
recovery.

2
Signal Estimation Theory Overview Statistical Signal Processing

Question 2: What is Maximum Likelihood Estimation


(MLE). Write down the ML estimator formula. Dis-
cuss advantages and limitations of ML.

Answer

Maximum Likelihood Estimation (MLE) is a method of estimating the parame-


ters of a statistical model by maximizing the likelihood function, which represents the
probability of observing the given data as a function of the model parameters.

General ML Estimator Formula

Let X = {x1 , x2 , . . . , xn } be a set of independent observations from a probability distri-


bution with parameter θ. The likelihood function is:

n
Y
L(θ; x1 , x2 , . . . , xn ) = f (xi ; θ)
i=1

The Maximum Likelihood Estimator θ̂ML is defined as:

θ̂ML = arg max L(θ; x1 , . . . , xn )


θ

Equivalently, using the log-likelihood:

θ̂ML = arg max log L(θ; x1 , . . . , xn )


θ

Advantages of ML Estimation

• Provides efficient and asymptotically unbiased estimators.

• Based on solid theoretical foundations in probability and statistics.

• Works well with large sample sizes (consistency and efficiency).

• Easily generalizable to multivariate and nonlinear models.

3
Signal Estimation Theory Overview Statistical Signal Processing

Limitations of ML Estimation

• May yield biased estimates for small samples.

• Can be computationally intensive for complex models.

• Requires the correct specification of the underlying probability distribution.

• May involve solving non-linear equations without a closed-form solution.

Question 3: If x1, . . . , xn be i.i.d. samples from

xi ∼ N (µ, σ 2),

where both µ and σ 2 are unknown. Derive the maximum-


likelihood estimators µ̂ML and σb2ML.

Answer:

Let x1 , x2 , . . . , xn be independent and identically distributed (i.i.d.) samples from a Gaus-


sian distribution:

xi ∼ N (µ, σ 2 )

We assume σ 2 is known, and we aim to estimate the mean µ.

Step 1: Write the likelihood function

n
(xi − µ)2
 
Y 1
L(µ) = √ exp −
i=1 2πσ 2 2σ 2

Step 2: Take the natural logarithm (log-likelihood)


Q P
Using log = log,
n
n 1 X
ℓ(µ) := log L(µ) = − log 2πσ 2 − 2 (xi − µ)2 .

2 2σ i=1

4
Signal Estimation Theory Overview Statistical Signal Processing

Step 3: Differentiate w.r.t. µ

First expand the quadratic:


n
X n
X n
X n
X
2
(xi − µ) = (x2i − 2µxi + µ ) = 2
x2i − 2µ xi + nµ2 .
i=1 i=1 i=1 i=1

Hence !
n n
n 2 1 X 2 X
xi + nµ2

ℓ(µ) = − log 2πσ − 2 x − 2µ .
2 2σ i=1 i i=1

Differentiate:
n
! n
!
dℓ 1 X 1 X
=− 2 −2 xi + 2nµ = 2 xi − nµ .
dµ 2σ i=1
σ i=1

Set to zero (score equation):


n n
dℓ X 1X
=0 ⇐⇒ xi − nµ = 0 ⇐⇒ µ̂ML = xi =: X̄.
dµ i=1
n i=1

Step 4: Verify maximum (second derivative)

d2 ℓ 1 n
2
= 2 (0 − n) = − 2 < 0,
dµ σ σ
so ℓ(µ) is strictly concave and the stationary point is the (unique) maximizer.

Estimation variance:

Differentiate w.r.t. σ 2 :
n
∂ℓ n 1 X
= − + (xi − µ)2 .
∂σ 2 2σ 2 2(σ 2 )2 i=1

Set to zero and plug µ = µ̂ML :


n
n 1 X
− + (xi − x̄)2 = 0

b 2 22
2σ i=1
b

so n
1X
σb2 ML = (xi − x̄)2 .
n i=1

5
Signal Estimation Theory Overview Statistical Signal Processing

Remarks: µ̂ML is unbiased; σb2 ML is biased (unbiased estimator uses division by n − 1).
Both are standard textbook results.

Question 4: Consider the discrete-time model

y[n] = A s[n] + w[n], n = 0, 1, . . . , N − 1,

where s[n] is a known deterministic signal (known shape),


i.i.d.
A is an unknown scalar amplitude, and w[n] ∼ N (0, σ 2)
(noise variance σ 2 known). (i) Derive the MLE AbML.
(ii) Compute the Fisher information and the Cramér–Rao
lower bound (CRLB) for unbiased estimators of A.

Solution:

(i) Derivation of the MLE A


bML

Stack into vectors: y = As + w, where y, s ∈ RN . The pdf of y is

1  1 2

p(y; A) = exp − ∥y − As∥ .
(2πσ 2 )N/2 2σ 2

The log-likelihood (vector form) is

N 1
log 2πσ 2 − 2 ∥y − As∥2 .

ℓ(A) = −
2 2σ

The log-likelihood (up to additive constant independent of A) is

1
ℓ(A) = − 2
∥y − As∥2 .

Expand the quadratic term:

∥y − As∥2 = (y − As)⊤ (y − As)


= y⊤ y − 2A s⊤ y + A2 s⊤ s.

6
Signal Estimation Theory Overview Statistical Signal Processing

Thus
1
y⊤ y − 2A s⊤ y + A2 s⊤ s .

ℓ(A) = − 2

Differentiate ℓ(A) with respect to A:

dℓ 1 1
= − 2 − 2s⊤ y + 2A s⊤ s = 2 s⊤ y − A s⊤ s .
 
dA 2σ σ

Set derivative to zero to obtain the stationary point:


PN −1
⊤ ⊤ bML = s y = n=0 s[n] y[n]
s y − As s = 0 =⇒ A N −1
.
s⊤ s
P 2
n=0 s[n]

(Second-derivative check:)
d2 ℓ 1 ⊤
= − s s (< 0),
dA2 σ2
so the stationary point is a maximum.

(ii) Fisher information and Cramér–Rao lower bound (CRLB)

The log-likelihood (vector form) is

N 1
log 2πσ 2 − 2 ∥y − As∥2 .

ℓ(A) = −
2 2σ

Score:
dℓ 1
= 2 s⊤ (y − As).
dA σ
Fisher information:
h dℓ 2 i 1 
= 4 E (s⊤ (y − As))2 .

I(A) = E
dA σ

Let w = y − As be the noise vector. Then w ∼ N (0, σ 2 I).


So we can write
1  ⊤ 2
I(A) = E (s w) .
σ4
Notice that
(s⊤ w)2 = s⊤ ww⊤ s = s⊤ (ww⊤ )s.

Since
w ∼ N (0, σ 2 I) ⇒ E[ww⊤ ] = Cov(w) = σ 2 I,

7
Signal Estimation Theory Overview Statistical Signal Processing

1 ⊤ 1 1
I(A) = 4
s E[ww⊤ ]s = 4 s⊤ (σ 2 I)s = 2 s⊤ s.
σ σ σ
Therefore the CRLB for any unbiased estimator à of A is

1 σ2
Var(Ã) ≥ = ⊤ .
I(A) s s

Question 5: State the EM algorithm: what problem


does it solve and what are the E-step and M-step?
What are its motivations?

Answer: EM solves maximum-likelihood estimation problems when data are incomplete


or contain latent variables. Let observed data be Y , latent (missing) variables Z, and
parameters θ. The complete-data log-likelihood is ℓc (θ) = log p(Y, Z | θ). Given current
iterate θ(t) :

• E-step: compute the conditional expectation (the Q-function)

Q(θ | θ(t) ) = EZ|Y,θ(t) log p(Y, Z | θ) .


 

• M-step: update parameters by maximizing Q:

θ(t+1) = arg max Q(θ | θ(t) ).


θ

EM iterates these two steps until convergence; it guarantees non-decreasing observed-data


likelihood p(Y | θ).

Motivation

In many signal processing problems, we observe incomplete or noisy data. For example:

• Mixtures of signals (e.g., in communications or biomedical data).

• Missing data (e.g., some samples lost).

• Hidden labels (e.g., clustering signals into sources).

8
Signal Estimation Theory Overview Statistical Signal Processing

The EM algorithm provides a systematic way to estimate unknown parameters of the


signal model when the data are incomplete or contain latent (hidden) variables.

Question 6: Why is EM useful? Give three concrete


signal-processing situations where EM is applicable.

Answer: EM is useful when direct maximization of the marginal likelihood p(Y | θ) =


R
p(Y, Z | θ) dZ is difficult because the integral (or sum) over latent variables is in-
tractable. EM replaces this difficult problem by iteratively maximizing a simpler expected
complete-data log-likelihood.
Examples in signal processing:

1. Mixture signals / source separation: signals modeled as mixtures (Gaussian


mixture) — EM estimates mixture weights, means, variances.

2. Missing samples / packet loss: reconstructing signal statistics (mean, variance)


when some samples are missing.

3. Hidden Markov Models (HMMs) for speech: Baum–Welch is EM for HMMs


to estimate transition and emission parameters.

Question 7: Describe how EM would be used to clus-


ter 1D signal samples into K clusters modeled by a
Gaussian mixture with unknown means, variances,
and weights.
PK
Answer: Model: p(yi ) = k=1 πk N (yi | µk , σk2 ). Latent labels Zi ∈ {1, . . . , K} indicate
component. EM steps:

• E-step: compute responsibilities

(t) (t) 2(t)


(t) π N (yi | µk , σk )
γik = Pr Zi = k | yi , θ(t) = PK k (t)

(t) 2(t)
.
π
j=1 j N (yi | µ j , σj )

9
Signal Estimation Theory Overview Statistical Signal Processing

• M-step: update parameters


n n
(t+1) Nk (t+1) 1 X (t) 2(t+1) 1 X (t) (t+1)
πk = , µk = γ yi , σk = γik (yi − µk )2 ,
n Nk i=1 ik Nk i=1

Pn (t)
where Nk = i=1 γik .

Repeat until convergence.

Question 8: For the linear-Gaussian latent model:

xi ∼ N (µ, τ 2), yi = xi + εi, εi ∼ N (0, σ 2),

derive the EM updates for estimating θ = (µ, τ 2) when


σ 2 is known. Treat xi as latent.

Answer:

1. Complete-data log-likelihood

The complete-data density factorizes as


n
hY n
i hY i
2 2
p(y, x | µ, τ ) = N (yi | xi , σ ) N (xi | µ, τ 2 ) .
i=1 i=1

Hence, up to additive constants (w.r.t. µ, τ 2 ),


n n
2 2 1 X 2 n 2 1 X
ℓc (µ, τ ) = log p(y, x | µ, τ ) = − 2 (yi − xi ) − log τ − 2 (xi − µ)2 + const.
2σ i=1 2 2τ i=1

2. E-step: compute Q(θ | θ(t) )

Given current parameters θ(t) = (µ(t) , τ 2(t) ), we need

Q(µ, τ 2 | θ(t) ) = Ex|y,θ(t) ℓc (µ, τ 2 ) .


 

10
Signal Estimation Theory Overview Statistical Signal Processing

2.1 Posterior p(xi | yi , θ(t) ) (by completing the square). For each i,

(yi − xi )2 (xi − µ(t) )2


 
(t)
p(xi | yi , θ ) ∝ exp − − .
2σ 2 2τ 2(t)

Collect the quadratic and linear terms in xi :

µ(t)
   
1 1 1 yi
− 2
+ 2(t) x2i + + xi + (terms free of xi ).
2 σ τ σ 2 τ 2(t)

Thus the posterior is Gaussian with variance and mean


−1
µ(t)
  
(t) 1 1 (t) (t) yi
v = 2
+ 2(t) , mi =v + .
σ τ σ 2 τ 2(t)

Hence the moments needed in Q are

(t) (t)
E[xi | yi , θ(t) ] = mi , E[x2i | yi , θ(t) ] = (mi )2 + v (t) .

2.2 Assemble Q. Only the prior part depends on (µ, τ 2 ); the likelihood part − 2σ1 2 (yi −
P

xi )2 is constant in (µ, τ 2 ) after taking expectation, so it can be dropped for the M-step.
Therefore,
n
2 n
(t) 2 1 X 
E (xi − µ)2 | yi , θ(t) + const

Q(µ, τ | θ ) = − log τ − 2
2 2τ i=1
n
n 1 X 2 
= − log τ 2 − 2 E[xi ] − 2µ E[xi ] + µ2 + const
2 2τ i=1
n
n 2 1 X  (t) 2 (t)

= − log τ − 2 (mi ) + v (t) − 2µ mi + µ2 + const.
2 2τ i=1

3. M-step: maximize Q w.r.t. (µ, τ 2 )

3.1 Update for µ. Differentiate Q w.r.t. µ:


n n
∂Q 1 X (t) 1  X (t) 
=− 2 (−2mi + 2µ) = − 2 − mi + nµ .
∂µ 2τ i=1 τ i=1

Set to zero:
n n
X (t) (t+1) 1 X (t)
− mi + nµ = 0 =⇒ µ = m .
i=1
n i=1 i

11
Signal Estimation Theory Overview Statistical Signal Processing

3.2 Update for τ 2 . Treat µ as fixed at µ(t+1) . Define


n 
X  n 
X 
(t) (t) (t) (t)
S (µ) ≡ (mi )2 +v (t)
− 2µ mi +µ 2
= v (t) + (mi − µ)2 .
i=1 i=1

Then
n 1
Q(µ, τ 2 | θ(t) ) = − log τ 2 − 2 S (t) (µ) + const.
2 2τ
Differentiate w.r.t. τ 2 and set to zero:

∂Q n 1 1 S (t) (µ) S (t) (µ)


= − + =0 =⇒ τ2 = .
∂τ 2 2 τ 2 2 (τ 2 )2 n

With µ = µ(t+1) ,
n
2(t+1) 1 X  (t) (t) 2 
τ = v + mi − µ(t+1) .
n i=1

4. Final EM updates (collected)

With −1
µ(t)
  
(t) 1 1 (t) (t) yi
v = 2
+ 2(t) , mi =v + ,
σ τ σ 2 τ 2(t)
the M-step gives

n n
(t+1) 1 X (t) 2(t+1) 1 X  (t) (t) 2 
µ = m , τ = v + mi − µ(t+1) .
n i=1 i n i=1

Problem 9: Given samples y = {0.2, 0.4, 2.2, 2.4} mod-


eled by a two-Gaussian mixture with known σ 2 = 0.12.
Start from
(0) (0)
π (0) = 0.5, µ1 = 0.3, µ2 = 2.3.

12
Signal Estimation Theory Overview Statistical Signal Processing

(1) (1)
Perform one EM iteration and report π (1), µ1 , µ2 .

Solution:

E-step. For each yi ,


 
(0) 2
(y −µ )
π (0) exp − 2σ12 i

(0) (0) (0)


γi1 = 
(0) 2
 
(0) 2
, γi2 = 1 − γi1 .
(y i −µ ) (y i −µ )
π (0) exp − 2σ12 + (1 − π (0) ) exp − 2σ22

Compute the exponents with σ 2 = 0.01:

i = 1, y1 = 0.2:

(0) (0)
(y1 − µ1 )2 (0.2 − 0.3)2 (y1 − µ2 )2 (0.2 − 2.3)2
= = 0.5, = = 231.
2σ 2 0.02 2σ 2 0.02
(0) 0.5e−0.5
Thus γ11 ≈ 0.5e−0.5 +0.5e−231
≈ 1.0000.
i = 2, y2 = 0.4:

(0.4 − 0.3)2 (0.4 − 2.3)2 (0)


= 0.5, = 180.5 ⇒ γ21 ≈ 1.0000.
0.02 0.02

i = 3, y3 = 2.2:

(2.2 − 0.3)2 (2.2 − 2.3)2 (0) (0)


= 180.5, = 0.5 ⇒ γ31 ≈ 0.0, γ32 ≈ 1.0000.
0.02 0.02

i = 4, y4 = 2.4:

(2.4 − 0.3)2 (2.4 − 2.3)2 (0) (0)


= 220.5, = 0.5 ⇒ γ41 ≈ 0.0, γ42 ≈ 1.0000.
0.02 0.02

Hence (within machine precision):

γ11 = γ21 = 1, γ31 = γ41 = 0.

M-step. Effective counts:


X X
N1 = γi1 = 2, N2 = γi2 = 2.
i i

13
Signal Estimation Theory Overview Statistical Signal Processing

Updated mixture weight:


N1 2
π (1) = = = 0.5.
n 4
Updated means:
P P
(1) i γi1 yi 0.2 + 0.4 (1) i γi2 yi 2.2 + 2.4
µ1 = = = 0.3, µ2 = = = 2.3.
N1 2 N2 2

Thus after one iteration the parameters remain (0.5, 0.3, 2.3) because the initialization
already matches the two tight clusters perfectly.

Question 10: Given observations y = {0.0, 0.4, 1.9, 2.1},


model them as a two-component Gaussian mixture
with equal initial weights and equal variances known
σ 2 = 0.04 (i.e., σ = 0.2). Initialize
(0) (0)
π (0) = 0.5, µ1 = 0.2, µ2 = 2.0.

Perform one EM iteration (compute E-step respon-


sibilities and M-step updates). Show numeric values
(rounded reasonably).

Answer: Given σ 2 = 0.04, 1/(2σ 2 ) = 1/(0.08) = 12.5.


E-step: for each yi compute
 (y −µ(0) )2 
π (0) exp − i 2σ12
ri1 = P  (yi −µ(0) )2  .
2 (0)
k=1 πk exp −
k
2σ 2

14
Signal Estimation Theory Overview Statistical Signal Processing

Compute exponents:

(y1 , y2 , y3 , y4 ) = (0.0, 0.4, 1.9, 2.1)


0.04 0.04
(y1 − µ1 )2 = (0.0 − 0.2)2 = 0.04 ⇒ 2
= = 0.5
2σ 0.08
4.00
(y1 − µ2 )2 = (0.0 − 2.0)2 = 4.00 ⇒ = 50
0.08
(y2 − µ1 )2 = (0.4 − 0.2)2 = 0.04 ⇒ 0.5
(y2 − µ2 )2 = (0.4 − 2.0)2 = 2.56 ⇒ 32
(y3 − µ1 )2 = (1.9 − 0.2)2 = 2.89 ⇒ 36.125
(y3 − µ2 )2 = (1.9 − 2.0)2 = 0.01 ⇒ 0.125
(y4 − µ1 )2 = (2.1 − 0.2)2 = 3.61 ⇒ 45.125
(y4 − µ2 )2 = (2.1 − 2.0)2 = 0.01 ⇒ 0.125

Now compute responsibilities (dominant exponent gives near 1 or 0):

0.5e−0.5
• i = 1: r11 ≈ ≈ 1.0.
0.5e−0.5 + 0.5e−50
0.5e−0.5
• i = 2: r21 ≈ ≈ 1.0.
0.5e−0.5 + 0.5e−32
0.5e−36.125
• i = 3: r31 ≈ ≈ 0.0.
0.5e−36.125 + 0.5e−0.125
0.5e−45.125
• i = 4: r41 ≈ ≈ 0.0.
0.5e−45.125 + 0.5e−0.125

So (within numerical precision) first two points assigned to component 1, last two to
component 2.
M-step:
X
N1 = ri1 ≈ 2, N2 ≈ 2.
i

π (1) = N1 /n = 2/4 = 0.5.


(1)0.0 + 0.4 (1) 1.9 + 2.1
µ1 = = 0.2, µ2 = = 2.0.
2 2
Hence the parameters remain essentially the same after one iteration because initialization
matched the cluster centers.

15

You might also like