0% found this document useful (0 votes)
34 views25 pages

Nonlinear Regression

The document discusses nonlinear regression, focusing on fitting models to data and estimating parameters using methods such as least-squares and maximum likelihood. It covers statistical properties, confidence intervals, prediction intervals, and error models, particularly in the context of tumor growth modeling. Additionally, it addresses goodness of fit metrics and information criteria for model selection.

Uploaded by

derahmandandj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views25 pages

Nonlinear Regression

The document discusses nonlinear regression, focusing on fitting models to data and estimating parameters using methods such as least-squares and maximum likelihood. It covers statistical properties, confidence intervals, prediction intervals, and error models, particularly in the context of tumor growth modeling. Additionally, it addresses goodness of fit metrics and information criteria for model selection.

Uploaded by

derahmandandj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Nonlinear regression

S. Benzekry
1. Fitting a model
Data Theory Mathematical model

… …

ka
Linear regression

𝑦 = 𝜃! + 𝜃" 𝑡 + 𝜀
Question: what is the « best » linear approximation of 𝑦 ?
2
𝑦" 1 𝑡"
⋮ = ⋮ ⋮ ⋅ 𝜃$ 𝑀 rectangular
𝑛

𝑦# 𝜃"
1 𝑡#
no solution

⇔𝑦 =𝑀⋅𝜃
!
× 𝑀 (∈ 𝑀%,# ) one unique solution
⇒ 𝑀! 𝑦 = 𝑀! 𝑀 ⋅ 𝜃
(if the square matrix 𝑀! 𝑀 is invertible)
𝑀%,# ⋅ 𝑀#," 𝑀%,# ⋅ 𝑀#,% ⋅ 𝑀%,"

𝑀%," 𝑀%,% ⋅ 𝑀%,"


^ 1𝟏 𝑻
𝜽= 𝑻
𝑴 𝑴 𝑴 𝒚
Formalism

• Observations: n couples of points (𝑡$ , 𝑦$ ), with 𝑦$ ∈ ℝ (or ℝ! ).


We will denote 𝑦 = (𝑦", ⋯ , 𝑦% ) ∈ ℝ% and 𝑡 = (𝑡" , ⋯ , 𝑡% ).

• Structural model: a function

• The (unknown) vector of parameters 𝜃 ∗ ∈ ℝ'

Goal = find 𝜃 ∗
Statistical model

𝑦$ = 𝑀(𝑡$ ; 𝜃 ∗ ) + 𝑒$

• « True » parameter 𝜃 ∗

• 𝑒" = error = measurement error + structural error

• (𝑦! , ⋯ , 𝑦" ) are realizations of random variables

𝑌" , 𝜀" = r.v.


𝑌4 = 𝑀(𝑡4 ; 𝜃 ∗) + 𝜀4
𝑦4 , 𝑒4 = realizations

• (𝑦! , ⋯ , 𝑦" ) = sample with probability density function 𝑝(𝑦|𝜃 ∗ )


̂


An estimator of 𝜃 is a random variable function of 𝑌, denoted 𝜃 :
̂
𝜃 = ℎ(𝑌# , ⋯ , 𝑌$ )
Linear least-squares: statistical properties

𝑌 = 𝑀𝜃 ∗ + 𝜀
̂ ̂
𝜃() = 𝑀 𝑀! 0"
𝑀 𝑌 ⇔ 𝜃() = argmin
! ∥ 𝑌 − 𝑀𝜃 ∥%
!
*∈ℝ

Proposition:
Assume that 𝜀 ∼ 𝒩 0, 𝜎 # 𝐼 , then
̂
𝜃%& ∼ 𝒩 𝜃 ∗ , 𝜎 # 𝑀( 𝑀 )!

From this, standard errors and confidence intervals can be computed on


the parameter estimates

̂
%
1 ̂
̂ ⁄
- %
𝐼𝐶- 𝜃 ∗ = 𝜃(),/ ± 𝑡#0/ 𝑠 𝑀! 𝑀 0"
𝑠 = ∥ 𝑦 − 𝑀𝜃() ∥% 𝑠𝑒 𝜃"#,% = 𝑠 𝑀& 𝑀 '( /,/
𝑛−𝑝 %,%
Example: tumor growth

14
*+, % ●

Log(number of cells)
𝑛𝑏& ≃ 𝑁' 𝑒 ()2
= 𝑁' 𝑒 -! )2

13

12
ln(𝑛𝑏& ) ≃ log 𝑁' + 𝜆𝑡& ●

11

𝑦 = 𝜃# + 𝜃% 𝑡 + 𝜀

10
0 1 2 3 4
Time (d)

𝝀> = 𝜽 @ = 𝐥𝐨𝐠 𝟐 𝟏𝟗. 𝟐 hours


@𝟐 = 𝟎. 𝟖𝟔𝟓 ⟹ 𝑫𝑻
2 𝝀

@% = 0.004, rse 𝜃
𝑠𝑒 𝜃 @% = 0.005
𝑰𝑪 𝑫𝑻 = (𝟏𝟖. 𝟖, 𝟏𝟗. 𝟕) hours
Statistical test for the model parameters

̂
𝜃 ∼ 𝒩 𝜃 ∗ , 𝜎 % 𝑀! 𝑀 5#

𝜃3 − 𝜃3∗ t-distribution with 𝑛 − 𝑝 degrees of


For 𝑘 = 1,2 … , 𝑝 𝑡3 =
𝑠𝑒3
freedom

⟹ t-test (Wald test)


H0 : « 𝜃! = 0 » versus H1 : « 𝜃! ≠ 0 »

̂
84
Under the null hypothesis, 𝑡6)7) = follows a t-distribution with 𝑛 − 𝑑 degrees of freedom
694

p-value:
ℙ |𝑡#0/ | ≥ 𝑡5676 = 2 1 − ℙ 𝑡#0/ ≤ 𝑡5676
Nonlinear regression: least-squares

𝑌 = 𝑀 𝑡; 𝜃 ∗ + 𝜀 𝜀"

̂
𝜃() = argmin ∥ 𝑌 − 𝑀 𝑡; 𝜃 ∥%
*∈ℝ!

Linearization: 𝑀(𝑡, 𝜃) = 𝑀(𝑡, 𝜃 ∗ ) + 𝐽 ⋅ 𝜃 − 𝜃 ∗ + 𝑜 𝜃 − 𝜃 ∗ , 𝐽 = 𝐷: 𝑀(𝑡, 𝜃 ∗ )

Proposition:
Assume 𝜀 ∼ 𝒩 0, 𝜎 # 𝐼 . Then, for large 𝑛, approximately
̂
𝜃%& ∼ 𝒩 𝜃 ∗ , 𝜎 # 𝐽( 𝐽 )!

⇒ standard errors, confidence intervals


Confidence interval and prediction interval

𝑌 = 𝑀 𝑡; 𝜃 + 𝜀

̂ ̂
• Prediction at new time 𝑡'() 𝑀#89 = 𝑀 𝑡#89 , 𝜃

̂ ̂
• Uncertainty on parameter estimate 𝜃 ⟹confidence interval on 𝑀$9;
̂ ̂
𝑀#89 ∼ 𝒩 𝑀#89 , 𝑉𝑎𝑟 𝑀#89

̂
• Uncertainty on parameter estimate 𝜃

̂
+ uncertainty on observation 𝜀 (e.g. measurement error) ⟹ prediction interval on 𝑀$9;
𝑦#89 = 𝑀#89 + 𝜀

̂ ̂
𝑦#89 ∼ 𝒩 𝑀#89 , 𝑉𝑎𝑟 𝑀#89 + 𝜎 % 𝐼
Confidence interval vs prediction interval
Error models for tumor volume

𝜀$ i.i.d 𝒩(0, 𝜎$ )

Constant Proportional Specific

̂
𝜎# = 𝜎, ∀𝑗 𝜎# = 𝜎𝑀 𝑡# , 𝜃

𝑝 = 0.004 𝑝 = 0.083 𝑝 = 0.2

̂
Note: combined error model = 𝜎" = 𝑎 + 𝑏𝑀 𝑡" , 𝜃
Nonlinear regression: Likelihood maximization

𝑌 = 𝑀 𝑡; 𝜃 ∗ + 𝜀
The likelihood is defined by
Z
𝐿(𝜃) = 𝑝(𝑦Y , ⋯ , 𝑦Z |𝜃) = ∏ 𝑝(𝑦4 |𝜃)
4[Y

It is the probability to observe 𝑦 if the parameter is 𝜃.

The maximum likelihood estimator (MLE) is the value of 𝜃 that maximizes the likelihood

̂
𝜃:; = argmax𝐿(𝜃)
*
Asymptotic properties of the MLE

Proposition:
Under regularity assumptions on 𝐿, when 𝑛 → +∞
̂
1. 𝜃+, ⟶ 𝜃 ∗ (consistency)
̂
2. 𝜃_` is asymptotically of minimal variance (it reaches the Cramér-
Rao bound):
̂
𝑛 𝜃+, − 𝜃 ∗ ⇀ 𝒩 0, 𝐼-)!

where 𝐼-∗ is the Fisher information matrix

𝜕log(𝑝(𝑌|𝜃 ∗ )) 𝜕log(𝑝(𝑌|𝜃 ∗ )) 𝜕 % log 𝑝 𝑌|𝜃 ∗


𝐼*∗ <,3 =𝔼 =𝔼 − .
𝜕𝜃< 𝜕𝜃3 𝜕𝜃< 𝜕𝜃3
Precision of the estimates

rse = 10% rse = 50%

95% C.I 95% C.I


−2𝐿𝐿 Correlation between estimates

beta
alpha

small r.s.e on alpha and beta, but large correlation


MLE: normal errors

𝑌4 = 𝑀 𝑡4 ; 𝜃 ∗ + 𝜀4 , 𝜀4 ∼ 𝒩 0, 𝜎

*
1 .) -/(1) ,2)
1 ∥.-/(1,2)∥*
- -
𝑝(𝑦# |𝜃, 𝜎) = 𝑒 45 * , 𝐿(𝜃, 𝜎) = 6 𝑒 45 *
𝜎 2𝜋 𝜎 2𝜋

Maximize 𝐿(𝜃, 𝜎) ⇔minimize 𝐹(𝜃, 𝜎) = −log 𝐿(𝜃, 𝜎)


∥ 𝑦 − 𝑀(𝑡, 𝜃) ∥4
𝐹(𝜃, 𝜎) = 𝑛log 𝜎 2𝜋 +
2𝜎 4

𝜕𝐹 ̂ ̂ ̂ 1 ̂
𝜃 , 𝜎 = 0 ⇒ 𝜎 = ∥ 𝑦 − 𝑀(𝑡, 𝜃 ) ∥4
𝜕𝜎 𝑛
̂
⇒ 𝜃 = argmin ∥ 𝑦 − 𝑀(𝑡, 𝜃) ∥4
2

Maximum likelihood ⇔ Least-squares


Application: tumor growth
What are minimal biological processes able to recover the kinetics of
(experimental) tumor growth?

Exponential Gompertz
t
ec
ej

… …
R

Fits very well

Lacks physiological interpretation

Logistic Power law

Competition
t
ec

Fits very well


ej
R

Has physiological

interpretation

Benzekry et al., PLoS Comp Biol, 2014


Goodness of fit metrics

Sum of Squared Errors Akaike Information Criterion

number of parameters

Root Mean Squared Errors R2


Parameter values and identifiability

NSE = Normalized Standard Error practical identifiability


Information criteria

̂ ̂
𝐴𝐼𝐶 = −2𝐿𝐿(𝜃 ) + 2𝑝 𝐵𝐼𝐶 = −2𝐿𝐿(𝜃 ) + log(𝑛)𝑝

Best model = smallest AIC or BIC

̂
−2𝐿𝐿(𝜃 ) ̂
−2𝐿𝐿(𝜃 )
Population modeling: the two-steps approach

Individual fits
Population data
̂ ̂
𝑌Y =𝑀 𝑡; 𝜃Y +𝜀 #
𝜃 ,…,𝜃 <

𝑌 g = 𝑀 𝑡; 𝜃 g + 𝜀
.
.
. ̂ 1 @ ̂>
𝜃/=/ = ∑𝜃
𝑌 h = 𝑀 𝑡; 𝜃 h + 𝜀 𝑁 >?"
̂ ̂
Ω = 𝑉𝐶𝑜𝑣 𝜃 >
Individual structural model
̂ ̂
𝒩 𝜃iji , Ω

Population model
Population modeling: mixed-effects approach

Population fit (MLE)

Population data

Population model

𝜃 > = 𝜃/=/ + 𝜂 > , 𝜂 > ∼ 𝒩 0, Ω

fixed random effects


effects

Individual structural model Individual fit


References

• Course « Statistics in Action with R » by Marc Lavielle


http://sia.webpopix.org/index.html

• Seber, G. A., & Wild, C. J. (2003). Nonlinear regression. Hoboken (NJ): Wiley-Interscience.

You might also like