Nonlinear regression
S. Benzekry
1. Fitting a model
Data Theory Mathematical model
… …
ka
Linear regression
𝑦 = 𝜃! + 𝜃" 𝑡 + 𝜀
Question: what is the « best » linear approximation of 𝑦 ?
2
𝑦" 1 𝑡"
⋮ = ⋮ ⋮ ⋅ 𝜃$ 𝑀 rectangular
𝑛
𝑦# 𝜃"
1 𝑡#
no solution
⇔𝑦 =𝑀⋅𝜃
!
× 𝑀 (∈ 𝑀%,# ) one unique solution
⇒ 𝑀! 𝑦 = 𝑀! 𝑀 ⋅ 𝜃
(if the square matrix 𝑀! 𝑀 is invertible)
𝑀%,# ⋅ 𝑀#," 𝑀%,# ⋅ 𝑀#,% ⋅ 𝑀%,"
𝑀%," 𝑀%,% ⋅ 𝑀%,"
^ 1𝟏 𝑻
𝜽= 𝑻
𝑴 𝑴 𝑴 𝒚
Formalism
• Observations: n couples of points (𝑡$ , 𝑦$ ), with 𝑦$ ∈ ℝ (or ℝ! ).
We will denote 𝑦 = (𝑦", ⋯ , 𝑦% ) ∈ ℝ% and 𝑡 = (𝑡" , ⋯ , 𝑡% ).
• Structural model: a function
• The (unknown) vector of parameters 𝜃 ∗ ∈ ℝ'
Goal = find 𝜃 ∗
Statistical model
𝑦$ = 𝑀(𝑡$ ; 𝜃 ∗ ) + 𝑒$
• « True » parameter 𝜃 ∗
• 𝑒" = error = measurement error + structural error
• (𝑦! , ⋯ , 𝑦" ) are realizations of random variables
𝑌" , 𝜀" = r.v.
𝑌4 = 𝑀(𝑡4 ; 𝜃 ∗) + 𝜀4
𝑦4 , 𝑒4 = realizations
• (𝑦! , ⋯ , 𝑦" ) = sample with probability density function 𝑝(𝑦|𝜃 ∗ )
̂
•
∗
An estimator of 𝜃 is a random variable function of 𝑌, denoted 𝜃 :
̂
𝜃 = ℎ(𝑌# , ⋯ , 𝑌$ )
Linear least-squares: statistical properties
𝑌 = 𝑀𝜃 ∗ + 𝜀
̂ ̂
𝜃() = 𝑀 𝑀! 0"
𝑀 𝑌 ⇔ 𝜃() = argmin
! ∥ 𝑌 − 𝑀𝜃 ∥%
!
*∈ℝ
Proposition:
Assume that 𝜀 ∼ 𝒩 0, 𝜎 # 𝐼 , then
̂
𝜃%& ∼ 𝒩 𝜃 ∗ , 𝜎 # 𝑀( 𝑀 )!
From this, standard errors and confidence intervals can be computed on
the parameter estimates
̂
%
1 ̂
̂ ⁄
- %
𝐼𝐶- 𝜃 ∗ = 𝜃(),/ ± 𝑡#0/ 𝑠 𝑀! 𝑀 0"
𝑠 = ∥ 𝑦 − 𝑀𝜃() ∥% 𝑠𝑒 𝜃"#,% = 𝑠 𝑀& 𝑀 '( /,/
𝑛−𝑝 %,%
Example: tumor growth
14
*+, % ●
Log(number of cells)
𝑛𝑏& ≃ 𝑁' 𝑒 ()2
= 𝑁' 𝑒 -! )2
13
●
12
ln(𝑛𝑏& ) ≃ log 𝑁' + 𝜆𝑡& ●
11
●
𝑦 = 𝜃# + 𝜃% 𝑡 + 𝜀
10
0 1 2 3 4
Time (d)
𝝀> = 𝜽 @ = 𝐥𝐨𝐠 𝟐 𝟏𝟗. 𝟐 hours
@𝟐 = 𝟎. 𝟖𝟔𝟓 ⟹ 𝑫𝑻
2 𝝀
@% = 0.004, rse 𝜃
𝑠𝑒 𝜃 @% = 0.005
𝑰𝑪 𝑫𝑻 = (𝟏𝟖. 𝟖, 𝟏𝟗. 𝟕) hours
Statistical test for the model parameters
̂
𝜃 ∼ 𝒩 𝜃 ∗ , 𝜎 % 𝑀! 𝑀 5#
𝜃3 − 𝜃3∗ t-distribution with 𝑛 − 𝑝 degrees of
For 𝑘 = 1,2 … , 𝑝 𝑡3 =
𝑠𝑒3
freedom
⟹ t-test (Wald test)
H0 : « 𝜃! = 0 » versus H1 : « 𝜃! ≠ 0 »
̂
84
Under the null hypothesis, 𝑡6)7) = follows a t-distribution with 𝑛 − 𝑑 degrees of freedom
694
p-value:
ℙ |𝑡#0/ | ≥ 𝑡5676 = 2 1 − ℙ 𝑡#0/ ≤ 𝑡5676
Nonlinear regression: least-squares
𝑌 = 𝑀 𝑡; 𝜃 ∗ + 𝜀 𝜀"
̂
𝜃() = argmin ∥ 𝑌 − 𝑀 𝑡; 𝜃 ∥%
*∈ℝ!
Linearization: 𝑀(𝑡, 𝜃) = 𝑀(𝑡, 𝜃 ∗ ) + 𝐽 ⋅ 𝜃 − 𝜃 ∗ + 𝑜 𝜃 − 𝜃 ∗ , 𝐽 = 𝐷: 𝑀(𝑡, 𝜃 ∗ )
Proposition:
Assume 𝜀 ∼ 𝒩 0, 𝜎 # 𝐼 . Then, for large 𝑛, approximately
̂
𝜃%& ∼ 𝒩 𝜃 ∗ , 𝜎 # 𝐽( 𝐽 )!
⇒ standard errors, confidence intervals
Confidence interval and prediction interval
𝑌 = 𝑀 𝑡; 𝜃 + 𝜀
̂ ̂
• Prediction at new time 𝑡'() 𝑀#89 = 𝑀 𝑡#89 , 𝜃
̂ ̂
• Uncertainty on parameter estimate 𝜃 ⟹confidence interval on 𝑀$9;
̂ ̂
𝑀#89 ∼ 𝒩 𝑀#89 , 𝑉𝑎𝑟 𝑀#89
̂
• Uncertainty on parameter estimate 𝜃
̂
+ uncertainty on observation 𝜀 (e.g. measurement error) ⟹ prediction interval on 𝑀$9;
𝑦#89 = 𝑀#89 + 𝜀
̂ ̂
𝑦#89 ∼ 𝒩 𝑀#89 , 𝑉𝑎𝑟 𝑀#89 + 𝜎 % 𝐼
Confidence interval vs prediction interval
Error models for tumor volume
𝜀$ i.i.d 𝒩(0, 𝜎$ )
Constant Proportional Specific
̂
𝜎# = 𝜎, ∀𝑗 𝜎# = 𝜎𝑀 𝑡# , 𝜃
𝑝 = 0.004 𝑝 = 0.083 𝑝 = 0.2
̂
Note: combined error model = 𝜎" = 𝑎 + 𝑏𝑀 𝑡" , 𝜃
Nonlinear regression: Likelihood maximization
𝑌 = 𝑀 𝑡; 𝜃 ∗ + 𝜀
The likelihood is defined by
Z
𝐿(𝜃) = 𝑝(𝑦Y , ⋯ , 𝑦Z |𝜃) = ∏ 𝑝(𝑦4 |𝜃)
4[Y
It is the probability to observe 𝑦 if the parameter is 𝜃.
The maximum likelihood estimator (MLE) is the value of 𝜃 that maximizes the likelihood
̂
𝜃:; = argmax𝐿(𝜃)
*
Asymptotic properties of the MLE
Proposition:
Under regularity assumptions on 𝐿, when 𝑛 → +∞
̂
1. 𝜃+, ⟶ 𝜃 ∗ (consistency)
̂
2. 𝜃_` is asymptotically of minimal variance (it reaches the Cramér-
Rao bound):
̂
𝑛 𝜃+, − 𝜃 ∗ ⇀ 𝒩 0, 𝐼-)!
∗
where 𝐼-∗ is the Fisher information matrix
𝜕log(𝑝(𝑌|𝜃 ∗ )) 𝜕log(𝑝(𝑌|𝜃 ∗ )) 𝜕 % log 𝑝 𝑌|𝜃 ∗
𝐼*∗ <,3 =𝔼 =𝔼 − .
𝜕𝜃< 𝜕𝜃3 𝜕𝜃< 𝜕𝜃3
Precision of the estimates
rse = 10% rse = 50%
95% C.I 95% C.I
−2𝐿𝐿 Correlation between estimates
beta
alpha
small r.s.e on alpha and beta, but large correlation
MLE: normal errors
𝑌4 = 𝑀 𝑡4 ; 𝜃 ∗ + 𝜀4 , 𝜀4 ∼ 𝒩 0, 𝜎
*
1 .) -/(1) ,2)
1 ∥.-/(1,2)∥*
- -
𝑝(𝑦# |𝜃, 𝜎) = 𝑒 45 * , 𝐿(𝜃, 𝜎) = 6 𝑒 45 *
𝜎 2𝜋 𝜎 2𝜋
Maximize 𝐿(𝜃, 𝜎) ⇔minimize 𝐹(𝜃, 𝜎) = −log 𝐿(𝜃, 𝜎)
∥ 𝑦 − 𝑀(𝑡, 𝜃) ∥4
𝐹(𝜃, 𝜎) = 𝑛log 𝜎 2𝜋 +
2𝜎 4
𝜕𝐹 ̂ ̂ ̂ 1 ̂
𝜃 , 𝜎 = 0 ⇒ 𝜎 = ∥ 𝑦 − 𝑀(𝑡, 𝜃 ) ∥4
𝜕𝜎 𝑛
̂
⇒ 𝜃 = argmin ∥ 𝑦 − 𝑀(𝑡, 𝜃) ∥4
2
Maximum likelihood ⇔ Least-squares
Application: tumor growth
What are minimal biological processes able to recover the kinetics of
(experimental) tumor growth?
Exponential Gompertz
t
ec
ej
… …
R
Fits very well
Lacks physiological interpretation
Logistic Power law
Competition
t
ec
Fits very well
ej
R
Has physiological
interpretation
Benzekry et al., PLoS Comp Biol, 2014
Goodness of fit metrics
Sum of Squared Errors Akaike Information Criterion
number of parameters
Root Mean Squared Errors R2
Parameter values and identifiability
NSE = Normalized Standard Error practical identifiability
Information criteria
̂ ̂
𝐴𝐼𝐶 = −2𝐿𝐿(𝜃 ) + 2𝑝 𝐵𝐼𝐶 = −2𝐿𝐿(𝜃 ) + log(𝑛)𝑝
Best model = smallest AIC or BIC
̂
−2𝐿𝐿(𝜃 ) ̂
−2𝐿𝐿(𝜃 )
Population modeling: the two-steps approach
Individual fits
Population data
̂ ̂
𝑌Y =𝑀 𝑡; 𝜃Y +𝜀 #
𝜃 ,…,𝜃 <
𝑌 g = 𝑀 𝑡; 𝜃 g + 𝜀
.
.
. ̂ 1 @ ̂>
𝜃/=/ = ∑𝜃
𝑌 h = 𝑀 𝑡; 𝜃 h + 𝜀 𝑁 >?"
̂ ̂
Ω = 𝑉𝐶𝑜𝑣 𝜃 >
Individual structural model
̂ ̂
𝒩 𝜃iji , Ω
Population model
Population modeling: mixed-effects approach
Population fit (MLE)
Population data
Population model
𝜃 > = 𝜃/=/ + 𝜂 > , 𝜂 > ∼ 𝒩 0, Ω
fixed random effects
effects
Individual structural model Individual fit
References
• Course « Statistics in Action with R » by Marc Lavielle
http://sia.webpopix.org/index.html
• Seber, G. A., & Wild, C. J. (2003). Nonlinear regression. Hoboken (NJ): Wiley-Interscience.