-
Notifications
You must be signed in to change notification settings - Fork 2k
Expand file tree
/
Copy pathqr_and_regression.Rmd
More file actions
189 lines (132 loc) · 5.02 KB
/
qr_and_regression.Rmd
File metadata and controls
189 lines (132 loc) · 5.02 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
---
title: "The QR decomposition"
author: "Rafa"
date: "January 31, 2015"
output: html_document
layout: page
---
```{r options, echo=FALSE}
library(knitr)
opts_chunk$set(fig.path=paste0("figure/", sub("(.*).Rmd","\\1",basename(knitr:::knit_concord$get('infile'))), "-"))
```
## The QR Factorization (Advanced)
We have seen that in order to calculate the LSE, we need to invert a matrix. In previous sections we used the function `solve`. However, solve is not a stable solution. When coding LSE computation, we use the QR decomposition.
#### Inverting $\mathbf{X^\top X}$
Remember that to minimize the RSS:
$$
(\mathbf{Y}-\mathbf{X}\boldsymbol{\beta})^\top
(\mathbf{Y}-\mathbf{X}\boldsymbol{\beta})
$$
We need to solve:
$$
\mathbf{X}^\top \mathbf{X} \boldsymbol{\hat{\beta}} = \mathbf{X}^\top \mathbf{Y}
$$
The solution is:
$$
\boldsymbol{\hat{\beta}} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{Y}
$$
Thus, we need to compute $(\mathbf{X}^\top \mathbf{X})^{-1}$.
#### `solve` is numerically unstable
To demonstrate what we mean by _numerically unstable_, we construct an extreme case:
```{r}
n <- 50;M <- 500
x <- seq(1,M,len=n)
X <- cbind(1,x,x^2,x^3)
colnames(X) <- c("Intercept","x","x2","x3")
beta <- matrix(c(1,1,1,1),4,1)
set.seed(1)
y <- X%*%beta+rnorm(n,sd=1)
```
The standard R function for inverse gives an error:
```{r,eval=FALSE}
solve(crossprod(X)) %*% crossprod(X,y)
```
To see why this happens, look at $(\mathbf{X}^\top \mathbf{X})$
```{r}
options(digits=4)
log10(crossprod(X))
```
Note the difference of several orders of magnitude. On a digital computer, we have a limited range of numbers. This makes some numbers seem like 0, when we also have to consider very large numbers. This in turn leads to divisions that are practically divisions by 0 errors.
#### The factorization
The QR factorization is based on a mathematical result that tells us that we can decompose any full rank $N\times p$ matrix $\mathbf{X}$ as:
$$
\mathbf{X = QR}
$$
with:
* $\mathbf{Q}$ a $N \times p$ matrix with $\mathbf{Q^\top Q=I}$
* $\mathbf{R}$ a $p \times p$ upper triangular matrix.
Upper triangular matrices are very convenient for solving system of equations.
#### Example of upper triangular matrix
In the example below, the matrix on the left is upper triangular: it only has 0s below the diagonal.
This facilitates solving the system of equations greatly:
$$
\,
\begin{pmatrix}
1&2&-1\\
0&1&2\\
0&0&1\\
\end{pmatrix}
\begin{pmatrix}
a\\
b\\
c\\
\end{pmatrix}
=
\begin{pmatrix}
6\\
4\\
1\\
\end{pmatrix}
$$
We immediately know that $c=1$, which implies that $b+2=4$. This in turn implies $b=2$ and thus $a+4-1=6$ so $a = 3$. Writing an algorithm to do this is straight-forward for any upper triangular matrix.
#### Finding LSE with QR
If we rewrite the equations of the LSE using $\mathbf{QR}$ instead of $\mathbf{X}$ we have:
$$\mathbf{X}^\top \mathbf{X} \boldsymbol{\beta} = \mathbf{X}^\top \mathbf{Y}$$
$$(\mathbf{Q}\mathbf{R})^\top (\mathbf{Q}\mathbf{R}) \boldsymbol{\beta} = (\mathbf{Q}\mathbf{R})^\top \mathbf{Y}$$
$$\mathbf{R}^\top (\mathbf{Q}^\top \mathbf{Q}) \mathbf{R} \boldsymbol{\beta} = \mathbf{R}^\top \mathbf{Q}^\top \mathbf{Y}$$
$$\mathbf{R}^\top \mathbf{R} \boldsymbol{\beta} = \mathbf{R}^\top \mathbf{Q}^\top \mathbf{Y}$$
$$(\mathbf{R}^\top)^{-1} \mathbf{R}^\top \mathbf{R} \boldsymbol{\beta} = (\mathbf{R}^\top)^{-1} \mathbf{R}^\top \mathbf{Q}^\top \mathbf{Y}$$
$$\mathbf{R} \boldsymbol{\beta} = \mathbf{Q}^\top \mathbf{Y}$$
$\mathbf{R}$ being upper triangular makes solving this more stable. Also, because $\mathbf{Q}^\top\mathbf{Q}=\mathbf{I}$ , we know that the columns of $\mathbf{Q}$ are in the same scale which stabilizes the right side.
Now we are ready to find LSE using the QR decomposition. To solve:
$$\mathbf{R} \boldsymbol{\beta} = \mathbf{Q}^\top \mathbf{Y}$$
We use `backsolve` which takes advantage of the upper triangular nature of $\mathbf{R}$.
```{r}
QR <- qr(X)
Q <- qr.Q( QR )
R <- qr.R( QR )
(betahat <- backsolve(R, crossprod(Q,y) ) )
```
In practice, we do not need to do any of this due to the built-in `solve.qr` function:
```{r}
QR <- qr(X)
(betahat <- solve.qr(QR, y))
```
#### Fitted values
This factorization also simplifies the calculation for fitted values:
$$\mathbf{X}\boldsymbol{\hat{\beta}} =
(\mathbf{QR})\mathbf{R}^{-1}\mathbf{Q}^\top \mathbf{y}= \mathbf{Q}\mathbf{Q}^\top\mathbf{y} $$
In R, we simply do the following:
```{r,eval=FALSE}
```{r,fig.width=5,fig.height=5,fig.align="center"}
library(rafalib)
mypar(1,1)
plot(x,y)
fitted <- tcrossprod(Q)%*%y
lines(x,fitted,col=2)
```
#### Standard errors
To obtain the standard errors of the LSE, we note that:
$$(\mathbf{X^\top X})^{-1} = (\mathbf{R^\top Q^\top QR})^{-1} = (\mathbf{R^\top R})^{-1}$$
The function `chol2inv` is specifically designed to find this inverse. So all we do is the following:
```{r}
df <- length(y) - QR$rank
sigma2 <- sum((y-fitted)^2)/df
varbeta <- sigma2*chol2inv(qr.R(QR))
SE <- sqrt(diag(varbeta))
cbind(betahat,SE)
```
This gives us identical results to the `lm` function.
```{r}
summary(lm(y~0+X))$coef
```