0% found this document useful (0 votes)
224 views1 page

Statistical Tests and Linear Models Guide

Common statistical tests such as t-tests, ANOVA, regression, and chi-square tests can be represented as linear models. Many parametric and non-parametric tests have equivalent linear models that model the test statistic as an intercept and slopes predicted from variables. These linear models become more accurate approximations for non-parametric tests with larger sample sizes. Interaction terms in linear models like ANOVA represent how a variable changes the relationship between other variables and the outcome.

Uploaded by

Lourrany Borges
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
224 views1 page

Statistical Tests and Linear Models Guide

Common statistical tests such as t-tests, ANOVA, regression, and chi-square tests can be represented as linear models. Many parametric and non-parametric tests have equivalent linear models that model the test statistic as an intercept and slopes predicted from variables. These linear models become more accurate approximations for non-parametric tests with larger sample sizes. Interaction terms in linear models like ANOVA represent how a variable changes the relationship between other variables and the outcome.

Uploaded by

Lourrany Borges
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Common statistical tests are linear models

Last updated: 02 April, 2019


See worked examples and more details at the accompanying
notebook: https://lindeloev.github.io/tests-as-linear

Common name Built-in function in R Equivalent linear model in R Exact? The linear model in words Icon

y is independent of x
Simple regression: lm(y ~ 1 + x) .

P: One-sample t-test t.test(y) lm(y ~ 1) ✓ One number (intercept, i.e., the mean) predicts y.
N: Wilcoxon signed-rank wilcox.test(y) lm(signed_rank(y) ~ 1) for N >14 - (Same, but it predicts the signed rank of y.)

P: Paired-sample t-test t.test(y1, y2, paired=TRUE) lm(y2 - y1 ~ 1) ✓ One intercept predicts the pairwise y2-y1 differences.
N: Wilcoxon matched pairs wilcox.test(y1, y2, paired=TRUE) lm(signed_rank(y2 - y1) ~ 1) for N >14 - (Same, but it predicts the signed rank of y2-y1.)

y ~ continuous x
P: Pearson correlation cor.test(x, y, method=’Pearson’) lm(y ~ 1 + x) ✓ One intercept plus x multiplied by a number (slope) predicts y.
N: Spearman correlation cor.test(x, y, method=’Spearman’) lm(rank(y) ~ 1 + rank(x)) for N >10 - (Same, but with ranked x and y)

y ~ discrete x
P: Two-sample t-test t.test(y1, y2, var.equal=TRUE) lm(y ~ 1 + G2)A ✓ An intercept for group 1 (plus a difference if group 2) predicts y.
P: Welch’s t-test t.test(y1, y2, var.equal=FALSE) gls(y ~ 1 + G2, weights=…B)A ✓ - (Same, but with one variance per group instead of one common.)
N: Mann-Whitney U wilcox.test(y1, y2) lm(signed_rank(y) ~ 1 + G2)A for N >11 - (Same, but it predicts the signed rank of y.)
Multiple regression: lm(y ~ 1 + x1 + x2 +…)

P: One-way ANOVA aov(y ~ group) lm(y ~ 1 + G2 + G3 +…+ GN)A ✓ An intercept for group 1 (plus a difference if group ≠ 1) predicts y.
N: Kruskal-Wallis kruskal.test(y ~ group) lm(rank(y) ~ 1 + G2 + G3 +…+ GN)A for N >11 - (Same, but it predicts the rank of y.)

- (Same, but plus a slope on x.)


P: One-way ANCOVA aov(y ~ group + x) lm(y ~ 1 + G2 + G3 +…+ GN + x)A ✓
Note: this is discrete AND continuous. ANCOVAs are ANOVAs with a continuous x.

P: Two-way ANOVA aov(y ~ group * sex) lm(y ~ 1 + G2 + G3 + … + GN + ✓ Interaction term: changing sex changes the y ~ group parameters.
S2 + S3+ … + SK + Note: G2 to N is an indicator (0 or 1) for each non-intercept levels of the group variable.
Similarly for S2 to K for sex. The first line (with Gi) is main effect of group, the second (with [Coming]
G2*S2+G3*S3+…+GN*SK)
Si) for sex and the third is the group × sex interaction. For two levels (e.g. male/female),
line 2 would just be “S 2” and line 3 would be S2 multiplied with each Gi.

Counts ~ discrete x Equivalent log-linear model Interaction term: (Same as Two-way ANOVA.)
Note: Run glm using the following arguments: glm(model, family=poisson()) Same as
N: Chi-square test chisq.test(groupXsex_table) glm(y ~ 1 + G2 + G3 + … + GN + ✓ Two-way
As linear-model, the Chi-square test is log(y i) = log(N) + log(αi) + log(βj) + log(αiβj) where αi
S2 + S3+ … + SK + and βj are proportions. See more info in the accompanying notebook. ANOVA
G2*S2+G3*S3+…+GN*SK, family=…)A

N: Goodness of fit chisq.test(y) glm(y ~ 1 + G2 + G3 +…+ GN, family=…)A ✓ (Same as One-way ANOVA and see Chi-Square note.) 1W-ANOVA

List of common parametric (P) non-parametric (N) tests and equivalent linear models. The notation y ~ 1 + x is R shorthand for y = 1·b + a·x which most of us learned in school. Models in similar colors are highly similar, but
really, notice how similar they all are across colors! For non-parametric models, the linear models are reasonable approximations for non-small sample sizes (see “Exact” column and click links to see simulations). Other less
accurate approximations exist, e.g., Wilcoxon for the sign test and Goodness-of-fit for the binomial test. The signed rank function is signed_rank = function(x) sign(x) * rank(abs(x)). The variables Gi and Si are “dummy
coded” indicator variables (either 0 or 1) exploiting the fact that when Δx = 1 between categories the difference equals the slope. Subscripts (e.g., G 2 or y1) indicate different columns in data. lm requires long-format data for all
non-continuous models. All of this is exposed in greater detail and worked examples at https://lindeloev.github.io/tests-as-linear.

A
See the note to the two-way ANOVA for explanation of the notation. Jonas Kristoffer Lindeløv
B
Same model, but with one variance per group: gls(value ~ 1 + G2, weights = varIdent(form = ~1|group), method="ML"). https://lindeloev.net

You might also like