0% found this document useful (0 votes)
4 views5 pages

Summary StatTesting

The document covers various statistical methods including sampling techniques, correlation measures, hypothesis testing, and error types. It explains concepts such as null and alternative hypotheses, significance levels, and critical values, as well as specific tests like the Chi-squared test, Z and T-tests, and correlation tests. Additionally, it discusses the Central Limit Theorem and the properties of random variables in relation to expected values and variances.

Uploaded by

tsinapah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views5 pages

Summary StatTesting

The document covers various statistical methods including sampling techniques, correlation measures, hypothesis testing, and error types. It explains concepts such as null and alternative hypotheses, significance levels, and critical values, as well as specific tests like the Chi-squared test, Z and T-tests, and correlation tests. Additionally, it discusses the Central Limit Theorem and the properties of random variables in relation to expected values and variances.

Uploaded by

tsinapah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Sampling methods: (random, quota, convenient, systematic), Validity and reliability

Correlation (with data) : How much you can trust the regression line
Pearson correlation for linear relations
Spearman rank correlation for non linear relations (average ranks of equal values)
GDC: Line Reg (r: coeff, r²: determination)
0<|r|<0.5 very weak 0.5<|r|<0.7 weak 0.7<|r|<0.87 moderate
0.87<|r|<0.95 strong 0.95<|r|<1 very strong

Definitions:
• Null Hypothesis H0: The hypothesis we will base the test on

• Alternative Hypothesis H1: The hypothesis we will take if we reject H0


(two tailed, each tail is /2); > (one tailed right); < (one tailed left)
• Degrees of freedom: number of values that can vary without changing statistics
parameters

• significance level : probability to reject H0 when in fact true that we accept

• critical values: The values that are the threshold for rejecting H0 (critical values are in
the rejection region)

• Acceptance region: below critical value is H1 is <


above critical value if H1 is >
between critical values if H1 is

• test statistic t: a value that summarizes the observed data


if t is in the acceptance region, we accept H0

• p-value: The probability of being above t if H1 is >


below t if H1 is <
between the critical values if H1 is
if p-value > , we accept H0
X² test of independence: are two categorical data variables dependent
prerequisite/Assumptions: random sampling, no frequencies below 5 (group categories)
H0: independent, H1: dependent

expected frequencies: for each cell

Degrees of freedom: (numbers of columns-1)×(numbers of rows-1)

GDC : Matrices then Stat Test then


if p-value < or if χ² > Critical value Reject H0
if p-value > or if χ² < Critical value Accept H0

X² GOF test: Check if one categorical data variable matches a given distribution (or if two
samples are distributed alike)
prerequisite/Assumptions: random sampling, no frequencies below 5 (group categories)
H0: data distributed as expected, H1: not distributed as expected
Degrees of freedom: number of categories – 1 – number of estimated parameters

test statistic formula:

You can use GOF to check the estimators


GDC: List then stat test
if p-value < or if χ² > Critical value Reject H0
if p-value > or if χ² < Critical value Accept H0

Random variables:

Expected value of a squared random variable:


Linear combination of random variables:
Example: We have bags of apples that contain random numbers of green and red apples:
X is the number green apples in a bag and a green apple costs 1€
Y is the number of red apples in a bag and a red apple costs 2€
if W is the price of a bag of apples then W=X+2Y

Assuming X and Y are independent: E(W)=E(X)+2E(Y) and VAR(W)=VAR(X)+4VAR(Y)

Do not confuse linear combination of random variables with independent picks of a random
variables. Example:
If Z is the price of two bags of apples then Z 2W because we are not looking double of the
price of one bag but the price of two different bags (two independent picks in the random
variable W).
Then VAR(Z) 4VAR(W) but instead VAR(Z) = VAR(W)+VAR(W)= 2VAR(W)

Looking at the double of the price of one bag is not the same as looking at the price of two
different bags.

(this distinction does not change the value of E(Z) which remains E(Z)=2E(W))

CLT: if we take several same-size samples within a population the means of the samples are
normally distributed. Then
n = samples size, = mean of the population, = variance of the population

CLT applies if the population is already normally distributed or if the samples size is >30.

If the CLT applies you can estimate the population mean and variance from the sample:

unbiased estimator of the population mean is (with , the mean of the sample)

unbiased estimator of the population variance is (with , the variance of


the sample)
Z and T-test: are used to assess whether a sample mean is close enough to the supposed
population mean.
The z-test is used when the population standard deviation, σ, is known.
To find the critical values c use invNorm:
if H1 is
if H1 is
and if H1 is (use “center” in the gdc)

The t-test is used when only the sample standard deviation, Sn , is known.
It will not be asked to find a critical region for a t-test
use the GDC (1-var-stat) to find Sn-1 (Sx in the GDC)
Types of t and z tests:
one sample and a theoretical mean (student t-test)
two unrelated samples (unpaired test)
two related samples (paired test) (testing a same group before and after an event)
then H0: and H1 is or or
unless stated otherwise the 2 sample tests are pooled.
The tests can be run with the GDC (with data or statistics)

Poisson test: Testing the mean of a Poisson distribution

Binomial test: Testing the proportion of a characteristic within a population

For Poisson and Binomial tests, be careful with the “or equal” when calculating the p-value
Pearson correlation test: Testing if a correlation can be extended to the global population.
H0: There is no correlation ρ = 0 ( is the “population” correlation coefficient)

GDC:LineRegT-test (t is the test value, p is the p-value)

Error Types:
Type 1: reject H0 when H0 is true , for a continuous variable P(Type 1 error) =
Type 2: accept H0 when H1 is true (H1 parameters will be given)

You might also like