What is Mathematical Statistics ?
Science of investigating population’s laws
a) Population: The set of target objects of study
- Socio-demographic study: all citizentsof a given country
- Forestry survey: All trees in a study region
- Quality control: All product issues of a factory
b) Sample
A reasonable small amount of individuals picked out
from a given population for a specific study
Sampling
Sample
Population
Estimation
Laws
Hypothesis tests
Population
How to do the sampling
- Representative for the population of study
- Corresponding to study target
Sampling models
A. One sample model
• One sample model usually concerns with an intervention on
population: If the intervention should make some change in
population?
• Choose individuals from the population randomly to perform a
sample
◼Example 1. If in Ha Dong 90% motorcyclists use helms?
◼Example 2. If proportion of girl-students less than 50% ?
◼Example 3. If in Viet Nam bred feeding is popular among
more than 70% women?
B. Two independent samples
model
Model of two groups of objects with different
a) Intervention levels,
b) Individual proper
• Example 1. If women are better in foreign languages than
men?
• Example 2. If there is any difference between Ha Noi and
Ho Chi Minh City in immigration from rural areas?
• Example 3. If quality of coffee produced in Lam Dong is
different than that in Dak Lak?
• Example 4. If number of traffic accidents per month in Ba
Dinh district decreased after 31/12/2022?
Notes:
In the model of two independent samples
a) Observations numbers in two groups (sample sizes) may
be different
b) Observations of each group are independent from
those of the second group
c) Sampling: Observations must be randomly
selected from each of two groups
C. Model of two dependent (paired) samples
• Two dependent samples model is used in a study when
• A) Each object in the first sample is chosen together with a similar
(paired) object in the second sample, or
• B) Any object in the second sample is the same one in the first
sample, but the measures in the two samples are taken under
different conditions.
Notes: In the model of two paired samples
A) Observations amounts (sample sizes) of two samples
are equal
B) Information taken from one observation is related with
that of correspondingly paired observation
C) In pairing to perform the samples, all factors which may
influence on study issues must be taken into account
Example 1. To investigate the influence of cigarette on
hypertension disease: perform two samples of smoking and
non smoking people, each person from the non-smoking
group is paired with one smoking person similar about age,
sex, weight, height, occupation, etc.
Example 2. Comparing 2022 and 2023 there is a changing
in persons’ opinion about Covid Vaccination?
D. Model of multi-independent samples
Notes:
In the model of multi- independent samples
a) Observations numbers in groups (sample sizes) may be
different
b) Observations of each group are independent from
those of other groups
c) Sampling: Observations must be randomly
selected from each of groups
• Example 1. Compare examination results of several high
schools in Ha Noi
• Example 2. Compare salary in different economic sectors
• Example 3. Water supplying of ethnic groups?
Data - Coding
DATA: Information, usually numerical or categorical
a) Variable: (quantity, characteristic, etc. )
The characteristic measured or observed when an experiment is carried
out or and observation is made, including
- Characteristics: Nationality, sex, occupation, etc
- M easures Weight, height, age, monthly income, …
- Answers to interview questions
- States, forms of companies, of study objects, etc.
b) Observation: (individual, sample unit)
The set of values of all variables denoted at a given
observation, an object, a person or a sample , etc.
c) Value set of variable:
The set of all available values of a given variable
Example: variables Name, Age, Sex, Height, Weight, Housing
VSET(Name) = {A ,. . ., Ba , . . ., Tien , . . ., Yen , . . . , Xuan , . . .}
VSET(Age) = { 1 , 2 , . . . , 100 , . . . } ,
VSET(Sex) = { Male, Female} ,
VSET(Height) = [ 0.6 m , 2.30 m ],
VSET(Weight) = [ 2 Kg , 150 Kg ] ,
VSET(Housing) = { thatched house, brick house, appartment, villa}
2. Variable types
a) Quantitative variables: (measures)
- Continuous variables
Example: Weight, Temperature, Density of a chemical substance in water
- Discrete variables
Example: Income, Salary, Price,
- Integer Variable
Age, Amount of children in household
b) Qualitative variables (norminal or categorical variables)
Charateristics of study object, usually with non-number values
Example: Sex (male-female), Residence place
Reason of borrow (for Health care, for Education, etc.
Occupation (Farmer, Worker, Vender
Transport (by foot, by boat, bicycle, motorbike, car, etc.)
- Ordered qualitative variables:
Values of variable can be ordered in certain way, presenting
their importance levels.
Example: Housing, Water source, Transport mean, etc.
- Unordered qualitative variables: (nominal variables)
Values of variable can not be ranged in order
Example: Ethnic, Occupation, Reason of migration, etc.
CODI NG
Turning collected information into numerical form suitable for
computing process
i) Coding quantitative variables
Values of quantitative variables are measures
The measures are taken directly as codes of variables
ii) Coding qualitative variables
- For ordered qualitative variables:
Take integer numbers as codes for ordered levels of a given variable
- For unordered qualitative variables:
+ 1-st way : Coding in the same way as for ordered variables,
Each value of variable → one integer number
+ 2-nd way: From a given variable perform new auxiliary binary
variables (impuls variables), each of those takes only two values
0 -1
Example:
a) Coding ordered qualitative variables
“Transport means”
~ By foot → 0
~ By bicycle → 1
~ By motorbike → 2
“Housing”
~ Homeless → 0
~ Thatched house → 1
~ Wooden house → 3
~ Appartment → 5
~ Villa → 6
b) Coding unordered qualitative variables
“Borrow reason“: Production, Shoping, Health care, Education, Wedding
1-st way: ~ Production → 1
~ Shoping → 2
~ Health care → 3
~ Education → 4
~ Wedding → 5
2-nd way : Perform 5 new auxiliary binary variables
Main Variable 1 Variable 2 Variable 3 Variable 4 Variable 5
variable Production Shoping Health care Education Wedding
Production 1 0 0 0 0
Shoping 0 1 0 0 0
Health care 0 0 1 0 0
Education 0 0 0 1 0
Wedding 0 0 0 0 1
4. Organizing data
Data matrix:
- Columns → variables,
- Rows → Observations
Example: Demographic survey
Name Age Sex Income Height Weight Whatching Housing
TV
Person1 Vân 27 Female 650000 1m55 55Kg Every day Hired
Person 2 Bường 46 Male 980000 1m68 67Kg Rarely Brick H.
. . . ... . . . . . . . . . . . . . . . . . . . . .
Person 40 Việt 31 Male 775000 1m73 58Kg Every day Wooden
Person 41 Canh 77 Female 325000 1m49 46Kg Never Thatched
1 VAN 27 2 650 1.55 55 2 0
2 BUONG 46 1 980 1.68 67 1 5
... ... ... ... ... ... ... ... ...
40 VIET 31 1 775 1.73 58 2 3
41 CANH 77 2 325 1.49 46 0 1