[DAY 1 - 6/8/2024]:
Sau session 5: MID-TERM
Business Analytics life cycle -> Identify Problem ( most important)
+ Who is your business?
+ What
+ How
Power BI
Looker Studio
[DAY 2]:
+ Mode -> the worst choice because of not meeting up to 50% customers
+ Media/ Mean -> meet up to 50% customer -> not a good one
- The midrange => the average of Min and Max
- Dispersion ( sự phân tán): How data is varied from the mean
+ Nếu gần với giá trị mean => low dispersion
+ Nếu xa với giá trị mean => high dispersion
=> ∑(xi-> x ngang) =0 => muốn dương thì bình phương
- Midspread ( Độ trải giữa): 1 khoảng (Q1-Q3):
- Standard Deviation (Độ lệch chuẩn)
Ex: Noodles: 80g +- 5g
- CV = std/mean => càng lớn: high risk
-
Note: WHY banks usually announce the Mean of employees salaries?
+ attract talents
+ attract investors, shareholders, stakeholders and satisfy customers.
+ attract media/ public relations
CHAPTER 7:
1.ONE SAMPLE TEST: x ngang vs C
x ngang income vs 60.000 (constant data)
- Step1: Hypothesis:
+ Ho: x (ngang) income = 60K => There is no difference between the average
household income and 60.000
+ H1: x (ngang) income khác 60K ( Income < 60.000 or >60.000) => There is
difference between the average household income and 60.000
- Set 2: Methodology: One sample test
- Set 3: Descriptive Statistic: N, x ngang, s
A survey of 6400 people showed that average household income is 6474$ and the std =
78718$
- Step 4: Result of T-test
t = 9.629, p< 0.001. There is a significant difference between 60K and average household
income. The 60K is 9474$ smaller than average income and the difference is significant at
the 0.001 level.
(Different or not different => if different is smaller or bigger )
Note: Nếu There is no significant difference… ( k cần report Mean Difference ). Xo luôn true
=> kh cần report
2.INDEPENDENT SAMPLES T-TEST
- S1: Hypothesis:
+ Ho: Income F = Income M
+ H1: Income F khác Income M => biggest smaller
- S2: Methodology: Independent samples t-test
- S3: Descriptive Statistic:
+ N1, x1 ngang, s.dev1 ngang ( of each group)
+ N2, x2 ngang, s.dev 2 ngang
- S4: Homogeneity (đồng nhất) of Variance
Note: Chỉ so sánh khi 2 groups đồng nhất
Flevene =? ( through Levene test)
● p<0.005 => Equal variance NOT assumed ( because the variance is different ) =>
There is a difference …. (XEM p owr sig.detailed 2)
● p>0.005 => Equal variance assumed
=> Identify p< 0.05 ? and t= ?
Giải:
S1: Hypothesis
● Ho: There is no significant difference in between average household income between
male and female
● H1: There is significant difference in between average household income between
male and female
S2: Methodology: Independent sample t-test
S3: Descriptive Statistics
Vào Analyze -> Compare Means -> Independent Sample T-tests
S4: Homogeneity (đồng nhất) of Variance
NOTE:
● Correlation => sự tương quan hai chiều giữa hai biến
● Independent samples t-tests ( 2 groups) => đo lường tđ của 1 biến đến biến còn lại
( 1 chiều)
● One- Sample t-tests ( 1 biến) => so sánh 1 biến với một số
Ex: Impact của Marital Status lên Car => Dùng Independent Sample t-tests
REVIEW:
1. sample x ngang vs c
2. Two samples tests :
- xA vs xB => Independent samples t-test
- xA vs xA phẩy ( same group but in different context) => Paired samples test
3. PAIR-SAMPLES T-TEST
Step 1:
Ho: There is no significant difference between the Estimated and Actual data
H1: There is a significant difference………………………………………………..
Step 2: Hypothesis: Paired samples t- test
Step 3: Descriptives
N, x ngang, x ngang phẩy, std của 2 biến
Note: Không cần Homogeneity vì chỉ có 1 group
Step 4: Correlation
r=? , p<= 0.05
The movement of the data => 2 biến có same direction or not
Nếu move together => Nếu tốt nhất => perfect movement => Correlation should be = 1
Move up hay move down
Step 5: t-test
t=? , p<=0.05?
Nếu p<=0.05:
+ smaller or bigger ( Mean biến nào lớn hơn thì bigger)
+ Mean difference
+ Sig-level
Sửa bài Pile Foundation:
S1:
Ho: There is no significant difference between Estimated data and Actual data
H1: There is significant difference between Estimated data and Actual data
S2: Paired samples t-test
S3: Statistics
S4: Correlation
r=0.797, p<0.001 => The Estimated data highly correlated to the Actual data
S5: Result of t-test
t=-10.929, p<0.01
=> There is significant difference between Estimated data and Actual data
=> The Estimated data is smaller than the Actual data, with the mean difference is 6.38 and
it is significant at 0.001 level.
4. ANOVA:
>= 2 groups
Step 1:
Ho: xA ngang =xB ngang = xC ngang
H1: At least 1 sig.difference
Step 2: Methodology: ANOVA
Step3: Descriptive Statistics
F=?
p<=0.05?
Step 4: Assumption of Homogeneity of Variance
Đề bài: Demo file
carcart (Factor) -> Income ??? => Income: Dependent list
Step 1:
Ho: There is no significant difference in average Household income among the people who
drive different types of car
H1: There is AT LEAST 1 significant difference in average Household income among the
people who drive different types of car.
Step 2: 3 car categories => use the ANOVA
Step 3:Descriptives statistics
There are 1841 respondents who drive the economy car, have the average household
income is 21887$ and std = 5.241$
The highest average household income belongs to the group of respondents who drive the
luxury car, with the mean is 134.621$ and std=102,355$
Step 4: Assumption of Homogeneity of Variance
FLevene’s test = 1129.720, PLevene’s <0.001
=> Equal variances NOT assumed
=> ANOVA table could be wrong
=> Additional test for Equality of Mean (Welch test/ Brown Forsythe)
Note:
● pLevene<=0.05=> Equal variance NOT assumed => Robust test
● pLevene>0.05 => Equal variance assumed => ANOVA (enough, don't need Robust
test)
Step 5:
FWelch’s test = 5752.869, p<0.001
=> There is at least 1 significant difference in the average Household income among the
people who drive different types of car.
Step 6: Post-hoc test
The average household income of respondents who drive the economy car is 20672$
smaller than the average household income of respondents who drive the standard car. the
difference is significant at 0.001 level.
…. ( ss các biến còn lại)
NOTE: Chỉ lấy 3 số thập phân