What is the Stats?
Stats is a domain in which
1 Collecting the right data sample data
2 Analyze it summarize of the data
3 Draw conclusion from it
e.g. Covid vaccination
Stats
Descriptive Stats Analyze the data:
Inference Stats Draw conclusion from the data
Descriptive Stats Analyze the data and summarize the data
Measure of Central Tendency
Mean
Median
Mode
Measure of variability How much data is spreaded away from the mean?
Range Max - Min
Variance
Standard Deviation
Which is suitable to play at 3rd position?
Virat Kohli Ishan kishan
1st match 40 60 1
50 40 2
60 20 3
45 0 4
55 50 5
47 40 Mean Sum of all val 6
53 5 no of values 7
48 10 8
52 35 9
10th match 50 25 10
Mean/Average 50 28.5 Virat Kohli
Median 50 30
Mode 50 40
Range 20 60
When to use which measure of CT?
variable/col/feature is numeric / categorical
numeric mean/median mean
median
categorical mode
Gender
M
F
F
F
M
M
F
F
F
F
Mode F
Virat Kohli diff x - mean
1st match 60 56.25
59 42.25
56 12.25
50 6.25
50 6.25
50 6.25
50 6.25
50 6.25
50 6.25
10th match 50 6.25
Mean 52.5 15.45
Range 10
Variance 15.45 measure of spreadness: how much datapoints are moving away from the mean
Variance is leswhen more data points are closer to mean
Variance is higwhen more data points are away from mean
Virat Kohli Ishan kishan
40 0
45 5
47 10
48 20
50 25 Median
50 30 if n is odd
52 35 if n is even
53 40
55 40
60 60
Salary Salary + extreme values
10000 10000
15000 15000
35000 35000
12000 12000
18000 18000
20000 20000
11000 11000
14000 14000
10000 10000
14000 140000
mean 15900 28500 12600 sum of values
median 14000 14500 500 middle values
mean is getting more impacted because of extreme values
Player A
40
50
60
Mean 50
Variance 66.66666667
Which Player will have highe
Player B has higher variance
way from the mean
middle value
n+1/2
average of (n/2 n+2/2)
Player B
100
0
50
50
1666.666667
hich Player will have higher variance?
ayer B has higher variance