0% found this document useful (0 votes)
17 views20 pages

Tutorial - 1 Introduction To Statistics and Data Analysis

The document outlines a course on statistics and data analysis, covering topics such as probability, random variables, and measures of central tendency. It includes examples of statistical concepts like sample mean, median, mode, and variance, along with practical applications in data collection and analysis. The document also discusses the importance of box-plots in comparing data distributions.

Uploaded by

mahmoud khairy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views20 pages

Tutorial - 1 Introduction To Statistics and Data Analysis

The document outlines a course on statistics and data analysis, covering topics such as probability, random variables, and measures of central tendency. It includes examples of statistical concepts like sample mean, median, mode, and variance, along with practical applications in data collection and analysis. The document also discusses the importance of box-plots in comparing data distributions.

Uploaded by

mahmoud khairy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Course Contents

1. Introduction to Statistics
and Data Analysis
2. Probability
3. Random Variables and
Probability Distributions
4. Mathematical Expectation
5. Some Discrete Probability
Distributions
6. Some Continuous
Probability Distributions

Copyright © 2010 Pearson Addison-Wesley. All rights reserved.


Chapter 1

Introduction to
Statistics and Data
Analysis

1-2
Chapter Outline

1.1 Overview: Statistical Inference, Samples, Populations,


and the Role of Probability
1.2 Sampling Procedures; Collection of Data
1.3 Measures of Location: The Sample Mean and Median
1.4 Measures of Variability
1.5 Discrete and Continuous Data
1.6 Statistical Modeling, Scientific, Inspection, and
Graphical Diagnostics

1-3
Example

Two samples of 10 northern red oak seedlings were planted in a greenhouse,


one containing seedlings treated with nitrogen and the other containing
seedlings with no nitrogen. The stem weights in grams were recorded after the
end of 140 days. The data are given as follows:

1-4
The Dot Plot

1-5
Fundamental Relationship between
Probability and Inferential Statistics

1-6
Measures of Location (Central
Tendency)

• The data (observations) often tend to be concentrated around the


center of the data.
• Some measures of location are: the mean, mode, and median.
• These measures are considered as representatives (or typical
values) of the data. They are designed to give some quantitative
measures of where the center of the data is in the sample.

1-7
Sample Mean

1-8
Example

Suppose that the following sample represents the ages (in year) of
a sample of 3 men:

x1  30 , x2  35, x3  27.

Then, the sample mean is:


30  35  27 92
x   30.67
3 3
3
Note:  x  x   30  30.67  35  30.67  27  30.67  0
i 1
i

1-9
Sample Mean as a Centroid of
the with-nitrogen stem weight

1 - 10
Median

• e.g. 4, 2, 1, 4, 5, 2, 1
1, 1, 2, 2, 4, 4, 5
Therefore the median is 2
• e.g. 4, 2, 1, 4, 5, 2
1, 2, 2, 4, 4, 5
Therefore the median is (2 + 4)/2 = 3

1 - 11
Mode

• The mode of a set of quantitative data is the most frequently


occurring measurement in a data set.
• If no measurements occurring more than once, then there is no
mode.
• There may be several modes if there are more than one data with
the same most frequently occurring.

e.g. 2, 4, 5, 1, 7, 9, 0 : No mode
2, 4, 2, 5, 4, 2 : Mode is 2
2, 4, 2, 5, 4, 2, 4, 7 : Modes are 2 and 4

1 - 12
Sample Variance

1 - 13
Example 1

Compute the sample variance and standard deviation of the following


observations (ages in year): 10, 21, 33, 53, 54.

Solution: n 5

x i x i
10  21  33  53  54 171
x i 1
 i 1
   34 .2 year
n 5 5 5
n 5

 x  x  x  34 .2 
2 2
i i
s 
2 i 1
 i 1
n 1 5 1

10  34 .2   21  34 .2   33  34 .2   53  34 .2   54  34 .2 
2 2 2 2 2

4
1506 .8
  376 .7 (year) 2
4

s  s 2  376.7  19.41 year 1 - 14


Example 2

A sample of 10 students scored the following grades: 40, 42, 35, 54, 57,
54, 46, 42, 54, 57.
(i) Find the sample mean, mode and median.
(ii) Compute the standard deviation.

Solution:
(i) Listing the score in order : 35, 40, 42, 42, 46, 54, 54, 54, 57, 57
35  40  42  42  46  54  54  54  57  57
Mean  x   48 .1
10
46  54
Mode  54 Median  ~ x  50
2
1
(ii) s  [( 35  48 .1) 2  ( 40  48 .1) 2    (57  38 .1) 2 ]  8 .1
9
1 - 15
Comparing

• The range is the numerical difference between the largest and the
smallest value of a set of a batch of data:
range = max – min
• The lower quartile, denoted by Q1, is the median of the lower half of
the batch of data.
• The upper quartile, denoted by Q3, is the median of the upper half of
the batch of data.
• The inter-quartile range, is defined by Q3 – Q1.
• A Box-plot is a diagram consisting of box and whiskers displays the
median, the quartiles and maximum and minimum values in a batch
of data.
median
min max

Q1 Q3 1 - 16
Example 1

For the batch of data


4, 5, 6, 6, 7, 11, 12, 14, 16, 20, 22, 29
Min = 4
Max = 29
Q1 = (6 + 6)/2 = 6
Q3 = (16 + 20)/2 = 18
Inter-quartile range = 18 – 6 = 12
Median = (11 + 12)/2 = 11.5

11.5
4 29

6 18
1 - 17
Example 2

The table below gives the gross weekly earning including overtime in
pounds of 20 actors working in a theatre (9 women and 11 men):

Women 221 272 334 361 372 399 415 456 510
Men 258 315 333 353 398 420 435 462 495 523 587

(a) Draw an accurate diagram of the box-plots.


(b) What do box-plots tell you about the relative earnings of male and
female actors.

1 - 18
Example 2

For women For men


Min = 221 Min = 258
Max = 510 Max = 587
Q1 = (272 + 334)/2 = 303 Q1 = 333
Q3 = (415 + 456)/2 = 435.5 Q3 = 495
Median = 372 Median = 420

Women

Men

160 320 480 640

1 - 19
Example 2

CONTINUED

From the box-plots it is clear that the men’s earnings are higher than the
women’s: all the five values marked on the box-plots are higher for men
than for the women.

1 - 20

You might also like