QUAN201: Introduction to Business Statistics
Presentation 1
Lecture Topics
Define Statistics Applications of Statistics in Business Difference between Descriptive and Inferential statistics Some Basic Concepts Data Collection Data Presentation via Tables and Graphs
What is Statistics?
Facts and figures Branch of mathematics Science of gathering, analyzing, interpreting, and presenting data.
Application of Statistics in Business
Accounting auditing and cost estimation Economics regional, national, and international economic performance Finance investments and portfolio management Management human resources, and quality management Management Information Systems performance of systems which gather, summarize, and disseminate information to various managerial levels Marketing market analysis and consumer research International Business market and demographic analysis
4
Some Statistical Concepts
Population vs. Sample Census Descriptive vs. Inferential Statistics Parameter vs. Statistic Variables, Data
Population Versus Sample
Population the whole
a collection of persons, objects, or items under study a subset of the population
Sample a portion of the whole
Census gathering data from the entire population
6
Population
Population and Census Data
Identifier RD1 RD2 RD3 RD4 RD5 BL1 BL2 GR1 GR2 GY1 GY2 GY3 Color Red Red Red Red Red Blue Blue Green Green Gray Gray Gray
Sample and Sample Data
Identifier Color RD2 Red
RD5
Red
GR1
Green
GY2
Gray
Descriptive vs. Inferential Statistics
Descriptive Statistics using data gathered on a group to describe or reach conclusions about that same group only. Inferential Statistics using sample data to reach conclusions about the population from which the sample was taken.
10
Descriptive Statistics
Collect data
e.g. Survey e.g. Tables and graphs e.g. Sample mean =
Present data
Characterize data
X
n
11
Descriptive Statistics
Descriptive statistics involves the arrangement, summary, and presentation of data, to enable meaningful interpretation, and to support decision making. Descriptive statistics methods make use of graphical techniques numerical descriptive measures. The methods presented apply to both the entire population the population sample
12
Inferential Statistics
Estimation
e.g.: Estimate the population mean weight using the sample mean weight e.g.: Test the claim that the population mean weight is 120 pounds Drawing conclusions and/or making decisions concerning a population based on sample results. 13
Hypothesis testing
Quiz
Which of the following statements involve descriptive statistics as opposed to inferential statistics? The Alcohol, Tobacco and Firearms Department reported that Houston had 1,791 registered gun dealers in 2006. Based on a survey of 400 magazine readers, the magazine reports that 45% of its readers prefer double column articles. Based on a sample of 300 professional tennis players, a tennis magazine reported that 25% of the parents of all professional tennis players did not play tennis.
14
Parameter vs. Statistic
Parameter descriptive measure of the population
Usually represented by Greek letters
Statistic descriptive measure of a sample
Usually represented by Roman letters
15
Symbols for Population Parameters
denotes population parameter mean
denotes population variance
denotes population standard deviation
16
Symbols for Sample Statistic
x denotes sample mean
denotes sample variance
S denotes sample standard deviation
17
Process of Inferential Statistics
Calculate x
Population
to estimate
Sample x (statistic)
(parameter )
Select a random sample
18
Definitions
A variable is some characteristic of a population or sample that is of interest for us. E.g. student grades. Typically denoted with a capital letter: X, Y, Z
Data are the observed values of a variable. E.g. student marks: {67, 74, 71, 83, 93, 55, 48}
19
Why We Need Data
To provide input to survey
To provide input to study To measure performance of service or production process To evaluate conformance to standards To assist in formulating alternative courses of action To satisfy curiosity
20
Data Sources
Primary Data Collection Secondary Data Compilation
Print or Electronic Observation Survey
Experimentation
21
Types of Data
Knowing the type of data is necessary to properly select the technique to be used when analyzing data.
22
Types of Data
Data
Categorical (Qualitative) Numerical (Quantitative)
Discrete
Continuous
23
Types of data - examples
Quantitative data Age - income
55 42 75000 68000
Qualitative data Person Marital status
1 2 3 married single single
. .
. . Weight gain
+10 +5
. . Computer
1 2 3 . .
. . Brand
IBM Dell IBM . .
. .
24
Data Presentation via Tables and Graphs
Organizing numerical/Quantitative data
The ordered array and stem-leaf display
Tabulating and graphing Univariate numerical/quantitative data
Frequency distributions: tables, histograms, polygons
Cumulative distributions: tables, the Ogive
25
Data Presentation via Tables and Graphs (continued..)
Tabulating and graphing Univariate categorical/qualitative data
The frequency distribution table
Bar and pie charts, the Pareto diagram
Graphing Bivariate numerical data
26
Organizing Numerical/Quantitative Data
Numerical Data
41, 24, 32, 26, 27, 27, 30, 24, 38, 21
Ordered Array
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
Frequency Distributions Cumulative Distributions Ogive Polygons
27
Stem and Leaf Display
2 144677
Histograms Tables
3 028 4 1
Ungrouped Versus Grouped Data
Ungrouped data
have not been summarized in any way are also called raw data have been organized into a frequency distribution
28
Grouped data
Example of Ungrouped Data
42 30 53 26 58 40 32 37 30 34 50 47 57 30 49
50
52 30 55
40
28 36 30
32
23 32 58
31
35 26 64
40
25 50 52
Ages of a Sample of Managers in the United Arab Emirates
49
61 74
33
31 37
43
30 29
46
40 43
32
60 54
29
Frequency Distribution of Managers Ages (An example of a grouped data)
Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80
Frequency 6 18 11 11 3 1
30
How to Construct a Frequency Distribution (or Tally) Table/Chart?
Find range: (51)
Select number of classes: (6)
Compute class interval (width): (10)
Determine class boundaries (limits): (20,30,40,50,60,70,80) Count observations & assign to classes
31
Data Range
42 30 53 50 52 30 55 49 61 74 26 58 40 40 28 36 30 33 31 37 32 37 30 32 23 32 58 43 30 29 34 50 47 31 35 26 64 46 40 43 57 30 49 40 25 50 52 32 60 54
Range = Largest - Smallest = 74 - 23 = 51
Smallest
Largest
32
Number of Classes and Class Width
The number of classes should be between 5 and 15.
Fewer than 5 classes cause excessive summarization. More than 15 classes leave too much detail. Divide the range by the number of classes for an approximate class width Round up to a convenient number
Class Width
51 Approximate Class Width = = 8.5 6 Class Width = 10
33
Frequency Distribution of Managers Ages
Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 Frequency 6 18 11 11 3 1
34
Tally Chart of Managers Ages
Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 Tallies IIIII I IIIII IIIII IIIII III IIIII IIIII I IIIII IIIII I III I
35
Class Midpoints, Relative Frequencies, and Cumulative Frequencies
Class Midpoint =
beginning class endpoint + ending class endpoint 2 30 + 40 = 2 = 35
36
Relative Frequency
Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 Total Relative Frequency Frequency 6 .12 6 18 .36 50 11 .22 18 50 11 .22 3 .06 1 .02 50 1.00
37
Cumulative Frequency
Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 Total Frequency 6 18 11 11 3 1 50
Cumulative Frequency 6 18 + 6 24 11 + 24 35 46 49 50
38
Class Midpoints, Relative Frequencies, and Cumulative Frequencies
Relative Class Interval Frequency Midpoint Frequency 20-under 30 6 25 .12 30-under 40 18 35 .36 40-under 50 11 45 .22 50-under 60 11 55 .22 60-under 70 3 65 .06 70-under 80 1 75 .02 Total 50 1.00 Cumulative Frequency 6 24 35 46 49 50
39
Cumulative Relative Frequencies
Cumulative Relative Cumulative Relative Class Interval Frequency Frequency Frequency Frequency 20-under 30 6 .12 6 .12 30-under 40 18 .36 24 .48 40-under 50 11 .22 35 .70 50-under 60 11 .22 46 .92 60-under 70 3 .06 49 .98 70-under 80 1 .02 50 1.00 Total 50 1.00
40
Histogram
Class Interval Frequency 20-under 30 6 30-under 40 18 40-under 50 11 50-under 60 11 60-under 70 3 70-under 80 1
20 Frequency 0 10
10 20 30 40 50 60 70 80 Years
41
Histogram Construction
Class Interval Frequency 20-under 30 6 30-under 40 18 40-under 50 11 50-under 60 11 60-under 70 3 70-under 80 1
20 Frequency 0 10
10 20 30 40 50 60 70 80 Years
42
Shapes of Histograms
Symmetry
A histogram is said to be symmetric if, when we draw a vertical line down the center of the histogram, the two sides are identical in shape and size:
Frequency
Frequency
Variable
Variable
Frequency
Variable
43
Shapes of Histograms
Skewness
A skewed histogram is one with a long tail extending to either the right or the left:
Frequency Variable
Frequency
Variable
Positively Skewed
Negatively Skewed
44
Shapes of Histograms
Bell Shape A special type of symmetric histogram is one that is bell shaped:
Frequency
Many statistical techniques require that the population be bell shaped. Drawing the histogram helps verify the shape of the population in question.
Variable
Bell Shaped
45
Why do we need Histograms?
Suppose a manufacturer of cereal wants to compare performance of his two plants (whether these plants are producing 500 grams of cereal accurately) He picks a sample of 100 cereal boxes from each of his two plants and prepared separate histograms. These histograms can provide information about the accuracy of the working of these plants.
46
Histogram Comparison Compare & contrast the following histograms based on
data from Example 2.6 & Example 2.7.
unimodal vs. bimodal
The two courses have very different histograms
spread of the marks (narrower | wider)
47
Frequency Polygon
Class Interval Frequency 20-under 30 6 30-under 40 18 40-under 50 11 50-under 60 11 60-under 70 3 70-under 80 1
20 Frequency 0 10
10 20 30 40 50 60 70 80 Years
48
Ogive
Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 Cumulative Frequency 6 24 35 46 49 50
60
Frequency
0
0
20
40
10
20
30
40 Years
50
60
70
80
49
Relative Frequency Ogive
Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 Cumulative Relative Frequency .12 .48 .70 .92 .98 1.00
Cumulative Relative Frequency
1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 0 10 20 30 40 Years 50 60 70 80
50
Stem and Leaf Display
This is a graphical technique most often used in a preliminary analysis. Stem and leaf diagrams use the actual value of the original observations (whereas, the histogram does not).
51
Stem and Leaf Display
Split each observation into two parts. There are several ways of doing that:
Observation:
42.19
Stem 42 Leaf 19
42.19
Stem 4 Leaf 2
52
Safety Examination Scores for Plant Trainees
Raw Data
86 76 77 92 91 47 60 88 55 67
Stem 2 3 4 3 9 79
Leaf
23
77 81 79 68
59
68 75 83 49
72
82 74 70 56
75
97 39 78 94
83
89 67 91 81
5
6 7 8 9
569
07788 0245567789 11233689 11247
53
Construction of Stem and Leaf Plot
Raw Data
86 76 23 77 77 92 59 68 91 47 72 82 60 88 75 97 55
Stem
Leaf 3 9 79 569 07788
Stem
67 83
2 3 4 5
Leaf
67
91 81
89
81
79 68
75
83 49
74
70 56
39
Stem 78
94 Leaf
7
8 9
0245567789
11233689 11247
54
Graphical Techniques for Qualitative data
When the raw data can be naturally categorized in a meaningful manner, we can display frequencies by
Pie chart emphasize the proportion of occurrences of each category. Bar charts emphasize frequency of occurrences of the different categories.
55
The Pie Chart
The pie chart is a circle, subdivided into a number of slices that represent the various categories.
The size of each slice is proportional to the percentage corresponding to the category it represents.
56
Second Quarter U.S. Truck Production (Example 1)
17% 4% 1%
39% 39%
57
Pie Chart Calculations for Company A
2d Quarter Truck Production 357,411
Company A
Proportion .388
Degrees 140
B
C D E
357, 411 = 920,190
354,936
160,997 34,099 12,747 920,190
.386
.175
139
63 13 5 360
58
.388 360 = .037
.014 1.000
Totals
The Pie Chart
Example 2
The student placement office at a university wanted to determine the general areas of employment of last year school graduates.
Data was collected, and the count of the occurrences was recorded for each area. These counts were converted to proportions and the results were presented as a pie chart and a bar chart.
59
Frequency and Relative Frequency Distributions for Example 2
Area Frequency Relative Fre. ----------------------------------------------------Accounting 73 28.8% Finance 52 20.6 General Management 36 14.2 Marketing/Sales 64 25.3 Other 28 11.1 ---------------------------------------------------Total 253 100
60
The Pie Chart
Other 11.1% General management 14.2%
Accounting 28.9%
(28.9 /100)(3600) = 1040
Finance 20.6%
Marketing 25.3%
61
The Bar Chart
Rectangles represent each category. The height of the rectangle represents the frequency. The base of the rectangle is arbitrary
Bar Chart
80 70 60 50 40 30 20 10 0 1 2 3 Area 4 5 More
73 52 36
64 28
Frequency
62
Graphing the Relationship Between Two Quantitative Variables
To explore this relationship, we employ a scatter diagram, which plots two variables against one another. The independent variable is labeled X and is usually placed on the horizontal axis, while the other, dependent variable, Y, is mapped to the vertical axis.
63
Scatter Diagram
Example 2.9 A real estate agent wanted to know to what extent the selling price of a home is related to its size
Collect the data Determine the independent variable (X = house size) and the dependent variable (Y = selling price) Use Excel to create a scatter diagram
1) 2)
3)
64
Scatter Diagram
It appears that in fact there is a relationship, that is, the greater the house size the greater the selling price
65
Patterns of Scatter Diagrams
Linearity and Direction are two concepts we are interested in
Positive Linear Relationship
Negative Linear Relationship
Weak or Non-Linear Relationship
66