0% found this document useful (0 votes)
21 views38 pages

Module 1

This document provides an overview of descriptive statistics and its applications. It discusses what statistics is, the different types of statistics (descriptive and inferential), why statistics is important to study, and examples of how statistics is applied in business. It also includes a case study on an Indian ridesharing company, Ola Cabs, that analyzes data on over 25,000 rides to identify reasons for vehicle unavailability and cancellations. Key concepts discussed include data, variables, levels of measurement, and types of variables.

Uploaded by

SusheelKumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views38 pages

Module 1

This document provides an overview of descriptive statistics and its applications. It discusses what statistics is, the different types of statistics (descriptive and inferential), why statistics is important to study, and examples of how statistics is applied in business. It also includes a case study on an Indian ridesharing company, Ola Cabs, that analyzes data on over 25,000 rides to identify reasons for vehicle unavailability and cancellations. Key concepts discussed include data, variables, levels of measurement, and types of variables.

Uploaded by

SusheelKumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MODULE 1

KNOWING DATA AND


DESCRIPTIVE STATISTICS
WHAT IS STATISTICS

Science of problem-solving using data. It involves:


▪ collection
▪ classification Data are information or proposition used for
decision making or drawing conclusion.
▪ summarization
▪ organization A major characteristic of data is variability.

▪ analysis - Number of airline tickets sold by a major airline company


on the same date or same time every year varies
▪ presentation - Calories consumed every day varies

▪ interpretation One of the major objectives in statistics is to


of data to draw conclusion. study and understand the source of variability.
TYPES OF STATISTICS
Descriptive statistics include organizing,
presenting and summarizing data. These
describe data through numerical summaries,
tables, and graphs.
E.g. numerical summaries (mean, median,
standard deviation etc.), graphs (histogram,
Pareto chart, box plot etc.)

Inferential statistics include generalizing the


results obtained from a sample, extending
them to the entire population and assigning
measures of reliability to the results.
E.g. parameter estimation, confidence and
prediction intervals, hypothesis testing

Concepts of probability and randomness is


intrinsic in the study of statistics
WHY STUDY STATISTICS
▪ Analytical and quantitative skills rank high as the most sought-
“core skills for after attributes while hiring.
businesspeople” – A ▪ Individuals with these skills are more likely to earn higher salaries.
▪ For a company, statistical knowledge provides a competitive edge
BusinessWeek article versus those who don’t understand market data.

▪ Statistical terminologies, e.g., normal distribution, confidence


Communication and interval, p-value, six sigma are widely used in all areas of business.
Technical Literacy ▪ Statistics acts as a common language of communication across
various interdisciplinary areas.

▪ Using various computer software and spreadsheets come with in-built


Computer Skills and data analysis features. Knowing statistics enhances computer skills
Information too.
▪ Insufficient data imply more surveys and sampling; Sufficient or
Management excess data reveal underlying relationships when properly mined.
APPLICATION OF STATISTICS IN BUSINESS
▪ Auditing: estimating proportion (sampling distribution) of incorrectly paid invoices to suppliers every
month based on a sample

▪ Marketing: correlation, regression and data mining helps identifying specific requirements for a
targeted group of customers to market products more efficiently

▪ Health Care: t-tests, ANOVA, survival analysis can help in identifying differences between two or
more therapists in evaluating patients, or differences in the efficacy of two cancer drugs etc.

▪ Quality Improvement: concepts distribution theory and process measure standard deviation
(sigma) greatly help in reducing product defects

▪ Operation Management: predictive analyses based on regression techniques or other machine


learning algorithms are useful in studying patterns and forecasting periodic customer demands,
thereby, managing supplies strategically

▪ Product Warranty: estimation of average cost of product warranty claims in the first year of sale
based on collected data, and using the estimate to predict future cost incurred to the companies
along with a degree of reliability of the estimate
CASE STUDY – DEMAND SUPPLY GAP
OLA CABS SERVICE (INDIAN CASES C.3 CASE 1)

▪ Bangalore based ridesharing company launched by ANI Technologies Pvt. Ltd. in 2010 in Mumbai
▪ Offers transportation services: superior luxury cars, Ola auto
▪ Functions in 102 cities with 450000 vehicles
▪ Customer care registers many complaints on vehicle unavailability and last moment cancellations on
the Bangalore city – Airport route bookings
▪ Problems:
▪ Lack of car availability during peak hours
▪ Cancellation by the drivers
both leading to loss of potential revenue
▪ The concerned team in the company wanted to identify cause and find a possible solution to the
problem
▪ Data were collected on 25541 rides for the month of November 2019 on seven important features –
Request ID, Driver ID, Time of Request, Pick-up time, Drop-off time, Pick-up point and Status of the
request (Completed/Cancelled/Not Available)
VARIABLES AND DATA

▪ Subject/Individual/Item: a person or an object or an item under study


▪ Variable: characteristic of the item, e.g., Driver ID, Status of the Request
▪ Observation: a single unit in a collection of information needed for a study
▪ Data set: collection of information on all variables over all subjects considered for a study

Request ID Driver ID Time of Request Pick-up time Drop-off time Pick-up point Status of the Request
1278 A1 01/11/19 9:15 01/11/19 9:36 01/11/19 10:55 Airport Completed Row: Observation
567 A2 01/11/19 13:01 01/11/19 13:35 01/11/19 15:28 Airport Completed
432 C1 03/11/19 17:22 03/11/19 17:45 03/11/19 20:05 Airport Completed Subjects/Items: 25541
23 B1 03/11/19 14:59 NA NA City Cancelled Observations: 25541
1989 A3 01/11/19 11:24 NA NA City Not Available Variables: 7

Column: Variable

This is a sample extracted from the ‘Ola Cabs Service’ data.


TYPES OF DATA

The sample data are mainly collected in one of the two ways.
▪ Cross sectional data: records on attributes of the items are collected at the same point in time without
considering time difference
▪ Time series data: data collected over several time periods like hourly, daily, weekly, monthly, quarterly,
or annually etc.

Cross sectional data


(Motor Trend Data)

Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1949 112 118 132 129 121 135 148 148 136 119 104 118
Time series data 1950 115 126 141 135 125 149 170 170 158 133 114 140
1951 145 150 178 163 172 178 199 199 184 162 146 166
(Air Passenger Data) 1952 171 180 193 181 183 218 230 242 209 191 172 194
TYPES OF VARIABLES
Request ID Driver ID Time of Request Pick-up time Drop-off time Pick-up point Status of the Request
1278 A1 01/11/19 9:15 01/11/19 9:36 01/11/19 10:55 Airport Completed
567 A2 01/11/19 13:01 01/11/19 13:35 01/11/19 15:28 Airport Completed
Variable 432 C1 03/11/19 17:22 03/11/19 17:45 03/11/19 20:05 Airport Completed
23 B1 03/11/19 14:59 NA NA City Cancelled
1989 A3 01/11/19 11:24 NA NA City Not Available

Qualitative Quantitative
allows for classification of numerical measures of
individuals based on individuals and allows for Designator
attribute or characteristic, arithmetic operations,
e.g., gender, political e.g., sales, salary,
affiliation, car brand expenditure, demand

Let’s discuss what are the types of


Discrete Continuous variables given in the ‘Ola Cabs
countably finite or
countably infinite
measured, uncountable or Service’ data
real numbers, e.g., salary,
numbers, e.g., no. of
failure time of an
defects, no. of cars sold,
instrument, sales
no. of applications
LEVEL OF MEASUREMENTS – SCALES

Measurement
Scale

Qualitative Quantitative

Ratio
Nominal Ordinal Interval measured in real
un-ordered/un-ranked, no measured in real numbers, ratio of two
ordering of arrangements like nominal but with
numbers, operations + or - values makes sense,
specific order of
name, brand, level, class, are meaningful but * or / properly defined zero
arrangements
type, category are not, no defined zero implying absence of
quantity
LEVEL OF MEASUREMENTS – DISCUSSION
For each example below, determine the scale of measurement:

1. Customer rating for a product: poor, average, fair, good, excellent


o Qualitative variable measured in ordinal scale

2. Total revenue generated by a business every month in the last year (in Rs.)
o Quantitative variable measured in ratio scale

3. Car brand: Tata, Maruti Suzuki, Honda, Hyundai, Toyota, Ford, Volkswagen,
Kia, Nissan, Mercedes
o Qualitative variable measured in nominal scale

4. Today’s temperature in Toronto (0C)


o Quantitative variable measured in interval scale

5. ‘Pick-up time’ or ‘Drop-off time’ in the ‘Ola Cabs Service’ data


o Quantitative variable measured in interval scale
DATA VISUALIZATION
UNDERSTAND DEMAND SUPPLY GAP GRAPHICALLY

▪ For the ‘Ola Cabs Services’ data:

o In order to gain insights, what kind of graphs can we plot?

o If we want to represent percentage of requests for each ‘Status of the Request’ (Cancelled,
Not Available, Completed) category, what kind of visual techniques would be useful?

o If we want to represent number of requests in each ‘Status of the Request’ category further
categorized by ‘Pick-up point’, what kind of visual techniques would be useful?

o How would you represent the data based on ‘Trip Duration’?


DATA VISUALIZATION: PIE CHART
▪ Frequency Distribution: table that lists frequency corresponding to each category of a qualitative variable
o Frequency: number of occurrences of observations in a category/class/group
o Relative frequency: (frequency/total no. of observations) in a category/class/group

▪ Pie Chart: This is represented by a circle divided into sectors. Each sector represents a category and area of
each category is proportional to the frequency of that category. Pie charts are applied to the cases where
relative percentages are of importance over absolute counts.

Row Labels Count of Request ID ▪ Only 41% of 25541


requests were
Cancelled 19.24%
completed
Completed 40.92% ▪ 40% of 25541 requests
Not Available 39.84% did not find a vehicle
▪ 19% of 25541 requests
Grand Total 100.00% were cancelled
Relative Frequency
Distribution Table
Pie Chart
DATA VISUALIZATION: BAR GRAPH
▪ Bar Graph: equal width rectangular blocks (with gaps in between) corresponding to each category of a qualitative
variable, where the height of a block represents frequency/relative frequency of that particular category

▪ Overwhelmingly large
number of cancellations
happened for the city pick
ups – traffic jam, requests
from short-distance
commuters

▪ Airport pick ups faced with


lots of not available requests
– cabs couldn’t return,
demand higher than supply

▪ The above two points indicate


the existence of demand
supply gap

Bar Graph
DATA VISUALIZATION: SIDE-BY-SIDE BAR GRAPH
▪ Most cancellations occurred
during the ‘Morning’ slot

▪ Most ‘Not Available’ requests


occurred during the ‘Night’
slot

▪ Most cancellations during the


‘Morning’ slot for city pick-ups

▪ Most ‘Not Available’ requests


occurred during the ‘Night’
slot for airport pick-ups

▪ Both time slots corresponds to


peak hours, and hence the
company faced with many
complaints about this demand
Bar Graph supply gap
DATA VISUALIZATION: HISTOGRAM – I
▪ Histograms are constructed to present quantitative data by drawing rectangles corresponding to each class of data. The
height of a rectangle denotes frequency or relative frequency of that class. There is no gap in between rectangles denoting a
continuous scale.
▪ There is no unique way to choose lower class limit of the first class and class width. Use whichever summarizes data in a
convenient way.
▪ Histogram represents the distribution of the
variable ‘Trip Duration’ (in minutes).
▪ A representative middle value of the
distribution lies somewhere between 40-70 mins.
▪ The spread of the distribution varies from
around 30 mins to 100 mins.
▪ The shape of the histogram reveals a skewed
(non-symmetric) shape.
▪ More frequencies are concentrated toward lower
values, hence, the data is positively-skewed or
right-skewed.
▪ Many trips took more than 70 mins to complete
▪ Large variability in ‘Trip Duration’ may indicate
heavy traffic
DATA VISUALIZATION: HISTOGRAM – II
Shape of a distribution from histogram

Uniform distribution Symmetric distribution

Negatively skewed distribution Positively skewed distribution


DATA VISUALIZATION: LINE CHART
▪ Line Chart: This is a graphical technique to present mainly time series data, see trends and compare
different time periods.
Year Product A Product B
1981 9.35 5.57
1982 4.68 10.4 ▪ Sales revenue
1983 4.21 16.21 for both products
1984 9.77 18.55
1985 14.35 18.78
show increasing
1986 12.05 18.21 trends over the
1987 14.15 20.86 years.
1988 14.95 19.36 ▪ Sales revenue
1989 11.68 17.96
for product A is
1990 17 18.31
1991 20.52 19.64 higher than B
1992 19.92 22.27 almost
1993 21.93 24.69 consistently over
1994 22.89 28.41 the years.
1995 28.15 34.04
1996 29.28 37.36 Sales revenue (in lakhs) from products A and B in 1981-1999
1997 26.68 34.83
1998 30.92 38.64
1999 29 41.55
RECAPITULATION

Quantitative Data:
Qualitative Data:
▪ Frequency (Relative Frequency)
▪ Frequency (Relative Frequency) Distribution Table for classes
Distribution Table for categories
▪ Graphs:
▪ Graphs: o Histogram – applicable for large
o Bar Plot – absolute/relative value number of observations
representation) – horizontal/vertical o Stem-and-Leaf plot – raw data can
o Pie Chart – percentages/relative be retrieved, not suitable for large
value representation observations
o Pareto Chart – most frequent o Dot Plot – not suitable for large
categories observations
o Side-by-Side Bar Plot (comparison o Line Chart – display time series
across groups by same attribute) data, several variables
simultaneously
NUMERICAL SUMMARY
To describe/summarize data, we apply mainly 3 kinds of measures:
▪ Measures of Central Tendency
o Mean Provide a representative or
o Median aggregate number around which all
o Mode observations lie
▪ Measures of Dispersion
o Range
o Standard Deviation (SD)
o Mean Absolute Deviation (MAD) Denote the expanse or spread of the
o Variance observations
o Coefficient of variation (CV)

▪ Measures of Position Denotes location of observations in


o Quantiles (Percentiles or Quartiles) an ordered arrangement
MEASURES OF CENTRAL TENDENCY
▪ Mean: sum of all observations
divided by the number of Median (61.36)
observations
Mean (62.58)
▪ Median: middle observation when
the observations are arranged in Mode (40-45)

increasing order
▪ Mode: most frequent
observation/class
▪ The average ‘Trip Duration’ is 62.58 mins.

▪ Half of the requests had ‘Trip Duration’


less than or equal to 61.36 mins and
remaining half had greater than 61.36 mins
of ‘Trip Duration’.

▪ Most requests had ‘Trip Duration’ between


40-45 mins.

Mathematical notation
MEASURES OF DISPERSION − AN ILLUSTRATION
We want to construct a measure which denotes variation or
spread of the data.
▪ Consider 5 numbers: {2,4,5,6,8}
2+4+5+6+8 25
▪ Mean = = =5
5 5
▪ Deviation of each number from the mean:
{(2 − 5), (4 − 5), (5 − 5), (6 − 5), (8 − 5)} ≡ {−3, −1, 0, 1, 3} 0 2 4 5 6 8
These denote the distances of each observation from the
average.
−3 + −1 +0+1+3
▪ Average/mean deviation from the mean = =0
5
▪ Average/mean squared deviation from the mean =
(−3)2 +(−1)2 +(0)2 +(1)2 +(3)2 20
= =4
5 5 Is dividing by 5 ok?
▪ Square root of average/mean squared deviation from the
mean = 4 = 2
▪ Average/mean absolute deviation from the mean =
3 + 1 +0+1+3
= 1.6
5 Mathematical notation
MEASURES OF DISPERSION – II
Range (68.61)
▪ Range: maximum-minimum Variance (342.40) CV (29.73)
▪ Standard Deviation: square root of
sum of the squared deviation from the SD (18.61)
mean divided by (𝑛 − 1) Mean (62.58)
▪ Variance: sum of the squared deviation
from the mean divided by (𝑛 − 1) MAD (15.92)

▪ Mean Absolute Deviation: sum of the


absolute deviation from the mean
divided by 𝑛
▪ Coefficient of variation (CV):
SD
× 100
Mean
Interpretation of SD: Approximate average
distance/deviation of ‘Trip Duration’ values
from the ‘Trip Duration’ average (62.58) is
18.61 mins, higher this number greater is
the variation in the data
MEASURES OF DISPERSION – III
January June
▪ How measures of dispersion help?
▪ Example: Mean=11.42 Mean=11.61
SD=2.17 SD=1.99
MAD=1.59 MAD=1.35
A research team in an electronic parts
manufacturing firm wants to know average
number of defective ICs manufactured
everyday. To study this, they chose four
months’ manufacturing data everyday for
January, June, October and November. Assume
that the manufacturing machine is properly October November
maintained throughout the year and produces
Mean=11.58 Mean=11.76
same number of ICs everyday.
SD=4.97 SD=2.32
MAD=3.06 MAD=1.78
If they want to publish the best-case scenario in
their annual report based on a single month,
what value of the average number defects per
day would they want to present?
WHEN TO USE WHAT
Central Tendency Dispersion
Mean: SD:
▪ Best for continuous data, also used for discrete data ▪ Quantitative data
▪ Not suitable for skewed data, works well with ▪ Sensitive to outliers
symmetric data ▪ Not suitable for comparing groups with different units
▪ Sensitive to outliers
Median: MAD about mean:
▪ Continuous or discrete data, ordinal data ▪ Quantitative data
▪ Suitable for skewed data ▪ Algebraically difficult to handle
▪ NOT sensitive to outliers ▪ Sensitive to outliers
▪ Not suitable for comparing groups with different units
Mode: CV:
▪ Works well with nominal or ordinal data ▪ Quantitative data
▪ Suitable for comparing groups with different units
MEASURES OF POSITION – I
▪ Percentiles: Pk is called k-th
percentile of a set of observations, if k P25=46.81
percentage of observations is less
than or equal to the value. P50=61.36
▪ Quartiles: set of 3 numbers which P75=77.95
partitions the observation set into 4
equal parts, so basically, 25th (Q1),
50th (Q2) and 75th (Q3) percentiles.

Interpretations:
▪ P25 is 46.81 meaning 25% of Trip Durations
were of less than or equal to 46.81 mins.

▪ 25% of all request had Trip Duration greater


than 77.95 mins, which further indicates the
shortage of cabs in the route.

▪ At least half of all requests took more than 1


hour to complete.
BOX AND WHISKER PLOT – I
Extreme Outlier
Inter-Quartile Range (IQR)=Q3 – Q1

LOF=Q1-3*IQR UOF=Q3+3*IQR
UOF

LIF=Q1-1.5*IQR UIF=Q3+1.5*IQR
Outlier

5 number summary: Min, Q1, Q2, Q3, Max


UIF

▪ For both ‘Airport’ and ‘City’ pick-ups, Trip Duration distributions


were very similar – slightly right skewed, similar mean, Q1,
Q1 Q2 Q3

Q2, Q3, Min and Max values

▪ There is no outlier in the ‘Trip Duration’.

▪ Box-plots are applied:


o To understand the shape of a distribution
LIF

o Present the 5-number summary


o Compare 2 or more distributions to have further insights
Outlier

o Detect outliers
LOF

o To understand the center and spread of the distribution


Extreme Outlier

Box plots for ‘Trip Duration’


BOX AND WHISKER PLOT – II

▪ For all 6 time slots, Trip


Duration distributions
were very similar –
almost symmetric, similar
mean, Q1, Q2, Q3, Min
and Max values with no
outliers

Box plots for ‘Trip Duration’ for every ‘Time Slot’


BOX AND WHISKER PLOT – III

▪ Indicates a right skewed distribution

▪ Indicates a left skewed distribution

▪ Indicates a symmetric distribution


BELL SHAPED AND SYMMETRIC CURVE

Empirical Rule:
For any bell-shaped nearly symmetric distributed data, the
empirical rule says:

1. approximately 68% of the data lies between 1 standard


deviation from the mean, i.e., in the interval (𝑥 - s, 𝑥 + s).

2. approximately 95% of the data lies between 2 standard


deviations from the mean, i.e., in the interval (𝑥 - 2s, 𝑥 + 2s).

3. approximately 99.7% of the data lies between 3 standard


deviation from the mean, i.e., in the interval (𝑥 - 3s, 𝑥 + 3s).
BELL SHAPED AND SYMMETRIC CURVE

34% 34%

13.5% 13.5%

2.35% 2.35%
0.15% 0.15%
𝑥
𝑥−𝑠 𝑥+𝑠
68%
𝑥 − 2𝑠 𝑥 + 2𝑠
95%
𝑥 − 3𝑠 𝑥 + 3𝑠
99.7%
Z-SCORES
▪ Universally, a z-score (of salaries) of 1.68 or above
indicates the almost certainty of repayment of loan of 30 L.

▪ A financial institution has the following information for all


those who borrowed loan:
o They found the annual salary distribution is bell
shaped and symmetric.
o They found the average salary to be Rs. 15.5L.
o They found the SD of the salaries to be Rs. 2.3L.

▪ Will a person whose annual salary is 17.8L repay a loan


amount almost certainly?

𝑥 − 𝑥 17.8 − 15.5 𝑥 = 15.5


𝑧= = = 1 < 1.68
𝑠 2.3 𝒔 = 𝟐. 𝟑

Measure of relative position in a group; also helps in


comparison across heterogeneous groups
BIVARIATE DATA
miles per gallon weight of car
▪ Bivariate data analysis includes study of data considering two variables 18.9
17
3910
3860
together 20 3510
18.3 3890
20.1 3365
▪ Gasoline Consumption Data: 11.2 4215
22.1 3020
o Miles per gallon (Y) 21.5 3180
o Weight of car (X10) – in pounds 34.7 1905
30.4 2320
16.5 3885
▪ Study two variables together to derive insights into their association 36.5 2009
21.5 2655
19.7 3375
▪ Even when one variable is not observed, trend or behavior of the other 20.3
17.8
2700
3890
variable can be determined 14.4 5290
14.9 5185
17.8 3910
▪ Response (target) variable: primary variable of interest that we want to 16.4 3660
23.5 3050
study; conventionally denoted by Y 21.5 4250
31.9 2275
13.3 5430
▪ Explanatory (predictor or feature or factor) variable: variable which aids in 23.9 2535
explaining or predicting the response; conventionally denoted by X 19.7 4370
13.9 4540
13.3 4715
13.8 4215
16.5 3660
SCATTER PLOT
▪ Scatter plot helps in studying relationship between two variables
▪ X axis: explanatory variable; Y axis: response/target variable

▪ Downward sloping pattern of the scatter plot

▪ As X10 increases, Y decreases and vice versa

▪ The relationship between Y and X10 is


negative or inversely proportional

▪ The relationship is slightly curvilinear


MEASURE OF CORRELATION – I

▪ To numerically quantify the degree of


linear association between two variables,
a measure called Pearson’s product
moment correlation coefficient ( 𝒓 ) is
used.
▪ 𝑟 lies between −1 and +1
▪ 𝑟 is unit less
▪ Correlation doesn’t imply causation

σ𝑛𝑖=1 𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦
𝑟=
σ𝑛𝑖=1 𝑥𝑖 − 𝑥 2 σ𝑛𝑖=1 𝑦𝑖 − 𝑦 2
MEASURE OF CORRELATION – II
▪ For the ‘Gasoline Consumption’ data, generate the correlation matrix with variables Y
(mileage, miles per gallon) and X1-X5.

▪ 𝑟 = −0.87182 is the correlation coefficient


Column 1 Column 2 Column 3 Column 4 Column 5 Column 6
value between Y (miles per gallon) and X1
Column 1 1
Column 2 -0.87182 1
(displacement in cubic inches)
Column 3 -0.79656 0.940646 1
Column 4 -0.84934 0.989585 0.964359 1
▪ This indicates a strong negative linear
Column 5 0.422415 -0.34959 -0.2899 -0.326 1 relationship between the two variables
Column 6 0.635232 -0.67143 -0.55096 -0.67287 0.413781 1
MEASURE OF CORRELATION – II
▪ For the ‘Gasoline Consumption’ data, generate the correlation matrix with variables Y
(mileage, miles per gallon) and X1-X5.

Column 1 Column 2 Column 3 Column 4 Column 5 Column 6


Column 1 1
Column 2 -0.87182 1
Column 3 -0.79656 0.940646 1
Column 4 -0.84934 0.989585 0.964359 1
Column 5 0.422415 -0.34959 -0.2899 -0.326 1
Column 6 0.635232 -0.67143 -0.55096 -0.67287 0.413781 1

Range of Correlation Coefficient Strength


-0.3 to + 0.3 Weak
-0.7 to -0.3 OR +0.3 to +0.7 Moderate
Less than -0.7 OR more than +0.7 Strong

Source: Pearson's Correlation Coefficient - SAGE Research Methods


[Link] › pearson-in-gho-2012
CORRELATION AND SCATTER PLOT

𝒓 = +𝟏 perfect positive linear 𝒓 ≈ +𝟏 strong positive linear 𝒓 ≈ 𝟎 uncorrelated

𝒓 = −𝟏 perfect negative linear 𝒓 ≈ −𝟏 strong negative linear 𝒓 ≈ 𝟎 non-linear

You might also like