0% found this document useful (0 votes)

186 views10 pages

Class Test 1 Revision Notes

This document provides a summary of key concepts in descriptive statistics, including: 1) Types of variables (categorical, ordinal, quantitative) and charts (histogram, bar chart) used to visualize data. 2) Pros and cons of different visual displays (box plots, stem-and-leaf plots, dot plots, histograms) for analyzing quantitative data. 3) Measures of central tendency (mean, median, mode), variability (standard deviation, range, interquartile range), and shape (skewness, kurtosis).

Uploaded by

Harry Kwong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

186 views10 pages

Class Test 1 Revision Notes

Uploaded by

Harry Kwong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Class Test 1 Revision Note

Chapter 1 Descriptive Statics

Types of Variables
- Categorical variables
- Ordinal variables
- Quantitative variables

Charts
- Histogram


- Histogram vs. Bar Chart

 One implication of this distinction: it is always appropriate to talk about the
skewness of a histogram; that is, the tendency of observations to fall more
on the low end or the high end of the x-axis
 With bar charts, however, the x-axis does not have a low end or a high end;
because the labels on the x-axis are categorical – not quantitative. As a
result, it is less appropriate to comment on the skewness of a bar chart.

- Pros and Cons of the Four Visual Displays for Quantitative Variables
 Box plots, stem-and-leaf plots, dot plots, and histograms organize
quantitative data in ways that let us begin to find the information in a data
set.
 As to the question of which type of display is the best, there is no unique
answer.
 The answer depends on what feature of the data may be of interest and, to a
certain degree, on the sample size.
 Box plot
 Strength:
 Give a direct look at central location and spread as it summarizes
the five-number summary.
 Can identify outliers.
 Side-by-side box plot is an excellent tool for comparing two or
more groups
 Weakness:
 Not entirely useful for judging shape.
 Cannot distinguish between bell-shaped or bimodal.
 Stem-and-Leaf plot
 Strength:
 Excellent for sorting data.
 With a sufficient sample size, it can be used to judge shape.
 Weakness:
 With a large sample size, a stem-and-leaf plot may be too
cluttered because the display shows all individual data values.
 More restricted in the choices for “intervals” when compared to
histograms.
 Dot plot
 Strength:
 Can present all individual data values.
 Easy to create.
 Weakness:
 With a large sample size, a dot plot may be too cluttered.
 Histogram
 Strength:
 Excellent for judging the shape of a data set with moderate or
large sample sizes.
 Flexible in choosing number as well as the width of the intervals
for the display.
 Between 6 and 15 intervals usually gives a good picture of the
shape.
 Weakness:
 With a small sample size, a histogram may not “fill in”
sufficiently well to show the shape of the data.
 With either too few intervals or too many, we may not see the true
shape of the data.

- Misleading Graphs
 Statistics can be misleading if not presented appropriately.
 Same data can appear very differently when graphed.
 E.g. break in the vertical axis.
 Frequency on the vertical axis should be continuous from zero.
When we put a break in the axis, we lose proportional relationship
among class interval frequencies.

- Shape of Frequency Distributions

 J-shaped
 Positively skewed
 Negatively skewed
 Rectangular
 Bimodal
 Bell-shaped

Numerical Summaries
- Measures of Central Location: Mean, Mode, Median
 Mean as the Balance Point of a Distribution:
 Unlike the median and the mode, the mean is responsive to the exact
position of each score in the distribution. It is the balance point of a
distribution.
 Median in the Case with Outliers:
 The median is less sensitive than the mean to the presence of a
few extreme scores (outliers)
 Is it permissible to calculate the mean for tests in the behavioral
sciences? First of all, we have to ask ourselves a question: “Is the
measurement on this scale interval or ordinal?” Sometimes it may not
be interval nor ordinal.
 Measures of Variability: Standard Deviation, Range, Interquartile Range
 The standard deviation, like the mean, is responsive to the exact
position of every score in the distribution, because it is calculated by
taking deviations from the mean, if a score is shifted to a position more
deviant from the mean, the standard deviation will increase. If the shift
is to a position closer to the mean, the standard deviation decreases.
 Measures of Shape: Skewness, Kurtosis
 Skewness is a measure of a data set’s deviation from symmetry

 Skewness 
m3
, m2 
 (x  x ) 2

, m3 
 (x  x ) 3

3
m2 2 n n

The value of this measure generally lies between -3 and +3. The
closer the value lies to -3, the more the distribution is skewed left,
vice versa. A value close to 0 indicates a symmetric distribution. A
normal distribution is symmetric and has skewness of 0.
 There are other measures of skewness:
 1. Pearson mode skewness or fist skewness coefficient
mean  mode
skewness 
s.d .
Mean < (>) mode  distribution is -ve-ly (+ve-ly) skewned

 2. Pearson median skewness or second skewness coefficient

3(mean  median)
skewness 
s.d .
Mean < (>) median  distribution is -ve-ly (+ve-ly) skewed
 3. Bowley skewness or quartile skewness coefficient
(Q  Q2 )  (Q2  Q1 ) Q3  2Q2  Q1
skewness  3 
Q3  Q1 Q3  Q1
Distribution Coefficient of Skewness Measures of Central
Location
Symmetrical 0 Mean = Median = Mode
Skewed to the right >0 Mean > Median > Mode
Skewed to the left <0 Mean < Median < Mode
 Kurtosis is a measure of peakedness of a distribution.
m
kurtosis  42
m2
 Excess kurtosis is defined as the kurtosis minus 3, i.e.
excess kurtosis = kurtosis – 3
Normal distribution has an excess kurtosis of 3.
Generally, if a distribution has a greater excess kurtosis, it has a
higher peak and thicker tails, compared to another distribution of
the same kind.
 Outlier is a data point that is not consistent with the bulk of the data.
If an observation is outside the range [Q1 – 1.5IQR , Q3+1.5IQR],
then it is regarded as outlier.
 Possible reasons for outliers and what to do about them:
 Outlier is legitimate data value and represents natural
variability for the group and variable(s) measured. Values
may not be discarded. They provide important information
about location and spread.
 Mistake made while taking measurement or entering it into
computer. If verified, should be discarded or corrected.
 Individual in question belongs to a different group other than
bulk of individuals measured. Values may be discarded if
summary is desired and reported for the majority group only.
 Coefficient of Variation
 The standard deviation measures the variation in a set of data. For
decision makers, the standard deviation indicates how spread out a
distribution is.
 For distributions having the same mean, the distribution with the
largest standard deviation has the greatest relative spread.
 When two or more distributions have different means, the relative
spread cannot be determined by merely comparing the standard
deviations.
 Coefficient of variation (CV), is used to measure the relative variation
for distributions with different means.
s
 Sample coefficient of variation = (100%)
x
 When the coefficients of variation for two or more distributions are
compared, the distribution with the largest CV is said to have the
greatest relative spread.

Normal Distribution

Percentile

- k-th percentile is a number that has k% of the data values at or below it and
(100-k)% of the data values at or above it. Lower quartile, median, upper quartile
are special cases of percentile. Lower quartile = 25th percentile, median = 50th
percentile, upper quartile = 75th percentile.
Value-at-Risk (VaR)

- One important application of percentile in risk management is VaR.

- VaR is defined as the worst loss over a target horizon that will not be exceeded
with a certain confidence level. For instance, the VaR at the 95% confidence level
gives a loss value that will not be exceeded with no less than 95% of probability.

Z-score

-   1 contains about 68% of the scores

-   2 contains about 95% of the scores

-   3 contains about 99.7% of the scores

Chapter 2 Correlation and Regression

Scatterplot

- Positive/negative association, linear relationship/nonlinear (curvilinear)

relationship

Correlation Coefficient r

- Strength

 It is determined by the closeness of the points to a straight line.

- Direction

 It is determined by whether one variable generally increases or generally

decreases when the other variable increases

- Linear

 When the pattern is nonlinear, the correlation coefficient is not an

appropriate way to measure the strength of the relationship.

- The measure is also called Pearson product-moment correlation coefficient.

r
S xy

 ( x  x )( y  y )
( S xx )( S yy )  (x  x )  ( y  y)
2 2

where
( x) 2
S xx   ( x  x ) 2   x 2  nx 2   x 2 
n
( y ) 2
S yy   ( y  y )   y  ny   y 
2 2 2 2

S xy   ( x  x )( y  y )   xy  nx  y   xy 
 x y
n

- r is always -1 and +1.

- Magnitude indicates the strength of the linear relationship.

- Sign indicates the direction of the association.

Rank Correlation Coefficient rs

- Since rankings are qualitative data but not quantitative data even though they are
numerical, sample correlation coefficient r cannot be used.
- Instead, we will use the nonparametric counterparts of r, the rank correlation
coefficient rs, to perform correlation analysis to a form of qualitative data:
bivariate rankings.

- If we wish to assess the strength of the relation between the two sets of ranks, we
can compute the sample rank correlation coefficient rs.

- The Spearman correlation coefficient rs is defined as the Pearson correlation

coefficient between the ranks of the data.

rs 
 ( R  R )( R  R )
x x y y
, where Rx and R y are the ranks of the two
 (R  R )  (R  R )
x x
2
y y
2

variables of interest.

If there are no tied ranks in the data, then the following formula also works

6 i 1 di2
n

Shortcut formula: rs  1  ,
n(n 2  1)

di
 Rank ( xi )  Rank ( yi )
where ,
 Rxi  Ryi (difference between a pair of ranks)

n = the number of pairs of ranks

- When to use rs instead of r?

 Situation 1: Data are given in the form of ranks.

 Situation 2: Data are given in the form of scores, but what matters is that
one score is higher than another and how much higher is not really
important. Then, translating scores to ranks will be suitable.

- Cautions in the use of correlation

 Bear in mind the following five cautions in the use of correlation.

 Correlation does not prove causation

 If variation in X causes variation in Y, that causal connection will

appear in some degree of correlation between X and Y.

 However, we cannot reason backward from a correlation to a

causal relationship.
 We must always remember “correlation does not imply
causation”.

 There are at least four possibilities of an observed correlation.

Denote X as the explanatory variable, Y as the response variable.

(a) Causation – X is a cause of Y.

(b) Reverse of causation – Y is a cause of X.

(c) A third variable influences both X and Y.

(d) A complex of interrelated variables influences X and Y.

Note: Two or more of these situations may occur simultaneously.

For example, X and Y may influence each other. (a+b)

 r and rs are only for linear relationship

 When data for one or both variables are not linear, other measures
of association are better.

 effect of variability

 The correlation coefficient is sensitive to the variability

characterizing the measurements of the two variables.

 For example, suppose a university had only minimal entrance

requirements, the relationship between total SAT scores, and the
other university is a more selective private university which
admits students only with SAT scores of 1200 or higher. The
correlation will be weaker in the latter case.

 Therefore, restricting the range, whether in X, in Y, or in both,

results in lower correlation coefficient (in magnitude).

 effect of discontinuity

 The correlation tends to be an overestimate in discontinuous

distributions.

 Usually, discontinuity, whether in X, in Y, or in both, results in a

higher correlation coefficient.

 correlation for combined data

 correlation coefficient may increase or decrease, depends.

- Examples of deceiving relationship

 Outliers can substantially inflate or deflate correlations.
 An outlier that is consistent with the trend of the rest of the data will
inflate the correlation.

 An outlier that is not consistent with the rest of the data can
substantially decrease the correlation.

 Groups combined inappropriately may mask relationships.

 The missing link is a third variable.

 Simpson’s Paradox

 Two or more groups

 Variables for each group may be strongly correlated

 When groups combined into one, very little correlation between

the two variables.

Simple Linear Regression

HKU Statistics Class Test 2 2022-2023
No ratings yet
HKU Statistics Class Test 2 2022-2023
4 pages
HKU STAT1600 Class Test 1 Overview
No ratings yet
HKU STAT1600 Class Test 1 Overview
3 pages
Lecture Notes For STAT2602
No ratings yet
Lecture Notes For STAT2602
104 pages
STAT1600: Descriptive Statistics Overview
No ratings yet
STAT1600: Descriptive Statistics Overview
80 pages
SCNC 1112 Assignment 4
No ratings yet
SCNC 1112 Assignment 4
2 pages
Lecture5 MathematicalModelling
100% (1)
Lecture5 MathematicalModelling
31 pages
Lecture2GoodTheory and Simplifications
100% (1)
Lecture2GoodTheory and Simplifications
36 pages
SCNC1112 Introducing Science and Course Overview
100% (1)
SCNC1112 Introducing Science and Course Overview
19 pages
Sports Day Points Calculator in Python
No ratings yet
Sports Day Points Calculator in Python
22 pages
Bafs s4 Personal Finance CH
No ratings yet
Bafs s4 Personal Finance CH
3 pages
Yh Dse 2024 Econ X2
No ratings yet
Yh Dse 2024 Econ X2
19 pages
Perceptron Learning in Neural Networks
No ratings yet
Perceptron Learning in Neural Networks
66 pages
Multi-layer Perceptron Backpropagation Guide
No ratings yet
Multi-layer Perceptron Backpropagation Guide
70 pages
hw3 Sol
100% (1)
hw3 Sol
6 pages
HKDSE M1 Probability Distributions
No ratings yet
HKDSE M1 Probability Distributions
15 pages
PSYC2020 Course Outline
No ratings yet
PSYC2020 Course Outline
8 pages
Inference About Population Variance
No ratings yet
Inference About Population Variance
11 pages
Mia 509
No ratings yet
Mia 509
105 pages
Week-9 Discrete Probability Distributions
No ratings yet
Week-9 Discrete Probability Distributions
97 pages
The Ultimate Student's Guide To AP Microeconomics (PDFDrive)
No ratings yet
The Ultimate Student's Guide To AP Microeconomics (PDFDrive)
183 pages
BK 1
No ratings yet
BK 1
254 pages
Coordinate Geometry Exercises
No ratings yet
Coordinate Geometry Exercises
3 pages
2020 Bafs Dse Paper 1 and 2A Eng
No ratings yet
2020 Bafs Dse Paper 1 and 2A Eng
21 pages
Differential Equations in Economics
No ratings yet
Differential Equations in Economics
47 pages
Data Science 101: R Programming Basics
No ratings yet
Data Science 101: R Programming Basics
603 pages
Triangle Centers and Special Lines
No ratings yet
Triangle Centers and Special Lines
5 pages
International School Interview Guide
No ratings yet
International School Interview Guide
8 pages
Hypothesis Tests For One and Two Population Variances - Part C
No ratings yet
Hypothesis Tests For One and Two Population Variances - Part C
18 pages
Dseexamstat22 8
No ratings yet
Dseexamstat22 8
6 pages
2024hkdse e Bio
No ratings yet
2024hkdse e Bio
3 pages
Ssmia3e 4a SB Eng Ch2
No ratings yet
Ssmia3e 4a SB Eng Ch2
67 pages
2014-Dse-Math-Cp 1 - MS
No ratings yet
2014-Dse-Math-Cp 1 - MS
16 pages
Elementary Statistics Chap-1 4
No ratings yet
Elementary Statistics Chap-1 4
53 pages
S2 MA UT1 23-24PastPaper
No ratings yet
S2 MA UT1 23-24PastPaper
12 pages
Ying Wa College Events 2023-24
No ratings yet
Ying Wa College Events 2023-24
2 pages
Macroeconomic Equilibrium Explained
No ratings yet
Macroeconomic Equilibrium Explained
39 pages
S1 N3A Greek T
No ratings yet
S1 N3A Greek T
13 pages
Mathematics Mock Exam Paper 1
No ratings yet
Mathematics Mock Exam Paper 1
19 pages
COMP 3711 Midterm Exam Fall 2018
No ratings yet
COMP 3711 Midterm Exam Fall 2018
8 pages
Data Description Toolbox DD Tools 2.0.0
No ratings yet
Data Description Toolbox DD Tools 2.0.0
47 pages
Lecture Five Radial-Basis Function Networks: Associate Professor
No ratings yet
Lecture Five Radial-Basis Function Networks: Associate Professor
64 pages
Econ A231 Tma02
No ratings yet
Econ A231 Tma02
3 pages
Paul A. Gagniuc - Coding Examples From Simple To Complex - Applications in MATLAB (2024, Springer) - Libgen - Li
No ratings yet
Paul A. Gagniuc - Coding Examples From Simple To Complex - Applications in MATLAB (2024, Springer) - Libgen - Li
275 pages
Introductory Econometrics Course Guide
100% (1)
Introductory Econometrics Course Guide
41 pages
HKDSE Maths All in One
No ratings yet
HKDSE Maths All in One
48 pages
Logarithmic Functions
No ratings yet
Logarithmic Functions
16 pages
Year 9 - Statistical Investigation Project
No ratings yet
Year 9 - Statistical Investigation Project
9 pages
Math 11: Calculus I Differential Calculus
No ratings yet
Math 11: Calculus I Differential Calculus
9 pages
GP - All JCs 2007 Mid-Year GP Comprehension & Essays
No ratings yet
GP - All JCs 2007 Mid-Year GP Comprehension & Essays
131 pages
2020 Mock Extra 6 Paper 2 Sol Eng
No ratings yet
2020 Mock Extra 6 Paper 2 Sol Eng
7 pages
How Can We Test For The Presence of Different Food Substances?
No ratings yet
How Can We Test For The Presence of Different Food Substances?
42 pages
Gcesoln 2
100% (1)
Gcesoln 2
3 pages
Ch-9 Data Preparation and Preliminary Analysis
No ratings yet
Ch-9 Data Preparation and Preliminary Analysis
15 pages
Edexcel A Level Large Data Set Overview
100% (1)
Edexcel A Level Large Data Set Overview
5 pages
WYC S.3 Ch.2 Note Teacher
No ratings yet
WYC S.3 Ch.2 Note Teacher
34 pages
Serangoon JC Chemistry Mid-Year Exam 2015
No ratings yet
Serangoon JC Chemistry Mid-Year Exam 2015
15 pages
Ismaykim1 PDF
No ratings yet
Ismaykim1 PDF
522 pages
M2 I07 Volume
No ratings yet
M2 I07 Volume
48 pages
Quadrilaterals: 4.1 Properties of Parallelograms
No ratings yet
Quadrilaterals: 4.1 Properties of Parallelograms
20 pages
Biostats Lesson 3
No ratings yet
Biostats Lesson 3
6 pages
T1 - Demo of Random Walk
No ratings yet
T1 - Demo of Random Walk
1 page
Multivariable Calculus Exam
No ratings yet
Multivariable Calculus Exam
4 pages
Lecture6 StockSimulationExcel
No ratings yet
Lecture6 StockSimulationExcel
35 pages
STAT1600A (16-17, 1st) Assignment 4
No ratings yet
STAT1600A (16-17, 1st) Assignment 4
5 pages
Syllabus
No ratings yet
Syllabus
4 pages
Fixed Income Securities Assignment
No ratings yet
Fixed Income Securities Assignment
2 pages
EC212 Problem Set I Solutions
No ratings yet
EC212 Problem Set I Solutions
6 pages
Problem Sets (Days 1-6)
No ratings yet
Problem Sets (Days 1-6)
18 pages
Qualitative Response Regression Insights
No ratings yet
Qualitative Response Regression Insights
10 pages
Pyq Research Method
No ratings yet
Pyq Research Method
24 pages
The Curse of The Self Self Awareness Egotism and The Quality of Human Life 1st Edition Mark R. Leary PDF Version
No ratings yet
The Curse of The Self Self Awareness Egotism and The Quality of Human Life 1st Edition Mark R. Leary PDF Version
106 pages
Cambridge International AS & A Level: Mathematics 9709/52
No ratings yet
Cambridge International AS & A Level: Mathematics 9709/52
12 pages
Parenting Style and Child's Mental Health
No ratings yet
Parenting Style and Child's Mental Health
16 pages
Materi 4 Estimasi Titik Dan Interval-Edit
No ratings yet
Materi 4 Estimasi Titik Dan Interval-Edit
73 pages
(Ebook PDF) Modern Control Systems 13th Edition by Richard C. Dorf Download
100% (1)
(Ebook PDF) Modern Control Systems 13th Edition by Richard C. Dorf Download
78 pages
Coarsened Exact Matching in Stata
No ratings yet
Coarsened Exact Matching in Stata
23 pages
Time Series Analysis Guide
100% (1)
Time Series Analysis Guide
83 pages
The Impact of Mental Accounting and Self-Control On The Daily Expenses of Students
No ratings yet
The Impact of Mental Accounting and Self-Control On The Daily Expenses of Students
12 pages
Business Statistics: A Decision-Making Approach: Multiple Regression Analysis and Model Building
No ratings yet
Business Statistics: A Decision-Making Approach: Multiple Regression Analysis and Model Building
69 pages
Statistical Inference for Two Samples
No ratings yet
Statistical Inference for Two Samples
40 pages
Descriptive Statistics Guide
No ratings yet
Descriptive Statistics Guide
27 pages
Climate Change and Plant Phenology
No ratings yet
Climate Change and Plant Phenology
13 pages
SYLLABUS IN ADVANCED STATISTICS PHD
100% (6)
SYLLABUS IN ADVANCED STATISTICS PHD
2 pages
Econometrics Till Midsem
No ratings yet
Econometrics Till Midsem
236 pages
Inductive Reasoning in Number Sequences
No ratings yet
Inductive Reasoning in Number Sequences
12 pages
Matu - Stakeholder Participation in Project Life Cycle Management, Risk Management Practices and Completion of Urban Roads Transport Infrastructure Projects in Kenya
No ratings yet
Matu - Stakeholder Participation in Project Life Cycle Management, Risk Management Practices and Completion of Urban Roads Transport Infrastructure Projects in Kenya
342 pages
Ix Stati
No ratings yet
Ix Stati
3 pages
Research Methodology
No ratings yet
Research Methodology
4 pages
Activity 9 Alcala
100% (1)
Activity 9 Alcala
5 pages
M.Sc. Bioinformatics Syllabus
No ratings yet
M.Sc. Bioinformatics Syllabus
27 pages
Cohen-1960-A Coefficient of Agreement For Nominal Scales
0% (1)
Cohen-1960-A Coefficient of Agreement For Nominal Scales
1 page
BCS301M55
No ratings yet
BCS301M55
29 pages
Quantitative vs Qualitative Research
67% (3)
Quantitative vs Qualitative Research
16 pages
From Open Data To Information Justice
No ratings yet
From Open Data To Information Justice
21 pages
Probability Distributions Guide
No ratings yet
Probability Distributions Guide
10 pages
Eating Out and Fast Food vs. Home-Cooked Meals and Use of Fresh of Ingredients in Cooking
No ratings yet
Eating Out and Fast Food vs. Home-Cooked Meals and Use of Fresh of Ingredients in Cooking
25 pages
(MAA 4.10) BINOMIAL DISTRIBUTION - Solutions
No ratings yet
(MAA 4.10) BINOMIAL DISTRIBUTION - Solutions
4 pages
Social Science Simulation Insights
No ratings yet
Social Science Simulation Insights
13 pages

Class Test 1 Revision Notes

Uploaded by

Class Test 1 Revision Notes

Uploaded by

Class Test 1 Revision Note

Chapter 1 Descriptive Statics

- Histogram vs. Bar Chart

- Shape of Frequency Distributions

 2. Pearson median skewness or second skewness coefficient

- One important application of percentile in risk management is VaR.

-   1 contains about 68% of the scores

-   2 contains about 95% of the scores

-   3 contains about 99.7% of the scores

- Positive/negative association, linear relationship/nonlinear (curvilinear)

 It is determined by the closeness of the points to a straight line.

 It is determined by whether one variable generally increases or generally

 When the pattern is nonlinear, the correlation coefficient is not an

- The measure is also called Pearson product-moment correlation coefficient.

- r is always -1 and +1.

- Magnitude indicates the strength of the linear relationship.

- Sign indicates the direction of the association.

Rank Correlation Coefficient rs

- The Spearman correlation coefficient rs is defined as the Pearson correlation

n = the number of pairs of ranks

- When to use rs instead of r?

 Situation 1: Data are given in the form of ranks.

- Cautions in the use of correlation

 Bear in mind the following five cautions in the use of correlation.

 Correlation does not prove causation

 If variation in X causes variation in Y, that causal connection will

 However, we cannot reason backward from a correlation to a

 There are at least four possibilities of an observed correlation.

Denote X as the explanatory variable, Y as the response variable.

(a) Causation – X is a cause of Y.

(b) Reverse of causation – Y is a cause of X.

(c) A third variable influences both X and Y.

(d) A complex of interrelated variables influences X and Y.

Note: Two or more of these situations may occur simultaneously.

 r and rs are only for linear relationship

 The correlation coefficient is sensitive to the variability

 For example, suppose a university had only minimal entrance

 Therefore, restricting the range, whether in X, in Y, or in both,

 The correlation tends to be an overestimate in discontinuous

 Usually, discontinuity, whether in X, in Y, or in both, results in a

 correlation for combined data

 correlation coefficient may increase or decrease, depends.

- Examples of deceiving relationship

 Groups combined inappropriately may mask relationships.

 The missing link is a third variable.

 Two or more groups

 Variables for each group may be strongly correlated

 When groups combined into one, very little correlation between

Simple Linear Regression

You might also like