0% found this document useful (0 votes)

928 views18 pages

Data Collection Statistics

The document discusses the key differences between primary and secondary data. Primary data is collected directly by the researcher for a specific study, while secondary data has already been collected by others. Some key differences are that primary data involves more time and cost to collect but is more accurate and specific to the research question, while secondary data is easier to obtain but may not be as relevant or reliable. The document provides examples of primary data collection methods like surveys and interviews and sources of secondary data like government publications and previous research reports.

Uploaded by

Gabriel Belmonte

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

928 views18 pages

Data Collection Statistics

Uploaded by

Gabriel Belmonte

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Data collection plays a very crucial role in the statistical analysis.

In research,
there are different methods used to gather information, all of which fall into
two categories, i.e. primary data, and secondary data. As the name suggests,
primary data is one which is collected for the first time by the researcher while
secondary data is the data already collected or produced by others.

There are many differences between primary and secondary data, which are
discussed in this article. But the most important difference is that primary
data is factual and original whereas secondary data is just the analysis and
interpretation of the primary data. While primary data is collected with an aim
for getting solution to the problem at hand, secondary data is collected for
other purposes.

Content: Primary Data Vs Secondary Data

1. Comparison Chart
2. Definition
3. Key Differences
4. Conclusion

Comparison Chart

BASIS FOR
PRIMARY DATA SECONDARY DATA
COMPARISON

Meaning Primary data refers to the Secondary data means data

first hand data gathered collected by someone else
by the researcher himself. earlier.

Data Real time data Past data

Process Very involved Quick and easy

Source Surveys, observations, Government publications,

experiments, websites, books, journal
questionnaire, personal articles, internal records
interview, etc. etc.
BASIS FOR
PRIMARY DATA SECONDARY DATA
COMPARISON

Cost effectiveness Expensive Economical

Collection time Long Short

Specific Always specific to the May or may not be specific

researcher's needs. to the researcher's need.

Available in Crude form Refined form

Accuracy and More Relatively less

Reliability

Definition of Primary Data

Primary data is data originated for the first time by the researcher through
direct efforts and experience, specifically for the purpose of addressing his
research problem. Also known as the first hand or raw data. Primary data
collection is quite expensive, as the research is conducted by the organisation
or agency itself, which requires resources like investment and manpower. The
data collection is under direct control and supervision of the investigator.

The data can be collected through various methods like surveys, observations,
physical testing, mailed questionnaires, questionnaire filled and sent by
enumerators, personal interviews, telephonic interviews, focus groups, case
studies, etc.

Definition of Secondary Data

Secondary data implies second-hand information which is already collected

and recorded by any person other than the user for a purpose, not relating to
the current research problem. It is the readily available form of data collected
from various sources like censuses, government publications, internal records
of the organisation, reports, books, journal articles, websites and so on.
Secondary data offer several advantages as it is easily available, saves time and
cost of the researcher. But there are some disadvantages associated with this,
as the data is gathered for the purposes other than the problem in mind, so the
usefulness of the data may be limited in a number of ways like relevance and
accuracy.

Moreover, the objective and the method adopted for acquiring data may not
be suitable to the current situation. Therefore, before using secondary data,
these factors should be kept in mind.

Key Differences Between Primary and Secondary Data

The fundamental differences between primary and secondary data are
discussed in the following points:

1. The term primary data refers to the data originated by the researcher for
the first time. Secondary data is the already existing data, collected by
the investigator agencies and organisations earlier.
2. Primary data is a real-time data whereas secondary data is one which
relates to the past.
3. Primary data is collected for addressing the problem at hand while
secondary data is collected for purposes other than the problem at hand.
4. Primary data collection is a very involved process. On the other hand,
secondary data collection process is rapid and easy.
5. Primary data collection sources include surveys, observations,
experiments, questionnaire, personal interview, etc. On the contrary,
secondary data collection sources are government publications,
websites, books, journal articles, internal records etc.
6. Primary data collection requires a large amount of resources like time,
cost and manpower. Conversely, secondary data is relatively inexpensive
and quickly available.
7. Primary data is always specific to the researcher’s needs, and he controls
the quality of research. In contrast, secondary data is neither specific to
the researcher’s need, nor he has control over the data quality.
8. Primary data is available in the raw form whereas secondary data is the
refined form of primary data. It can also be said that secondary data is
obtained when statistical methods are applied to the primary data.
9. Data collected through primary sources are more reliable and accurate
as compared to the secondary sources.

Conclusion

As can be seen from the above discussion that primary data is an original and
unique data, which is directly collected by the researcher from a source
according to his requirements. As opposed to secondary data which is easily
accessible but are not pure as they have undergone through many statistical
treatments.

Descriptive Statistics
Descriptive statistics are used to describe the basic features of the data
in a study. They provide simple summaries about the sample and the
measures. Together with simple graphics analysis, they form the basis of
virtually every quantitative analysis of data.

Descriptive statistics are typically distinguished from inferential statistics.

With descriptive statistics you are simply describing what is or what the
data shows. With inferential statistics, you are trying to reach
conclusions that extend beyond the immediate data alone. For instance,
we use inferential statistics to try to infer from the sample data what the
population might think. Or, we use inferential statistics to make
judgments of the probability that an observed difference between groups
is a dependable one or one that might have happened by chance in this
study. Thus, we use inferential statistics to make inferences from our
data to more general conditions; we use descriptive statistics simply to
describe what's going on in our data.

Descriptive Statistics are used to present quantitative descriptions in a

manageable form. In a research study we may have lots of measures.
Or we may measure a large number of people on any measure.
Descriptive statistics help us to simplify large amounts of data in a
sensible way. Each descriptive statistic reduces lots of data into a
simpler summary. For instance, consider a simple number used to
summarize how well a batter is performing in baseball, the batting
average. This single number is simply the number of hits divided by the
number of times at bat (reported to three significant digits). A batter who
is hitting .333 is getting a hit one time in every three at bats. One batting
.250 is hitting one time in four. The single number describes a large
number of discrete events. Or, consider the scourge of many students,
the Grade Point Average (GPA). This single number describes the
general performance of a student across a potentially wide range of
course experiences.

Every time you try to describe a large set of observations with a single
indicator you run the risk of distorting the original data or losing important
detail. The batting average doesn't tell you whether the batter is hitting
home runs or singles. It doesn't tell whether she's been in a slump or on
a streak. The GPA doesn't tell you whether the student was in difficult
courses or easy ones, or whether they were courses in their major field
or in other disciplines. Even given these limitations, descriptive statistics
provide a powerful summary that may enable comparisons across
people or other units.

Univariate Analysis
Univariate analysis involves the examination across cases of one
variable at a time. There are three major characteristics of a single
variable that we tend to look at:

 the distribution

 the central tendency

 the dispersion

In most situations, we would describe all three of these characteristics

for each of the variables in our study.

The Distribution. The distribution is a summary of the frequency of

individual values or ranges of values for a variable. The simplest
distribution would list every value of a variable and the number of
persons who had each value. For instance, a typical way to describe the
distribution of college students is by year in college, listing the number or
percent of students at each of the four years. Or, we describe gender by
listing the number or percent of males and females. In these cases, the
variable has few enough values that we can list each one and
summarize how many sample cases had the value. But what do we do
for a variable like income or GPA? With these variables there can be a
large number of possible values, with relatively few people having each
one. In this case, we group the raw scores into categories according to
ranges of values. For instance, we might look at GPA according to the
letter grade ranges. Or, we might group income into four or five ranges
of income values.

Table 1. Frequency distribution table.

One of the most common ways to describe a single variable is with

a frequency distribution. Depending on the particular variable, all of
the data values may be represented, or you may group the values into
categories first (e.g., with age, price, or temperature variables, it would
usually not be sensible to determine the frequencies for each value.
Rather, the value are grouped into ranges and the frequencies
determined.). Frequency distributions can be depicted in two ways, as a
table or as a graph. Table 1 shows an age frequency distribution with
five categories of age ranges defined. The same frequency distribution
can be depicted in a graph as shown in Figure 1. This type of graph is
often referred to as a histogram or bar chart.
Figure 1. Frequency distribution bar chart.

Distributions may also be displayed using percentages. For example,

you could use percentages to describe the:

 percentage of people in different income levels

 percentage of people in different age ranges

 percentage of people in different ranges of standardized test scores

Central Tendency. The central tendency of a distribution is an estimate

of the "center" of a distribution of values. There are three major types of
estimates of central tendency:

 Mean

 Median

 Mode

The Mean or average is probably the most commonly used method of

describing central tendency. To compute the mean all you do is add up
all the values and divide by the number of values. For example, the
mean or average quiz score is determined by summing all the scores
and dividing by the number of students taking the exam. For example,
consider the test score values:

15, 20, 21, 20, 36, 15, 25, 15

The sum of these 8 values is 167, so the mean is 167/8 = 20.875.

The Median is the score found at the exact middle of the set of values.
One way to compute the median is to list all scores in numerical order,
and then locate the score in the center of the sample. For example, if
there are 500 scores in the list, score #250 would be the median. If we
order the 8 scores shown above, we would get:

15,15,15,20,20,21,25,36

There are 8 scores and score #4 and #5 represent the halfway point.
Since both of these scores are 20, the median is 20. If the two middle
scores had different values, you would have to interpolate to determine
the median.

The mode is the most frequently occurring value in the set of scores. To
determine the mode, you might again order the scores as shown above,
and then count each one. The most frequently occurring value is the
mode. In our example, the value 15 occurs three times and is the model.
In some distributions there is more than one modal value. For instance,
in a bimodal distribution there are two values that occur most frequently.

Notice that for the same set of 8 scores we got three different values --
20.875, 20, and 15 -- for the mean, median and mode respectively. If the
distribution is truly normal (i.e., bell-shaped), the mean, median and
mode are all equal to each other.

Dispersion. Dispersion refers to the spread of the values around the

central tendency. There are two common measures of dispersion, the
range and the standard deviation. The range is simply the highest value
minus the lowest value. In our example distribution, the high value is 36
and the low is 15, so the range is 36 - 15 = 21.

The Standard Deviation is a more accurate and detailed estimate of

dispersion because an outlier can greatly exaggerate the range (as was
true in this example where the single outlier value of 36 stands apart
from the rest of the values. The Standard Deviation shows the relation
that set of scores has to the mean of the sample. Again lets take the set
of scores:

15,20,21,20,36,15,25,15

to compute the standard deviation, we first find the distance between

each value and the mean. We know from above that the mean is 20.875.
So, the differences from the mean are:

15 - 20.875 = -5.875

20 - 20.875 = -0.875

21 - 20.875 = +0.125

20 - 20.875 = -0.875

36 - 20.875 = 15.125

15 - 20.875 = -5.875

25 - 20.875 = +4.125

15 - 20.875 = -5.875

Notice that values that are below the mean have negative discrepancies
and values above it have positive ones. Next, we square each
discrepancy:

-5.875 * -5.875 = 34.515625

-0.875 * -0.875 = 0.765625

+0.125 * +0.125 = 0.015625

-0.875 * -0.875 = 0.765625

15.125 * 15.125 = 228.765625

-5.875 * -5.875 = 34.515625

+4.125 * +4.125 = 17.015625

-5.875 * -5.875 = 34.515625

Now, we take these "squares" and sum them to get the Sum of Squares
(SS) value. Here, the sum is 350.875. Next, we divide this sum by the
number of scores minus 1. Here, the result is 350.875 / 7 = 50.125. This
value is known as the variance. To get the standard deviation, we take
the square root of the variance (remember that we squared the
deviations earlier). This would be SQRT(50.125) = 7.079901129253.

Although this computation may seem convoluted, it's actually quite

simple. To see this, consider the formula for the standard deviation:

In the top part of the ratio, the numerator, we see that each score has
the the mean subtracted from it, the difference is squared, and the
squares are summed. In the bottom part, we take the number of scores
minus 1. The ratio is the variance and the square root is the standard
deviation. In English, we can describe the standard deviation as:

the square root of the sum of the squared deviations from the mean
divided by the number of scores minus one

Although we can calculate these univariate statistics by hand, it gets

quite tedious when you have more than a few values and variables.
Every statistics program is capable of calculating them easily for you.
For instance, I put the eight scores into SPSS and got the following table
as a result:
N 8

Mean 20.8750

Median 20.0000

Mode 15.00

Std. Deviation 7.0799

Variance 50.1250

Range 21.00

which confirms the calculations I did by hand above.

The standard deviation allows us to reach some conclusions about

specific scores in our distribution. Assuming that the distribution of
scores is normal or bell-shaped (or close to it!), the following conclusions
can be reached:

 approximately 68% of the scores in the sample fall within one standard deviation of the mean

 approximately 95% of the scores in the sample fall within two standard deviations of the mean

 approximately 99% of the scores in the sample fall within three standard deviations of the mean

For instance, since the mean in our example is 20.875 and the standard
deviation is 7.0799, we can from the above statement estimate that
approximately 95% of the scores will fall in the range of 20.875-
(2*7.0799) to 20.875+(2*7.0799) or between 6.7152 and 35.0348. This
kind of information is a critical stepping stone to enabling us to compare
the performance of an individual on one variable with their performance
on another, even when the variables are measured on entirely different
scales.
Deciles are similar to quartiles. But while quartiles sort data into four quarters, deciles sort data into ten
equal parts: The 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, 90th and 100th percentiles.
A decile rank assigns a number to a decile:
Decile Rank Percentile
1 10th

2 20th

3 30th

4 40th

5 50th

6 60th

7 70th

8 80th

9 90th

The higher your place in the decile rankings, the higher your overall ranking. For example, if you were in
the 99th percentile for a particular test, that would put you in the decile ranking of 10. A person who
scored very low (say, the 5th percentile) would find themselves in a decile rank of 1.

A chart showing decile rankings for discharged stroke patients. Image: SUNY Buffalo

In this page 'Deciles' we are going to see the partitional values of the given
data.
Decile : Definition

Deciles are nine partitional values of the data or the given set of
observation into ten equal parts. These 9 values are represented by D₁, D₂,
D₃, D₄, D₅, D₆, D₇, D₈ and D₉ .

They shows the 10%, 20%.30%, 40%, 50%, 60%, 70%, 80% and 90%

For ungrouped data:

Example 1:

Given the series 3,5, 7, 4 6,2 and 9.

Calculate the 2nd and 4th decile.

Solution:

To find the decile first we have to arrange the data in order.

2,3, 4,5, 6, 7 and 9.

Here n = 7

D₂ = value of 2[(n+1)/10]th item.

= value of 2x[(7+1)/10]th item

= value of 1.6th item.

= 1st value + 0.6 of the distance between 1st and

2nd value

= 2 + 0.6(3-2)

D₂ = 2.6

Now let us find the value for D₄

Solution:

The ordered data is 2, 3, 4, 5, 6, 7 and 9.

Here n = 7

D₄ = value of 4[(7+1)/10]th item

= value of 4 x 8/10 th item.

= value of 3.2 th item

= 3rd value + 0.2 of the distance between 3rd
and 4th value

= 4 + 0.2(5-4)

= 4.2

For grouped data:

Where

Lᵢ = Lower limit of the decile class.

N = Sum of the absolute frequency

Fᵢ₋₁ = Absolute frequency immediately below

the decile class

aᵢ = Width of the class containing the

decile class.

Note: The decile is independent of the width of the classes.

Let us an example for the grouped data.

Example:
Calculate the decile D₁ and D₃ for the following table.

Solution:

Calculation for the first decile

D₁ = L₁ + { [(k.N)/10 - F₁₋₁]/f₁}.a₁

= 40 + {[(1.70)/10 - 0]/8}.10

= 40 + [(70/10)/8) .10]

= 40 + 70/8

= 390/8

= 48.75
Calculation for 3rd decile

D₃ = L₃ + { [(k.N)/10 - F₃₋₁]/f₃}.a₃

= 60 + {[3.70/10-F₂]/14}10

= 60 + (210/10)-20]/14}.10

= 60 + [(21-20)/14].10

= 60 + 10/14

= 60 + 0.71

= 60.71

The formula given below is also to find the deciles for the grouped data.

Where

l = lower boundary of the class containing the

kth decile

h = Width of that class

f = frequency of that class

n = total number of frequencies

c = cumulative frequency preceding to that class

We will get the same answer on doing the above method also.

Students can choose the formula which is more convenient to them to

find out the decile. If you are having any doubt you can contact us through
mail, we will help you to clear your doubts.

Methods of Statistical Survey
67% (3)
Methods of Statistical Survey
2 pages
Measures of Central Tendency - Dispersion - Skewness - NOTES PGDM
No ratings yet
Measures of Central Tendency - Dispersion - Skewness - NOTES PGDM
89 pages
Business Statistics 1430 Important Questions 2025
No ratings yet
Business Statistics 1430 Important Questions 2025
57 pages
Business Statistics Notes-3 - Compressed (1) - Compressed
No ratings yet
Business Statistics Notes-3 - Compressed (1) - Compressed
97 pages
Problems On Geometric Mean
No ratings yet
Problems On Geometric Mean
2 pages
Input Output Analysis
No ratings yet
Input Output Analysis
11 pages
Business Statistics Question Bank 2025
No ratings yet
Business Statistics Question Bank 2025
8 pages
Measure of Dispersion
No ratings yet
Measure of Dispersion
3 pages
Operations Research for Math Majors
No ratings yet
Operations Research for Math Majors
207 pages
Unit 4 Measures of Central Tendency and Dispersion: Structure
No ratings yet
Unit 4 Measures of Central Tendency and Dispersion: Structure
90 pages
Index Numbers: Methods and Applications
No ratings yet
Index Numbers: Methods and Applications
33 pages
Index Numbers: Types and Uses
No ratings yet
Index Numbers: Types and Uses
14 pages
Sociological Investigation: Three Ways of Doing Sociology
No ratings yet
Sociological Investigation: Three Ways of Doing Sociology
5 pages
Book
No ratings yet
Book
14 pages
Bca Business Statistics PDF
100% (1)
Bca Business Statistics PDF
44 pages
Au B.com Business Statistics
No ratings yet
Au B.com Business Statistics
221 pages
Chapter 7 Correlation
No ratings yet
Chapter 7 Correlation
23 pages
BBA Statistics Exam Guide
No ratings yet
BBA Statistics Exam Guide
12 pages
Business Mathematics and Statistic
No ratings yet
Business Mathematics and Statistic
18 pages
Time Series Analysis Syllabus
No ratings yet
Time Series Analysis Syllabus
84 pages
Understanding Measures of Dispersion
50% (2)
Understanding Measures of Dispersion
58 pages
Quadratic Equations Methods To Solve BBA-I Sem 2022 Batch
No ratings yet
Quadratic Equations Methods To Solve BBA-I Sem 2022 Batch
18 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
117 pages
Ratio, Proportion, Percentage & Partnership Ratio
No ratings yet
Ratio, Proportion, Percentage & Partnership Ratio
4 pages
TNPSC Group 2 Complete Syllabus: TNPSC Group 2 Previous Questions: TNPSC Group 2 Model Questions
No ratings yet
TNPSC Group 2 Complete Syllabus: TNPSC Group 2 Previous Questions: TNPSC Group 2 Model Questions
4 pages
Correlation Regression
100% (1)
Correlation Regression
25 pages
Welcome: To All MBA Students
No ratings yet
Welcome: To All MBA Students
60 pages
Strategic Management
No ratings yet
Strategic Management
114 pages
Index Number - Theory Notes
No ratings yet
Index Number - Theory Notes
6 pages
Chapter 3 (Measures of Central Tendency pt.1) Business Statistics
No ratings yet
Chapter 3 (Measures of Central Tendency pt.1) Business Statistics
25 pages
Business Statistics Question Bank
100% (1)
Business Statistics Question Bank
4 pages
Barometric Method of Forecastin1
No ratings yet
Barometric Method of Forecastin1
2 pages
Calculation of Chain Index Numbers: Steps
No ratings yet
Calculation of Chain Index Numbers: Steps
2 pages
The Moment Generating Function of The Normal Distribution: Msc. Econ: Mathematical Statistics, 1996
No ratings yet
The Moment Generating Function of The Normal Distribution: Msc. Econ: Mathematical Statistics, 1996
2 pages
Unit 2 - Statistics Notes
No ratings yet
Unit 2 - Statistics Notes
213 pages
Index Numbers: Quantitative Aptitude & Business Statistics
No ratings yet
Index Numbers: Quantitative Aptitude & Business Statistics
115 pages
Problems and Solutions in Business Mathematics and Statistics Problems and Solutions in Business Mathematics and Statistics
No ratings yet
Problems and Solutions in Business Mathematics and Statistics Problems and Solutions in Business Mathematics and Statistics
3 pages
SQC
No ratings yet
SQC
11 pages
8 Main Limitations of Statistics
No ratings yet
8 Main Limitations of Statistics
3 pages
Stats 1 and 2 Marks Impt Questions With Answers
No ratings yet
Stats 1 and 2 Marks Impt Questions With Answers
14 pages
Correlation Analysis
No ratings yet
Correlation Analysis
30 pages
Understanding Measures of Dispersion
No ratings yet
Understanding Measures of Dispersion
17 pages
Statistics For MGMT I & II
No ratings yet
Statistics For MGMT I & II
161 pages
B.Com 6th Sem Business Mathematics Study Materials - Unit-1 To Upload
No ratings yet
B.Com 6th Sem Business Mathematics Study Materials - Unit-1 To Upload
16 pages
Telecom Industry Data Analysis
No ratings yet
Telecom Industry Data Analysis
30 pages
Booklist and Supplementary Materials For Iss
No ratings yet
Booklist and Supplementary Materials For Iss
16 pages
J. K.Shah Classes Regression Analysis
No ratings yet
J. K.Shah Classes Regression Analysis
15 pages
Statistical Inferences Notes
No ratings yet
Statistical Inferences Notes
15 pages
Measures of Central Tendency - Business Statistics by PR Vittal
No ratings yet
Measures of Central Tendency - Business Statistics by PR Vittal
20 pages
Correlation & Regression
No ratings yet
Correlation & Regression
10 pages
Agp 210 L6 2025
No ratings yet
Agp 210 L6 2025
12 pages
1.4 Types of Data and Sources
No ratings yet
1.4 Types of Data and Sources
28 pages
Primary Sources of Data and Secondary So
No ratings yet
Primary Sources of Data and Secondary So
5 pages
Importance of Statistics in Education
No ratings yet
Importance of Statistics in Education
22 pages
Unit 4
No ratings yet
Unit 4
18 pages
What Are Statistics ?: Statistics Is A Form of Mathematical Analysis That Uses Quantified Models
No ratings yet
What Are Statistics ?: Statistics Is A Form of Mathematical Analysis That Uses Quantified Models
15 pages
I Am Sharing 'Theory Question of Business Statistics BBA 2nd Sem' With You
No ratings yet
I Am Sharing 'Theory Question of Business Statistics BBA 2nd Sem' With You
3 pages
In Statistics
No ratings yet
In Statistics
4 pages
Primary vs Secondary Data Explained
No ratings yet
Primary vs Secondary Data Explained
8 pages
UGC Universities
No ratings yet
UGC Universities
40 pages
Ecommerce BRD
89% (19)
Ecommerce BRD
17 pages
Cash Budget:: (A) Receipts & Payments Method
No ratings yet
Cash Budget:: (A) Receipts & Payments Method
6 pages
Recruitment Strategies Guide
No ratings yet
Recruitment Strategies Guide
28 pages
Env
No ratings yet
Env
3 pages
Kerala University UG Admissions 2020-21
No ratings yet
Kerala University UG Admissions 2020-21
72 pages
Point and Interval Estimation: Presented By: Shubham Mehta 0019
100% (1)
Point and Interval Estimation: Presented By: Shubham Mehta 0019
43 pages
Outplacement Services and Job Rotation Benefits
No ratings yet
Outplacement Services and Job Rotation Benefits
10 pages
Financial Ratios Cheatsheet
No ratings yet
Financial Ratios Cheatsheet
6 pages
Illustration1: For The Production of 10000 Units of A Product, The Following Are The Budgeted Expenses
100% (3)
Illustration1: For The Production of 10000 Units of A Product, The Following Are The Budgeted Expenses
4 pages
Understanding Marginal Costing Techniques
No ratings yet
Understanding Marginal Costing Techniques
15 pages
Understanding Marginal Costing Techniques
No ratings yet
Understanding Marginal Costing Techniques
15 pages
Break Even Analysis for New Businesses
No ratings yet
Break Even Analysis for New Businesses
20 pages
Management Accounting
No ratings yet
Management Accounting
11 pages
KSEB Bill Calculation
0% (1)
KSEB Bill Calculation
2 pages
Marginal Costing for Decision Making
No ratings yet
Marginal Costing for Decision Making
16 pages
Understanding Budget Planning Essentials
No ratings yet
Understanding Budget Planning Essentials
3 pages
Symbiosis BBA Syllabus (2016-19)
100% (1)
Symbiosis BBA Syllabus (2016-19)
64 pages
Ratios Cheat Sheet
No ratings yet
Ratios Cheat Sheet
1 page
Cusat Mba Second Semester Syllabus
100% (1)
Cusat Mba Second Semester Syllabus
12 pages
Ethical Relativism
75% (4)
Ethical Relativism
18 pages
GW4S
No ratings yet
GW4S
6 pages
Confidence Intervals for Statistical Parameters
No ratings yet
Confidence Intervals for Statistical Parameters
29 pages
UGC NET (Physical Education) : Official Syllabus 2019
No ratings yet
UGC NET (Physical Education) : Official Syllabus 2019
8 pages
Evaluation of The Quality of A Granite Quarry 1999 Engineering Geology
No ratings yet
Evaluation of The Quality of A Granite Quarry 1999 Engineering Geology
11 pages
Extended Kalman Filter PDF
0% (2)
Extended Kalman Filter PDF
2 pages
Research Methodology Part 1 Introduction To Research Research Methodology
No ratings yet
Research Methodology Part 1 Introduction To Research Research Methodology
30 pages
Shogun Machine Learning Toolbox Overview
No ratings yet
Shogun Machine Learning Toolbox Overview
3 pages
Cleveland 1984
No ratings yet
Cleveland 1984
25 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
21 pages
Forecasting: Forecasting Is A Process of Predicting or Estimating The Future Based On Past and Present Data
No ratings yet
Forecasting: Forecasting Is A Process of Predicting or Estimating The Future Based On Past and Present Data
5 pages
M.tech Biotechnology
No ratings yet
M.tech Biotechnology
17 pages
Unit One Sampling and Sampling Distribution
No ratings yet
Unit One Sampling and Sampling Distribution
41 pages
Intro To Statistics Chapter 1 & 2
No ratings yet
Intro To Statistics Chapter 1 & 2
25 pages
Statistics Sem 2 Revision Questions
No ratings yet
Statistics Sem 2 Revision Questions
9 pages
Course Catalogue 2023 2024
No ratings yet
Course Catalogue 2023 2024
230 pages
Taking College Courses in High School: A Strategy For College Readiness
No ratings yet
Taking College Courses in High School: A Strategy For College Readiness
44 pages
RS2 Tutorials - Probabilistic Slope Stability Analysis
No ratings yet
RS2 Tutorials - Probabilistic Slope Stability Analysis
10 pages
IIT JAM Mathematical Statistics Syllabus 2025
No ratings yet
IIT JAM Mathematical Statistics Syllabus 2025
4 pages
Carson Et Al., 2007
No ratings yet
Carson Et Al., 2007
9 pages
An Introduction To Football Modelling at Smartodds by Robert Johnson
100% (1)
An Introduction To Football Modelling at Smartodds by Robert Johnson
104 pages
Spss Univariat Bivariat
No ratings yet
Spss Univariat Bivariat
5 pages
CAIIB Syllabus - Advanced Bank Management
No ratings yet
CAIIB Syllabus - Advanced Bank Management
19 pages
Data Migration Strategiesin SAPS4 HANA
No ratings yet
Data Migration Strategiesin SAPS4 HANA
18 pages
Chap 15
No ratings yet
Chap 15
15 pages
B.C.A. Second Year Syllabus-2021-22
No ratings yet
B.C.A. Second Year Syllabus-2021-22
37 pages
Data Analyst Roadmap Overview
No ratings yet
Data Analyst Roadmap Overview
5 pages
Sampling Distributions and CLT Examples
No ratings yet
Sampling Distributions and CLT Examples
24 pages
Best Work (1) Econometrics Assignment
No ratings yet
Best Work (1) Econometrics Assignment
18 pages
M.Tech (RE) Course Plan
No ratings yet
M.Tech (RE) Course Plan
89 pages
Sas 9.0 Manual PDF
No ratings yet
Sas 9.0 Manual PDF
1,861 pages

Data Collection Statistics

Uploaded by

Data Collection Statistics

Uploaded by

Data collection plays a very crucial role in the statistical analysis.

Content: Primary Data Vs Secondary Data

Meaning Primary data refers to the Secondary data means data

Data Real time data Past data

Process Very involved Quick and easy

Source Surveys, observations, Government publications,

Cost effectiveness Expensive Economical

Collection time Long Short

Specific Always specific to the May or may not be specific

Available in Crude form Refined form

Accuracy and More Relatively less

Definition of Primary Data

Definition of Secondary Data

Secondary data implies second-hand information which is already collected

Key Differences Between Primary and Secondary Data

Descriptive statistics are typically distinguished from inferential statistics.

Descriptive Statistics are used to present quantitative descriptions in a

 the central tendency

In most situations, we would describe all three of these characteristics

The Distribution. The distribution is a summary of the frequency of

Table 1. Frequency distribution table.

One of the most common ways to describe a single variable is with

Distributions may also be displayed using percentages. For example,

 percentage of people in different income levels

 percentage of people in different age ranges

 percentage of people in different ranges of standardized test scores

Central Tendency. The central tendency of a distribution is an estimate

The Mean or average is probably the most commonly used method of

15, 20, 21, 20, 36, 15, 25, 15

Dispersion. Dispersion refers to the spread of the values around the

The Standard Deviation is a more accurate and detailed estimate of

to compute the standard deviation, we first find the distance between

-5.875 * -5.875 = 34.515625

-0.875 * -0.875 = 0.765625

+0.125 * +0.125 = 0.015625

-0.875 * -0.875 = 0.765625

15.125 * 15.125 = 228.765625

-5.875 * -5.875 = 34.515625

+4.125 * +4.125 = 17.015625

-5.875 * -5.875 = 34.515625

Although this computation may seem convoluted, it's actually quite

Although we can calculate these univariate statistics by hand, it gets

Std. Deviation 7.0799

which confirms the calculations I did by hand above.

The standard deviation allows us to reach some conclusions about

For ungrouped data:

Given the series 3,5, 7, 4 6,2 and 9.

Calculate the 2nd and 4th decile.

To find the decile first we have to arrange the data in order.

2,3, 4,5, 6, 7 and 9.

D₂ = value of 2[(n+1)/10]th item.

= value of 2x[(7+1)/10]th item

= value of 1.6th item.

= 1st value + 0.6 of the distance between 1st and

Now let us find the value for D₄

The ordered data is 2, 3, 4, 5, 6, 7 and 9.

D₄ = value of 4[(7+1)/10]th item

= value of 4 x 8/10 th item.

= value of 3.2 th item

For grouped data:

Lᵢ = Lower limit of the decile class.

N = Sum of the absolute frequency

Fᵢ₋₁ = Absolute frequency immediately below

aᵢ = Width of the class containing the

Note: The decile is independent of the width of the classes.

Let us an example for the grouped data.

Calculation for the first decile

l = lower boundary of the class containing the

h = Width of that class

f = frequency of that class

n = total number of frequencies

Students can choose the formula which is more convenient to them to

You might also like