Chapter - 3 Organisation of Data
Organization of data refers to the systematic arrangement of collected figures
(raw data), so that the data becomes easy to understand and more convenient
for further statistical treatment .
Classification is the process of arranging data into sequences and groups
according to their common characteristics of separating them in to different but
related parts.
Characteristics of classification:
1. Homogeneity
2.Suitability
3. Clarity
4. Flexibility
5. Diversification
A variable is a characteristic which is capable of being measured and capable of
change in its value from time to time.
Basis of classification:
Raw data can be classified as:
1. Chronological classification: In such a classification data are classified
either in ascending or in descending order with reference to time such as years,
quarters, months weeks etc.
2. Geographical/Spatial classification: The data are classified with reference
to geographical location/place such as countries, states , cities, districts, block
etc.
3. Qualitative classification: Data are classified with reference to descriptive
characteristics like sex, caste, religion literacy etc.
4. Quantitative classification: Data are classified on the basis of some
measurable characteristics such as height, age, weight, income, marks of
students.
5. conditional classification: When data are classified with respect to condition,
the type of classification is called conditional classification.
A mass of data in its original form is called raw data. It is an unorganized mass
of various items.
A characteristic which is capable of being measured and changes its value
overtime is called a variable. It is of two types.
(a) Discrete
(b) Continuous
Discrete: Discrete variable are those variables that increase in jumps or in
complete numbers and are not fractional. Ex.-number of student in a class could
be 2, 4, 10, 15,, 20, 25, etc. It does not take any fractional value between them.
Continuous variable: Continuous variables are those variables that can takes
any value i.e. integral value or fractional value in a specified interval.Ex- Wages
of workers in a factory.
A frequency distribution is a comprehensive way to classify raw data of a
quantitative variable. It shows how different values of a variable is distributed in
different classes along with their corresponding class frequencies.
The class mid-point or class mark is the middle value of a class. It lies halfway
between the lower class limit and the upper class limit of a class and can be
ascertained in the following manner.
Class mid-point = upper class limit + lower class limit / 2.
Class frequency: It means the number of values in a particular class.
Class width:- It is the difference between the upper class limit and lower class
limit
Class width = upper class Limit – Lower class Limit
Class Limits:- There are two ends of a class. The lowest value is called lower
class limit and highest value is called upper class limit.
The classes, by the exclusive method is formed in such a way that the upper
class limit of one class equals the lower class limit of the next class. eg 0-10,
10-20.
In comparison to the exclusive method, the inclusive method does not excludes
the upper class limit in a class interval. It includes the upper class in a class.
Thus both class limits are parts of the class intervals e.g., 0-9, 10-19.
The classification of data as a frequency distribution has an inherent short
coming. While it summarizes the raw data making it concise and
comprehensible. It does not show the details that are found in raw data. So there
is a loss of information in classifying raw data.
Classification of data implies conversion of raw data in to statistical series.
The difference between Univariate and Bivariate Frequency distribution
Basis Univariate Frequency Bivariate Frequency distribution
distribution
Meanin When data is classified on the when data is classified on the basis of
g basis of single variable,the two variables, the distribution is known
distribution is known as as bivariate frequency distribution.
univariate frequency
distribution.
Alternat One-way frequency Two-way frequency
e Name
Exampl Height of students in a class Height and weight of students in a
e class
Broadly statistical series are of two types.
Types of series
1. Individual series
2. Frequency series
a. Discrete series Or frequency array
b. Frequency distribution or continuous series
Individual series are those series in which the items are listed singly. For
example:
Sr. No. of Daily wages(in
workers Rs.)
1 25
2 50
3 35
4 40
5 20
6 45
A discrete series or frequency array is that series in which data are prescribed
in a way that exact measurements of items are clearly shown. The example in
following table illustrates a frequency array.
Frequency array of the size of household
Size of the Number of household
household (Frequency)
1 5
2 15
3 25
4 35
5 10
6 5
A continuous series: It is that series in which items cannot be exactly
measured. The items assume a range of values and are placed within the range
of limits. In other words, data are classified into different classes with a range, the
range is called class-intervals.
Frequency distribution or continuous series
Mark Frequen
s cy
10-20 4
20-30 5
30-40 8
40-50 5
50-60 4
60-70 3