0% found this document useful (0 votes)
98 views29 pages

Chapter 4 Organizing Data, Classification and Tabulation

This chapter introduces you to the tabulation of data so that you can make sense out of the collected data.

Uploaded by

vg3162
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views29 pages

Chapter 4 Organizing Data, Classification and Tabulation

This chapter introduces you to the tabulation of data so that you can make sense out of the collected data.

Uploaded by

vg3162
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Chapter 4 Organizing Data

4.1 Introduction
Data in itself is not information. We need to summarize and present data in useful ways to
support insight and help take effective decisions. Classification is the process of arranging
things (either actually or notionally) in groups or classes according to their resemblances
and affinities, and gives expression to the unity of attributes that may subsist amongst a
diversity of individuals. It serves the following purposes:
 eliminates unnecessary details,
 brings out clearly the points of similarity and dissimilarity, and
 allows comparisons and drawing of inferences.
The first step in the process of classification is to select the basis of classification. Statistical
facts are classified according to their characteristics. Thus, the students of a college may be
classified according to their marital status, height, religion, etc. When a particular
characteristic has been chosen for this purpose, the next step in the process of classification
would be to note the similarity and dissimilarity as regards this chosen characteristic in the
various items. Items that would be alike in respect of this characteristic will be grouped
together. Thus, if the students are to be classified according to their marital status, all
married students would be put into one group and all unmarried in another. If the students
are classified on the basis of religion, there will be different groups for Hindus, Muslims,
Sikhs, etc. When the classification is made according to heights, each group will include only
those students whose heights lie within a certain range. It will be noted that the three
different characteristics (marital status, height and religion) give us groups that are
significantly different from one another. Thus, in the first case we have groups where the
characteristic is either present or absent, e.g., married or unmarried. In the second we have
groups where the characteristic is of differing quality e.g., students may be Hindus, Muslims
or Sikhs.
The characteristics of a population may be broadly divided into two categories: attributes
and variables. Attributes are qualitative characteristics which are not capable of being
described numerically, e.g., sex, nationality, colour of eye, etc. These characteristics are
called attributes or descriptive characteristics. When classification is to be made on the basis
or attributes, groups are differentiated either by the presence and absence of the attribute
(e.g., married or not married), or by its differing qualities. The qualities of an attribute can
easily be differentiated by means of some natural or physical line of demarcation, and their
natural differences determine the group into which a particular item is to be placed. Thus, if
we select colour of eye as the basis of classification, there will be a group of brown-eyed
people, another of blue-eyed people, and so on.
Data is organized by creating summaries in the form of tables and are, more often than not,
presented in the form of visualizations. Large amounts of data can be reviewed rapidly by
visual summaries in the form of graphs and charts. Such visualizations reveal significant
patterns to the data. More on visualization in the next chapter.
Because the methods used to organize categorical variables differ from the methods used to
organize and visualize numerical variables, this chapter discusses methods for classification
for numerical and categorical variables in separate sections.
It may be noted that creating tabular summaries may often risk distorting the information
that they present and undermining the usefulness of those summaries for decision making.
Surely, a summary of any kind means that some details have been suppressed. Resulting
information can end up being distorted if the class intervals are not chosen appropriately.

4.2 Organizing Numerical Variables


Consider the marks obtained by 56 college students in the examination which are given in
Table 4.1.
Table 4.1
74 69 79 62 66 80 83 72 72 75 74
75 69 65 73 65 62 84 64 73 61 76
71 70 68 68 65 68 81 64 61 71 66
74 70 62 73 65 71 87 79 72 67 69
Note that the marks obtained is a discrete variable which takes integer values between 0
and 100, though in the present case there are no values between 0 to 60 and 88 to 100. As
was maintained earlier, this data is a shapeless mass not capable of being readily
assimilated or interpreted. It is, therefore, to be organised properly. The first thing that is to
be done in the matter of arranging the collected data is to prepare an array. The array is
prepared by arranging the values of the variable in an ascending or descending order. This
will enable us to know the range over which the items are spread and we will also get an
idea of their general distribution.
If we rearrange the above marks in ascending order, we get the following as an array:
Table 4.2
61 61 62 62 62 64 64 65 65 65 65
66 66 67 68 68 68 69 69 69 70 70
71 71 71 72 72 72 73 73 73 74 74
74 75 75 76 79 79 80 81 83 84 87
This is much more illuminating than the previous organization. We can right away see that
marks obtained lie between 61 and 80. However, this in itself is not sufficient. In order to
make the data easily understandable, the first task of the statistician is to condense and
simplify them in such a manner that irrelevant details are eliminated and their significant
features stand out prominently.
Frequency distribution
A frequency distribution shows how many times the various values (or classes of values) of
the numerical variable occur in the data. After arranging the data as in Table 4.2 their bulk
must be reduced so that the eye can take them in easily, the mind comprehend them and
computational method deal with them efficiently. A first step in such a condensation would
be achieved by representing the repetitions of a particular value of the marks by tallies
instead of rewriting the marks itself, as in Table 4.3. The number of' tallies corresponding to
any given marks is the frequency of that marks, usually represented by the symbol f. The
traditional method of tallying is to record the frequencies by marks until four have been
made, then to make a cross mark for fifth score. This procedure makes up the preliminary
sheet as in Table 4.3.
Table 4.3 Tallying marks to obtain frequency distribution of data of Table 4.1
Mark Frequency, Frequency,
Tallies Marks Tallies
s f f
61 || 2 75 || 2
62 ||| 3 76 | 1
63 0 77 0
64 || 2 78 0
65 |||| 5 79 || 2
66 || 2 80 | 1
67 | 1 81 | 1
68 ||| 3 82 0
69 ||| 3 83 | 1
70 || 2 84 | 1
71 ||| 3 85 0
72 || 2 86 0
73 ||| 3 87 | 1
74 ||| 3
Breaking down the data into the form of Table 4.3 makes much more information apparent.
A superficial scrutiny reveals the number of students getting a certain number of marks.
Thus, 65 is scored by 5 students, while no student scores 63, 77, 78, 82, 85, and 86.
Frequency, thus, means the number of times a certain value of the variable is repeated in the
given data. The table so formed is known as frequency distribution table.
Even after so much simplification, the data contain too many figures. In order that it may be
more readily comprehensible, the bulk of this data is further reduced by preparing a
grouped frequency distribution. Such a distribution shows the values of a numerical variable
into a set of numerically-ordered classes. Each class groups a mutually exclusive range of
values, called a class interval. Each value can be assigned to only one class, and every value
must be contained in one of the class intervals.
To create a useful frequency distribution, you must consider how many classes would be
appropriate for your data as well as determine a suitable width for each class interval. In
general, a frequency distribution should have at least 5 and no more than 15 classes. In our
data given in Table 4.1, the variable under study is marks scored and its different values are
the marks which may be (or are) obtained by 40 students individually. The objects in
relation to which the values of the variable are obtained are called items. In our data of
Table 4.1 students are the items.
Now, if we add up the number of items (students) obtaining marks equal to any of the
consecutive values beginning with 61, the grouped frequency distribution will take the
following form:
Each group of 5 consecutive values of marks, viz. 61-65, 66-70, 71-75, and 76-80, is called a
class. Since each class includes five values, the width of the class is 5. The first figure of each
class is known as the lower limit of the class, and the last figure of each class is called its
upper limit. Thus, 61, 66, 71, and 76 are respectively, the lower limits of the classes, and 65,
70, 75 and 80 are their upper limits.
Table 4.4 Grouped frequency distribution of data of Table 4.1
Class interval Frequency
61-65 11
66-70 11
71-75 14
76-80 4
81-85 3
86-90 1
N=44
A look at this table makes it evident that the process of grouping the frequencies makes
possible a considerable reduction in the size of the data. As a result of this, it is made more
comprehensible by highlighting the salient features of variable. Thus, we find that a very
large number of students have marks varying between 71 and 75, very few having more
than 75, and only one having more than 76.
It should be noted that there is a price that we have paid for the simplification obtained.
Thus, while the frequency distribution of Table 4.3 had all the details, the grouped
frequency distribution has masked certain details. We no longer know how the 14 items
within the class 71-75 are distributed. Are they all close to 71 or to 75 or more or less
uniformly distributed across the class interval? This smudging of the details is the price we
pay for easier comprehension of the essential pattern of the distribution. Clearly the larger
the class interval we select, the more would be this loss of information. There is thus, an
optimum class size for each problem at which the best compromise is made between the
demands of details and simplification.

4.3 Selecting Class Intervals


The main problem in preparing a frequency distribution as detailed above is that of selecting
class intervals. The first question that has to be settled is the number of classes to be used.
No hard and fast rules can be given for determining the number of classes. This is decided
by balancing the two extremes: one, where the number is so large that enough
simplification is not obtained, and two, when the number is so small that much useful
details are lost. A good rule of thumb is to choose between 5 and 15 classes, the exact
number being determined by other considerations as detailed below.
The class width used should be some convenient number like 1, 5, 10, 25, 100, etc., and
ordinarily not 7, 11, 13, 26, 29, etc. Thus, in the distribution of marks of Table 4.4 the class
width is 5 giving 6 intervals. We could easily have chosen 6 or 7, both leading to convenient
number of classes, but both would have been awkward.
For discrete variable data, i.e., where there are finite jumps between the consecutive
possible values of the variable, the limits can be specified in a number of ways. One way is
that shown in Table 4.4 for the marks obtained. Here the limits are both inclusive, i.e., items
having values equal to the lower and the upper limits of a class are included in that class. In
the other method, we use what are called exclusive classes wherein the items equal to the

Table 4.5 Lower limit excluded classes Table 4.6 Upper limit excluded classes
size of either the upper limit or the lower limit are excluded from the frequency of that
class. These may be either of the lower limit excluded type as given in Table 4.5, or the
upper limit excluded type as given in Table 4.6.
When the variable involved is a continuous one, i.e., when the variable takes values on a
continuous scale, the meaning of the class limit becomes slightly different. Now, we cannot
have the limits as in Tables 4.4 to 4.6. The intervals must be such that they cover the entire
range and therefore there cannot be any gaps. When classifying weights of pre-teen kids,
we cannot use classes like 31-35, 36-40, 41-45, etc. (all in kg). This is so, because the weights
are not restricted to whole numbers. A weight of 40.6 kg will not fit any of these classes. A
little thought will reveal that only such classes as 30-35, 35-40, 40-45, etc. will do. Here any
weight less than 35 will be included in the first class, and any greater than 35 in the second
class. How about a weight of exactly 35 kg? Theoretically, a weight of exactly 35 kg has very
little chance of occurring. If it was measured accurately enough, maybe we can find that it is
1 gm (or even less) on one side or the other. But usually these weights-are not measured to
such accuracy, and if we get one that is reported at 35 kg. Where do we place it?
There are at least three courses that are available to a statistician: (1) include half an item in
the class 30-35 and the other half in 35-40, (2) toss a coin to decide, and (3) make a thumb
rule to include in the higher class the first time such an item occurs, and in the lower class
when it occurs the second time. The first option is perhaps the best statistically, but the idea
of half a boy in 35-40 class is diverting, and is best avoided. The second option suffers from
the fact that it will not permit rechecking the results. The third, or some variant of it, is
usually employed.

Illustration 4.1
From the following observations prepare a frequency distribution in ascending order
starting with the class of 5-10 (exclusive method):

Marks in English:
12 36 40 30 28 20 19 10 10 16
19 27 15 26 20 19 7 45 33 21
56 37 6 20 11 17 37 30 20 5
Solution:
Note the variable involved is a discrete one. We chose the 'exclusive' type of limits. The
resulting distributions with upper limit excluded and with lower limit excluded are as given in
Table 4.7.
Table 4.7

With upper limit excluded With lower limit excluded


Marks Frequency Marks Frequency
0 -less than 5 0 More than 0 - 5 1
5 - less than 10 3 More than 5 - 10 4
10 - less than 15 4 More than 10 - 15 3
15 - less than 20 6 More than 15 - 20 9
20 - less than 25 5 More than 20 - 25 1
25 - less than 30 3 More than 25 - 30 5
30 - less than 35 3 More than 30 - 35 1
35 - less than 40 3 More than 35 - 40 4
40 - less than 45 1
Illustration 4.2
Prepare a frequency table for the following reported weights (in kg) of 40 individuals.
47 50 79 45 46 80 82 72
75 74 57 69 65 52 55 60
64 73 61 60 71 70 68 68
65 55 59 61 60 66 54 70
62 53 65 56 52 72 67 58
Solution:
The variable is a continuous one with values ranging from 45 to 82. If we select an interval
of 5 kg, it gives 8 classes, and hence is quite appropriate. We note the preponderance of
figures ending in 5 and 0 (15 out of 40 compared to a normal of about 8 denoting gross
rounding off. Therefore, the mid-points may be placed at 0's and 5's. Thus, we select the
class intervals 42.5 - 47.5, 47.5 - 52.5, 77.5 - 82.5, ... The resulting distribution is given in
Table 4.8.

Table 4.8
Apparent Real class
class limits limits Frequency
43-47 42.5-47.5 3
48-52 47.5-52.5 3
53-57 52.5-57.5 6
58-62 57.5-62.5 8
63-67 62.5-67.5 6
68-72 67.5-72.5 8
73-77 72.5-77.5 3
78-82 77.5-82.5 3
N=30

4.4 Two Way Frequency Distribution


In the preceding sections we have dealt with classifying data when one variable was
involved. We now proceed to describe the construction of frequency tables when two
variables are present. Table 4.9 shows the marks obtained by 24 students in tests in
Accountancy and in Statistics. A good table should be able to condense this data while
showing the performance in the two subjects simultaneously.
Table 4.9
Roll number of Students 1 2 3 4 5 6 7 8 9 10 11 12
Marks in Statistics 15 1 1 3 16 2 18 5 4 17 6 19
Marks in Accountancy 13 1 2 7 8 9 12 9 17 16 6 18
Roll number of Students 13 14 15 l6 17 18 19 20 21 22 23 24
Marks in Statistics 14 9 8 13 10 13 11 11 12 18 9 7
Marks in Accountancy 11 3 5 4 10 11 14 17 18 15 15 3
A two-way frequency table is a suitable way to present such data. It has class intervals for
one variable as columns and for the other variable as rows, as in Table 4.10. The boxes
formed at the intersection of rows and columns, thus, represent a joint-class. The frequency
of this joint class is the number of items that has the value of the first variable in the class
given by the column heading and the value of the second variable in the class given by the
row heading.
The method of construction of a two-way table consists of the following steps:
 Determine the class intervals for each of the variables. For data of Table 6.9. Since the
variables are discrete, we can use inclusive-type intervals for both variables. Use of 4
classes of width 5 for each variable gives 16 joint-classes, which is a reasonable
number.
 Place one of the variables at the top of the table (here, marks in statistics) and the
other on the left-hand side.
 Place each item in the appropriate box. Thus, Roll no. 1 with 15 in statistics and 13 in
Accountancy belongs to box at the intersection of column 11-15 and row 11-15.
 Total the tallies in each box and in each row and column. The grand total of rows and
columns should check with the total number of items.
The two-way table for marks in Statistics and Accountancy constructed in this manner is
shown as Table 6.10.
Table 4.10 A 2-Way Table or The Contingency Table for Data in Table 6.9

Statistics→ 1-5 5-10 10-15 15-20 Total


Accountancy↓
1-5 || (2) ||| (3) | (1) 6
5-10 ||| (3) || (2) | (1) 6
10-15 | (1) |||| (4) || (2) 7
15-20 | (1) || (2) || (2) 5
Total 6 6 7 5 24

4.5 Cumulative Frequencies


A cumulative frequency distribution shows how many cases lie below the upper real limit of
each class interval. Consider the distribution of the weekly earnings of handloom
employees given in Table 4.11.
Table 4.11 Weekly Earnings of Handloom Employees

Earnings, ₹ Number of Employees


3,000 - 4,000 20
4,000 - 5,000 50
5,000 - 6,000 100
6,000 - 7,000 40
7,000 - 8,000 35
8,000 - 9,000 10
9,000 - 10,000 5
N=260
It is at times desirable to know the number of employees earning less than or more than
certain amounts. This information is useful if we are planning some social-security measures
based on their earnings.
In order to provide this information, it is necessary to change the form of the frequency
distribution from a simple distribution as shown above to a cumulative frequency
distribution as shown below. When frequencies or two or more classes are added up, such
totals are called cumulative frequencies. There are, in general, two types of cumulative
frequencies used: the ‘less than’ and ‘more than’ cumulative frequencies. ‘Less than’
cumulative frequency of a class refers to number of cases that lie below the real upper limit
of the class. ‘More than’ cumulative frequency of a class refers to number of cases that lie
above the real lower limit of the class.
If we are interested in preparing a ‘less than’ cumulative frequency table, the cumulation is
started from the lowest size of the variable to the highest size. Thus, the frequency of less
than 4,000 is frequency of class-interval 3,000 - 4,000; of less than 5,000 is the total of the
frequencies of first two classes; of less than 6,000 the total of the frequencies of the first
three classes, and so on.
When it is desired to construct a ‘more than’ cumulative frequency distribution, the
cumulation proceeds from the greatest to the least. Thus, in order to determine the workers
whose earnings are ₹ 7,000 or more, the frequencies of the last three classes are added up.
Tables 4.12 and 4.13 show the two types of cumulative frequency distributions.

Table 4.12 ‘Less-Than’ Cumulative Table 4.13 ‘More-Than’


Distribution Cumulative Distribution
Number of Number of
Earnings, ₹ Earnings, ₹
Employees Employees
Less than 10,000 260 More than 3,000 260
Less than 9,000 255 More than 4,000 240
Less than 8,000 245 More than 5,000 190
Less than 7,000 210 More than 6,000 90
Less than 6,000 170 More than 7,000 50
Less than 5,000 70 More than 8,000 15
Less than 4,000 20 More than 9,000 5

4.6 Percentile Ranks and Percentile Points


Percentile representation is widely used to show how an individual has performed within a
large group. A percentile rank is the percentage of cases falling below a given point on the
measurement scale. A percentile point (or more simply, percentile) on the other hand is the
point on the measurement scale below which a given percentage of the cases falls. If, for
example, 20 % of population in a state have an annual income more than ₹4,50,000, the 20th
percentile is ₹4,50,000.
Calculating percentile rank
Percentile rank is based on the ‘less than’ type of cumulative frequency distribution, and is
obtained by converting the cumulative frequencies into percentages of the total number of
cases in the distribution. Table 4.14 shows the calculations of percentile ranks for the data
of Table 4.11. After obtaining the ‘less-than’ cumulative frequencies as in Table 4.12, we
obtain the proportion of cumulative frequency for each class by dividing the cumulative frequencies
by the total number of cases (260 here). The percentile ranks are finally obtained by multiplying the
proportions with 100. These are shown in the last column. The percentile rank for the class ₹ 5,000
- 6,000 is
shown Cumulative, Cumulative as 65
Percentile
Earnings, ₹ Frequency less than frequency,
proportion
rank
upper limit
9,000 - 10,000 5 260 1 100
8,000 - 9,000 10 255 0.98 98
7,000 - 8,000 35 245 0.94 94
6,000 - 7,000 40 210 0.81 81
5,000 - 6,000 100 170 0.65 65
4,000 - 5,000 50 70 0.27 27
3,000 - 4,000 20 20 0.08 8
signifying that 65% of all handloom employees surveyed have incomes below the upper limit
of this class, which is ₹ 6,000.

Table 4.14 Cumulative Percentage Distribution and Percentile Calculations

The example below illustrates the procedure for calculating the percentile rank for a given
value of the variable.
Illustration 4.3
Let us determine from the data of Table 4.14 the percentile rank of an individual handloom
employee who earns ₹ 5,300 per week. This earning falls in the class ₹ 5000 – 6000. The
percentage of employees earning less than ₹ 6,000 (the upper limit of the class) is 65 and of
those earning less than ₹ 5,000 is 27. So, the percentile rank for one earning 5,300 is
between these two limits, 27 and 65. In the absence of any other information, we assume
that the earnings of 100 employees within this band of ₹ 5,000 – 6,000 are uniformly
distributed. We can, thus, use linear interpolation as illustrated in Fig. 4.1.

6000−5300
PRX ¿ 27+ × ( 65−27 )=53.6
6000−5000

Fig. 4.1 Calculating the Percentile Rank for Weekly Earnings of ₹ 5,300 in Data of Table
4.14

The following formula has been used to calculate the percentile rank for a variable value of
X:
( ULn −X )
PR X =PRUL + ×d (4.1)
i
where: n is the serial number of the class in which the required percentile is located,
 PR UL is the percentile rank of the upper limit of this nth class,
 d is percentage of all items contained within the nth class interval,
 UL n is the upper limit of the nth class interval, and
 i is the width of the nth class interval.
Calculating percentile point
As stated above, percentile point (or more simply, percentile) is the point on the
measurement scale below which a given percentage of the cases falls. We use the following
to illustrate its approximate calculation.

Illustration 4.4
Consider the marks obtained out of a maximum of 300 in the Joint Entrance Examination -
Main (JEE). In one particular year 10,25,029 candidates appeared in the examination. Table
4.15 shows the distribution of the candidates according to the scores obtained.
Table 4.15 Distribution of Marks Obtained in JEE – Main
Class interval Real limits
Lower Upper Lower Upper Frequency Less than Percentile rank
limit limit limit limit upper limit at upper limit
275 300 274.5 300 106 10,25,029 100.00
250 274 249.5 274.5 418 10,24,923 99.99
225 249 224.5 249.5 1,474 10,24,505 99.95
200 224 199.5 224.5 2,669 10,23,031 99.81
175 199 174.5 199.5 6,079 10,20,362 99.54
150 174 149.5 174.5 10,499 10,14,283 98.95
125 149 124.5 149.5 16,929 10,03,784 97.93
100 124 99.5 124.5 27,584 9,86,855 96.28
75 99 74.5 99.5 63,571 9,59,271 93.58
50 74 49.5 74.5 97,188 8,95,700 87.38
25 49 24.5 49.5 4,07,563 7,98,512 77.90
0 24 0 24.5 3,90,949 3,90,949 38.14
This is the starting point for determining percentiles. Suppose that we want to find the value
of 90th percentile, denoted as P9. It is, by definition, the point on the variable scale below
which 90% of the actual scores are. The first step in the process is to determine the class in
which this percentile is located. From the last column of Table 4.15, we note that 93.58% of
case are below the real upper limit of class 75 - 99, that is, above 99.5 marks, and that
87.38% of case are below the real upper limit of class 50 – 74 (which is also the real lower
limit of the class 75-99), that is, above 74.5 marks. This clearly means that 90 th percentile
lies within the class with real limits as 74.5 and 99.5.
At this point, it is not clear what score value we should assign because the point we want
lies somewhere within this interval. There are 63,571 scores in this interval. We will assume
that the scores are uniformly distributed throughout the interval. This assumption underlies
the procedure termed as linear interpolation, the same that was used for locating percentile
rank in Illustration 4.3.
P90 is the marks obtained by the candidate below which there are 90% of the total
candidates, which are the marks obtained by the candidate who has
90
10 , 25,029 × =9 , 22,526 candidates above him.
100
Fig. 4.2 shows the linear interpolation calculations using this assumption of uniform

Fig. 4.2 Location of the 90th Percentile Point in the Data of Table 4.15

distribution within this class. The value of the 90 th percentile will be located at a point
2,05,005 scores up from the bottom of the distribution. Because there are 8,95,700 cases
above the lower limit of this class interval, we must come up ( 9 , 22,526−8 , 95,700 ) more to
reach P90. This means that we must come up ( 9 , 22,526−8 , 95,700 ) out of the
(9 , 59,271−8 ,95,700) equal parts in the interval’s real width of 25 marks. We add this
quantity to the lower limit of the interval, which is 50.5. Thus, we get P90 ¿ 85.05.
What we did above can be converted into the following formula to calculate Pth percentile:
( c . f for the percentile−c . f for ( n−1 ) th class )
P=¿+i×
frequency of the class
(4.2)
where:
 n represents the class number in which the required percentile is located,
 LL is the real lower limit of this nth class interval,
 i is width of this class interval,
 c . f for the percentile is the number of items lying below the percentile, and
 c . f for ( n−1 ) th class is the number of scores lying below LL.

4.7 Organizing Categorical Variables


Categorical variables are organized by tallying values for the variable by categories and
tabulating the results. If the data are classified on the basis of one attribute only, the
process is termed as simple classification. Tables 4.16 and 4.17 are two examples of such
tables. These are also known as summary tables.

Table 4.16 Employment in MANREGA (in thousands)


Average daily
number of workers
State
employed in
MANREGA
Assam 79
Bihar 258
Gujarat 405
Haryana 77
Kerala 204
Tamil Nadu 41 I
Uttar Pradesh 384
West Bengal 850

Table 4.17. Outlay for Village and Small Industries in Public Sector, 2022-23
(in thousand crores of rupees)
Industry Expenditure
Small scale
Industry 39.35
Industrial Estates 7.58
Handloom
Industry 14.05
Village Industry 89.33
Coir Industry 1.79
In cases, where more than one attribute is studied, resulting in a subdivision of classes, the
classification is known as manifold. Thus, the population of a city may be divided into
literate and illiterate. Literate persons may again be divided into literate males and literate
females. The following illustration depicts an example of manifold classification:
Table 4.18 Distribution of Buildings in a District According to Habitation
(in thousands)
Under
District Inhabited Uninhabited Construction Total
Administrative 571 40 5 616
Other Urban 4,064 285 45 4,394
Rural 1,625 124 12 1,761
Total 6,260 449 62 6,771
When it is desired to represent three or more characteristics in a single table, such a table is
called higher-order table. Thus, if it is desired to represent the age, sex and course, of the
students, the table would take the form as shown in Table 4.19, and would be called a
higher order table.
Table 4.19 Skelton Table Showing Distribution of Students in a High School According to Age,
Sex and Course

Course
Arts Science Commerce
Age in years Male Female Male Female Male Female
14-15
15-16
16-17
17 and over

Illustration 4.5
In a trip organized by a college, there were 80 persons, each of whom paid ₹ 202.50 on an
average. There were 60 students, each of whom paid ₹ 200. Members of teaching staff were
charged at a higher rate. The number of helpers (all males) was six, and they were not
charged anything. The number of women was 20 per cent of the total, and there was only
one women staff member. Tabulate this information.
Solution:
Table 4.20 shows the data. Noting (G) denotes given information. Numbers in brackets
denote the sequence in which information is obtained from the given information. Final
calculation is (9), the rate of contribution of teachers which was determined as ₹ 300.

Sex Total
Rate of
Totals contribution
Female Male contribution
Students (4) 15 (5) 45 (G) 60 (G) 200 (7) 12,000
Teaching staff (G) 1 (2) 13 (1) 14 (9) 300 (8) 4,200
Helpers (G) 0 (G) 6 (G) 6 (G) 0 (G) 0
Totals (G) 16 (3) 64 (G) 80 (G) 202.50 (6) 16,200

4.8 Using EXCEL to Tabulate Data


We explain the use of EXCEL for tabulating data through the following illustrations.

Illustration 4.6
Classify the data of Table 4.1 using EXCEL.
Solution:
Frequency distributions are constructed using the Histogram module in the Data Analysis
package1 of the EXCEL.
Open a sheet and copy the data onto it. Then determine the minimum and maximum
values of the variables in this data. For this you use the min and max function commands by
typing =max(Range of cells in which data is located).

1
There is a good video of the process titled Use Excel 2016 to make Frequency distribution and Histogram for
quantitative data by Kwai Chan at https://www.youtube.com/watch?v=Giewd9yH4q0
Next, we have to decide on the class intervals. The minimum and maximum values suggest
using class intervals 61-65, 66-70, etc., till 85-90, a total of 6 classes. The real limits of these
are 60.5-65.5, 65.5-70.5, etc.
EXCEL uses the terminology bin for classes. The bins are specified by the upper limits of the
classes they represent. We enter the upper limits of the six classes as shown in the Screen
shot 4.1 here. We are now ready for using the histogram module.
Screen Shot 4.1

Under the Data tab, choose Data Analysis package. Then choose Histogram module in the
pop-up menu. A form titled Histogram is displayed as shown in Screen shot 4.2a. Fill in the
Input range (A1:K4 here), and the Bin range (A10:A150). Fill in the Output range too. Here
we want the output to be displayed at Cell C9. On pressing OK, we get the output table
starting at cell C9 as shown in Screen shot 4.2b
Screen Shot 4.2

a) Histogram Form (b) Output

If we had selected the chart output in the histogram form, we would have got a plot of this
data as well. (That is why this software module is named Histogram). The resulting table
can be edited to insert proper class intervals in place of bins.

Illustration 4.7
Thirty kids in a primary school were surveyed to determine the connection between intake
of fast food and the general heath. The data is shown in the Screen shot 4.3. The codes for
intake and general health are shown on the sheet.
Screen Shot 4.3

Present this data in a contingency table.


Solution:
Let us construct a table with Codes for intake shown in rows and Codes for health shown in
columns. The skeleton table is shown starting at Cell M6 in the screenshot above. In the
first cell of this table (located at cell N7). Select the first cell of the destination table (Cell
N7), and then select the function button f x shown. An Insert function menu would pop-up.
In Select a category select Statistical, and choose function COUNTIFS in it. It counts the
number of items which meet multiple criteria. In the function argument form that pops up
enter the criteria ranges and criteria as shown in Screen Shot 4.4a below 2.
The $ signs before some cell symbols signify that the immediately following value is frozen,
and does not change when you drag the formula from row to row or column to column. The
screen shot also shows the completed contingency table.
Screen Shot 4.4

(a) Function Argument Form (b) Output

4.9 Using PSPP for Tabulating Data


PSPP is very easy to use for tabulating data 3. It is introduced here with the following
illustration.

Illustration 4.8
Use PSPP for tabulating the data given in Table 4.2.
Solution:
After we open the saved data file (with extension .sav), choose the tab Descriptive Statistics
> Descriptive Statistics > Frequencies. A form titled Frequencies as shown in Screen Shot 4.5
pops up

2
There are many good videos on YouTube to explain this procedure in details. One such video is: Creating
Contingency Tables in Excel by Erik Heineman at https://www.youtube.com/watch?v=hpiI_HZfmIY
3
A very good video tutorial on YouTube on this topic:
Toussaint, L, Frequencies Analysis in SPSS: https://www.youtube.com/watch?v=JNGI_-n3dKo
Screen Shot 4.5

Here the variables are listed in the panel on the left from which we can select the variables
that we want to tabulate. There is only one variable here. We select it and press the arrow
key to transfer it to the Variable(s) panel on the right. We can select the statistics that we
want as the output. Since we are only interested in tabulation here, we can deselect all of
them, or leave Minimum and Maximum as selected. We could also choose what charts to
plot. We will not do so here. By pressing the Frequency Tables button, we can choose some
characteristics of how the data is plotted, like whether in ascending or descending order. On
pressing OK, a new window titled Output – PSPPIRE Output Viewer opens. It contains the
output of the operation and is shown in Screen Shot 4.6. The table shows the frequencies,
Percent, Valid Percent (which is the percent when missing data are excluded from the
calculations), and the cumulative percent.
Screen Shot 4.6
We next illustrate creating grouped-frequency tables. This is a bit more complicated, since
there is no direct way to do so.

Illustration 4.9
Tabulate the data of Table 4.2 as a grouped-frequency distribution.
Solution:
Since there is no direct way to group data in PSPP, the trick lies in transforming the variable
values, with one value for a group4. For this we use the Transform tab. Let us say we group
the data of Table 4.2 into six groups, namely 61-65, 66-70, 71-75, 76-80, 81-85, and 86-90 with
mid-points at 63, 68, 73, 78, 83 and 87, respectively. Let us name these groups by their mid-points.
After selecting the tab Transform, select Recode into Different Variables within the pop-up. A new
screen opens up as shown in Screen Shot 4.7. The panel on left shows the name of variables. There
is only one variable, Marks, which we select. Press the arrow to transfer it to the right panel. Then
under the group Output Variable, enter a name for the grouped variable. We have entered it as
simply GroupedVariable. The same is entered as label, and Old and New Values button at the bottom
is pressed. A form titled Recode into Different Variable: Old and New Values opens up, as shown in
Screen Shot 4.8, ready to accept the desired transformation.
Screen Shot 4.7

Screen Shot 4.8

4
A video tutorial on this subject: Dr J., Grouped Frequency Table Trick in PSPP SPSS:
https://www.youtube.com/watch?v=-3Vq2MP6hK8
Screen Shot 4.9

We select the Range radio button and enter the limits of the first group, 61 through 65, and
enter 63, the mid-value of this group into New Value field, and press Add. The old and the
new values show up in the window on the right. Repeat this for each of the eight groups
that we have. After all groups show up, press Continue to go back to the earlier form Recode
into Different Variable. Press Change and then OK. The Data view of PSPP opens up with the
new variable GroupMarks values for each item (Screen Shot 4.9).
If we construct a frequency table now (using the procedure explained in Illustration 4.8, all
the cases for class 61-68 would be listed against the mid-value 63. It is useful, then, to label
63 as 61-65, and similarly the other mid-values. This is achieved by opening the variable
view of PSPP, and click on the value label cell against the variable GroupVariable. A form
titled Value Labels opens up (Screen Shot 4.10).
Screen Shot 4.10
The grouped frequencies can now be obtained using the process outlined in Illustration 4.8.
The navigation is the usual: Analyze > Descriptive Statistics > Frequencies and the choosing
GroupedVariable. The output is as shown in Screen Shot 4.11.
Screen Shot 4.11

Concepts Introduced
It is easier to understand the meaning of a large number of observations when they are
ordered and are grouped. A frequency distribution shows the number of observations for
various groups. Although putting scores into class intervals is convenient, we lose the
detailed information. For this reason, class intervals should not be too wide. A good
compromise is to use between 5 and 15 intervals.
We can construct distributions to show actual frequency (how many?) or relative frequency
(what percentage of the whole?). Relative frequency distributions are usually best for
comparing two or more distributions containing different numbers of cases.
The cumulative frequency distribution gives the number of scores below the upper real limit
of each score interval.
A percentile rank is the percentage of cases falling below a given point on the measurement
scale. Thus, it allows us to see how an individual has performed relative to the entire group.
A percentile is a point along the measurement scale below which a specified percentage of
the cases in the distribution falls.
The following formula can be used to calculate the percentile rank for a variable value of X:
( ULn −X )
PR X =PRUL + ×d (4.1)
i
where:
 n is the serial number of the class in which the required percentile is located,
 PRUL is the percentile rank of the upper limit of this nth class,
 d is percentage of all items contained within the nth class interval,
 UL n is the upper limit of the nth class interval, and
 i is the width of the nth class interval.
Percentile point (or more simply, percentile) is the point on the measurement scale below
which a given percentage of the cases falls. The following formula can be used to calculate
the Pth percentile:
( c . f for the percentile−c . f for ( n−1 ) th class )
P=¿+i× (4.2)
frequency of the class
where:
 n represents the class number in which the required percentile is located,
 LL is the real lower limit of this nth class interval,
 i is width of this class interval,
 c . f for the percentile is the number of items lying below the percentile, and
 c . f for ( n−1 ) th class is the number of scores lying below LL.
Categorical variables are organized by tallying values for the variable by categories and
tabulating the results. If the data are classified on the basis of one attribute only, the
process is termed as simple classification. These are also known as summary tables.
In cases, where more than one attribute is studied resulting in a subdivision of classes, the
classification is known as manifold.
Frequency tables in EXCEL can be calculated by using navigation Data> Data Analysis>
Histogram. EXCEL uses the terminology bin for classes. The bins are specified by the upper
limits of the classes they represent. We enter the upper limits of the various classes in a
column, and use it according to the procedure described in Sec. 4.8.
In PSPP we use the navigation Descriptive Statistics > Descriptive Statistics > Frequencies.
Since there is no direct way to group data in PSPP, the trick lies in transforming the variable
values, with one value for a group. For this we use the Transform tab. The process is
explained in Illustration 4.9 above.

Conceptual Questions and Problems


Conceptual Questions
4.1 Describe the considerations which are to guide you in fixing the range, the class interval
and upper and lower limits of class intervals for a frequency distribution.
4.2 Discuss the objectives of classification of a raw mass of collected data.
4.3 Distinguish between:
(a) Continuous series and discrete series.
(b) Exclusive and inclusive class intervals.
(c) Ordinary and cumulative frequencies.
(d) More than and less than frequency tables.
4.4 Describe what considerations are to guide you in constructing a statistical table.
4.5 For each of the following intervals, give the interval width, the real limits, the apparent
limits of the next higher interval, and the real limits of the next higher interval. (a) 21 –
25, where the scores were rounded to the nearest whole number, (b) 20 – 29, where
the scores were rounded to the nearest whole number, (c) 8.5 – 8.9, where the scores
were rounded to the nearest tenth, (d) 50 – 60, where scores are accurate to the
nearest ten, and (e) 3.25 – 3.49, where scores were rounded to the nearest hundredth.
4.6 The real limits for the lowest two intervals in a frequency distribution are set at 39.5 –
49.5 and 49.5 – 59.5. (a) Is the overlapping of intervals at 49.5 a problem? Explain. What
are the apparent limits and the interval width for both intervals?
4.7 For each set of class intervals, identify any mistakes or shortcomings:
(a) 50 and up, 44 – 49, 38 – 43, 26 – 31
(b) 20 – 25, 14 – 19, 8 – 13, 0 – 7
(c) 5 – 9, 10 – 14, 14 – 19, 20 – 24
4.8 Given below are the highest and the lowest score for different samples. Each sample is
to be grouped into class intervals. For each, give the range, your choice of width for the
class intervals, the apparent limits for the lowest interval, and the apparent limits for
the highest interval: (a) 75, 36; (b) 117, 54; (c) 171, 27; (d) 21, 22; (e) 3.47, 1.13; and (f )
821, 287.
Problems
4.9 Data below gives the scores of some college students on a test:
44 35 20 40 38 52 29 36 38 38
38 38 41 35 42 50 31 43 30 41
32 47 43 41 47 32 38 29 23 48
41 51 48 49 37 26 34 48 35 41
38 47 41 33 39 48 38 20 59 37
29 44 29 33 35 58 41 38 26 29
32 54 24 38 38 56 56 48 34 35
26 26 38 37 57 24 44 62 29 41
4.10 Following are the marks obtained by a group of 34 students in class test carrying
maximum marks 5. Tabulate the data in the form of frequency distribution.
3, 2, 0, 1, 3, 4, 2, 5, 3, 3, 1, 3, 2, 3, 1, 3, 3, 0, 4, 3, 5, 2, 2, 5, 3, 1, 4, 2 1, 2, 3, 4, 1, 3, 2
4.11 Compile a table showing the number of letters in each word of the extract given
below, treating the number of letters in a word as variable.
Sample method has another advantage over the census method if the information is
collected from only a small proportion of the population. Its completeness and accuracy
can be easily ensured.
4.12 The following is a record or sales of a shop on some days in thousands of Rupees.
Tabulate the data in the form of frequency distribution taking the lowest class as 60 -69.
61, 13, 93, 107, 112, 16, 78, 69, 96, 12, 80, 88, 85, 109, 103, 84, 84, 106, 91, 15, 91, 92,
102, 91, 101, 90, 17, 105, 90, 86, 113, 101, 114, 72, 77, 98, 95, 63, 99, 82, 100, 106, 87,
89, 92, 107, 111, 75, 83, 86, 106, 107, 62, 94, 73, 108, 115, 85, 98, 93, 109, 97, 74, 98,
67, 82, 104, 88, 88, 92.
4.13 Following are the marks (out or 10) obtained by 50 students on a test in Statistics:
70, 45, 33, 64, 50, 25, 65, 75, 30, 20, 55, 60, 65, 58, 52, 36, 45, 42, 35, 40, 51, 47, 39, 61,
53, 59, 49, 41, 15, 55, 42, 63, 82, 65, 45, 63, 54, 52. 48, 46, 57, 53, 55, 42, 45, 32, 64, 35,
26, 18
Make a frequency distribution taking a class-interval of 10 marks (Take the first class
interval as 0-10).
4.14 Following are the weekly wages in hundreds of rupees or 70 workers. Tabulate them
by taking the class interval of size 10.
32, 47, 57, 67, 62, 92, 117, 87, 27, 102, 93, 63, 73, 83, 123, 108, 63, 98, 113, 68, 63, 78,
98, 133, 98, 128, 118, 68, .73, 92, 82, 62, 57, 82, 72, 92, 52, 42, 36, 46, 41, 86, 136, 146,
96, 66, 46, 26, 114, 89, 79, 129, 24, 89, 99, 94, 84, 85, 102, 115, 40, 35, 125, 105, 35, 75,
45, 76, 84, 125.
4.15 Following figures give the height in centimetres of 80 plants. Represent the data by a
frequency distribution with suitable class-intervals.
62.1, 65.5, 63.0, 62.2, 64.7, 63.1, 65.8, 62.3, 60.7, 63.2, 64.1, 59.6, 64.5, 61.1, 65.7, 60.2,
64.6, 67.3, 64.5, 66.4, 64.2, 62.4, 63.3, 64.0, 62.5, 63.4, 66.3, 59.9, 63.5, 61.8, 65.4, 67.3,
60.4, 65.6, 59.1, 64.8, 61.9, 62.6, 67.0, 68.1, 59.4, 63.6, 64.4, 62.0, 63.7, 66.3, 63.8, 66.7,
63.9, 60.8, 63.0, 64.3, 61.2, 62.7, 64.6, 64.9, 60.5, 64.4, 61.7, 66.5, 65.3, 63.5, 65.2. 66.2,
59.7, 67.6, 63.5, 67.4, 63.5, 68.6, 60.0, 61.3, 63.6, 61.5, 65.1, 63.8, 61.6, 64.0, 68.7, 66.6.
4.16 In a survey of small business owners, the owners were asked a number of
questions on their experience with implementation of GST. It was found that half of all
small businesses report they have had serious issues. The following frequency
distribution summarizes the average time needed by small business owners to resolve
their GST issues.
Average Time to Resolve GST Issues
Less than 1 day 111
Between 1 and less than 3 days 94
Between 3 and less than 7 days 42
Between 7 and less than 14 days 12
14 days or more 41
a) What percentage of small businesses took less than 3 days, on the average, to
resolve GST problems?
b) What percentage of small businesses took between 1 and less than 14 days, on the
average, to resolve GST problems?
c) What percentage of small businesses took 3 or more days, on the average, to
resolve GST problems?
4.17 A group of 30 visitors to a fair were queried on their total cost in ₹ of the visit to a
local fair, and the following data was obtained:
246.39 444.16 404.60 212.40 477.32 271.74 312.20 322.50 261.20 336.52
369.86 232.44 435.72 541.00 223.92 468.20 325.85 281.06 221.80 676.42
295.40 263.10 278.90 341.90 317.08 280.28 340.60 289.71 275.74 258.78
a) Organize these costs as an ordered array.
b) Construct a frequency distribution and a percentage distribution for these costs.
c) Around which class grouping, if any, are the costs of attending the fair
concentrated? Explain
4.18 The following data about the water bill (in ₹) during June 2023 for a random sample
of 50 one-bedroom apartments in a large city.
96 171 202 178 147 102 153 197 127 82 157 185 90 116 172 111 148 213 130 165
141 149 206 175 123 128 144 168 109 167 95 163 150 154 130 143 187 166 139
149 108 119 183 151 114 135 191 137 129 158
a) Construct a frequency distribution and a percentage distribution that have class
intervals with the upper-class boundaries ₹ 99, ₹ 119, and so on.
b) Construct a cumulative percentage distribution.
c) Around what amount does the monthly water bill seem to be concentrated?
4.19 Prepare a statistical table from the following:
Daily pocket expenditure (in Rupees) of some boys in a school
88 23 27 28 88 96 94 93 86 99
82 24 24 55 88 99 55 86 82 36
96 39 26 54 87 100 56 84 84 46
102 48 27 26 19 ]00 59 83 84 48
104 46 30 29 40 101 60 89 45 49
106 33 36 30 40 103 70 90 49 50
104 36 37 40 40 106 72 94 50 60
24 26 29 39 78 67 49 50 56 46
44 99 66 43 93 107 46 48 76 96
79 99 80 102 46 36 32 67 68 51
4.20 The following is a record of marks obtained by students in two sections, A and B, out
of maximum of 150 marks. Tabulate the data in the form of a frequency distribution in
such a way that a visual look at them would easily indicate the comparative
performance of students in the two sections with-out the use of other sophisticated
tools of statistical analysis.
Section A: 61, 73, 93, 107, 112, 80, 88, 96, 109, 103, 91, 92, 102, 91, 103, 113, 101, 114,
72, 77, 100, 106, 87, 89, 92, 106, 107, 62, 94, 73, 109, 97, 74, 98, 67.
Section B: 76, 78, 69, 96, 72, 84, 84, 106, 91, 75, 90, 77, 105, 90, 86, 118, 95, 63, 99, 82,
107, 111, 76, 83, 86, 108, 115, 85, 98, 93, 82, 104, 88, 88, 92. .
4.21 Following are the marks obtained by students of a class in certain tests of statistics
and law. Represent the data by one frequency table.
4.22
S.No. of student 1 2 3 4 5 6 7 8 9 10 11 12
Marks in Statistics 15 0 1 3 16 2 18 5 4 17 6 19
Marks in Law 13 I 2 7 8 9 12 9 17 16 5 18
S.No. of student 13 14 15 16 17 18 19 20 21 22 23 24
Marks in Statistics 14 9 8 13 10 13 11 11 12 18 9 7
Marks in Law 11 3 5 4 1 11 14 7 18 15 15 3
The ages of 20 husband-and-wife pairs are given below. Form a two-way frequency table
showing the relationship between the ages of husbands and their wives with the class-
intervals 20-25, 25-30, etc.
Age of Age of Age of Age of
S.No. S.No.
husband wife husband wife
1 28 23 I1 27 24
2 37 30 12 39 34
3 42 40 13 23 20
4 25 26 14 33 31
5 29 25 15 36 29
6 47 31 16 32 35
7 37 35 17 22 23
8 '35 25 18 29 27
9 23 21 19 38 34
10 41 38 20 48 47

4.23 A class of 32 students obtained the following marks in 2020 and 2021.

S. No. of Marks in S. No. of Marks in


student 2020 2021 student 2020 2021
1 41 42 17 38 43
2 55 54 18 78 64
3 46 31 19 10 16
4 49 56 20 36 33
5 40 30 21 38 31
6 22 24 22 18 53
7 57 50 23 21 11
8 75 62 24 24 26
9 48 51 25 18 24
10 31 35 16 61 68
11 78 63 27 29 14
12 23 26 28 13 20
13 31 37 29 30 36
14 19 25 30 36 30
15 45 31 31 38 30
16 36 45 32 37 42
Classify these marks into the following form (your answer should clearly show
the actual process of classification)

Numbers of students who obtained


Marks in More marks in 2020 Less marks in 2020
2020 than in 2021 than in 2021 Total
Less than 30
30-47
48-59
60 and
above
4.24 Consider the following distribution
Limits Real Limits f Cumulative f Cumulative %

96–98 95.5–98.5 1 50 100


93–95 92.5–95.5 0 49 98
90–92 89.5–92.5 2 49 98
87–89 86.5–89.5 7 47 94
84–86 83.5–86.5 10 40 80
81–83 80.5–83.5 6 30 60
78–80 77.5–80.5 8 24 48
75–77 74.5–77.5 4 16 32
72–74 71.5–74.5 3 12 24
69–71 68.5–71.5 4 9 18
66–68 65.5–68.5 3 5 10
63–65 62.5–65.5 0 2 4
60–62 59.5–62.5 2 2 4
Determine: (a) the 30th percentile; (b) P60; (c) the 10th percentile; (d) the percentile
rank of value 62.0; (e) the percentage of cases with a score less than 77.0.
4.25 The following are the scores obtained by 50 students on a test.
52 84 93 78 75 71 99 81 86 81 65 70 72 71 91 87 82 77 66 63 90 58 89 60 79
77 72 83 87 87 83 79 55 97 74 71 86 75 83 63 82 70 90 95 92 75 85 83 71 88
Use EXCEL to construct a grouped-frequency distribution. Also construct a
cumulative frequency distribution.
4.26 Point out the mistakes made in the following blank table drawn to show the
distribution of population according to sex, age, marital status and literacy.
0 - 25 25 - 50 50 - 75 75 and above

Unmarried

Unmarried

Unmarried

Unmarried
Total

Married

Married

Married

Married
s

M F M F M F M F M F M F M F M F
Literate
Illiterat
e

Reconstruct the above table.


4.27 Out of the total number of 1,807 women who were interviewed for employment
in a textile factory in Bombay, 512 were from textile areas and the rest from the non-
textile areas. Amongst the married women who belonged to textile areas, 247 were
experienced and 73 inexperienced, while for non-textile areas, the corresponding
figures were 49 and 550. The total number of inexperienced women was 134 of
whom 111 resided in textile areas. Of the total number of women, 918 were
unmarried, and of these, the number of experienced women in the textile and non-
textile areas was 154 and 16 respectively. Tabulate.
4.28 In a newspaper account describing the incidence of influenza among tubercular
persons living in the same family, the following paragraph appeared: “Exactly a fifth
of the 100,000 inhabitants showed signs of tuberculosis and no fewer than 5,000
among them had an attack of influenza, but among them only 1,000 lived in
uninfected houses. In contrast with this, 1/15th of the tubercular persons who did
not have influenza were still exposed to infection. Altogether 21,000 were attacked
by influenza and 41,000 were exposed to risk of infection, but the number who
having influenza but not tubercular lived in houses where no other cases of influenza
occurred, was only 2,000."
Redraft the Information in a concise and elegant tabular form.
4.29 Present the following information in a suitable tabular form: “In 1975-76 the total
production in India (in thousand tons) of the principal oilseeds was as follows:
Groundnuts 3,102; linseed 434; rape and mustard 1,193; castor 105; sesamum 433.
Next year the production of each of the first three items fell by 36% and the
remaining items fell by 10% each. In 1977-78 there was an increase compared to the
preceding year of 8% in groundnuts, 12% in linseed, 1 % in rape and mustard, 50% in
castor, and 10% in sesamum. In the next year the figures were respectively 3,823,
395, 955, 140 and 447.”
4.30 “The total number of accidents in Southern Railway in 1970 was 3,500 and it
decreased by 300 in 1971, and by 700 in 1972. The total number of accidents in
Metre Gauge section showed a progressive increase from 1970 to 1972. It was 245 in
1970, 346 in 1971 and 428 in 1972. In the Metre Gauge 'not compensated' cases
were 49 in 1910, 76 in 1971 and 108 in 1972. Compensated cases in the Broad-Gauge
section were 2,867, 2,587 and 2,152 in these three years respectively.”
From the above report, prepare a neat table as per rules of tabulation,
4.31 Use PSPP to construct a grouped frequency distribution of the data in Prob. 4.23.
Also construct a cumulative frequency distribution.
4.32 Use PSPP to construct a grouped frequency distribution of the data in Prob. 4.25.
Also construct a cumulative frequency distribution.

You might also like