0 ratings 0% found this document useful (0 votes) 34 views 21 pages Chapter 10 Processing Analysis and Interpretation of Data
The document outlines the processes involved in the processing, analysis, and interpretation of data, emphasizing steps such as data preparation, editing, coding, and data cleaning. It describes methods for summarizing data through classification, tabulation, and graphical representation, as well as the importance of descriptive statistics and measures of central tendency. The document serves as a guide for researchers to ensure accurate and effective data handling for analysis.
AI-enhanced title and description
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save Chapter 10 Processing Analysis and interpretation ... For Later
_—_———E Le
PROCESSING, ANALYSIS AND
INTERPRETATION OF DATA
PROCESSING OF DATA
Before writing the Report, the processing, analysis and interpretations of Data begins.
Once the data are received from the field, the researcher has an important duty to process
data for subsequent statistical analysis.
The processing of data includes :
(i) Data preparation (ii) Classification
(iii) Tabulation (iv) Graphical representation
(v) Diagrammatical representation (vi) Computative of statistical deviation
Thus, data preparation consists of three important steps,
1. Editing, 2. Coding, 3. Data entry, 4. Transcribing, 5. Data Cleaning
EDITING :
Editing is the review of the questionnaires with the objective of increasing accuracy and
Precision. It is needed to detect and if possible, to eliminate errors in the filled in questionnaires.
Eedting work, although uninteresting and dull in nature, is no doubt necessory for faultless analysis
of survey data,
There are three points - Completeness, Accuracy and Uniformity to be checked while editing
the data,
Completeness : While checking the questionnaires for completeness it should be remembered
that there is an answer to every question. If not, the answer should be deduced ftom other data. or
referred to the concerned respondent.
Accuracy : Besides checking that all questionnaires are provided with answers, one must
“yo check whether answers are accurate. Inconsistencies in answers should be looked for and
Tesolved.Business Research Metho dg
The questionnaires with unsatisfactory responses may be returned to the field Where tte
interviewers recontact the respondents. If returning to the field to correct the unsatisfactory Tesponsey
and to fill up the missing values is not feasible, the editor may assign seemingly appropriate Values
to unsatisfactory responses. Alternatively the editor may discard the unsatisfactory Tesponses, ip
these are few in number.
Uniformity : While editing the schedules efforts should be made to check whether the
interviewers have interpreted questions and instructions uniformly. In case of lack of Uniformity
editing staff may try to make corrections or refer back the schedules to the Tespondents or omit
schedules from analysis.
CODING : After editing the data it may be required in most Surveys to put the results in
questionnaire form by coding the answers before summarization and analysis begin. This may
also be conveniently carried out at the time of editing, if not coded in the questionnaires/schedules,
The purpose of coding in surveys is to put the answers to a particular question into meaningfy
and unambiguous categories to bring out essential pattern, concealed in the mass of information,
Essentially, coding means assigning a code, usually a number, to each possible Tesponse to each
question.
Closed questions can be easily handled by the researchers for coding. As regards open
ended questions the researcher should note the varieties of answers and after preliminary evaluation,
response categories can be settled down / created for coding. Although most responses could be
accounted for by the derived categories, another category might be established to meet the coding
tule exhaustiveness. It is to be noted that open questions are more difficult to code since answers
are not prepared in advance. However, they do encourage disclosure of complete information
without restriction imposed by prior suggestive answers.
Transcribing :
Transcribing data involves transferring the the coded data from the questionnaires or coding
sheets into a computer for subsequent data treatment.
Data Cleaning
Data cleaning involves :
(i) Range checks
(ii) Consistency checks
(iii) Treatment of missing observations
Range checks compare each data item to the set of usual and permissible values for that
variable. Range checks are used to (a) detect and correct invalid values (b) note and investigate
unusual values (c) note outliers which may need special statistical treatment.
Consistency checks examine each pair of related data items in relation to set of usual and
permissible values for the variables as pair Consistency checks are used to (a) detect and correst
impermissible combinations (b) note and correct unusual combinations.proces: ir nalysis and Interpretation of Data e&
Inconsistent responses may have noticeable impact on estimates and can alter comparisons
¢1085 gTOUPS.
Missing responses correspond to unknown values of the variable because of ambiguous
answers provided by the respondents and also because the interviewers fail to record answers
propery.
The treatment of missing responses may be made by
e _ substituing a neutral value such as mean response, etc,
¢ substituting an imputed response by studying the pattern of responses.
« deleting the missing responses from the analysis.
e taking into account only available responses for each question.
DATA ENTRY
Data entry implies conversion of information gathered from secondary or primary sources
to a medium for viewing and manipulation. Keyboarding helps the researchers who need to create
a data file immediately and store it in a minimum space on a variety of media.
For large research projects involving large bulk of data, database programs serve as valuable
data entry devices. A database is a collection of data organized for computerized retrieval.
Spreadsheets are a specialized type of database. It provides an easy to lea mechanism for
organizing and tabulating data and computing simple statistics. Data entry on a spread sheet uses
numbered rows and letter columns with a matrix of thousands of cells into which an entry may be
placed.
A dataware house organizes large volumes of data into categories to facilitate retrieval,
interpretation and sorting by end users.
SUMMARIZATION OF DATA:
The raw data are often in bulky form and are difficult to be comprehended easily for ready
reference. This calls for devising certain suitable methods for processing of data such as condensing
/ summarizing data for easy comprehension and providing directions for subsequent analysis in
computing required statistical derivatives and applying further statistical treatment.
The most prominent and universally adopted data processing methods which present data
in a summary form are :
Classification
Tabulation ‘
Graphical Representation
Diagrammatic Representation
Computation of Basic Statistical Derivatives for univariate / multivariate data.
CLASSIFICATION OF DATA:
Classification is a process by which the data are arranged according to resemblances,
affinities, intemal homogeneity and common characteristics. This process is forerunner to the
wR YNBusiness Research Methods
tabular, graphical and diagrammatic representation of data, For example, in a socio-economic
enquiry, data can be classified according to age, sex, educational qualification, religion, caste,
Income group, occupation etc.
TABULATION.
After classification of data the next step in the process of summarization of data is to put the
classified data in rows and columns having special characteristics on a piece of paper. Such
representation of data in orderly and easily comprehensible fashion is called tabulation.
Classification is a pre-requisite for tabulation.
General format of a table :
Table number
Title
Head note (Row Headings and Column Headings)
Body of the table
Foot note (if any)
Source note (if any)
An ideal table should have (1) title (2) stub (3) caption (or box head) (4) body
Frequency Table : In frequency table data are classified according to class intervals. Suppose
we have monthly income data of 1000 households in a large community. What is the number of
households whose monthly incomes lie between Rs.4000 and Rs.5000 ? Suppose from the given
data we find there are 400 households having monthly income between Rs.4000 and Rs.5000.
Now the class interval is stated as Rs.4000 — Rs.5000. The frequency in that class is 400. Rs.4000
is the lower limit of the class and Rs.5000 is the upper limit. The difference between the upper
limit and the lower limit is defined as the size or length of the class interval. In the present case the
size is Rs.1000.
The class intervals may be either continuous or discontinuous depending on the nature of
the variable. In practice the values observed on the continuous variable (say height, weight etc) are
classified into continuous class intervals and the values observed on discontinuous / discrete variable
(number of children in households, number of books in shelves etc) are classified into discontinuous
class intervals.Analysis and Interpretation of Data
05
ation - 1 : Continuous class intervals
(1
ust 1000- 000, 2000 - 3000, 3000 - 4000 etc,
piscontinous class intervals
1000 1999, 2000 - 2999, 3000 - 3999 ete.
The principal steps in the construction of frequency distribution are
@ decision as to number of and size of class intervals
(ii). selection of class limits.
(ii) counting of frequencies through tally marks, etc.
size of the class interval depends on the number of class intervals. The large number of
cass intervals may not justify the reasons for condensation / summarization of data. If the number
of class intervals is very small, essential features of the data may be concealed in the frequency
distribution. In practice it is expected that the number of class intervals should not exceed 15.
‘After class limits of the class intervals are fixed the number of observations falling in each
class interval is counted by using the oldest method, Tally marks (| || [) or by the use of computers.
The frequency distribution may consist of equal sized class intervals or unequal class intervals.
The equal sized class intervals help comparison of frequencies in different class intervals. T' ‘he
beginning and end class intervals may be either open or closed. In open ended class intervals the
beginning class intervals is written as ‘less than’ the lower limit of the succeeding class interval
and the end class interval is expressed as ‘Greater than or equal to’ the upper limit of the preceding
class interval.
Ilustration — 2 :
TABLE - 1 (Frequency Distribution)
Cumulative frequency _ |
Class Frequence| Relative Greater than] Less than
interval fF freq. type type
<30 4 0.08 4 50
30-40 12 0.24 16 46
40-50 18 0.36 34 34
50-60 14 0.28 48 16
>=60 2 0.04 50 2
Total 50 1 =
The open ended class intervals are used when the extreme class intervals contain very small
fiequencies and hence grouping is resorted with inequalities.
Relative frequency and cumulative frequency
on ae frequency is defined as the ratio of frequency in a particular class interval to the
requency in a data set. (Table — 3)
ae frequency is the consecutive addition of class frequencies when the class
S are either in increasing or decreasing order (Table — 3)B Business Research Methods
GRAPHICAL REPRESENTATION OF DATA
The graphs provide an alternative method of representing data in a con
form. A graph is a scale-dependent geometrical figure and provides a visual presentation of
statistical data.
Graphs are immensely useful to di
have become an indispensable tools for economic analysis (i.e, price vrs d
expenditure, time vrs population, time vrs agricultural production, economic development vrs
population growth etc.
Graphs with more than two variables lose their ready recognisability and are not in common
idensed and summary
lepict economic relationship explicitly through times and
lemand, income vrs
use.
‘An ideal graph should be self explanatory and drawn neatly with indication of scales on
both axes and with a clear and unambiguous title, Head note, Foot notes and Source notes at
suitable places in the body. All descriptions used in the graph should be written horizontally.
Illustration - 3 : Growth of Population in India
TABLE - 2 (Population of India)
Population in Milhons Y(t)
8888888
Year 1901 4917 1921 1931 1941 1951 1961 1971 1981 1991 2001
Fig.-1 : Population of India (1901 - 2001)processi Analysis and Interpretation of Data ee
Frequency graphs
Frequency distribution can be represented by suitable graphs to show the characteristics of
frequency distribution. These are
1. Histogram
2, Frequency curve
3, Frequency polygon
4. Cumulative frequency curves (ogives)
Histogram is a graphical representation of frequency distribution, where the frequencies in
the form of rectangles are erected over the respective consecutive class intervals. The areas of the
rectangles are proportional to class frequencies. Total area of the rectangles represents total
frequency.
The frequency polygon is drawn by joining the mid points of the tops of the rectangles.
The area bounded by the frequency polygon is supposed to be equal to the area bounded by the
histogram, which in turn represents the total frequency.
The frequency curve is a smoothed curve passing approximately through the extreme points
of the frequency polygon and the area bounded by the frequency curve and the X-axis represents
the total frequency.
The cumulative frequency curve is drawn by plotting cumulative frequencies greater than
type / less than type against the lower class limits / upper class limits of the class intervals. The
cumulative frequency curves are also called ogives.
DESCRIPTIVE STATISTICS
Besides the classification and tabulation of data, and their representation through graphical
and diagrammatic representation, there may be further necessity of calculating certain important
statistical derivatives (numerical quantities) like percentages, rates, ratios, indices, measures of
central tendency (averages) / measures of location, measures of variation / dispersion, coefficient
of variation and analytical ones such as differences, correlation coefficient, regression coefficient,
etc. which also lead to ultimate reduction of data.
Such reduction to some numerical quantities is useful for the sophisticated analysis and
interpretation of data. Thus, descriptive statistics provide a clear, concise, useful and informative
picture of a mass of numerical figures.
Measures of Central Tendency
Given a data series, a researcher may like to know what is the average of the values of
observations around which the values in the data series lie ? This is answered by computing the
statistical quantities termed as averages grouped under a broad name Measures of Central
Tendency / Measures of location. These measures are Arithmetic mean, Geometric mean, Harmonic
mean, Median and Mode. The most popular measure of central tendency is the Arithmetic mean.
Arithmetic Mean
The arithmetic mean of a set of values is tl
values and represents the average of values in the data set.
he sum of values divided by the total number ofBusiness Research Methods
Illustration — 4 ; ; :
Suppose the daily cereal expenditure of five households in a locality are Rs.60, Rs.80 ,
Rs.65, Rs.45 and Rs.55. Then, the average or arithmetic mean or simply mean daily cereal
expenditures of these five households is computed as
Sum of expendituresof five households
Arithmetic mean 4M = ber of households
_ 60+80+65+45+55
5
Mathematically gives a set of n observed values (ratio or interval data) x4.%3,%,.4%, the
afithmetic mean is defined by
=Rs.61
n n
:. Inspite of its simplicity in calculation, the arithmetic mean is affected by the extreme
values in the data set, which is a major draw back in its application.
This has led to innovation of two other measures with concept of average. These are
Geometric mean
e * Harmonic mean
The geometric mean of x,,x5,..,x, is defined as
Geometric Mean
1
Xoo = (1 xX, XX, X...Xq)a» Provided x, >0 for all i= 1, 2, ...n.
The Harmonic Mean x,,x,,..,x, is defined as
provided x; #0 for all i= 1, 2,.....n.
The geometric mean is an appropriate average of observations in the data set which lead to
be logarithmic in form. It is the correct average of percentage rate of increase and ratios.
The harmonic mean is useful when the data are given in terms of rates, Sometimes it is
Conventional to deal with data in kilometres per hour, units purchased per rupee or units produced
per hour. Mathematically it can be proved that x >xG >xH
For example, given four observations 5, 8, 10 and 117
> _5+8+10+11
_— a =35
XG =(5%8%10%117)" =(46,800)!* -14.71
XH
ea
ee
8 10 117
a
5and Interpretation of Data
However, 1 practice the arithmetic mean is invariably used because of its simplicity and
vanced applications in statistical analysis.
a
dia ¢
‘Median is another measure of position which divides a data set into two equal halves and is
the middle most value. Half of the values lies below the median and half above the median. To
median from a data set, the values are to be ordered either from the lowest to highest
ute the
aie highest to the lowest.
or from the
lustration -5:
Data in Illustration - 10 regarding the daily cereal consumption expenditure of five households
may be ordered as
Rs.45, Rs.55, Rs.60, Rs.65, Rs.80
The median is the value of the middle most observation when arranged either in ascending
or descending order. This comes cut to be 3rd observation, i.e, Rs.60/-. If the number of ‘observations
js even, the median is approximately equal to mean of two middle most observations.
‘Median is not affected by extreme values in the data set and involves one draw back that it
is not based on the numerical values of all observations. Further it lacks simplicity in advanced
statistical applications compared to simple mean.
Mode
Sometimes we may be interested in the most typical or the most frequent value in the data
array or the value around which maximum concentration of items accurs. That value is called
Mode. For instance, a garment manufacturer may like to know the size of men’s shirts that has the
maximum demand in the market, so that his production will have to concentrate around that shirt
size, Ina series of 6 observations - 10, 15, 18, 15, 20, 15, the modal values is 15, because it is the
most frequent value. However, mode has limited application in statistical studies.
Measures of Dispersion / Variability
Dispersion implies spread or variability in the values of the items in the given data set.
Referring to data in Illustration - 10 on daily cereal consumption expenditure of 5 households
Rs.60, Rs.65, Rs.80 Rs.45, Rs.55, we find that all the values are not the same and hence there is
Variability in the data set.
In order to throw light on this variability we need a measure of variation / dispersion. In
statistical literature there are a number of measures of disperssion to meanure the variability in the
data set. If there is no variablility in data set, it is said to have zero dispersion and as such all the
Measures will result in zero values.
Suppose there are n observations having the values
oe ke
The methods of computation of some important measures of dispersion are stated below.(236) Business Research Methods
Vi
‘ariance (¢2) ;
a
where X=—
ee Kal Sa,
Standard deviation (0)
ia
Mean absolute deviation (MAD) :
SSS
MAD. pods x|
Note : Variance is measured in st
absolute deviation,
Range (R)
The range is a very simple and rough measure of variation and is defined as
R= Xn, —X,
quared units of measurement unlike standard deviation or mean
in
where Xa, and X,
Mean difference (A) :
1
ase -x||
Pa
six ar€ maximum and minimum values in the set of observations,
n(n
The mean difference is attributed to Gini and measures
independent of any central value.
Quartile Deviation :
The median divides the data array into two equal parts, Le, 50%
the intrinsic spead of the distribution,
of the observations lie
above the median and 50% lie below the median. The first quartile (Q,) in an ordered data array is
the value above which 75% of the values lie and below which 25% of the values lie. The second
quartile is the median. The third quartile (Q,) is the value below which 75% of the values lie and
above which 25% of the values lie, Thus, the interquartile range is Q, ~Q,.
The Quartile deviation is defined as
-3-%
QD= 2
This is also termed as Semi-interquartile range.processing, Analysis and Interpretation of Data
Relative Measures of dispersion
Sometimes we may be required to compare variations in two data series where the units /
opservations are measured in different units of measurement. For example, suppose we are interested
in comparing the variations in heights (measured in cm) and weights (measured in kg) of a group
of individuals Here we need to construct some relative measures of variation which are free from
the units of measurement. Some measures of this type are given below.
Coefficient of variation
Let o and X be the standard deviation and arithmetic mean of the values in a data series.
Then, the coefficient of variation is defined as
standard deviation _o
CV=
mean Xe
which is free from the unit of measurement, whatever it may be. It is usually expressed in
percentage.
Other relative measures of variation are
_ Mean absolute deviation from mean
@ Arithmetic mean
(ii) Mean absolute deviation from median
Median
, -Q
(iii) QG+Q
(iv) Coefficient of Range
= Xmax —Xmin
xx
Illustration — 6 :
Referring to data in Illustration - 10 we compute X = 61, X,, (median) = 60
gi=
(45-61) +(55-61). + (60-61)? +(65~61) +(80-61)"
2
= 134 (squared units)
Thus, o = 11.58,
MAD (from mean) = —(46) = 9.20
5
5 1
Mean absolute deviation from median = 3(43) =9
p= =O - 2.5059 11.25
2 2Business Research Methods
Relative measures of variation
o_ 11.58
=D === 0.1898
@ Wee
ig) Mea absolute dey from median 9 4
() nedian SOS OO
= Q _ 72.50=50 _ 99 590.1836
(iii) Q+Q, 72.50+50
Xwnax—Xmin _ 80-45
: : = mean = «08
(iv) Coefficient of Range max + Xin 80445
from the ungrouped data,
Measures of Central tendency,
Let x), X2ou9X\ represent the mid-values
of k class interval
SioSyy «sf, Tespectively,
Is with frequencies
k
Sin
Arithmetic mean = ¥ = 1+ 6%) +..+ fy
fi +f +. at A,
Geometric mean Ry = (x x9"...
df
2
ial
Harmonic mean Xq=proces nalysis and Interpretation of Data ee)
Median 4
Median in a frequency distribution is calculated by forming a cumulative frequency table.
As the exact value of the median is not possible to locate in a grouped frequency distribution, it is
requited to use a numerical interpolation formula such as
Median = M, =|,
where 1, = the lower limit of the median class
[= the upper limit of the median class
f= frequency in the median class
Athtoth et Total frequency
mu 2
c= the frequency in the class preceding the median class.
It may be pointed out here that the computation of median in a frequency distribution needs
continuous ordered class intervals.
Mode
‘The modal class is the class for which the frequency is maximum, To compute mode from
the grouped frequency distribution with continuous class intervals, the numerical interpolation
formula is to be adopted.
AL
Ay +A,
J, = lower limit of the modal class
i= size of the class interval.
Xo = 1) +i
A, = difference between frequency is the modal class and the preceding class.
A, = difference between frequency in the modal class and the succeeding class.
Measures of Dispersion
(i) Variance (o*)Business Research Methods
Illustration — 7 ;
Given below the frequency distribution of number of defective electric tubes in 50 lots,
each of size 100.
TABLE-3
Frequency distribution of defective tubes
Defective Frequency Cumulative Mid- fxx Median/
tubes C) frequency _value(x) Modal class
0-5 4 4 25 10
5-10 8 12 15 60
10-15 14 26 tas 175 median class
15-20 18 44 17.5 315 modal class
20-25 6 50 22.5 135
Total 50 — _ 695
Median (M,) = 4+
15-10
iq (25-12), (median lies in the class 10 — 15)
= 14-64, where J, =10
=10+,
L #15, f=14,m= 225 and
e=12
4A,
Mode (M,) =! “(e -), (mode lies in the class 15 — 20)
4
=15+5|—_| _ =
Fel =15+1.25=16.25
Standard deviation (6)
. yh (5-3)
= (1552) =5.571and Interpretation of Data
Mean Absolute Deviation (MAD)
1S ff -3| = 7228 - 4.656
“Thi 50
TABLE -4
Computations for standard deviation and mean deviation
- ay 2 = =
vidvale) Ff ax (=x) iu -3) 3] bs |
25 4 2.5 -13.9=-11.4 129.96 519.84 1L4 45.6
15 8 75-139 40.96 327.68 6.4 51.2
=-64
12.5 14 12.5 -13.9=-1.4 1.96 27.44 14 19.6
17.5 18 17.5-13.9=3.6 12.96 233.28 3.6 64.8
22.5 6 22.5 -13.9= 8.6 73.96 443.76 8.6 51.6
- 0 . 1552 = 232.8
ANALYSIS OF DATA
Data do not speak themselves. The researchers or analysts make them speak.
Processing and summarization of data help to get a feel for the data. Exploration of data at
certain stages becomes descriptive analysis and involves
statistical representation of data
¢ logical ordering of data.
such that the question can be raised and answered.
The statistical representation of data needs clasification, frequency distribution or tabulation.
The tabulation in order to have utility, must have internal logic and order.
‘The most common summary statistics / descriptive statistics are ratios, proportions, measures
of location / averages, measures of dispersion.
Consider the age - sex two - way classification of some population below 15 years of age
given in Table — 3. The proportion of males in 0-4, 5-7 and 10-14 are 0.526, 0.532 and 0.517
Tespectively,
TABLE -5
Ratio. | Proportion of males ]
(F/M) M/(M+F)
0.90 0.526 (52.6%)
0.88 0.532 (53.2%)
0.517 (51.7%)
0.525 (52.6%)Business Research Methods
Across tabulation of bivariate data displays the frequency (or percentage) of all combinations
of two or more nominal or categorical variables to detect association / corelation or cause - effect
relationship. The foregoing table is a cross table with sex and age groupp of two variables (attributes)
representing frequency for each combination,
The diagrammatical and graphical representation of data are some of the classical
methods to understand the data and to open up concealed features there in, which are of interest to
the researchers and also for the laymen,
plot (b) box plot / box whisker Plot. To construct a stem and leaf plot divide each number into
two groups - first few digits form stem and the remaining digits form leaf. A box-plot is a
summary information contained in the quartiles,
The Pareto diagram is a bar chart whose percentages sum to 100. The caused of problem
under investigation are sorted in decresing importance with bar heights decreasing from left to
tight,
The computation of statistical measures of central tendency indicate average picture of the
observations in the data set and the measures of variation indicate the amount of dispersion or
variability in the set.
Analytical statistics in the statistical analysis include the comparison of two data sets as
regards their averages, ratios and variations. If the researcher likes to know the relationship between
value of independent variable.
When the population under study is large, we resort to a sample study because of time and
Cost constraints and limited infrastructure facilities, The estimates computed from the sample for
Population characteristics such as Population mean and Propotions will be subject to sampling and
non sampling errors which ought to be controlled for valid inferences. Statistical procedures are
available in literature to deal with such cases for efficient statistical analysis,
Research questions are translated into research hypotheses. These hypotheses are tested
with the help of samples randomly selected from the certain Populations having some infinite
thoretical population models, such as normal, exponential, binomial and Poisson populations, etc
which fit many real life situations. The concept of theoretical models is considered while testing
the hypothesis in order to generalize the conclusions derived using sample from finite Populations.Analysis and Interpretation of Data
process
When sampling from finite or infinite populations, two inferential problems arise :
Estimation of unknown population characteristics such as mean, total, proportion, variance,
correlation coefficient etc.
Testing. hypothesis about the unknown population characteristics as mentioned above
The first problem requires to compute the estimates along with standard errors or confidence
intervals. The standard error which is the square root of the sampling variance, a measure of
sampling error indicates the extent of reliability to be put on the sample estimates. Alternatively
we may also compute the confidence interval in which the unknown population parameter is
expected £0 lie with certain probability.
For the second problem we set up hypothesis based on research questions and take decision
as to sample evidence whether to reject or accept the hypothesis with certain level of significance.
Sometimes it may not be possible to guess the form of the distribution from which a particular
sample is assumed to be drawn, excepting that the population is continuous having existence of
certain moments. In such situation certain non-parametric or distribution free tests have been
developed in the literature. Non-parametric tests are less efficient than parametric tests where the
assumption of normality of observtions is an essential requirement.
INTERPRETATION OF DATA :
‘After collection of data and its subsequent analysis, the researcher has to accomplish the task
of drawing inferences followed by report writing. This has to be condensed carefully, otherwise
misleading conclusions may be drawn and the whole process of doing research may get vitiated. It
is only through interpretation that the researcher can expose relations and processes that underlie
his findings. If the hypothses are tested and upheld several times, the researcher may arrive at
generalizations.
What is interpretation ?
Interpretation refers to drawing inferences from the collected facts after an analysis and/or
experimental study. It has two major aspects :
(a) the effort to establish continuity in research through linking the results of a given study
with those of another, and
(b) the establishment of some explanatory concepts.
Why Interpretation ?
Interpretation is essential for the simple reason that the usefulness and utility of research
findings lie in proper interpretation. It constitutes a basic component of research process. The
Teasons are :
1. It is through interpretation that the researcher can well understand the abstract principle that
works beneath his findings.
2. Interpretation leads to the establishment of explanatory concepts which can serve as future
guide.Business Research Methods
3, Researcher can better appreciate only through interpretation why his findings are what they
are and can make others to understand the real significance of his research findings.
Techniques of Interpretation :
The task of interpretation is not a very easy one. It requires the knowledge and expertise of
the researcher along with basic understanding of the subject of research and its theoretical
background,
A scientific analysis of statistical data and survey findings help realistic interpretation. The
researcher should base his conclusions and interpretation on the basis of reliable data collected for
the purpose and the researches done earlier. Broad generalizations of the results should be avoided
unless verified by repeated experiments and field surveys. The reseacher should avoid making
scientific comments and interpretations without verifying from possible angles. Limited sample
data should be interpreted with caution. If necessary, the researcher should consult experts in his
subject of study while interpreting the results obtained in his researches.
MODEL QUESTIONS
a in the blanks.
The averages around which the values of observations lie are called ——.
Mean deviation and Quartile deviation are measures of.
Relative measures of dispersion are free from
10. Coefficient of variation is a
2. The most popular average is
3. Arithmetic mean is affected by values.
4. Median is not affected by value in the data set.
5. The variability in the data set is measured by
6. The variance is measured in units.
7. The standard deviation is of variance.
8.
9,
measure of dispersion.
1. Measures of Central tendency, 2. Arithmetic mean, 3. Extreme, 4. extreme,
5, Measures of variation / dispersion, 6. squared, 7. positive square root, 8. Dispersion,
9. units of measurement, 10. Relative.
Ey muttipte Choice Questions
1. Point out the relative measue of dispersion
(a) Variance (b) Quantile deviation
(c). Range (@) Coefiicient of variation
Ans (d) Coefficient of variationAnalysis and Interpretation of Data
cossite
spat is the first step in analysis of data ?
2 We ia entry to computer (b) Tabulation
Editing (d) None of the above
as. (0) Editing
which of the following measures is free from the unit of measurement ?
3 Mean (b) Variance
o standard deviation (d) Coefficient of variation
Ans. (d) Coefficient of variation
if you change the origin of the measurement of a variable ?
(@) variance may be increased (©) variance does not change
(0) standard deviation may be reduced (d) — None of the above
‘Ans. (b) variance does not change
Find out True / False statements
1, The variance of a set of observations corresponding to weights of 10 objects measured in
kg. is Ske.
Depending on data correlation coefficient can be greater than unity.
‘Two sets of observations (1, 3, 5) and (2, 4, 6) have the same variance.
Variance or standard deviation is invariant under the change of origin.
Variance or standard deviation is invariant under change of scale,
Coefficient of variation is the ratio of standard deviation to the mean, usually expressed in
percentage
Oe)
1. False, 2. False, 3. True, 4. True,
a Short answer-type questions
if
(a) What are steps for data preparation ?
Ans, Editing, Coding , and Data entry.
(b) What is transcribing ?
Ans. Transferring coded data into a computer.
(©) What is coding ?
Ans. Survey coding is the process of taking, open-end responses and categorizing them into
fee Once coded they can be analyzed in the same way as multiple response questions can be.
may vary ftom person to person depending on what you code you use for open end comments.
2 How would you summarize data?
‘Ans, Classification, Tabulation. Graphs, Diagrams, Computation of Statistical derivatives.Business Research Methods
(a) What is classification of data?
Ans, Representing data in different classes
(b) Wha
Ans. Number of entities in a particular category,
Frequency class
4. (a) What is class interval ?
Ans, Particular numerical interval having upper
falls.
and lower limits within which certain
(b) What are closed interval and open class interval ?
Ans, Example: In age classification one can group ages of population in 15-20 years.20-
25, ete as (closed) intervals and above (>) 20 or < 15 (open interval).
(c) What is size of class interval 2
Ans. (Upper limit- lower limit.)
(d) What is Relative frequency ?
Ans. Ratio of Frequency in a class interval to total frequency.
3. (a) What is Tabulation of data?
Ans. Representing classified data in orderly fashion enclosed by horizontal and vertical
lines.
(b) What is the general format of a Table ?
Ans. Table number, Title, Head note - Row and column, Body of the table, Foot note,
Source note.
(c) What is Master table ?
Ans. Reference table/Information table/General purpose table —the detailed table with
available enquiry in orderly fashion.
(a) What is summary table ?
Ans. Special purpose table derived from General purpose table,
(€) Give a frequency table, how would you reproduce data ?
Ans. Graphs and Diagrams
(f) What is a graph ?
Ans. Scale dependent geometrical figure
(g) What is Histogram ?
Ans. Graphical representation of frequency distribution,
(h) What is a Diagram ?
Ans. Diagrammatical representation of data in one dimension, two dimension and multi-
dimensions.
(i) What is importance of diagrams ?
Ans. For easy visual interpretation.
(j) What are statistical derivatives from a data set 2‘ nalysis and Interpretation of Data
proces
geome - type questions.
‘What do you understand by processing of data ? What are its different components ? Explaii
nts ? Explain.
What are different steps for data preparation ? Discuss.
Which is brief reasons for
(@ Graphical representation of data
(b) Diagrammatical representation of data
© Classification of data
(a) Tabulation of data
‘What is cross tabulation of data ? Explain its uses.
5, Mention considerations for the analysis of data.
: ene ? Why do you interpret ? What are techniques for interpretation ?
o#e#d