Data Collection, Assessment of
Qualitative Data, Data
Processing: Key Issues
Presentation Layout
• Introduction to data
• Classification of data
• Collection of data
• Methods of data collection
• Assessment of qualitative data
• Processing of data
- Editing
- Coding
- Tabulation
- Graphical representation
What is data?
Data are observations or evidences about the social world
Data, the plural of datum, can be quantitative or
qualitative in nature
‘data is produced, not given’; that is, researchers choose
what to call data, it is not just ‘there’ to be ‘found’.
(Marsh 1988)
- The Sage Dictionary of Social Research Methods
Data & Information
The terms 'data' and 'information' are used interchangeably
However the terms have distinct meanings
Data Information
Facts, events, Data that have been
transactions which have produced in such a way as
been recorded to be useful to the
recipient
Input raw materials is processed
from which information
Basic data are
processed in some
way to form
information
Nature of Data
The research studies in behavioral science are
mainly concerned with the characteristics or traits
Thus, tools are administered to quantify these characteristics
- but all traits or characteristics can not be
quantified The data can be classified into two broad
categories:
Data
Qualitative Data or Quantitative Data or
Attributes Variables
Nature of Data
1. Qualitative Data or Attributes
The characteristics or traits for which numerical value
can not be assigned, are called attributes
e.g. gender, motivation, etc.
2. Quantitative Data or Variables
The characteristics or traits for which numerical
value can be assigned, are called variables
e.g. height, weight etc.
Constants
A constant is all characteristic or condition that is the same for
all the observed units or sample subjects of a study
Variables
The characteristic or the trait in the behavioral science which
can be quantified is termed as variable
Variables
Continuous variables Discrete variables
Variables
1. Continuous variables
A characteristic whose observation can take any values over
a particular range
It can assure either fractional or integral values
E.g. wt. of children in kg, height of pt.
2. Discrete variables
Are those on the other hand, which exist only in units not the
fractional value (usually units of one)
E.g. No. of cataract pts. in a village, WBC count
Attribute vs. Variable
Attribute Variable
A category of a characteristic, Variable describes a
to which a subject either characteristic in terms of
belongs or does not belong or a numerical value, which
property that a subject either is expressed in units of
possesses or does not measurements
possess
The variables are height,
The attributes are weight, blood pressure, age
becoming sick, describing of pts. etc.
blood group etc.
Qualitative Data
In such data there is no notion of magnitude of size of
the characteristic
They are just categorized
The data are classified by counting the individuals having
the same characteristics or attribute and not by
measurement
For examples: Gender: male/female
Disease: present/absent
Smoke: smoking/not smoking
These data can be measured in nominal and ordinal scales
Quantitative Data
Anything that can be expressed as a number, or quantity
or magnitude
Describes characteristics in term of a numerical value, which
are expressed in units of measurements
E.g. level of hemoglobin in the blood, no. of glaucoma pts.,
intra ocular pressure, weight, etc.
Quantitative observations: as each individual is represented by
a number
These data can be measured in interval and ratio scales
Measurement Scale
The choice of appropriate statistical technique
depends upon the type of data in question
Qualitative Quantitative
Data Data
• Nominal Scale • Interval Scale
• Ordinal Scale • Ratio Scale
Nominal Scale
The least precise or crude of the 4 basic scales
of measurement
Implies the classification of an item into 2 or more
categories without any extent or magnitude
There is no particular order assigned to them
The frequency or numbers are used to give a name to
something that may be used for determining per cent,
mode
Eg. boys and girls; pass and fail; rural and urban
Ordinal Scale
The ordinal scale is more precise scale than the
nominal scale
The variables has been categorized or leveled
with meaningful natural order
But there is no information about the
interval Eg. Pain: none, mild, moderate,
severe
Interval Scale
The interval scale is more precise and refined scale
than nominal and ordinal scales
This scale has all the characteristics and relationship of
the ordinal scale, besides which distances between any
two numbers on the scale are known
The size of interval between two observations can
be measured
Eg. The temperature of a body
Ratio Scale
It has the same properties as an interval scale as well as
a true or absolute zero value
The ratio scale numerals have the qualities of real
numbers, and can be added, subtracted, multiplied
or divided
Eg. Mean systolic BP
Collection of Data
Process of systematic gathering of data for a particular
purpose from various sources, that has been
systematically observed, recorded, organized
It is the first step of statistical study
There are several ways of collecting data
The choice of procedures usually depends on the
objectives and design of the study and the availability of
time, money and personnel
Purpose of Data Collection
To obtain information
To keep on record
To make decisions about important issues
To pass information onto others
For research study
How Important it is?
Data collection is an extremely important part of any
research because the conclusions of a study are
based on what the data reveal
Factors to be considered before
data collection
Nature, scope & objective of the enquiry
Sources of information
Availability of fund
Techniques of data collection
Availability of trained persons
Sources of Data
Source of Data
External Internal
Primary Data Secondary Data
Example: materials Surveys
Documents
Creative
works
Interviews
Man-made
Example: ls
Unpublis
hed
thesis
and
dissertati
ons
Manuscri
pt
B
o
o
k
s
J
o
u
r
n
a
Internal & External Sources of Data
Internal sources of Data External sources of data
o Many institutions and o When information is collected
departments have from outside agencies is
information about their called external sources of data
regular functions , for their
own internal purposes
o Such types of data are
either primary or secondary
oWhen those information are
used in any survey is called
internal sources of data o This type of information
can be collected by census
o Eg. social welfare society or sampling method by
conducting survey
Primary Data
Data collected by investigator from personal
experimental studies for a specific research goal is called
primary data
The data are collected specially for a research project
Used when secondary data are unavailable and inappropriate
Data are to be unique, original, reliable and accurate in nature
Primary data hahe not been changed or altered by human
beings, therefore its validity is greater than secondary
data
Primary Data
Merits Demerits
Targeted issues Evaluated cost
are addressed
Data interpretation is better Time consuming
High accuracy of data More number of
resources are required
Address as specific Inaccurate feedback
research issues
Greater control Required lot of skill with
labor
Primary Data Collection
Techniques
Interview (direct/indirect)
Schedule
Questionnaires survey
Focus group discussion (FGD)
Community forums and public hearings
Observation
Case studies
Key informants interview
Internet/E-mail/SMS
Direct personal observation
The data is collected by the investigator personally,
he/she must be a keen observer
He/she asks or cross-examines the informant and
collects necessary information
It is original in character
Suitability of direct personal observation
Direct personal observation is adopted in the following cases
Where greater accuracy is needed
Where the field of enquiry is not large
Where confidential data are to be collected
Where sufficient time is available
Direct personal observation
Merits Demerits
Original data Unsuitable in large area
True and reliable data Expensive & time-consuming
Encouraging response Untrained investigator
because of personal brings worst results
approach Collection of information
A high degree of accuracy according to the ease of
the informant
Indirect oral interview
The investigator approaches the witness or third
parties, who are in touch with the informant
The enumerator interviews the people, who are directly
or indirectly connected with the problem under the study
Generally this method is employed by different
enquiry committees and commissions
The police department generally adopts this method
to get clues of thefts, riots , murders, etc.
Suitability of indirect oral interview
It is more suitable when the area to be studied is large
It is used when direct information cannot be obtained
This system is generally adopted by governments
Indirect oral interview
Merits
Simple and convenient
Saves time, money and labor
Useful in investigation of a large area
Adequate information can be had
Demerits
Information can’t be relied as absence of direct contact
Interview with an improper man will spoil the results
To get real data, a sufficient no. of people are to be interviewed
Careless attitude of informant affects the degree of accuracy
Information through agencies
The local agents or correspondents will be appointed, they
collect the information and transmit it to the office or
person
They do according to their own ways and tastes
Adopted by newspapers, agencies, etc.
The informants are generally called correspondents
Suitable in those cases where the information is to
be obtained at regular intervals from a wide area
Information through agencies
Merits
Extensive information can be had
It is the most cheap and economical method
Speedy information is possible
It is useful where information is needed regularly
Demerits
The information may be biased
Degree of accuracy cannot be maintained
Uniformity cannot be maintained
Data may not be original
Mailed questionnaires
The questionnaires is sent to the respondents, there are
blank spaces for answers
A covering letter is also sent along with the questionnaire,
requesting the respondent to extend their full
cooperation
Adopted by research workers, private individuals, non-
officials agencies and government
Appropriate in cases where informants are spread over a
wide area
Mailed questionnaires
Merits
Of all the methods, the mailed questionnaire is the
most economical
It can be widely used, when the area of investigation is large
It saves money, labor and time
Demerits
Cannot be sure about the accuracy and reliability of the data
There is long delay in receiving questionnaires duly filled in
Data Collection Through Schedules
Very similar to the questionnaire method
The main difference is that a schedule is filled by the
enumerator who is specially appointed for the
purpose
Enumerator goes to the respondents, asks them the
questions from the Performa in the order listed, and
records the responses in the space provided
Enumerators must be trained in administering the schedule
Survey
A detailed study of geographical area to gather data,
attitudes, impressions, opinions, satisfaction level etc.,
by polling a section of the population
Census Survey Continuous Ad-hoc Survey
• Conducted Survey • Conducted at
regularly at • Conducted specific times
Types large interval of regularly for specific need
time and • ‘as and
frequently when’
required
Survey
Merits Demerits
On small scale survey
Cover large population avoided
Time consuming
Less expensive
Information does not
penetrate deeply
Information is accurate Researcher must have
good knowledge
Case Study
It is the method of comprehensive study of social unit which
may be a person, a family, an institution, an organization or
a community
Merits Demerits
One case almost
Direct behavioral study different
from another case
Real & personal Personal bias
experience record
Make possible the Use only in limit sphere
study of social change
Increase analysis More time & money
ability & skills consuming
Focus Group Discussion
Useful to further explore a topic, providing a broader
understanding of why the target group may behave
or think in a particular way
And assist in determining the reason for attitudes
and beliefs
Conducted with a small sample of the target group and
Used to stimulate discussion and gain greater insights
Focus Group Discussion
Merits
Useful when exploring cultural values and health beliefs
Can be used to explore complex issues
Can be used to develop hypothesis for further research
Do not require participants to be literate
Demerits
Lack of privacy/anonymity
Potential for the risk of ‘group think’
Potential for group to be dominated by one or two people
Group leader needs to be skilled at conducting focus groups,
dealing with conflict, drawing out passive participants
Time consuming to conduct and analyse
Triangulation
Application and combination of several research methods in
the study of the same phenomenon
Beating the Bias
Researchers can hope to overcome the weakness or intrinsic
biases and the problems that come from single method, single-
observer and single-theory studies
The purpose of triangulation in qualitative research is to
increase the credibility and validity of the results
Types (Denzin
1978)
Data Investigator Theory Methodological
Triangulation Triangulation Triangulation Triangulation
Secondary Data
Secondary data are those data which have been already
collected and analysed by some earlier agency for its own
use and later the same data are used by a different agency
Sources of
Secondary
Data
Published Sources Unpublished
Sources
Published Sources
Various governmental, international and local
agencies publish statistical data, and chief among
them are:
International publications: They are UNO, WHO, Nature, etc.
Official publications of Government: Department of Drug
Administration, Central Bureau of Statistics
Semi-Official publications: Semi-Govt. institutions like
Municipal Corporation, District Board, etc. publish
reports
Published Sources
Publications of Research Institutions: Nepal Development
Research Institute, Nepalese Journal of Ophthalmology
etc. publish the finding of their research program
Journals and Newspapers: Current and important materials
on statistics and socio-economic problems can be obtained
from journals and newspapers like, Swasthya Khabar
Patrika, Health Today Magazine, The Sight, etc.
Unpublished Sources
Records maintained by various government and private
offices
Researches carried out by individual research scholars in
the universities or research institutes
According to Prof. Bowley “It is never safe to take published statistics
at their face value without knowing their meaning and limitations
and it is always necessary to criticize arguments that can be based on
them.”
Precautions in the use of Secondary Data
Before using the secondary data, the investigators should
consider the following factors:
Suitability of data
Adequacy of data
Reliability of data
Secondary Data must possess the following
characteristics
Reliability of data – may be tested by checking:
Who collected the data?
What were the sources of the data?
Was the data collected properly?
Suitability of data
Data that are suitable for one enquiry may not be
necessarily suitable in another enquiry
Objective, scope and nature of the original enquiry must be studied
Adequacy of data – data is considered inadequate, if they are
related to area which may be either narrower or wider than the
area of the present enquiry
Primary data Secondary data
o Real time data
o Past data
o Sure about sources of data
o Not sure about of sources of
o Help to give results/ data
finding o Refining the problem
o Cheap and no time
o Costly and time consuming process
consuming process o Can not know in data
biasness or not
o Avoid biasness of o Less flexible
response data
o More flexible
Assessment of Qualitative Data
The characteristics or traits for which numerical value
can not be assigned, are called qualitative data
(attributes)
e.g. gender, color, honesty etc.
Methods of collecting qualitative data
Methods of Qualitative
Data Collection
Use of
Direct In-depth
Case Study Triangulation Secondary
Observation Interview
Data
Assessment of Qualitative Data
Classification of Qualitative data
Qualitative
Data
Geographical Chronological Qualitative
Classification Classification Classification
Assessment of Qualitative Data
Tabulation of Qualitative Data
Qualitative data values can be organized by a
frequency distribution
A frequency distribution lists
– Each of the categories
– The frequency/counts for each category
Assessment of Qualitative Data
Frequency Table
A simple data set is: cataract, cataract, keratoconus,
glaucoma, glaucoma, cataract, glaucoma, cataract
A frequency table for this qualitative data is
Eye condition Frequency
Cataract 4
Keratoconus 1
Glaucoma 3
The most commonly occurring eye condition is cataract
Assessment of Qualitative Data
What Is A Relative Frequency?
The relative frequencies are the proportions (or
percents) of the observations out of the total
A relative frequency distribution lists
– Each of the categories
– The relative frequency for each category
Relative frequency = Frequency/Total
Assessment of Qualitative Data
Relative Frequency Table
A relative frequency table for this qualitative data is
Refractive Error Relative Frequency
Cataract .500 (=4/8)
Keratoconus .125 (=1/8)
Glaucoma .375 (=3/8)
A relative frequency table can also be constructed
with percents (50%, 12.5% and 37.5% for the above
table)
Assessment of Qualitative Data
Graphical representation Of Qualitative Data
Bar Diagram
Pie or Sector
Diagram
Line Diagram
Pictogram
Map Diagram or
Cartogram
Data Processing
Data Processing
The data, after collection, has to be prepared for analysis
Collected data is raw and it must undergo some
processing before analysis
The result of the analysis are affected a lot by the form
of the data
So, proper data processing is must to get reliable result
Objectives of Data Processing
Checking the questionnaires and schedules
Reduction of mass data to manageable proportion
Sum up the materials so as to prepare tables, charts,
graphs and various groupings and breakdowns for
presenting the result
Minimizing the errors which may creep in at various stage
of the survey
Types of Data Processing
1. Manual Data Processing
Involves human intervention
Implies many chances for errors, such as delays in
data capture, high amount of operator misprints
Implies higher labor expenses in regards to spending
for equipment and supplies, rent, etc.
Types of Data Processing
2. Mechanical Data Processing
Different calculations and processing are
performed using mechanical machines like
calculators etc.
The use of mechanical machines makes data
processing easier and less time- consuming
The chances of errors also become far less than
manual data processing
Types of Data
Processing
3. Electronic Data Processing
Processing of data by use of computer and its programs
Types of Data Processing
4. Real Time Processing
There is a continual input, process and output of data
Data has to be processed in a small stipulated time
period (real time)
Eg, when a bank customer withdraws a sum of money from
his or her account it is vital that the transaction be
processed and the account balance updated as soon as
possible
Types of Data Processing
5. Batch Processing
In a batch processing group of transactions collected over a
period of time is collected, entered, processed and then
the batch results are produced
Batch processing requires separate programs for input,
process and output
It is an efficient way of processing high volume of data
Eg, Payroll system, examination system and billing system
Important Steps in Data Processing
The processing of data involves activities such as
QUESTIONNAIRE
EDITING CODING CLASSIFICATION
CHECKING
GRAPHICAL
DATA ADJUSTING DATA CLEANING TABULATION
REPRESENTATION
Questionnaire Checking
When the data is collected through questionnaires, the
first steps of data process is to check the questionnaires if
they are accepted or not
Not accepted if:
Gives the impression that respondent could
not understand the questions
Incomplete partially or fully
Answered by a person
who has inadequate
knowledge
Data Editing
Process of examining the data collected
in questionnaires/schedules
to detect errors and omissions
to correct these when possible
to make sure the schedules are ready for tabulation
Data Editing
Editor is responsible for seeing that the data are;
Accurate as possible
Consistent with other facts secured
Uniformly entered
As complete as possible
Acceptable for tabulation and arranged to
facilitate coding tabulation
Types of Editing
• Data form complete
Editing for quality • Free of bias, errors,
inconsistency and
dishonesty
Editing for • Modification to facilitate
tabulation tabulation,
• Ignoring extremely high/low
• Translating or
Field editing rewriting
• Wrong and
Central editing replacement
Necessity of Editing
To gather
information
To make data relevant and appropriate for analysis
To find errors and modify them
To ensures that the information provided is accurate
To establish the consistency of data
To determine whether or not the data are complete
To obtain the best possible data available
Coding of Data
Process of assigning numerals or other symbols to answers
so that responses can be put into limited number of
categories or classes
Translating answers into numerical values or assigning
numbers to the various categories of a variable to be used
in data analysis
Coding is done by using a code book, code sheet, and
a computer card
Coding is done on the basis of the instructions given in
the codebook
The codebook gives a numerical code for each variable
Codebook
• A codebook contains coding instructions and the
necessary information about variables in the data set
• A codebook generally contains the following information:
- column number
- record number
- variable number
- variable name
- question number
- instructions for coding
72
Necessity of Coding
To organize data code
To form structure for coding
For interpretation of data
For conclusions of data coded
To translating answers into numerical values
To assign no. to the various categories for data analysis
It is necessary for efficient analysis
Classification of Data
The process of arranging the primary data in a
definite pattern and presenting it in a systematic way
The crude data obtained from experiment or survey
is classified according to their properties
Classification cab be done by qualitatively or quantitatively
Objectives of classification
The classified data is more easily understood
It presents the facts into a simpler form
It facilitates quick comparison
It helps for further statistical treatment such
as average, dispersion etc.
It detects the error easily
Types of classification
Qualitative classification Quantitative classification
Geographical classification Discrete classification
Chronological
Continuous classification
classification
Qualitative classification
Qualitative Classification
Geographical Classification
Data are classified by location of occurrence (i.e. area,
region) eg cataract pts. district wise
Chronological classification
Data are classified by time of occurrence of the
observations, events
The categories are arranged in chronological order
eg, no. of trachoma pts. recorded from 2000 to
2010
Qualitative Classification
Qualitative classification (Classification according to attributes)
Data are classified according to some quality such as
religion, literacy, sex, occupation etc.
Simple classification
Classification is made into 2 classes, such as classification
by male or female
Manifold classification
2 or more than 2 attributes are studied simultaneously
Eg. Classification according to sex, again marital status
and again literacy
Tabulation
Process of systematic organization and recording
of long series of data for further analysis and
interpretation into rows and columns
It is concise, logical & orderly arrangement of data in
a columns & rows
Usefulness of
Tabulation
It presents an overall view of findings in a simpler way
To identify trends
It displays relationships in a comparable way between
parts of the findings
It conserves space and reduces explanatory and
descriptive statement to a minimum
It facilitates the process of comparison
It provides a basis for various statistical computations
Graphical Representation
Graphs help to understand the data easily
A single picture is worth a thousand words-so goes
a common saying
The non statistical minded people also easily
understands the data and compares them
Most common graphs are bar charts and pie charts
in qualitative study and histogram in quantitative
study
Graphical Representation
Advantages
It is easier to read
Can show relationship between 2 or more sets
of observations in one look
Universally applicable
Has high communication power
Simplifies complex data
Has more lasting effect on brain
Graphical Representation
Presentation of Qualitative data
1. Bar Diagram
• Consists of equally spaced vertical (or horizontal)
rectangular bars of equal width placed on a
common horizontal (or vertical) base line
• The categories are placed on X-axis and their
frequencies on Y-axis
Graphical Representation
Health Program at IOM
400
NO. OF STUDENTS
300
200
100
0
BPH MBBS B.Optom B.Pharma
Component Bar diagram
HEALTH PROGRAM
Simple Bar diagram
Multiple Bar diagram
Graphical Representation
2. Pie Chart
• Circular diagram divided into segments and
each segment represent frequency in a category
Graphical Representation
Line diagram
Pictogram
Production of health manpower
yearly
Cartogram
Graphical Representation
Presentation of Quantitative Data
1.Histogram
• Graphical representation of a set of contiguously
drawn bars
• Most popular graph for continuous variable
Graphical Representation
Frequency Curve
Frequency Polygon
Scatter Diagram Time Plot
Graphical Representation
Stem-leaf Display
Box-and-whisker Plot
Data Cleaning
Includes consistency checks and treatment of
missing responses
Although preliminary consistency checks have been
made during editing, the checks at this stage are more
thorough and extensive, because they are made by
computer
Computer packages like SPSS, SAS, EXCEL and MINITAB
can be programmed to identify out-of-range values for
each variable
Data Adjusting
If any correction needs to be done for the
statistical analysis, the data is adjusted accordingly
Data adjusting is not always necessary but it
may improve the quality of analysis sometimes
Data Analysis
References
• Biostatistics by Prem P. Panta
• Fundamentals of Research Methodology
and Statistics by Yogesh k. Singh
• Research Design by J. W. Creswell
• Internet
Thank
you