DATA
COLLECTION
AND
PROCESSING
UNIT 3
DATA COLLECTION
Data collection is defined as the procedure of collecting, measuring and
analyzing accurate insights for research using standard validated techniques.
A researcher can evaluate their hypothesis on the basis of collected data.
In most cases, data collection is the primary and most important step for
research, irrespective of the field of research. The approach of data collection is
different for different fields of study, depending on the required information.
TYPES OF DATA
PRIMARY DATA SECONDARY DATA
PRIMARY DATA
Primary data is a type of data that is
collected by researchers directly from
main sources through interviews,
surveys, experiments, etc. Primary data
are usually collected from the source—
where the data originally originates from
and are regarded as the best kind of
data in research.
• An organization doing market research about a
new product (say phone) they are about to
release will need to collect data like purchasing
power, feature preferences, daily phone usage,
Example etc. from the target market. The data from past
surveys are not used because the product
differs.
FEATURES OF PRIMARY DATA
FIRST HAND INFORMATION
TIME CONSUMING
EXPENSIVE
PAPER WORK AND DOCUMENTATION
VARIOUS METHODS
AVAILIBILITY
RELIABILITY
ACCURACY
OBJECTIVES OF RESEARCH
FACTORS TIME
INFLUENCING
PRIMARY COST
DATA AVAILABILITY OF RESEARCH STAFF
COLLECTION
AVAILIBILITY OF RESPONDENTS
OBSERVATION METHOD
METHOD
OF EXPERIMENTATION METHOD
COLLECTIN
G PRIMARY SURVEY METHOD
DATA
INTERVIEW METHOD
OBSERVATION
METHOD
• Observation, as the name
implies, is a way of collecting
data
through observing. Observa
tion data
collection method is
classified as a participatory
study, because the
researcher has to immerse
herself in the setting where
her respondents are, while
taking notes and/or recording.
Simplest way
Use for framing hypothesis
Advantage Greater accuracy
s Universal method
Useful where verbal information fails
Independent of people willingness to give
information
Observation requires huge time
It is expensive as staffs are required to
Disadvanta be trained and record the observation.
ges
It may not give complete information
Observation may be judgemental
Structured and Unstructured
Types of
Observatio
Disguised and Undisguised
n
Mechanical
Experimentation
Method
•The experimental method involves the
manipulation of variables to establish cause and
effect relationships. The key features are controlled
methods and the random allocation of participants
into controlled and experimental groups.
•An experiment is an investigation in which a
hypothesis is scientifically tested. In an experiment,
an independent variable (the cause) is manipulated
and the dependent variable (the effect) is measured;
any extraneous variables are controlled.
It provides first hand
information
Advantages It gives reliable and relevant
information
It helps in developing new
techniques and method
Expensive
Disadvanta
ges
Time consuming
Unsuccessful or delayed result
Field Experiment
Types of
Experimentati
Lab Experiment
on
Natural Experiment
Interview
Method
• An interview is generally a qualitative
research technique which involves
asking open-ended questions to
converse with respondents and collect
elicit data about a subject.
• Interviews are conducted with a
sample from a population and the key
characteristic they exhibit is their
conversational tone.
• Focused Group Interview
• In-depth Interview
Advantages
RELIABILITY DETAILED HELPS IN FLEXIBILITY PERSONAL TOUCH
INFORMATION HYPOTHESIS
FORMULATION
Disadvantages
• Time consuming
• Expensive
• Documentation and Paperwork
• Respondent and Interviewer
biasedness
• Sampling problem
Types of Interview
Focused Group
Personal Interview Interview
• Line of thought of the
• Formal and Informal
interviewer
• Structured and
Unstructured
• Individual and Group
• General or specific
interview
Survey Method
A survey is a research method used
for collecting data from a predefined
group of respondents to gain
information and insights into various
topics of interest. They can have
multiple purposes, and researchers
can conduct it in many ways
depending on the methodology chosen
and the study's goal.
Telephone
Types Mail
Internet
Schedules
The schedule is a formalized
set of questions, statements,
and spaces for answers,
provided to the enumerators
who ask questions to the
respondents and note down
the answers. While
a questionnaire is filled by the
informants themselves,
enumerators fill
the schedule on behalf of the
respondent.
To provide a standardized tool
for observation
Purpose To act as memory tickler
To facilitate the work of
tabulation and analysis
Rating Schedule
Documents Schedule
Types of Survey Schedule
Schedules
Observation Schedule
Structured or Unstructured
Study all aspects of problem
Clarity
Framing of Sequencing of Questions
a Schedule
Pre-testing of Schedule
Division of Schedule
Appropriate form of questionnaires
Personal Identity of
Biasedness
Contact Respondents
Nature of Use of
Features Respondents Computers
Time
Response Area of
Cost
Rate Coverage
• A questionnaire is a research instrument that
consists of a set of questions or other types of
prompts that aims to collect information from
a respondent. A research questionnaire is
typically a mix of close-ended
questions and open-ended questions. Open-
ended, long-form questions offer the
Questionna respondent the ability to elaborate on their
thoughts. Research questionnaires were
ire developed in 1838 by the Statistical Society of
London.
• The data collected from a data collection
questionnaire can be both qualitative as well
as quantitative in nature. A questionnaire may
or may not be delivered in the form of
a survey, but a survey always consists of a
questionnaire.
Importance of
Questionnaire
• It collects view point of people
• More data can be collected
• It gives a summary of demographic situation
• Less time consuming
• Study the behaviour
• It collects sensitive information
• It creates a data base
Essentials of Good
Questionnaire
Relevant Questions
Clarity
Restricted no. of Questions
Type of Questions: Open and Close ended
Sequence of Questions
Pilot Study
Data collected is up-to- Relevant and specific to
date your research objectives.
Advantages
of Primary Primary research can
Competitors have no
access to your data,
Data deliver ‘trade secrets’ giving you a competitive
edge.
It’s possible to conduct
To be able to apply those low-level market
findings to the entire research, such as an
market. online survey, cheaply
and easily.
It can be expensive
Limitations Time-consuming and take a long time to
complete if it involves face-to-face
of Primary contact with customers.
Data It requires some prior information about
the subject, and ideally market research
skills to get the best results.
Attracting enough customers to take
part in your survey, especially when
doing it yourself, can be challenging.
Secondary Secondary research means
research that has previously been
Research/Dat undertaken, usually by another
business or organisation, but is
a publicly available for free (such as
government statistics) or paid-for
(such as a research paper by an
organization).
Secondary Gathering previously
Based on already
research is researched information
analysed and interpreted
information and data
characterized
by:
Use of data that has been
Same data being
collected by someone
available to both you and
else other than the
your competitors;
researcher;
Fast and easy, ideal for
gaining a broad
Immediate data
understanding of a
availability
market quickly and
cheaply.
Sources of
Secondary
Data
Supplement It could be
Primary less
Advantag
Data expensive.
es of Quick Decision
Less Time
consuming
Secondar Less
y Data
No Sampling
Processing
Errors
of Data
Large
volume of
Data
Problem of Accuracy, and Reliability
Problem of Adequacy
Limitations
Lack of In-depth information
of
Secondary Lack of potential in handling specific problem
Data
Problem of Biased information
May involve huge cost
TASK
Write down the difference between Primary data and Secondary data.
Sampling
Significanc TIME SAVING LESS COMPLEX DETAILED CONVENIENT TO
e of INFORMATION CAN BE
COLLECTED
RESEARCHER
Sampling
ECONOMICAL SUITABLE FOR QUALITY RESEARCH
ACADEMIC AND WORK
MARKET-BASED
RESEARCH
Method of
Sampling
Non-
Probability
Probability
Method
Method
Probability sampling is a sampling technique where a In non-probability sampling, the researcher chooses
researcher sets a selection of a few criteria and chooses members for research at random. This sampling method
members of a population randomly. All the members have is not a fixed or predefined selection process. This makes
an equal opportunity to be a part of the sample with this it difficult for all elements of a population to have equal
selection parameter opportunities to be included in a sample.
• It is a reliable method of obtaining
information where every single member
of a population is chosen randomly,
merely by chance. Each individual has
the same probability of being chosen to
be a part of a sample.
For example, in an organization of 500
Simple Random employees, if the HR team decides on
Sampling conducting team building activities, it is
highly likely that they would prefer
picking chits out of a bowl. In this case,
each of the 500 employees has an equal
opportunity of being selected.
• Lottery Method
• Random Tables
• Researchers use the systematic sampling
method to choose the sample members of a
population at regular intervals. It requires
the selection of a starting point for the
sample and sample size that can be
Systemati repeated at regular intervals. This type of
sampling method has a predefined range,
c and hence this sampling technique is the
Sampling least time-consuming.
For example, a researcher intends to collect
a systematic sample of 500 people in a
population of 5000. He/she numbers each
element of the population from 1-5000 and
will choose every 10th individual to be a
part of the sample (Total population/
Sample Size = 5000/500 = 10).
• Cluster sampling is a method where the
researchers divide the entire population
into sections or clusters that represent a
population. Clusters are identified and
included in a sample based on
demographic parameters like age, sex,
location, etc. This makes it very simple
for a survey creator to derive effective
inference from the feedback.
Cluster
Sampling • For example, if the United States
government wishes to evaluate the
number of immigrants living in the
Mainland US, they can divide it into
clusters based on states such as
California, Texas, Florida, Massachusetts,
Colorado, Hawaii, etc. This way of
conducting a survey will be more
effective as the results will be organized
into states and provide insightful
immigration data.
Stratified random sampling is a method in which
the researcher divides the population into smaller
groups that don’t overlap but represent the
entire population. While sampling, these groups
can be organized and then draw a sample from
each group separately.
For example, a researcher looking to analyze the
characteristics of people belonging to different
Stratified annual income divisions will create strata
Sampling (groups) according to the annual family income.
Eg – less than Rs. 2,00,000, Rs. 2,00,000 – Rs.
4,00,000, Rs. 4,00,000 to Rs. 6,00,000, Rs.
6,00,000 to Rs. 8,00,000, etc. By doing this, the
researcher concludes the characteristics of
people belonging to different income groups.
Marketers can analyze which income groups to
target and which ones to eliminate to create a
roadmap that would bear fruitful results.
Uses of
probabil Reduce Sample Bias
ity
samplin
g Diverse Population
Create an Accurate
Sample
Convenience Sampling
• When the data collection is dependent on ease of access to
data is termed as convenience sampling. This non-
probability sampling method is used when there are time and
cost limitations in collecting feedback. In situations where
there are resource limitations such as the initial stages of
research, convenience sampling is used.
For example, startups and NGOs usually conduct
convenience sampling at a mall to distribute leaflets of
upcoming events or promotion of a cause – they do that by
standing at the mall entrance and giving out pamphlets
randomly.
Judgmental Sampling
• Judgemental or purposive samples are formed by the
discretion of the researcher. Researchers purely consider the
purpose of the study, along with the understanding of the
target audience. For instance, when researchers want to
understand the thought process of people interested in
studying for their master’s degree. The selection criteria will
be: “Are you interested in doing your masters in …?” and
those who respond with a “No” are excluded from the
sample.
Accidental Sampling
• Accidental sampling (sometimes known as grab,
convenience sampling or opportunity sampling) is a type of
nonprobability sampling which involves the sample being drawn from
that part of the population which is close to hand. That is,
a sample population selected because it is readily available and
convenient.
Quota Sampling
• In Quota sampling, the selection of members in this
sampling technique happens based on a pre-set standard. In
this case, as a sample is formed based on specific attributes,
the created sample will have the same qualities found in the
total population. It is a rapid method of collecting samples.
Snowball Sampling
Snowball sampling is a sampling method that researchers
apply when the subjects are difficult to trace. For example, it
will be extremely challenging to survey shelterless people or
illegal immigrants. In such cases, using the snowball theory,
researchers can track a few categories to interview and derive
results. Researchers also implement this sampling method in
situations where the topic is highly sensitive and not openly
discussed—for example, surveys to gather information about
HIV Aids. Not many victims will readily respond to the
questions. Still, researchers can contact people they might
know or volunteers associated with the cause to get in touch
with the victims and collect information.
Uses of non-probability sampling
Create a Exploratory Budget and
hypothesis research time constraints
Task
Difference between probability
sampling and non-probability
sampling methods.
Area of Availability of Availability of
Research Funds Manpower
Factors
determinin Nature of Method of
Time Frame
g Sample Research Sampling
Size Method of Judgement of
Data the Accuracy
Collection Researcher
Sample that provide correct and quality
information.
Goal oriented
Good Simple and practical
Sampling
Random selection and variability of data
Suitability
Data Processing
Data processing is the method
of collecting raw data and
translating it into usable
information. It is usually
performed in a step-by-step
process by a team of data
scientists and data
engineers in an organization.
The raw data is collected,
filtered, sorted, processed,
analyzed, stored and then
presented in a readable format.
Stages in Data Processing
Graphic
Editing Coding Classification Tabulation
Presentation
Editing of data
Editing is the first step of data processing. Editing
is the process of examine the data collected
through questionnaire or any other method. It
start after all data collection to check it or reform
into useful data.
Coding is the process of categories
data according to research subject
or topic and the design of research.
In coding process researcher set a
code for a particular things like
Coding of male - M, Female- F that indicate
the gender in questionnaire without
data writing full spelling same as
researcher can be use colors to
highlight something or numbers like
1+, 1-. this type of coding makes
easy to calculate or evaluate result
in tabulation.
Classification or categorization is the
process of grouping the statistical data
under various understandable
homogeneous groups for the purpose of
convenient interpretation. A uniformity of
attributes is the basic criterion for
classification; and the grouping of data is
Classificati made according to similarity.
Classification becomes necessary when
on of Data there is a diversity in the data collected
for meaningless for meaningful
presentation and analysis. However, it is
meaningless in respect of homogeneous
data. A good classification should have
the characteristics of clarity,
homogeneity, equality of scale,
purposefulness and accuracy.
Tabulation
of data
Tabulation is the process of summarizing raw
data and displaying it in compact form for
further analysis. Therefore, preparing tables is
a very important step. Researcher can be
tabulation by hand or in digital mode. The
choice is made largely based on the size and
type of study, alternative costs, time
pressures, and the availability of computers,
and computer programmes. If the number of
questionnaire is small, and their length short,
hand tabulation is quite satisfactory.
Diagrams are charts and graphs used
to present data. These facilitate
getting the attention of the reader
more. This help present data more
effectively. Creative presentation of
Graphical data is possible. The data diagrams are
classified into:
Representat Pie Chart
ion Bar Graphs
Line Graphs
Gantt Charts
Histograms
Gantt Chart