Data analysis
Data Preparation:
An Introduction to Getting Data Ready for
Analysis
Data Preparation Overview
Getting Started
Organizing Data
Cleaning Data
Data Entry Verification
Checking for Errors and Consistency
Formatting for Analysis: Data Transformations
Getting Started
Bridging the Gap
Analysi
Research s
Methodolo
gy
What Is Data Preparation?
Creating and preparing a dataset to be
analyzed
Data can come from one or more data
sources
Survey data (phone, web, mail)
Data tracked by your organization (internal
reports)
Customer/client databases
Program data
Quality control data
Coded interviews
What Is Data Preparation?
Before you start analysis, your data should
be:
Organized
Consistently recorded
Error-free
Formatted for analysis
Allow for Ample Time
Data Preparation can take up 50% or more
of the time you dedicate to analysis
Rushing/skipping data preparation
Data errors
Low confidence
Starting analysis over
Document, Document, Document!
Data collection and compilation process
should be replicable
Where did you get the data?
How did you obtain it?
Document problems
Collecting
Recording
Extracting
Document, Document, Document!
Methodology report
Survey/Instrument/Experiment/Process Design
Sampling procedures (if applicable)
Response rates (if applicable)
Limitations
Compile Your Data
Start by figuring out all of the data
components you will need for analysis.
What are the sources?
Do you have access to all data that you
need?
How do you get access to data that you need?
Track contact/retrieval information and date
expectations
How much time do you need to build into your
timeline?
Create Codes
Code = A number or set of letters that
stands for something else
Question codes provide a way to
reference a question
Response codes provide a way to easily
record results or answers
Create Codes
Create Codes
Create a Codebook
Enter Data Into a Database
Two Methods:
Export from existing database
SurveyMonkey Excel
Data entry
Paper survey data Excel or SPSS
Create a codebook
Verification
Data Entry
Enter information into your database one
record at time
Use your codebook to determine what you
should enter into your database
Statistics program (i.e., SPSS)
Enter answer codes for data and define the codes
in the program (i.e., Male = 1; Female = 2)
Excel or other spreadsheet program
Enter answer labels into spreadsheet (i.e., Male
or Female)
Data Entry Considerations
Create a unique ID for each record
Illegible handwriting or unclear
markings
Missing data
How much can you tolerate?
Key questions?
Data Cleaning
Data Entry Validation
GIGO: Garbage In, Garbage Out
Double user verification
Double entry verification
Enter exact same data into two Excel tabs
Use a formula on a third tab to check the first
two tabs
Dataentry software / SPSS (using
Compare Datasets)
Checking the Data: Ranges
Are all answers within the accepted
minimum and maximum values?
Age values of 150 or 16
Value of “7” on a 5-point scale
Checking the Data: Data Types
Is data formatted in the correct way?
Age entered as thirty instead of 30
Date entered 29-10-2015 instead of 10-29-
2015
Checking the Data: Data length
Do all of your data entries have the
correct number of digits or letters?
Examples
Zip code with four digits instead of five or
nine
Phone number with 11 digits instead of 10
Checking the Data: Fixing Errors
Can the data be clarified with
assistance?
Trace data back to point of origin
Review original
data/database/instrument/source
Ifsomeone answered a survey, can you
contact that person for clarification?
Checking the Data: Fixing Errors
Is it reasonably correctable on your own?
If valid values are 1 – 5 and you have “11”,
entering a “1” might be considered reasonable
If the valid values are between 0 and 100 and
you have “232”, you can not make a reasonable
determination between 23 and 32.
Do not guess or choose a value at
random; make it a missing value
Checking the Data: Missing Data
Data can be missing for a variety of
reasons:
Unanswered questions (forgot/declined to
answer, illegible handwriting)
Data point was not applicable for a portion
of records
Errors in recording data
Manually removed because you lacked
confidence in the data
Checking the Data: Missing Data
Create a unique code for missing values
Missing responses: -9998
Not applicable fields: -9999
Don’t know responses: 8888, 888, 88
Declined to answer responses: 9999, 999, 99
Checking the Data: Missing Data
Statistical programs can remove your
missing data from analysis
Statistical Analysis in Excel
Can do statistical analysis in Excel
Excel 2010 and 2013 (Windows):
File Options Add-Ins
New Data Analysis option appears under the Data tab
Statistical Analysis in Excel
Data Transformations in
SPSS
Questions?
Thank you!
TIMOTH Y SERVIN SKY
PROJECT MANAGER
C EN TER FOR SURVEY R ESEAR C H
T J S 3 2 @ P S U. E D U
717-948-4312
H T T P S : / /C S R . H B G . P S U. E D U