8/1/2019
BASIC COURSE IN BIO-MEDICAL RESEARCH
nie.gov.in
Data management includes
BASIC COURSE IN BIO-MEDICAL RESEARCH
Define variables
Create study database and data dictionary
Enter data and correct errors
Create dataset for analysis
Back up and archive the dataset nie.gov.in
1
8/1/2019
Key elements of data management
BASIC COURSE IN BIO-MEDICAL RESEARCH
• Data structure
• Data entry
• Individual and aggregated databases
• Mother and daughter databases
nie.gov.in
Basic structure of a database
• Lines represent records BASIC COURSE IN BIO-MEDICAL RESEARCH
• Columns represent variables
Identifier Variable 1 Variable 2 Variable 3 Variable 4 Etc…
Record 1
Record 2
Record 3
Etc…
Structure
nie.gov.in
2
8/1/2019
Data documentation
BASIC COURSE IN BIO-MEDICAL RESEARCH
• Structure
• Name, number of records etc
• Variables
• Name, values, coding
• History
• Creation, modification
• Storage information
• Media, location, back up
• Additional information Structure
nie.gov.in
Identifier in the database
BASIC COURSE IN BIO-MEDICAL RESEARCH
• Unique
• Maintained by a computerized index
• Secured by quality assurance procedures
Structure
nie.gov.in
3
8/1/2019
Using codes within the unique identifier
BASIC COURSE IN BIO-MEDICAL RESEARCH
Village
• Unique identifier may
contain all information about Street
that particular ID House
• Each digit or set of digits refer Person
to specific information 1 2 3 4 5 6 7
• Example:
• First and second digit: village
• Third and fourth digit: Street
• Fifth digit: House Structure
•
nie.gov.in
Sixth and seventh digit: Person
Structure of the variables in the database
BASIC COURSE IN BIO-MEDICAL RESEARCH
• Integer
• Specify the number of digits
• Numeric
• Specify the number of decimals
• Alpha-numeric
• Specify length
• Turn all letters to capitals
Structure
• Dates (specific format) nie.gov.in
4
8/1/2019
Creating variable names
BASIC COURSE IN BIO-MEDICAL RESEARCH
• Clear
• Need to refer to the questionnaire item
• Understandable (e.g., “EXERDAILY” for “Exercise daily”)
• Short, no space
• Most softwares require less than 10 characters
• Consistent
• “EXERPAST” for “Exercise daily in the past”
• “EXERCURRDLY” for “Exercise daily in the current ”
• “EXERPASTOCC” for “Exercise occasionally in the past”
• “EXERCURROCC” for “Exercise occasionally in the current”
• “VARIAB” for all crude variables (EXERCISE)
• “VARIAB_12” for all dichotomized variables (EXERCISE_12)
• No duplicate Structure
• Trimming of names by software can create duplicate name nie.gov.in
Design data entry-friendly data collection
BASIC COURSE IN BIO-MEDICAL RESEARCH
instrument
• Outline
• Identifiers
• Demographics
• Outcome (Health problem/disease)
• Exposures (variables, including third factors)
• Auto-coding function
Entry
nie.gov.in
5
8/1/2019
Coding
BASIC COURSE IN BIO-MEDICAL RESEARCH
• Prefer numerical coding
• Decide on
• Missing values (.) or (9, 99, 999)
• Not applicable (8, 98, 998)
• Avoid cumbersome codes
• WALKING (1) and CYCLING (2)
• Doing WALKING and CYCLING (12)
• Use as “1” or “0” (“1” or “2”) as baseline for
gradients (Yes/No or Present/Absent) as appropriateEntry
depending on software for analysis nie.gov.in
Constructing a data dictionary
BASIC COURSE IN BIO-MEDICAL RESEARCH
• Contains, for each variable:
• Variable name Question Variable
name
Type Format Values Logical
checks
• Description of questionnaire 1 EXERDAILY Integer Yes =1 Skip
No =2 pattern
item
2 EXERTYPE Integer Walking =1
• Various values of variable Cycling =2
(e.g., 1, 2, 3)
ETC…
• Meaning of each value (e.g.,
1= Yes, 2=No) Some softwares create variable catalogue automatically; Ideally investigator constructs the same
• The catalogue is particularly useful:
• When a database is shared with others Entry
nie.gov.in
• If the researcher has to get back to the database later
6
8/1/2019
Check specifications before data entry
BASIC COURSE IN BIO-MEDICAL RESEARCH
• Minimum and maximum values
• Legal codes
• Set of values that will be accepted
e.g., 1, 0 and 9 for “Yes”, “No” and “Missing”
• Skip patterns
• Automatic coding
• Copying data from preceding record Entry
• Calculations nie.gov.in
Data entry
BASIC COURSE IN BIO-MEDICAL RESEARCH
• Use as opportunity for partial data cleaning
• Write comments
• Seek clarification
• Use checks
• Mark each paper as data entry is completed
• Validate after data entry
Entry
nie.gov.in
7
8/1/2019
Individual and aggregated databases
BASIC COURSE IN BIO-MEDICAL RESEARCH
• Individual databases
• Each record is an observation
• Aggregated database
• Records contain counts
• Normalized database
• Only one count by record
• Facilitates further aggregation
nie.gov.in
Individual and aggregated databases
Aggregating individual data BASIC COURSE IN BIO-MEDICAL RESEARCH
Individual data Aggregated file
I Place Age Sex Onset
D I Place Count
D
1 A 3 1 1 Jan 06
1 A 5
2 B 1 2 1 Jan 06
2 B 3
3 C 35 2 3 Jan 06
3 C 37
4 D 67 1 4 Jan 06
4 D 67
5 A 2 1 2 Jan 06
6 B 2 1 4 Jan 06
5 C 2 1 5 Jan 06
nie.gov.in
… … … … …
Individual and aggregated databases
8
8/1/2019
Mother and daughter databases
BASIC COURSE IN BIO-MEDICAL RESEARCH
• Information is available at various levels
• Village
• Household
• Individual
• Illness episode
• Store information at each level in separate
databases
• Link databases together with identifiers nie.gov.in
Mother and daughter databases
Mother and daughter databases
Household level data Individual level data BASIC COURSE IN BIO-MEDICAL RESEARCH
HousI Location Communit HousInco HousID PersonID Diseased Exposed
D y m 1 101 1 1
1 A 3 1 1 102 2 1
2 B 1 2 2 201 2 2
3 C 35 2 2 202 1 2
4 D 67 1
5 E 2 1 • Each database has its own
6 F 2 1 unique identifier
5 G 2 1
• Link these relational databases
… … … …
using a common index identifier
nie.gov.in
• Merge files when needed
Mother and daughter databases
9
8/1/2019
Summing up on data management
BASIC COURSE IN BIO-MEDICAL RESEARCH
• Code database numerically
• Enter data using quality assurance procedures
• Store information at the level where it needs to
be stored
• Relate/Merge files when needed and as required
nie.gov.in
BASIC COURSE IN BIO-MEDICAL RESEARCH
nie.gov.in
10