0% found this document useful (0 votes)

48 views7 pages

Econometrics Data Cleaning Guide

The document provides instructions for cleaning and manipulating data in Stata. It demonstrates how to rename variables, assign variable labels, generate new variables, recode values, convert variable types, import and export datasets, and perform other common data management tasks. Key steps shown include renaming variables, assigning value labels, generating binary indicator variables, converting between string and numeric variable types, and extracting substrings from string variables.

Uploaded by

mayakhalid26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views7 pages

Econometrics Data Cleaning Guide

Uploaded by

mayakhalid26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

cd "C:\Users\Lenovo\Documents\LUMS\Fall 2022\Econometrics\01 dta"

use "roster"

*opens the roster dataset. You can also directly click the dta file or

*write the full path after use.

tab age

set more off

*Disables the annoying "more" option displayed everytime you run a command that obtains results so long, they can't fit in the screen all at
once.

*Numeric variables are stored as byte, int, long, float or double.

*Byte, int and long can only hold integers

*String variables are stored as str9, str13 etc.

*String variables can hold non-numeric characters as well. No mathematical operation can be formed on them

tab age

summarize age

drop if age==2018

*drops observations in which age=2018

tab sb1q4

tab sb1q4, nolab

*Or

label list sb1q4

*finding the value labels of gender

drop if sb1q4==0

summarize age if sb1q4==2

*restricting the observations for which the age variable is being summarized,

*to only female respondents (compare female age stats with overall stats)
count if sb1q4==1 & age<18

*211,185 respondents are male and below 18

*TASK: How many respondents are above or equal to 18 and either separated or

*divorced?

count if age>=18 & (sb1q7==4 | sb1q7==5)

*2178 respondents above or equal to 18 are either separated or divorced.

sum age if sb1q7==2

tab sb1q7 sb1q4

*Two-way tabulation. Breakdown of respondents in each marital status category by gender, or, breakdown of male and female respondents by
the marital status category (depends if you are following values

*horizontally or vertically). Try the "column", "row", "column nofreq" and "row nofreq" options with the tab command (feel free to use the help
file to see how to use these options)

*and see how you can enhance the results obtained from this two-way tabulation command.

cd "C:\Users\Lenovo\Documents\LUMS\Fall 2022\Econometrics\01 dta"

use "roster"

set more off

drop if age==2018

*drops observations in which age=2018

preserve

drop sb1q62

restore

*restores the dataset and undo any changes you made

codebook age

*provides more detailed summary statistics

*TASK: How many districts in this dataset?

codebook district
list age in 1/5

*lists the first 5 observations

*saving dataset

save "roster_new", replace

/*Data Cleaning involves:

1) Renaming Variables

2) Assigning/Changing Variable Labels

3) Defining Value Labels

4) Generating New Variables

5) Creating smaller subsets of data

6) Merging Datasets

7) Reshaping Datasets

8) Removing special characters from string variables

9) Converting variables from string to numeric or vice versa

10)Recoding Missing Values

*To rename a variable:

rename sb1q4 gender

*TASK: Rename the variable sb1q11 to HHmember

rename sb1q11 HHmember

*To assign a variable label:

label variable hhcode "Household Code"

*TASK: Assign the variable label "Primary Sampling Unit" to the psu variable

label variable psu "Primary Sampling Unit"

*Changing Variable Label

label variable age

label variable age "Age"

*Recoding Missing Values

tab gender

replace gender=. if gender==0

tab gender

tab gender, missing

tab gender, nolab missing

*TASK: In the marital status variable, replace the 0 values with missing values

replace sb1q7=. if sb1q7==0

cd "C:\Users\Lenovo\Documents\LUMS\Fall 2022\Econometrics\01 dta"

use "roster"

set more off

rename sb1q4 gender

replace gender=. if gender==0

*Generating Variables and Assigning Value Labels

gen female=0

replace female=1 if gender==2

replace female=. if gender==.

/*We can now assign value labels to the female variable through the following

two-step procedure:

1) Define a value label

2) Assign that value label to our variable of interest

tab female

lab define gender 0"Male" 1"Female"

lab val female gender

tab female

label list gender

*TASK: Now generate a variable named "Male". It should take the value 1 for

*male respondents and 0 for female respondents. Assign value labels to this

*variable.

*To verify that you have generated the correct binary variable for male:

tab gender male

*Changing Value Labels

tab sb1q11

tab sb1q11, nolab

lab val sb1q11

tab sb1q11

recode sb1q11 (2=0)

tab sb1q11

lab def hhmember 0"No" 1"Yes"

lab val sb1q11 hhmember

tab sb1q11

*Conversion from numeric to string

decode sb1q7, gen(marital)

tostring age, gen(age_string)

tostring sb1q4, gen(gender)

cd "C:\Users\Lenovo\Documents\LUMS\Fall 2022\Econometrics\01 dta"

*Importing from excel

import excel "C:\Users\Lenovo\Documents\Job Applications\LUMS\Spring 2023\Econometrics\roster_subset.xlsx", sheet("Sheet1") firstrow

*Generating variables from string variables

describe sb1q7

gen divorced=0

replace divorced=1 if sb1q7=="divorced"

replace divorced=. if sb1q7==""

tab divorced

*TASK: Generate a binary variable and name it "married". It should take the

*value 1 if the respondent is currently married and 0 otherwise.

*Convert a string variable to a numeric categorical variable

encode sb1q7, generate(marital)

*TASK: Convert sb1q4 from string to numeric

encode sb1q4, gen(gender)

use "[Link]", clear

*Convert province from numeric to string

decode province, gen(province_string)

*upper(), lower() and proper functions

*To make all the characters into uppercase

replace province_string = upper(province_string)

*To make all the characters into lowercase

replace province_string = lower(province_string)

*To make all just the first character into uppercase

replace province_string = proper(province_string)

*TASK: Convert the district variable from numeric to string. Transform the

*variable such that all the characters are in uppercase.

di subinstr("Ec00nometrics","00","o",.)

di subinstr("IceCCCCCCCCCCream","C","",.)

di subinstr("Iceream","er","ecr",.)

replace province_string = subinstr(province_string,"a","@",.)

***Replace @ with a

replace province_string = subinstr(province_string,"@","a",.)

di substr("Econometrics",1,3)

di substr("Econometrics",6,.)

di substr("Econometrics",-3,.)

gen prov_code = substr(province_string,1,3)

*generates the first three characters from the province_string variable

Stata Data Managment
No ratings yet
Stata Data Managment
79 pages
Stata Basics for Data Management
No ratings yet
Stata Basics for Data Management
32 pages
Stataguide
No ratings yet
Stataguide
13 pages
NBDM Training 2016 - Module 2 - Data Management Using Stata
No ratings yet
NBDM Training 2016 - Module 2 - Data Management Using Stata
63 pages
Computing For Research I: Spring 2012
No ratings yet
Computing For Research I: Spring 2012
34 pages
Stata Tutorial: Data Management Basics
No ratings yet
Stata Tutorial: Data Management Basics
40 pages
Command List For Fall 2015 Workshop
No ratings yet
Command List For Fall 2015 Workshop
4 pages
Stata Data Management Basics Guide
100% (1)
Stata Data Management Basics Guide
24 pages
STATA Precourse
No ratings yet
STATA Precourse
6 pages
Summary of Basic STATA Commands and Syntax
No ratings yet
Summary of Basic STATA Commands and Syntax
5 pages
Stata Commands for Data Analysis
No ratings yet
Stata Commands for Data Analysis
3 pages
STATA Basics Regression and Panal Data
100% (1)
STATA Basics Regression and Panal Data
26 pages
Computing Stata Notes
No ratings yet
Computing Stata Notes
5 pages
Introduction To STATA: Introduction To STATA About STATA Basic Operations Regression Analysis Panel Data Analysis
No ratings yet
Introduction To STATA: Introduction To STATA About STATA Basic Operations Regression Analysis Panel Data Analysis
27 pages
STATA Basics and Regression Guide
No ratings yet
STATA Basics and Regression Guide
57 pages
Week 1 - Intro To Stata
No ratings yet
Week 1 - Intro To Stata
35 pages
Stata Commands-3
No ratings yet
Stata Commands-3
11 pages
Using Datediff in Stata
100% (1)
Using Datediff in Stata
52 pages
Stata - 2 - Data Managment - 1-1-1
No ratings yet
Stata - 2 - Data Managment - 1-1-1
22 pages
Stata Basics for Econometrics Students
No ratings yet
Stata Basics for Econometrics Students
181 pages
An Introduction To Stata For Economists: Data Management
No ratings yet
An Introduction To Stata For Economists: Data Management
49 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
Stata Data Analysis Commands Guide
No ratings yet
Stata Data Analysis Commands Guide
43 pages
Stata
No ratings yet
Stata
26 pages
Stata Basics for Econ Students
No ratings yet
Stata Basics for Econ Students
5 pages
R Working Materials Prep
No ratings yet
R Working Materials Prep
43 pages
Stata - Tutorial MATERIAL
No ratings yet
Stata - Tutorial MATERIAL
3 pages
Stata Workshop
No ratings yet
Stata Workshop
5 pages
Essential Stata Commands for Data Analysis
No ratings yet
Essential Stata Commands for Data Analysis
8 pages
Stata Basics for Beginners
No ratings yet
Stata Basics for Beginners
63 pages
Data Cleaning and Management Guide
No ratings yet
Data Cleaning and Management Guide
6 pages
A Short Introduction To STATA
No ratings yet
A Short Introduction To STATA
8 pages
Sas Data Statement
No ratings yet
Sas Data Statement
17 pages
Introduction To Stata and Data Management
No ratings yet
Introduction To Stata and Data Management
30 pages
Stata Note
No ratings yet
Stata Note
5 pages
Day 01
No ratings yet
Day 01
14 pages
Stata Syntax Guide for Beginners
No ratings yet
Stata Syntax Guide for Beginners
4 pages
Prac 31 Jan
No ratings yet
Prac 31 Jan
16 pages
R Working Manuals Students
No ratings yet
R Working Manuals Students
11 pages
ECON6067 Stata (I) 2022
No ratings yet
ECON6067 Stata (I) 2022
28 pages
SPSS Data Manipulation Techniques
No ratings yet
SPSS Data Manipulation Techniques
6 pages
Sas Cheat Sheet
No ratings yet
Sas Cheat Sheet
3 pages
STATA Commands
100% (2)
STATA Commands
35 pages
SPSS 20 Training for Epidemiology
No ratings yet
SPSS 20 Training for Epidemiology
83 pages
Spss Intro
No ratings yet
Spss Intro
83 pages
Stata Analysis Methodology Guide
No ratings yet
Stata Analysis Methodology Guide
8 pages
String Functions: Extract 1st Word From String "Name"
No ratings yet
String Functions: Extract 1st Word From String "Name"
28 pages
SAS R::: Cheat Sheet
No ratings yet
SAS R::: Cheat Sheet
2 pages
Standard Deviation in RStudio Guide
No ratings yet
Standard Deviation in RStudio Guide
10 pages
Stata Basics: Commands and Functions
No ratings yet
Stata Basics: Commands and Functions
41 pages
Tutorial of Stata
No ratings yet
Tutorial of Stata
11 pages
Practice Questions Lab
No ratings yet
Practice Questions Lab
1 page
Khurram Hussain - Dawn - Article and Mushfiq Mobarak - S Response
No ratings yet
Khurram Hussain - Dawn - Article and Mushfiq Mobarak - S Response
1 page
Template
No ratings yet
Template
1 page
Inflation and Printing More Money Controversy
No ratings yet
Inflation and Printing More Money Controversy
2 pages
QA Processes, Tools and Metrics: Group 1
0% (1)
QA Processes, Tools and Metrics: Group 1
18 pages
Affiliate API Documentation - Gambling-Affiliation
No ratings yet
Affiliate API Documentation - Gambling-Affiliation
11 pages
ISaGRAF Workbench
No ratings yet
ISaGRAF Workbench
440 pages
One Stream Design and Reference Guide
100% (1)
One Stream Design and Reference Guide
1,186 pages
Senior Java Developer
No ratings yet
Senior Java Developer
2 pages
Final Exam Question Paper FSPK0022 Foc July 2023-2024
No ratings yet
Final Exam Question Paper FSPK0022 Foc July 2023-2024
14 pages
Ai Interview Questions
No ratings yet
Ai Interview Questions
3 pages
Tvet 2
No ratings yet
Tvet 2
32 pages
Resume - Ruchita Wagh
No ratings yet
Resume - Ruchita Wagh
1 page
Angular JS Laboratory Manual
No ratings yet
Angular JS Laboratory Manual
42 pages
Hands-On Lab: Stored Procedures: Software Used in This Lab
No ratings yet
Hands-On Lab: Stored Procedures: Software Used in This Lab
6 pages
Message 16
No ratings yet
Message 16
4 pages
CP4252 Multicore Architecture and Programming Lab Manual
No ratings yet
CP4252 Multicore Architecture and Programming Lab Manual
26 pages
PHP Practicals
No ratings yet
PHP Practicals
36 pages
Personalization Guide in S4HANA
No ratings yet
Personalization Guide in S4HANA
8 pages
ESP Workload Manager Getting Started
100% (1)
ESP Workload Manager Getting Started
92 pages
SAP Administration Tcodes
No ratings yet
SAP Administration Tcodes
6 pages
Face Project
No ratings yet
Face Project
43 pages
LP - 3 Programming and Databases
No ratings yet
LP - 3 Programming and Databases
5 pages
Adobe Photoshop CC 2018.app Terminology 1
No ratings yet
Adobe Photoshop CC 2018.app Terminology 1
1 page
Java Classes and Variable Types Overview
No ratings yet
Java Classes and Variable Types Overview
13 pages
Project Management Expertise
No ratings yet
Project Management Expertise
1 page
A Guide To Data Validation Manager
No ratings yet
A Guide To Data Validation Manager
42 pages
Running Your Project Using Windows Subsystem For Linux (WSL)
No ratings yet
Running Your Project Using Windows Subsystem For Linux (WSL)
7 pages
Using JDBI With Spring Boot - Baeldung
No ratings yet
Using JDBI With Spring Boot - Baeldung
11 pages
Oracle DB Migration Guide
No ratings yet
Oracle DB Migration Guide
11 pages
Keycloak FIPS 140-2 Setup Guide
No ratings yet
Keycloak FIPS 140-2 Setup Guide
8 pages
Jenkins CI/CD Setup Guide
No ratings yet
Jenkins CI/CD Setup Guide
19 pages
Biometric Attendance System Using Arduino
100% (1)
Biometric Attendance System Using Arduino
19 pages
1-First Python Program: Python Programming - (Install Python & Pycharm) Min 00:01
No ratings yet
1-First Python Program: Python Programming - (Install Python & Pycharm) Min 00:01
7 pages