0% found this document useful (0 votes)
16 views3 pages

Stata Class Notes - Modifying Data

Quick guide to stata

Uploaded by

Ismaila Yusuf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views3 pages

Stata Class Notes - Modifying Data

Quick guide to stata

Uploaded by

Ismaila Yusuf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Stata Class Notes: Modifying Data http://www.ats.ucla.edu/stat/stata/notes/modifying12.

htm

Help the Stat Consulting Group by

stat > stata > notes > modifying12.htm

Stata Class Notes


Modifying Data

1.0 Stata commands in this unit


codebook Show codebook information for file

order Order the variables in a data set

label data Apply a label to a data set

label variable Apply a label to a variable

label define Define value labels for a categorical variable

label values Apply value labels to a variable

encode Create numeric version of a string variable

list Lists the observations

rename Rename a variable

recode Recode the values of a variable

notes Apply notes to the data file

generate Creates a new variable

replace Replaces values for an existing variable

egen Extended generate - has special functions that can be used when creating a new variable

2.0 Demonstration and explanation


use http://www.ats.ucla.edu/stat/data/hs0, clear

Let's use the codebook command to see what our variables look like. Because we have not listed any
variables after the command, Stata will show us the codebook for all of the variables.

codebook

First, let's order the variables in a way that makes sense. While there are several possible orderings
that are logical, we will put the id variable first, followed by the demographic variables, such as
gender, ses and prgtype. We will put the variables regarding the test scores at the end.

order id gender

Now let's include some variable and value labels so that we know a little more about the variables.

label variable schtyp "type of school"


label define scl 1 public 2 private
label values schtyp scl
codebook schtyp
list schtyp in 1/10
list schtyp in 1/10, nolabel

Now let's create a new numeric version of the string variable prgtype. We will call our new variable
prog.

encode prgtype, gen(prog)


label variable prog "type of program"

1 of 3 4/27/2015 5:12 PM
Stata Class Notes: Modifying Data http://www.ats.ucla.edu/stat/stata/notes/modifying12.htm

codebook prog
list prog in 1/10
list prog in 1/10, nolabel

The variable gender may give us trouble in the future because it is difficult to know what the 1s and
2s mean.

rename gender female


recode female (1=0)(2=1)
label define fm 1 female 0 male
label values female fm
codebook female
list female in 1/10
list female in 1/10, nolabel

Let's recode the value 5 in the variable race to be missing.

list race if race == 5


recode race 5 = .
list race if race == .

Now let's create a variable that is a total of some of the test scores.

generate total = read + write + math + science


summarize total

Note that there are five missing values of total because there are five missing values of science.

Now let's see if we can assign some letter grades to these test scores.

recode total (0/140=0 F) (141/180=1 D) (181/210=2 C) (211/234=3 B) (235/300=4 A), gen(grade)


label variable grade "combined grades of read, write, math, science"
codebook grade
list read write math science total grade in 1/10
list read write math science total grade in 1/10, nolabel

Let's label the dataset itself so that we will remember what the data are. We can also add some notes
to the data set.

label data "High School and Beyond"

notes female: the variable gender was renamed to female


notes race: values of race coded as 5 were recoded to be missing
notes

Stata has another way of generating new variables called egen which stands for extended generation. The egen command is a
useful tool for many of specialized situations.

In our first example, we will use egen to create standard scores for the variable read.

egen zread = std(read)


summarize zread
list read zread in 1/10

Next we will a variable that has the mean of read for each level of ses.

egen readmean = mean(read), by(ses)


list read ses readmean in 1/10

Now we will compute the average of several variables for each observation. Please note that there will be a mean for observation 9
even though it has a missing value for science.

egen row_mean = rowmean(read write math science)


list read write math science row_mean in 1/10

These are just a few of the many useful egen functions built-in to Stata.
Finally, we will save our data and continue on to the next unit.

save hs1

2 of 3 4/27/2015 5:12 PM
Stata Class Notes: Modifying Data http://www.ats.ucla.edu/stat/stata/notes/modifying12.htm

3.0 For more information

Data Management Using Stata: A Practical Handbook

Chapters 4-5

Statistics with Stata 12

Chapter 2

Gentle Introduction to Stata, Revised Third Edition

Chapter 3

Data Analysis Using Stata, Third Edition

Chapter 5

An Introduction to Stata for Health Researchers, Third Edition

Chapters 7-8

Stata Learning Modules


Labeling data
Creating and recoding variables
Stata Frequently Asked Questions
How can I quickly convert many string variables into numeric variables?
How can I quickly recode continuous variables into groups?
How do I standardize variables in Stata?

3 of 3 4/27/2015 5:12 PM

You might also like