5/08/2016
AN INTRODUCTION TO STATA
ECMT1020
What is Stata?
Stata is a computer program which provides a very flexible
computer environment in which one can conduct various analysis of
data
Stata is a full-featured statistical programming language for
Windows, Macintosh, Unix and Linux
It can be considered a “stat package,” like SAS, SPSS, RATS, or
EViews
The number of variables is limited to 2,047 in standard (Intercooled)
Stata,but can be much larger in Stata/SE
The number of observations is limited only by computer memory
1
5/08/2016
Strengths of Stata
Stata is a command-driven statistical package for statistical analyses, data
management and graphic presentations
Stata is designed for researchers in the fields of econometrics, social science
and biostatistics
In particular, Stata allows you to do the following statistical analysis:
• Panel and Survey Data Analysis
• Discrete and Limited Dependent Variable Analysis
• Maximum Likelihood Estimation
• Regression Analysis and Regression Diagnostics
• Data Management and Transformation
The capabilities of Stata are not limited to the features above. Stata codes can
be shared, reused and made into extensions for specialized routines
Strengths of Stata
Stata has 3 major strengths:
• Data manipulation: moving data from external sources into the program,
cleaning it up, generating new variables, generating summary data sets,
merging data sets, reshaping data sets from “long” to “wide”
• Statistics: Stata provides all of the standard descriptive statistics and t-tests
through most sophisticated regressions. This includes regression diagnostics,
prediction, robust estimation of standard errors, instrumental variables and two-
stage least squares, seemingly unrelated regressions, vector autoregressions
and error correction models, logit, probit, ordered logit and probit, multinomial
logit etc.
• Graphics: These are excellent tools for exploratory data analysis, and can
produce high-quality graphics in many different forms. Every aspect of graphics
may be programmed and customized
2
5/08/2016
Updating Stata
One of Stata’s great strengths is that it can be updated over the
Internet
Updates during the life of the version you own are free
You need only have a licensed copy of Stata and access to the
Internet to check for and, if desired, download the updates
Launch Stata
Double click on the program icon on your desktop
or
Go to the Start menu/Programs/Stata
or
Click on a Stata dataset icon
You will end Stata by typing exit in the command window
3
5/08/2016
Stata windows and Interface
window
Review
Variable
window
Output window
Output window
Command window
Stata Menus and dialogs
Stata is a command driven application
You can also access most of the commands by selecting items from
Stata’s menus to open dialogs that build Stata command
Stata’s Data, Graphics and Statistics menus provide point and
click access to almost every command in Stata
4
5/08/2016
Where to find help (1/2)
The manuals are useful, particularly the User’s Guide and Getting Started
with Stata
The manual are included with the software as PDFs, and can be accessed
from the Help menu.
The command findit keyword can also be used to locate Stata
materials, including descriptions of built-in commands, Stata FAQs, and
hundreds of user-written routines
Useful links:
• http://www.stata.com/support/
• http://www.stata.com/links/resources.html
• http://www.ats.ucla.edu/stat/stata/
• http://data.princeton.edu/stata/default.html
• http://saproject.psc.isr.umich.edu/content/lesson_modules.html
Where to find help (2/2)
Command Description
Help command Display the description and the syntax of the
e.g. help regress Stata command (when you already know the
name of the command)
Lookup topic Display information on the requested
concepts, including useful commands and
FAQ
Search topic Find the Stata command you are looking for
(when you don’t know the specific name)
5
5/08/2016
Stata Viewer
The Viewer is where you can see help information, view or print log, check
and install the latest official updates to Stata. You can Open it from the
main Stata menu: Window - Viewer
Back: returns to previous content
Refresh: reload the current
content
Search: allows keyword search
Help: gives options for using the
viewer
Contents: table of content for
Stata help files
What’s New/News: new features
of Stata
Simple rules in Stata
Variable names cannot be longer 32 characters
Use letters, digits and/or the underscore for a variable name
It is preferable for a variable to start with a letter
Stata is case-sensitive. For instance, A and a are considered as two
different variable names. However, most Stata commands are in
lowercase
6
5/08/2016
Files extension in Stata
You will work with 3 kinds of files:
• Do file (or program file): filename.do
Do files allow to save your commands and to keep track of what
you have been doing. It is essential that you write comments on
your program, so that you know what the last modifications were
• Log file: filename.scml or filename.log (text file)
The Stata log file records the commands you entered and the
corresponding output for the current session
• Data set: filename.dta
Reproducibility
Keeping track of your work using a do file is very important in order to
be able to reproduce your research
If you are conducting scientific research you must be able to reproduce
your results
Ideally, anyone with your programs and data should be able to do so
without your assistance
Reproducibility also makes it very easy to perform an alternate
analysis of a particular model
E.g. This allows you to keep track of what would happen if we added an
interaction, introduced an additional variable, or decided to handle zero
values as missing.
7
5/08/2016
Getting started with Stata
1. Open a do file
2. Start every do file with an indication of your working directory
3. The working directory displayed at the bottom left hand corner of the
main window is your default directory. Any files you save without
specifying a directory will be saved here
4. You may change the working directory entering: cd directoryname
5. Set memory, if necessary, by using the command set mem # (e.g
set mem 100m)
Stata Do file
To use the Stata do file editor, open the Windows menu in Stata and
go down to the “Do-File Editor”.
You can run one single line by
clicking on
You can run the whole do file by
clicking on
Write comments starting with *
8
5/08/2016
Stata log file
The Stata log file records the commands you entered and the corresponding
outputs for the current session
To record the log file as plain text (so it can read with any word processor or
editor) use the command: log using filename, text
Keep in mind that you need to close a log file when you want to stop recording
the Stata output. You can do this typing log close
To open a log file, you can either use the open log button, or enter the Stata
command in the command window log using c:directory\filename.log
You can temporarily close a log file typing log off and then starting it again
typing log on
The command log using filename, text replace overwrites an
existing log file
Open a Stata dataset
If you want to open a Stata dataset, enter the command:
use filename.dta, clear or click on the window icon
In some cases you may get the message “no room to add more
observations or no room to add more variables”. This is because not
enough memory has been assigned to STATA
Change memory settings as explained above
To save a data file, enter the command: save, replace (overwrite
the current file) or save filename, replace (to save the current
file as filename)
9
5/08/2016
Various ways to look at your data
Describe: If you are using a data set created by someone else, you
may want to know what is in the data set.
Type describe using filename, for basic content of the data file
Type describe to get a description of the data in memory
Summarize: This gives a summary statistics of all data in memory
(number of observations, mean, st.dev, min, max)
summarize, detail gives more detailed summary statistics
summarize var1 gives summary statistics of var1
List: This command let you browse through the dataset.
list var1: list all observations in var 1
list in 3/8: list all variables of observation 3 through 8
list if condition: list all observations that satisfy the conditions
Various ways to look at your data
Tabulate: this command is very useful to produce one and two way
tables, including frequencies (single and cumulated) and percentages
Type tabulate var1 for one way tables for var1, or
tabulate var1 var2 for two way tables
Correlate: The correlate command displays the correlation
matrix or covariance matrix for a group of variables or for the
coefficients of the most recent estimation
If the variables are not specified, the matrix is displayed for all variables in
the data
10
5/08/2016
Some notation in Stata
- for use in conditioning or logical statements
Equal: ==
Not equal: != or ~=
Greater or equal: >=
Smaller or equal: <=
And: &
Or: |
Not: !
Data Manipulation (1/2)
Generating new variables: The generate command is used to create a
new variable. Generate can create a new variable that is an algebraic
expression of other variables.
generate newvar = exp [where exp is an algebraic expression]
To change the contents of an existing variable you must use the
replace command
replace oldvar = exp
Example: to create a new variable agerange from an existing variable
age (from 0 to 45)
gen agerange = . if age<16 [where . is a missing value]
replace agerange=1 if 16<=age & age<25
replace agerange=2 if 25<= age & age<45
11
5/08/2016
Data Manipulation (2/2)
Example 2: to create a dummy variable age16 identifying all 16 year
olds in the dataset
gen age16=0
replace age16=1 if age==16 [note the == for an existing value]
Drop: this command drops the selected variables and/or observations
e.g.: to drop var1: drop var1
To drop all data in memory: drop _all
To drop observations 5 to 10: drop in 5/10
Keep: this command keeps only the specified variables and/or
observations
Stata command syntax
The general syntax in Stata is:
[by varlist] command [varlis] [=exp][if exp] [in range]
[,options]
the brackets denote additional elements
If a variable is not specified the command is applied to all variables
If and in allow to restrict the command to a specific subset of data
e.g.: in 5/10: observation from 5 to 10
e.g.: if var1>10; if var1!=.
12
5/08/2016
Working with graphs
Use the graph button to open a Graph window
Draw a scatter plot of the variables yvar1
yvar2 (y-axis) against xvar
scatter yvar1 yvar2 ... xvar
Draw a line graph, i. e. scatter with
connected points:
line yvar1 yvar2 ... xvar
Draw a histogram of the variable var
histogram var
Draw a scatter plot with regression line:
scatter yvar xvar || lfit yvar xvar
Transformation in Stata
Command/Syntax Task
^2 Square
^3 Cube
sqrt(var) Square root
log(var) Logarithm
1/var Reciprocal
abs(var) Absolute value
max (x1,….xn) Maximum of x1,…xn
gen varlag=var[_n-1] Taking lags
norm(x) Cumulative standard normal distribution
normden(x) Standard normal density
normden(x,m,s) Normal density with mean m and sd s
invnorm(p) Inverse cumulative standard normal distribution
13
5/08/2016
Useful regression commands
Command/Syntax Task
regress y x1 x2 x3 Run linear regression of dependent variable y on independent
variables x1, x2 and x3. By default, the constant is included
rreg Robust regression
logit Logit analysis
probit Probit analysis
glm Generalized linear model
predict Calculate prediction or residuals after estimations
test Test linear hypothesis on parameters
mfx Marginal effects
ivreg Instrumental variable regression
qreg Quantile regression
Useful regression commands
Command/Syntax Task
tobit One and two limit Tobit model
ologit, oprobit Ordered logit and probit models
mlogit Multinomial logit model
poisson Poisson regression
arima Box-Jenkins models
arch Models of autoregressive conditional heteroskedasticity
xtreg Random effect estimator
xtreg, fe Fixed effect estimator
xtlogit Panel data logit model
xtprobit Panel data probit model
var Vector Autoregressive Model
xtivreg Instrumental variable panel data estimator
14
5/08/2016
Test commands
Command/Syntax Task
ttest T test on equality of means
ttest varname [if Test of the hypothesis that varname has the same
exp], by (groupvar) mean within the two groups defined by the dummy variable groupvar
test Test linear hypothesis on the parameter
lrtest Likelihood ratio test after estimation
lincom Test non linear hypothesis on the parameter
dfgls DF Unit root test
wntestq Portmanteau (q) test for white noise
pperron Phillips-Perron unit root test
dfuller Augmented Dickey Fuller
dwstat Durbin Watson d statistic
durbina Durbin’s alternative test for serial correlation
bgodfrey Breusch-Godfrey test for higher order serial correlation
Vargranger Granger causality tests after var/svar
archlm Engle’s LM test for the presence of autoregressive condit. heteroskedsticity
15