0% found this document useful (0 votes)

55 views13 pages

Data Analysis Using Stata

The document provides a comprehensive guide on data analysis using STATA, detailing steps to import data, perform basic operations, and generate statistical analyses. It covers commands for data manipulation, summary statistics, correlation, regression analysis, and graphical representations. Additionally, it explains the interpretation of regression results, including coefficients, R-squared values, and significance levels.

Uploaded by

lydia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views13 pages

Data Analysis Using Stata

Uploaded by

lydia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

TOPIC 2; DATA ANALYSIS USING STATA

Before starting to work with STATA, ensure you have the data that you want to work with,

preferably in an excel spreadsheet.

For example, the STATA folder has an excel file named: Data on GPA, TUCE, PSI and

GRADE. This data set consists of 4 variables and 32 observations.

To start STATA, click on the STATA folder provided, then double click the application.

This will open the STATA interface, and you will notice that STATA has four windows as

follows:

Review window Results window

Variables window Command window

To start the process of data analysis, click as follows: file – log – begin. Then stata will ask you

to provide a name for your file, say ANALYSIS 1.

Now, minimize the stata application, then open the excel file containing the data on: gpa, tuce,

psi and grade. Copy this data from excel (you can close the excel file after copying), then

maximize the stata application. In the command window, type: edit then press enter. This will

bring the stata spreadsheet. Now, you can paste your data here (in the cell highlighted with blue

– the cell on the top left of your stata spreadsheet). You may now close the data editor. Notice in

the results window, the result is “6 variables and 32 observations have been pasted in to the data

editor. Also, when you check in the review window, you will see a history of all the commands
that you are working with, and this is good for replication purposes. Finally, the variables

window displays the variables that you are working with. Having pasted the data into the data

editor, now you are ready to begin the process of data analysis.

However, the makers of stata have also installed some example data sets into stata, to aid in

teaching and training. Therefore, instead of using our data on gpa, tuce, psi and grade, it would

be more ideal if we were to use the data that the makers of stata have already installed. To do

away with the data we have just entered, type clear in the command window, then press enter. If

you type a command in the command window, you always have to press ENTER so as to

execute that command.

One of the famous example data sets that have been installed into stata is the 1978 Automobile

data which shows data on various automobiles as at 1978 and their characteristics. To get the

1978 automobile data, type use auto in the command window, then enter. Now, check in your

variables window. You will see that the variables are: make, price, mpg, rep78, headroom, trunk,

weight, length, turn, displacement, gear ratio and foreign. Thus, we have 12 variables.

To view the data, type browse in the command window then press enter. You will be able to see

12 variables and 74 observations.

To describe the data, type describe in the command window then press enter. You will be able to

see a description of all your variables in the results window (the dark screen). Make is the make

and model of the car, price is the price of the car, mpg is mileage per gallon, rep78 is the repair

record as at 1978, headroom is headroom in inches, trunk is trunk space in cubic feet, weight in

pounds, length in inches, turn is the turn circle in feet, displacement is displacement in cubic
inches, gear_ratio is Gear Ratio and finally, foreign is a dummy or indicator variable for car type

and it is defined as 1 if the car is foreign, and 0 if the car is domestic.

From the output, we notice that on storage type, some variables are string variables (str), others

are integer variables (int), while others are float variables. A string variable means that the

variable is not numeric but is in words or alphabet. Thus make is str18 which means that the

longest name in the variable make has 18 characters. Price, mpg, rep78, trunk and displacement

are int, thus they are integers. Headroom and gear_ratio are float variables which means that

their values have decimal points. Foreign is a byte which means that it is a dummy variable or

indicator variable.

To get summary statistics for the data, type summarize in the command window, then enter. The

summary statistics show the number of observations, the mean, the standard deviation, and the

maximum and minimum values. You can even copy these statistics from stata and paste them

into your word project for interpretation of the results (you could review what you learnt in

statistics or econometrics). The results are as follows:

Table 1: Summary Statistics for 1978 Automobile Data

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

price | 74 6165.257 2949.496 3291 15906

mpg | 74 21.2973 5.785503 12 41

rep78 | 69 3.405797 .9899323 1 5

headroom | 74 2.993243 .8459948 1.5 5

-------------+--------------------------------------------------------

trunk | 74 13.75676 4.277404 5 23

weight | 74 3019.459 777.1936 1760 4840

length | 74 187.9324 22.26634 142 233

turn | 74 39.64865 4.399354 31 51

displacement | 74 197.2973 91.83722 79 425

-------------+--------------------------------------------------------

gear_ratio | 74 3.014865 .4562871 2.19 3.89

foreign | 74 .2972973 .4601885 0 1

Source: Author

But, if you want to get more details about the summary statistics, type: summarize, detail in the

command window, and then enter. If you want summary statistics for only one variable with

details, say price, then type: summarize price, detail.

STATA also allows the user to generate new variables from the data set provided. Thus, we can

create product, square, square root, logarithm, reciprocal, and so on

- To create the product between mpg and weight, the command is: generate

productmpgweight = mpg * weight then enter

- To create the square of a variable (say mpg), the command is: generate squarempg =

mpg * mpg then enter.

- To create the square root of a variable (say price), the command is: generate sqrootprice

= price^0.5 then enter

- to create the natural logarithm of a variable (say headroom), the command is: generate

logheadroom = ln(headroom) then enter

- to create the reciprocal of a variable (say mpg), the command is: generate

reciprocalmpg = 1/mpg then enter

- In order to see your new variables, type browse in the command window, then enter.

Notice that the spreadsheet now contains the new variables and even in the variables

window, they are shown.

Graphics can also be done using stata. These include: scatter plots, line graph, bar graph, pie

chart, and so on.

- to create a scatter plot, between price and mpg, the command is: scatter price mpg then

enter
Figure 1: Scatter plot between price and mpg

Source: Author

- to create a line graph, between price and mpg, the command is: line price mpg then enter

- to create a bar graph, between price and mpg, the command is: graph bar price mpg

then enter

- to create a pie chart, between price and mpg, the command is: graph pie price mpg then

enter

- Repeat the above procedure but now using many variables rather than only two variables.

With stata, you can also perform correlation and regression analysis. For example to correlate

price and mpg, type correlate price mpg in the command window then enter. We notice that the

correlation coefficient between price and mpg is – 0.4686. There is a fair negative correlation
between price and mpg. Also, try: correlate price mpg rep78 weight length foreign then enter.

What can you say about the correlation coefficients given?

Stata also performs regression analysis, which is to find the effect of independent variables on

the dependent variable. In regression, the command is regress, followed by the dependent

variable, then followed by the list of independent variables. For example, type the command

regress price mpg rep78 weight length foreign then enter.

correlate price mpg

(obs=74)

| price mpg

-------------+------------------

price | 1.0000

mpg | -0.4686 1.0000

correlate price mpg rep78 weight length foreign

(obs=69)

| price mpg rep78 weight length foreign

-------------+------------------------------------------------------

price | 1.0000
mpg | -0.4559 1.0000

rep78 | 0.0066 0.4023 1.0000

weight | 0.5478 -0.8055 -0.4003 1.0000

length | 0.4425 -0.8037 -0.3606 0.9478 1.0000

foreign | -0.0174 0.4538 0.5922 -0.6460 -0.6110 1.0000

regress price mpg rep78 weight length foreign

Source | SS df MS Number of obs = 69

-------------+------------------------------ F( 5, 63) = 15.90

Model | 321789308 5 64357861.7 Prob > F = 0.0000

Residual | 255007650 63 4047740.48 R-squared = 0.5579

-------------+------------------------------ Adj R-squared = 0.5228

Total | 576796959 68 8482308.22 Root MSE = 2011.9

------------------------------------------------------------------------------

price | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

mpg | -26.01325 75.48927 -0.34 0.732 -176.8665 124.84

rep78 | 244.4242 318.787 0.77 0.446 -392.6208 881.4691

weight | 6.006738 1.03725 5.79 0.000 3.93396 8.079516

length | -102.2199 34.74826 -2.94 0.005 -171.6587 -32.78102

foreign | 3303.213 813.5921 4.06 0.000 1677.379 4929.047

_cons | 5896.438 5390.534 1.09 0.278 -4875.684 16668.56

Covariance matrix of coefficients of regress model

e(V) | mpg rep78 weight length foreign _cons

-------------+------------------------------------------------------------------------

mpg | 5698.6301

rep78 | -6545.3892 101625.14

weight | 19.667013 .94772928 1.0758867

length | 630.02684 -1456.3491 -28.839384 1207.4416

foreign | 16171.29 -133572.57 209.84955 2564.5577 661932.17

_cons | -282211.05 105230.53 1682.2409 -149140.82 -1209971.1 29057853

Having got the regression results, we may also wish to obtain the variance-covariance matrix for

the regression model. The command to get the variance-covariance matrix for the regression
model is to type: vce in the command window then press enter. The variance-covariance matrix

derives its name from the fact that the elements along the main diagonal are called VARIANCES

whereas the elements away from the main diagonal are called COVARIANCES.

The first or top part of the regression model is called the ANOVA table. The ANOVA table

shows SOURCE (model, residual and total); SUM OF SQUARES, SS; DEGREES OF

FREEDOM, df AND MEAN SUM OF SQUARES, MS.

The lower table provides the regression coefficients, the standard errors, the t statistics, the

probability value and the confidence intervals.

The sum of squares for the model is 312,789,308. This is also known as the explained sum of

squares (ESS). The sum of squares for the residual is 255,007,650 otherwise known as residual

sum of squares (RSS). The total sum of squares (TSS) is 576,796,959. Notice that: 312,789,308

+ 255,007,650 = 576,796,959. Hence, ESS + RSS = TSS.

The degrees of freedom for the model are 5. The formula for this is k – 1 where k is the number

of variables being estimated. Hence, k – 1 = 6 – 1 = 5. The degrees of freedom for the residual

are 63. The formula for this is n – k where n is the number of observations, and k is defined as

before. Hence, n – k = 69 – 6 = 63. The total degrees of freedom are 68. The formula for this is n

– 1. Hence, n – 1 = 69 – 1 = 68. Alternatively, 5 + 63 = 68.

Mean square is defined as the ratio of sum of squares to degrees of freedom. That is: MS =

SS/df. The mean square for the model is therefore 321,789,308/5 = 64,357,861.7; the mean

square for the residual is 255,007,650/63 = 4,047,740.48.

The model has a total of 69 observations. The probability value for the model is reported as Prob

> F = 0.0000. This means that the model is statistically significant at 1 percent level. The lower

the Prob value, the higher is the level of significance.

Th goodness of fit (R squared) of the model is reported as 0.5579. Now R Squared is the ratio of

explained sum of squares (ESS) to the total sum of squares (TSS). Thus, R squared = ESS/TSS =

321,789,308/576,796,959 = 0.5579. Thus mpg, rep78, weight, length and foreign explain or

account for 55.79 percent of all the variations in price, holding other factors constant.

Adjusted R squared is reported as 0.5228 which means that mpg, rep78, weight, length and

foreign explain or account for 55.79 percent of all the variations in price, holding other factors

constant when degrees of freedom are taken into account.

The formula for adjusted R squared is: Adj R Squared = 1 – (1 – R 2)*[(n – 1) / (n – k)]. Thus,

Adj R Squared = 1 – (1 – 0.5579)*[(69 – 1) / (69 – 6)] = 0.5228.

Root MSE is the root mean square error = 2011.9; is the square-root of mean square of the

residual. Thus, Root MSE = √ 4,047,740 = 2011.9

A coefficient measures how a unit change in a certain explanatory variable will affect the

dependent variable, holding all other factors constant. For example, the coefficient of mpg is –

26.01325. This means that if the mpg of a car increases by one unit, then price of the car will

decrease by 26.01 units, holding all other factors constant. The rest of the coefficients are

interpreted in a similar way.

The second column provides the standard errors (Std. Err.) for each coefficient in the regression

model. Standard errors are the square-root of variance. The variances are obtained from the

variance covariance matrix. Check to see whether the square root of the values on the main

diagonal of the variance-covariance matrix provide the standard errors that have been reported

for each variable.

The third column provides the t statistics. The t value is the ratio of coefficient to standard error.

That is, t = coefficient / std. err. For example, the t value for mpg = -26.01325 / 75.48927 = -

0.34, and so on for the remaining t values.

The next column is the probability value (P > |t|). The probability values help in determining the

significance of the coefficients. for example, if p < 0.01, it means that the coefficient is

significant at 1 percent level of significance; if p < 0.05, it means that the coefficient is

significant at 5 percent level of significance. If p > 0.10, the coefficient is not significant.

Stata Resources

The following are the resources that are useful to perform data analysis using Stata:

(i) Getting Started with Stata (GSW)

(ii) Stata Users Guide (U)

(iii) Stata Base Reference Manual (R)

(iv) Stata Data Management Reference Manual (G)

(v) Stata Programming Reference Manual (P)

(vi) Stata Time Series Reference Manual (TS)

(vii) Stata Quick Reference and Index (I)

(viii) Stata Website – www.stata.com

(ix) Stata demonstration videos on you tube.

Stata Codes
No ratings yet
Stata Codes
8 pages
Stata Starter Kit2
No ratings yet
Stata Starter Kit2
202 pages
AllCheatSheets Stata v15
No ratings yet
AllCheatSheets Stata v15
6 pages
Lec11-Stata Regression
No ratings yet
Lec11-Stata Regression
9 pages
Chi-Squared Analysis in Stata
100% (1)
Chi-Squared Analysis in Stata
73 pages
Stata Commands Cheat Sheet
No ratings yet
Stata Commands Cheat Sheet
1 page
Stata 14.1 Data Processing Cheat Sheet
No ratings yet
Stata 14.1 Data Processing Cheat Sheet
5 pages
Stata Data Processing Cheat Sheet
100% (1)
Stata Data Processing Cheat Sheet
6 pages
Stata Commands Cheat Sheet
No ratings yet
Stata Commands Cheat Sheet
6 pages
Stata Commands Cheat Sheet
No ratings yet
Stata Commands Cheat Sheet
6 pages
Economics 400 Computer Exercise
No ratings yet
Economics 400 Computer Exercise
7 pages
Stata Cheat Sheets
100% (1)
Stata Cheat Sheets
6 pages
6 Stata-1
No ratings yet
6 Stata-1
2 pages
Computing New Variables Using Generate and Replace
No ratings yet
Computing New Variables Using Generate and Replace
9 pages
Introduction To Stata 2012 - Econ4150
No ratings yet
Introduction To Stata 2012 - Econ4150
17 pages
Creating New Variables: Generate and Replace
No ratings yet
Creating New Variables: Generate and Replace
7 pages
Stat A Cheat Sheets
No ratings yet
Stat A Cheat Sheets
6 pages
Tutorial Stata PDF
No ratings yet
Tutorial Stata PDF
22 pages
Stata Demo 3 Econ 396A F2016
No ratings yet
Stata Demo 3 Econ 396A F2016
12 pages
Introduction To STATA Commands
No ratings yet
Introduction To STATA Commands
2 pages
Stata Data Analysis Cheat Sheet
No ratings yet
Stata Data Analysis Cheat Sheet
6 pages
Stata Data Analysis Cheat Sheet
No ratings yet
Stata Data Analysis Cheat Sheet
6 pages
Stata 14.1 Cheat Sheet
No ratings yet
Stata 14.1 Cheat Sheet
1 page
Intro to Stata for Economics 3111
No ratings yet
Intro to Stata for Economics 3111
12 pages
Statistics
No ratings yet
Statistics
10 pages
Stata e
No ratings yet
Stata e
31 pages
Econometrics With Stata PDF
No ratings yet
Econometrics With Stata PDF
58 pages
Notes 8 - Examples (March5)
No ratings yet
Notes 8 - Examples (March5)
25 pages
Introduction To Stata and Data Management
No ratings yet
Introduction To Stata and Data Management
30 pages
Manual vs Auto Transmission MPG Analysis
No ratings yet
Manual vs Auto Transmission MPG Analysis
5 pages
Basic Statistical Analysis with R
No ratings yet
Basic Statistical Analysis with R
11 pages
Introduction to Base R Programming
No ratings yet
Introduction to Base R Programming
10 pages
Stat A Cheat Sheets
No ratings yet
Stat A Cheat Sheets
6 pages
Getting Started With Stata: For Mac Release 17
No ratings yet
Getting Started With Stata: For Mac Release 17
146 pages
Getting Started With Stata: For Mac Release 14
No ratings yet
Getting Started With Stata: For Mac Release 14
144 pages
Multivarable Analysis
No ratings yet
Multivarable Analysis
20 pages
Stata Learning Modules ALL
No ratings yet
Stata Learning Modules ALL
142 pages
Stata Notes
No ratings yet
Stata Notes
7 pages
Stata
No ratings yet
Stata
6 pages
GSW 11
No ratings yet
GSW 11
8 pages
Stata tabstat: Summary Statistics Table
No ratings yet
Stata tabstat: Summary Statistics Table
6 pages
Essential Stata Commands Guide
No ratings yet
Essential Stata Commands Guide
8 pages
Stata Book
No ratings yet
Stata Book
158 pages
STATA Capacity Building March 8
No ratings yet
STATA Capacity Building March 8
15 pages
An Introduction To Stata Graphics
No ratings yet
An Introduction To Stata Graphics
53 pages
Stata Datawork
No ratings yet
Stata Datawork
22 pages
An Introduction To Stata For Economists: Data Management
No ratings yet
An Introduction To Stata For Economists: Data Management
49 pages
Introduction To Stata Software, MaU, 2022
No ratings yet
Introduction To Stata Software, MaU, 2022
93 pages
STATA Basics for Academic Users
100% (3)
STATA Basics for Academic Users
46 pages
Getting Started With Stata: For Unix Release 19
No ratings yet
Getting Started With Stata: For Unix Release 19
164 pages
Software Material
No ratings yet
Software Material
13 pages
Stata Basics for Econometrics Students
No ratings yet
Stata Basics for Econometrics Students
181 pages
Getting Started With Stata: For Mac Release 13
No ratings yet
Getting Started With Stata: For Mac Release 13
145 pages
Introduction to STATA Usage
No ratings yet
Introduction to STATA Usage
19 pages
Physics Target Full Class 12th
91% (11)
Physics Target Full Class 12th
348 pages
ĐỀ 9 (HS)
No ratings yet
ĐỀ 9 (HS)
4 pages
The Immune System 4th Edition Ebook PDF Official Test Bank
No ratings yet
The Immune System 4th Edition Ebook PDF Official Test Bank
411 pages
Imt 99903634 PDF
No ratings yet
Imt 99903634 PDF
16 pages
(@bohring - Bot) BT and PnC-L4
No ratings yet
(@bohring - Bot) BT and PnC-L4
9 pages
02 - Decision Constructs Loops
No ratings yet
02 - Decision Constructs Loops
45 pages
Service Manual Lex MX 310 410 510.PDF 2
No ratings yet
Service Manual Lex MX 310 410 510.PDF 2
394 pages
Principle of EE1 Lesson 5
No ratings yet
Principle of EE1 Lesson 5
61 pages
Digital Image Processing Lab Manual-1
No ratings yet
Digital Image Processing Lab Manual-1
23 pages
GarageBand Beat Making Guide
50% (2)
GarageBand Beat Making Guide
7 pages
CSE220 Data Structures - Course Description and Outcome Form
No ratings yet
CSE220 Data Structures - Course Description and Outcome Form
4 pages
UJT Triggering Circuit Study Guide
No ratings yet
UJT Triggering Circuit Study Guide
2 pages
IT Application Tools in Business - Module 1
No ratings yet
IT Application Tools in Business - Module 1
7 pages
Chevrolet Luv Dmax CNG Wiring Harness: O Sensor
100% (1)
Chevrolet Luv Dmax CNG Wiring Harness: O Sensor
1 page
ZPD & ZRD Copeland Scroll Digital Compressor Range For R410A and R407C
No ratings yet
ZPD & ZRD Copeland Scroll Digital Compressor Range For R410A and R407C
3 pages
Investigation of The Mechanical Properties Surface
No ratings yet
Investigation of The Mechanical Properties Surface
23 pages
Safety Valves: Essential Protection Guide
100% (1)
Safety Valves: Essential Protection Guide
79 pages
5th International Conference of Education (CONEDU 2025)
No ratings yet
5th International Conference of Education (CONEDU 2025)
2 pages
Advance Electrical Design & Engineering Institute
No ratings yet
Advance Electrical Design & Engineering Institute
1 page
Ai Red Teaming Solutions Brief
No ratings yet
Ai Red Teaming Solutions Brief
4 pages
880 Series User Manual - MAN-027.2023
No ratings yet
880 Series User Manual - MAN-027.2023
88 pages
Polynomial Functions Study Guide
No ratings yet
Polynomial Functions Study Guide
26 pages
Shear and Diagonal Tension: General
No ratings yet
Shear and Diagonal Tension: General
10 pages
York Water Cooled Pacakged YBW Series
100% (1)
York Water Cooled Pacakged YBW Series
5 pages
Statistics Mock 3 Test
No ratings yet
Statistics Mock 3 Test
7 pages
Record of Daily Attendance
No ratings yet
Record of Daily Attendance
2 pages
TSB wk2 1900411
No ratings yet
TSB wk2 1900411
2 pages
Distortion Factor Meter Circuit Us4267515
No ratings yet
Distortion Factor Meter Circuit Us4267515
4 pages
Ball Type Load Cell
No ratings yet
Ball Type Load Cell
1 page
Assignment 2
No ratings yet
Assignment 2
2 pages