0% found this document useful (0 votes)

22 views9 pages

Stata Notebook

The document provides an overview of using Stata for data analysis, including general syntax for commands, troubleshooting with the 'help' command, and exploring datasets with commands like 'describe' and 'codebook'. It also covers creating and labeling variables, generating new variables, and editing data, along with practical examples for visualizing data through bar charts and histograms. Additionally, it explains how to categorize continuous data and includes questions and answers related to the use of bar charts and histograms.

Uploaded by

Emmanuel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views9 pages

Stata Notebook

Uploaded by

Emmanuel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

General syntax

There are a number of ways to ‘get to know’ your dataset. As you proceed
through this section, note that the general syntax used for the commands in
Stata are as follows:

command {space} variable_name(s) {space} [if expression],

{space} options

Note that after the command and the variable name, there is a comma
before any options are listed. Some commands can be abbreviated as well.
Conditional statements involving the word “if” come before the comma.
Keep this general syntax in mind as you work with the commands below.

Trouble shooting – how to find help

The ‘help’ command in Stata can be very useful to understand more about
the commands in Stata. Simply type help and the command you want more
information on and you will open a help window. For example, if you want
more information on the tabulate command, you can type:

help tabulate

You can also search for a command. For example, if you wanted to look for
general help for histograms, you can type:

search histogram

Exploring your data

The command ‘ describe ‘ will give provide the you the variable names and
their labels. You can look at the whole dataset or specific variables. Try:

describe

describe bmi_grp4

The command ‘ codebook ‘ provides a little more information about the

variables in the dataset, such as the minimum and maximum values and
information on missing data. Missing values in Stata are generally coded as .
but missing values can also be coded as 99 or 0 so you need to be clear
about how missing values are coded prior to exploring your dataset. Try:

codebook bmi_grp4
You can also get a feel for the dataset by using the ‘ list ‘ and ‘ tabulate ‘
commands. Try looking at the variables for ‘currsmoker’ and ‘frailty’:

tabulate currsmoker

tab currsmoker, missing

tab currsmoker frailty, row

tab currsmoker frailty, col

You can use the ‘ if ‘ condition to view specific observations. Try browsing
the data for current smokers, aged between 60-70 years old:

browse if currsmoker==1 & age_grp==1

Now try to tabulate frailty level among the current smokers aged 60-70 using
the if condition:

tab frailty if currsmoker==1 & age_grp==1

Notice that in Stata, if you want to specify that a variable is “equal to” some
value, then you need to hit two = signs, like this “==”.

Labelling variables
If you look at the ‘Variables’ window, you will notice that some of the
variables do not have labels, or the given name of the variable is not very
clear. The data might be easier to work with if you have a short description
(or label) of the variables. You can label variables using the label command.

General syntax:

label variable variable_name “label”

When adding a label to a variable, the command is ‘ label variable ‘, and

simply typing label is incorrect. For example:

label variable prior_cvd “Prior CVD”

If you look at the ‘Variables’ window, you should see that there now is a label
next to ‘prior_cvd’. Now look at this variable using either
the codebook or tab command:

codebook prior_cvd … (or tab prior_cvd)

You can see that ‘currsmoker’ is a binary variable and it takes the value
either 0 or 1. This is not very meaningful, in fact, 0 =no and 1=yes.
Therefore, we need to relabel the numeric values within the variable as ‘yes’
or ‘no’. Relabelling a variable is a two-step process. First, you must define
the label and then assign the label to the variable. The general syntax is
presented below:

label define label_name 0 “label1” 1 “label2” …[, add modify

replace]

label values variable_name(s) label_name

For the currsmoker variable, first we have to use the ‘label define’ command
to create value label called ‘smoke_lab’, which defines 0 with “no” and 1 with
“yes”. For example:

label define smoke_lab 0 “no” 1 “yes”

Next, we need to apply the new label (smoke_lab) to the currsmoker

variable. For example:

label values currsmoker smoke_lab

Now look at currsmoker (either tab or codebook) and you should see that the
values 0 and 1 are now labelled as “no” and “yes”. You can also look at the
labels of a variable by typing:
label list label_name

For example, type: label list smoke_lab

Creating new variables – generate

The generate command allows you to create new variables. The general
syntax is:

generate new_variable=expression [if ]

Try the following:

generate var1 = 1

generate var2 = 5

browse

You can also copy existing variables or generate new variables based on
existing data. For example:

generate age4 = age_grp

You can also use mathematical operations and functions such as + (add), –
(subtract), * (multiply), / (divide), ^ (to the power of), sqrt square root, ln
(natural log), exp (exponential) For example:

generate var3 = var1+var2

gen var4 = var3*var2

These calculations can be used to calculate different variables. For example,

you could compute the total number of prior diseases a participant has been
diagnosed with:

gen prior_disease = prior_cvd + prior_t2dm + prior_cancer

tab prior_disease, m

There is also an extension to the generate command, called egen, which can
be useful (see help egen for more information).

Replace and recode

You can edit variables using ‘replace’ and ‘recode’ commands. PLEASE BE
AWARE THERE ARE MULTIPLE (CORRECT) WAYS TO RECODE AND GENERATE
NEW VARIABLES. You will see some examples in this practical, and more
examples in future sessions. Choosing the method to replace or recode
variables is generally a matter of personal preference, and it will rarely
matter which method you use when recoding variables as several different
commands will produce the same result.

Try changing the ‘bmi’ variable so that you create to a binary variable, which
indicates those who have obesity and those that do not. But never recode
the original variable (in case you change your mind)! Duplicate a variable
first, and then recode it. For example:

gen bmi2=bmi

recode bmi2 min/29=0 30/max=1

Now compare ‘bmi’ and ‘bmi2’:

browse bmi bmi2

Here is another way to recode BMI:

gen bmi_bin = 1 if bmi_grp4<=2

replace bmi_bin = 0 if bmi_grp4>2

It is good practice to cross-tabulate your binary and categorical variables to

check your coding:

tab bmi_bin bmi_grp4, miss

Stata considers missing values to be the highest numerical values, so notice

where the missing values went using this code.

gen ldl2=ldlc

replace ldl2=0 if ldl2<=4

replace ldl2=1 if ldl2>4

tab ldl2

label define ldl 0 “Under 4” 1 “Over 4”

label values ldl2 ldl

tab ldl2, m

Note, as with many commands in Stata, there are alternative correct ways to generate our new
variable. One way would be to use the ‘recode’ command.

If you used the ‘recode’ command above, the recoded information is stored in the variable to be
recoded, i.e. the original information stored in this variable is overwritten. This is the reason why
we created a copy of the variable first. We can skip this step, and make our code more efficient
by utilising the “, gen()” option of the “recode” command, as shown below:

recode ldl2 min/4=0 4.01/max=1, gen(ldl2)

The code can be made even more efficient by combining the information on value labels in the
“recode” command:

recode ldlc (min/4=0 “under 4”) (4.01/max =1 “over 4”),

gen(ldl2_b)

Dropping variables

You can drop variables from the dataset if you no longer want to use them.
But once you do this, you cannot undo it, so be careful when using this
command. A variable can be dropped from the dataset by typing the
following command:

drop var1

Editing data – making changes in data editor

Stata has an editing browser, where you can see the data in your dataset
and make changes to the dataset. To access the edit window, you can either
type edit or you can open it from the drop down menu (Data>Data Editor>
Data Editor (Edit)). You can then click on the appropriate cell in the Edit
Window and change the values of the dataset. HOWEVER… when data
cleaning it is strongly recommended that you save the relevant commands in
a .do-file and then run that for each session. This ensures your original
dataset is kept intact in case you make a mistake while editing- or you need
to remember something you edited a long time ago- and you have a
permanent record of the data cleaning process. “Data cleaning” is the
process whereby you get all the variables you received in your raw dataset
ready to be used in your analysis.

Practical on Bar charts --STATA

Bar charts are a useful way of comparing groups by a particular

characteristic. We can tell Stata what summary statistic we wish to include in
the bar chart, for example, the frequency within each category of a variable,
or the mean of one variable within each level of another categorical variable.
For categorical variables, it can be useful to look at frequencies within each
level. To do this, we use the ‘graph bar’ command and include ‘(count),’
followed by ‘over ([variable name])’.

The following code will present a bar chart comparing the frequencies within
each age group category:

graph bar (count), over (age_grp)

To look at percentages within each category:

graph bar (percent), over (age_grp)

Explore the above command with some variables within your dataset. We
can also look at summary statistics of a continuous variable within each level
of another, categorical variable. For example, the following code will produce
a bar chart that presents the mean of vitamin D serum levels within each
age group category:

graph bar vitd, over (age_grp)

To present the median of vitamin D by age group, you simply include

(median) after the command ‘graph bar’:

graph bar (median) vitd, over (age_grp)

It is also possible to present the bar chart with multiple categorical variables.
The following code will produce a bar chart presenting the mean vitamin D
by age groups and history of cardiovascular disease.

graph bar vitd, over (age_grp) over (prior_cvd)

To add a title to the y-axis, we can use the following code:

graph bar (mean) vitd, over(age_grp) ytitle(Mean vitamin D

concentration)

To remove labels or change the size, you can use the following code:

graph bar (mean) vitd, over(age_grp) ytitle(Mean vitamin D

concentration) ylabel(, nolabels)

graph bar (mean) vitd, over(age_grp) ytitle(Mean vitamin D

concentration) ylabel(, labels labsize(small))
If comparing multiple variables on one chart, it can be useful to change the
colour of bars.

To do this, add in the following code ‘bar (1, fcolour([insert colour])’:

graph bar (mean) vitd, bar (1,fcolor(black)) ytitle(Mean vitamin D

concentration)

Try producing a number of different bar charts and play around with
changing different features.

Histograms

When you want to look at the distribution of a variable, rather than

comparing characteristics, you can use a histogram. A histogram can be
produced for a continuous or categorical variable, as long as they are
measured on an interval scale.

Type ‘histogram [variable name]’.

histogram sbp

If the variable is not continuous, type ‘, discrete’ afterwards:

histogram bmi, discrete

A histogram is often used to check whether a variable is normally

distributed. To add a normal distribution curve to the histogram, use the
following code:

histogram bmi, discrete normal

To adjust the number of bins, include ‘, bin ([number of bins])’

histogram sbp, bin (20)

histogram sbp, bin (10)

You can also add a title and labels to the x-axis: histogram bmi, discrete
normal title (“Body Mass Index”) histogram bmi_grp4, discrete normal
title(“Body Mass Index”) xlabel (1 “Underweight” 2 “Normal weight” 3
“Overweight” 4 “Obese”) It is also possible to show the percentage or
frequency on a histogram. To do this, amend the code at the end of the
histogram command. histogram bmi, discrete percent histogram bmi,
discrete frequency Grouping continuous data There are different ways you
can group continuous data to create a categorical variable. To this, firstly
generate a duplicate variable, so you are not altering the original. 1. ‘xtile’
If you want to create a new variable with percentiles, the ‘xtile’ command is
useful. For example, if you wish to produce deciles of systolic blood pressure:
xtile sbp10=sbp, nquantiles(10) Or quartiles of systolic blood pressure: xtile
sbp4=sbp, nquantiles(4) 2. ‘cut’ If you want to create a variable with
specific categories you can use the ‘egen’ function with the ‘cut’ command.
The code below is an example of creating a new categorical systolic blood
pressure variable. The new variable categories are <90 = low sbp; 90-<120
= normal sbp; 120-<130 = elevated sbp; ≥130 = high. egen
sbp_cat=cut(sbp), at(0,90, 120, 130, 231) Note: that the max systolic blood
pressure recorded in this population is 230, therefore, the cut off 231,
includes all values below 231. 3. ‘recode’ The recoded command also works
in the same way to the cut command above. gen sbp_cat=recode(sbp, 90,
120, 130, 231) 4. ‘autocode’ The autocode command creates evenly spaced
categories of a continuous variable: gen [new var name]=autocode([original
var name], [number of categories], [minimum], [maximum]) To create a
new systolic blood pressure categorical variable, with 4 evenly spaced
categories between 0 and 230: gen sbp_cat=autocode(sbp, 4, 0, 230) You
can use the tab and tabstat commands to check that your new categorical
variables include the correct categories. Use ‘label’ function to label the
variable and the categories in your new variable.

Questions A1.3b: Which type of variable can you plot with a bar chart? When should you use a
histogram? Plot a histogram of total cholesterol and describe the distribution. Can you change the
number of bins used to plot the histogram? What is the effect of changing the number of bins? Split
total cholesterol into groups and make a bar chart of the number of participants in each cholesterol
group. Can you give this graph a title? Can you label the y axis and change the colour of the bars in the
chart?

Answers

Answer A1.3b.i: A bar chart can be used to compare the frequency and percentage of participants
within each level of a categorical variable. They can also be used to look at summary statistics of
continuous variables, but only within level of categorical variables. Histograms should be used to look at
the distribution of data.

Answer A1.3b.ii: histogram chol (normally distributed)

Answer A1.3b.iii: histogram chol With too few bins it becomes difficult to identify the distribution of
the data histogram chol, bin(3)

Answer A1.3b.iv: tab chol gen chol_cat=recode(chol, 0, 5, 7.5, 11) label var chol_cat “Categories of
cholesterol” label define chol_cat 5 “Normal” 7.5 “High” 11 “Very high” label values chol_cat chol_cat
tab chol_cat graph bar (count), over(chol_cat) bar(1, fcolour(black)) ytitle (Frequency)

STATAfor Econ Workshop 3
No ratings yet
STATAfor Econ Workshop 3
12 pages
Computing Stata Notes
No ratings yet
Computing Stata Notes
5 pages
Using Datediff in Stata
100% (1)
Using Datediff in Stata
52 pages
Computing For Research I: Spring 2012
No ratings yet
Computing For Research I: Spring 2012
34 pages
Stata Basics for Data Management
No ratings yet
Stata Basics for Data Management
32 pages
Stata Data Managment
No ratings yet
Stata Data Managment
79 pages
STATA Basics and Regression Guide
No ratings yet
STATA Basics and Regression Guide
57 pages
Data Management Techniques in STATA
No ratings yet
Data Management Techniques in STATA
44 pages
Stata - 2 - Data Managment - 1-1-1
No ratings yet
Stata - 2 - Data Managment - 1-1-1
22 pages
Software Material
No ratings yet
Software Material
13 pages
Essential Stata Commands Guide
No ratings yet
Essential Stata Commands Guide
8 pages
Summary of Basic STATA Commands and Syntax
No ratings yet
Summary of Basic STATA Commands and Syntax
5 pages
Stata Session 1 KA (Class)
No ratings yet
Stata Session 1 KA (Class)
6 pages
Introduction To Stata 2012 - Econ4150
No ratings yet
Introduction To Stata 2012 - Econ4150
17 pages
An Introduction To Stata For Economists: Data Management
No ratings yet
An Introduction To Stata For Economists: Data Management
49 pages
STATA Basics for Economics Students
No ratings yet
STATA Basics for Economics Students
6 pages
Applied Econometrics Course Guide
No ratings yet
Applied Econometrics Course Guide
68 pages
Stata Basics: Commands and Functions
No ratings yet
Stata Basics: Commands and Functions
41 pages
Stata - Tutorial MATERIAL
No ratings yet
Stata - Tutorial MATERIAL
3 pages
STATA Precourse
No ratings yet
STATA Precourse
6 pages
Stata Guide for Data Analysts
No ratings yet
Stata Guide for Data Analysts
25 pages
Introduction To Stata and Data Management
No ratings yet
Introduction To Stata and Data Management
30 pages
Introduction To Stata: 1 Data Manipulation
No ratings yet
Introduction To Stata: 1 Data Manipulation
6 pages
Stoc
No ratings yet
Stoc
44 pages
Stata Basics for Econ Students
No ratings yet
Stata Basics for Econ Students
5 pages
Stata Technical Hints Guide
No ratings yet
Stata Technical Hints Guide
26 pages
Lec11-Stata Regression
No ratings yet
Lec11-Stata Regression
9 pages
Multicollinearity Testing in Stata
No ratings yet
Multicollinearity Testing in Stata
25 pages
Basics of STATA: Data Management & Graphs
No ratings yet
Basics of STATA: Data Management & Graphs
67 pages
Michael N. Mitchell - Data Management Using Stata - A Practical Handbook-STATA Press (2010)
100% (1)
Michael N. Mitchell - Data Management Using Stata - A Practical Handbook-STATA Press (2010)
405 pages
Stata Note
No ratings yet
Stata Note
5 pages
Intro To Stata 2022
No ratings yet
Intro To Stata 2022
36 pages
STATA Commands
100% (2)
STATA Commands
35 pages
Stata Basics for Econometrics Students
No ratings yet
Stata Basics for Econometrics Students
12 pages
Stata Syntax Guide for Beginners
No ratings yet
Stata Syntax Guide for Beginners
4 pages
Creating New Variables: Generate and Replace
No ratings yet
Creating New Variables: Generate and Replace
7 pages
Command List For Fall 2015 Workshop
No ratings yet
Command List For Fall 2015 Workshop
4 pages
GSW 11
No ratings yet
GSW 11
8 pages
Stata File Management and Commands Guide
No ratings yet
Stata File Management and Commands Guide
43 pages
Introduction To Stata Software, MaU, 2022
No ratings yet
Introduction To Stata Software, MaU, 2022
93 pages
Stata Tutorial: Data Management Basics
No ratings yet
Stata Tutorial: Data Management Basics
40 pages
6.1 Stata
No ratings yet
6.1 Stata
62 pages
Stata 10 Guide for Econometrics
No ratings yet
Stata 10 Guide for Econometrics
7 pages
Stata Basics for Data Analysts
No ratings yet
Stata Basics for Data Analysts
42 pages
Stata For Survey Analysis
No ratings yet
Stata For Survey Analysis
164 pages
Stata Presentation1
No ratings yet
Stata Presentation1
66 pages
Stata Basics13
No ratings yet
Stata Basics13
23 pages
Stoc PDF
No ratings yet
Stoc PDF
38 pages
STATA: Commands for Data Analysis
No ratings yet
STATA: Commands for Data Analysis
26 pages
Introduction To Statistical Computing in Clinical Research: Biostatistics 212
No ratings yet
Introduction To Statistical Computing in Clinical Research: Biostatistics 212
39 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
Tutorial of Stata
No ratings yet
Tutorial of Stata
11 pages
1683-Article Text-3224-1-10-20250531
No ratings yet
1683-Article Text-3224-1-10-20250531
20 pages
Cim 105
No ratings yet
Cim 105
39 pages
Teet Regress
No ratings yet
Teet Regress
28 pages
Linear Mapping - GeeksforGeeks
No ratings yet
Linear Mapping - GeeksforGeeks
11 pages
Midwifery Service
No ratings yet
Midwifery Service
7 pages
Seeds 145
No ratings yet
Seeds 145
44 pages
How Digital Skills and Artificial Intelligence Influence Entrepreneurial Orientation
No ratings yet
How Digital Skills and Artificial Intelligence Influence Entrepreneurial Orientation
22 pages
Reliability and Validity Study of The Spanish Adaptation 2019 Nurse Educati
No ratings yet
Reliability and Validity Study of The Spanish Adaptation 2019 Nurse Educati
7 pages
5 Effective Productivity Hacks For Programmers
No ratings yet
5 Effective Productivity Hacks For Programmers
7 pages
For Analysis
No ratings yet
For Analysis
2 pages
Willingness To Pay For Excreta Pellet Fe
No ratings yet
Willingness To Pay For Excreta Pellet Fe
9 pages
IDM Serial Numbers - IDM Activation Code (2024) TechMaina
No ratings yet
IDM Serial Numbers - IDM Activation Code (2024) TechMaina
10 pages
Transcription of Interview of Clinical Preceptors of Quality Midwifery Education
No ratings yet
Transcription of Interview of Clinical Preceptors of Quality Midwifery Education
19 pages
Cover Letter - Refonte Learning - Olawuni Emmanuel
No ratings yet
Cover Letter - Refonte Learning - Olawuni Emmanuel
1 page
Ethnographic Evaluation of Consumption Behaviour of Noodles by The Selected Households in Ojodu
No ratings yet
Ethnographic Evaluation of Consumption Behaviour of Noodles by The Selected Households in Ojodu
14 pages
HTML Table Styling Guide
No ratings yet
HTML Table Styling Guide
25 pages
Playbook
No ratings yet
Playbook
99 pages
Airtimenigeria Api Docs
No ratings yet
Airtimenigeria Api Docs
36 pages
Television Production
No ratings yet
Television Production
3 pages
Modbus RTU Protocol Guide
No ratings yet
Modbus RTU Protocol Guide
18 pages
myTE Access and Expense Submission Guide
No ratings yet
myTE Access and Expense Submission Guide
14 pages
Block Chain Summary & MCQ
No ratings yet
Block Chain Summary & MCQ
26 pages
Kuliah 6 Productivity Improvement
No ratings yet
Kuliah 6 Productivity Improvement
10 pages
Daa Module 2
No ratings yet
Daa Module 2
22 pages
Course Structure BArch 2017-22 PDF
No ratings yet
Course Structure BArch 2017-22 PDF
106 pages
Structures S 25 02743
No ratings yet
Structures S 25 02743
39 pages
The Evolution of Database Modeling
No ratings yet
The Evolution of Database Modeling
11 pages
IoS's Impact on Industry 4.0 via SOA
No ratings yet
IoS's Impact on Industry 4.0 via SOA
8 pages
There Is No A.I. - The New Yorker
No ratings yet
There Is No A.I. - The New Yorker
8 pages
Ultimate Guide On Crypters
No ratings yet
Ultimate Guide On Crypters
5 pages
Delta Ia-Hmi Lua Um en 20211208
No ratings yet
Delta Ia-Hmi Lua Um en 20211208
175 pages
IoT & Robotics Projects: AR, VR, BCI
No ratings yet
IoT & Robotics Projects: AR, VR, BCI
10 pages
Bank Account Debit Authorization Form
No ratings yet
Bank Account Debit Authorization Form
1 page
DG Placement for Power Loss Minimization
No ratings yet
DG Placement for Power Loss Minimization
3 pages
Mini Monitor Module Installation Guide: Troubleshooting
No ratings yet
Mini Monitor Module Installation Guide: Troubleshooting
2 pages
Idm Assignment Cover Page
No ratings yet
Idm Assignment Cover Page
29 pages
Software Test Engineer Resume - Prakash Narkhede
100% (1)
Software Test Engineer Resume - Prakash Narkhede
2 pages
Mgt212 Final Assignment - E2100233 - Pham Vu Hoang Lam
No ratings yet
Mgt212 Final Assignment - E2100233 - Pham Vu Hoang Lam
25 pages
Technology Audit
No ratings yet
Technology Audit
11 pages
Sony HT-RT3 (SA-WRT3) - Ver.1.5 PDF
67% (27)
Sony HT-RT3 (SA-WRT3) - Ver.1.5 PDF
36 pages
Resume January 2011
No ratings yet
Resume January 2011
3 pages
Understanding Stack in Assembly
No ratings yet
Understanding Stack in Assembly
26 pages
Meeco Moisture Analyzer Manual
100% (1)
Meeco Moisture Analyzer Manual
2 pages
Cisco-Business-Edition-6000-V9 - DS 1
No ratings yet
Cisco-Business-Edition-6000-V9 - DS 1
6 pages
411-9001-132 18.06 BSC3000
100% (2)
411-9001-132 18.06 BSC3000
216 pages

Stata Notebook

Uploaded by

Stata Notebook

Uploaded by

General syntax

command {space} variable_name(s) {space} [if expression],

Trouble shooting – how to find help

Exploring your data

The command ‘ codebook ‘ provides a little more information about the

tab currsmoker, missing

tab currsmoker frailty, row

tab currsmoker frailty, col

browse if currsmoker==1 & age_grp==1

tab frailty if currsmoker==1 & age_grp==1

label variable variable_name “label”

When adding a label to a variable, the command is ‘ label variable ‘, and

label variable prior_cvd “Prior CVD”

codebook prior_cvd … (or tab prior_cvd)

label define label_name 0 “label1” 1 “label2” …[, add modify

label values variable_name(s) label_name

label define smoke_lab 0 “no” 1 “yes”

Next, we need to apply the new label (smoke_lab) to the currsmoker

label values currsmoker smoke_lab

For example, type: label list smoke_lab

Creating new variables – generate

generate new_variable=expression [if ]

Try the following:

generate age4 = age_grp

generate var3 = var1+var2

gen var4 = var3*var2

These calculations can be used to calculate different variables. For example,

gen prior_disease = prior_cvd + prior_t2dm + prior_cancer

Replace and recode

recode bmi2 min/29=0 30/max=1

Now compare ‘bmi’ and ‘bmi2’:

browse bmi bmi2

Here is another way to recode BMI:

gen bmi_bin = 1 if bmi_grp4<=2

replace bmi_bin = 0 if bmi_grp4>2

It is good practice to cross-tabulate your binary and categorical variables to

tab bmi_bin bmi_grp4, miss

Stata considers missing values to be the highest numerical values, so notice

replace ldl2=0 if ldl2<=4

replace ldl2=1 if ldl2>4

label define ldl 0 “Under 4” 1 “Over 4”

label values ldl2 ldl

recode ldl2 min/4=0 4.01/max=1, gen(ldl2)

recode ldlc (min/4=0 “under 4”) (4.01/max =1 “over 4”),

Editing data – making changes in data editor

Practical on Bar charts --STATA

Bar charts are a useful way of comparing groups by a particular

graph bar (count), over (age_grp)

To look at percentages within each category:

graph bar (percent), over (age_grp)

graph bar vitd, over (age_grp)

To present the median of vitamin D by age group, you simply include

graph bar (median) vitd, over (age_grp)

graph bar vitd, over (age_grp) over (prior_cvd)

To add a title to the y-axis, we can use the following code:

graph bar (mean) vitd, over(age_grp) ytitle(Mean vitamin D

graph bar (mean) vitd, over(age_grp) ytitle(Mean vitamin D

graph bar (mean) vitd, over(age_grp) ytitle(Mean vitamin D

To do this, add in the following code ‘bar (1, fcolour([insert colour])’:

graph bar (mean) vitd, bar (1,fcolor(black)) ytitle(Mean vitamin D

When you want to look at the distribution of a variable, rather than

Type ‘histogram [variable name]’.

If the variable is not continuous, type ‘, discrete’ afterwards:

histogram bmi, discrete

A histogram is often used to check whether a variable is normally

histogram bmi, discrete normal

To adjust the number of bins, include ‘, bin ([number of bins])’

histogram sbp, bin (20)

histogram sbp, bin (10)

Answer A1.3b.ii: histogram chol (normally distributed)

You might also like