0% found this document useful (0 votes)
36 views45 pages

IS BF 2024 25 Week 1

IS_BF_2024_25_Week_1

Uploaded by

yuvrajwilson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views45 pages

IS BF 2024 25 Week 1

IS_BF_2024_25_Week_1

Uploaded by

yuvrajwilson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Intro to Stata for Finance

Class 1 – Getting started

Dario Maimone Ansaldo Patti

M.Sc. in Banking & Finance

King’s Business School


Outline

Myself Appendix
Education & Academic Position
Publications
Current Research
Introduction
Why Stata?
References
Books
Websites
Introducing Stata
Program Layout
Getting started
Loading data in Stata
Main commands
2 of 45
Myself
Education & Academic Position

▶ Education:
▶ B.Sc. in Law (University of Messina)
▶ M.Sc. in Economics (University of York)
▶ Ph.D. in Economics & Institutions (University of Messina)
▶ Ph.D. in Economics (University of Essex)

▶ Current Position
▶ Associate Professor of Economics (University of Messina)

▶ Previous Position
▶ Visiting Lecturer (School of Economics & Finance, Queen Mary
University of London)

3 of 45
Myself
Publications

▶ Main Publications:
▶ Book:
▶ Happiness and the Pursuit of Freedom (with S. Bavetta & P. Navarra),
Cambridge University Press, 2014.
▶ Selected Papers:
▶ Autonomy in Decision-Making and Freshmen’s Performance at
University: Evidence from Italy (with S. D’Arrigo, L. Leonida, E.
Muzzupappa & P. Navarra), Studies in Higher Education, 2024.
▶ Does Economic Liberalization Foster Corporate Investment? Theory and
Evidence from US and Canadian Firms (with A. Iona, L. Leonida, M.
Limosani & P. Navarra), Socio-Economic Planning Science, 2024.
▶ Freedom, Diversity and the Taste for Revolt (with A. Marino & P.
Navarra), Kyklos, 2021.
▶ A tale of soil and seeds: the external environment and entrepreneurial
entry (with D. Baglieri, M. Mudambi & P. Navarra), Small Business
4 of 45 Economics, 2016.
Myself
Current Research

▶ Current Research:

▶ Political Competition and Economic Development (with L. Leonida &


P. Navarra);
▶ Economic Freedom, Firm Investment and Financing Constraints.
Theory and Empirical Evidence (with A. Iona, L. Leonida & P.
Navarra);
▶ Regional Interdependence and the Nexus between Culture and Growth
(with L. Leonida & P. Navarra).

5 of 45
Introduction

▶ The aim of this workshop is to learn how to use Stata to carry out
empirical research.

▶ At the end of this module, students are expected to have a basic


knowledge of the software, letting them to manage and analyze
data and to carry out empirical analysis. Students will be able to
estimate an econometric model and to comment upon the results.

▶ After a brief introduction to the main features of the software, we


will use real data to explore some capabilities of the software.

▶ Notice that through the file the words in orange contain links,
which we will re–direct you to the relevant web page.

6 of 45
Introduction
Why Stata?

▶ Preliminarily: Why do we choose Stata? Is there something better?


▶ Different alternative packages exist:
1. Matlab
▶ Very powerful package allowing you to do virtually anything (but it
needs good programming skills).
2. EViews
▶ Quite popular in finance (probably you have used it in the past), but
static (only few add-ins available and substantial improvements are
available only with a new release).
3. R
▶ Open source, flexible and powerful. However, it needs good
programming skills.
7 of 45
Introduction
Why Stata?

▶ Stata displays some important characteristics:


1. Students can easily learn its commands, even though they are not
confident with its syntax, studying the executed commands that are
reported in a specific Review window.
2. Stata contains an easy do–file editor where you can write all the
commands you want to execute. Therefore, there is no need to repeat
all the steps every time you use the software. Just one click and you
execute all the commands available in the do–file.
3. Stata is indefinitely extensible. Since the majority of researchers
around the world use this software, they develop some specific
commands, which can be downloaded and used in your machine free
of charge.
4. StataCorp releases periodically several updates.
8 of 45
References
Books
▶ There are many good references about Stata. Those I prefer more
are:
1. Baum, C.F. (2006). “An Introduction to Modern Econometrics Using
Stata”. Stata Press.
2. Cameron A.C. & P.K. Trivedi (2022). “Microeconometrics Using
Stata, Vol. I and II”. Stata Press.
▶ Both references explain how to exploit Stata capabilities. The first
reference is a bit old, but still useful, since most of the commands
in Stata did not change across time. The second one covers all the
features in Stata and is the companion of Cameron A.C. & P.K.
Trivedi (2005). “Microeconometrics: Methods and Applications”.
Cambridge University Press.
▶ Finally, if you are looking for a reference covering a specific topic,
you can access Stata Press.
9 of 45
References
Websites

▶ Despite the large amount of available books, several alternative


resources exists in internet. The only problem is that they are
many:
1. Oscar Torres-Reyna webpage at Princeton University;
2. The Advance Research Computing website at UCLA;
3. Statalist forum (registration required);
4. Official Stata channel on YouTube.

▶ However, googling a little bit you will find thousands of other


alternative resources in the net (for instance, many researchers do
have their own Stata page).

10 of 45
Introducing Stata
Program Layout

▶ After opening the software, you may notice that it is divided in five
different windows:
1. Left window (History): Here the list of the past executed commands
is reported (if you need you can click on one of the previous
commands to replicate it).
2. Top-middle window (Results): it displays the results of each
command you executed.
3. Bottom-middle window (Command): here you can write the
command you want to execute
4. Top-tight window (Variables): it contains the list of the variables
included in the dataset that you are currently using;
5. Bottom-right window (Properties): it reports some information
about the file in use (usually not very informative).
11 of 45
Introducing Stata
Program Layout

Figure 1: Stata main screen

12 of 45
Introducing Stata
Program Layout

▶ Some buttons at the top of the main page are worthy of attention.
▶ While the first three allow you to open a dataset, to save it and to
print, you may want to consider the following:
▶ Button n. 4: it opens the *.do file editor, i.e. a file where you can
write all your code and run it;
▶ Button n. 5: it opens a Data editor (however, my suggestion is to
arrange your data in Excel);
▶ Button n. 6: it opens a Data browser (you may inspect the data, but
you cannot edit them);
▶ Button n. 7: a break button to stop the execution of your code, if
needed.

13 of 45
Introducing Stata
Program Layout

Figure 2: Stata shortcuts

14 of 45
Introducing Stata
Program Layout

▶ Finally, the menu bar at the top of the main window is very useful.
In particular:
▶ Button 1: It allows you to open and save datasets in Stata format, to
import data from different format to Stata, to export your dataset in
another format, to use example datasets, to change your working
directory;
▶ Button 2: It allows you to manage your data;
▶ Button 3: It allows you to generate different type of graphs (really
many!);
▶ Button 4: it contains all the commands that should be used to carry
out analysis in Stata;
▶ Button 5: it allows you to access help resources, such a Stata manuals
(they are installed along with the software), other resources, the
Statalist forum and Stata journal.
15 of 45
Introducing Stata
Program Layout

Figure 3: Stata menu bar

16 of 45
Introducing Stata
Tip 1 – Inspecting a Stata command

Although you can use the help tab, you can get information
about any specific command in Stata in other ways. For
instance, if you already know the command that you want to
inspect, simply type in the command window:

help aaa

where aaa is the name of the command you want to re-


view. For instance:

help regress

reg is the short name of regress.


17 of 45
Getting started
Loading data in Stata

▶ Along with the notes, I uploaded a file, named capm.dta.


▶ It contains information about the closing price of the S&P500,
some stocks (Ford, Microsoft, General Electrics and Oracle) and
3–months Treasury bills (ustb3m) over the period
2002m1–2013m4.
▶ Data have been downloaded from Yahoo finance. There exists a
nice command in Stata (which is not part of the official release but
you can download and install as I will show later), which is called
getsymbols.
▶ It allows you to download financial data directly from Yahoo
finance to Stata.
▶ In the Appendix, I give you some idea about it. ( Using getsymbols )
18 of 45
Getting started
Loading data in Stata

▶ Many data providers offer datasets in Stata format. In this case, it


is easy to open them, by double–clicking on the file.
▶ Alternatively, in the command window you can use:

use ‘‘C:\your\_directory\capm.dta’’, clear

▶ Notice that the path of the folder where the dataset is located
must be under quotation.
▶ Instead, if your data file has a different format, the easiest thing to
do is to click on File -> Import. You see that Stata supports
different formats. Simply choose the one you like and follow the
instruction.
19 of 45
Getting started
Loading data in Stata

▶ For example suppose that our data are in Excel. You choose File
-> Import -> Excel spreadsheet. The following window appears:

Figure 4: How to import an Excel file

20 of 45
Getting started
Loading data in Stata

▶ You should click on Browse to locate the file in your pc.


▶ Notice that you should clarify that in the first row of the dataset
you find the names of the variables.
▶ Failing to do so will determine that the names of the variables are
included as the first observations. Since Stata cannot read letters,
all the data will appear in red, thus indicating that there is a
problem in your data (usually, an element like a letter or a / that
Stata cannot read).
▶ Alternatively, you can type the following line in the command
window:
import excel "Your directory and file", sheet("name
of the sheet") firstrow
21 of 45
Getting started
Loading data in Stata

▶ In the above line of code:


1. import excel: it indicates that you want to import an Excel file;
2. "Your directory and file": it locates the file in your pc;
3. sheet("name of the sheet"): it is optional. If the Excel file
contains more sheets, you can specify the one where the data are.
Important: if a Stata command supports options, they can be
included after a comma in the order your prefere.
4. firstrow is another option, which indicates that the names of the
variables are included in the first row.

22 of 45
Getting started

Hint 1
Suppose you do not know how to write down the line of code
to import data. You can use the menu bar to import data that
are saved in Excel, for instance.

After you import the data using the menu bar, go to the
Review window and click on the last executed command. In
the Command window, you will see which is the Stata syntax
to import the data. This is an evidence that learning Stata’s
syntax is easy.

Next time you know that you can write directly the line
of code above.
23 of 45
Getting started

Hint 2
There is another way to import data from Excel to Stata. You
can simply copy and paste the data.

Open the Excel file, highlight all the data (do not high-
light empty cells at the end of the data) and copy them using
CTRL+C.

In Stata, click on the shortcut to open the spreadsheet in


edit mode (button 5 in Figure 2). Paste data using CTRL+V.

24 of 45
Getting started

Tip 2 – Decimal separator

Stata uses (.) to separate decimals and (,) to separate


thousands. This is the standard notation in UK, US, China and
Australia for instance. Instead, in many European countries
the convention is the other way round. So two–and–a-half is
written as 2.5 in UK and 2,5 in Italy. This may create problems
when you import data from a foreign application, such as Ex-
cel, to Stata, since the former typically uses the local convention.

The easiest way to overcome this problem is to open Ex-


cel and then File -> Options -> Advanced. There you will
have the possibility of changing the way to separate decimals
and the thousands, using the same notation in Stata.
25 of 45
Getting started
generate

▶ We will now review some commands in Stata. They are those that
you will frequently use in your research.
▶ Stata can generate new variables using those contained in your
dataset.
▶ This is done using the command generate – or its abbreviation
gen.
▶ For instance, we want to generate the log of the variable sandp.

26 of 45
Getting started
generate

▶ In Stata:

gen lnsp = ln(sandp)

where ln( ) is the operator we apply.


▶ Clearly gen supports more complex calculations.
▶ A more general command is egen, which allows you to generate
new variables, based on some summary measures, such as mean,
min, max and so on.
▶ For instance, suppose we want to calculate the mean of the closing
price of sandp. We can type:
egen mean_sandp=mean(sandp)
27 of 45
Getting started
preserve, restore, keep and drop and replace

▶ If we want to keep or drop some variables, we can use the


commands, keep and drop.
▶ We may want to do this on a permanent basis or just because we
want to make some temporary management. In the latter case it is
wise to use preserve and restore.
▶ Suppose we want to keep only microsoft and date variables (drop
works in the same way). We can write:

preserve
keep date microsoft
restore

28 of 45
Getting started
preserve, restore, keep, drop and replace

▶ preserve stores temporarily in Stata memory our dataset.

▶ keep date microsoft allows you to keep only the two variables in
the list.
▶ restore allows you to recall the dataset in the shape it was
previously.
▶ replace allows us to change the content of a variable. For
instance, we generate previously a new variable, named lnsp.
Suppose that for whatsoever reason, we want to multiply that
value by 100. Hence we type:

replace lnsp=100*lnsp

29 of 45
Getting started
sort and order
▶ The command sort allows you to sort your data in an ascending
order.
▶ If you like to sort them in descending order, you can use the more
general gsort, i.e. gsort- date (recall to include “-” if you want
in descending order).
▶ The command order with the option first instead allows you to
move the columns. For instance, the line code order microsoft,
first indicates that the first column of the dataset should contain
the variable microsoft.
▶ If you type order microsoft sandp, first, the first two
variables of the dataset will be microsoft and sandp.

30 of 45
Getting started
if

▶ A very important and useful command is if. It allows you to


introduce conditional statements.
▶ For instance, suppose that we want to generate a dummy variable
equal to 1 if sandp is smaller than 1000 and 0 otherwise.
▶ We can proceed as follows:

gen dummy=0
replace dummy=1 if sandp<1000
▶ Notice that if the conditional statement contains an equality, i.e.
replace dummy=1 if sandp==1000 we need to use “==” rather
than simply “=”.
31 of 45
Getting started
& and |

▶ The conditional statement can be composite. Suppose that we


want to generate a second dummy which is equal to 1 if sandp, is
smaller than 1000 and microsoft is smaller than 25.
▶ We type:

gen dummy2=0
replace dummy2=0 if sandp<1000 & microsoft<25

▶ Instead, if the conditional statement is such that we want to


generate a dummy that is equal to 1 if sandp is larger than 1000
OR microsoft is larger than 25, we will use | rather than &.

32 of 45
Getting started
tostring and destring

▶ When you upload data in Stata, the variables are marked in black.
If, instead, they are marked in red, this means that the software
reads the number as a string.
▶ Stata allows you to convert a string to a number, using destring,
and, viceversa, using tostring.
▶ For example, the variable date is a number.

▶ We want to convert it in a string:

tostring date, replace

33 of 45
Getting started
tostring and destring

Figure 5: date code Figure 6: date code


as a number as a string

34 of 45
Getting started
substr

▶ Date is now a three–digit string.

▶ Suppose we want to ask the software to extract the first two digits,
using substr:

gen newvbl=substr(date, 1, 2)

▶ In the line above substr is the command to extract the code. The
first number in the brackets, 1 denotes the location in the string
where to start, while 2 indicates the length of the string to extract.
▶ Hence, we ask the software to read the first two elements in the
sting.

35 of 45
Getting started
Working with dates

▶ An important issue refers to the date. When we convert the


variable date into a string, you see that it comes out in a strange
format, for instance 504.
▶ This depends on the way Stata counts days. It starts from
01/01/1960, that in Stata language is equal to 0. Any date before
takes a negative value.
▶ When you download a dataset in any other format, dates can be
reported in different formats, for instance 01011960 or 01/01/1960
or 01Jan1960.
▶ In all cases Stata cannot read correctly the data. However, you can
manage to fix this issue.
36 of 45
Getting started
Tip 3 – Data format dd/mm/yyyy

If your data is 01/01/1960, it will be read as a string. You can


fix it in the following way:

gen date2=date(date,"DMY")
format date2, %td

Notice that if your data as the format mm/dd/yyyy, you simply


need to change the order of the letters in the above command,
i.e. instead of "DMY", we type "MDY"
The line format date2, %td allows us to change the format
in a way that is easy to understand (instead of 0, we read
01jan1960).
37 of 45
Getting started

Tip 4 – Data format 01011960


tostring date, replace
gen day=substr(date, 1, 2)
gen month=substr(date, 3, 2)
gen year=substr(date, 5, 4)
destring day, replace
destring month, replace
destring year, replace
gen newdate=mdy(month, day, year)
format newdate %td

38 of 45
Getting started
Installing new commands

▶ As I mentioned previously, we can expand Stata in different


aspects, by downloading some routines from the web.
▶ In fact Stata is used a lot around the world and many people
developed additional commands that can be used by anyone free of
charge. The majority of those commands are hosted at Boston
College. A small part can be obtained from researchers websites.
▶ If you want to install a new command, you should type:
ssc install xxx
where xxx is the name of the routine.
▶ The software will immediately install the new command in your
machine. This operation should be done only once. It is not
necessary to repeat it in the future.
39 of 45
Appendix
getsymbols

▶ Although you can download your data and arrange them in Excel
before importing them in Stata, there is another way to obtain
financial data directly in Stata.
▶ Specifically, we can use the user–written routine getsymbols

▶ You can download and install it by using:

ssc install getsymbols

▶ Notice that the line ssc install is the standard way to download
an additional command in your machine.
▶ There are some advantages/disadvantages in using it.

40 of 45
Appendix
getsymbols

▶ Advantages/disadvantages of getsymbols:
▶ It allows you to access several sources of data (Yahoo Finance,
Google, Alpha Vantages, Quandl);
▶ Depending on the source you access, you can download data for
stock, currency and cryptocurrencies data;
▶ Alpha Vantages database allows you to download high frequency data,
too (1m, 5m, 15m and 30m)
▶ However, some data cannot be downloaded (for instance, corporate
data from Yahoo finance, although available).

▶ An example about the usage of this routine (as well as other useful
tools for financial analysis) can be found in the presentation
delivered by Alberto Dorantes at the 2019 Stata meeting.
41 of 45
Appendix
getsymbols

▶ Interestingly, you can download stocks from any country.

▶ Moreover, you can choose the frequency of data (yearly,


semi–annual, quarterly, monthly and daily) and, of course, the time
period.

▶ However, you should use exactly the same ticker that is used in the
source that you want to access. For instance, if you need data for
S&P500, you should use ˆGSPC if you want to get information
about that market index.
▶ Below I provide an example about the usage of this routine.

42 of 45
Appendix
getsymbols

Tip 3 – How to use getsymbols

Suppose we want to collect data for S&P500, for Microsoft


and Oracle. The first thing is to check which is the ticker
code for those assets. An inspection in Yahoo Finance shows
S&P500=ˆGSPC, Microsoft=MSFT and Oracle=ORCL. In
Stata we type:

getsymbols ˆGSPC MSFT ORCL, fm(1) fd(1) fy(2002)


lm(4) ld(30) ly(2013) frequency(m) price(adjclose)
clear yahoo

43 of 45
Appendix
getsymbols

Tip 3 continues – How to use getsymbols

Using that line of code, we could obtain the required data, di-
rectly in Stata. However, we may obtain additional data, we are
not interested in. Some editing is required. Hence:
drop r_*
drop R_*
drop volume_*
renpfix p_adjclose_
rename _GSPC sp500
rename MSFT microsoft
rename ORCL oracle
44 of 45
Appendix
getsymbols

Tip 3 continues – How to use getsymbols

Notice the following things:

1. If we want to drop more variables which names begin in the


same way, for instance r_MSFT, r_ORCL and so on, we
can write the intial part of the name, r_, followed by *. All
the variables which start with r_ will be removed;
2. If we want to remove a prefix, as p_adjclose_, we can use
the command renpfix.
Back

45 of 45

You might also like