0% found this document useful (0 votes)

9 views47 pages

Lecture 1 - Data Introduction 2020 Student

The course unit 'Introduction to Handling Data' aims to teach students about estimation and inference techniques in Economics, focusing on causal inference and time-series analysis pitfalls. It includes practical applications using R, covering topics such as regression analysis, data manipulation, and hypothesis testing, with a specific case study on the impact of minimum wage on employment in the fast-food industry. Assessment consists of an online test, an end-of-term exam, and group coursework.

Uploaded by

s.illingworth.a

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views47 pages

Lecture 1 - Data Introduction 2020 Student

Uploaded by

s.illingworth.a

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Introduction to Handling Data

ECON20222 - Lecture 1

Ralf Becker and Martyn Andrews

Ralf Becker and Martyn Andrews Introduction to Handling Data 1 / 47

What is this course unit about?

Help you implement and interpret the main estimation and

inference techniques used in Economics
Focus on:
I causal inference
I the main pitfalls of time-series analysis

Ralf Becker and Martyn Andrews Introduction to Handling Data 2 / 47

This Week’s Empirical Question

Card, David ; Krueger, Alan B. (1994) Minimum Wages and

Employment: A Case Study of the Fast-Food Industry in New Jersey
and Pennsylvania, The American Economic Review, 84, 772-793.
Do higher minimum wages decrease employment (as predicted by
common-sense and a competitive labour market model)?

Ralf Becker and Martyn Andrews Introduction to Handling Data 3 / 47

The Research Question

“This paper presents new evidence on the effect of minimum wages on

establishment-level employment outcomes. We analyze the experiences
of 410 fast-food restaurants in New Jersey and Pennsylvania following
the increase in New Jersey’s minimum wage from $ 4.25 to $ 5.05 per
hour. Comparisons of employment, wages, and prices at stores in New
Jersey and Pennsylvania before and after the rise offer a simple method
for evaluating the effects of the minimum wage.”
Card, David ; Krueger, Alan B. (1994, p.772)

Ralf Becker and Martyn Andrews Introduction to Handling Data 4 / 47

Why Data Matter

The debate is still alive:

Overall negative effect on employment, IZA.
"Research findings are not unanimous, but especially for the US,
evidence suggests that minimum wages reduce the jobs available to
low-skill workers."
An overview of the empirical evidence is provided in this report by
Arindrajit Dube for the UK Government.
"Especially for the set of studies that consider broad groups of
workers, the overall evidence base suggests an employment impact
of close to zero."

Ralf Becker and Martyn Andrews Introduction to Handling Data 5 / 47

At the end of this unit . . .

You will be able to:

Understand and discuss the challenges of making causal inferences
Perform inference appropriate for the model being estimated
Interpret empirical results (with due caution!)
Discuss strengths and weaknesses of particular empirical
applications
Do intermediate data work in R
Confidently apply regression analysis in R
Apply more advanced causal inference techniques in R
Find coding help for any new challenges in R

Ralf Becker and Martyn Andrews Introduction to Handling Data 6 / 47

What you need to do
To learn in this unit you need to:

coding, cleaning data, struggling,

answering real questions, that there
self-learning, amazement at what
is not always a clear answer
you can do

Ralf Becker and Martyn Andrews Introduction to Handling Data 7 / 47

Assessment Structure and feedback

Online test (on the use of R) - 10%

End-of-Term exam (short answer questions) - 50%
Group coursework - 40% (see extra info)

Ralf Becker and Martyn Andrews Introduction to Handling Data 8 / 47

Aim for today

Statistics/Econometrics R Coding
Summary Statistics Introduce you to R and
Difference between population RStudio
and sample How do I learn R
Hypothesis testing Import data into R
Graphical Data Perform some basic data
Representations manipulation
Diff-in-Diff Analysis Perform hypothesis tests
Simple regression analysis Estimate a regression

Ralf Becker and Martyn Andrews Introduction to Handling Data 9 / 47

This Week’s Plan

Replicate some of the basic results presented in Card and Krueger

(1994)
Introduce the Difference-in-Difference methodology (Project!!)
[Sometimes known as “Diff-in-Diff” or DiD.]
Use this example to
I introduce you to R
I review some summary statistics
I review simple regression and its implementation
I introduce some basic visualisations

Ralf Becker and Martyn Andrews Introduction to Handling Data 10 / 47

Introduce R/R-Studio

R is a statistical software package, it is open source

and free
a lot of useful functionality is added by independent
researchers via packages (also for free)

RStudio is a user interface which makes working with

R easier. You need to install R before you install
RStudio.

ECLR is a web-resource we have set up to support

you in your R work.

Ralf Becker and Martyn Andrews Introduction to Handling Data 11 / 47

Welcome to RStudio

Ralf Becker and Martyn Andrews Introduction to Handling Data 12 / 47

Write Code Files or the Basic Workflow
keep an original data file (usually ‘.xlsx‘ or ‘.csv‘) and do not
overwrite this file
any manipulation we make to the data (data cleaning, statistical
analysis etc.) is command based and we collect all these commands
in a script file. R will then interpret and execute these commands.
It is hence like a recepie which you present to a chef. These script
files have extension ‘.r‘
you can also learn to write Rmarkdown files (‘.rmd‘). They combine
code with normal text and output.
When you write code you should ensure that you add comments to
your code. Comments are bit of text which is ignored by R
(everything after an ‘#‘) but helps you or someone else to decipher
what the code does.
By following the above advice you make it easy for yourself and others
to replicate your work.
Ralf Becker and Martyn Andrews Introduction to Handling Data 13 / 47
Prepare your code

We start by uploading the extra packages we need in our code.

The first time you need these packages at a computer you may need to
install these. Use the following code to do this
[Link](c("readxl","tidyverse","ggplot2","stargazer")

This only needs to be done once on a particular computer. However,

every time you want to use any of these packages in a code you need to
make them available to your code (load them):
library(tidyverse) # for almost all data handling tasks
library(readxl) # to import Excel data
library(ggplot2) # to produce nice graphiscs
library(stargazer) # to produce nice results tables

Ralf Becker and Martyn Andrews Introduction to Handling Data 14 / 47

The data
Then we load the data from excel
CKdata<- read_xlsx("CK_public.xlsx",na = ".")

na = "." indicates how missing data are coded.

Check some characteristics of the data which are now stored in CKdata:
Discuss [Link], number of obs and number of variables, their names
and variable types
str(CKdata) # prints some basic info on variables

## tibble[,46] [410 x 46] (S3: tbl_df/tbl/[Link])

## $ SHEET : num [1:410] 46 49 506 56 61 62 445 451 455 458 ...
## $ CHAIN : num [1:410] 1 2 2 4 4 4 1 1 2 2 ...
## $ CO_OWNED: num [1:410] 0 0 1 1 1 1 0 0 1 1 ...
## $ STATE : num [1:410] 0 0 0 0 0 0 0 0 0 0 ...
## $ SOUTHJ : num [1:410] 0 0 0 0 0 0 0 0 0 0 ...
## $ CENTRALJ: num [1:410] 0 0 0 0 0 0 0 0 0 0 ...
## $ NORTHJ : num [1:410] 0 0 0 0 0 0 0 0 0 0 ...
## $ PA1 : num [1:410] 1 1 1 1 1 1 0 0 0 1 ...
## $ PA2 : num [1:410] 0 0 0 0 0 0 1 1 1 0 ...
## $ SHORE : num [1:410] 0 0 0 0 0 0 0 0 0 0 ...
## $ NCALLS : num [1:410] 0 0 0 0 0 2 0 0 0 2 ...
Ralf Becker and Martyn Andrews Introduction to Handling Data 15 / 47
The data

To see the entire dataset (like in a spreadsheet):

Either click the little spreadsheet symbol next to the [Link] in the
Environment tab, or
view(CKdata) # prints some basic info on variables

Ralf Becker and Martyn Andrews Introduction to Handling Data 16 / 47

The data - Unit of observation
A unit of observation is a fast food restaurant.
Say observation 27 in our dataset is a Roy Rogers (CHAIN = 3) store in
Pennsylvania (STATE = 0) with 7 full time employees (EMPFT), 19
part-time employees (EMPPT) and 4 managers (NMGRS) in Feb 1992 and
17.5 in Dec
CKdata[27,]

## # A tibble: 1 x 46
## SHEET CHAIN CO_OWNED STATE SOUTHJ CENTRALJ NORTHJ PA1
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <
## 1 515 3 1 0 0 0 0 0
## # ... with 35 more variables: EMPFT <dbl>, EMPPT <dbl>, NMG
## # WAGE_ST <dbl>, INCTIME <dbl>, FIRSTINC <dbl>, BONUS <db
## # MEALS <dbl>, OPEN <dbl>, HRSOPEN <dbl>, PSODA <dbl>, PF
## # PENTREE <dbl>, NREGS <dbl>, NREGS11 <dbl>, TYPE2 <dbl>,
## # DATE2 <dbl>, NCALLS2 <dbl>, EMPFT2 <dbl>, EMPPT2 <dbl>,
## # WAGE_ST2 <dbl>, INCTIME2 <dbl>, FIRSTIN2 <dbl>, SPECIAL
Ralf Becker and Martyn Andrews Introduction to Handling Data 17 / 47
Addressing particular variables

If you want to call/use the entire spreadsheet/data frame/tibble then

you call CKdata.
But often you want to call one variable only:
CKdata$CHAIN, calls CHAIN only
CKdata["CHAIN"], calls CHAIN only
CKdata[2], calls CHAIN only, as it is the 2nd variable
And sometimes you want to call several, but not all, variables:
CKdata[c("STATE","CHAIN")]
c("STATE","CHAIN") creates a list of names. c really represents a
function, c for concatenation.
Also note: R is case sensitive, CHAIN 6= Chain

Ralf Becker and Martyn Andrews Introduction to Handling Data 18 / 47

Variable types
These are five basic data types.
character: "a", "swc"
numeric: 2, 15.5
integer: 2L (the L tells R to store this as an
integer)
logical: TRUE, FALSE
factor: a set number of categories
It is important that you know and understand differences between data
types. Each variable has has a particular type and some operations only
work for particular datatypes. For instance, we need num or int for any
mathematical operations.
In our [Link] we have only num variable types.
We will encounter logical variables frequently. they are very powerful
Ralf Becker and Martyn Andrews Introduction to Handling Data 19 / 47
factor variables

We store categorical variables as factor variables.

Sometimes you need to type convert to factor variables.
str(CKdata[c("STATE","CHAIN")]) # prints some basic info on v

## tibble[,2] [410 x 2] (S3: tbl_df/tbl/[Link])

## $ STATE: num [1:410] 0 0 0 0 0 0 0 0 0 0 ...
## $ CHAIN: num [1:410] 1 2 2 4 4 4 1 1 2 2 ...
STATE, 1 if New Jersey (NJ); 0 if Pennsylvania (Pa)
CHAIN, 1 = Burger King; 2 = KFC; 3 = Roy Rogers; 4 = Wendy’s

Ralf Becker and Martyn Andrews Introduction to Handling Data 20 / 47

factor variables
CKdata$STATEf <- [Link](CKdata$STATE)
levels(CKdata$STATEf) <- c("Pennsylvania","New Jersey")

CKdata$CHAINf <- [Link](CKdata$CHAIN)

levels(CKdata$CHAINf) <- c("Burger King","KFC", "Roy Rogers", "Wendy's")

CKdata$STATE calls variable STATE in dataframe ck_data

<- assigns what is on the right [Link](CKdata$STATE) to the
variable on the left CKdata$STATEf
[Link](CKdata$STATE) calls a function [Link] and applies
it to CKdata$STATE
str(CKdata[c("STATEf","CHAINf")]) # prints some basic info on variables

## tibble[,2] [410 x 2] (S3: tbl_df/tbl/[Link])

## $ STATEf: Factor w/ 2 levels "Pennsylvania",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ CHAINf: Factor w/ 4 levels "Burger King",..: 1 2 2 4 4 4 1 1 2 2 ...

Ralf Becker and Martyn Andrews Introduction to Handling Data 21 / 47

factor variables

factor variables are variables with discrete categories. Which ones they
are you can find out with the levels() function:
levels(CKdata$CHAINf)

## [1] "Burger King" "KFC" "Roy Rogers" "Wendy's"

Ralf Becker and Martyn Andrews Introduction to Handling Data 22 / 47

Learn more about your data

Use the summary function for some initial summary stats for num or int
variables
WAGE_ST, starting wage ($/hr), Wave 1, before min wage increase,
Feb 1992
EMPFT, # full-time employees before policy implementation
summary(CKdata[c("WAGE_ST","EMPFT")])

## WAGE_ST EMPFT
## Min. :4.250 Min. : 0.000
## 1st Qu.:4.250 1st Qu.: 2.000
## Median :4.500 Median : 6.000
## Mean :4.616 Mean : 8.203
## 3rd Qu.:4.950 3rd Qu.:12.000
## Max. :5.750 Max. :60.000
## NA's :20 NA's :6

Ralf Becker and Martyn Andrews Introduction to Handling Data 23 / 47

Learn more about your data

How many obs in each state and what chains

Tab1 <- CKdata %>% group_by(STATEf) %>%
summarise(n = n()) %>%
print()

## # A tibble: 2 x 2
## STATEf n
## <fct> <int>
## 1 Pennsylvania 79
## 2 New Jersey 331
[Link](table(CKdata$CHAINf,CKdata$STATEf,dnn = c("Chain", "State")),margin = 2)

## State
## Chain Pennsylvania New Jersey
## Burger King 0.4430380 0.4108761
## KFC 0.1518987 0.2054381
## Roy Rogers 0.2151899 0.2477341
## Wendy's 0.1898734 0.1359517

Ralf Becker and Martyn Andrews Introduction to Handling Data 24 / 47

Scatter plot of the data
p1 <- ggplot(CKdata,aes(WAGE_ST,EMPFT)) +
geom_point(size=0.5) + # this produces the scatter plot
geom_smooth(method = "lm", se = FALSE) # adds the line
p1

40
EMPFT

0
4.5 5.0 5.5
WAGE_ST

Point out that each dot represents one store data. Point out line of best fit

Ralf Becker and Martyn Andrews Introduction to Handling Data 25 / 47

Regression Line
The line in the previous plot is the line of best fit coming for a linear
regression
EM P F T = α + βW AGE_ST + u (Population Model)
The population model is defined by unknown parameters α and β
and the unknown error terms u. We will use sample data to obtain
sample estimates of these parameters.
The error terms u contain the effects of any omitted variables and
reflect that any modelled relationship will only be an
approximation. The u are random variables

EM P F Tit = α
b + βb W AGE_STit + u
bit (Estimated Sample Model)
Here we have two subscripts as the data have a cross-section (i) and a
time-series dimension (t).
The regression line in the previous figure is represented by
EM
\ P F T it = α
b + βW
b AGE_STit ( Regression Line )
Ralf Becker and Martyn Andrews Introduction to Handling Data 26 / 47
Simple Regression Model and OLS

Regression analysis is the core technique used in Econometrics. It is

based on certain assumptions about the Population Model and the error
terms u (more on this in the next few weeks).
How to estimate parameters (get αb and β)
b using the available sample of
data? This is typically done by Ordinary Least Squares (OLS).

Ralf Becker and Martyn Andrews Introduction to Handling Data 27 / 47

Simple Regression Model and OLS
mod1 <- lm(EMPFT~WAGE_ST, data= CKdata)
summary(mod1)

##
## Call:
## lm(formula = EMPFT ~ WAGE_ST, data = CKdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.091 -5.898 -2.100 3.005 51.304
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.468 5.807 -1.114 0.2660
## WAGE_ST 3.193 1.255 2.544 0.0114 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.5 on 383 degrees of freedom
## (25 observations deleted due to missingness)
## Multiple R-squared: 0.01662, Adjusted R-squared: 0.01405
## F-statistic: 6.472 on 1 and 383 DF, p-value: 0.01135
Ralf Becker and Martyn Andrews Introduction to Handling Data 28 / 47
OLS - nice output
stargazer(mod1,type="text")

##
## ===============================================
## Dependent variable:
## ---------------------------
## EMPFT
## -----------------------------------------------
## WAGE_ST 3.193**
## (1.255)
##
## Constant -6.468
## (5.807)
##
## -----------------------------------------------
## Observations 385
## R2 0.017
## Adjusted R2 0.014
## Residual Std. Error 8.500 (df = 383)
## F Statistic 6.472** (df = 1; 383)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Ralf Becker and Martyn Andrews Introduction to Handling Data 29 / 47
OLS - calculation and interpretation

How were βb and α

b calculated?

Cov(EM
d P F Tit , W AGE_STit )
βb =
Vd
ar(W AGE_STit )
b = EM P F T it − βb ∗ W AGE_ST it
α

How to interpret βb = 3.193?

An increase of one unit in WAGE_ST (=USD1) is related to an increase in
about 3 full time employees (EMPFT).
Have we established that higher wages cause higher employment?
NO

Ralf Becker and Martyn Andrews Introduction to Handling Data 30 / 47

Regression Analysis - Underneath the hood
Need to recognise that in a sample β̂ and α̂ are really random variables.
For short EMPFT=E and WAGE_ST=W:

Cov(E,
d W)
β̂ =
Vd
ar(W )
Cov(α
d + β W + u, W )
=
Vd
ar(W )
Cov(α,
d W ) + β Cov(W,
d W ) + Cov(u,
d W)
=
Vd
ar(W )
Vd
ar(W ) Cov(u,
d W) Cov(u,
d W)
= β + =β+
Vd
ar(W ) Vd
ar(W ) Vd
ar(W )

So β̂ is a function of the random term u and hence is itself a random

variable. Once Cov(E,
d W ) and Vd
ar(W ) are replaced by sample
estimates we get ONE value which is draw from a random distribution.
Ralf Becker and Martyn Andrews Introduction to Handling Data 31 / 47
OLS - estimator properties
What can we learn from this?
If uit is a random variable, so is βb
Any particular value we get is a draw from a random distribution
An estimator is unbiased if, on average, the estimates would be
equal to the unknown β
at this stage the concept of unbiasedness may still be a little hazy
and that is fine
For this to happen we need to assume that Cov(u, x) = 0 as then
E(β)
b =β
Why do we need to assume this? Because while we do have values
for xit we do not have values for the unobserved error terms uit .
Hence we cannot test this. As you will find out, this is a thinking
exercise and whether it is true/false/sensible/appropriate is at the
core of what we do.
Ralf Becker and Martyn Andrews Introduction to Handling Data 32 / 47
OLS - the exogeneity assumption
For βb in yit = α + βxit + uit to be unbiased (i.e. on average correct) we
needed

Cov(uit , xit ) = 0

This is sometimes called the Exogeneity assumption. The error term has
to be uncorrelated to the explanatory variable xit
There are a lot of reasons why this assumption may be breached.
Simultaneity (W AGE_ST → EM P F T and
EM P F T → W AGE_ST )
Discuss the fact that we have to assume that causailty here goes in both
directions. Hence we cannot attach one one-directional causal interpretation to
the estimated coefficient. If you can estimate the model the other way round

Omitted relevant variables or unobserved heterogeneity

Measurement error in xit
Ralf Becker and Martyn Andrews Introduction to Handling Data 33 / 47
So how to make causal statements

Once we have found reasons to believe in the exogeneity assumption, the

next few lectures is to introduce various standard techniques that use
this assumption:
First Difference
Diff-in-Diff, to be used in Project
Instrumental Variables
Regression Discontinuity
All of them can be thought of as specific ways to apply a regression
model.

Ralf Becker and Martyn Andrews Introduction to Handling Data 34 / 47

Diff-in-Diff - The Problem

Do higher minimum wages decrease employment (as predicted by a

simplistic labour market model)?

Ralf Becker and Martyn Andrews Introduction to Handling Data 35 / 47

The Research Question

“This paper presents new evidence on the effect of minimum wages on

Ralf Becker and Martyn Andrews Introduction to Handling Data 36 / 47

Wage distribution - Pre

Look at the distribution of starting wages before the change in minimum

wage in New Jersey (WAGE_ST).
At this stage it is not so important to understand the commands for
these plots.
The easiest way to plot a histogram is
hist(CKdata$WAGE_ST[CKdata$STATEf == "Pennsylvania"])
where, in square brackets, we select that we only want data fram
Pennsylvania.
hist(CKdata$WAGE_ST[CKdata$STATEf == "Pennsylvania"])
hist(CKdata$WAGE_ST[CKdata$STATEf == "New Jersey"])

Ralf Becker and Martyn Andrews Introduction to Handling Data 37 / 47

Wage distribution - Pre
Or here an alternative visualisation.
ggplot(CKdata,aes(WAGE_ST, colour = STATEf), colour = STATEf) +
geom_histogram(position="identity",
aes(y = ..density..),
bins = 10,
alpha = 0.2) +
ggtitle(paste("Starting wage distribution, Feb/Mar 1992"))

Starting wage distribution, Feb/Mar 1992

2.0

1.5 STATEf
density

1.0 Pennsylvania
New Jersey
0.5

0.0
4.5 5.0 5.5
WAGE_ST
Ralf Becker and Martyn Andrews Introduction to Handling Data 38 / 47
Wage distribution - Pre

Both plots sow that the starting wage distribution is fairly similar in
both states, with peaks at the minimum wage of $4.25 and $5.00.

Ralf Becker and Martyn Andrews Introduction to Handling Data 39 / 47

Policy Evaluation

First we can evaluate whether the legislation has been implemented.

Tab1 <- CKdata %>% group_by(STATEf) %>%
summarise(wage_FEB = mean(WAGE_ST,[Link] = TRUE),
wage_DEC = mean(WAGE_ST2,[Link] = TRUE)) %>%
print()

## # A tibble: 2 x 3
## STATEf wage_FEB wage_DEC
## <fct> <dbl> <dbl>
## 1 Pennsylvania 4.63 4.62
## 2 New Jersey 4.61 5.08
Average wage in New Jersey has increased.

Ralf Becker and Martyn Andrews Introduction to Handling Data 40 / 47

Policy Evaluation - Wage distribution
ggplot(CKdata,aes(WAGE_ST2, colour = STATEf), colour = STATEf) +
geom_histogram(position="identity",
aes(y = ..density..),
bins = 10,
alpha = 0.2) +
ggtitle(paste("Starting wage distribution, Nov/Dec 1992"))

Starting wage distribution, Nov/Dec 1992

3 STATEf
density

2 Pennsylvania

1 New Jersey

0
4.0 4.5 5.0 5.5 6.0
WAGE_ST2

Ralf Becker and Martyn Andrews Introduction to Handling Data 41 / 47

Policy Evaluation - Employment outcomes

Let’s measure employment before and after the policy change.

Calculate two new variables FTE and FTE2 (full time employment
equivalent before and after policy change)
CKdata$FTE <- CKdata$EMPFT + CKdata$NMGRS + 0.5*CKdata$EMPPT
CKdata <- CKdata %>% mutate(FTE2 = EMPFT2 + NMGRS2 + 0.5*EMPPT2)

TabDiD <- CKdata %>% group_by(STATEf) %>%

summarise(meanFTE_FEB = mean(FTE,[Link] = TRUE),
meanFTE_DEC = mean(FTE2,[Link] = TRUE)) %>%
print()

## # A tibble: 2 x 3
## STATEf meanFTE_FEB meanFTE_DEC
## <fct> <dbl> <dbl>
## 1 Pennsylvania 23.3 21.2
## 2 New Jersey 20.4 21.0

Ralf Becker and Martyn Andrews Introduction to Handling Data 42 / 47

Policy Evaluation - Diff-in-Diff estimator
ggplot(CKdata, aes(1992,FTE, colour = STATEf)) +
geom_point(alpha = 0.2) +
geom_point(aes(1993,FTE2),alpha = 0.2) +
labs(x = "Time") +
ggtitle(paste("Employment, FTE"))

Employment, FTE
80

60 STATEf
FTE

40 Pennsylvania
New Jersey
20

0
1992.00 1992.25 1992.50 1992.75 1993.00
Time

Ralf Becker and Martyn Andrews Introduction to Handling Data 43 / 47

Policy Evaluation - Diff-in-Diff estimator
ggplot(CKdata, aes(1992,FTE, colour = STATEf)) +
geom_jitter(alpha = 0.2) +
geom_jitter(aes(1993,FTE2),alpha = 0.2) +
labs(x = "Time") +
ggtitle(paste("Employment, FTE"))

Employment, FTE
80

60 STATEf
FTE

40 Pennsylvania
New Jersey
20

0
1992.0 1992.5 1993.0
Time

Ralf Becker and Martyn Andrews Introduction to Handling Data 44 / 47

Policy Evaluation - Diff-in-Diff estimator
ggplot(TabDiD, aes(1992,meanFTE_FEB, colour = STATEf)) +
geom_point(size = 3) +
geom_point(aes(1993,meanFTE_DEC),size=3) +
ylim(17, 24) +
labs(x = "Time") +
ggtitle(paste("Employment, mean FTE"))

Employment, mean FTE

24
meanFTE_FEB

22 STATEf
Pennsylvania
20
New Jersey
18

1992.00 1992.25 1992.50 1992.75 1993.00

Time

Ralf Becker and Martyn Andrews Introduction to Handling Data 45 / 47

Policy Evaluation - Diff-in-Diff estimator

print(TabDiD)

## # A tibble: 2 x 3
## STATEf meanFTE_FEB meanFTE_DEC
## <fct> <dbl> <dbl>
## 1 Pennsylvania 23.3 21.2
## 2 New Jersey 20.4 21.0
Numerically the DiD estimator is calculated as follows:
(21 - 20.4) - (21.2 - 23.3) = 2.7
Later: This can be calculated using a regression approach (has some
additional advantages)

Ralf Becker and Martyn Andrews Introduction to Handling Data 46 / 47

Outlook

Over the next weeks you will learn

to perform more advanced statistical analysis in R, such as:
I Hypothesis testing
I Multivariate regression analysis
I specification testing

to devise methods to draw causal inference

to understand the main pitfalls of time-series modelling and
forecasting

Ralf Becker and Martyn Andrews Introduction to Handling Data 47 / 47

R-Programming Lab Mannual
No ratings yet
R-Programming Lab Mannual
33 pages
R for Database Management & Analysis
No ratings yet
R for Database Management & Analysis
79 pages
R Language Overview: Ihaka & Gentleman
No ratings yet
R Language Overview: Ihaka & Gentleman
36 pages
Econ 1-2
No ratings yet
Econ 1-2
33 pages
Presentation of R
No ratings yet
Presentation of R
109 pages
DSC2608 Exercise - Manual S1 2025
No ratings yet
DSC2608 Exercise - Manual S1 2025
8 pages
R Basics for Econometrics
No ratings yet
R Basics for Econometrics
43 pages
R Intro2021
No ratings yet
R Intro2021
23 pages
Problem Set 1: Introduction To R - Solutions With R Output: 1 Install Packages
No ratings yet
Problem Set 1: Introduction To R - Solutions With R Output: 1 Install Packages
24 pages
Business Analytics - L2
No ratings yet
Business Analytics - L2
41 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
Econometrics 2019 PDF
No ratings yet
Econometrics 2019 PDF
143 pages
Beginner's Guide to R Programming
No ratings yet
Beginner's Guide to R Programming
155 pages
R for Applied Econometrics Tutorial
No ratings yet
R for Applied Econometrics Tutorial
21 pages
MBA Sem 1 Unit 3 Fundamentals of R
No ratings yet
MBA Sem 1 Unit 3 Fundamentals of R
41 pages
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
No ratings yet
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
72 pages
R1 Uptovisualisation
No ratings yet
R1 Uptovisualisation
122 pages
Module 1: Unit - 1.1: Introduction To Analytics or R Programming
No ratings yet
Module 1: Unit - 1.1: Introduction To Analytics or R Programming
26 pages
R Docs
No ratings yet
R Docs
45 pages
R Data Types and Input Methods
No ratings yet
R Data Types and Input Methods
29 pages
Lab Record
No ratings yet
Lab Record
21 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
R Studio: Scripts, Data Handling & Cleaning
No ratings yet
R Studio: Scripts, Data Handling & Cleaning
25 pages
R Lab
No ratings yet
R Lab
114 pages
Da Session 4
No ratings yet
Da Session 4
75 pages
MultivariateRGGobi PDF
No ratings yet
MultivariateRGGobi PDF
60 pages
R Programming
No ratings yet
R Programming
50 pages
Praktikum Modul 3
No ratings yet
Praktikum Modul 3
5 pages
R Working Materials Prep
No ratings yet
R Working Materials Prep
43 pages
Statistics With R Unit 1: Divya Arun Kumar
No ratings yet
Statistics With R Unit 1: Divya Arun Kumar
65 pages
Basic R Commands For Data Analysis
No ratings yet
Basic R Commands For Data Analysis
7 pages
Sta238 Wks - Week1+2
No ratings yet
Sta238 Wks - Week1+2
35 pages
R Programing
No ratings yet
R Programing
32 pages
Data Preparation: Treatment of Missing Values
No ratings yet
Data Preparation: Treatment of Missing Values
26 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
37 pages
Ecotrix With R and Python
No ratings yet
Ecotrix With R and Python
25 pages
Module 2 ExploratoryDataAnalysis
No ratings yet
Module 2 ExploratoryDataAnalysis
22 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
SAS R::: Cheat Sheet
No ratings yet
SAS R::: Cheat Sheet
2 pages
R Commands
No ratings yet
R Commands
18 pages
Basic Statistics with R Guide
No ratings yet
Basic Statistics with R Guide
241 pages
Data Analytic R
No ratings yet
Data Analytic R
28 pages
Machine Learning with R Guide
No ratings yet
Machine Learning with R Guide
151 pages
Data Analytic Using R - Advanced
No ratings yet
Data Analytic Using R - Advanced
51 pages
R For Introductory Econometrics-1
No ratings yet
R For Introductory Econometrics-1
4 pages
07-ProgrammingR - Programming With Data in R
No ratings yet
07-ProgrammingR - Programming With Data in R
14 pages
Data Preparation and Cleaning Guide
No ratings yet
Data Preparation and Cleaning Guide
28 pages
Unit 2
No ratings yet
Unit 2
76 pages
R Exercises For Modules
100% (1)
R Exercises For Modules
41 pages
Introduction to R Programming
No ratings yet
Introduction to R Programming
34 pages
Lab Manual Record: St. Josephs PG College
No ratings yet
Lab Manual Record: St. Josephs PG College
14 pages
Introduction to R for Statistics
No ratings yet
Introduction to R for Statistics
56 pages
Lab0 R Tutorial EHS
No ratings yet
Lab0 R Tutorial EHS
9 pages
Chap 1
No ratings yet
Chap 1
32 pages
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
No ratings yet
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
15 pages
RStudio Exercices
No ratings yet
RStudio Exercices
8 pages
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
No ratings yet
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
36 pages
Eng MNGT 2
No ratings yet
Eng MNGT 2
3 pages
2023 Paper Referring To The Work by J. D. Littlejohn and C. J. Bruce (1985)
No ratings yet
2023 Paper Referring To The Work by J. D. Littlejohn and C. J. Bruce (1985)
20 pages
Chapter 9 - Communication
No ratings yet
Chapter 9 - Communication
37 pages
PM Debug Info
No ratings yet
PM Debug Info
20 pages
ss1 2nd Term Physics
No ratings yet
ss1 2nd Term Physics
3 pages
Feasibility Study for Tafach Bakery
100% (2)
Feasibility Study for Tafach Bakery
29 pages
Science Class Rules & Guidelines
No ratings yet
Science Class Rules & Guidelines
2 pages
Propylene Glycol Based Heat..
No ratings yet
Propylene Glycol Based Heat..
3 pages
Assignment - Modal Verbs
No ratings yet
Assignment - Modal Verbs
2 pages
Checked DBAO GRADE 10
No ratings yet
Checked DBAO GRADE 10
9 pages
Carbon Fiber Properties
No ratings yet
Carbon Fiber Properties
8 pages
Understanding Softlines in Retail
No ratings yet
Understanding Softlines in Retail
8 pages
Pradhan Mantri Kaushal Vikas Yojana 4.0
No ratings yet
Pradhan Mantri Kaushal Vikas Yojana 4.0
3 pages
Understanding Mental Health Concepts
No ratings yet
Understanding Mental Health Concepts
123 pages
Half Yearly English
No ratings yet
Half Yearly English
5 pages
Not Without My Daughter-Betty Mahmoody
80% (97)
Not Without My Daughter-Betty Mahmoody
385 pages
Mammalian Cleavage and Development
No ratings yet
Mammalian Cleavage and Development
23 pages
Redundant Expressions
100% (2)
Redundant Expressions
4 pages
Ehrmann Apostolic Fathers
96% (26)
Ehrmann Apostolic Fathers
457 pages
January 2020 Mark Scheme
No ratings yet
January 2020 Mark Scheme
16 pages
Shadowfever Moning Karen Marie Download
No ratings yet
Shadowfever Moning Karen Marie Download
40 pages
GRADE 11. Imam Al-Bukhārī' - Emir of The Believers in Hadith
No ratings yet
GRADE 11. Imam Al-Bukhārī' - Emir of The Believers in Hadith
3 pages
3D Studio Max Office Furniture Tutorial
No ratings yet
3D Studio Max Office Furniture Tutorial
7 pages
Jindal Saw Safety Inspection Report
No ratings yet
Jindal Saw Safety Inspection Report
77 pages
7-11 Liloy ZDN Emboss (MND)
No ratings yet
7-11 Liloy ZDN Emboss (MND)
1 page
AirFit N30i & P30i Mask User Guide
No ratings yet
AirFit N30i & P30i Mask User Guide
42 pages
Tutorial 08 Part 1
No ratings yet
Tutorial 08 Part 1
95 pages
Quadrilateral in English (Answer Bold)
No ratings yet
Quadrilateral in English (Answer Bold)
11 pages
HFD INTERPRETATION (AutoRecovered)
No ratings yet
HFD INTERPRETATION (AutoRecovered)
1 page
Unbrako Fasteners Catalogue Overview
No ratings yet
Unbrako Fasteners Catalogue Overview
48 pages