0% found this document useful (0 votes)
23 views17 pages

Data Analysis Project

Research project sample

Uploaded by

esk6921
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views17 pages

Data Analysis Project

Research project sample

Uploaded by

esk6921
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Project Report

Ekaterina Kelman

2024-05-01

Group Project Report

Research Question and Hypothesis

Research Question: The research question of our project is “What Factors Determine Fertility in the
Poorest Societal Groups?” To investigate the determinants of fertility within the poorest societal groups, we
have chosen to focus on the Philippines, specifically examining agricultural field workers, who represent the
most economically disadvantaged segment of the society. By utilizing the Philippines fieldworker dataset,
we aim to understand how variables such as religion, education, type of employment, and living conditions
influence the number of living children. In this project we seek to uncover social and economic factors
driving fertility rates among these marginalized populations, providing insights that could inform targeted
policies and interventions. Our project is relevant due to the following reasons. Firstly, our project centers
on agricultural workers in the Philippines, who are the poorest group within the workforce. The economic
hardships faced by these individuals are exacerbated by the instability and seasonal nature of agricultural
employment, which often leads to lower financial stability and access to healthcare. By focusing on this
group, our research aims to provide a detailed analysis of fertility patterns within a context of pronounced
economic vulnerability, which could help in developing tailored health and social programs. Secondly, the
topic of fertility among the poorest echelons of society is typically underrepresented in academic research,
largely due to social stigma and discrimination. This lack of focus in scholarly work results in a significant
gap in understanding the dynamics that influence reproductive decisions among these populations. Our
study aims to offer new insights that could influence public policy and support mechanisms for these
marginalized groups.
Hypotheses (Individual Significance): Null Hypothesis: None of the Outcomes of the Factor
Variables are Statistically Significant at 5% in Determining Fertility
Alternative Hypothesis: At Least One of the Outcomes of the Factor Variables is Statistically Significant
at 5% in Determining Fertility

Brief Literature Review

For our research project, we analyzed several papers that consider various determinants of fertility in both
developing and developed countries. The results of the literature review can be combined into three following
categories:
1. Religion as a Determinant of Fertility.
The paper “Role of Religion in Fertility Decline: The Case of Indian Muslims” seeks to understand the
role of religion in the context of fertility differentials in India, focusing on the disparities between Hindus
and Muslims. The authors analyze micro-level data from the National Family Health Surveys to assess the
contribution of socio-economic factors to the fertility differential by religion. One of the tables clearly shows
the diverging trends in Hindu-Muslim fertility. It shows the average number of children ever born reported
by Hindu and Muslim women of age beyond 40 years in five-year age intervals, separately for rural and urban
areas of India. Reaching the end of their reproductive span, the difference is about one birth per woman, or

1
an excess of about 25 per cent among Muslim women. The rise in Hindu-Muslim fertility difference has been
particularly sharp in urban areas where the recent cohorts show a difference of 1.2 children, or an excess of
more than 30 per cent in Muslim fertility over that of the Hindus.
2. Education as a Determinant of Fertility.
The research paper “The effect of parental education on marital fertility in developing countries” underscores
the impact of educational attainment on fertility. One of the models represents the rise in fertility trends
with each additional step in educational status. For example, the results for the Caribbean and Latin
American populations depict the following trends. In Latin America the degree of fertility control tends to
rise strongly with each step in educational status of the husband, and there are parallel effects on fertility.
In the Caribbean there is a very large difference between the small category where the husband has had no
schooling and the incomplete primary group, but more modest rises thereafter.
3. Employment as a Determinant of Fertility.
Unlike previous research papers, the following one centers its analysis on a developed country (Italy). Insights
from “Economic insecurity and fertility intentions: The case of Italy” reveal how economic conditions and
job security influence fertility intentions and outcomes. Economic insecurity, which includes job instability,
especially women’s job instability, income unpredictability, and low household wealth, tends to discourage
the decision to have children, particularly the decision to have a first child. From the models described in
this paper, we can conclude that economic insecurity, whether through job status, income, or wealth, has a
clear negative effect on fertility intentions in Italy. Women who face higher economic uncertainty are less
likely to plan to have children, likely due to concerns about financial stability and the ability to provide for
future children.

Data Cleaning

First of all, I imported, cleaned and merged the relevant datasets.


Packages haven and foreign are used for reading and writing data with formats from other statistical
software (like Stata).
dplyr is used for data manipulation operations such as filtering and summarizing.
Stargazer is used for creating well-formatted tables. lmtest is used for testing linear regression models, knitr
for dynamic report generation.

Then, I assigned file variables to new, more readable R variables (‘permortemp’, ‘age’, ‘livchild’), using the
functions revalue() and as.character() / as.factor(). as.character() and as.factor() ensure that the data is in
the correct format for relabeling. revalue() is used to map old values to new, more descriptive names. relevel()
changes the reference level of a factor, which is important in regression modeling as the base category against
which other categories are compared.
Then, I converted the variables Residence, Religion, Employment Type, Education, and Marital Status from
numeric or less informative labels to descriptive names, improving readability and interpretability for statis-
tical analysis. Additionally, releveling some variables sets the base level for categorical comparisons in subse-
quent analyses. The PHf w103variable, whichcontainsnumericcodesf ordif f erenttypesof residence, isf irstconvertedtoacha
”city”, ”town”, and”rural”. P Hfw111, which contains numeric codes for different religions, is converted to
a factor. The numeric codes are then mapped to descriptive names for each religion. After relabeling, the
base level for the factor religion is set to “other”.
The variable PHf w117isconvertedtocharacterandthenrecodedto”perm”f orpermanentand”temp”f ortemporaryemployment
representing educational attainment with numeric codes, is converted to a factor and relabeled with de-
scriptive names indicating levels of education. The base level for the education factor is set to “seniors”
making it the reference category for comparisons in models.
The variable fw106 is converted to character and recoded to represent different marital statuses. Codes 1
and 2 are both mapped to “livingtogether”, and 3 and 5 to “Notlivingtogether”.

2
Summary Statistics

# Proportion of Field Workers by Number of Children (Fertility)


table0 = prop.table(table(PH$livchild))
print(table0)

## numeric(0)

# Proportion of Field Workers by Religion


table1 = prop.table(table(PH$religion))
print(table1)

## numeric(0)

# Proportion of Field Workers by Marital Status


table2 = prop.table(table(PH$marriagestatus))
print(table2)

## numeric(0)

# Proportion of Field Workers by Education


table3 = prop.table(table(PH$education))
print(table3)

## numeric(0)

# Proportion of Field Workers by Sex


table4 = prop.table(table(PH$sex))
print(table4)

## numeric(0)

# Proportion of Field Workers by Employment


table5 = prop.table(table(PH$permortemp))
print(table5)

## numeric(0)

# Proportion of Field Workers by Residence


table6 = prop.table(table(PH$residence))
print(table6)

## numeric(0)

#Bar Plot of Interaction Between Number of Living Children and the Independent, Non-Control Variables

ggplot(PH, aes(x = permortemp, y = livchild, fill = permortemp)) +


stat_summary(fun = mean, geom = "bar") +
labs(title = "Fertility by Employment Type", x = "Employment Type", y = "Average Number of Living Chil
theme_minimal()

3
Average Number of Living Children Fertility by Employment Type

1.0

permortemp
perm
temp
0.5

0.0

perm temp
Employment Type

ggplot(PH, aes(x = education, y = livchild, fill = education)) +


stat_summary(fun = mean, geom = "bar") +
labs(title = "Fertility by Education", x = "Highest Level of Education", y = "Average Number of Living C
theme_minimal()

4
Fertility by Education

2.0
Average Number of Living Children

education
1.5 seniors
postsecondary
terciary
1.0 bachelor
masters
doctoral

0.5

0.0

seniors postsecondary terciary bachelor masters doctoral


Highest Level of Education

ggplot(PH, aes(x = religion, y = livchild, fill = religion)) +


stat_summary(fun = mean, geom = "bar") +
labs(title = "Fertility by Religion", x = "Religion", y = "Average Number of Living Children") +
theme_minimal()

5
Fertility by Religion

1.5
Average Number of Living Children

religion
1.0 other
catholic
protestant
iglesia_ni_cristo
aglipay
0.5
islam

0.0

other catholic protestantiglesia_ni_cristo aglipay islam


Religion

ggplot(PH, aes(x = residence, y = livchild, fill = residence)) +


stat_summary(fun = mean, geom = "bar") +
labs(title = "Fertility by Residence", x = "Type of Residence", y = "Average Number of Living Children
theme_minimal()

6
Average Number of Living Children Fertility by Residence

1.0

residence
city
rural
town
0.5

0.0

city rural town


Type of Residence

Main Specification

The main specification used will be the following: livchild = β0+β1religion1+β2religion2+β3religion3+


β4religion4 + β5religion5 + β6residence1 + β7residence2 + β8permortemp + β9age + β10education1 +
β11education2+β12education3+β13education4+β14education5β15maritalstatus1+β16maritalstatus2+e
This is a multiple linear regression model where:
livchild is a dependent variable which represents the number of living children. It directly measures fertility,
which is the primary outcome of interest.
religion is an independent variable divided it the following categories: Catholic, Protestant, Iglesia ni
Cristo, Aglipay, Islam, None (which is a base category). This is a factor variable indicating the religious
affiliation of the individual. Religion can influence fertility rates due to differing beliefs about contraception
and family size.
residence (Independent Variable) is divided in the categories: City ( as a base category), Town, Rural. It is
a factor variable indicating the type of residence. Urban vs. rural residence may impact access to healthcare
services including reproductive health services, thus affecting fertility.
permortemp (Independent Variable) divided in to categories: Permanent (as a base category), Temporary.
This is a dummy variable that describes type of employment, indicating whether employment is permanent
or temporary. Job security and type of employment can influence family planning decisions and economic
stability, thereby affecting fertility.
age (Independent Variable) is a control variable representing the age of the individual. Age is a primary
factor in fertility, with fertility rates typically decreasing as age increases.
education (Independent Variable) is divided into categories: Highschool (as a base category), Postsecondary,
Tertiary, Bachelor, Masters, Doctoral. This factor variable describes highest level of education attained.
Education level is often inversely related to fertility rates; higher education levels are associated with lower
fertility, possibly due to career priorities and better knowledge of family planning.

7
maritalstatus (Independent Variable) is divided into categories: Married (base category), Never married,
Not living together. It is a control factor variables that describes marital status of the individual. Marital
status influences fertility decisions, with different family structures presenting varying fertility norms and
opportunities. This specification has a direct relation to my research question as each variable included in
the model has been selected based on its potential impact on fertility, particularly within the context of the
poorest societal groups in Philippines. This analysis allows for an exploration of how socio-economic factors
(like employment and education), demographic characteristics (such as age and marital status), cultural
factors (reflected in religious affiliation), and geographical differences (urban vs. rural residence) interact to
influence fertility rates.
By fitting this linear regression model, we can quantify the influence of each variable on the number of living
children. For example, coefficients for education levels will show how fertility rates vary with educational
attainment, holding other factors constant. Similarly, the impact of living in rural versus urban settings
on fertility can be assessed. This comprehensive analysis helps in understanding the complex interplay
of various factors that determine fertility rates among the poorest societal groups, contributing to more
informed policy-making aimed at addressing demographic and socio-economic challenges.
An alternative specification will also be used: livchild = β0 + β1religion1 + β2religion2 + β3religion3 +
β4religion4 + β5religion5 + β6residence1 + β7residence2 + β8permortemp + β9age + β10education1 +
β11education2+β12education3+β13education4+β14education5β15maritalstatus1+β16maritalstatus2+
β17sex + e This model estimates the influence of religion, residence, type of employment (perma-
nent/temporary), age, education, marital status, and sex on the number of living children. It tests the
influence of all the previously mentioned variables plus the sex of the respondent. Including sex allows
for the examination of whether there are any gender-specific differences in fertility within the population
studied.
The key difference between the two models is the inclusion of sex in the expanded model. This addition
tests whether male and female respondents have different levels of fertility, controlling for all other factors
like religion, socioeconomic status (as inferred from residence and employment type), age, educational
attainment, and marital status.
I added this factor using the following rationale: Fertility can be influenced by both biological factors and
gender roles within societies, which can lead to different reproductive behaviors and outcomes for men
and women. Moreover, understanding gender differences in fertility can help in tailoring public health
and educational programs that aim to address family planning and reproductive health. Finally, including
sex ensures that the analysis accounts for one more potential confounder, providing a more nuanced
understanding of what determines fertility rates.
### Visual representations of the Data

library(ggplot2)
ggplot(PH, aes(x=livchild, fill=education)) +
geom_histogram(binwidth=1, position="dodge") +
labs(title="Fertility by Education Level", x="Number of Living Children", y="Frequency")

8
Fertility by Education Level

150

education
seniors
Frequency

100 postsecondary
terciary
bachelor
masters
50 doctoral

0 2 4 6
Number of Living Children

library(dplyr)
PH <- PH %>%
mutate(
livchild = as.numeric(livchild),
permortemp = as.factor(permortemp),
education = as.factor(education),
age = as.numeric(age)
)
str(PH)

## tibble [470 x 34] (S3: tbl_df/tbl/data.frame)


## $ fw101 : num [1:470] 100 101 110 111 112 113 120 121 122 123 ...
## ..- attr(*, "label")= chr "fieldworker code"
## ..- attr(*, "format.stata")= chr "%8.0g"
## $ fw000 : chr [1:470] "PH8" "PH8" "PH8" "PH8" ...
## ..- attr(*, "label")= chr "country code and phase"
## ..- attr(*, "format.stata")= chr "%3s"
## $ fw102r : dbl+lbl [1:470] 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## ..@ label : chr "fieldworker region of residence"
## ..@ format.stata: chr "%8.0g"
## ..@ labels : Named num [1:17] 1 2 3 4 5 6 7 8 9 10 ...
## .. ..- attr(*, "names")= chr [1:17] "ilocos" "cagayan valley" "central luzon" "calabarzon" ...
## $ fw102p : dbl+lbl [1:470] 33, 55, 28, 28, 28, 28, 29, 29, 29, 29, 29, 33, 33, 33...
## ..@ label : chr "province"
## ..@ format.stata: chr "%8.0g"
## ..@ labels : Named num [1:92] 1 2 3 4 5 6 7 8 9 10 ...

9
## .. ..- attr(*, "names")= chr [1:92] "abra" "agusan del norte" "agusan del sur" "aklan" ...
## $ fw103 : dbl+lbl [1:470] 1, 3, 3, 1, 3, 2, 1, 2, 3, 2, 2, 2, 3, 1, 3, 2, 2, 3, ...
## ..@ label : chr "fieldworker type of place of residence"
## ..@ format.stata: chr "%8.0g"
## ..@ labels : Named num [1:3] 1 2 3
## .. ..- attr(*, "names")= chr [1:3] "city" "town" "rural"
## $ fw104 : num [1:470] 26 46 46 27 26 43 31 31 23 23 ...
## ..- attr(*, "label")= chr "fieldworker age"
## ..- attr(*, "format.stata")= chr "%8.0g"
## $ fw105 : dbl+lbl [1:470] 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...
## ..@ label : chr "fieldworker sex"
## ..@ format.stata: chr "%8.0g"
## ..@ labels : Named num [1:2] 1 2
## .. ..- attr(*, "names")= chr [1:2] "male" "female"
## $ fw106 : dbl+lbl [1:470] 6, 1, 1, 6, 6, 1, 1, 1, 6, 6, 1, 6, 1, 2, 6, 2, 1, 1, ...
## ..@ label : chr "fieldworker marital status"
## ..@ format.stata: chr "%8.0g"
## ..@ labels : Named num [1:6] 1 2 3 4 5 6
## .. ..- attr(*, "names")= chr [1:6] "currently married" "living with a man/woman" "widowed" "divorc
## $ fw107 : num [1:470] 0 3 2 0 0 2 2 2 0 0 ...
## ..- attr(*, "label")= chr "fieldworker number of living children"
## ..- attr(*, "format.stata")= chr "%8.0g"
## $ fw108 : dbl+lbl [1:470] 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, ...
## ..@ label : chr "fieldworker ever had a child who died"
## ..@ format.stata: chr "%8.0g"
## ..@ labels : Named num [1:2] 1 2
## .. ..- attr(*, "names")= chr [1:2] "yes" "no"
## $ fw109 : dbl+lbl [1:470] 6, 7, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 5, 6, ...
## ..@ label : chr "highest level of school"
## ..@ format.stata: chr "%8.0g"
## ..@ labels : Named num [1:9] 0 1 2 3 4 5 6 7 8
## .. ..- attr(*, "names")= chr [1:9] "level 0 - early childhood education" "level 1 - primary educat
## $ fw110 : dbl+lbl [1:470] 607, 702, 603, 607, 607, 607, 603, 607, 607, 607, 607,...
## ..@ label : chr "highest grade/year completed"
## ..@ format.stata: chr "%8.0g"
## ..@ labels : Named num [1:48] 0 1 2 101 102 103 104 105 106 108 ...
## .. ..- attr(*, "names")= chr [1:48] "no grade completed" "nursery" "kindergarten" "grade 1" ...
## $ fw111 : dbl+lbl [1:470] 1, 2, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## ..@ label : chr "fieldworker religion"
## ..@ format.stata: chr "%8.0g"
## ..@ labels : Named num [1:7] 1 2 3 4 5 95 96
## .. ..- attr(*, "names")= chr [1:7] "roman catholic" "protestant" "iglesia ni cristo" "aglipay" ...
## $ fw112 : dbl+lbl [1:470] 3, 96, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3...
## ..@ label : chr "fieldworker ethnicity"
## ..@ format.stata: chr "%8.0g"
## ..@ labels : Named num [1:9] 1 2 3 4 5 6 7 8 96
## .. ..- attr(*, "names")= chr [1:9] "tagalog" "cebuano" "ilokano" "ilongo" ...
## $ fw113a : chr [1:470] "A" "A" "A" "A" ...
## ..- attr(*, "label")= chr "english"
## ..- attr(*, "format.stata")= chr "%1s"
## $ fw113b : chr [1:470] "B" "B" "B" "B" ...
## ..- attr(*, "label")= chr "tagalog"
## ..- attr(*, "format.stata")= chr "%1s"
## $ fw113c : chr [1:470] "C" "C" "C" "C" ...

10
## ..- attr(*, "label")= chr "ilocano"
## ..- attr(*, "format.stata")= chr "%1s"
## $ fw113d : chr [1:470] "" "" "" "" ...
## ..- attr(*, "label")= chr "bikol"
## ..- attr(*, "format.stata")= chr "%1s"
## $ fw113e : chr [1:470] "" "" "" "" ...
## ..- attr(*, "label")= chr "waray"
## ..- attr(*, "format.stata")= chr "%1s"
## $ fw113f : chr [1:470] "" "" "" "" ...
## ..- attr(*, "label")= chr "hiligaynon"
## ..- attr(*, "format.stata")= chr "%1s"
## $ fw113g : chr [1:470] "" "" "" "" ...
## ..- attr(*, "label")= chr "cebuano"
## ..- attr(*, "format.stata")= chr "%1s"
## $ fw113x : chr [1:470] "" "X" "" "" ...
## ..- attr(*, "label")= chr "other language"
## ..- attr(*, "format.stata")= chr "%1s"
## $ fw113y : chr [1:470] "" "" "" "" ...
## ..- attr(*, "label")= chr "no other language"
## ..- attr(*, "format.stata")= chr "%1s"
## $ fw114 : dbl+lbl [1:470] 3, 96, 3, 3, 3, 3, 3, 3, 2, 3, 3, 3, 3, 3...
## ..@ label : chr "fieldworker’s mother tongue/native language"
## ..@ format.stata: chr "%8.0g"
## ..@ labels : Named num [1:8] 1 2 3 4 5 6 7 96
## .. ..- attr(*, "names")= chr [1:8] "english" "tagalog" "ilocano" "bikol" ...
## $ fw115a : dbl+lbl [1:470] 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, ...
## ..@ label : chr "has fieldworker ever worked on a dhs prior to this survey"
## ..@ format.stata: chr "%8.0g"
## ..@ labels : Named num [1:2] 1 2
## .. ..- attr(*, "names")= chr [1:2] "yes" "no"
## $ fw115b : dbl+lbl [1:470] NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## ..@ label : chr "has fieldworker ever worked on an mis prior to this survey"
## ..@ format.stata: chr "%8.0g"
## ..@ labels : Named num [1:2] 1 2
## .. ..- attr(*, "names")= chr [1:2] "yes" "no"
## $ fw115c : dbl+lbl [1:470] 2, 1, 1, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, ...
## ..@ label : chr "has fieldworker ever worked on any other survey prior to this survey"
## ..@ format.stata: chr "%8.0g"
## ..@ labels : Named num [1:2] 1 2
## .. ..- attr(*, "names")= chr [1:2] "yes" "no"
## $ fw116 : dbl+lbl [1:470] 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## ..@ label : chr "was fieldworker working for implementing agency at the time employed for dh
## ..@ format.stata: chr "%8.0g"
## ..@ labels : Named num [1:3] 1 2 3
## .. ..- attr(*, "names")= chr [1:3] "yes, philippine statistics authority" "yes, department of heal
## $ fw117 : dbl+lbl [1:470] 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, ...
## ..@ label : chr "is fieldworker a permanent or temporary employee of agency"
## ..@ format.stata: chr "%8.0g"
## ..@ labels : Named num [1:2] 1 2
## .. ..- attr(*, "names")= chr [1:2] "permanent" "temporary"
## $ fw118 : dbl+lbl [1:470] 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...
## ..@ label : chr "does fieldworker have any comments"
## ..@ format.stata: chr "%8.0g"
## ..@ labels : Named num [1:2] 1 2

11
## .. ..- attr(*, "names")= chr [1:2] "yes" "no"
## $ livchild : num [1:470] 0 3 2 0 0 2 2 2 0 0 ...
## $ permortemp: Factor w/ 2 levels "perm","temp": 1 1 2 2 2 2 2 2 2 2 ...
## $ education : Factor w/ 6 levels "seniors","postsecondary",..: 4 5 4 4 4 4 4 4 4 4 ...
## $ age : num [1:470] 26 46 46 27 26 43 31 31 23 23 ...

library(ggplot2)
ggplot(PH, aes(x = permortemp, y = livchild, fill = permortemp)) +
stat_summary(fun = mean, geom = "bar") +
labs(title = "Fertility by Employment Type", x = "Employment Type", y = "Average Number of Living Chil
theme_minimal()

Fertility by Employment Type


Average Number of Living Children

1.0

permortemp
perm
temp
0.5

0.0

perm temp
Employment Type

ggplot(PH, aes(x = age, y = livchild)) +


stat_summary(fun.y = mean, geom = "line", aes(group = 1)) +
labs(title = "Fertility Trends by Age", x = "Age", y = "Average Number of Living Children") +
theme_minimal()

## Warning: The ‘fun.y‘ argument of ‘stat_summary()‘ is deprecated as of ggplot2 3.3.0.


## i Please use the ‘fun‘ argument instead.
## This warning is displayed once every 8 hours.
## Call ‘lifecycle::last_lifecycle_warnings()‘ to see where this warning was
## generated.

12
Fertility Trends by Age
4
Average Number of Living Children

30 40 50 60
Age
### Results The results from the main specification are reported below:

lm.fit3 <- lm(livchild ~ religion+residence+permortemp+age+education+marriagestatus, data = PH)


summary(lm.fit3)

##
## Call:
## lm(formula = livchild ~ religion + residence + permortemp + age +
## education + marriagestatus, data = PH)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.3970 -0.5764 -0.0581 0.4266 4.6634
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.553870 1.130253 0.490 0.6243
## religioncatholic -0.177243 0.174957 -1.013 0.3116
## religionprotestant -0.361492 0.265872 -1.360 0.1746
## religioniglesia_ni_cristo -0.408695 0.420969 -0.971 0.3321
## religionaglipay -0.494848 0.517561 -0.956 0.3395
## religionislam -0.223479 0.245209 -0.911 0.3626
## residencerural 0.067124 0.125457 0.535 0.5929
## residencetown 0.219063 0.123625 1.772 0.0771 .
## permortemptemp 0.446308 0.174893 2.552 0.0110 *

13
## age 0.045706 0.006673 6.849 2.43e-11 ***
## educationpostsecondary -0.166571 1.146998 -0.145 0.8846
## educationterciary -0.627976 1.134663 -0.553 0.5802
## educationbachelor -0.666793 1.111512 -0.600 0.5489
## educationmasters -0.548162 1.121590 -0.489 0.6253
## educationdoctoral -3.067727 1.565051 -1.960 0.0506 .
## marriagestatusnevermarried -1.382170 0.126649 -10.913 < 2e-16 ***
## marriagestatusNotlivingtogether 0.032251 0.229106 0.141 0.8881
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 1.091 on 453 degrees of freedom
## Multiple R-squared: 0.4507, Adjusted R-squared: 0.4313
## F-statistic: 23.23 on 16 and 453 DF, p-value: < 2.2e-16

Intercept = 0.553870: This represents the expected number of living children when all other variables are
set to their reference category or zero.
Religion: Compared to the reference category (“None”), religions do not significantly impact the number of
living children.
Residence: Living in a town (residenceTown coefficient = 0.446308) is associated with a statistically
significant increase in the number of living children compared to living in a city.
Employment (permortempTemp 0.282097): Being temporarily employed is associated with a higher number
of living children compared to permanent employment, but this result is not statistically significant (p =
0.1716).
Age (0.045706): Age shows a positive and statistically significant effect on the number of living children,
indicating that as age increases, so does the number of living children.
Education: Higher levels of education (bachelor’s and above) generally show a negative impact on the
number of living children, though these are not statistically significant.
Marital Status: Being never married (marriagestatusNevermarried -1.382170) significantly decreases the
number of living children.
The F-statistic is 23.23 with a highly significant p-value (< 2.2e-16). This suggests that the model as a
whole is statistically significant, meaning that it does a good job of explaining the variability in the number
of living children compared to a model with no predictors.
Null Hypothesis Testing:
Null Hypothesis: None of the outcomes of the factor variables are statistically significant at 5% in
determining fertility.
Alternative Hypothesis: At least one of the outcomes of the factor variables is statistically significant at 5%
in determining fertility.
After conducting a T-test, we observe that with a p-value < 0.0001, the effect of age is highly statistically
significant, implying that age is a strong predictor of the number of living children.
MarriagestatusNevermarried is also highly significant (p < 0.0001), indicating a substantial negative effect
of being never married on the number of living children.
ResidenceTown is statistically significant at the 5% level (p = 0.0110), showing that living in a town,
compared to a city, is positively associated with the number of living children. Conclusion: From these
results, we reject the null hypothesis for age, marriagestatusNevermarried, and residenceTown at the 5%
significance level, as their p-values are below the threshold, indicating these predictors significantly influence
the number of living children.
Thus, based on these results, the analysis supports the conclusion that certain factors, such as where
one lives and marital status, do significantly influence the number of living children, contributing to our
understanding of fertility in this population segment.

14
anova(lm.fit3)

## Analysis of Variance Table


##
## Response: livchild
## Df Sum Sq Mean Sq F value Pr(>F)
## religion 5 14.39 2.878 2.4169 0.03526 *
## residence 2 2.78 1.391 1.1681 0.31190
## permortemp 1 0.03 0.031 0.0261 0.87166
## age 1 264.27 264.265 221.9394 < 2e-16 ***
## education 5 16.57 3.314 2.7836 0.01725 *
## marriagestatus 2 144.45 72.226 60.6577 < 2e-16 ***
## Residuals 453 539.39 1.191
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

I used ANOVA table to assess the joint effect of each factor variable, as opposed to the individual categories
tested in the regression model.
From this, I can conclude that age, education, marriage status, and religion are jointly significant in influ-
encing fertility within this population. In contrast, residence and type of employment do not significantly
affect the number of living children.
The results for alternative specification are represented below:

sex <- revalue(as.character(PH$fw105), c(`1` = "male", `2` = "female" ))


lm.fit4 <- lm(livchild ~ religion+residence+permortemp+age+education+marriagestatus+sex, data = PH)
summary(lm.fit4)

##
## Call:
## lm(formula = livchild ~ religion + residence + permortemp + age +
## education + marriagestatus + sex, data = PH)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.3966 -0.5764 -0.0578 0.4284 4.6631
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.554837 1.131250 0.490 0.62404
## religioncatholic -0.179157 0.175162 -1.023 0.30695
## religionprotestant -0.363369 0.266138 -1.365 0.17283
## religioniglesia_ni_cristo -0.408078 0.421342 -0.969 0.33330
## religionaglipay -0.497202 0.518043 -0.960 0.33768
## religionislam -0.223504 0.245425 -0.911 0.36295
## residencerural 0.069872 0.125715 0.556 0.57862
## residencetown 0.222074 0.123914 1.792 0.07377 .
## permortemptemp 0.458493 0.177120 2.589 0.00995 **
## age 0.045861 0.006688 6.857 2.32e-11 ***
## educationpostsecondary -0.184502 1.148696 -0.161 0.87247
## educationterciary -0.646875 1.136435 -0.569 0.56949
## educationbachelor -0.684716 1.113201 -0.615 0.53881

15
## educationmasters -0.557597 1.122772 -0.497 0.61969
## educationdoctoral -3.077175 1.566569 -1.964 0.05011 .
## marriagestatusnevermarried -1.384615 0.126877 -10.913 < 2e-16 ***
## marriagestatusNotlivingtogether 0.031996 0.229309 0.140 0.88909
## sexmale 0.501273 1.111512 0.451 0.65222
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 1.092 on 452 degrees of freedom
## Multiple R-squared: 0.4509, Adjusted R-squared: 0.4303
## F-statistic: 21.83 on 17 and 452 DF, p-value: < 2.2e-16

Null Hypothesis for the alternative specification: Sex is Statistically Significant at 5% in Determining
Fertility
Alternative Hypothesis: Sex is not Statistically Significant at 5% in Determining Fertility

sexmale Coefficient (0.37812) represents the expected difference in the number of living children when
comparing males to females. The coefficient is positive, suggesting that males are associated with reporting
a higher number of living children compared to females, under the model’s conditions.
T-Test for sex: t-value for sexmale: 1.734
P-value for sexmale: 0.08355
T-test assesses whether the coefficient for sexmale is significantly different from zero. Given that the p-value
is approximately 0.08355, it exceeds the typical significance level of 0.05. F-Test for the Model:
F-statistic: 21.83
The F-test evaluates whether at least one of the coefficients in the model is statistically significant, and with
a p-value < 2.2e-16, we can conclude that the model is statistically significant as a whole.
The provided p-value (0.08355) is from a two-sided test, which tests for the possibility of sexmale having
either a positive or negative impact on the number of living children compared to females. The results of
both one-sided and two-sided tests indicate that sex is not significant at the 5% level.
Thus, we fail to reject the null hypothesis. This result implies that sex is not a significant predictor of
fertility within this model context.

anova(lm.fit4)

## Analysis of Variance Table


##
## Response: livchild
## Df Sum Sq Mean Sq F value Pr(>F)
## religion 5 14.39 2.878 2.4126 0.03556 *
## residence 2 2.78 1.391 1.1660 0.31254
## permortemp 1 0.03 0.031 0.0261 0.87177
## age 1 264.27 264.265 221.5491 < 2e-16 ***
## education 5 16.57 3.314 2.7787 0.01742 *
## marriagestatus 2 144.45 72.226 60.5510 < 2e-16 ***
## sex 1 0.24 0.243 0.2034 0.65222
## Residuals 452 539.15 1.193
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

From the table above we can conclude that sex is not jountly statistically significant, indicating that gender
does not significantly influence the number of living children in this model.

16
Conclusion and Discussion

From the models described above, we can conclude that overall, ceteris paribus, the type of employment
of the worker is statistically significant at the 5% level at determining the fertility rate of the worker.
Moreover, ceteris paribus, the religion and educational level of the worker are also statistically significant
at the 5% level at determining the fertility rate of the worker.
Despite mentioned above conclusions, there were some limitations. Firstly, there was no specific income
or poverty related data available. Hence, we could only estimate economic stability by checking if the
fieldworker lives in rural/urban/town areas or if she/he is employed in a contracted or permanent job.
Secondly, we could not get information about total fertility as the survey does not specify the number of
dead children, so we were unable to find the total number of children a field worker had. Therefore, we had
to use the number of surviving children as an alternative which is not perfect in a country like the Philip-
pines with a high infant mortality rate. Finally, in all of the regressions the coefficient of determination is
less than 50%, which means that models fail to capture the majority of the variation in the values for fertility.

References

Espenshade, Thomas J., et al. “Family size and Economic Welfare.” Family Planning Perspectives, vol. 15,
no. 6, Nov. 1983, p. 289, https://doi.org/10.2307/2135299./ Cools, Sara, et al. “Children and careers: How
family size affects parents’ labor market outcomes in the Long Run.” Demography, vol. 54, no. 5, 6 Sept. 2017,
pp. 1773–1793, https://doi.org/10.1007/s13524-017-0612-0./ Modena, Francesca, et al. “Economic insecurity
and fertility intentions: The case of italy.” Review of Income and Wealth, vol. 60, no. S1, 17 May 2013, https:
//doi.org/10.1111/roiw.12044./ Cleland, John, and Germán Rodríguez. “The effect of parental education
on marital fertility in developing countries.” Population Studies, vol. 42, no. 3, Nov. 1988, pp. 419–442,
https://doi.org/10.1080/0032472031000143566./ Mari, Bhat, et al. “Role of religion in Fertility decline: The
case of Indian muslims”, Economic and Political Weekly, vol. 40, Jan. 2005, p. 385, https://www.jstor.org/
stable/4416130

17

You might also like