0% found this document useful (0 votes)
23 views9 pages

Time Series Design 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views9 pages

Time Series Design 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Original Article

The Case Time Series Design


Downloaded from [Link] by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1

Antonio Gasparrinia,b

(Epidemiology 2021;32: 829–837)


Abstract: Modern data linkage and technologies provide a way to recon-
AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC1y0abggQZXdtwnfKZBYtws= on 05/19/2024

struct detailed longitudinal profiles of health outcomes and predictors at


the individual or small-area level. Although these rich data resources offer BACKGROUND
the possibility to address epidemiologic questions that could not be feasi- Observational studies aim to discover and understand
bly examined using traditional studies, they require innovative analytical causal relationships between exposures and health outcomes
approaches. Here we present a new study design, called case time series, through the analysis of epidemiologic data.1 Paramount to this
for epidemiologic investigations of transient health risks associated with objective is removing biases due to the nonexperimental setting,
time-varying exposures. This design combines a longitudinal structure in the first place confounding. It is, therefore, no surprise that
and flexible control of time-varying confounders, typical of aggregated
traditional approaches based on cohort and case–control meth-
time series, with individual-level analysis and control-by-design of time-
invariant between-subject differences, typical of self-matched methods
ods have been complemented with, and extended by, alternative
such as case–crossover and self-controlled case series. The modeling study designs and statistical techniques applicable in specific
framework is highly adaptable to various outcome and exposure defini- contexts. An active area of research is so-called self-matched
tions, and it is based on efficient estimation and computational methods studies, which investigate acute effects of intermittent exposures
that make it suitable for the analysis of highly informative longitudinal by comparing observations sampled at different times within
data resources. We assess the methodology in a simulation study that the same unit. These include individual-level designs such as
demonstrates its validity under defined assumptions in a wide range of the case–crossover,2 the case-only,3 the case–time–control,4 the
data settings. We then illustrate the design in real-data examples: a first
exposure–crossover,5 and the self-controlled case series,6 among
case study replicates an analysis on influenza infections and the risk of
myocardial infarction using linked clinical datasets, while a second case
others. An alternative but related epidemiologic method for
study assesses the association between environmental exposures and aggregated data is the time series design, applied in particular
respiratory symptoms using real-time measurements from a smartphone in environmental studies.7 A thorough overview of self-matched
study. The case time series design represents a general and flexible tool, methods is provided in a recent publication by Mostofsky et al.8
applicable in different epidemiologic areas for investigating transient asso- This landscape is likely to be transformed further by
ciations with environmental factors, clinical conditions, or medications. ongoing technologic and methodologic developments in data
Keywords: AirRater; Case-only; Epidemiologic methods; Longitudinal science, which offers unique opportunities for epidemiologic
data; Self-controlled; Study design; Self-matched; Time series investigations, for instance through electronic health records
linkage,9 exposure modeling,10 and real-time measurements
Submitted November 26, 2019; accepted July 15, 2021 technologies.11,12 Ultimately, these data resources can be used
a
Department of Public Health Environments and Society, London School
of Hygiene & Tropical Medicine, London, United Kingdom; and bCen-
to reconstruct detailed longitudinal profiles with repeated mea-
tre for Statistical Methodology, London School of Hygiene & Tropical sures of health outcomes and various risk factors, offering the
Medicine, London, United Kingdom. chance to investigate complex etiological mechanisms and to
Supported by the Medical Research Council-UK (Grant ID: MR/R013349/1).
The authors report no conflicts of interest. test elaborate causal hypotheses. However, existing self-matched
Online supplemental material includes documents for simulating data with methods present limitations in this context, and new analytical
the same features of the datasets used in the two case studies, and for techniques must be developed for epidemiologic investigations
reproducing the steps and results of the analyses presented in the article.
An updated version complemented with scripts of the R statistical soft- in these intensive longitudinal and big data settings.13
ware is available at [Link] In this contribution, we present the case time series
Supplemental digital content is available through direct URL citations design, a novel self-matched method for the analysis of
in the HTML and PDF versions of this article ([Link]).
Correspondence: Antonio Gasparrini, London School of Hygiene & Tropical transient changes in risk of acute outcomes associated with
Medicine, 15-17 Tavistock Place, London WC1H 9SH, United Kingdom. time-varying exposures. This innovative design combines the
E-mail: [Link]@[Link].
longitudinal modeling structure of time series analysis with
Copyright © 2021 The Author(s). Published by Wolters Kluwer Health, Inc. the individual-level setting of other self-matched methods,
This is an open access article distributed under the terms of the Creative
Commons Attribution-Non Commercial License 4.0 (CCBY-NC), where
offering a flexible and generally applicable tool for modern
it is permissible to download, share, remix, transform, and buildup the epidemiologic studies. First, we introduce the case time series
work provided it is properly cited. The work cannot be used commercially design and its features, including the design structure, mod-
without permission from the journal.
ISSN: 1044-3983/21/326-829 eling framework, estimation methods, and key assumptions.
DOI: 10.1097/EDE.0000000000001410 Later, we assess the methodology in a simulation study that
Epidemiology • Volume 32, Number 6, November 2021 [Link] | 829

Copyright © 2021 Wolters Kluwer Health, Inc. Unauthorized reproduction of this article is prohibited.
Gasparrini Epidemiology • Volume 32, Number 6, November 2021

evaluates its performance under various data-generating sce- J P


narios. Then, we demonstrate its application through two real- ( )
g E ( yit ) = ξi ( k ) + f (xit , ) + ∑s j (t) + ∑hp zipt (1)
data epidemiologic analyses. In Discussion, we describe the j =1 p =1

epidemiologic context, advantages, and limitations, and areas The definition in Equation (1) resembles a classic time
of further development. We add documents for reproducing series regression model traditionally used in environmen-
real-data examples and the simulation study as eAppendix
Downloaded from [Link] by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1

tal epidemiology, where the ordered and sequential nature


1–3; [Link] with an updated ver- of the data allows the application of cutting-edge analyti-
sion complemented with and R scripts available at the per- cal techniques.7 Specifically, the function f (x,) specifies
sonal web site and GitHub webpage of the author (see “Data the association with the exposure of interest x, defined either
AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC1y0abggQZXdtwnfKZBYtws= on 05/19/2024

and Code”). as a binary episode indicator or as a continuous variable,


optionally allowing for nonlinearity and complex temporal
A NOVEL SELF-MATCHED DESIGN dependencies along the lag dimension l. These complex rela-
The study design proposed here, called case time series, tionships can be modeled through distributed lag linear and
is a generally applicable tool for the analysis of transient health nonlinear models (DLMs and DLNMs), which can flexibly
associations with time-varying risk factors. This novel design define cumulative effects of multiple exposure episodes.14
considers multiple observational units, defined as cases, for The term(s) sj represent functions expressed at different
which data are longitudinally collected over a predefined fol- timescales to model temporal variations in risk associated
low-up period. The main design feature that defines the case to underlying trends or seasonality, among others.15 Other
time series methodology is the split of the follow-up period in measurable time-varying confounders zp can be modeled
equally spaced time intervals, which results in a set of mul- through functions hp, and these can include for instance age
tiple case-level time series. Data forming the series can origi- or time since a specific intervention. The two sets of terms
nate from actual sequential observations or be reconstructed sj and hp ensure a strict control of temporal variation in risks
by aggregating or averaging longitudinal measurements, but, over multiple time axes. The outcome y can represent binary
eventually, they are assumed to represent a continuous tempo- indicators, counts of rare or frequent events, or continuous
ral frame. A graphical representation is provided in Figure 1, measures. The analysis can be performed on multiple cases
showing case-specific time series data with various types of i = 1,…, n , with intercepts ξi ( k ) expressing baseline risks for
measurements of outcome and exposure collected for multiple different risk sets, optionally stratified further in time strata
subjects. k = 1,…, K i nested within them, allowing an additional
The case time series data setting provides a flexible within-case control for temporal variations in risk.
framework that can be adapted for studying a wide range of
epidemiologic associations. For instance, outcomes, expo- Estimation
sures, and other predictors can be represented by either indi- The estimation procedures in case time series analy-
cators for events, episodes, or continuous measurements that ses rely on estimators and efficient computational algorithms
vary across units and times, as in Figure 1. The time intervals provided by the general framework of fixed-effects models.16
can be of any length (from seconds to years), depending on These were developed in econometrics and often applied in
the temporal association between outcome and exposures and panel studies with repeated observations.10,17 Fixed-effects
on practical design considerations. A case is a general defini- methods allow the estimation of coefficients for the various
tion, and it can represent a subject or other entities such as functions in Equation (1), without including the potentially
a geographic area to which observations are assigned, thus high number of case/stratum-specific intercepts ξi ( k ) , treated
allowing analyses to be conducted either at individual level or as nuisance (or incidental) parameters.16
with aggregated data. Eventually, the case time series struc- Fixed-effects estimators are available for the three main
ture combines characteristics of various other study designs: types of outcomes and distributions within the extended
it allows individual-level analyses of transient risk associa- exponential family of generalized linear models (GLMs).
tions as in traditional self-matched methods, but it retains Specifically, for continuous outcomes with a Gaussian distri-
the longitudinal temporal frame typical of time series data, bution, the estimation procedure involves mean-centring and
with ordered repeated measures of outcomes, exposures, and a simple correction of the degrees of freedom. For event-type
other predictors. As discussed later, this flexible design setting indicator or count outcomes following a Bernoulli and Poisson
offers important advantages. distribution, respectively, estimators for fixed-effects models
with canonical logit and log links can be defined through con-
Modeling Framework ditional likelihoods for logistic and Poisson regression.18,19
A case time series model can be written in a regression These are forms of partial likelihoods that are derived by
form by defining the expectation of a given health outcome defining reduced sufficient statistics for ξi ( k ) , obtained by
yit for case i at time t in relation to a series of predictor terms. conditioning on the total number of events within each of the
Algebraically, the model can be written as follows: n cases or n × K strata.

830 | [Link] © 2021 The Author(s). Published by Wolters Kluwer Health, Inc.

Copyright © 2021 Wolters Kluwer Health, Inc. Unauthorized reproduction of this article is prohibited.
Epidemiology • Volume 32, Number 6, November 2021 The Case Time Series Design
Downloaded from [Link] by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1
AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC1y0abggQZXdtwnfKZBYtws= on 05/19/2024

FIGURE 1. Graphical representation of data configurations for the case time series design applied in the analysis of transient
health risks of time-varying exposures. The figure represents three examples of data for three subjects (cases) followed for a period
of time, with equally spaced measures of outcome and exposure that form case-level time series. This setting allows the definition
of predictors and time axes as unique and sequential observations. The three examples illustrate different measures of outcome
and exposure. The former is represented as counts (top), a binary indicator (middle), or a continuous measure (bottom). Similarly,
exposure can be represented by a simple binary episode indicator (top), or continuous term (middle and bottom). Continuous
variables are represented by shaded colors. The graphical representation demonstrates the potential of the case time series design
to be applied in various research areas for modeling associations defined by different types of measurements.

The main advantage of fixed-effects models is that the while appealing, only operates within an elementary causal
effect of any unmeasured predictor that does not vary within framework and requires relatively strict assumptions to protect
each risk set is absorbed by the intercept ξi ( k ), and therefore against key threats to validity. Specifically, the main require-
the related confounding effect is controlled for implicitly ments are the following:
by design, as in other self-matched methods.8 In addition,
1. Distributional assumptions on the outcome. The outcome
the within-case design offers important computational
yit must represent conditionally independent observations
advantages, especially from a big data perspective. First,
originating from one of the standard family distributions,
the analysis is restricted to informative strata, that is, cases
for instance, Poisson counts, Bernoulli binary indicators,
and risk sets with variation in both outcome and exposure.
or Gaussian continuous measures.
Second, the estimators are based on efficient computational
2. Outcome-independent follow-up period. The period of
schemes, where the conditional or fixed-effect likelihood is
observation for each case i must be independent of a given
defined by the sum of parts related to multiple risk sets, and
outcome, meaning that the follow-up period cannot be
the corresponding nuisance parameters ξi ( k ) are not directly
defined or modified by the outcome itself.
estimated.
3. Outcome-independent exposure distribution. The probabil-
Key Assumptions and Threats to Validity ity of the exposure xt must be independent of the outcome
As discussed earlier, the case time series framework has history before t, meaning that the occurrence of a given
interesting design and modeling features that offer important outcome must not modify the exposure distribution in the
advantages. On the other hand, its self-controlled structure, following period.

© 2021 The Author(s). Published by Wolters Kluwer Health, Inc. [Link] | 831

Copyright © 2021 Wolters Kluwer Health, Inc. Unauthorized reproduction of this article is prohibited.
Gasparrini Epidemiology • Volume 32, Number 6, November 2021

4. Constant baseline risk conditionally on measured time- TABLE. Results of the Simulation Study, with Ten Scenarios
varying predictors. The baseline risk along the (strata of) Representing Increasingly Complex Data Settings (Scenarios
follow-up period of each case i must be constant, meaning 1–10), and Four Additional Scenarios Simulating Data Where
that variations in risks must be fully explained by model the Key Design Assumptions Are Violated (Scenarios 11–14)
covariates. Relative Relative
Scenario Bias (%) Coverage RMSE (%)
Downloaded from [Link] by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1

These requirements enable valid conditional compari-


son of observations at different times within the follow-up of Scenario 1: basic 0.0 0.951 8.8
each case. Departures from these assumptions can produce Scenario 2: rare outcome/exposure −4.5 0.951 86.0
imbalances in the temporal distribution of the outcome, the Scenario 3: continuous exposure −0.1 0.950 15.2
AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC1y0abggQZXdtwnfKZBYtws= on 05/19/2024

exposure, or unmeasured risk factors, thus determining spuri- Scenario 4: binary outcome 0.3 0.949 9.1
ous associations. Scenario 5: continuous outcome 0.0 0.950 14.7
Some of these assumptions have been separately Scenario 6: common trend −0.1 0.950 28.8
described in the literature of self-matched designs and Scenario 7: subject-specific trend 0.1 0.948 35.2
Scenario 8: unobserved baseline confounder 0.2 0.951 25.8
fixed-effects models.20–23 Specifically, assumption 1 dic-
Scenario 9: time-varying confounder −0.2 0.949 35.1
tates that outcomes must occur independently, and in par-
Scenario 10: complex lag structure 0.0 0.950 29.2
ticular that the occurrence of a given outcome level or Scenario 11: outcome-dependent risk −18.9 0.738 24.7
event must not modify the risk of following outcomes.24 Scenario 12: outcome-dependent follow-up 16.8 0.797 22.7
This assumption indirectly implies that outcomes are recur- Scenario 13: outcome-dependent exposure 11.1 0.744 14.4
rent, and nonrecurrent events can only be analyzed if rare Scenario 14: variation in baseline risk 40.7 0.222 43.3
in the population of interest.25,26 Assumptions 2 and 3 are The table reports empirical figures of relative bias (%), coverage, and relative RMSE
those posing more limitations to the application of self- (%) in 50,000 replications. A detailed description of the scenarios, definitions, and addi-
tional results and graphs are provided in the eAppendix A; [Link]
matched methods, as for many associations of interest an B841.
outcome can modify both the follow-up period and expo-
sure distribution.27,28 These requirements often restrict
the case time series designs to the analysis of exogenous binary indicators of exposure episodes associated with a con-
exposures, which are by definition outcome-independent, stant increase in risk in the next 10 days.
and for which the observation period can be extended even The first part of the simulation study (scenarios 1–10)
beyond a terminal event, as in bidirectional case–crossover evaluates the performance of the new design in recovering
schemes.29 Assumption 4 requires a constant baseline risk the true association under increasingly complex data set-
to ensure conditional exchangeability between observa- tings. Specifically, the scenarios depict different outcome and
tions within each risk sets,20,30,31 requiring that relevant exposure types, the presence of common or subject-specific
time-varying confounders are included and all the terms in trends, time-invariant and time-dependent confounders, and
Equation (1) are correctly specified. more complex lag structures. Results in the Table indicate that
Importantly, the design setting described earlier is the case time series design provides correct point estimates
not suited to represent complex causal scenarios character- and confidence intervals in almost all ten scenarios. The small
ized by dynamic mechanisms between time-varying terms. underestimation in scenario 2 is consistent with the asymp-
Specifically, feedback between outcomes and between out- totic bias of maximum likelihood estimators originating from
come and exposure are forbidden by assumptions 1 and 3, the extreme unbalance of expected events between risk and
respectively, while more generally exposure–confounder feed- control periods, previously described and defined analytically
back cannot be validly handled through traditional regression- in the self-controlled case series literature.33 eFigure 1; http://
based methods for longitudinal data.32 [Link]/EDE/B841 shows that the case time series
models can correctly recover the true association, both in the
SIMULATION STUDY basic scenario 1 with constant risk and no confounding, and in
We evaluated the performance of the case time series the more complex scenario 10 representing varying lag effects,
design in a set of simulated scenarios that involved various strong temporal trends, and highly correlated confounders.
data-generating processes and assumptions (Table). Detailed The second part of the simulation study (scenarios
information on the simulation settings, definitions, and addi- 11–14) illustrates basic applications, but where each of the
tional results are provided in eAppendix 3; [Link] four assumptions, in turn, does not hold. Specifically, scenario
com/EDE/B841. Briefly, we simulated and analyzed data for 11 describes the case where the occurrence of an outcome can
500 subjects followed up for 1 year, testing the method in change the risk status of a subject and temporally reduce their
terms of relative bias, coverage, and relative root mean square underlying risk. This can occur for instance when the event
error (RMSE) in 50,000 replications. The basic scenario results in the prescription of drugs or therapies. This induces
involves an outcome represented by repeated event counts and a form of dependency in the outcome series that violates

832 | [Link] © 2021 The Author(s). Published by Wolters Kluwer Health, Inc.

Copyright © 2021 Wolters Kluwer Health, Inc. Unauthorized reproduction of this article is prohibited.
Epidemiology • Volume 32, Number 6, November 2021 The Case Time Series Design

assumption 1 and, in this example, results in a negative bias B841). We fitted a fixed-effects Poisson model to estimate the
(Table). Scenarios 12 simulates a different situation, namely flu–AMI association while controlling for underlying trends
when the outcome event carries a risk of censoring the follow- across multiple time scales. The model includes smooth func-
up, for instance, if it increases the probability of death. This tions to define the baseline risk, specifically using natural
contravenes assumption 2 and generates a bias in the opposite splines (with two knots at the interquartile range) for age and
direction. In scenario 13, the outcome event reduces instead cyclic splines (with three degrees of freedom) for seasonality.
Downloaded from [Link] by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1

the probability of exposure episodes in the following 2 weeks, More importantly, we applied DLMs defined by either splines
a situation that can occur for example if the event results in (with knots at 3, 10, and 29 lags) or step functions (with strata
hospitalization or lifestyle changes. Here assumption 3 does 1–3, 4–7, 8–14, 15–28, and 29–91 lags) to describe temporal
AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC1y0abggQZXdtwnfKZBYtws= on 05/19/2024

not hold, and the estimators are again biased upward. Finally, effects along with the exposure window.
scenario 14 illustrates the case of unobserved periods of lower Results are reported in Figure 2. The left and middle
baseline risk within the follow-up, for instance, corresponding panels display the variation in risk of AMI by age and sea-
to holiday periods with a reduced probability of an outcome son, showing how the case time series design allows model-
being reported. This undermines the conditional exchange- ing baseline trends fluctuating smoothly across multiple time
ability requirements of assumption 4 and induces a large posi- axes. The right panel illustrates the risk after a flu episode
tive bias. within the selected lag period, as estimated using a DLM with
spline functions. The graph indicates a high risk in the first
ILLUSTRATIVE EXAMPLES days after a flu episode, which then attenuates and disappears
This section illustrates the application of the case time after approximately 1 month. The same panel also includes
series design in two real-data examples. These case studies the fit of the alternative distributed lag model defined by step
are described here only for illustrative purposes, and they are functions, which assumes a constant risk within exposure win-
not meant to offer substantive epidemiologic evidence on the dows (see also eFigure 3; [Link]
associations under study. Detailed information on the set- This specification matches the stratification approach in the
ting and sources of data can be found in the cited references. original self-controlled case series analysis,34 although the
Documents in the eAppendices 1 and 2; [Link] case time series design with DLMs accounts for cumulative
EDE/B841, provide notes and R code that reproduce the steps effects of potentially overlapping periods of flu episodes.
of these analyses using simulated data, and they offer details
Environmental Exposures and Respiratory
on the specific modeling choices.
Symptoms
Flu and Myocardial Infarction The second example illustrates a preliminary analysis
The first example replicates a published analysis that of the role of multiple environmental stressors in increasing
assessed the role of influenza infection as a trigger for acute the risk of respiratory symptoms using smartphone technol-
myocardial infarction (AMI).34 The data, retrieved by linking ogy. Data were collected within AirRater, an integrated online
electronic health records from primary care and cohort data- platform operating in Tasmania that combines symptom sur-
bases for England and Wales, include 3,927 acute MI cases veillance, environmental monitoring, and real-time notifica-
with at least one flu episode in the period 2003–2009. A rep- tions.12 A smartphone app allowed the self-reported recording
resentation of a subinterval of the follow-up for six subjects of respiratory symptoms and the reconstruction of personal-
is reported in eFigure 2; [Link] ized exposure series by linking geolocated positions with
The original analysis relied on the self-controlled case series high-resolution spatiotemporal maps derived from environ-
design to examine the association, using exposure windows in mental monitors (Figure 3). Standard cohort analyses based
the 1–91 days after each flu episode and controlling for trends on between-subject comparisons are unsuitable in this com-
using 5-year age strata and trimester indicators. Limitations of plex study setting, characterized by continuous recruitment,
this approach are the use of stratification to describe smooth high dropout rates, and intermittent participation (eFigure 4;
continuous dependencies and the fact that multiple flu epi- [Link] Similarly, the frequent and
sodes experienced by some subjects resulted in the long expo- highly seasonal outcome pose problems in adopting a case–
sure windows to overlap (eFigure 2; [Link] crossover design, with issues in selecting control times and
EDE/B841), requiring ad-hoc fixes that can generate biases.35 about the assumption of constant within-stratum risk. Finally,
Conversely, the rarity of the exposure, with most of the sub- the presence of multiple continuous exposures prevents the
jects experiencing a single flu episode, prevents the applica- application of the self-controlled case series design, either in
tion of the case–crossover design, as most control sampling its standard or extended forms.36,37
schemes would generate nondiscordant case–referent sets. We, therefore, applied a case time series design (eAp-
We replicated the analysis with a case time series pendix 2; [Link] The analysis
design, splitting the follow-up period of each subject into included 1,601 subjects followed between October 2015 and
daily time series (eAppendix 1; [Link] November 2018, with a total of 364,384 person–days. The

© 2021 The Author(s). Published by Wolters Kluwer Health, Inc. [Link] | 833

Copyright © 2021 Wolters Kluwer Health, Inc. Unauthorized reproduction of this article is prohibited.
Gasparrini Epidemiology • Volume 32, Number 6, November 2021
Downloaded from [Link] by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1
AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC1y0abggQZXdtwnfKZBYtws= on 05/19/2024

FIGURE 2. Results of the analysis on the association between influenza infection and AMI, as incident rate ratio (IRR) and 95%
confidence intervals. The three panels show the AMI risk by age (left) and by season (middle), and the lag-response curve repre-
senting the risk in the 1–91 days after a flu episode (right). The latter is estimated in the main model using natural splines (continu-
ous red line), with superimposed the results from an alternative model using step functions (dashed gray line).

FIGURE 3. Graphical representation of the individual time series of a subject participating in the AirRater study on the associa-
tion between environmental exposures and respiratory symptoms. The four panels (from top to bottom) display the daily series
of indicators of allergic events and levels of the three environmental stressors, represented by pollen (grains/m3), PM2.5 (μg/m3),
and temperature (°C).

834 | [Link] © 2021 The Author(s). Published by Wolters Kluwer Health, Inc.

Copyright © 2021 Wolters Kluwer Health, Inc. Unauthorized reproduction of this article is prohibited.
Epidemiology • Volume 32, Number 6, November 2021 The Case Time Series Design

event-type outcome was defined as daily indicators of reported pollen, with a step increase in risk that flattens out at high
respiratory symptoms and associated with individual exposure exposures, and a lagged effect up to 2 days. The middle panels
to pollen (grains/m3), fine particulate matter (PM2.5, μg/m3), suggest an independent association with PM2.5, where the risk
and temperature (°C) (Figure 3). We modeled the relation- is entirely limited to the same-day exposure. Finally, results
ships using a fixed-effects logistic regression over a lag period in the righthand panels show a positive association with high
of 0–3 days, using an unconstrained distributed lag model for ambient temperature, with the OR increasing above 1 beyond
Downloaded from [Link] by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1

the linear association with PM2.5, and bidimensional spline daily averages of 15°C.
DLNMs for specifying nonlinear dependencies with pollen
and temperature.14,38 A strict temporal control was enforced DISCUSSION
AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC1y0abggQZXdtwnfKZBYtws= on 05/19/2024

by using subject/month strata intercepts, natural splines of The novel case time series methodology offers a general
time (with 8 df/year), and indicators of the day of the week, modeling framework for the analysis of epidemiologic asso-
thus modeling individually varying baseline risks on top of ciations with time-varying exposures. The design is adaptable
shared long-term, seasonal, and weekly trends. to various data settings for the analysis of highly informative
Figure 4 shows the preliminary results, with estimated longitudinal measurements, and it is particularly well suited in
associations reported as odds ratios (ORs) from the model that applications with modern data resources such as individual-
includes simultaneously the three environmental stressors. level exposure models and real-time technologies.
The graphs display the overall cumulative exposure-response The main feature of methodology is a flexible scheme
relationships (top panels), interpreted as the net effects across that embeds a longitudinal time series structure in a within-
lags, and the full bidimensional exposure-lag-response asso- subject design, providing unique modeling advantages. For
ciations (bottom panels).14,38 The lefthand panels indicate a instance, the sequential order of observations offers the oppor-
positive association between risk of allergic symptoms and tunity to assess complex temporal relationships with multiple

FIGURE 4. Results of the analysis on the association between environmental exposures and respiratory symptoms, as odds ratio
(OR) and 95% confidence intervals. The three columns of panels show estimated associations with pollen (left, grains/m3), PM2.5
(middle, μg/m3), and temperature (right, °C). The top row of panels displays the net risk cumulated in the lag period 0–3 days
as overall cumulative exposure–response associations, assumed linear for PM2.5 and nonlinear for pollen and temperature. The
bottom row of panels shows instead the full exposure-lag-response associations, represented as the bidimensional risk surface for
pollen and temperature or the lag-specific risks for a 10 μg/m3 increase in PM2.5.

© 2021 The Author(s). Published by Wolters Kluwer Health, Inc. [Link] | 835

Copyright © 2021 Wolters Kluwer Health, Inc. Unauthorized reproduction of this article is prohibited.
Gasparrini Epidemiology • Volume 32, Number 6, November 2021

exposures, where patterns of cumulative effects for linear or that can invalidate the assumption of constant baseline risk,
nonlinear dependencies can be easily modeled. Furthermore, through the stratification of the follow-up period and the inclu-
the time series and self-controlled features offer a structure sion of lagged and smooth continuous terms in the model.
that enables strict control for confounding: time-invariant and Other limitations and areas of current research must
time-varying factors can be adjusted for by stratifying the be discussed. First, as a method based on a within-subject
baseline risk between and within subjects, respectively, while comparison, the case time series design is ideal for investi-
Downloaded from [Link] by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1

residual temporal variations can be directly modeled through gating phenomena with short-term changes in risk relative to
time-varying predictors that represent confounders or shared the study period, while it is less suitable for the analysis of
trends across multiple time axes. long-term effects and chronic exposures. In fact, while it is
AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC1y0abggQZXdtwnfKZBYtws= on 05/19/2024

The new design complements and extends the already in theory possible to extend indefinitely the lag period within
rich set of self-matched methods for observational studies the follow-up interval, there is a limit to which the model
described in the epidemiologic literature.8 Previous method- can disentangle long-lagged effects from seasonal and other
ological contributions have highlighted links and similarities trends.42 In addition, the splitting of the follow-up period in
between various designs,18,21,29,30,39–41 and ultimately these individual-level time series produces a substantial data expan-
can be seen as alternative approaches to model the same risk sion, with considerable computational demand especially in
associations. However, each method relies on different sets of the presence of a high number of subjects or long study peri-
assumptions and modeling choices, which explain in part their ods. Schemes based on risk-set sampling, previously proposed
separate areas of application. The case time series methodol- for cohort and nested case–control studies,43–45 are currently
ogy, nevertheless, offers a general framework that combines under development to address this issue. Finally, the simu-
and extends features of existing designs, with important advan- lation study and the two real-data examples presented basic
tages. For example, it borrows flexible modeling tools from epidemiologic relationships between time-varying variables.
aggregated-data time series design, but it implements them in However, more complex causal dependencies, involving, for
individual-level analyses that allow a finer reconstruction of instance, dynamic feedback or multiple pathways, explicitly
outcomes, exposures, and other risk factors. It is applicable violate the strict assumptions underpinning the case time
to assess associations with multiple continuous predictors as series design, and cannot be modeled in the proposed frame-
the case–crossover design, and it can model recurrent events, work. The definition, limitations, and potential extensions
either common or rare, as the self-controlled case series analy- of fixed-effects models and related designs within a general
ses, but it can be extended to the analysis of outcomes repre- causal inference setting is an area of current research.23
sented by binary indicators or continuous measures, simply In conclusion, the case time series design represents
assuming different distributions. Finally, its time series struc- a novel epidemiologic method for the analysis of transient
ture allows the application of sophisticated techniques such as health associations with time-varying exposures. Its flex-
smoothing methods and distributed lag models, characterized ible modeling framework can be adapted to various contexts
by well-defined parameterizations, computational efficiency, and research areas, for instance, in clinical, environmental,
and standard software implementations. A thorough and criti- and pharmacoepidemiology, and it is suitable for the analy-
cal comparison of the case time series methodology with alter- sis of intensive longitudinal data provided by modern data
native approaches will be provided in future contributions. technologies.
Together with other self-matched methods, the new
case time series design is based on strict assumptions to pro- ACKNOWLEDGMENTS
tect against key threats to validity. However, these conditions The author is thankful to Dr. Charlotte Warren-Gash,
are not always met in practice, and their violations can lead to and Dr. Fay Johnston and Mr. Iain Koolhof for providing data
important biases. Specifically, the requirement that both expo- access and information for the two case studies used as illus-
sures and follow-up periods are independent of the outcome trative examples. The author is also grateful to colleagues who
poses severe limitations to the application of the method, in provided comments on various drafts of the manuscript and
particular in clinical and pharmacoepidemiologic studies. In analyses, in particular Mr. Francesco Sera, Dr. Ana Maria
fact, the temporal distribution of endogenous predictors such Vicedo-Cabrera, and Prof. Ben Armstrong. Finally, the author
as behaviors, clinical therapies, or drug prescriptions are often is indebted to Prof. Paddy Farrington for offering critical
modified by an outcome event. In contrast, the case time series insights on asymptotic biases of maximum likelihood estima-
and other self-controlled designs are well suited for the analysis tors in self-controlled case series. The study on influenza and
of exogenous exposures such as environmental factors, as dis- AMI was originally approved by the Independent Scientific
cussed before. Extension to test and relax these strong assump- Advisory Committee (ISAC) of the Clinical Practice Research
tions have been developed for the self-controlled case series Datalink (Ref: 09_034), the Cardiovascular Disease Research
design,27,28 but further research is needed to implement and Using Linked Bespoke Studies and Electronic Records
assess their validity in case time series models. Conversely, the (CALIBER) Scientific oversight committee and Myocardial
new design is well suited to control for temporal confounding Ischaemia National Audit Project (MINAP) Academic Group

836 | [Link] © 2021 The Author(s). Published by Wolters Kluwer Health, Inc.

Copyright © 2021 Wolters Kluwer Health, Inc. Unauthorized reproduction of this article is prohibited.
Epidemiology • Volume 32, Number 6, November 2021 The Case Time Series Design

(ref: 09_08), and the UCL Research Ethics committee (Ref: 22. Whitaker HJ, Ghebremichael-Weldeselassie Y, Douglas IJ, Smeeth L,
Farrington CP. Investigating the assumptions of the self-controlled case
2219/001). This study, which used the analysis dataset only, series method. Stat Med. 2018;37:643–658.
was approved through a minor ISAC amendment (granted 23. Imai K, Kim IS. When should we use unit fixed effects regression
on 12/01/2016) and a MINAP Academic Group amendment models for causal inference with longitudinal data? Am J Poli Sci.
2019;63:467–490.
(granted on 11/01/2016). More information about AirRater 24. Farrington CP, Hocine MN. Within-individual dependence in self-
are available at [Link]
Downloaded from [Link] by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1

controlled case series models for recurrent events. J R Stat Soc Ser C
(Applied Statistics). 2010;59:457–475.
25. Whitaker HJ, Steer CD, Farrington CP. Self-controlled case series stud-
REFERENCES ies: just how rare does a rare non-recurrent outcome need to be? Biom J.
1. Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. Lipcott 2018;60:1110–1120.
AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC1y0abggQZXdtwnfKZBYtws= on 05/19/2024

Williams & Wilkins; 2008. 26. Ghebremichael-Weldeselassie Y, Whitaker HJ. Self-controlled case series
2. Maclure M. The case-crossover design: a method for studying transient methodology. Ann Rev Stat Appl. 2019;6:241–261.
effects on the risk of acute events. Am J Epidemiol. 1991;133:144–153. 27. Farrington CP, Anaya-Izquierdo K, Whitaker HJ, Hocine MN, Douglas
3. Armstrong BG. Fixed factors that modify the effects of time-varying fac- I, Smeeth L. Self-controlled case series analysis with event-dependent
tors: applying the case-only approach. Epidemiology. 2003;14:467–472. observation periods. J Am Stat Assoc. 2011;106:417–426.
4. Suissa S. The case-time-control design. Epidemiology. 1995;6:248–253. 28. Farrington CP, Whitaker HJ, Hocine MN. Case series analysis for
5. Redelmeier DA. The exposure-crossover design is a new method for censored, perturbed, or curtailed post-event exposures. Biostatistics.
studying sustained changes in recurrent events. J Clin Epidemiol. 2009;10:3–16.
2013;66:955–963. 29. Navidi W. Bidirectional case-crossover designs for exposures with time
6. Farrington CP. Relative incidence estimation from case series for vaccine trends. Biometrics. 1998;54:596–605.
safety evaluation. Biometrics. 1995;51:228–235. 30. Lu Y, Zeger SL. On the equivalence of case-crossover and time series
7. Bhaskaran K, Gasparrini A, Hajat S, Smeeth L, Armstrong B. Time methods in environmental epidemiology. Biostatistics. 2007;8:337–344.
series regression studies in environmental epidemiology. Int J Epidemiol. 31. Mittleman MA, Mostofsky E. Exchangeability in the case-crossover
2013;42:1187–1195. design. Int J Epidemiol. 2014;43:1645–1655.
8. Mostofsky E, Coull BA, Mittleman MA. Analysis of observational self- 32. Mansournia MA, Etminan M, Danaei G, Kaufman JS, Collins G.
matched data to examine acute triggers of outcome events with abrupt Handling time varying confounding in observational research. BMJ.
onset. Epidemiology. 2018;29:804–816. 2017;359:j4587.
9. Herrett E, Gallagher AM, Bhaskaran K, et al. Data resource profile: clinical 33. Musonda P, Hocine MN, Whitaker HJ, Farrington CP. Self-controlled
Practice Research Datalink (CPRD). Int J Epidemiol. 2015;44:827–836. case series analyses: small-sample performance. Comput Stat Data Anal.
10. Janes H, Sheppard L, Shepherd K. Statistical analysis of air pollution 2008;52:1942–1957.
panel studies: an illustration. Ann Epidemiol. 2008;18:792–802. 34. Warren-Gash C, Hayward AC, Hemingway H, et al. Influenza infection
11. Dixon WG, Beukenhorst AL, Yimer BB, et al. How the weather affects and risk of acute myocardial infarction in England and Wales: a CALIBER
the pain of citizen scientists using a smartphone app. NPJ Digit Med. self-controlled case series study. J Infect Dis. 2012;206:1652–1659.
2019;2:1–9. 35. Whitaker HJ, Farrington CP, Spiessens B, Musonda P. Tutorial
12. Johnston FH, Wheeler AJ, Williamson GJ, et al. Using smartphone tech- in biostatistics: the self-controlled case series method. Stat Med.
nology to reduce health impacts from atmospheric environmental haz- 2006;25:1768–1797.
ards. Envir Res Lett. 2018;13:044019. 36. Farrington CP, Whitaker HJ. Semiparametric analysis of case series data.
13. Walls TA, Schafer JL. Models for Intensive Longitudinal Data. Oxford J R Stat Soc Ser C. 2006;55:553–594.
University Press; 2006. 37. Ghebremichael-Weldeselassie Y, Whitaker HJ, Farrington CP. Spline-
14. Gasparrini A. Modeling exposure-lag-response associations with distrib- based self-controlled case series method. Stat Med. 2017;36:3022–3038.
uted lag non-linear models. Stat Med. 2014;33:881–899. 38. Gasparrini A, Armstrong B, Kenward MG. Distributed lag non-linear
15. Touloumi G, Atkinson R, Le Tertre A, et al. Analysis of health out- models. Stat Med. 2010;29:2224–2234.
come time series data in epidemiological studies. EnvironMetrics. 39. Armstrong BG, Gasparrini A, Tobias A. Conditional Poisson models: a
2004;15:101–117. flexible alternative to conditional logistic case cross-over analysis. BMC
16. Gunasekara FI, Richardson K, Carter K, Blakely T. Fixed effects analysis Med Res Methodol. 2014;14:122.
of repeated measures data. Int J Epidemiol. 2014;43:264–269. 40. Greenland S. A unified approach to the analysis of case-distribution
17. Arellano M, Honoré B. Panel Data Models: Some Recent Developments. (case-only) studies. Stat Med. 1999;18:1–15.
Handbook of Econometrics. Vol. 5. Elsevier; 2001:3229–3296. 41. Navidi W, Weinhandl E. Risk set sampling for case-crossover designs.
18. Xu S, Zeng C, Newcomer S, Nelson J, Glanz J. Use of fixed effects mod- Epidemiology. 2002;13:100–105.
els to analyze self-controlled case series data in vaccine safety studies. J 42. Schwartz J. The distributed lag between air pollution and daily deaths.
Biom Biostat. 2012;(suppl 7):006. Epidemiology. 2000;11:320–326.
19. Allison PD. Fixed Effects Regression Methods for Longitudinal Data 43. Langholz B, Goldstein L. Risk set sampling in epidemiologic cohort stud-
Using SAS. Sas Institute Inc; 2005. ies. Stat Sci. 1996;11:35–53.
20. Janes H, Sheppard L, Lumley T. Case-crossover analyses of air pollution 44. Borgan O, Goldstein L, Langholz B. Methods for the analysis of sam-
exposure data: referent selection strategies and their implications for bias. pled cohort data in the Cox proportional hazards model. Ann Stat.
Epidemiology. 2005;16:717–726. 1995;23:1749–1778.
21. Lumley T, Levy D. Bias in the case–crossover design: implications for 45. Langholz B, Goldstein L. Conditional logistic analysis of case-control
studies of air pollution. EnvironMetrics. 2000;11:689–704. studies with complex sampling. Biostatistics. 2001;2:63–84.

© 2021 The Author(s). Published by Wolters Kluwer Health, Inc. [Link] | 837

Copyright © 2021 Wolters Kluwer Health, Inc. Unauthorized reproduction of this article is prohibited.

You might also like