Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
We present a novel approach to constraint-based causal discovery, that takes the form of straightforward logical inference, applied to a list of simple, logical statements about causal relations that are derived directly from observed (in)dependencies. It is both sound and complete, in the sense that all invariant features of the corresponding partial ancestral graph (PAG) are identified, even in the presence of latent variables and selection bias. The approach shows that every identifiable causal relation corresponds to one of just two fundamental forms. More importantly, as the basic building blocks of the method do not rely on the detailed (graphical) structure of the corresponding PAG, it opens up a range of new opportunities, including more robust inference, detailed accountability, and application to large models.
The Maximal Ancestral Graph (MAG) formalism is an important generalization of Bayesian Networks for representing causal processes that admit the possibility of latent confounding variables. Thus, when learning MAGs from data for Causal Discovery, the often unrealistic assumption of Causal Sufficiency can be dismissed. However, the causal interpretation of edges in a MAG is not trivial and it is potentially misleading to unfamiliar practitioners. An edge X → Y may denote either (a) X causes Y and no latent confounding variable is present (pure-causal edge) or (b) X causes Y with the potential presence of a latent common cause. In addition, an edge X → Y may denote (I) X causes Y directly (direct-causal edge), i.e., without any modeled variables mediating the causation or (II) X causes Y possibly-indirectly. In this paper, we present polynomial-time algorithms and tools that can distinguish among the above cases and facilitate the causal interpretation of MAGs. In addition, we run simulated experiments to quantify the percentage of edges that can be labeled as pure or direct-causal. Our results show that the percentage of edges that can be labeled as pure-causal achieves a minimum for sparse or dense networks, and a maximum for in-between values of edge density. In contrast, the percentage of edges that can be labeled as direct-causal decreases as the edge density of the MAG increases.
We consider the incorporation of causal knowledge about the presence or absence of (possibly indirect) causal relations into a causal model. Such causal relations correspond to directed paths in a causal model. This type of knowledge naturally arises from experimental data, among others. Specifically, we consider the formalisms of Causal Bayesian Networks and Maximal Ancestral Graphs and their Markov equivalence classes: Partially Directed Acyclic Graphs and Partially Oriented Ancestral Graphs. We introduce sound and complete procedures which are able to incorporate causal prior knowledge in such models. In simulated experiments, we show that often considering even a few causal facts leads to a significant number of new inferences. In a case study, we also show how to use real experimental data to infer causal knowledge and incorporate it into a real biological causal network. The code is available at mensxmachina.org.
2011
We present two inference rules, based on so called minimal conditional independencies, that are sufficient to find all invariant arrowheads in a single causal DAG, even when selection bias may be present. It turns out that the set of seven graphical orientation rules that are usually employed to identify these arrowheads are, in fact, just different instances/manifestations of these two rules. The resulting algorithm to obtain the definite causal information is elegant and fast, once the (often surprisingly small) set of minimal independencies is found. * This research was funded by NWO Vici grant 639.023.604.
2011
In this paper we address the problem of incorporating prior knowledge, in the form of causal relations, in causal models. Prior approaches mostly consider knowledge about the presence or absence of edges in the model. We use the formalism of Maximal Ancestral Graphs (MAGs) and adapt cSAT+ to solve this problem, an algorithm for reasoning with datasets defined over different variable sets.
This paper considers a method that combines ideas from Bayesian learning, Bayesian network inference, and classical hypothesis testing to produce a more reliable and robust test of independence for constraintbased (CB) learning of causal structure. Our method produces a smoothed contingency table N XY Z that can be used with any test of independence that relies on contingency table statistics. N XY Z can be calculated in the same asymptotic time and space required to calculate a standard contingency table, allows the specification of a prior distribution over parameters, and can be calculated when the database is incomplete. We provide theoretical justification for the procedure, and with synthetic data we demonstrate its benefits empirically over both a CB algorithm using the standard contingency table, and over a greedy Bayesian algorithm. We show that, even when used with noninformative priors, it results in better recovery of structural features and it produces networks with smaller KL-Divergence, especially as the number of nodes increases or the number of records decreases. Another benefit is the dramatic reduction in the probability that a CB algorithm will stall during the search, providing a remedy for an annoying problem plaguing CB learning when the database is small.
Causality discovery without manipulation is considered a crucial problem to a variety of applications, such as genetic therapy. The state-of-the-art solutions, e.g. LiNGAM, return accurate results when the number of labeled samples is larger than the number of variables. These approaches are thus applicable only when large numbers of samples are available or the problem domain is sufficiently small. Motivated by the observations of the local sparsity properties on causal structures, we propose a general Split-and-Merge strategy, named SADA, to enhance the scalability of a wide class of causality discovery algorithms. SADA is able to accurately identify the causal variables, even when the sample size is significantly smaller than the number of variables. In SADA, the variables are partitioned into subsets, by finding cuts on the sparse probabilistic graphical model over the variables. By running mainstream causation discovery algorithms, e.g. LiNGAM, on the subproblems, complete causality can be reconstructed by combining all the partial results. SADA benefits from the recursive division technique, since each small subproblem generates more accurate result under the same number of samples. We theoretically prove that SADA always reduces the scale of problems with-out significant sacrifice on result accuracy, depending only on the local sparsity condition over the variables. Experiments on realworld datasets verify the improvements on scalability and accuracy by applying SADA on top of existing causation algorithms.
2009
This paper (which is mainly expository) sets up graphical models for causation, having a bit less than the usual complement of hypothetical counterfactuals. Assuming the invariance of error distributions may be essential for causal inference, but the errors themselves need not be invariant. Graphs can be interpreted using conditional distributions, so that we can better address connections between the mathematical framework and causality in the world. The identification problem is posed in terms of conditionals. As will be seen, causal relationships cannot be inferred from a data set by running regressions unless there is substantial prior knowledge about the mechanisms that generated the data. There are few successful applications of graphical models, mainly because few causal pathways can be excluded on a priori grounds. The invariance conditions themselves remain to be assessed.
Proceedings of the 22nd Conference on …, 2006
Most causal discovery algorithms in the lit-erature exploit an assumption usually re-ferred to as the Causal Faithfulness or Sta-bility Condition. In this paper, we high-light two components of the condition used in constraint-based algorithms, which we call ...
arXiv (Cornell University), 2022
Instrumental variable (IV) is a powerful approach to inferring the causal effect of a treatment on an outcome of interest from observational data even when there exist latent confounders between the treatment and the outcome. However, existing IV methods require that an IV is selected and justified with domain knowledge. An invalid IV may lead to biased estimates. Hence, discovering a valid IV is critical to the applications of IV methods. In this paper, we study and design a data-driven algorithm to discover valid IVs from data under mild assumptions. We develop the theory based on partial ancestral graphs (PAGs) to support the search for a set of candidate Ancestral IVs (AIVs), and for each possible AIV, the identification of its conditioning set. Based on the theory, we propose a data-driven algorithm to discover a pair of IVs from data. The experiments on synthetic and realworld datasets show that the developed IV discovery algorithm estimates accurate estimates of causal effects in comparison with the state-of-the-art IV based causal effect estimators.
2016
Many applications call for learning causal models from relational data. We investigate Relational Causal Models (RCM) under relational counterparts of adjacency-faithfulness and orientation-faithfulness, yielding a simple approach to identifying a subset of relational d-separation queries needed for determining the structure of an RCM using d-separation against an unrolled DAG representation of the RCM.We provide original theoretical analysis that offers the basis of a sound and efficient algorithm for learning the structure of an RCM from relational data. We describe RCD-Light, a sound and efficient constraint-based algorithm that is guaranteed to yield a correct partially-directed RCM structure with at least as many edges oriented as in that produced by RCD, the only other existing algorithm for learning RCM. We show that unlike RCD, which requires exponential time and space, RCDLight requires only polynomial time and space to orient the dependencies of a sparse RCM.
International Conference on Artificial Intelligence and Statistics, 2020
The discovery of causal relationships is a core part of scientific research. Accordingly, over the past several decades, algorithms have been developed to discover the causal structure for a system of variables from observational data. Learning ancestral graphs is of particular interest due to their ability to represent latent confounding implicitly with bi-directed edges. The well-known FCI algorithm provably recovers an ancestral graph for a system of variables encoding the sound and complete set of causal relationships identifiable from observational data. 1 Additional causal relationships become identifiable with the incorporation of background knowledge; however, it is not known for what types of knowledge FCI remains complete. In this paper, we define tiered background knowledge and show that FCI is sound and complete with the incorporation of this knowledge.
2003
This paper considers a method that combines ideas from Bayesian learning, Bayesian network inference, and classical hypothesis testing to produce a more reliable and robust test of independence for constraintbased (CB) learning of causal structure. Our method produces a smoothed contingency table N XY Z that can be used with any test of independence that relies on contingency table statistics. N XY Z can be calculated in the same asymptotic time and space required to calculate a standard contingency table, allows the specification of a prior distribution over parameters, and can be calculated when the database is incomplete. We provide theoretical justification for the procedure, and with synthetic data we demonstrate its benefits empirically over both a CB algorithm using the standard contingency table, and over a greedy Bayesian algorithm. We show that, even when used with noninformative priors, it results in better recovery of structural features and it produces networks with smaller KL-Divergence, especially as the number of nodes increases or the number of records decreases. Another benefit is the dramatic reduction in the probability that a CB algorithm will stall during the search, providing a remedy for an annoying problem plaguing CB learning when the database is small.
We address the problem of constraint-based causal discovery with mixed data types, such as (but not limited to) continuous, binary, multinomial, and ordinal variables. We use likelihood-ratio tests based on appropriate regression models and show how to derive symmetric conditional independence tests. Such tests can then be directly used by existing constraint-based methods with mixed data, such as the PC and FCI algorithms for learning Bayesian networks and maximal ancestral graphs, respectively. In experiments on simulated Bayesian networks, we employ the PC algorithm with different conditional independence tests for mixed data and show that the proposed approach outperforms alternatives in terms of learning accuracy.
New Generation Computing, 2016
Learning causal models hidden in the background of observational data has been a difficult issue. Dealing with latent common causes and selection bias for constructing causal models in real data is often necessary because observing all relevant variables is difficult. Ancestral graph models are effective and useful for representing causal models with some information of such latent variables. The causal faithfulness condition, which is usually assumed for determining the models, is known to often be weakly violated in statistical view points for finite data. One of the authors developed a constraint-based causal learning algorithm that is robust against the weak violations while assuming no latent variables. In this study, we applied and extended the thoughts of the algorithm to the inference of ancestral graph models. The practical validity and effectiveness of the algorithm are also confirmed by using some standard datasets in comparison with FCI and RFCI algorithms.
JMLR workshop and conference proceedings, 2016
We present an algorithm for estimating bounds on causal effects from observational data which combines graphical model search with simple linear regression. We assume that the underlying system can be represented by a linear structural equation model with no feedback, and we allow for the possibility of latent variables. Under assumptions standard in the causal search literature, we use conditional independence constraints to search for an equivalence class of ancestral graphs. Then, for each model in the equivalence class, we perform the appropriate regression (using causal structure information to determine which covariates to include in the regression) to estimate a set of possible causal effects. Our approach is based on the "IDA" procedure of Maathuis et al. (2009), which assumes that all relevant variables have been measured (i.e., no unmeasured confounders). We generalize their work by relaxing this assumption, which is often violated in applied contexts. We validat...
Discovering causal relationships in large databases of observational data is challenging. The pioneering work in this area was rooted in the theory of Bayesian network (BN) learning, which however, is a NP-complete problem. Hence several constraint-based algorithms have been developed to efficiently discover causations in large databases. These methods usually use the idea of BN learning, directly or indirectly, and are focused on causal relationships with single cause variables. In this paper, we propose an approach to mine causal rules in large databases of binary variables. Our method expands the scope of causality discovery to causal relationships with multiple cause variables, and we utilise partial association tests to exclude noncausal associations, to ensure the high reliability of discovered causal rules. Furthermore an efficient algorithm is designed for the tests in large databases. We assess the method with a set of real-world diagnostic data. The results show that our method can effectively discover interesting causal rules in large databases.
2013
We propose a new principles and technique to speed up constraint-based algorithms for learning dependency structures from data. Novelty of proposed framework comes from implementing the rules of inductive inference acceleration, which can radically reduce a searching space for model’s skeleton inference. The inductive inference acceleration rules facilitate fast identification of skeleton by performing such functions: recognition of edge presence (absence); recognition of some variables as non-members (obligate members) of supposed separator. It has been demonstrated that an algorithm, equipped with the proposed rules, can learn Bayesian networks (of moderate density) multiple times faster than well-known PC algorithm. Such improvement can be extended to the case of non-recursive graphical models, i.e. causal networks with hidden variables.
Lecture Notes in Statistics, 1993
A discovery problem is composed of a set of alternative structures, one of which is the source of data, but any of which, for all the investigator knows before the inquiry, could be the structure from which the data are obtained. There is something to be found out about the actual structure, whichever it is. It may be that we want to settle a particular hypothesis that is true in some of the possible structures and false in others, or it may be that we want to know the complete theory of a certain sort of phenomenon. In this book, and in much of the social sciences and epidemiology, the alternative structures in a discovery problem are typically directed acyclic graphs paired with joint probability distributions on their vertices. We usually want to know something about the structure of the graph that represents causal influences, and we may also want to know about the distribution of values of variables in the graph for a given population. A discovery problem also includes a characterization of a kind of evidence; for example, data may be available for some of the variables but not others, and the data may include the actual probability or conditional independence relations or, more realistically, simply the values of the variables for random samples. Our theoretical discussions will usually consider discovery problems in which the data include the true conditional independence relations among the measured variables, but our examples and applications will always involve inferences from statistical samples. A method solves a discovery problem in the limit if as the sample size increases without bound the method converges to the true answer to the question or to the true theory, whatever
The PC algorithm learns maximally oriented causal Bayesian networks. However, there is no equivalent complete algorithm for learning the structure of relational models, a more expressive generalization of Bayesian networks. Recent developments in the theory and representation of relational models support lifted reasoning about conditional independence. This enables a powerful constraint for orienting bivariate dependencies and forms the basis of a new algorithm for learning structure. We present the relational causal discovery (RCD) algorithm that learns causal relational models. We prove that RCD is sound and complete, and we present empirical results that demonstrate effectiveness.
2022
Unobserved confounding is the main obstacle to causal effect estimation from observational data. Instrumental variables (IVs) are widely used for causal effect estimation when there exist latent confounders. With the standard IV method, when a given IV is valid, unbiased estimation can be obtained, but the validity requirement of standard IV is strict and untestable. Conditional IV has been proposed to relax the requirement of standard IV by conditioning on a set of observed variables (known as a conditioning set for a conditional IV). However, the criterion for finding a conditioning set for a conditional IV needs complete causal structure knowledge or a directed acyclic graph (DAG) representing the causal relationships of both observed and unobserved variables. This makes it impossible to discover a conditioning set directly from data. In this paper, by leveraging maximal ancestral graphs (MAGs) in causal inference with latent variables, we propose a new type of IV, ancestral IV i...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.