Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2005, Markov Chain Monte Carlo
…
35 pages
1 file
This chapter provides a tutorial introduction to the use of MCMC in the analysis of data observed for multiple genetic loci on members of extended pedigrees in which there are many missing data. We introduce the specification of pedigrees and inheritance, and the structure of genetic models defining the dependence structure of data. We review exact computational algorithms which can provide a partial solution, and can be used to improve MCMC sampling of inheritance patterns. Realization of inheritance patterns can be used in several ways. Here, we focus on the estimation of multilocus linkage lod scores for the location of a locus affecting a disease trait relative to a known map of genetic marker loci.
2002
Abstract: Multipoint linkage analyses of genetic data on extended pedigrees can involve exact computationswhich are infeasible. Markov chain Monte Carlo methods represent an attractive alternative, greatlyextending the range of models and data sets for which analysis is practical. In this paper, severaladvances in Markov chain Monte Carlo theory, namely joint updates of latent variables across lociand meioses, integrated proposals, Metropolis-Hastings restarts via sequential imputation and...
This research plan is being produced as a technical report, first, because it summarises much preliminary work that has not yet been published, and second, and more importantly,b ecause it provides the methodological formulation for use of the Gibbs sampler in the Monte Carlo evaluation of likelihoods for complexg enetic models. These methodological foundations have been presented in several seminars and at meetings, March -June 1990.
Statistical Science, 2003
Multipoint linkage analyses of data collected on related individuals are often performed as a first step in the discovery of disease genes. Through the dependence in inheritance of genes segregating at several linked loci, multipoint linkage analysis detects and localizes chromosomal regions (called trait loci) which contain disease genes. Our ability to correctly detect and position these trait loci is increased with the analysis of data observed on large pedigrees and multiple genetic markers. However, large pedigrees generally contain substantial missing data and exact calculation of the required multipoint likelihoods quickly becomes intractable. In this paper, we present a new Markov chain Monte Carlo approach to multipoint linkage analysis which greatly extends the range of models and data sets for which analysis is practical. Several advances in Markov chain Monte Carlo theory, namely joint updates of latent variables across loci or meioses, integrated proposals, Metropolis-Hastings restarts via sequential imputation and Rao-Blackwellized estimators, are incorporated into a sampling strategy which mixes well and produces accurate results in real time. The methodology is demonstrated through its application to several data sets originating from a study of early-onset Alzheimer's disease in families of Volga-German ethnic origin.
Institute of Mathematical Statistics Lecture Notes - Monograph Series, 1999
Genetic Analysis Workshop 10 identified five key factors contributing to the resolution of the genetic factors affecting complex traits. These include analysis with multipoint methods, use of extended pedigrees, and selective sampling of pedigrees. By sampling the affected individuals in an extended pedigree, we obtain individuals who have an increased probability of sharing genes identical by descent (IBD) at marker loci that are linked to the trait locus or loci. Given marker data on specified members of a pedigree, the conditional IBD status among relatives can be assessed, but exact computation is often impractical for multiple linked markers on complex pedigrees. The use of Markov chain Monte Carlo (MCMC) methods greatly extends the range of models and data sets for which analysis is computationally feasible. Many forms of MCMC have now been implemented in the context of genetic analysis. Here we propose a new sampler, which takes as latent variables the segregation indicators at marker loci, and jointly updates all indicators corresponding to a given meiosis. The sampler has good mixing properties. Questions of irreducibility are also addressed. 1. Introduction. Relatives share common ancestors. A single gene in such an ancestor may therefore descend via repeated segregations to each of the relatives. Such genes, which are copies of a single ancestral gene within a defined pedigree, are said to be identical by descent (IBD). Disregarding mutation, IBD genes must be of like type. It is the sharing of IBD genes that underlies phenotypic similarities among relatives. The probabilities of patterns of gene identity by descent are determined by the pedigree structure, and in turn determine the probability distribution of observed data on individuals of the pedigree. Genetic linkage is the dependent cosegregation of genes at different loci on the same chromosome. Linkage detection and linkage analysis on the basis of data observed on related individuals require the computation of multilocus probabilities of observed phenotypic data on pedigree structures. Genetic Analysis Workshop 10 identified five key factors contributing to the resolution of the genetic factors affecting complex traits (Wijsman and Amos 1997). These include analysis with multipoint methods, use of extended pedigrees, and selective sampling of pedigrees. Here we consider an approach to linkage detection which uses only data on affected individuals. However, calculation of multilocus probabili-Work supported in part by NIH grant GM-46255 and NSF grant BIR-9305835. AMS 1991 subject classifications. Primary 62F03 secondary 92D10.
Genetics, 1997
A Bayesian method for mapping linked quantitative trait loci (QTL) using multiple linked genetic markers is presented. Parameter estimation and hypothesis testing was implemented via Markov chain Monte Carlo (MCMC) algorithms. Parameters included were allele frequencies and substitution effects for two biallelic QTL, map positions of the QTL and markers, allele frequencies of the markers, and polygenic and residual variances. Missing data were polygenic effects and multi-locus marker-QTL genotypes. Three different MCMC schemes for testing the presence of a single or two linked QTL on the chromosome were compared. The first approach includes a model indicator variable representing two unlinked QTL affecting the trait, one linked and one unlinked QTL, or both QTL linked with the markers. The second approach incorporates an indicator variable for each QTL into the model for phenotype, allowing or not allowing for a substitution effect of a QTL on phenotype, and the third approach is ba...
Some Recent Advances in Mathematics and Statistics, 2013
The gene genealogy is a tree describing the ancestral relationships among genes sampled from unrelated individuals. Knowledge of the tree is useful for inference of population-genetic parameters such as migration or recombination rates. It also has potential application in gene-mapping, as individuals with similar trait values will tend to be more closely related genetically at the location of a trait-influencing mutation. One way to incorporate genealogical trees in genetic applications is to sample them conditional on observed genetic data. We have implemented a Markov chain Monte Carlo based genealogy sampler that conditions on observed haplotype data. Our implementation is based on an algorithm sketched by Zöllner and Pritchard but with several differences described herein. We also provide insights from our interpretation of their description that were necessary for efficient implementation. Our sampler can be used to summarize the distribution of tree-based association statistics, such as case-clustering measures.
Genetics, 1996
Markov chain Monte Carlo (MCMC) techniques are applied to simultaneously identify multiple quantitative trait loci (QTL) and the magnitude of their effects. Using a Bayesian approach a multi-locus model is fit to quantitative trait and molecular marker data, instead of fitting one locus at a time. The phenotypic trait is modeled as a linear function of the additive and dominance effects of the unknown QTL genotypes. Inference summaries for the locations of the QTL and their effects are derived from the corresponding marginal posterior densities obtained by integrating the likelihood, rather than by optimizing the joint likelihood surface. This is done using MCMC by treating the unknown QTL genotypes, and any missing marker genotypes, as augmented data and then by including these unknowns in the Markov chain cycle along with the unknown parameters. Parameter estimates are obtained as means of the corresponding marginal posterior densities. High posterior density regions of the marginal densities are obtained as confidence regions. We examine flowering time data from double haploid progeny of Brassica napus to illustrate the proposed method.
The American Journal of Human Genetics, 2000
Markov chain-Monte Carlo (MCMC) techniques for multipoint mapping of quantitative trait loci have been developed on nuclear-family and extended-pedigree data. These methods are based on repeated sampling-peeling and gene dropping of genotype vectors and random sampling of each of the model parameters from their full conditional distributions, given phenotypes, markers, and other model parameters. We further refine such approaches by improving the efficiency of the marker haplotype-updating algorithm and by adopting a new proposal for adding loci. Incorporating these refinements, we have performed an extensive simulation study on simulated nuclear-family data, varying the number of trait loci, family size, displacement, and other segregation parameters. Our simulation studies show that our MCMC algorithm identifies the locations of the true trait loci and estimates their segregation parameters well-provided that the total number of sibship pairs in the pedigree data is reasonably large, heritability of each individual trait locus is not too low, and the loci are not too close together. Our MCMC algorithm was shown to be significantly more efficient than LOKI (Heath 1997) in our simulation study using nuclear-family data.
public.iastate.edu
Probability functions such as likelihoods and genotype probabilities play an important role in the analysis of genetic data. When genotype data are incomplete Markov chain Monte Carlo (MCMC) methods, such as the Gibbs sampler, can be used to sample genotypes at the marker and trait loci. The Markov chain that corresponds to the scalar Gibbs sampler may not work due to slow mixing. Further, the Gibbs chain may not be irreducible when sampling genotypes at marker loci with more than two alleles. These problems do not arise if the genotypes are sampled jointly from the entire pedigree. When the pedigree does not have loops, a joint sample of the genotypes can be obtained efficiently via modification of the Elston-Stewart algorithm. When the pedigree has many loops, obtaining a joint sample can be time consuming. We propose a method for sampling genotypes from a pedigree so modified as to make joint sampling efficient. These samples, obtained from the modified pedigree, are used as candidate draws in the Metropolis-Hastings algorithm.
Molecular Biotechnology, 2004
One of the most challenging areas in human genetics is the dissection of quantitative traits. In this context, the efficient use of available data is important, including, when possible, use of large pedigrees and many markers for gene mapping. In addition, methods that jointly perform linkage analysis and estimation of the trait model are appealing because they combine the advantages of a model-based analysis with the advantages of methods that do not require prespecification of model parameters for linkage analysis. Here we review a Markov chain Monte Carlo approach for such joint linkage and segregation analysis, which allows analysis of oligogenic traits in the context of multipoint linkage analysis of large pedigrees. We provide an outline for practitioners of the salient features of the method, interpretation of the results, effect of violation of assumptions, and an example analysis of a two-locus trait to illustrate the method.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
BMC Genetics, 2003
Scientific Research and Essays, 2010
The American Journal of Human Genetics, 2000
Journal of the American Statistical Association, 2009
Genetics, 2009
Genetic Epidemiology, 2002
Journal of Human Genetics, 2011
Theoretical Population Biology, 2007
The American Journal of Human Genetics, 2000
Genetics Selection Evolution, 2003
Genetics Selection Evolution, 2008
The American Journal of Human Genetics, 2007
The American Journal of Human Genetics, 2002
Genetic Epidemiology, 1993
PLoS ONE, 2011
Bioinformatics, 2010