Modeling Demographic Processes In Marked Populations, 2009
We investigated the utility of state-space models for determining the demographic causes of popul... more We investigated the utility of state-space models for determining the demographic causes of population declines, using the Song Thrush as an example. A series of integrated state-space models were fitted to census and ring-recovery data from the United Kingdom for the period 1968-2000. The models were fitted using Bayesian MCMC techniques with uniform priors and were ranked using the Deviance Information Criterion (DIC). Ring-reporting rates were modelled as a declining logit-linear function of year, with separate slopes for first-year birds and adults. The system process involved three demographic parameters, first-year survival, adult survival and productivity. Survival rates were modelled as yearspecific, as specific to blocks with uniform population growth rates, or as logit-linear functions of weather or year. Productivity rates were modelled as random annual effects, as block-specific or as log-linear functions of year. We fitted 17 such models chosen on the basis of our prior knowledge of this system, given that it was not practical to fit all potential models. Six models within 10 points of the smallest DIC value were selected for inference. The posterior distributions from these preferred models suggest that population growth rates are best correlated with first year survival and that and that there is also a pattern of consistent but weaker correlations between population growth rate and adult survival. Correlations between population growth rates and productivity were more variable, and may have been influenced by errors in other parts of the model, as productivity is essentially measured by difference. Thus in this analysis the evidence for productivity having a substantial influence of population changes is equivocal. The interpretation of these results and the potential value of integrated state-space models for research into the population dynamics of declining populations are discussed.
We consider mark-recapture-recovery (MRR) data of animals where the model parameters are a functi... more We consider mark-recapture-recovery (MRR) data of animals where the model parameters are a function of individual time-varying continuous covariates. For such covariates, the covariate value is unobserved if the corresponding individual is unobserved, in which case the survival probability cannot be evaluated. For continuous-valued covariates, the corresponding likelihood can only be expressed in the form of an integral that is analytically intractable and, to date, no maximum likelihood approach that uses all the information in the data has been developed. Assuming a first-order Markov process for the covariate values, we accomplish this task by formulating the MRR setting in a state-space framework and considering an approximate likelihood approach which essentially discretizes the range of covariate values, reducing the integral to a summation. The likelihood can then be efficiently calculated and maximized using standard techniques for hidden Markov models. We initially assess the approach using simulated data before applying to real data relating to Soay sheep, specifying the survival probability as a function of body mass. Models that have previously been suggested for the corresponding covariate process are typically of the form of diffusive random walks. We consider an alternative nondiffusive AR(1)-type model which appears to provide a significantly better fit to the Soay sheep data. . This reprint differs from the original in pagination and typographic detail. 1 2 R. LANGROCK AND R. KING such as a ring or tag) and released back into the population. At each subsequent survey all individuals observed are recorded, and those that have not previously been observed are again uniquely identified, before all are released back into the population. We assume that individuals can be observed alive or recovered dead in each survey. The resulting MRR data can be summarised as the observed encounter histories for each individual observed within the population, detailing for each survey event whether an individual was observed alive or recovered dead. Conditioning on the initial capture time of each individual leads to Cormack-Jolly-Seber-type models [see for a review of these models]. The original Cormack-Jolly-Seber model considered only live captures (i.e., markrecapture data) and was extended to additional recoveries by . The corresponding MRR likelihood function of these data can be written as a function of survival, recapture and recovery probabilities. Recent research has focussed on linking environmental and individual covariates to demographic parameters, most notably the survival probabilities, in order to explain temporal and individual variability [Brooks, Catchpole name but a few]. We consider individual time-varying continuous covariates. These have traditionally been difficult to deal with due to the missing covariate values (if an individual is unobserved, the corresponding covariate value is also unknown). One of the initial approaches to dealing with such covariates was to (coarsely) discretize the covariate space, essentially defining discrete covariate "states." considered data relating to meadow voles (Microtus pennsylvanicus) and categorised weight into four different categories. Such a discretization reduces the model to the Arnason-Schwarz model [Brownie et al. (1993), Schwarz, Schweigert and]. Transition probabilities between the covariate states are estimated within the optimisation of the likelihood (possibly with additional restrictions on the state transitions). With the coarse discretization arbitrarily defined, this approach leads to a (potentially significant) loss of information. Catchpole, Morgan and Tavecchia (2008) have proposed a conditional likelihood approach (often referred to as the "trinomial approach"). By conditioning on only the observed covariate values, this approach results in a simple, closed-form likelihood expression. However, this involves discarding a proportion of the available data, leading to a decreased precision of the parameter estimates. In addition, Bayesian approaches have been proposed Schwarz (2006), King, Brooks and] and the corresponding model fitted using a data augmentation approach ]. Within the Bayesian approach priors need to specified on the model parameters (and possibly models in the presence of model
The aim of this paper is to demonstrate the R package conting for the Bayesian analysis of comple... more The aim of this paper is to demonstrate the R package conting for the Bayesian analysis of complete and incomplete contingency tables using hierarchical log-linear models. This package allows a user to identify interactions between categorical factors (via complete contingency tables) and to estimate closed population sizes using capture-recapture studies (via incomplete contingency tables). The models are fitted using Markov chain Monte Carlo methods. In particular, implementations of the Metropolis-Hastings and reversible jump algorithms appropriate for log-linear models are employed. The conting package is demonstrated on four real examples.
The 10-year drug strategy for England and Wales was published in February 2008. It dropped drugs-... more The 10-year drug strategy for England and Wales was published in February 2008. It dropped drugs-related deaths (DRDs) as a key performance indicator. Scotland retained a necessary strong focus on DRDs. Scotland's DRDs numbered 1006 in 2000-02 and 1009 in 2003-05. The previous Scottish administration's claim that its number of current injectors had decreased substantially between 2000 and 2003 implied, paradoxically, that their DRD rate would have to have increased. Worse was to come: Scotland's DRDs had increased to 876 in 2006+2007. We analyse UK's DRDs by sex and age-group to reveal temporal trends (2000-02 versus 2003-05 versus 2006+2007) with different public health and epidemiological implications. We also address the above Scottish paradox and assess, by age-group, how consistent Scotland's 876 DRDs in 2006+2007 are with Scottish injectors' DRD rate in 2003-05 of around 1 per 100 injector-years. Public health success in the UK in reducing DRDs at young...
Journal of Agricultural, Biological, and Environmental Statistics, 2014
Summary: We consider mark-recapture-recovery data where the model parameters are expressed as fun... more Summary: We consider mark-recapture-recovery data where the model parameters are expressed as functions of time-varying individual continuous covariates. The issue arises of missing covariate values, for (at least) the times when an individual is not observed. We ...
1. Group dynamics are a fundamental aspect of many species' movements. The need to adequately mod... more 1. Group dynamics are a fundamental aspect of many species' movements. The need to adequately model individuals' interactions with other group members has been recognized, particularly in order to differentiate the role of social forces in individual movement from environmental factors. However, to date, practical statistical methods, which can include group dynamics in animal movement models, have been lacking. 2. We consider a flexible modelling framework that distinguishes a group-level model, describing the movement of the group's centre, and an individual-level model, such that each individual makes its movement decisions relative to the group centroid. The basic idea is framed within the flexible class of hidden Markov models, extending previous work on modelling animal movement by means of multistate random walks. 3. While in simulation experiments parameter estimators exhibit some bias in non-ideal scenarios, we show that generally the estimation of models of this type is both feasible and ecologically informative. 4. We illustrate the approach using real movement data from 11 reindeer (Rangifer tarandus). Results indicate a directional bias towards a group centroid for reindeer in an encamped state. Though the attraction to the group centroid is relatively weak, our model successfully captures group-influenced movement dynamics. Specifically, as compared to a regular mixture of correlated random walks, the group dynamic model more accurately predicts the non-diffusive behaviour of a cohesive mobile group. 5. As technology continues to develop, it will become easier and less expensive to tag multiple individuals within a group in order to follow their movements. Our work provides a first inferential framework for understanding the relative influences of individual versus group-level movement decisions. This framework can be extended to include covariates corresponding to environmental influences or body condition. As such, this framework allows for a broader understanding of the many internal and external factors that can influence an individual's movement.
... 2. Bayesian field theory. ... This is done by means of a very simple relationship, known as B... more ... 2. Bayesian field theory. ... This is done by means of a very simple relationship, known as Bayes' Theorem, which is given in Chapter 4. When certain of the parameters have particular importance, then it is their marginal posterior dis-tribution that is the end-point of the analysis. ...
Estimating the size of hidden or difficult to reach populations is often of interest for economic... more Estimating the size of hidden or difficult to reach populations is often of interest for economic, sociological or public health reasons. In order to estimate such populations, administrative data lists are often collated to form multi-list cross-counts and displayed in the form of an incomplete contingency table. Log-linear models are typically fitted to such data to obtain an estimate of the total population size by estimating the number of individuals not observed by any of the data-sources. This approach has been taken to estimate the current number of people who inject drugs (PWID) in Scotland, with the Hepatitis C virus diagnosis database used as one of the datasources to identify PWID. However, the Hepatitis C virus diagnosis data-source does not distinguish between current and former PWID, which, if ignored, will lead to overestimation of the total population size of current PWID. We extend the standard model-fitting approach to allow for a data-source, which contains a mixture of target and non-target individuals (i.e. in this case, current and former PWID). We apply the proposed approach to data for PWID in Scotland in 2003, 2006 and 2009 and compare with the results from standard log-linear models.
Using Bayesian capture-recapture methods, we estimate current injectors in Scotland in 2003, and,... more Using Bayesian capture-recapture methods, we estimate current injectors in Scotland in 2003, and, thereby, injectors' drug-related death rates for the period 2003-2005. Four different data sources are considered [Hepatitis C Virus (HCV) database, hospital admissions, social enquiry reports, and drug misuse database reports by General Practices or Drug Treatment Agencies] which provide covariate information on sex, region (Greater Glasgow versus elsewhere in Scotland) and age group (15-34 years and 35+ years).We quantified Scotland's current injectors in 2003 at 27,400 (95% highest probability density interval: 20,700-32,100) by incorporating underlying model uncertainty in terms of the possible interactions present between data sources and/or covariates. The posterior probability was 72% that Scotland had more current injectors in 2003 than in 2000. Detailed comparison with 2000 gave evidence of importantly changed numbers of current injectors for different covariate classes.In addition, and of particular social interest, is the estimation of injectors' drug-related death rates. Expert information was used to construct upper and lower bounds on the number of drug-related deaths pertaining to injectors, which were then used to provide bounds on injectors' drug-related death rates. Failure to incorporate expert information could result in over-estimation of drug-related death rates for subclasses of injectors.
ABSTRACT We provide a closed form likelihood expression for multi-state capture–recapture–recover... more ABSTRACT We provide a closed form likelihood expression for multi-state capture–recapture–recovery data when the state of an individual may be only partially observed. The corresponding sufficient statistics are presented in addition to a matrix formulation which facilitates an efficient calculation of the likelihood. This likelihood framework provides a consistent and unified framework with many standard models applied to capture–recapture–recovery data as special cases.
A default prior distribution is proposed for the Bayesian analysis of contingency tables. The pri... more A default prior distribution is proposed for the Bayesian analysis of contingency tables. The prior is specified to allow for dependence between levels of the factors. Different dependence structures are considered, including conditional autoregressive and distance correlation structures. To demonstrate the prior distribution, a dataset is considered which involves estimating the number of injecting drug users in the eleven National Health Service board regions of Scotland using an incomplete contingency table where the dependence structure relates to geographical regions.
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2003
The classical approach to statistical analysis is usually based upon finding values for model par... more The classical approach to statistical analysis is usually based upon finding values for model parameters that maximise the likelihood function. Model choice in this context is often also based upon the likelihood function, but with the addition of a penalty term for the number of parameters. Though models may be compared pairwise using likelihood ratio tests for example, various criteria such as the AIC have been proposed as alternatives when multiple models need to be compared. In practical terms, the classical approach to model selection usually involves maximising the likelihood function associated with each competing model and then calculating the corresponding criteria value(s). However, when large numbers of models are possible, this quickly becomes infeasible unless a method that simultaneously maximises over both parameter and model space is available. In this paper we propose an extension to the traditional simulated annealing algorithm that allows for moves that not only change parameter values but that also move between competing models. This transdimensional simulated annealing algorithm can therefore be used to locate models and parameters that minimise criteria such as the AIC, but within a single algorithm, removing the need for large numbers of simulations to be run. We discuss the implementation of the trans-dimensional simulated annealing algorithm and use simulation studies to examine their performance in realistically complex modelling situations. We illustrate our ideas with a pedagogic example based upon the analysis of an autoregressive time series and two more detailed examples: one on variable selection for logistic regression and the other on model selection for the analysis of integrated recapture/recovery data.
Journal of the Royal Statistical Society: Series A (Statistics in Society), 2014
Injecting drug users (IDUs) have a direct social and economic effect yet can typically be regarde... more Injecting drug users (IDUs) have a direct social and economic effect yet can typically be regarded as a hidden population within a community. We estimate the size of the IDU population across the nine different Government Office regions of England in 2005-2006 by using capture-recapture methods with age (ranging from 15 to 64 years) and gender as covariate information. We consider a Bayesian model averaging approach using log-linear models, where we can include explicit prior information within the analysis in relation to the total IDU population (elicited from the number of drug-related deaths and injectors' drug-related death rates). Estimation at the regional level allows for regional heterogeneity with these regional estimates aggregated to obtain a posterior mean estimate for the number of England's IDUs of 195840 with 95% credible interval (181700, 210480). There is significant variation in the estimated regional prevalence of current IDUs per million of population aged 15-64 years, and in injecting drug-related death rates across the gender age cross-classifications. The propensity of an IDU to be seen by at least one source also exhibits strong regional variability with London having the lowest propensity of being observed (posterior mean probability 0.21) and the South West the highest propensity (posterior mean 0.46).
We present new developments in statistical methodology allowing in-depth analysis of realistic, c... more We present new developments in statistical methodology allowing in-depth analysis of realistic, complex biological models for longitudinal data sets. Important biological details such as mark-loss and recapture heterogeneity can be identified. 2. We use a Bayesian hidden process framework for a comparative analysis of long-term capture-recapture data with various combinations of marking methods for adult female grey seals Halichoerus grypus at two UK colonies. 3. Seals were identified using three different methods: flipper tags, brands, or natural pelage markings. Animals identified by brands or natural markings were re-sighted more effectively than those with tags. 4. Flipper tag-loss rates differed between colonies, and there was evidence for non-independent tagloss in double-tagged animals. There was also evidence at one colony for the presence of transient animals, which attend the colony for 1 year only. Apparent survival was higher and more consistent at one site, and the differences in survival between the two colonies were able to explain contrasting pup production trends at these sites. 5. Synthesis and applications. Longitudinal studies allow for the estimation of demographic parameters which have important implications for our understanding of population dynamics and for the conservation and management of populations. Using new statistical developments to allow for the analysis of missing ⁄ incomplete data and partial observations, we show how survival can be estimated from complex mark-recapture data, allowing for the effects of mark loss. The re-sightability of different marks is estimated, indicating that photo-ID based on natural pelage markings is a very effective method for identifying grey seals. There are notable contrasts in survival estimates between breeding colonies which can explain contrasts in population trends at these sites, confirming the importance of adult survival in driving population dynamics in this long-lived species.
Journal of Agricultural, Biological, and Environmental Statistics, 2014
The few distance sampling studies that use Bayesian methods typically consider only line transect... more The few distance sampling studies that use Bayesian methods typically consider only line transect sampling with a half-normal detection function. We present a Bayesian approach to analyse distance sampling data applicable to line and point transects, exact and interval distance data and any detection function possibly including covariates affecting detection probabilities. We use an integrated likelihood which combines the detection and density models. For the latter, densities are related to covariates in a log-linear mixed effect Poisson model which accommodates correlated counts. We use a Metropolis-Hastings algorithm for updating parameters and a reversible jump algorithm to include model selection for both the detection function and density models. The approach is applied to a large-scale experimental design study of northern bobwhite coveys where the interest was to assess the effect of establishing herbaceous buffers around agricultural fields in several states in the US on bird densities. Results were compared with those from an existing maximum likelihood approach that analyses the detection and density models in two stages. Both methods revealed an increase of covey densities on buffered fields. Our approach gave estimates with higher precision even though it does not condition on a known detection function for the density model.
Journal of Agricultural, Biological, and Environmental Statistics, 2012
... DOI: 000 000 0000 A Bayesian approach to fitting Gibbs processes with temporal random effects... more ... DOI: 000 000 0000 A Bayesian approach to fitting Gibbs processes with temporal random effects Ruth King∗, Janine B. Illian∗∗, Stuart E. King∗∗∗ and Glenna Evans∗∗∗∗ ... email: [email protected] **email: [email protected] ***email: [email protected] ...
Modeling Demographic Processes In Marked Populations, 2009
We investigated the utility of state-space models for determining the demographic causes of popul... more We investigated the utility of state-space models for determining the demographic causes of population declines, using the Song Thrush as an example. A series of integrated state-space models were fitted to census and ring-recovery data from the United Kingdom for the period 1968-2000. The models were fitted using Bayesian MCMC techniques with uniform priors and were ranked using the Deviance Information Criterion (DIC). Ring-reporting rates were modelled as a declining logit-linear function of year, with separate slopes for first-year birds and adults. The system process involved three demographic parameters, first-year survival, adult survival and productivity. Survival rates were modelled as yearspecific, as specific to blocks with uniform population growth rates, or as logit-linear functions of weather or year. Productivity rates were modelled as random annual effects, as block-specific or as log-linear functions of year. We fitted 17 such models chosen on the basis of our prior knowledge of this system, given that it was not practical to fit all potential models. Six models within 10 points of the smallest DIC value were selected for inference. The posterior distributions from these preferred models suggest that population growth rates are best correlated with first year survival and that and that there is also a pattern of consistent but weaker correlations between population growth rate and adult survival. Correlations between population growth rates and productivity were more variable, and may have been influenced by errors in other parts of the model, as productivity is essentially measured by difference. Thus in this analysis the evidence for productivity having a substantial influence of population changes is equivocal. The interpretation of these results and the potential value of integrated state-space models for research into the population dynamics of declining populations are discussed.
We consider mark-recapture-recovery (MRR) data of animals where the model parameters are a functi... more We consider mark-recapture-recovery (MRR) data of animals where the model parameters are a function of individual time-varying continuous covariates. For such covariates, the covariate value is unobserved if the corresponding individual is unobserved, in which case the survival probability cannot be evaluated. For continuous-valued covariates, the corresponding likelihood can only be expressed in the form of an integral that is analytically intractable and, to date, no maximum likelihood approach that uses all the information in the data has been developed. Assuming a first-order Markov process for the covariate values, we accomplish this task by formulating the MRR setting in a state-space framework and considering an approximate likelihood approach which essentially discretizes the range of covariate values, reducing the integral to a summation. The likelihood can then be efficiently calculated and maximized using standard techniques for hidden Markov models. We initially assess the approach using simulated data before applying to real data relating to Soay sheep, specifying the survival probability as a function of body mass. Models that have previously been suggested for the corresponding covariate process are typically of the form of diffusive random walks. We consider an alternative nondiffusive AR(1)-type model which appears to provide a significantly better fit to the Soay sheep data. . This reprint differs from the original in pagination and typographic detail. 1 2 R. LANGROCK AND R. KING such as a ring or tag) and released back into the population. At each subsequent survey all individuals observed are recorded, and those that have not previously been observed are again uniquely identified, before all are released back into the population. We assume that individuals can be observed alive or recovered dead in each survey. The resulting MRR data can be summarised as the observed encounter histories for each individual observed within the population, detailing for each survey event whether an individual was observed alive or recovered dead. Conditioning on the initial capture time of each individual leads to Cormack-Jolly-Seber-type models [see for a review of these models]. The original Cormack-Jolly-Seber model considered only live captures (i.e., markrecapture data) and was extended to additional recoveries by . The corresponding MRR likelihood function of these data can be written as a function of survival, recapture and recovery probabilities. Recent research has focussed on linking environmental and individual covariates to demographic parameters, most notably the survival probabilities, in order to explain temporal and individual variability [Brooks, Catchpole name but a few]. We consider individual time-varying continuous covariates. These have traditionally been difficult to deal with due to the missing covariate values (if an individual is unobserved, the corresponding covariate value is also unknown). One of the initial approaches to dealing with such covariates was to (coarsely) discretize the covariate space, essentially defining discrete covariate "states." considered data relating to meadow voles (Microtus pennsylvanicus) and categorised weight into four different categories. Such a discretization reduces the model to the Arnason-Schwarz model [Brownie et al. (1993), Schwarz, Schweigert and]. Transition probabilities between the covariate states are estimated within the optimisation of the likelihood (possibly with additional restrictions on the state transitions). With the coarse discretization arbitrarily defined, this approach leads to a (potentially significant) loss of information. Catchpole, Morgan and Tavecchia (2008) have proposed a conditional likelihood approach (often referred to as the "trinomial approach"). By conditioning on only the observed covariate values, this approach results in a simple, closed-form likelihood expression. However, this involves discarding a proportion of the available data, leading to a decreased precision of the parameter estimates. In addition, Bayesian approaches have been proposed Schwarz (2006), King, Brooks and] and the corresponding model fitted using a data augmentation approach ]. Within the Bayesian approach priors need to specified on the model parameters (and possibly models in the presence of model
The aim of this paper is to demonstrate the R package conting for the Bayesian analysis of comple... more The aim of this paper is to demonstrate the R package conting for the Bayesian analysis of complete and incomplete contingency tables using hierarchical log-linear models. This package allows a user to identify interactions between categorical factors (via complete contingency tables) and to estimate closed population sizes using capture-recapture studies (via incomplete contingency tables). The models are fitted using Markov chain Monte Carlo methods. In particular, implementations of the Metropolis-Hastings and reversible jump algorithms appropriate for log-linear models are employed. The conting package is demonstrated on four real examples.
The 10-year drug strategy for England and Wales was published in February 2008. It dropped drugs-... more The 10-year drug strategy for England and Wales was published in February 2008. It dropped drugs-related deaths (DRDs) as a key performance indicator. Scotland retained a necessary strong focus on DRDs. Scotland's DRDs numbered 1006 in 2000-02 and 1009 in 2003-05. The previous Scottish administration's claim that its number of current injectors had decreased substantially between 2000 and 2003 implied, paradoxically, that their DRD rate would have to have increased. Worse was to come: Scotland's DRDs had increased to 876 in 2006+2007. We analyse UK's DRDs by sex and age-group to reveal temporal trends (2000-02 versus 2003-05 versus 2006+2007) with different public health and epidemiological implications. We also address the above Scottish paradox and assess, by age-group, how consistent Scotland's 876 DRDs in 2006+2007 are with Scottish injectors' DRD rate in 2003-05 of around 1 per 100 injector-years. Public health success in the UK in reducing DRDs at young...
Journal of Agricultural, Biological, and Environmental Statistics, 2014
Summary: We consider mark-recapture-recovery data where the model parameters are expressed as fun... more Summary: We consider mark-recapture-recovery data where the model parameters are expressed as functions of time-varying individual continuous covariates. The issue arises of missing covariate values, for (at least) the times when an individual is not observed. We ...
1. Group dynamics are a fundamental aspect of many species' movements. The need to adequately mod... more 1. Group dynamics are a fundamental aspect of many species' movements. The need to adequately model individuals' interactions with other group members has been recognized, particularly in order to differentiate the role of social forces in individual movement from environmental factors. However, to date, practical statistical methods, which can include group dynamics in animal movement models, have been lacking. 2. We consider a flexible modelling framework that distinguishes a group-level model, describing the movement of the group's centre, and an individual-level model, such that each individual makes its movement decisions relative to the group centroid. The basic idea is framed within the flexible class of hidden Markov models, extending previous work on modelling animal movement by means of multistate random walks. 3. While in simulation experiments parameter estimators exhibit some bias in non-ideal scenarios, we show that generally the estimation of models of this type is both feasible and ecologically informative. 4. We illustrate the approach using real movement data from 11 reindeer (Rangifer tarandus). Results indicate a directional bias towards a group centroid for reindeer in an encamped state. Though the attraction to the group centroid is relatively weak, our model successfully captures group-influenced movement dynamics. Specifically, as compared to a regular mixture of correlated random walks, the group dynamic model more accurately predicts the non-diffusive behaviour of a cohesive mobile group. 5. As technology continues to develop, it will become easier and less expensive to tag multiple individuals within a group in order to follow their movements. Our work provides a first inferential framework for understanding the relative influences of individual versus group-level movement decisions. This framework can be extended to include covariates corresponding to environmental influences or body condition. As such, this framework allows for a broader understanding of the many internal and external factors that can influence an individual's movement.
... 2. Bayesian field theory. ... This is done by means of a very simple relationship, known as B... more ... 2. Bayesian field theory. ... This is done by means of a very simple relationship, known as Bayes' Theorem, which is given in Chapter 4. When certain of the parameters have particular importance, then it is their marginal posterior dis-tribution that is the end-point of the analysis. ...
Estimating the size of hidden or difficult to reach populations is often of interest for economic... more Estimating the size of hidden or difficult to reach populations is often of interest for economic, sociological or public health reasons. In order to estimate such populations, administrative data lists are often collated to form multi-list cross-counts and displayed in the form of an incomplete contingency table. Log-linear models are typically fitted to such data to obtain an estimate of the total population size by estimating the number of individuals not observed by any of the data-sources. This approach has been taken to estimate the current number of people who inject drugs (PWID) in Scotland, with the Hepatitis C virus diagnosis database used as one of the datasources to identify PWID. However, the Hepatitis C virus diagnosis data-source does not distinguish between current and former PWID, which, if ignored, will lead to overestimation of the total population size of current PWID. We extend the standard model-fitting approach to allow for a data-source, which contains a mixture of target and non-target individuals (i.e. in this case, current and former PWID). We apply the proposed approach to data for PWID in Scotland in 2003, 2006 and 2009 and compare with the results from standard log-linear models.
Using Bayesian capture-recapture methods, we estimate current injectors in Scotland in 2003, and,... more Using Bayesian capture-recapture methods, we estimate current injectors in Scotland in 2003, and, thereby, injectors' drug-related death rates for the period 2003-2005. Four different data sources are considered [Hepatitis C Virus (HCV) database, hospital admissions, social enquiry reports, and drug misuse database reports by General Practices or Drug Treatment Agencies] which provide covariate information on sex, region (Greater Glasgow versus elsewhere in Scotland) and age group (15-34 years and 35+ years).We quantified Scotland's current injectors in 2003 at 27,400 (95% highest probability density interval: 20,700-32,100) by incorporating underlying model uncertainty in terms of the possible interactions present between data sources and/or covariates. The posterior probability was 72% that Scotland had more current injectors in 2003 than in 2000. Detailed comparison with 2000 gave evidence of importantly changed numbers of current injectors for different covariate classes.In addition, and of particular social interest, is the estimation of injectors' drug-related death rates. Expert information was used to construct upper and lower bounds on the number of drug-related deaths pertaining to injectors, which were then used to provide bounds on injectors' drug-related death rates. Failure to incorporate expert information could result in over-estimation of drug-related death rates for subclasses of injectors.
ABSTRACT We provide a closed form likelihood expression for multi-state capture–recapture–recover... more ABSTRACT We provide a closed form likelihood expression for multi-state capture–recapture–recovery data when the state of an individual may be only partially observed. The corresponding sufficient statistics are presented in addition to a matrix formulation which facilitates an efficient calculation of the likelihood. This likelihood framework provides a consistent and unified framework with many standard models applied to capture–recapture–recovery data as special cases.
A default prior distribution is proposed for the Bayesian analysis of contingency tables. The pri... more A default prior distribution is proposed for the Bayesian analysis of contingency tables. The prior is specified to allow for dependence between levels of the factors. Different dependence structures are considered, including conditional autoregressive and distance correlation structures. To demonstrate the prior distribution, a dataset is considered which involves estimating the number of injecting drug users in the eleven National Health Service board regions of Scotland using an incomplete contingency table where the dependence structure relates to geographical regions.
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2003
The classical approach to statistical analysis is usually based upon finding values for model par... more The classical approach to statistical analysis is usually based upon finding values for model parameters that maximise the likelihood function. Model choice in this context is often also based upon the likelihood function, but with the addition of a penalty term for the number of parameters. Though models may be compared pairwise using likelihood ratio tests for example, various criteria such as the AIC have been proposed as alternatives when multiple models need to be compared. In practical terms, the classical approach to model selection usually involves maximising the likelihood function associated with each competing model and then calculating the corresponding criteria value(s). However, when large numbers of models are possible, this quickly becomes infeasible unless a method that simultaneously maximises over both parameter and model space is available. In this paper we propose an extension to the traditional simulated annealing algorithm that allows for moves that not only change parameter values but that also move between competing models. This transdimensional simulated annealing algorithm can therefore be used to locate models and parameters that minimise criteria such as the AIC, but within a single algorithm, removing the need for large numbers of simulations to be run. We discuss the implementation of the trans-dimensional simulated annealing algorithm and use simulation studies to examine their performance in realistically complex modelling situations. We illustrate our ideas with a pedagogic example based upon the analysis of an autoregressive time series and two more detailed examples: one on variable selection for logistic regression and the other on model selection for the analysis of integrated recapture/recovery data.
Journal of the Royal Statistical Society: Series A (Statistics in Society), 2014
Injecting drug users (IDUs) have a direct social and economic effect yet can typically be regarde... more Injecting drug users (IDUs) have a direct social and economic effect yet can typically be regarded as a hidden population within a community. We estimate the size of the IDU population across the nine different Government Office regions of England in 2005-2006 by using capture-recapture methods with age (ranging from 15 to 64 years) and gender as covariate information. We consider a Bayesian model averaging approach using log-linear models, where we can include explicit prior information within the analysis in relation to the total IDU population (elicited from the number of drug-related deaths and injectors' drug-related death rates). Estimation at the regional level allows for regional heterogeneity with these regional estimates aggregated to obtain a posterior mean estimate for the number of England's IDUs of 195840 with 95% credible interval (181700, 210480). There is significant variation in the estimated regional prevalence of current IDUs per million of population aged 15-64 years, and in injecting drug-related death rates across the gender age cross-classifications. The propensity of an IDU to be seen by at least one source also exhibits strong regional variability with London having the lowest propensity of being observed (posterior mean probability 0.21) and the South West the highest propensity (posterior mean 0.46).
We present new developments in statistical methodology allowing in-depth analysis of realistic, c... more We present new developments in statistical methodology allowing in-depth analysis of realistic, complex biological models for longitudinal data sets. Important biological details such as mark-loss and recapture heterogeneity can be identified. 2. We use a Bayesian hidden process framework for a comparative analysis of long-term capture-recapture data with various combinations of marking methods for adult female grey seals Halichoerus grypus at two UK colonies. 3. Seals were identified using three different methods: flipper tags, brands, or natural pelage markings. Animals identified by brands or natural markings were re-sighted more effectively than those with tags. 4. Flipper tag-loss rates differed between colonies, and there was evidence for non-independent tagloss in double-tagged animals. There was also evidence at one colony for the presence of transient animals, which attend the colony for 1 year only. Apparent survival was higher and more consistent at one site, and the differences in survival between the two colonies were able to explain contrasting pup production trends at these sites. 5. Synthesis and applications. Longitudinal studies allow for the estimation of demographic parameters which have important implications for our understanding of population dynamics and for the conservation and management of populations. Using new statistical developments to allow for the analysis of missing ⁄ incomplete data and partial observations, we show how survival can be estimated from complex mark-recapture data, allowing for the effects of mark loss. The re-sightability of different marks is estimated, indicating that photo-ID based on natural pelage markings is a very effective method for identifying grey seals. There are notable contrasts in survival estimates between breeding colonies which can explain contrasts in population trends at these sites, confirming the importance of adult survival in driving population dynamics in this long-lived species.
Journal of Agricultural, Biological, and Environmental Statistics, 2014
The few distance sampling studies that use Bayesian methods typically consider only line transect... more The few distance sampling studies that use Bayesian methods typically consider only line transect sampling with a half-normal detection function. We present a Bayesian approach to analyse distance sampling data applicable to line and point transects, exact and interval distance data and any detection function possibly including covariates affecting detection probabilities. We use an integrated likelihood which combines the detection and density models. For the latter, densities are related to covariates in a log-linear mixed effect Poisson model which accommodates correlated counts. We use a Metropolis-Hastings algorithm for updating parameters and a reversible jump algorithm to include model selection for both the detection function and density models. The approach is applied to a large-scale experimental design study of northern bobwhite coveys where the interest was to assess the effect of establishing herbaceous buffers around agricultural fields in several states in the US on bird densities. Results were compared with those from an existing maximum likelihood approach that analyses the detection and density models in two stages. Both methods revealed an increase of covey densities on buffered fields. Our approach gave estimates with higher precision even though it does not condition on a known detection function for the density model.
Journal of Agricultural, Biological, and Environmental Statistics, 2012
... DOI: 000 000 0000 A Bayesian approach to fitting Gibbs processes with temporal random effects... more ... DOI: 000 000 0000 A Bayesian approach to fitting Gibbs processes with temporal random effects Ruth King∗, Janine B. Illian∗∗, Stuart E. King∗∗∗ and Glenna Evans∗∗∗∗ ... email: [email protected] **email: [email protected] ***email: [email protected] ...
Uploads
Papers by Ruth King