This article appeared in a journal published by Elsevier. The attached copy is furnished to the a... more This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier's archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright
Very often the data collected by social scientists involve dependent observations, without, howev... more Very often the data collected by social scientists involve dependent observations, without, however, the investigators having any substantive interest in the nature of the dependencies. Although these dependencies are not important for the answers to the research questions concerned, they must still be taken into account in the analysis. Standard statistical estimation and testing procedures assume independent and identically distributed observations, and need to be modified for observations that are clustered in some way. Marginal models provide the tools to deal with these dependencies without having to make restrictive assumptions about their nature. In this paper, recent developments in the (maximum likelihood) estimation and testing of marginal models for categorical data will be explained, including marginal models with latent variables. The differences and commonalities with other ways of dealing with these nuisance dependencies will be discussed, especially with GEE and also briefly with (hierarchical) random coefficient models. The usefulness of marginal modeling will be illuminated by showing several common types of research questions and designs for which marginal models may provide the answers, along with two extensive real world examples. Finally, a brief evaluation will be given, shortcomings and strong points, computer programs and future work to be done.
We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total in... more We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures.
We describe a previously unnoted problem which, if it occurs, causes the empirical likelihood met... more We describe a previously unnoted problem which, if it occurs, causes the empirical likelihood method to break down. It is related to the empty set problem, recently described in detail by Grendár and Judge (2009), which is the problem that the empirical likelihood model is empty, so that maximum empirical likelihood estimates do not exist. An example is the model that the mean is zero, while all observations are positive. A related problem, which appears to have gone unnoted so far, is what we call the zero likelihood problem. This occurs when the empirical likelihood model is nonempty but all its elements have zero empirical likelihood. Hence, also in this case inference regarding the model under investigation breaks down. An example is the model that the covariance is zero, and the sample consists of monotonically associated observations. In this paper, we define the problem generally and give examples. Although the problem can occur in many situations, we found it to be especially prevalent in marginal modeling of categorical data, when the problem often occurs with probability close to one for large, sparse contingency tables.
Statistical models defined by imposing restrictions on marginal distributions of contingency tabl... more Statistical models defined by imposing restrictions on marginal distributions of contingency tables have received considerable attention recently. This paper introduces a general definition of marginal log-linear parameters and describes conditions for a marginal log-linear parameter to be a smooth parameterization of the distribution, and to be variation independent. Statistical models defined by imposing affine restrictions on the marginal log-linear parameters are investigated. These models generalize ordinary log-linear and multivariate logistic models. Sufficient conditions for a log-affine marginal model to be nonempty, and to be a curved exponential family are given. Standard large sample theory is shown to apply to maximum likelihood estimation of log-affine marginal models for a variety of sampling procedures.
Abstract: The standard two-variable chi-square test is typically consistent for all alternatives ... more Abstract: The standard two-variable chi-square test is typically consistent for all alternatives to independence, but effectively treats the data as nominal which may lead to loss of power for ordinal data. Alternatively, a test based on Kendall's tau does take ordinality into account, but only has power against a narrow set of alternatives. This paper introduces a new test aimed at filling this gap, ie, it is designed for ordinal data and to have omnibus asymptotic power. Our test is a permutation test based on a modification of Kendall's tau, denoted $\ ...
Most genome-wide association studies (GWASs) use randomly selected sam- ples from the population ... more Most genome-wide association studies (GWASs) use randomly selected sam- ples from the population (hereafter bases) as the control set. This approach is successful when the trait of interest is rare; otherwise, a loss in the statistical power to detect disease- associated variants is expected. To address this, a pro- posal to combine the three sample types, cases, controls and bases is introduced, for instances when the disease under study is prevalent. This is done by modelling the bases as a mixture of multinomial logistic functions of cases and controls, ac- cording to the disease prevalence. The maximum likelihood method is used to estimate the underlying parameters using the EM algorithm. Three classical tests of association; score, Walds, and likelihood ratio tests are derived and their power of detecting genetic associations under different designs is compared. Simulations show that combining the three samples can increase the power to detect disease- associated variants, thou...
This article appeared in a journal published by Elsevier. The attached copy is furnished to the a... more This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier's archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright
Very often the data collected by social scientists involve dependent observations, without, howev... more Very often the data collected by social scientists involve dependent observations, without, however, the investigators having any substantive interest in the nature of the dependencies. Although these dependencies are not important for the answers to the research questions concerned, they must still be taken into account in the analysis. Standard statistical estimation and testing procedures assume independent and identically distributed observations, and need to be modified for observations that are clustered in some way. Marginal models provide the tools to deal with these dependencies without having to make restrictive assumptions about their nature. In this paper, recent developments in the (maximum likelihood) estimation and testing of marginal models for categorical data will be explained, including marginal models with latent variables. The differences and commonalities with other ways of dealing with these nuisance dependencies will be discussed, especially with GEE and also briefly with (hierarchical) random coefficient models. The usefulness of marginal modeling will be illuminated by showing several common types of research questions and designs for which marginal models may provide the answers, along with two extensive real world examples. Finally, a brief evaluation will be given, shortcomings and strong points, computer programs and future work to be done.
We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total in... more We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures.
We describe a previously unnoted problem which, if it occurs, causes the empirical likelihood met... more We describe a previously unnoted problem which, if it occurs, causes the empirical likelihood method to break down. It is related to the empty set problem, recently described in detail by Grendár and Judge (2009), which is the problem that the empirical likelihood model is empty, so that maximum empirical likelihood estimates do not exist. An example is the model that the mean is zero, while all observations are positive. A related problem, which appears to have gone unnoted so far, is what we call the zero likelihood problem. This occurs when the empirical likelihood model is nonempty but all its elements have zero empirical likelihood. Hence, also in this case inference regarding the model under investigation breaks down. An example is the model that the covariance is zero, and the sample consists of monotonically associated observations. In this paper, we define the problem generally and give examples. Although the problem can occur in many situations, we found it to be especially prevalent in marginal modeling of categorical data, when the problem often occurs with probability close to one for large, sparse contingency tables.
Statistical models defined by imposing restrictions on marginal distributions of contingency tabl... more Statistical models defined by imposing restrictions on marginal distributions of contingency tables have received considerable attention recently. This paper introduces a general definition of marginal log-linear parameters and describes conditions for a marginal log-linear parameter to be a smooth parameterization of the distribution, and to be variation independent. Statistical models defined by imposing affine restrictions on the marginal log-linear parameters are investigated. These models generalize ordinary log-linear and multivariate logistic models. Sufficient conditions for a log-affine marginal model to be nonempty, and to be a curved exponential family are given. Standard large sample theory is shown to apply to maximum likelihood estimation of log-affine marginal models for a variety of sampling procedures.
Abstract: The standard two-variable chi-square test is typically consistent for all alternatives ... more Abstract: The standard two-variable chi-square test is typically consistent for all alternatives to independence, but effectively treats the data as nominal which may lead to loss of power for ordinal data. Alternatively, a test based on Kendall's tau does take ordinality into account, but only has power against a narrow set of alternatives. This paper introduces a new test aimed at filling this gap, ie, it is designed for ordinal data and to have omnibus asymptotic power. Our test is a permutation test based on a modification of Kendall's tau, denoted $\ ...
Most genome-wide association studies (GWASs) use randomly selected sam- ples from the population ... more Most genome-wide association studies (GWASs) use randomly selected sam- ples from the population (hereafter bases) as the control set. This approach is successful when the trait of interest is rare; otherwise, a loss in the statistical power to detect disease- associated variants is expected. To address this, a pro- posal to combine the three sample types, cases, controls and bases is introduced, for instances when the disease under study is prevalent. This is done by modelling the bases as a mixture of multinomial logistic functions of cases and controls, ac- cording to the disease prevalence. The maximum likelihood method is used to estimate the underlying parameters using the EM algorithm. Three classical tests of association; score, Walds, and likelihood ratio tests are derived and their power of detecting genetic associations under different designs is compared. Simulations show that combining the three samples can increase the power to detect disease- associated variants, thou...
Uploads
Papers by Wicher Bergsma