Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2018, Proceedings of the National Academy of Sciences of the United States of America
…
6 pages
1 file
We describe and demonstrate an empirical strategy useful for discovering and replicating empirical effects in psychological science. The method involves the design of a metastudy, in which many independent experimental variables-that may be moderators of an empirical effect-are indiscriminately randomized. Radical randomization yields rich datasets that can be used to test the robustness of an empirical claim to some of the vagaries and idiosyncrasies of experimental protocols and enhances the generalizability of these claims. The strategy is made feasible by advances in hierarchical Bayesian modeling that allow for the pooling of information across unlike experiments and designs and is proposed here as a gold standard for replication research and exploratory research. The practical feasibility of the strategy is demonstrated with a replication of a study on subliminal priming.
Psychonomic Bulletin & Review, 2012
Empirical replication has long been considered the final arbiter of phenomena in science, but replication is undermined when there is evidence for publication bias. Evidence for publication bias in a set of experiments can be found when the observed number of rejections of the null hypothesis exceeds the expected number of rejections. Application of this test reveals evidence of publication bias in two prominent investigations from experimental psychology that have purported to reveal evidence of extrasensory perception and to indicate severe limitations of the scientific method. The presence of publication bias suggests that those investigations cannot be taken as proper scientific studies of such phenomena, because critical data are not available to the field. Publication bias could partly be avoided if experimental psychologists started using Bayesian data analysis techniques.
Statistical methods in medical research, 2017
Consider a statistical analysis that draws causal inferences from an observational dataset, inferences that are presented as being valid in the standard frequentist senses; i.e. the analysis produces: (1) consistent point estimates, (2) valid p-values, valid in the sense of rejecting true null hypotheses at the nominal level or less often, and/or (3) confidence intervals, which are presented as having at least their nominal coverage for their estimands. For the hypothetical validity of these statements, the analysis must embed the observational study in a hypothetical randomized experiment that created the observed data, or a subset of that hypothetical randomized data set. This multistage effort with thought-provoking tasks involves: (1) a purely conceptual stage that precisely formulate the causal question in terms of a hypothetical randomized experiment where the exposure is assigned to units; (2) a design stage that approximates a randomized experiment before any outcome data ar...
We describe a method of quantifying the effect of Questionable Research Practices (QRPs) on the results of meta-analyses. As an example we simulated a meta-analysis of a controversial telepathy protocol to assess the extent to which these experimental results could be explained by QRPs. Our simulations used the same numbers of studies and trials as the original meta-analysis and the frequencies with which various QRPs were applied in the simulated experiments were based on surveys of experimental psychologists. Results of both the meta-analysis and simulations were characterized by 4 metrics, two describing the trial and mean experiment hit rates (HR) of around 31%, where 25% is expected by chance, one the correlation between sample-size and hit-rate, and one the complete P-value distribution of the database. A genetic algorithm optimized the parameters describing the QRPs, and the fitness of the simulated meta-analysis was defined as the sum of the squares of Zscores for the 4 metrics. Assuming no anomalous effect a good fit to the empirical metaanalysis was found only by using QRPs with unrealistic parameter-values. Restricting the parameter space to ranges observed in studies of QRP occurrence, under the untested assumption that parapsychologists use comparable QRPs, the fit to the published Ganzfeld meta-analysis with no anomalous effect was poor. We allowed for a real anomalous effect, be it unidentified QRPs or a paranormal effect, where the HR ranged from 25% (chance) to 31%. With an anomalous HR of 27% the fitness became F = 1.8 (p = 0.47 where F = 0 is a perfect fit). We conclude that the very significant probability cited by the Ganzfeld metaanalysis is likely inflated by QRPs, though results are still significant (p = 0.003) with QRPs. Our study demonstrates that quantitative simulations of QRPs can assess their impact. Since meta-analyses in general might be polluted by QRPs, this method has wide applicability outside the domain of experimental parapsychology.
2014
While direct replications such as the ‘‘Many Labs’’ project are extremely valuable in testing the reliability of published findings across laboratories, they reflect the common reliance in psychology on single vignettes or stimuli, which limits the scope of the conclusions that can be reached. New experimental tools and statistical techniques make it easier to routinely sample stimuli, and to appropriately treat them as random factors. We encourage researchers to get into the habit of including multiple versions of the content (e.g., stimuli or vignettes) in their designs, to increase confidence in cross-stimulus generalization and to yield more realistic estimates of effect size. We call on editors to be aware of the challenges inherent in such stimulus sampling, to expect and tolerate unexplained variability in observed effect size between stimuli, and to encourage stimulus sampling instead of the deceptively cleaner picture offered by the current reliance on single stimuli.
Perspectives on Psychological Science, 2011
Statistical inference in psychology has traditionally relied heavily on p-value significance testing. This approach to drawing conclusions from data, however, has been widely criticized, and two types of remedies have been advocated. The first proposal is to supplement p values with complementary measures of evidence, such as effect sizes. The second is to replace inference with Bayesian measures of evidence, such as the Bayes factor. The authors provide a practical comparison of p values, effect sizes, and default Bayes factors as measures of statistical evidence, using 855 recently published t tests in psychology. The comparison yields two main results. First, although p values and default Bayes factors almost always agree about what hypothesis is better supported by the data, the measures often disagree about the strength of this support; for 70% of the data sets for which the p value falls between .01 and .05, the default Bayes factor indicates that the evidence is only anecdota...
Advances in Experimental Philosophy of Medicine, 2023
We simulate trial data to test speculative claims about research methods, such as the impact of publication bias.
Journal of the American Statistical Association, 2016
Investigators from a large consortium of scientists recently performed a multi-year study in which they replicated 100 psychology experiments. Although statistically significant results were reported in 97% of the original studies, statistical significance was achieved in only 36% of the replicated studies. This article presents a reanalysis of these data based on a formal statistical model that accounts for publication bias by treating outcomes from unpublished studies as missing data, while simultaneously estimating the distribution of effect sizes for those studies that tested nonnull effects. The resulting model suggests that more than 90% of tests performed in eligible psychology experiments tested negligible effects, and that publication biases based on p-values caused the observed rates of nonreproducibility. The results of this reanalysis provide a compelling argument for both increasing the threshold required for declaring scientific discoveries and for adopting statistical summaries of evidence that account for the high proportion of tested hypotheses that are false. Supplementary materials for this article are available online.
Evaluation Practice, 1987
Adjusting for publication bias is essential when drawing meta-analytic inferences. However,most methods that adjust for publication bias are sensitive to the particular researchconditions, such as the degree of heterogeneity in effect sizes across studies. Sladekovaet al. (2022) tried to circumvent this complication by selecting the methods that are mostappropriate for a given set of conditions, and concluded that publication bias on averagecauses only minimal over-estimation of effect sizes in psychology. However, this approachsuffers from a “catch-22” problem — to know the underlying research conditions, one needsto have adjusted for publication bias correctly, but to correctly adjust for publication bias,one needs to know the underlying research conditions. To alleviate this problem weconduct an alternative analysis, Robust Bayesian meta-analysis (RoBMA), which is notbased on model-selection but on model-averaging. In RoBMA, models that predict theobserved results better are give...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Acta Analytica, 2021
Journal of Contextual Behavioral Science, 2014
Bayesian Inference, 2017
Psychological Science, 2014
… Journal of Clinical …, 2008
Perspectives on Psychological Science, 2014
Frontiers in psychology, 2017
Routledge eBooks, 2015
Frontiers in psychology, 2012