Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2021, Psychometrika
…
35 pages
1 file
The emergence of computer-based assessments has made response times, in addition to response accuracies, available as a source of information about test takers' latent abilities. The development of substantively meaningful accounts of the cognitive process underlying item responses is critical to establishing the validity of psychometric tests. However, existing substantive theories such as the diffusion model have been slow to gain traction due to their unwieldy functional form and regular violations of model assumptions in psychometric contexts. In the present work, we develop an attention-based diffusion model based on process assumptions that are appropriate for psychometric applications. This model is straightforward to analyse using Gibbs sampling and can be readily extended. We demonstrate our model's good computational and statistical properties in a comparison with two well-established psychometric models.
British Journal of Mathematical and Statistical Psychology, 2010
Marginal maximum-likelihood procedures for parameter estimation and testing the fit of a hierarchical model for speed and accuracy on test items are presented. The model is a composition of two first-level models for dichotomous responses and response times along with multivariate normal models for their item and person parameters. It is shown how the item parameters can easily be estimated using Fisher's identity. To test the fit of the model, Lagrange multiplier tests of the assumptions of subpopulation invariance of the item parameters (i.e., no differential item functioning), the shape of the response functions, and three different types of conditional independence were derived. Simulation studies were used to show the feasibility of the estimation and testing procedures and to estimate the power and Type I error rate of the latter. In addition, the procedures were applied to an empirical data set from a computerized adaptive test of language comprehension. Reproduction in any form (including the internet) is prohibited without prior permission from the Society second-level model which represents the speed-accuracy distribution in the population of respondents.
Large-scale Assessments in Education
Understanding the cognitive processes, skills and strategies that examinees use in testing is important for construct validity and score interpretability. Although response processes evidence has long been included as an important aspect of validity (i.e., Standards for Educational and Psychological Tests, 1999), relevant studies are often lacking, especially in large scale educational and psychological testing. An important method for studying response processes involves explanatory mathematical modeling of item responses and item response times from variables that represent sources of cognitive complexity. For many item types, examinees may differ in strategies applied to responding to items. Mixture class item response theory models can identify latent classes of examinees with different processes, skills and strategies based on their pattern of item responses. This study will illustrate the use of response times in conjunction with explanatory item response theory models and mix...
1982
Methodology Project of the Center for the Study of Evaluation. Methods for characterizing test accuracy are reported in the first two papers. "Bounds' on the K Out of N Reliability of a Test, and an Exact Test for Hierarchically Related Items" describes and illustrates how an extension of a latent structure Model can be used ir conjunction with results in`Sathe (1980) to estimate the upper and lower bounds of the probability of making at least k correct decisions. "An Approximation of the K Out [of] N Reliability of a Test, and a Scoring Procedure for Determining Which items an Examinee Knows" proposes a probability approximation that can be'estimated with an answer-until-correct (AUC) test. "How Do Examinees Behave When Taking Multiple Choice Tests?" deals with empirical studies of AUC assumptions. Over two hundred examinees were asked to record the hichthey chose their responses. Findings indicate that Horst's (1933) assumption that examinees eliminate,as many distractors as possible and guess at random from among those that remain appears to be a tolerable approximation of yeality in most cases. (Author/PN)
2020
The goal of item response theoretic (IRT) models is to provide estimates of latent traits from binary observed indicators and at the same time to learn the item response functions (IRFs) that map from latent trait to observed response. However, in many cases observed behavior can deviate significantly from the parametric assumptions of traditional IRT models. Nonparametric IRT models overcome these challenges by relaxing assumptions about the form of the IRFs, but standard tools are unable to simultaneously estimate flexible IRFs and recover ability estimates for respondents. We propose a Bayesian nonparametric model that solves this problem by placing Gaussian process priors on the latent functions defining the IRFs. This allows us to simultaneously relax assumptions about the shape of the IRFs while preserving the ability to estimate latent traits. This in turn allows us to easily extend the model to further tasks such as active learning. GPIRT therefore provides a simple and intu...
Multivariate Behavioral Research, 2016
Current approaches to model responses and response times to psychometric tests solely focus on between-subject differences in speed and ability. Within subjects, speed and ability are assumed to be constants. Violations of this assumption are generally absorbed in the residual of the model. As a result, within-subject departures from the between-subject speed and ability level remain undetected. These departures may be of interest to the researcher as they reflect differences in the response processes adopted on the items of a test. In this article, we propose a dynamic approach for responses and response times based on hidden Markov modeling to account for within-subject differences in responses and response times. A simulation study is conducted to demonstrate acceptable parameter recovery and acceptable performance of various fit indices in distinguishing between different models. In addition, both a confirmatory and an exploratory application are presented to demonstrate the practical value of the modeling approach.
This article analyzes latent variable models from a cognitive psychology perspective. We start by discussing work by , who proved that a diffusion model for 2-choice response processes entails a 2-parameter logistic item response theory (IRT) model for individual differences in the response data. Following this line of reasoning, we discuss the appropriateness of IRT for measuring abilities and bipolar traits, such as pro versus contra attitudes. Surprisingly, if a diffusion model underlies the response processes, IRT models are appropriate for bipolar traits but not for ability tests. A reconsideration of the concept of ability that is appropriate for such situations leads to a new item response model for accuracy and speed based on the idea that ability has a natural zero point. The model implies fundamentally new ways to think about guessing, response speed, and person fit in IRT. We discuss the relation between this model and existing models as well as implications for psychology and psychometrics.
SSRN Electronic Journal, 2000
The Item Characteristic Curve describes the relationship between the probability of correctly answering a question and ability. Ability is a latent variable. Therefore one has to impose distributional assumptions on ability in order to estimate the relationship. In this paper we overcome the need to impose an assumption on the distribution of abilities by using the properties of concentration curves and the Gini Mean Difference. As a result we are able to investigate whether the probability of correctly answering a question is monotonically related to a specific ability. The paper demonstrates the properties of the technique.
National Research Council, 2000
In recent years, as cognitive theories of learning and instruction have become richer, and computational methods to support assessment have become more powerful, there has been increasing pressure to make assessments truly criterion referenced, that is, to "report" on student achievement relative to theory-driven lists of examinee skills, beliefs and other cognitive features needed to perform tasks in a particular assessment domain. Cognitive assessment models must generally deal with a more complex goal than linearly ordering examinees, or partially ordering them in a low-dimensional Euclidean space, which is what item response theory (IRT) has been designed and optimized to do. In this paper we consider some usability and interpretability issues for single-strategy cognitive assessment models that posit a stochastic conjunctive relationship between a set of cognitive attributes to be assessed, and performance on particular items or tasks in the assessment. The attributes are coded as present or absent in each examinee, and the tasks are coded as performed correctly or incorrectly. The models we consider make few assumptions about the relationship between latent attributes and task performance beyond a simple conjunctive structure: all attributes relevant to task performance must be present to maximize probability of correct performance of the task. We show by example that these models can be sensitive to cognitive attributes even in data that was designed to be well-fit by the Rasch model, and we consider several stochastic ordering and monotonicity properties that enhance the interpretability of the models. We also identify some simple data summaries that are informative about the presence or absence of cognitive attributes, when the full computational power needed to estimate the models is not available.
Journal of Educational Measurement, 2006
The validity of inferences based on achievement test scores is dependent on the amount of effort that examinees put forth while taking the test. With low-stakes tests, for which this problem is particularly prevalent, there is a consequent need for psychometric models that can take into account differing levels of examinee effort. This article introduces the effort-moderated IRT model, which incorporates item response time into proficiency estimation and item parameter estimation. In two studies of the effort-moderated model when rapid guessing (i.e., reflecting low examinee effort) was present, one based on real data and the other on simulated data, the effort-moderated model performed better than the standard 3PL model. Specifically, it was found that the effort-moderated model (a) showed better model fit, (b) yielded more accurate item parameter estimates, (c) more accurately estimated test information, and (d) yielded proficiency estimates with higher convergent validity.
2016
Inferences on ability in item response theory (IRT) have been mainly based on item responses while response time is often ignored. This is a loss of information especially with the advent of computerized tests. Most of the IRT models may not apply to these modern computerized tests as they still suffer from at least one of the three problems, local independence, randomized item and individually varying test dates, due to the flexibility and complex designs of computerized (adaptive) tests. In Chapter 2, we propose a new class of state space models, namely dynamic item responses and response times models (DIR-RT models), which conjointly model response time with time series of dichotomous responses. It aims to improve the accuracy of ability estimation via auxilary information from response time. A simulation study is conducted to ensure correctness of proposed sampling schemes to estimate parameters, whereas an empirical study is conducted using MetaMetrics datasets to demonstrate i...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Journal of Applied Statistics, 2012
Applied Psychological Measurement, 2019
Journal of Probability and Statistics, 2009
Applied Psychological Measurement, 2001
Journal of Educational and Behavioral …, 1999
Applied Psychological Measurement, 2019
Journal of the Royal Statistical Society: Series C (Applied Statistics), 2006
Frontiers in Psychology, 2019
Journal of Educational and Behavioral Statistics, 1992
Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science, 2019
In Irwing, P., Booth, T. & Hughes, D. (Eds.), The Wiley Handbook of Psychometric Testing., 2015
Journal of Educational and Behavioral Statistics, 2019
Psychometrika, 1995
Behavior research methods, 2015
Journal of Educational and Behavioral Statistics, 2005
ETS Research Report Series, 2005
Education Sciences
ETS Research Report Series, 1991
Applied Psychological Measurement, 2001
Brazilian Journal of Probability and Statistics, 2014