Papers by Alastair Pollitt
上海外语教育出版社, 1999
Contributions from a wide range of related areas are assembled in this book: psycholinguistics, p... more Contributions from a wide range of related areas are assembled in this book: psycholinguistics, pragmatics, second language acquisition, syntax, text linguistics and sociolinguistics, as well as educational and applied linguistics. These contributions enable the applied linguist to keep up-to-date with current thinking in diverse fields. The editiorial introductions to the papers show how their contents relate to each other and discuss some of the practical implications for language teaching and language assessment.

iaea.info
What could be more valid than judging that one piece of work is more creative than another? Or mo... more What could be more valid than judging that one piece of work is more creative than another? Or more effective? Or just better? And if many judges agree that the same one is better, isn't that the best evidence for validity we could ask for? This paper describes progress in applying comparative judgement (first reported to IAEA in 2004) to the assessment of holistic traits like overall achievement, including effectivenes, quality and creativity. Marking was invented (in Cambridge) during the 18th century enlightment, not in pursuit of validity or even reliability but to overcome serious problems of bias and prejudice in the examinations of the day. Its unintended consequence has been a most serious loss of validity in most of our formal assessments. Some progress has been made in abolishing marksism in UK assessment. A web-based system has been developed for presenting pairs of 'scripts' and collecting judgements, and the estimation procedure has been shown to be remarkably robust with the extremely sparse data 'matrices' that result. A simple initiation algorithm has been used developed. Some technical aspects of the procedure are described, and a procedure for qualitative description of the scalel for public use id described.
The opinions expressed in this paper are those of the authors and should not be taken as
Measure tests time tests money tests fractions shape and pictorial representation tests number te... more Measure tests time tests money tests fractions shape and pictorial representation tests number tests computation tests Kidmap overlay sheet pupil profile record sheet.

Oral assessment has often been disregarded due to problems of reliability. However, it may provid... more Oral assessment has often been disregarded due to problems of reliability. However, it may provide a more valid way of assessing candidates' level of knowledge than the widespread method of written assessment. Oral assessment offers the opportunity for the marker to enter into a dialogue with the candidate, and to use strategies such as prompting which can improve communication with the candidate. It also involves different time constraints from written examinations. At present, other than in Modern Languages, oral tests are used only in Certificate of Achievement (CoA) examinations, but they may be appropriate at other levels. Questionnaires were sent to teachers carrying out the oral examination for Geography CoA, and 4 teachers and 16 students were interviewed. Transcripts of 18 of the teachers' orals have been analysed in detail to discover more about the language used, and the relationship between oral and written grades has also been investigated.
Paper presented at …, 1999
Summary This study aims to investigate the effects of time-induced stress on text comprehension, ... more Summary This study aims to investigate the effects of time-induced stress on text comprehension, and the implications for performance in an examination setting. In the pilot phase, which has just been completed, we aimed to identify which features of language, ...
8th International Conference on Thinking, …, 1999
Oral exams have historically been a popular method for assessment, but in the last 50 years the m... more Oral exams have historically been a popular method for assessment, but in the last 50 years the majority of assessment in Britain has been in the form of written tests, in part due to Hartog and Rhodes'(1936) criticisms of orals. However, the proportion of the population ...
… Association for Educational …, 2008
What is our warrant for saying Student X deserves a Grade C? It must be based on evidence, and ... more What is our warrant for saying Student X deserves a Grade C? It must be based on evidence, and the only evidence we see is what students produce during the exam. For valid assessment two criteria must be met: the examination must elicit proper evidence of the ...
European …, 2000
This study aims to investigate the effects of time-induced stress on making inferences in text co... more This study aims to investigate the effects of time-induced stress on making inferences in text comprehension, and to account for these effects in terms of working memory processing limitations. Using a program called Hypercard, a narrative text was ...
Assessment in Education: Principles, Policy & Practice, 2007
Setting examination questions in real‐world contexts is widespread. However, when students are re... more Setting examination questions in real‐world contexts is widespread. However, when students are reading contextualized questions there is a risk that the cognitive processes provoked by the context can interfere with their understanding of the concepts in the ...
IAEA conference, Slovenia, May …, 1999
The central phenomenon of the whole examination process is what happens when a candidate meets a ... more The central phenomenon of the whole examination process is what happens when a candidate meets a question. This is the focus of all our activity: no amount of good administration, good teaching or wise judgement can compensate if there is something wrong with the ...
BERA, Queen's …
Two techniques from the psychological literature were used to identify demands in exam questions:... more Two techniques from the psychological literature were used to identify demands in exam questions: Edwards' scale of cognitive demand (1981) and Kelly's Repertory Grid technique (1955). These two techniques were used to develop a tool for identifying and gauging the demands made in GCSE and A Level History, Chemistry and Geography questions. This paper reports on three phases of the development: Phase 1, the adaptation of Edwards' scale to assessment tasks in a number of subjects; Phase 2, subject specialists' elaboration of the scale and; Phase 3, the integration of examiner's perceptions of demands in exam questions.

Assessment in Education: Principles, Policy & Practice, 2010
The two most common models for assessment involve measuring how well students perform on a task (... more The two most common models for assessment involve measuring how well students perform on a task (the quality model), and how difficult a task students can succeed on (the difficulty model). By exploiting the interactive potential of computers we may be able to use a third model: measuring how much help a student needs to complete a task. We assume that every student can complete it, but some need more support than others. This kind of tailored support will give students a positive experience of assessment, and a learning experience, while allowing us to differentiate them by ability. The computer can offer several kinds of support, such as help with understanding a question, hints on the meanings of key concepts, and examples or analogies. A further type of support has particular importance for test validity: the computer can probe students for a deeper explanation than they have so far given. In subjects like geography or science, markers often would like to ask 'yes, but why?', suspecting that students understand more than they have written. We describe a pilot study in which students were given a high level task as an oral interview with varying types of support. Implications of the support model for future modes of assessment are discussed.
The opinions expressed in this paper are those of the authors and are not to be taken as the opin... more The opinions expressed in this paper are those of the authors and are not to be taken as the opinions of the University of Cambridge Local Examinations Syndicate (UCLES) or any of its subsidiaries. Contact details
Two techniques from the psychological literature were used to identify demands in exam questions:... more Two techniques from the psychological literature were used to identify demands in exam questions: Edwards ’ scale of cognitive demand (1981) and Kelly’s Repertory Grid technique (1955). These two techniques were used to develop a tool for identifying and gauging the demands made in GCSE and A Level History, Chemistry and Geography questions. This paper reports on three phases of the development: Phase 1, the adaptation of Edwards ’ scale to assessment tasks in a number of subjects; Phase 2, subject specialists ’ elaboration of the scale and; Phase 3, the integration of examiner’s perceptions of demands in exam questions.

In this paper we will discuss both the currently-used ‘Thurstone ’ methodology for establishing c... more In this paper we will discuss both the currently-used ‘Thurstone ’ methodology for establishing comparability and the previously used ‘home and away ’ ratification method. A brief description and history of the use of human judgement in comparability studies using both these methods will be accompanied by a detailed exploration of issues arising from them. We propose to present the findings of an initial investigation into the extent and predictability of ‘home Board ’ bias amongst judges, derived from a recent inter-Board study. In conclusion, we shall consider what the ideal role is for human judgement in comparability, and also whether human judgement should be utilised at all during Awarding meetings. Disclaimer The opinions expressed in this paper are those of the author and are not to be taken as the opinions of the University of Cambridge Local Examinations Syndicate or any of its subsidiaries. Note Part of this paper uses data collected by the Assessment & Qualifications All...

Aim Examiners and many varieties of commentator have long talked about how ‘demanding ’ a particu... more Aim Examiners and many varieties of commentator have long talked about how ‘demanding ’ a particular examination is, or seems to be, but there is not a clear understanding of what ‘demands ’ means nor of how it differs from ‘difficulty’. In this chapter we describe the main efforts that have tried to elucidate the concept of demands, and aim to establish a common interpretation, so that it may be more useful in future for the description and evaluation of examination standards. Definition of comparability No definition of comparability is necessarily assumed. Sometimes it is apparent that researchers operate with a default assumption that two examinations are expected to show the same level in every aspect of demand, but it would be quite reasonable for one of them to, for example, require a deeper treatment of a smaller range of content than the other; comparability then requires these differences in the demands somehow to balance each other out. It is asking a lot of examiners to ...

What is our warrant for saying “Student X deserves a Grade C ” ? It must be based on evidence, an... more What is our warrant for saying “Student X deserves a Grade C ” ? It must be based on evidence, and the only evidence we see is what students produce during the exam. For valid assessment two criteria must be met: the examination must elicit proper evidence of the trait, and we must evaluate the evidence properly. This highlights the importance of ensuring quality in the mark schemes with which we evaluate the evidence as well as in the questions which elicit it. Our recent research shows that improving mark schemes can make more impact on validity than further work on improving questions. In this paper we will outline a procedural model for maximising construct validity: at its heart is the concept of Outcome Space, the range of evidence that students produce. The model aims to ensure that our mark schemes evaluate this evidence properly in terms of the achievement trait we want to assess. This model has been developed in consultation with senior examiners and exam board personnel. ...
• Prompts are categorised as: Reading prompts, Understanding prompts, Activation prompts, Writing... more • Prompts are categorised as: Reading prompts, Understanding prompts, Activation prompts, Writing prompts and Affective prompts. The prompts allow us to use the Support Model to assess the pupils' understanding of the important concepts in a subject, without their Speech and ...
Uploads
Papers by Alastair Pollitt