Papers by Judith Runnels
Pluricultural Language Education and the CEFR, 2021

CEFR Journal, 2019
Published in 2001, the Common European Framework of Reference for Languages (CEFR), a reference f... more Published in 2001, the Common European Framework of Reference for Languages (CEFR), a reference framework which informs teaching, learning and assessment in language education, appears to be increasingly recognized, referenced and utilized in language education contexts worldwide. To date however, the extent, provenance and adoption of the collected body of knowledge concerning the CEFR has yet to be systematically analysed, rendering it difficult for any conclusions to be made about its impact. A bibliometric analysis was therefore conducted to explore the CEFR from the document’s more formal origins in 1990 to the end of 2017 for the bibliometric indicators of number of publications per year, geographical location of research, highly cited works and journals with the highest number of relevant publications. The findings show that research on the CEFR has increased significantly over the examined time. The majority of publications with a focus on the CEFR are European, but numbers are increasing in geographical areas outside of Europe, and particularly in Asia. The framework is discussed in numerous types of publications covering a range of topics in language education. These findings suggest that the CEFR has been used in contexts beyond its origins and has influenced many aspects of language education around the globe. Diffusion of innovations theory suggests that the CEFR’s impact and influence is likely to increase over the next ten years in and outside of Europe and especially in Asia.

Since its release in 1979 the TOEIC ® (Test of English for International Communication) has been ... more Since its release in 1979 the TOEIC ® (Test of English for International Communication) has been consistently and widely used by educational institutions and companies of Japan despite criticisms that it provides little useable information about language ability. In order to both reduce the extreme focus on and also aid with the practical interpretability of TOEIC ® test scores, other approaches to the assessment of language proficiency have started to gain popularity. One notable shift seems to be towards the usage of the Common European Framework of Reference (CEFR), which is purported to provide a highly learner-centered approach to the teaching, learning and assessment of languages. The CEFR promotes the development of learner autonomy and supports learner self-assessment through the usage of can do statements, which describe the communicative actions learners are able to perform at any given time. Due to the increasing interest in using the CEFR as an assessment tool for learning in Japan, further study of the relationship between language proficiency and self-assessment is required. The current study thus explored the relationship between Japanese English language learners' self-assessment scores on listening and reading can do statements from the Common European Framework of Reference-Japan (CEFR-J, a modified version of the CEFR) with test scores from the TOEIC. Moderate correlations between the TOEIC and can do self-assessment scores were found for listening, while no correlations were found for reading. The factors that may influence a learner's self-assessment tendencies, the efficacy of a self-assessment system for Japanese learners and the interpretability of TOIEC® scores are discussed.

Differential item functioning (DIF) analyses are used to determine if there are any items that af... more Differential item functioning (DIF) analyses are used to determine if there are any items that affect the probability of particular groups of test-takers endorsing an item, after controls for ability are taken into account. If DIF occurs on a wide-scale, this means that test scores do not represent the same measurement over the population of test-takers. This is known as differential test functioning (DTF). This study examined the item functioning of an in-house designed low-stakes achievement vocabulary test designed to measure how well second-year students in four different academic disciplines acquired words on a 250 word study list. The same test has been previously examined using Rasch analysis, for the purposes of highlighting items that were causing unexpected response patters (Runnels, 2011). The current analysis offers additional validity evidence related to score equivalence across majors. It was found that even though DTF is unlikely, there were several items that favored and hindered some majors. The importance of establishing a process to check for DTF and DIF, especially when the test-takers are from different disciplines of study and even for low-stakes tests, is discussed.
The current study investigated the current perceptions of the CEFR held by tertiary level languag... more The current study investigated the current perceptions of the CEFR held by tertiary level language teachers in Japanese universities. This entailed examining teachers’ familiarity with the CEFR, the areas of language education in which it has been used, its estimated impact, problems arising through its previous usages, and intended future usage. Particular focus was paid to the issues that CEFR users have previously faced and the suggestions they made to resolve them.

Both the CEFR (Common European Framework of Reference) and the CEFR-J (CEFR-Japan) use illustrati... more Both the CEFR (Common European Framework of Reference) and the CEFR-J (CEFR-Japan) use illustrative can-do descriptors to describe a learner’s communicative competences in five language skills across six levels of language proficiency. This paper reports on Japanese English learners self-assessment on the CEFR-J’s 50 A-level descriptors using either a four-point or a five-point scale to determine if a neutral response option (neither agree nor disagree) influenced participants’ responses. Self-assessment by Japanese language learners has been shown to be subject to cultural factors related to social desirability phenomena, resulting in high selection rates of mid-scale response options no matter the content of the item or the size of the scale. Overall, no significant differences between mean responses on a four-point (no neutral category) and a five-point (contains an inherent mid-point) rating scale were found following controls for scale size. Conversely, significant interactions were found for rating scale, skill (reading and spoken production) and descriptor difficulty level (A1.1 and A2.2). When the distance between responses and scale mid-point was measured and compared across rating scale to determine whether the inclusion of a neutral option appeared to influence selection rates, no significant differences were found for 68% of all descriptors. While inclusion of a middle response option had far lesser impact on responses than has been previously shown, further research is required to determine the impact of differing scale types on Japanese English learner self-assessments. This paper discusses the influence on responses from socio-cultural factors, response styles, task-familiarity, language skill, the number of response scale categories and language proficiency

Just as the CEFR (Common European Framework of Reference, Council of Europe, 2001) did in Europe,... more Just as the CEFR (Common European Framework of Reference, Council of Europe, 2001) did in Europe, since its release in March 2012, likewise has the CEFR-J (Common European Framework of Reference-Japan, Negishi, Takada & Tono, 2013), begun to impact the foreign language education industry of Japan. The CEFR has been shown to act as a useful descriptive scheme for analyzing the needs, goals, materials and outcomes of language learners’ studies (Alanen, Huhta & Tarnanen, 2010) and it is hoped that the CEFR-J may do the same for language education in Japan. To summarize briefly, both of the systems operate using illustrative descriptors or can do statements, which describe communicative competences of a learner in five language sub-skills (listening, reading, spoken interaction, spoken production, writing), across levels of proficiency. In contrast to the CEFR’s six global levels of proficiency, the CEFR-J has twelve: Pre-A1, A1.1, A1.2, A1.3, A2.1, A2.2, B1.1, B1.2, B2.1, B2.2, C1, C2. Groups of can-do statements, often presented in table form on what is known as a self-assessment grid, can be used for a number of purposes: as a basis for materials design, curriculum development or formal assessment, as a tool for self-assessment by a language learner, the measurement of proficiency or progress, or for any other reason. Despite significant activity on implementing the system at many institutions across the nation, currently there are few published resources or examples specific to a Japanese context from which other teachers, learners or institutions could draw upon.
Inspired by Little, Goullier and Hughes’ (2011) article “The European Language Portfolio: The story so far (2001-2011)” which summarises ten years of activity involving the ELP, this article provides summaries of as comprehensive a list as possible of primarily English language CEFR-J publications. It is hoped that this list will act as a starting point and will hopefully aid anyone interested in learning more about what has happened with the CEFR-J so far, as well as encourage the publication of further CEFR-J-specific research in the future.

This paper outlines one method through which learner self-regulation can be
promoted in CEFR-inf... more This paper outlines one method through which learner self-regulation can be
promoted in CEFR-informed courses using a learning cycle. Previous reports of
learning cycles in use have not adequately described how they can be
operationalised within the classroom—typically, they have been limited to
descriptions of the cycle alone. This paper provides specific examples of how a
CEFR-informed learning cycle has been implemented in an EFL process writing
class. Cyclical learning and the CEFR as the tools for bringing learner selfregulation
practices forward are first introduced. Next, a description of selfregulation
practices in the classroom context using the example of an essay
writing task in a process writing class is provided. The discussion then focuses on
how instructors can encourage learners to carry their self-regulation practices
forward to their future learning once a class has been completed. We conclude by
suggesting possible benefits of this learning approach, and future directions for
research.

Both the Common European Framework of Reference (CEFR) and the CEFR-Japan (CEFR-J), an alternate ... more Both the Common European Framework of Reference (CEFR) and the CEFR-Japan (CEFR-J), an alternate version designed for Japanese learners of English, provide measurements of language proficiency via assessment or self-assessment on scales of descriptors of communicative competences (known as can-do statements). Although extensive empirical evidence supports these claims for the CEFR, the same cannot yet be said of the CEFR-J. Mokken scaling was thus used to measure the reliability of can-do statement scales from the five skills of the CEFR-J’s five A sublevels of A1.1, A1.2, A1.3, A2.1, and A2.2. Statements that negatively affected the reliability of the scale were analysed. Lower reliability was attributed to characteristics specific to participants (homogeneity of the population, familiarity with the task, and if the material was recently studied), and content of the statement itself (whether it
implied more than one language skill or none at all, whether it contained a contradiction, or was confusing or unfamiliar). Modifications to increase the reliability of cando statement scales and limitations of using illustrative descriptor-based systems as measurement instruments are discussed.

The newly released Common European Framework of Reference Japan (CEFR-J) was designed to address ... more The newly released Common European Framework of Reference Japan (CEFR-J) was designed to address the issue that a consistent system for measuring learner proficiency and progress in foreign language pedagogy in Japan is lacking. This tailored version of the Common Europe Framework of Reference (CEFR) was developed to better discriminate incremental differences in proficiency for Japanese learners of English, who tend to fall mostly within the A1 and A2 levels. Changes from the original CEFR included the creation of can-do illustrative descriptors that separated 4 of the existing 6 levels into sub-levels. The goal of the current analysis is to test the suitability of the new sub-levels of A1 and A2 for target users of the system in two ways: 1) by determining if newly developed descriptors are empirically rank ordered by difficulty as specified by the CEFR-J, and 2) by testing the statistical significance of differences in difficulty ratings between the sub-levels. The current analysis found that the rank ordering of levels was the same as predicted by the CEFR-J, and that the higher-order A1 and A2 levels varied in difficulty to a statistically significant degree, but significant differences between adjacent CEFR-J sub-levels were not found. This raises questions about how users of the system can effectively distinguish features representative of each level and whether the additional sub-levels in the CEFR-J can function as intended. Limitations of using a system of illustrative descriptors based primarily on estimates of difficulty and the process of contextualizing a generalized framework are discussed.

The Japanese adaptation of the Common European Framework of Reference (CEFR-J) is a tailored vers... more The Japanese adaptation of the Common European Framework of Reference (CEFR-J) is a tailored version of the Common European Framework of Reference (CEFR), designed to better meet the needs of Japanese learners of English. The CEFR-J, like the CEFR, uses illustrative descriptors known as can-do statements, that describe achievement goals for five skills (listening, reading, spoken production, spoken interaction and writing) across twelve levels instead of the CEFR's original six. The goal of the present analysis is to provide validity evidence in support of the inherent difficulty hierarchy within the 5 A level sub-categories (A1.1, A1.2, A1.3, A2.1 and A2.2) in two ways: 1) by testing whether the difficulty of the can-do statements for each skill increases with the levels, and 2) by determining if there are significant differences in difficulty ratings between each level. It was found that for most skills, the rank ordering from difficulty ratings made by Japanese university students somewhat matched the level hierarchy of the CEFR-J but that significant differences between many adjacent levels were not found. The localization of a general framework for use by a specific population of users and the limitations related to using a system of can-dos that is derived from estimates of difficulty are discussed.

Differential item functioning (DIF) is when a test item favors or hinders a characteristic exhibi... more Differential item functioning (DIF) is when a test item favors or hinders a characteristic exhibited by group members of a test-taking population. DIF analyses are statistical procedures used to determine to what extent the content of an item affects the item endorsement of sub-groups of test-takers. If DIF is found for many items on the test, the final test scores do not represent the same measurement across groups in the population of test-takers. This is known as differential test functioning (DTF). DTF is of particular concern in tertiary level language tests, where test-takers often differ in academic discipline. This study examined the DIF and DTF of an in-house developed assessment designed to measure how well first year students of five academic disciplines achieved material over the course of a year of English language study. The DIF and DTF tests were performed using Rasch analysis, which controls for ability across groups, ensuring that items are only flagged if groups of test-takers of the same ability levels exhibit a significantly different probability of endorsing the item. The current analysis outlines the process for checking for DIF and DTF and finds that even though DTF is unlikely, there were several items that favored and hindered some majors. Recommendations for modification of items are made and the importance of establishing a process to check for DTF and DIF, especially when the test-takers are from different disciplines of study, is discussed.

Rasch analyses have been linked by numerous scholars to the six facets of Messick's (1989) concep... more Rasch analyses have been linked by numerous scholars to the six facets of Messick's (1989) concept of construct validity and are commonly used to evaluate pedagogical assessment. Compared to deterministic statistics from classical test theory, the Rasch model's prescriptive methods have been argued to provide stronger validity evidence. Rasch-based methods estimate probabilities of item endorsements according to person ability and item difficulty parameters, highlighting items that produce some degree of unexpected response patterning. In the current article, a multiple-choice achievement test taken by English as a Foreign Language students at a private university in Japan was analyzed. The results show very little misfit to the Rasch model and that the level of the test was appropriately targeted to the abilities of the test-taking population, covering a range of statistically distinct difficulties. Fit statistics, an item-person map, item strata, Rasch measures, point-measure correlations and their relation to Messickian validity are discussed.
and 17, 2012. The theme of the conference was, Literacy: SIGnals of Emergence, and was a collabor... more and 17, 2012. The theme of the conference was, Literacy: SIGnals of Emergence, and was a collaborative effort from 22 Special Interest Groups (SIGs) within JALT (Japan Association for Language Teaching). The conference was highly successful as more than 200 participants attended over 130 presentations with a variety of topics and interests from a wide spectrum in the field language teaching.

The comprehensibility of written task instructions from language classroom learning materials is
... more The comprehensibility of written task instructions from language classroom learning materials is
an important area to examine since language learners need to understand what is expected of
them in order to perform classroom tasks successfully. This study describes how teacher-written
instructions on classroom materials developed for a General English curriculum were analyzed
and modified to improve comprehensibility. Written instructions were modified to simplify
vocabulary, reduce sentence length, eliminate extraneous information, and reflect the sequencing
of task performance. An analysis using three readability formulas showed that readability
increased for all indices. Both the pre and post-change task instructions were then rated by
language learners for comprehensibility; Rasch analysis and ANOVAs comparing the
comprehensibility ratings on the pre- and post-change instructions revealed that 78% of the
instructions were rated as easier to understand than their pre-change equivalent, illustrating that
the modifications were effective at increasing both readability and comprehensibility.

The JALT CALL Journal, Dec 2013
A paperless classroom, when all materials required to complete a class are available in an electr... more A paperless classroom, when all materials required to complete a class are available in an electronic form, has been shown to have positive impacts on student and teacher motivation, engagement, productivity, and efficiency. Recent trends suggest that of all of the technological tools available, tablet PCs can support many aspects of a paperless classroom for both students and teachers. A variety of resources describing the development and implementation of courses using tablet PCs are currently available, though comparatively less research specific to individual stages of the process or details involved in selecting appropriate tools has been performed. The current study was designed to provide preliminary evidence for how the screen size of a tablet PC affected interactions with electronic handouts from an English language class. Teachers and students completed tasks on both a 10-inch tablet PC as well as on the miniature version of the same tablet, to determine the impact screen size had on usability. It was found that while teachers significantly preferred interacting with classroom materials on the regular-sized tablet, students did not show preference toward either device for classroom use. However, students suggested that for everyday use, such as doing homework, the miniature version was preferred. The implications the results make on materials design and mobility as a component of a paperless classroom are discussed.
Non-Refereed Articles by Judith Runnels

Oral achievement tests aim to give students the opportunity to demonstrate how well they can use ... more Oral achievement tests aim to give students the opportunity to demonstrate how well they can use language they were previously exposed to and practiced using in the classroom. Literature in the field of language assessment however, focuses largely on proficiency or placement testing, much to the dismay of educators. The current study expands on the work of Fulcher (2004) and Luoma (2004) and presents some novel steps that can be taken in the preliminary stages of achievement test development to ensure assessment is more representative of classroom conditions and leads to equivalent alternate forms of a test. The equivalence of eight forms of a second year General English speaking test was examined and analyses revealed no significant differences across test forms. Involving teachers in the development stages appears to be one way to ensure equivalency across alternate test forms. Consulting teachers also appears to contribute to producing achievement tests that are representative of the curriculum.

The Rasch model has recently been used in educational measurement as an evaluative tool. Rasch a... more The Rasch model has recently been used in educational measurement as an evaluative tool. Rasch analyses have been shown to map onto the six aspects of Messick’s (1989) construct validity and compared to a more classic model of test theory and deterministic analysis measures, make stronger arguments in providing validity evidence for tests. The Rasch model estimates the probability of a specific response according to person ability and item difficulty parameters, placing both on an interval scale. In the current study, an 83 item multiple-choice English vocabulary achievement test was administered to second-year non-English majors at a Japanese university. The test was developed from a 250 word study list. The results were analysed using a combination of Rasch measures and deterministic statistics, including logistic regression. The analyses highlighted several test items that exhibited unusual response patterning and suggested that the test was not an effective tool in measuring how well the students’ acquired the 250 words on their study list. Deterministic parametric and Rasch analyses were both effective as evaluative tools, although Rasch produced more precise information that can subsequently be used by test developers or educators to revisit potentially problematic test items, ultimately improving the validity of the test.
Uploads
Papers by Judith Runnels
http://www.cambridgeenglish.org/research-and-validation/published-research/
http://www.englishprofile.org/images/English_Profile/Pluricultural-Language-Education-and-the-CEPR.pdf
Inspired by Little, Goullier and Hughes’ (2011) article “The European Language Portfolio: The story so far (2001-2011)” which summarises ten years of activity involving the ELP, this article provides summaries of as comprehensive a list as possible of primarily English language CEFR-J publications. It is hoped that this list will act as a starting point and will hopefully aid anyone interested in learning more about what has happened with the CEFR-J so far, as well as encourage the publication of further CEFR-J-specific research in the future.
promoted in CEFR-informed courses using a learning cycle. Previous reports of
learning cycles in use have not adequately described how they can be
operationalised within the classroom—typically, they have been limited to
descriptions of the cycle alone. This paper provides specific examples of how a
CEFR-informed learning cycle has been implemented in an EFL process writing
class. Cyclical learning and the CEFR as the tools for bringing learner selfregulation
practices forward are first introduced. Next, a description of selfregulation
practices in the classroom context using the example of an essay
writing task in a process writing class is provided. The discussion then focuses on
how instructors can encourage learners to carry their self-regulation practices
forward to their future learning once a class has been completed. We conclude by
suggesting possible benefits of this learning approach, and future directions for
research.
implied more than one language skill or none at all, whether it contained a contradiction, or was confusing or unfamiliar). Modifications to increase the reliability of cando statement scales and limitations of using illustrative descriptor-based systems as measurement instruments are discussed.
an important area to examine since language learners need to understand what is expected of
them in order to perform classroom tasks successfully. This study describes how teacher-written
instructions on classroom materials developed for a General English curriculum were analyzed
and modified to improve comprehensibility. Written instructions were modified to simplify
vocabulary, reduce sentence length, eliminate extraneous information, and reflect the sequencing
of task performance. An analysis using three readability formulas showed that readability
increased for all indices. Both the pre and post-change task instructions were then rated by
language learners for comprehensibility; Rasch analysis and ANOVAs comparing the
comprehensibility ratings on the pre- and post-change instructions revealed that 78% of the
instructions were rated as easier to understand than their pre-change equivalent, illustrating that
the modifications were effective at increasing both readability and comprehensibility.
Non-Refereed Articles by Judith Runnels
http://www.cambridgeenglish.org/research-and-validation/published-research/
http://www.englishprofile.org/images/English_Profile/Pluricultural-Language-Education-and-the-CEPR.pdf
Inspired by Little, Goullier and Hughes’ (2011) article “The European Language Portfolio: The story so far (2001-2011)” which summarises ten years of activity involving the ELP, this article provides summaries of as comprehensive a list as possible of primarily English language CEFR-J publications. It is hoped that this list will act as a starting point and will hopefully aid anyone interested in learning more about what has happened with the CEFR-J so far, as well as encourage the publication of further CEFR-J-specific research in the future.
promoted in CEFR-informed courses using a learning cycle. Previous reports of
learning cycles in use have not adequately described how they can be
operationalised within the classroom—typically, they have been limited to
descriptions of the cycle alone. This paper provides specific examples of how a
CEFR-informed learning cycle has been implemented in an EFL process writing
class. Cyclical learning and the CEFR as the tools for bringing learner selfregulation
practices forward are first introduced. Next, a description of selfregulation
practices in the classroom context using the example of an essay
writing task in a process writing class is provided. The discussion then focuses on
how instructors can encourage learners to carry their self-regulation practices
forward to their future learning once a class has been completed. We conclude by
suggesting possible benefits of this learning approach, and future directions for
research.
implied more than one language skill or none at all, whether it contained a contradiction, or was confusing or unfamiliar). Modifications to increase the reliability of cando statement scales and limitations of using illustrative descriptor-based systems as measurement instruments are discussed.
an important area to examine since language learners need to understand what is expected of
them in order to perform classroom tasks successfully. This study describes how teacher-written
instructions on classroom materials developed for a General English curriculum were analyzed
and modified to improve comprehensibility. Written instructions were modified to simplify
vocabulary, reduce sentence length, eliminate extraneous information, and reflect the sequencing
of task performance. An analysis using three readability formulas showed that readability
increased for all indices. Both the pre and post-change task instructions were then rated by
language learners for comprehensibility; Rasch analysis and ANOVAs comparing the
comprehensibility ratings on the pre- and post-change instructions revealed that 78% of the
instructions were rated as easier to understand than their pre-change equivalent, illustrating that
the modifications were effective at increasing both readability and comprehensibility.
http://www.cambridgeenglish.org/research-and-validation/published-research/
http://www.englishprofile.org/images/English_Profile/Pluricultural-Language-Education-and-the-CEPR.pdf