International Journal of Assessment Tools in Education, Dec 20, 2020
This study tested the applicability of the theoretical Examination for Candidates of Driving Lice... more This study tested the applicability of the theoretical Examination for Candidates of Driving License (ECODL) in Turkey as a computerized adaptive test (CAT). Firstly, various simulation conditions were tested for the live CAT through an item response theory-based calibrated item bank. The application of the simulated CAT was based on data from e-exams administered by the Ministry of National Education (MoNE). Results of the first stage of the study were used to determine the rules for starting, continuing, and terminating the live CAT exam for ECODL. Secondly, the live CAT exam was applied according to the results of the simulation. Candidate drivers (n = 280) who had taken the ECODL as an e-test participated in the second stage. Thirdly, the opinions of the individuals who took the computer-based test towards the computer-based testing application were mapped. In the termination rule of the CAT-based ECODL, testing with a fixed number of questions yielded the smallest estimated measurement error. We also found that when ECODL was implemented as CAT, it could reliably differentiate among testers in terms of competence of theoretical knowledge of driving and provide basis for accurate decisions regarding their proficiency. According to the findings obtained on the candidates' opinions on the computer-based testing application, it was seen that they considered computer-based application more practical an easier in terms of testing.
Language proficiency testing serves an important function of classifying examinees into different... more Language proficiency testing serves an important function of classifying examinees into different categories of ability. However, misclassification is to some extent inevitable and may have important consequences for stakeholders. Recent research suggests that classification efficacy may be enhanced substantially using computerized adaptive testing (CAT). Using real data simulations, the current study investigated the classification performance of CAT on the reading section of an English language proficiency test and made comparisons with the paper-based version of the same test. Classification analysis was carried out to estimate classification accuracy (CA) and classification consistency (CC) by applying different locations and numbers of cutoff points. The results showed that classification was suitable when a single cutoff score was used, particularly for high- and low-ability test takers. Classification performance declined significantly when multiple cutoff points were simulta...
The purpose of the present study is to compare ability estimations obtained from computerized ada... more The purpose of the present study is to compare ability estimations obtained from computerized adaptive testing (CAT) procedure with the paper and pencil test administration results of Student Selection Examination (SSE) science subtest considering different ability estimation methods and test termination rules. There are two phases in the present study. In the first phase, a post-hoc simulation was conducted to find out relationships between examinee ability levels estimated by CAT and paper and pencil test versions of the SSE. Maximum Likelihood Estimation and Expected A Posteriori were used as ability estimation method. Test termination rules were standard error threshold and fixed number of items. Second phase was actualized by implementing a CAT administration to a v group of examinees to investigate performance of CAT administration in an environment other than simulated administration. Findings of post-hoc simulations indicated CAT could be implemented by using Expected A Posteriori estimation method with standard error threshold value of 0.30 or higher for SSE. Correlation between ability estimates obtained by CAT and real SSE was found to be 0.95. Mean of number of items given to examinees by CAT is 18.4. Correlation between live CAT and real SSE ability estimations was 0.74. Number of items used for CAT administration is approximately 50% of the items in paper and pencil SSE science subtest. Results indicated that CAT for SSE science subtest provided ability estimations with higher reliability with fewer items compared to paper and pencil format.
Bu calisma dezavantajli olup da dusuk ve ustun basarili ogrencileri birbirlerinden ayirt edebilen... more Bu calisma dezavantajli olup da dusuk ve ustun basarili ogrencileri birbirlerinden ayirt edebilen ogretmen ve okul ile iliskili degiskenleri incelemistir. Bu iki grup arasinda onemli bir basari farkliligi mevcuttur. Bu amacla, PISA 2012 veri kumesinden secilen bazi faktorlerin dusuk ve utun basarili ogrencileri ayirt edip etmediklerini inceemek amaci ile discriminant analysis yontemi kullanilmistir. 5 farkli boyuttan 22 madde calismaya dahil edilmistir: Ogrenciogretmen iliskileri (5 madde), Aidiyet duygusu (9 madde), Okulda Ogrenilerlere karsi tutum (4 madde) ve Okula karsi tutum (4 madde). Calisma sonuclari bazi maddeleri dusuk ve ustun basarili ogrenciler ayirt edebildigini ortaya koymustur. Bu calismanin sonuclarin ustun basarili ogrenci oraninin artirilmasinda kullanilmak uzere onemli bilgiler saglayacagi dusunulmektedir
This study investigates the combined role of utility value, expectancy for success, intrinsic rea... more This study investigates the combined role of utility value, expectancy for success, intrinsic reasons and self-worth concerns, in predicting learning strategies and test anxiety. The study examined this potential prediction with 1,009 university students. The students were studying in a language preparatory program of a university. In the qualitative phase of this exploratory sequential mixed methods case study, semistructured interviews faciliated understanding of students' perception of the motivational variables they believe are influential in their language learning process. Interviews were held with students from three different categories: non-repeaters (i.e., those who never failed), past-repeaters (i.e., those who had experienced failure), and current-repeaters (i.e., those who failed and were repeating the current period of study). Quantitative data were gathered through a survey approach and enabled exploration of the relationship among motivational components and learning strategies. Five hierarchical regression analyses was conducted. The regression analysis was conducted for effort regulation, learning strategies (i.e., rehearsal, critical thinking and metacognitive self-regulation) and test anxiety. The results of the regression analyses showed that, intrinsic reasons positively predicted learning strategies across the three groups of students. Self-worth concerns were found to positively predict test anxiety. The results of the study suggest that intrinsic reasons for have an important role in contexts where there is psychological pressure to be successful.
One important function of school mathematics curriculum is to prepare high school students with t... more One important function of school mathematics curriculum is to prepare high school students with the knowledge and skills needed for university education. Identifying them empirically will help making sound decisions about the contents of high school mathematics curriculum. It will also help students to make informed choices in course selection at high school. In this study, we surveyed university faculty members who teach first year university students about the mathematical knowledge and skills that they would like to see in incoming high school graduates. Data were collected from 122 faculty members from social science (history, law, psychology) and engineering departments (electrical/electronics and computer engineering). Participants were asked to indicate which high school mathematics topics and skills they thought were important to be successful at university education in their field. Results were compared across social science and engineering departments. Implications were drawn for curriculum specialists, students, and mathematics educators.
This paper presents a computer software developed by the author. The software conducts post-hoc s... more This paper presents a computer software developed by the author. The software conducts post-hoc simulations for computerized adaptive testing based on real responses of examinees to paper and pencil tests under different parameters that can be defined by user. In this paper, short information is given about post-hoc simulations. After that, the working principle of the software is provided and a sample simulation with required input files is shown. And last, output files are described. Özet Bu çalışmada yazar tarafından geliştirilmiş olan bir bilgisayar yazılımı tanıtılmaktadır. Söz konusu yazılım bilgisayar ortamından bireyselleştirilmiş test yaklaşımı için, kullanıcı tarafından tanımlanabilen farklı parametreler altında, bireylerin kağıt kalem testlerine verdikleri yanıtları kullanarak post-hoc simulasyonları yapmaktadır. Çalışmada once posthoc simulasyonlar hakkında kısa bir bilgi verilmekte, ardından yazılımın çalışma prensibi ve gerekli dosyalar ile birlikte örnek bir simulasyon gösterilmektedir. Son olarak da, çıktı dosyaları tanıtılmaktadır.
In the present study, comparability of scores from student evaluation of teaching forms was inves... more In the present study, comparability of scores from student evaluation of teaching forms was investigated. This is an important issue because scores given by students are used in decision making in higher education institutions. Three course-related variables (grade level, course type, and course credit) were used to define student subgroups. Then, multi-group confirmatory factor analysis was used to assess invariance of factorial structure, factor loadings and factor means across groups. It was found that although a common factorial structure held across groups, fully invariant factor loadings were observed only across instructors who teach different course types. For other groups, only partial invariance of factor loadings was obtained. Analyses also revealed that none of the subgroups had invariant factor means, indicating a possible bias. Results indicate that comparison of instructors based on student ratings may not be valid as it is mostly assumed.
Student evaluations of teaching (SET) have been the principal instrument to elicit students' opin... more Student evaluations of teaching (SET) have been the principal instrument to elicit students' opinions in higher education institutions. Many decisions, including high-stake ones, are made based on SET scores reported by students. In this respect, reliability of SET scores is of considerable importance. This paper has an argument that there are some problems in choosing and using of reliability indices in SET context. Three hypotheses were tested: (i) using internal consistency measures is misleading in SET context since the variability is mainly due to disagreement between students' ratings, which requires use of inter-rater reliability coefficients, (ii) number of minimum feedbacks is not achieved in most of the classes, resulting unreliable decisions, and (iii) calculating reliability coefficient assuming a common factor structure across all classes is misleading because a common model may not be tenable for all. Results showed that use of internal consistency only to assess reliability of SET scores may result in wrong decisions. Considerable large numbers of missing feedbacks were observed to achieve acceptable reliability levels. Findings also indicated that factorial model differed across several groups.
The purpose of the present study is to define instructional profiles and investigate the relation... more The purpose of the present study is to define instructional profiles and investigate the relationship between these profiles and learning indicators such as endof-semester grades and self-reported amount of learning. Instructional profiles were obtained using a segmentation method. Student ratings were used as indicators of instructional effectiveness. Results revealed that instructors who receive higher scores from students seem to be effective instructors in learning. However, instructors with high ratings from students did not receive high scores for all measures of instructional effectiveness. Effective instructors seem to have varying scores due to the imperfect relationship between instructional effectiveness and learning. It can be concluded that the definition of an effective instructor can vary across subgroups. For an instructor to be defined as effective, it is not necessary for them to receive higher scores for all measures. Low-rated aspects of effectiveness can be compensated for by showing high performance in other areas. Based on the results of the present study, instructional profiles or any other related traits should be investigated under subgroups that show differences.
The present study seeks to determine the variables explaining differences between the scores of s... more The present study seeks to determine the variables explaining differences between the scores of student ratings given to instructors within the context of the university through discriminant analysis. Ratings given by students were grouped into two groups based on their means and instructors were labeled as low-rated and high-rated. Predictors identified by discriminant analysis are (i) class size, (ii) credit, (iii) grade level, (iv) mean grade, and (v) number of sections. Results of the study suggested that low rated instructors are those who teach courses with smaller number of students, lower credits, higher grade levels, higher mean grades, and one section. Identification of source of differences between ratings may provide invaluable information for those who are interested in assessment of instructional effectiveness.
The present study sought to define resilient students' profile in comparison with low achievi... more The present study sought to define resilient students' profile in comparison with low achieving/low SES students. To this end, several school-and teacher-related variables, taken from PISA 2012 student questionnaire, that were considered to be influential on students' reading literacy were examined. A total number of 28 items from 5 dimensions were selected. They are: Student-Teacher Relations (5 items), Sense of Belonging (9 items), Attitude towards Learning at School (4 items), Attitude toward School (4 items), and Perceived Control (6 items). Using binary logistic regression, significant variables were defined explaining literacy differences between two groups of students. Then, profile of resilient students was defined. Results indicated that resilient students had more positive attitudes towards school and teachers compared with low achievers. The findings of the present may provide significant information as to increase rate of resilient students.
The purpose of the present study is to discuss applicability of computerized adaptive testing for... more The purpose of the present study is to discuss applicability of computerized adaptive testing format as an alternative for current student selection examinations to higher education in Turkey. In the study, first problems associated with current student selection system are given. These problems exerts pressure on students that results in test anxiety, produce measurement experiences that can be criticized, and lessen credibility of student selection system. Next, computerized adaptive test are introduced and advantages they provide are presented. Then results of a study that used two research designs (simulation and live testing) were presented. Results revealed that (i) computerized adaptive format provided a reduction up to 80% in the number of items given to students compared to paper and pencil format of student selection examination, (ii) ability estimations have high reliabilities. Correlations between ability estimations obtained from simulation and traditional format were h...
The present study investigated differences between disadvantaged and resilient students in terms ... more The present study investigated differences between disadvantaged and resilient students in terms of sense of belonging, as measured in PISA 2012. To this end, a segmentation method was employed to define student segments differing in ratios of resilient students. Results indicated that there is a relationship between academic resiliency and sense of belonging. While the relationship between resiliency and the some predictors seemed to be varying, there are some predictors with direct relationships with academic resiliency.
One of issues that higher education institutions face is probably accountability. Assessment prac... more One of issues that higher education institutions face is probably accountability. Assessment practices are among the aspects that universities seek excellence. Language proficiency tests are one of these practices. Validity, reliability, and precision of these tests/their scores are of considerable importance since they are an important barrier prior to formal university education. Students either directly start their education in their faculty or they are placed into English preparatory schools based on proficiency levels. The major issue with these proficiency exams is that students may not be classified into correct proficiency level. Or more seriously, students who have a sufficient level of English may score below the threshold due to lack of required level of precision at threshold point associated reliability of paper and pencil tests. However, language proficiency tests are expected to yield precise scores, especially at the pass/fail threshold. Computerized adaptive testing (CAT) is proposed as a solution to precision of classification problem. A substantial number of research showed that CAT can be used for efficient testing in educational settings, including language proficiency exams. Among many advantages (less items, high reliability, etc.), CAT has a potential to produce a high degree of precision to discriminate pass and fail decisions. Computer simulations were used to investigate classification accuracy of a language proficiency test at university level. Preliminary analyses showed that CAT classified individuals effectively for pass and fail decisions with a high degree of reliability, as well as significant reduction in number of items, as compared to paper-and-pencil tests.
Assessments of reliability are mostly investigated out in terms of internal consistency. Internal... more Assessments of reliability are mostly investigated out in terms of internal consistency. Internal consistency of scores are commonly assessed by Cronbach’s Alpha and several other indexes. Although these indices provide information about the consistency of the individual’s responses, they do not answer the question of whether individuals agree on their responses on the aspects being evaluated. To obtain an answer to this, inter-rater reliability coefficients are used. Among the indexes available, Intraclass Correlation Coefficients are the mostly used ones. There is no direct relationship between internal consistency and inter-rater consistency indices. Thus it should not be naively assumed that having a high degree of internal consistency among raters’ scores guarantees a high degree of agreement by raters. While responding to a measure, the individuals may be consistent about the objects being assessed; however, half of the individuals may positive opinions about the object and the rest may have negative. Thus, it is of importance to assess inter-rater reliability as well as internal consistency if one wishes to obtain information about the trait being rated. The present study differences between reliability approaches were investigated in the context of Program of International Student Assessment (PISA) 2012. Several decisions, some are high-stake, can be made based on PISA 2012 results. If this is the case, not only internal consistency should be sought but also inter-rater reliability should also be assessed. Although an instrument is valid, scores obtained from this instrument may not be reliable. Thus, reliability of scores from PISA should be assessed in after implementation phase. In this study, PISA 2012 Turkey results were studied in terms of internal and inter-rater consistency between two groups of students: (i) individuals with low SES and (ii) those who are resilient (low-SES and high-achieving students) ones. Turkey is among the counties with the highest resiliency rates, despite its very low rank in reading, science, and mathematics domains. There are several studies conducted by the researchers of this study on the resiliency of Turkish students. Thus researchers focused on reliability analysis of scores from the two different groups defined above. Preliminary analysis indicated that although high degree of internal consistency for both groups, inter-rater reliability indices showed that resilient and non-resilient students have not the similar degree of agreement on the dimensions stated in PISA questionnaires.
