Skip to main content

Mahmut Sami Yigiter

Followers

5

Following

17

Public Views

.

less

Mostafa Dehpahlavan

University of Tehran

alfonso montella

Università degli Studi di Napoli "Federico II"

SRM UNIVERSITY

University of California, San Diego

University of Alabama at Birmingham

Nottingham Trent University

Jonathan Passmore

Universidade de Évora

Apostolos Papanikolaou

National Technical University of Athens

Mohamed El-genk

University of New Mexico

Sergio A Useche

University of Valencia

Interests

Uploads

Conference Presentations by Mahmut Sami Yigiter

Öğretmen Yapımı Testler İçin Yapay Zekâ Destekli Geribildirim

Journal of Applied Measurement and Assessment , 2024

Öğretmen yapımı testler, öğretmenin öğrencilerini daha yakından tanımasına ve onların güçlü ve za... more Öğretmen yapımı testler, öğretmenin öğrencilerini daha yakından tanımasına ve onların güçlü ve zayıf yönlerini belirlemesine yardımcı olur. Ancak bu testler için genellikle pilot uygulama yapılamaması ve uzman görüşü alınamaması, testin psikometrik özellikleri açısından çeşitli sorunlar oluşturabilir. Yapay zekanın (YZ), öğretmen yapımı testlerin geliştirilmesi ve psikometrik özelliklerinin iyileştirilmesi konusunda önemli bir destek sağlayabileceği düşünülmektedir. Bu çalışmada, YZ destekli araçların geliştirilen başarı testleri için sunacağı geribildirimlerin incelenmesi amaçlanmıştır. Bu amaçla; 8. sınıf İngilizce dersi için, 5115 öğrenciye 20 çoktan seçmeli madde ile uygulanmış olan bir başarı testi üzerinde, YZ destekli araçlarının (ChatGPT 4o, Gemini ve Copilot) sağladığı geribildirimler incelenmiştir. Bu geribildirimler; testin kapsam geçerliği, madde güçlük indeksleri ve maddelerin cevaplanma süresini belirlemeye yöneliktir. Elde edilen sonuçlara göre, üç farklı YZ aracının kapsam geçerliği indeksinin oldukça yüksek düzeyde olduğu görülmüştür (CVI=0.97). YZ araçları tarafından madde güçlüklerine yönelik sınıflandırmalar arasındaki uyumun orta düzeyde dolduğu belirlenmiştir (α=0.71). Son olarak maddelerin toplam cevaplanma süresinin üç YZ aracı tarafından da bir ders saati (40 dk) sınırını aşmadığı belirtilmiştir. Bu sonuçlar; YZ araçlarının öğretmen yapımı testlerin psikometrik özelliklerinin iyileştirilmesi konusunda öğretmenlere önemli bir destek sağlayabileceğini, çeşitli analizler ve geri bildirimler sunarak testin geçerliğinin ve güvenirliğinin artırılmasına katkıda bulunabileceğini göstermektedir.
Anahtar Kelimeler: Öğretmen yapımı testler, yapay zekâ, geribildirim, geçerlik, madde yanıtlama süresi

Item Response Theory Assumptions: A Comprehensive Review of Studies with Document Analysis

International Journal of Educational Studies and Policy, 2024

Item Response Theory (IRT), over its nearly 100-year history, has become one of the most popular ... more Item Response Theory (IRT), over its nearly 100-year history, has become one of the most popular methodologies for modeling response patterns in measures in education, psychology and health. Due to its advantages, IRT is particularly popular in large-scale assessments. A precondition for the validity of the estimations obtained from IRT is that the data meet the model assumptions. The purpose of this study is to examine the testing of model assumptions in studies using IRT models. For this purpose, 107 studies in the National Thesis Center of the Council of Higher Education that use the IRT model on real data were examined. The studies were analyzed according to sample size, unidimensionality, local independence, overall model fit, item fit and non-speedness test criteria. According to the results, it was observed that the unidimensionality assumption was tested at a high level (89%) and Factor Analytic approaches were predominantly used. Local independence assumption was not tested in 36% of the studies, unidimensionality was cited as evidence in 40% of the studies and tested in 24% of the studies. Overall model fit was tested at a moderate level (51%) and Log-Likelihood and information criteria were used. Item fit and Non-Speedness testing were tested at a low level (26% and 9%). IRT assumptions should be considered as a whole and all assumptions should be tested from an evidence-based perspective.

Assessing Cyber-Emotional Skills in the Digital Age: The Turkish Adaptation and Measure Invariance of the E-Motions Scale

HAYEF: Journal of Education, 2025

In an era dominated by digital connectivity, online platforms have emerged as critical arenas whe... more In an era dominated by digital connectivity, online platforms have emerged as critical arenas where digital natives’ behavioral patterns, emotional expressions, and social interactions converge and crystallize. While extensive research has examined various aspects of digital relationships, there is a compelling imperative to prioritize the investigation of emotional dimensions, as these components offer crucial insights into psychological well-being and interpersonal dynamics in virtual spaces. This study addresses a significant methodological gap by validating and evaluating the psychometric properties of the E-motions questionnaire in the Turkish context. Employing stratified sampling, data were collected from 332 high school students. Confirmatory factor analysis results indicate a robust fit of the adapted scale to a fourdimensional 21-item scale. The scale demonstrates high internal consistency, with a Cronbach’s α coefficient of 0.933 and a McDonald’s ω coefficient of 0.947. Measurement invariance assessments show strict invariance across gender, school type, social media use, and social media platforms. These findings not only validate the instrument’s psychometric integrity but also substantiate its utility for conducting meaningful cross-group comparisons in cyber-emotional research, contributing significantly to the growing body of literature on emotional competencies in online spaces.

Examining the Performance of Artificial Intelligence in Scoring Students' Handwritten Responses to Open-Ended Items

TED Education and Science, 2025

Open-ended items, which have been used as a measurement method for centuries in the evaluation of... more Open-ended items, which have been used as a measurement method for centuries in the evaluation of student achievement, have many advantages, such as measuring high-level skills, providing rich diagnostic information about the student, and not having chance success. However, today, open-ended items cannot be used in exams with a large number of students due to the potential for errors in the scoring process and disadvantages in terms of labour, time, and cost. At this point, Artificial Intelligence (AI) has an important potential in scoring open-ended items. The aim of this study is to examine the scoring performance of AI in scoring students' handwritten responses to open-ended items. In the study, an achievement test consisting of 3 open-ended and 10 multiple-choice items was developed within the scope of the Measurement and Assessment in Education course at a state university. Open-ended items were scored in a structured way (0- 1-2), while multiple-choice items were scored as true-false (0-1). 84 participants took part in the study, and the open-ended items were scored by the expert group and the AI tool (ChatGPT-4o). The visual responses written by the students in their handwriting were scored by the AI tool in two different scenarios. In the first scenario, the AI tool was asked to score without giving any scoring criteria to the AI, whereas in the second scenario, the AI was asked to score according to the standard scoring criteria. The findings of the study showed that there were low agreement and correlation coefficients between the AI scores without criteria and expert scores, while there were high agreement and correlation coefficients between the AI scores with standard scoring criteria and expert scores. Similar to these findings, while the item discriminations of the AI scoring without criteria were quite low, the item discriminations of the AI scores with standard scoring criteria were high. In the study, the reasons for the discrepancies between expert scores and AI scores with standard criteria were also investigated and reported. The results show that AI can score handwritten open-ended items with standardized scoring criteria at a good level. In the future, with the development and transformation of AI, it is thought that it can reach scoring accuracy comparable to expert raters in terms of consistency.

Öğrencilerin El Yazısıyla Yanıtladığı Açık Uçlu Maddelerin Puanlanmasında Yapay Zekâ Performansının İncelenmesi

TED Eğitim ve Bilim, 2025

Öğrenci başarılarının değerlendirilmesinde yüzyıllardır bir ölçme yöntemi olarak kullanılan açık ... more Öğrenci başarılarının değerlendirilmesinde yüzyıllardır bir ölçme yöntemi olarak kullanılan açık uçlu maddeler, üst düzey becerilerin ölçülmesi, öğrenci hakkında zengin tanısal bilgi sağlaması, şans başarısının olmaması gibi pek çok avantaja sahiptir. Fakat günümüzde açık uçlu maddeler, puanlama işlemine hata karışabilmesi ve emek, zaman ve para açılarından dezavantajlı olması sebebiyle fazla sayıda öğrencinin katıldığı sınavlarda kullanılamamaktadır. Bu noktada Yapay Zekâ (YZ) açık uçlu maddelerin puanlanmasında önemli bir potansiyel içermektedir. Bu çalışmanın amacı, öğrencilerin açık uçlu maddelere el yazısıyla verdiği yanıtların puanlanmasında YZ’nin puanlama performansını incelemektir. Araştırmada bir devlet üniversitesinde Eğitimde Ölçme ve Değerlendirme dersi kapsamında 3 açık uçlu ve 10 çoktan seçmeli maddeden oluşan bir başarı testi geliştirilmiştir. Açık uçlu maddeler yanıtı yapılandırılmış biçimde (0-1-2) puanlanırken, çoktan seçmeli maddeler doğru-yanlış (0-1) şeklinde puanlanmıştır. 84 katılımcının yer aldığı çalışmada yer alan açık uçlu maddeler uzman grubu ve YZ aracı (ChatGPT-4o) tarafından puanlanmıştır. YZ aracına öğrencilerin el yazıları ile yazdıkları görsel yanıtlar iki farklı senaryoda puanlatılmıştır. Birinci senaryoda YZ’ye herhangi bir puanlama ölçütü verilmeden YZ aracının puanlama yapması istenirken, ikinci senaryoda standart puanlama ölçütlerine göre YZ’den puanlama yapması istenmiştir. Araştırmanın bulguları, YZ ile ölçütsüz puanlar ile uzman puanları arasında düşük uyum ve ilişki katsayıları olduğunu gösterirken, YZ ile standart ölçütle puanlama ve uzman puanlamaları arasında yüksek uyum ve ilişki katsayıları olduğu görülmüştür. Bu bulgulara benzer şekilde, YZ ile ölçütsüz puanlamanın madde ayırt edicilikleri oldukça düşük iken, YZ ile standart ölçütle puanlamanın madde ayırt edicilikleri yüksektir. Araştırmada ayrıca uzman puanları ve YZ ile standart ölçütlü puanları arasındaki uyumsuzlukların nedenleri de araştırılmış ve raporlanmıştır. Sonuçlar, YZ’nin standart puanlama ölçütleriyle el yazısıyla yanıtlanmış açık uçlu maddeleri iyi düzeyde puanlayabildiğini göstermektedir. Gelecekte YZ'nin gelişim ve dönüşümüyle birlikte tutarlılık açısından uzman puanlayıcılarla karşılaştırılabilir puanlama doğruluğuna ulaşabileceği düşünülmektedir.

ANAHTAR KELİMELER
Açık uçlu madde, Yapay zekâ, YZ, ChatGPT, Otomatik puanlama, El yazısı yanıtlar, Yapılandırılmış yanıtlı madde

Sosyal Görünüş Kaygısı Ölçeği’nin Meta Analiz ile Güvenirlik Genellemesi

Sosyal Görünüş Kaygısı Ölçeği’nin Meta Analiz ile Güvenirlik Genellemesi, 2022

Sosyal Görünüş Kaygısı Ölçeği (SGKÖ), insanların bedeni ve görünüşüyle ilgili olumsuz beden imajı... more Sosyal Görünüş Kaygısı Ölçeği (SGKÖ), insanların bedeni ve görünüşüyle ilgili olumsuz beden imajı oluşturmasıyla oluşan kaygıyı ölçen öz bildirim ölçeklerinden biridir. Bu çalışma, Hart vd. (2008) tarafından geliştirilen 16 madde ve tek faktörden oluşan Sosyal Görünüş Kaygısı Ölçeği’nin iç tutarlılık kestirimleri hakkında bir güvenirlik genellemesi sunmaktadır. Belirlenen veri tabanlarında yapılan arama sonucunda 99 çalışmaya ulaşılmıştır. Bu çalışmaların 5’inde ölçek kullanılmamış, 23’ü güvenirlik katsayısını bildirmemiş ve 1 çalışmaya erişilememiştir. İlgili ölçeğin güvenirlik katsayısını içeren 68 çalışma ile güvenirlik genellemesi çalışması yürütülmüştür. Birleştirilmiş güvenirlik katsayısı 0.939 [0.869, 0.972] idi. Çalışmalarda raporlanan güvenirlik katsayıları arasındaki değişkenliğin nedenleri moderatör değişkenlere göre incelenmiştir. Bu çalışmanın bulguları araştırmacıların Sosyal Görünüş Kaygısı Ölçeği’ni güvenilir bir şekilde kullanabileceğini göstermektedir. Anahtar kelimeler: sosyal görünüş kaygısı, güvenirlik, iç tutarlılık, güvenirlik genellemesi, meta analiz.

Türkiye’de Sosyal Medya Bağımlılığı ile Depresyon Arasındaki İlişki: Bir Meta Analiz Çalışması

8. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Kongresi, 2022

Bu araştırmanın amacı, Türkiye’de sosyal medya bağımlılığı ile depresyon arasındaki ilişkiyi meta... more Bu araştırmanın amacı, Türkiye’de sosyal medya bağımlılığı ile depresyon arasındaki ilişkiyi meta analiz yöntemiyle incelemektir. Bu amaç doğrultusunda belirlenen anahtar kelimelerle çeşitli veri tabanlarında taramalar yapılmıştır. Rastgele etkiler modeliyle gerçekleştirilen meta analiz çalışmasına, Türkiye evreninde gerçekleştirilmiş olup dahil edilme kriterlerine uyan makale ve lisansüstü tezlerden oluşan 29 çalışma dâhil edilmiştir. Verilerin analizi metafor paketi kullanılarak R studio programında gerçekleştirilmiştir. Araştırmadan elde edilen geçici bulgulara göre sosyal medya bağımlılığı ile depresyon arasında pozitif yönde ve orta düzeyde bir ilişki bulunmaktadır. Anahtar kelimeler: sosyal medya bağımlılığı, depresyon, meta analiz

Bireyselleştirilmiş Çok Aşamalı Testlerde Test Tasarımının Test Katılımcılarının Optimal Olmayan Modüle Yönlendirilmesine Etkisi

8. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Kongresi, 2022

Bireyselleştirilmiş Çok Aşamalı Testler [MST], test katılımcısının önceden birleştirilmiş panel ü... more Bireyselleştirilmiş Çok Aşamalı Testler [MST], test katılımcısının önceden birleştirilmiş panel üzerinde aşama ve modülleri tamamlayarak ilerlediği bir test modelidir. MST’de test katılımcısının yetenek düzeyine en uygun (optimal) modüle yönlendirilmesi hem ölçüm kesinliği hem de uyarlanabilir testin mantığı açısından önemlidir. Bu araştırmanın amacı MST tasarımında yer alan temel bileşenlerin optimal yönlendirmeyi etkileme düzeylerinin incelenmesidir. Bu bağlamda araştırma Monte Carlo simülasyon çalışması ile yürütülmüştür. Test uzunluğunun, yönlendirme modülü uzunluğunun ve yönlendirme modülünün geniş yetenek aralığında yapılandırılmasının optimal modüle yönlendirme düzeyini pozitif yönde etkilediği sonucuna ulaşılmıştır. Anahtar kelimeler: bireyselleştirilmiş çok aşamalı test, optimal yönlendirme, ölçme kesinliği, uyarlanabilir test, computerized adaptive test, computerized multistage testing, optimal routing, measurement precision.

Papers by Mahmut Sami Yigiter

Bireyselleştirilmiş Çok Aşamalı Testlerde Test Tasarımının Yanlış Yönlendirmeye Etkisi

Uluslararası Türk Eğitim Bilimleri Dergisi

Computerized Multistage Testing (MST) is an adaptive testing approach in which the test taker com... more Computerized Multistage Testing (MST) is an adaptive testing approach in which the test taker completes stages and modules on a pre-assembled panel according to his/her ability level. In MST, the test taker is routed to a module in the following stage based on his/her responses to the module in each stage. The test taker is expected to be routed to the module that fits his/her ability level best in the following stages. If the test taker is not routed to the module appropriate to his/her ability level, misrouting can be mentioned. Misrouting is thought to affect both measurement accuracy and the test taker's psychology. Although it is very difficult to completely eliminate misrouting, it is assumed that it can be reduced with the basic components of the MST design. The purpose of this study is to determine the level of misrouting according to different MST designs and to investigate the effects of changes in test design on the level of misrouting. The main components that are co...

Madde Güçlüklerinin Tahmin Edilmesinde Uzman Görüşleri ve ChatGPT Performansının Karşılaştırılması

Disiplinlerarası Eğitim Araştırmaları Dergisi

Bu çalışmada ChatGPT yapay zeka teknolojisinin eğitim alanında destekleyici unsur olarak kullanım... more Bu çalışmada ChatGPT yapay zeka teknolojisinin eğitim alanında destekleyici unsur olarak kullanımına yönelik bir araştırma yürütülmüştür. ChatGPT’nin çoktan seçmeli test maddelerini yanıtlama ve bu maddelerin madde güçlük düzeylerini sınıflama performansı incelenmiştir. 20 maddeden oluşan beş seçenekli çoktan seçmeli test maddesine 4930 öğrencinin verdiği yanıtlara göre madde güçlük düzeyleri belirlenmiştir. Bu güçlük düzeyleri ile ChatGPT’nin ve uzmanların yaptığı sınıflandırmalar arasındaki ilişkiler incelenmiştir. Elde edilen bulgulara göre ChatGPT’nin çoktan seçmeli maddelere doğru yanıt verme performansının yüksek düzeyde olmadığı (%55) görülmüştür. Ancak madde güçlük düzeylerini sınıflandırma konusunda ChatGPT; gerçek madde güçlük düzeyleri ile 0.748, uzman görüşleri ile 0.870 korelasyon göstermiştir. Bu sonuçlara göre deneme uygulamasının yapılamadığı veya uzman görüşlerine başvurulamadığı durumlarda ChatGPT'den test geliştirme aşamalarında destek alınabileceği düşünülmek...

Reliability Generalization of Social Appearance Anxiety Scale: A Meta Analysis Study

Hacettepe Üniversitesi Eğitim Fakültesi dergisi/Hacettepe eğitim dergisi, Jan 25, 2024

The Social Appearance Anxiety Scale (SAAS) is one of the self-report scales that measure the anxi... more The Social Appearance Anxiety Scale (SAAS) is one of the self-report scales that measure the anxiety that occurs when people form a negative body image about their body and appearance. This study provides a reliability generalization about the internal consistency estimates of the Social Appearance Anxiety Scale, which consists of 16 items and a single factor developed by Hart et al. (2008). As a result of the search in the identified databases, 96 studies were found. In 4 of these studies, the scale was not used, 23 did not report the reliability coefficient and 1 study could not be accessed. Reliability generalization study was conducted with 68 studies including the reliability coefficient of the relevant scale. It was concluded that the average reliability coefficient was .937 [.930-.943]. As a result of moderator analyses, it was concluded that there was a statistically significant difference in Cronbach's alpha coefficient according to the subcategories of "language of the scale" and "country of the participants" variables, but there was no statistically significant difference according to the subcategories of "language of the article", "sample type" and "field of study" variables and "average age" variable. With this study, it was concluded that it would not be appropriate to generalize, that is, to use reliability induction, since the reliability coefficients of the Social Appearance Anxiety Scale obtained in different languages and different countries differ. It is recommended that the authors calculate reliability estimates for the data sets they have and report the reliability coefficients obtained.

Cross-National Measurement of Mathematics Intrinsic Motivation: An Investigate of Measurement Invariance with MG-CFA and Aligment Method Across Fourteen Countries

Kuramsal eğitim bilim dergisi, Jan 28, 2024

One of the main objectives of international large-scale assessments is to make comparisons betwee... more One of the main objectives of international large-scale assessments is to make comparisons between different countries, education policies, education systems, or subgroups. One of the main criteria for making comparisons between different groups is to ensure measurement invariance. The purpose of this study was to test the measurement invariance of the mathematics intrinsic motivation scale across 14 countries. For this purpose, the "students like learning mathematics" scale, which measures intrinsic motivation for mathematics, was included in the TIMSS 2019 cycle. The study sample consisted of a total of 152992 students, 70192 4th grade and 82800 8th grade students from 14 different countries participating in the TIMSS 2019 cycle. Measurement invariance was tested with Multi-Group Confirmatory Factor Analysis (MG-CFA) and Alignment Method. The mathematics intrinsic motivation scale provides only configural invariance according to MG-CFA at the 4th grade level, whereas the scale provides approximate invariance according to the alignment method. At the 8th grade level, the scale provides configural and metric invariance according to MG-CFA, whereas the scale provides approximate invariance according to the alignment method. The results indicate that the mathematics intrinsic motivation scale provides approximate measurement invariance at both grade levels and that comparisons can be made between the scores of the identified countries.

Computerized Multistage Testing: Principles, Designs and Practices with R

Measurement: Interdisciplinary Research and Perspectives

Madde Güçlüklerinin Tahmin Edilmesinde Uzman Görüşleri ve ChatGPT Performansının Karşılaştırılması Comparison of Expert Opinions and ChatGPT Performance in Predicting Item Difficulties [Comparison of Expert Opinions and ChatGPT Performance in Predicting Item Difficulties]

Disiplinlerarası Eğitim Araştırmaları Dergisi, 2023

Bu çalışmada ChatGPT'nin çoktan seçmeli test maddelerini yanıtlama ve bu maddelerin madde güçlük ... more Bu çalışmada ChatGPT'nin çoktan seçmeli test maddelerini yanıtlama ve bu maddelerin madde güçlük düzeylerini sınıflama performansı incelenmiştir. 20 maddeden oluşan beş seçenekli çoktan seçmeli test maddesine 4930 öğrencinin verdiği yanıtlara göre madde güçlük düzeyleri belirlenmiştir. Bu güçlük düzeyleri ile ChatGPT'nin ve uzmanların yaptığı sınıflandırmalar arasındaki ilişkiler incelenmiştir. Elde edilen bulgulara göre ChatGPT'nin çoktan seçmeli maddelere doğru yanıt verme performansının orta düzeyde olduğu (%55) görülmüştür. Ancak madde güçlük düzeylerini sınıflandırma konusunda ChatGPT; gerçek madde güçlük düzeyleri ile 0.748, uzman görüşleri ile 0.870 korelasyon göstermiştir. Bu sonuçlara göre deneme uygulamasının yapılamadığı veya uzman görüşlerine başvurulamadığı durumlarda ChatGPT'den test geliştirme aşamalarında destek alınabileceği düşünülmektedir. Geniş ölçekli sınavlarda da uzman gözetiminde ChatGPT benzeri yapay zekâ teknolojilerinden faydalanılabilir.

In this study, ChatGPT's performance in answering multiple-choice test items and classifying the item
difficulty levels of these items was examined. Item’s actual difficulty levels were determined according
to the responses of 4930 students to the five-choice multiple-choice test items consisting of 20 items.
The relationships between these difficulty levels and the classifications made by ChatGPT and experts
were tested. The findings demonsrated that ChatGPT's performance in giving correct answers to
multiple-choice items was at moderate level (55%). However, in terms of classifying item difficulty
levels, ChatGPT showed a correlation of 0.748 with actual item difficulty levels and 0.870 with expert
opinions. According to these results, it is thought that ChatGPT can be used to support test development
in cases where trial application cannot be conducted or expert opinions cannot be consulted. In largescale exams, ChatGPT-like artificial intelligence technologies can be utilized under expert supervision.
Keywords: ChatGPT, artificial intelligence, item difficulties, expert opinion

Matematik Duyuşsal Özellik Faktörlerinin Cinsiyete Göre Ölçme Değişmezliğinin İncelenmesi: TIMSS 2019 Türkiye Örneği [Investigation of Measurement Invariance of Mathematics Affective Characteristic Factors According to Gender: TIMSS 2019 Turkey Sample]

Anadolu University Journal of Education Faculty (AUJEF), 2023

One of the main objectives of large-scale assessments is to draw conclusions about education poli... more One of the main objectives of large-scale assessments is to draw conclusions about education policies or education systems by making comparisons between different countries or subgroups. One of the main criteria for making comparisons between different groups is to satisfy measurement invariance. Measurement invariance indicates that the measured construct is psychometrically equivalent between groups. Claims of differences in comparisons without evidence of measurement invariance can be unreliable. The aim of this study was to test the measurement invariance of the model created with mathematics affective characteristics according to gender. For this purpose, the Mathematics Affective Characteristics Model was created with the scales of Like Learning Mathematics (MOS), Instructional Clarity in Mathematics Lessons (MON), Disorderly Behavior During Mathematics Lessons (MDDD), Students Confident in Mathematics (MKG) and Students Value Mathematics (MDV) in the TIMSS 2019 cycle. The sample of the study consists of 3658 students from Turkey who participated in the TIMSS 2019 cycle at the 8th grade level. In the first part of the study, Confirmatory Factor Analysis (CFA) was conducted to examine the factor structure of the mathematics affective characteristics model. DFA model results show that model data fit is reached (RMSEA=0.046, SRMR=0.051, CFI=0.973 and TLI=0.975). In the measurement invariance analysis, it was tested hierarchically between the stages with Multi-Group CFA (MG-CFA) analysis. The findings show that the mathematics affective characteristics model meets the configural, metric, scaler, and strict invariance stages, respectively. Therefore, the factor loadings, variances, error variances and covariances of the mathematics affective characteristics model were equivalent according to gender, and it was concluded that significant comparisons could be made between the groups. After examining measurement invariance, t-test analyses were conducted to examine the significant differences of the variables in the model according to gender. The results indicate that there is a significant difference in favor of boys in the MON scale, in favor of girls in the MKG and MDDD scales, while there is no significant difference in the MDV and MOS variables according to gender.

Uluslararası geniş ölçekli değerlendirmelerin temel hedeflerinden biri göre farklı ülkeler veya altgruplar arasında karşılaştırmalar yaparak eğitim politikaları veya eğitim sistemleri hakkında çıkarımlarda bulunmaktır. Farklı gruplar arasında karşılaştırma yapmanın temel kriterlerinden biri de ölçme değişmezliğinin sağlanmasıdır. Ölçme değişmezliği, ölçülen yapının gruplar arasında psikometrik olarak eşdeğer olduğunu göstermektedir. Ölçme değişmezliği kanıtı sunulmadan yapılan karşılaştırmalardaki farklılıklara dair iddialar güvenilmez olabilir. Bu çalışmanın amacı matematik duyuşsal özellikleri ile oluşturulan modelin cinsiyete göre ölçme değişmezliğinin sınanmasıdır. Bu amaçla TIMSS 2019 döngüsünde yer alan matematik öğrenmeyi sevme (MOS), matematik öğretiminin netliği (MON), matematik dersinde disiplinsiz davranış (MDDD), matematikte kendine güven (MKG) ve matematiğe değer verme (MDV) ölçekleri ile Matematik Duyuşsal Özellikleri Modeli oluşturulmuştur. Çalışmanın örneklemini TIMSS 2019 döngüsüne 8. Sınıf düzeyinde Türkiyeden katılan 3658 öğrenci oluşturmaktadır. Araştırmanın ilk bölümünde matematik duyuşsal özellikler modelinin faktör yapısını incelemek için Doğrulayıcı Faktör Analizi (DFA) yapılmıştır. DFA modeli sonuçları model veri uyumunun sağlandığını göstermektedir (RMSEA=0.046, SRMR=0.051, CFI=0.973 ve TLI=0.975). Ölçme değişmezliği analizinde Çok Gruplu DFA (ÇG-DFA) analizi ile aşamalar arasında hiyerarşik olarak test edilmiştir. Bulgular, matematik duyuşsal özellikler modelinin sırasıyla yapısal, metrik, ölçek ve katı değişmezlik aşamalarını karşıladığını göstermektedir. Dolayısıyla matematik duyuşsal özellikler modelinin cinsiyete göre faktör yükleri, varyansları, hata varyansları ve kovaryansları eşdeğer olup gruplar arasında anlamlı karşılaştırmalar yapılabileceği sonucuna ulaşılmıştır. Ölçme değişmezliğinin incelenmesinin ardından modelde yer alan değişkenlerin cinsiyete göre anlamlı farklılıklarını incelemek için t testi analizleri gerçekleştirilmiştir. Sonuçlar, MON ölçeğinde erkekler lehine, MKG ve MDDD ölçeklerinde kızlar lehine anlamlı farklılık olduğuna işaret ederken, MDV ve MOS değişkenlerinde cinsiyete göre anlamlı farklılık bulunmamaktadır.

Does Quantum Learning Model Increase Academic Achievement A Meta-Analysis Study [Kuantum Öğrenme Modeli Akademik Başarıyı Arttırıyor mu Bir Meta-Analiz Çalışması]

Cumhuriyet International Journal of Education, 2023

Quantum Learning Model (QLM) is a model that enables students to have a joyful learning experienc... more Quantum Learning Model (QLM) is a model that enables students to have a joyful learning experience, aims to realise permanent learning, and aims to learn by making sense in the mind of the individual. This study aimed to systematically synthesise the effect of QLM on academic achievement through meta-analysis method through existing research. A search of five databases yielded 25 studies that met the inclusion criteria. The findings of the random effects meta-analysis showed that the effect of QLM on academic achievement was positive and large (d=1.051 [0.769, 1.331], p<.05). According to the moderator analysis results, the results concluded that publication year, sample size, publication type, course, country and pretest status variables were not significant sources of heterogeneity. The highest effect of QLM on academic achievement was found at the middle school level, followed by primary school, high school and university levels, respectively. The results of the study suggest that QLM is effective on academic achievement. The study also provides suggestions for future studies on QLM.

The Relationship Between Problematic Social Media Use and Depression: A Meta-Analysis Study

Current Psychology

Öğretmenleri̇n Uzaktan Eği̇ti̇mde Yaşadiği Sorunlarin Siralama Yargilari Kanunuyla Ölçeklenmesi̇

Boğaziçi Üniversitesi dergisi, eğitim bilimleri, Jul 20, 2022

Bu çalışmanın amacı, öğretmenlerin uzaktan eğitimde yaşadığı sorunların önem sıralamasını belirle... more Bu çalışmanın amacı, öğretmenlerin uzaktan eğitimde yaşadığı sorunların önem sıralamasını belirlemektir. Bu kapsamda nicel araştırma türlerinden tarama araştırması yönteminden faydalanılmıştır. Uzaktan eğitimde öğretmenlerin yaşadığı sorunlar belirlenmiştir. Ardından sorunlar ile oluşturulan ölçek, çalışma grubunda yer alan ve aktif olarak uzaktan eğitim veren 906 öğretmene uygulanmıştır. Çalışma grubundan elde edilen veri, ölçekleme yaklaşımlarından sıralama yargıları ile ölçekleme yöntemi kullanılarak analiz edilmiştir. Araştırmanın sonuçlarına göre öğretmenlerin yaşadığı en önemli iki sorun "öğrencilerin derse katılma isteksizliği" ve "internet erişimi sorunu" iken, daha az önemli görülen iki sorun ise "online ders yazılımı sorunları" ve "uzaktan eğitimde ders dokümanı yetersizliği" dir. Ayrıca "öğrencilerin derse katılma isteksizliği" sorunu ilkokul düzeyinde önemli bir sorun olarak görülmezken, ortaokul ve lise düzeyinde önemli bir sorun olduğu sonucuna ulaşılmıştır.

Öğretmenlerin Uzaktan Eğitimde Yaşadığı Sorunların Sıralama Yargıları Kanunuyla Ölçeklenmesi

Boğaziçi Üniversitesi Eğitim Dergisi, 2022

Bu çalışmanın amacı, öğretmenlerin uzaktan eğitimde yaşadığı sorunların önem sıralamasını belirle... more Bu çalışmanın amacı, öğretmenlerin uzaktan eğitimde yaşadığı sorunların önem sıralamasını belirlemektir. Bu kapsamda nicel araştırma türlerinden tarama araştırması yönteminden faydalanılmıştır. Uzaktan eğitimde öğretmenlerin yaşadığı sorunlar belirlenmiştir. Ardından sorunlar ile oluşturulan ölçek, çalışma grubunda yer alan ve aktif olarak uzaktan eğitim veren 906 öğretmene uygulanmıştır. Çalışma grubundan elde edilen veri, ölçekleme yaklaşımlarından sıralama yargıları ile ölçekleme yöntemi kullanılarak analiz edilmiştir. Araştırmanın sonuçlarına göre öğretmenlerin yaşadığı en önemli iki sorun "öğrencilerin derse katılma isteksizliği" ve "internet erişimi sorunu" iken, daha az önemli görülen iki sorun ise "online ders yazılımı sorunları" ve "uzaktan eğitimde ders dokümanı yetersizliği" dir. Ayrıca "öğrencilerin derse katılma isteksizliği" sorunu ilkokul düzeyinde önemli bir sorun olarak görülmezken, ortaokul ve lise düzeyinde önemli bir sorun olduğu sonucuna ulaşılmıştır.

Books by Mahmut Sami Yigiter

Gender Differences Measurement Invariance in Test Anxiety across the World Evidence From PISA 2015 Yigiter Boduroglu

Recently, students have become more anxious in their education process due to increasing competit... more Recently, students have become more anxious in their education process due to increasing competitive conditions and rising academic expectations. The pressure from the environment, family, or school stakeholders to get high grades increases the stress on the student with the fear of failure. Test anxiety refers to the feelings of fear, worry, and tension that students may experience in relation to their academic performance, school loads, and expectations from students (Chamberlain, Daly, and Spalding, 2011; Putwain and Daly, 2014). In other words, test anxiety describes the change in students' stress and anxiety levels due to the exams or studies they perform at school. The main triggers of test anxiety can be factors such as fear of failure, pressure from high demands, competition with peers, or lack of self-confidence (Ringeisen & Raufelder, 2015; Tan & Pang, 2023). There is a broad consensus in the literature that test anxiety is related to academic achievement (Ali & Mohsin, 2013; Crişan & Copaci, 2015; von der Embse, Jester, Roy, & Post, 2018). von der Embse et al. (2018) reported that test anxiety has a negative effect on many educational performance indicators in their meta-analysis of 238 studies on test anxiety since 1988. In addition, there are studies showing that test anxiety is closely related to many psychological symptoms such as depressive symptoms, stress, and emotional balance disorders (Augner, 2015; von der Embse, Barterian, & Segool, 2013). Test anxiety can also lead to different physical symptoms such as heart palpitations, rapid pulse, rapid breathing, sweating, headache, abdominal pain or nausea, sleep problems, and fatigue (Mashayekh & Hashemi, 2011; Chishti & Rana, 2021). In addition, it is also stated in the literature that test anxiety can have future-oriented effects such as skipping classes, delaying or giving up academic goals (Pekrun, 2006; Lowe et al., 2008). Therefore, it can be said that test anxiety will have negative effects on students' academic performance, mental health and overall quality of life if it is not given due importance. 15 Test Anxiety Any attempt to measure students' academic development in which academic performance is evaluated brings to mind the concept of test anxiety, which in a sense causes students to react with anxiety (Hodapp, Glanzmann, & Laux, 1995). Test anxiety is the mental, psychological, or physical behavioral reactions that occur due to the worry of the possible negative consequences of failing an exam or an assessment (Zeidner, 1998). School assignments, exams, pressure to get high grades, and fear of getting low grades are seen as the most prominent causes of test anxiety (McDonald, 2001; Yakıcı and Kandemir, 2021; Demir, 2022). Some students develop anxiety when they cannot solve tasks at school, when they have problems with homework, when they are preparing for an exam or when they feel that they will take an exam (Zeidner, 2007). Anxiety is expected to be higher in students with low levels of confidence in themselves or their abilities or in students with high levels of parental expectations. Students with test anxiety are more likely to underperform, be absent frequently, or drop out of school completely (Cortina, 2008; Ramirez & Beilock, 2011). Anxiety can affect students' motivation and disrupt their learning strategies (Varasteh, Ghanizadeh, & Akbari, 2016). In addition, anxiety has many effects on mental, psychological, physical or quality of life (Lohiya et al., 2021; Chen et al., 2023). Gender Differences in Test Anxiety There are many studies reporting that there is a gender difference in test anxiety between male and female groups. In self-reported test anxiety, it is generally seen that girls have more test anxiety than boys (Hembree, 1988; Seipp & Schwarzer, 1996; von der Embse et al., 2018). Donati et al. (2020) stated that girls have higher test anxiety than boys in the German Test Anxiety Inventory. Devine, Fawcett, 16 Szűcs, and Dowker (2012) examined the relationships between mathematics performance, mathematics anxiety and test anxiety by gender and found that girls have higher test anxiety than boys. Robson, Johnstone, Putwain, and Howard (2023), in a meta-analysis of articles written on test anxiety in the last 20 years, report that girls show higher test anxiety than boys. In general, there are many studies in the literature indicating that girls have higher test anxiety than boys. Some studies have also been conducted on why girls may have higher levels of test anxiety than boys. In these studies, it is stated that women may be more prone to anxiety, stress or depression due to coping style, socialization, or genetic factors (Goodwin & Gotlib, 2004; Olatunji et al., 2013). Women show their emotions more easily and therefore, in terms of socialization practice, women are more likely to have higher levels of anxiety (Chaplin, 2015). Therefore, a higher anxiety tendency may lead to higher test anxiety (McLean et al., 2011). On the other hand, gender differences in test anxiety may also stem from the sub-dimensions of test anxiety. It can be seen that this difference is sometimes higher in the emotionality and anxiety components of test anxiety (Putwain, 2007) and sometimes only in the emotionality component (Zeidner & Nevo, 1992). On the other hand, the self-report method of measuring test anxiety may also be a variable in explaining the gender difference. As a result, it is thought that new studies should be conducted to better determine the difference in test anxiety. Measurement Invariance Large-scale international assessments play an important role in comparing the qualifications of individuals across countries. The Programme for International Student Assessment (PISA) provides data on students' academic achievement for cross-country comparisons. In such large-scale assessments, the precondition for making comparisons 17 between groups is to ensure statistical equivalence between groups (Millsap & Olivera-Ogilar, 2012). While test adaptation takes place before data collection, assessment of the measurement invariance of scores takes place after data collection to decide on score equivalence between groups. Measurement invariance indicates that the item parameters of a scale are equivalent across groups (Vandenberg & Lance, 2000). Scores meet the condition of measurement invariance if individuals with similar characteristics are equally likely to choose a particular item response, regardless of group membership. In the development of each measurement tool, it is assumed as a basic principle that it measures the same trait in each group to which it is applied. However, in practice, the results may differ according to the groups to which they are applied. If the results obtained from the groups do not have equivalent psychometric qualities, it would not be correct to compare the results between the groups (Başusta & Gelbal, 2015; Yiğiter, 2023). Therefore, the measurement tool should measure the same construct in the same way in each subgroup. With measurement invariance, showing that the factor loadings, inter-dimensional correlations and error variances of a measurement model are the same in each group shows that the measurement tool has the same structure in different groups (Jöreskog & Sörbom, 1993). Researchers obtain evidence on whether the scale measures the same construct in subgroups (Millsap & Olivera-Ogilar, 2012). If measurement invariance cannot be ensured, a validity problem occurs for the measurement tool. Therefore, the interpretations to be obtained from group comparisons based on the scores obtained from this measurement tool may also be incorrect (Vandenberg & Lance, 2000). Measurement invariance also provides evidence of the validity of the instrument. He et al. (2019), in their research on cross-cultural comparability with TIMSS and PISA data, state that comparisons made without examining measurement invariance may lead to 18 incorrect results, hence the importance of testing measurement invariance. If measurement invariance is not ensured, it is not possible to know whether the difference between the scores to be obtained from the measurement tool is due to a real difference in construct or scale items (Horn & McArdle, 1992).

Öğretmen Yapımı Testler İçin Yapay Zekâ Destekli Geribildirim

Journal of Applied Measurement and Assessment , 2024

Öğretmen yapımı testler, öğretmenin öğrencilerini daha yakından tanımasına ve onların güçlü ve za... more Öğretmen yapımı testler, öğretmenin öğrencilerini daha yakından tanımasına ve onların güçlü ve zayıf yönlerini belirlemesine yardımcı olur. Ancak bu testler için genellikle pilot uygulama yapılamaması ve uzman görüşü alınamaması, testin psikometrik özellikleri açısından çeşitli sorunlar oluşturabilir. Yapay zekanın (YZ), öğretmen yapımı testlerin geliştirilmesi ve psikometrik özelliklerinin iyileştirilmesi konusunda önemli bir destek sağlayabileceği düşünülmektedir. Bu çalışmada, YZ destekli araçların geliştirilen başarı testleri için sunacağı geribildirimlerin incelenmesi amaçlanmıştır. Bu amaçla; 8. sınıf İngilizce dersi için, 5115 öğrenciye 20 çoktan seçmeli madde ile uygulanmış olan bir başarı testi üzerinde, YZ destekli araçlarının (ChatGPT 4o, Gemini ve Copilot) sağladığı geribildirimler incelenmiştir. Bu geribildirimler; testin kapsam geçerliği, madde güçlük indeksleri ve maddelerin cevaplanma süresini belirlemeye yöneliktir. Elde edilen sonuçlara göre, üç farklı YZ aracının kapsam geçerliği indeksinin oldukça yüksek düzeyde olduğu görülmüştür (CVI=0.97). YZ araçları tarafından madde güçlüklerine yönelik sınıflandırmalar arasındaki uyumun orta düzeyde dolduğu belirlenmiştir (α=0.71). Son olarak maddelerin toplam cevaplanma süresinin üç YZ aracı tarafından da bir ders saati (40 dk) sınırını aşmadığı belirtilmiştir. Bu sonuçlar; YZ araçlarının öğretmen yapımı testlerin psikometrik özelliklerinin iyileştirilmesi konusunda öğretmenlere önemli bir destek sağlayabileceğini, çeşitli analizler ve geri bildirimler sunarak testin geçerliğinin ve güvenirliğinin artırılmasına katkıda bulunabileceğini göstermektedir.
Anahtar Kelimeler: Öğretmen yapımı testler, yapay zekâ, geribildirim, geçerlik, madde yanıtlama süresi

Item Response Theory Assumptions: A Comprehensive Review of Studies with Document Analysis

International Journal of Educational Studies and Policy, 2024

Item Response Theory (IRT), over its nearly 100-year history, has become one of the most popular ... more Item Response Theory (IRT), over its nearly 100-year history, has become one of the most popular methodologies for modeling response patterns in measures in education, psychology and health. Due to its advantages, IRT is particularly popular in large-scale assessments. A precondition for the validity of the estimations obtained from IRT is that the data meet the model assumptions. The purpose of this study is to examine the testing of model assumptions in studies using IRT models. For this purpose, 107 studies in the National Thesis Center of the Council of Higher Education that use the IRT model on real data were examined. The studies were analyzed according to sample size, unidimensionality, local independence, overall model fit, item fit and non-speedness test criteria. According to the results, it was observed that the unidimensionality assumption was tested at a high level (89%) and Factor Analytic approaches were predominantly used. Local independence assumption was not tested in 36% of the studies, unidimensionality was cited as evidence in 40% of the studies and tested in 24% of the studies. Overall model fit was tested at a moderate level (51%) and Log-Likelihood and information criteria were used. Item fit and Non-Speedness testing were tested at a low level (26% and 9%). IRT assumptions should be considered as a whole and all assumptions should be tested from an evidence-based perspective.

Assessing Cyber-Emotional Skills in the Digital Age: The Turkish Adaptation and Measure Invariance of the E-Motions Scale

HAYEF: Journal of Education, 2025

In an era dominated by digital connectivity, online platforms have emerged as critical arenas whe... more In an era dominated by digital connectivity, online platforms have emerged as critical arenas where digital natives’ behavioral patterns, emotional expressions, and social interactions converge and crystallize. While extensive research has examined various aspects of digital relationships, there is a compelling imperative to prioritize the investigation of emotional dimensions, as these components offer crucial insights into psychological well-being and interpersonal dynamics in virtual spaces. This study addresses a significant methodological gap by validating and evaluating the psychometric properties of the E-motions questionnaire in the Turkish context. Employing stratified sampling, data were collected from 332 high school students. Confirmatory factor analysis results indicate a robust fit of the adapted scale to a fourdimensional 21-item scale. The scale demonstrates high internal consistency, with a Cronbach’s α coefficient of 0.933 and a McDonald’s ω coefficient of 0.947. Measurement invariance assessments show strict invariance across gender, school type, social media use, and social media platforms. These findings not only validate the instrument’s psychometric integrity but also substantiate its utility for conducting meaningful cross-group comparisons in cyber-emotional research, contributing significantly to the growing body of literature on emotional competencies in online spaces.

Examining the Performance of Artificial Intelligence in Scoring Students' Handwritten Responses to Open-Ended Items

TED Education and Science, 2025

Open-ended items, which have been used as a measurement method for centuries in the evaluation of... more Open-ended items, which have been used as a measurement method for centuries in the evaluation of student achievement, have many advantages, such as measuring high-level skills, providing rich diagnostic information about the student, and not having chance success. However, today, open-ended items cannot be used in exams with a large number of students due to the potential for errors in the scoring process and disadvantages in terms of labour, time, and cost. At this point, Artificial Intelligence (AI) has an important potential in scoring open-ended items. The aim of this study is to examine the scoring performance of AI in scoring students' handwritten responses to open-ended items. In the study, an achievement test consisting of 3 open-ended and 10 multiple-choice items was developed within the scope of the Measurement and Assessment in Education course at a state university. Open-ended items were scored in a structured way (0- 1-2), while multiple-choice items were scored as true-false (0-1). 84 participants took part in the study, and the open-ended items were scored by the expert group and the AI tool (ChatGPT-4o). The visual responses written by the students in their handwriting were scored by the AI tool in two different scenarios. In the first scenario, the AI tool was asked to score without giving any scoring criteria to the AI, whereas in the second scenario, the AI was asked to score according to the standard scoring criteria. The findings of the study showed that there were low agreement and correlation coefficients between the AI scores without criteria and expert scores, while there were high agreement and correlation coefficients between the AI scores with standard scoring criteria and expert scores. Similar to these findings, while the item discriminations of the AI scoring without criteria were quite low, the item discriminations of the AI scores with standard scoring criteria were high. In the study, the reasons for the discrepancies between expert scores and AI scores with standard criteria were also investigated and reported. The results show that AI can score handwritten open-ended items with standardized scoring criteria at a good level. In the future, with the development and transformation of AI, it is thought that it can reach scoring accuracy comparable to expert raters in terms of consistency.

Öğrencilerin El Yazısıyla Yanıtladığı Açık Uçlu Maddelerin Puanlanmasında Yapay Zekâ Performansının İncelenmesi

TED Eğitim ve Bilim, 2025

Öğrenci başarılarının değerlendirilmesinde yüzyıllardır bir ölçme yöntemi olarak kullanılan açık ... more Öğrenci başarılarının değerlendirilmesinde yüzyıllardır bir ölçme yöntemi olarak kullanılan açık uçlu maddeler, üst düzey becerilerin ölçülmesi, öğrenci hakkında zengin tanısal bilgi sağlaması, şans başarısının olmaması gibi pek çok avantaja sahiptir. Fakat günümüzde açık uçlu maddeler, puanlama işlemine hata karışabilmesi ve emek, zaman ve para açılarından dezavantajlı olması sebebiyle fazla sayıda öğrencinin katıldığı sınavlarda kullanılamamaktadır. Bu noktada Yapay Zekâ (YZ) açık uçlu maddelerin puanlanmasında önemli bir potansiyel içermektedir. Bu çalışmanın amacı, öğrencilerin açık uçlu maddelere el yazısıyla verdiği yanıtların puanlanmasında YZ’nin puanlama performansını incelemektir. Araştırmada bir devlet üniversitesinde Eğitimde Ölçme ve Değerlendirme dersi kapsamında 3 açık uçlu ve 10 çoktan seçmeli maddeden oluşan bir başarı testi geliştirilmiştir. Açık uçlu maddeler yanıtı yapılandırılmış biçimde (0-1-2) puanlanırken, çoktan seçmeli maddeler doğru-yanlış (0-1) şeklinde puanlanmıştır. 84 katılımcının yer aldığı çalışmada yer alan açık uçlu maddeler uzman grubu ve YZ aracı (ChatGPT-4o) tarafından puanlanmıştır. YZ aracına öğrencilerin el yazıları ile yazdıkları görsel yanıtlar iki farklı senaryoda puanlatılmıştır. Birinci senaryoda YZ’ye herhangi bir puanlama ölçütü verilmeden YZ aracının puanlama yapması istenirken, ikinci senaryoda standart puanlama ölçütlerine göre YZ’den puanlama yapması istenmiştir. Araştırmanın bulguları, YZ ile ölçütsüz puanlar ile uzman puanları arasında düşük uyum ve ilişki katsayıları olduğunu gösterirken, YZ ile standart ölçütle puanlama ve uzman puanlamaları arasında yüksek uyum ve ilişki katsayıları olduğu görülmüştür. Bu bulgulara benzer şekilde, YZ ile ölçütsüz puanlamanın madde ayırt edicilikleri oldukça düşük iken, YZ ile standart ölçütle puanlamanın madde ayırt edicilikleri yüksektir. Araştırmada ayrıca uzman puanları ve YZ ile standart ölçütlü puanları arasındaki uyumsuzlukların nedenleri de araştırılmış ve raporlanmıştır. Sonuçlar, YZ’nin standart puanlama ölçütleriyle el yazısıyla yanıtlanmış açık uçlu maddeleri iyi düzeyde puanlayabildiğini göstermektedir. Gelecekte YZ'nin gelişim ve dönüşümüyle birlikte tutarlılık açısından uzman puanlayıcılarla karşılaştırılabilir puanlama doğruluğuna ulaşabileceği düşünülmektedir.

ANAHTAR KELİMELER
Açık uçlu madde, Yapay zekâ, YZ, ChatGPT, Otomatik puanlama, El yazısı yanıtlar, Yapılandırılmış yanıtlı madde

Sosyal Görünüş Kaygısı Ölçeği’nin Meta Analiz ile Güvenirlik Genellemesi

Sosyal Görünüş Kaygısı Ölçeği’nin Meta Analiz ile Güvenirlik Genellemesi, 2022

Sosyal Görünüş Kaygısı Ölçeği (SGKÖ), insanların bedeni ve görünüşüyle ilgili olumsuz beden imajı... more Sosyal Görünüş Kaygısı Ölçeği (SGKÖ), insanların bedeni ve görünüşüyle ilgili olumsuz beden imajı oluşturmasıyla oluşan kaygıyı ölçen öz bildirim ölçeklerinden biridir. Bu çalışma, Hart vd. (2008) tarafından geliştirilen 16 madde ve tek faktörden oluşan Sosyal Görünüş Kaygısı Ölçeği’nin iç tutarlılık kestirimleri hakkında bir güvenirlik genellemesi sunmaktadır. Belirlenen veri tabanlarında yapılan arama sonucunda 99 çalışmaya ulaşılmıştır. Bu çalışmaların 5’inde ölçek kullanılmamış, 23’ü güvenirlik katsayısını bildirmemiş ve 1 çalışmaya erişilememiştir. İlgili ölçeğin güvenirlik katsayısını içeren 68 çalışma ile güvenirlik genellemesi çalışması yürütülmüştür. Birleştirilmiş güvenirlik katsayısı 0.939 [0.869, 0.972] idi. Çalışmalarda raporlanan güvenirlik katsayıları arasındaki değişkenliğin nedenleri moderatör değişkenlere göre incelenmiştir. Bu çalışmanın bulguları araştırmacıların Sosyal Görünüş Kaygısı Ölçeği’ni güvenilir bir şekilde kullanabileceğini göstermektedir. Anahtar kelimeler: sosyal görünüş kaygısı, güvenirlik, iç tutarlılık, güvenirlik genellemesi, meta analiz.

Türkiye’de Sosyal Medya Bağımlılığı ile Depresyon Arasındaki İlişki: Bir Meta Analiz Çalışması

8. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Kongresi, 2022

Bu araştırmanın amacı, Türkiye’de sosyal medya bağımlılığı ile depresyon arasındaki ilişkiyi meta... more Bu araştırmanın amacı, Türkiye’de sosyal medya bağımlılığı ile depresyon arasındaki ilişkiyi meta analiz yöntemiyle incelemektir. Bu amaç doğrultusunda belirlenen anahtar kelimelerle çeşitli veri tabanlarında taramalar yapılmıştır. Rastgele etkiler modeliyle gerçekleştirilen meta analiz çalışmasına, Türkiye evreninde gerçekleştirilmiş olup dahil edilme kriterlerine uyan makale ve lisansüstü tezlerden oluşan 29 çalışma dâhil edilmiştir. Verilerin analizi metafor paketi kullanılarak R studio programında gerçekleştirilmiştir. Araştırmadan elde edilen geçici bulgulara göre sosyal medya bağımlılığı ile depresyon arasında pozitif yönde ve orta düzeyde bir ilişki bulunmaktadır. Anahtar kelimeler: sosyal medya bağımlılığı, depresyon, meta analiz

Bireyselleştirilmiş Çok Aşamalı Testlerde Test Tasarımının Test Katılımcılarının Optimal Olmayan Modüle Yönlendirilmesine Etkisi

8. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Kongresi, 2022

Bireyselleştirilmiş Çok Aşamalı Testler [MST], test katılımcısının önceden birleştirilmiş panel ü... more Bireyselleştirilmiş Çok Aşamalı Testler [MST], test katılımcısının önceden birleştirilmiş panel üzerinde aşama ve modülleri tamamlayarak ilerlediği bir test modelidir. MST’de test katılımcısının yetenek düzeyine en uygun (optimal) modüle yönlendirilmesi hem ölçüm kesinliği hem de uyarlanabilir testin mantığı açısından önemlidir. Bu araştırmanın amacı MST tasarımında yer alan temel bileşenlerin optimal yönlendirmeyi etkileme düzeylerinin incelenmesidir. Bu bağlamda araştırma Monte Carlo simülasyon çalışması ile yürütülmüştür. Test uzunluğunun, yönlendirme modülü uzunluğunun ve yönlendirme modülünün geniş yetenek aralığında yapılandırılmasının optimal modüle yönlendirme düzeyini pozitif yönde etkilediği sonucuna ulaşılmıştır. Anahtar kelimeler: bireyselleştirilmiş çok aşamalı test, optimal yönlendirme, ölçme kesinliği, uyarlanabilir test, computerized adaptive test, computerized multistage testing, optimal routing, measurement precision.

Bireyselleştirilmiş Çok Aşamalı Testlerde Test Tasarımının Yanlış Yönlendirmeye Etkisi

Uluslararası Türk Eğitim Bilimleri Dergisi

Computerized Multistage Testing (MST) is an adaptive testing approach in which the test taker com... more Computerized Multistage Testing (MST) is an adaptive testing approach in which the test taker completes stages and modules on a pre-assembled panel according to his/her ability level. In MST, the test taker is routed to a module in the following stage based on his/her responses to the module in each stage. The test taker is expected to be routed to the module that fits his/her ability level best in the following stages. If the test taker is not routed to the module appropriate to his/her ability level, misrouting can be mentioned. Misrouting is thought to affect both measurement accuracy and the test taker's psychology. Although it is very difficult to completely eliminate misrouting, it is assumed that it can be reduced with the basic components of the MST design. The purpose of this study is to determine the level of misrouting according to different MST designs and to investigate the effects of changes in test design on the level of misrouting. The main components that are co...

Madde Güçlüklerinin Tahmin Edilmesinde Uzman Görüşleri ve ChatGPT Performansının Karşılaştırılması

Disiplinlerarası Eğitim Araştırmaları Dergisi

Bu çalışmada ChatGPT yapay zeka teknolojisinin eğitim alanında destekleyici unsur olarak kullanım... more Bu çalışmada ChatGPT yapay zeka teknolojisinin eğitim alanında destekleyici unsur olarak kullanımına yönelik bir araştırma yürütülmüştür. ChatGPT’nin çoktan seçmeli test maddelerini yanıtlama ve bu maddelerin madde güçlük düzeylerini sınıflama performansı incelenmiştir. 20 maddeden oluşan beş seçenekli çoktan seçmeli test maddesine 4930 öğrencinin verdiği yanıtlara göre madde güçlük düzeyleri belirlenmiştir. Bu güçlük düzeyleri ile ChatGPT’nin ve uzmanların yaptığı sınıflandırmalar arasındaki ilişkiler incelenmiştir. Elde edilen bulgulara göre ChatGPT’nin çoktan seçmeli maddelere doğru yanıt verme performansının yüksek düzeyde olmadığı (%55) görülmüştür. Ancak madde güçlük düzeylerini sınıflandırma konusunda ChatGPT; gerçek madde güçlük düzeyleri ile 0.748, uzman görüşleri ile 0.870 korelasyon göstermiştir. Bu sonuçlara göre deneme uygulamasının yapılamadığı veya uzman görüşlerine başvurulamadığı durumlarda ChatGPT'den test geliştirme aşamalarında destek alınabileceği düşünülmek...

Reliability Generalization of Social Appearance Anxiety Scale: A Meta Analysis Study

Hacettepe Üniversitesi Eğitim Fakültesi dergisi/Hacettepe eğitim dergisi, Jan 25, 2024

The Social Appearance Anxiety Scale (SAAS) is one of the self-report scales that measure the anxi... more The Social Appearance Anxiety Scale (SAAS) is one of the self-report scales that measure the anxiety that occurs when people form a negative body image about their body and appearance. This study provides a reliability generalization about the internal consistency estimates of the Social Appearance Anxiety Scale, which consists of 16 items and a single factor developed by Hart et al. (2008). As a result of the search in the identified databases, 96 studies were found. In 4 of these studies, the scale was not used, 23 did not report the reliability coefficient and 1 study could not be accessed. Reliability generalization study was conducted with 68 studies including the reliability coefficient of the relevant scale. It was concluded that the average reliability coefficient was .937 [.930-.943]. As a result of moderator analyses, it was concluded that there was a statistically significant difference in Cronbach's alpha coefficient according to the subcategories of "language of the scale" and "country of the participants" variables, but there was no statistically significant difference according to the subcategories of "language of the article", "sample type" and "field of study" variables and "average age" variable. With this study, it was concluded that it would not be appropriate to generalize, that is, to use reliability induction, since the reliability coefficients of the Social Appearance Anxiety Scale obtained in different languages and different countries differ. It is recommended that the authors calculate reliability estimates for the data sets they have and report the reliability coefficients obtained.

Cross-National Measurement of Mathematics Intrinsic Motivation: An Investigate of Measurement Invariance with MG-CFA and Aligment Method Across Fourteen Countries

Kuramsal eğitim bilim dergisi, Jan 28, 2024

One of the main objectives of international large-scale assessments is to make comparisons betwee... more One of the main objectives of international large-scale assessments is to make comparisons between different countries, education policies, education systems, or subgroups. One of the main criteria for making comparisons between different groups is to ensure measurement invariance. The purpose of this study was to test the measurement invariance of the mathematics intrinsic motivation scale across 14 countries. For this purpose, the "students like learning mathematics" scale, which measures intrinsic motivation for mathematics, was included in the TIMSS 2019 cycle. The study sample consisted of a total of 152992 students, 70192 4th grade and 82800 8th grade students from 14 different countries participating in the TIMSS 2019 cycle. Measurement invariance was tested with Multi-Group Confirmatory Factor Analysis (MG-CFA) and Alignment Method. The mathematics intrinsic motivation scale provides only configural invariance according to MG-CFA at the 4th grade level, whereas the scale provides approximate invariance according to the alignment method. At the 8th grade level, the scale provides configural and metric invariance according to MG-CFA, whereas the scale provides approximate invariance according to the alignment method. The results indicate that the mathematics intrinsic motivation scale provides approximate measurement invariance at both grade levels and that comparisons can be made between the scores of the identified countries.

Computerized Multistage Testing: Principles, Designs and Practices with R

Measurement: Interdisciplinary Research and Perspectives

Madde Güçlüklerinin Tahmin Edilmesinde Uzman Görüşleri ve ChatGPT Performansının Karşılaştırılması Comparison of Expert Opinions and ChatGPT Performance in Predicting Item Difficulties [Comparison of Expert Opinions and ChatGPT Performance in Predicting Item Difficulties]

Disiplinlerarası Eğitim Araştırmaları Dergisi, 2023

Bu çalışmada ChatGPT'nin çoktan seçmeli test maddelerini yanıtlama ve bu maddelerin madde güçlük ... more Bu çalışmada ChatGPT'nin çoktan seçmeli test maddelerini yanıtlama ve bu maddelerin madde güçlük düzeylerini sınıflama performansı incelenmiştir. 20 maddeden oluşan beş seçenekli çoktan seçmeli test maddesine 4930 öğrencinin verdiği yanıtlara göre madde güçlük düzeyleri belirlenmiştir. Bu güçlük düzeyleri ile ChatGPT'nin ve uzmanların yaptığı sınıflandırmalar arasındaki ilişkiler incelenmiştir. Elde edilen bulgulara göre ChatGPT'nin çoktan seçmeli maddelere doğru yanıt verme performansının orta düzeyde olduğu (%55) görülmüştür. Ancak madde güçlük düzeylerini sınıflandırma konusunda ChatGPT; gerçek madde güçlük düzeyleri ile 0.748, uzman görüşleri ile 0.870 korelasyon göstermiştir. Bu sonuçlara göre deneme uygulamasının yapılamadığı veya uzman görüşlerine başvurulamadığı durumlarda ChatGPT'den test geliştirme aşamalarında destek alınabileceği düşünülmektedir. Geniş ölçekli sınavlarda da uzman gözetiminde ChatGPT benzeri yapay zekâ teknolojilerinden faydalanılabilir.

In this study, ChatGPT's performance in answering multiple-choice test items and classifying the item
difficulty levels of these items was examined. Item’s actual difficulty levels were determined according
to the responses of 4930 students to the five-choice multiple-choice test items consisting of 20 items.
The relationships between these difficulty levels and the classifications made by ChatGPT and experts
were tested. The findings demonsrated that ChatGPT's performance in giving correct answers to
multiple-choice items was at moderate level (55%). However, in terms of classifying item difficulty
levels, ChatGPT showed a correlation of 0.748 with actual item difficulty levels and 0.870 with expert
opinions. According to these results, it is thought that ChatGPT can be used to support test development
in cases where trial application cannot be conducted or expert opinions cannot be consulted. In largescale exams, ChatGPT-like artificial intelligence technologies can be utilized under expert supervision.
Keywords: ChatGPT, artificial intelligence, item difficulties, expert opinion

Matematik Duyuşsal Özellik Faktörlerinin Cinsiyete Göre Ölçme Değişmezliğinin İncelenmesi: TIMSS 2019 Türkiye Örneği [Investigation of Measurement Invariance of Mathematics Affective Characteristic Factors According to Gender: TIMSS 2019 Turkey Sample]

Anadolu University Journal of Education Faculty (AUJEF), 2023

One of the main objectives of large-scale assessments is to draw conclusions about education poli... more One of the main objectives of large-scale assessments is to draw conclusions about education policies or education systems by making comparisons between different countries or subgroups. One of the main criteria for making comparisons between different groups is to satisfy measurement invariance. Measurement invariance indicates that the measured construct is psychometrically equivalent between groups. Claims of differences in comparisons without evidence of measurement invariance can be unreliable. The aim of this study was to test the measurement invariance of the model created with mathematics affective characteristics according to gender. For this purpose, the Mathematics Affective Characteristics Model was created with the scales of Like Learning Mathematics (MOS), Instructional Clarity in Mathematics Lessons (MON), Disorderly Behavior During Mathematics Lessons (MDDD), Students Confident in Mathematics (MKG) and Students Value Mathematics (MDV) in the TIMSS 2019 cycle. The sample of the study consists of 3658 students from Turkey who participated in the TIMSS 2019 cycle at the 8th grade level. In the first part of the study, Confirmatory Factor Analysis (CFA) was conducted to examine the factor structure of the mathematics affective characteristics model. DFA model results show that model data fit is reached (RMSEA=0.046, SRMR=0.051, CFI=0.973 and TLI=0.975). In the measurement invariance analysis, it was tested hierarchically between the stages with Multi-Group CFA (MG-CFA) analysis. The findings show that the mathematics affective characteristics model meets the configural, metric, scaler, and strict invariance stages, respectively. Therefore, the factor loadings, variances, error variances and covariances of the mathematics affective characteristics model were equivalent according to gender, and it was concluded that significant comparisons could be made between the groups. After examining measurement invariance, t-test analyses were conducted to examine the significant differences of the variables in the model according to gender. The results indicate that there is a significant difference in favor of boys in the MON scale, in favor of girls in the MKG and MDDD scales, while there is no significant difference in the MDV and MOS variables according to gender.

Uluslararası geniş ölçekli değerlendirmelerin temel hedeflerinden biri göre farklı ülkeler veya altgruplar arasında karşılaştırmalar yaparak eğitim politikaları veya eğitim sistemleri hakkında çıkarımlarda bulunmaktır. Farklı gruplar arasında karşılaştırma yapmanın temel kriterlerinden biri de ölçme değişmezliğinin sağlanmasıdır. Ölçme değişmezliği, ölçülen yapının gruplar arasında psikometrik olarak eşdeğer olduğunu göstermektedir. Ölçme değişmezliği kanıtı sunulmadan yapılan karşılaştırmalardaki farklılıklara dair iddialar güvenilmez olabilir. Bu çalışmanın amacı matematik duyuşsal özellikleri ile oluşturulan modelin cinsiyete göre ölçme değişmezliğinin sınanmasıdır. Bu amaçla TIMSS 2019 döngüsünde yer alan matematik öğrenmeyi sevme (MOS), matematik öğretiminin netliği (MON), matematik dersinde disiplinsiz davranış (MDDD), matematikte kendine güven (MKG) ve matematiğe değer verme (MDV) ölçekleri ile Matematik Duyuşsal Özellikleri Modeli oluşturulmuştur. Çalışmanın örneklemini TIMSS 2019 döngüsüne 8. Sınıf düzeyinde Türkiyeden katılan 3658 öğrenci oluşturmaktadır. Araştırmanın ilk bölümünde matematik duyuşsal özellikler modelinin faktör yapısını incelemek için Doğrulayıcı Faktör Analizi (DFA) yapılmıştır. DFA modeli sonuçları model veri uyumunun sağlandığını göstermektedir (RMSEA=0.046, SRMR=0.051, CFI=0.973 ve TLI=0.975). Ölçme değişmezliği analizinde Çok Gruplu DFA (ÇG-DFA) analizi ile aşamalar arasında hiyerarşik olarak test edilmiştir. Bulgular, matematik duyuşsal özellikler modelinin sırasıyla yapısal, metrik, ölçek ve katı değişmezlik aşamalarını karşıladığını göstermektedir. Dolayısıyla matematik duyuşsal özellikler modelinin cinsiyete göre faktör yükleri, varyansları, hata varyansları ve kovaryansları eşdeğer olup gruplar arasında anlamlı karşılaştırmalar yapılabileceği sonucuna ulaşılmıştır. Ölçme değişmezliğinin incelenmesinin ardından modelde yer alan değişkenlerin cinsiyete göre anlamlı farklılıklarını incelemek için t testi analizleri gerçekleştirilmiştir. Sonuçlar, MON ölçeğinde erkekler lehine, MKG ve MDDD ölçeklerinde kızlar lehine anlamlı farklılık olduğuna işaret ederken, MDV ve MOS değişkenlerinde cinsiyete göre anlamlı farklılık bulunmamaktadır.

Does Quantum Learning Model Increase Academic Achievement A Meta-Analysis Study [Kuantum Öğrenme Modeli Akademik Başarıyı Arttırıyor mu Bir Meta-Analiz Çalışması]

Cumhuriyet International Journal of Education, 2023

Quantum Learning Model (QLM) is a model that enables students to have a joyful learning experienc... more Quantum Learning Model (QLM) is a model that enables students to have a joyful learning experience, aims to realise permanent learning, and aims to learn by making sense in the mind of the individual. This study aimed to systematically synthesise the effect of QLM on academic achievement through meta-analysis method through existing research. A search of five databases yielded 25 studies that met the inclusion criteria. The findings of the random effects meta-analysis showed that the effect of QLM on academic achievement was positive and large (d=1.051 [0.769, 1.331], p<.05). According to the moderator analysis results, the results concluded that publication year, sample size, publication type, course, country and pretest status variables were not significant sources of heterogeneity. The highest effect of QLM on academic achievement was found at the middle school level, followed by primary school, high school and university levels, respectively. The results of the study suggest that QLM is effective on academic achievement. The study also provides suggestions for future studies on QLM.

The Relationship Between Problematic Social Media Use and Depression: A Meta-Analysis Study

Current Psychology

Öğretmenleri̇n Uzaktan Eği̇ti̇mde Yaşadiği Sorunlarin Siralama Yargilari Kanunuyla Ölçeklenmesi̇

Boğaziçi Üniversitesi dergisi, eğitim bilimleri, Jul 20, 2022

Bu çalışmanın amacı, öğretmenlerin uzaktan eğitimde yaşadığı sorunların önem sıralamasını belirle... more Bu çalışmanın amacı, öğretmenlerin uzaktan eğitimde yaşadığı sorunların önem sıralamasını belirlemektir. Bu kapsamda nicel araştırma türlerinden tarama araştırması yönteminden faydalanılmıştır. Uzaktan eğitimde öğretmenlerin yaşadığı sorunlar belirlenmiştir. Ardından sorunlar ile oluşturulan ölçek, çalışma grubunda yer alan ve aktif olarak uzaktan eğitim veren 906 öğretmene uygulanmıştır. Çalışma grubundan elde edilen veri, ölçekleme yaklaşımlarından sıralama yargıları ile ölçekleme yöntemi kullanılarak analiz edilmiştir. Araştırmanın sonuçlarına göre öğretmenlerin yaşadığı en önemli iki sorun "öğrencilerin derse katılma isteksizliği" ve "internet erişimi sorunu" iken, daha az önemli görülen iki sorun ise "online ders yazılımı sorunları" ve "uzaktan eğitimde ders dokümanı yetersizliği" dir. Ayrıca "öğrencilerin derse katılma isteksizliği" sorunu ilkokul düzeyinde önemli bir sorun olarak görülmezken, ortaokul ve lise düzeyinde önemli bir sorun olduğu sonucuna ulaşılmıştır.

Öğretmenlerin Uzaktan Eğitimde Yaşadığı Sorunların Sıralama Yargıları Kanunuyla Ölçeklenmesi

Boğaziçi Üniversitesi Eğitim Dergisi, 2022

Bu çalışmanın amacı, öğretmenlerin uzaktan eğitimde yaşadığı sorunların önem sıralamasını belirle... more Bu çalışmanın amacı, öğretmenlerin uzaktan eğitimde yaşadığı sorunların önem sıralamasını belirlemektir. Bu kapsamda nicel araştırma türlerinden tarama araştırması yönteminden faydalanılmıştır. Uzaktan eğitimde öğretmenlerin yaşadığı sorunlar belirlenmiştir. Ardından sorunlar ile oluşturulan ölçek, çalışma grubunda yer alan ve aktif olarak uzaktan eğitim veren 906 öğretmene uygulanmıştır. Çalışma grubundan elde edilen veri, ölçekleme yaklaşımlarından sıralama yargıları ile ölçekleme yöntemi kullanılarak analiz edilmiştir. Araştırmanın sonuçlarına göre öğretmenlerin yaşadığı en önemli iki sorun "öğrencilerin derse katılma isteksizliği" ve "internet erişimi sorunu" iken, daha az önemli görülen iki sorun ise "online ders yazılımı sorunları" ve "uzaktan eğitimde ders dokümanı yetersizliği" dir. Ayrıca "öğrencilerin derse katılma isteksizliği" sorunu ilkokul düzeyinde önemli bir sorun olarak görülmezken, ortaokul ve lise düzeyinde önemli bir sorun olduğu sonucuna ulaşılmıştır.

Gender Differences Measurement Invariance in Test Anxiety across the World Evidence From PISA 2015 Yigiter Boduroglu

Recently, students have become more anxious in their education process due to increasing competit... more Recently, students have become more anxious in their education process due to increasing competitive conditions and rising academic expectations. The pressure from the environment, family, or school stakeholders to get high grades increases the stress on the student with the fear of failure. Test anxiety refers to the feelings of fear, worry, and tension that students may experience in relation to their academic performance, school loads, and expectations from students (Chamberlain, Daly, and Spalding, 2011; Putwain and Daly, 2014). In other words, test anxiety describes the change in students' stress and anxiety levels due to the exams or studies they perform at school. The main triggers of test anxiety can be factors such as fear of failure, pressure from high demands, competition with peers, or lack of self-confidence (Ringeisen & Raufelder, 2015; Tan & Pang, 2023). There is a broad consensus in the literature that test anxiety is related to academic achievement (Ali & Mohsin, 2013; Crişan & Copaci, 2015; von der Embse, Jester, Roy, & Post, 2018). von der Embse et al. (2018) reported that test anxiety has a negative effect on many educational performance indicators in their meta-analysis of 238 studies on test anxiety since 1988. In addition, there are studies showing that test anxiety is closely related to many psychological symptoms such as depressive symptoms, stress, and emotional balance disorders (Augner, 2015; von der Embse, Barterian, & Segool, 2013). Test anxiety can also lead to different physical symptoms such as heart palpitations, rapid pulse, rapid breathing, sweating, headache, abdominal pain or nausea, sleep problems, and fatigue (Mashayekh & Hashemi, 2011; Chishti & Rana, 2021). In addition, it is also stated in the literature that test anxiety can have future-oriented effects such as skipping classes, delaying or giving up academic goals (Pekrun, 2006; Lowe et al., 2008). Therefore, it can be said that test anxiety will have negative effects on students' academic performance, mental health and overall quality of life if it is not given due importance. 15 Test Anxiety Any attempt to measure students' academic development in which academic performance is evaluated brings to mind the concept of test anxiety, which in a sense causes students to react with anxiety (Hodapp, Glanzmann, & Laux, 1995). Test anxiety is the mental, psychological, or physical behavioral reactions that occur due to the worry of the possible negative consequences of failing an exam or an assessment (Zeidner, 1998). School assignments, exams, pressure to get high grades, and fear of getting low grades are seen as the most prominent causes of test anxiety (McDonald, 2001; Yakıcı and Kandemir, 2021; Demir, 2022). Some students develop anxiety when they cannot solve tasks at school, when they have problems with homework, when they are preparing for an exam or when they feel that they will take an exam (Zeidner, 2007). Anxiety is expected to be higher in students with low levels of confidence in themselves or their abilities or in students with high levels of parental expectations. Students with test anxiety are more likely to underperform, be absent frequently, or drop out of school completely (Cortina, 2008; Ramirez & Beilock, 2011). Anxiety can affect students' motivation and disrupt their learning strategies (Varasteh, Ghanizadeh, & Akbari, 2016). In addition, anxiety has many effects on mental, psychological, physical or quality of life (Lohiya et al., 2021; Chen et al., 2023). Gender Differences in Test Anxiety There are many studies reporting that there is a gender difference in test anxiety between male and female groups. In self-reported test anxiety, it is generally seen that girls have more test anxiety than boys (Hembree, 1988; Seipp & Schwarzer, 1996; von der Embse et al., 2018). Donati et al. (2020) stated that girls have higher test anxiety than boys in the German Test Anxiety Inventory. Devine, Fawcett, 16 Szűcs, and Dowker (2012) examined the relationships between mathematics performance, mathematics anxiety and test anxiety by gender and found that girls have higher test anxiety than boys. Robson, Johnstone, Putwain, and Howard (2023), in a meta-analysis of articles written on test anxiety in the last 20 years, report that girls show higher test anxiety than boys. In general, there are many studies in the literature indicating that girls have higher test anxiety than boys. Some studies have also been conducted on why girls may have higher levels of test anxiety than boys. In these studies, it is stated that women may be more prone to anxiety, stress or depression due to coping style, socialization, or genetic factors (Goodwin & Gotlib, 2004; Olatunji et al., 2013). Women show their emotions more easily and therefore, in terms of socialization practice, women are more likely to have higher levels of anxiety (Chaplin, 2015). Therefore, a higher anxiety tendency may lead to higher test anxiety (McLean et al., 2011). On the other hand, gender differences in test anxiety may also stem from the sub-dimensions of test anxiety. It can be seen that this difference is sometimes higher in the emotionality and anxiety components of test anxiety (Putwain, 2007) and sometimes only in the emotionality component (Zeidner & Nevo, 1992). On the other hand, the self-report method of measuring test anxiety may also be a variable in explaining the gender difference. As a result, it is thought that new studies should be conducted to better determine the difference in test anxiety. Measurement Invariance Large-scale international assessments play an important role in comparing the qualifications of individuals across countries. The Programme for International Student Assessment (PISA) provides data on students' academic achievement for cross-country comparisons. In such large-scale assessments, the precondition for making comparisons 17 between groups is to ensure statistical equivalence between groups (Millsap & Olivera-Ogilar, 2012). While test adaptation takes place before data collection, assessment of the measurement invariance of scores takes place after data collection to decide on score equivalence between groups. Measurement invariance indicates that the item parameters of a scale are equivalent across groups (Vandenberg & Lance, 2000). Scores meet the condition of measurement invariance if individuals with similar characteristics are equally likely to choose a particular item response, regardless of group membership. In the development of each measurement tool, it is assumed as a basic principle that it measures the same trait in each group to which it is applied. However, in practice, the results may differ according to the groups to which they are applied. If the results obtained from the groups do not have equivalent psychometric qualities, it would not be correct to compare the results between the groups (Başusta & Gelbal, 2015; Yiğiter, 2023). Therefore, the measurement tool should measure the same construct in the same way in each subgroup. With measurement invariance, showing that the factor loadings, inter-dimensional correlations and error variances of a measurement model are the same in each group shows that the measurement tool has the same structure in different groups (Jöreskog & Sörbom, 1993). Researchers obtain evidence on whether the scale measures the same construct in subgroups (Millsap & Olivera-Ogilar, 2012). If measurement invariance cannot be ensured, a validity problem occurs for the measurement tool. Therefore, the interpretations to be obtained from group comparisons based on the scores obtained from this measurement tool may also be incorrect (Vandenberg & Lance, 2000). Measurement invariance also provides evidence of the validity of the instrument. He et al. (2019), in their research on cross-cultural comparability with TIMSS and PISA data, state that comparisons made without examining measurement invariance may lead to 18 incorrect results, hence the importance of testing measurement invariance. If measurement invariance is not ensured, it is not possible to know whether the difference between the scores to be obtained from the measurement tool is due to a real difference in construct or scale items (Horn & McArdle, 1992).