Papers by Violetta Cavalli-Sforza

Educational Data Mining, Jul 1, 2020
EDM brings together researchers from computer science, education, psychology, psychometrics, and ... more EDM brings together researchers from computer science, education, psychology, psychometrics, and statistics to analyze large data sets to answer educational research questions. The increase in instrumented educational software and databases of student test scores, has created large repositories of data reflecting how students learn. The EDM conference focuses on computational approaches for using those data to address important educational questions. The broad collection of research disciplines ensures cross fertilization of ideas, with the central questions of educational research serving as a unifying focus. We received a total of 54 submissions from 24 countries. Submissions were reviewed by three reviewers and 20 of them were accepted as full papers (37.03% acceptance rate). 13 other submissions were accepted as poster or as student papers. All papers will appear both on the web, at www.educationaldatamining.org, as well as in the printed proceedings. The conference also included invited talks by Professor Arthur C.
A prototype system for transnational information sharing and process coordination: system demo
Abstract: Global problems such as disease detection and control, terrorism, immigration and borde... more Abstract: Global problems such as disease detection and control, terrorism, immigration and border control,illicit drug trafficking, etc. require information sharing, coordination and collaboration amonggovernment agencies within a country and across national boundaries. This paper presents a prototype ofa transnational information system which aims at achieving information sharing, process coordination andenforcement of policies, constraints, regulations, and security and privacy rules by...

Science Education, Nov 1, 1994
Computer environments could support students in engaging in cognitive activities that are essenti... more Computer environments could support students in engaging in cognitive activities that are essential to scientific practice and to the understanding of the nature of scientific knowledge, but that are difficult to manage in science classrooms. The authors describe a design for a computer-based environment to assist students in conducting dialectical activities of constructing, comparing, and evaluating arguments for competing scientific theories. Their choice of activities and their design respond to educators' and theorists' criticisms of current science curricula. They give detailed specifications of portions of the environment. 0 1994 John Wiley & Sons, Inc. BUILDING A SENSE OF SCIENTIFIC KNOWLEDGE For more than a decade, curriculum study groups have proposed "understanding the nature of scientific knowledge" as an important goal of science learning (National Science Teachers' Association, 1982). Recently, there has been interest in an instructional focus on how scientific theories develop and are revised (Duschl, 1990; Duschl et al., 1992; Kuhn, 1991, 1993; Ohlsson, 1992). Warnings of the difficulty of implementing this epistemological focus have been raised, however (Duschl & Gitomer, 1991). A constructivist perspective suggests that, if we are to make students' understanding of scientific knowledge a goal, we should attend to the conceptions of scientific knowledge that they are likely to hold when they enter the science classroom. Perry (1970) and Kitchener (1992) have characterized young people's conceptions of knowledge, based on large-scale samples. Perry, who studied elite college students in the 1960s, characterized the beliefs of many as either "dualist" or "multiplicist." That is, a large proportion of the students believed that knowledge claims can be classified An earlier version of this article (V. Cavalli-Sforza, G. Gabrys, A. Lesgold, and A. Weiner, "Engaging Students in Scientific Argumentation and Scientific Controversy") appeared in the Workshop Notes of the AAAI '92 Workshop Program: Communicating Scientific & Technical Knowledge, San Jose, CA, pp. 99-106. We thank the American Association for Artificial Intelligence for permission to use it.

Springer eBooks, 2000
Graphical representations have long been associated with more efficient problem solving. More rec... more Graphical representations have long been associated with more efficient problem solving. More recently, researchers have begun looking at how representation may affect the information that students attend to and what they learn. In this paper we report on a study of how graphical representation may influence interaction between a human coach and a student engaged in analyzing argument texts. We compared coaching interaction with subjects working with a predefined graphical representation to subjects who developed their own representation. The predefined representation, with a better "cognitive fit", to the task, allowed subjects to do more work on their own. Coaching was more systematic and both more efficient and more effective. 2 Study Design and General Results Four subjects, non-science major college undergraduates, participated in an extended experiment in which they studied several concepts pertaining to the description, support and critique of causal theories. They analyzed short texts drawn from a historical scientific debate. They drew diagrams representing the information in the texts, either the description of a causal theory or arguments in support of and/or against such a theory. They received coaching from the experimenter throughout diagram construction. Two subjects (in the FIXED condition) used a predefined box-and-arrow graphical representation that specifically encoded important concepts of scientific argument. Distinct shapes, enclosing text, represented scientific propositions with different statuses (e.g., rectangle = observation, rounded rectangle = explanation). Links with specific names, directionality, and arrowheads provided different types of relationships between propositions and/or other links. The remaining two subjects (in the FREE condition) had similar graphical primitives available but could use them and label them at will, thereby effectively constructing their own graphical representation. FIXED condition subjects studied sample analyses of texts that used the predefined graphical representation. FREE condition subjects viewed the same analyses through a schematic text-based representation in which sentences were labeled by their role (e.g., "premises", "conclusion", "claim", "grounds", and "warrant"). Subjects in the FIXED condition adapted rather easily to the representation they were given. Subjects in the FREE condition went down very different representational paths. One subject (Free-1) developed, rather laboriously, a representation similar to that used in the FIXED condition, although it remained plagued by inconsistencies and other problems throughout the experiment. The other subject (Free-2) drew the causal theory diagrams using a curious mix of analogical and abstract representation (e.g., using 7 shapes to represent 7 continents). Since the chosen representation could not be easily extended to express more abstract content, for texts containing primarily arguments Free-2 fell back on a labeled text strategy similar to the one used in the instructional materials. Consonant with Suthers' [6] hypotheses, only subjects who used a box-and-arrow abstract representation (Free-1 and FIXED condition subjects) expressed in their diagrams some of the more complex relational concepts (e.g., multi-step support, dialectical argument patterns), which were realized with distinctive linkage patterns. There was also a link between subjects' use of the concept in the diagram and the ability to give a good definition. The crucial factor underlying these results appeared to be whether the representation used by subjects was relation-centered, as box-andarrow representations are, or role-centered, as the labeled-text schema adopted by Free-2 is. These findings point out that it is risky to let students develop their own representation for a task, though it may lead to deeper processing. At best, like Free-1, students will develop a sufficiently expressive representation, but at a significant cost in time, clarity of the resulting work, and attention that could be focused on target instructional concepts. At worst, like Free-2, they may fail to find an adequate representation and consequently may not learn to apply the target concepts.

Arabic Readability Assessment for Foreign Language Learners
Lecture Notes in Computer Science, 2018
Reading in a foreign language is a difficult task, especially if the texts presented to readers a... more Reading in a foreign language is a difficult task, especially if the texts presented to readers are chosen without taking into account the reader’s skill level. Foreign language learners need to be presented with reading material suitable to their reading capacities. A basic tool for determining if a text is appropriate to a reader’s level is the assessment of its readability, a measure that aims to represent the human capacities required to comprehend a given text. Readability prediction for a text is an important aspect in the process of teaching and learning, for reading in a foreign language as well as in one’s native language, and continues to be a central area of research and practice. In this paper, we present our approach to readability assessment for Modern Standard Arabic (MSA) as a foreign language. Readability prediction is carried out using the Global Language Online Support System (GLOSS) corpus, which was developed for independent learners to improve their foreign language skills and was annotated with the Interagency Language Roundtable (ILR) scale. In this study, we introduce a frequency dictionary, which was developed to calculate frequency-based features. The approach gives results that surpass the state-of the-art results for Arabic.

Approaches, Methods, and Resources for Assessing the Readability of Arabic Texts
ACM Transactions on Asian and Low-Resource Language Information Processing, Mar 25, 2023
Text readability assessment is a well-known problem that has acquired even more importance in tod... more Text readability assessment is a well-known problem that has acquired even more importance in today’s information-rich world. In this article, we survey various approaches to measuring and assessing the readability of texts. Our specific goal is to provide a perspective on the state-of-the-art in readability assessment research for Arabic, which differs significantly from other languages on which readability studies have tended to focus. We provide background on readability assessment research and tools for English, for which readability studies are the most advanced. We then survey approaches adopted for Arabic, both classical formula-based approaches and studies that combine Machine Learning (ML) with Natural Language Processing (NLP) techniques. The works we cover target text corpora for different audiences: school-age first language readers (L1), foreign language learners (L2), and adult readers in non-academic contexts. Therefore, we explore differences between reading in L1 and L2 and consider how they play out specifically in Arabic after describing language characteristics that may impact readability. Finally, we highlight challenges for Arabic readability research and propose multiple future directions to improve readability assessment and related applications that would benefit from more attention.
Lexical Simplification of Arabic Educational Texts Through a Classification Approach
Springer eBooks, 2023

MoSAR
Today there is a large amount of valuable research on corpora, and the availability of corpora ha... more Today there is a large amount of valuable research on corpora, and the availability of corpora has increased significantly in recent years. Unfortunately, this is not the case for all types of corpora. Research in the field of Arabic language processing suffers from a great lack of annotated educational corpora. In this work, we have tried to constitute a new educational corpus by drawing from Moroccan primary school books. This corpus will help education researchers and computational linguists provide appropriate tools to support school students who are learning formal Arabic. We annotated the corpus with morphosyntactic information that can be used in several natural language processing applications. We also added a text difficulty measure, linked to the Moroccan primary school levels, so that the corpus can be used in the development of readability measurement applications. The result is a Modern Standard Arabic Language corpus dedicated to young learners of Arabic as a first language (L1). The corpus is manually labeled by seven levels, namely the primary levels of the Moroccan educational system from 1st to 6th grade, in addition to a more basic level we called level 0.
Arabic Computational Morphology: A Trade-off Between Multiple Operations and Multiple Stems
Springer eBooks, Sep 30, 2007
We present a computational approach to Arabic morphology description that draws from Lexeme-Based... more We present a computational approach to Arabic morphology description that draws from Lexeme-Based Morphology (Aronoff, 1994; Beard, 1995), giving priority to stems and granting a subordinate status to inflectional prefixes and suffixes. Although the morphology of Arabic is non-concatenative, we make the process of generating inflected forms concatenative by separating the generation of stems from that of other inflectional affixes.

We describe ongoing efforts towards developing language resources for a transnational digital gov... more We describe ongoing efforts towards developing language resources for a transnational digital government project aimed at applying information technology (IT) to a problem of international concern: detecting and monitoring activities related to the transnational movement of illicit drugs. The project seeks to support information sharing, coordination and collaboration among government agencies within a country and across national boundaries by combining a variety of technologies including a distributed query processor with form-based and conversational user interfaces, a language translation system, an event server for event filtering and notification, and an event-trigger-rule server. The prototype system is being developed by U.S. universities in collaboration with an international agency and with universities and government agencies in Belize and the Dominican Republic. This paper focuses on the linguistic resources and their use in Example-Based Machine Translation (EBMT). We are in the process of developing an English-Spanish parallel corpus, focused on the domain of information elicited and used at border crossings, to fuel the EBMT system. While significant parallel corpora are available for these two languages in the newswire domain, they were found to be of very limited use for the border crossings application, spurring the need to develop our own resources.

Language Resources and Evaluation, May 1, 2004
We describe ongoing efforts towards developing language resources for a transnational digital gov... more We describe ongoing efforts towards developing language resources for a transnational digital government project aimed at applying information technology (IT) to a problem of international concern: detecting and monitoring activities related to the transnational movement of illicit drugs. The project seeks to support information sharing, coordination and collaboration among government agencies within a country and across national boundaries by combining a variety of technologies including a distributed query processor with form-based and conversational user interfaces, a language translation system, an event server for event filtering and notification, and an event-trigger-rule server. The prototype system is being developed by U.S. universities in collaboration with an international agency and with universities and government agencies in Belize and the Dominican Republic. This paper focuses on the linguistic resources and their use in Example-Based Machine Translation (EBMT). We are in the process of developing an English-Spanish parallel corpus, focused on the domain of information elicited and used at border crossings, to fuel the EBMT system. While significant parallel corpora are available for these two languages in the newswire domain, they were found to be of very limited use for the border crossings application, spurring the need to develop our own resources.
Building Intelligent Chatbots: Tools, Technologies, and Approaches
2023 3rd International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET)

Approaches, Methods, and Resources for Assessing the Readability of Arabic Texts
ACM Transactions on Asian and Low-Resource Language Information Processing
Text readability assessment is a well-known problem that has acquired even more importance in tod... more Text readability assessment is a well-known problem that has acquired even more importance in today’s information-rich world. In this article, we survey various approaches to measuring and assessing the readability of texts. Our specific goal is to provide a perspective on the state-of-the-art in readability assessment research for Arabic, which differs significantly from other languages on which readability studies have tended to focus. We provide background on readability assessment research and tools for English, for which readability studies are the most advanced. We then survey approaches adopted for Arabic, both classical formula-based approaches and studies that combine Machine Learning (ML) with Natural Language Processing (NLP) techniques. The works we cover target text corpora for different audiences: school-age first language readers (L1), foreign language learners (L2), and adult readers in non-academic contexts. Therefore, we explore differences between reading in L1 an...

IEEE Access
Text-to-graphics systems encompass three types of tools: text-to-picture, text-to-scene and text-... more Text-to-graphics systems encompass three types of tools: text-to-picture, text-to-scene and text-to-animation. They are an artificial intelligence application wherein users can create 2D and 3D scenes or animations and recently immersive environments from natural language. These complex tasks require the collaboration of various fields, such as natural language processing, computational linguistics and computer graphics. Text-to-animation systems have received more interest than their counterparts, and have been developed for various domains, including theatrical pre-production, education or training. In this survey we focus on text-to-animation systems, discussing their requirements, challenges and proposing solutions, and investigate the natural language understanding approaches adopted in previous research works to solve the challenge of animation generation. We review text-to-animation systems developed over the period 2001-2021, and investigate their recent trends in order to paint the current landscape of the field. INDEX TERMS Natural language interface, natural language understanding, computer graphics, semantic parsing, visual semantics.
Global problems such as disease detection and control, terrorism, immigration and border control,... more Global problems such as disease detection and control, terrorism, immigration and border control, illicit drug trafficking, etc. require information sharing, coordination and collaboration among government agencies within a country and across national boundaries. This paper presents a prototype of a transnational information system which aims at achieving information sharing, process coordination and enforcement of policies, constraints, regulations, and security and privacy rules by integrating a distributed query processor with form-based and conversational user interfaces, a language translation system, an event server for event filtering and notification, and an event-trigger-rule server. The Web-services infrastructure is used to achieve the interoperation of these heterogeneous component systems.

Published In
Abstract. We describe ongoing efforts towards and challenges in using an Example-Based Machine Tr... more Abstract. We describe ongoing efforts towards and challenges in using an Example-Based Machine Translation (EBMT) system in the context of a multi-national, multi-university and multi-agency transnational digital government project. The project is aimed at applying information technology to the problem of collecting and sharing information securely in a multilingual context. We report on a number of issues encountered in obtaining and using language data for the EBMT system, discuss our current solutions, and briefly describe ongoing enhancements to the system to meet some of the technical and practical challenges posed by using this machine translation approach in the project domain. 1. Background We describe ongoing efforts towards and challenges in adapting and using an Example-Based Machine Translation (EBMT) system in the context of a transnational digital government project (Cavalli-Sforza, et al., 2003; Su et al., under
Global problems such as disease detection and control, terrorism, immigration and border control,... more Global problems such as disease detection and control, terrorism, immigration and border control, illicit drug trafficking, etc. require information sharing, coordination and collaboration among government agencies within a country and across national boundaries. This paper presents a prototype of a transnational information system which aims at achieving information sharing, process coordination and enforcement of policies, constraints, regulations, and security and privacy rules by integrating a distributed query processor with form-based and conversational user interfaces, a language translation system, an event server for event filtering and notification, and an event-trigger-rule server. The Web-services infrastructure is used to achieve the interoperation of these heterogeneous component systems.
Intelligent Learning by Doing Tools for Technical and Dialectical Knowledge
Organizational Learning and Technological Change, 1995
New members entering productive organizations require considerable training. Computer tools can s... more New members entering productive organizations require considerable training. Computer tools can support such training by providing an opportunity to learn while engaging in authentic activities and receiving appropriate coaching. We describe two tools that incorporate this approach. Sherlock, an existing computer coach, is an effective environment for learning how to troubleshoot complex electronic devices. A newer research effort focuses on tools for supporting knowledge-building argumentation and scientific theory evaluation in post-elementary school science education. Both tools offer users opportunities for reflecting on their own performance and support individual as well as collaborative learning.
Combining Classical and Non-classical Features to Improve Readability Measures for Arabic First Language Texts
Advanced Intelligent Systems for Sustainable Development (AI2SD’2020), 2022

Published In
Abstract. We describe ongoing efforts towards and challenges in using an Example-Based Machine Tr... more Abstract. We describe ongoing efforts towards and challenges in using an Example-Based Machine Translation (EBMT) system in the context of a multi-national, multi-university and multi-agency transnational digital government project. The project is aimed at applying information technology to the problem of collecting and sharing information securely in a multilingual context. We report on a number of issues encountered in obtaining and using language data for the EBMT system, discuss our current solutions, and briefly describe ongoing enhancements to the system to meet some of the technical and practical challenges posed by using this machine translation approach in the project domain. 1. Background We describe ongoing efforts towards and challenges in adapting and using an Example-Based Machine Translation (EBMT) system in the context of a transnational digital government project (Cavalli-Sforza, et al., 2003; Su et al., under
Uploads
Papers by Violetta Cavalli-Sforza