0% encontró este documento útil (0 votos)

75 vistas18 páginas

A Comparison of Synthetic and Human Speech: An Evaluation by English As A Foreign Language Students in A Public Costa Rican University

This document provides an abstract for an article that compares perceptions of synthetic and human speech among English as a foreign language students at a public Costa Rican university. The study was conducted from April to September 2022 using a quantitative survey to collect students' perceptions of computer-generated voices, human voices, and listening instruction. The data were analyzed using descriptive statistics. The analysis found that students generally find human voices more appealing than artificial voices, but some students cannot fully distinguish between the two. It also found preferences for female voices when generated by computers and characteristics of synthetic voices that some students find appealing. The study suggests current listening instruction and materials should be reexamined.

Cargado por

Solano Hernández Ronald

Derechos de autor

Nos tomamos en serio los derechos de los contenidos. Si sospechas que se trata de tu contenido, reclámalo aquí.

Formatos disponibles

Descarga como PDF, TXT o lee en línea desde Scribd

0% encontró este documento útil (0 votos)

75 vistas18 páginas

A Comparison of Synthetic and Human Speech: An Evaluation by English As A Foreign Language Students in A Public Costa Rican University

Cargado por

Solano Hernández Ronald

Derechos de autor

Nos tomamos en serio los derechos de los contenidos. Si sospechas que se trata de tu contenido, reclámalo aquí.

Formatos disponibles

Descarga como PDF, TXT o lee en línea desde Scribd

Revista Comunicación. Año 44, vol. 32, núm. 2, julio-diciembre 2023 (pp.

41-58)

A Comparison of Synthetic and Human

Speech: an Evaluation by English
as a Foreign Language Students in
a Public Costa Rican University
Recibido: 12 de febrero, 2023
Aceptado: 13 de noviembre, 2023
Por: William Charpentier-Jiménez1, Universidad de Costa Rica,
ORCID: https://orcid.org/0000-0002-8554-7819

Abstract William Charpentier-Jiménez. A Comparison

of Synthetic and Human Speech: an
The possible role of text-to-speech (TTS) audio for pedagogical purposes has not Evaluation by English as a Foreign Language
been fully explored. This study examines ESL students’ perceptions of artificial Students in a Public Costa Rican University.
Revista Comunicación. Año 44, volumen
intelligence and human voices. It also explores students’ opinions on listening 32, número 2, junio-diciembre, 2023.
instruction. The investigation was conducted from April to September 2022 and Instituto Tecnológico de Costa Rica. ISSN:
involved 36 TESOL students enrolled in a BA in English or English teaching at a 0379-3974/e-ISSN1659-3820
Costa Rican public university. It used a quantitative survey design. The researcher
gathered student responses through a survey designed to collect students’ percep-
tions of computer-generated voices, human voices, and listening instruction. The
data were quantitatively analyzed using descriptive statistics. Data analyses indi-
cate that: 1) students find human voices more appealing than artificial intelligence
voices; 2) students find female voices more appealing than male voices when a
computer generates them; 3) artificial intelligence voices share some characteris-
tics that students find more appealing; and 4) current listening instruction policies
and materials should be reexamined in the language program. Consistent with the
reviewed literature, these findings demonstrate that although TTS does not appeal
to students as much as human voices, a part of the population finds computer-
generated voices appealing. The analysis also suggests that some students can-
not fully discern between computer-generated and human voices; thus, their use
may be appropriate in some contexts. Finally, these findings confirm that listening
instruction policies and materials should be revised to improve students’ language
acquisition processes.

Comparación del Habla Sintética y Humana: una Evaluación de Estudiantes

de Inglés como Lenguas Extranjera en una Universidad Pública
Costarricense

Resumen PALABRAS CLAVE:

Inteligencia artificial, enseñanza de una
lengua extranjera, educación superior,
El posible papel de audios texto-a-voz (TTS) para usos pedagógicos no ha sido material pedagógico de escucha, texto-
completamente explorado. Este estudio examina las percepciones de estudiantes a-voz
1 William Charpentier-Jiménez ha trabajado para la Universidad de Costa Rica por más de ca-
torce años como profesor y coordinador de varias secciones y proyectos. Ha impartido clases KEY WORDS:
tanto en el Bachillerato en la Enseñanza del Inglés así como en el Bachillerato en Inglés, la Artificial intelligence, Foreign language
Licenciatura en la Enseñanza del Inglés y la Maestría en la Enseñanza del Inglés. Posee una instruction, Higher education, Teaching
Maestría en Lingüística Aplicada y una Maestría en Administración Universitaria, ambas de listening materials, Text-to-speech
la Universidad de Costa Rica. Sus principales intereses de investigación se relación con la
adquisición de vocabulario, aprendizaje de la lengua asistido por computadora y dispositivos
móviles, y la pronunciación. Contacto: [email protected]

41
Revista Comunicación. Año 44, vol. 32, núm. 2, julio-diciembre 2023 (pp. 41-58)

de ILE acerca de las voces humanas y de inteligencia artificial. Asimismo, explora las opiniones de estudiantes sobre la
instrucción de la escucha. Esta investigación se llevó a cabo de abril a setiembre de 2022 e incluyó a 36 estudiantes de ILE
matriculados en un Bachillerato en Inglés o Enseñanza del Inglés en una universidad pública costarricense. Se utilizó un
modelo cuantitativo de encuestas. El investigador recolectó las respuestas mediante una encuesta diseñada para recabar
las percepciones del estudiantado acerca de las voces generadas por computadora, las voces humanas, y la instrucción
de la escucha. Los datos fueron analizados de manera cuantitativa utilizando estadística descriptiva. El análisis de los da-
tos indica que: 1) el estudiantado encuentra las voces humanas más atractivas que las voces generadas con inteligencia
artificial; 2) el estudiantado considera las voces femeninas más atractivas que las masculinas cuando son generadas por
computadora; 3) las voces generadas por inteligencia artificial comparten algunas características que el estudiantado en-
cuentra más atractivas; y 4) las presentes políticas y materiales para la instrucción de la escucha deben ser reexaminadas
en el programa de idiomas. Consistente con la literatura revisada, estos resultados demuestran que aunque las voces
TTS no llaman tanto la atención del estudiantado como las voces humanas, una parte de la población considera las voces
generadas por computadora interesantes. El análisis también sugiere que una parte del estudiantado no puede discernir
en su totalidad entre voces humanas y generadas por computadora; por lo tanto, su uso puede ser apropiado en algunos
contextos. Finalmente, los resultados confirman que las políticas y los materiales para la enseñanza de la escucha deben
ser revisados para mejorar los procesos de adquisición del lenguaje del estudiantado.

INTRODUCTION centered on using them to read abstracts. No other lan-

guage program in the university where this study took
In recent years, humanity has moved beyond viewing place has considered using TTS synthesis, and no at-
technology as a tool to recognizing its potential as a tention has been paid to the use of computer-generated
creative entity capable of imitating human characteris- voices in educational settings, especially in ESL settings
tics. With the advent of artificial intelligence (AI), self- where listening materials are abundant but sometimes
driving cars, automated assistants, computer-generated difficult to find or do not fully adapt to the course re-
text, and synthetic speech have become commonplace quirements.
in our daily lives (Adamopoulou & Moussiades, 2020;
Luo et al., 2022). However, the potential of these inven- This study aims to identify students’ perceptions of
tions in educational settings has yet to be fully explored. synthetic and human voices. The findings may explain
As AI becomes more capable of processing language the possible advantages of incorporating TTS voice re-
and closely interacting with humans, its application cordings in the language class. By comparing human
in language learning settings stands out. For example, voices to synthetic voices, the findings will also aid in
TTS software could help the visually impaired, produce understanding students’ preferences and the differences
voices with realistic accents, or provide fully custom- and similarities between both. Thus, the results may
ized, appropriate audio input for language learners. encourage the development of audio materials, audio-
However, many professors remain unaware or passive books, audio instructions, and auditory aids for visually
about the possible advantages or disadvantages of this impaired students.
technology. Consequently, this hinders students from
accessing a potential source of input tailored to their The findings of this research will potentially benefit
specific requirements, interests, and language level. two populations. On the one hand, professors will have
research-based evidence to know whether or not using
Although this paper focuses on the English language, TTS is appropriate. They can also consider using TTS
public institutions in Costa Rica frequently offer other in some circumstances but not in others. For example,
languages as a major (e.g., French) or as required or professors may determine that TTS is useful for assist-
elective courses for other majors. Therefore, the poten- ing students with visual impairments or creating audio
tial application of TTS may impact several language instructions for listening tasks but not good enough for
programs. Regardless of the potential benefits of imple- longer text passages. Also, professors may incorporate
menting TTS in ESL settings (Bione et al., 2017; Car- audio clips to create other kinds of materials (for exam-
doso et al., 2015; Craig & Schroeder, 2019; Hillaire et ple, audio prompts or word lists in audio) or tasks where
al., 2019; Kang & chatGPT, 2023), currently, the only students respond to questions or react to a short listen-
discussion about TTS systems in the English major has ing passage. On the other hand, students may benefit

42
Revista Comunicación. Año 44, vol. 32, núm. 2, julio-diciembre 2023 (pp. 41-58)

from being exposed to materials that are more contex- mans. It also involves creating machines that can learn
tualized to their needs, more adequate to their level, and from data, make predictions, make decisions, and per-
more appropriate to their interests. Students can also be form tasks that would typically require human intelli-
exposed to AI speakers of various accents, ages, or gen- gence, such as visual perception, speech recognition,
ders if needed. Therefore, having this variety and adap- decision-making, and translating (Abbott, 2020; Arora,
tation may enrich the class dynamics in the ESL class. 2022; Cameron, 2019; Jeste et al., 2020; Kent, 2022).
There are various types of AI. Narrow or weak AI is
This paper is divided into five distinct sections. The in- designed to perform a single task (Gulson et al., 2022;
troduction describes the importance and potential bene- Kindersley, 2023), while general or strong AI can per-
fits of including TTS audios in the context of English as form any intellectual task that a human can (Kindersley,
a second language or foreign language (ESL/EFL). The 2023; Mitchell, 2019). Artificial intelligence is used in
literature review presents the most relevant concepts various applications, such as self-driving cars, virtual
of artificial intelligence (AI) and natural language pro- personal assistants, and biometric authentication meth-
cessing (NLP). It also introduces some core concepts ods. In addition, AI research aims to create systems ca-
related to the human voice, TTS theory, and listening pable of performing tasks that typically require human
instruction. The methods section describes the partici- intelligence, such as understanding natural language,
pants, materials, methodology, procedure, and data col- recognizing images, playing games, and solving com-
lection and interpretation steps. The results section of- plex problems (Jeste et al., 2020).
fers a statistical analysis of the data collected. Finally,
the discussion summarizes some possible limitations Natural Language Processing (NLP) is a subset of arti-
and proposes the main results of the study and their im- ficial intelligence that focuses on the interaction of com-
plications in the field of language teaching, particularly puters and humans through natural language (Kochmar,
listening instruction. 2022; McRoy, 2021). It involves the development of
models and algorithms that can analyze, comprehend,
Aims and produce human language. Natural Language Pro-
The article aims to compare students’ perceptions of cessing is used in various applications, such as sentiment
synthetic and human speech. This article also focuses analysis, online searches, predictive text, and machine
on the perceived differences or similarities between translation (Adamopoulou & Moussiades, 2020; Luo et
male and female voices. al., 2022). It has also been instrumental in advancing
virtual assistants and chatbots. In addition, NLP tech-
niques are based on a combination of computer science,
REVIEW OF THE LITERATURE
linguistic theory, and machine learning (Raaijmakers,
The role of listening instruction has been extensively 2022). The goal of NLP is to create systems that can ac-
studied in recent years. However, to the best of the re- curately compute and analyze large amounts of data and
searcher’s knowledge, studies comparing AI-generated use this information to perform specific tasks. Overall,
audio and human-created audio in ESL settings are NLP is a rapidly growing field that has the potential to
scarce. This literature review summarizes some of the revolutionize how computers and humans interact and
main concepts related to this study. It does not intend has a wide range of practical applications, including
to be comprehensive but to provide an overview of the customer service, marketing, healthcare, etc.
central aspects of speech synthesis and language in-
The Human Voice
struction, particularly the listening skill.
The human voice is the sound produced by the vibration
Artificial Intelligence and Natural
of the vocal folds in the larynx. Sound waves are pro-
Language Processing
duced by the vibration of the vocal folds, which travel
Artificial intelligence is the simulation of human intel- through the oral and nasal cavities to produce speech
ligence in machines designed to think and act like hu- or singing; these waves interact with our articulators
(tongue, jaw, teeth, etc.) to produce specific sounds

43
Revista Comunicación. Año 44, vol. 32, núm. 2, julio-diciembre 2023 (pp. 41-58)

(Calais-Germain & Germain, 2016). The human voice such as large print keyboards, mouse devices, screen
is a powerful and unique tool for communication and magnifiers, and adapted joysticks for individuals with
self-expression. It can convey a wide range of emotions mobility or dexterity issues. Examples of software as-
and has been used throughout history for interacting, sistive technology include screen readers and TTS soft-
storytelling, singing, and other forms of artistic expres- ware for individuals who are blind or have low vision.
sion (Karpf, 2006).
Although screen readers and TTS systems are similar,
The study of the human voice, including its production they also have some differences. A screen reader is a
and perception, is known as voice science or phonet- type of assistive technology that reads out loud the text
ics (Akmajian et al., 2017). Voice science deals with the on a computer screen. It is primarily designed to help
sound and quality of the voice, which in turn is influ- individuals who are blind or have low vision access
enced by several factors, including age, gender, physi- the information and functions of a computer (Evans &
cal attributes, and emotional state. This field of study Blenkhorn, 2008). In this case, the program reads what
is essential for understanding how the voice works and is already on the screen. Text-to-speech is an advanced
developing techniques for improving vocal health and technology that converts written text into speech. One
performance, helping people with trouble speaking, or of its main goals is to be very similar or even indistin-
developing techniques and strategies to help students guishable from the human voice (Dutoit, 1997). It uses
learn a new language. natural language processing and speech synthesis to
generate human-like speech from input text (Taylor,
In addition to its role in communication and expression, 2009). The output speech can be played back using
the human voice also plays an essential role in identi- speakers or headphones or stored as an audio file. For
ty and socialization, as it is frequently used to convey instance, unlike screen readers, a user can deliberately
personal and cultural information (Norton & Toohey, enter text to be read. This first user can modify the text
2011). Thus, the human voice is a complex and unique and how it will be presented to the end user. Text-to-
aspect of human physiology and behavior and contin- speech technology is commonly used for accessibility
ues to be studied by scientists and artists alike. It pro- purposes, for individuals with visual impairments, and
vides essential information such as gender, personality, for various applications in fields such as education, en-
accent, race, and emotion, among other aspects (Nass tertainment, and business (Narayanan & Alwan, 2005).
& Brave, 2005). However, the role of the voice as an
instrument that carries a message has been frequently Text-to-speech technology breaks down written text
overlooked (Karpf, 2006). Listening exercises frequent- into words and phrases and then uses a computer-gener-
ly focus more on the quality of the audio in general than ated voice to read them aloud. The process of TTS typi-
on the characteristics of the voice, and research about cally involves the following stages: text analysis (the
the role of the human voice in learning remains scarce text is analyzed and processed to determine pronun-
(Craig & Schroeder, 2019). ciation, rhythm, and stress patterns), voice synthesis (a
computer-generated voice is created by concatenating
Assistive and text-to-speech technology or piecing together segments of pre-recorded speech),
Assistive technology refers to tools, devices, or soft- and speech production (the processed text is combined
ware created to help people with disabilities perform with the generated voice to produce spoken language)
tasks that they would otherwise be unable to perform (Hersh et al., 2008; Holmes & Holmes, 2001).
or may complete with difficulty (Emiliani & Associa- Several studies have found potential benefits from using
tion for the Advancement of Assistive Technology in TTS in language classes or have found no significant
Europe, 2009). These technologies can aid people with differences between using a synthetic or human voice
several disabilities, including physical, sensory, and (Bione et al., 2017; Cardoso et al., 2015; Craig & Schro-
cognitive impairments (Bouck, 2017; Cook, 2019; Dell eder, 2019; Hillaire et al., 2019; Kang et al., 2008) and
et al., 2017; Green, 2018). Examples of hardware as- the possibility of more interactive models where people
sistive technology include adaptive computer hardware, can keep conversations in real time with machines (Ku-

44
Revista Comunicación. Año 44, vol. 32, núm. 2, julio-diciembre 2023 (pp. 41-58)

mar et al., 2023). In addition, TTS systems may use var- related to language features or contextual characteris-
ious techniques to improve the quality and naturalness tics of the message. For example, the speed of delivery
of the generated speech, such as adjusting the rhythm (Brown & Lee, 2015; Ur, 2012) or the speakers’ ac-
and intonation to match that of a human speaker or add- cent, especially if no adequate training has been previ-
ing natural-sounding pauses and inflections. Since this ously provided (Charpentier-Jiménez, 2019; Derwing
technology is rather new, it is constantly evolving and & Munro, 2015; Field, 2011; Harmer, 2007), may limit
improving (Chen et al, 2023; Wang et al., 2023). There- students’ processing time and frustrate their attempts to
fore, the accuracy and quality of TTS systems can vary decode the message. Additionally, the type of vocabu-
widely, depending on factors such as the complexity of lary (Hadfield & Hadfield, 2008; Watkins, 2010) and
the text, the quality of the voice synthesis, and the so- the level of formality (Hadfield & Hadfield, 2008) could
phistication of the TTS algorithms used. slow down students’ ability to comprehend the mes-
sage. On the one hand, the words, expressions, or gram-
Teaching Listening mar used could be too specific, elaborate, or technical
Teaching listening skills involves providing students for students to understand. On the other hand, language
with opportunities to practice and develop their ability could be too colloquial and culturally bound, making
to understand spoken language. It also includes strate- understanding the message more challenging.
gies such as providing opportunities for authentic lis- Another aspect to consider is the message and its char-
tening, using varied listening materials, and incorporat- acteristics. For example, audio input should present
ing interactive activities (Brace et al., 2006; Ur, 2012). students with authentic input while considering various
Teaching listening skills also requires dedication and task types and audio formats (Brown & Lee, 2015; Bur-
a focus on the process and the outcome. Thus, regular gess & Head, 2005; Celce-Murcia et al., 2010). Content
practice and ongoing feedback are essential for helping is another aspect professors should examine (Harmer,
students improve their listening abilities. 2007). These aspects make finding voice recordings
Listening has historically been viewed as a receptive more difficult for professors. Despite the myriad pos-
skill (Field, 2011; Harmer, 2013). To understand a lis- sibilities the Internet brings, audio recordings do not
tening passage, the listener uses their linguistic abili- always adapt to students’ levels, the desired task, the ap-
ties and schemata. In this regard, one of the main dif- propriate accent, or the content under study. Moreover,
ficulties for ESL students is making sense of the sound professors should consider aspects like length, audio re-
system of English, especially if they are learning it as cording quality, or any other aspect that interferes with
adults (Field, 2011; Nation & Newton, 2009). On the the message, such as background noise (Watkins, 2010),
other hand, the topics used in language classes should since the audio input should provide students with an
consider students’ schemata. A schema is a cognitive appropriate model to imitate (Patel & Jain, 2008).
framework or mental model that helps us organize and Finally, the advancement of AI and text-to-speech
understand information (Brown & Lee, 2015; Harmer, systems have proven effective in improving language
2013). Schemata can refer to general concepts or mental learning. A study by Al-Jarf (2022) highlighted nota-
structures about the world or specific knowledge struc- ble improvements in decoding skills, reading fluency,
tures about a particular topic or situation. For example, and pronunciation accuracy when using these tools,
we may have a schema about what a typical car looks although there was no significant enhancement in vo-
like, which helps us understand and categorize new in- cabulary knowledge. Additionally, the integration of
formation about cars we encounter in real-life situations AI-driven techniques in ELT has been instrumental in
or through written, pictorial, or audio messages. There- bossing motivation and fostering heightened learner
fore, the learner’s linguistic proficiency and schemata engagement. As highlighted by Anis (2023), learners
are crucial when decoding the message. experience heightened involvement due to the effects
In addition, some other aspects may constrain students’ of adaptive instruction, intelligent tutoring systems, and
listening comprehension. Some of these limitations are personalized learning applications. These innovative

45
Revista Comunicación. Año 44, vol. 32, núm. 2, julio-diciembre 2023 (pp. 41-58)

approaches not only stimulate motivation but also en- be used freely without special permission. At no point in
courage active participation in language-related activi- the study did students have access to the script. The four
ties. Furthermore, as Moybeka et al. (2023) emphasize, different audios included this same passage. To record
text-to-speech applications serve as pivotal tools in dis- the audio, the software Speechelo was used. Speechelo
mantling language barriers, leading to a more inclusive is an AI-enabled TTS and voiceover, paid software that
and equitable approach to English language education. turns text into human-sounding voiceovers (BlasterOn-
TTS also offers a unique advantage, assisting students line, 2023). It can also create audio in 23 languages. It
in refining their listening and reading proficiencies was chosen because of its quality and the number of au-
(Hartono et al., 2023). Text-to-Speech tools could be dios it has available. Two of the audios were read by a
a valuable asset in acquainting students with a diverse male and female human, both native American English
range of accents, further enriching their auditory experi- speakers. Students were not informed that some audios
ence and understanding of the language (Fitria, 2023). could be computerized as this could have biased their
perception. The other two audios were read in American
This literature review presents some main concepts English by a male and female AI voice using Speechelo.
related to using TTS in ESL classes. The researcher All audios were encoded in an MP3 format. Participants
must grant that some concepts related to TTS systems listened to the audio using noise-canceling, over-the-ear
or listening instruction have been purposely left aside headphones, the Bose QC35 Series II, which guaran-
as they do not directly relate to the objective of this tees optimal listening conditions. These headphones
study. However, this omission does not limit or impair were wirelessly connected to a different audio system
the findings of this paper. to avoid any interference with students’ answers. Fi-
nally, the survey was divided into four sections: a) de-
METHODS mographic information, b) participants’ perceptions of
voice recordings in English classes, c) the evaluation of
Participants
the AI or human audio, and d) an optional open-ended
This study includes Costa Rican university students en- question. The survey used two question formats: forced-
rolled in their second language course. The researcher choice and open-ended questions. Except for the open-
visited the students’ oral classes to invite them to partic- ended question, items included Likert scales for all sec-
ipate. The participants were selected because they were tions. For example, some items asked the participants
currently enrolled in an oral course in their second aca- to rate the audio quality in their English classes. These
demic year. Their proficiency level corresponds to B1- items were placed on a 5-point Likert scale that ranged
B2. Thirty-six participants were willing to participate; from 1 (Very poor) to 5 (Very good). This format, or a
however, they did not receive monetary compensation similar one, was also used for other questions.
for their participation. All participants speak Spanish as
The last part of the survey contained one optional, open-
their first language.
ended question. This question invited participants to
Materials add any other comments they believed were relevant to
the study. The total time to complete the survey was es-
The materials include written consent, a listening script timated at 10 to 15 minutes.
(see Appendix 1), four different audios (see Appendix
3), the software, the necessary equipment for the listen- PROCEDURE
ing part, and an electronic survey (see Appendix 2) to
collect participants’ answers. The written consent was This study used a quantitative survey design. First, the
sent to participants electronically before their participa- researcher selected an appropriate text to create the au-
tion, and a checkbox labeled “agree to terms and condi- dio recordings. The text was selected because it is copy-
tions” was included to certify voluntary participation in right-free and normally used in language analysis. The
the study. The listening script used was Comma Gets a researcher then pilot-tested several female and male AI
Cure, a diagnostic passage for dialect and accent that can voices with ten participants from the same affiliation

46
Revista Comunicación. Año 44, vol. 32, núm. 2, julio-diciembre 2023 (pp. 41-58)

as the target population. This stage aimed to extract the derived from participants’ survey answers. The analy-
two voices that sounded more human-like. The AI voic- sis included descriptive statistics, where percentages,
es chosen (Mathew and Grace) were fed the proposed nominal data, and the standard deviation, among other
text. These two voices were chosen from a list of 17 basic statistics, were performed to compare participants’
voices offered by the software. Although Speechelo al- opinions about the audios and their listening training.
lows the user to add breathing and pauses, among other
changes, the audios were not modified in any way. The ANALYSIS OF THE RESULTS
human voices were professional voiceover actors. The
speakers also read the same text, and their voices were The following summary of the results presents the main
in no way altered. findings of the study in four distinct sections. The first
section includes the participants’ demographic informa-
After preparing the materials, the researcher created tion. The second section compares the four voiceovers
the survey. The survey included sections about par- based on participants’ ratings. The third section summa-
ticipants’ demographic information, their perception of rizes the main features under analysis and their ratings.
audio quality in English classes, a list of ten descrip- Finally, the fourth section describes participants’ gener-
tors to evaluate the four voiceovers, and an open-ended al perceptions of the audios used during English classes
question. To explore participants’ perceptions of audio and the type of listening instruction they received.
recordings, the list of ten descriptors was extracted from
a list of 17. This list was compiled by the researcher, Demographic Information
considering the most common characteristics asso-
Of the 36 study participants, 27 (75%) were females
ciated with vocal features (Memon, 2020, Paz et al,
and 8 were males (22.22%). One participant (2.78%)
2022). Some items from the initial list were discarded
chose to be identified as non-binary. Overall, 33 partici-
since they did not fit the study’s scope (i.e., background
pants (91.67%) reported being between the ages of 18
noise, length, and volume, among others). By default,
and 24, while two (5.56%) were between 25 and 34.
some of these features were either objectively the same
Only one participant (2.78%) was between 35 and 44.
in all audios or could be adjusted by the participants.
All participants are native Spanish speakers and study
All students had access to a sample survey before their
English as a foreign language. Regarding studies, the
appointment.
study participants are enrolled in the BA in English (n =
Finally, during the data-gathering stage, participants 29, 80.56%) or English teaching (n = 6, 16.67%). Only
were summoned to a vacant office with a silent environ- one participant reported studying both majors (n = 1,
ment. Students were able to choose their appointments 2.78%). The two majors share the same core language
at their own convenience. The researcher provided writ- courses, including oral courses. All of the participants
ten and oral instructions to all participants. Participants are currently in their second or third year.
used noise-canceling, over-the-ear headphones to mini-
Participants’ ratings of voiceovers
mize any background noise during this stage. Although
participants could listen to the audio more than once, The following analysis delves into the realm of par-
no student asked to listen again. Participants’ answers ticipant voiceover preferences. Table 1 shows that par-
were collected through an anonymous electronic survey ticipants’ voiceover ratings can be analyzed from two
that was partially completed while listening to the au- perspectives. First, participants had a slight preference
dio. Other sections did not require the audio to be com- for female voices. Although the difference was almost
pleted. non-existent when comparing human voices, the female
AI was more than six points above the male AI. Sec-
Data Processing and Analysis
ond, participants showed a marked preference for hu-
The original data set in Excel format (xls) was subjected man voices. Even though all maximum grades were at
to computational analysis using the statistical package or above 90, the minimums for AI voices were below
for social sciences (SPSS) Version 26. The data was 55, while human voices exceeded the 70 threshold. The

47
Revista Comunicación. Año 44, vol. 32, núm. 2, julio-diciembre 2023 (pp. 41-58)

standard deviation also shows that, when evaluating hu- voices, they appealed to part of the population. This was
man voices, ratings tend to be more uniform. However, especially evident when overlapping those results with
ratings are more spread when evaluating AI, indicat- the mean and maximum grades.
ing that, although AI voices ranked lower than human

Table 1. Summary of participants’ perceptions of each voiceover: mean and standard deviation

Min. Max. X̄ Median SD

Female AI 54.00 98.00 78.44 76.00 14.24
Male AI 54.00 90.00 72.22 74.00 12.35
Female Human 76.00 96.00 86.44 86.00 6.69
Male Human 74.00 98.00 86.22 90.00 8.74
Note. N = 36. Min. = Minimum; Max. = Maximum; X̄ = arithmetic mean; SD = Standard Deviation
Source: Compiled by the author based on survey responses.

Participants’ voiceover rating per criteria ing to each audio, participants used a five-point Likert
scale to rate one of the audios that were evenly and ran-
To analyze participants’ perceptions of each audio, ten domly assigned to them. Table 2 presents a summary of
criteria were chosen. As previously stated, some criteria the main findings of this section.
from the initial list of 17 were discarded. While listen-

Table 2. Summary of participants’ ratings of each criterion: means of raw data and percentage

Criteria Female AI Male AI Female Human Male Human

Intonation (monotonous – varied) 26 22 36 40
(57.78%) (48.89%) (80.00%) (88.89%)
Voice quality (unclear – clear) 45 39 43 42
(100%) (86.67%) (95.56%) (93.33%)
Voice quality (harsh – pleasant) 37 27 38 38
(82.22%) (60.00%) (84.44%) (84.44%)
Voice quality (lifeless – enthusiastic) 30 32 38 36
(66.67%) (71.11%) (84.44%) (80.00%)
Speed (paused – fluent) 41 43 42 41
(91.11%) (95.56%) (93.33%) (91.11%)
Speed (unvaried – varied) 29 33 36 34
(64.44%) (73.33%) (80.00%) (75.56%)
Vocal variety (does not convey 26 27 34 32
emotion-conveys emotion) (57.78%) (60.00%) (75.56%) (71.11%)
Vocal variety (unfriendly – friendly) 38 37 39 40
(84.44%) (82.22%) (86.67%) (88.89%)
Vocal variety (strained – natural) 39 30 42 41
(86.67%) (66.67%) (93.33%) (91.11%)
General audio quality (unintelligible 42 35 41 44
– clear) (93.33%) (77.78%) (91.11%) (97.78%)
Note. N = 36.
Source: Compiled by the author based on survey responses.

48
Revista Comunicación. Año 44, vol. 32, núm. 2, julio-diciembre 2023 (pp. 41-58)

According to Table 2, some criteria show more con- important. No participant believed that listening in-
trast, while others are more similar. In terms of simi- struction was slightly important or not important. These
lar characteristics, speed (paused – fluent) (SD = 2.13) results show that participants recognize the importance
and vocal variety (unfriendly – friendly) (SD = 2.87) of listening instruction in ESL settings; however, a sig-
are three or fewer points apart from each other. In both nificant number of participants consider that listening
cases, participants perceive the friendliness and fluency instruction needs improvement.
of the voice as good and very good, respectively. On
the other hand, some criteria were different. For exam- Additionally, the survey requested that participants
ple, according to the participants’ answers, intonation evaluate the audio quality and quantity during language
(monotonous – varied) (SD = 18.68) and vocal variety classes. Concerning quality, the results were varied. Six
(strained-natural) (SD = 12.17) are characteristics that participants (16.67%) rated the audio quality as very
show great variation, favoring human voices. Another good, and ten (27.78%) labeled the audio quality and
characteristic worth mentioning is voice quality (harsh- quantity a good. The majority of the participants (n =
pleasant) (SD =11.90). In this last case, the variation oc- 11; 30.56%) considered the audios acceptable, while
curred mainly because of the perceived harshness of the nine (25.00%) mentioned that the audios were poor in
male AI voice. quality. No participant ranked them as very poor. In re-
gard to the audio quantity, eight participants (22.22%)
In addition, some criteria had higher or lower marks believed it was very good. The same number of par-
overall. For example, voice quality (unclear – clear) ticipants (n = 14; 38.89%) ranked the audio quantity as
(93.33%) and speed (slow – fluent) (93.33%) were good or acceptable. No participant chose poor or very
the two highest-ranked criteria for AI voices. In the poor for this section. These results demonstrate that the
case of human voices, voice quality (unclear – clear) use of audio should be revised, especially in terms of
(94.44%) and general audio quality (unintelligible – quality.
clear) (94.44%) were the highest. This shows that over-
all voice quality (unclear – clear) was the characteristic Finally, participants were asked about the challenges
that appealed most to participants. On the other hand, they faced when listening to the audios. The first ques-
AI voices ranked the lowest in intonation (monotonous- tion asked participants if the overall audio quality (back-
varied) (53.33%) and vocal variety (does not convey ground noise or music, static, etc.) had ever increased
emotion – conveys emotion) (58.89%), while human the difficulty level of an audio exercise in language
voices ranked the lowest in speed (unvaried – varied) classes at the university. Most participants answered
(77.78%) and vocal variety (does not convey emotion – affirmatively (n = 28; 77.78%). Only two participants
conveys emotion) (73.33%). On average, vocal variety (5.56%) mentioned that overall audio quality had not
(does not convey emotion – conveys emotion) was the been an issue. Six participants (16.67%) did not remem-
characteristic participants found more unappealing. ber any event where overall audio quality had been an
issue. The second question asked participants whether
Participants’ perceptions of audio use and instruction the speaker’s voice (accent, volume, speed, etc.) had
ever increased the difficulty level of an audio exercise
Participants’ perceptions of audio quality and instruc- in language classes at the university. Most participants
tion were analyzed based on six survey questions. First, answered affirmatively (n = 29; 80.56%). Five partici-
the survey considered listening instruction. Eight par- pants (13.89%) answered that this has never been an
ticipants (22.22%) ranked listening instruction as very issue. Only two participants (5.56%) claim to not re-
good, while 17 (47.22%) ranked it as good. Eleven par- member any instance where the voice quality hindered
ticipants (30.56%) mentioned that listening instruction their understanding. The findings demonstrate that par-
was acceptable. No participant classified listening in- ticipants do not always consider that audios are appro-
struction as poor or very poor. In addition, 17 (47.22%) priate for ESL settings.
participants considered listening instruction important
and very important. Only two participants (5.56%) The strengths and weaknesses of each audio were a
mentioned that listening instruction was moderately recurring theme in the responses to the optional, open-

49
Revista Comunicación. Año 44, vol. 32, núm. 2, julio-diciembre 2023 (pp. 41-58)

ended question. The following examples summarize ing one specific piece of software. Other software may
participants’ opinions concerning the audio. include voices that are more appealing to students or
have a more remarkable resemblance to human voices.
Example 3. It sounds kind of robotic sometimes, but On the part of human voices, the researcher used pro-
it’s acceptable enough. (Participant 6, Female AI fessional voiceover experts. They have the necessary
voice) equipment and record in a professional studio. Although
Example 7. The audio is clear but the voice is too this was done with the intention of replicating the null
robotic and does not sound natural. (Participant 14, environment of AI voices, not all audios used in ESL
Male AI voice) classes share similar characteristics. Finally, AI audios
can be manipulated. In the present study, the audios
Example 13. It’s very pleasant, however, it’s (sic) were not manipulated to standardize procedures. How-
feels rushed and even though it certainly has emotion, ever, using the source software or a third-party appli-
it’s not necessary (sic) to be overlly (sic) happy nor cation, modified audio may improve AI audios. These
too excited. It’s very fluent yet it feels like some air is modifications may also alter the results from one study
neccesary (sic) in order to continue the reading. Pretty to the next. Therefore, the results included in this study
good though. (Participant 20, Female human voice) cannot be generalized but should serve as a base for fu-
ture research.
Example 19. Speed was a bit quick for a short story,
maybe a little bit of excitement would be good. (Par- Researchers should replicate this study in other ESL
ticipant 33, Male human voice) settings or other types of TTS software. For example,
not all institutions or professors may have access to
In general, participants also commented on the quality the same software. Although TTS free software ex-
of the headphones used. According to the participants’ ists, its quality and number of available voices may not
comments, they were very pleased with the equipment. compare to paid software. In addition, future research
The equipment used during listening instruction was should consider other types of environments in which
beyond the scope of this study; however, it should be noise and background noise may play a part in regular
considered in future research on listening instruction or listening instruction. Further research should also deter-
AI voiceovers. mine whether other characteristics or criteria may trig-
This investigation shed light on how students perceive ger other results.
human and AI voices. It also discussed the different
criteria used to rank voices in ESL environments. Fi- CONCLUSIONS
nally, it described students’ perceptions of audio use
Although this study’s findings are not generalizable
and instruction.
beyond the study sample, several conclusions can be
drawn from the analysis of the results. First, AI voices
DISCUSSION are not yet at the same level as human voices. In gen-
Limitations and Future Directions eral, human voices are preferred over human voices;
however, this does not imply that AI voices should not
This study has three main limitations worth mentioning. be used. Some students did not notice that they were
First, the number and type of criteria used are limited. listening to a non-human voice; even human voices,
Only ten criteria were used, and other options could recorded by experts and with professional equipment,
have been considered for this study. However, due to were criticized in some aspects. In addition, AI voices
time constraints, the ten most relevant choices, accord- cannot be used in all scenarios and contexts. For ex-
ing to the researchers’ pilot test, were included. Other or ample, AI voices are limited since they cannot create
more criteria could trigger different results. Second, the role-plays, dialogues, or other interactive communica-
researcher used four types of audio. The selection was tive instances without a lot of human intervention, at
based on the results of the pilot test for AI voices us- least not with the type of TTS software used. As AI

50
Revista Comunicación. Año 44, vol. 32, núm. 2, julio-diciembre 2023 (pp. 41-58)

voices are not as appealing as human voices, they can dents perceive that audios pose additional challenges
be used to generate instructions for listening exercises, created by static, unnecessary background noise or mu-
provide audio support for readings (especially for visu- sic, volume, or accent, among other factors, they may
ally impaired students), give people who have lost their develop negative feelings towards listening exercises.
voices for medical reasons the ability to communicate Nevertheless, this does not mean that students should
orally, or create introductions or summaries of listening not be challenged. Students may face real-life situations
exercises. Finally, AI voices may be modified to play a where some of these added difficulties are present; how-
more pedagogical role by providing extra audio input ever, institutions should develop clear guidelines to pro-
or audio prompts for students to discuss various topics. vide students with appropriate materials for their level,
age, or other conditions.
Second, AI voices do not fall behind in all criteria. This
information may be useful for two populations. On the It is important to remember that users of TTS software,
one hand, people who program TTS applications may including Speechelo, can adjust pitch level, breathing,
seek to adjust, to the best of current technological ca- speed, and emphasis to make voices sound more natu-
pabilities, those characteristics that mark AI voices as ral. Although Speechelo was created with video creators
non-human. On the other hand, language professors and in mind, its use may provide opportunities to improve
material developers may take advantage of this informa- students’ language learning capabilities. The author
tion and include AI or human voices according to their suggests that other language programs replicate this
specific needs. For example, in lieu of having a human study to examine other possible TTS software uses or
record specific audios for beginners, a professor may test its possible improvement in the coming years.
decide to use AI voices since students’ main challenge
is speed, a feature that is easily adjusted in a computer- BIBLIOGRAPHICAL REFERENCES
generated environment. On the other hand, audio that
requires enthusiasm, emotion, or varied intonation may Abbott, R. (2020). The Reasonable Robot: Artificial In-
be more suitable for human voices. In addition, AI voic- telligence and the Law (1st ed.). Cambridge Univer-
es may be useful where resources and exposure to real- sity Press. https://doi.org/10.1017/9781108631761
life languages are limited. Although the Internet is an
excellent source for audio input, finding suitable audios Adamopoulou, E., & Moussiades, L. (2020). An Over-
for students’ specific needs (accent, speed, topic, dura- view of Chatbot Technology. In I. Maglogiannis, L.
tion, vocabulary or grammar level, etc.) may be time- Iliadis, & E. Pimenidis (Eds.), Artificial Intelligence
consuming or virtually impossible without considering Applications and Innovations (Vol. 584, pp. 373–
that some audios may be subject to copyright laws. 383). Springer International Publishing. https://doi.
org/10.1007/978-3-030-49186-4_31
The results of this study call for a revision of the pro-
gram’s listening instructions. Although students recog- Al-Jarf, R. (2022). Text-to-speech software for promot-
nize the importance of listening instruction, they per- ing EFL freshman students’ decoding skills and pro-
ceive some weaknesses in the instruction they receive. nunciation accuracy. Journal of Computer Science
In particular, a relevant group of students considers that and Technology Studies, 4(2), 19-30.
the number of audios, their quality, and the general qual- Akmajian, A., Farmer, A. K., Bickmore, L., Demers, R.
ity of instruction are areas that need improvement. The A., & Harnish, R. M. (Eds.). (2017). Linguistics: an
results do not indicate that these areas need to be com- introduction to language and communication (Sev-
pletely restructured; however, they point to a system- enth edition). The MIT Press.
atic revision of current policies and materials to provide
students with better and more substantial exposure to Anis, M. (2023). Leveraging Artificial Intelligence for
auditory input. Currently, policies and materials should Inclusive English Language Teaching: Strategies And
also be examined to guarantee that students are exposed Implications For Learner Diversity. Journal of Multi-
to audios according to their level and needs. When stu-

51
Revista Comunicación. Año 44, vol. 32, núm. 2, julio-diciembre 2023 (pp. 41-58)

disciplinary Educational Research. 12(6). http://ijmer. Celce-Murcia, M., Brinton, D., & Goodwin, J. M.
in.doi./2023/12.06.89 (2010). Teaching pronunciation: a course book and
reference guide (2nd ed). Cambridge University
Arora, V. (2022). Artificial intelligence in schools: a Press.
guide for teachers, administrators, and technology
leaders. Routledge. Charpentier-Jiménez, W. (2019). University students´
perception of exposure to various English accents
Bione, T., Grimshaw, J., & Cardoso, W. (2017). An and their production. Actualidades Investigativas En
evaluation of TTS as a pedagogical tool for pronun- Educación, 19(2), 1–27. https://doi.org/10.15517/aie.
ciation instruction: the ‘foreign’ language context. v19i2.36908
In K. Borthwick, L. Bradley, & S. Thouësny (Eds.),
CALL in a climate of change: adapting to turbulent Chen, L. W., Watanabe, S., & Rudnicky, A. (2023). A
global conditions – short papers from EUROCALL vector quantized approach for text to speech synthe-
2017 (pp. 56–61). Research-publishing.net. https:// sis on real-world spontaneous speech. arXiv preprint
doi.org/10.14705/rpnet.2017.eurocall2017.689 arXiv:2302.04215.

BlasterOnline. (2023). Speechelo [Computer software]. Cook, A. M. (2019). Assistive technologies: principles
Romania. Retrieved from: https://app.blasteronline. and practice (5th edition). Elsevier.
com/speechelo/
Craig, S. D., & Schroeder, N. L. (2019). Text-to-Speech
Bouck, E. C. (2017). Assistive technology. Sage Publi- Software and Learning: Investigating the Relevancy
cations. of the Voice Effect. Journal of Educational Com-
puting Research, 57(6), 1534–1548. https://doi.
Brace, J., Brockhoff, V., Sparkes, N., & Tuckey, J. org/10.1177/0735633118802877
(2006). Speaking and listening map of development:
addressing current literacy challenges (2nd ed). Rig- Dell, A. G., Newton, D. A., & Petroff, J. G. (2017). As-
by-Harcourt EducationRigby. sistive technology in the classroom: enhancing the
school experiences of students with disabilities (Third
Brown, H. D., & Lee, H. (2015). Teaching by princi- edition). Pearson.
ples: an interactive approach to language pedagogy
(Fourth edition). Pearson Education. Derwing, T. M., & Munro, M. J. (2015). Pronunciation
fundamentals: evidence-based perspectives for L2
Burgess, S., & Head, K. (2005). How to teach for ex- teaching and research. John Benjamins Publishing
ams. Longman. Company.
Calais-Germain, B., & Germain, F. (2016). Anatomy Dutoit, T. (1997). An introduction to text-to-speech syn-
of voice: how to enhance and project your best voice thesis. Kluwer Academic Publishers.
(First U.S. edition). Healing Arts Press.
Emiliani, P. L., & Association for the Advancement of
Cameron, R. M. (2019). A.I. - 101: a primer on using Assistive Technology in Europe (Eds.). (2009). Assis-
artifical intelligence in education. publisher not iden- tive technology from adapted equipment to inclusive
tified. environments: AAATE 2009. Washington, DC : IOS
Cardoso, W., Smith, G., & Garcia Fuentes, C. (2015). Press.
Evaluating text-to-speech synthesizers. Critical CALL Evans, G., & Blenkhorn, P. (2008). Screen Readers and
– Proceedings of the 2015 EUROCALL Conference, Screen Magnifiers. In M. A. Hersh, M. A. Johnson,
Padova, Italy, 108–113. https://doi.org/10.14705/ & D. Keating (Eds.), Assistive technology for visually
rpnet.2015.000318 impaired and blind people. Springer.

52
Revista Comunicación. Año 44, vol. 32, núm. 2, julio-diciembre 2023 (pp. 41-58)

Field, J. (2011). Psycholinguistics. In J. Simpson (Ed.), Honorof, D., McCullough, J., & Somerville, B. Comma
The Routledge handbook of applied linguistics (1st Gets A Cure | IDEA: International Dialects of Eng-
ed). Routledge. lish Archive. https://www.dialectsarchive.com/com-
ma-gets-a-cure
Fitria, T. N. (2023). English Accent Variations of Amer-
ican English (Ame) and British English (Bre): An Jeste, D. V., Graham, S. A., Nguyen, T. T., Depp, C. A.,
Implication in English Language Teaching. Sketch Lee, E. E., & Kim, H.-C. (2020). Beyond artificial
Journal: Journal of English Teaching, Literature and intelligence: exploring artificial wisdom. Interna-
Linguistics, 3(1), 1-16. tional Psychogeriatrics, 32(8), 993–1001. https://doi.
org/10.1017/S1041610220000927
Green, J. L. (2018). Assistive technology in special edu-
cation: resources to support literacy, communication, Kang, M., Kashiwagi, H., Treviranus, J., & Kaburagi,
and learning differences (Third edition). Prufrock M. (2008). Synthetic speech in foreign language
Press, Inc. learning: an evaluation by learners. International
Journal of Speech Technology, 11(2), 97–106. https://
Gulson, K. N., Sellar, S., & Webb, P. T. (2022). Algo- doi.org/10.1007/s10772-009-9039-3
rithms of education: how datafication and artificial
intelligence shape policy. University of Minnesota Karpf, A. (2006). The human voice: how this extraordi-
Press. nary instrument reveals essential clues about who we
are (1st U.S. ed). Bloomsbury Publishing.
Hadfield, J., & Hadfield, C. (2008). Introduction to
teaching English (1. publ). Oxford Univ. Press. Kent, D. (2022). Artificial intelligence in education:
fundamentals for educators. Kotesol DDC.
Harmer, J. (2007). How to teach English. (New ed., 6.
impr). Pearson/Longman. Kindersley, D. (2023). Simply Artificial Intelligence.
DK PUBLISHING.
Harmer, J. (2013). The practice of English language
teaching: with DVD (4. ed., 8. impression). Pearson King, M. R., & chatGPT. (2023). A Conversation on
Education. Artificial Intelligence, Chatbots, and Plagiarism in
Higher Education. Cellular and Molecular Bioengi-
Hartono, W. J., Nurfitri, N., Ridwan, R., Kase, E. B., neering, 16(1), 1–2. https://doi.org/10.1007/s12195-
Lake, F., & Zebua, R. S. Y. (2023). Artificial Intel- 022-00754-8
ligence (AI) Solutions In English Language Teach-
ing: Teachers-Students Perceptions And Experiences. Kochmar, E. (2022). Getting started with Natural Lan-
Journal on Education, 6(1), 1452-1461. guage Processing. Manning Publications.

Hersh, M. A., Johnson, M. A., Keating, D., & Hoff- Kumar, Y., Koul, A. & Singh, C. (2023). A deep learn-
mann, R. (Eds.). (2008). Speech, Text and Braille ing approaches in text-to-speech system: a systematic
Conversion Technology. In Assistive technology for review and recent research perspective. Multimed
visually impaired and blind people. Springer. Tools Appl 82, 15171–15197 https://doi.org/10.1007/
s11042-022-13943-4
Hillaire, G., Iniesto, F., & Rienties, B. (2019). Humanis-
ing Text-to-Speech Through Emotional Expression in Luo, B., Lau, R. Y. K., Li, C., & Si, Y. (2022). A criti-
Online Courses. Journal of Interactive Media in Edu- cal review of state‐of‐the‐art chatbot designs and
cation, 2019(1), 12. https://doi.org/10.5334/jime.519 applications. WIREs Data Mining and Knowledge
Discovery, 12(1). https://doi.org/10.1002/widm.1434
Holmes, J. N., & Holmes, W. (2001). Speech synthesis
and recognition (2nd ed). Taylor & Francis. McRoy, S. (2021). Principles of natural language pro-
cessing. Susan McRoy.

53
Revista Comunicación. Año 44, vol. 32, núm. 2, julio-diciembre 2023 (pp. 41-58)

Memon, S. A. (2020). Acoustic Correlates of the Voice Patel, M. F., & Jain, P. M. (2008). English language
Qualifiers: A Survey (arXiv:2010.15869). arXiv. teaching: (methods, tools & techniques). Sunrise
https://doi.org/10.48550/arXiv.2010.15869 Publishers & Distributors.

Mitchell, M. (2019). Artificial intelligence: a guide for Paz, K. E. D. S., Almeida, A. A., Behlau, M., & Lopes,
thinking humans. Farrar, Straus and Giroux. L. W. (2022). Descritores de qualidade vocal sopro-
sa, rugosa e saudável no senso comum. Audiology
Moybeka, A. M., Syariatin, N., Tatipang, D. P., Mush- - Communication Research, 27, e2602. https://doi.
thoza, D. A., Dewi, N. P. J. L., & Tineh, S. (2023). org/10.1590/2317-6431-2021-2602
Artificial Intelligence and English Classroom: The
Implications of AI Toward EFL Students’ Motivation. Raaijmakers, S. (2022). Deep learning for natural lan-
Edumaspul: Jurnal Pendidikan, 7(2), 2444-2454. guage processing. Manning Publications Co.

Narayanan, S. S., & Alwan, A. (Eds.). (2005). Text to Taylor, P. A. (2009). Text-to-speech synthesis. Cam-
speech synthesis: new paradigms and advances. bridge University Press.
Prentice Hall Professional Technical Reference.
Ur, P. (2012). A course in English language teaching
Nass, C. I., & Brave, S. (2005). Wired for speech: how (2nd ed). Cambridge University Press.
voice activates and advances the human-computer re-
lationship. MIT Press. Wang, C., Chen, S., Wu, Y., Zhang, Z., Zhou, L., Liu, S.,
& Wei, F. (2023). Neural codec language models are
Nation, I. S. P., & Newton, J. (2009). Teaching ESL/ zero-shot text to speech synthesizers. arXiv preprint
EFL listening and speaking. Routledge. arXiv:2301.02111.

Norton, B., & Toohey, K. (2011). Identity, language Watkins, P. (2010). Learning to teach English: a practi-
learning, and social change. Language Teach- cal introduction for new teachers (Reprinted). Delta
ing, 44(4), 412–446. https://doi.org/10.1017/ Publishing.
S0261444811000309

54
Revista Comunicación. Año 44, vol. 32, núm. 2, julio-diciembre 2023 (pp. 41-58)

APPENDIX 1

Comma Gets a Cure

A Diagnostic Passage for Dialect and Accent Study

by Jill McCullough & Barbara Somerville
Edited by Douglas N. Honorof
Incorporating the standard lexical set words of J.C. Wells.
Well, here’s a story for you: Sarah Perry was a veterinary nurse who had been working daily at an old zoo in a
deserted district of the territory, so she was very happy to start a new job at a superb private practice in North Square
near the Duke Street Tower. That area was much nearer for her and more to her liking. Even so, on her first morning,
she felt stressed. She ate a bowl of porridge, checked herself in the mirror and washed her face in a hurry. Then she
put on a plain yellow dress and a fleece jacket, picked up her kit and headed for work. When she got there, there was a
woman with a goose waiting for her. The woman gave Sarah an official letter from the vet. The letter implied that the
animal could be suffering from a rare form of foot and mouth disease, which was surprising, because normally you
would only expect to see it in a dog or a goat. Sarah was sentimental, so this made her feel sorry for the beautiful bird.
Before long, that itchy goose began to strut around the office like a lunatic, which made an unsanitary mess. The
goose’s owner, Mary Harrison, kept calling, “Comma, Comma,” which Sarah thought was an odd choice for a name.
Comma was strong and huge, so it would take some force to trap her, but Sarah had a different idea. First, she tried
gently stroking the goose’s lower back with her palm, then singing a tune to her. Finally, she administered ether.
Her efforts were not futile. In no time, the goose began to tire, so Sarah was able to hold onto Comma and give her
a relaxing bath.
Once Sarah had managed to bathe the goose, she wiped her off with a cloth and laid her on her right side. Then
Sarah confirmed the vet’s diagnosis. Almost immediately, she remembered an effective treatment that required her
to measure out a lot of medicine. Sarah warned that this course of treatment might be expensive – either five or six
times the cost of penicillin. I can’t imagine paying so much, but Mrs. Harrison – a millionaire lawyer – thought it
was a fair price for a cure.
Comma Gets a Cure and derivative works may be used freely for any purpose without special permission, provided
the present sentence and the following copyright notification accompany the passage in print, if reproduced in print,
and in audio format in the case of a sound recording: Copyright 2000 Douglas N. Honorof, Jill McCullough & Bar-
bara Somerville. All rights reserved.

55
Revista Comunicación. Año 44, vol. 32, núm. 2, julio-diciembre 2023 (pp. 41-58)

APPENDIX 2

Survey

The purpose of this survey is to determine current practices in pronunciation instruction and the use of audio re-
cordings and their perceived quality.

This survey should take no more than 10 minutes of your time. All answers are anonymous. Your participation in
this brief survey is greatly appreciated.

Best regards,

Prof. William Charpentier

Escuela de Lenguas Modernas
University of Costa Rica

I. Demographic Information

The University of Costa Rica does not discriminate on the basis of sexual orientation, gender identity or expres-
sion, age, or national origin. In order to track the reach and effectiveness of our learning experiences and ensure we
consider the needs of all, please consider the following questions:

1. What is your gender?

Female
Male
Non-binary / third gender
Prefer to self-describe: ______
Prefer not to say

2. What is your age?

Below 18
18 – 24
25 – 34
35 – 44
45 – 54
Above 54

3. What is your native language?

English
Spanish
Other:

4. What year are you?

Freshman (1st year)
Sophomore (2nd year)
Junior (3rd year)
Senior (4th year)

56
Revista Comunicación. Año 44, vol. 32, núm. 2, julio-diciembre 2023 (pp. 41-58)

II. Answer the following questions taking into account any exposure you have had to the use of dictionaries
during your major.

5. On a scale of 1 to 5, with 1 being poor and 5 being excellent, how would you rate listening instruction in the ma-
jor?
1-2-3-4-5

6. On a scale of 1 to 5, with 1 being poor and 5 being excellent, how important is listening instruction for you?
1-2-3-4-5

7. Has overall audio quality (background noise or music, static, etc.) ever increased the difficulty level of an audio
exercise in language classes at the university?
Yes
No
I don’t remember.

8. Has the speaker’s voice (accent, volume, speed, etc.) ever increased the difficulty level of an audio exercise in
language classes at the university?
Yes
No
I don’t remember.

9. On a scale of 1 to 5, with 1 being poor and 5 being excellent, how would you rate the quantity of listening exer-
cises in your oral courses?
1-2-3-4-5

10. On a scale of 1 to 5, with 1 being poor and 5 being excellent, how would you rate the quality of audios in your
oral courses?
1-2-3-4-5

You will answer this part while listening to the audio.

III. On a scale of 1 to 5, how would you rate this audio? Use the words at both ends to guide your answer.

11. The intonation of the speaker is __________________.

Monotonous 1-2-3-4-5 Varied

12. The voice quality of the speaker is __________________.

Unclear 1-2-3-4-5 Clear
Harsh, raspy 1-2-3-4-5 Mellow, pleasant
Lifeless 1-2-3-4-5 Enthusiastic

13. The timing or rate (speed) of the speaker is __________________.

Slow 1-2-3-4-5 Fluent
Unvaried 1-2-3-4-5 Varied, exciting

57
Revista Comunicación. Año 44, vol. 32, núm. 2, julio-diciembre 2023 (pp. 41-58)

14. The vocal variety:of the speaker is __________________.

Emotionless 1-2-3-4-5 Conveys emotion
Unfriendly 1-2-3-4-5 Friendly
Strained (showing signs of 1-2-3-4-5 Natural
tiredness or nervous tension)

15. How would you define the audio quality of this recording?
Unintelligible or Poor 1-2-3-4-5 Clear or Excellent

16. Please add any other comment you believe necessary.

__________________________________________________________________

APPENDIX 3

También podría gustarte

IA en Enseñanza de Idiomas
Aún no hay calificaciones
IA en Enseñanza de Idiomas
10 páginas
Uso de Tecnología en Educación Sorda
Aún no hay calificaciones
Uso de Tecnología en Educación Sorda
12 páginas
La IA en La Ensenanza de Idiomas Chatbots y Formac
Aún no hay calificaciones
La IA en La Ensenanza de Idiomas Chatbots y Formac
12 páginas
Trabajo de Grado Sánchez - Duque
Aún no hay calificaciones
Trabajo de Grado Sánchez - Duque
54 páginas
IA para Aprender Idiomas Entre Universitarios Japoneses - 2025
Aún no hay calificaciones
IA para Aprender Idiomas Entre Universitarios Japoneses - 2025
24 páginas
La Enseñanza Del Ingles en El Siglo XXI
Aún no hay calificaciones
La Enseñanza Del Ingles en El Siglo XXI
38 páginas
Reconocimiento de Voz en Educación
Aún no hay calificaciones
Reconocimiento de Voz en Educación
19 páginas
Chatgpt para Aprender Español
Aún no hay calificaciones
Chatgpt para Aprender Español
18 páginas
Desafíos Éticos y Morales de La Inteligencia Artificial
Aún no hay calificaciones
Desafíos Éticos y Morales de La Inteligencia Artificial
32 páginas
Revisión de La Tecnología de Síntesis de Voz y Recursos Lingüísticos Existentes para El Idioma Español
Aún no hay calificaciones
Revisión de La Tecnología de Síntesis de Voz y Recursos Lingüísticos Existentes para El Idioma Español
14 páginas
Tesis
100% (2)
Tesis
128 páginas
Ia en La Literatura
Aún no hay calificaciones
Ia en La Literatura
22 páginas
Efectos de herramientas tecnológicas en inglés
Aún no hay calificaciones
Efectos de herramientas tecnológicas en inglés
11 páginas
ChatGPT en La Clase de Preparación Al DELE
Aún no hay calificaciones
ChatGPT en La Clase de Preparación Al DELE
20 páginas
492 Chicaiza++
Aún no hay calificaciones
492 Chicaiza++
19 páginas
Articulo 2
Aún no hay calificaciones
Articulo 2
10 páginas
Reconocimiento de Voz en Educación
Aún no hay calificaciones
Reconocimiento de Voz en Educación
18 páginas
Exploring The Potential of An AI-based Chatbot (ChatGPT)
Aún no hay calificaciones
Exploring The Potential of An AI-based Chatbot (ChatGPT)
23 páginas
Chat GPT Dele
Aún no hay calificaciones
Chat GPT Dele
19 páginas
1195-Texto Del Artículo-11645-2-10-20250607
Aún no hay calificaciones
1195-Texto Del Artículo-11645-2-10-20250607
11 páginas
Uso de Chat GPT Como Herramienta
Aún no hay calificaciones
Uso de Chat GPT Como Herramienta
25 páginas
Audiolibros como herramienta en inglés
Aún no hay calificaciones
Audiolibros como herramienta en inglés
9 páginas
TFM Perez Garcia, Juan Antonio
Aún no hay calificaciones
TFM Perez Garcia, Juan Antonio
25 páginas
Nepturne M
Aún no hay calificaciones
Nepturne M
21 páginas
RILEX v7 n3 7 9111
Aún no hay calificaciones
RILEX v7 n3 7 9111
19 páginas
Chat GPT en La Educación Superior
Aún no hay calificaciones
Chat GPT en La Educación Superior
10 páginas
2024 TEyET
Aún no hay calificaciones
2024 TEyET
10 páginas
Traducción Robots Con Mal Acento, Convivir Con El Habla Sintética (2008) - Marc Bohlen, Colectivo Pliegue
Aún no hay calificaciones
Traducción Robots Con Mal Acento, Convivir Con El Habla Sintética (2008) - Marc Bohlen, Colectivo Pliegue
12 páginas
13-El Uso de Podcasts en La Enseñanza Secundaria
Aún no hay calificaciones
13-El Uso de Podcasts en La Enseñanza Secundaria
9 páginas
Dialnet HablaYEscucha 5920343
Aún no hay calificaciones
Dialnet HablaYEscucha 5920343
20 páginas
ChatGPT en El Ambito Educativo
Aún no hay calificaciones
ChatGPT en El Ambito Educativo
10 páginas
Integración de ChatGPT en La Formación Inicial de Profesorado
Aún no hay calificaciones
Integración de ChatGPT en La Formación Inicial de Profesorado
19 páginas
Herramientas Interactivas de Lenguas
Aún no hay calificaciones
Herramientas Interactivas de Lenguas
16 páginas
Universidad Del Zulia. Revista de La Facultad de Ciencias Económicas y Sociales Vol. XXX, Núm Especial 10 Julio-Diciembre, 2024
Aún no hay calificaciones
Universidad Del Zulia. Revista de La Facultad de Ciencias Económicas y Sociales Vol. XXX, Núm Especial 10 Julio-Diciembre, 2024
17 páginas
Proyecto2.0 (1) Mejorado
Aún no hay calificaciones
Proyecto2.0 (1) Mejorado
37 páginas
0718 5006 Formuniv 16 06 61
Aún no hay calificaciones
0718 5006 Formuniv 16 06 61
10 páginas
Luca No Huan Calu Is Anthony
Aún no hay calificaciones
Luca No Huan Calu Is Anthony
41 páginas
Borr1-Ramos Lozano Ipanaque
Aún no hay calificaciones
Borr1-Ramos Lozano Ipanaque
5 páginas
Hostetter Et Al Preprint ChatGPT - En.es
Aún no hay calificaciones
Hostetter Et Al Preprint ChatGPT - En.es
40 páginas
Explorando El Papel de La IA en La Educación Universitaria de La Informática A Través de Una Conversación
Aún no hay calificaciones
Explorando El Papel de La IA en La Educación Universitaria de La Informática A Través de Una Conversación
8 páginas
Posibilidades y Limitaciones de Chatgpt para El Aprendizaje Del Español
Aún no hay calificaciones
Posibilidades y Limitaciones de Chatgpt para El Aprendizaje Del Español
12 páginas
Ia para El Aprendizaje Del Inglés Como Lengua Extranjera.
Aún no hay calificaciones
Ia para El Aprendizaje Del Inglés Como Lengua Extranjera.
14 páginas
Introducción al Voice AI y su Impacto
Aún no hay calificaciones
Introducción al Voice AI y su Impacto
29 páginas
Beneficios y Desafíos de Los Asistentes Virtuales en El Aprendizaje
Aún no hay calificaciones
Beneficios y Desafíos de Los Asistentes Virtuales en El Aprendizaje
16 páginas
Medios Audiovisuales y Comprensión Auditiva en Inglés
Aún no hay calificaciones
Medios Audiovisuales y Comprensión Auditiva en Inglés
24 páginas
Análisis Crítico Sobre El Uso de ChatGPT en La Educación
Aún no hay calificaciones
Análisis Crítico Sobre El Uso de ChatGPT en La Educación
3 páginas
La Inteligencia Artificial
Aún no hay calificaciones
La Inteligencia Artificial
3 páginas
05 Articulo DosierIA 1 2024
Aún no hay calificaciones
05 Articulo DosierIA 1 2024
10 páginas
Conclusión General de Entrevistas PDF
Aún no hay calificaciones
Conclusión General de Entrevistas PDF
2 páginas
Colaboración Humano-Máquina en El
Aún no hay calificaciones
Colaboración Humano-Máquina en El
23 páginas
Literacidad de Corpus en Educación Docente
Aún no hay calificaciones
Literacidad de Corpus en Educación Docente
7 páginas
Vol137 5 Estudio Comparativo Experimental Del Uso de Chatgpt y Su Influencia
Aún no hay calificaciones
Vol137 5 Estudio Comparativo Experimental Del Uso de Chatgpt y Su Influencia
14 páginas
Crítica al Uso de ChatGPT en Universidades
Aún no hay calificaciones
Crítica al Uso de ChatGPT en Universidades
8 páginas
Lopez Arenas Armando Arturo1
Aún no hay calificaciones
Lopez Arenas Armando Arturo1
61 páginas
Chatgpt en El Ámbito Educativo: Chatgpt in The Educational Field
Aún no hay calificaciones
Chatgpt en El Ámbito Educativo: Chatgpt in The Educational Field
10 páginas
35153-Texto Do Trabalho-163701-1-10-20240708
Aún no hay calificaciones
35153-Texto Do Trabalho-163701-1-10-20240708
20 páginas
Caso de Franz Xaver Forgo
Aún no hay calificaciones
Caso de Franz Xaver Forgo
5 páginas
ACTUARIALMTAHANDER 1y2
83% (6)
ACTUARIALMTAHANDER 1y2
57 páginas
Cuadernillo de Comprensión Lectora Nivel 1 El Profe Cool
Aún no hay calificaciones
Cuadernillo de Comprensión Lectora Nivel 1 El Profe Cool
20 páginas
Permiso de Trabajo para Alturas 2023
Aún no hay calificaciones
Permiso de Trabajo para Alturas 2023
2 páginas
Grasa Viperlube
Aún no hay calificaciones
Grasa Viperlube
4 páginas
Inventario Cociente Emocional Bar-On
100% (5)
Inventario Cociente Emocional Bar-On
29 páginas
Hipótesis y Métodos en Resistencia Materiales
100% (2)
Hipótesis y Métodos en Resistencia Materiales
3 páginas
Técnicas de Aprestamiento para Niños
50% (2)
Técnicas de Aprestamiento para Niños
34 páginas
Convocatoria Docente Ii-2025
Aún no hay calificaciones
Convocatoria Docente Ii-2025
2 páginas
C.C Acusado Como Testigo en Su Juicio.
Aún no hay calificaciones
C.C Acusado Como Testigo en Su Juicio.
32 páginas
Glosario de Tang Soo Do
Aún no hay calificaciones
Glosario de Tang Soo Do
1 página
S6-Matriz Operacionalización de Variable
Aún no hay calificaciones
S6-Matriz Operacionalización de Variable
3 páginas
Guía para Docentes en Educación Inicial
Aún no hay calificaciones
Guía para Docentes en Educación Inicial
3 páginas
Reveduhumanismo, MAQUETA-El Sociograma
Aún no hay calificaciones
Reveduhumanismo, MAQUETA-El Sociograma
12 páginas
Geologia Sur de Paipa
100% (1)
Geologia Sur de Paipa
1 página
Cronometría Dentaria de Los Ovinos
Aún no hay calificaciones
Cronometría Dentaria de Los Ovinos
5 páginas
Ideas de Juegos
100% (1)
Ideas de Juegos
2 páginas
Mapa de Guatemala Con Departamentos y Cabeceras
100% (1)
Mapa de Guatemala Con Departamentos y Cabeceras
3 páginas
Historia de La Seguridad Social en RD
Aún no hay calificaciones
Historia de La Seguridad Social en RD
6 páginas
Teorías del Derecho Internacional Público
100% (1)
Teorías del Derecho Internacional Público
3 páginas
Certificado de Salud 2021: Aptitud Física
Aún no hay calificaciones
Certificado de Salud 2021: Aptitud Física
1 página
Adicción A La Marihuana en Los Jóvenes
Aún no hay calificaciones
Adicción A La Marihuana en Los Jóvenes
9 páginas
Ficha Tecnica Brisa 90w
Aún no hay calificaciones
Ficha Tecnica Brisa 90w
4 páginas
Guía de Aprendizaje Grado 8
Aún no hay calificaciones
Guía de Aprendizaje Grado 8
7 páginas
Nombres y Titulaciones
Aún no hay calificaciones
Nombres y Titulaciones
1 página
Historia de La Música Desde La Antigüedad Hasta Hoy.
Aún no hay calificaciones
Historia de La Música Desde La Antigüedad Hasta Hoy.
3 páginas
Avaluo
Aún no hay calificaciones
Avaluo
12 páginas
Ficha Técnica KPIs
Aún no hay calificaciones
Ficha Técnica KPIs
5 páginas
El Grafo del Deseo y el Sujeto
Aún no hay calificaciones
El Grafo del Deseo y el Sujeto
3 páginas
Trabajo de Investigacion
Aún no hay calificaciones
Trabajo de Investigacion
80 páginas