Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2019
Mexico has a great language diversity. In addition to Spanish, there are 68 language groups and 364 variants (INALI, 2008), divided into 11 families. However, this wealth has been threatened due to discrimination against speakers. Indeed, Spanish has been imposed from the legislative, political and economic point of view, which has interrupted the intergenerational transmission of originary languages and, with it, caused the gradual loss of use spaces and communicative functions. Likewise, few technologies have been developed for these languages, because there are few texts written on the internet. The CPLM is a collaborative parallel corpus that contains texts aligned in Spanish and in six indigenous languages: Mayan, Ch'ol, Mazatec, Mixtec, Otomi and Nahuatl. This article describes the development of the CPLM, as well as the difficulties presented throughout the process. Resumen México cuenta con una gran diversidad de lenguas, ya que, aparte del español, existen 68 agrupaciones lingüísticas y 364 variantes (INALI, 2008), repartidas en 11 familias. Sin embargo, esta riqueza se ha visto amenazada debido a la discriminación hacia los hablantes. En efecto español se ha impuesto desde el punto de vista legislativo, político y económico, lo que ha interrumpido la transmisión intergeneracional de las lenguas originarias y, con ello, originado la pérdida paulatina de espacios de uso y funciones comunicativas. Así mismo, pocas tecnologías se han desarrollado para estas lenguas, debido a que existen pocos textos escritos en internet. El CPLM es un corpus paralelo colaborativo que presenta textos alineados en español y en seis lenguas indígenas: maya, ch'ol, mazateco, mixteco, otomí y náhuatl. Este artículo describe el desarrollo del CPLM, así como las dificultades presentadas a lo largo del proceso.
International Conference on Language Resources and Evaluation, 2020
Mexico is a Spanish speaking country that has a great language diversity, with 68 linguistic groups and 364 varieties. As they face a lack of representation in education, government, public services and media, they present high levels of endangerment. Due to the lack of data available on social media and the internet, few technologies have been developed for these languages. To analyze different linguistic phenomena in the country, the Language Engineering Group developed the Corpus Paralelo de Lenguas Mexicanas (CPLM) [The Mexican Languages Parallel Corpus], a collaborative parallel corpus for the low-resourced languages of Mexico. The CPLM aligns Spanish with six indigenous languages: Maya, Ch'ol, Mazatec, Mixtec, Otomi, and Nahuatl. First, this paper describes the process of building the CPLM: text searching, digitalization and alignment process. Furthermore, we present some difficulties regarding dialectal and orthographic variations. Second, we present the interface and types of searching as well as the use of filters.
Lacorte, Manel (ed.): The Routledge Handbook of Hispanic Applied Linguistics. New York: Routledge, 2014, 371-387, 2014
Corpus linguistics is one of the most important orientations in current theoretical and applied linguistics. This chapter considers the main characteristics this approach presents in Spanish linguistics. After a brief examination of the main general features of Spanich corpora, we summarize both the antecedents of CL as well as the different phases in its development, analyze the current situation with regard to some fundamental topics, and include references to different Spanish corpora. Finally, a number of notions that are likely to be of central concern in the coming years are examined.
2019
Computational technologies have a key role in Computational Linguistics. Thanks to the capability of compiling and analyzing large collections of texts with computers many resources and applications have been designed that have caused a fast development in Natural Language Processing and Artificial Intelligence. Corpora and parallel corpora are basic instruments for approaching natural language, making it possible the implementation of models for machine translation, automatic summarization, information extraction and other methods for language understanding and analysis. All these advances in language technologies need large amounts of data. The most spread and bestrepresented languages in media and internet generate every day Giga Bytes of information that can easily be processed and studied. However, most of the languages in the world are under-represented in social life, the media and, the internet. These are low-resourced languages. An example of this is indigenous languages in...
2002
Abstract:-Collections of texts with syntactic annotation are nowadays useful resources. They are employed for diverse tasks in theoretical research and natural language applications. The most important collections are dedicated to English. But huge efforts have being realized to develop the corresponding to other languages. In this work we present the initial steps for the compilation of a Mexican Spanish text corpora with syntactic annotation. Key-Words:-text collection, annotated corpus, corpus compilation
2024
Esta conferencia propone una doble lectura historicista del paisaje lingüístico en español en Suiza (con foco principal en la ciudad de Lausana): 1) se tratará de leer la historia de la comunidad migrante hispanófona en el PL desde que, hace ya más de sesenta años, Suiza y España firmaron un tratado bilateral de inmigración; 2) se observará este PL a través de una recopilación de datos que cubre una década (desde 2013 a 2024) en cuatro momentos: 2013/14, 2016, 2019 y 2024, produciendo una visión de time lapse que muestra la evolución constante de los signos en español acorde con los cambios que va experimentando la comunidad hispanohablante en este país. Asumo, de un lado, con Blommaert (2013: 51), que "Linguistic landscaping can be all kinds of things, but not an a-historical inquiry; it is an instrument for historical research"; y, de otro, con historiadores expertos en memoria democrática, como el historiador español Sergio Molina (2024), que es necesario incluir en las iniciativas de memoria democrática la de la emigración española a Europa, para contribuir a luchar contra las actitudes xenófobas actuales en España: En el contexto actual, marcado por las batallas identitarias y por el auge de la extrema derecha, es necesario incluir las historias de la emigración española en el relato sobre nuestro pasado más reciente para así entender el movimiento de las sociedades y la permeabilidad de las fronteras. Se debe insistir, por ejemplo, en la gran importancia de la emigración española a Europa durante el franquismo y en su impacto en la economía, en la sociedad y en la política de aquellos momentos. Esta investigación comenzó en 2013 y desde entonces he publicado mis resultados en dos ocasiones (Castillo Lluch 2019 y 2022). Esta ocasión me permitirá completar una visión comparativa a lo largo de una década y comprobar si las dinámicas observadas estos últimos años-fundamentalmente el desmantelamiento de los locales del asociacionismo español del siglo pasado con la consiguiente desaparición de sus signos en el PL y la expansión de signos de la comunidad hispanoamericana-se mantienen y siguen su curso.
Dos propuestas de con-ferencias a impartir en las últimas dos semanas de junio de Aztlán a Cuzcatlán // Two proposal for con-ferences to be given in the last two weeks of June from Aztlan to Cuzcatlan...
Journal of Language Contact, 2010
2016
The essays in and Lipski (2007) raised relevant considerations about the status of Hispanic Linguistics in American universities, and Del Valle's comments (2014) confirmed the ongoing validity of those remarks on Hispanic Linguistics as a research area and as an educational field. These are two sides of the same coin: our research, divulged in conference papers, journal articles, book chapters, and books; and our activities, such as teaching courses, advising students, or directing theses and dissertations. When there is a good fit between the two sides, research and teaching may be closely related. Due to circumstances to be commented on below, however, the synergy between research and teaching is often tenuous. Whereas literature professors at research universities rarely give courses outside their specialization, in the same institutions it is not unusual for the one or two linguists in the Spanish department to teach courses in areas-phonetics, phonology, morphology, syntax, semantics, pragmatics, and language variation-unrelated to their research field. In addition, Hispanic linguists are often responsible for activities that, while benefitting from their specialized training, do not constitute linguistics per se, such as training teaching assistants, designing language courses, or directing language programs. Comments on this arrangement appeared in several of the essays mentioned above and need not be revisited here. Regarding research, the past decade has seen an impressive output in Spanish linguistics. Traditional venues, such as the conventions of the American Association of Teachers of Spanish and Portuguese (AATSP) and, to a lesser extent, the Modern Language Association (MLA), have featured essays on Spanish linguistics; the Congress on Spanish in the United States, first held in 1980, and the Congress of Spanish in Contact with other Languages (held jointly with the latter since 1991) have continued to be active venues for a growing number of specialists; and the first congress of the recently reactivated Academia Norteamericana de la Lengua Española (ANLE), held in June 2014, featured essays on Spanish in the United States.
2016
In this paper it will reflect on the specific needs of the linguistic research regarding the construction of bilingual parallel corpora and primarily on the conclusions to be drawn for their design, compilation and domains. A research group of the university in Santiago is currently building a bilingual parallel corpus (Corpus PaGeS) consisting of original texts in German and Spanish together with their translations into the other language, as well as German and Spanish translations from a third language. This corpus was originally intended for linguistic research purposes, specifically, the analysis of the expression of the spatial relations. Initially a brief survey of some significant existing related corpora is performed, and their limitations for linguistic studies are outlined. The different issues that were taken into account for the design of the corpus will be explained, such as type of texts, domains, regional language variety or quality and direction of translations. Afte...
Revista Estudios del Discurso Digital (REDD)
CHIMERA: Revista de Corpus de Lenguas Romances y Estudios Lingüísticos, 2018
In this work, we present the EspaDA-UNCuyo Corpus to share findings and future directions of our research. The project covers corpus design stage. Its general aim is to establish external criteria in order to systematize a representative and balanced sample of academic written and spoken discourse in Spanish at UNCuyo (Mendoza, Argentina). Among the most relevant findings, we established external criteria and started a compilation of authentic communicative events. In this presentation, we will make connections between the external criteria system and the performance of each criterion categories in the initial stage of sample compilation.
Lusophone, Galician, and Hispanic Linguistics: Bridging Frames and Traditions, 2019
2008
This paper describes part of a three year collaboration between Carnegie Mellon University's Language Technologies Institute, the
Procedia Social and Behavioral Sciences, 2013
Everyone working on general language would like their corpus to be bigger, wider-coverage, cleaner, duplicate-free, and with richer metadata. As a response to that wish, Lexical Computing Ltd. has a programme to develop very large web corpora. In this paper we introduce the Spanish corpus, esTenTen, of 8 billion words and 19 different national varieties of Spanish. We investigate the distance between the national varieties as represented in the corpus, and examine in detail the keywords of Peninsular Spanish vs. American Spanish, finding a wide range of linguistic, cultural and political contrasts.
Relación de diccionarios del español en los Estados Unidos. Presentado en "25th Conference on Spanish in the United States". New York. 26-29 de marzo. 2015
Nueva Revista de Filología Hispánica (NRFH), 2006
Language Problems and Language Planning, 2010
the cultural significance of Spanish and its unique roles as a first language, a foreign language and a border language. Third, Humberto López Morales points out that the encyclopedia addresses US Spanish in its multiple and diverse manifestations including demographic, legal, linguistic, pedagogical, artistic and other domains. The volume reads like a Who's Who of Hispanic language and culture in the USA; its 49 contributors represent the best scholars on the various topics covered and are too numerous to mention here. Several authors contribute more than one entry, either as single authors or as co-authors. There are sixteen sections, each with separate essays by outstanding scholars, and each chapter contains subdivisions with individual encyclopedic entries on various aspects of the main topic. The theme of each section merits mention here since the spectrum of topics covered is quite broad: 1. Las primeras huellas hispanas (The first Hispanic traces) 2. La demografía hispánica en suelo norteamericano (Demography on North American soil)
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.