Edited books by Marco Passarotti
The focus of this keynote is the larger picture of how different disciplines of classics and arch... more The focus of this keynote is the larger picture of how different disciplines of classics and archaeologies from analogue times, when taken onto the digital level, change. And what that change looks like, when focused on the often misunderstood role of computer linguists inside the data ecosystem of classics and archaeologies.
Papers by Marco Passarotti
This paper aims at examining the diachronic distribution of one of the richest classes of nouns i... more This paper aims at examining the diachronic distribution of one of the richest classes of nouns in Latin, namely those ending in-io. The work is performed through the combined use of a morphological analyser for Latin (Lemlat), and a database collecting all word forms occurring through different periods of Latin language (TF-CILF).

• Despite a centuries-long tradition in lexicography, Latin lacks state-of-the-art computational ... more • Despite a centuries-long tradition in lexicography, Latin lacks state-of-the-art computational lexical resources. This situation is strictly related to the still quite limited amount of linguistically annotated textual data for Latin, which can help the building of new lexical resources by supporting them with empirical evidence. However, projects for creating new language resources for Latin have been launched over the last decade to fill this gap. In this paper, we present Latin Vallex, a valency lexicon for Latin built in mutual connection with the semantic and pragmatic annotation of two Latin treebanks featuring texts of different eras. On the one hand, such a connection between the empirical evidence provided by the treebanks and the lexicon allows to enhance each frame entry in the lexicon with its frequency in real data. On the other hand, each valency-capable word in the treebanks is linked to a frame entry in the lexicon.
This paper investigates the distribution of word formation data
through network visualisation, as... more This paper investigates the distribution of word formation data
through network visualisation, as an entry point for the exploration
/ analysis of productivity in affixal derivation in Classical
Latin. This study uses data from theWord Formation Latin lexicon, a
derivational morphology resource for Latin, where entries are analysed
into their formative components, and relationships between
them are established on the basis of word formation rules.
The recent enhancement of the morphological analyser for Latin Lemlat with a large Onomasticon en... more The recent enhancement of the morphological analyser for Latin Lemlat with a large Onomasticon enables us to analyse both the morphology and the distribution of loanwords in the Latin lexicon. In this paper, first we describe the categories of proper names that were not possible to insert into Lemlat automatically, showing that a large part of them are loanwords. Then, we present the results of a qualitative analysis of loanwords to detect those ‘exceptional’ endings that identify loanwords featuring inflectional properties not assimilated to those regular in the morphological system of Latin. In the end, we report a quantitative analysis of data to study the frequency of such loanwords in Latin texts.
The Word Formation Latin project is developing a new lexicon of Latin based on derivational morph... more The Word Formation Latin project is developing a new lexicon of Latin based on derivational morphology, a branch of linguistics that is increasingly gaining interest in the area of NLP thanks to its connection with semantics. This paper describes an easy to use web application to access this resource, using a combination of queries and interactive visualisations.
This paper introduces the main components of the downloadable package of the 3.0 version of the m... more This paper introduces the main components of the downloadable package of the 3.0 version of the morphological analyser for Latin Lemlat. The processes of word form analysis and treatment of spelling variation performed by the tool are detailed, as well as the different output formats and the connection of the results with a recently built resource for deri-vational morphology of Latin. A light evaluation of the tool's lexical coverage against a diachronic vocabulary of the entire Latin world is also provided.
We present a valency lexicon for Latin verbs extracted from the Index Thomisticus Treebank, a syn... more We present a valency lexicon for Latin verbs extracted from the Index Thomisticus Treebank, a syntactically annotated corpus of Medieval Latin texts by Thomas Aquinas. In our corpus-based approach, the lexicon reflects the empirical evidence of the source data. Verbal arguments are induced directly from annotated data. The lexicon contains 432 Latin verbs with 270 valency frames. The lexicon is useful for NLP applications and is able to support annotation.
... 25 Rapid Adaptation of NE Resolvers for Humanities Domains using Active Annotation Asif Ekbal... more ... 25 Rapid Adaptation of NE Resolvers for Humanities Domains using Active Annotation Asif Ekbal,Francesca Bonin, Sriparna Saha, Egon Stemle, Eduard Barbu, Fabio ... 105 Exploring New High German Texts for Evidence of Phrasemes Cerstin Mahlow, Britta Juska-Bacher . ...
... ésta es una manera especialmente apropiada de representación para lenguajes con un orden de p... more ... ésta es una manera especialmente apropiada de representación para lenguajes con un orden de palabras medianamente libre (como el griego, el latín y el checo), donde el orden lineal de los constituyentes está interrumpido por elementos ... mora ADV ceteri sine mora veniunt ...
... 25 Rapid Adaptation of NE Resolvers for Humanities Domains using Active Annotation Asif Ekbal... more ... 25 Rapid Adaptation of NE Resolvers for Humanities Domains using Active Annotation Asif Ekbal,Francesca Bonin, Sriparna Saha, Egon Stemle, Eduard Barbu, Fabio ... 105 Exploring New High German Texts for Evidence of Phrasemes Cerstin Mahlow, Britta Juska-Bacher . ...

Research in the Humanities is predominantly text-based. For centuries scholars have studied docum... more Research in the Humanities is predominantly text-based. For centuries scholars have studied documents such as historical manuscripts, literary works, legal contracts, diaries of important personalities, old tax records etc. Manual analysis of such documents is still the dominant research paradigm in the Humanities. However, with the advent of the digital age this is increasingly complemented by approaches that utilise digital resources. More and more corpora are made available in digital form (theatrical plays, contemporary novels, critical literature, literary reviews etc.). This has a potentially profound impact on how research is conducted in the Humanities. Digitised sources can be searched more easily than traditional, paper-based sources, allowing scholars to analyse texts quicker and more systematically. Moreover, digital data can also be (semi-)automatically mined: important facts, trends and interdependencies can be detected, complex statistics can be calculated and the results can be visualised and presented to the scholars, who can then delve further into the data for verification and deeper analysis. Digitisation encourages empirical research, opening the road for completely new research paradigms that exploit `big data' for humanities research. This has also given rise to Digital Humanities (or E-Humanities) as a new research area. Digitisation is only a first step, however. In their raw form, electronic corpora are of limited use to humanities researchers. The true potential of such resources is only unlocked if corpora are enriched with different layers of linguistic annotation (ranging from morphology to semantics). While corpus annotation can build on a long tradition in (corpus) linguistics and computational linguistics, corpus and computational linguistics on the one side and the Humanities on the other side have grown apart over the past decades. We believe that a tighter collaboration between people working in the Humanities and the research community involved in developing annotated corpora is now needed because, while annotating a corpus from scratch still remains a labor-intensive and time-consuming task, today this is simplified by intensively exploiting prior experience in the field. Indeed, such a collaboration is still quite far from being achieved, as a gap still holds between computational linguists (who sometimes do not involve humanists in The ACRH-2 Co-Chairs and Organisers
Uploads
Edited books by Marco Passarotti
Papers by Marco Passarotti
through network visualisation, as an entry point for the exploration
/ analysis of productivity in affixal derivation in Classical
Latin. This study uses data from theWord Formation Latin lexicon, a
derivational morphology resource for Latin, where entries are analysed
into their formative components, and relationships between
them are established on the basis of word formation rules.
through network visualisation, as an entry point for the exploration
/ analysis of productivity in affixal derivation in Classical
Latin. This study uses data from theWord Formation Latin lexicon, a
derivational morphology resource for Latin, where entries are analysed
into their formative components, and relationships between
them are established on the basis of word formation rules.
into their formative components, and relationships between
them are established on the basis of Word Formation
Rules (WFRs). For example amo (to love) and amator
(lover) are connected with a relationship that describes a
change from a verb to a noun through the addition of a suffix
(-a-tor) that in itself bears semantic information (in this
case it characterises agentive and instrumental nouns, i.e.
someone or something performing an action).
The Word Formation Latin project received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 658332-WFL. It ran from November 2015 to October 2017 and resulted in a word formation based lexicon and tool for Latin. The work was carried out at the CIRCSE Research Centre of Università Cattolica del Sacro Cuore in Milan.
The first Workshop on Resources and Tools for Derivational Morphology (DeriMo), whose contributions are collected in these proceedings, was organised to celebrate the end of the project and to consider the current status of research in the field.