Ressources numériques en sciences humaines et sociales OpenEdition Nos plateformes OpenEdition Books OpenEdition Journals Hypothèses Calenda Bibliothèques OpenEdition Freemium Suivez-nous

Intensive Training Week

The thematic semester DH-AI organises an intensive training week dedicated to master and PhD students in AI, Humanities and Digital humanities on its thematics in Paris from November 25 to november 29, 2024. 

This intensive training will cover theoretical and numerical topics, and applications at the intersection between these two fields including  for instance to networks analysis,  computer vision, NLP, Archival and heritage sciences. It is intended for students from a wide range of disciplines from AI/computer sciences to the humanities and social sciences. The structure of the course will be quite innovative, since it will interplay theoretical topics and practical sessions with computer labs and group projects.

Teachers and project leaders includes: Ségolène Albouy, Raphael Baena, Thierry Poibeau, Hélène Dessales, Matthieu Husson, Javier Cha, Jade Norindr, al-Motassen Alrahabi, Xavier Fresquet, Carmen Brando, Romane Desarbre

Place: 16 bis rue de l’estrapade 75005 paris

Program coordination: Florence Somer, Vincent Folliard

Program

Monday, November 25

    • 09:00-10:30 – Raphaël Baena, “Computer Vision for Humanities I”
    • 10:45-12:15 – Segolene Albouy, “From Document to Data”

 Lunch break

    • 13:30-14:45 – Projects
    • 15:00-17:30 – Projects

 

Tuesday, November 26

    • 09:00-10:30 – Matthieu Husson, Artificial Intelligence and the history of astronomy”
    • 10:45-12:15 – Carmen Brando, “Extraction and analysis of geographical information from historical sources”

Lunch break

    • 13:30-14:45 – Projects
    • 15:00-17:30 – Projects

Wednesday, November 27

    • 09:00-10:30 – Thierry Poibeau, “A brief introduction to natural Language processing”
    • 10:45-12:15 – Project

Lunch break

    • 13:30-14:45 – Hélène Dessales, “Digital humanities, Artificial Intelligence and Archeology”
    • 15:00-17:30 – Projects

Thursday, November 28

    • 09:00-10:30 – Javier Cha, “Algorithmic Reading: AI-Accelerated
      Explorations of Digital Archives”
    • 10:45-12:15 – Projects

Lunch break

    • 13:30-14:45 – Projects
    • 15:00-17:30 – Projects

Friday, November 29

    • 09:00-12:15 – Projects

Lunch break

    • 13:30-16:30 – Project Defense/Presentations

 

Projects

  1. Jade Norindr, Clara Grometo (Observatoire de Paris/ EIDA)

Analyzing a corpus of astronomical diagrams through a computer vision pipeline

With a focus on historical diagrams, this workshop aims to explore the possibilities offered by computer vision to collect, process and analyze scientific illustrations that can be found in a vast corpus of manuscripts. Participants will discover, modify and refine a semi-automatic pipeline, conceived to support humanities researchers by processing digitizations of documents to extract reusable, interoperable data. From object detection to vectorization, participants will discover multiple algorithms developed or fine-tuned to work on astronomical sources. The workshop includes an introduction to theoretical concepts, adjustment of the models, and development of post-processing treatments to optimize results.

 

 

  1. Xavier Fresquet (Sorbonne Center for Artificial Intelligence)

Generative Modeling of 14th Century Polyphony

 The polyphony of the 14th century, characterized by specific musical structures and precise compositional techniques, marks a pivotal moment in the evolution of Western music. This project proposes the opportunity to explore generative modeling techniques applied to medieval vocal polyphony (with 3 or more voices). Participants will have access to a dataset of 14th-century polyphonic compositions, allowing them to experiment with advanced models such as Transformers, Variational Autoencoders (VAE), and Recurrent Neural Networks (RNN) like LSTM to generate new polyphonies. They will also incorporate, where possible, the historical techniques and compositional rules distinctive to this era. By combining computational creativity with medieval constraints, this project aims to produce historically inspired musical outputs, in order to evaluate the potential of AI-driven medieval music generation.

 

 Motasem Alrahabi , Mikhail Biriuchinskii (ObTIC, Sorbonne Université)

Exploring cultural perceptions in historical travel narratives

 

This project aims to explore historical travel narratives, focusing on the perceptions and cultural exchanges documented by explorers. Using resources from Gallica, we extract metadata and process textual content to analyze descriptions of places, people, and sentiments in these accounts. The workflow includes data collection via keyword searches, text cleaning, named entity recognition (NER), sentiment analysis, and visualization. The analysis maps geographical references and builds networks of entities, offering insights into the cultural and social representations of different regions as seen through the eyes of travelers. This project analyzes historical travel narratives from Gallica to explore cultural perceptions documented by explorers. Key steps include:

    • Data collection: Using the Gallica API, we gather metadata and excerpts based on keywords like “voyage”
    • Data cleaning: Texts are normalized for consistent structure.
    • NER (Named Entity Recognition): Using spaCy, we extract locations and persons to identify key references.
    • Sentiment analysis: TextBlob captures the tone of descriptions for various regions.
    • Visualization: locations are mapped with matplotlib and geopy, while networkx creates a relational graph of locations and entities.

 

  1. Noé Durandard (ENS/PSL)

Exploring Large Language Models Political Biases

Since the rise of Chat-GPT and the democratisation of language technologies, LLMs have become integral to various domains —from search engines to content creation—, ultimately shaping the way information is produced and consumed.  Therefore, it appears essential to understand the political biases these technologies may carry. Participants will join hands-on experiments, applying social science-inspired methodologies, such as surveys or questionnaires, to evaluate the political leanings of LLMs. These experiments will further open a range of research questions: from practical, methodological and technical considerations (eg. what is truly being measured? how can we modify LLMs behaviour?), and up to broader ethical concerns, prompting reflections on the responsibilities and impacts of deploying such models. Through hands-on experiments aimed at uncovering political biases, the sessions will highlight current research trends while encouraging a critical assessment of the findings, the methodologies used, as well as the ensuing impacts of these technologies.

Romane Desarbre (Università degli Studi di Padova)

Computational analysis in R – A Roman construction archaeology case study

The aim of this project is to familiarise students with R language and associated software and initial concepts of computational analysis, i.e. serial processing of large datasets, by exploring the possibilities and limitations of archaeological data. Starting at an R beginner’s level, students will learn to manipulate simple univariate, bivariate and multivariate statistical analysis, as well as some elements needed for visualisation. To do so, participants will work on a case study in Roman construction archaeology: Windows and opening. Windows and openings are usually not the main focus of architectural and construction studies and have therefore been less explored than other elements of the Roman house. This is obviously due to the scarcity of the remains and the difficult treatment of the information. The application of exploratory and computational analysis on the openings of this 2nd century BCE- 1st century CE Campanian villae would enable to better understand construction techniques, the use of space and rooms as well as the evolution of villae through time up until the eruption of Mount Vesuvius in 79 CE. To do this, we will provide the students with a dataset focusing on the openings from several villae in the Roman Campanian region such as the villa of Poppaea (Oplontis) or the villa of Diomedes (Pompeii), and a toolbox of statistical analyses and visualisations possible in R. The project, catered towards R software beginners, will include an introduction to R, on how to code in R and the various useful functions and tools as well as a brief chrono cultural introduction to the datasets. The participants will then be invited to explore the dataset on their own using the rest of the tools provided to suggest possible analysis and interpretations. They will also have access to data from similar villae in the Campanian region in order to be able to draw comparisons.

 

  1. Jeanne Bollée (ENS/EHESS)

Characters, emotions and interiority in fairy tales, capacities and limits of NLP models for analysing models for analysing atypical literature

This project will involve analysing a multilingual corpus of fairy tales from oral often short but complex texts, with atypical syntax and vocabulary. It will explore the limits of the most traditional NLP models for character recognition and sentiment analysis.