Von Alleiah Kall
Historical archives are full of sensitive data and discriminatory language because they reflect political, social, or colonial biases. As these archives are increasingly digitised and made accessible today, many questions arise as to how historical scholarship can engage responsibly with the imperial and colonial heritage and inherited institutional practices of most European archives. The thorniest question in this context is: does minimising the inherent biases and discriminatory potential in archival language have any effect upon the communities being discriminated against?
This question was one of the topics discussed at the international conference on ‘Data ethics for historical research in a digital era: Critical reflections and best practices’, organised by by the Leibniz Institut für Europäische Geschichte (IEG) in Mainz in November 2025. The panel “Session II: Sensitive Data and Discriminatory language”, in particular, focused on the crucial question of how to build an ethical relationship between the people whose data is digitised, and those who digitise and reuse that data to further the aims of historical research. The presentations and discussions during the panel focused on the question of how ethical responsibility should be exercised in the digitisation process. It quickly became clear that this means more than just thinking about the manifestation of bias in metadata and re-evaluating decisions for or against certain terms during digitisation.
Story as Resource: Decisions around Anonymisation and Biographical Exposition
When does an individual’s story start playing a role in history? This may happen when an individual produces an autobiographical text about themselves; and it happens often once an individual is dead, and can no longer determine the course of their own story. Therefore, historian Ian Brunskill has called obituaries the “first drafts of history”.1 When a deceased individual is memorialised, the story written about their life has a certain narrative thrust, often determined by the community grieving or remembering them. An even deeper layer of control over this sort of story is held by the social majority. To this, no matter how sensitively it treats data, the archive adds a layer of description and categorisation that can influence how an individual’s story continues to live on, or play a role in history.
The power and potential effect of archival classification systems is all the more visible when the archive is digital. However, by the same token, there is more scope to include a multiplicity of voices when it comes to setting up those systems. This archival power was interrogated by Milan Van Lange and Carlijn Keijzer’s presentation on “Egodocuments, Data Ethics, and Navigating Transformations in the Archive”, which focused on correspondence during World War II gathered by their institution, the Netherlands Institute for War, Holocaust, and Genocide Studies (NIOD), between 1935 and 1950. They asked what it actually meant when a letter written by a child in wartime was classified by the digital archive under “Jews in hiding” rather than, say, “children”, or “hunger/winter”. In a nutshell, it would mean reducing the whole life of the person to one traumatic experience. Along similar lines, in their investigation of the practices of memorial sites commemorating the Krankenmorde (victims of the Nazis concentrated in psychiatric institutions), presenters Anne Klammt and Christoph Hanzig, both from the Hannah Arendt Institute for Totalitarianism Studies in Dresden, explained how their digitisation work depended entirely on descriptions produced by the perpetrators. The biographical descriptions of victims they were able to produce thus drew on violently biased sources. On occasion, they were able to ask family members of the deceased if they would like to contribute and help write a biography for exhibitions based on the digital archive, which would allow a wider story of the individual to emerge.
Yet, since relatives of victims are thin on the ground the further back one goes in history, sometimes the wider story remains to be discerned by someone who has a grasp of the historical context and can extrapolate from it. Gabriele Zöllner and Rebekka Reichert, both from the Coordination Centre for Scientific University Collections in Germany at the Humboldt University Berlin, in their presentation “Discrimatory?! Navigating Historical and Contemporary Sources for Provenance Research”, highlighted the possibility for an archivist who was composing metadata, to spot even visual stereotypes, such as a particular Nazi-era drawing-style used to portray a rabbi’s nose, and hence include tags such as “antisemitism”. The onus thus remains on the archivist, and much depends on their capacity to discern bias and apply ethical standards.
However, archivists’ solutions and policies aren’t always the best solution from the point of view of those who possess the histories to be digitised. Klammt and Hanzig, on behalf of the Hannah Arendt Institute, presented the dilemma of anonymisation: including no names of victims would mean no derogatory language and would be relatively easy; putting in names but no identification would still include discriminatory words and be searchable, though this would mean significantly more work; and providing names and referencing wherever possible including full context and consequences of events would mean so much work that it might stall the project altogether. Yet, in an example cited by Lange and Keijzer, anonymity, generally considered a safeguard against misuse, felt offensive to a 94-year-old whose childhood letters were digitised by the NIOD; she wanted full visibility, which was counterintuitive for the archivists. Disagreements over archival donations and potential legal disputes, they warned nonetheless, should not drive digitisation towards a focus on only “easy” material. How, then, can the “hard” material be approached? And what does “hard” mean to the people whose histories are to be archived?
Responsibility and Respect: Data as Gift, Weapon, or Ongoing Relationship?
One of the hardest questions to resolve is the place of the individual and of the community when it comes to the ethical reuse of data by historians, not least because the relationship between all three of those parties may be fraught. For Anne Klammt and Christoph Hanzig, it is important to illustrate injustice by recounting individual fates, and including names and biographies is very important in order to acquire funding. Since there is often sensitive data in biographies, they recommend focusing, based on their study of the “Krankenmorde”, on the general living conditions in traumatic circumstances in order to contextualise each individual’s life and fate. They also recommend contexualising any language used by the perpetrators, and focusing more on the individual and less on the person’s relatives. It is nonetheless an assumption, they admit, that those individuals would want to be remembered.
What are the rights of the individuals who are still alive? Sometimes people only become aware of the murder of one of their family members when they consult one of these databases. Klammt and Hanzig added that if lists of victims were to be published online in cross-referencable ways, it would be a major infringement of the rights of those affected, according to the Federal Data Protection Officer, Diethelm Gerhold. In their database on the digitised NSDAP newspaper Der Freiheitskampf, they referred to authority data in the historians’ database FactGrid, but did not include the personal names of victims in order to avoid revictimising them. Although, excluding names from metadata, they added, might not be entirely effective when a lot of online newspapers today use OCR character recognition software. Bad pun intended: not all characters in history desire recognition.
The ethical dilemma thus lies also in whether or not to “rescue” an individual’s story from oblivion as a moral duty. Virginia Gg Niri, an oral historian from Genoa, talked about the Roads to Oral Archives Development and Sustainability project. Her own work focuses on 127 interviews of people about life during WWII, recorded between 1993 and 2004 and digitised by the University of Padova. Often these people used pseudonyms, so there was no way of identifying them. She identified two possible strategies in this case, citing the Italian Vademecum sulle Fonte Orali published by the Ministry of Culture. This includes public notice and thus a demonstrably diligent search for family or heirs of the interviewee; and alternatively, looking for direct contacts, such as neighbours of the deceased. She presented the further dilemma of risking intrusion into someone’s life – since maybe the heirs were not even aware of the interviewee’s actions – versus the ethical responsibility to search for consent. Ultimately, she used anthropological conceptions of the gift to conclude that interviews were voluntarily donated to the researcher, who became their guardian thereafter.
Yet, there is a definite hierarchy and paternalism inherent in the concept of guardianship. Klammt and Hanzig also reflected upon the problem of differentiating between victim and perpetrator, citing the example of the mayor of Freital, a member of the Nazi party, who was later condemned by the Nazis for homosexuality. Furthermore, the question arose as to whether future perpetrators might gain access to public digital archives about minority groups, and use information gained to cause harm. Considering these scholars’ collaboration with the Israeli database Yad Vashem, the crucial question was raised during the Q & A session as to how the memorialisations on that website “might be weaponised against Palestinian victims of Israeli genocide today”. The social roots and impact of bias are all the plainer when bureaucratic and State structures that sustain the archive and its relationships stem from, or still support, colonial power.
The questions this leaves us with are: whose sensitivities are being considered? What effect does minimising bias have upon research communities, and communities from which the data originates?

The Role and Rights of Communities in the Reduction of Bias and Marginalisation
Qualifying data as “sensitive” carries within it the connotation of potential harm that could stem from that data. It is important, in this case, to focus on the word “harmful”, used in the title of the keynote presentation by Kerstin Herlt, coordinator of the European project DE-BIAS, “Detecting and cur(at)ing harmful language in cultural heritage collections”.
Kerstin Herlt shared the typology of bias in the DE-BIAS project as being divided into three themes: migration and colonial past, gender and sexual identity, and ethnicity and ethnoreligious identity. According to her, bias can occur at the linguistic level, at the visual level, and within the relation between both those levels. The layers of archival description at which it can occur are during metadata creation (like captions, titles, and descriptions); additional metadata (controlled vocabulary, tagging, subject headings), and digital deliberation (enrichments and translations). Herlt also emphasised that it was important to pinpoint who was affected by the descriptions and representations, why, how, and by what information and data gaps. Their goals were to create a vocabulary of contentious terms, a typology for the analysis of patterns of bias, an AI tool to detect problematic language, and to work on capacity building. This is very useful for the research community, and will no doubt lead to more thoughtful academic output by historians. However, it leaves the audience wondering about the harm already inflicted by the research community as a result of the biases that exist and need to be eliminated.
The gap between academia and non-academic communities or activists manifests here as a concrete question: how do the communities affected by this potential discrimination participate in the archival de-biasing process, to what end, and what are the mechanics of that collaboration? Herlt revealed that one can contribute to the DeBias Hub by leaving comments on their website. However, generally speaking, this was the stickiest issue during the keynote lecture, and indeed, the whole conference. On the one hand, there were “co-creation workshops” held for each bias area during the DE-BIAS project. For Migration and Colonialism, they collaborated with the Congolese community in Belgium, with whom the University of Leuven had a pre-existing link, as well as the Surinamese community, with whom the Netherlands Institute for Sound and Vision had a pre-existing link. For Gender and Sexual Identity, it was the queer community in Rome and London, via the European Fashion Heritage Association. For Ethnicity and Ethnoreligious Identity, it was the Jewish community through the Deutsches Filminstitut & Filmmuseum. However, one is left wondering about the composition and inner dynamics of these groups, as well as their relationship to the DE-BIAS project members. How about their individual identities and stories?
Herlt recognised that remunerating the community was the place where good intentions tripped themselves up. When asked about remuneration, she admitted that members of the communities concerned had to be associate members of the project, and not full members often because of immigration constraints. Similarly, money from the project budget had to be set aside specifically to pay them – they did not get paid as project members. In the end, it was admitted that the relationship with the community was never an equal one. How to remunerate the community was something that needed to be advocated for and put into funding applications. The conclusion on the topic was that “who profits and who benefits is a critical question.”
Conclusion: My Opinion as an Attendee
There is a disconnection between the good intentions of digitisation projects, and the social conditions and needs of communities who are perceived as being at risk of harm by historical research that reproduces biased language. How do communities harmed by that language deal with it outside of the archive, and how does the curation of that language within the digital archive actually make a difference? That is a difficult question. In my opinion, the emphasis needs to be on the archive’s ongoing structural relationship, not only with the communities in question, but with the bureaucracy and funding authorities it depends upon. Any attempt to “decolonise” the archive needs to go beyond the inclusion of marginalised voices and the sanitisation of metadata.
An archive needs to make itself into a public actor that is actively anti-colonial; it needs to constantly strive to counter and protest against and provide alternatives to inherited colonially-influenced practices. Digitisation could be a starting point for institutional reform, and the emphasis could be placed upon creating long-term collaborations with local history initiatives and rights organisations who would greatly benefit from improved access to digital material. The ethical standards set by historians and archivists would be stronger still, if the archive were to work positively against discrimination, instead of simply eliminating the traces it will not cease to leave.
Alleiah Kall from the European University Institute in Florence is currently a PhD Fellow at the IEG. Her dissertation deals with the historical memory of Spanisch Republicans since the 1970s, focusing on the themes of “death” and “childhood.”
Further links to a report on the Data Ethics Conference 2025 and an opinion piece on the dilemma of AI in historical research:
Header image: Kerstin Hertl presenting the De-Bias tool. @IEG Mainz.
Appendix
- Ian Brunskill, Great Lives: A Century In Obituraries, (London: Times Books, HarperCollins, 2005), 113. ↩︎
The text only may be used under licence Creative Commons Attribution Share Alike 4.0 International. All other elements (illustrations, imported files) are “All rights reserved”, unless otherwise stated.
OpenEdition suggests that you cite this post as follows:
Alleiah Kall (December 15, 2025). Digitising People: The Ethical Use of Historical Data. Writing European History / Europäische Geschichte schreiben. Retrieved April 5, 2026 from https://ieg.hypotheses.org/4095