September 2023

It’s that time again, the events of the series “Digital Humanities – Wie geht das?” for the third quarter of 2023 are coming up.

Workshop: OCR4all – Open-source Text Recognition of (pre-)Modern Prints and Manuscripts

November 13, 2023 at SUB Hamburg.

What data and file types are required for OCR? How does the use of the OCR or HTR workflow integrated in OCR4all change depending on the source material and what (manual) effort should be expected?, How much can the workflow be automated depending on the material at hand?, What are OCR models and how can you train your own text recognition models?, What recognition accuracy can be expected?, How much effort actually makes sense with regard to the later use of the texts produced?

These and other questions will be addressed and explained during the workshop as a part of the event “OCR4all – Open-source Text Recognition of (pre-)Modern Prints and Manuscripts”. So that at the end of the day all participants will be able to work on complex OCR projects independently.

OCR (Optical Character Recognition) and HTR (Handwritten Text Recognition) continue to represent a challenge in the humanities and cultural sciences. OCR4all offers all users a freely available and easy-to-use option to carry out their own OCR workflows. Florian Langhanki (JMU) will introduce the general basics and concepts of OCR and introduce the OCR4all software.

You can either work directly with your own texts or use prepared materials. No prior technical knowledge is required to participate. All you need to bring with you is an internet-enabled laptop, texts relevant to your research (optional) and a great deal of curiosity about the OCR.

The number of participants is limited to 15, so please register at [email protected]

OCR4all – Open-source Text Recognition from Mass Processing of Prints to High-quality Transcription of Handwriting

November 08, 2023 online via ZOOM. Registration is not required.

A central aspect of the work of humanities, cultural and human sciences researchers is the examination of historical sources in the form of printed and handwritten textual evidence. These are often only available as scans, which severely limits their usability, as automatic indexing approaches such as full-text searches or quantitative analysis methods cannot be used. To do this, so-called machine-processable full text must first be extracted from the digital copies, with methods of automatic text recognition of prints (Optical Character Recognition, OCR) or handwriting (Handwritten Text Recognition, HTR) playing an increasingly important role. Very old prints and manuscripts in particular often represent a major challenge for a variety of reasons.

The freely available open source tool OCR4all, developed at the Center for Philology and Digitality (ZPD) at the University of Würzburg, aims to give even less technically experienced users the opportunity to access sophisticated prints and manuscripts independently and in the highest quality. OCR4all encapsulates the entire text recognition workflow and all the tools required for it in a single application that can be easily installed and operated via a comfortable graphical user interface.

During the lecture, Christian Reul explains the basics of automatic text recognition and presents OCR4all and how it works in a live demo. In addition, the applicability and performance on different materials is demonstrated and an overview of current work as well as an outlook on future developments is given.

Monat: September 2023

Event information: “Digital Humanities – How does it work?” in the third quarter of 2023