AI accelerates access to insect collections

Researchers at the Museum für Naturkunde Berlin, together with data scientists, have developed a new method to largely automate the extraction of label information from digitized insect specimens. The pipeline, named ELIE, uses artificial intelligence to reliably detect and process printed labels. This significantly reduces the time-consuming manual transcription work and represents an important advance for the digitization of natural history collections worldwide. The paper is published in the journal Methods in Ecology and Evolution.

With more than 1 million described species, insects represent the most diverse group of living organisms on Earth. Natural history collections worldwide house about 500 million insect specimens collected over the past three centuries. Each specimen carries labels containing essential information such as collection locality, date, and collector. These data form a crucial foundation for research in taxonomy, evolutionary biology, and ecology.

Despite the availability of high-throughput digitization workflows for collection objects, the transcription of label information is still largely performed manually. Researchers at the Museum für Naturkunde Berlin, working closely with experts in digitization and data science, have now developed a new pipeline that substantially simplifies and accelerates this process.

The pipeline, ELIE ("Entomological Label Information Extraction"), automates several steps of label processing. Using image analysis and machine learning techniques, ELIE detects individual labels in digital images, aligns them, and classifies them as either printed or handwritten. Printed labels are automatically processed using optical character recognition, while handwritten information is separated for targeted manual transcription. In addition, the system groups identical or highly similar labels, ensuring that recurring information only needs to be reviewed once.

"With ELIE, we address one of the major bottlenecks in the digitization of entomological collections," says Margot Belot, Data Manager at the Museum für Naturkunde Berlin. "Automating the transcription of printed labels significantly relieves researchers and curators and allows us to make our collections available for scientific use more quickly and systematically."

The pipeline was tested, among other datasets, on 26,000 of the label images from the 650,000 insect specimens digitized at the MfN between 2022 and 2023 using a high-speed conveyor-based imaging system developed by the company Picturae. The results show that, depending on the degree of label redundancy, information from up to nearly 90% of printed labels can be extracted automatically. Further tests with datasets from the Smithsonian National Museum of Natural History in Washington, D.C., and the Museum of Comparative Zoology at Harvard University demonstrate that ELIE can be reliably applied to previously unseen collections.

The researchers see ELIE as an important building block for the future digitization of natural history collections and as a contribution to making these unique archives of biodiversity more accessible for research.

Publication details

Margot Belot et al, High‐throughput information extraction of printed specimen labels from large‐scale digitization of entomological collections using a semi‐automated pipeline, Methods in Ecology and Evolution (2026). DOI: 10.1111/2041-210x.70235

Journal information: Methods in Ecology and Evolution

Key concepts
biological informaticsBiodiversity
Who's behind this story?
Lisa Lock
Lisa Lock

BA art history, MA material culture. Former museum editor, paramedic, and transplant coordinator. Editing for Science X since 2021. Full profile →

Robert Egan
Robert Egan

Bachelor's in mathematical biology, Master's in creative writing. Well-traveled with unique perspectives on science and language. Full profile →

Citation: AI accelerates access to insect collections (2026, February 5) retrieved 7 June 2026 from https://phys.org/news/2026-02-ai-access-insect.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Botanical time machines: AI is unlocking a treasure trove of data held in herbarium collections