Data Standards & Workflows

eDNAqua-Plan’s Guide to Datasets and Metadata

The eDNAqua-Plan project is committed to building a harmonized digital ecosystem for DNA-based biodiversity monitoring. A critical component of this effort is ensuring that datasets, databases, and metadata standards are consistent, interoperable, and FAIR-compliant across Europe.

This document provides a comprehensive overview of the datasets, databases, and metadata standards recommended within the eDNAqua-Plan digital landscape. It serves as a practical guide for researchers, data managers, and policymakers, ensuring that DNA-based data is reproducible, comparable, and ready for integration into environmental monitoring frameworks.

Key Features

This guide outlines the workflow steps for DNA-based data studies, covering everything from sampling to taxonomic assignment, and specifies the applicable methodologies, including:

Metabarcoding
Metagenomics
Targeted assays
Morphology-based approaches

Standardized File Naming Conventions
Aligning with FAIR guidelines, this document provides clear conventions for naming files, ensuring consistency and ease of use across studies.

Recommended Metadata Standards
The guide highlights essential metadata standards, including:

MIxS (Minimum Information about any (x) Sequence)
Darwin Core (DwC) for biodiversity data
ENVO (Environment Ontology) for environmental context
Reproducibility Management Systems (e.g., NCBI BioProject, Zenodo, GitHub)

File Types and Extensions
The document specifies file extension types for various data files, such as:

CSV (for tabular data)
FASTQ (for raw sequencing data)
FASTA (for sequence data)
Code and scripts (for bioinformatics analyses)

Guidance on Data Types
From project and sample metadata to raw and processed sequencing data, this guide ensures that all aspects of DNA-based data management are covered, including:

Reference libraries
Bioinformatics code
Taxonomic assignment workflows

Download the Guide

By adhering to these standards and workflows, researchers can ensure that their DNA-based data is:

Consistent across studies and regions
Reproducible for future analyses
FAIR-compliant (Findable, Accessible, Interoperable, Reusable)

This document is an essential resource for anyone involved in DNA-based biodiversity monitoring, from field sampling to data publishing.