Data Standards & Workflows

eDNAqua-Plan’s Guide to Datasets and Metadata

The eDNAqua-Plan project is committed to building a harmonized digital ecosystem for DNA-based biodiversity monitoring. A critical component of this effort is ensuring that datasets, databases, and metadata standards are consistent, interoperable, and FAIR-compliant across Europe.

This document provides a comprehensive overview of the datasets, databases, and metadata standards recommended within the eDNAqua-Plan digital landscape. It serves as a practical guide for researchers, data managers, and policymakers, ensuring that DNA-based data is reproducible, comparable, and ready for integration into environmental monitoring frameworks.

Key Features

This guide outlines the workflow steps for DNA-based data studies, covering everything from sampling to taxonomic assignment, and specifies the applicable methodologies, including:

  • Metabarcoding
  • Metagenomics
  • Targeted assays
  • Morphology-based approaches

Standardized File Naming Conventions
Aligning with FAIR guidelines, this document provides clear conventions for naming files, ensuring consistency and ease of use across studies.

Recommended Metadata Standards
The guide highlights essential metadata standards, including:

  • MIxS (Minimum Information about any (x) Sequence)
  • Darwin Core (DwC) for biodiversity data
  • ENVO (Environment Ontology) for environmental context
  • Reproducibility Management Systems (e.g., NCBI BioProject, Zenodo, GitHub)

File Types and Extensions
The document specifies file extension types for various data files, such as:

  • CSV (for tabular data)
  • FASTQ (for raw sequencing data)
  • FASTA (for sequence data)
  • Code and scripts (for bioinformatics analyses)

Guidance on Data Types
From project and sample metadata to raw and processed sequencing data, this guide ensures that all aspects of DNA-based data management are covered, including:

  • Reference libraries
  • Bioinformatics code
  • Taxonomic assignment workflows

By adhering to these standards and workflows, researchers can ensure that their DNA-based data is:

  • Consistent across studies and regions
  • Reproducible for future analyses
  • FAIR-compliant (Findable, Accessible, Interoperable, Reusable)

This document is an essential resource for anyone involved in DNA-based biodiversity monitoring, from field sampling to data publishing.